Samples from the distribution are not fully reparameterized, and straight-through gradients are either partially
unsupported or are not supported at all. In this case, for purposes of e.g. reinforcement learning or variational
inference, it is generally safest to wrap the sample results in a stopGradients call and instead use policy
gradients or a surrogate loss instead.
Samples from the distribution are not fully reparameterized, and straight-through gradients are either partially unsupported or are not supported at all. In this case, for purposes of e.g. reinforcement learning or variational inference, it is generally safest to wrap the sample results in a
stopGradients
call and instead use policy gradients or a surrogate loss instead.