ARMS: Antithetic-REINFORCE-Multi-Sample Gradient for Binary Variables
Alek Dimitriev 1 Mingyuan Zhou 1
Abstract If b is discrete, however, Tφ is not differentiable, and f is not
Estimating the gradients for binary variables is always differentiable, e.g., if it corresponds to the reward
a task that arises frequently in various domains, function in reinforcement learning. This has inspired work
such as training discrete latent v ...


雷达卡




京公网安备 11010802022788号







