Neural Proximal/Trust Region Policy Optimization
Attains Globally Optimal Policy
Boyi Liu Qi Cai Zhuoran Yang§ Zhaoran Wang
Abstract
Proximal policy optimization and trust region policy optimization (PPO and
TRPO) with actor and critic parametrized by neural networks achieve significant
empirical success in deep reinforcement learning. However, due to nonconvexity,
the global convergence of PPO and TRPO remains less un ...


雷达卡


京公网安备 11010802022788号







