Trust Region-Guided Proximal Policy Optimization
Yuhui Wang , Hao He , Xiaoyang Tan , Yaozhong Gan
College of Computer Science and Technology,Nanjing University of Aeronautics and Astronautics
MIIT Key Laboratory of Pattern Analysis and Machine Intelligence
Collaborative Innovation Center of Novel Software Technology and Industrialization
{y.wang, hugo, x.tan, yzgancn}@nuaa.edu.cn
Abstract
Proximal policy optimization (P ...


雷达卡



京公网安备 11010802022788号







