Monotonic Robust Policy Optimization with Model Discrepancy
Yuankun Jiang 1 Chenglin Li 2 Wenrui Dai 1 Junni Zou 1 Hongkai Xiong 2
Abstract control tasks, e.g., playing computer games with human-
State-of-the-art deep reinforcement learning level performance (Mnih et al., 2013; Silver et al., 2018)
(DRL) algorithms tend to overt due to the model and trafc signal control (Chen et al., 2020). DRL often
discrepancy between source and t ...


雷达卡




京公网安备 11010802022788号







