CRPO: A New Approach for Safe Reinforcement Learning with Convergence
Guarantee
Tengyu Xu 1 Yingbin Lang 1 Guanghui Lan 2
Abstract Mind, 2019) and recommendation system (Zheng et al.,
2018), etc. In these settings, the agent is allowed to explore
In safe reinforcement learning (SRL) problems,
the entire state and action space to maximize the expected
an ag ...


雷达卡




京公网安备 11010802022788号







