Emphatic Algorithms for Deep Reinforcement Learning
Ray Jiang 1 Tom Zahavy 1 Zhongwen Xu 1 Adam White 1 2
Matteo Hessel 1 Charles Blundell 1 Hado van Hasselt 1
Abstract Many reinforcement learning (RL) agents learn off-policy
to some extent, to learn about the greedy policy while ex-
Off-policy learning allows us to learn about pos- ploring (Watkins, 1989), to make predictions about policies
...


雷达卡




京公网安备 11010802022788号







