On-Policy Deep Reinforcement Learning for the Average-Reward Criterion
Yiming Zhang 1 Keith W. Ross 2 1
Abstract Haarnoja et al., 2018) or in a queuing scenario (Tadepalli
& Ok, 1994; Sutton & Barto, 2018), there is no natural sep-
We develop theory and algorithms for average- aration of episodes and the agent-environment interaction
reward on-policy Reinforcement Learning (RL). continues in ...


雷达卡




京公网安备 11010802022788号







