Near-Optimal Model-Free Reinforcement Learning
in Non-Stationary Episodic MDPs
Weichao Mao 1 Kaiqing Zhang 1 Ruihao Zhu 2 David Simchi-Levi 2 Tamer Basar 1
Abstract through sequential interactions with an initially unknown
We consider model-free reinforcement learning but fixed environment, usually modeled by a Markov De-
(RL) in non-stationary Markov decision processes. cision Process (MDP). In classical RL problems, the state ...


雷达卡




京公网安备 11010802022788号







