Sample Complexity of Asynchronous Q-Learning:
Sharper Analysis and Variance Reduction
Gen Li Yuting Wei Yuejie Chi Yuantao Gu Yuxin Chen
Tsinghua CMU CMU Tsinghua Princeton
Abstract
Asynchronous Q-learning aims to learn the optimal action-value function (or Q-
function) of a Markov decision process (MDP), based on a single trajectory of
Markovian samples induced by a behavior policy. Focusing on a γ-dis ...


雷达卡


京公网安备 11010802022788号







