Tightening the Dependence on Horizon in the
Sample Complexity of Q-Learning
Gen Li 1 Changxiao Cai 2 Yuxin Chen 2 Yuantao Gu 1 Yuting Wei 3 Yuejie Chi 4
Abstract Q-learning (Borkar & Meyn, 2000; Jaakkola et al., 1994;
Szepesvari, 1998; Tsitsiklis, 1994) have been primarily fo-
Q-learning, which seeks to learn the optimal Q-
cused on the asymptotic regime — in which ...


雷达卡




京公网安备 11010802022788号







