Provably Efficient Q-Learning
with Low Switching Cost
Yu Bai Tengyang Xie Nan Jiang Yu-Xiang Wang
Stanford University UIUC UC Santa Barbara
yub@stanford.edu {tx10, nanjiang}@illinois.edu yuxiangw@cs.ucsb.edu
Abstract
We take initial steps in studying PAC-MDP algorithms with limited adaptivity,
that is, algorithms that change its exploration policy as infrequently as possible
during r ...


雷达卡


京公网安备 11010802022788号







