Non-Asymptotic Gap-Dependent Regret Bounds for
Tabular MDPs
Max Simchowitz Kevin Jamieson
UC Berkeley University of Washington
msimchow@berkeley.edu jamieson@cs.washington.edu
Abstract
This paper establishes that optimistic algorithms attain gap-dependent and non-
asymptotic logarithmic regret for episodic MDPs. In contrast to prior work,
our bounds do not suffer a dependence on diamet ...


雷达卡


京公网安备 11010802022788号







