Online Stochastic Shortest Path with
Bandit Feedback and Unknown Transition Function
Aviv Rosenberg Yishay Mansour
Tel Aviv University, Israel Tel Aviv University, Israel
avivros007@gmail.com and Google Research, Israel
mansour.yishay@gmail.com
Abstract
We consider online learning in episodic loop-free Markov decision processes
(MDPs), where the loss function can change arbitr ...


雷达卡


京公网安备 11010802022788号







