On the Optimality of Batch Policy Optimization Algorithms
Chenjun Xiao * 1 2 Yifan Wu * 3 Tor Littlemore 4 Bo Dai 2 Jincheng Mei 1 2 Lihong Li 5 Csaba Szepesvari 1 4
Dale Schuurmans 1 2
Abstract a fixed dataset of previously collected experience, with no
Batch policy optimization considers leveraging further environment interaction available. Interest in this
existing data for policy construction before inter- problem has ...


雷达卡




京公网安备 11010802022788号







