Provably Correct Optimization and Exploration with Non-linear Policies
Fei Feng 1 Wotao Yin 1 Alekh Agarwal 2 Lin Yang 3
Abstract rer & Geist, 2014; Geist et al., 2019; Abbasi-Yadkori et al.,
Policy optimization methods remain a powerful 2019; Agarwal et al., 2020c; Bhandari & Russo, 2019) when
workhorse in empirical Reinforcement Learning the agent has access to a distribution over states which is
(RL), with a focus on neur ...


雷达卡




京公网安备 11010802022788号







