Is Pessimism Provably Efficient for Offline RL?
Ying Jin * 1 Zhuoran Yang * 2 Zhaoran Wang * 3
Abstract Vinyals et al., 2017) relies on two ingredients: (i) expressive
We study offline reinforcement learning (RL), function approximators, e.g., deep neural networks (LeCun
which aims to learn an optimal policy based on et al., 2015), which approximate policies and values, and
a dataset collected a priori. Due to the lack of ...


雷达卡




京公网安备 11010802022788号







