On Reward-Free RL with Kernel and Neural Function Approximations:
Single-Agent MDP and Markov Game
Shuang Qiu 1 Jieping Ye 1 Zhaoran Wang 2 Zhuoran Yang 3
Abstract is large and function approximators such as neural networks
To achieve sample efficiency in reinforcement are employed. To achieve sample efficiency, any RL algo-
learning (RL), it necessitates to efficiently explore rithm needs to accurately learn the transit ...


雷达卡




京公网安备 11010802022788号







