OptiDICE: Offline Policy Optimization via
Stationary Distribution Correction Estimation
Jongmin Lee 1 * Wonseok Jeon 2 3 * Byung-Jun Lee 4 Joelle Pineau 2 3 5 Kee-Eung Kim 1 6
Abstract and then to deploy the model with its parameter fixed when
we are satisfied with training. This offline training allows
We consider the offline reinforcement learning
us to address various operationa ...


雷达卡




京公网安备 11010802022788号







