Hyperparameter Selection for Imitation Learning
Leonard Hussenot * 1 2 Marcin Andrychowicz * 1 Damien Vincent * 1 Robert Dadashi 1 Anton Raichuk 1
Lukasz Stafiniak 1 Sertan Girgin 1 Raphael Marinier 1 Nikola Momchev 1 Sabela Ramos 1 Manu Orsini 1
Olivier Bachem 1 Matthieu Geist 1 Olivier Pietquin 1
Abstract the expert implements an optimal policy according to an
unknown reward function. This approach, also known ...


雷达卡




京公网安备 11010802022788号







