Guided Exploration with Proximal Policy Optimization using a Single
Demonstration
Gabriele Libardi 1 Sebastian Dittert 1 Gianni De Fabritiis 1 2
Abstract Learning from demonstrations allows to directly bypass
this problem but it only works under specific conditions,
Solving sparse reward tasks through exploration e.g. large number of demonstration trajectories or access to
is one of the major c ...


雷达卡




京公网安备 11010802022788号







