Self-Imitation Learning via Generalized Lower
Bound Q-learning
Yunhao Tang
Columbia University
yt2541@columbia.edu
Abstract
Self-imitation learning motivated by lower-bound Q-learning is a novel and effec-
tive approach for off-policy learning. In this work, we propose a n-step lower bound
which generalizes the original return-based lower-bound Q-learning, and introduce
a new fa ...


雷达卡


京公网安备 11010802022788号







