UCB Momentum Q-learning:
Correcting the bias without forgetting
Pierre Ménard 1 Omar Darwiche Domingues 2 Xuedong Shang 2 3 Michal Valko 2 3 4
Abstract balance the exploration of the environment and exploitation
of the current knowledge to act optimally.
We propose UCBMQ, Upper Confidence Bound
Momentum Q-learning, a new algorithm for rein- In particular, we study the non-stationary setting where re ...


雷达卡




京公网安备 11010802022788号







