Breaking the Deadly Triad with a Target Network
Shangtong Zhang 1 Hengshuai Yao 2 3 Shimon Whiteson 1
Abstract ping methods construct update targets for an estimate by
using the estimate itself recursively, which usually has lower
The deadly triad refers to the instability of a re- variance than Monte Carlo methods (Sutton, 1988). How-
inforcement learning algorithm when it employs ever, when an al ...


雷达卡




京公网安备 11010802022788号







