Taylor Expansions of Discount Factors
Yunhao Tang 1 Mark Rowland 2 Remi Munos 3 Michal Valko 3
Abstract example, T could be the first time the MDP gets into a termi-
In practical reinforcement learning (RL), the dis- nal state (e.g., a robot falls); when the MDP does not have a
count factor used for estimating value functions natural terminal state, T could be enforced as a deterministic
often differs from that used for defining the ...


雷达卡




京公网安备 11010802022788号







