RUDDER: Return Decomposition for Delayed
Rewards
Jose A. Arjona-Medina Michael Gillhofer Michael Widrich
Thomas Unterthiner Johannes Brandstetter Sepp Hochreiter
LIT AI Lab
Institute for Machine Learning
Johannes Kepler University Linz, Austria
Abstract
We propose RUDDER, a novel reinforcement learning approach for delayed re-
wards in finite Markov decision proc ...


雷达卡


京公网安备 11010802022788号







