Two Time-scale Off-Policy TD Learning:
Non-asymptotic Analysis over Markovian Samples
Tengyu Xu
Department of Electrical and Computer Engineering
The Ohio State University
xu.3260@osu.edu
Shaofeng Zou
Department of Electrical Engineering
University at Buffalo, The State University of New York
szou3@buffalo.edu
Yingbin Liang
...


雷达卡



京公网安备 11010802022788号







