Decentralized Single-Timescale Actor Critic on Zero-Sum Two-Player
Stochastic Games
Hongyi Guo 1 Zuyue Fu 1 Zhuoran Yang 2 Zhaoran Wang 1
Abstract as Markov decision process (Puterman, 2014, MDP), where
an agent aims to learn an optimal policy via interaction with
We study the global convergence and global opti- the environment. However, a wild range of real-world com-
mality of the actor-c ...


雷达卡




京公网安备 11010802022788号







