Online Learning in Unknown Markov Games
Yi Tian 1 * Yuanhao Wang 2 * Tiancheng Yu 1 * Suvrit Sra 1
Abstract control both/all players and aim to minimize the number of
episodes required to find a good policy; and (2) the online
We study online learning in unknown Markov setting, where we can only control one player (which we
games, a problem that arises in episodic multi- refer to as our player), tre ...


雷达卡




京公网安备 11010802022788号







