DFAC Framework: Factorizing the Value Function via
Quantile Mixture for Multi-Agent Distributional Q-Learning
Wei-Fang Sun 1 2 3 Cheng-Kuang Lee 2 Chun-Yi Lee 1
Abstract optimize the overall rewards in each episode. Nevertheless,
each agent’s policy may not converge owing to two main
In fully cooperative multi-agent reinforcement
difficulties: (1) non-stationary environments ca ...


雷达卡




京公网安备 11010802022788号







