Adapting to Delays and Data in Adversarial Multi-Armed Bandits
Andras Gyorgy 1 Pooria Joulani 1
Abstract 1. Introduction
The multi-armed bandit problem is a canonical model for
We consider the adversarial multi-armed bandit sequential decision making with limited feedback. In this
problem under delayed feedback. We analyze model a learner makes a sequence of actions. After ev-
variants of th ...


雷达卡




京公网安备 11010802022788号







