Stochastic Multi-Armed Bandits with Unrestricted Delay Distributions
Tal Lancewicki * 1 Shahar Segal * 1 Tomer Koren 1 2 Yishay Mansour 1 2
Abstract tion, like in the classic stochastic MAB problem. However,
the reward is observed only at time t + dt , where dt is a
We study the stochastic Multi-Armed Ban-
random variable denoting the delay at step t. This prob-
dit (MAB) p ...


雷达卡




京公网安备 11010802022788号







