Meta-Thompson Sampling
Branislav Kveton 1 Mikhail Konobeev 2 Manzil Zaheer 1 Chih-wei Hsu 1 Martin Mladenov 1 Craig Boutilier 1
Csaba Szepesvari 3 2
Abstract which the prior distribution over mean arm rewards, a vital
Efficient exploration in bandits is a fundamental part of TS, is unknown. While the prior is unknown, the de-
online learning problem. We propose a variant of signer may know that it is one of two possible priors ...


雷达卡




京公网安备 11010802022788号







