Best Model Identification: A Rested Bandit Formulation
Leonardo Cella 1 Massimiliano Pontil 1 2 Claudio Gentile 3
Abstract 2002), the feedback generated when pulling an arm is mod-
eled as a random variable sampled from a prescribed dis-
We introduce and analyze a best arm identifica- tribution associated with the selected arm. In contrast, in
tion problem in the rested bandit setting, wherein this pa ...


雷达卡




京公网安备 11010802022788号







