Incentivized Bandit Learning with Self-Reinforcing User Preferences
Tianchen Zhou 1 Jia Liu 1 Chaosheng Dong 2 Jingyuan Deng 2
Abstract accumulates more positive feedbacks. For example, on a
In this paper, we investigate a new multi-armed movie rental website, current customers tend to have more
bandit (MAB) online learning model that consid- interest in Movie A that has 500 positive reviews, compared
ers real-world phenomena in man ...


雷达卡




京公网安备 11010802022788号







