Online Limited Memory Neural-Linear Bandits with Likelihood Matching
Ofir Nabati 1 Tom Zahavy 1 2 Shie Mannor 1 3
Abstract ploration during the representation learning phase is still
We study neural-linear bandits for solving prob- an open problem. The -greedy policy (Langford & Zhang,
lems where both exploration and representation 2008) is simple to implement and widely used in practice
learning play an important role. ...


雷达卡




京公网安备 11010802022788号







