Online Policy Gradient for Model Free
√Learning
of Linear Quadratic Regulators with T Regret
Asaf Cassel 1 Tomer Koren 1 2
Abstract Model-based methods, which perform planning based on a
We consider the task of learning to control a lin- system identification procedure that estimates the transition
ear dynamical system under fixed quadratic costs, matrices, have been studied extensively ...


雷达卡




京公网安备 11010802022788号







