PID Accelerated Value Iteration Algorithm
Amir-massoud Farahmand 1 2 Mohammad Ghavamzadeh 3
Abstract approximation of the value or action-value functions, i.e.,
The convergence rate of Value Iteration (VI), a Vk+1 ← T π Vk or Qk+1 ← T Qk . For discounted MDPs,
fundamental procedure in dynamic programming the Bellman operator is a contraction, and standard fixed-
and reinforcement learning, for solving MDPs can point ite ...


雷达卡




京公网安备 11010802022788号







