附件是书中第六章
Contents
6.1. General Issues of Cost Approximation . . . . . . . . p. 327
6.1.1. Approximation Architectures . . . . . . . . . p. 327
6.1.2. Approximate Policy Iteration . . . . . . . . . p. 331
6.1.3. Direct and Indirect Approximation . . . . . . p. 336
6.1.4. Simplifications . . . . . . . . . . . . . . . p. 338
6.1.5. The Role of Contraction Mappings . . . . . . p. 344
6.1.6. The Role of Monte Carlo Simulation . . . . . . p. 345
6.2. Direct Policy Evaluation - Gradient Methods . . . . . p. 349
6.3. Projected Equation Methods . . . . . . . . . . . . p. 354
6.3.1. The Projected Bellman Equation . . . . . . . p. 355
6.3.2. Deterministic Iterative Methods . . . . . . . . p. 361
6.3.3. Simulation-Based Methods . . . . . . . . . . p. 365
6.3.4. LSTD, LSPE, and TD(0) Methods . . . . . . p. 367
6.3.5. Optimistic Versions . . . . . . . . . . . . . p. 375
6.3.6. Multistep Simulation-Based Methods . . . . . p. 376
6.3.7. Policy Iteration Issues - Exploration . . . . . . p. 382
6.3.8. Policy Oscillations – Chattering . . . . . . . . p. 390
6.3.9. A Synopsis . . . . . . . . . . . . . . . . . p. 400
6.4. Aggregation Methods . . . . . . . . . . . . . . . p. 405
6.4.1. Cost Approximation via the Aggregate Problem . p. 408
6.4.2. Cost Approximation via the Enlarged Problem . p. 411
6.5. Q-Learning . . . . . . . . . . . . . . . . . . . . p. 421
6.5.1. Convergence Properties of Q-Learning . . . . . p. 424
6.5.2. Q-Learning and Approximate Policy Iteration . . p. 428
6.5.3. Q-Learning for Optimal Stopping Problems . . . p. 431
6.5.4. Finite Horizon Q-Learning . . . . . . . . . . p. 436
321
322 Approximate Dynamic Programming Chap. 6
6.6. Stochastic Shortest Path Problems . . . . . . . . . p. 438
6.7. Average Cost Problems . . . . . . . . . . . . . . p. 443
6.7.1. Approximate Policy Evaluation . . . . . . . . p. 443
6.7.2. Approximate Policy Iteration . . . . . . . . . p. 452
6.7.3. Q-Learning for Average Cost Problems . . . . . p. 454
6.8. Simulation-Based Solution of Large Systems . . . . . p. 458
6.8.1. Projected Equations - Simulation-Based Versions p. 458
6.8.2. Matrix Inversion and Regression-Type Methods . p. 462
6.8.3. Iterative/LSPE-Type Methods . . . . . . . . p. 464
6.8.4. Extension of Q-Learning for Optimal Stopping . p. 472
6.8.5. Bellman Equation Error-Type Methods . . . . p. 474
6.8.6. Oblique Projections . . . . . . . . . . . . . p. 478
6.8.7. Generalized Aggregation by Simulation . . . . . p. 480
6.9. Approximation in Policy Space . . . . . . . . . . . p. 485
6.9.1. The Gradient Formula . . . . . . . . . . . . p. 486
6.9.2. Computing the Gradient by Simulation . . . . p. 487
6.9.3. Essential Features of Critics . . . . . . . . . p. 488
6.9.4. Approximations in Policy and Value Space . . . p. 491
6.10. Notes, Sources, and Exercises . . . . . . . . . . . p. 492


雷达卡




京公网安备 11010802022788号







