CONTENTS
Preface xi
Acknowledgments xv
1 The challenges of dynamic programming 1
1.1 A dynamic programming example: a shortest path problem 2
1.2 The three curses of dimensionality 3
1.3 Some real applications 6
1.4 Problem classes 9
1.5 The many dialects of dynamic programming 12
1.6 What is new in this book? 14
1.7 Bibliographic notes 15
2 Some illustrative models 17
2.1 Deterministic problems 18
2.2 Stochastic problems 23
2.3 Information acquisition problems 36
2.4 A simple modeling framework for dynamic programs 39
2.5 Bibliographic notes 42
Problems 43
3 Introduction to Markov decision processes 47
3.1 The optimality equations 48
3.2 Finite horizon problems 53
3.3 Infinite horizon problems 55
3.4 Value iteration 56
3.5 Policy iteration 61
3.6 Hybrid valuepolicy
iteration 62
3.7 The linear programming method for dynamic programs 63
3.8 Monotone policies* 64
3.9 Why does it work?** 70
3.10 Bibliographic notes 85
Problems 85
4 Introduction to approximate dynamic programming 91
4.1 The three curses of dimensionality (revisited) 92
4.2 The basic idea 93
4.3 Sampling random variables 100
4.4 ADP using the postdecision
state variable 101
4.5 Lowdimensional
representations of value functions 107
4.6 So just what is approximate dynamic programming? 110
4.7 Experimental issues 112
4.8 Dynamic programming with missing or incomplete models 117
4.9 Relationship to reinforcement learning 117
4.10 But does it work? 119
4.11 Bibliographic notes 120
Problems 121
5 Modeling dynamic programs 127
5.1 Notational style 129
5.2 Modeling time 130
5.3 Modeling resources 133
5.4 The states of our system 137
5.5 Modeling decisions 144
5.6 The exogenous information process 149
5.7 The transition function 157
5.8 The contribution function 164
5.9 The objective function 166
5.10 A measuretheoretic
view of information** 168
5.11 Bibliographic notes 170
Problems 171
6 Stochastic approximation methods 177
6.1 A stochastic gradient algorithm 179
6.2 Some stepsize recipes 181
6.3 Stochastic stepsizes 188
6.4 Computing bias and variance 193
6.5 Optimal stepsizes 195
6.6 Some experimental comparisons of stepsize formulas 202
6.7 Convergence 207
6.8 Why does it work?** 209
6.9 Bibliographic notes 218
Problems 219
7 Approximating value functions 225
7.1 Approximation using aggregation 226
7.2 Approximation methods using regression models 235
7.3 Recursive methods for regression models 246
7.4 Neural networks 252
7.5 Batch processes 257
7.6 Why does it work?** 261
7.7 Bibliographic notes 264
Problems 266
8 ADP for finite horizon problems 269
8.1 Strategies for finite horizon problems 270
8.2 Qlearning
8.3 Temporal difference learning 277
8.4 Policy iteration 280
8.5 Monte Carlo value and policy iteration 282
8.6 The actorcritic
paradigm 283
8.7 Bias in value function estimation 284
8.8 State sampling strategies 288
8.9 Starting and stopping 291
8.10 A taxonomy of approximate dynamic programming strategies 294
8.11 Why does it work** 296
8.12 Bibliographic notes 296
Problems 297
9 Infinite horizon problems 301
9.1 From finite to infinite horizon 302
9.2 Algorithmic strategies 302
9.3 Stepsizes for infinite horizon problems 311
9.4 Error measures 313
9.5 Direct ADP for online
applications 315
9.6 Finite horizon models for steady state applications 315
9.7 Why does it work?** 317
9.8 Bibliographic notes 317
Problems 317
10 Exploration vs. exploitation 321
10.1 A learning exercise: the nomadic trucker 321
10.2 Learning strategies 324
10.3 A simple information acquisition problem 328
10.4 Gittins indices and the information acquisition problem 330
10.5 Variations 335
10.6 The knowledge gradient algorithm 337
10.7 Information acquisition in dynamic programming 340
10.8 Bibliographic notes 343
Problems 344
11 Value function approximations for special functions 349
11.1 Value functions versus gradients 350
11.2 Linear approximations 351
11.3 Piecewise linear approximations 353
11.4 The SHAPE algorithm 357
11.5 Regression methods 360
11.6 Cutting planes* 363
11.7 Why does it work?** 375
11.8 Bibliographic notes 381
Problems 382
12 Dynamic resource allocation 385
12.1 An asset acquisition problem 386
12.2 The blood management problem 390
12.3 A portfolio optimization problem 399
12.4 A general resource allocation problem 402
12.5 A fleet management problem 414
12.6 A driver management problem 420
12.7 Bibliographic references 424
Problems 425
13 Implementation challenges 431
13.1 Will ADP work for your problem? 431
13.2 Designing an ADP algorithm for complex problems 432
13.3 Debugging an ADP algorithm 434
13.4 Convergence issues 435
13.5 Modeling your problem 436
13.6 Online
- Approximate Dynamic Programming Solving the Curses of Dimensionality.pdf