Advanced Dynamic Programming
Instructor: Dr. Rajesh Ganesan
Eng Bldg. Room 2217
Phone: (703) 993-1693
Fax: (703) 993-1521
Email: rganesan at gmu dot edu
Week 1
Text Book: https://onlinelibrary.wiley.com/doi/book/10.1002/9781118029176
https://onlinelibrary.wiley.com/doi/book/10.1002/9780470544785 by Jennie Si et al.
Examples for DP/Approx DP Read chapter by Paul Werbos in Si. et al. pages 3-44
Weeks 2 - 3
DP refresher Notes
DP example question
Excel Example value iteration for MDP
Weeks 4 - 6
Chapters 1-3
ADP motivation
Chapter 4
ADP Dialect 1 - Asynchronous update with TPM - use of small case or little v, use pre decision states (Fig 4.2 page 120)
ADP Dialect 2 - Q- learning around pre decison state (no TPM). use of small case or little q, Robbins-Monroe Stochatic Approximation Scheme, Use of learning parameter alpha, use pre decision states
ADP Dialect 3 - Asynchronous update with TPM, use of small case or little v, Robbins-Monroe Stochatic Approximation Scheme, Use of learning parameter alpha, use pre decision states (Fig 4.4 page 128)
ADP Dialect 4 - Asynchronous update (No TPM), use of small case or little v, Robbins-Monroe Stochatic Approximation Scheme, Use of learning parameter alpha, use post decision states(Fig 4.7 page 141)
ADP Dialect 5 - Q- learning around post decison state (no TPM). use of small case or little q, Robbins-Monroe Stochatic Approximation Scheme, Use of learning parameter alpha, use post decision states
SARSA - evaluate a policy
RTDP- start from an optimistic value of a state (same as ADP Dialect 1)
Week 7
Summary of models and coding
MATLAB
DP- Value iteration Discounted cost criteria, DP, uses TPM, sync update so all possible next states are evaluated in each iteration
ADP Dialect 1 - Figure 4.2 page 120 value iteration_ADP, uses TPM, V=v, async update, TPM used for finding next state
ADP Dialect 3 - Figure 4.4 page 128 value iteration_ADP2, uses TPM, V = (1-alpha)V + alpha v, async update, TPM used for finding next state
ADP Dialect 4 - Figure 4.7 page 141 value iteration_ADP 3, no TPM, uses Post-decision state, V = (1-alpha)V + alpha v, async update, need some uncertainty model for finding next state
Week 8
Chapters 5 and 6 - Modeling and Policies
Week 9
Chapter 7 - Policy search
Stopping criteria with Mean square error
Fig 4.7 page 141 with MSE page 255 value iteration_ADP3x for machine replacement problem.
Week 10
Inventory control Example, see hand out, excel for inventory control
Matlab using fig 4.7 pg 141 with uniform demand probability. No knowledge of uncertainty is available
Matlab using fig 4.7 pg 141 with poisson (lambda =1) demand probability from the question. Knowledge of uncertainty is available
Chapter 11
alpha decay
Matlab - alpha decay
Week 11
Chapter 12
Exploration vs exploitation (learning)
Chapter 8
VFA - Value Function Approximation
matlab code with 4 schemes
Weeks 12-13
Chapters 9, 10, 13, and 15
VFA with diffusion wavelet
VFA DW Theory
Diffusion wavelets DW code for best basis - multiple levels - MATLAB
Excel to show value determination
ADP with scaling and wavelet functions code - MATLAB
VFA Steps implementation
Week 14
Project Presentation - Week of Nov 30- Dec 4. We will pick a 2.5 hr time that suits everyone.
*******************************************************************************
Project - Individual or 2 in a group. Email me your group and the title of your work by Dec 1
Pick one application and prepare a 5-10 minute overview for presenting in class from the following books. You are welcome to prepare slides. Provide a 1-2 page write up by Dec 9 via email only.
For slides and report use this structure: Back ground describing the problem, objective, state and its variables, action variables, uncertainty and how the state trasitions to another state, reward/penatly - contribution function, transition probability, reference etc.
https://onlinelibrary.wiley.com/doi/pdf/10.1002/9780470544785 - Hand book of ADP Jennie Si et al. (eds) - See part III for applications. You may also use part II.
https://onlinelibrary.wiley.com/doi/pdf/10.1002/9781118029176 - Test book ADP by Warren Powell - Chapter 14
http://incompleteideas.net/book/bookdraft2017nov5.pdf - RL by Sutton and Barto - Chapter 16 - computer games
********************************************************************************