Advanced Dynamic Programming
Instructor: Dr. Rajesh Ganesan
Eng Bldg. Room 2217
Phone: (703) 993-1693
Fax: (703) 993-1521
Email: rganesan at gmu dot edu
Week 1
Text Book: https://onlinelibrary.wiley.com/doi/book/10.1002/9781118029176
https://onlinelibrary.wiley.com/doi/book/10.1002/9780470544785 by Jennie Si et al.
Examples for DP/Approx DP Read chapter by Paul Werbos in Si. et al. pages 3-44
Weeks 2 - 3
DP refresher Notes
DP example question
Excel Example value iteration for MDP
Weeks 4 - 9
Chapters 1-3
ADP motivation
Chapter 4
ADP Dialect 1 - Asynchronous update with TPM - use of small case or little v, use pre decision states (Fig 4.2 page 120)
ADP Dialect 2 - Q- learning around pre decison state (no TPM). use of small case or little q, Robbins-Monroe Stochatic Approximation Scheme, Use of learning parameter alpha, use pre decision states
SARSA - evaluate a policy
RTDP- start from an optimistic value of a state (same as ADP Dialect 1)
ADP Dialect 3 - Asynchronous update (with no TPM but use smaple realizations from a simulator, hopefully the simulator can get close to the TPM), use of small case or little v, Robbins-Monroe Stochatic Approximation Scheme, Use of learning parameter alpha, use pre decision states (Fig 4.4 page 128)
ADP Dialect 4 - Asynchronous update (No TPM), use of small case or little v, Robbins-Monroe Stochatic Approximation Scheme, Use of learning parameter alpha, use post decision states(Fig 4.7 page 141)
ADP Dialect 5 - Q- learning around post decison state (no TPM). use of small case or little q, Robbins-Monroe Stochatic Approximation Scheme, Use of learning parameter alpha, use post decision states
Summary of models and coding
MATLAB
DP- Value iteration Discounted cost criteria, DP, uses TPM, sync update so all possible next states are evaluated in each iteration
ADP Dialect 1 - Figure 4.2 page 120 value iteration_ADP, uses TPM, V=v, async update, TPM used for finding next state
ADP Dialect 3 - Figure 4.4 page 128 value iteration_ADP2, uses TPM, V = (1-alpha)V + alpha v, async update, TPM used for finding next state
ADP Dialect 4 - Figure 4.7 page 141 value iteration_ADP 3, no TPM, uses Post-decision state, V = (1-alpha)V + alpha v, async update, need some uncertainty model for finding next state
Chapter 6 - policy representation
policy iteration for discounted criteria
Chapter 7 - Policy search
Stochatic gradient page 266 of 647
Stopping criteria with Mean square error
Fig 4.7 page 141 with MSE page 255 value iteration_ADP3x for machine replacement problem.
Chapter 9 Value of a policy
TD learning page 350 of 647
TD(0) = DP class policy iteration for discounted criteria
Chapter 10
Actor-critic Page 420 of 647
DPG - Deterministic prolicy gradient
DQN+DPG = DDPG
Q learning- DQN- DPG- DDPG
https://cse.buffalo.edu/~avereshc/rl_fall19/lecture_21_Actor_Critic_DPG_DDPG.pdf
(DDPG in Matlab) https://www.mathworks.com/help/reinforcement-learning/ug/ddpg-agents.html#mw_086ee5c6-c185-4597-aefc-376207c6c24c
https://2020blogfor.github.io/posts/2020/04/rlddpg/
Week 10 onwards
Chapter 11
alpha decay
Matlab - alpha decay
Chapter 12
Exploration vs exploitation (learning)
Chapter 5 - Modeling
Importance of a good uncertainty simulator
Inventory control Example, see hand out, excel for inventory control
Matlab using fig 4.7 pg 141 with uniform demand probability. No knowledge of uncertainty is available
Matlab using fig 4.7 pg 141 with poisson (lambda =1) demand probability from the question. Knowledge of uncertainty is available
Chapter 8
VFA - Value Function Approximation
matlab code with 4 schemes
VFA with diffusion wavelet
VFA DW Theory
Diffusion wavelets DW code for best basis - multiple levels - MATLAB
Excel to show value determination
ADP with scaling and wavelet functions code - MATLAB
VFA Steps implementation
Chapters 13, 14, and 15
Week 14
Project Presentation - Nov 30 Thursday 4:30 PM
*******************************************************************************
Project - Individual or 2 in a group. Email me your group and the title of your work by Nov 30
Pick one application and prepare a 5-10 minute overview for presenting in class from the following books. You are welcome to prepare slides. Provide a 1-2 page write up by Dec 7 via email only.
For slides and report use this structure: Back ground describing the problem, objective, state and its variables, action variables, uncertainty and how the state trasitions to another state, reward/penatly - contribution function, transition probability, reference etc.
https://onlinelibrary.wiley.com/doi/pdf/10.1002/9780470544785 - Hand book of ADP Jennie Si et al. (eds) - See part III for applications. You may also use part II.
https://onlinelibrary.wiley.com/doi/pdf/10.1002/9781118029176 - Test book ADP by Warren Powell - Chapter 14
http://incompleteideas.net/book/bookdraft2017nov5.pdf - RL by Sutton and Barto - Chapter 16 - computer games
********************************************************************************