Advanced Dynamic Programming

 

Instructor: Dr. Rajesh Ganesan

    

Eng Bldg. Room 2217

Phone: (703) 993-1693                                                    

Fax: (703) 993-1521                                                                                 

Email: rganesan at gmu dot edu

 

Week 1

Syllabus

Text Book: https://onlinelibrary.wiley.com/doi/book/10.1002/9781118029176

https://onlinelibrary.wiley.com/doi/book/10.1002/9780470544785 by Jennie Si et al.

Big picture

Examples for DP/Approx DP     Read chapter by Paul Werbos in Si. et al. pages 3-44

Weeks  2 - 3

DP refresher   Notes

DP example question  

Excel Example value iteration for MDP

 

 

Weeks 4 - 6

 

Chapters 1-3

ADP motivation

 

Chapter 4

ADP Dialect 1 - Asynchronous update with TPM - use of small case or little v, use pre decision states (Fig 4.2 page 120)

ADP Dialect 2 - Q- learning around pre decison state (no TPM). use of small case or little q, Robbins-Monroe Stochatic Approximation Scheme, Use of learning parameter alpha, use pre decision states

ADP Dialect 3 - Asynchronous update with TPM, use of small case or little v, Robbins-Monroe Stochatic Approximation Scheme, Use of learning parameter alpha, use pre decision states (Fig 4.4 page 128)

ADP Dialect 4 - Asynchronous update (No TPM), use of small case or little v, Robbins-Monroe Stochatic Approximation Scheme, Use of learning parameter alpha, use post decision states(Fig 4.7 page 141)

ADP Dialect 5 - Q- learning around post decison state (no TPM). use of small case or little q, Robbins-Monroe Stochatic Approximation Scheme, Use of learning parameter alpha, use post decision states

 

SARSA - evaluate a policy

RTDP- start from an optimistic value of a state (same as ADP Dialect 1)

 

Week 7

Summary of models and coding

MATLAB

DP- Value iteration Discounted cost criteria, DP, uses TPM, sync update so all possible next states are evaluated in each iteration

ADP Dialect 1 - Figure 4.2 page 120 value iteration_ADP, uses TPM, V=v, async update, TPM used for finding next state

ADP Dialect 3 - Figure 4.4 page 128 value iteration_ADP2, uses TPM, V = (1-alpha)V + alpha v, async update, TPM used for finding next state

ADP Dialect 4 - Figure 4.7 page 141 value iteration_ADP 3, no TPM, uses Post-decision state, V = (1-alpha)V + alpha v, async update, need some uncertainty model for finding next state

 

Week 8

Chapters 5 and 6 - Modeling and Policies

Week 9

Chapter 7 - Policy search

Stopping criteria with Mean square error

Fig 4.7 page 141 with MSE page 255 value iteration_ADP3x for machine replacement problem.

 

Week 10

Inventory control Example,  see hand out,     excel for inventory control  

Matlab using fig 4.7 pg 141 with uniform demand probability. No knowledge of uncertainty is available

Matlab using fig 4.7 pg 141  with poisson (lambda =1) demand probability from the question. Knowledge of uncertainty is available

 

Chapter 11

alpha decay

Matlab - alpha decay

 

Week 11

Chapter 12

Exploration vs exploitation (learning)

 

Chapter 8

VFA - Value Function Approximation

matlab code with 4 schemes

 

Weeks 12-13

Chapters 9, 10, 13, and 15

 

 

VFA with diffusion wavelet

VFA DW Theory

Diffusion wavelets DW code for best basis - multiple levels - MATLAB

Excel to show value determination

ADP with scaling and wavelet functions code - MATLAB

VFA Steps implementation

 

ADP summary

 

Week 14

 

Project Presentation - Week of Nov 30- Dec 4. We will pick a 2.5 hr time that suits everyone.

*******************************************************************************

Project - Individual or 2 in a group.  Email me your group and the title of your work by Dec 1

 

Pick one application and prepare a 5-10 minute overview for presenting in class from the following books. You are welcome to prepare slides. Provide a 1-2 page write up by Dec 9 via email only.

 

For slides and report use this structure: Back ground describing the problem, objective, state and its variables, action variables, uncertainty and how the state trasitions to another state, reward/penatly - contribution function, transition probability, reference etc.

 

https://onlinelibrary.wiley.com/doi/pdf/10.1002/9780470544785    - Hand book of ADP Jennie Si et al. (eds)    - See part III for applications. You may also use part II.

https://onlinelibrary.wiley.com/doi/pdf/10.1002/9781118029176   - Test book ADP by Warren Powell   - Chapter 14 

http://incompleteideas.net/book/bookdraft2017nov5.pdf   - RL by Sutton and Barto    - Chapter 16  -  computer games

 

 

********************************************************************************