Advanced Dynamic Programming

Advanced Dynamic Programming

Fall 2019

Instructor: Dr. Rajesh Ganesan

Eng Bldg. Room 2217

Phone: (703) 993-1693

Fax: (703) 993-1521

Email: rganesan at gmu dot edu

Syllabus

DP refresher Notes

ADP need and eqautions

Machine replacement problem:

Limiting Probabilities and MDP, Exhaustive enumeration, LP solution to MDP, MDP- Average Reward/cost- Policy and Value Iteration, Discounted Cost-Policy and Value Iteration

notes

Excel Example value iteration for MDP

Matlab files:

Value iteration Discounted cost criteria, DP, uses TPM, sync update so all possible next states are evaluated in each iteration

Figure 4.2 page 120 value iteration_ADP, uses TPM, V=v, async update, TPM used for finding next state

Figure 4.4 page 128 value iteration_ADP2, uses TPM, V = (1-alpha)V + alpha v, async update, TPM used for finding next state

Figure 4.7 page 141 value iteration_ADP 3, no TPM, uses Post-decision state, V = (1-alpha)V + alpha v, async update, need some uncertainty model for finding next state

stopping criteria

Fig 4.7 page 141 with MSE page 255 value iteration_ADP3x for machine replacement problem.

Inventory control Example, see hand out, excel for inventory control

matlab using fig 4.7 pg 141 with uniform demand prob. no knowledge of uncertainty is available

matlab using fig 4.7 pg 141 with poisson (lambda =1) demand prob . some knowledge of uncertainty is available

Alpha decay schemes

alpha decay

VFA matlab code with 4 schemes

VFA with diffusion wavelet

VFA DW Theory

Diffusion wavelets DW code for best basis - multiple levels

excel to show value determination

ADP with scaling and wavelet functions code

VFA implementation

adp summary

*******************************************************************************

Project - Individual or 2 in a group.

Pick one application and prepare a 5-10 minute overview for presenting in class on Nov 25th from the following books. You are welcome to prepare slides or just talk about it. Provide a 1-2 page write up by Dec 9 via email only.

Back ground describing the problem, objective, state and its variables, action variables, uncertainty and how the state trasitions to another state, reward/penatly - contribution function, trnsition probability if any.

https://onlinelibrary.wiley.com/doi/pdf/10.1002/9780470544785 - Hand book of ADP Jennie Si et al. (eds) - See part III for applications. You may also use part II.

https://onlinelibrary.wiley.com/doi/pdf/10.1002/9781118029176 - Test book ADP by Warren Powell - Chapter 14

http://incompleteideas.net/book/bookdraft2017nov5.pdf - RL by Sutton and Barto - Chapter 16 - computer games

http://www-anw.cs.umass.edu/rlr/ - RL repository

********************************************************************************

Alternate Project: - Individual or 2 in a group.

For those looking to code and solve an ADP problem by Nov 25th for presenting in class. Provide a 1-2 page write up by Dec 9 via email only.

Define objective, state and its variables, action variables, uncertainty and how the state trasitions to another state, reward/penatly - contribution function, trnsition probability if any for the following inventory problem

14 day cycle - time in hours (336 hrs)

job arrival rate (Poisson average) - 50 per hour per sensor at the beginning of the hour.

job service rate (deterministic)- 51 per hour per analyst

M/D/1 queue

pref metric: maintain the queue length of jobs less than 50 in any given hour (green zone), 50-100 is yellow, and >100 is red zone.

queue length = (rho)^2/(2(1-rho), rho = l/m (arrival/service) = 50/51

# of sensors = 10

# of people available per hour = 10

uncertainty model for learning -additional demand Poisson(50), generate 6 events over 14 days (repeat for several 14 day runs)

additional resource available over 14 days period =7 hours

contribution function = weight*normalized backlog + weight*normalized (leftover resource/left over time in hrs over 14 days)

normalize on to 0-1 scale where 1 means most desirable.

Test case:

uncertainty model for learnt (test) - 6 events over 14 days, additional demand = 33 (at hour 32), 47(@54), 70(@80), 69(@170), 51(@210), 47(@320)

Plot thefollowing for the test case: Allocate the additional resource and caluculate queuelength/hour and additional resource consumption for myopic with immediate action and RL strategy.

*****************************************************************************