Advanced Dynamic Programming

Fall 2017

 

Instructor: Dr. Rajesh Ganesan

 

 

    

Eng Bldg. Room 2217

Phone: (703) 993-1693                                                    

Fax: (703) 993-1521                                                                                 

Email: rganesan at gmu dot edu

 

Syllabus

 

DP refresher  Notes

ADP need and eqautions

 

Machine replacement problem:

Limiting Probabilities and MDP, Exhaustive enumeration, LP solution to MDP,  MDP- Average Reward/cost- Policy and Value Iteration, Discounted Cost-Policy and Value Iteration

notes

Excel Example value iteration for MDP

 

Matlab files:

Value iteration Discounted cost criteria, DP, uses TPM, sync update so all possible next states are evaluated in each iteration

Figure 4.2 page 120 value iteration_ADP, uses TPM, V=v, async update, TPM used for finding next state

Figure 4.4 page 128 value iteration_ADP2, uses TPM, V = (1-alpha)V + alpha v, async update, TPM used for finding next state

Figure 4.7 page 141 value iteration_ADP 3, no TPM, uses Post-decision state, V = (1-alpha)V + alpha v, async update, need some uncertainty model for finding next state

 

Inventory control Example,  see hand out,     excel for inventory control  

matlab using fig 4.7 pg 141 with uniform demand prob. no knowledge of uncertainty is available

matlab using fig 4.7 pg 141  with poisson (lambda =1) demand prob . some knowledge of uncertainty is available

 

stopping criteria

Fig 4.7 page 141 with MSE value iteration_ADP3x for machine replacement problem.

Alpha decay schemes

alpha decay

Fig 4.7 page 141 with MSE value iteration_ADP4 to study different alpha decay rates for machine replacement problem.

Fig 4.7 page 141 with MSE value iteration_ADP4 to study different alpha decay rates for inv control problem with uniform demand prob. no knowledge of uncertainty is available

Fig 4.7 page 141 with MSE value iteration_ADP4 to study different alpha decay rates for inv control problem with poisson (lambda =1) demand prob . some knowledge of uncertainty is available

Water Resource example: see handout      value iteration using excel

 The possible decisions are to release upto the dam's availability. Use same cost functions as in the handout. The probability of inflow is

Prob of inflow
P(0) 0.02
P(1) 0.05
P(2) 0.01
P(3) 0.06
P(4) 0.02
P(5) 0.02
P(6) 0.08
P(7) 0.09
P(8) 0.01
P(9) 0.04
P(10) 0.1
P(11) 0.04
P(12) 0.02
P(13) 0.16
P(14) 0.04
P(15) 0.05
P(16) 0.02
P(17) 0.08
P(18) 0.06
P(19) 0.02
P(20) 0.01
 

Solve the water resource problem using a) value iteration b) 4.2 c) 4.4, and d) 4.7 -use uniform (equal prob of 1/21) given above with dam capcity 20 units. e) How does the solution change if the probability of inflow is poisson with mean 10 (use 4.7)

 

VFA matlab code with 4 schemes

 

VFA with diffusion wavelet

VFA DW Theory

DW - Haar wavelet demo to go with the ppt above

Diffusion wavelets DW code for best basis - multiple levels

excel to show value determination

ADP with only scaling function code

ADP with scaling and wavelet functions code

 

 

 

Inventory problem

 

14 day cycle - time in hours (336 hrs)

job arrival rate (Poisson average) - 50 per hour per sensor at the beginning of the hour.

job service rate (deterministic)- 51 per hour per analyst

M/D/1 queue

pref metric: maintain the queue length of jobs less than 50 in any given hour (green zone), 50-100 is yellow, and >100 is red zone.

queue length = (rho)^2/(2(1-rho),      rho = l/m (arrival/service) = 50/51

# of sensors = 10

# of people available per hour = 10

uncertainty model for learning -additional demand Poisson(50), generate 6 events over 14 days (repeat for several 14 day runs)

additional resource available over 14 days period =7 hours

 

contribution function = weight*normalized backlog + weight*normalized (leftover resource/left over time in hrs over 14 days)

 

normalize on to 0-1 scale where 1 means most desirable.

 

Test:

uncertainty model for learnt (test) - 6 events over 14 days, additional demand = 33 (at hour 32), 47(@54), 70(@80), 69(@170), 51(@210), 47(@320)

allocate the adiitional resource and caluculate queuelength/hour  and additional resource consumption for myopic with immediate reaction and RL strategy.