Advanced Dynamic Programming

Advanced Dynamic Programming

Fall 2017

Instructor: Dr. Rajesh Ganesan

Eng Bldg. Room 2217

Phone: (703) 993-1693

Fax: (703) 993-1521

Email: rganesan at gmu dot edu

Syllabus

DP refresher Notes

ADP need and eqautions

Machine replacement problem:

Limiting Probabilities and MDP, Exhaustive enumeration, LP solution to MDP, MDP- Average Reward/cost- Policy and Value Iteration, Discounted Cost-Policy and Value Iteration

notes

Excel Example value iteration for MDP

Matlab files:

Value iteration Discounted cost criteria, DP, uses TPM, sync update so all possible next states are evaluated in each iteration

Figure 4.2 page 120 value iteration_ADP, uses TPM, V=v, async update, TPM used for finding next state

Figure 4.4 page 128 value iteration_ADP2, uses TPM, V = (1-alpha)V + alpha v, async update, TPM used for finding next state

Figure 4.7 page 141 value iteration_ADP 3, no TPM, uses Post-decision state, V = (1-alpha)V + alpha v, async update, need some uncertainty model for finding next state

Inventory control Example, see hand out, excel for inventory control

matlab using fig 4.7 pg 141 with uniform demand prob. no knowledge of uncertainty is available

matlab using fig 4.7 pg 141 with poisson (lambda =1) demand prob . some knowledge of uncertainty is available

stopping criteria

Fig 4.7 page 141 with MSE value iteration_ADP3x for machine replacement problem.

Alpha decay schemes

alpha decay

Fig 4.7 page 141 with MSE value iteration_ADP4 to study different alpha decay rates for machine replacement problem.

Fig 4.7 page 141 with MSE value iteration_ADP4 to study different alpha decay rates for inv control problem with uniform demand prob. no knowledge of uncertainty is available

Fig 4.7 page 141 with MSE value iteration_ADP4 to study different alpha decay rates for inv control problem with poisson (lambda =1) demand prob . some knowledge of uncertainty is available

Water Resource example: see handout value iteration using excel

The possible decisions are to release upto the dam's availability. Use same cost functions as in the handout. The probability of inflow is

Prob of inflow
P(0)	0.02
P(1)	0.05
P(2)	0.01
P(3)	0.06
P(4)	0.02
P(5)	0.02
P(6)	0.08
P(7)	0.09
P(8)	0.01
P(9)	0.04
P(10)	0.1
P(11)	0.04
P(12)	0.02
P(13)	0.16
P(14)	0.04
P(15)	0.05
P(16)	0.02
P(17)	0.08
P(18)	0.06
P(19)	0.02
P(20)	0.01

Solve the water resource problem using a) value iteration b) 4.2 c) 4.4, and d) 4.7 -use uniform (equal prob of 1/21) given above with dam capcity 20 units. e) How does the solution change if the probability of inflow is poisson with mean 10 (use 4.7)

VFA matlab code with 4 schemes

VFA with diffusion wavelet

VFA DW Theory

DW - Haar wavelet demo to go with the ppt above

Diffusion wavelets DW code for best basis - multiple levels

excel to show value determination

ADP with only scaling function code

ADP with scaling and wavelet functions code

adp summary

Inventory problem

14 day cycle - time in hours (336 hrs)

job arrival rate (Poisson average) - 50 per hour per sensor at the beginning of the hour.

job service rate (deterministic)- 51 per hour per analyst

M/D/1 queue

pref metric: maintain the queue length of jobs less than 50 in any given hour (green zone), 50-100 is yellow, and >100 is red zone.

queue length = (rho)^2/(2(1-rho), rho = l/m (arrival/service) = 50/51

# of sensors = 10

# of people available per hour = 10

uncertainty model for learning -additional demand Poisson(50), generate 6 events over 14 days (repeat for several 14 day runs)

additional resource available over 14 days period =7 hours

contribution function = weight*normalized backlog + weight*normalized (leftover resource/left over time in hrs over 14 days)

normalize on to 0-1 scale where 1 means most desirable.

Test:

uncertainty model for learnt (test) - 6 events over 14 days, additional demand = 33 (at hour 32), 47(@54), 70(@80), 69(@170), 51(@210), 47(@320)

allocate the adiitional resource and caluculate queuelength/hour and additional resource consumption for myopic with immediate reaction and RL strategy.

result