Advanced Dynamic Programming
Fall 2017
Instructor: Dr. Rajesh Ganesan
Eng Bldg. Room 2217
Phone: (703) 993-1693
Fax: (703) 993-1521
Email: rganesan at gmu dot edu
DP refresher Notes
ADP need and eqautions
Machine replacement problem:
Limiting Probabilities and MDP, Exhaustive enumeration, LP solution to MDP, MDP- Average Reward/cost- Policy and Value Iteration, Discounted Cost-Policy and Value Iteration
Excel Example value iteration for MDP
Matlab files:
Value iteration Discounted cost criteria, DP, uses TPM, sync update so all possible next states are evaluated in each iteration
Figure 4.2 page 120 value iteration_ADP, uses TPM, V=v, async update, TPM used for finding next state
Figure 4.4 page 128 value iteration_ADP2, uses TPM, V = (1-alpha)V + alpha v, async update, TPM used for finding next state
Figure 4.7 page 141 value iteration_ADP 3, no TPM, uses Post-decision state, V = (1-alpha)V + alpha v, async update, need some uncertainty model for finding next state
Inventory control Example, see hand out, excel for inventory control
matlab using fig 4.7 pg 141 with uniform demand prob. no knowledge of uncertainty is available
matlab using fig 4.7 pg 141 with poisson (lambda =1) demand prob . some knowledge of uncertainty is available
stopping criteria
Fig 4.7 page 141 with MSE value iteration_ADP3x for machine replacement problem.
Alpha decay schemes
Fig 4.7 page 141 with MSE value iteration_ADP4 to study different alpha decay rates for machine replacement problem.
Fig 4.7 page 141 with MSE value iteration_ADP4 to study different alpha decay rates for inv control problem with uniform demand prob. no knowledge of uncertainty is available
Fig 4.7 page 141 with MSE value iteration_ADP4 to study different alpha decay rates for inv control problem with poisson (lambda =1) demand prob . some knowledge of uncertainty is available
Water Resource example: see handout value iteration using excel
The possible decisions are to release upto the dam's availability. Use same cost functions as in the handout. The probability of inflow is
Prob of inflow | |
P(0) | 0.02 |
P(1) | 0.05 |
P(2) | 0.01 |
P(3) | 0.06 |
P(4) | 0.02 |
P(5) | 0.02 |
P(6) | 0.08 |
P(7) | 0.09 |
P(8) | 0.01 |
P(9) | 0.04 |
P(10) | 0.1 |
P(11) | 0.04 |
P(12) | 0.02 |
P(13) | 0.16 |
P(14) | 0.04 |
P(15) | 0.05 |
P(16) | 0.02 |
P(17) | 0.08 |
P(18) | 0.06 |
P(19) | 0.02 |
P(20) | 0.01 |
Solve the water resource problem using a) value iteration b) 4.2 c) 4.4, and d) 4.7 -use uniform (equal prob of 1/21) given above with dam capcity 20 units. e) How does the solution change if the probability of inflow is poisson with mean 10 (use 4.7)
VFA matlab code with 4 schemes
VFA with diffusion wavelet
VFA DW Theory
DW - Haar wavelet demo to go with the ppt above
Diffusion wavelets DW code for best basis - multiple levels
excel to show value determination
ADP with only scaling function code
ADP with scaling and wavelet functions code
Inventory problem
14 day cycle - time in hours (336 hrs)
job arrival rate (Poisson average) - 50 per hour per sensor at the beginning of the hour.
job service rate (deterministic)- 51 per hour per analyst
M/D/1 queue
pref metric: maintain the queue length of jobs less than 50 in any given hour (green zone), 50-100 is yellow, and >100 is red zone.
queue length = (rho)^2/(2(1-rho), rho = l/m (arrival/service) = 50/51
# of sensors = 10
# of people available per hour = 10
uncertainty model for learning -additional demand Poisson(50), generate 6 events over 14 days (repeat for several 14 day runs)
additional resource available over 14 days period =7 hours
contribution function = weight*normalized backlog + weight*normalized (leftover resource/left over time in hrs over 14 days)
normalize on to 0-1 scale where 1 means most desirable.
Test:
uncertainty model for learnt (test) - 6 events over 14 days, additional demand = 33 (at hour 32), 47(@54), 70(@80), 69(@170), 51(@210), 47(@320)
allocate the adiitional resource and caluculate queuelength/hour and additional resource consumption for myopic with immediate reaction and RL strategy.