Algorithm 1: HRLearning algorithm |
Partition M into {M0 , M1 , … , Mn} ,such that for each i(0 ≤ i ≤ n), Mi = <Ti , Ai , Ri> |
Randomly initialize π = {π0 , π1 , … , πn} |
For each Mi in M do |
repeat |
Extract features and get Ti and Ni ; |
Randomly initialize Wi ; |
Let s ← s0 ; |
repeat |
Select an action a for s from strategy of Q |
Execute action a computer r, s' and R' = r + F' |
Let Q(i, s, a) ← Q(i, s', a') + α(R' + γmaxaϵA(s)) |
Until s ϵ Ti ; |
Until Mi is finished; |