Algorithm 1: HRLearning algorithm
Partition M into {M0 , M1 , … , Mn} ,such that for each i(0 ≤ in), Mi = <Ti , Ai , Ri>
Randomly initialize π = {π0 , π1 , … , πn}
For each Mi in M do
    repeat
       Extract features and get Ti and Ni ;
       Randomly initialize Wi ;
       Let ss0 ;
       repeat
       Select an action a for s from strategy of Q
       Execute action a computer r, s' and R' = r + F'
       Let Q(i, s, a) ← Q(i, s', a') + α(R' + γmaxaϵA(s))
    Until s ϵ Ti ;
  Until Mi is finished;