| Algorithm 1: HRLearning algorithm |
| Partition M into {M0 , M1 , … , Mn} ,such that for each i(0 ≤ i ≤ n), Mi = <Ti , Ai , Ri> |
| Randomly initialize π = {π0 , π1 , … , πn} |
| For each Mi in M do |
| repeat |
| Extract features and get Ti and Ni ; |
| Randomly initialize Wi ; |
| Let s ← s0 ; |
| repeat |
| Select an action a for s from strategy of Q |
| Execute action a computer r, s' and R' = r + F' |
| Let Q(i, s, a) ← Q(i, s', a') + α(R' + γmaxaϵA(s)) |
| Until s ϵ Ti ; |
| Until Mi is finished; |