Hierarchical Reinforcement Learning Based Self-balancing Algorithm for Two-wheeled Robots

Algorithm 1: HRLearning algorithm

Partition M into {M₀ , M₁ , … , M_n} ,such that for each i(0 ≤ i ≤ n), M_i = <T_i , A_i , R_i>

Randomly initialize π = {π₀ , π₁ , … , π_n}

For each M_i in M do

repeat

Extract features and get T_i and N_i ;

Randomly initialize W_i ;

Let s ← s₀ ;

repeat

Select an action a for s from strategy of Q

Execute action a computer r, s' and R' = r + F'

Let Q(i, s, a) ← Q(i, s', a') + α(R' + γmax_aϵA(s))

Until s ϵ T_i ;

Until M_i is finished;