SpotMicroAI:Learning-based gait modulation

Abstract

This project builds upon D2-GMBC and researches the importance of Reinforcement Learning (RL) algorithm complexity to the sim-to-real gap of gait modulation methods for locomotion learning. Previous work in learning-based locomotion, has shown that more complex RL algorithms can increase robustness. At the same time, other works found that reducing the complexity of the model by reducing the observation space also decreases the sim-to-real gap. This project inspects the influence of the complexity of RL algorithms on the sim-to-real gap. To achieve this, the Soft Actor-Critic algorithm from Haarnoja et al., 2018a is applied to model the framework of D2-GMBC, and compared against ARS. We pretrain both algorithms in simulation and perform two evaluations on the real robot.

For more information see the report or GitHub repository:

Experiment 1: simulation training

From the learning curves in Figure 1 we see that although the starting reward of SAC is much lower, it learns much faster than ARS. This is in line with our expectations since ARS only updates its policy once per episode and SAC after each episode step after the replay buffer is full enough. However, we see that ARS achieves a much higher reward than SAC and that the reward of ARS always increases over time. Both this non-decreasing reward and the much higher reward of ARS training are likely a result of the multiple rollouts, since ARS performs multiple training episodes in parallel.

(a) SAC
(b) ARS
Figure 1 Training results in simulation

Before the quality of the RL agents on the real robot can be inspected we first compare the IMU data between the simulation and the real robot. To achieve this comparison, we use IK to make the robot apply a roll in both directions followed by a pitch in both directions. In Figure 2 three of the eight measurements are displayed. As seen from the Figures, the roll and pitch measurements are similar to those in the simulation. The angular twist also follows the same values as the simulation, but already contains a lot more noise, especially in the z-direction.

(a) Roll
(b)Pitch
(c) Angular twist: x-direction
Figure 2IMU results of the same movement in simulation and on the real robot. In red are the results of the real robot and in blue are the results of the simulation.

Experiment 2: traversal speed

From the table below we notice that neither of the agents was able to improve the forward walking speed above that of the baseline. When looking at the standard deviation we do note that the ARS agent has a more predictable walking speed, as indicated by the low standard deviation. The SAC agent is significantly slower than the other two and has an average walking speed of 21 seconds per meter compared to the 15 seconds per meter of the other two models. However, the walking speed was only one part of the reward signal, so to evaluate the walking stability we need to inspect the quality of the walk itself.

Mean Standard deviation Failed Runs
Baseline 15.20 2.56 0
ARS 15.33 0.94 1
SAC 21.22 1.47 1

Experiment 3: gait analysis

The ARS video shows adaptive behavior when the robot threatens to fall backward or forward. However, in the video it is also visible that the agent sometimes overcompensates, causing the body to swing from left to right or from front to back. Therefore we hypothesize that the reduced traversal time likely arises from the falling prevention behavior.

The SAC agent displays even more complex falling prevention behavior. In the video, this is visible when the roll or pitch becomes too high, causing the agent to stand still for a short term. Thus the agent can not only modify one or two legs based on the body angles but when the IMU values diverge significantly enough, the agent even seems to halt all movement. However, from the start of the SAC clip, we see this falling prevention can hinder the gait significantly since it has troubles with starting the gait. Yet, after the agent has achieved a stable gait, the SAC agent demonstrates significant adaptive behavior.

(a) SAC
(b) ARS
Figure 1 Real world gait modulation using the two RL agents

For the full video’s see link below;