Terrain Adaptation of Hexapod Robot via CentralPattern Generator and Policy Gradient
- jorge vasquez
- 7 may 2021
- 2 Min. de lectura
Abstract
Recently model-free deep reinforcement learning methods have shown promise inlearning complex locomotion skills for legged robots. However, these approachessuffer from low sample efficiency due to a large action space. To address thisproblem we propose a solution that parameterizes a robot’s gait using a CentralPattern Generator (CPG). By performing policy gradient on CPG parameters in-stead of joint velocities and positions, we reduce the size of our action space tocontain strictly periodic behaviors. We demonstrate how our method is able toperform online gait adaptation on a real-world hexapod robot in rugged terrain.Furthermore, we present a novel approach for terrain representation which allowsfor a smoother sim-to-real transfer. By combining gait parameterization and sim-plified terrain representation with deep reinforcement learning, our robot learns asample-efficient way to traverse unstructured terrain.

1 Introduction
Developing a motion controller for a legged robot to traverse unstructured terrain is challengingbecause of difficulty modeling system dynamics and a high dimensional action space. In recentworks, policy gradient methods for Reinforcement Learning have had success in combating thesechallenges to learn complex control strategies on legged robots [1]. These methods learn an optimalpolicy by allowing the robot interacts with a simulated environment over many trials. They directlylearn a state-to-action policy that minimizes a user specified cost function without any knowledgeof the system dynamics. In our work we frame the problem of legged robot motion planning asa Partially Observable Markov Decision Process (POMDP) that we solve using a model-free RLalgorithm called Proximal Policy optimization (PPO) [2]. We reward our legged-robot for makingforward progress though an unstructured terrain in simulation.Some of the major downsides to model-free approaches are their lack of sample-efficiency anddifficulty transferring to real world. In order to increase the sample efficiency of our RL agent andreduce the size of our action space. We decide to search only in space of periodic policies. Wedo this by introducing a Central Pattern Generator as the low level motion planner for our robot.Central pattern generators (CPGs) are biological neural circuits that produce rhythmic outputs in theabsence of rhythmic input [3]. Typically CPG models are used in robotics to design open-loop gaitsfor articulated robots, such as crawling, swimming or legged robots. We look to close the loop onour CPG model by using reinforcement learning to learn the optimal parameters of the robots gaitshape based on some sensory feedback.In addition to using the CPG to reduce the size of the action space, a bulk of our contribution alsofocuses on reducing the size of our observations space. In our work we explore several methods forlower dimensional terrain representation including classical height maps and a trained variationalauto-encoder. In the full system, we use this module to simplify perception task for our agent. All ofthe work that we present in this paper is applied to a hexapod robot with 18 degrees of freedom. Weshow that by effectively reducing the size of both our action and observation spaces, we are able toincrease sample efficiency. Furthermore, we are able to transition form simulation to the real worldmuch more easily.



Comentarios