X-Morph retargets human motions to kinematically plausible references for multiple legged morphologies, trains privileged RL trackers, and distills them into deployable policies that generalize and enable teleoperation and text-conditioned generation.
APEX: Action Priors Enable Efficient Exploration for Robust Motion Tracking on Legged Robots
1 Pith paper cite this work. Polarity classification is still indexing.
abstract
Learning natural, animal-like locomotion from demonstrations has become a core paradigm in legged robotics. While motion tracking can reproduce reference gaits, many approaches still require substantial tuning and depend on reference motion inputs at deployment, which can limit responsiveness to task objectives and reduce adaptability. We present APEX (Action Priors enable Efficient eXploration), a motion-tracking reinforcement learning (RL) framework that removes deployment-time dependence on reference motion inputs, improves sample efficiency, and reduces tuning effort. APEX integrates demonstrations into RL via decaying action priors, which guide early exploration toward demonstration-consistent actions and then fade to zero, yielding a pure RL policy at deployment. This is combined with a multi-critic framework that separates style and task + regularization learning signals. Moreover, APEX enables a single policy to learn diverse motions and transfer reference-like styles across different terrains and velocities, while remaining robust to variations in training parameters. We validate our method in simulation on both humanoid and quadruped robots, and with zero-shot deployment on a Unitree Go2 robot. Website and code: https://marmotlab.github.io/APEX/.
fields
cs.RO 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
X-Morph: Human Motion Priors for Scalable Robot Learning Across Morphologies
X-Morph retargets human motions to kinematically plausible references for multiple legged morphologies, trains privileged RL trackers, and distills them into deployable policies that generalize and enable teleoperation and text-conditioned generation.