MuGen: Multi-Skill Generative Locomotion Controller for Humanoid Robots

Baoquan Chen; Boyang Yu; Heyuan Yao; Libin Liu; Pengyun Qiu; Ruijie Zhao; Xiang Wang; Xinyu Huo; Yusen Feng; Zixi Kang

arxiv: 2605.24592 · v1 · pith:6RPVGBDPnew · submitted 2026-05-23 · 💻 cs.RO

MuGen: Multi-Skill Generative Locomotion Controller for Humanoid Robots

Yusen Feng , Xiang Wang , Heyuan Yao , Zixi Kang , Xinyu Huo , Boyang Yu , Pengyun Qiu , Ruijie Zhao

show 2 more authors

Baoquan Chen Libin Liu

This is my paper

Pith reviewed 2026-06-30 13:10 UTC · model grok-4.3

classification 💻 cs.RO

keywords humanoid locomotionmotion imitationvector-quantized autoencoderreinforcement learningpolicy distillationgenerative controlmulti-skill controller

0 comments

The pith

A humanoid robot learns to track and mimic unseen human motions by compressing motion data into a reusable latent space.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces MuGen, a framework that trains vector-quantized autoencoders on heterogeneous human motion data using model-based reinforcement learning to build a generative locomotion representation. A teacher-student distillation produces a deployable policy that follows new motion sequences and reuses the learned latent space for additional tasks. If the approach holds, robots could handle expressive multi-skill behaviors from example data without separate retraining for each skill.

Core claim

By training vector-quantized autoencoders with model-based reinforcement learning on hours of heterogeneous human performance data, MuGen creates a generative representation of locomotion; a student policy distilled from a teacher then tracks and mimics unseen human motions while enabling reuse of the latent space for other tasks, demonstrated across a diverse set of motions with accurate execution.

What carries the argument

Vector-quantized autoencoders (VQ-VAEs) trained with model-based reinforcement learning that compress human motion patterns into a discrete generative latent space.

If this is right

The robot executes a diverse set of motions accurately when guided by example sequences.
The latent space supports direct reuse for other locomotion tasks without full policy retraining.
The distilled student policy transfers to physical humanoid hardware.
Training on heterogeneous data yields generalization to novel motion inputs.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same latent space might simplify high-level task planning by providing reusable motion primitives.
Similar compression could apply to non-humanoid robots if the representation separates motion style from platform dynamics.
Physical results would need to quantify how sim-to-real gaps affect tracking of unseen motions.
The approach might reduce the data required for new skills if the latent codes already encode transferable patterns.

Load-bearing premise

The vector-quantized autoencoders trained with model-based reinforcement learning produce a generative representation that captures key patterns of human motion from heterogeneous data in a way that supports generalization to unseen motions on a physical humanoid.

What would settle it

A physical robot test in which the policy receives motion sequences drawn from outside the training distribution and shows large tracking errors compared with in-distribution motions would falsify the generalization claim.

Figures

Figures reproduced from arXiv: 2605.24592 by Baoquan Chen, Boyang Yu, Heyuan Yao, Libin Liu, Pengyun Qiu, Ruijie Zhao, Xiang Wang, Xinyu Huo, Yusen Feng, Zixi Kang.

**Figure 1.** Figure 1: MuGen enables multi-skill humanoid locomotion by learning a generative controller. (a-d): A simulated humanoid tracks (a) a long walking motion manually crafted from motion data and (b–d) dance, run, and crouching walk motions selected from the motion dataset. (e-f): A real Unitree G1 robot tracks short segments of (e) a straight walk and (f) a crouching walk from the motion dataset. Details are provided i… view at source ↗

**Figure 2.** Figure 2: System overview 1) Motion Skill Embedding: states and reference motions are encoded into continuous representations, then a VQ bottleneck maps embeddings to a trainable codebook. 2) Student policy: using the shared codebook, the student decodes actions from partial observations and aligns them with teacher outputs through behavior cloning. All training is performed in simulation environment. A. Problem For… view at source ↗

**Figure 3.** Figure 3: The trajectory generated by the student policy using the learned prior encoder [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: Training Curves of different model designs. We cal [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

**Figure 5.** Figure 5: Data Distribution Rotations are represented using the 6D continuous format from [84]. After this transformation, at each timestep, the state (¯xi , q¯i , v¯i , ω¯i) has shape (Nbody, 3 + 6 + 3 + 3), corresponding to body position, body orientation (6D), linear velocity, and angular velocity. States and reference motions in global space encompass privileged information that is typically inaccessible to onbo… view at source ↗

read the original abstract

This paper presents MuGen, a data-driven framework for learning and deploying multi-skill locomotion on humanoid robots. MuGen enables a robot to perform expressive motions like humans under the guidance of example motion sequences. To achieve this, we employ vector-quantized autoencoders (VQ-VAEs) trained with model-based reinforcement learning, resulting in a generative representation of locomotion that captures key patterns of human motion from hours of heterogeneous human performance data. We employ a teacher-student learning framework and develop a new policy distillation strategy to enable a deployable student policy learning this efficient latent representation. This policy allows the robot to track and mimic unseen human motions and further enables the robot to reuse the learned latent space for other tasks. We demonstrate the effectiveness of our framework through a diverse set of motions and accurate execution.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

MuGen combines VQ-VAE with model-based RL and distillation for multi-skill humanoid locomotion but the abstract supplies no numbers or validation to support the unseen-motion generalization claim.

read the letter

The paper's main move is to train vector-quantized autoencoders with model-based reinforcement learning on heterogeneous human motion data, then distill the resulting latent representation into a deployable student policy via a new teacher-student strategy. The stated payoff is that the robot can track motions outside the training set and reuse the same latent space for additional tasks.

The concrete contribution is the specific pipeline that turns hours of varied human performance data into a generative locomotion controller. Framing the problem around a reusable discrete latent space rather than direct imitation or single-skill tracking is a reasonable direction for multi-skill humanoid work, and the distillation step addresses the practical need to get something that runs on hardware.

The soft spot is the complete absence of quantitative results, ablations, error bars, or hardware metrics in the abstract. The central claim that the latent codes support tracking of truly unseen motions on a physical robot rests on an untested assumption that the VQ-VAE has extracted transferable patterns rather than dataset artifacts. No architecture details, loss terms, retargeting procedure, or definition of "unseen" appear, so the soundness cannot be judged from the given text. The stress-test concern holds up on the material provided.

This is aimed at researchers working on data-driven locomotion and generative control for humanoids. Readers already exploring VQ-VAE or model-based RL for motion might pick up the distillation tactic. It deserves a serious referee because the problem is relevant and the method is specific enough for reviewers to evaluate the experiments once they are in the manuscript.

Recommendation: send it to peer review rather than desk reject, on the condition that the full paper contains the missing validation and comparisons.

Referee Report

2 major / 0 minor

Summary. The paper presents MuGen, a data-driven framework for multi-skill locomotion control on humanoid robots. It trains vector-quantized autoencoders (VQ-VAEs) via model-based reinforcement learning on hours of heterogeneous human motion data to obtain a generative latent representation, then applies a teacher-student distillation procedure to produce a deployable student policy. The resulting policy is claimed to track and mimic unseen human motions while also permitting reuse of the learned latent space for additional tasks, with effectiveness shown through a diverse set of motions and accurate execution on the robot.

Significance. If the central generalization claims hold with supporting quantitative evidence, the work would offer a practical route to expressive, multi-skill humanoid controllers learned from real human data that transfer to physical hardware and support downstream task reuse. The integration of VQ-VAE discretization with model-based RL and policy distillation addresses a relevant gap between motion capture data and deployable controllers.

major comments (2)

[Abstract] Abstract: the central claim that the policy 'allows the robot to track and mimic unseen human motions' and 'enables the robot to reuse the learned latent space for other tasks' is presented without any quantitative tracking metrics, success rates, error statistics, ablation studies, or hardware validation details, rendering the generalization performance impossible to assess.
[Abstract] Abstract: no definition is supplied for what constitutes an 'unseen' motion, no description of the data retargeting procedure, and no architecture or loss terms for the VQ-VAE or model-based RL training are given, all of which are load-bearing for verifying that the discrete codes capture transferable patterns rather than dataset-specific or simulation artifacts.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on the abstract. We address the points below and will revise the abstract to better support the claims with key details while respecting length constraints.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim that the policy 'allows the robot to track and mimic unseen human motions' and 'enables the robot to reuse the learned latent space for other tasks' is presented without any quantitative tracking metrics, success rates, error statistics, ablation studies, or hardware validation details, rendering the generalization performance impossible to assess.

Authors: The abstract is a high-level summary; quantitative results (tracking errors, success rates on unseen motions, hardware execution accuracy) and ablations appear in Sections 5 and 6. We will revise the abstract to include concise quantitative highlights (e.g., average tracking error and success rate on held-out motions) to make the generalization claims more immediately assessable. revision: yes
Referee: [Abstract] Abstract: no definition is supplied for what constitutes an 'unseen' motion, no description of the data retargeting procedure, and no architecture or loss terms for the VQ-VAE or model-based RL training are given, all of which are load-bearing for verifying that the discrete codes capture transferable patterns rather than dataset-specific or simulation artifacts.

Authors: Space limits prevent full details in the abstract, but 'unseen' motions are defined as sequences from held-out subjects/styles (Section 4.1), retargeting uses standard SMPL-to-robot mapping (Section 3.2), and VQ-VAE architecture/losses plus model-based RL objective are specified in Section 3.3. We will add brief clarifications to the abstract (e.g., 'unseen motions from different performers') to address verifiability. revision: yes

Circularity Check

0 steps flagged

No circularity detected; claims rest on empirical training outcomes

full rationale

The provided abstract and description contain no equations, loss functions, or derivation steps. The central claim—that VQ-VAE latent codes trained via model-based RL on heterogeneous motion data enable tracking of unseen motions and latent-space reuse—is presented as an empirical result of the training and distillation process rather than a quantity derived from or equivalent to its own inputs by construction. No self-citations, ansatzes, or fitted-input-as-prediction patterns appear in the given text. The derivation chain is therefore self-contained against external benchmarks and receives the default non-circularity finding.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review; free parameters, axioms, and invented entities cannot be enumerated without the methods and results sections.

pith-pipeline@v0.9.1-grok · 5694 in / 1024 out tokens · 28469 ms · 2026-06-30T13:10:42.472522+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

84 extracted references · 33 canonical work pages · 3 internal anchors

[1]

Adversarial motion priors make good substitutes for complex reward functions. 2022 ieee,

A. Escontrela, X. B. Peng, W. Yu, T. Zhang, A. Iscen, K. Goldberg, and P. Abbeel, “Adversarial motion priors make good substitutes for complex reward functions. 2022 ieee,” inInternational Conference on Intelligent Robots and Systems (IROS), vol. 2, 2022

2022
[2]

Amp in the wild: Learning robust, agile, natural legged locomotion skills,

Y . Wang, Z. Jiang, and J. Chen, “Amp in the wild: Learning robust, agile, natural legged locomotion skills,”arXiv preprint arXiv:2304.10888, 2023

work page arXiv 2023
[3]

Exbody2: Advanced expressive humanoid whole-body control,

M. Ji, X. Peng, F. Liu, J. Li, G. Yang, X. Cheng, and X. Wang, “Exbody2: Advanced expressive humanoid whole-body control,”arXiv preprint arXiv:2412.13196, 2024

work page arXiv 2024
[4]

Styleloco: Generative adversarial distillation for natural humanoid robot locomotion,

L. Ma, Z. Meng, T. Liu, Y . Li, R. Song, W. Zhang, and S. Huang, “Styleloco: Generative adversarial distillation for natural humanoid robot locomotion,”arXiv preprint arXiv:2503.15082, 2025

work page arXiv 2025
[5]

Controlvae: Model-based learning of generative controllers for physics-based characters,

H. Yao, Z. Song, B. Chen, and L. Liu, “Controlvae: Model-based learning of generative controllers for physics-based characters,”ACM Transactions on Graphics (TOG), vol. 41, no. 6, pp. 1–16, 2022

2022
[6]

Moconvq: Uni- fied physics-based motion control via scalable discrete representations,

H. Yao, Z. Song, Y . Zhou, T. Ao, B. Chen, and L. Liu, “Moconvq: Uni- fied physics-based motion control via scalable discrete representations,” ACM Transactions on Graphics (TOG), vol. 43, no. 4, pp. 1–21, 2024

2024
[7]

A reduction of imitation learning and structured prediction to no-regret online learning,

S. Ross, G. Gordon, and D. Bagnell, “A reduction of imitation learning and structured prediction to no-regret online learning,” inProceedings of the fourteenth international conference on artificial intelligence and statistics, 2011

2011
[8]

Development of wabot 1,

I. Kato, “Development of wabot 1,”Biomechanism, vol. 2, pp. 173–214, 1973

1973
[9]

Dynamic walk of a biped,

H. Miura and I. Shimoyama, “Dynamic walk of a biped,”IJRR, 1984

1984
[10]

The development of honda humanoid robot,

K. Hirai, M. Hirose, Y . Haikawa, and T. Takenaka, “The development of honda humanoid robot,” inProceedings. 1998 IEEE international conference on robotics and automation (Cat. No. 98CH36146), vol. 2. IEEE, 1998, pp. 1321–1326

1998
[11]

High speed whole body dynamic motion experiment with real time master-slave humanoid robot system,

Y . Ishiguro, K. Kojima, F. Sugai, S. Nozawa, Y . Kakiuchi, K. Okada, and M. Inaba, “High speed whole body dynamic motion experiment with real time master-slave humanoid robot system,” in2018 IEEE International Conference on Robotics and Automation (ICRA), 2018, pp. 5835–5841

2018
[12]

Dynamic locomotion synchronization of bipedal robot and human operator via bilateral feedback teleoperation,

J. Ramos and S. Kim, “Dynamic locomotion synchronization of bipedal robot and human operator via bilateral feedback teleoperation,”Science Robotics, vol. 4, no. 35, p. eaav4282, 2019

2019
[13]

On real- time whole-body human to humanoid motion transfer,

F.-J. Montecillo-Puente, M. N. Sreenivasa, and J.-P. Laumond, “On real- time whole-body human to humanoid motion transfer,” inInternational Conference on Informatics in Control, Automation and Robotics, 2010. [Online]. Available: https://api.semanticscholar.org/CorpusID:20676844

2010
[14]

Real-time imitation of human whole-body motions by humanoids,

J. Koenemann, F. Burget, and M. Bennewitz, “Real-time imitation of human whole-body motions by humanoids,” in2014 IEEE International Conference on Robotics and Automation (ICRA), 2014, pp. 2806–2812

2014
[15]

Dancing humanoid robots: Systematic use of osid to compute dynamically consistent movements following a motion capture pattern,

O. E. Ramos, N. Mansard, O. Stasse, C. Benazeth, S. Hak, and L. Saab, “Dancing humanoid robots: Systematic use of osid to compute dynamically consistent movements following a motion capture pattern,” IEEE Robotics and Automation Magazine, vol. 22, no. 4, pp. 16–26, 2015

2015
[16]

Hybrid zero dynamics of planar biped walkers,

E. R. Westervelt, J. W. Grizzle, and D. E. Koditschek, “Hybrid zero dynamics of planar biped walkers,”IEEE transactions on automatic control, vol. 48, no. 1, pp. 42–56, 2003

2003
[17]

Whole body humanoid control from human motion descriptors,

B. Dariush, M. Gienger, B. Jian, C. Goerick, and K. Fujimura, “Whole body humanoid control from human motion descriptors,” in2008 IEEE International Conference on Robotics and Automation. IEEE, 2008, pp. 2677–2684

2008
[18]

The 3d linear inverted pendulum mode: A simple modeling for a biped walking pattern generation,

S. Kajita, F. Kanehiro, K. Kaneko, K. Yokoi, and H. Hirukawa, “The 3d linear inverted pendulum mode: A simple modeling for a biped walking pattern generation,” inProceedings 2001 IEEE/RSJ International Con- ference on Intelligent Robots and Systems. Expanding the Societal Role of Robotics in the the Next Millennium (Cat. No. 01CH37180), vol. 1. IEEE, 2001...

2001
[19]

Anymal-a highly mobile and dynamic quadrupedal robot,

M. Hutter, C. Gehring, D. Jud, A. Lauber, C. D. Bellicoso, V . Tsounis, J. Hwangbo, K. Bodie, P. Fankhauser, M. Bloesch,et al., “Anymal-a highly mobile and dynamic quadrupedal robot,” inIROS, 2016

2016
[20]

Whole-body geometric retargeting for humanoid robots,

K. Darvish, Y . Tirupachuri, G. Romualdi, L. Rapetti, D. Ferigo, F. J. A. Chavez, and D. Pucci, “Whole-body geometric retargeting for humanoid robots,” in2019 IEEE-RAS 19th International Conference on Humanoid Robots (Humanoids), 2019, pp. 679–686

2019
[21]

A multimode teleoperation framework for humanoid loco-manipulation: An application for the icub robot,

L. Penco, N. Scianca, V . Modugno, L. Lanari, G. Oriolo, and S. Ivaldi, “A multimode teleoperation framework for humanoid loco-manipulation: An application for the icub robot,”IEEE Robotics and Automation Magazine, vol. 26, no. 4, pp. 73–82, 2019

2019
[22]

Dynamic locomotion synchronization of bipedal robot and human operator via bilateral feedback teleoperation,

J. Ramos and S. Kim, “Dynamic locomotion synchronization of bipedal robot and human operator via bilateral feedback teleoperation,”Science Robotics, vol. 4, no. 35, p. eaav4282, 2019. [Online]. Available: https://www.science.org/doi/abs/10.1126/scirobotics.aav4282

work page doi:10.1126/scirobotics.aav4282 2019
[23]

Bilateral humanoid teleoperation system using whole-body exoskeleton cockpit tablis,

Y . Ishiguro, T. Makabe, Y . Nagamatsu, Y . Kojio, K. Kojima, F. Sugai, Y . Kakiuchi, K. Okada, and M. Inaba, “Bilateral humanoid teleoperation system using whole-body exoskeleton cockpit tablis,”IEEE Robotics and Automation Letters, vol. 5, no. 4, pp. 6419–6426, 2020

2020
[24]

Mixed reality teleoperation assistance for direct control of humanoids,

L. Penco, K. Momose, S. McCrory, D. Anderson, N. Kitchel, D. Calvert, and R. J. Griffin, “Mixed reality teleoperation assistance for direct control of humanoids,”IEEE Robotics and Automation Letters, vol. 9, no. 2, pp. 1937–1944, 2024

1937
[25]

Whole-body control of humanoid robots,

F. L. Moro and L. Sentis, “Whole-body control of humanoid robots,” Humanoid Robotics: A reference, Springer, Dordrecht, 2019

2019
[26]

Online non-linear centroidal mpc for humanoid robot locomotion with step adjustment,

G. Romualdi, S. Dafarra, G. L’Erario, I. Sorrentino, S. Traversaro, and D. Pucci, “Online non-linear centroidal mpc for humanoid robot locomotion with step adjustment,” in2022 International Conference on Robotics and Automation (ICRA). IEEE, 2022, pp. 10 412–10 419

2022
[27]

Mptc-modular passive tracking controller for stack of tasks based control frameworks,

J. Englsberger, A. Dietrich, G.-A. Mesesan, G. Garofalo, C. Ott, and A. O. Albu-Sch¨affer, “Mptc-modular passive tracking controller for stack of tasks based control frameworks,”16th Robotics: Science and Systems, RSS 2020, 2020

2020
[28]

Online non-linear centroidal mpc for humanoid robots payload carrying with contact-stable force parametrization,

M. Elobaid, G. Romualdi, G. Nava, L. Rapetti, H. A. O. Mohamed, and D. Pucci, “Online non-linear centroidal mpc for humanoid robots payload carrying with contact-stable force parametrization,” in2023 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2023, pp. 12 233–12 239. 8

2023
[29]

The mit humanoid robot: Design, motion planning, and control for acrobatic behaviors,

M. Chignoli, D. Kim, E. Stanger-Jones, and S. Kim, “The mit humanoid robot: Design, motion planning, and control for acrobatic behaviors,” in 2020 IEEE-RAS 20th International Conference on Humanoid Robots (Humanoids). IEEE, 2021, pp. 1–8

2020
[30]

Synchronized human-humanoid motion imitation,

A. Dallard, M. Benallegue, F. Kanehiro, and A. Kheddar, “Synchronized human-humanoid motion imitation,”IEEE Robotics and Automation Letters, vol. 8, no. 7, pp. 4155–4162, 2023

2023
[31]

Learning agile robotic locomotion skills by imitating animals,

X. B. Peng, E. Coumans, T. Zhang, T.-W. Lee, J. Tan, and S. Levine, “Learning agile robotic locomotion skills by imitating animals,” Apr. 2020

2020
[32]

RMA: Rapid Motor Adaptation for Legged Robots

A. Kumar, Z. Fu, D. Pathak, and J. Malik, “Rma: Rapid motor adaptation for legged robots,”arXiv preprint arXiv:2107.04034, 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021
[33]

Minimizing energy con- sumption leads to the emergence of gaits in legged robots,

Z. Fu, A. Kumar, J. Malik, and D. Pathak, “Minimizing energy con- sumption leads to the emergence of gaits in legged robots,”Conference on Robot Learning (CoRL), 2021

2021
[34]

Blind bipedal stair traversal via sim-to-real reinforcement learning,

J. Siekmann, K. Green, J. Warila, A. Fern, and J. Hurst, “Blind bipedal stair traversal via sim-to-real reinforcement learning,”arXiv preprint arXiv:2105.08328, 2021

work page arXiv 2021
[35]

Robot parkour learning,

Z. Zhuang, Z. Fu, J. Wang, C. Atkeson, S. Schwertfeger, C. Finn, and H. Zhao, “Robot parkour learning,” inConference on Robot Learning (CoRL), 2023

2023
[36]

Extreme parkour with legged robots,

X. Cheng, K. Shi, A. Agarwal, and D. Pathak, “Extreme parkour with legged robots,”arXiv preprint arXiv:2309.14341, 2023

work page arXiv 2023
[37]

Beamdojo: Learning agile humanoid locomotion on sparse footholds,

H. Wang, Z. Wang, J. Ren, Q. Ben, T. Huang, W. Zhang, and J. Pang, “Beamdojo: Learning agile humanoid locomotion on sparse footholds,” arXiv preprint arXiv:2502.10363, 2025

work page arXiv 2025
[38]

Legs as manipulator: Pushing quadrupedal agility beyond locomotion,

X. Cheng, A. Kumar, and D. Pathak, “Legs as manipulator: Pushing quadrupedal agility beyond locomotion,” in2023 IEEE International Conference on Robotics and Automation (ICRA), 2023

2023
[39]

Learn- ing whole-body manipulation for quadrupedal robot,

S. Jeon, M. Jung, S. Choi, B. Kim, and J. Hwangbo, “Learn- ing whole-body manipulation for quadrupedal robot,”arXiv preprint arXiv:2308.16820, 2023

work page arXiv 2023
[40]

Curiosity-driven learning of joint locomotion and manipulation tasks,

C. Schwarke, V . Klemm, M. Van der Boon, M. Bjelonic, and M. Hutter, “Curiosity-driven learning of joint locomotion and manipulation tasks,” inProceedings of The 7th Conference on Robot Learning, vol. 229. PMLR, 2023, pp. 2594–2610

2023
[41]

Dribblebot: Dynamic legged manipulation in the wild,

Y . Ji, G. B. Margolis, and P. Agrawal, “Dribblebot: Dynamic legged manipulation in the wild,”arXiv preprint arXiv:2304.01159, 2023

work page arXiv 2023
[42]

Deepmimic: Example-guided deep reinforcement learning of physics-based character skills,

X. B. Peng, P. Abbeel, S. Levine, and M. Van de Panne, “Deepmimic: Example-guided deep reinforcement learning of physics-based character skills,”ACM Transactions On Graphics (TOG), pp. 1–14, 2018

2018
[43]

Amp: Adversarial motion priors for stylized physics-based character control,

X. B. Peng, Z. Ma, P. Abbeel, S. Levine, and A. Kanazawa, “Amp: Adversarial motion priors for stylized physics-based character control,” ACM Trans. Graph., vol. 40, no. 4, July 2021. [Online]. Available: http://doi.acm.org/10.1145/3450626.3459670

work page doi:10.1145/3450626.3459670 2021
[44]

Ase: Large- scale reusable adversarial skill embeddings for physically simulated characters,

X. B. Peng, Y . Guo, L. Halper, S. Levine, and S. Fidler, “Ase: Large- scale reusable adversarial skill embeddings for physically simulated characters,”ACM Trans. Graph., vol. 41, no. 4, July 2022

2022
[45]

Calm: Conditional adversarial latent models for directable virtual characters,

C. Tessler, Y . Kasten, Y . Guo, S. Mannor, G. Chechik, and X. B. Peng, “Calm: Conditional adversarial latent models for directable virtual characters,” inACM SIGGRAPH 2023 Conference Proceedings, ser. SIGGRAPH ’23. New York, NY , USA: Association for Computing Machinery, 2023. [Online]. Available: https://doi.org/10.1145/3588432. 3591541

work page doi:10.1145/3588432 2023
[46]

Reinforcement learning for robust parameterized locomo- tion control of bipedal robots,

Z. Li, X. Cheng, X. B. Peng, P. Abbeel, S. Levine, G. Berseth, and K. Sreenath, “Reinforcement learning for robust parameterized locomo- tion control of bipedal robots,” in2021 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2021, pp. 2811–2817

2021
[47]

Reinforcement learning for versatile, dynamic, and robust bipedal locomotion control,

Z. Li, X. B. Peng, P. Abbeel, S. Levine, G. Berseth, and K. Sreenath, “Reinforcement learning for versatile, dynamic, and robust bipedal locomotion control,”arXiv preprint arXiv:2401.16889, 2024

work page arXiv 2024
[48]

Real-world humanoid locomotion with reinforcement learning,

I. Radosavovic, T. Xiao, B. Zhang, T. Darrell, J. Malik, and K. Sreenath, “Real-world humanoid locomotion with reinforcement learning,”arXiv:2303.03381, 2023

work page arXiv 2023
[49]

Homie: Humanoid loco-manipulation with isomorphic exoskeleton cockpit,

Q. Ben, F. Jia, J. Zeng, J. Dong, D. Lin, and J. Pang, “Homie: Humanoid loco-manipulation with isomorphic exoskeleton cockpit,”arXiv preprint arXiv:2502.13013, 2025

work page arXiv 2025
[50]

Robust and versatile bipedal jumping control through multi-task rein- forcement learning,

Z. Li, X. B. Peng, P. Abbeel, S. Levine, G. Berseth, and K. Sreenath, “Robust and versatile bipedal jumping control through multi-task rein- forcement learning,”arXiv preprint arXiv:2302.09450, 2023

work page arXiv 2023
[51]

Learning humanoid standing-up control across diverse postures,

T. Huang, J. Ren, H. Wang, Z. Wang, Q. Ben, M. Wen, X. Chen, J. Li, and J. Pang, “Learning humanoid standing-up control across diverse postures,”arXiv preprint arXiv:2502.08378, 2025

work page arXiv 2025
[52]

Simgan: Hybrid simulator identification for domain adaptation via adversarial reinforcement learning,

Y . Jiang, T. Zhang, D. Ho, Y . Bai, C. K. Liu, S. Levine, and J. Tan, “Simgan: Hybrid simulator identification for domain adaptation via adversarial reinforcement learning,” in2021 IEEE International Con- ference on Robotics and Automation (ICRA), 2021, pp. 2884–2890

2021
[53]

Asap: Aligning simulation and real-world physics for learning agile humanoid whole-body skills,

T. He, J. Gao, W. Xiao, Y . Zhang, Z. Wang, J. Wang, Z. Luo, G. He, N. Sobanbab, C. Pan,et al., “Asap: Aligning simulation and real-world physics for learning agile humanoid whole-body skills,”arXiv preprint arXiv:2502.01143, 2025

work page arXiv 2025
[54]

Legged locomotion in challenging terrains using egocentric vision,

A. Agarwal, A. Kumar, J. Malik, and D. Pathak, “Legged locomotion in challenging terrains using egocentric vision,” inConference on Robot Learning. PMLR, 2023, pp. 403–415

2023
[55]

Neural volumetric memory for visual locomotion control,

R. Yang, G. Yang, and X. Wang, “Neural volumetric memory for visual locomotion control,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 1430–1440

2023
[56]

Learning vision-based bipedal locomotion for challenging terrain,

H. Duan, B. Pandit, M. S. Gadde, B. J. van Marum, J. Dao, C. Kim, and A. Fern, “Learning vision-based bipedal locomotion for challenging terrain,”arXiv preprint arXiv:2309.14594, 2023

work page arXiv 2023
[57]

Character controllers using motion vaes,

H. Y . Ling, F. Zinno, G. Cheng, and M. Van De Panne, “Character controllers using motion vaes,”ACM Transactions on Graphics (TOG), vol. 39, no. 4, pp. 40–1, 2020

2020
[58]

Physics-based character controllers using conditional vaes,

J. Won, D. Gopinath, and J. Hodgins, “Physics-based character controllers using conditional vaes,”ACM Trans. Graph., vol. 41, no. 4, July 2022. [Online]. Available: https://doi.org/10.1145/3528223. 3530067

work page doi:10.1145/3528223 2022
[59]

Generative adversarial imitation learning,

J. Ho and S. Ermon, “Generative adversarial imitation learning,”Ad- vances in neural information processing systems, vol. 29, 2016

2016
[60]

Amp: Adversarial motion priors for stylized physics-based character control,

X. B. Peng, Z. Ma, P. Abbeel, S. Levine, and A. Kanazawa, “Amp: Adversarial motion priors for stylized physics-based character control,” ACM Transactions on Graphics (ToG), vol. 40, no. 4, pp. 1–20, 2021

2021
[61]

Neural categorical priors for physics-based character control,

Q. Zhu, H. Zhang, M. Lan, and L. Han, “Neural categorical priors for physics-based character control,” 2023. [Online]. Available: https://arxiv.org/abs/2308.07200

work page arXiv 2023
[62]

Masked- mimic: Unified physics-based character control through masked motion inpainting,

C. Tessler, Y . Guo, O. Nabati, G. Chechik, and X. B. Peng, “Masked- mimic: Unified physics-based character control through masked motion inpainting,”ACM Transactions on Graphics (TOG), 2024

2024
[63]

Opt-mimic: Imitation of optimized trajectories for dynamic quadruped behaviors,

Y . Fuchioka, Z. Xie, and M. Van de Panne, “Opt-mimic: Imitation of optimized trajectories for dynamic quadruped behaviors,” in2023 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2023, pp. 5092–5098

2023
[64]

Diffuseloco: Real-time legged locomotion control with diffusion from offline datasets,

X. Huang, Y . Chi, R. Wang, Z. Li, X. B. Peng, S. Shao, B. Nikolic, and K. Sreenath, “Diffuseloco: Real-time legged locomotion control with diffusion from offline datasets,” 2024. [Online]. Available: https://arxiv.org/abs/2404.19264

work page arXiv 2024
[65]

An efficient model-based approach on learning agile motor skills without reinforcement,

H. Shi, T. Li, Q. Zhu, J. Sheng, L. Han, and M. Q. H. Meng, “An efficient model-based approach on learning agile motor skills without reinforcement,” 2024. [Online]. Available: https: //arxiv.org/abs/2403.01962

work page arXiv 2024
[66]

Learning Multi-Modal Whole-Body Control for Real-World Humanoid Robots

P. Dugar, A. Shrestha, F. Yu, B. van Marum, and A. Fern, “Learning multi-modal whole-body control for real-world humanoid robots,” 2024. [Online]. Available: https://arxiv.org/abs/2408.07295

work page internal anchor Pith review Pith/arXiv arXiv 2024
[67]

Cheng, Y

X. Cheng, Y . Ji, J. Chen, R. Yang, G. Yang, and X. Wang, “Expressive whole-body control for humanoid robots,” 2024. [Online]. Available: https://arxiv.org/abs/2402.16796

work page arXiv 2024
[68]

Omnih2o: Universal and dexterous human-to-humanoid whole-body teleoperation and learning,

T. He, Z. Luo, X. He, W. Xiao, C. Zhang, W. Zhang, K. Kitani, C. Liu, and G. Shi, “Omnih2o: Universal and dexterous human-to-humanoid whole-body teleoperation and learning,” 2024. [Online]. Available: https://arxiv.org/abs/2406.08858

work page arXiv 2024
[69]

Learning human-to-humanoid real-time whole-body teleoperation,

T. He, Z. Luo, W. Xiao, C. Zhang, K. Kitani, C. Liu, and G. Shi, “Learning human-to-humanoid real-time whole-body teleoperation,” in arXiv, 2024

2024
[70]

Hover: Versatile neural whole-body controller for humanoid robots,

T. He, W. Xiao, T. Lin, Z. Luo, Z. Xu, Z. Jiang, C. Liu, G. Shi, X. Wang, L. Fan, and Y . Zhu, “Hover: Versatile neural whole-body controller for humanoid robots,”arXiv preprint arXiv:2410.21229, 2024

work page arXiv 2024
[71]

Humanplus: Humanoid shadowing and imitation from humans,

Z. Fu, Q. Zhao, Q. Wu, G. Wetzstein, and C. Finn, “Humanplus: Humanoid shadowing and imitation from humans,” 2024. [Online]. Available: https://arxiv.org/abs/2406.10454

work page arXiv 2024
[72]

Humanoid locomotion as next token prediction,

I. Radosavovic, B. Zhang, B. Shi, J. Rajasegaran, S. Kamat, T. Dar- rell, K. Sreenath, and J. Malik, “Humanoid locomotion as next token prediction,”arXiv:2402.19469, 2024

work page arXiv 2024
[73]

Universal humanoid motion representations for physics- based control,

Z. Luo, J. Cao, J. Merel, A. Winkler, J. Huang, K. M. Kitani, and W. Xu, “Universal humanoid motion representations for physics- based control,” inThe Twelfth International Conference on Learning Representations, 2024. [Online]. Available: https://openreview.net/ forum?id=OrOd8PxOO2

2024
[74]

Strategy and skill learning for physics- based table tennis animation,

J. Wang, J. Hodgins, and J. Won, “Strategy and skill learning for physics- based table tennis animation,” inACM SIGGRAPH 2024 Conference Papers, 2024, pp. 1–11

2024
[75]

Robot motion diffusion model: Motion generation for robotic characters,

A. Serifi, R. Grandia, E. Knoop, M. Gross, and M. B ¨acher, “Robot motion diffusion model: Motion generation for robotic characters,” in SIGGRAPH Asia 2024 Conference Papers, ser. SA ’24. New York, 9 NY , USA: Association for Computing Machinery, 2024. [Online]. Available: https://doi.org/10.1145/3680528.3687626

work page doi:10.1145/3680528.3687626 2024
[76]

Learning agile robotic locomotion skills by imitating animals,

X. B. Peng, E. Coumans, T. Zhang, T.-W. E. Lee, J. Tan, and S. Levine, “Learning agile robotic locomotion skills by imitating animals,” in Robotics: Science and Systems, 07 2020

2020
[77]

Neural discrete representation learning,

A. van den Oord, O. Vinyals, and K. Kavukcuoglu, “Neural discrete representation learning,” inProceedings of the 31st International Con- ference on Neural Information Processing Systems, ser. NIPS’17. Red Hook, NY , USA: Curran Associates Inc., 2017, p. 6309–6318

2017
[78]

Learn to teach: Sample-efficient privileged learning for humanoid locomotion over diverse terrains,

F. Wu, X. Nal, J. Jang, W. Zhu, Z. Gu, A. Wu, and Y . Zhao, “Learn to teach: Sample-efficient privileged learning for humanoid locomotion over diverse terrains,” 2025. [Online]. Available: https: //arxiv.org/abs/2402.06783

work page arXiv 2025
[79]

Cts: Concurrent teacher-student reinforcement learning for legged locomotion,

H. Wang, H. Luo, W. Zhang, and H. Chen, “Cts: Concurrent teacher-student reinforcement learning for legged locomotion,” 2024. [Online]. Available: https://arxiv.org/abs/2405.10830

work page arXiv 2024
[80]

Isaac Gym: High Performance GPU-Based Physics Simulation For Robot Learning

V . Makoviychuk, L. Wawrzyniak, Y . Guo, M. Lu, K. Storey, M. Macklin, D. Hoeller, N. Rudin, A. Allshire, A. Handa,et al., “Isaac gym: High performance gpu-based physics simulation for robot learning,”arXiv preprint arXiv:2108.10470, 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021

Showing first 80 references.

[1] [1]

Adversarial motion priors make good substitutes for complex reward functions. 2022 ieee,

A. Escontrela, X. B. Peng, W. Yu, T. Zhang, A. Iscen, K. Goldberg, and P. Abbeel, “Adversarial motion priors make good substitutes for complex reward functions. 2022 ieee,” inInternational Conference on Intelligent Robots and Systems (IROS), vol. 2, 2022

2022

[2] [2]

Amp in the wild: Learning robust, agile, natural legged locomotion skills,

Y . Wang, Z. Jiang, and J. Chen, “Amp in the wild: Learning robust, agile, natural legged locomotion skills,”arXiv preprint arXiv:2304.10888, 2023

work page arXiv 2023

[3] [3]

Exbody2: Advanced expressive humanoid whole-body control,

M. Ji, X. Peng, F. Liu, J. Li, G. Yang, X. Cheng, and X. Wang, “Exbody2: Advanced expressive humanoid whole-body control,”arXiv preprint arXiv:2412.13196, 2024

work page arXiv 2024

[4] [4]

Styleloco: Generative adversarial distillation for natural humanoid robot locomotion,

L. Ma, Z. Meng, T. Liu, Y . Li, R. Song, W. Zhang, and S. Huang, “Styleloco: Generative adversarial distillation for natural humanoid robot locomotion,”arXiv preprint arXiv:2503.15082, 2025

work page arXiv 2025

[5] [5]

Controlvae: Model-based learning of generative controllers for physics-based characters,

H. Yao, Z. Song, B. Chen, and L. Liu, “Controlvae: Model-based learning of generative controllers for physics-based characters,”ACM Transactions on Graphics (TOG), vol. 41, no. 6, pp. 1–16, 2022

2022

[6] [6]

Moconvq: Uni- fied physics-based motion control via scalable discrete representations,

H. Yao, Z. Song, Y . Zhou, T. Ao, B. Chen, and L. Liu, “Moconvq: Uni- fied physics-based motion control via scalable discrete representations,” ACM Transactions on Graphics (TOG), vol. 43, no. 4, pp. 1–21, 2024

2024

[7] [7]

A reduction of imitation learning and structured prediction to no-regret online learning,

S. Ross, G. Gordon, and D. Bagnell, “A reduction of imitation learning and structured prediction to no-regret online learning,” inProceedings of the fourteenth international conference on artificial intelligence and statistics, 2011

2011

[8] [8]

Development of wabot 1,

I. Kato, “Development of wabot 1,”Biomechanism, vol. 2, pp. 173–214, 1973

1973

[9] [9]

Dynamic walk of a biped,

H. Miura and I. Shimoyama, “Dynamic walk of a biped,”IJRR, 1984

1984

[10] [10]

The development of honda humanoid robot,

K. Hirai, M. Hirose, Y . Haikawa, and T. Takenaka, “The development of honda humanoid robot,” inProceedings. 1998 IEEE international conference on robotics and automation (Cat. No. 98CH36146), vol. 2. IEEE, 1998, pp. 1321–1326

1998

[11] [11]

High speed whole body dynamic motion experiment with real time master-slave humanoid robot system,

Y . Ishiguro, K. Kojima, F. Sugai, S. Nozawa, Y . Kakiuchi, K. Okada, and M. Inaba, “High speed whole body dynamic motion experiment with real time master-slave humanoid robot system,” in2018 IEEE International Conference on Robotics and Automation (ICRA), 2018, pp. 5835–5841

2018

[12] [12]

Dynamic locomotion synchronization of bipedal robot and human operator via bilateral feedback teleoperation,

J. Ramos and S. Kim, “Dynamic locomotion synchronization of bipedal robot and human operator via bilateral feedback teleoperation,”Science Robotics, vol. 4, no. 35, p. eaav4282, 2019

2019

[13] [13]

On real- time whole-body human to humanoid motion transfer,

F.-J. Montecillo-Puente, M. N. Sreenivasa, and J.-P. Laumond, “On real- time whole-body human to humanoid motion transfer,” inInternational Conference on Informatics in Control, Automation and Robotics, 2010. [Online]. Available: https://api.semanticscholar.org/CorpusID:20676844

2010

[14] [14]

Real-time imitation of human whole-body motions by humanoids,

J. Koenemann, F. Burget, and M. Bennewitz, “Real-time imitation of human whole-body motions by humanoids,” in2014 IEEE International Conference on Robotics and Automation (ICRA), 2014, pp. 2806–2812

2014

[15] [15]

Dancing humanoid robots: Systematic use of osid to compute dynamically consistent movements following a motion capture pattern,

O. E. Ramos, N. Mansard, O. Stasse, C. Benazeth, S. Hak, and L. Saab, “Dancing humanoid robots: Systematic use of osid to compute dynamically consistent movements following a motion capture pattern,” IEEE Robotics and Automation Magazine, vol. 22, no. 4, pp. 16–26, 2015

2015

[16] [16]

Hybrid zero dynamics of planar biped walkers,

E. R. Westervelt, J. W. Grizzle, and D. E. Koditschek, “Hybrid zero dynamics of planar biped walkers,”IEEE transactions on automatic control, vol. 48, no. 1, pp. 42–56, 2003

2003

[17] [17]

Whole body humanoid control from human motion descriptors,

B. Dariush, M. Gienger, B. Jian, C. Goerick, and K. Fujimura, “Whole body humanoid control from human motion descriptors,” in2008 IEEE International Conference on Robotics and Automation. IEEE, 2008, pp. 2677–2684

2008

[18] [18]

The 3d linear inverted pendulum mode: A simple modeling for a biped walking pattern generation,

S. Kajita, F. Kanehiro, K. Kaneko, K. Yokoi, and H. Hirukawa, “The 3d linear inverted pendulum mode: A simple modeling for a biped walking pattern generation,” inProceedings 2001 IEEE/RSJ International Con- ference on Intelligent Robots and Systems. Expanding the Societal Role of Robotics in the the Next Millennium (Cat. No. 01CH37180), vol. 1. IEEE, 2001...

2001

[19] [19]

Anymal-a highly mobile and dynamic quadrupedal robot,

M. Hutter, C. Gehring, D. Jud, A. Lauber, C. D. Bellicoso, V . Tsounis, J. Hwangbo, K. Bodie, P. Fankhauser, M. Bloesch,et al., “Anymal-a highly mobile and dynamic quadrupedal robot,” inIROS, 2016

2016

[20] [20]

Whole-body geometric retargeting for humanoid robots,

K. Darvish, Y . Tirupachuri, G. Romualdi, L. Rapetti, D. Ferigo, F. J. A. Chavez, and D. Pucci, “Whole-body geometric retargeting for humanoid robots,” in2019 IEEE-RAS 19th International Conference on Humanoid Robots (Humanoids), 2019, pp. 679–686

2019

[21] [21]

A multimode teleoperation framework for humanoid loco-manipulation: An application for the icub robot,

L. Penco, N. Scianca, V . Modugno, L. Lanari, G. Oriolo, and S. Ivaldi, “A multimode teleoperation framework for humanoid loco-manipulation: An application for the icub robot,”IEEE Robotics and Automation Magazine, vol. 26, no. 4, pp. 73–82, 2019

2019

[22] [22]

Dynamic locomotion synchronization of bipedal robot and human operator via bilateral feedback teleoperation,

J. Ramos and S. Kim, “Dynamic locomotion synchronization of bipedal robot and human operator via bilateral feedback teleoperation,”Science Robotics, vol. 4, no. 35, p. eaav4282, 2019. [Online]. Available: https://www.science.org/doi/abs/10.1126/scirobotics.aav4282

work page doi:10.1126/scirobotics.aav4282 2019

[23] [23]

Bilateral humanoid teleoperation system using whole-body exoskeleton cockpit tablis,

Y . Ishiguro, T. Makabe, Y . Nagamatsu, Y . Kojio, K. Kojima, F. Sugai, Y . Kakiuchi, K. Okada, and M. Inaba, “Bilateral humanoid teleoperation system using whole-body exoskeleton cockpit tablis,”IEEE Robotics and Automation Letters, vol. 5, no. 4, pp. 6419–6426, 2020

2020

[24] [24]

Mixed reality teleoperation assistance for direct control of humanoids,

L. Penco, K. Momose, S. McCrory, D. Anderson, N. Kitchel, D. Calvert, and R. J. Griffin, “Mixed reality teleoperation assistance for direct control of humanoids,”IEEE Robotics and Automation Letters, vol. 9, no. 2, pp. 1937–1944, 2024

1937

[25] [25]

Whole-body control of humanoid robots,

F. L. Moro and L. Sentis, “Whole-body control of humanoid robots,” Humanoid Robotics: A reference, Springer, Dordrecht, 2019

2019

[26] [26]

Online non-linear centroidal mpc for humanoid robot locomotion with step adjustment,

G. Romualdi, S. Dafarra, G. L’Erario, I. Sorrentino, S. Traversaro, and D. Pucci, “Online non-linear centroidal mpc for humanoid robot locomotion with step adjustment,” in2022 International Conference on Robotics and Automation (ICRA). IEEE, 2022, pp. 10 412–10 419

2022

[27] [27]

Mptc-modular passive tracking controller for stack of tasks based control frameworks,

J. Englsberger, A. Dietrich, G.-A. Mesesan, G. Garofalo, C. Ott, and A. O. Albu-Sch¨affer, “Mptc-modular passive tracking controller for stack of tasks based control frameworks,”16th Robotics: Science and Systems, RSS 2020, 2020

2020

[28] [28]

Online non-linear centroidal mpc for humanoid robots payload carrying with contact-stable force parametrization,

M. Elobaid, G. Romualdi, G. Nava, L. Rapetti, H. A. O. Mohamed, and D. Pucci, “Online non-linear centroidal mpc for humanoid robots payload carrying with contact-stable force parametrization,” in2023 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2023, pp. 12 233–12 239. 8

2023

[29] [29]

The mit humanoid robot: Design, motion planning, and control for acrobatic behaviors,

M. Chignoli, D. Kim, E. Stanger-Jones, and S. Kim, “The mit humanoid robot: Design, motion planning, and control for acrobatic behaviors,” in 2020 IEEE-RAS 20th International Conference on Humanoid Robots (Humanoids). IEEE, 2021, pp. 1–8

2020

[30] [30]

Synchronized human-humanoid motion imitation,

A. Dallard, M. Benallegue, F. Kanehiro, and A. Kheddar, “Synchronized human-humanoid motion imitation,”IEEE Robotics and Automation Letters, vol. 8, no. 7, pp. 4155–4162, 2023

2023

[31] [31]

Learning agile robotic locomotion skills by imitating animals,

X. B. Peng, E. Coumans, T. Zhang, T.-W. Lee, J. Tan, and S. Levine, “Learning agile robotic locomotion skills by imitating animals,” Apr. 2020

2020

[32] [32]

RMA: Rapid Motor Adaptation for Legged Robots

A. Kumar, Z. Fu, D. Pathak, and J. Malik, “Rma: Rapid motor adaptation for legged robots,”arXiv preprint arXiv:2107.04034, 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021

[33] [33]

Minimizing energy con- sumption leads to the emergence of gaits in legged robots,

Z. Fu, A. Kumar, J. Malik, and D. Pathak, “Minimizing energy con- sumption leads to the emergence of gaits in legged robots,”Conference on Robot Learning (CoRL), 2021

2021

[34] [34]

Blind bipedal stair traversal via sim-to-real reinforcement learning,

J. Siekmann, K. Green, J. Warila, A. Fern, and J. Hurst, “Blind bipedal stair traversal via sim-to-real reinforcement learning,”arXiv preprint arXiv:2105.08328, 2021

work page arXiv 2021

[35] [35]

Robot parkour learning,

Z. Zhuang, Z. Fu, J. Wang, C. Atkeson, S. Schwertfeger, C. Finn, and H. Zhao, “Robot parkour learning,” inConference on Robot Learning (CoRL), 2023

2023

[36] [36]

Extreme parkour with legged robots,

X. Cheng, K. Shi, A. Agarwal, and D. Pathak, “Extreme parkour with legged robots,”arXiv preprint arXiv:2309.14341, 2023

work page arXiv 2023

[37] [37]

Beamdojo: Learning agile humanoid locomotion on sparse footholds,

H. Wang, Z. Wang, J. Ren, Q. Ben, T. Huang, W. Zhang, and J. Pang, “Beamdojo: Learning agile humanoid locomotion on sparse footholds,” arXiv preprint arXiv:2502.10363, 2025

work page arXiv 2025

[38] [38]

Legs as manipulator: Pushing quadrupedal agility beyond locomotion,

X. Cheng, A. Kumar, and D. Pathak, “Legs as manipulator: Pushing quadrupedal agility beyond locomotion,” in2023 IEEE International Conference on Robotics and Automation (ICRA), 2023

2023

[39] [39]

Learn- ing whole-body manipulation for quadrupedal robot,

S. Jeon, M. Jung, S. Choi, B. Kim, and J. Hwangbo, “Learn- ing whole-body manipulation for quadrupedal robot,”arXiv preprint arXiv:2308.16820, 2023

work page arXiv 2023

[40] [40]

Curiosity-driven learning of joint locomotion and manipulation tasks,

C. Schwarke, V . Klemm, M. Van der Boon, M. Bjelonic, and M. Hutter, “Curiosity-driven learning of joint locomotion and manipulation tasks,” inProceedings of The 7th Conference on Robot Learning, vol. 229. PMLR, 2023, pp. 2594–2610

2023

[41] [41]

Dribblebot: Dynamic legged manipulation in the wild,

Y . Ji, G. B. Margolis, and P. Agrawal, “Dribblebot: Dynamic legged manipulation in the wild,”arXiv preprint arXiv:2304.01159, 2023

work page arXiv 2023

[42] [42]

Deepmimic: Example-guided deep reinforcement learning of physics-based character skills,

X. B. Peng, P. Abbeel, S. Levine, and M. Van de Panne, “Deepmimic: Example-guided deep reinforcement learning of physics-based character skills,”ACM Transactions On Graphics (TOG), pp. 1–14, 2018

2018

[43] [43]

Amp: Adversarial motion priors for stylized physics-based character control,

X. B. Peng, Z. Ma, P. Abbeel, S. Levine, and A. Kanazawa, “Amp: Adversarial motion priors for stylized physics-based character control,” ACM Trans. Graph., vol. 40, no. 4, July 2021. [Online]. Available: http://doi.acm.org/10.1145/3450626.3459670

work page doi:10.1145/3450626.3459670 2021

[44] [44]

Ase: Large- scale reusable adversarial skill embeddings for physically simulated characters,

X. B. Peng, Y . Guo, L. Halper, S. Levine, and S. Fidler, “Ase: Large- scale reusable adversarial skill embeddings for physically simulated characters,”ACM Trans. Graph., vol. 41, no. 4, July 2022

2022

[45] [45]

Calm: Conditional adversarial latent models for directable virtual characters,

C. Tessler, Y . Kasten, Y . Guo, S. Mannor, G. Chechik, and X. B. Peng, “Calm: Conditional adversarial latent models for directable virtual characters,” inACM SIGGRAPH 2023 Conference Proceedings, ser. SIGGRAPH ’23. New York, NY , USA: Association for Computing Machinery, 2023. [Online]. Available: https://doi.org/10.1145/3588432. 3591541

work page doi:10.1145/3588432 2023

[46] [46]

Reinforcement learning for robust parameterized locomo- tion control of bipedal robots,

Z. Li, X. Cheng, X. B. Peng, P. Abbeel, S. Levine, G. Berseth, and K. Sreenath, “Reinforcement learning for robust parameterized locomo- tion control of bipedal robots,” in2021 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2021, pp. 2811–2817

2021

[47] [47]

Reinforcement learning for versatile, dynamic, and robust bipedal locomotion control,

Z. Li, X. B. Peng, P. Abbeel, S. Levine, G. Berseth, and K. Sreenath, “Reinforcement learning for versatile, dynamic, and robust bipedal locomotion control,”arXiv preprint arXiv:2401.16889, 2024

work page arXiv 2024

[48] [48]

Real-world humanoid locomotion with reinforcement learning,

I. Radosavovic, T. Xiao, B. Zhang, T. Darrell, J. Malik, and K. Sreenath, “Real-world humanoid locomotion with reinforcement learning,”arXiv:2303.03381, 2023

work page arXiv 2023

[49] [49]

Homie: Humanoid loco-manipulation with isomorphic exoskeleton cockpit,

Q. Ben, F. Jia, J. Zeng, J. Dong, D. Lin, and J. Pang, “Homie: Humanoid loco-manipulation with isomorphic exoskeleton cockpit,”arXiv preprint arXiv:2502.13013, 2025

work page arXiv 2025

[50] [50]

Robust and versatile bipedal jumping control through multi-task rein- forcement learning,

Z. Li, X. B. Peng, P. Abbeel, S. Levine, G. Berseth, and K. Sreenath, “Robust and versatile bipedal jumping control through multi-task rein- forcement learning,”arXiv preprint arXiv:2302.09450, 2023

work page arXiv 2023

[51] [51]

Learning humanoid standing-up control across diverse postures,

T. Huang, J. Ren, H. Wang, Z. Wang, Q. Ben, M. Wen, X. Chen, J. Li, and J. Pang, “Learning humanoid standing-up control across diverse postures,”arXiv preprint arXiv:2502.08378, 2025

work page arXiv 2025

[52] [52]

Simgan: Hybrid simulator identification for domain adaptation via adversarial reinforcement learning,

Y . Jiang, T. Zhang, D. Ho, Y . Bai, C. K. Liu, S. Levine, and J. Tan, “Simgan: Hybrid simulator identification for domain adaptation via adversarial reinforcement learning,” in2021 IEEE International Con- ference on Robotics and Automation (ICRA), 2021, pp. 2884–2890

2021

[53] [53]

Asap: Aligning simulation and real-world physics for learning agile humanoid whole-body skills,

T. He, J. Gao, W. Xiao, Y . Zhang, Z. Wang, J. Wang, Z. Luo, G. He, N. Sobanbab, C. Pan,et al., “Asap: Aligning simulation and real-world physics for learning agile humanoid whole-body skills,”arXiv preprint arXiv:2502.01143, 2025

work page arXiv 2025

[54] [54]

Legged locomotion in challenging terrains using egocentric vision,

A. Agarwal, A. Kumar, J. Malik, and D. Pathak, “Legged locomotion in challenging terrains using egocentric vision,” inConference on Robot Learning. PMLR, 2023, pp. 403–415

2023

[55] [55]

Neural volumetric memory for visual locomotion control,

R. Yang, G. Yang, and X. Wang, “Neural volumetric memory for visual locomotion control,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 1430–1440

2023

[56] [56]

Learning vision-based bipedal locomotion for challenging terrain,

H. Duan, B. Pandit, M. S. Gadde, B. J. van Marum, J. Dao, C. Kim, and A. Fern, “Learning vision-based bipedal locomotion for challenging terrain,”arXiv preprint arXiv:2309.14594, 2023

work page arXiv 2023

[57] [57]

Character controllers using motion vaes,

H. Y . Ling, F. Zinno, G. Cheng, and M. Van De Panne, “Character controllers using motion vaes,”ACM Transactions on Graphics (TOG), vol. 39, no. 4, pp. 40–1, 2020

2020

[58] [58]

Physics-based character controllers using conditional vaes,

J. Won, D. Gopinath, and J. Hodgins, “Physics-based character controllers using conditional vaes,”ACM Trans. Graph., vol. 41, no. 4, July 2022. [Online]. Available: https://doi.org/10.1145/3528223. 3530067

work page doi:10.1145/3528223 2022

[59] [59]

Generative adversarial imitation learning,

J. Ho and S. Ermon, “Generative adversarial imitation learning,”Ad- vances in neural information processing systems, vol. 29, 2016

2016

[60] [60]

Amp: Adversarial motion priors for stylized physics-based character control,

X. B. Peng, Z. Ma, P. Abbeel, S. Levine, and A. Kanazawa, “Amp: Adversarial motion priors for stylized physics-based character control,” ACM Transactions on Graphics (ToG), vol. 40, no. 4, pp. 1–20, 2021

2021

[61] [61]

Neural categorical priors for physics-based character control,

Q. Zhu, H. Zhang, M. Lan, and L. Han, “Neural categorical priors for physics-based character control,” 2023. [Online]. Available: https://arxiv.org/abs/2308.07200

work page arXiv 2023

[62] [62]

Masked- mimic: Unified physics-based character control through masked motion inpainting,

C. Tessler, Y . Guo, O. Nabati, G. Chechik, and X. B. Peng, “Masked- mimic: Unified physics-based character control through masked motion inpainting,”ACM Transactions on Graphics (TOG), 2024

2024

[63] [63]

Opt-mimic: Imitation of optimized trajectories for dynamic quadruped behaviors,

Y . Fuchioka, Z. Xie, and M. Van de Panne, “Opt-mimic: Imitation of optimized trajectories for dynamic quadruped behaviors,” in2023 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2023, pp. 5092–5098

2023

[64] [64]

Diffuseloco: Real-time legged locomotion control with diffusion from offline datasets,

X. Huang, Y . Chi, R. Wang, Z. Li, X. B. Peng, S. Shao, B. Nikolic, and K. Sreenath, “Diffuseloco: Real-time legged locomotion control with diffusion from offline datasets,” 2024. [Online]. Available: https://arxiv.org/abs/2404.19264

work page arXiv 2024

[65] [65]

An efficient model-based approach on learning agile motor skills without reinforcement,

H. Shi, T. Li, Q. Zhu, J. Sheng, L. Han, and M. Q. H. Meng, “An efficient model-based approach on learning agile motor skills without reinforcement,” 2024. [Online]. Available: https: //arxiv.org/abs/2403.01962

work page arXiv 2024

[66] [66]

Learning Multi-Modal Whole-Body Control for Real-World Humanoid Robots

P. Dugar, A. Shrestha, F. Yu, B. van Marum, and A. Fern, “Learning multi-modal whole-body control for real-world humanoid robots,” 2024. [Online]. Available: https://arxiv.org/abs/2408.07295

work page internal anchor Pith review Pith/arXiv arXiv 2024

[67] [67]

Cheng, Y

X. Cheng, Y . Ji, J. Chen, R. Yang, G. Yang, and X. Wang, “Expressive whole-body control for humanoid robots,” 2024. [Online]. Available: https://arxiv.org/abs/2402.16796

work page arXiv 2024

[68] [68]

Omnih2o: Universal and dexterous human-to-humanoid whole-body teleoperation and learning,

T. He, Z. Luo, X. He, W. Xiao, C. Zhang, W. Zhang, K. Kitani, C. Liu, and G. Shi, “Omnih2o: Universal and dexterous human-to-humanoid whole-body teleoperation and learning,” 2024. [Online]. Available: https://arxiv.org/abs/2406.08858

work page arXiv 2024

[69] [69]

Learning human-to-humanoid real-time whole-body teleoperation,

T. He, Z. Luo, W. Xiao, C. Zhang, K. Kitani, C. Liu, and G. Shi, “Learning human-to-humanoid real-time whole-body teleoperation,” in arXiv, 2024

2024

[70] [70]

Hover: Versatile neural whole-body controller for humanoid robots,

T. He, W. Xiao, T. Lin, Z. Luo, Z. Xu, Z. Jiang, C. Liu, G. Shi, X. Wang, L. Fan, and Y . Zhu, “Hover: Versatile neural whole-body controller for humanoid robots,”arXiv preprint arXiv:2410.21229, 2024

work page arXiv 2024

[71] [71]

Humanplus: Humanoid shadowing and imitation from humans,

Z. Fu, Q. Zhao, Q. Wu, G. Wetzstein, and C. Finn, “Humanplus: Humanoid shadowing and imitation from humans,” 2024. [Online]. Available: https://arxiv.org/abs/2406.10454

work page arXiv 2024

[72] [72]

Humanoid locomotion as next token prediction,

I. Radosavovic, B. Zhang, B. Shi, J. Rajasegaran, S. Kamat, T. Dar- rell, K. Sreenath, and J. Malik, “Humanoid locomotion as next token prediction,”arXiv:2402.19469, 2024

work page arXiv 2024

[73] [73]

Universal humanoid motion representations for physics- based control,

Z. Luo, J. Cao, J. Merel, A. Winkler, J. Huang, K. M. Kitani, and W. Xu, “Universal humanoid motion representations for physics- based control,” inThe Twelfth International Conference on Learning Representations, 2024. [Online]. Available: https://openreview.net/ forum?id=OrOd8PxOO2

2024

[74] [74]

Strategy and skill learning for physics- based table tennis animation,

J. Wang, J. Hodgins, and J. Won, “Strategy and skill learning for physics- based table tennis animation,” inACM SIGGRAPH 2024 Conference Papers, 2024, pp. 1–11

2024

[75] [75]

Robot motion diffusion model: Motion generation for robotic characters,

A. Serifi, R. Grandia, E. Knoop, M. Gross, and M. B ¨acher, “Robot motion diffusion model: Motion generation for robotic characters,” in SIGGRAPH Asia 2024 Conference Papers, ser. SA ’24. New York, 9 NY , USA: Association for Computing Machinery, 2024. [Online]. Available: https://doi.org/10.1145/3680528.3687626

work page doi:10.1145/3680528.3687626 2024

[76] [76]

Learning agile robotic locomotion skills by imitating animals,

X. B. Peng, E. Coumans, T. Zhang, T.-W. E. Lee, J. Tan, and S. Levine, “Learning agile robotic locomotion skills by imitating animals,” in Robotics: Science and Systems, 07 2020

2020

[77] [77]

Neural discrete representation learning,

A. van den Oord, O. Vinyals, and K. Kavukcuoglu, “Neural discrete representation learning,” inProceedings of the 31st International Con- ference on Neural Information Processing Systems, ser. NIPS’17. Red Hook, NY , USA: Curran Associates Inc., 2017, p. 6309–6318

2017

[78] [78]

Learn to teach: Sample-efficient privileged learning for humanoid locomotion over diverse terrains,

F. Wu, X. Nal, J. Jang, W. Zhu, Z. Gu, A. Wu, and Y . Zhao, “Learn to teach: Sample-efficient privileged learning for humanoid locomotion over diverse terrains,” 2025. [Online]. Available: https: //arxiv.org/abs/2402.06783

work page arXiv 2025

[79] [79]

Cts: Concurrent teacher-student reinforcement learning for legged locomotion,

H. Wang, H. Luo, W. Zhang, and H. Chen, “Cts: Concurrent teacher-student reinforcement learning for legged locomotion,” 2024. [Online]. Available: https://arxiv.org/abs/2405.10830

work page arXiv 2024

[80] [80]

Isaac Gym: High Performance GPU-Based Physics Simulation For Robot Learning

V . Makoviychuk, L. Wawrzyniak, Y . Guo, M. Lu, K. Storey, M. Macklin, D. Hoeller, N. Rudin, A. Allshire, A. Handa,et al., “Isaac gym: High performance gpu-based physics simulation for robot learning,”arXiv preprint arXiv:2108.10470, 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021