pith. sign in

arxiv: 2605.24592 · v1 · pith:6RPVGBDPnew · submitted 2026-05-23 · 💻 cs.RO

MuGen: Multi-Skill Generative Locomotion Controller for Humanoid Robots

Pith reviewed 2026-06-30 13:10 UTC · model grok-4.3

classification 💻 cs.RO
keywords humanoid locomotionmotion imitationvector-quantized autoencoderreinforcement learningpolicy distillationgenerative controlmulti-skill controller
0
0 comments X

The pith

A humanoid robot learns to track and mimic unseen human motions by compressing motion data into a reusable latent space.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces MuGen, a framework that trains vector-quantized autoencoders on heterogeneous human motion data using model-based reinforcement learning to build a generative locomotion representation. A teacher-student distillation produces a deployable policy that follows new motion sequences and reuses the learned latent space for additional tasks. If the approach holds, robots could handle expressive multi-skill behaviors from example data without separate retraining for each skill.

Core claim

By training vector-quantized autoencoders with model-based reinforcement learning on hours of heterogeneous human performance data, MuGen creates a generative representation of locomotion; a student policy distilled from a teacher then tracks and mimics unseen human motions while enabling reuse of the latent space for other tasks, demonstrated across a diverse set of motions with accurate execution.

What carries the argument

Vector-quantized autoencoders (VQ-VAEs) trained with model-based reinforcement learning that compress human motion patterns into a discrete generative latent space.

If this is right

  • The robot executes a diverse set of motions accurately when guided by example sequences.
  • The latent space supports direct reuse for other locomotion tasks without full policy retraining.
  • The distilled student policy transfers to physical humanoid hardware.
  • Training on heterogeneous data yields generalization to novel motion inputs.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same latent space might simplify high-level task planning by providing reusable motion primitives.
  • Similar compression could apply to non-humanoid robots if the representation separates motion style from platform dynamics.
  • Physical results would need to quantify how sim-to-real gaps affect tracking of unseen motions.
  • The approach might reduce the data required for new skills if the latent codes already encode transferable patterns.

Load-bearing premise

The vector-quantized autoencoders trained with model-based reinforcement learning produce a generative representation that captures key patterns of human motion from heterogeneous data in a way that supports generalization to unseen motions on a physical humanoid.

What would settle it

A physical robot test in which the policy receives motion sequences drawn from outside the training distribution and shows large tracking errors compared with in-distribution motions would falsify the generalization claim.

Figures

Figures reproduced from arXiv: 2605.24592 by Baoquan Chen, Boyang Yu, Heyuan Yao, Libin Liu, Pengyun Qiu, Ruijie Zhao, Xiang Wang, Xinyu Huo, Yusen Feng, Zixi Kang.

Figure 1
Figure 1. Figure 1: MuGen enables multi-skill humanoid locomotion by learning a generative controller. (a-d): A simulated humanoid tracks (a) a long walking motion manually crafted from motion data and (b–d) dance, run, and crouching walk motions selected from the motion dataset. (e-f): A real Unitree G1 robot tracks short segments of (e) a straight walk and (f) a crouching walk from the motion dataset. Details are provided i… view at source ↗
Figure 2
Figure 2. Figure 2: System overview 1) Motion Skill Embedding: states and reference motions are encoded into continuous representations, then a VQ bottleneck maps embeddings to a trainable codebook. 2) Student policy: using the shared codebook, the student decodes actions from partial observations and aligns them with teacher outputs through behavior cloning. All training is performed in simulation environment. A. Problem For… view at source ↗
Figure 3
Figure 3. Figure 3: The trajectory generated by the student policy using the learned prior encoder [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Training Curves of different model designs. We cal [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Data Distribution Rotations are represented using the 6D continuous format from [84]. After this transformation, at each timestep, the state (¯xi , q¯i , v¯i , ω¯i) has shape (Nbody, 3 + 6 + 3 + 3), corresponding to body position, body orientation (6D), linear velocity, and angular velocity. States and reference motions in global space encompass privileged information that is typically inaccessible to onbo… view at source ↗
read the original abstract

This paper presents MuGen, a data-driven framework for learning and deploying multi-skill locomotion on humanoid robots. MuGen enables a robot to perform expressive motions like humans under the guidance of example motion sequences. To achieve this, we employ vector-quantized autoencoders (VQ-VAEs) trained with model-based reinforcement learning, resulting in a generative representation of locomotion that captures key patterns of human motion from hours of heterogeneous human performance data. We employ a teacher-student learning framework and develop a new policy distillation strategy to enable a deployable student policy learning this efficient latent representation. This policy allows the robot to track and mimic unseen human motions and further enables the robot to reuse the learned latent space for other tasks. We demonstrate the effectiveness of our framework through a diverse set of motions and accurate execution.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The paper presents MuGen, a data-driven framework for multi-skill locomotion control on humanoid robots. It trains vector-quantized autoencoders (VQ-VAEs) via model-based reinforcement learning on hours of heterogeneous human motion data to obtain a generative latent representation, then applies a teacher-student distillation procedure to produce a deployable student policy. The resulting policy is claimed to track and mimic unseen human motions while also permitting reuse of the learned latent space for additional tasks, with effectiveness shown through a diverse set of motions and accurate execution on the robot.

Significance. If the central generalization claims hold with supporting quantitative evidence, the work would offer a practical route to expressive, multi-skill humanoid controllers learned from real human data that transfer to physical hardware and support downstream task reuse. The integration of VQ-VAE discretization with model-based RL and policy distillation addresses a relevant gap between motion capture data and deployable controllers.

major comments (2)
  1. [Abstract] Abstract: the central claim that the policy 'allows the robot to track and mimic unseen human motions' and 'enables the robot to reuse the learned latent space for other tasks' is presented without any quantitative tracking metrics, success rates, error statistics, ablation studies, or hardware validation details, rendering the generalization performance impossible to assess.
  2. [Abstract] Abstract: no definition is supplied for what constitutes an 'unseen' motion, no description of the data retargeting procedure, and no architecture or loss terms for the VQ-VAE or model-based RL training are given, all of which are load-bearing for verifying that the discrete codes capture transferable patterns rather than dataset-specific or simulation artifacts.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on the abstract. We address the points below and will revise the abstract to better support the claims with key details while respecting length constraints.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central claim that the policy 'allows the robot to track and mimic unseen human motions' and 'enables the robot to reuse the learned latent space for other tasks' is presented without any quantitative tracking metrics, success rates, error statistics, ablation studies, or hardware validation details, rendering the generalization performance impossible to assess.

    Authors: The abstract is a high-level summary; quantitative results (tracking errors, success rates on unseen motions, hardware execution accuracy) and ablations appear in Sections 5 and 6. We will revise the abstract to include concise quantitative highlights (e.g., average tracking error and success rate on held-out motions) to make the generalization claims more immediately assessable. revision: yes

  2. Referee: [Abstract] Abstract: no definition is supplied for what constitutes an 'unseen' motion, no description of the data retargeting procedure, and no architecture or loss terms for the VQ-VAE or model-based RL training are given, all of which are load-bearing for verifying that the discrete codes capture transferable patterns rather than dataset-specific or simulation artifacts.

    Authors: Space limits prevent full details in the abstract, but 'unseen' motions are defined as sequences from held-out subjects/styles (Section 4.1), retargeting uses standard SMPL-to-robot mapping (Section 3.2), and VQ-VAE architecture/losses plus model-based RL objective are specified in Section 3.3. We will add brief clarifications to the abstract (e.g., 'unseen motions from different performers') to address verifiability. revision: yes

Circularity Check

0 steps flagged

No circularity detected; claims rest on empirical training outcomes

full rationale

The provided abstract and description contain no equations, loss functions, or derivation steps. The central claim—that VQ-VAE latent codes trained via model-based RL on heterogeneous motion data enable tracking of unseen motions and latent-space reuse—is presented as an empirical result of the training and distillation process rather than a quantity derived from or equivalent to its own inputs by construction. No self-citations, ansatzes, or fitted-input-as-prediction patterns appear in the given text. The derivation chain is therefore self-contained against external benchmarks and receives the default non-circularity finding.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review; free parameters, axioms, and invented entities cannot be enumerated without the methods and results sections.

pith-pipeline@v0.9.1-grok · 5694 in / 1024 out tokens · 28469 ms · 2026-06-30T13:10:42.472522+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

84 extracted references · 33 canonical work pages · 3 internal anchors

  1. [1]

    Adversarial motion priors make good substitutes for complex reward functions. 2022 ieee,

    A. Escontrela, X. B. Peng, W. Yu, T. Zhang, A. Iscen, K. Goldberg, and P. Abbeel, “Adversarial motion priors make good substitutes for complex reward functions. 2022 ieee,” inInternational Conference on Intelligent Robots and Systems (IROS), vol. 2, 2022

  2. [2]

    Amp in the wild: Learning robust, agile, natural legged locomotion skills,

    Y . Wang, Z. Jiang, and J. Chen, “Amp in the wild: Learning robust, agile, natural legged locomotion skills,”arXiv preprint arXiv:2304.10888, 2023

  3. [3]

    Exbody2: Advanced expressive humanoid whole-body control,

    M. Ji, X. Peng, F. Liu, J. Li, G. Yang, X. Cheng, and X. Wang, “Exbody2: Advanced expressive humanoid whole-body control,”arXiv preprint arXiv:2412.13196, 2024

  4. [4]

    Styleloco: Generative adversarial distillation for natural humanoid robot locomotion,

    L. Ma, Z. Meng, T. Liu, Y . Li, R. Song, W. Zhang, and S. Huang, “Styleloco: Generative adversarial distillation for natural humanoid robot locomotion,”arXiv preprint arXiv:2503.15082, 2025

  5. [5]

    Controlvae: Model-based learning of generative controllers for physics-based characters,

    H. Yao, Z. Song, B. Chen, and L. Liu, “Controlvae: Model-based learning of generative controllers for physics-based characters,”ACM Transactions on Graphics (TOG), vol. 41, no. 6, pp. 1–16, 2022

  6. [6]

    Moconvq: Uni- fied physics-based motion control via scalable discrete representations,

    H. Yao, Z. Song, Y . Zhou, T. Ao, B. Chen, and L. Liu, “Moconvq: Uni- fied physics-based motion control via scalable discrete representations,” ACM Transactions on Graphics (TOG), vol. 43, no. 4, pp. 1–21, 2024

  7. [7]

    A reduction of imitation learning and structured prediction to no-regret online learning,

    S. Ross, G. Gordon, and D. Bagnell, “A reduction of imitation learning and structured prediction to no-regret online learning,” inProceedings of the fourteenth international conference on artificial intelligence and statistics, 2011

  8. [8]

    Development of wabot 1,

    I. Kato, “Development of wabot 1,”Biomechanism, vol. 2, pp. 173–214, 1973

  9. [9]

    Dynamic walk of a biped,

    H. Miura and I. Shimoyama, “Dynamic walk of a biped,”IJRR, 1984

  10. [10]

    The development of honda humanoid robot,

    K. Hirai, M. Hirose, Y . Haikawa, and T. Takenaka, “The development of honda humanoid robot,” inProceedings. 1998 IEEE international conference on robotics and automation (Cat. No. 98CH36146), vol. 2. IEEE, 1998, pp. 1321–1326

  11. [11]

    High speed whole body dynamic motion experiment with real time master-slave humanoid robot system,

    Y . Ishiguro, K. Kojima, F. Sugai, S. Nozawa, Y . Kakiuchi, K. Okada, and M. Inaba, “High speed whole body dynamic motion experiment with real time master-slave humanoid robot system,” in2018 IEEE International Conference on Robotics and Automation (ICRA), 2018, pp. 5835–5841

  12. [12]

    Dynamic locomotion synchronization of bipedal robot and human operator via bilateral feedback teleoperation,

    J. Ramos and S. Kim, “Dynamic locomotion synchronization of bipedal robot and human operator via bilateral feedback teleoperation,”Science Robotics, vol. 4, no. 35, p. eaav4282, 2019

  13. [13]

    On real- time whole-body human to humanoid motion transfer,

    F.-J. Montecillo-Puente, M. N. Sreenivasa, and J.-P. Laumond, “On real- time whole-body human to humanoid motion transfer,” inInternational Conference on Informatics in Control, Automation and Robotics, 2010. [Online]. Available: https://api.semanticscholar.org/CorpusID:20676844

  14. [14]

    Real-time imitation of human whole-body motions by humanoids,

    J. Koenemann, F. Burget, and M. Bennewitz, “Real-time imitation of human whole-body motions by humanoids,” in2014 IEEE International Conference on Robotics and Automation (ICRA), 2014, pp. 2806–2812

  15. [15]

    Dancing humanoid robots: Systematic use of osid to compute dynamically consistent movements following a motion capture pattern,

    O. E. Ramos, N. Mansard, O. Stasse, C. Benazeth, S. Hak, and L. Saab, “Dancing humanoid robots: Systematic use of osid to compute dynamically consistent movements following a motion capture pattern,” IEEE Robotics and Automation Magazine, vol. 22, no. 4, pp. 16–26, 2015

  16. [16]

    Hybrid zero dynamics of planar biped walkers,

    E. R. Westervelt, J. W. Grizzle, and D. E. Koditschek, “Hybrid zero dynamics of planar biped walkers,”IEEE transactions on automatic control, vol. 48, no. 1, pp. 42–56, 2003

  17. [17]

    Whole body humanoid control from human motion descriptors,

    B. Dariush, M. Gienger, B. Jian, C. Goerick, and K. Fujimura, “Whole body humanoid control from human motion descriptors,” in2008 IEEE International Conference on Robotics and Automation. IEEE, 2008, pp. 2677–2684

  18. [18]

    The 3d linear inverted pendulum mode: A simple modeling for a biped walking pattern generation,

    S. Kajita, F. Kanehiro, K. Kaneko, K. Yokoi, and H. Hirukawa, “The 3d linear inverted pendulum mode: A simple modeling for a biped walking pattern generation,” inProceedings 2001 IEEE/RSJ International Con- ference on Intelligent Robots and Systems. Expanding the Societal Role of Robotics in the the Next Millennium (Cat. No. 01CH37180), vol. 1. IEEE, 2001...

  19. [19]

    Anymal-a highly mobile and dynamic quadrupedal robot,

    M. Hutter, C. Gehring, D. Jud, A. Lauber, C. D. Bellicoso, V . Tsounis, J. Hwangbo, K. Bodie, P. Fankhauser, M. Bloesch,et al., “Anymal-a highly mobile and dynamic quadrupedal robot,” inIROS, 2016

  20. [20]

    Whole-body geometric retargeting for humanoid robots,

    K. Darvish, Y . Tirupachuri, G. Romualdi, L. Rapetti, D. Ferigo, F. J. A. Chavez, and D. Pucci, “Whole-body geometric retargeting for humanoid robots,” in2019 IEEE-RAS 19th International Conference on Humanoid Robots (Humanoids), 2019, pp. 679–686

  21. [21]

    A multimode teleoperation framework for humanoid loco-manipulation: An application for the icub robot,

    L. Penco, N. Scianca, V . Modugno, L. Lanari, G. Oriolo, and S. Ivaldi, “A multimode teleoperation framework for humanoid loco-manipulation: An application for the icub robot,”IEEE Robotics and Automation Magazine, vol. 26, no. 4, pp. 73–82, 2019

  22. [22]

    Dynamic locomotion synchronization of bipedal robot and human operator via bilateral feedback teleoperation,

    J. Ramos and S. Kim, “Dynamic locomotion synchronization of bipedal robot and human operator via bilateral feedback teleoperation,”Science Robotics, vol. 4, no. 35, p. eaav4282, 2019. [Online]. Available: https://www.science.org/doi/abs/10.1126/scirobotics.aav4282

  23. [23]

    Bilateral humanoid teleoperation system using whole-body exoskeleton cockpit tablis,

    Y . Ishiguro, T. Makabe, Y . Nagamatsu, Y . Kojio, K. Kojima, F. Sugai, Y . Kakiuchi, K. Okada, and M. Inaba, “Bilateral humanoid teleoperation system using whole-body exoskeleton cockpit tablis,”IEEE Robotics and Automation Letters, vol. 5, no. 4, pp. 6419–6426, 2020

  24. [24]

    Mixed reality teleoperation assistance for direct control of humanoids,

    L. Penco, K. Momose, S. McCrory, D. Anderson, N. Kitchel, D. Calvert, and R. J. Griffin, “Mixed reality teleoperation assistance for direct control of humanoids,”IEEE Robotics and Automation Letters, vol. 9, no. 2, pp. 1937–1944, 2024

  25. [25]

    Whole-body control of humanoid robots,

    F. L. Moro and L. Sentis, “Whole-body control of humanoid robots,” Humanoid Robotics: A reference, Springer, Dordrecht, 2019

  26. [26]

    Online non-linear centroidal mpc for humanoid robot locomotion with step adjustment,

    G. Romualdi, S. Dafarra, G. L’Erario, I. Sorrentino, S. Traversaro, and D. Pucci, “Online non-linear centroidal mpc for humanoid robot locomotion with step adjustment,” in2022 International Conference on Robotics and Automation (ICRA). IEEE, 2022, pp. 10 412–10 419

  27. [27]

    Mptc-modular passive tracking controller for stack of tasks based control frameworks,

    J. Englsberger, A. Dietrich, G.-A. Mesesan, G. Garofalo, C. Ott, and A. O. Albu-Sch¨affer, “Mptc-modular passive tracking controller for stack of tasks based control frameworks,”16th Robotics: Science and Systems, RSS 2020, 2020

  28. [28]

    Online non-linear centroidal mpc for humanoid robots payload carrying with contact-stable force parametrization,

    M. Elobaid, G. Romualdi, G. Nava, L. Rapetti, H. A. O. Mohamed, and D. Pucci, “Online non-linear centroidal mpc for humanoid robots payload carrying with contact-stable force parametrization,” in2023 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2023, pp. 12 233–12 239. 8

  29. [29]

    The mit humanoid robot: Design, motion planning, and control for acrobatic behaviors,

    M. Chignoli, D. Kim, E. Stanger-Jones, and S. Kim, “The mit humanoid robot: Design, motion planning, and control for acrobatic behaviors,” in 2020 IEEE-RAS 20th International Conference on Humanoid Robots (Humanoids). IEEE, 2021, pp. 1–8

  30. [30]

    Synchronized human-humanoid motion imitation,

    A. Dallard, M. Benallegue, F. Kanehiro, and A. Kheddar, “Synchronized human-humanoid motion imitation,”IEEE Robotics and Automation Letters, vol. 8, no. 7, pp. 4155–4162, 2023

  31. [31]

    Learning agile robotic locomotion skills by imitating animals,

    X. B. Peng, E. Coumans, T. Zhang, T.-W. Lee, J. Tan, and S. Levine, “Learning agile robotic locomotion skills by imitating animals,” Apr. 2020

  32. [32]

    RMA: Rapid Motor Adaptation for Legged Robots

    A. Kumar, Z. Fu, D. Pathak, and J. Malik, “Rma: Rapid motor adaptation for legged robots,”arXiv preprint arXiv:2107.04034, 2021

  33. [33]

    Minimizing energy con- sumption leads to the emergence of gaits in legged robots,

    Z. Fu, A. Kumar, J. Malik, and D. Pathak, “Minimizing energy con- sumption leads to the emergence of gaits in legged robots,”Conference on Robot Learning (CoRL), 2021

  34. [34]

    Blind bipedal stair traversal via sim-to-real reinforcement learning,

    J. Siekmann, K. Green, J. Warila, A. Fern, and J. Hurst, “Blind bipedal stair traversal via sim-to-real reinforcement learning,”arXiv preprint arXiv:2105.08328, 2021

  35. [35]

    Robot parkour learning,

    Z. Zhuang, Z. Fu, J. Wang, C. Atkeson, S. Schwertfeger, C. Finn, and H. Zhao, “Robot parkour learning,” inConference on Robot Learning (CoRL), 2023

  36. [36]

    Extreme parkour with legged robots,

    X. Cheng, K. Shi, A. Agarwal, and D. Pathak, “Extreme parkour with legged robots,”arXiv preprint arXiv:2309.14341, 2023

  37. [37]

    Beamdojo: Learning agile humanoid locomotion on sparse footholds,

    H. Wang, Z. Wang, J. Ren, Q. Ben, T. Huang, W. Zhang, and J. Pang, “Beamdojo: Learning agile humanoid locomotion on sparse footholds,” arXiv preprint arXiv:2502.10363, 2025

  38. [38]

    Legs as manipulator: Pushing quadrupedal agility beyond locomotion,

    X. Cheng, A. Kumar, and D. Pathak, “Legs as manipulator: Pushing quadrupedal agility beyond locomotion,” in2023 IEEE International Conference on Robotics and Automation (ICRA), 2023

  39. [39]

    Learn- ing whole-body manipulation for quadrupedal robot,

    S. Jeon, M. Jung, S. Choi, B. Kim, and J. Hwangbo, “Learn- ing whole-body manipulation for quadrupedal robot,”arXiv preprint arXiv:2308.16820, 2023

  40. [40]

    Curiosity-driven learning of joint locomotion and manipulation tasks,

    C. Schwarke, V . Klemm, M. Van der Boon, M. Bjelonic, and M. Hutter, “Curiosity-driven learning of joint locomotion and manipulation tasks,” inProceedings of The 7th Conference on Robot Learning, vol. 229. PMLR, 2023, pp. 2594–2610

  41. [41]

    Dribblebot: Dynamic legged manipulation in the wild,

    Y . Ji, G. B. Margolis, and P. Agrawal, “Dribblebot: Dynamic legged manipulation in the wild,”arXiv preprint arXiv:2304.01159, 2023

  42. [42]

    Deepmimic: Example-guided deep reinforcement learning of physics-based character skills,

    X. B. Peng, P. Abbeel, S. Levine, and M. Van de Panne, “Deepmimic: Example-guided deep reinforcement learning of physics-based character skills,”ACM Transactions On Graphics (TOG), pp. 1–14, 2018

  43. [43]

    Amp: Adversarial motion priors for stylized physics-based character control,

    X. B. Peng, Z. Ma, P. Abbeel, S. Levine, and A. Kanazawa, “Amp: Adversarial motion priors for stylized physics-based character control,” ACM Trans. Graph., vol. 40, no. 4, July 2021. [Online]. Available: http://doi.acm.org/10.1145/3450626.3459670

  44. [44]

    Ase: Large- scale reusable adversarial skill embeddings for physically simulated characters,

    X. B. Peng, Y . Guo, L. Halper, S. Levine, and S. Fidler, “Ase: Large- scale reusable adversarial skill embeddings for physically simulated characters,”ACM Trans. Graph., vol. 41, no. 4, July 2022

  45. [45]

    Calm: Conditional adversarial latent models for directable virtual characters,

    C. Tessler, Y . Kasten, Y . Guo, S. Mannor, G. Chechik, and X. B. Peng, “Calm: Conditional adversarial latent models for directable virtual characters,” inACM SIGGRAPH 2023 Conference Proceedings, ser. SIGGRAPH ’23. New York, NY , USA: Association for Computing Machinery, 2023. [Online]. Available: https://doi.org/10.1145/3588432. 3591541

  46. [46]

    Reinforcement learning for robust parameterized locomo- tion control of bipedal robots,

    Z. Li, X. Cheng, X. B. Peng, P. Abbeel, S. Levine, G. Berseth, and K. Sreenath, “Reinforcement learning for robust parameterized locomo- tion control of bipedal robots,” in2021 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2021, pp. 2811–2817

  47. [47]

    Reinforcement learning for versatile, dynamic, and robust bipedal locomotion control,

    Z. Li, X. B. Peng, P. Abbeel, S. Levine, G. Berseth, and K. Sreenath, “Reinforcement learning for versatile, dynamic, and robust bipedal locomotion control,”arXiv preprint arXiv:2401.16889, 2024

  48. [48]

    Real-world humanoid locomotion with reinforcement learning,

    I. Radosavovic, T. Xiao, B. Zhang, T. Darrell, J. Malik, and K. Sreenath, “Real-world humanoid locomotion with reinforcement learning,”arXiv:2303.03381, 2023

  49. [49]

    Homie: Humanoid loco-manipulation with isomorphic exoskeleton cockpit,

    Q. Ben, F. Jia, J. Zeng, J. Dong, D. Lin, and J. Pang, “Homie: Humanoid loco-manipulation with isomorphic exoskeleton cockpit,”arXiv preprint arXiv:2502.13013, 2025

  50. [50]

    Robust and versatile bipedal jumping control through multi-task rein- forcement learning,

    Z. Li, X. B. Peng, P. Abbeel, S. Levine, G. Berseth, and K. Sreenath, “Robust and versatile bipedal jumping control through multi-task rein- forcement learning,”arXiv preprint arXiv:2302.09450, 2023

  51. [51]

    Learning humanoid standing-up control across diverse postures,

    T. Huang, J. Ren, H. Wang, Z. Wang, Q. Ben, M. Wen, X. Chen, J. Li, and J. Pang, “Learning humanoid standing-up control across diverse postures,”arXiv preprint arXiv:2502.08378, 2025

  52. [52]

    Simgan: Hybrid simulator identification for domain adaptation via adversarial reinforcement learning,

    Y . Jiang, T. Zhang, D. Ho, Y . Bai, C. K. Liu, S. Levine, and J. Tan, “Simgan: Hybrid simulator identification for domain adaptation via adversarial reinforcement learning,” in2021 IEEE International Con- ference on Robotics and Automation (ICRA), 2021, pp. 2884–2890

  53. [53]

    Asap: Aligning simulation and real-world physics for learning agile humanoid whole-body skills,

    T. He, J. Gao, W. Xiao, Y . Zhang, Z. Wang, J. Wang, Z. Luo, G. He, N. Sobanbab, C. Pan,et al., “Asap: Aligning simulation and real-world physics for learning agile humanoid whole-body skills,”arXiv preprint arXiv:2502.01143, 2025

  54. [54]

    Legged locomotion in challenging terrains using egocentric vision,

    A. Agarwal, A. Kumar, J. Malik, and D. Pathak, “Legged locomotion in challenging terrains using egocentric vision,” inConference on Robot Learning. PMLR, 2023, pp. 403–415

  55. [55]

    Neural volumetric memory for visual locomotion control,

    R. Yang, G. Yang, and X. Wang, “Neural volumetric memory for visual locomotion control,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 1430–1440

  56. [56]

    Learning vision-based bipedal locomotion for challenging terrain,

    H. Duan, B. Pandit, M. S. Gadde, B. J. van Marum, J. Dao, C. Kim, and A. Fern, “Learning vision-based bipedal locomotion for challenging terrain,”arXiv preprint arXiv:2309.14594, 2023

  57. [57]

    Character controllers using motion vaes,

    H. Y . Ling, F. Zinno, G. Cheng, and M. Van De Panne, “Character controllers using motion vaes,”ACM Transactions on Graphics (TOG), vol. 39, no. 4, pp. 40–1, 2020

  58. [58]

    Physics-based character controllers using conditional vaes,

    J. Won, D. Gopinath, and J. Hodgins, “Physics-based character controllers using conditional vaes,”ACM Trans. Graph., vol. 41, no. 4, July 2022. [Online]. Available: https://doi.org/10.1145/3528223. 3530067

  59. [59]

    Generative adversarial imitation learning,

    J. Ho and S. Ermon, “Generative adversarial imitation learning,”Ad- vances in neural information processing systems, vol. 29, 2016

  60. [60]

    Amp: Adversarial motion priors for stylized physics-based character control,

    X. B. Peng, Z. Ma, P. Abbeel, S. Levine, and A. Kanazawa, “Amp: Adversarial motion priors for stylized physics-based character control,” ACM Transactions on Graphics (ToG), vol. 40, no. 4, pp. 1–20, 2021

  61. [61]

    Neural categorical priors for physics-based character control,

    Q. Zhu, H. Zhang, M. Lan, and L. Han, “Neural categorical priors for physics-based character control,” 2023. [Online]. Available: https://arxiv.org/abs/2308.07200

  62. [62]

    Masked- mimic: Unified physics-based character control through masked motion inpainting,

    C. Tessler, Y . Guo, O. Nabati, G. Chechik, and X. B. Peng, “Masked- mimic: Unified physics-based character control through masked motion inpainting,”ACM Transactions on Graphics (TOG), 2024

  63. [63]

    Opt-mimic: Imitation of optimized trajectories for dynamic quadruped behaviors,

    Y . Fuchioka, Z. Xie, and M. Van de Panne, “Opt-mimic: Imitation of optimized trajectories for dynamic quadruped behaviors,” in2023 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2023, pp. 5092–5098

  64. [64]

    Diffuseloco: Real-time legged locomotion control with diffusion from offline datasets,

    X. Huang, Y . Chi, R. Wang, Z. Li, X. B. Peng, S. Shao, B. Nikolic, and K. Sreenath, “Diffuseloco: Real-time legged locomotion control with diffusion from offline datasets,” 2024. [Online]. Available: https://arxiv.org/abs/2404.19264

  65. [65]

    An efficient model-based approach on learning agile motor skills without reinforcement,

    H. Shi, T. Li, Q. Zhu, J. Sheng, L. Han, and M. Q. H. Meng, “An efficient model-based approach on learning agile motor skills without reinforcement,” 2024. [Online]. Available: https: //arxiv.org/abs/2403.01962

  66. [66]

    Learning Multi-Modal Whole-Body Control for Real-World Humanoid Robots

    P. Dugar, A. Shrestha, F. Yu, B. van Marum, and A. Fern, “Learning multi-modal whole-body control for real-world humanoid robots,” 2024. [Online]. Available: https://arxiv.org/abs/2408.07295

  67. [67]

    Cheng, Y

    X. Cheng, Y . Ji, J. Chen, R. Yang, G. Yang, and X. Wang, “Expressive whole-body control for humanoid robots,” 2024. [Online]. Available: https://arxiv.org/abs/2402.16796

  68. [68]

    Omnih2o: Universal and dexterous human-to-humanoid whole-body teleoperation and learning,

    T. He, Z. Luo, X. He, W. Xiao, C. Zhang, W. Zhang, K. Kitani, C. Liu, and G. Shi, “Omnih2o: Universal and dexterous human-to-humanoid whole-body teleoperation and learning,” 2024. [Online]. Available: https://arxiv.org/abs/2406.08858

  69. [69]

    Learning human-to-humanoid real-time whole-body teleoperation,

    T. He, Z. Luo, W. Xiao, C. Zhang, K. Kitani, C. Liu, and G. Shi, “Learning human-to-humanoid real-time whole-body teleoperation,” in arXiv, 2024

  70. [70]

    Hover: Versatile neural whole-body controller for humanoid robots,

    T. He, W. Xiao, T. Lin, Z. Luo, Z. Xu, Z. Jiang, C. Liu, G. Shi, X. Wang, L. Fan, and Y . Zhu, “Hover: Versatile neural whole-body controller for humanoid robots,”arXiv preprint arXiv:2410.21229, 2024

  71. [71]

    Humanplus: Humanoid shadowing and imitation from humans,

    Z. Fu, Q. Zhao, Q. Wu, G. Wetzstein, and C. Finn, “Humanplus: Humanoid shadowing and imitation from humans,” 2024. [Online]. Available: https://arxiv.org/abs/2406.10454

  72. [72]

    Humanoid locomotion as next token prediction,

    I. Radosavovic, B. Zhang, B. Shi, J. Rajasegaran, S. Kamat, T. Dar- rell, K. Sreenath, and J. Malik, “Humanoid locomotion as next token prediction,”arXiv:2402.19469, 2024

  73. [73]

    Universal humanoid motion representations for physics- based control,

    Z. Luo, J. Cao, J. Merel, A. Winkler, J. Huang, K. M. Kitani, and W. Xu, “Universal humanoid motion representations for physics- based control,” inThe Twelfth International Conference on Learning Representations, 2024. [Online]. Available: https://openreview.net/ forum?id=OrOd8PxOO2

  74. [74]

    Strategy and skill learning for physics- based table tennis animation,

    J. Wang, J. Hodgins, and J. Won, “Strategy and skill learning for physics- based table tennis animation,” inACM SIGGRAPH 2024 Conference Papers, 2024, pp. 1–11

  75. [75]

    Robot motion diffusion model: Motion generation for robotic characters,

    A. Serifi, R. Grandia, E. Knoop, M. Gross, and M. B ¨acher, “Robot motion diffusion model: Motion generation for robotic characters,” in SIGGRAPH Asia 2024 Conference Papers, ser. SA ’24. New York, 9 NY , USA: Association for Computing Machinery, 2024. [Online]. Available: https://doi.org/10.1145/3680528.3687626

  76. [76]

    Learning agile robotic locomotion skills by imitating animals,

    X. B. Peng, E. Coumans, T. Zhang, T.-W. E. Lee, J. Tan, and S. Levine, “Learning agile robotic locomotion skills by imitating animals,” in Robotics: Science and Systems, 07 2020

  77. [77]

    Neural discrete representation learning,

    A. van den Oord, O. Vinyals, and K. Kavukcuoglu, “Neural discrete representation learning,” inProceedings of the 31st International Con- ference on Neural Information Processing Systems, ser. NIPS’17. Red Hook, NY , USA: Curran Associates Inc., 2017, p. 6309–6318

  78. [78]

    Learn to teach: Sample-efficient privileged learning for humanoid locomotion over diverse terrains,

    F. Wu, X. Nal, J. Jang, W. Zhu, Z. Gu, A. Wu, and Y . Zhao, “Learn to teach: Sample-efficient privileged learning for humanoid locomotion over diverse terrains,” 2025. [Online]. Available: https: //arxiv.org/abs/2402.06783

  79. [79]

    Cts: Concurrent teacher-student reinforcement learning for legged locomotion,

    H. Wang, H. Luo, W. Zhang, and H. Chen, “Cts: Concurrent teacher-student reinforcement learning for legged locomotion,” 2024. [Online]. Available: https://arxiv.org/abs/2405.10830

  80. [80]

    Isaac Gym: High Performance GPU-Based Physics Simulation For Robot Learning

    V . Makoviychuk, L. Wawrzyniak, Y . Guo, M. Lu, K. Storey, M. Macklin, D. Hoeller, N. Rudin, A. Allshire, A. Handa,et al., “Isaac gym: High performance gpu-based physics simulation for robot learning,”arXiv preprint arXiv:2108.10470, 2021

Showing first 80 references.