arxiv: 2602.00678 · v4 · submitted 2026-01-31 · 💻 cs.RO

Recognition: no theorem link

Toward Reliable Sim-to-Real Predictability for MoE-based Robust Quadrupedal Locomotion

Tianyang Wu , Hanwei Guo , Yuhang Wang , Junshu Yang , Xinyang Sui , Jiayi Xie , Xingyu Chen , Zeyang Liu

show 1 more author

Xuguang Lan

Authors on Pith no claims yet

Pith reviewed 2026-05-16 08:56 UTC · model grok-4.3

classification 💻 cs.RO

keywords quadrupedal locomotionmixture of expertssim-to-real transferreinforcement learningterrain generalizationproprioceptionpolicy robustnessrobotics

0 comments

The pith

A gated Mixture-of-Experts policy paired with sim-to-sim metrics lets quadruped controllers transfer reliably to real hardware on unseen rough terrain.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper builds a locomotion controller that routes commands and terrain cues through a set of specialist experts inside a single policy network. It pairs the controller with RoboGauge, a battery of simulation tests that score how well any given policy should hold up when moved to a physical robot. The goal is to pick policies that work on snow, sand, stairs, slopes, and tall obstacles using only onboard sensors, while cutting down on dangerous and slow real-world trial runs. If the approach holds, teams could train once in simulation and deploy with higher that the robot will keep moving when conditions change.

Core claim

The central claim is that an MoE locomotion policy, whose gated experts decompose latent terrain features and velocity commands, achieves stronger robustness and generalization when selected by RoboGauge's multi-dimensional proprioception metrics obtained from controlled sim-to-sim trials across terrains, difficulty levels, and randomizations. This combination allows reliable deployment on a Unitree Go2 without extensive physical validation, as shown by successful traversal of snow, sand, stairs, slopes, and 30 cm obstacles plus sustained speeds of 4 m/s with an emergent narrow-width gait.

What carries the argument

Mixture-of-Experts policy whose gated specialist experts decompose latent terrain and command modeling, together with RoboGauge's proprioception-based sim-to-sim metrics for predictive policy selection.

If this is right

The MoE policy delivers superior robustness and generalization from proprioception alone on multi-terrain tasks.
RoboGauge metrics enable policy selection that avoids most physical trial-and-error.
The selected policies handle previously unseen surfaces including snow, sand, stairs, slopes, and 30 cm obstacles.
High-speed runs reach 4 m/s while producing a stable narrow-width gait that emerges without explicit reward shaping.
The framework reduces the cost and risk of moving reinforcement-learned controllers from simulation to hardware.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same expert-decomposition idea could be tested on bipeds or wheeled platforms to check whether the sim-to-real predictability generalizes beyond quadrupeds.
If RoboGauge scores prove consistent across robot platforms, future work could replace part of today's heavy domain randomization with targeted metric-guided training.
The appearance of a narrow gait at high speed suggests that stability at velocity may arise from the policy architecture itself rather than from hand-crafted reward terms.
RoboGauge-style predictive suites might be adapted to manipulation or navigation tasks where physical resets are equally expensive.

Load-bearing premise

RoboGauge's multi-dimensional proprioception-based sim-to-sim metrics accurately forecast which policies will transfer and remain robust on physical hardware without needing physical validation.

What would settle it

A side-by-side physical test on the Unitree Go2 in which a policy ranked highest by RoboGauge metrics fails to complete the reported terrain suite while a lower-ranked policy succeeds would falsify the predictive claim.

Figures

Figures reproduced from arXiv: 2602.00678 by Hanwei Guo, Jiayi Xie, Junshu Yang, Tianyang Wu, Xingyu Chen, Xinyang Sui, Xuguang Lan, Yuhang Wang, Zeyang Liu.

**Figure 1.** Figure 1: Our proposed framework integrates a Mixture-of-Experts architecture for terrain and command representation with [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗

**Figure 2.** Figure 2: Comparative analysis against one-stage proprioceptive [PITH_FULL_IMAGE:figures/full_fig_p001_2.png] view at source ↗

**Figure 3.** Figure 3: The RoboGauge evaluation architecture consists of [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: Comparison of RoboGauge scores and terrain level [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 5.** Figure 5: Comparison of maximum terrain levels across varying [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗

**Figure 6.** Figure 6: PCA visualization of the student encoder latent space [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗

**Figure 7.** Figure 7: Experiment on wooden stairs with a 10 cm rise and 15 cm drop. The upper-right plot depicts the velocity tracking [PITH_FULL_IMAGE:figures/full_fig_p008_7.png] view at source ↗

**Figure 8.** Figure 8: Robust locomotion during slope traversal and drop recovery. The left panel highlights a 1.7 s efficiency gain on [PITH_FULL_IMAGE:figures/full_fig_p008_8.png] view at source ↗

**Figure 9.** Figure 9: Velocity tracking and gait on a µ = 0.6 surface. The left plot exhibits command following reaching 4.01 m/s within 2.16 s with a 0.20 m/s error. The upper-right image captures transient flight phases while the lower-right image highlights a stable narrow-base gait. Lateral Impulse Backward Impulse [PITH_FULL_IMAGE:figures/full_fig_p008_9.png] view at source ↗

**Figure 10.** Figure 10: Continuous lateral pull disturbance rejection experi [PITH_FULL_IMAGE:figures/full_fig_p008_10.png] view at source ↗

**Figure 11.** Figure 11: Operational workflow of the BasePipeline [PITH_FULL_IMAGE:figures/full_fig_p013_11.png] view at source ↗

**Figure 12.** Figure 12: Operational workflow of the LevelPipeline [PITH_FULL_IMAGE:figures/full_fig_p013_12.png] view at source ↗

**Figure 13.** Figure 13: Ablation study on training strategies. We conducted ablation studies on the training configurations where [PITH_FULL_IMAGE:figures/full_fig_p014_13.png] view at source ↗

**Figure 14.** Figure 14: Maximum terrain difficulty levels achieved by various [PITH_FULL_IMAGE:figures/full_fig_p015_14.png] view at source ↗

**Figure 15.** Figure 15: The green dashed lines represent the ground-truth ve [PITH_FULL_IMAGE:figures/full_fig_p015_15.png] view at source ↗

**Figure 16.** Figure 16: PCA visualization of the student encoder latent space [PITH_FULL_IMAGE:figures/full_fig_p016_16.png] view at source ↗

**Figure 18.** Figure 18: The top panel shows the robot quickly adjusting its posture to safely descend when the [PITH_FULL_IMAGE:figures/full_fig_p016_18.png] view at source ↗

read the original abstract

Reinforcement learning has shown strong promise for quadrupedal agile locomotion, even with proprioception-only sensing. In practice, however, sim-to-real gap and reward overfitting in complex terrains can produce policies that fail to transfer, while physical validation remains risky and inefficient. To address these challenges, we introduce a unified framework encompassing a Mixture-of-Experts (MoE) locomotion policy for robust multi-terrain representation with RoboGauge, a predictive assessment suite that quantifies sim-to-real transferability. The MoE policy employs a gated set of specialist experts to decompose latent terrain and command modeling, achieving superior deployment robustness and generalization via proprioception alone. RoboGauge further provides multi-dimensional proprioception-based metrics via sim-to-sim tests over terrains, difficulty levels, and domain randomizations, enabling reliable MoE policy selection without extensive physical trials. Experiments on a Unitree Go2 demonstrate robust locomotion on unseen challenging terrains, including snow, sand, stairs, slopes, and 30 cm obstacles. In dedicated high-speed tests, the robot reaches 4 m/s and exhibits an emergent narrow-width gait associated with improved stability at high velocity.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The MoE policy gets solid hardware results on tough terrains but RoboGauge's sim-to-sim scores lack the correlation data needed to prove they predict real transfer.

read the letter

The core of this paper is a gated Mixture-of-Experts policy for quadrupedal locomotion that decomposes terrain and command modeling, paired with RoboGauge, a set of proprioception-based sim-to-sim metrics meant to pick policies likely to transfer without heavy physical testing. The Unitree Go2 experiments show the selected policy handling snow, sand, stairs, slopes, and 30 cm obstacles, plus 4 m/s runs with an emergent narrow gait. That hardware outcome is the clearest positive result here.

Referee Report

2 major / 2 minor

Summary. The paper introduces a Mixture-of-Experts (MoE) locomotion policy for quadrupedal robots that uses gated specialist experts to model terrain and commands, paired with RoboGauge, a suite of multi-dimensional proprioception-based sim-to-sim metrics intended to predict sim-to-real transferability and enable policy selection without extensive physical trials. Real-world experiments on a Unitree Go2 are reported to demonstrate robust locomotion on unseen terrains (snow, sand, stairs, slopes, 30 cm obstacles) at speeds up to 4 m/s with an emergent narrow-width gait.

Significance. If the predictive validity of RoboGauge holds, the framework would meaningfully advance efficient sim-to-real workflows in robotics by reducing reliance on risky hardware validation for RL policies. The reported hardware results on diverse challenging terrains provide concrete evidence of the MoE policy's practical robustness and generalization from proprioception alone.

major comments (2)

[RoboGauge description and Experiments] The central claim that RoboGauge's sim-to-sim metrics reliably forecast sim-to-real transferability (and thus enable selection of successful MoE policies) is not supported by any quantitative correlation analysis between the multi-dimensional metrics and real-world outcomes such as success rate or achieved velocity. The manuscript reports successful Unitree Go2 deployment but provides no Pearson r, regression, or statistical validation linking RoboGauge scores to physical performance.
[Experiments on Unitree Go2] No controlled ablation or baseline comparison is presented to isolate the contribution of RoboGauge-based selection from the MoE architecture itself; without this, it remains unclear whether the reported robustness stems from the predictive framework or from the policy design and training.

minor comments (2)

[Abstract and Methods] The abstract and methods sections omit full training details, exact definitions of the multi-dimensional RoboGauge metrics, aggregation procedure for policy selection, and error analysis, all of which are required for reproducibility.
[Method] Notation for the MoE gating mechanism and the precise proprioceptive inputs used in RoboGauge should be clarified with explicit equations or pseudocode to avoid ambiguity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment below and have incorporated revisions to provide stronger quantitative support for our claims.

read point-by-point responses

Referee: [RoboGauge description and Experiments] The central claim that RoboGauge's sim-to-sim metrics reliably forecast sim-to-real transferability (and thus enable selection of successful MoE policies) is not supported by any quantitative correlation analysis between the multi-dimensional metrics and real-world outcomes such as success rate or achieved velocity. The manuscript reports successful Unitree Go2 deployment but provides no Pearson r, regression, or statistical validation linking RoboGauge scores to physical performance.

Authors: We agree that a quantitative correlation analysis would provide stronger evidence for RoboGauge's predictive validity. In the revised manuscript we will add a dedicated analysis section that computes Pearson correlation coefficients (and associated p-values) between each RoboGauge dimension and the observed real-world success rates and peak velocities across the evaluated policies. This will directly link the sim-to-sim metrics to hardware outcomes. revision: yes
Referee: [Experiments on Unitree Go2] No controlled ablation or baseline comparison is presented to isolate the contribution of RoboGauge-based selection from the MoE architecture itself; without this, it remains unclear whether the reported robustness stems from the predictive framework or from the policy design and training.

Authors: We acknowledge that the current experiments do not isolate RoboGauge's contribution via controlled ablation. In the revision we will add an ablation study comparing RoboGauge-selected MoE policies against (i) MoE policies chosen solely by aggregate sim-to-sim reward and (ii) randomly selected MoE policies, reporting transfer success rates and velocity on the same hardware terrains. This will clarify the incremental benefit of the predictive selection framework. revision: yes

Circularity Check

0 steps flagged

No significant circularity; claims rest on independent sim-to-sim metrics and physical experiments

full rationale

The paper trains an MoE policy and defines RoboGauge metrics from separate sim-to-sim tests across terrains, difficulties, and domain randomizations. Policy selection uses these metrics, but real-world results on the Unitree Go2 (4 m/s, snow/sand/stairs/obstacles) are reported as direct empirical outcomes rather than derived from the metrics by construction. No equation reduces a prediction to a fitted parameter, no uniqueness theorem is imported from self-citation, and no ansatz is smuggled in. The framework is self-contained against external physical benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Based on abstract only; no explicit free parameters, axioms, or invented entities are detailed. RoboGauge metrics and MoE gating likely involve RL-fitted parameters whose values are not reported.

pith-pipeline@v0.9.0 · 5526 in / 1000 out tokens · 25724 ms · 2026-05-16T08:56:41.599261+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

64 extracted references · 64 canonical work pages · 2 internal anchors

[1]

Real-time obstacle avoidance for ma- nipulators and mobile robots.The international journal of robotics research, 5(1):90–98, 1986

Oussama Khatib. Real-time obstacle avoidance for ma- nipulators and mobile robots.The international journal of robotics research, 5(1):90–98, 1986

work page 1986
[2]

Sampling-based al- gorithms for optimal motion planning.The international journal of robotics research, 30(7):846–894, 2011

Sertac Karaman and Emilio Frazzoli. Sampling-based al- gorithms for optimal motion planning.The international journal of robotics research, 30(7):846–894, 2011

work page 2011
[3]

Learning agile and dynamic motor skills for legged robots.Science Robotics, 4(26):eaau5872, 2019

Jemin Hwangbo, Joonho Lee, Alexey Dosovitskiy, Dario Bellicoso, Vassilios Tsounis, Vladlen Koltun, and Marco Hutter. Learning agile and dynamic motor skills for legged robots.Science Robotics, 4(26):eaau5872, 2019

work page 2019
[4]

Perceptive whole-body planning for multilegged robots in confined spaces.Journal of Field Robotics, 38 (1):68–84, 2021

Russell Buchanan, Lorenz Wellhausen, Marko Bjelonic, Tirthankar Bandyopadhyay, Navinda Kottege, and Marco Hutter. Perceptive whole-body planning for multilegged robots in confined spaces.Journal of Field Robotics, 38 (1):68–84, 2021

work page 2021
[5]

Learning to walk in the real world with minimal human effort

Sehoon Ha, Peng Xu, Zhenyu Tan, Sergey Levine, and Jie Tan. Learning to walk in the real world with minimal human effort. InConference on Robot Learning, pages 1110–1120. PMLR, 2021

work page 2021
[6]

Legged robots that keep on learning: Fine-tuning locomotion policies in the real world

Laura Smith, J Chase Kew, Xue Bin Peng, Sehoon Ha, Jie Tan, and Sergey Levine. Legged robots that keep on learning: Fine-tuning locomotion policies in the real world. In2022 international conference on robotics and automation (ICRA), pages 1593–1599. IEEE, 2022

work page 2022
[7]

Robust autonomous navigation of a small-scale quadruped robot in real-world environments

Thomas Dudzik, Matthew Chignoli, Gerardo Bledt, Bryan Lim, Adam Miller, Donghyun Kim, and Sangbae Kim. Robust autonomous navigation of a small-scale quadruped robot in real-world environments. In2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 3664–3671. IEEE, 2020

work page 2020
[8]

Collision-free mpc for legged robots in static and dynamic scenes

Magnus Gaertner, Marko Bjelonic, Farbod Farshidian, and Marco Hutter. Collision-free mpc for legged robots in static and dynamic scenes. In2021 IEEE International Conference on Robotics and Automation (ICRA), pages 8266–8272. IEEE, 2021

work page 2021
[9]

A collision-free mpc for whole-body dynamic locomotion and manipulation

Jia-Ruei Chiu, Jean-Pierre Sleiman, Mayank Mittal, Far- bod Farshidian, and Marco Hutter. A collision-free mpc for whole-body dynamic locomotion and manipulation. In2022 international conference on robotics and au- tomation (ICRA), pages 4686–4693. IEEE, 2022

work page 2022
[10]

Learning a state representation and navigation in cluttered and dynamic environments.IEEE Robotics and Automation Letters, 6(3):5081–5088, 2021

David Hoeller, Lorenz Wellhausen, Farbod Farshidian, and Marco Hutter. Learning a state representation and navigation in cluttered and dynamic environments.IEEE Robotics and Automation Letters, 6(3):5081–5088, 2021

work page 2021
[11]

Vision aided dynamic exploration of unstructured terrain with a small-scale quadruped robot

Donghyun Kim, Daniel Carballo, Jared Di Carlo, Ben- jamin Katz, Gerardo Bledt, Bryan Lim, and Sangbae Kim. Vision aided dynamic exploration of unstructured terrain with a small-scale quadruped robot. In2020 IEEE International Conference on Robotics and Automation (ICRA), pages 2464–2470. IEEE, 2020

work page 2020
[12]

Walking in narrow spaces: Safety-critical locomotion control for quadrupedal robots with duality-based optimization

Qiayuan Liao, Zhongyu Li, Akshay Thirugnanam, Jun Zeng, and Koushil Sreenath. Walking in narrow spaces: Safety-critical locomotion control for quadrupedal robots with duality-based optimization. In2023 IEEE/RSJ In- ternational Conference on Intelligent Robots and Systems (IROS), pages 2723–2730. IEEE, 2023

work page 2023
[13]

An efficient locally reactive controller for safe navigation in visual teach and repeat missions.IEEE Robotics and Automation Letters, 7(2):2353–2360, 2022

Matias Mattamala, Nived Chebrolu, and Maurice Fallon. An efficient locally reactive controller for safe navigation in visual teach and repeat missions.IEEE Robotics and Automation Letters, 7(2):2353–2360, 2022

work page 2022
[14]

Learning vision-guided quadrupedal locomotion end-to-end with cross-modal transformers

Ruihan Yang, Minghao Zhang, Nicklas Hansen, Huazhe Xu, and Xiaolong Wang. Learning vision-guided quadrupedal locomotion end-to-end with cross-modal transformers. InDeep RL Workshop NeurIPS 2021

work page 2021
[15]

Resilient legged local navigation: Learning to traverse with com- promised perception end-to-end

Chong Zhang, Jin Jin, Jonas Frey, Nikita Rudin, Mat ´ıas Mattamala, Cesar Cadena, and Marco Hutter. Resilient legged local navigation: Learning to traverse with com- promised perception end-to-end. In2024 IEEE Interna- tional Conference on Robotics and Automation (ICRA), pages 34–41. IEEE, 2024

work page 2024
[16]

Learning quadrupedal locomotion over challenging terrain.Science robotics, 5 (47):eabc5986, 2020

Joonho Lee, Jemin Hwangbo, Lorenz Wellhausen, Vladlen Koltun, and Marco Hutter. Learning quadrupedal locomotion over challenging terrain.Science robotics, 5 (47):eabc5986, 2020

work page 2020
[17]

Rma: Rapid motor adaptation for legged robots

Ashish Kumar, Zipeng Fu, Deepak Pathak, and Jitendra Malik. Rma: Rapid motor adaptation for legged robots. Robotics: Science and Systems XVII, 2021

work page 2021
[18]

Leveraging symmetry in rl-based legged locomotion control

Zhi Su, Xiaoyu Huang, Daniel Ordo ˜nez-Apraez, Yunfei Li, Zhongyu Li, Qiayuan Liao, Giulio Turrisi, Massim- iliano Pontil, Claudio Semini, Yi Wu, et al. Leveraging symmetry in rl-based legged locomotion control. In2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 6899–6906. IEEE, 2024

work page 2024
[19]

Concurrent training of a control policy and a state estimator for dynamic and robust legged locomotion.IEEE Robotics and Automation Letters, 7 (2):4630–4637, 2022

Gwanghyeon Ji, Juhyeok Mun, Hyeongjun Kim, and Jemin Hwangbo. Concurrent training of a control policy and a state estimator for dynamic and robust legged locomotion.IEEE Robotics and Automation Letters, 7 (2):4630–4637, 2022

work page 2022
[20]

Learning robust and agile legged locomotion using ad- versarial motion priors.IEEE Robotics and Automation Letters, 8(8):4975–4982, 2023

Jinze Wu, Guiyang Xin, Chenkun Qi, and Yufei Xue. Learning robust and agile legged locomotion using ad- versarial motion priors.IEEE Robotics and Automation Letters, 8(8):4975–4982, 2023

work page 2023
[21]

Daydreamer: World models for physical robot learning

Philipp Wu, Alejandro Escontrela, Danijar Hafner, Pieter Abbeel, and Ken Goldberg. Daydreamer: World models for physical robot learning. InConference on robot learning, pages 2226–2240. PMLR, 2023

work page 2023
[22]

Learning to walk in minutes using massively parallel deep reinforcement learning

Nikita Rudin, David Hoeller, Philipp Reist, and Marco Hutter. Learning to walk in minutes using massively parallel deep reinforcement learning. InConference on robot learning, pages 91–100. PMLR, 2022

work page 2022
[23]

Learning robust perceptive locomotion for quadrupedal robots in the wild.Science robotics, 7(62):eabk2822, 2022

Takahiro Miki, Joonho Lee, Jemin Hwangbo, Lorenz Wellhausen, Vladlen Koltun, and Marco Hutter. Learning robust perceptive locomotion for quadrupedal robots in the wild.Science robotics, 7(62):eabk2822, 2022

work page 2022
[24]

Dreamwaq: Learning robust quadrupedal lo- comotion with implicit terrain imagination via deep reinforcement learning

I Made Aswin Nahrendra, Byeongho Yu, and Hyun Myung. Dreamwaq: Learning robust quadrupedal lo- comotion with implicit terrain imagination via deep reinforcement learning. In2023 IEEE International Conference on Robotics and Automation (ICRA), pages 5078–5084. IEEE, 2023

work page 2023
[25]

Hybrid internal model: Learning agile legged locomotion with simulated robot response

Junfeng Long, Zirui Wang, Quanyi Li, Liu Cao, Jiawei Gao, and Jiangmiao Pang. Hybrid internal model: Learning agile legged locomotion with simulated robot response. InICLR, 2024

work page 2024
[26]

Rapid locomotion via reinforcement learning

Gabriel B Margolis, Ge Yang, Kartik Paigwar, Tao Chen, and Pulkit Agrawal. Rapid locomotion via reinforcement learning. InRobotics: Science and Systems, 2022

work page 2022
[27]

Minimizing energy consumption leads to the emergence of gaits in legged robots

Zipeng Fu, Ashish Kumar, Jitendra Malik, and Deepak Pathak. Minimizing energy consumption leads to the emergence of gaits in legged robots. InConference on Robot Learning, pages 928–937. PMLR, 2022

work page 2022
[28]

Walk these ways: Tuning robot control for generalization with mul- tiplicity of behavior

Gabriel B Margolis and Pulkit Agrawal. Walk these ways: Tuning robot control for generalization with mul- tiplicity of behavior. InConference on Robot Learning, pages 22–31. PMLR, 2023

work page 2023
[29]

Multi-expert learning of adaptive legged locomotion.Science Robotics, 5(49):eabb2174, 2020

Chuanyu Yang, Kai Yuan, Qiuguo Zhu, Wanming Yu, and Zhibin Li. Multi-expert learning of adaptive legged locomotion.Science Robotics, 5(49):eabb2174, 2020

work page 2020
[30]

The transferability approach: Crossing the reality gap in evolutionary robotics.IEEE Transactions on Evolutionary Computation, 17(1):122–145, 2012

Sylvain Koos, Jean-Baptiste Mouret, and St ´ephane Don- cieux. The transferability approach: Crossing the reality gap in evolutionary robotics.IEEE Transactions on Evolutionary Computation, 17(1):122–145, 2012

work page 2012
[31]

Domain ran- domization for transferring deep neural networks from simulation to the real world

Josh Tobin, Rachel Fong, Alex Ray, Jonas Schneider, Wojciech Zaremba, and Pieter Abbeel. Domain ran- domization for transferring deep neural networks from simulation to the real world. In2017 IEEE/RSJ in- ternational conference on intelligent robots and systems (IROS), pages 23–30. IEEE, 2017

work page 2017
[32]

Sim-to-real transfer of robotic control with dynamics randomization

Xue Bin Peng, Marcin Andrychowicz, Wojciech Zaremba, and Pieter Abbeel. Sim-to-real transfer of robotic control with dynamics randomization. In2018 IEEE international conference on robotics and automa- tion (ICRA), pages 3803–3810. IEEE, 2018

work page 2018
[33]

Learning dexterous in-hand manipula- tion.The International Journal of Robotics Research, 39 (1):3–20, 2020

OpenAI: Marcin Andrychowicz, Bowen Baker, Maciek Chociej, Rafal Jozefowicz, Bob McGrew, Jakub Pa- chocki, Arthur Petron, Matthias Plappert, Glenn Powell, Alex Ray, et al. Learning dexterous in-hand manipula- tion.The International Journal of Robotics Research, 39 (1):3–20, 2020

work page 2020
[34]

Closing the sim-to-real loop: Adapting simulation randomization with real world experience

Yevgen Chebotar, Ankur Handa, Viktor Makoviychuk, Miles Macklin, Jan Issac, Nathan Ratliff, and Dieter Fox. Closing the sim-to-real loop: Adapting simulation randomization with real world experience. In2019 International Conference on Robotics and Automation (ICRA), pages 8973–8979. IEEE, 2019

work page 2019
[35]

Not only rewards but also constraints: Applications on legged robot locomotion.IEEE Trans- actions on Robotics, 40:2984–3003, 2024

Yunho Kim, Hyunsik Oh, Jeonghyun Lee, Jinhyeok Choi, Gwanghyeon Ji, Moonkyu Jung, Donghoon Youm, and Jemin Hwangbo. Not only rewards but also constraints: Applications on legged robot locomotion.IEEE Trans- actions on Robotics, 40:2984–3003, 2024

work page 2024
[36]

Proximal Policy Optimization Algorithms

John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy optimization algorithms.arXiv preprint arXiv:1707.06347, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[37]

Learning agile locomotion on risky terrains

Chong Zhang, Nikita Rudin, David Hoeller, and Marco Hutter. Learning agile locomotion on risky terrains. In 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 11864–11871. IEEE, 2024

work page 2024
[38]

Long short- term memory.Neural Computation, 9(8):1735–1780,

Sepp Hochreiter and J ¨urgen Schmidhuber. Long short- term memory.Neural Computation, 9(8):1735–1780,

work page
[39]

doi: 10.1162/neco.1997.9.8.1735

work page doi:10.1162/neco.1997.9.8.1735 1997
[40]

Hacl: History-aware curriculum learning for fast locomotion.arXiv preprint arXiv:2505.18429, 2025

Prakhar Mishra, Amir Hossain Raj, Xuesu Xiao, and Di- nesh Manocha. Hacl: History-aware curriculum learning for fast locomotion.arXiv preprint arXiv:2505.18429, 2025

work page arXiv 2025
[41]

Gaitor: Learning a unified representation across gaits for real-world quadruped locomotion

Alexander Luis Mitchell, Wolfgang Merkt, Aristotelis Papatheodorou, Ioannis Havoutis, and Ingmar Posner. Gaitor: Learning a unified representation across gaits for real-world quadruped locomotion. In8th Annual Conference on Robot Learning, 2024

work page 2024
[42]

Allgaits: Learning all quadruped gaits and transitions

Guillaume Bellegarda, Milad Shafiee, and Auke Ijspeert. Allgaits: Learning all quadruped gaits and transitions. In 2025 IEEE International Conference on Robotics and Automation (ICRA), pages 15929–15935. IEEE, 2025

work page 2025
[43]

Viability leads to the emergence of gait transitions in learning agile quadrupedal locomotion on challenging terrains.Nature Communications, 15(1):3073, 2024

Milad Shafiee, Guillaume Bellegarda, and Auke Ijspeert. Viability leads to the emergence of gait transitions in learning agile quadrupedal locomotion on challenging terrains.Nature Communications, 15(1):3073, 2024

work page 2024
[44]

Non-conflicting energy minimization in rein- forcement learning based robot control

Skand Peri, Akhil Perincherry, Bikram Pandit, and Ste- fan Lee. Non-conflicting energy minimization in rein- forcement learning based robot control. In9th Annual Conference on Robot Learning, 2025

work page 2025
[45]

Moe-loco: Mixture of experts for multitask locomotion

Runhan Huang, Shaoting Zhu, and Yilun Du. Moe-loco: Mixture of experts for multitask locomotion. In2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 14218–14225, 10 2025. doi: 10.1109/IROS60139.2025.11246585

work page doi:10.1109/iros60139.2025.11246585 2025
[46]

Evaluating real-world robot manipulation policies in simulation

Xuanlin Li, Kyle Hsu, Jiayuan Gu, Karl Pertsch, Oier Mees, Homer Rich Walke, Chuyuan Fu, Ishikaa Lunawat, Isabel Sieh, Sean Kirmani, et al. Evaluating real-world robot manipulation policies in simulation. InRSS 2024 Workshop: Data Generation for Robotics

work page 2024
[47]

Scalable policy evaluation with video world models

Wei-Cheng Tseng, Jinwei Gu, Qinsheng Zhang, Hanzi Mao, Ming-Yu Liu, Florian Shkurti, and Lin Yen-Chen. Scalable policy evaluation with video world models. arXiv preprint arXiv:2511.11520, 2025

work page arXiv 2025
[48]

Robogsim: A real2sim2real robotic gaussian splatting simulator.arXiv preprint arXiv:2411.11839, 2024

Xinhai Li, Jialin Li, Ziheng Zhang, Rui Zhang, Fan Jia, Tiancai Wang, Haoqiang Fan, Kuo-Kun Tseng, and Ruip- ing Wang. Robogsim: A real2sim2real robotic gaussian splatting simulator.arXiv preprint arXiv:2411.11839, 2024

work page arXiv 2024
[49]

Vr-robo: A real-to- sim-to-real framework for visual robot navigation and locomotion.IEEE Robotics and Automation Letters, 2025

Shaoting Zhu, Linzhan Mou, Derun Li, Baijun Ye, Runhan Huang, and Hang Zhao. Vr-robo: A real-to- sim-to-real framework for visual robot navigation and locomotion.IEEE Robotics and Automation Letters, 2025

work page 2025
[50]

Cts: Concurrent teacher-student reinforcement learning for legged locomotion.IEEE Robotics and Automation Letters, 2024

Hongxi Wang, Haoxiang Luo, Wei Zhang, and Hua Chen. Cts: Concurrent teacher-student reinforcement learning for legged locomotion.IEEE Robotics and Automation Letters, 2024

work page 2024
[51]

Deep mutual learning

Ying Zhang, Tao Xiang, Timothy M Hospedales, and Huchuan Lu. Deep mutual learning. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 4320–4328, 2018

work page 2018
[52]

Adaptive mixtures of local experts

Robert A Jacobs, Michael I Jordan, Steven J Nowlan, and Geoffrey E Hinton. Adaptive mixtures of local experts. Neural computation, 3(1):79–87, 1991

work page 1991
[53]

Hierarchical mixtures of experts and the em algorithm.Neural computation, 6(2):181–214, 1994

Michael I Jordan and Robert A Jacobs. Hierarchical mixtures of experts and the em algorithm.Neural computation, 6(2):181–214, 1994

work page 1994
[54]

Switch transformers: Scaling to trillion parameter models with simple and efficient sparsity.Journal of Machine Learn- ing Research, 23(120):1–39, 2022

William Fedus, Barret Zoph, and Noam Shazeer. Switch transformers: Scaling to trillion parameter models with simple and efficient sparsity.Journal of Machine Learn- ing Research, 23(120):1–39, 2022

work page 2022
[55]

Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer

Noam Shazeer, Azalia Mirhoseini, Krzysztof Maziarz, Andy Davis, Quoc Le, Geoffrey Hinton, and Jeff Dean. Outrageously large neural networks: The sparsely-gated mixture-of-experts layer.arXiv preprint arXiv:1701.06538, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[56]

Isaac gym: High performance gpu based physics simulation for robot learning

Viktor Makoviychuk, Lukasz Wawrzyniak, Yunrong Guo, Michelle Lu, Kier Storey, Miles Macklin, David Hoeller, Nikita Rudin, Arthur Allshire, Ankur Handa, et al. Isaac gym: High performance gpu based physics simulation for robot learning. InNeurIPS Datasets and Benchmarks, 2021

work page 2021
[57]

Mu- joco: A physics engine for model-based control

Emanuel Todorov, Tom Erez, and Yuval Tassa. Mu- joco: A physics engine for model-based control. In 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 5026–5033, 2012. doi: 10.1109/IROS.2012.6386109

work page doi:10.1109/iros.2012.6386109 2012
[58]

Zero- moment point—thirty five years of its life.International journal of humanoid robotics, 1(01):157–173, 2004

Miomir Vukobratovi ´c and Branislav Borovac. Zero- moment point—thirty five years of its life.International journal of humanoid robotics, 1(01):157–173, 2004

work page 2004
[59]

Stability of surface contacts for humanoid robots: Closed-form formulae of the contact wrench cone for rectangular support areas

St ´ephane Caron, Quang-Cuong Pham, and Yoshihiko Nakamura. Stability of surface contacts for humanoid robots: Closed-form formulae of the contact wrench cone for rectangular support areas. In2015 IEEE International Conference on Robotics and Automation (ICRA), pages 5107–5112. IEEE, 2015

work page 2015
[60]

Mcp: Learning composable hierarchical control with multiplicative compositional policies.Advances in neural information processing systems, 32, 2019

Xue Bin Peng, Michael Chang, Grace Zhang, Pieter Abbeel, and Sergey Levine. Mcp: Learning composable hierarchical control with multiplicative compositional policies.Advances in neural information processing systems, 32, 2019

work page 2019
[61]

Karl Pearson. Liii. on lines and planes of closest fit to systems of points in space.The London, Edinburgh, and Dublin philosophical magazine and journal of science, 2 (11):559–572, 1901. APPENDIXA ROBOGAUGESUPPLEMENTARYMATERIAL A. Stability Metric To provide a more comprehensive evaluation of locomotion stability, we introduce two formal physical criteri...

work page 1901
[62]

Zero Moment Point (ZMP) Margin:The Zero Moment Point (ZMP) is a fundamental concept in legged locomotion, defined as the point on the ground where the net moment of inertial and gravitational forces has no horizontal components. To formalize this metric within our framework, we establish the following definitions: •Support Polygon:The convex hull formed b...

work page
[63]

LetN c be the number of active foot contacts with the ground

Coulomb Friction Margin:To account for potential slippage and Contact Wrench Cone (CWC) constraints, we introduce a translational friction margin. LetN c be the number of active foot contacts with the ground. For each contacti, f tangent i represents the tangential force,f normal i represents the normal force, andµis the surface friction coefficient. The ...

work page
[64]

velocity trackingexp(−σ|ω cmd z −ω z|2) 0.5 Lin

1.0/2.0 Ang. velocity trackingexp(−σ|ω cmd z −ω z|2) 0.5 Lin. velocity (z)v 2 z −2.0 Ang. velocity (xy)||ω xy||2 2 −0.05 Joint acceleration¨q 2 −2.5×10 −7 Joint power|τ||˙q| T −2×10 −5 Joint torque||τ|| 2 2 −1×10 −4 Base height(h des −h) 2 −1.0 Action rate||a t −a t−1||2 2 −0.01 Action smoothness||a t −2a t−1 +a t−2||2 2 −0.01 Collisionn collision −1.0 Jo...

work page arXiv