Unified Walking, Running, and Recovery for Humanoids via State-Dependent Adversarial Motion Priors
Pith reviewed 2026-05-20 09:00 UTC · model grok-4.3
The pith
A single policy enables a humanoid robot to walk, run, and recover from falls without any mode switching.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The framework extends Adversarial Motion Priors by replacing the conventional global reference distribution with a state-dependent gate that routes each training transition to one of two discriminators: a dedicated recovery discriminator and a velocity-conditioned locomotion discriminator that jointly covers walking and running. The gate is defined by a single fixed threshold on projected gravity: the recovery discriminator is activated when body tilt exceeds approximately 37 degrees from vertical; otherwise the locomotion discriminator is used, with the normalized commanded velocity serving as a condition that selects the appropriate reference trajectory between walk and run clips. Only a 3
What carries the argument
state-dependent gate on projected gravity that routes training transitions to either a recovery discriminator or a velocity-conditioned locomotion discriminator
If this is right
- A single frozen policy executes all three behaviors at a fixed rate with no runtime mode logic.
- Smooth transitions between walking and running occur under the same controller.
- Recovery works from both prone and supine starting positions on hardware.
- The full behavior set is regularized using only three reference motion clips.
Where Pith is reading between the lines
- The gating idea could be extended to additional state signals to unify still more behaviors in one policy.
- Similar orientation-based routing might transfer to other robot types that change modes according to body pose.
- Varying the exact threshold value during training could be tested to measure its effect on transition smoothness.
Load-bearing premise
A fixed gravity threshold can reliably separate recovery states from locomotion states in both training and deployment without causing instability or needing extra boundary logic.
What would settle it
Run the deployed policy through motions that keep the projected gravity vector near the threshold value and check whether behavior switches occur correctly without instability or incorrect activation.
Figures
read the original abstract
We propose a unified reinforcement learning framework that enables a single policy to perform walking, running, and fall recovery on the Unitree G1 humanoid robot, validated on physical hardware without any explicit mode-switching command at deployment. The framework extends Adversarial Motion Priors (AMP) by replacing the conventional global reference distribution with a state-dependent gate that routes each training transition to one of two discriminators: a dedicated recovery discriminator and a velocity-conditioned locomotion discriminator that jointly covers walking and running. The gate is defined by a single fixed threshold on projected gravity: the recovery discriminator is activated when body tilt exceeds approximately $37^\circ$ from vertical ($|g_z+1|>0.6$); otherwise the locomotion discriminator is used, with the normalized commanded velocity serving as a condition that selects the appropriate reference trajectory between walk and run clips. Only three LAFAN1 reference clips are required to regularize the complete behavior set. At deployment, a single frozen ONNX policy executes at 50\,Hz with no runtime mode logic; hardware experiments demonstrate successful recovery from both prone and supine falls and smooth walk-to-run transitions under the same controller.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces a unified RL framework extending Adversarial Motion Priors (AMP) with a state-dependent gate that routes transitions to either a recovery discriminator or a velocity-conditioned locomotion discriminator based on a fixed projected-gravity threshold (|g_z + 1| > 0.6, corresponding to ~37° tilt). This enables a single policy to produce walking, running, and fall recovery behaviors using only three LAFAN1 reference clips. At deployment, the policy is exported as a frozen ONNX model running at 50 Hz on the Unitree G1 humanoid with no explicit mode-switching logic; hardware experiments claim successful recovery from prone and supine falls and smooth walk-to-run transitions.
Significance. If the central claims hold, the work demonstrates a practical route to multi-behavior unification in humanoid locomotion without runtime mode logic or multiple policies, which would reduce deployment complexity. The use of a minimal set of reference clips and hardware validation on a physical platform from both fall orientations constitutes a concrete empirical contribution to the field.
major comments (1)
- [framework section (state-dependent gate)] The description of the state-dependent gate (framework section): the unified single-policy claim at deployment depends on the fixed threshold |g_z + 1| > 0.6 producing clean separation between recovery and locomotion states in both training rollouts and real hardware dynamics. No sensitivity analysis, boundary smoothing, or ablation on the threshold value is reported, despite the abstract noting that the threshold is approximate. Projected gravity can fluctuate near the boundary during partial recoveries, sensor noise, or high-speed locomotion, risking inconsistent discriminator signals and potential chattering or failed recoveries in the frozen ONNX policy.
minor comments (1)
- [abstract] The abstract would be strengthened by including at least one quantitative hardware metric (e.g., success rate, average recovery time, or number of trials) to support the claimed hardware validation.
Simulated Author's Rebuttal
We thank the referee for the constructive and positive review of our manuscript. We address the single major comment on the state-dependent gate below and outline the revisions we will make.
read point-by-point responses
-
Referee: [framework section (state-dependent gate)] The description of the state-dependent gate (framework section): the unified single-policy claim at deployment depends on the fixed threshold |g_z + 1| > 0.6 producing clean separation between recovery and locomotion states in both training rollouts and real hardware dynamics. No sensitivity analysis, boundary smoothing, or ablation on the threshold value is reported, despite the abstract noting that the threshold is approximate. Projected gravity can fluctuate near the boundary during partial recoveries, sensor noise, or high-speed locomotion, risking inconsistent discriminator signals and potential chattering or failed recoveries in the frozen ONNX policy.
Authors: We agree that the lack of sensitivity analysis and boundary discussion is a valid concern that should be addressed to strengthen the robustness claims. The threshold |g_z + 1| > 0.6 was selected empirically to correspond to an approximate 37-degree tilt where recovery behaviors are required, providing reliable state separation in both our training rollouts and hardware tests on the Unitree G1. No chattering or inconsistent discriminator signals were observed in the reported experiments. To directly respond to the referee's point, we will add a new ablation subsection in the revised manuscript that varies the threshold over [0.4, 0.8], reports performance metrics, and analyzes boundary cases including simulated sensor noise and partial-recovery trajectories. We will also evaluate and report on a simple hysteresis band for smoothing if the results indicate improved stability. These additions will appear in the framework and experimental sections. revision: yes
Circularity Check
No significant circularity; empirical design with external references and hardware validation
full rationale
The paper extends AMP with a state-dependent gate using a fixed, author-chosen threshold on projected gravity to route between recovery and locomotion discriminators. This is a methodological design choice, not a fitted parameter or self-referential equation that makes the unified policy output tautological by construction. Training relies on three external LAFAN1 clips and RL optimization, with final claims supported by physical hardware experiments on the Unitree G1 at 50 Hz. No self-citation chains, uniqueness theorems, or ansatzes imported from prior author work are load-bearing in the derivation. The result is self-contained against external benchmarks rather than reducing to its inputs.
Axiom & Free-Parameter Ledger
free parameters (1)
- gravity threshold =
0.6
axioms (1)
- domain assumption Three LAFAN1 reference clips are representative enough to regularize walking, running, and recovery behaviors.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
The gate is defined by a single fixed threshold on projected gravity: the recovery discriminator is activated when body tilt exceeds approximately 37° from vertical (|g_z + 1| > 0.6); otherwise the locomotion discriminator is used
-
IndisputableMonolith/Foundation/DimensionForcing.leanalexander_duality_circle_linking unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
A single frozen ONNX policy executes walking, running, and fall recovery at 50 Hz with no runtime mode logic
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Capture point: A step toward humanoid push recovery,
J. Pratt, J. Carff, S. Drakunov, and A. Goswami, “Capture point: A step toward humanoid push recovery,” in2006 6th IEEE-RAS International Conference on Humanoid Robots. IEEE, 2006, pp. 200–207
work page 2006
-
[2]
Learning humanoid standing-up control across diverse postures.arXiv preprint arXiv:2502.08378, 2025
T. Huang, J. Ren, H. Wang, Z. Wang, Q. Ben, M. Wen, X. Chen, J. Li, and J. Pang, “Learning humanoid standing-up control across diverse postures,”arXiv preprint arXiv:2502.08378, 2025
-
[3]
Learning getting-up policies for real-world humanoid robots.arXiv preprint arXiv:2502.12152, 2025
X. He, R. Dong, Z. Chen, and S. Gupta, “Learning getting-up policies for real-world humanoid robots,”arXiv preprint arXiv:2502.12152, 2025
-
[4]
Hifar: Multi-stage curriculum learning for high-dynamics humanoid fall recovery,
P. Chen, Y . Wang, C. Luo, W. Cai, and M. Zhao, “Hifar: Multi-stage curriculum learning for high-dynamics humanoid fall recovery,”arXiv preprint arXiv:2502.20061, 2025
-
[5]
Frasa: An end-to-end reinforcement learning agent for fall recovery and stand up of humanoid robots,
C. Gaspard, M. Duclusaud, G. Passault, M. Daniel, and O. Ly, “Frasa: An end-to-end reinforcement learning agent for fall recovery and stand up of humanoid robots,”arXiv preprint arXiv:2410.08655, 2024
-
[6]
Safefall: Learning protective control for humanoid robots,
Z. Meng, T. Liu, L. Ma, Y . Wu, R. Song, W. Zhang, and S. Huang, “Safefall: Learning protective control for humanoid robots,”arXiv preprint arXiv:2511.18509, 2025
-
[7]
Unified humanoid fall-safety policy from a few demonstrations,
Z. Xu, S. Bahl, and D. Pathak, “Unified humanoid fall-safety policy from a few demonstrations,”arXiv preprint arXiv:2511.07407, 2025
-
[8]
Deepmimic: Example-guided deep reinforcement learning of physics-based char- acter skills,
X. B. Peng, P. Abbeel, S. Levine, and M. van de Panne, “Deepmimic: Example-guided deep reinforcement learning of physics-based char- acter skills,”ACM Transactions on Graphics, vol. 37, no. 4, 2018
work page 2018
-
[9]
Amp: Ad- versarial motion priors for stylized physics-based character control,
X. B. Peng, Z. Ma, P. Abbeel, S. Levine, and A. Kanazawa, “Amp: Ad- versarial motion priors for stylized physics-based character control,” ACM Transactions on Graphics, vol. 40, no. 4, 2021
work page 2021
-
[10]
F. G. Harvey, M. Yurick, D. Nowrouzezahrai, and C. Pal, “Robust motion in-betweening,”ACM Transactions on Graphics, vol. 39, no. 4, pp. 60:1–60:12, 2020
work page 2020
-
[11]
Orbit: A unified simulation framework for interactive robot learning environments,
M. Mittal, C. Yu, Q. Yu, J. Liu, N. Rudin, D. Hoeller, J. L. Yuan, R. Singh, Y . Guo, H. Mukopadhyay, G. Bhatt, R. Burgess-Limerick, A. Mandlekar, M. Hutter, and A. Garg, “Orbit: A unified simulation framework for interactive robot learning environments,”IEEE Robotics and Automation Letters, vol. 8, no. 6, pp. 3740–3747, 2023
work page 2023
-
[12]
Proximal Policy Optimization Algorithms
J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,”arXiv preprint arXiv:1707.06347, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.