Recognition: unknown
RecoverFormer: End-to-End Contact-Aware Recovery for Humanoid Robots
Pith reviewed 2026-05-08 11:18 UTC · model grok-4.3
The pith
A single end-to-end policy delivers multi-modal contact-aware recovery for humanoid robots and generalizes zero-shot across perturbation magnitudes, contact geometries, and dynamics shifts.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
RecoverFormer is a fully end-to-end humanoid recovery policy that learns when and how to switch among recovery behaviors including compensatory stepping, hand-environment contact, and center-of-mass reshaping while maintaining robust performance under model mismatch. The architecture combines a causal transformer over a 50-step observation history with a latent recovery mode that enables smooth transitions among distinct recovery strategies and a contact affordance head that predicts which environmental surfaces are beneficial for stabilization. Trained only on open floor, RecoverFormer transfers zero-shot to walled environments achieving 100 percent recovery success across 100-300 N pushes,
What carries the argument
Causal transformer over 50-step observation history with latent recovery mode head for strategy transitions and contact affordance head for predicting useful surfaces.
If this is right
- The policy achieves 100 percent success on 100-300 N pushes when walls are present at distances from 0.25 m to 1.4 m.
- Under +25 percent mass it reaches 75.5 percent success, 89 percent under 30 ms added latency, 91.5 percent at low friction, and 99 percent under combined perturbations.
- Latent recovery modes emerge that specialize across force regimes without any mode-level supervision.
- The same policy maintains performance when contact geometry changes at test time.
- Recovery behaviors remain robust when both contact geometry and dynamics parameters shift simultaneously.
Where Pith is reading between the lines
- If the simulation matches real contact physics, the policy could be deployed directly onto physical humanoids to handle unexpected disturbances in homes or warehouses without additional tuning.
- The latent-mode and affordance design may generalize to other high-degree-of-freedom robots that must choose among locomotion, manipulation, and balance actions.
- Combining history-based transformers with affordance prediction could reduce reliance on separate perception pipelines for contact planning.
- The observed zero-shot transfer suggests testing whether similar architectures scale to continuous locomotion tasks that interleave recovery with navigation.
Load-bearing premise
The MuJoCo simulator accurately reproduces the contact forces, friction, and latency that govern real-robot behavior.
What would settle it
Running the trained policy on a physical Unitree G1 humanoid under matching push magnitudes, wall distances, and dynamics perturbations to measure actual recovery success.
Figures
read the original abstract
Humanoid robots operating in unstructured environments must recover from unexpected disturbances-a capability that remains challenging for end-to-end control policies. We present RECOVERFORMER, a fully end-to-end humanoid recovery policy that learns when and how to switch among recovery behaviors-including compensatory stepping, hand-environment contact, and center-of-mass reshaping-while maintaining robust performance under model mismatch. The architecture combines a causal transformer over a 50-step observation history with two novel heads: a latent recovery mode that enables smooth transitions among distinct recovery strategies, and a contact affordance head that predicts which environmental surfaces (walls, railings, table edges) are beneficial for stabilization. We evaluate RECOVERFORMER on the Unitree G1 humanoid in MuJoCo. Trained only on open floor, RECOVERFORMER transfers zero shot to walled environments, achieving 100% recovery success across 100-300 N pushes and across wall distances from 0.25-1.4m. Under zero-shot dynamics mismatch, RECOVERFORMER reaches 75.5% at plus +25% mass, 89% under 30 ms latency, 91.5% at low friction, and 99% under compound friction, latency and mass perturbation. The learned latent modes specialize across force regimes without mode-level supervision, validated by t-SNE analysis of 300 episodes. Taken together, these results show that a single end-to-end policy can deliver multi-modal, contact aware humanoid recovery that generalizes across perturbation magnitude, contact geometry, and dynamics shift.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. RecoverFormer is an end-to-end policy for humanoid recovery using a causal transformer with a latent recovery mode and contact affordance head. Trained on open-floor MuJoCo simulations of the Unitree G1, it claims zero-shot generalization to walled environments (100% success for 100-300N pushes, 0.25-1.4m walls) and dynamics mismatches (75.5% for +25% mass, 89% for 30ms latency, etc.), with t-SNE showing mode specialization without supervision.
Significance. This approach could significantly advance end-to-end learning for robust humanoid control in unstructured environments by enabling multi-modal contact-aware recovery without hand-crafted behaviors. The zero-shot transfer and learned specialization are strengths if they hold beyond simulation. The work provides concrete quantitative results on generalization across perturbation types.
major comments (2)
- Abstract: The abstract reports specific success rates (e.g., 100% on walled environments, 75.5% under +25% mass) but does not mention baseline comparisons, ablation studies, training curves, or statistical details such as number of trials or variance, which are essential to evaluate whether the architecture drives the claimed generalization.
- Evaluation section: All quantitative results, including zero-shot transfer and dynamics robustness, are obtained exclusively in MuJoCo simulation. The central claim of applicability to humanoid robots in unstructured environments relies on the unverified assumption that MuJoCo accurately models contact forces, friction, and latency; no physical experiments on the Unitree G1 are reported to support sim-to-real transfer.
Simulated Author's Rebuttal
We thank the referee for their insightful comments on our work. We address each major comment in detail below, indicating the revisions we intend to make to the manuscript.
read point-by-point responses
-
Referee: Abstract: The abstract reports specific success rates (e.g., 100% on walled environments, 75.5% under +25% mass) but does not mention baseline comparisons, ablation studies, training curves, or statistical details such as number of trials or variance, which are essential to evaluate whether the architecture drives the claimed generalization.
Authors: We agree that the abstract would benefit from additional context regarding the evaluation methodology. In the revised version, we will update the abstract to include a brief mention of the baseline comparisons performed and indicate that success rates are computed over 100 trials per condition, with full statistical details (including variance) reported in the evaluation section. Ablation studies and training curves are presented in the main paper and will be referenced in the abstract where space permits. revision: yes
-
Referee: Evaluation section: All quantitative results, including zero-shot transfer and dynamics robustness, are obtained exclusively in MuJoCo simulation. The central claim of applicability to humanoid robots in unstructured environments relies on the unverified assumption that MuJoCo accurately models contact forces, friction, and latency; no physical experiments on the Unitree G1 are reported to support sim-to-real transfer.
Authors: We acknowledge that the evaluations are simulation-only, as described in the manuscript. Our robustness experiments test the policy under varied dynamics parameters to simulate real-world mismatches. We will revise the manuscript to include an expanded limitations paragraph discussing the fidelity of MuJoCo for contact modeling and our plans for future real-robot validation. However, physical experiments on the Unitree G1 are not included in this work. revision: partial
- The lack of physical experiments on the Unitree G1 to validate sim-to-real transfer.
Circularity Check
No circularity; empirical RL results independent of inputs
full rationale
The paper trains a causal-transformer policy end-to-end on a standard RL objective using only open-floor MuJoCo data. All quantitative claims (success rates under pushes, wall distances, mass/latency/friction shifts, t-SNE mode clustering) are obtained from subsequent simulation rollouts. No algebraic derivation, parameter fit renamed as prediction, self-citation chain, or ansatz is invoked to produce the reported metrics; the generalization numbers are direct experimental outcomes rather than tautological restatements of the training setup.
Axiom & Free-Parameter Ledger
free parameters (1)
- neural network weights and hyperparameters
axioms (2)
- domain assumption MuJoCo contact and dynamics model is sufficiently accurate for zero-shot generalization claims
- standard math The MDP formulation with 50-step history is Markovian enough for stable recovery learning
invented entities (2)
-
latent recovery mode
no independent evidence
-
contact affordance head
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Real-time stabilization of a falling humanoid robot using hand contact: An optimal control approach,
S. Wang and K. Hauser, “Real-time stabilization of a falling humanoid robot using hand contact: An optimal control approach,” inIEEE- RAS 17th International Conference on Humanoid Robots (Humanoids). IEEE, 2017, pp. 454–460
2017
-
[2]
Development of push-recovery control system for humanoid robots using deep reinforcement learning,
E. Aslan, M. A. Arseric ¸m, and A. Uc ¸ar, “Development of push-recovery control system for humanoid robots using deep reinforcement learning,” Ain Shams Engineering Journal, 2023
2023
-
[3]
Realization of a real-time optimal control strategy to stabilize a falling humanoid robot with hand contact,
S. Wang and K. Hauser, “Realization of a real-time optimal control strategy to stabilize a falling humanoid robot with hand contact,” in IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2018
2018
-
[4]
SafeFall: Learning protective control for humanoid robots,
Z. Meng, T. Liu, L. Ma, Y . Wu, R. Song, W. Zhang, and S. Huang, “SafeFall: Learning protective control for humanoid robots,”arXiv preprint arXiv:2511.18509, 2025
-
[5]
Unified multi-contact fall mitigation plan- ning for humanoids via contact transition tree optimization,
S. Wang and K. Hauser, “Unified multi-contact fall mitigation plan- ning for humanoids via contact transition tree optimization,” inIEEE- RAS 18th International Conference on Humanoid Robots (Humanoids). IEEE, 2018, pp. 1–9
2018
-
[6]
FRASA: An end-to-end reinforcement learning agent for fall recovery and stand up of humanoid robots,
C. Gaspard, M. Duclusaud, G. Passault, M. Daniel, and O. Ly, “FRASA: An end-to-end reinforcement learning agent for fall recovery and stand up of humanoid robots,” inIEEE International Conference on Robotics and Automation (ICRA), 2025
2025
-
[7]
Unified humanoid fall-safety policy from a few demonstrations,
Z. Xu, Y . Li, K.-y. Lin, and S. X. Yu, “Unified humanoid fall-safety policy from a few demonstrations,”arXiv preprint arXiv:2511.07407, 2025
-
[8]
Towards a maximally-robust self-balancing bicycle without reaction-moment gyro- scopes or reaction wheels,
A. M. Sharma, S. Wang, Y .-M. Zhou, and A. Ruina, “Towards a maximally-robust self-balancing bicycle without reaction-moment gyro- scopes or reaction wheels,” inBicycle and Motorcycle Dynamics, 2016
2016
-
[9]
Learning getting-up policies for real-world humanoid robots,
X. He, R. Dong, Z. Chen, and S. Gupta, “Learning getting-up policies for real-world humanoid robots,”arXiv preprint arXiv:2502.12152, 2025
-
[10]
HoST: Learning humanoid standing-up control across diverse postures,
T. Huang, J. Ren, H. Wang, Z. Wang, Q. Ben, M. Wen, X. Chen, J. Li, and J. Pang, “HoST: Learning humanoid standing-up control across diverse postures,”arXiv preprint arXiv:2502.08378, 2025
-
[11]
Efficient online calibration for au- tonomous vehicle’s longitudinal dynamical system: A Gaussian model approach,
S. Wang, C. Deng, and Q. Qi, “Efficient online calibration for au- tonomous vehicle’s longitudinal dynamical system: A Gaussian model approach,” inProceedings of the Conference, 2023
2023
-
[12]
Real-world humanoid locomotion with reinforcement learning,
I. Radosavovic, T. Xiao, B. Zhang, T. Darrell, J. Malik, and K. Sreenath, “Real-world humanoid locomotion with reinforcement learning,”Sci- ence Robotics, vol. 9, 2024
2024
-
[13]
Humanoid locomotion as next token predic- tion,
I. Radosavovic, B. Zhang, B. Shi, J. Rajasegaran, S. Kamat, T. Darrell, K. Sreenath, and J. Malik, “Humanoid locomotion as next token predic- tion,” inAdvances in Neural Information Processing Systems (NeurIPS), 2024
2024
-
[14]
ExBody: Expressive whole-body control for humanoid robots,
X. Cheng, Y . Ji, J. Chen, R. Yang, G. Yang, and X. Wang, “ExBody: Expressive whole-body control for humanoid robots,” inRobotics: Science and Systems (RSS), 2024
2024
-
[15]
Exbody2: Ad- vanced expressive humanoid whole-body control.arXiv preprint arXiv:2412.13196, 2024
M. Ji, X. Peng, F. Liu, J. Li, G. Yang, X. Cheng, and X. Wang, “ExBody2: Advanced expressive humanoid whole-body control,”arXiv preprint arXiv:2412.13196, 2024
-
[16]
WoCoCo: Learning whole-body humanoid control with sequential contacts,
C. Zhang, W. Xiao, T. He, and G. Shi, “WoCoCo: Learning whole-body humanoid control with sequential contacts,” inConference on Robot Learning (CoRL), 2024
2024
-
[17]
SENTINEL: A fully end-to-end language-action model for humanoid whole body control,
Y . Wang, H. Jiang, S. Yao, Z. Ding, and Z. Lu, “SENTINEL: A fully end-to-end language-action model for humanoid whole body control,” arXiv preprint arXiv:2511.19236, 2025
-
[18]
LangWBC: Language-directed humanoid whole-body control via end-to-end learning,
Y . Shao, B. Zhang, Q. Liao, X. Huang, Y . Gao, Y . Chi, Z. Li, S. Shao, and K. Sreenath, “LangWBC: Language-directed humanoid whole-body control via end-to-end learning,” inRobotics: Science and Systems (RSS), 2025
2025
-
[19]
HOVER: Versatile neural whole-body controller for humanoid robots,
T. He, W. Xiao, T. Lin, Z. Luo, Z. Xu, Z. Jiang, J. Kautz, C. Liu, G. Shi, X. Wang, L. Fan, and Y . Zhu, “HOVER: Versatile neural whole-body controller for humanoid robots,”arXiv preprint arXiv:2410.21229, 2024
-
[20]
LeVERB: Humanoid whole-body control with latent vision-language instruction,
H. Xue, X. Huang, D. Niu, Q. Liao, T. Kragerud, J. T. Gravdahl, X. B. Peng, G. Shi, T. Darrell, K. Sreenath, and S. Sastry, “LeVERB: Humanoid whole-body control with latent vision-language instruction,” arXiv preprint arXiv:2506.13751, 2025
-
[21]
ASAP: Aligning simulation and real-world physics for learning agile humanoid whole-body skills,
T. He, J. Gao, W. Xiao, Y . Zhang, Z. Wang, J. Wang, Z. Luo, G. He, N. Sobanbabu, C. Panet al., “ASAP: Aligning simulation and real-world physics for learning agile humanoid whole-body skills,” inRobotics: Science and Systems (RSS), 2025
2025
-
[22]
KungfuBot: Physics-based humanoid whole-body control for learning highly-dynamic skills,
W. Xie, J. Han, J. Zhenget al., “KungfuBot: Physics-based humanoid whole-body control for learning highly-dynamic skills,” inAdvances in Neural Information Processing Systems (NeurIPS), 2025
2025
-
[23]
Learning contact- rich whole-body manipulation with example-guided reinforcement learn- ing,
J. A. Barreiros, A. O. ¨Onol, M. Zhang, S. Creasey, A. Goncalves, A. Beaulieu, A. Bhat, K. M. Tsui, and A. Alspach, “Learning contact- rich whole-body manipulation with example-guided reinforcement learn- ing,”Science Robotics, vol. 10, p. eads6790, 2025
2025
-
[24]
TACT: Humanoid whole-body contact manipulation through deep imitation learning with tactile modality,
M. Murooka, T. Hoshi, K. Fukumitsu, S. Masuda, M. Hamze, T. Sasaki, M. Morisawa, and E. Yoshida, “TACT: Humanoid whole-body contact manipulation through deep imitation learning with tactile modality,” IEEE Robotics and Automation Letters (RA-L), vol. 10, no. 8, pp. 7819– 7826, 2025
2025
-
[25]
S. Nasiriany, S. Kirmani, T. Ding, L. Smith, Y . Zhu, D. Driess, D. Sadigh, and T. Xiao, “RT-Affordance: Affordances are versatile intermediate representations for robot manipulation,”arXiv preprint arXiv:2411.02704, 2024
-
[26]
A0: An affordance-aware hierarchical model for general robotic manipulation,
R. Xu, J. Zhang, M. Guoet al., “A0: An affordance-aware hierarchical model for general robotic manipulation,” inInternational Conference on Computer Vision (ICCV), 2025
2025
-
[27]
$\pi_0$: A Vision-Language-Action Flow Model for General Robot Control
K. Blacket al., “π 0: A vision-language-action flow model for general robot control,”arXiv preprint arXiv:2410.24164, 2024
work page internal anchor Pith review arXiv 2024
-
[28]
Helix: A vision-language-action model for generalist hu- manoid control,
Figure AI, “Helix: A vision-language-action model for generalist hu- manoid control,” https://www.figure.ai/news/helix, 2025
2025
-
[29]
GR00T N1: An Open Foundation Model for Generalist Humanoid Robots
J. Bjorck, F. Casta ˜neda, N. Cherniadevet al., “GR00T N1: An open foundation model for generalist humanoid robots,”arXiv preprint arXiv:2503.14734, 2025
work page internal anchor Pith review arXiv 2025
-
[30]
WholeBodyVLA: Towards unified latent VLA for whole-body loco-manipulation control,
OpenDriveLabet al., “WholeBodyVLA: Towards unified latent VLA for whole-body loco-manipulation control,” inInternational Conference on Learning Representations (ICLR), 2026
2026
-
[31]
RMA: Rapid motor adaptation for legged robots,
A. Kumar, Z. Fu, D. Pathak, and J. Malik, “RMA: Rapid motor adaptation for legged robots,” inRobotics: Science and Systems (RSS), 2021
2021
-
[32]
World model implanting for test-time adaptation of embodied agents,
M. Yoo, S. Shin, D. Sub, and D. Lee, “World model implanting for test-time adaptation of embodied agents,” inInternational Conference on Machine Learning (ICML), 2025
2025
-
[33]
TARC: Time-adaptive robotic control,
A. Sukhija, L. Treven, J. Cheng, F. D ¨orfler, S. Coros, and A. Krause, “TARC: Time-adaptive robotic control,”arXiv preprint arXiv:2510.23176, 2025
-
[34]
MAGMA: A Multi-Graph based Agentic Memory Architecture for AI Agents
D. Jiang, Y . Li, G. Li, and B. Li, “MAGMA: A multi-graph based agentic memory architecture for AI agents,”arXiv preprint arXiv:2601.03236, 2026
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[35]
Attention is all you need,
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, “Attention is all you need,”Advances in Neural Information Processing Systems (NeurIPS), 2017
2017
-
[36]
Proximal Policy Optimization Algorithms
J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Prox- imal policy optimization algorithms,”arXiv preprint arXiv:1707.06347, 2017
work page internal anchor Pith review arXiv 2017
-
[37]
MuJoCo: A physics engine for model-based control,
E. Todorov, T. Erez, and Y . Tassa, “MuJoCo: A physics engine for model-based control,” inIEEE/RSJ International Conference on Intelli- gent Robots and Systems (IROS), 2012
2012
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.