pith. sign in

arxiv: 2606.30362 · v1 · pith:BYKSVUHVnew · submitted 2026-06-29 · 💻 cs.RO · cs.AI· cs.CV

ReactiveBFM: Reactive Closed-Loop Motion Planning Towards Universal Humanoid Whole-Body Control

Pith reviewed 2026-06-30 05:06 UTC · model grok-4.3

classification 💻 cs.RO cs.AIcs.CV
keywords humanoid controlclosed-loop motion planningbehavior foundation modelsexposure biasreactive whole-body controlgenerative motion planningprefix sampling curriculum
0
0 comments X

The pith

ReactiveBFM trains generative planners on imperfect states via prefix sampling to enable reactive closed-loop humanoid control.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper claims that cascading open-loop Behavior Foundation Models with motion planners fails due to cumulative tracking errors that create exposure bias. ReactiveBFM counters this by training the planner on noisy physical states through a scheduled prefix sampling curriculum, forcing it to learn recovery behaviors. An asynchronous replanning scheme plus trajectory chunking then reconciles planning latency with high-frequency tracking. The resulting system runs on the Unitree G1 and produces text-conditioned whole-body motions that remain stable under perturbation.

Core claim

ReactiveBFM is a real-time closed-loop planning-control framework whose core is a scheduled prefix sampling curriculum that forces the generative planner to learn error-recovery behaviors from imperfect physical states rather than ground-truth trajectories; this is combined with an asynchronous replanning mechanism and trajectory chunking to produce spatio-temporally fluid execution.

What carries the argument

Scheduled prefix sampling curriculum that trains the planner on imperfect physical states to induce error-recovery behaviors.

If this is right

  • Achieves 93.1 percent success rate in sim-to-sim benchmarking under severe perturbations.
  • Outperforms cascaded open-loop baselines by 28.6 percent.
  • Enables zero-shot moving target reaching with intricate whole-body coordination and on-the-fly replanning.
  • Guarantees spatio-temporally fluid execution without physical jitter across a repertoire of text-conditioned motions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The curriculum approach may reduce dependence on accurate state estimation in unstructured settings.
  • Similar prefix sampling could be tested on other legged platforms to check transfer of recovery behaviors.
  • Pairing the planner with onboard perception could allow direct closed-loop response to visual changes without separate tracking layers.

Load-bearing premise

The scheduled prefix sampling curriculum successfully induces error-recovery behaviors in the generative planner when driven by imperfect physical states rather than ground-truth trajectories.

What would settle it

Removing the prefix sampling curriculum and measuring whether success rate under the same severe perturbations falls below 70 percent.

Figures

Figures reproduced from arXiv: 2606.30362 by Furui Xu, Huayi Wang, Jiahe Chen, Jianan Li, Jiangmiao Pang, Jingbo Wang, Kailin Li, Lihe Ding, Tai Wang, Tianfan Xue, Weishuai Zeng, Weixiang Zhong, Xiao Chen, Xiaojie Niu, Zirui Wang.

Figure 1
Figure 1. Figure 1: We introduce ReactiveBFM, a closed-loop framework integrating a behavior foundation model with a reactive whole-body motion planner. Guided by proprioceptive feedback, text, and target positions, Re￾activeBFM enables robust text-conditioned control and seamless zero-shot replanning to reach moving targets. Abstract: While current Behavior Foundation Models (BFMs) provide robust control priors for humanoids… view at source ↗
Figure 2
Figure 2. Figure 2: Overview of ReactiveBFM. (a) Asynchronous closed-loop inference coupled with a universal con￾troller via trajectory chunking and proprioceptive feedback. (b) Architecture of our reactive motion planner. It generates smooth robot trajectories from multi-modal streaming conditions. (c) Core training strategies includ￾ing a scheduled prefix sampling curriculum and condition adherence to mitigate exposure bias… view at source ↗
Figure 3
Figure 3. Figure 3: (a) ReactiveBFM enables long-horizon zero-shot deployment of reaching a moving target. (b) The recorded actual robot-target trajectories show the effectiveness. (c) The curve illustrates that our system reac￾tively plans coordinated whole-body motion and always reaches the moving global localizer. High-Quality Data Curation. Raw kinematic datasets often contain physically infeasible ar￾tifacts, such as foo… view at source ↗
Figure 4
Figure 4. Figure 4: Real-world deployment of ReactiveBFM under text-conditioned and streaming interactive [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Real-world robustness evaluation under diverse physical perturbations: (a) repeated heavy kicks; (b) holding for over 1s to forcibly reverse the rotation direction; (c) repeated strikes with a 3kg ball; and (d) being dragged off-balance and down. evaluation in [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Timeline and latency analysis during real-world deployment. via an HTC Vive Ultimate Tracker as shown in [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: We utilize HTC VIVE Ultimate Trackers for global localization. During deployment, one tracker is mounted on the back of the robot’s pelvis, and another is attached to a handheld toy sword [PITH_FULL_IMAGE:figures/full_fig_p015_7.png] view at source ↗
read the original abstract

While current Behavior Foundation Models (BFMs) provide robust control priors for humanoids, they only execute pre-defined reference motions. As a result, they are vulnerable to environmental shifts and incapable of reactive whole-body coordination. Naively cascading them with generative motion planners fails to achieve true reactivity, as inevitable tracking discrepancies induce fatal cumulative exposure bias. To bridge this gap, we propose ReactiveBFM, a real-time closed-loop planning-control framework. At its core, we effectively mitigate exposure bias via a scheduled prefix sampling curriculum, forcing the generative planner to actively learn error-recovery behaviors from imperfect physical states rather than ground-truth trajectories. Systematically, to reconcile the severe latency mismatch between auto-regressive planning and high-frequency tracking, we introduce an asynchronous replanning mechanism. Combined with trajectory chunking to temporally ensemble spatial references, our system guarantees spatio-temporally fluid execution without physical jitter. Deployed on the Unitree G1 humanoid, ReactiveBFM demonstrates unprecedented physical agility across a vast repertoire of text-conditioned closed-loop motions. Notably, ReactiveBFM achieves zero-shot moving target reaching, showcasing intricate whole-body coordination and on-the-fly replanning. In sim-to-sim benchmarking under severe perturbations, ReactiveBFM achieves a 93.1% success rate, significantly outperforming cascaded open-loop baselines by 28.6%.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces ReactiveBFM, a real-time closed-loop framework for humanoid whole-body control that combines generative motion planning with a scheduled prefix sampling curriculum to mitigate exposure bias, asynchronous replanning to handle latency mismatch, and trajectory chunking for fluid execution. It claims this enables reactive text-conditioned motions on the Unitree G1, including zero-shot moving target reaching, and reports a 93.1% success rate in sim-to-sim benchmarking under severe perturbations, outperforming cascaded open-loop baselines by 28.6%.

Significance. If the performance gains and mechanism are substantiated, the work would advance reactive control for humanoids by addressing a key limitation of behavior foundation models (exposure bias under imperfect state feedback), potentially enabling more robust closed-loop behaviors without hand-crafted recovery policies.

major comments (2)
  1. [Abstract] Abstract (paragraph on mitigation of exposure bias): The central claim that the scheduled prefix sampling curriculum induces error-recovery behaviors from imperfect physical states (rather than ground-truth trajectories) is load-bearing for the 28.6% performance gain, yet no ablation is described that removes or randomizes the prefix schedule while holding replanning, chunking, and the base BFM fixed. Without this isolation, the success-rate delta cannot be attributed specifically to the curriculum.
  2. [Abstract] Abstract (sim-to-sim benchmarking paragraph): The reported 93.1% success rate and 28.6% improvement lack supporting details on curriculum schedule parameters, baseline definitions, perturbation magnitudes, number of trials, or statistical tests, leaving the quantitative claim unsupported by visible evidence in the manuscript.
minor comments (2)
  1. The manuscript should include explicit pseudocode or a table for the prefix sampling schedule and the asynchronous replanning logic to allow reproduction.
  2. Clarify whether the sim-to-sim results use the same perturbation distribution for training and testing, and report variance across seeds.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address the major comments below and will revise the manuscript to strengthen the presentation of our claims.

read point-by-point responses
  1. Referee: [Abstract] Abstract (paragraph on mitigation of exposure bias): The central claim that the scheduled prefix sampling curriculum induces error-recovery behaviors from imperfect physical states (rather than ground-truth trajectories) is load-bearing for the 28.6% performance gain, yet no ablation is described that removes or randomizes the prefix schedule while holding replanning, chunking, and the base BFM fixed. Without this isolation, the success-rate delta cannot be attributed specifically to the curriculum.

    Authors: We agree that an explicit ablation isolating the scheduled prefix sampling curriculum is necessary to rigorously attribute the performance gains. The manuscript describes the curriculum's role in learning error-recovery from imperfect states, but does not include a controlled comparison against a fixed or randomized prefix schedule with replanning and chunking held constant. In the revised manuscript, we will add this ablation in the experiments section to directly quantify its contribution to the 28.6% improvement. revision: yes

  2. Referee: [Abstract] Abstract (sim-to-sim benchmarking paragraph): The reported 93.1% success rate and 28.6% improvement lack supporting details on curriculum schedule parameters, baseline definitions, perturbation magnitudes, number of trials, or statistical tests, leaving the quantitative claim unsupported by visible evidence in the manuscript.

    Authors: We acknowledge that the abstract and main text require expanded details to support the quantitative results. In the revision, we will update both the abstract and the sim-to-sim benchmarking section to specify curriculum schedule parameters, exact baseline configurations, perturbation magnitudes, number of trials, and statistical tests (including means, standard deviations, and significance levels). revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical results independent of inputs

full rationale

The paper's core claims rest on an empirical sim-to-sim success rate (93.1%) measured under perturbations, presented as an outcome of the proposed asynchronous replanning, chunking, and prefix-sampling curriculum rather than a quantity defined by construction from those mechanisms. No equations appear that equate a fitted parameter to a renamed prediction, no self-citation chain supplies a uniqueness theorem that forces the architecture, and the exposure-bias mitigation is described as a training procedure whose effect is validated externally by benchmark deltas. The derivation chain therefore remains self-contained against external benchmarks; absence of an ablation isolating the curriculum is a question of evidence strength, not circular reduction.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

Abstract-only review yields minimal ledger entries; the curriculum schedule parameters and simulation fidelity are the primary unexamined elements.

free parameters (1)
  • prefix sampling schedule parameters
    The curriculum relies on an unspecified schedule whose values determine how quickly imperfect states are introduced.
axioms (1)
  • domain assumption The simulation environment faithfully reproduces the dynamics needed for sim-to-sim transfer to physical Unitree G1 performance
    Benchmarking claims rest on this unstated transfer assumption.

pith-pipeline@v0.9.1-grok · 5824 in / 1060 out tokens · 49275 ms · 2026-06-30T05:06:51.077361+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

43 extracted references · 22 canonical work pages · 8 internal anchors

  1. [1]

    K. Yin, W. Zeng, K. Fan, M. Dai, Z. Wang, Q. Zhang, Z. Tian, J. Wang, J. Pang, and W. Zhang. Unitracker: Learning universal whole-body motion tracker for humanoid robots.arXiv preprint arXiv:2507.07356, 2025

  2. [2]

    Y . Ze, S. Zhao, W. Wang, A. Kanazawa, R. Duan, P. Abbeel, G. Shi, J. Wu, and C. K. Liu. Twist2: Scalable, portable, and holistic humanoid data collection system.arXiv preprint arXiv:2511.02832, 2025

  3. [3]

    Z. Luo, Y . Yuan, T. Wang, C. Li, S. Chen, F. Casta ˜neda, Z.-A. Cao, J. Li, D. Minor, Q. Ben, et al. Sonic: Supersizing motion tracking for natural humanoid whole-body control.arXiv preprint arXiv:2511.07820, 2025

  4. [4]

    W. Zeng, S. Lu, K. Yin, X. Niu, M. Dai, J. Wang, and J. Pang. Behavior foundation model for humanoid robots.arXiv preprint arXiv:2509.13780, 2025

  5. [5]

    Y . Li, Z. Luo, T. Zhang, C. Dai, A. Kanervisto, A. Tirinzoni, H. Weng, K. Kitani, M. Guzek, A. Touati, et al. Bfm-zero: A promptable behavioral foundation model for humanoid control using unsupervised reinforcement learning.arXiv preprint arXiv:2511.04131, 2025

  6. [6]

    Y . Wei, Z. Wang, K. Yin, Y . Hu, J. Wang, and S. Chen. Unveiling the impact of data and model scaling on high-level control for humanoid robots.arXiv preprint arXiv:2511.09241, 2025

  7. [7]

    J. Li, X. Chen, T. Huang, and T.-T. Wong. Learning to control physically-simulated 3d char- acters via generating and mimicking 2d motions.arXiv preprint arXiv:2512.08500, 2025

  8. [8]

    Kalaria, S

    D. Kalaria, S. S. Harithas, P. Katara, S. Kwak, S. Bhagat, S. Sastry, S. Sridhar, S. Vemprala, A. Kapoor, and J. C.-K. Huang. Dreamcontrol: Human-inspired whole-body humanoid control for scene interaction via guided diffusion.arXiv preprint arXiv:2509.14353, 2025

  9. [9]

    J. Li, J. Cao, H. Zhang, D. Rempe, J. Kautz, U. Iqbal, and Y . Yuan. Genmo: A generalist model for human motion. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 11766–11776, 2025

  10. [10]

    R. Chen, M. Shi, S. Huang, P. Tan, T. Komura, and X. Chen. Taming diffusion probabilistic models for character control. InACM SIGGRAPH 2024 Conference Papers, pages 1–10, 2024

  11. [11]

    K. Zhao, G. Li, and S. Tang. Dartcontrol: A diffusion-based autoregressive motion model for real-time text-driven motion control. InInternational Conference on Learning Representa- tions, volume 2025, pages 23569–23592, 2025

  12. [12]

    Tevet, S

    G. Tevet, S. Raab, S. Cohan, D. Reda, Z. Luo, X. B. Peng, A. H. Bermano, and M. van de Panne. Closd: Closing the loop between simulation and diffusion for multi-task character control. InThe Thirteenth International Conference on Learning Representations

  13. [13]

    T. Z. Zhao, V . Kumar, S. Levine, and C. Finn. Learning fine-grained bimanual manipulation with low-cost hardware.arXiv preprint arXiv:2304.13705, 2023

  14. [14]

    Z. Fu, T. Z. Zhao, and C. Finn. Mobile aloha: Learning bimanual mobile manipulation with low-cost whole-body teleoperation.arXiv preprint arXiv:2401.02117, 2024

  15. [15]

    W. Xie, J. Zheng, J. Han, J. Shi, W. Zhang, C. Bai, and X. Li. Textop: Real-time interactive text-driven humanoid robot motion generation and control.arXiv preprint arXiv:2602.07439, 2026

  16. [16]

    A. Q. Nichol, P. Dhariwal, A. Ramesh, P. Shyam, P. Mishkin, B. Mcgrew, I. Sutskever, and M. Chen. Glide: Towards photorealistic image generation and editing with text-guided diffu- sion models. InInternational Conference on Machine Learning, pages 16784–16804. PMLR, 2022. 10

  17. [17]

    Hierarchical Text-Conditional Image Generation with CLIP Latents

    A. Ramesh, P. Dhariwal, A. Nichol, C. Chu, and M. Chen. Hierarchical text-conditional image generation with clip latents.arXiv preprint arXiv:2204.06125, 1(2):3, 2022

  18. [18]

    Rombach, A

    R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer. High-resolution image syn- thesis with latent diffusion models. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695, 2022

  19. [19]

    Tevet, S

    G. Tevet, S. Raab, B. Gordon, Y . Shafir, D. Cohen-or, and A. H. Bermano. Human motion diffusion model. InThe Eleventh International Conference on Learning Representations, 2023

  20. [20]

    Zhang, Y

    J. Zhang, Y . Zhang, X. Cun, Y . Zhang, H. Zhao, H. Lu, X. Shen, and Y . Shan. Generating human motion from textual descriptions with discrete representations. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 14730–14740, 2023

  21. [21]

    Jiang, X

    B. Jiang, X. Chen, W. Liu, J. Yu, G. Yu, and T. Chen. Motiongpt: Human motion as a foreign language.Advances in Neural Information Processing Systems, 36:20067–20079, 2023

  22. [22]

    Mahmood, N

    N. Mahmood, N. Ghorbani, N. F. Troje, G. Pons-Moll, and M. J. Black. Amass: Archive of motion capture as surface shapes. InProceedings of the IEEE/CVF international conference on computer vision, pages 5442–5451, 2019

  23. [23]

    C. Guo, S. Zou, X. Zuo, S. Wang, W. Ji, X. Li, and L. Cheng. Generating diverse and natural 3d human motions from text. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5152–5161, 2022

  24. [24]

    J. Lin, A. Zeng, S. Lu, Y . Cai, R. Zhang, H. Wang, and L. Zhang. Motion-x: A large-scale 3d expressive whole-body human motion dataset.Advances in Neural Information Processing Systems, 36:25268–25280, 2023

  25. [25]

    S. Lu, J. Wang, Z. Lu, L.-H. Chen, W. Dai, J. Dong, Z. Dou, B. Dai, and R. Zhang. Scamo: Exploring the scaling law in autoregressive motion generation model. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 27872–27882, 2025

  26. [26]

    K. Fan, S. Lu, M. Dai, R. Yu, L. Xiao, Z. Dou, J. Dong, L. Ma, and J. Wang. Go to zero: Towards zero-shot motion generation with million-scale data. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 13336–13348, 2025

  27. [27]

    Z. Luo, J. Cao, K. Kitani, W. Xu, et al. Perpetual humanoid control for real-time simulated avatars. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 10895–10904, 2023

  28. [28]

    Cheng, Y

    X. Cheng, Y . Ji, J. Chen, R. Yang, G. Yang, and X. Wang. Expressive whole-body control for humanoid robots.arXiv preprint arXiv:2402.16796, 2024

  29. [29]

    Huang, H

    T. Huang, H. Wang, J. Ren, K. Yin, Z. Wang, X. Chen, F. Jia, W. Zhang, J. Long, J. Wang, et al. Towards adaptable humanoid control via adaptive motion tracking.arXiv preprint arXiv:2510.14454, 2025

  30. [30]

    Mason, S

    I. Mason, S. Starke, and T. Komura. Real-time style modelling of human locomotion via feature-wise transformations and local motion phases.Proceedings of the ACM on Computer Graphics and Interactive Techniques, 5(1):1–18, 2022

  31. [31]

    J. Chen, Z. Wang, F. Jia, X. Chen, X. Niu, W. Zeng, T. Xue, X. Zhou, J. Pang, and J. Wang. Imagine2real: Towards zero-shot humanoid-object interaction via video generative priors. arXiv preprint arXiv:2605.22272, 2026

  32. [32]

    L. Xiao, S. Lu, H. Pi, K. Fan, L. Pan, Y . Zhou, Z. Feng, X. Zhou, S. Peng, and J. Wang. Motionstreamer: Streaming motion generation via diffusion-based autoregressive model in causal latent space.arXiv preprint arXiv:2503.15451, 2025. 11

  33. [33]

    Coumans and Y

    E. Coumans and Y . Bai. Pybullet, a python module for physics simulation for games, robotics and machine learning, 2016

  34. [34]

    H. Wang, W. Zhang, R. Yu, T. Huang, J. Ren, F. Jia, Z. Wang, X. Niu, X. Chen, J. Chen, Q. Chen, J. Wang, and J. Pang. Physhsi: Towards a real-world generalizable and natural humanoid-scene interaction system.arXiv preprint arXiv:2510.11072, 2025

  35. [35]

    Anonymous

    A. Anonymous. Scaling behavior foundation model for humanoid robots.Under Review, 2026

  36. [36]

    Proximal Policy Optimization Algorithms

    J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov. Proximal policy optimization algorithms.arXiv preprint arXiv:1707.06347, 2017

  37. [37]

    B.; Jiang, Y.; Wang, T.; Iqbal, U.; Minor, D.; de Ruyter, M.; et al

    D. Rempe, M. Petrovich, Y . Yuan, H. Zhang, X. B. Peng, Y . Jiang, T. Wang, U. Iqbal, D. Minor, M. de Ruyter, et al. Kimodo: Scaling controllable human motion generation.arXiv preprint arXiv:2603.15546, 2026

  38. [38]

    J. P. Araujo, Y . Ze, P. Xu, J. Wu, and C. K. Liu. Retargeting matters: General motion retargeting for humanoid motion tracking.arXiv preprint arXiv:2510.02252, 2025

  39. [39]

    Vaswani, N

    A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polo- sukhin. Attention is all you need.Advances in neural information processing systems, 30, 2017

  40. [40]

    N. Shazeer. Glu variants improve transformer.arXiv preprint arXiv:2002.05202, 2020

  41. [41]

    Zhang and R

    B. Zhang and R. Sennrich. Root mean square layer normalization.Advances in neural infor- mation processing systems, 32, 2019

  42. [42]

    J. Su, M. Ahmed, Y . Lu, S. Pan, W. Bo, and Y . Liu. Roformer: Enhanced transformer with rotary position embedding.Neurocomputing, 568:127063, 2024

  43. [43]

    Isaac Lab: A GPU-Accelerated Simulation Framework for Multi-Modal Robot Learning

    M. Mittal, P. Roth, J. Tigue, A. Richard, O. Zhang, P. Du, A. Serrano-Mu ˜noz, X. Yao, R. Zurbr ¨ugg, N. Rudin, L. Wawrzyniak, M. Rakhsha, A. Denzler, E. Heiden, A. Borovicka, O. Ahmed, I. Akinola, A. Anwar, M. T. Carlson, J. Y . Feng, A. Garg, R. Gasoto, L. Gulich, Y . Guo, M. Gussert, A. Hansen, M. Kulkarni, C. Li, W. Liu, V . Makoviychuk, G. Malczyk, H...