ReactiveBFM: Reactive Closed-Loop Motion Planning Towards Universal Humanoid Whole-Body Control
Pith reviewed 2026-06-30 05:06 UTC · model grok-4.3
The pith
ReactiveBFM trains generative planners on imperfect states via prefix sampling to enable reactive closed-loop humanoid control.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
ReactiveBFM is a real-time closed-loop planning-control framework whose core is a scheduled prefix sampling curriculum that forces the generative planner to learn error-recovery behaviors from imperfect physical states rather than ground-truth trajectories; this is combined with an asynchronous replanning mechanism and trajectory chunking to produce spatio-temporally fluid execution.
What carries the argument
Scheduled prefix sampling curriculum that trains the planner on imperfect physical states to induce error-recovery behaviors.
If this is right
- Achieves 93.1 percent success rate in sim-to-sim benchmarking under severe perturbations.
- Outperforms cascaded open-loop baselines by 28.6 percent.
- Enables zero-shot moving target reaching with intricate whole-body coordination and on-the-fly replanning.
- Guarantees spatio-temporally fluid execution without physical jitter across a repertoire of text-conditioned motions.
Where Pith is reading between the lines
- The curriculum approach may reduce dependence on accurate state estimation in unstructured settings.
- Similar prefix sampling could be tested on other legged platforms to check transfer of recovery behaviors.
- Pairing the planner with onboard perception could allow direct closed-loop response to visual changes without separate tracking layers.
Load-bearing premise
The scheduled prefix sampling curriculum successfully induces error-recovery behaviors in the generative planner when driven by imperfect physical states rather than ground-truth trajectories.
What would settle it
Removing the prefix sampling curriculum and measuring whether success rate under the same severe perturbations falls below 70 percent.
Figures
read the original abstract
While current Behavior Foundation Models (BFMs) provide robust control priors for humanoids, they only execute pre-defined reference motions. As a result, they are vulnerable to environmental shifts and incapable of reactive whole-body coordination. Naively cascading them with generative motion planners fails to achieve true reactivity, as inevitable tracking discrepancies induce fatal cumulative exposure bias. To bridge this gap, we propose ReactiveBFM, a real-time closed-loop planning-control framework. At its core, we effectively mitigate exposure bias via a scheduled prefix sampling curriculum, forcing the generative planner to actively learn error-recovery behaviors from imperfect physical states rather than ground-truth trajectories. Systematically, to reconcile the severe latency mismatch between auto-regressive planning and high-frequency tracking, we introduce an asynchronous replanning mechanism. Combined with trajectory chunking to temporally ensemble spatial references, our system guarantees spatio-temporally fluid execution without physical jitter. Deployed on the Unitree G1 humanoid, ReactiveBFM demonstrates unprecedented physical agility across a vast repertoire of text-conditioned closed-loop motions. Notably, ReactiveBFM achieves zero-shot moving target reaching, showcasing intricate whole-body coordination and on-the-fly replanning. In sim-to-sim benchmarking under severe perturbations, ReactiveBFM achieves a 93.1% success rate, significantly outperforming cascaded open-loop baselines by 28.6%.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces ReactiveBFM, a real-time closed-loop framework for humanoid whole-body control that combines generative motion planning with a scheduled prefix sampling curriculum to mitigate exposure bias, asynchronous replanning to handle latency mismatch, and trajectory chunking for fluid execution. It claims this enables reactive text-conditioned motions on the Unitree G1, including zero-shot moving target reaching, and reports a 93.1% success rate in sim-to-sim benchmarking under severe perturbations, outperforming cascaded open-loop baselines by 28.6%.
Significance. If the performance gains and mechanism are substantiated, the work would advance reactive control for humanoids by addressing a key limitation of behavior foundation models (exposure bias under imperfect state feedback), potentially enabling more robust closed-loop behaviors without hand-crafted recovery policies.
major comments (2)
- [Abstract] Abstract (paragraph on mitigation of exposure bias): The central claim that the scheduled prefix sampling curriculum induces error-recovery behaviors from imperfect physical states (rather than ground-truth trajectories) is load-bearing for the 28.6% performance gain, yet no ablation is described that removes or randomizes the prefix schedule while holding replanning, chunking, and the base BFM fixed. Without this isolation, the success-rate delta cannot be attributed specifically to the curriculum.
- [Abstract] Abstract (sim-to-sim benchmarking paragraph): The reported 93.1% success rate and 28.6% improvement lack supporting details on curriculum schedule parameters, baseline definitions, perturbation magnitudes, number of trials, or statistical tests, leaving the quantitative claim unsupported by visible evidence in the manuscript.
minor comments (2)
- The manuscript should include explicit pseudocode or a table for the prefix sampling schedule and the asynchronous replanning logic to allow reproduction.
- Clarify whether the sim-to-sim results use the same perturbation distribution for training and testing, and report variance across seeds.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. We address the major comments below and will revise the manuscript to strengthen the presentation of our claims.
read point-by-point responses
-
Referee: [Abstract] Abstract (paragraph on mitigation of exposure bias): The central claim that the scheduled prefix sampling curriculum induces error-recovery behaviors from imperfect physical states (rather than ground-truth trajectories) is load-bearing for the 28.6% performance gain, yet no ablation is described that removes or randomizes the prefix schedule while holding replanning, chunking, and the base BFM fixed. Without this isolation, the success-rate delta cannot be attributed specifically to the curriculum.
Authors: We agree that an explicit ablation isolating the scheduled prefix sampling curriculum is necessary to rigorously attribute the performance gains. The manuscript describes the curriculum's role in learning error-recovery from imperfect states, but does not include a controlled comparison against a fixed or randomized prefix schedule with replanning and chunking held constant. In the revised manuscript, we will add this ablation in the experiments section to directly quantify its contribution to the 28.6% improvement. revision: yes
-
Referee: [Abstract] Abstract (sim-to-sim benchmarking paragraph): The reported 93.1% success rate and 28.6% improvement lack supporting details on curriculum schedule parameters, baseline definitions, perturbation magnitudes, number of trials, or statistical tests, leaving the quantitative claim unsupported by visible evidence in the manuscript.
Authors: We acknowledge that the abstract and main text require expanded details to support the quantitative results. In the revision, we will update both the abstract and the sim-to-sim benchmarking section to specify curriculum schedule parameters, exact baseline configurations, perturbation magnitudes, number of trials, and statistical tests (including means, standard deviations, and significance levels). revision: yes
Circularity Check
No significant circularity; empirical results independent of inputs
full rationale
The paper's core claims rest on an empirical sim-to-sim success rate (93.1%) measured under perturbations, presented as an outcome of the proposed asynchronous replanning, chunking, and prefix-sampling curriculum rather than a quantity defined by construction from those mechanisms. No equations appear that equate a fitted parameter to a renamed prediction, no self-citation chain supplies a uniqueness theorem that forces the architecture, and the exposure-bias mitigation is described as a training procedure whose effect is validated externally by benchmark deltas. The derivation chain therefore remains self-contained against external benchmarks; absence of an ablation isolating the curriculum is a question of evidence strength, not circular reduction.
Axiom & Free-Parameter Ledger
free parameters (1)
- prefix sampling schedule parameters
axioms (1)
- domain assumption The simulation environment faithfully reproduces the dynamics needed for sim-to-sim transfer to physical Unitree G1 performance
Reference graph
Works this paper leans on
- [1]
- [2]
-
[3]
Z. Luo, Y . Yuan, T. Wang, C. Li, S. Chen, F. Casta ˜neda, Z.-A. Cao, J. Li, D. Minor, Q. Ben, et al. Sonic: Supersizing motion tracking for natural humanoid whole-body control.arXiv preprint arXiv:2511.07820, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
- [4]
- [5]
- [6]
- [7]
-
[8]
D. Kalaria, S. S. Harithas, P. Katara, S. Kwak, S. Bhagat, S. Sastry, S. Sridhar, S. Vemprala, A. Kapoor, and J. C.-K. Huang. Dreamcontrol: Human-inspired whole-body humanoid control for scene interaction via guided diffusion.arXiv preprint arXiv:2509.14353, 2025
-
[9]
J. Li, J. Cao, H. Zhang, D. Rempe, J. Kautz, U. Iqbal, and Y . Yuan. Genmo: A generalist model for human motion. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 11766–11776, 2025
2025
-
[10]
R. Chen, M. Shi, S. Huang, P. Tan, T. Komura, and X. Chen. Taming diffusion probabilistic models for character control. InACM SIGGRAPH 2024 Conference Papers, pages 1–10, 2024
2024
-
[11]
K. Zhao, G. Li, and S. Tang. Dartcontrol: A diffusion-based autoregressive motion model for real-time text-driven motion control. InInternational Conference on Learning Representa- tions, volume 2025, pages 23569–23592, 2025
2025
-
[12]
Tevet, S
G. Tevet, S. Raab, S. Cohan, D. Reda, Z. Luo, X. B. Peng, A. H. Bermano, and M. van de Panne. Closd: Closing the loop between simulation and diffusion for multi-task character control. InThe Thirteenth International Conference on Learning Representations
-
[13]
T. Z. Zhao, V . Kumar, S. Levine, and C. Finn. Learning fine-grained bimanual manipulation with low-cost hardware.arXiv preprint arXiv:2304.13705, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[14]
Z. Fu, T. Z. Zhao, and C. Finn. Mobile aloha: Learning bimanual mobile manipulation with low-cost whole-body teleoperation.arXiv preprint arXiv:2401.02117, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
- [15]
-
[16]
A. Q. Nichol, P. Dhariwal, A. Ramesh, P. Shyam, P. Mishkin, B. Mcgrew, I. Sutskever, and M. Chen. Glide: Towards photorealistic image generation and editing with text-guided diffu- sion models. InInternational Conference on Machine Learning, pages 16784–16804. PMLR, 2022. 10
2022
-
[17]
Hierarchical Text-Conditional Image Generation with CLIP Latents
A. Ramesh, P. Dhariwal, A. Nichol, C. Chu, and M. Chen. Hierarchical text-conditional image generation with clip latents.arXiv preprint arXiv:2204.06125, 1(2):3, 2022
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[18]
Rombach, A
R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer. High-resolution image syn- thesis with latent diffusion models. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695, 2022
2022
-
[19]
Tevet, S
G. Tevet, S. Raab, B. Gordon, Y . Shafir, D. Cohen-or, and A. H. Bermano. Human motion diffusion model. InThe Eleventh International Conference on Learning Representations, 2023
2023
-
[20]
Zhang, Y
J. Zhang, Y . Zhang, X. Cun, Y . Zhang, H. Zhao, H. Lu, X. Shen, and Y . Shan. Generating human motion from textual descriptions with discrete representations. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 14730–14740, 2023
2023
-
[21]
Jiang, X
B. Jiang, X. Chen, W. Liu, J. Yu, G. Yu, and T. Chen. Motiongpt: Human motion as a foreign language.Advances in Neural Information Processing Systems, 36:20067–20079, 2023
2023
-
[22]
Mahmood, N
N. Mahmood, N. Ghorbani, N. F. Troje, G. Pons-Moll, and M. J. Black. Amass: Archive of motion capture as surface shapes. InProceedings of the IEEE/CVF international conference on computer vision, pages 5442–5451, 2019
2019
-
[23]
C. Guo, S. Zou, X. Zuo, S. Wang, W. Ji, X. Li, and L. Cheng. Generating diverse and natural 3d human motions from text. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5152–5161, 2022
2022
-
[24]
J. Lin, A. Zeng, S. Lu, Y . Cai, R. Zhang, H. Wang, and L. Zhang. Motion-x: A large-scale 3d expressive whole-body human motion dataset.Advances in Neural Information Processing Systems, 36:25268–25280, 2023
2023
-
[25]
S. Lu, J. Wang, Z. Lu, L.-H. Chen, W. Dai, J. Dong, Z. Dou, B. Dai, and R. Zhang. Scamo: Exploring the scaling law in autoregressive motion generation model. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 27872–27882, 2025
2025
-
[26]
K. Fan, S. Lu, M. Dai, R. Yu, L. Xiao, Z. Dou, J. Dong, L. Ma, and J. Wang. Go to zero: Towards zero-shot motion generation with million-scale data. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 13336–13348, 2025
2025
-
[27]
Z. Luo, J. Cao, K. Kitani, W. Xu, et al. Perpetual humanoid control for real-time simulated avatars. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 10895–10904, 2023
2023
- [28]
- [29]
-
[30]
Mason, S
I. Mason, S. Starke, and T. Komura. Real-time style modelling of human locomotion via feature-wise transformations and local motion phases.Proceedings of the ACM on Computer Graphics and Interactive Techniques, 5(1):1–18, 2022
2022
-
[31]
J. Chen, Z. Wang, F. Jia, X. Chen, X. Niu, W. Zeng, T. Xue, X. Zhou, J. Pang, and J. Wang. Imagine2real: Towards zero-shot humanoid-object interaction via video generative priors. arXiv preprint arXiv:2605.22272, 2026
work page internal anchor Pith review Pith/arXiv arXiv 2026
- [32]
-
[33]
Coumans and Y
E. Coumans and Y . Bai. Pybullet, a python module for physics simulation for games, robotics and machine learning, 2016
2016
- [34]
-
[35]
Anonymous
A. Anonymous. Scaling behavior foundation model for humanoid robots.Under Review, 2026
2026
-
[36]
Proximal Policy Optimization Algorithms
J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov. Proximal policy optimization algorithms.arXiv preprint arXiv:1707.06347, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[37]
B.; Jiang, Y.; Wang, T.; Iqbal, U.; Minor, D.; de Ruyter, M.; et al
D. Rempe, M. Petrovich, Y . Yuan, H. Zhang, X. B. Peng, Y . Jiang, T. Wang, U. Iqbal, D. Minor, M. de Ruyter, et al. Kimodo: Scaling controllable human motion generation.arXiv preprint arXiv:2603.15546, 2026
- [38]
-
[39]
Vaswani, N
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polo- sukhin. Attention is all you need.Advances in neural information processing systems, 30, 2017
2017
-
[40]
N. Shazeer. Glu variants improve transformer.arXiv preprint arXiv:2002.05202, 2020
work page internal anchor Pith review Pith/arXiv arXiv 2002
-
[41]
Zhang and R
B. Zhang and R. Sennrich. Root mean square layer normalization.Advances in neural infor- mation processing systems, 32, 2019
2019
-
[42]
J. Su, M. Ahmed, Y . Lu, S. Pan, W. Bo, and Y . Liu. Roformer: Enhanced transformer with rotary position embedding.Neurocomputing, 568:127063, 2024
2024
-
[43]
Isaac Lab: A GPU-Accelerated Simulation Framework for Multi-Modal Robot Learning
M. Mittal, P. Roth, J. Tigue, A. Richard, O. Zhang, P. Du, A. Serrano-Mu ˜noz, X. Yao, R. Zurbr ¨ugg, N. Rudin, L. Wawrzyniak, M. Rakhsha, A. Denzler, E. Heiden, A. Borovicka, O. Ahmed, I. Akinola, A. Anwar, M. T. Carlson, J. Y . Feng, A. Garg, R. Gasoto, L. Gulich, Y . Guo, M. Gussert, A. Hansen, M. Kulkarni, C. Li, W. Liu, V . Makoviychuk, G. Malczyk, H...
work page internal anchor Pith review Pith/arXiv arXiv 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.