pith. sign in

arxiv: 2606.25706 · v1 · pith:4BXC5VPMnew · submitted 2026-06-24 · 💻 cs.RO

Learning Asynchronous Upper-body Task-space Trajectory Tracking Policy for Humanoid Robots

Pith reviewed 2026-06-25 21:06 UTC · model grok-4.3

classification 💻 cs.RO
keywords humanoid robotstask-space trackingasynchronous controlreinforcement learningteacher-student distillationMPC guidanceframe drift
0
0 comments X

The pith

Conditioning a humanoid policy on cached future trajectories and an execution-time index enables accurate upper-body tracking at low planner update rates.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes a method to bridge the gap between sparse, low-rate task-space trajectories from high-level planners and the high-frequency needs of whole-body humanoid control. A student policy is first distilled from a teacher and then trained while receiving the entire cached future trajectory plus the current execution index as input. A sliding-window global reward is used during training to limit accumulated frame drift without any explicit frame estimation step. Post-training adds an MPC module to densify references into guidance signals and self-guidance terms at both action and forward-kinematics levels to further constrain drift. Experiments on simulation and the Unitree G1 hardware demonstrate better tracking than synchronous or decoupled baselines, especially when planner updates are infrequent and when motions lie outside the training distribution.

Core claim

The authors introduce an asynchronous upper body task-space tracking framework in which a student policy, initialized by teacher-student distillation, receives the full cached future trajectory and an execution-time index as conditioning. Training employs a sliding-window global reward to reduce frame drift without explicit frame estimation. For task-specific refinement an MPC module converts sparse references into floating-base and upper-body guidance while action-level and forward-kinematics self-guidance terms limit policy drift. Simulation and Unitree G1 hardware results show improved tracking under low update rates, stronger performance relative to synchronous and decoupled baselines, a

What carries the argument

Student policy conditioned on the full cached future trajectory and execution-time index, trained with a sliding-window global reward.

If this is right

  • Tracking accuracy improves when high-level planners update at low rates.
  • Performance exceeds that of synchronous and decoupled baseline controllers.
  • The policy adapts more safely to motions outside the training distribution.
  • MPC-based guidance combined with self-guidance at action and kinematics levels keeps the policy from diverging.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same conditioning and reward structure could be applied to lower-body policies to achieve full-body asynchronous tracking.
  • Explicit frame alignment modules may become unnecessary if global rewards are shaped over sliding windows.
  • The framework suggests planners could safely run at slower rates or on cheaper hardware without sacrificing execution quality.

Load-bearing premise

The method assumes that feeding the cached future trajectory and execution index together with the sliding-window reward will reliably stop frame drift and policy divergence even when planner updates are sparse and the motion is out-of-distribution.

What would settle it

Deploy the learned policy on the Unitree G1 while deliberately lowering planner update frequency below the rates tested in the paper and measure whether task-space tracking error grows beyond the reported bounds.

Figures

Figures reproduced from arXiv: 2606.25706 by Dongqi Wang, Jiyu Yu, Rong Xiong, Yijun Fan, Yue Wang, Yumeng Liu.

Figure 1
Figure 1. Figure 1: Execution-interface gap between task-level planners and low-level [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Overview of the proposed asynchronous sparse trajectory tracking and post-training framework. [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Asynchronous frame mismatch and sliding-window global reward. Base (right): the base drifts from btk to bt between updates, so the transform btk Tbt is unavailable, causing the mismatch of the reference frame. Head and Hands (left): we keep the cached reference and executed pose in the same frame btk , and penalize their global discrepancy over a sliding window. new environments and out-of-distribution(OOD… view at source ↗
Figure 4
Figure 4. Figure 4: OOD post-training with MPC-completed guidance. (a) Success rates on ood1 and held-out ood2 for ASYNC-CA and ASYNC-3PT built on the OmniH2O and SONIC teacher backbones. Hollow markers denote zero-shot deployment, filled markers denote post-training (P.T.) with MPC-completed guidance (SG), and Baseline refers to the original synchronous policy under asynchronous reference updates. (b) Normalized interval tra… view at source ↗
Figure 5
Figure 5. Figure 5: (a)Inference-time speed modulation enabled by time-indexed [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: We compare ASYNC-CA with and without post-training on challenging OOD trajectories. (a) Post-training reduces end-effector position and rotation errors, indicating improved tracking accuracy. (b)Without post-training, ASYNC-CA exhibits inaccurate tracking and unstable forward staggering, where the reference motion is nearly in-place. With post-training, the policy better follows the reference and maintains… view at source ↗
Figure 7
Figure 7. Figure 7: We compare post-trained policies with and without self [PITH_FULL_IMAGE:figures/full_fig_p008_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Additional hardware demonstrations on the Unitree G1. The post-trained policy executes diverse whole-body motions including waving hands, [PITH_FULL_IMAGE:figures/full_fig_p010_8.png] view at source ↗
read the original abstract

High-level humanoid planners often output sparse task-space, low-rate trajectories, whereas whole-body controllers run at high frequency. This creates temporal asynchrony between the planning and execution, and structural incompleteness for full-body control. We propose an asynchronous upper body task-space tracking framework for humanoids. A student policy is initialized by teacher-student distillation, conditioned on the full cached future trajectory and an execution-time index, and trained with a sliding-window global reward to reduce frame drift without explicit frame estimation. For task-specific post-training, an MPC module completes sparse references into floating-base and upper-body guidance, while action- and FK level self-guidance constrain policy drift. Simulation and Unitree G1 hardware experiments show improved tracking under low update rates, stronger performance than synchronous and decoupled baselines, and safer adaptation to out-of-distribution motions.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 2 minor

Summary. The manuscript proposes an asynchronous upper-body task-space trajectory tracking framework for humanoid robots. A student policy is initialized via teacher-student distillation and conditioned on the full cached future trajectory plus an execution-time index; it is trained with a sliding-window global reward to reduce frame drift without explicit estimation. Post-training uses an MPC module to complete sparse references into floating-base and upper-body guidance, together with action- and FK-level self-guidance to constrain drift. Simulation and Unitree G1 hardware experiments are reported to demonstrate improved tracking under low update rates, stronger performance than synchronous and decoupled baselines, and safer adaptation to out-of-distribution motions.

Significance. If the empirical results hold, the work addresses a practical gap between sparse high-level planning and high-frequency whole-body control in humanoids. The combination of distillation-based initialization, trajectory conditioning, sliding-window rewards, MPC completion, and multi-level self-guidance offers a concrete method for drift mitigation that could improve robustness in real-world asynchronous settings.

minor comments (2)
  1. [Abstract] Abstract: the claims of 'improved tracking' and 'stronger performance' are stated without any numerical values, error metrics, or statistical comparisons; adding one or two key quantitative results would make the summary more informative.
  2. The description of the teacher policy and the exact distillation procedure would benefit from additional implementation details (e.g., loss weights, data collection protocol) to support reproducibility.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive summary, significance assessment, and recommendation of minor revision. The referee's description of the framework, including teacher-student distillation, trajectory conditioning, sliding-window rewards, MPC completion, and multi-level self-guidance, accurately reflects the manuscript. No major comments were provided in the report.

Circularity Check

0 steps flagged

No circularity in empirical framework

full rationale

The manuscript presents an empirical robotics learning method: teacher-student distillation initializes a student policy conditioned on cached trajectories and an index, trained via sliding-window reward plus MPC/self-guidance modules. All central claims (improved asynchronous tracking, robustness to low rates and OOD motions) rest on simulation and Unitree G1 hardware experiments that directly measure the stated conditions. No equations, derivations, fitted parameters renamed as predictions, or self-citation chains appear that would reduce any result to its inputs by construction. The approach is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only the abstract is available; no information on free parameters, axioms, or invented entities can be extracted or audited.

pith-pipeline@v0.9.1-grok · 5678 in / 1195 out tokens · 35767 ms · 2026-06-25T21:06:54.908224+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

28 extracted references · 3 linked inside Pith

  1. [1]

    Learning human-to-humanoid real-time whole-body teleoperation,

    T. Heet al., “Learning human-to-humanoid real-time whole-body teleoperation,” inIEEE/RSJ International Conference on Intelligent Robots and Systems, 2024

  2. [2]

    GMT: General motion tracking for humanoid whole- body control,

    Z. Chenet al., “GMT: General motion tracking for humanoid whole- body control,”arXiv preprint arXiv:2506.14770, 2025

  3. [3]

    Hover: Versatile neural whole-body controller for humanoid robots,

    T. Heet al., “Hover: Versatile neural whole-body controller for humanoid robots,” in2025 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2025

  4. [4]

    Hiwet: Hierarchical world-frame end-effector tracking for long-horizon humanoid loco-manipulation,

    Z. Caoet al., “Hiwet: Hierarchical world-frame end-effector tracking for long-horizon humanoid loco-manipulation,” inRobotics: Science and Systems (RSS), 2026, accepted

  5. [5]

    Learning humanoid end-effector control for open-vocabulary visual loco-manipulation,

    R. Dong, Z. Li, X. He, and S. Gupta, “Learning humanoid end-effector control for open-vocabulary visual loco-manipulation,”arXiv preprint arXiv:2602.16705, 2026

  6. [6]

    GR00T N1: An open foundation model for generalist humanoid robots,

    NVIDIAet al., “GR00T N1: An open foundation model for generalist humanoid robots,”arXiv preprint arXiv:2503.14734, 2025

  7. [7]

    AgiBot world colosseo: A large-scale manipulation platform for scalable and intelligent embodied systems,

    Q. Buet al., “AgiBot world colosseo: A large-scale manipulation platform for scalable and intelligent embodied systems,” in2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2025

  8. [8]

    Towards bridging the gap between large-scale pre- training and efficient finetuning for humanoid control,

    W. Huanget al., “Towards bridging the gap between large-scale pre- training and efficient finetuning for humanoid control,” inInternational Conference on Learning Representations (ICLR), 2026

  9. [9]

    Unlocking in-the-wild loco-manipulation with robot- free egocentric demonstration,

    M. Shiet al., “Unlocking in-the-wild loco-manipulation with robot- free egocentric demonstration,” inRobotics: Science and Systems (RSS), 2026, accepted

  10. [10]

    Ψ 0: An open foundation model towards universal hu- manoid loco-manipulation,

    S. Weiet al., “Ψ 0: An open foundation model towards universal hu- manoid loco-manipulation,” inRobotics: Science and Systems (RSS), 2026, accepted

  11. [11]

    SONIC: Supersizing motion tracking for natural hu- manoid whole-body control,

    Z. Luoet al., “SONIC: Supersizing motion tracking for natural hu- manoid whole-body control,”arXiv preprint arXiv:2511.07820, 2025

  12. [12]

    Trajbooster: Boosting humanoid whole-body manip- ulation via trajectory-centric learning,

    J. Liuet al., “Trajbooster: Boosting humanoid whole-body manip- ulation via trajectory-centric learning,” in2026 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2026

  13. [13]

    Openvla: An open-source vision-language-action model,

    M. J. Kimet al., “Openvla: An open-source vision-language-action model,” inProceedings of The 8th Conference on Robot Learning, ser. Proceedings of Machine Learning Research, P. Agrawal, O. Kroemer, and W. Burgard, Eds., vol. 270. PMLR, 06–09 Nov 2025, pp. 2679– 2713

  14. [14]

    π 0.5: a vision-language-action model with open- world generalization,

    K. Blacket al., “π 0.5: a vision-language-action model with open- world generalization,” inProceedings of The 9th Conference on Robot Learning, ser. Proceedings of Machine Learning Research, J. Lim, S. Song, and H.-W. Park, Eds., vol. 305. PMLR, 27–30 Sep 2025, pp. 17–40

  15. [15]

    Wholebodyvla: Towards unified latent vla for whole-body loco-manipulation control,

    H. Jianget al., “Wholebodyvla: Towards unified latent vla for whole-body loco-manipulation control,” inInternational Conference on Learning Representations (ICLR), 2026

  16. [16]

    HOMIE: Humanoid loco-manipulation with isomor- phic exoskeleton cockpit,

    Q. Benet al., “HOMIE: Humanoid loco-manipulation with isomor- phic exoskeleton cockpit,” inProceedings of Robotics: Science and Systems, Los Angeles, CA, USA, June 2025

  17. [17]

    Humanoid manipulation interface: Humanoid whole- body manipulation from robot-free demonstrations,

    R. Naiet al., “Humanoid manipulation interface: Humanoid whole- body manipulation from robot-free demonstrations,”arXiv preprint arXiv:2602.06643, 2026

  18. [18]

    REFINE-DP: Diffusion policy fine-tuning for hu- manoid loco-manipulation via reinforcement learning,

    Z. Guet al., “REFINE-DP: Diffusion policy fine-tuning for hu- manoid loco-manipulation via reinforcement learning,”arXiv preprint arXiv:2603.13707, 2026

  19. [19]

    Exbody2: Advanced expressive humanoid whole-body control,

    M. Jiet al., “Exbody2: Advanced expressive humanoid whole-body control,” in2026 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2026

  20. [20]

    FRoM-W1: Towards general humanoid whole-body control with language instructions,

    P. Liet al., “FRoM-W1: Towards general humanoid whole-body control with language instructions,”arXiv preprint arXiv:2601.12799, 2026

  21. [21]

    Towards adaptive humanoid control via multi-behavior distillation and reinforced fine-tuning,

    Y . Zhaoet al., “Towards adaptive humanoid control via multi-behavior distillation and reinforced fine-tuning,” inProceedings of the AAAI Conference on Artificial Intelligence, 2026

  22. [22]

    General humanoid whole-body control via pretraining and fast adaptation,

    Z. Wanget al., “General humanoid whole-body control via pretraining and fast adaptation,”arXiv preprint arXiv:2602.11929, 2026

  23. [23]

    PPF: Pre-training and preservative fine-tuning of humanoid locomotion via model-assumption-based regularization,

    H. Junget al., “PPF: Pre-training and preservative fine-tuning of humanoid locomotion via model-assumption-based regularization,” IEEE Robotics and Automation Letters, 2025

  24. [24]

    Omnih2o: Universal and dexterous human-to-humanoid whole-body teleoperation and learning,

    T. Heet al., “Omnih2o: Universal and dexterous human-to-humanoid whole-body teleoperation and learning,” inProceedings of The 8th Conference on Robot Learning, ser. Proceedings of Machine Learning Research, P. Agrawal, O. Kroemer, and W. Burgard, Eds., vol. 270. PMLR, 06–09 Nov 2025, pp. 1516–1540

  25. [25]

    Object motion guided human motion synthesis,

    J. Li, J. Wu, and C. K. Liu, “Object motion guided human motion synthesis,”ACM Transactions on Graphics, vol. 42, no. 6, pp. 197:1– 197:11, 2023

  26. [26]

    GRAB: A dataset of whole-body human grasping of objects,

    O. Taheri, N. Ghorbani, M. J. Black, and D. Tzionas, “GRAB: A dataset of whole-body human grasping of objects,” inEuropean Conference on Computer Vision, 2020, pp. 581–600

  27. [27]

    Hold my beer: Learning gentle humanoid locomotion and end-effector stabilization control,

    Y . Liet al., “Hold my beer: Learning gentle humanoid locomotion and end-effector stabilization control,” inConference on Robot Learning (CoRL), 2025, poster

  28. [28]

    Mobile-television: Predictive motion priors for humanoid whole-body control,

    C. Luet al., “Mobile-television: Predictive motion priors for humanoid whole-body control,” in2025 IEEE International Conference on Robotics and Automation (ICRA), 2025, pp. 5364–5371. APPENDIXI MPC FORMULATION We adopt an OCS2-based kinematic MPC formulation to complete the sparse upper-body reference with floating-base and upper-body joint guidance. At ...