pith. sign in

arxiv: 2603.09170 · v2 · pith:HX2D5TMJnew · submitted 2026-03-10 · 💻 cs.RO · cs.AI

ZeroWBC: Learning Natural Whole-Body Humanoid Interaction from Human Egocentric Data

classification 💻 cs.RO cs.AI
keywords whole-bodyhumanoidinteractionegocentrichumanmotionnaturalzerowbc
0
0 comments X
read the original abstract

Achieving versatile and natural whole-body humanoid interaction control remains challenging due to the high cost of whole-body teleoperation data. We present ZeroWBC, a teleoperation-free framework that learns humanoid whole-body interaction from human egocentric videos paired with synchronized whole-body motion and text annotations. ZeroWBC adopts a generation-then-tracking formulation to tackle the static scene whole-body interaction control problem. Given an initial egocentric image and a language instruction, a fine-tuned Vision-Language Model generates future human whole-body motion tokens, which are decoded into continuous motions and retargeted to the humanoid. The resulting reference motions, together with root and key body-part trajectories, are then executed by a general interactive motion tracking policy. To improve interaction performance, we introduce an interaction-oriented tracking reward that prioritizes global root and key body-part trajectory alignment while preserving natural whole-body motion. Experiments on the Unitree G1 humanoid robot show that ZeroWBC enables diverse scene-aware behaviors without robot teleoperation demonstrations. These results suggest a scalable paradigm for learning natural humanoid whole-body interaction from human egocentric data.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 5 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Imagine2Real: Towards Zero-shot Humanoid-Object Interaction via Video Generative Priors

    cs.RO 2026-05 unverdicted novelty 6.0

    Imagine2Real enables zero-shot humanoid-object interaction by unifying motions as 4D point trajectories, tracking only base/hands/object keypoints inside a BFM latent space, and training with progressive simple reward...

  2. HEX: Humanoid-Aligned Experts for Cross-Embodiment Whole-Body Manipulation

    cs.RO 2026-04 unverdicted novelty 6.0

    HEX is a new framework with humanoid-aligned state representation, mixture-of-experts proprioceptive predictor, history tokens, and residual-gated fusion that achieves state-of-the-art success and generalization on re...

  3. HEX: Humanoid-Aligned Experts for Cross-Embodiment Whole-Body Manipulation

    cs.RO 2026-04 unverdicted novelty 6.0

    HEX introduces a state-centric framework with humanoid-aligned representations and mixture-of-experts proprioceptive prediction for coordinated whole-body control on bipedal humanoids.

  4. Imagine2Real: Towards Zero-shot Humanoid-Object Interaction via Video Generative Priors

    cs.RO 2026-05 unverdicted novelty 5.0

    Imagine2Real is a zero-shot humanoid-object interaction method that unifies robot and object motion as 4D point trajectories, tracks only sparse keypoints inside a behavior foundation model latent space, and trains wi...

  5. EgoLive: A Large-Scale Egocentric Dataset from Real-World Human Tasks

    cs.RO 2026-04 unverdicted novelty 4.0

    EgoLive is presented as the largest open-source annotated egocentric dataset for real-world task-oriented human routines, captured with a custom head-mounted device and multi-modal annotations exclusively in unconstra...