SceneBot: Contact-Prompted General Humanoid Whole Body Tracking with Scene-Interaction

· 2026 · cs.RO · arXiv 2606.27581

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

Current humanoid reinforcement-learning policies excel at free-space motions but struggle with contact-rich tasks, as pure kinematic tracking cannot resolve the physical ambiguities of interacting with objects and uneven terrain. To address this, we introduce SceneBot, a unified motion-tracking framework capable of handling freespace locomotion, terrain traversal, and whole-body manipulation. SceneBot conditions a single policy on both reference motions and per-link contact labels, explicitly defining expected environmental interactions. To overcome the lack of annotated interaction data, we propose a hindsight scene reconstruction approach that infers scene-interaction graphs from retargeted human motion. Trained on 7.5 hours of this reconstructed, contact-rich data, SceneBot successfully generalizes to unseen motions and environments. Our results demonstrate that SceneBot is the first general framework to seamlessly unify free-space and contact-rich behaviors executing complex, long-horizon tasks like carrying a box upstairs and establishing contact conditioning as a powerful interface for humanoid control. All code and data will be open-sourced. More demos and information are available at: https://ericcsr.github.io/scenebot/

representative citing papers

VLK: Learning Humanoid Loco-Manipulation from Synthetic Interactions in Reconstructed Scenes

cs.RO · 2026-06-29 · unverdicted · novelty 6.0

Generates 48,000 synthetic VLK trajectories in 3D-reconstructed scenes to train a policy for egocentric perception-based humanoid navigation and object transport, shown on physical Unitree G1 robot.

citing papers explorer

Showing 1 of 1 citing paper.

VLK: Learning Humanoid Loco-Manipulation from Synthetic Interactions in Reconstructed Scenes cs.RO · 2026-06-29 · unverdicted · none · ref 5 · internal anchor
Generates 48,000 synthetic VLK trajectories in 3D-reconstructed scenes to train a policy for egocentric perception-based humanoid navigation and object transport, shown on physical Unitree G1 robot.

SceneBot: Contact-Prompted General Humanoid Whole Body Tracking with Scene-Interaction

fields

years

verdicts

representative citing papers

citing papers explorer