Egohumanoid: Unlocking in-the-wild loco-manipulation with robot-free egocentric demonstration

Modi Shi, Shijia Peng, Jin Chen, Haoran Jiang, Yinghui Li, Di Huang, Ping Luo, Hongyang Li, Li Chen · 2026 · cs.RO · arXiv 2602.10106

12 Pith papers cite this work. Polarity classification is still indexing.

12 Pith papers citing it

open full Pith review browse 12 citing papers arXiv PDF

abstract

Human demonstrations offer rich environmental diversity and scale naturally, making them an appealing alternative to robot teleoperation. While this paradigm has advanced robot-arm manipulation, its potential for the more challenging, data-hungry problem of humanoid loco-manipulation remains largely unexplored. We present EgoHumanoid, the first framework to co-train a vision-language-action policy using abundant egocentric human demonstrations together with a limited amount of robot data, enabling humanoids to perform loco-manipulation across diverse real-world environments. To bridge the embodiment gap between humans and robots, including discrepancies in physical morphology and viewpoint, we introduce a systematic alignment pipeline spanning from hardware design to data processing. A portable system for scalable human data collection is developed, and we establish practical collection protocols to improve transferability. At the core of our human-to-humanoid alignment pipeline lies two key components. The view alignment reduces visual domain discrepancies caused by camera height and perspective variation. The action alignment maps human motions into a unified, kinematically feasible action space for humanoid control. Extensive real-world experiments demonstrate that incorporating robot-free egocentric data significantly outperforms robot-only baselines by 51\%, particularly in unseen environments. Our analysis further reveals which behaviors transfer effectively and the potential for scaling human data.

citation-role summary

background 2 method 1

citation-polarity summary

background 2 use method 1

representative citing papers

Translation as a Bridging Action: Transferring Manipulation Skills from Humans to Robots

cs.RO · 2026-06-26 · unverdicted · novelty 6.0

A relative wrist translation bridging action with a vision-language-action model using interleaved tokens and attention masking transfers human manipulation skills to robots more effectively than 6DoF actions.

ActiveMimic: Egocentric Video Pretraining with Active Perception

cs.RO · 2026-06-04 · unverdicted · novelty 6.0

ActiveMimic pretrains on egocentric human video by recovering and modeling active camera motion as viewpoint actions, matching robot-data pretraining performance on real-world tasks.

LEGS: Fine-Tuning Teleop-Free VLAs for Humanoid Loco-manipulation in an Embodied Gaussian Splatting World

cs.RO · 2026-05-31 · unverdicted · novelty 6.0

LEGS shows synthetic data from a 3DGS-mesh hybrid simulator trains VLA policies for humanoid pick-and-place that match or exceed human teleoperation performance across multiple backbones and tasks while enabling low-cost robustness to appearance shifts.

EgoKit: Towards Unified Low-Cost Egocentric Data Collection with Heterogeneous Devices

cs.CV · 2026-05-16 · unverdicted · novelty 6.0

EgoKit is a new toolkit and accessory set that unifies egocentric video collection with wrist views across heterogeneous consumer devices using a consistent interface and log format.

BifrostUMI: Bridging Robot-Free Demonstrations and Humanoid Whole-Body Manipulation

cs.RO · 2026-05-05 · unverdicted · novelty 6.0

BifrostUMI enables robot-free human demonstration capture via VR and wrist cameras to train visuomotor policies that predict keypoint trajectories for transfer to humanoid whole-body control through retargeting.

ExoActor: Exocentric Video Generation as Generalizable Interactive Humanoid Control

cs.RO · 2026-04-30 · unverdicted · novelty 6.0

ExoActor uses exocentric video generation to implicitly model robot-environment-object interactions and converts the resulting videos into task-conditioned humanoid control sequences.

HEX: Humanoid-Aligned Experts for Cross-Embodiment Whole-Body Manipulation

cs.RO · 2026-04-09 · unverdicted · novelty 6.0 · 2 refs

HEX introduces a state-centric framework with humanoid-aligned representations and mixture-of-experts proprioceptive prediction for coordinated whole-body control on bipedal humanoids.

TAMEn: Tactile-Aware Manipulation Engine for Closed-Loop Data Collection in Contact-Rich Tasks

cs.RO · 2026-04-08 · unverdicted · novelty 6.0

TAMEn supplies a cross-morphology wearable interface and pyramid-structured visuo-tactile data regime that raises bimanual manipulation success rates from 34% to 75% via closed-loop collection.

OASIS: From Simulation Data Collection to Real-World Humanoid Loco-Manipulation

cs.RO · 2026-06-07 · unverdicted · novelty 5.0

OASIS generates scalable simulation data for humanoid loco-manipulation via 3D generative asset reconstruction and domain randomization, yielding a policy with higher zero-shot real-world success than real-robot teleoperation data.

EgoLive: A Large-Scale Egocentric Dataset from Real-World Human Tasks

cs.RO · 2026-04-26 · unverdicted · novelty 4.0

EgoLive is presented as the largest open-source annotated egocentric dataset for real-world task-oriented human routines, captured with a custom head-mounted device and multi-modal annotations exclusively in unconstrained environments.

JoyAI-Sim: A Simulation-Enabled Interconversion Toolchain for the Embodied Data Pyramid

cs.RO · 2026-06-15 · unverdicted · novelty 3.0

JoyAI-Sim provides bidirectional Robot-Simulation-Human pathways for aligned model evaluation and data generation in robotics using the JoySim simulator as an evaluation layer and physical consistency filter.

Controllable Egocentric Video Generation via Occlusion-Aware Sparse 3D Hand Joints

cs.CV · 2026-03-12

citing papers explorer

Showing 10 of 10 citing papers after filters.

Translation as a Bridging Action: Transferring Manipulation Skills from Humans to Robots cs.RO · 2026-06-26 · unverdicted · none · ref 46 · internal anchor
A relative wrist translation bridging action with a vision-language-action model using interleaved tokens and attention masking transfers human manipulation skills to robots more effectively than 6DoF actions.
ActiveMimic: Egocentric Video Pretraining with Active Perception cs.RO · 2026-06-04 · unverdicted · none · ref 19 · internal anchor
ActiveMimic pretrains on egocentric human video by recovering and modeling active camera motion as viewpoint actions, matching robot-data pretraining performance on real-world tasks.
LEGS: Fine-Tuning Teleop-Free VLAs for Humanoid Loco-manipulation in an Embodied Gaussian Splatting World cs.RO · 2026-05-31 · unverdicted · none · ref 13 · internal anchor
LEGS shows synthetic data from a 3DGS-mesh hybrid simulator trains VLA policies for humanoid pick-and-place that match or exceed human teleoperation performance across multiple backbones and tasks while enabling low-cost robustness to appearance shifts.
BifrostUMI: Bridging Robot-Free Demonstrations and Humanoid Whole-Body Manipulation cs.RO · 2026-05-05 · unverdicted · none · ref 25 · internal anchor
BifrostUMI enables robot-free human demonstration capture via VR and wrist cameras to train visuomotor policies that predict keypoint trajectories for transfer to humanoid whole-body control through retargeting.
ExoActor: Exocentric Video Generation as Generalizable Interactive Humanoid Control cs.RO · 2026-04-30 · unverdicted · none · ref 24 · internal anchor
ExoActor uses exocentric video generation to implicitly model robot-environment-object interactions and converts the resulting videos into task-conditioned humanoid control sequences.
HEX: Humanoid-Aligned Experts for Cross-Embodiment Whole-Body Manipulation cs.RO · 2026-04-09 · unverdicted · none · ref 39 · 2 links · internal anchor
HEX introduces a state-centric framework with humanoid-aligned representations and mixture-of-experts proprioceptive prediction for coordinated whole-body control on bipedal humanoids.
TAMEn: Tactile-Aware Manipulation Engine for Closed-Loop Data Collection in Contact-Rich Tasks cs.RO · 2026-04-08 · unverdicted · none · ref 12 · internal anchor
TAMEn supplies a cross-morphology wearable interface and pyramid-structured visuo-tactile data regime that raises bimanual manipulation success rates from 34% to 75% via closed-loop collection.
OASIS: From Simulation Data Collection to Real-World Humanoid Loco-Manipulation cs.RO · 2026-06-07 · unverdicted · none · ref 13 · internal anchor
OASIS generates scalable simulation data for humanoid loco-manipulation via 3D generative asset reconstruction and domain randomization, yielding a policy with higher zero-shot real-world success than real-robot teleoperation data.
EgoLive: A Large-Scale Egocentric Dataset from Real-World Human Tasks cs.RO · 2026-04-26 · unverdicted · none · ref 39 · internal anchor
EgoLive is presented as the largest open-source annotated egocentric dataset for real-world task-oriented human routines, captured with a custom head-mounted device and multi-modal annotations exclusively in unconstrained environments.
JoyAI-Sim: A Simulation-Enabled Interconversion Toolchain for the Embodied Data Pyramid cs.RO · 2026-06-15 · unverdicted · none · ref 40 · internal anchor
JoyAI-Sim provides bidirectional Robot-Simulation-Human pathways for aligned model evaluation and data generation in robotics using the JoySim simulator as an evaluation layer and physical consistency filter.

Egohumanoid: Unlocking in-the-wild loco-manipulation with robot-free egocentric demonstration

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer