Visualmimic: Visual hu- manoid loco-manipulation via motion tracking and generation

Shaofeng Yin, Yanjie Ze, Hong-Xing Yu, C Karen Liu, Jiajun Wu · 2025 · arXiv 2509.20322

13 Pith papers cite this work. Polarity classification is still indexing.

13 Pith papers citing it

representative citing papers

HumanoidArena: Benchmarking Egocentric Hierarchical Whole-body Learning

cs.RO · 2026-06-16 · unverdicted · novelty 7.0

HumanoidArena is a new benchmark of 7 leg-critical HOI/HSI tasks that evaluates egocentric hierarchical whole-body policies in humanoids and finds performance is strongly conditioned on the low-level GMT used.

EgoPriMo: Egocentric Motion Generation for Interactive Humanoid Control

cs.RO · 2026-06-07 · unverdicted · novelty 6.0

EgoPriMo learns a unified egocentric motion prior with a Triple-stream DiT model that supports reconstruction, generation, and forecasting of SMPL motions from egocentric views and text, outperforming prior methods and transferable to humanoid controllers.

ActiveMimic: Egocentric Video Pretraining with Active Perception

cs.RO · 2026-06-04 · unverdicted · novelty 6.0

ActiveMimic pretrains on egocentric human video by recovering and modeling active camera motion as viewpoint actions, matching robot-data pretraining performance on real-world tasks.

Simultaneous Contact Selection and Planning for Contact-Rich Manipulation with Cascaded Optimization

cs.RO · 2026-05-27 · unverdicted · novelty 6.0

SCSP is a cascaded optimization framework using a surrogate contact model and discrete-continuous search to enable simultaneous contact selection and planning for robust contact-rich manipulation.

Imagine2Real: Towards Zero-shot Humanoid-Object Interaction via Video Generative Priors

cs.RO · 2026-05-21 · unverdicted · novelty 6.0

Imagine2Real enables zero-shot humanoid-object interaction by unifying motions as 4D point trajectories, tracking only base/hands/object keypoints inside a BFM latent space, and training with progressive simple rewards for mocap deployment.

CEER: Compliant End-Effector and Root Control as a Unified Interface for Hierarchical Humanoid Loco-Manipulation

cs.RO · 2026-05-19 · unverdicted · novelty 6.0

CEER proposes a compliant end-effector and root control interface that unifies loco-manipulation for humanoids via a distilled low-level policy and hierarchical planners.

VOFA: Visual Object Goal Pushing with Force-Adaptive Control for Humanoids

cs.RO · 2026-05-02 · unverdicted · novelty 6.0

VOFA combines a high-level visuomotor policy with a low-level force-adaptive controller to let humanoids push objects up to 17 kg to arbitrary goals using only noisy onboard vision, achieving over 80% real-world success.

HAIC: Humanoid Agile Object Interaction Control via Dynamics-Aware World Model

cs.RO · 2026-02-12 · unverdicted · novelty 6.0

HAIC enables robust humanoid interactions with underactuated objects by predicting their dynamics from proprioceptive history and using a world model for adaptive control.

Commanding Humanoid by Free-form Language: A Large Language Action Model with Unified Motion Vocabulary

cs.RO · 2025-11-28 · unverdicted · novelty 6.0

Humanoid-LLA converts unconstrained natural language commands into stable whole-body motions for humanoid robots using a unified motion vocabulary and two-stage supervised-plus-reinforcement fine-tuning.

VAIC: Vision-Guided Humanoid Agile Object Interaction Control via Decoupled Commands

cs.RO · 2026-06-08 · unverdicted · novelty 5.0

VAIC distills a teacher policy into a vision-and-proprioception student policy using recurrent adaptation and decoupled commands, enabling diverse real-robot tasks like box carrying and skateboarding that outperform baselines.

HANDOFF: Humanoid Agentic Task-Space Whole-Body Control via Distilled Complementary Teachers

cs.RO · 2026-06-04 · unverdicted · novelty 5.0

HANDOFF is a distilled mixture-of-experts humanoid whole-body controller that follows a compact task-space interface, matches SOTA velocity tracking, provides large manipulation workspace on Unitree G1, and supports VLM-driven agentic planning with no task-specific data.

GRAIL: Generating Humanoid Loco-Manipulation from 3D Assets and Video Priors

cs.RO · 2026-06-03 · unverdicted · novelty 5.0

GRAIL creates over 20,000 synthetic loco-manipulation sequences from known 3D configurations and video priors, then trains policies that achieve 84% pick-up and 90% stair-climbing success on a real Unitree G1 humanoid using only the generated data.

SplitAdapter: Load-Aware Humanoid Loco-Manipulation via Factorized Adaptation

cs.RO · 2026-06-02 · unverdicted · novelty 4.0

SplitAdapter factorizes adaptation into load-aware and dynamics-aware encoders using split world-model objectives, GRL regularization, and hierarchical FiLM, reporting higher full-task success than baselines across 2-6 kg masses and 0-60 cm heights.

citing papers explorer

Showing 1 of 1 citing paper after filters.

Commanding Humanoid by Free-form Language: A Large Language Action Model with Unified Motion Vocabulary cs.RO · 2025-11-28 · unverdicted · none · ref 61
Humanoid-LLA converts unconstrained natural language commands into stable whole-body motions for humanoid robots using a unified motion vocabulary and two-stage supervised-plus-reinforcement fine-tuning.

Visualmimic: Visual hu- manoid loco-manipulation via motion tracking and generation

fields

years

verdicts

representative citing papers

citing papers explorer