HumanoidArena is a new benchmark of 7 leg-critical HOI/HSI tasks that evaluates egocentric hierarchical whole-body policies in humanoids and finds performance is strongly conditioned on the low-level GMT used.
Visualmimic: Visual hu- manoid loco-manipulation via motion tracking and generation
13 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.RO 13verdicts
UNVERDICTED 13representative citing papers
EgoPriMo learns a unified egocentric motion prior with a Triple-stream DiT model that supports reconstruction, generation, and forecasting of SMPL motions from egocentric views and text, outperforming prior methods and transferable to humanoid controllers.
ActiveMimic pretrains on egocentric human video by recovering and modeling active camera motion as viewpoint actions, matching robot-data pretraining performance on real-world tasks.
SCSP is a cascaded optimization framework using a surrogate contact model and discrete-continuous search to enable simultaneous contact selection and planning for robust contact-rich manipulation.
Imagine2Real enables zero-shot humanoid-object interaction by unifying motions as 4D point trajectories, tracking only base/hands/object keypoints inside a BFM latent space, and training with progressive simple rewards for mocap deployment.
CEER proposes a compliant end-effector and root control interface that unifies loco-manipulation for humanoids via a distilled low-level policy and hierarchical planners.
VOFA combines a high-level visuomotor policy with a low-level force-adaptive controller to let humanoids push objects up to 17 kg to arbitrary goals using only noisy onboard vision, achieving over 80% real-world success.
HAIC enables robust humanoid interactions with underactuated objects by predicting their dynamics from proprioceptive history and using a world model for adaptive control.
Humanoid-LLA converts unconstrained natural language commands into stable whole-body motions for humanoid robots using a unified motion vocabulary and two-stage supervised-plus-reinforcement fine-tuning.
VAIC distills a teacher policy into a vision-and-proprioception student policy using recurrent adaptation and decoupled commands, enabling diverse real-robot tasks like box carrying and skateboarding that outperform baselines.
HANDOFF is a distilled mixture-of-experts humanoid whole-body controller that follows a compact task-space interface, matches SOTA velocity tracking, provides large manipulation workspace on Unitree G1, and supports VLM-driven agentic planning with no task-specific data.
GRAIL creates over 20,000 synthetic loco-manipulation sequences from known 3D configurations and video priors, then trains policies that achieve 84% pick-up and 90% stair-climbing success on a real Unitree G1 humanoid using only the generated data.
SplitAdapter factorizes adaptation into load-aware and dynamics-aware encoders using split world-model objectives, GRL regularization, and hierarchical FiLM, reporting higher full-task success than baselines across 2-6 kg masses and 0-60 cm heights.
citing papers explorer
-
Commanding Humanoid by Free-form Language: A Large Language Action Model with Unified Motion Vocabulary
Humanoid-LLA converts unconstrained natural language commands into stable whole-body motions for humanoid robots using a unified motion vocabulary and two-stage supervised-plus-reinforcement fine-tuning.