Uni-Mo generates 7,488 language-annotated quadruped motions via LLM prompts and video diffusion, lifts them to 3D trajectories, and trains policies achieving 96.7% real-robot success on 392 sampled motions.
hub Canonical reference
Humanplus: Hu- manoid shadowing and imitation from humans
Canonical reference. 80% of citing Pith papers cite this work as background.
hub tools
citation-role summary
citation-polarity summary
representative citing papers
BeyondMimic combines compact motion tracking with a unified guided latent diffusion model to master diverse agile behaviors from human demos and solve unseen downstream tasks via test-time classifier guidance.
Human-as-Humanoid converts ego-exo human videos into executable 60-DoF humanoid actions through embodiment alignment and retargeting, enabling zero-shot real-robot policy deployment without target-task teleoperation data.
FADA is a three-stage Planner-IDM method that achieves few-shot domain adaptation for humanoid control by distilling an oracle policy then finetuning only the IDM on short target-domain rollouts via supervised learning.
CWI decouples MoCap data for upper-body manipulation and lower-body locomotion, using dual discriminators and multi-critic training plus distillation to produce a policy that works from hand poses and velocity commands alone.
Imagine2Real enables zero-shot humanoid-object interaction by unifying motions as 4D point trajectories, tracking only base/hands/object keypoints inside a BFM latent space, and training with progressive simple rewards for mocap deployment.
BifrostUMI enables robot-free human demonstration capture via VR and wrist cameras to train visuomotor policies that predict keypoint trajectories for transfer to humanoid whole-body control through retargeting.
RoSHI is a hybrid wearable that combines sparse IMUs and egocentric SLAM to capture accurate full-body 3D pose and shape data in natural environments for robot learning.
NMR uses VAE-based clustered expert physics refinement and a CNN-Transformer to learn dynamics-aware retargeting, eliminating joint jumps and self-collisions on Unitree G1 while accelerating downstream control policies.
A modular system uses motion matching to compose long-horizon human skill chains, trains RL experts, and distills them into a depth-based policy that lets a Unitree G1 humanoid autonomously climb, vault, and roll over obstacles up to 1.25 m tall.
HUSKY combines humanoid-skateboard dynamics modeling with adversarial motion priors and physics-guided lean-to-steer strategies to achieve real-world stable skateboarding on a humanoid robot.
RIGVid shows that filtered AI-generated videos can serve as effective supervision for complex robotic manipulation tasks without any real demonstrations.
DreamPolicy integrates an autoregressive diffusion world model with policy learning to produce a single scalable policy that generalizes to unseen composite terrains for humanoid locomotion.
CoT-VLA is a 7B VLA that generates future visual frames autoregressively as planning goals before actions, outperforming prior VLAs by 17% on real-world tasks and 6% in simulation.
DexVLA combines a scaled diffusion action expert with embodiment curriculum learning to achieve better generalization and performance than prior VLA models on diverse robot hardware and long-horizon tasks.
MuGen learns a generative latent representation of multi-skill humanoid locomotion from heterogeneous human data using VQ-VAEs and RL, then distills a deployable policy that tracks unseen motions and reuses the latent space.
HoloMotion-1 trains a MoE Transformer policy on hybrid video and MoCap motion data to achieve robust zero-shot tracking that transfers directly to real humanoid robots.
Switch enables humanoid robots to perform agile, seamless transitions between locomotion skills via a kinematic skill graph, DRL tracking policy, and real-time graph-search scheduler.
HTD, a multimodal transformer policy trained with behavioral cloning and touch dreaming to predict future tactile latents, achieves a 90.9% relative success rate improvement over baselines on five real-world contact-rich humanoid loco-manipulation tasks.
A literature review of pHHI that proposes a taxonomy of interaction types by modality and engagement level while outlining pathways to integrate control, intent, and modeling for more seamless humanoid-human collaboration.
Describes an integrated pipeline for curating motion data, adapting real-to-sim models, applying AMP-based RL, and deploying locomotion policies on Booster T1 and K1 humanoid robots.
citing papers explorer
-
Unleashing Infinite Motion: Scaling Expressive Quadrupedal Motion via Generative Video Priors
Uni-Mo generates 7,488 language-annotated quadruped motions via LLM prompts and video diffusion, lifts them to 3D trajectories, and trains policies achieving 96.7% real-robot success on 392 sampled motions.
-
Human-as-Humanoid: Enabling Zero-Shot Humanoid Learning from Ego-Exo Human Videos with Human-Aligned Embodiments
Human-as-Humanoid converts ego-exo human videos into executable 60-DoF humanoid actions through embodiment alignment and retargeting, enabling zero-shot real-robot policy deployment without target-task teleoperation data.
-
FADA: Few-Shot Domain Adaptation via Dynamics Alignment for Humanoid Control
FADA is a three-stage Planner-IDM method that achieves few-shot domain adaptation for humanoid control by distilling an oracle policy then finetuning only the IDM on short target-domain rollouts via supervised learning.
-
CWI: Composite Humanoid Whole-Body Imitation System for Loco-manipulation
CWI decouples MoCap data for upper-body manipulation and lower-body locomotion, using dual discriminators and multi-critic training plus distillation to produce a policy that works from hand poses and velocity commands alone.
-
Imagine2Real: Towards Zero-shot Humanoid-Object Interaction via Video Generative Priors
Imagine2Real enables zero-shot humanoid-object interaction by unifying motions as 4D point trajectories, tracking only base/hands/object keypoints inside a BFM latent space, and training with progressive simple rewards for mocap deployment.
-
BifrostUMI: Bridging Robot-Free Demonstrations and Humanoid Whole-Body Manipulation
BifrostUMI enables robot-free human demonstration capture via VR and wrist cameras to train visuomotor policies that predict keypoint trajectories for transfer to humanoid whole-body control through retargeting.
-
RoSHI: A Versatile Robot-oriented Suit for Human Data In-the-Wild
RoSHI is a hybrid wearable that combines sparse IMUs and egocentric SLAM to capture accurate full-body 3D pose and shape data in natural environments for robot learning.
-
Make Tracking Easy: Neural Motion Retargeting for Humanoid Whole-body Control
NMR uses VAE-based clustered expert physics refinement and a CNN-Transformer to learn dynamics-aware retargeting, eliminating joint jumps and self-collisions on Unitree G1 while accelerating downstream control policies.
-
Perceptive Humanoid Parkour: Chaining Dynamic Human Skills via Motion Matching
A modular system uses motion matching to compose long-horizon human skill chains, trains RL experts, and distills them into a depth-based policy that lets a Unitree G1 humanoid autonomously climb, vault, and roll over obstacles up to 1.25 m tall.
-
HUSKY: Humanoid Skateboarding System via Physics-Aware Whole-Body Control
HUSKY combines humanoid-skateboard dynamics modeling with adversarial motion priors and physics-guided lean-to-steer strategies to achieve real-world stable skateboarding on a humanoid robot.
-
MuGen: Multi-Skill Generative Locomotion Controller for Humanoid Robots
MuGen learns a generative latent representation of multi-skill humanoid locomotion from heterogeneous human data using VQ-VAEs and RL, then distills a deployable policy that tracks unseen motions and reuses the latent space.
-
HoloMotion-1 Technical Report
HoloMotion-1 trains a MoE Transformer policy on hybrid video and MoCap motion data to achieve robust zero-shot tracking that transfers directly to real humanoid robots.
-
Switch: Learning Agile Skills Switching for Humanoid Robots
Switch enables humanoid robots to perform agile, seamless transitions between locomotion skills via a kinematic skill graph, DRL tracking policy, and real-time graph-search scheduler.
-
Learning Versatile Humanoid Manipulation with Touch Dreaming
HTD, a multimodal transformer policy trained with behavioral cloning and touch dreaming to predict future tactile latents, achieves a 90.9% relative success rate improvement over baselines on five real-world contact-rich humanoid loco-manipulation tasks.
-
Booster Lab: A Data-Centric Pipeline for Learning Deployable Humanoid Locomotion Policies
Describes an integrated pipeline for curating motion data, adapting real-to-sim models, applying AMP-based RL, and deploying locomotion policies on Booster T1 and K1 humanoid robots.
- RPG: Robust Policy Gating for Smooth Multi-Skill Transitions in Humanoid Fighting
- Learn Weightlessness: Imitate Non-Self-Stabilizing Motions on Humanoid Robot