BeyondMimic combines compact motion tracking with a unified guided latent diffusion model to master diverse agile behaviors from human demos and solve unseen downstream tasks via test-time classifier guidance.
hub Canonical reference
Humanplus: Humanoid shadowing and imitation from humans
Canonical reference. 80% of citing Pith papers cite this work as background.
hub tools
citation-role summary
citation-polarity summary
representative citing papers
Imagine2Real enables zero-shot humanoid-object interaction by unifying motions as 4D point trajectories, tracking only base/hands/object keypoints inside a BFM latent space, and training with progressive simple rewards for mocap deployment.
BifrostUMI enables robot-free human demonstration capture via VR and wrist cameras to train visuomotor policies that predict keypoint trajectories for transfer to humanoid whole-body control through retargeting.
The Weightlessness Mechanism lets humanoid robots imitate non-self-stabilizing motions by dynamically relaxing specific joints to exploit passive environmental contacts, generalizing from single demonstrations to varied setups.
RoSHI is a hybrid wearable that combines sparse IMUs and egocentric SLAM to capture accurate full-body 3D pose and shape data in natural environments for robot learning.
NMR uses VAE-based clustered expert physics refinement and a CNN-Transformer to learn dynamics-aware retargeting, eliminating joint jumps and self-collisions on Unitree G1 while accelerating downstream control policies.
A modular system uses motion matching to compose long-horizon human skill chains, trains RL experts, and distills them into a depth-based policy that lets a Unitree G1 humanoid autonomously climb, vault, and roll over obstacles up to 1.25 m tall.
HUSKY combines humanoid-skateboard dynamics modeling with adversarial motion priors and physics-guided lean-to-steer strategies to achieve real-world stable skateboarding on a humanoid robot.
RIGVid shows that filtered AI-generated videos can serve as effective supervision for complex robotic manipulation tasks without any real demonstrations.
DreamPolicy integrates an autoregressive diffusion world model with policy learning to produce a single scalable policy that generalizes to unseen composite terrains for humanoid locomotion.
CoT-VLA is a 7B VLA that generates future visual frames autoregressively as planning goals before actions, outperforming prior VLAs by 17% on real-world tasks and 6% in simulation.
DexVLA combines a scaled diffusion action expert with embodiment curriculum learning to achieve better generalization and performance than prior VLA models on diverse robot hardware and long-horizon tasks.
HoloMotion-1 trains a MoE Transformer policy on hybrid video and MoCap motion data to achieve robust zero-shot tracking that transfers directly to real humanoid robots.
RPG trains a single policy with transition and timing randomization for stable multi-skill fighting on humanoids, integrated with locomotion for arbitrary-duration combat.
Switch enables humanoid robots to perform agile, seamless transitions between locomotion skills via a kinematic skill graph, DRL tracking policy, and real-time graph-search scheduler.
HTD, a multimodal transformer policy trained with behavioral cloning and touch dreaming to predict future tactile latents, achieves a 90.9% relative success rate improvement over baselines on five real-world contact-rich humanoid loco-manipulation tasks.
A literature review of pHHI that proposes a taxonomy of interaction types by modality and engagement level while outlining pathways to integrate control, intent, and modeling for more seamless humanoid-human collaboration.
citing papers explorer
-
BeyondMimic: From Motion Tracking to Versatile Humanoid Control via Guided Diffusion
BeyondMimic combines compact motion tracking with a unified guided latent diffusion model to master diverse agile behaviors from human demos and solve unseen downstream tasks via test-time classifier guidance.
-
Imagine2Real: Towards Zero-shot Humanoid-Object Interaction via Video Generative Priors
Imagine2Real enables zero-shot humanoid-object interaction by unifying motions as 4D point trajectories, tracking only base/hands/object keypoints inside a BFM latent space, and training with progressive simple rewards for mocap deployment.
-
BifrostUMI: Bridging Robot-Free Demonstrations and Humanoid Whole-Body Manipulation
BifrostUMI enables robot-free human demonstration capture via VR and wrist cameras to train visuomotor policies that predict keypoint trajectories for transfer to humanoid whole-body control through retargeting.
-
Learn Weightlessness: Imitate Non-Self-Stabilizing Motions on Humanoid Robot
The Weightlessness Mechanism lets humanoid robots imitate non-self-stabilizing motions by dynamically relaxing specific joints to exploit passive environmental contacts, generalizing from single demonstrations to varied setups.
-
RoSHI: A Versatile Robot-oriented Suit for Human Data In-the-Wild
RoSHI is a hybrid wearable that combines sparse IMUs and egocentric SLAM to capture accurate full-body 3D pose and shape data in natural environments for robot learning.
-
Make Tracking Easy: Neural Motion Retargeting for Humanoid Whole-body Control
NMR uses VAE-based clustered expert physics refinement and a CNN-Transformer to learn dynamics-aware retargeting, eliminating joint jumps and self-collisions on Unitree G1 while accelerating downstream control policies.
-
Perceptive Humanoid Parkour: Chaining Dynamic Human Skills via Motion Matching
A modular system uses motion matching to compose long-horizon human skill chains, trains RL experts, and distills them into a depth-based policy that lets a Unitree G1 humanoid autonomously climb, vault, and roll over obstacles up to 1.25 m tall.
-
HUSKY: Humanoid Skateboarding System via Physics-Aware Whole-Body Control
HUSKY combines humanoid-skateboard dynamics modeling with adversarial motion priors and physics-guided lean-to-steer strategies to achieve real-world stable skateboarding on a humanoid robot.
-
Robotic Manipulation by Imitating Generated Videos Without Physical Demonstrations
RIGVid shows that filtered AI-generated videos can serve as effective supervision for complex robotic manipulation tasks without any real demonstrations.
-
DreamPolicy: A Unified World-model Policy for Scalable Humanoid Locomotion
DreamPolicy integrates an autoregressive diffusion world model with policy learning to produce a single scalable policy that generalizes to unseen composite terrains for humanoid locomotion.
-
CoT-VLA: Visual Chain-of-Thought Reasoning for Vision-Language-Action Models
CoT-VLA is a 7B VLA that generates future visual frames autoregressively as planning goals before actions, outperforming prior VLAs by 17% on real-world tasks and 6% in simulation.
-
DexVLA: Vision-Language Model with Plug-In Diffusion Expert for General Robot Control
DexVLA combines a scaled diffusion action expert with embodiment curriculum learning to achieve better generalization and performance than prior VLA models on diverse robot hardware and long-horizon tasks.
-
HoloMotion-1 Technical Report
HoloMotion-1 trains a MoE Transformer policy on hybrid video and MoCap motion data to achieve robust zero-shot tracking that transfers directly to real humanoid robots.
-
RPG: Robust Policy Gating for Smooth Multi-Skill Transitions in Humanoid Fighting
RPG trains a single policy with transition and timing randomization for stable multi-skill fighting on humanoids, integrated with locomotion for arbitrary-duration combat.
-
Switch: Learning Agile Skills Switching for Humanoid Robots
Switch enables humanoid robots to perform agile, seamless transitions between locomotion skills via a kinematic skill graph, DRL tracking policy, and real-time graph-search scheduler.
-
Learning Versatile Humanoid Manipulation with Touch Dreaming
HTD, a multimodal transformer policy trained with behavioral cloning and touch dreaming to predict future tactile latents, achieves a 90.9% relative success rate improvement over baselines on five real-world contact-rich humanoid loco-manipulation tasks.
-
Toward Seamless Physical Human-Humanoid Interaction: Insights from Control, Intent, and Modeling with a Vision for What Comes Next
A literature review of pHHI that proposes a taxonomy of interaction types by modality and engagement level while outlining pathways to integrate control, intent, and modeling for more seamless humanoid-human collaboration.