Uni-Mo generates 7,488 language-annotated quadruped motions via LLM prompts and video diffusion, lifts them to 3D trajectories, and trains policies achieving 96.7% real-robot success on 392 sampled motions.
Title resolution pending
29 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.RO 29roles
background 4polarities
background 4representative citing papers
ReactiveBFM introduces a real-time closed-loop planning-control system for humanoids using curriculum-based error recovery and asynchronous replanning, achieving 93.1% success under severe perturbations in sim-to-sim tests.
FADA is a three-stage Planner-IDM method that achieves few-shot domain adaptation for humanoid control by distilling an oracle policy then finetuning only the IDM on short target-domain rollouts via supervised learning.
CWI decouples MoCap data for upper-body manipulation and lower-body locomotion, using dual discriminators and multi-critic training plus distillation to produce a policy that works from hand poses and velocity commands alone.
Stubborn introduces a unified RL framework with yaw-aligned representation, Bernoulli probabilistic termination, and adaptive sampling for robust humanoid motion tracking and fall recovery.
EgoPriMo learns a unified egocentric motion prior with a Triple-stream DiT model that supports reconstruction, generation, and forecasting of SMPL motions from egocentric views and text, outperforming prior methods and transferable to humanoid controllers.
MPC-based retargeting framework enables cross-morphology whole-body teleoperation from a single XR device via dynamic feasibility optimization, state synchronization, and SLAM feedback, with reported gains in simulation and real-world tests.
A data-centric approach shows that less than 3% of AMASS motion data, filtered by physics feasibility, diversity, and complexity, yields better humanoid tracking policies than the full dataset.
A multi-condition latent diffusion model transfers human motion styles to diverse humanoid robot contents with physics regularizations, achieving 96% success in real-robot trials on Unitree G1.
PHASOR factorizes motion into an FFT-based phase manifold and pose branch with semantic distillation to produce a cross-embodiment, human-anchored action embedding space for humanoid robots.
Lucid-XR uses XR-headset physics simulation and physics-guided video generation to create synthetic data that trains robot policies transferring zero-shot to unseen real-world manipulation tasks.
NMR uses VAE-based clustered expert physics refinement and a CNN-Transformer to learn dynamics-aware retargeting, eliminating joint jumps and self-collisions on Unitree G1 while accelerating downstream control policies.
Humanoid-LLA converts unconstrained natural language commands into stable whole-body motions for humanoid robots using a unified motion vocabulary and two-stage supervised-plus-reinforcement fine-tuning.
A multi-stage RL curriculum produces a unified whole-body controller enabling humanoid robots to sustain badminton rallies in simulation and return shuttles at up to 19.1 m/s in real hardware, with both EKF-based and prediction-free variants.
RIGVid shows that filtered AI-generated videos can serve as effective supervision for complex robotic manipulation tasks without any real demonstrations.
DreamPolicy integrates an autoregressive diffusion world model with policy learning to produce a single scalable policy that generalizes to unseen composite terrains for humanoid locomotion.
A single learned controller called MHC enables real humanoid robots to execute diverse whole-body behaviors from multi-modal inputs via masked target trajectories.
Marope applies hierarchical MARL with decentralized lower-level rope policies and a centralized scheduler to achieve cooperative long rope skipping on Unitree G1 humanoids in simulation and reality.
Humanoid-GPT is a causal Transformer pre-trained on a unified billion-scale motion dataset that tracks dynamic behaviors with zero-shot generalization to unseen motions and tasks.
ParkourFormer achieves 93.85% average success on multi-terrain humanoid parkour by fusing Transformer sequence modeling with supervised future-state prediction.
MuGen learns a generative latent representation of multi-skill humanoid locomotion from heterogeneous human data using VQ-VAEs and RL, then distills a deployable policy that tracks unseen motions and reuses the latent space.
Any2Any transfers humanoid whole-body tracking models across embodiments via kinematic alignment followed by targeted PEFT, matching full-training performance with 1% of the data and compute on tested platforms.
Switch enables humanoid robots to perform agile, seamless transitions between locomotion skills via a kinematic skill graph, DRL tracking policy, and real-time graph-search scheduler.
HTD, a multimodal transformer policy trained with behavioral cloning and touch dreaming to predict future tactile latents, achieves a 90.9% relative success rate improvement over baselines on five real-world contact-rich humanoid loco-manipulation tasks.