ScaleMoGen introduces a scale-wise autoregressive framework that quantizes motions into hierarchical discrete tokens and predicts next-scale maps to achieve SOTA FID 0.030 on HumanML3D and text-guided editing.
Rethinking diffusion for text-driven human motion generation
4 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
representative citing papers
Scaling motion tracking models along size, data volume, and compute produces a foundation model for natural, robust humanoid whole-body control with downstream uses in kinematic planning and vision-language-action models.
MARRS synthesizes fine-grained reaction motions via unit-distinguished VAE, masked action-conditioned fusion, mutual unit modulation, and compact MLP diffusion predictors.
citing papers explorer
-
ScaleMoGen: Autoregressive Next-Scale Prediction for Human Motion Generation
ScaleMoGen introduces a scale-wise autoregressive framework that quantizes motions into hierarchical discrete tokens and predicts next-scale maps to achieve SOTA FID 0.030 on HumanML3D and text-guided editing.
-
SONIC: Supersizing Motion Tracking for Natural Humanoid Whole-Body Control
Scaling motion tracking models along size, data volume, and compute produces a foundation model for natural, robust humanoid whole-body control with downstream uses in kinematic planning and vision-language-action models.
-
MARRS: Masked Autoregressive Unit-based Reaction Synthesis
MARRS synthesizes fine-grained reaction motions via unit-distinguished VAE, masked action-conditioned fusion, mutual unit modulation, and compact MLP diffusion predictors.
- Next-Scale Autoregressive Models for Text-to-Motion Generation