TeMuDance enables text-based semantic control over music-conditioned dance generation by using motion as a bridge to align existing unpaired datasets and training a lightweight text branch on a frozen diffusion backbone with noise-filtered supervision.
Motion anything: Any to motion generation
6 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 6roles
background 3polarities
background 3representative citing papers
ViBES introduces a speech-language-behavior model using modality-specific transformer experts that jointly generates dialogue and 3D body actions, showing gains over separate co-speech and text-to-motion baselines on multi-turn metrics.
AnchorRoute couples anchor-conditioned generation via AnchorKV on a frozen text-to-motion diffusion prior with residual-routed refinement through RouteSolver on piecewise-affine interval bases.
PresentAgent-2 generates query-driven multimodal presentation videos with research grounding, supporting single-speaker, multi-speaker discussion, and interactive question-answering modes.
UniMesh unifies 3D mesh generation and understanding in one model via a Mesh Head interface, Chain of Mesh iterative editing, and an Actor-Evaluator self-reflection loop.
MOGO introduces MoSA-VQ residual quantization and RQHC-Transformer for efficient real-time text-to-3D-motion generation with competitive quality on HumanML3D, KIT-ML and CMP.
citing papers explorer
-
TeMuDance: Contrastive Alignment-Based Textual Control for Music-Driven Dance Generation
TeMuDance enables text-based semantic control over music-conditioned dance generation by using motion as a bridge to align existing unpaired datasets and training a lightweight text branch on a frozen diffusion backbone with noise-filtered supervision.
-
ViBES: A Conversational Agent with Behaviorally-Intelligent 3D Virtual Body
ViBES introduces a speech-language-behavior model using modality-specific transformer experts that jointly generates dialogue and 3D body actions, showing gains over separate co-speech and text-to-motion baselines on multi-turn metrics.
-
AnchorRoute: Human Motion Synthesis with Interval-Routed Sparse Contro
AnchorRoute couples anchor-conditioned generation via AnchorKV on a frozen text-to-motion diffusion prior with residual-routed refinement through RouteSolver on piecewise-affine interval bases.
-
PresentAgent-2: Towards Generalist Multimodal Presentation Agents
PresentAgent-2 generates query-driven multimodal presentation videos with research grounding, supporting single-speaker, multi-speaker discussion, and interactive question-answering modes.
-
UniMesh: Unifying 3D Mesh Understanding and Generation
UniMesh unifies 3D mesh generation and understanding in one model via a Mesh Head interface, Chain of Mesh iterative editing, and an Actor-Evaluator self-reflection loop.
-
MOGO: Residual Quantized Hierarchical Causal Transformer for High-Quality and Real-Time 3D Human Motion Generation
MOGO introduces MoSA-VQ residual quantization and RQHC-Transformer for efficient real-time text-to-3D-motion generation with competitive quality on HumanML3D, KIT-ML and CMP.