CoAMD unifies skeleton-based action recognition and text-to-motion generation through autoregressive diffusion guided by a multi-modal recognizer, reporting SOTA results on 13 benchmarks for four tasks.
Tm2t: Stochastic and tokenized modeling for the reciprocal genera- tion of 3d human motions and texts
3 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.CV 3years
2026 3verdicts
UNVERDICTED 3representative citing papers
PolySLGen generates contextually appropriate and temporally coherent multimodal speaking and listening reactions for polyadic interactions by fusing group motion and social cues.
PhysiGen reduces interpenetration in text-driven 3D human interaction generation by simplifying meshes to geometric primitives for fast collision detection and guiding optimization with collision regions.
citing papers explorer
-
Marrying Text-to-Motion Generation with Skeleton-Based Action Recognition
CoAMD unifies skeleton-based action recognition and text-to-motion generation through autoregressive diffusion guided by a multi-modal recognizer, reporting SOTA results on 13 benchmarks for four tasks.
-
PolySLGen: Online Multimodal Speaking-Listening Reaction Generation in Polyadic Interaction
PolySLGen generates contextually appropriate and temporally coherent multimodal speaking and listening reactions for polyadic interactions by fusing group motion and social cues.
-
PhysiGen: Integrating Collision-Aware Physical Constraints for High-Fidelity Human-Human Interaction Generation
PhysiGen reduces interpenetration in text-driven 3D human interaction generation by simplifying meshes to geometric primitives for fast collision detection and guiding optimization with collision regions.