arXiv preprint arXiv:2411.17335 , year=

VersatileMotion: A Unified Framework for Motion Synthesis, Comprehension , author= · 2024 · arXiv 2411.17335

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

representative citing papers

NextMotionQA: Benchmarking and Judging Human Motion Understanding with Vision-Language Models

cs.CV · 2026-06-03 · unverdicted · novelty 7.0

NextMotionQA benchmark reveals VLMs have critical gaps in fine-grained human motion understanding and align with experts on coarse judgment (κ=0.70) but not fine-grained (κ=0.10).

Plan, Don't Pose: Long Composite Motion Generation with Text-Aligned BFM

cs.LG · 2026-05-28 · unverdicted · novelty 6.0

Text2BFM aligns language with a frozen BFM via a text-aligned variational behavioral bottleneck to generate long motions by decoding latents into policy actions.

AnyMo: Scaling Any-Modality Conditional Motion Generation with Masked Modeling

cs.CV · 2026-05-28 · unverdicted · novelty 6.0

AnyMo is a masked-modeling framework for any-modality human motion generation trained on the new OmniHuMo dataset of 5,000+ hours of multimodal motion sequences.

UMo: Unified Sparse Motion Modeling for Real-Time Co-Speech Avatars

cs.GR · 2026-05-14 · unverdicted · novelty 4.0

UMo presents a sparse MoE-based unified model for real-time co-speech avatar animation that claims superior quality under latency constraints via keyframe-centric design and multi-stage audio-augmented training.

citing papers explorer

Showing 2 of 2 citing papers after filters.

NextMotionQA: Benchmarking and Judging Human Motion Understanding with Vision-Language Models cs.CV · 2026-06-03 · unverdicted · none · ref 26
NextMotionQA benchmark reveals VLMs have critical gaps in fine-grained human motion understanding and align with experts on coarse judgment (κ=0.70) but not fine-grained (κ=0.10).
AnyMo: Scaling Any-Modality Conditional Motion Generation with Masked Modeling cs.CV · 2026-05-28 · unverdicted · none · ref 23
AnyMo is a masked-modeling framework for any-modality human motion generation trained on the new OmniHuMo dataset of 5,000+ hours of multimodal motion sequences.

arXiv preprint arXiv:2411.17335 , year=

fields

years

verdicts

representative citing papers

citing papers explorer