Mospa: Human motion generation driven by spatial audio

Shuyang Xu, Zhiyang Dou, Mingyi Shi, Liang Pan, Leo Ho, Jingbo Wang, Yuan Liu, Cheng Lin, Yuexin Ma, Wenping Wang, et al · 2025 · arXiv 2507.11949

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

read on arXiv browse 3 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

PianoFlow: Music-Aware Streaming Piano Motion Generation with Bimanual Coordination

cs.CV · 2026-04-14 · unverdicted · novelty 6.0

PianoFlow generates coordinated bimanual piano motions from audio via MIDI-distilled flow-matching, asymmetric role-gated interaction, and autoregressive streaming continuation, outperforming priors with 9x faster inference.

Next-Scale Autoregressive Models for Text-to-Motion Generation

cs.CV · 2026-04-04 · unverdicted · novelty 6.0

MoScale introduces a hierarchical next-scale autoregressive framework for text-to-motion generation that achieves state-of-the-art performance by refining motions from coarse to fine temporal resolutions.

LLaMo: Scaling Pretrained Language Models for Unified Motion Understanding and Generation with Continuous Autoregressive Tokens

cs.CV · 2026-02-12 · unverdicted · novelty 6.0

LLaMo scales pretrained LLMs for unified motion-language tasks by encoding motion into continuous causal latents and adding a flow-matching head for real-time autoregressive generation and captioning.

citing papers explorer

Showing 3 of 3 citing papers.

PianoFlow: Music-Aware Streaming Piano Motion Generation with Bimanual Coordination cs.CV · 2026-04-14 · unverdicted · none · ref 66
PianoFlow generates coordinated bimanual piano motions from audio via MIDI-distilled flow-matching, asymmetric role-gated interaction, and autoregressive streaming continuation, outperforming priors with 9x faster inference.
Next-Scale Autoregressive Models for Text-to-Motion Generation cs.CV · 2026-04-04 · unverdicted · none · ref 53
MoScale introduces a hierarchical next-scale autoregressive framework for text-to-motion generation that achieves state-of-the-art performance by refining motions from coarse to fine temporal resolutions.
LLaMo: Scaling Pretrained Language Models for Unified Motion Understanding and Generation with Continuous Autoregressive Tokens cs.CV · 2026-02-12 · unverdicted · none · ref 83
LLaMo scales pretrained LLMs for unified motion-language tasks by encoding motion into continuous causal latents and adding a flow-matching head for real-time autoregressive generation and captioning.

Mospa: Human motion generation driven by spatial audio

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer