Live Music Diffusion Models adapt bidirectional diffusion for interactive music generation via KV caching and ARC-Forcing, recovering and exceeding discrete autoregressive efficiency while enabling post-training alignment without RL.
Puonti, Oula, Juan Eugenio Iglesias, and Koen Van Leemput
4 Pith papers cite this work. Polarity classification is still indexing.
years
2026 4verdicts
UNVERDICTED 4representative citing papers
SemaVoice adds SFM-guided alignment to refine continuous speech representations in autoregressive TTS, reporting 1.71% English WER on Seed-TTS and competitiveness with open-source SOTA.
TimeLesSeg delivers a unified contrast-agnostic CNN for MS lesion segmentation that seamlessly handles both cross-sectional and longitudinal inputs by combining empty prior masks with stochastic morphological deformation of lesions during training.
LLaMo scales pretrained LLMs for unified motion-language tasks by encoding motion into continuous causal latents and adding a flow-matching head for real-time autoregressive generation and captioning.
citing papers explorer
-
Live Music Diffusion Models: Efficient Fine-Tuning and Post-Training of Interactive Diffusion Music Generators
Live Music Diffusion Models adapt bidirectional diffusion for interactive music generation via KV caching and ARC-Forcing, recovering and exceeding discrete autoregressive efficiency while enabling post-training alignment without RL.
-
SemaVoice: Semantic-Aware Continuous Autoregressive Speech Synthesis
SemaVoice adds SFM-guided alignment to refine continuous speech representations in autoregressive TTS, reporting 1.71% English WER on Seed-TTS and competitiveness with open-source SOTA.
-
TimeLesSeg: Unified Contrast-Agnostic Cross-Sectional and Longitudinal MS Lesion Segmentation via a Stochastic Generative Model
TimeLesSeg delivers a unified contrast-agnostic CNN for MS lesion segmentation that seamlessly handles both cross-sectional and longitudinal inputs by combining empty prior masks with stochastic morphological deformation of lesions during training.
-
LLaMo: Scaling Pretrained Language Models for Unified Motion Understanding and Generation with Continuous Autoregressive Tokens
LLaMo scales pretrained LLMs for unified motion-language tasks by encoding motion into continuous causal latents and adding a flow-matching head for real-time autoregressive generation and captioning.