Analyzable chain-of-musical-thought prompting for high-fidelity music generation.arXiv preprint arXiv:2503.19611,

Max WY Lam, Yijin Xing, Weiya You, Jingcheng Wu, Zongyu Yin, Fuqiang Jiang, Hangyu Liu, Feng Liu, Xingda Li, Wei-Tsung Lu, et al · 2025 · arXiv 2503.19611

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

read on arXiv browse 4 citing papers

representative citing papers

JenBridge: Adaptive Long-Form Video Soundtracking across Scene Transitions

cs.SD · 2026-06-01 · unverdicted · novelty 6.0

JenBridge pretrains a flow-matching Transformer on text-audio data then adapts it with video conditioning and an LLM director to select transitions, claiming better coherence than prior methods on a new LVS benchmark.

UniVocal: Unified Speech-Singing Code-Switching Synthesis

cs.SD · 2026-06-01 · unverdicted · novelty 6.0

UniVocal presents a text-context-only framework for speech-singing code-switching synthesis via two-stage curriculum learning and a synthetic data pipeline, claiming SOTA on a new benchmark.

LeVo 2: Stable and Melodious Song Generation via Hierarchical Representation Modeling and Progressive Post-Training

cs.SD · 2026-06-29 · unverdicted · novelty 5.0

LeVo 2 presents a hierarchical LLM-Diffusion model with progressive post-training stages to generate full-length songs that balance semantic planning, track-specific acoustics, and musicality.

SketchSong: Hierarchical Song Generation with Sketch Planning and Fine-Grained Multi-Track Modeling

cs.SD · 2026-06-02 · unverdicted · novelty 5.0

SketchSong uses temporal sketch planning with high-level tokens and explicit modeling of four tracks (vocals, bass, drums, other) to generate more coherent songs than baselines.

citing papers explorer

Showing 4 of 4 citing papers after filters.

JenBridge: Adaptive Long-Form Video Soundtracking across Scene Transitions cs.SD · 2026-06-01 · unverdicted · none · ref 11
JenBridge pretrains a flow-matching Transformer on text-audio data then adapts it with video conditioning and an LLM director to select transitions, claiming better coherence than prior methods on a new LVS benchmark.
UniVocal: Unified Speech-Singing Code-Switching Synthesis cs.SD · 2026-06-01 · unverdicted · none · ref 52
UniVocal presents a text-context-only framework for speech-singing code-switching synthesis via two-stage curriculum learning and a synthetic data pipeline, claiming SOTA on a new benchmark.
LeVo 2: Stable and Melodious Song Generation via Hierarchical Representation Modeling and Progressive Post-Training cs.SD · 2026-06-29 · unverdicted · none · ref 17
LeVo 2 presents a hierarchical LLM-Diffusion model with progressive post-training stages to generate full-length songs that balance semantic planning, track-specific acoustics, and musicality.
SketchSong: Hierarchical Song Generation with Sketch Planning and Fine-Grained Multi-Track Modeling cs.SD · 2026-06-02 · unverdicted · none · ref 13
SketchSong uses temporal sketch planning with high-level tokens and explicit modeling of four tracks (vocals, bass, drums, other) to generate more coherent songs than baselines.

Analyzable chain-of-musical-thought prompting for high-fidelity music generation.arXiv preprint arXiv:2503.19611,

fields

years

verdicts

representative citing papers

citing papers explorer