LeVo 2 presents a hierarchical LLM-Diffusion model with progressive post-training stages to generate full-length songs that balance semantic planning, track-specific acoustics, and musicality.
Muse: Towards reproducible long- form song generation with fine-grained style control,
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.SD 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
SketchSong uses temporal sketch planning with high-level tokens and explicit modeling of four tracks (vocals, bass, drums, other) to generate more coherent songs than baselines.
citing papers explorer
-
LeVo 2: Stable and Melodious Song Generation via Hierarchical Representation Modeling and Progressive Post-Training
LeVo 2 presents a hierarchical LLM-Diffusion model with progressive post-training stages to generate full-length songs that balance semantic planning, track-specific acoustics, and musicality.
-
SketchSong: Hierarchical Song Generation with Sketch Planning and Fine-Grained Multi-Track Modeling
SketchSong uses temporal sketch planning with high-level tokens and explicit modeling of four tracks (vocals, bass, drums, other) to generate more coherent songs than baselines.