A two-stage coarse-to-fine acoustic token language model with hybrid attention generates high-fidelity music and achieves emergent lyric alignment without separate semantic tokens.
Title resolution pending
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.SD 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Khala: Scaling Acoustic Token Language Models Toward High-Fidelity Music Generation
A two-stage coarse-to-fine acoustic token language model with hybrid attention generates high-fidelity music and achieves emergent lyric alignment without separate semantic tokens.