A Conformer-conditioned decoder-only language model generates discrete tokens via a neural audio codec to separate four music stems, reaching near state-of-the-art perceptual quality and top NISQA on vocals in MUSDB18-HQ tests.
Kuielab-mdx- net: A two-stream neural network for music demixing
2 Pith papers cite this work. Polarity classification is still indexing.
fields
eess.AS 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
PS-TTS and PS-Comet TTS use isochrony via language model paraphrasing plus phonetic synchronization with DTW on vowel distances to achieve better lip-sync and semantic preservation in automated dubbing than standard TTS or voice actors on tested language pairs.
citing papers explorer
-
Discrete Token Modeling for Multi-Stem Music Source Separation with Language Models
A Conformer-conditioned decoder-only language model generates discrete tokens via a neural audio codec to separate four music stems, reaching near state-of-the-art perceptual quality and top NISQA on vocals in MUSDB18-HQ tests.
-
PS-TTS: Phonetic Synchronization in Text-to-Speech for Achieving Natural Automated Dubbing
PS-TTS and PS-Comet TTS use isochrony via language model paraphrasing plus phonetic synchronization with DTW on vowel distances to achieve better lip-sync and semantic preservation in automated dubbing than standard TTS or voice actors on tested language pairs.