A Conformer-conditioned decoder-only language model generates discrete tokens via a neural audio codec to separate four music stems, reaching near state-of-the-art perceptual quality and top NISQA on vocals in MUSDB18-HQ tests.
Music source separation with band-split rnn
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
eess.AS 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
Feeding noisy and enhanced speech together into a speaker encoder with EMA adaptation from clean pre-training improves recognition accuracy under noise.
citing papers explorer
-
Discrete Token Modeling for Multi-Stem Music Source Separation with Language Models
A Conformer-conditioned decoder-only language model generates discrete tokens via a neural audio codec to separate four music stems, reaching near state-of-the-art perceptual quality and top NISQA on vocals in MUSDB18-HQ tests.
-
UNet-Based Fusion and Exponential Moving Average Adaptation for Noise-Robust Speaker Recognition
Feeding noisy and enhanced speech together into a speaker encoder with EMA adaptation from clean pre-training improves recognition accuracy under noise.