pith. machine review for the scientific record. sign in

Music source separation in the waveform domain

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

years

2026 3 2022 1

representative citing papers

High Fidelity Neural Audio Compression

eess.AS · 2022-10-24 · accept · novelty 7.0

EnCodec is an end-to-end trained streaming neural audio codec that uses a single multiscale spectrogram discriminator and a gradient-normalizing loss balancer to achieve higher fidelity than prior methods at the same bitrates for 24 kHz mono and 48 kHz stereo audio.

MAGE: Modality-Agnostic Music Generation and Editing

cs.SD · 2026-04-10 · unverdicted · novelty 6.0

MAGE unifies text, visual, and audio-conditioned music generation and editing in one flow-based latent model with dynamic modality masking and cross-gated control.

citing papers explorer

Showing 4 of 4 citing papers.

  • ArtifactNet: Detecting AI-Generated Music via Forensic Residual Physics cs.SD · 2026-04-17 · unverdicted · none · ref 19

    ArtifactNet extracts codec residuals from spectrograms with a 4M-parameter network to detect AI music at F1=0.9829 and 1.49% FPR on unseen tracks from 22 generators, outperforming larger baselines.

  • High Fidelity Neural Audio Compression eess.AS · 2022-10-24 · accept · none · ref 8

    EnCodec is an end-to-end trained streaming neural audio codec that uses a single multiscale spectrogram discriminator and a gradient-normalizing loss balancer to achieve higher fidelity than prior methods at the same bitrates for 24 kHz mono and 48 kHz stereo audio.

  • MAGE: Modality-Agnostic Music Generation and Editing cs.SD · 2026-04-10 · unverdicted · none · ref 5

    MAGE unifies text, visual, and audio-conditioned music generation and editing in one flow-based latent model with dynamic modality masking and cross-gated control.

  • Discrete Token Modeling for Multi-Stem Music Source Separation with Language Models eess.AS · 2026-04-10 · unverdicted · none · ref 16

    A Conformer-conditioned decoder-only language model generates discrete tokens via a neural audio codec to separate four music stems, reaching near state-of-the-art perceptual quality and top NISQA on vocals in MUSDB18-HQ tests.