ArtifactNet extracts codec residuals from spectrograms with a 4M-parameter network to detect AI music at F1=0.9829 and 1.49% FPR on unseen tracks from 22 generators, outperforming larger baselines.
Music source separation in the waveform domain
4 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
EnCodec is an end-to-end trained streaming neural audio codec that uses a single multiscale spectrogram discriminator and a gradient-normalizing loss balancer to achieve higher fidelity than prior methods at the same bitrates for 24 kHz mono and 48 kHz stereo audio.
MAGE unifies text, visual, and audio-conditioned music generation and editing in one flow-based latent model with dynamic modality masking and cross-gated control.
A Conformer-conditioned decoder-only language model generates discrete tokens via a neural audio codec to separate four music stems, reaching near state-of-the-art perceptual quality and top NISQA on vocals in MUSDB18-HQ tests.
citing papers explorer
-
ArtifactNet: Detecting AI-Generated Music via Forensic Residual Physics
ArtifactNet extracts codec residuals from spectrograms with a 4M-parameter network to detect AI music at F1=0.9829 and 1.49% FPR on unseen tracks from 22 generators, outperforming larger baselines.
-
High Fidelity Neural Audio Compression
EnCodec is an end-to-end trained streaming neural audio codec that uses a single multiscale spectrogram discriminator and a gradient-normalizing loss balancer to achieve higher fidelity than prior methods at the same bitrates for 24 kHz mono and 48 kHz stereo audio.
-
MAGE: Modality-Agnostic Music Generation and Editing
MAGE unifies text, visual, and audio-conditioned music generation and editing in one flow-based latent model with dynamic modality masking and cross-gated control.
-
Discrete Token Modeling for Multi-Stem Music Source Separation with Language Models
A Conformer-conditioned decoder-only language model generates discrete tokens via a neural audio codec to separate four music stems, reaching near state-of-the-art perceptual quality and top NISQA on vocals in MUSDB18-HQ tests.