AudioMosaic learns general-purpose audio representations through contrastive pre-training with structured spectrogram masking, reaching state-of-the-art results on standard benchmarks and improving audio-language tasks.
Esdd 2026: Environmental sound deepfake detection challenge evaluation plan
3 Pith papers cite this work. Polarity classification is still indexing.
years
2026 3representative citing papers
DeepFense supplies a unified toolkit and large-scale benchmarks showing that pre-trained front-end feature extractors drive most performance differences while top models exhibit strong biases by audio quality, speaker gender, and language.
EnvTriCascade is a tri-stage cascaded framework using mix-consistency detection followed by dual SSL-based five-class classifiers with cross-branch attention and RawBoost augmentation, achieving 0.8266 Macro-F1 on the ESDD2 2026 challenge test set.
citing papers explorer
-
AudioMosaic: Contrastive Masked Audio Representation Learning
AudioMosaic learns general-purpose audio representations through contrastive pre-training with structured spectrogram masking, reaching state-of-the-art results on standard benchmarks and improving audio-language tasks.
-
DeepFense: A Unified, Modular, and Extensible Framework for Robust Deepfake Audio Detection
DeepFense supplies a unified toolkit and large-scale benchmarks showing that pre-trained front-end feature extractors drive most performance differences while top models exhibit strong biases by audio quality, speaker gender, and language.
-
EnvTriCascade: An Environment-Aware Tri-Stage Cascaded Framework for ESDD2 2026 Challenge
EnvTriCascade is a tri-stage cascaded framework using mix-consistency detection followed by dual SSL-based five-class classifiers with cross-branch attention and RawBoost augmentation, achieving 0.8266 Macro-F1 on the ESDD2 2026 challenge test set.