Ring mixing and SCER loss break symmetry in noisy speech separation training, allowing models to learn denoising from noisy mixtures alone and halve residual noise on benchmarks.
V oxCeleb2: Deep speaker recognition,
3 Pith papers cite this work. Polarity classification is still indexing.
fields
eess.AS 3years
2026 3verdicts
UNVERDICTED 3representative citing papers
LRS-VoxMM is a new in-the-wild AVSR benchmark that is harder than LRS3 and demonstrates increasing value of visual information under acoustic degradation.
Feeding noisy and enhanced speech together into a speaker encoder with EMA adaptation from clean pre-training improves recognition accuracy under noise.
citing papers explorer
-
Ring Mixing with Auxiliary Signal-to-Consistency-Error Ratio Loss for Unsupervised Denoising in Speech Separation
Ring mixing and SCER loss break symmetry in noisy speech separation training, allowing models to learn denoising from noisy mixtures alone and halve residual noise on benchmarks.
-
LRS-VoxMM: A benchmark for in-the-wild audio-visual speech recognition
LRS-VoxMM is a new in-the-wild AVSR benchmark that is harder than LRS3 and demonstrates increasing value of visual information under acoustic degradation.
-
UNet-Based Fusion and Exponential Moving Average Adaptation for Noise-Robust Speaker Recognition
Feeding noisy and enhanced speech together into a speaker encoder with EMA adaptation from clean pre-training improves recognition accuracy under noise.