Pretraining audio SSL encoders on diverse French broadcast content rather than clean speech yields better downstream performance on ASR, music detection, and speaker recognition, with deduplication mitigating memorization.
We also assess the ability of our models to recall their pretraining dataset with a membership inference attack
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
eess.AS 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Data Selection Effects on Self-Supervised Learning of Audio Representations for French Audiovisual Broadcasts
Pretraining audio SSL encoders on diverse French broadcast content rather than clean speech yields better downstream performance on ASR, music detection, and speaker recognition, with deduplication mitigating memorization.