Comparative study finds that tokenization choices and SSL pretraining models produce distinct effects on French ASR when assessed with linguistic and acoustic metrics beyond CER and WER.
The ETAPE corpus for the evaluation of speech-based TV content processing in the French language,
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CL 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
A Comprehensive Analysis of Tokenization and Self-Supervised Learning in End-to-End Automatic Speech Recognition applied on French Language
Comparative study finds that tokenization choices and SSL pretraining models produce distinct effects on French ASR when assessed with linguistic and acoustic metrics beyond CER and WER.