Standard-to-Dialect Transfer Trends Differ across Text and Speech: A Case Study on Intent and Topic Classification in German Dialects

· 2025 · cs.CL · arXiv 2510.07890

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

Research on cross-dialectal transfer from a standard to a non-standard dialect variety has typically focused on text data. However, dialects are primarily spoken, and non-standard spellings cause issues in text processing. We compare standard-to-dialect transfer in three settings: text models, speech models, and cascaded systems where speech first gets automatically transcribed and then further processed by a text model. We focus on German dialects in the context of written and spoken intent classification -- releasing the first dialectal audio intent classification dataset -- with supporting experiments on topic classification. The speech-only setup provides the best results on the dialect data while the text-only setup works best on the standard data. While the cascaded systems lag behind the text-only models for German, they perform relatively well on the dialectal data if the transcription system generates normalized, standard-like output.

representative citing papers

Subtitle-Aligned Fine-Tuning of Whisper for Swiss German ASR: Benchmark Contamination, Convention Mismatch, and an Honest Baseline at 25.6% WER (13.8% cWER)

cs.CL · 2026-05-29 · unverdicted · novelty 7.0

Fine-tuning Whisper on Swiss German speech with subtitle supervision yields an honest 25.6% WER baseline (13.8% cWER) and demonstrates that prior SOTA claims of 17% WER result from benchmark contamination allowing 13.88% WER with no dialect training.

citing papers explorer

Showing 1 of 1 citing paper.

Subtitle-Aligned Fine-Tuning of Whisper for Swiss German ASR: Benchmark Contamination, Convention Mismatch, and an Honest Baseline at 25.6% WER (13.8% cWER) cs.CL · 2026-05-29 · unverdicted · none · ref 2 · internal anchor
Fine-tuning Whisper on Swiss German speech with subtitle supervision yields an honest 25.6% WER baseline (13.8% cWER) and demonstrates that prior SOTA claims of 17% WER result from benchmark contamination allowing 13.88% WER with no dialect training.

Standard-to-Dialect Transfer Trends Differ across Text and Speech: A Case Study on Intent and Topic Classification in German Dialects

fields

years

verdicts

representative citing papers

citing papers explorer