Mixing 636 hours of LLM-generated synthetic conversations with 67 hours of real data outperforms a model trained on 2700 hours of real Hungarian speech on the BEA-Dialogue benchmark.
Best Data is more Supervised Data – Even for Hungarian ASR , year =
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CL 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Efficient ASR Training with Conversations that Never Happened
Mixing 636 hours of LLM-generated synthetic conversations with 67 hours of real data outperforms a model trained on 2700 hours of real Hungarian speech on the BEA-Dialogue benchmark.