Cascaded systems remain the most reliable for speech translation overall, but recent SpeechLLMs match or outperform them in many conditions while standalone speech models lag.
InProceedings of the Seventh Conference on Machine Translation (WMT), pages 578–585, Abu Dhabi, United Arab Emirates (Hybrid)
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.CL 2verdicts
UNVERDICTED 2representative citing papers
Meta-evaluation on gender and prosody contrastive datasets finds text and speech quality estimation metrics fall short at assessing speech-specific features, including newly trained SpeechCOMET models.
citing papers explorer
-
Hearing to Translate: The Effectiveness of Speech Modality Integration into LLMs
Cascaded systems remain the most reliable for speech translation overall, but recent SpeechLLMs match or outperform them in many conditions while standalone speech models lag.
-
Why We Need Speech to Evaluate Speech Translation
Meta-evaluation on gender and prosody contrastive datasets finds text and speech quality estimation metrics fall short at assessing speech-specific features, including newly trained SpeechCOMET models.