LLMs exhibit a persistent modality gap versus specialized audio encoders on MSEB tasks, with no conclusive evidence favoring audio-native over cascaded architectures.
Speech-massive: A multilingual speech dataset for slu and be- yond,
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.SD 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Benchmarking LLMs on the Massive Sound Embedding Benchmark (MSEB)
LLMs exhibit a persistent modality gap versus specialized audio encoders on MSEB tasks, with no conclusive evidence favoring audio-native over cascaded architectures.