{"paper":{"title":"MMSU: A Massive Multi-task Spoken Language Understanding and Reasoning Benchmark","license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","headline":"MMSU benchmark shows current SpeechLLMs have substantial room for improvement in fine-grained spoken language understanding and reasoning.","cross_cats":["cs.SD","eess.AS"],"primary_cat":"cs.CL","authors_text":"Dingdong Wang, Dongchao Yang, Helen Meng, Jincenzi Wu, Junan Li, Tianhua Zhang, Xueyuan Chen","submitted_at":"2025-06-05T09:09:36Z","abstract_excerpt":"Speech inherently contains rich acoustic information that extends far beyond the textual language. In real-world spoken language understanding, effective interpretation often requires integrating semantic meaning (e.g., content), paralinguistic features (e.g., emotions, speed, pitch) and phonological characteristics (e.g., prosody, intonation, rhythm), which are embedded in speech. While recent multimodal Speech Large Language Models (SpeechLLMs) have demonstrated remarkable capabilities in processing audio information, their ability to perform fine-grained perception and complex reasoning in "},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"Through a rigorous evaluation of 14 advanced SpeechLLMs, we identify substantial room for improvement in existing models, highlighting meaningful directions for future optimization.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"The 5,000 audio-question-answer triplets have been meticulously curated to fairly and comprehensively represent the targeted linguistic phenomena without introducing selection bias or annotation artifacts that would distort model comparisons.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"MMSU is a new benchmark with 5,000 curated audio-QA pairs across 47 linguistically grounded tasks that reveals substantial limitations in existing SpeechLLMs for fine-grained spoken language understanding and reasoning.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"MMSU benchmark shows current SpeechLLMs have substantial room for improvement in fine-grained spoken language understanding and reasoning.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"3f637408ee71faa0360be6441e48eb7f78d557260d0a65e837a766dbff650d78"},"source":{"id":"2506.04779","kind":"arxiv","version":3},"verdict":{"id":"82e7f7c0-b112-499e-80ee-1ee5f79d68bf","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-17T17:16:48.388929Z","strongest_claim":"Through a rigorous evaluation of 14 advanced SpeechLLMs, we identify substantial room for improvement in existing models, highlighting meaningful directions for future optimization.","one_line_summary":"MMSU is a new benchmark with 5,000 curated audio-QA pairs across 47 linguistically grounded tasks that reveals substantial limitations in existing SpeechLLMs for fine-grained spoken language understanding and reasoning.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"The 5,000 audio-question-answer triplets have been meticulously curated to fairly and comprehensively represent the targeted linguistic phenomena without introducing selection bias or annotation artifacts that would distort model comparisons.","pith_extraction_headline":"MMSU benchmark shows current SpeechLLMs have substantial room for improvement in fine-grained spoken language understanding and reasoning."},"references":{"count":0,"sample":[],"resolved_work":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57","internal_anchors":0},"formal_canon":{"evidence_count":2,"snapshot_sha256":"dca2635067f684e6bff1cf0a06edad606b39ef3ad0e6c1fdacc896b8f4b952db"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"}