Multimodal LLMs are not all you need for Pediatric Speech Language Pathology

· 2026 · cs.CL · arXiv 2604.26568

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

Speech Sound Disorders (SSD) affect roughly five percent of children, yet speech-language pathologists face severe staffing shortages and unmanageable caseloads. We test a hierarchical approach to SSD classification on the granular multi-task SLPHelmUltraSuitePlus benchmark. We propose a cascading approach from binary classification to type, and symptom classification. By fine-tuning Speech Representation Models (SRM), and using targeted data augmentation we mitigate biases found by previous works, and improve upon all clinical tasks in the benchmark. We also treat Automatic Speech Recognition (ASR) with our data augmentation approach. Our results demonstrate that SRM consistently outperform the LLM-based state-of-the-art across all evaluated tasks by a large margin. We publish our models and code to foster future research.

representative citing papers

Multimodal LLMs are not all you need for Pediatric Speech Language Pathology

cs.CL · 2026-04-29 · unverdicted · novelty 5.0

Fine-tuned speech representation models with hierarchical classification outperform multimodal LLMs on pediatric speech sound disorder tasks.

citing papers explorer

Showing 1 of 1 citing paper.

Multimodal LLMs are not all you need for Pediatric Speech Language Pathology cs.CL · 2026-04-29 · unverdicted · none · ref 1 · internal anchor
Fine-tuned speech representation models with hierarchical classification outperform multimodal LLMs on pediatric speech sound disorder tasks.

Multimodal LLMs are not all you need for Pediatric Speech Language Pathology

fields

years

verdicts

representative citing papers

citing papers explorer