SpeechMedAssist: Efficiently and Effectively Adapting Speech Language Models for Medical Consultation

· 2026 · cs.CL · arXiv 2601.04638

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

Medical consultations are intrinsically speech-centric. However, most prior works focus on long-text-based interactions, which are cumbersome and patient-unfriendly. Recent advances in speech language models (SpeechLMs) have enabled more natural speech-based interaction, yet the scarcity of medical speech data and the inefficiency of directly fine-tuning on speech data jointly hinder the adoption of SpeechLMs in medical consultation. In this paper, we propose SpeechMedAssist, a SpeechLM natively capable of conducting speech-based multi-turn interactions with patients. By exploiting the architectural properties of SpeechLMs, we decouple the conventional one-stage training into a two-stage paradigm consisting of (1) Knowledge & Capability Injection via Text and (2) Modality Re-alignment with Limited Speech Data, thereby reducing the requirement for medical speech data to only 10k synthesized samples. To evaluate SpeechLMs for medical consultation scenarios, we design a benchmark comprising both single-turn question answering and multi-turn simulated interactions. Experimental results show that our model outperforms all baselines in both effectiveness and robustness in most evaluation settings.

representative citing papers

Beyond Isolated Behaviors: Hierarchical User Modeling for LLM Personalization

cs.CL · 2026-06-01 · unverdicted · novelty 5.0

PHF applies Bourdieu's Theory of Practice to create hierarchical user models for LLM personalization and reports consistent gains on the LaMP benchmark.

citing papers explorer

Showing 1 of 1 citing paper after filters.

Beyond Isolated Behaviors: Hierarchical User Modeling for LLM Personalization cs.CL · 2026-06-01 · unverdicted · none · ref 2 · internal anchor
PHF applies Bourdieu's Theory of Practice to create hierarchical user models for LLM personalization and reports consistent gains on the LaMP benchmark.

SpeechMedAssist: Efficiently and Effectively Adapting Speech Language Models for Medical Consultation

fields

years

verdicts

representative citing papers

citing papers explorer