pith. sign in

arxiv: 2606.06211 · v1 · pith:WZKPQL46new · submitted 2026-06-04 · 💻 cs.CL · cs.SD· eess.AS

FiLM-Based Speaker Conditioning of a SpeechLLM for Pathological Speech Recognition

classification 💻 cs.CL cs.SDeess.AS
keywords speechpathologicalconditioningmodelrecognitionspeakerstandardability
0
0 comments X
read the original abstract

Automatic speech recognition (ASR) has advanced remarkably for standard speech; however, pathological speech from neurological conditions remains a significant challenge. We investigate speaker conditioning via Feature-wise Linear Modulation (FiLM), injecting x-vector-derived information into each transformer layer of a frozen ASR encoder to adapt internal representations to individual pathological speakers without modifying base model weights. We benchmark this for the ASR task against standard and parameter-efficient fine-tuning baselines, complemented by post-processing, on Spanish and English pathological speech. Additionally, we evaluate if the adapted model preserves the ability to answer speech-related questions. Results show that speaker-conditioned ASR is competitive with established adaptation strategies while retaining performance on non-conditioned speech.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.