SALSA adapts speech-aware LLMs via supervised layer-wise steering vectors, reporting up to 46.8% relative gains over zero-shot on out-of-domain speech benchmarks.
Activation Steering for Accent Adaptation in Large Audio Language Models
1 Pith paper cite this work. Polarity classification is still indexing.
abstract
Accent variability remains a major source of errors in automatic speech recognition, yet most adaptation methods rely on parameter fine-tuning without understanding where accent information is encoded. We treat accent variation as an interpretable subspace in hidden representations and investigate whether it can be identified and controlled directly in activation space. We extract layer-wise encoder activations and estimate mean-shift directions capturing accent-induced representation shifts. By injecting these directions into individual layers and measuring how they align accented and standard embeddings, we derive a layer-wise accent sensitivity profile, revealing that accent information concentrates in a narrow band of middle encoder layers. Leveraging this structure, we further introduce parameter-free accent steering that modifies representations during inference without updating model weights. Experiments across eight accents show consistent word error rate reductions.
fields
cs.CL 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
SALSA: Speech Aware LLM Adaptation via Learned Steering Activation Vectors
SALSA adapts speech-aware LLMs via supervised layer-wise steering vectors, reporting up to 46.8% relative gains over zero-shot on out-of-domain speech benchmarks.