Whisper Hallucination Detection and Mitigation via Hidden Representation Steering and Sparse AutoEncoders

Assel Yermekova; Georgii Aparin; Tasnima Sadekova; Vadim Popov

arxiv: 2606.07473 · v1 · pith:JFZJCQ2Nnew · submitted 2026-06-05 · 💻 cs.SD · cs.AI

Whisper Hallucination Detection and Mitigation via Hidden Representation Steering and Sparse AutoEncoders

Georgii Aparin , Vadim Popov , Tasnima Sadekova , Assel Yermekova This is my paper

Pith reviewed 2026-06-27 20:43 UTC · model grok-4.3

classification 💻 cs.SD cs.AI

keywords Whisper ASRhallucination mitigationSparse AutoEncodersrepresentation steeringaudio hallucinationsencoder activationsnon-speech detectionactivation steering

0 comments

The pith

Steering Sparse AutoEncoder latents in Whisper cuts non-speech hallucinations from 87 percent to 27 percent.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests whether hallucinations in the Whisper ASR model, which produces coherent but false transcripts on non-speech audio, can be detected and reduced by changing internal encoder activations. It shows that both raw activations and Sparse AutoEncoder latents hold linearly separable signals for these errors, with the signal growing stronger in deeper layers and concentrated in sparse features. Two steering methods are introduced, and SAE latent steering produces the largest drops in hallucination rate on non-speech test sets while keeping word error rate on speech inputs nearly unchanged. This reaches performance close to fine-tuning approaches without retraining the full model. A reader would care because many real-world uses of ASR involve noisy, silent, or ambiguous audio where false outputs undermine reliability.

Core claim

Whisper generates hallucinations on non-speech inputs. These errors are detectable in audio encoder activations, where the relevant information is linearly separable and concentrated in sparse subsets that become more prominent in deeper layers. Activation-space steering and SAE latent-space steering both reduce the problem, with SAE steering lowering the hallucination rate from 72.63 percent to 14.11 percent for the small model and from 86.88 percent to 27.33 percent for large-v3 on the full non-speech test set, accompanied by only small WER degradation on speech data.

What carries the argument

SAE latent-space steering, which locates and adjusts specific latents tied to hallucination features to shift model outputs away from false transcripts.

If this is right

Hallucination rates fall sharply on complete non-speech test collections.
Word error rate on speech data shows only minor increases.
The method reaches results comparable to fine-tuning without model retraining.
Discriminative power concentrates in sparse feature subsets and strengthens toward deeper encoder layers.
Both raw activations and SAE latents encode the hallucination information in a linearly separable way.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The approach could extend to other encoder-based ASR models by extracting similar activation spaces.
Real-time steering during inference might enable on-the-fly correction for live audio streams.
Pairing the method with separate hallucination detectors could create layered safeguards.
The observed linear separability hints that lighter classifiers might suffice for detection alone.

Load-bearing premise

Hallucination signals remain linearly separable in the activations and SAE latents, so steering them reduces errors without creating new mistakes on speech inputs.

What would settle it

Applying the identified steering vectors to a new non-speech dataset yields no drop in hallucination rate or produces a large rise in word error rate on speech audio.

Figures

Figures reproduced from arXiv: 2606.07473 by Assel Yermekova, Georgii Aparin, Tasnima Sadekova, Vadim Popov.

**Figure 1.** Figure 1: Layer-wise AUC scores for classifiers trained on raw Whisper activations (left) and SAE latent representations (right) for Whisper large-v3. Each curve corresponds to a test dataset, except for the CV curve, which displays the cross-validation score on the training set. All results correspond to training on the FULL-train dataset [PITH_FULL_IMAGE:figures/full_fig_p006_1.png] view at source ↗

**Figure 2.** Figure 2: AUC score against the number of top-k SAE features for Whisper small and large-v3, with cross-validation (CV) and FULL-test curves shown for both models, averaged by layers. tive. This makes additive steering more effective for non-speech inputs, where hallucinations may depend on a small set of misactivated or missing latent features [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

**Figure 4.** Figure 4: Comparison of additive and multiplicative SAE steering across top-k values for Whisper small and large-v3, in terms of HR and WER (α = 0.5 and α = 1.5 respectively). α = 0 denotes inference without steering. 5.3. Reduction of Hallucinations To contextualise our results, we compare our approach against Calm-Whisper [18], which represents the closest alternative targeting the same problem. Calm-Whisper ide… view at source ↗

**Figure 5.** Figure 5: Trade-off between HR, CER and WER for SAE-based additive steering for small (top) and large-v3 (bottom) [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

read the original abstract

Whisper, a widely adopted ASR model, is known to suffer from hallucinations - coherent transcriptions generated for non-speech audio entirely disconnected from the input. We investigate whether hallucinations can be detected and mitigated through Whisper's internal representations. We extract audio encoder activations and evaluate two representation spaces: raw Whisper activations and Sparse AutoEncoder (SAE) latents. We show that both spaces encode linearly separable hallucination-related information, with discriminative power concentrated in a sparse feature subset and increasing toward deeper encoder layers. We propose two steering strategies: activation-space steering and SAE latent-space steering. SAE-based steering reduces hallucination rate from 72.63% to 14.11% for Whisper small and from 86.88% to 27.33% for Whisper large-v3 on the full non-speech test set, with small WER degradation on speech data, approaching the performance of fine-tuning-based methods.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

SAE steering cuts reported hallucination rates on non-speech audio but the abstract and stress-test note leave the generalization claim unverified.

read the letter

The core result is that SAE latent steering drops hallucination rates from 72.63% to 14.11% on Whisper small and 86.88% to 27.33% on large-v3 for their non-speech test set, while keeping WER degradation small on speech inputs. That is the concrete empirical claim.

They extract encoder activations, compare raw space to SAE latents, show hallucination information is linearly separable and concentrated in deeper layers plus a sparse subset of features, then test two steering approaches. The SAE version performs better and approaches fine-tuning results without retraining the model. This is a straightforward new application of existing steering and SAE techniques to a known Whisper failure mode.

The soft spot is the complete lack of experimental detail in what is provided. No description of the non-speech dataset, how it was collected, whether it overlaps with SAE training data, train/test splits, baseline comparisons with numbers, or variance across runs. The stress-test concern about possible in-distribution evaluation is reasonable given the absence of that information; if the test non-speech shares distribution with the SAE data, the separability and steering effect could be narrower than presented.

The work is aimed at people doing representation interventions on audio models or trying to harden ASR without full retraining. It shows clear thinking on the problem but the evidence is too thin for strong conclusions yet. A serious referee could check the methods and data protocol if the authors supply them; right now it is borderline for review.

Referee Report

2 major / 1 minor

Summary. The paper claims that hallucinations in Whisper ASR models can be detected and mitigated via steering of hidden representations in the audio encoder, using both raw activations and Sparse AutoEncoder (SAE) latents. It reports that SAE-based steering reduces hallucination rates from 72.63% to 14.11% (Whisper small) and 86.88% to 27.33% (Whisper large-v3) on a full non-speech test set, with only small WER degradation on speech data and performance approaching fine-tuning methods. Hallucination-related information is described as linearly separable, with power concentrated in sparse SAE features and deeper layers.

Significance. If the empirical results hold under proper controls, the work would demonstrate a practical, low-overhead method for targeted mitigation of a known failure mode in production ASR systems using interpretability tools like SAEs, potentially reducing reliance on full fine-tuning while highlighting representation properties in audio encoders.

major comments (2)

[Abstract] Abstract and Experimental Setup: The manuscript reports precise hallucination-rate reductions but supplies no information on the collection protocol, train/test split, or distributional relationship between the non-speech test set and the audio used to train the SAE or compute steering vectors. This directly bears on whether the claimed linear separability and steering effect reflect genuine feature isolation or in-distribution evaluation.
[Results] Results section: No baseline comparisons (e.g., random steering vectors, non-SAE activation steering), variance across runs, or error analysis are referenced for the headline numbers (72.63%→14.11%, 86.88%→27.33%), making it impossible to assess whether the mitigation is specific to hallucination directions or an artifact of the evaluation protocol.

minor comments (1)

[Abstract] The abstract states that steering 'approaches the performance of fine-tuning-based methods' without naming or citing those methods or providing quantitative comparison tables.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thoughtful comments, which highlight important aspects of the experimental design and results reporting. We address each point below and plan to revise the manuscript to incorporate clarifications and additional analyses.

read point-by-point responses

Referee: [Abstract] Abstract and Experimental Setup: The manuscript reports precise hallucination-rate reductions but supplies no information on the collection protocol, train/test split, or distributional relationship between the non-speech test set and the audio used to train the SAE or compute steering vectors. This directly bears on whether the claimed linear separability and steering effect reflect genuine feature isolation or in-distribution evaluation.

Authors: We agree that the current manuscript lacks explicit details on the non-speech audio collection protocol, the train/test splits, and the distributional relationship to the SAE training data. This information is important for assessing the generality of the findings. In the revised version, we will expand the Experimental Setup section to describe the data sources, collection methods, splits, and how the test set relates to the SAE training distribution. We will also discuss whether the evaluation is in-distribution or out-of-distribution based on these details. revision: yes
Referee: [Results] Results section: No baseline comparisons (e.g., random steering vectors, non-SAE activation steering), variance across runs, or error analysis are referenced for the headline numbers (72.63%→14.11%, 86.88%→27.33%), making it impossible to assess whether the mitigation is specific to hallucination directions or an artifact of the evaluation protocol.

Authors: The referee correctly notes the absence of these controls and analyses in the results section. To demonstrate that the steering effect is specific rather than an artifact, we will add comparisons with random steering vectors and non-SAE activation steering in the revised manuscript. Additionally, we will report variance across multiple runs (e.g., standard deviations) and include an error analysis section discussing cases where the steering does not fully mitigate hallucinations or affects WER. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical results with no derivation chain

full rationale

The paper reports experimental outcomes from extracting Whisper activations, training SAEs, identifying hallucination-related directions, and applying steering vectors, with performance measured on held-out test sets. No equations, first-principles derivations, or 'predictions' derived from fitted parameters appear in the abstract or described methodology. Claims reduce directly to measured hallucination rates and WER on explicit test data rather than any self-referential construction or self-citation load-bearing step. This is the standard case of an empirical ML paper whose central results are externally falsifiable via replication on the reported splits.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Abstract-only review yields minimal information on parameters or assumptions; the central premise is linear separability of hallucination signals.

axioms (1)

domain assumption Hallucination-related information is linearly separable in Whisper encoder activations and SAE latents
Stated as the basis for detection and steering effectiveness.

pith-pipeline@v0.9.1-grok · 5698 in / 1228 out tokens · 21911 ms · 2026-06-27T20:43:56.283222+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Interpreting and Steering a Text-to-Speech Language Model with Sparse Autoencoders
cs.LG 2026-06 unverdicted novelty 6.0

Sparse autoencoders on a TTS language model yield interpretable features that causally control attributes such as laughter, gender, and speech rate via targeted interventions.

Reference graph

Works this paper leans on

46 extracted references · 5 canonical work pages · cited by 1 Pith paper · 3 internal anchors

[1]

fluent and coherent outputs of neural models entirely disconnected from the input

Introduction Automatic Speech Recognition (ASR) is a speech processing task with a long history, evolving from classic machine learning algorithms such as hidden Markov models (HMMs) [1, 2] and finite-state transducers [3] to more accurate hybrid approaches combining HMMs with multi-layer perceptrons [4, 5]. Cur- rently state-of-the-art ASR algorithms are...
[2]

Whisper Hallucination Detection and Mitigation via Hidden Representation Steering and Sparse AutoEncoders

Background In this section we briefly overview Sparse AutoEncoders and steering – two concepts we will extensively use for analysis of Whisper’s hallucinations and ways of fixing them. arXiv:2606.07473v1 [cs.SD] 5 Jun 2026 2.1. Sparse AutoEncoders Sparse AutoEncoders (SAEs) are a class of neural network models originally developed in the context of mechan...

work page internal anchor Pith review Pith/arXiv arXiv 2026
[3]

Whisper Hallucinations Whisper is a Transformer-based ASR model trained on a large- scale weakly supervised dataset of 680,000 hours of audio col- lected from the Internet

Methodology 3.1. Whisper Hallucinations Whisper is a Transformer-based ASR model trained on a large- scale weakly supervised dataset of 680,000 hours of audio col- lected from the Internet. While this training paradigm en- ables remarkable generalization across languages, domains, and acoustic conditions, it also introduces a well-known failure mode: hall...
[4]

Models We conduct all experiments on two Whisper model variants: Whisper small and Whisper large-v3

Experimental Setup 4.1. Models We conduct all experiments on two Whisper model variants: Whisper small and Whisper large-v3. This choice is motivated by the desire to evaluate our proposed methods across models of substantially different scales, allowing us to assess whether the observed effects generalize beyond a single model size. For inference, we use...
[5]

Classification 5.1.1

Experimental Results 5.1. Classification 5.1.1. Whisper Activations Figure 1 (left column) presents the layer-wise AUC scores ob- tained by the logistic regression classifier trained on raw Whis- per encoder activations for Whisper large-v3. A trend is ob- served across all train and test datasets: classification perfor- mance improves with layer depth, w...

work page arXiv
[6]

We showed that both raw activations and SAE latent representations encode linearly Table 5:Comparison of methods for Whisper large-v3

Conclusion In this paper, we investigated Whisper’s internal representations for hallucination detection and mitigation. We showed that both raw activations and SAE latent representations encode linearly Table 5:Comparison of methods for Whisper large-v3. HR (%) is reported on UrbanSound8K and on the FULL test set. WER is reported on LibriSpeech test-clea...
[7]

Acknowledgments The work of Vadim Popov and Tasnima Sadekova was prepared within the framework of the research project HSE-BR-2025- 019 implemented as part of the Basic Research Program at HSE University

2025
[8]

A Maximum Likeli- hood Approach to Continuous Speech Recognition,

L. R. Bahl, F. Jelinek, and R. L. Mercer, “A Maximum Likeli- hood Approach to Continuous Speech Recognition,”IEEE Trans- actions on Pattern Analysis and Machine Intelligence, vol. PAMI- 5, no. 2, pp. 179–190, 1983

1983
[9]

Linguistic con- straints in hidden Markov model based speech recognition,

M. Weintraub, H. Murveit, M. Cohenet al., “Linguistic con- straints in hidden Markov model based speech recognition,” in International Conference on Acoustics, Speech, and Signal Pro- cessing (ICASSP), vol. 2, 1989, pp. 699–702

1989
[10]

Weighted finite-state trans- ducers in speech recognition,

M. Mohri, F. Pereira, and M. Riley, “Weighted finite-state trans- ducers in speech recognition,”Comput. Speech Lang., vol. 16, no. 1, p. 69–88, jan. 2002

2002
[11]

Phonetic context in hybrid HMM/MLP continuous speech recognition,

N. Morgan, H. Bourlard, C. Wooterset al., “Phonetic context in hybrid HMM/MLP continuous speech recognition,” in2nd Euro- pean Conference on Speech Communication and Technology (Eu- rospeech 1991), 1991, pp. 109–112

1991
[12]

H. A. Bourlard and N. Morgan,Connectionist Speech Recogni- tion: A Hybrid Approach. USA: Kluwer Academic Publishers, 1993

1993
[13]

Robust speech recogni- tion via large-scale weak supervision,

A. Radford, J. W. Kim, T. Xuet al., “Robust speech recogni- tion via large-scale weak supervision,” inProceedings of the 40th International Conference on Machine Learning, ser. ICML’23. JMLR.org, 2023

2023
[14]

Streaming End-to- end Speech Recognition for Mobile Devices,

Y . He, T. N. Sainath, R. Prabhavalkaret al., “Streaming End-to- end Speech Recognition for Mobile Devices,” inICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2019, pp. 6381–6385

2019
[15]

Emilia: An Extensive, Multilin- gual, and Diverse Speech Dataset For Large-Scale Speech Gen- eration,

H. He, Z. Shang, C. Wanget al., “Emilia: An Extensive, Multilin- gual, and Diverse Speech Dataset For Large-Scale Speech Gen- eration,” in2024 IEEE Spoken Language Technology Workshop (SLT), 2024, pp. 885–890

2024
[16]

Kimi-Audio Technical Report,

K. Team, “Kimi-Audio Technical Report,”ArXiv, 2025

2025
[17]

Step-Audio 2 Technical Report,

B. Wu, C. Yan, C. Huet al., “Step-Audio 2 Technical Report,” ArXiv, 2025

2025
[18]

Qwen3-Omni Technical Report,

J. Xu, Z. Guo, H. Huet al., “Qwen3-Omni Technical Report,” ArXiv, 2025

2025
[19]

Recogniz- ing Long-Form Speech Using Streaming End-to-End Models,

A. Narayanan, R. Prabhavalkar, C.-C. Chiuet al., “Recogniz- ing Long-Form Speech Using Streaming End-to-End Models,” in2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), 2019, pp. 920–927

2019
[20]

GigaAM: Ef- ficient Self-Supervised Learner for Speech Recognition,

A. Kutsakov, A. Maximenko, G. Gospodinovet al., “GigaAM: Ef- ficient Self-Supervised Learner for Speech Recognition,” inProc. INTERSPEECH 2025, 2025, pp. 1213–1217

2025
[21]

VibeV oice-ASR Technical Re- port,

Z. Peng, J. Yu, Y . Changet al., “VibeV oice-ASR Technical Re- port,”ArXiv, 2026

2026
[22]

Survey of Hallucination in Natural Language Generation,

Z. Ji, N. Lee, R. Frieskeet al., “Survey of Hallucination in Natural Language Generation,”ACM Comput. Surv., vol. 55, no. 12, mar. 2023

2023
[23]

Artificial Hallucinations in ChatGPT: Implications in Scientific Writing,

H. Alkaissi and S. I. McFarlane, “Artificial Hallucinations in ChatGPT: Implications in Scientific Writing,”Cureus, vol. 15, 2023

2023
[24]

Hallucinations in Neural Automatic Speech Recognition: Identifying Errors and Hallucinatory Mod- els,

R. Frieske and B. E. Shi, “Hallucinations in Neural Automatic Speech Recognition: Identifying Errors and Hallucinatory Mod- els,”ArXiv, 2024

2024
[25]

Calm-Whisper: Reduce Whisper Hallucination On Non-Speech By Calming Crazy Heads Down,

Y . Wang, A. Alhmoud, S. Alsahlyet al., “Calm-Whisper: Reduce Whisper Hallucination On Non-Speech By Calming Crazy Heads Down,” inProc. INTERSPEECH 2025, 2025, pp. 3414–3418

2025
[26]

Investigation of Whisper ASR Hallucinations Induced by Non-Speech Audio,

M. Bara ´nski, J. Jasi ´nski, J. Bartolewskaet al., “Investigation of Whisper ASR Hallucinations Induced by Non-Speech Audio,” in ICASSP 2025 - 2025 IEEE International Conference on Acous- tics, Speech and Signal Processing (ICASSP), 2025

2025
[27]

AudioSAE: To- wards understanding of audio-processing models with sparse au- toencoders,

G. Aparin, T. Sadekova, A. Rukhovichet al., “AudioSAE: To- wards understanding of audio-processing models with sparse au- toencoders,” in19th Conference of the European Chapter of the Association for Computational Linguistics, 2026

2026
[28]

Sparse Autoencoders Find Highly Interpretable Features in Language Models,

H. Cunningham, A. Ewart, L. Riggset al., “Sparse Autoencoders Find Highly Interpretable Features in Language Models,”ArXiv, 2023

2023
[29]

Gemma Scope: Open Sparse Autoencoders Everywhere All At Once on Gemma 2,

T. Lieberum, S. Rajamanoharan, A. Conmyet al., “Gemma Scope: Open Sparse Autoencoders Everywhere All At Once on Gemma 2,” inProceedings of the 7th BlackboxNLP Workshop: Analyzing and Interpreting Neural Networks for NLP. Associa- tion for Computational Linguistics, nov. 2024, pp. 278–300

2024
[30]

Sparse autoencoders reveal selec- tive remapping of visual concepts during adaptation,

H. Lim, J. Choi, J. Chooet al., “Sparse autoencoders reveal selec- tive remapping of visual concepts during adaptation,” inThe Thir- teenth International Conference on Learning Representations, 2025

2025
[31]

Sparse autoencoders learn monosemantic features in vision-language models,

M. Pach, S. Karthik, Q. Bouniotet al., “Sparse autoencoders learn monosemantic features in vision-language models,” inThe Thirty- ninth Annual Conference on Neural Information Processing Sys- tems, 2025

2025
[32]

Discovering and Steering Interpretable Concepts in Large Generative Music Models,

N. Singh, M. Cherep, and P. Maes, “Discovering and Steering Interpretable Concepts in Large Generative Music Models,” in The Fourteenth International Conference on Learning Represen- tations, 2026

2026
[33]

Steering Language Models With Activation Engineering

A. M. Turner, L. Thiergart, G. Leech, D. Udell, J. J. Vazquez, U. Mini, and M. MacDiarmid, “Steering language models with activation engineering, 2024,”URL https://arxiv. org/abs/2308.10248, vol. 2308, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[34]

Representation Engineering: A Top-Down Approach to AI Transparency

A. Zou, L. Phan, S. Chen, J. Campbell, P. Guo, R. Ren, A. Pan, X. Yin, M. Mazeika, A.-K. Dombrowskiet al., “Representation engineering: A top-down approach to ai transparency, 2023,”URL https://arxiv. org/abs/2310.01405, vol. 97, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2023
[35]

Inference-time intervention: Eliciting truthful answers from a language model,

K. Li, O. Patel, F. Vi ´egas, H. Pfister, and M. Wattenberg, “Inference-time intervention: Eliciting truthful answers from a language model,”Advances in Neural Information Processing Systems, vol. 36, pp. 41 451–41 530, 2023

2023
[36]

Steering llama 2 via contrastive activation addition,

N. Rimsky, N. Gabrieli, J. Schulz, M. Tong, E. Hubinger, and A. Turner, “Steering llama 2 via contrastive activation addition,” inProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024, pp. 15 504–15 522

2024
[37]

CASteer: Cross-attention steering for controllable concept erasure,

T. Gaintseva, A.-M. Oncescu, C. Ma, Z. Liu, M. Benning, G. Slabaugh, J. Deng, and I. Elezi, “CASteer: Cross-attention steering for controllable concept erasure,” inThe Fourteenth In- ternational Conference on Learning Representations, 2026. [On- line]. Available: https://openreview.net/forum?id=6D5Odqol1B

2026
[38]

Activation patching for inter- pretable steering in music generation, 2025,

S. Facchiano, G. Strano, D. Crisostomi, I. Tallini, T. Mencat- tini, F. Galasso, and E. Rodola, “Activation patching for inter- pretable steering in music generation, 2025,”URL https://arxiv. org/abs/2504.04479

work page arXiv 2025
[39]

Feature-level insights into artificial text detection with sparse autoencoders,

K. Kuznetsov, L. Kushnareva, A. Razzhigaev, P. Druzhinina, A. V oznyuk, I. Piontkovskaya, E. Burnaev, and S. Barannikov, “Feature-level insights into artificial text detection with sparse autoencoders,” inFindings of the Association for Computational Linguistics: ACL 2025, 2025, pp. 25 727–25 748

2025
[40]

MUSAN: A Music, Speech, and Noise Corpus,

D. Snyder, G. Chen, and D. Povey, “MUSAN: A Music, Speech, and Noise Corpus,”ArXiv, 2015

2015
[41]

WHAM!: Extending Speech Separation to Noisy Environments,

G. Wichern, J. Antognini, M. Flynnet al., “WHAM!: Extending Speech Separation to Noisy Environments,” inInterspeech 2019, 2019, pp. 1368–1372

2019
[42]

FSD50K: An Open Dataset of Human-Labeled Sound Events,

E. Fonseca, X. Favory, J. Ponset al., “FSD50K: An Open Dataset of Human-Labeled Sound Events,”IEEE/ACM Trans. Audio, Speech and Lang. Proc., vol. 30, p. 829–852, dec. 2021

2021
[43]

A Dataset and Taxon- omy for Urban Sound Research,

J. Salamon, C. Jacoby, and J. P. Bello, “A Dataset and Taxon- omy for Urban Sound Research,” in22nd ACM International Con- ference on Multimedia (ACM-MM’14), Orlando, FL, USA, Nov. 2014, pp. 1041–1044

2014
[44]

Librispeech: An ASR corpus based on public domain audio books,

V . Panayotov, G. Chen, D. Poveyet al., “Librispeech: An ASR corpus based on public domain audio books,” in2015 IEEE Inter- national Conference on Acoustics, Speech and Signal Processing (ICASSP), 2015, pp. 5206–5210

2015
[45]

FLEURS: FEW-Shot Learning Evaluation of Universal Representations of Speech,

A. Conneau, M. Ma, S. Khanujaet al., “FLEURS: FEW-Shot Learning Evaluation of Universal Representations of Speech,” in 2022 IEEE Spoken Language Technology Workshop (SLT), 2023, pp. 798–805

2022
[46]

AISHELL-1: An open-source Man- darin speech corpus and a speech recognition baseline,

H. Bu, J. Du, X. Naet al., “AISHELL-1: An open-source Man- darin speech corpus and a speech recognition baseline,” in2017 20th Conference of the Oriental Chapter of the International Co- ordinating Committee on Speech Databases and Speech I/O Sys- tems and Assessment (O-COCOSDA), 2017, pp. 1–5

2017

[1] [1]

fluent and coherent outputs of neural models entirely disconnected from the input

Introduction Automatic Speech Recognition (ASR) is a speech processing task with a long history, evolving from classic machine learning algorithms such as hidden Markov models (HMMs) [1, 2] and finite-state transducers [3] to more accurate hybrid approaches combining HMMs with multi-layer perceptrons [4, 5]. Cur- rently state-of-the-art ASR algorithms are...

[2] [2]

Whisper Hallucination Detection and Mitigation via Hidden Representation Steering and Sparse AutoEncoders

Background In this section we briefly overview Sparse AutoEncoders and steering – two concepts we will extensively use for analysis of Whisper’s hallucinations and ways of fixing them. arXiv:2606.07473v1 [cs.SD] 5 Jun 2026 2.1. Sparse AutoEncoders Sparse AutoEncoders (SAEs) are a class of neural network models originally developed in the context of mechan...

work page internal anchor Pith review Pith/arXiv arXiv 2026

[3] [3]

Whisper Hallucinations Whisper is a Transformer-based ASR model trained on a large- scale weakly supervised dataset of 680,000 hours of audio col- lected from the Internet

Methodology 3.1. Whisper Hallucinations Whisper is a Transformer-based ASR model trained on a large- scale weakly supervised dataset of 680,000 hours of audio col- lected from the Internet. While this training paradigm en- ables remarkable generalization across languages, domains, and acoustic conditions, it also introduces a well-known failure mode: hall...

[4] [4]

Models We conduct all experiments on two Whisper model variants: Whisper small and Whisper large-v3

Experimental Setup 4.1. Models We conduct all experiments on two Whisper model variants: Whisper small and Whisper large-v3. This choice is motivated by the desire to evaluate our proposed methods across models of substantially different scales, allowing us to assess whether the observed effects generalize beyond a single model size. For inference, we use...

[5] [5]

Classification 5.1.1

Experimental Results 5.1. Classification 5.1.1. Whisper Activations Figure 1 (left column) presents the layer-wise AUC scores ob- tained by the logistic regression classifier trained on raw Whis- per encoder activations for Whisper large-v3. A trend is ob- served across all train and test datasets: classification perfor- mance improves with layer depth, w...

work page arXiv

[6] [6]

We showed that both raw activations and SAE latent representations encode linearly Table 5:Comparison of methods for Whisper large-v3

Conclusion In this paper, we investigated Whisper’s internal representations for hallucination detection and mitigation. We showed that both raw activations and SAE latent representations encode linearly Table 5:Comparison of methods for Whisper large-v3. HR (%) is reported on UrbanSound8K and on the FULL test set. WER is reported on LibriSpeech test-clea...

[7] [7]

Acknowledgments The work of Vadim Popov and Tasnima Sadekova was prepared within the framework of the research project HSE-BR-2025- 019 implemented as part of the Basic Research Program at HSE University

2025

[8] [8]

A Maximum Likeli- hood Approach to Continuous Speech Recognition,

L. R. Bahl, F. Jelinek, and R. L. Mercer, “A Maximum Likeli- hood Approach to Continuous Speech Recognition,”IEEE Trans- actions on Pattern Analysis and Machine Intelligence, vol. PAMI- 5, no. 2, pp. 179–190, 1983

1983

[9] [9]

Linguistic con- straints in hidden Markov model based speech recognition,

M. Weintraub, H. Murveit, M. Cohenet al., “Linguistic con- straints in hidden Markov model based speech recognition,” in International Conference on Acoustics, Speech, and Signal Pro- cessing (ICASSP), vol. 2, 1989, pp. 699–702

1989

[10] [10]

Weighted finite-state trans- ducers in speech recognition,

M. Mohri, F. Pereira, and M. Riley, “Weighted finite-state trans- ducers in speech recognition,”Comput. Speech Lang., vol. 16, no. 1, p. 69–88, jan. 2002

2002

[11] [11]

Phonetic context in hybrid HMM/MLP continuous speech recognition,

N. Morgan, H. Bourlard, C. Wooterset al., “Phonetic context in hybrid HMM/MLP continuous speech recognition,” in2nd Euro- pean Conference on Speech Communication and Technology (Eu- rospeech 1991), 1991, pp. 109–112

1991

[12] [12]

H. A. Bourlard and N. Morgan,Connectionist Speech Recogni- tion: A Hybrid Approach. USA: Kluwer Academic Publishers, 1993

1993

[13] [13]

Robust speech recogni- tion via large-scale weak supervision,

A. Radford, J. W. Kim, T. Xuet al., “Robust speech recogni- tion via large-scale weak supervision,” inProceedings of the 40th International Conference on Machine Learning, ser. ICML’23. JMLR.org, 2023

2023

[14] [14]

Streaming End-to- end Speech Recognition for Mobile Devices,

Y . He, T. N. Sainath, R. Prabhavalkaret al., “Streaming End-to- end Speech Recognition for Mobile Devices,” inICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2019, pp. 6381–6385

2019

[15] [15]

Emilia: An Extensive, Multilin- gual, and Diverse Speech Dataset For Large-Scale Speech Gen- eration,

H. He, Z. Shang, C. Wanget al., “Emilia: An Extensive, Multilin- gual, and Diverse Speech Dataset For Large-Scale Speech Gen- eration,” in2024 IEEE Spoken Language Technology Workshop (SLT), 2024, pp. 885–890

2024

[16] [16]

Kimi-Audio Technical Report,

K. Team, “Kimi-Audio Technical Report,”ArXiv, 2025

2025

[17] [17]

Step-Audio 2 Technical Report,

B. Wu, C. Yan, C. Huet al., “Step-Audio 2 Technical Report,” ArXiv, 2025

2025

[18] [18]

Qwen3-Omni Technical Report,

J. Xu, Z. Guo, H. Huet al., “Qwen3-Omni Technical Report,” ArXiv, 2025

2025

[19] [19]

Recogniz- ing Long-Form Speech Using Streaming End-to-End Models,

A. Narayanan, R. Prabhavalkar, C.-C. Chiuet al., “Recogniz- ing Long-Form Speech Using Streaming End-to-End Models,” in2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), 2019, pp. 920–927

2019

[20] [20]

GigaAM: Ef- ficient Self-Supervised Learner for Speech Recognition,

A. Kutsakov, A. Maximenko, G. Gospodinovet al., “GigaAM: Ef- ficient Self-Supervised Learner for Speech Recognition,” inProc. INTERSPEECH 2025, 2025, pp. 1213–1217

2025

[21] [21]

VibeV oice-ASR Technical Re- port,

Z. Peng, J. Yu, Y . Changet al., “VibeV oice-ASR Technical Re- port,”ArXiv, 2026

2026

[22] [22]

Survey of Hallucination in Natural Language Generation,

Z. Ji, N. Lee, R. Frieskeet al., “Survey of Hallucination in Natural Language Generation,”ACM Comput. Surv., vol. 55, no. 12, mar. 2023

2023

[23] [23]

Artificial Hallucinations in ChatGPT: Implications in Scientific Writing,

H. Alkaissi and S. I. McFarlane, “Artificial Hallucinations in ChatGPT: Implications in Scientific Writing,”Cureus, vol. 15, 2023

2023

[24] [24]

Hallucinations in Neural Automatic Speech Recognition: Identifying Errors and Hallucinatory Mod- els,

R. Frieske and B. E. Shi, “Hallucinations in Neural Automatic Speech Recognition: Identifying Errors and Hallucinatory Mod- els,”ArXiv, 2024

2024

[25] [25]

Calm-Whisper: Reduce Whisper Hallucination On Non-Speech By Calming Crazy Heads Down,

Y . Wang, A. Alhmoud, S. Alsahlyet al., “Calm-Whisper: Reduce Whisper Hallucination On Non-Speech By Calming Crazy Heads Down,” inProc. INTERSPEECH 2025, 2025, pp. 3414–3418

2025

[26] [26]

Investigation of Whisper ASR Hallucinations Induced by Non-Speech Audio,

M. Bara ´nski, J. Jasi ´nski, J. Bartolewskaet al., “Investigation of Whisper ASR Hallucinations Induced by Non-Speech Audio,” in ICASSP 2025 - 2025 IEEE International Conference on Acous- tics, Speech and Signal Processing (ICASSP), 2025

2025

[27] [27]

AudioSAE: To- wards understanding of audio-processing models with sparse au- toencoders,

G. Aparin, T. Sadekova, A. Rukhovichet al., “AudioSAE: To- wards understanding of audio-processing models with sparse au- toencoders,” in19th Conference of the European Chapter of the Association for Computational Linguistics, 2026

2026

[28] [28]

Sparse Autoencoders Find Highly Interpretable Features in Language Models,

H. Cunningham, A. Ewart, L. Riggset al., “Sparse Autoencoders Find Highly Interpretable Features in Language Models,”ArXiv, 2023

2023

[29] [29]

Gemma Scope: Open Sparse Autoencoders Everywhere All At Once on Gemma 2,

T. Lieberum, S. Rajamanoharan, A. Conmyet al., “Gemma Scope: Open Sparse Autoencoders Everywhere All At Once on Gemma 2,” inProceedings of the 7th BlackboxNLP Workshop: Analyzing and Interpreting Neural Networks for NLP. Associa- tion for Computational Linguistics, nov. 2024, pp. 278–300

2024

[30] [30]

Sparse autoencoders reveal selec- tive remapping of visual concepts during adaptation,

H. Lim, J. Choi, J. Chooet al., “Sparse autoencoders reveal selec- tive remapping of visual concepts during adaptation,” inThe Thir- teenth International Conference on Learning Representations, 2025

2025

[31] [31]

Sparse autoencoders learn monosemantic features in vision-language models,

M. Pach, S. Karthik, Q. Bouniotet al., “Sparse autoencoders learn monosemantic features in vision-language models,” inThe Thirty- ninth Annual Conference on Neural Information Processing Sys- tems, 2025

2025

[32] [32]

Discovering and Steering Interpretable Concepts in Large Generative Music Models,

N. Singh, M. Cherep, and P. Maes, “Discovering and Steering Interpretable Concepts in Large Generative Music Models,” in The Fourteenth International Conference on Learning Represen- tations, 2026

2026

[33] [33]

Steering Language Models With Activation Engineering

A. M. Turner, L. Thiergart, G. Leech, D. Udell, J. J. Vazquez, U. Mini, and M. MacDiarmid, “Steering language models with activation engineering, 2024,”URL https://arxiv. org/abs/2308.10248, vol. 2308, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[34] [34]

Representation Engineering: A Top-Down Approach to AI Transparency

A. Zou, L. Phan, S. Chen, J. Campbell, P. Guo, R. Ren, A. Pan, X. Yin, M. Mazeika, A.-K. Dombrowskiet al., “Representation engineering: A top-down approach to ai transparency, 2023,”URL https://arxiv. org/abs/2310.01405, vol. 97, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2023

[35] [35]

Inference-time intervention: Eliciting truthful answers from a language model,

K. Li, O. Patel, F. Vi ´egas, H. Pfister, and M. Wattenberg, “Inference-time intervention: Eliciting truthful answers from a language model,”Advances in Neural Information Processing Systems, vol. 36, pp. 41 451–41 530, 2023

2023

[36] [36]

Steering llama 2 via contrastive activation addition,

N. Rimsky, N. Gabrieli, J. Schulz, M. Tong, E. Hubinger, and A. Turner, “Steering llama 2 via contrastive activation addition,” inProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024, pp. 15 504–15 522

2024

[37] [37]

CASteer: Cross-attention steering for controllable concept erasure,

T. Gaintseva, A.-M. Oncescu, C. Ma, Z. Liu, M. Benning, G. Slabaugh, J. Deng, and I. Elezi, “CASteer: Cross-attention steering for controllable concept erasure,” inThe Fourteenth In- ternational Conference on Learning Representations, 2026. [On- line]. Available: https://openreview.net/forum?id=6D5Odqol1B

2026

[38] [38]

Activation patching for inter- pretable steering in music generation, 2025,

S. Facchiano, G. Strano, D. Crisostomi, I. Tallini, T. Mencat- tini, F. Galasso, and E. Rodola, “Activation patching for inter- pretable steering in music generation, 2025,”URL https://arxiv. org/abs/2504.04479

work page arXiv 2025

[39] [39]

Feature-level insights into artificial text detection with sparse autoencoders,

K. Kuznetsov, L. Kushnareva, A. Razzhigaev, P. Druzhinina, A. V oznyuk, I. Piontkovskaya, E. Burnaev, and S. Barannikov, “Feature-level insights into artificial text detection with sparse autoencoders,” inFindings of the Association for Computational Linguistics: ACL 2025, 2025, pp. 25 727–25 748

2025

[40] [40]

MUSAN: A Music, Speech, and Noise Corpus,

D. Snyder, G. Chen, and D. Povey, “MUSAN: A Music, Speech, and Noise Corpus,”ArXiv, 2015

2015

[41] [41]

WHAM!: Extending Speech Separation to Noisy Environments,

G. Wichern, J. Antognini, M. Flynnet al., “WHAM!: Extending Speech Separation to Noisy Environments,” inInterspeech 2019, 2019, pp. 1368–1372

2019

[42] [42]

FSD50K: An Open Dataset of Human-Labeled Sound Events,

E. Fonseca, X. Favory, J. Ponset al., “FSD50K: An Open Dataset of Human-Labeled Sound Events,”IEEE/ACM Trans. Audio, Speech and Lang. Proc., vol. 30, p. 829–852, dec. 2021

2021

[43] [43]

A Dataset and Taxon- omy for Urban Sound Research,

J. Salamon, C. Jacoby, and J. P. Bello, “A Dataset and Taxon- omy for Urban Sound Research,” in22nd ACM International Con- ference on Multimedia (ACM-MM’14), Orlando, FL, USA, Nov. 2014, pp. 1041–1044

2014

[44] [44]

Librispeech: An ASR corpus based on public domain audio books,

V . Panayotov, G. Chen, D. Poveyet al., “Librispeech: An ASR corpus based on public domain audio books,” in2015 IEEE Inter- national Conference on Acoustics, Speech and Signal Processing (ICASSP), 2015, pp. 5206–5210

2015

[45] [45]

FLEURS: FEW-Shot Learning Evaluation of Universal Representations of Speech,

A. Conneau, M. Ma, S. Khanujaet al., “FLEURS: FEW-Shot Learning Evaluation of Universal Representations of Speech,” in 2022 IEEE Spoken Language Technology Workshop (SLT), 2023, pp. 798–805

2022

[46] [46]

AISHELL-1: An open-source Man- darin speech corpus and a speech recognition baseline,

H. Bu, J. Du, X. Naet al., “AISHELL-1: An open-source Man- darin speech corpus and a speech recognition baseline,” in2017 20th Conference of the Oriental Chapter of the International Co- ordinating Committee on Speech Databases and Speech I/O Sys- tems and Assessment (O-COCOSDA), 2017, pp. 1–5

2017