Hallucination information is linearly separable in Whisper activations and SAE latents; SAE steering reduces hallucination rates from 72.63% to 14.11% (small) and 86.88% to 27.33% (large-v3) on non-speech audio with small WER impact.
Title resolution pending
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.SD 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Whisper Hallucination Detection and Mitigation via Hidden Representation Steering and Sparse AutoEncoders
Hallucination information is linearly separable in Whisper activations and SAE latents; SAE steering reduces hallucination rates from 72.63% to 14.11% (small) and 86.88% to 27.33% (large-v3) on non-speech audio with small WER impact.