pith. sign in

arxiv: 2602.07253 · v2 · pith:KFH6FBB6new · submitted 2026-02-06 · 💻 cs.AI · cs.CL

From Out-of-Distribution Detection to Hallucination Detection: A Geometric View

classification 💻 cs.AI cs.CL
keywords detectionhallucinationlanguagemodelstaskslargeout-of-distributionproblem
0
0 comments X
read the original abstract

Detecting hallucinations in large language models is a critical open problem with significant implications for safety and reliability. While existing hallucination detection methods achieve strong performance in question-answering tasks, they remain less effective on tasks requiring reasoning. In this work, we revisit hallucination detection through the lens of out-of-distribution (OOD) detection, a well-studied problem in areas like computer vision. Treating next-token prediction in language models as a classification task allows us to apply OOD techniques, provided appropriate modifications are made to account for the structural differences in large language models. We show that OOD-based approaches yield training-free, single-sample-based detectors, achieving strong accuracy in hallucination detection for reasoning tasks. Overall, our work suggests that reframing hallucination detection as OOD detection provides a promising and scalable pathway toward language model safety.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. How Language Models Process Out-of-Distribution Inputs: A Two-Pathway Framework

    cs.CL 2026-04 unverdicted novelty 6.0

    LLM OOD detectors are length-confounded; a two-pathway embedding-plus-trajectory framework detects covert OOD inputs at 0.721 average AUROC and 0.850 on jailbreaks.