Recognition: unknown
A Geometric Taxonomy of Hallucinations in LLMs
read the original abstract
Hallucinations in deployed language models can have real consequences for downstream decisions in domains such as healthcare, legal, and financial services. In production, detection has to run on what the deployed system can see: the query, the response, and often a source document. White-box access to model internals and multi-sample querying are not generally available behind a third-party API. Within this setting - black-box, single-pass, only question/answer available - the dominant baseline is NLI, which returns a value but no diagnosis when it fails. We argue that operating directly on the geometry of the embedding space provides detection methods whose successes and failures are interpretable as structural properties of contrastive sentence-encoder training \citep{wang2020understanding}. The contribution is: given an operationally-motivated taxonomy, geometry predicts which types of hallucination are detectable and which are not - and the predictions hold. We propose three operational types organized by the relation of the response embedding to the plausibility region of grounded responses on the unit hypersphere, and derive from the alignment objective a prediction for each: (1)query-proximate unfaithfulness is detectable by an angular ratio; (2)confabulation outside the plausibility region produces a directional signature that outperforms NLI on expert-annotated error; (3)factual errors sharing vocabulary and frame with correct answers are not separable by angular geometry. To validate on content resembling deployment, we built a 212-pair human-confabulated dataset across nine domains using provoked confabulation.
This paper has not been read by Pith yet.
Forward citations
Cited by 2 Pith papers
-
Text Corpora as Concept Fields: Black-Box Hallucination and Novelty Measurement
Concept Fields model text corpora as local Gaussian drift fields in embedding space to score sentence transitions for hallucination detection and novelty via standardized deviation.
-
Text Corpora as Concept Fields: Black-Box Hallucination and Novelty Measurement
Concept Fields model text corpora as local Gaussian drift fields in embedding space to score sentence transitions for groundedness and novelty without model internals.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.