A Geometric Taxonomy of Hallucinations in LLMs

Javier Mar\'in

Authors on Pith no claims yet

classification 💻 cs.AI cs.CL

keywords geometryangularavailableconfabulationdeployeddetectabledetectiondomains

read the original abstract

Hallucinations in deployed language models can have real consequences for downstream decisions in domains such as healthcare, legal, and financial services. In production, detection has to run on what the deployed system can see: the query, the response, and often a source document. White-box access to model internals and multi-sample querying are not generally available behind a third-party API. Within this setting - black-box, single-pass, only question/answer available - the dominant baseline is NLI, which returns a value but no diagnosis when it fails. We argue that operating directly on the geometry of the embedding space provides detection methods whose successes and failures are interpretable as structural properties of contrastive sentence-encoder training \citep{wang2020understanding}. The contribution is: given an operationally-motivated taxonomy, geometry predicts which types of hallucination are detectable and which are not - and the predictions hold. We propose three operational types organized by the relation of the response embedding to the plausibility region of grounded responses on the unit hypersphere, and derive from the alignment objective a prediction for each: (1)query-proximate unfaithfulness is detectable by an angular ratio; (2)confabulation outside the plausibility region produces a directional signature that outperforms NLI on expert-annotated error; (3)factual errors sharing vocabulary and frame with correct answers are not separable by angular geometry. To validate on content resembling deployment, we built a 212-pair human-confabulated dataset across nine domains using provoked confabulation.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Text Corpora as Concept Fields: Black-Box Hallucination and Novelty Measurement
cs.CL 2026-05 unverdicted novelty 7.0

Concept Fields model text corpora as local Gaussian drift fields in embedding space to score sentence transitions for hallucination detection and novelty via standardized deviation.
Text Corpora as Concept Fields: Black-Box Hallucination and Novelty Measurement
cs.CL 2026-05 unverdicted novelty 6.0

Concept Fields model text corpora as local Gaussian drift fields in embedding space to score sentence transitions for groundedness and novelty without model internals.