Semantic energy: Detecting LLM hallucination beyond entropy

Semantic energy: Detecting LLM hallucination beyond entropy , author= · 2025 · arXiv 2508.14496

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

representative citing papers

Calibrated Confidence Estimation for Tabular Question Answering

cs.CL · 2026-04-14 · unverdicted · novelty 7.0

Tabular QA LLMs are overconfident, but Multi-Format Agreement using Markdown/HTML/JSON/CSV variants improves AUROC to 0.80 and cuts calibration error by 44-63% at lower cost than sampling.

Hallucination as Commitment Failure: Larger LLMs Misfire Despite Knowing the Answer

cs.CL · 2026-05-21 · unverdicted · novelty 6.0

Larger LLMs hallucinate more often despite having the correct concept available because instruction tuning causes probability mass to disperse across alternative surface forms instead of concentrating on one.

Position: Uncertainty Quantification in LLMs is Just Unsupervised Clustering

cs.CL · 2026-05-19 · unverdicted · novelty 5.0

Mainstream UQ for LLMs reduces to unsupervised clustering of internal generation consistency and therefore cannot detect confident hallucinations or provide reliable safety signals.

High-Entropy Tokens as Multimodal Failure Points in Vision-Language Models

cs.CV · 2025-12-26

citing papers explorer

Showing 4 of 4 citing papers.

Calibrated Confidence Estimation for Tabular Question Answering cs.CL · 2026-04-14 · unverdicted · none · ref 30
Tabular QA LLMs are overconfident, but Multi-Format Agreement using Markdown/HTML/JSON/CSV variants improves AUROC to 0.80 and cuts calibration error by 44-63% at lower cost than sampling.
Hallucination as Commitment Failure: Larger LLMs Misfire Despite Knowing the Answer cs.CL · 2026-05-21 · unverdicted · none · ref 55
Larger LLMs hallucinate more often despite having the correct concept available because instruction tuning causes probability mass to disperse across alternative surface forms instead of concentrating on one.
Position: Uncertainty Quantification in LLMs is Just Unsupervised Clustering cs.CL · 2026-05-19 · unverdicted · none · ref 98
Mainstream UQ for LLMs reduces to unsupervised clustering of internal generation consistency and therefore cannot detect confident hallucinations or provide reliable safety signals.
High-Entropy Tokens as Multimodal Failure Points in Vision-Language Models cs.CV · 2025-12-26 · unreviewed · ref 26

Semantic energy: Detecting LLM hallucination beyond entropy

fields

years

verdicts

representative citing papers

citing papers explorer