pith. sign in

arxiv: 2605.22007 · v1 · pith:3GHXP7WLnew · submitted 2026-05-21 · 💻 cs.CL

Hallucination as Commitment Failure: Larger LLMs Misfire Despite Knowing the Answer

Pith reviewed 2026-05-22 06:36 UTC · model grok-4.3

classification 💻 cs.CL
keywords hallucinationlarge language modelsinstruction tuningprobability distributionanswer commitmentscaling behaviorsemantic availability
0
0 comments X

The pith

Larger LLMs hallucinate despite substantial probability already on the correct answer concept, with the rate rising as models scale.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests whether hallucinations simply reflect missing knowledge by defining a semantic measure of answer availability that groups different token sequences for the same concept. It finds that in instruction-tuned models from 0.8B to 72B parameters, 16 to 47 percent of hallucinations occur when the correct concept already holds notable probability mass, and this share grows steadily with size. Correct generations differ not by having the concept present but by concentrating that probability on one surface form, while hallucinations spread it across alternatives. The pattern holds over multi-token outputs and appears in hidden states before generation begins. The work concludes that instruction tuning produces sharper commitments at scale, so that both confident correct answers and confident wrong answers arise from the same sharpening process.

Core claim

Instruction tuning sharpens answer commitment with scale, making helpfulness and confident hallucination two consequences of the same underlying disposition. Across Qwen and Llama models, 16-47% of Instruct hallucinations occur with substantial probability mass already on the correct concept, and the rate rises monotonically with scale. Comparing such failures against correct generations with matched semantic support, the distinguishing factor is not whether the correct concept is represented, but how its probability is distributed: correct generations concentrate mass on a single surface form, hallucinations disperse it across alternatives. The same sharpening asymmetry extends across multi

What carries the argument

Semantic notion of answer availability that aggregates probability over all token-level variants expressing the same answer concept, used to separate presence of knowledge from the act of committing to one surface form.

If this is right

  • The share of commitment failures increases steadily as model size grows from 0.8B to 72B parameters.
  • Correct and hallucinated outputs differ primarily in how sharply probability concentrates rather than in whether the concept is represented at all.
  • The concentration difference is already visible in hidden states before any tokens are generated.
  • The same dispersion pattern appears across both single-token and multi-token generations.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Methods that explicitly reward probability concentration on the highest-mass correct form could reduce these failures without sacrificing scale-driven gains in other capabilities.
  • The commitment mechanism may affect non-factual generation tasks such as creative writing or reasoning chains where multiple surface forms compete.
  • Pre-generation hidden-state probes for dispersion could serve as an early-warning signal during inference.

Load-bearing premise

Grouping different token sequences into the same semantic concept correctly identifies whether the model has the answer available at the moment it commits to a generation.

What would settle it

An intervention that forces probability mass onto the single highest-probability surface form for the correct concept, followed by measuring whether hallucination rates drop while overall accuracy on the same questions stays constant.

Figures

Figures reproduced from arXiv: 2605.22007 by Heejun Kim, Jaewon Sok, Jeongjae Park, Jewon Yeom, Seonghyeon Park, Taesup Kim.

Figure 1
Figure 1. Figure 1: Token entropy H(yt | Q, y<t) across a representative generation trajectory (Qwen3.5-9B Instruct). Entropy is near zero at most steps but spikes sharply at a small number of commitment steps. A natural follow-up is whether entropy at these spikes is itself a hallucination signal. Existing work has established that it is not, in a stronger form than we will need: Simhi et al. [2025] document hallucinations p… view at source ↗
Figure 2
Figure 2. Figure 2: Vocabulary fragmentation at the commitment step: the correct concept’s mass (0.501 total) [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: 500 long-form Qwen3.5-9B Instruct responses aligned to each trajectory’s commitment [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Within first-token selection failures, mean wrong-token probability [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Three-level Instruct–Base comparison at t = 1. Hidden Probe: 5-fold CV AUROC on last-layer hidden states (MCQA, N=1,000). Q-attn: fraction of last-layer attention on question tokens (Short-QA, N=500). Output AUROC: P(correct option) on MCQA. Average Instruct–Base gaps: +0.08 (Hidden Probe), +0.09 (Q-attn), +0.29 (Output AUROC). 4.4 When does the model “know” it is going to fail? A separate but related obse… view at source ↗
Figure 6
Figure 6. Figure 6: shows the resulting Pmass(t; c ∗ ) trajectories. Correct samples have a sharp peak in Pmass at tc (typically 0.6–0.9) and near-zero mass before and after, consistent with Pmass measuring the model’s commitment to the correct concept at the moment of emission. Hallucinated samples sit near zero at all aligned steps—the model never put substantial mass on c ∗ in the trajectory. This is the long-form analog o… view at source ↗
Figure 7
Figure 7. Figure 7: Generated-token probability P(yt) aligned to tc in long-form generation. Both correct and hallucinated trajectories sit at ∼0.85–0.95 throughout, with only a small dip at tc (∼0.78 vs. 0.89). Token-level confidence carries little of the correct/hallucinated signal that Pmass reveals (cf. Figure 3b). 24 [PITH_FULL_IMAGE:figures/full_fig_p024_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Layer-wise probe AUROC (Qwen3.5). Instruct (blue) [PITH_FULL_IMAGE:figures/full_fig_p026_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Pmass(t = 1) calibration for all 14 models. Accuracy increases monotonically across Pmass bins (Instruct ECE 0.023–0.096). 29 [PITH_FULL_IMAGE:figures/full_fig_p029_9.png] view at source ↗
read the original abstract

Hallucination is often viewed as a direct consequence of missing knowledge: a model answers incorrectly when the correct answer is absent from its generation-time distribution, and correctly when it is present. We test this assumption by introducing a semantic notion of answer availability that aggregates token-level variants expressing the same answer concept, and asks whether the correct concept is already available at the moment the model commits to an answer. Across Qwen and Llama models from 0.8B to 72B in both Instruct and Base variants, 16-47% of Instruct hallucinations occur with substantial probability mass already on the correct concept, and the rate rises monotonically with scale. Comparing such failures against correct generations with matched semantic support, the distinguishing factor is not whether the correct concept is represented, but how its probability is distributed: correct generations concentrate mass on a single surface form, hallucinations disperse it across alternatives. The same sharpening asymmetry extends across multi-token generation and is detectable in pre-generation hidden states. Together, these results identify a single mechanism: instruction tuning sharpens answer commitment with scale, making helpfulness and confident hallucination two consequences of the same underlying disposition.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper claims that hallucinations are not primarily due to missing knowledge but result from a commitment failure: models often have the correct answer concept available in their generation-time distribution yet commit to an incorrect surface form. Using a semantic aggregation of token variants to measure 'answer availability,' the authors report that 16-47% of Instruct hallucinations across Qwen and Llama models (0.8B–72B, base and instruct) occur with substantial probability mass on the correct concept, with this rate increasing monotonically with scale. Correct outputs concentrate mass on one form while hallucinations disperse it; the pattern holds for multi-token sequences and is detectable in pre-generation hidden states. The conclusion is that instruction tuning sharpens commitment with scale, linking helpfulness and hallucination as consequences of the same mechanism.

Significance. If the semantic aggregation accurately identifies concept availability without introducing selection effects, the work offers a useful empirical reframing of hallucinations as a distributional sharpening issue rather than a knowledge gap. The scale of the study across two model families, multiple sizes, and both base/instruct variants, together with the extension to hidden-state signals, provides concrete evidence that could guide interventions focused on probability concentration. The monotonic scaling observation is particularly noteworthy if it survives controls for generation diversity.

major comments (3)
  1. [§3] §3 (Semantic Answer Availability definition): The aggregation procedure that maps token variants to a shared 'concept' is load-bearing for the 16-47% claim and the subsequent distribution-sharpness comparison. The manuscript must specify the exact similarity metric, embedding model, or normalization used, and demonstrate that it does not group merely topically related expressions. Without this, the distinction between 'knowing but misfiring' and partial knowledge remains unverified, especially as larger models produce more varied paraphrases.
  2. [§4.1] §4.1 and results tables (substantial probability mass threshold): The headline percentages and monotonic scaling depend on an unspecified cutoff for 'substantial' mass. No sensitivity analysis or justification for the chosen threshold is reported; altering it could change both the fraction of 'available' hallucinations and the claimed distinction from matched correct generations.
  3. [Results] Results section (monotonicity with scale): The claim that the hallucination-with-available-answer rate rises monotonically from 0.8B to 72B requires statistical support. The manuscript should report per-size confidence intervals, a trend test, and controls for confounds such as average generation length or prompt-specific effects before the scaling conclusion can be treated as robust.
minor comments (2)
  1. [Abstract] Abstract: The phrase 'commitment failure' is used without a concise operational definition; adding one sentence would improve accessibility for readers outside the immediate subfield.
  2. [Figures] Figure captions and methods: Ensure all probability-mass figures include the exact aggregation window (top-k tokens or full vocabulary) and any exclusion rules applied to the data.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment point by point below. Where revisions are needed to improve clarity and robustness, we will incorporate them in the next version of the manuscript.

read point-by-point responses
  1. Referee: [§3] §3 (Semantic Answer Availability definition): The aggregation procedure that maps token variants to a shared 'concept' is load-bearing for the 16-47% claim and the subsequent distribution-sharpness comparison. The manuscript must specify the exact similarity metric, embedding model, or normalization used, and demonstrate that it does not group merely topically related expressions. Without this, the distinction between 'knowing but misfiring' and partial knowledge remains unverified, especially as larger models produce more varied paraphrases.

    Authors: We agree that precise implementation details are essential for reproducibility and to rule out conflation of related but distinct expressions. The current manuscript describes the procedure at a conceptual level but omits the exact metric. In the revision we will add a subsection specifying cosine similarity on embeddings from all-MiniLM-L6-v2, the normalization applied, and a validation experiment (with examples of grouped vs. non-grouped variants) confirming that clusters reflect semantic equivalence rather than topical relatedness. This addresses the concern about larger models' increased paraphrase diversity. revision: yes

  2. Referee: [§4.1] §4.1 and results tables (substantial probability mass threshold): The headline percentages and monotonic scaling depend on an unspecified cutoff for 'substantial' mass. No sensitivity analysis or justification for the chosen threshold is reported; altering it could change both the fraction of 'available' hallucinations and the claimed distinction from matched correct generations.

    Authors: We acknowledge the lack of sensitivity analysis and justification. We will add a new figure and table in the revised §4.1 showing results across thresholds from 0.05 to 0.30. The main claims (availability rates and the concentration-vs-dispersion distinction) remain stable within this range. Justification will be tied to the empirical distribution observed in correct generations, where primary surface forms typically receive >0.20 mass. revision: yes

  3. Referee: [Results] Results section (monotonicity with scale): The claim that the hallucination-with-available-answer rate rises monotonically from 0.8B to 72B requires statistical support. The manuscript should report per-size confidence intervals, a trend test, and controls for confounds such as average generation length or prompt-specific effects before the scaling conclusion can be treated as robust.

    Authors: We agree that additional statistical support will strengthen the scaling claim. The revision will include bootstrap confidence intervals per model size, a linear trend test on log(model size), and a regression controlling for generation length and prompt ID. The monotonic pattern holds after these controls across both model families, but we will report the full controlled analysis. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical measurements of semantic availability

full rationale

The paper defines a semantic aggregation procedure for answer availability and reports direct empirical counts (16-47% of hallucinations with substantial mass on the correct concept) across model scales. These percentages are obtained by comparing hallucinated outputs against matched correct generations; no equations, fitted parameters, or self-citations are used to derive the reported rates or the monotonic scaling observation. The distinguishing factor (probability concentration vs. dispersion) is likewise measured rather than predicted from prior fits. The work is self-contained against external benchmarks and contains no load-bearing self-referential steps.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on one domain assumption about semantic aggregation and an implicit threshold for substantial probability mass; no new physical entities or heavy free parameters are introduced.

free parameters (1)
  • substantial probability mass threshold
    Used to classify whether the correct concept is available; value not specified in abstract but required for the 16-47% statistic.
axioms (1)
  • domain assumption Token-level variants can be reliably grouped into semantic answer concepts that reflect model knowledge.
    Invoked when defining answer availability and comparing hallucinated versus correct generations.

pith-pipeline@v0.9.0 · 5756 in / 1213 out tokens · 39283 ms · 2026-05-22T06:36:02.734639+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

65 extracted references · 65 canonical work pages · 7 internal anchors

  1. [1]

    ACM Computing Surveys , volume =

    Ji, Ziwei and Lee, Nayeon and Frieske, Rita and Yu, Tiezheng and Su, Dan and Xu, Yan and Ishii, Etsuko and Bang, Yejin and Chen, Delong and Dai, Wenliang and Chan, Ho Shu and Madotto, Andrea and Fung, Pascale , title =. ACM Computing Surveys , volume =

  2. [2]

    Advances in Neural Information Processing Systems (NeurIPS) , year =

    Wang, Shenzhi and Yu, Le and Gao, Chang and Zheng, Chujie and Liu, Shixuan and Lu, Rui and Dang, Kai and Chen, Xiong-Hui and Yang, Jianxin and Zhang, Zhenru and Liu, Yuqiong and Yang, An and Zhao, Andrew and Yue, Yang and Song, Shiji and Yu, Bowen and Huang, Gao and Lin, Junyang , title =. Advances in Neural Information Processing Systems (NeurIPS) , year =

  3. [3]

    Ignore the

    Vassoyan, Jean and Beau, Nathana. Ignore the. Findings of the North American Chapter of the Association for Computational Linguistics (NAACL) , year =

  4. [4]

    , title =

    Ren, Jie and Luo, Jiaming and Zhao, Yao and Krishna, Kundan and Saleh, Mohammad and Lakshminarayanan, Balaji and Liu, Peter J. , title =. International Conference on Learning Representations (ICLR) , year =

  5. [5]

    International Conference on Learning Representations (ICLR) , year =

    Malinin, Andrey and Gales, Mark , title =. International Conference on Learning Representations (ICLR) , year =

  6. [7]

    Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (ACL) , year =

    Bakman, Yavuz Faruk and Yaldiz, Duygu Nur and Buyukates, Baturalp and Tao, Chenyang and Dimitriadis, Dimitrios and Avestimehr, Salman , title =. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (ACL) , year =

  7. [8]

    International Conference on Learning Representations (ICLR) , year =

    Xiong, Miao and Hu, Zhiyuan and Lu, Xinyang and Li, Yifei and Fu, Jie and He, Junxian and Hooi, Bryan , title =. International Conference on Learning Representations (ICLR) , year =

  8. [9]

    International Conference on Learning Representations (ICLR) , year =

    Kuhn, Lorenz and Gal, Yarin and Farquhar, Sebastian , title =. International Conference on Learning Representations (ICLR) , year =

  9. [10]

    Nature , volume =

    Farquhar, Sebastian and Kossen, Jannik and Kuhn, Lorenz and Gal, Yarin , title =. Nature , volume =

  10. [11]

    Manakul, Potsawee and Liusie, Adian and Gales, Mark J. F. , title =. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP) , year =

  11. [13]

    Advances in Neural Information Processing Systems (NeurIPS) , year =

    Vashurin, Roman and Goloburda, Maiya and Ilina, Albina and Rubashevskii, Aleksandr and Nakov, Preslav and Shelmanov, Artem and Panov, Maxim , title =. Advances in Neural Information Processing Systems (NeurIPS) , year =

  12. [14]

    Findings of the Association for Computational Linguistics: EMNLP 2025 , pages =

    Simhi, Adi and Itzhak, Itay and Barez, Fazl and Stanovsky, Gabriel and Belinkov, Yonatan , title =. Findings of the Association for Computational Linguistics: EMNLP 2025 , pages =. 2025 , note =

  13. [15]

    arXiv preprint arXiv:2503.06709 , year =

    Xu, Hongshen and Yang, Zixv and Zhu, Zichen and Lan, Kunyao and Wang, Zihan and Wu, Mengyue and Ji, Ziwei and Chen, Lu and Fung, Pascale and Yu, Kai , title =. arXiv preprint arXiv:2503.06709 , year =

  14. [16]

    arXiv preprint arXiv:2602.14080 , year =

    Calderon, Nitay and Ben-David, Eyal and Gekhman, Zorik and Ofek, Eran and Yona, Gal , title =. arXiv preprint arXiv:2602.14080 , year =

  15. [17]

    International Conference on Learning Representations (ICLR) , year =

    Burns, Collin and Ye, Haotian and Klein, Dan and Steinhardt, Jacob , title =. International Conference on Learning Representations (ICLR) , year =

  16. [18]

    Findings of the Association for Computational Linguistics: EMNLP 2023 , year =

    Azaria, Amos and Mitchell, Tom , title =. Findings of the Association for Computational Linguistics: EMNLP 2023 , year =

  17. [19]

    Conference on Language Modeling (COLM) , year =

    Marks, Samuel and Tegmark, Max , title =. Conference on Language Modeling (COLM) , year =

  18. [20]

    International Conference on Learning Representations (ICLR) , year =

    Chuang, Yung-Sung and Xie, Yujia and Luo, Hongyin and Kim, Yoon and Glass, James and He, Pengcheng , title =. International Conference on Learning Representations (ICLR) , year =

  19. [21]

    Inference-Time Intervention: Eliciting Truthful Answers from a Language Model , booktitle =

    Li, Kenneth and Patel, Oam and Vi. Inference-Time Intervention: Eliciting Truthful Answers from a Language Model , booktitle =. 2023 , note =

  20. [22]

    Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (ACL) , year =

    Lin, Stephanie and Hilton, Jacob and Evans, Owain , title =. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (ACL) , year =

  21. [23]

    Semantic Entropy Probes: Robust and Cheap Hallucination Detection in LLMs

    Kossen, Jannik and Han, Jiatong and Razzak, Muhammed and Schut, Lisa and Malik, Shreshth and Gal, Yarin , title =. arXiv preprint arXiv:2406.15927 , year =

  22. [25]

    European Conference on Computer Vision (ECCV) , year =

    Zhao, Qinyu and Xu, Ming and Gupta, Kartik and Asthana, Akshay and Zheng, Liang and Gould, Stephen , title =. European Conference on Computer Vision (ECCV) , year =

  23. [26]

    Advances in Neural Information Processing Systems (NeurIPS) , year =

    Niu, Mengjia and Haddadi, Hamed and Pang, Guansong , title =. Advances in Neural Information Processing Systems (NeurIPS) , year =

  24. [27]

    and Zettlemoyer, Luke , title =

    Joshi, Mandar and Choi, Eunsol and Weld, Daniel S. and Zettlemoyer, Luke , title =. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (ACL) , year =

  25. [28]

    and Uszkoreit, Jakob and Le, Quoc and Petrov, Slav , title =

    Kwiatkowski, Tom and Palomaki, Jennimaria and Redfield, Olivia and Collins, Michael and Parikh, Ankur and Alberti, Chris and Epstein, Danielle and Polosukhin, Illia and Devlin, Jacob and Lee, Kenton and Toutanova, Kristina and Jones, Llion and Kelcey, Matthew and Chang, Ming-Wei and Dai, Andrew M. and Uszkoreit, Jakob and Le, Quoc and Petrov, Slav , title...

  26. [29]

    International Conference on Learning Representations (ICLR) , year =

    Hendrycks, Dan and Burns, Collin and Basart, Steven and Zou, Andy and Mazeika, Mantas and Song, Dawn and Steinhardt, Jacob , title =. International Conference on Learning Representations (ICLR) , year =

  27. [31]

    Qwen2.5 Technical Report

    Yang, An and Yang, Baosong and Zhang, Beichen and Hui, Binyuan and Zheng, Bo and Yu, Bowen and Li, Chengyuan and Liu, Dayiheng and Huang, Fei and Wei, Haoran and Lin, Huan and Yang, Jian and Tu, Jianhong and Zhang, Jianwei and Yang, Jianxin and Yang, Jiaxi and Zhou, Jingren and Lin, Junyang and Dang, Kai and Lu, Keming and others , title =. arXiv preprint...

  28. [34]

    , title =

    Guo, Chuan and Pleiss, Geoff and Sun, Yu and Weinberger, Kilian Q. , title =. International Conference on Machine Learning (ICML) , year =

  29. [35]

    arXiv preprint arXiv:2303.08774 , year =

  30. [36]

    and Lee, Yoonho and Mitchell, Eric and Finn, Chelsea , title =

    Xie, Johnathan and Chen, Annie S. and Lee, Yoonho and Mitchell, Eric and Finn, Chelsea , title =. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP) , year =

  31. [37]

    Transactions on Machine Learning Research (TMLR) , year =

    Chhikara, Prateek , title =. Transactions on Machine Learning Research (TMLR) , year =

  32. [38]

    and Mishkin, Pamela and Zhang, Chong and Agarwal, Sandhini and Slama, Katarina and Ray, Alex and others , title =

    Ouyang, Long and Wu, Jeffrey and Jiang, Xu and Almeida, Diogo and Wainwright, Carroll L. and Mishkin, Pamela and Zhang, Chong and Agarwal, Sandhini and Slama, Katarina and Ray, Alex and others , title =. Advances in Neural Information Processing Systems (NeurIPS) , year =

  33. [39]

    arXiv preprint arXiv:2510.17426 , year =

    Hu, Tiancheng and Minixhofer, Benjamin and Collier, Nigel , title =. arXiv preprint arXiv:2510.17426 , year =

  34. [40]

    Cohen, Jacob , title =

  35. [41]

    Welch, B. L. , title =. Biometrika , volume =

  36. [42]

    and Whitney, Donald R

    Mann, Henry B. and Whitney, Donald R. , title =. The Annals of Mathematical Statistics , volume =

  37. [43]

    , title =

    Fisher, Ronald A. , title =

  38. [44]

    ACL , year=

    MARS: Meaning-aware response scoring for uncertainty estimation in generative LLMs , author=. ACL , year=

  39. [45]

    Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge

    Think you have solved question answering? Try ARC , author=. arXiv preprint arXiv:1803.05457 , year=

  40. [46]

    Nature , volume=

    Detecting hallucinations in large language models using semantic entropy , author=. Nature , volume=

  41. [47]

    ICLR , year=

    Measuring massive multitask language understanding , author=. ICLR , year=

  42. [48]

    ACM Computing Surveys , volume=

    Survey of hallucination in natural language generation , author=. ACM Computing Surveys , volume=

  43. [49]

    ACL , year=

    TriviaQA: A large scale distantly supervised challenge dataset for reading comprehension , author=. ACL , year=

  44. [50]

    Language Models (Mostly) Know What They Know

    Language models (mostly) know what they know , author=. arXiv preprint arXiv:2207.05221 , year=

  45. [51]

    NeurIPS , year=

    Semantic entropy probes: Robust and cheap hallucination detection in LLMs , author=. NeurIPS , year=

  46. [52]

    NeurIPS , year=

    Inference-time intervention: Eliciting truthful answers from a language model , author=. NeurIPS , year=

  47. [53]

    The Llama 3 Herd of Models

    The Llama 3 herd of models , author=. arXiv preprint arXiv:2407.21783 , year=

  48. [54]

    TACL , volume=

    Natural questions: A benchmark for question answering research , author=. TACL , volume=

  49. [55]

    arXiv preprint arXiv:2508.14496 , year=

    Semantic energy: Detecting LLM hallucination beyond entropy , author=. arXiv preprint arXiv:2508.14496 , year=

  50. [56]

    ICLR , year=

    Uncertainty estimation in autoregressive structured prediction , author=. ICLR , year=

  51. [57]

    arXiv preprint arXiv:2504.07863 , year=

    Robust hallucination detection in LLMs via adaptive token selection , author=. arXiv preprint arXiv:2504.07863 , year=

  52. [58]

    Qwen3 Technical Report

    Qwen3 technical report , author=. arXiv preprint arXiv:2505.09388 , year=

  53. [59]

    ICLR , year=

    Out-of-distribution detection and selective generation for conditional language models , author=. ICLR , year=

  54. [60]

    arXiv preprint arXiv:2507.20836 , year=

    First hallucination tokens are different from conditional ones , author=. arXiv preprint arXiv:2507.20836 , year=

  55. [61]

    arXiv preprint arXiv:2403.09037 , year=

    The first to know: How token distributions reveal hidden knowledge in large vision-language models , author=. arXiv preprint arXiv:2403.09037 , year=

  56. [62]

    ICML , year=

    On calibration of modern neural networks , author=. ICML , year=

  57. [63]

    and Hu, Z

    Xiong, M. and Hu, Z. and Lu, X. and Li, Y. and Fu, J. and He, J. and Hooi, B. , booktitle=. Can

  58. [64]

    and Xie, Y

    Chuang, Y.-S. and Xie, Y. and Luo, H. and Kim, Y. and Glass, J. and He, P. , booktitle=. Do

  59. [65]

    ICLR , year=

    Discovering latent knowledge in language models without supervision , author=. ICLR , year=

  60. [66]

    and Mitchell, T

    Azaria, A. and Mitchell, T. , booktitle=. The internal state of an

  61. [67]

    COLM , year=

    The geometry of truth: Emergent linear structure in large language model representations of true/false datasets , author=. COLM , year=

  62. [68]

    ACL , year=

    TruthfulQA: Measuring how models mimic human falsehoods , author=. ACL , year=

  63. [69]

    EMNLP , year=

    SelfCheckGPT: Zero-resource black-box hallucination detection for generative large language models , author=. EMNLP , year=

  64. [70]

    ICLR , year=

    Semantic uncertainty: Linguistic invariances for uncertainty estimation in natural language generation , author=. ICLR , year=

  65. [71]

    NeurIPS , year=

    CoCoA: A minimum Bayes risk framework bridging confidence and consistency for uncertainty quantification in LLMs , author=. NeurIPS , year=