Detecting Hallucinations in Large Language Models via Internal Attention Divergence Signals

Gijs van Dijk

arxiv: 2605.05025 · v1 · submitted 2026-05-06 · 💻 cs.CL

Detecting Hallucinations in Large Language Models via Internal Attention Divergence Signals

Gijs van Dijk This is my paper

Pith reviewed 2026-05-08 17:18 UTC · model grok-4.3

classification 💻 cs.CL

keywords hallucination detectionattention mechanismsuncertainty quantificationlarge language modelsKL divergencewhite-box detection

0 comments

The pith

Attention divergence from uniform distribution predicts LLM answer correctness

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The authors introduce a method to detect hallucinations in large language models by analyzing internal attention patterns in a single forward pass. They calculate the Kullback-Leibler divergence of each attention head's output from a uniform distribution and use these values as features in a logistic regression model to estimate uncertainty. This approach avoids the need for multiple sampling or external verifiers. Tests across various datasets, tasks, and model families show it competes with established uncertainty estimation techniques. The predictive signal is strongest in middle layers and for factual tokens like names and numbers.

Core claim

Computing the Kullback-Leibler divergence between attention head distributions and a uniform reference, then classifying with logistic regression on these features, yields a signal that is highly predictive of whether the model's answer is correct, performing competitively with other methods while being efficient and concentrated in specific layers and tokens.

What carries the argument

KL divergence of attention heads to uniform reference distribution, serving as input features to a logistic regression probe for uncertainty quantification

If this is right

Attention divergence provides a white-box, interpretable signal for model uncertainty.
The method works without repeated sampling or external models, making it lightweight.
Performance holds across multiple datasets, task types, and model families.
The signal concentrates in middle layers and on factual tokens such as named entities and numbers.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If reliable, this could enable real-time filtering of uncertain outputs during generation.
Combining divergence signals with other internal metrics might improve detection robustness.
Minimal retraining of the probe could allow adaptation to new domains with little data.

Load-bearing premise

That the logistic regression probe trained on divergence features will generalize reliably to new models, tasks, and domains without significant retraining or overfitting to the tested datasets.

What would settle it

If a probe trained on one set of models and tasks shows near-random accuracy when tested on a new model family or unseen task domain, this would indicate the method does not generalize as claimed.

Figures

Figures reproduced from arXiv: 2605.05025 by Gijs van Dijk.

**Figure 1.** Figure 1: Intuition of attention patterns with low KL view at source ↗

**Figure 2.** Figure 2: Heatmap of the difference in mean attention view at source ↗

**Figure 3.** Figure 3: Empirical cumulative distribution functions view at source ↗

read the original abstract

We propose a lightweight and single-pass uncertainty quantification method for detecting hallucinations in Large Language Models. The method uses attention matrices to estimate uncertainty without requiring repeated sampling or external models. Specifically, we measure the Kullback-Leibler divergence between each attention head's distribution and a uniform reference distribution, and use these features in a logistic regression probe. Across multiple datasets, task types, and model families, attention divergence is highly predictive of answer correctness and performs competitively with existing uncertainty estimation methods. We find that this signal is concentrated in middle layers and on factual tokens such as named entities and numbers, suggesting that attention dynamics provides an efficient and interpretable white-box signal of model uncertainty.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a simple attention-based way to flag hallucinations but its supervised probe likely needs retraining for new models or tasks.

read the letter

The main takeaway is that this work turns per-head attention KL divergence from uniform into input features for a logistic regression probe that predicts whether an LLM answer is correct. It is a clean, single-pass internal signal that avoids sampling or external models, and the authors locate the useful signal in middle layers and on factual tokens like entities and numbers. That part is straightforward and matches what one would expect from attention dynamics under uncertainty. They also run the approach across several datasets, task types, and model families, which is better than many single-model studies. The logistic probe itself is standard and the core divergence measure is defined without circularity to the labels, so the method is at least reproducible in principle. The soft spot is exactly the one the stress-test flags: the probe must be fit on labeled correctness data from the target model and domain. The abstract does not report cross-model transfer tests or training-set size ablations, so it is unclear whether the method stays lightweight when you move to a new model or task without retraining. If the probe overfits to the training distribution, the claimed advantage over unsupervised baselines shrinks. The abstract also asserts competitive performance without giving numbers, dataset sizes, or error bars, which leaves the strength of the result hard to judge from the available text. This paper is for people already working on white-box uncertainty estimation or hallucination detection in LLMs. A reader who wants a new internal feature to try in their own pipeline would get something concrete to experiment with. It deserves a serious referee because the core proposal is well-defined and the layer/token analysis adds interpretability value, even if the generalization claims need more evidence. I would send it for review and ask specifically for cross-model results and the raw performance numbers.

Referee Report

3 major / 2 minor

Summary. The paper proposes a lightweight, single-pass uncertainty quantification method for detecting hallucinations in LLMs. It computes the Kullback-Leibler divergence between each attention head's distribution and a uniform reference distribution, uses these as features in a logistic regression probe to predict answer correctness, and reports that the signal is highly predictive across datasets, tasks, and model families while being competitive with existing methods. The signal is said to concentrate in middle layers and on factual tokens such as named entities and numbers.

Significance. If validated with rigorous quantitative evidence and generalization tests, the approach could offer an efficient, interpretable internal signal for LLM uncertainty that avoids repeated sampling or external models. This would be a useful addition to white-box hallucination detection techniques, particularly if the attention divergence features prove robust without per-model retraining.

major comments (3)

Abstract: The central claims of 'highly predictive' performance and 'competitive' results with existing methods are asserted without any quantitative metrics, error bars, dataset sizes, ablation details, or baseline comparisons, making the soundness of the contribution impossible to assess from the provided text.
Method section (logistic probe description): The method relies on a supervised logistic regression probe fitted to labeled correctness data; this introduces free parameters (probe coefficients) and requires training data from the target domain, which directly challenges the 'lightweight and single-pass' framing unless cross-model transfer is explicitly demonstrated.
Experiments section: The assertion of results 'across multiple datasets, task types, and model families' lacks any mention of cross-model transfer experiments (e.g., training the probe on Llama and testing on Mistral) or training-set-size ablations, which is load-bearing for the generalization claim and leaves open the possibility that the probe captures dataset-specific correlations rather than a robust internal signal.

minor comments (2)

Abstract: Specify whether the uniform reference distribution is fixed or adjusted for varying sequence lengths across heads.
Provide the exact definition of 'factual tokens' used for the layer-wise concentration analysis and how they were identified.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment point by point below, indicating the changes we will make to the manuscript.

read point-by-point responses

Referee: Abstract: The central claims of 'highly predictive' performance and 'competitive' results with existing methods are asserted without any quantitative metrics, error bars, dataset sizes, ablation details, or baseline comparisons, making the soundness of the contribution impossible to assess from the provided text.

Authors: We agree that the abstract is too high-level and does not include quantitative details, which limits immediate assessment of the claims. The experiments section of the manuscript does report specific metrics, error bars from repeated runs, dataset sizes, and baseline comparisons. We will revise the abstract to incorporate key quantitative highlights drawn from those results. revision: yes
Referee: Method section (logistic probe description): The method relies on a supervised logistic regression probe fitted to labeled correctness data; this introduces free parameters (probe coefficients) and requires training data from the target domain, which directly challenges the 'lightweight and single-pass' framing unless cross-model transfer is explicitly demonstrated.

Authors: The referee is correct that the logistic regression probe is trained in a supervised fashion on labeled data, introducing coefficients and requiring a training set from the domain. This means the overall pipeline is not training-free. The 'lightweight and single-pass' description in the paper refers specifically to inference on new inputs, where attention divergences are extracted in one forward pass and the already-trained probe is applied. We will revise the method section to clarify the distinction between the one-time probe training cost and the inference procedure, and we will add explicit discussion of the training data requirements. revision: partial
Referee: Experiments section: The assertion of results 'across multiple datasets, task types, and model families' lacks any mention of cross-model transfer experiments (e.g., training the probe on Llama and testing on Mistral) or training-set-size ablations, which is load-bearing for the generalization claim and leaves open the possibility that the probe captures dataset-specific correlations rather than a robust internal signal.

Authors: We acknowledge that while the experiments evaluate the approach on multiple model families, the probe is trained and tested within each family rather than demonstrating explicit cross-model transfer or training-set-size ablations. This leaves the generalization claim open to the concern raised. We will add cross-model transfer experiments and training-set-size ablations to the revised experiments section to directly address this point. revision: yes

Circularity Check

0 steps flagged

No significant circularity: core signal defined independently of labels

full rationale

The paper defines attention divergence via standard KL divergence to a uniform distribution (an unsupervised, fixed reference) and feeds the resulting features into a logistic regression probe trained on external correctness labels. This is a conventional feature-based classifier; the divergence metric itself is not defined in terms of the target labels, nor does any equation reduce the claimed predictiveness to a fitted parameter by construction. No self-citations, uniqueness theorems, or ansatzes are invoked in the provided text to support the central claim. The evaluation of probe performance on held-out data is an empirical measurement rather than a tautological renaming or self-prediction.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The approach rests on the assumption that attention distributions can be meaningfully compared to uniform via KL divergence and that a linear probe on these scalars captures uncertainty. No new physical entities are introduced.

free parameters (1)

logistic regression coefficients
The probe weights are fitted to labeled correctness data on the chosen datasets.

axioms (1)

standard math KL divergence is a valid measure of difference between attention distributions and uniform reference
Invoked when defining the divergence features from attention matrices.

pith-pipeline@v0.9.0 · 5399 in / 1102 out tokens · 39558 ms · 2026-05-08T17:18:27.876790+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

63 extracted references · 63 canonical work pages · 1 internal anchor

[1]

2025 , url =

Kim, Yubin and Jeong, Hyewon and Chen, Shan and Li, Shuyue Stella and Park, Chanwoo and Lu, Mingyu and Alhamoud, Kumail and Mun, Jimin and Grau, Cristina and Jung, Minseok and Gameiro, Rodrigo and Fan, Lizhou and Park, Eugene and Lin, Tristan and Yoon, Joonsik and Yoon, Wonjin and Sap, Maarten and Tsvetkov, Yulia and Liang, Paul and Xu, Xuhai and Liu, Xin...

work page 2025
[2]

and Zhang, Edwin , month =

Kalai, Adam Tauman and Nachum, Ofir and Vempala, Santosh S. and Zhang, Edwin , month =. 2025 , url =

work page 2025
[3]

and Ho, Daniel E

Magesh, Varun and Surani, Faiz and Dahl, Matthew and Suzgun, Mirac and Manning, Christopher D. and Ho, Daniel E. , journal =. 2024 , url =

work page 2024
[4]

and Kaiser, Lukasz and Polosukhin, Illia , month =

Vaswani, Ashish and Shazeer, Noam and Parmar, Niki and Uszkoreit, Jakob and Jones, Llion and Gomez, Aidan N. and Kaiser, Lukasz and Polosukhin, Illia , month =. 2017 , url =

work page 2017
[5]

2024 , doi =

Huang, Lei and Yu, Weijiang and Ma, Weitao and Zhong, Weihong and Feng, Zhangyin and Wang, Haotian and Chen, Qianglong and Peng, Weihua and Feng, Xiaocheng and Qin, Bing and Liu, Ting , journal =. 2024 , doi =

work page 2024
[6]

2025 , url =

Liu, Xiaoou and Chen, Tiejin and Da, Longchao and Chen, Chacha and Lin, Zhen and Wei, Hua , month =. 2025 , url =

work page 2025
[7]

2023 , url =

Kuhn, Lorenz and Gal, Yarin and Farquhar, Sebastian , month =. 2023 , url =

work page 2023
[8]

2024 , url =

Kossen, Jannik and Han, Jiatong and Razzak, Muhammed and Schut, Lisa and Malik, Shreshth and Gal, Yarin , month =. 2024 , url =

work page 2024
[9]

2024 , doi =

Farquhar, Sebastian and Kossen, Jannik and Kuhn, Lorenz and Gal, Yarin , journal =. 2024 , doi =

work page 2024
[10]

2024 , url =

Nikitin, Alexander and Kossen, Jannik and Gal, Yarin and Marttinen, Pekka , month =. 2024 , url =

work page 2024
[11]

2025 , url =

Li, Yinghao and Qiang, Rushi and Moukheiber, Lama and Zhang, Chao , month =. 2025 , url =

work page 2025
[12]

2023 , url =

Kostenok, Elizaveta and Cherniavskii, Daniil and Zaytsev, Alexey , month =. 2023 , url =

work page 2023
[13]

2025 , url =

Vazhentsev, Artem and Rvanova, Lyudmila and Kuzmin, Gleb and Fadeeva, Ekaterina and Lazichny, Ivan and Panchenko, Alexander and Panov, Maxim and Baldwin, Timothy and Sachan, Mrinmaya and Nakov, Preslav and Shelmanov, Artem , month =. 2025 , url =

work page 2025
[14]

ArXiv , title =

Zifan Zheng and Yezhaohui Wang and Yuxin Huang and Shichao Song and Bo Tang and Feiyu Xiong and Zhiyu Li , booktitle =. ArXiv , title =

work page
[15]

2024 , url =

Zhou, Zhenhong and Yu, Haiyang and Zhang, Xinghua and Xu, Rongwu and Huang, Fei and Wang, Kun and Liu, Yang and Fang, Junfeng and Li, Yongbin , month =. 2024 , url =

work page 2024
[16]

doi: https://doi.org/10.1016/j.strusafe.2008.06.020

Armen Der Kiureghian and Ove Ditlevsen , keywords =. Aleatory or epistemic? Does it matter? , journal =. 2009 , note =. doi:https://doi.org/10.1016/j.strusafe.2008.06.020 , url =

work page doi:10.1016/j.strusafe.2008.06.020 2009
[17]

2021 , doi =

Hüllermeier, Eyke and Waegeman, Willem and Hüllermeier, Eyke and Waegeman, Willem , journal =. 2021 , doi =

work page 2021
[18]

2024 , url =

Ling, Chen and Zhao, Xujiang and Zhang, Xuchao and Cheng, Wei and Liu, Yanchi and Sun, Yiyou and Oishi, Mika and Osaki, Takao and Matsuda, Katsushi and Ji, Jie and Bai, Guangji and Zhao, Liang and Chen, Haifeng , month =. 2024 , url =

work page 2024
[19]

, month =

Ahdritz, Gustaf and Qin, Tian and Vyas, Nikhil and Barak, Boaz and Edelman, Benjamin L. , month =. 2024 , url =

work page 2024
[20]

2023 , url =

Hou, Bairu and Liu, Yujian and Qian, Kaizhi and Andreas, Jacob and Chang, Shiyu and Zhang, Yang , month =. 2023 , url =

work page 2023
[21]

2023 , url =

Zhao, Wayne Xin and Zhou, Kun and Li, Junyi and Tang, Tianyi and Wang, Xiaolei and Hou, Yupeng and Min, Yingqian and Zhang, Beichen and Zhang, Junjie and Dong, Zican and Du, Yifan and Yang, Chen and Chen, Yushuo and Chen, Zhipeng and Jiang, Jinhao and Ren, Ruiyang and Li, Yifan and Tang, Xinyu and Liu, Zikang and Liu, Peiyu and Nie, Jian-Yun and Wen, Ji-R...

work page 2023
[22]

2025 , url =

Liang, Weixin and Zhang, Yaohui and Codreanu, Mihai and Wang, Jiayu and Cao, Hancheng and Zou, James , month =. 2025 , url =

work page 2025
[23]

2025 , url =

Yang, Jeremy and Yonack, Noah and Zyskowski, Kate and Yarats, Denis and Ho, Johnny and Ma, Jerry , month =. 2025 , url =

work page 2025
[24]

Shannon, C. E. , journal=. A mathematical theory of communication , year=

work page
[25]

2024 , eprint=

Area under the ROC Curve has the Most Consistent Evaluation for Binary Classification , author=. 2024 , eprint=

work page 2024
[26]

2025 , url =

Pavlovic, Maja , month =. 2025 , url =

work page 2025
[27]

2023 , url =

Wang, Cheng , month =. 2023 , url =

work page 2023
[28]

Addressing the Curse of Imbalanced Training Sets: One-Sided Selection , url =

Kubat, Miroslav and Matwin, Stan , biburl =. Addressing the Curse of Imbalanced Training Sets: One-Sided Selection , url =. In Proceedings of the Fourteenth International Conference on Machine Learning , description =

work page
[29]

2021 , url =

Lin, Stephanie and Hilton, Jacob and Evans, Owain , month =. 2021 , url =

work page 2021
[30]

and Zettlemoyer, Luke , month =

Joshi, Mandar and Choi, Eunsol and Weld, Daniel S. and Zettlemoyer, Luke , month =. 2017 , url =

work page 2017
[31]

and Salakhutdinov, Ruslan and Manning, Christopher D

Yang, Zhilin and Qi, Peng and Zhang, Saizheng and Bengio, Yoshua and Cohen, William W. and Salakhutdinov, Ruslan and Manning, Christopher D. , month =. 2018 , url =

work page 2018
[32]

Training Verifiers to Solve Math Word Problems

Training Verifiers to Solve Math Word Problems , author=. arXiv preprint arXiv:2110.14168 , year=

work page internal anchor Pith review arXiv
[33]

2024 , url =

Grattafiori, Aaron and Dubey, Abhimanyu and Jauhri, Abhinav and Pandey, Abhinav and Kadian, Abhishek and Al-Dahle , month =. 2024 , url =

work page 2024
[34]

2025 , url =

Yang, An and Li, Anfeng and Yang, Baosong and Zhang, Beichen and Hui, Binyuan and Zheng, Bo and Yu, Bowen and Gao, Chang and Huang, Chengen and Lv, Chenxu and Zheng, Chujie and Liu, Dayiheng and Zhou, Fan and Huang, Fei and Hu, Feng and Ge, Hao and Wei, Haoran and Lin, Huan and Tang, Jialong and Yang, Jian and Tu, Jianhong and Zhang, Jianwei and Yang, Jia...

work page 2025
[35]

Jiang, Albert Q. and Sablayrolles, Alexandre and Mensch, Arthur and Bamford, Chris and Chaplot, Devendra Singh and De Las Casas, Diego and Bressand, Florian and Lengyel, Gianna and Lample, Guillaume and Saulnier, Lucile and Lavaud, Lélio Renard and Lachaux, Marie-Anne and Stock, Pierre and Scao, Teven Le and Lavril, Thibaut and Wang, Thomas and Lacroix, T...

work page 2023
[36]

2025 , url =

Bazarova, Alexandra and Yugay, Aleksandr and Shulga, Andrey and Ermilova, Alina and Volodichev, Andrei and Polev, Konstantin and Belikova, Julia and Parchiev, Rauf and Simakov, Dmitry and Savchenko, Maxim and Savchenko, Andrey and Barannikov, Serguei and Zaytsev, Alexey , month =. 2025 , url =

work page 2025
[37]

Manakul, Potsawee and Liusie, Adian and Gales, Mark J. F. , month =. 2023 , url =

work page 2023
[38]

2024 , url =

Chen, Chao and Liu, Kai and Chen, Ze and Gu, Yi and Wu, Yue and Tao, Mingyuan and Fu, Zhihang and Ye, Jieping , month =. 2024 , url =

work page 2024
[39]

2024 , url =

Du, Xuefeng and Xiao, Chaowei and Li, Yixuan , month =. 2024 , url =

work page 2024
[40]

LLM-Check: Investigating Detection of Hallucinations in Large Language Models , url =

Sriramanan, Gaurang and Bharti, Siddhant and Sadasivan, Vinu Sankar and Saha, Shoumik and Kattakinda, Priyatham and Feizi, Soheil , booktitle =. LLM-Check: Investigating Detection of Hallucinations in Large Language Models , url =. doi:10.52202/079017-1077 , editor =

work page doi:10.52202/079017-1077
[41]

, month =

Ren, Jie and Luo, Jiaming and Zhao, Yao and Krishna, Kundan and Saleh, Mohammad and Lakshminarayanan, Balaji and Liu, Peter J. , month =. 2022 , url =

work page 2022
[42]

2024 , url =

Fadeeva, Ekaterina and Rubashevskii, Aleksandr and Shelmanov, Artem and Petrakov, Sergey and Li, Haonan and Mubarak, Hamdy and Tsymbalov, Evgenii and Kuzmin, Gleb and Panchenko, Alexander and Baldwin, Timothy and Nakov, Preslav and Panov, Maxim , month =. 2024 , url =

work page 2024
[43]

2024 , url =

Sun, Zhongxiang and Zang, Xiaoxue and Zheng, Kai and Song, Yang and Xu, Jun and Zhang, Xiao and Yu, Weijie and Song, Yang and Li, Han , month =. 2024 , url =

work page 2024
[44]

2025 , url =

Binkowski, Jakub and Janiak, Denis and Sawczyn, Albert and Gabrys, Bogdan and Kajdanowicz, Tomasz , month =. 2025 , url =

work page 2025
[45]

2024 , url =

Peng, Binghui and Narayanan, Srini and Papadimitriou, Christos , month =. 2024 , url =

work page 2024
[46]

2024 , url =

Orgad, Hadas and Toker, Michael and Gekhman, Zorik and Reichart, Roi and Szpektor, Idan and Kotek, Hadas and Belinkov, Yonatan , month =. 2024 , url =

work page 2024
[47]

2025 , url =

Gao, Cheng and Chen, Huimin and Xiao, Chaojun and Chen, Zhiyi and Liu, Zhiyuan and Sun, Maosong , month =. 2025 , url =

work page 2025
[48]

2025 , url =

Sun, Yiyou and Gai, Yu and Chen, Lijie and Ravichander, Abhilasha and Choi, Yejin and Song, Dawn , month =. 2025 , url =

work page 2025
[49]

2022 , doi =

Ji, Ziwei and Lee, Nayeon and Frieske, Rita and Yu, Tiezheng and Su, Dan and Xu, Yan and Ishii, Etsuko and Bang, Ye Jin and Madotto, Andrea and Fung, Pascale , journal =. 2022 , doi =

work page 2022
[50]

2025 , url =

Skean, Oscar and Arefin, Md Rifat and Zhao, Dan and Patel, Niket and Naghiyev, Jalal and LeCun, Yann and Shwartz-Ziv, Ravid , month =. 2025 , url =

work page 2025
[51]

2021 , journal=

A Mathematical Framework for Transformer Circuits , author=. 2021 , journal=

work page 2021
[52]

2020 , url =

Geva, Mor and Schuster, Roei and Berant, Jonathan and Levy, Omer , month =. 2020 , url =

work page 2020
[53]

2019 , url =

Tenney, Ian and Das, Dipanjan and Pavlick, Ellie , month =. 2019 , url =

work page 2019
[54]

, title =

Bernardo, Jose M. , title =. Journal of the Royal Statistical Society: Series B (Methodological) , volume =. 2018 , month =. doi:10.1111/j.2517-6161.1979.tb01066.x , url =

work page doi:10.1111/j.2517-6161.1979.tb01066.x 2018
[55]

2020 , url =

Xu, Jiacheng and Desai, Shrey and Durrett, Greg , month =. 2020 , url =

work page 2020
[56]

2018 , url =

Ott, Myle and Auli, Michael and Grangier, David and Ranzato, Marc'Aurelio , month =. 2018 , url =

work page 2018
[57]

2021 , url =

Xiao, Yijun and Wang, William Yang , month =. 2021 , url =

work page 2021
[58]

2024 , url =

Stolfo, Alessandro and Wu, Ben and Gurnee, Wes and Belinkov, Yonatan and Song, Xingyi and Sachan, Mrinmaya and Nanda, Neel , month =. 2024 , url =

work page 2024
[59]

2025 , url =

Ogasa, Yuya and Arase, Yuki , month =. 2025 , url =

work page 2025
[60]

2024 , url =

Ferrando, Javier and Obeso, Oscar and Rajamanoharan, Senthooran and Nanda, Neel , month =. 2024 , url =

work page 2024
[61]

2023 , url =

Zhang, Shengyu and Dong, Linfeng and Li, Xiaoya and Zhang, Sen and Sun, Xiaofei and Wang, Shuhe and Li, Jiwei and Hu, Runyi and Zhang, Tianwei and Wu, Fei and Wang, Guoyin , month =. 2023 , url =

work page 2023
[62]

2024 , url =

Chuang, Yung-Sung and Qiu, Linlu and Hsieh, Cheng-Yu and Krishna, Ranjay and Kim, Yoon and Glass, James , month =. 2024 , url =

work page 2024
[63]

Do Androids Know They’re Only Dreaming of Electric Sheep?

CH-Wang, Sky and Van Durme, Benjamin and Eisner, Jason and Kedzie, Chris. Do Androids Know They ' re Only Dreaming of Electric Sheep?. Findings of the Association for Computational Linguistics: ACL 2024. 2024. doi:10.18653/v1/2024.findings-acl.260

work page doi:10.18653/v1/2024.findings-acl.260 2024

[1] [1]

2025 , url =

Kim, Yubin and Jeong, Hyewon and Chen, Shan and Li, Shuyue Stella and Park, Chanwoo and Lu, Mingyu and Alhamoud, Kumail and Mun, Jimin and Grau, Cristina and Jung, Minseok and Gameiro, Rodrigo and Fan, Lizhou and Park, Eugene and Lin, Tristan and Yoon, Joonsik and Yoon, Wonjin and Sap, Maarten and Tsvetkov, Yulia and Liang, Paul and Xu, Xuhai and Liu, Xin...

work page 2025

[2] [2]

and Zhang, Edwin , month =

Kalai, Adam Tauman and Nachum, Ofir and Vempala, Santosh S. and Zhang, Edwin , month =. 2025 , url =

work page 2025

[3] [3]

and Ho, Daniel E

Magesh, Varun and Surani, Faiz and Dahl, Matthew and Suzgun, Mirac and Manning, Christopher D. and Ho, Daniel E. , journal =. 2024 , url =

work page 2024

[4] [4]

and Kaiser, Lukasz and Polosukhin, Illia , month =

Vaswani, Ashish and Shazeer, Noam and Parmar, Niki and Uszkoreit, Jakob and Jones, Llion and Gomez, Aidan N. and Kaiser, Lukasz and Polosukhin, Illia , month =. 2017 , url =

work page 2017

[5] [5]

2024 , doi =

Huang, Lei and Yu, Weijiang and Ma, Weitao and Zhong, Weihong and Feng, Zhangyin and Wang, Haotian and Chen, Qianglong and Peng, Weihua and Feng, Xiaocheng and Qin, Bing and Liu, Ting , journal =. 2024 , doi =

work page 2024

[6] [6]

2025 , url =

Liu, Xiaoou and Chen, Tiejin and Da, Longchao and Chen, Chacha and Lin, Zhen and Wei, Hua , month =. 2025 , url =

work page 2025

[7] [7]

2023 , url =

Kuhn, Lorenz and Gal, Yarin and Farquhar, Sebastian , month =. 2023 , url =

work page 2023

[8] [8]

2024 , url =

Kossen, Jannik and Han, Jiatong and Razzak, Muhammed and Schut, Lisa and Malik, Shreshth and Gal, Yarin , month =. 2024 , url =

work page 2024

[9] [9]

2024 , doi =

Farquhar, Sebastian and Kossen, Jannik and Kuhn, Lorenz and Gal, Yarin , journal =. 2024 , doi =

work page 2024

[10] [10]

2024 , url =

Nikitin, Alexander and Kossen, Jannik and Gal, Yarin and Marttinen, Pekka , month =. 2024 , url =

work page 2024

[11] [11]

2025 , url =

Li, Yinghao and Qiang, Rushi and Moukheiber, Lama and Zhang, Chao , month =. 2025 , url =

work page 2025

[12] [12]

2023 , url =

Kostenok, Elizaveta and Cherniavskii, Daniil and Zaytsev, Alexey , month =. 2023 , url =

work page 2023

[13] [13]

2025 , url =

Vazhentsev, Artem and Rvanova, Lyudmila and Kuzmin, Gleb and Fadeeva, Ekaterina and Lazichny, Ivan and Panchenko, Alexander and Panov, Maxim and Baldwin, Timothy and Sachan, Mrinmaya and Nakov, Preslav and Shelmanov, Artem , month =. 2025 , url =

work page 2025

[14] [14]

ArXiv , title =

Zifan Zheng and Yezhaohui Wang and Yuxin Huang and Shichao Song and Bo Tang and Feiyu Xiong and Zhiyu Li , booktitle =. ArXiv , title =

work page

[15] [15]

2024 , url =

Zhou, Zhenhong and Yu, Haiyang and Zhang, Xinghua and Xu, Rongwu and Huang, Fei and Wang, Kun and Liu, Yang and Fang, Junfeng and Li, Yongbin , month =. 2024 , url =

work page 2024

[16] [16]

doi: https://doi.org/10.1016/j.strusafe.2008.06.020

Armen Der Kiureghian and Ove Ditlevsen , keywords =. Aleatory or epistemic? Does it matter? , journal =. 2009 , note =. doi:https://doi.org/10.1016/j.strusafe.2008.06.020 , url =

work page doi:10.1016/j.strusafe.2008.06.020 2009

[17] [17]

2021 , doi =

Hüllermeier, Eyke and Waegeman, Willem and Hüllermeier, Eyke and Waegeman, Willem , journal =. 2021 , doi =

work page 2021

[18] [18]

2024 , url =

Ling, Chen and Zhao, Xujiang and Zhang, Xuchao and Cheng, Wei and Liu, Yanchi and Sun, Yiyou and Oishi, Mika and Osaki, Takao and Matsuda, Katsushi and Ji, Jie and Bai, Guangji and Zhao, Liang and Chen, Haifeng , month =. 2024 , url =

work page 2024

[19] [19]

, month =

Ahdritz, Gustaf and Qin, Tian and Vyas, Nikhil and Barak, Boaz and Edelman, Benjamin L. , month =. 2024 , url =

work page 2024

[20] [20]

2023 , url =

Hou, Bairu and Liu, Yujian and Qian, Kaizhi and Andreas, Jacob and Chang, Shiyu and Zhang, Yang , month =. 2023 , url =

work page 2023

[21] [21]

2023 , url =

Zhao, Wayne Xin and Zhou, Kun and Li, Junyi and Tang, Tianyi and Wang, Xiaolei and Hou, Yupeng and Min, Yingqian and Zhang, Beichen and Zhang, Junjie and Dong, Zican and Du, Yifan and Yang, Chen and Chen, Yushuo and Chen, Zhipeng and Jiang, Jinhao and Ren, Ruiyang and Li, Yifan and Tang, Xinyu and Liu, Zikang and Liu, Peiyu and Nie, Jian-Yun and Wen, Ji-R...

work page 2023

[22] [22]

2025 , url =

Liang, Weixin and Zhang, Yaohui and Codreanu, Mihai and Wang, Jiayu and Cao, Hancheng and Zou, James , month =. 2025 , url =

work page 2025

[23] [23]

2025 , url =

Yang, Jeremy and Yonack, Noah and Zyskowski, Kate and Yarats, Denis and Ho, Johnny and Ma, Jerry , month =. 2025 , url =

work page 2025

[24] [24]

Shannon, C. E. , journal=. A mathematical theory of communication , year=

work page

[25] [25]

2024 , eprint=

Area under the ROC Curve has the Most Consistent Evaluation for Binary Classification , author=. 2024 , eprint=

work page 2024

[26] [26]

2025 , url =

Pavlovic, Maja , month =. 2025 , url =

work page 2025

[27] [27]

2023 , url =

Wang, Cheng , month =. 2023 , url =

work page 2023

[28] [28]

Addressing the Curse of Imbalanced Training Sets: One-Sided Selection , url =

Kubat, Miroslav and Matwin, Stan , biburl =. Addressing the Curse of Imbalanced Training Sets: One-Sided Selection , url =. In Proceedings of the Fourteenth International Conference on Machine Learning , description =

work page

[29] [29]

2021 , url =

Lin, Stephanie and Hilton, Jacob and Evans, Owain , month =. 2021 , url =

work page 2021

[30] [30]

and Zettlemoyer, Luke , month =

Joshi, Mandar and Choi, Eunsol and Weld, Daniel S. and Zettlemoyer, Luke , month =. 2017 , url =

work page 2017

[31] [31]

and Salakhutdinov, Ruslan and Manning, Christopher D

Yang, Zhilin and Qi, Peng and Zhang, Saizheng and Bengio, Yoshua and Cohen, William W. and Salakhutdinov, Ruslan and Manning, Christopher D. , month =. 2018 , url =

work page 2018

[32] [32]

Training Verifiers to Solve Math Word Problems

Training Verifiers to Solve Math Word Problems , author=. arXiv preprint arXiv:2110.14168 , year=

work page internal anchor Pith review arXiv

[33] [33]

2024 , url =

Grattafiori, Aaron and Dubey, Abhimanyu and Jauhri, Abhinav and Pandey, Abhinav and Kadian, Abhishek and Al-Dahle , month =. 2024 , url =

work page 2024

[34] [34]

2025 , url =

Yang, An and Li, Anfeng and Yang, Baosong and Zhang, Beichen and Hui, Binyuan and Zheng, Bo and Yu, Bowen and Gao, Chang and Huang, Chengen and Lv, Chenxu and Zheng, Chujie and Liu, Dayiheng and Zhou, Fan and Huang, Fei and Hu, Feng and Ge, Hao and Wei, Haoran and Lin, Huan and Tang, Jialong and Yang, Jian and Tu, Jianhong and Zhang, Jianwei and Yang, Jia...

work page 2025

[35] [35]

Jiang, Albert Q. and Sablayrolles, Alexandre and Mensch, Arthur and Bamford, Chris and Chaplot, Devendra Singh and De Las Casas, Diego and Bressand, Florian and Lengyel, Gianna and Lample, Guillaume and Saulnier, Lucile and Lavaud, Lélio Renard and Lachaux, Marie-Anne and Stock, Pierre and Scao, Teven Le and Lavril, Thibaut and Wang, Thomas and Lacroix, T...

work page 2023

[36] [36]

2025 , url =

Bazarova, Alexandra and Yugay, Aleksandr and Shulga, Andrey and Ermilova, Alina and Volodichev, Andrei and Polev, Konstantin and Belikova, Julia and Parchiev, Rauf and Simakov, Dmitry and Savchenko, Maxim and Savchenko, Andrey and Barannikov, Serguei and Zaytsev, Alexey , month =. 2025 , url =

work page 2025

[37] [37]

Manakul, Potsawee and Liusie, Adian and Gales, Mark J. F. , month =. 2023 , url =

work page 2023

[38] [38]

2024 , url =

Chen, Chao and Liu, Kai and Chen, Ze and Gu, Yi and Wu, Yue and Tao, Mingyuan and Fu, Zhihang and Ye, Jieping , month =. 2024 , url =

work page 2024

[39] [39]

2024 , url =

Du, Xuefeng and Xiao, Chaowei and Li, Yixuan , month =. 2024 , url =

work page 2024

[40] [40]

LLM-Check: Investigating Detection of Hallucinations in Large Language Models , url =

Sriramanan, Gaurang and Bharti, Siddhant and Sadasivan, Vinu Sankar and Saha, Shoumik and Kattakinda, Priyatham and Feizi, Soheil , booktitle =. LLM-Check: Investigating Detection of Hallucinations in Large Language Models , url =. doi:10.52202/079017-1077 , editor =

work page doi:10.52202/079017-1077

[41] [41]

, month =

Ren, Jie and Luo, Jiaming and Zhao, Yao and Krishna, Kundan and Saleh, Mohammad and Lakshminarayanan, Balaji and Liu, Peter J. , month =. 2022 , url =

work page 2022

[42] [42]

2024 , url =

Fadeeva, Ekaterina and Rubashevskii, Aleksandr and Shelmanov, Artem and Petrakov, Sergey and Li, Haonan and Mubarak, Hamdy and Tsymbalov, Evgenii and Kuzmin, Gleb and Panchenko, Alexander and Baldwin, Timothy and Nakov, Preslav and Panov, Maxim , month =. 2024 , url =

work page 2024

[43] [43]

2024 , url =

Sun, Zhongxiang and Zang, Xiaoxue and Zheng, Kai and Song, Yang and Xu, Jun and Zhang, Xiao and Yu, Weijie and Song, Yang and Li, Han , month =. 2024 , url =

work page 2024

[44] [44]

2025 , url =

Binkowski, Jakub and Janiak, Denis and Sawczyn, Albert and Gabrys, Bogdan and Kajdanowicz, Tomasz , month =. 2025 , url =

work page 2025

[45] [45]

2024 , url =

Peng, Binghui and Narayanan, Srini and Papadimitriou, Christos , month =. 2024 , url =

work page 2024

[46] [46]

2024 , url =

Orgad, Hadas and Toker, Michael and Gekhman, Zorik and Reichart, Roi and Szpektor, Idan and Kotek, Hadas and Belinkov, Yonatan , month =. 2024 , url =

work page 2024

[47] [47]

2025 , url =

Gao, Cheng and Chen, Huimin and Xiao, Chaojun and Chen, Zhiyi and Liu, Zhiyuan and Sun, Maosong , month =. 2025 , url =

work page 2025

[48] [48]

2025 , url =

Sun, Yiyou and Gai, Yu and Chen, Lijie and Ravichander, Abhilasha and Choi, Yejin and Song, Dawn , month =. 2025 , url =

work page 2025

[49] [49]

2022 , doi =

Ji, Ziwei and Lee, Nayeon and Frieske, Rita and Yu, Tiezheng and Su, Dan and Xu, Yan and Ishii, Etsuko and Bang, Ye Jin and Madotto, Andrea and Fung, Pascale , journal =. 2022 , doi =

work page 2022

[50] [50]

2025 , url =

Skean, Oscar and Arefin, Md Rifat and Zhao, Dan and Patel, Niket and Naghiyev, Jalal and LeCun, Yann and Shwartz-Ziv, Ravid , month =. 2025 , url =

work page 2025

[51] [51]

2021 , journal=

A Mathematical Framework for Transformer Circuits , author=. 2021 , journal=

work page 2021

[52] [52]

2020 , url =

Geva, Mor and Schuster, Roei and Berant, Jonathan and Levy, Omer , month =. 2020 , url =

work page 2020

[53] [53]

2019 , url =

Tenney, Ian and Das, Dipanjan and Pavlick, Ellie , month =. 2019 , url =

work page 2019

[54] [54]

, title =

Bernardo, Jose M. , title =. Journal of the Royal Statistical Society: Series B (Methodological) , volume =. 2018 , month =. doi:10.1111/j.2517-6161.1979.tb01066.x , url =

work page doi:10.1111/j.2517-6161.1979.tb01066.x 2018

[55] [55]

2020 , url =

Xu, Jiacheng and Desai, Shrey and Durrett, Greg , month =. 2020 , url =

work page 2020

[56] [56]

2018 , url =

Ott, Myle and Auli, Michael and Grangier, David and Ranzato, Marc'Aurelio , month =. 2018 , url =

work page 2018

[57] [57]

2021 , url =

Xiao, Yijun and Wang, William Yang , month =. 2021 , url =

work page 2021

[58] [58]

2024 , url =

Stolfo, Alessandro and Wu, Ben and Gurnee, Wes and Belinkov, Yonatan and Song, Xingyi and Sachan, Mrinmaya and Nanda, Neel , month =. 2024 , url =

work page 2024

[59] [59]

2025 , url =

Ogasa, Yuya and Arase, Yuki , month =. 2025 , url =

work page 2025

[60] [60]

2024 , url =

Ferrando, Javier and Obeso, Oscar and Rajamanoharan, Senthooran and Nanda, Neel , month =. 2024 , url =

work page 2024

[61] [61]

2023 , url =

Zhang, Shengyu and Dong, Linfeng and Li, Xiaoya and Zhang, Sen and Sun, Xiaofei and Wang, Shuhe and Li, Jiwei and Hu, Runyi and Zhang, Tianwei and Wu, Fei and Wang, Guoyin , month =. 2023 , url =

work page 2023

[62] [62]

2024 , url =

Chuang, Yung-Sung and Qiu, Linlu and Hsieh, Cheng-Yu and Krishna, Ranjay and Kim, Yoon and Glass, James , month =. 2024 , url =

work page 2024

[63] [63]

Do Androids Know They’re Only Dreaming of Electric Sheep?

CH-Wang, Sky and Van Durme, Benjamin and Eisner, Jason and Kedzie, Chris. Do Androids Know They ' re Only Dreaming of Electric Sheep?. Findings of the Association for Computational Linguistics: ACL 2024. 2024. doi:10.18653/v1/2024.findings-acl.260

work page doi:10.18653/v1/2024.findings-acl.260 2024