arxiv: 2603.29693 · v2 · submitted 2026-03-31 · 💻 cs.AI

Recognition: 1 theorem link

· Lean Theorem

Measuring the metacognition of AI

Richard Servajean , Philippe Servajean

Authors on Pith no claims yet

Pith reviewed 2026-05-13 23:39 UTC · model grok-4.3

classification 💻 cs.AI

keywords metacognitionAI evaluationsignal detection theorymeta-d'large language modelsconfidence calibrationuncertainty estimationdecision regulation

0 comments

The pith

The meta-d' framework should serve as the standard measure of metacognitive sensitivity in AI systems.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper argues that AI systems making decisions need reliable ways to judge their own uncertainty, and that the meta-d' framework from signal detection theory provides the best tool for measuring how well their confidence ratings separate right answers from wrong ones. It demonstrates this approach on large language models by having them answer questions and then rate their confidence, allowing direct comparisons to optimal performance, between different models on the same task, and across tasks for one model. The work further applies signal detection theory to test whether models adjust their choices when risks of error increase. Experiments cover three models including GPT-5. If adopted widely, this would give a consistent method for tracking and improving how AI systems handle uncertainty in real decisions.

Core claim

The meta-d' framework quantifies metacognitive sensitivity in AIs by measuring how effectively confidence ratings distinguish correct from incorrect primary judgments, enabling comparisons to optimality, across models, and across tasks, while signal detection theory separately assesses whether AIs spontaneously become more conservative in high-risk decision settings.

What carries the argument

The meta-d' framework, which applies signal detection theory to separate metacognitive sensitivity from response bias in confidence ratings after a primary judgment.

If this is right

LLMs can be ranked by how close their metacognition comes to optimality on any given task.
The same LLM can be compared to itself across tasks or versions to track metacognitive changes.
Signal detection theory reveals whether models regulate decisions more conservatively under higher risk.
Developers gain a standardized metric to evaluate uncertainty handling separate from raw accuracy.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same measurement approach could be applied to non-language AI systems such as image classifiers to test consistency across modalities.
Training objectives might be designed to directly maximize meta-d' rather than only accuracy or calibration loss.
If meta-d' proves stable, it could inform safety standards requiring minimum metacognitive thresholds for high-stakes AI deployment.

Load-bearing premise

Psychophysical tools developed for human perceptual judgments transfer directly to large language models without AI-specific adjustments to their assumptions.

What would settle it

An experiment showing that meta-d' scores for a given LLM fail to predict its actual accuracy differences between high- and low-confidence responses on held-out tasks.

read the original abstract

A robust decision-making process must take into account uncertainty, especially when the choice involves inherent risks. Because artificial Intelligence (AI) systems are increasingly integrated into decision-making workflows, managing uncertainty relies more and more on the metacognitive capabilities of these systems; i.e, their ability to assess the reliability of and regulate their own decisions. Hence, it is crucial to employ robust methods to measure the metacognitive abilities of AI. This paper is primarily a methodological contribution arguing for the adoption of the meta-d' framework as the gold standard for assessing the metacognitive sensitivity of AIs--the ability to generate confidence ratings that distinguish correct from incorrect responses. Moreover, we propose to leverage signal detection theory (SDT) to measure the ability of AIs to spontaneously regulate their decisions based on uncertainty and risk. To demonstrate the practical utility of these psychophysical frameworks, we conduct two series of experiments on three large language models (LLMs)--GPT-5, DeepSeek-V3.2-Exp, and Mistral-Medium-2508. In the first experiments, LLMs performed a primary judgment followed by a confidence rating. In the second, LLMs only performed the primary judgment, while we manipulated the risk associated with either response. On the one hand, applying the meta-d' framework allows us to conduct comparisons along three axes: comparing an LLM to optimality, comparing different LLMs on a given task, and comparing the same LLM across different tasks. On the other hand, SDT allows us to assess whether LLMs become more conservative when risk is high.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper proposes meta-d' and SDT as standard tools for measuring LLM metacognition and risk sensitivity, but the abstract shows only the plan with no results or assumption checks.

read the letter

The main takeaway is that the authors want to borrow meta-d' from psychology as the benchmark for how well an AI can rate its own accuracy, and to use signal detection theory to see if models adjust their decisions when risk changes. They outline experiments on GPT-5, DeepSeek-V3.2-Exp, and Mistral-Medium-2508 that combine a primary judgment with a confidence rating, plus a separate risk-manipulation condition on the judgment alone. This lets them compare a model to optimality, pit models against each other on the same task, and track one model across tasks, while also testing spontaneous conservatism under higher risk. The framing is straightforward and gives a consistent language for these comparisons instead of relying on raw accuracy or uncalibrated confidence scores. That part is useful for anyone building evaluation protocols. The clear limitation is that we see only the method. No numbers, no error bars, no statistical tests, and no check on whether the equal-variance Gaussian assumptions behind meta-d' actually fit LLM-generated confidence ratings. If those ratings are skewed or otherwise non-normal, the derived sensitivity scores could be off. The paper also does not address whether the human-perception origins of these tools require any adjustment for language models. This is aimed at researchers who work on AI evaluation and safety metrics. A reader who needs a ready benchmark will get the idea but will still need the data. It deserves a serious referee to require the actual results and the assumption validation before any wider adoption.

Referee Report

2 major / 1 minor

Summary. The paper is a methodological contribution proposing the meta-d' framework from signal detection theory (SDT) as the gold standard for assessing metacognitive sensitivity in AI, defined as the ability to generate confidence ratings that distinguish correct from incorrect responses. It further suggests using SDT to measure the spontaneous regulation of decisions based on uncertainty and risk. Two series of experiments are described on three LLMs (GPT-5, DeepSeek-V3.2-Exp, and Mistral-Medium-2508), involving primary judgments with confidence ratings and risk manipulations to demonstrate comparisons to optimality, across models, and across tasks.

Significance. If validated, this approach could provide a standardized, psychophysically grounded method for evaluating metacognition in AI systems, facilitating rigorous comparisons that go beyond simple accuracy metrics. The strength lies in leveraging established frameworks for optimality assessments and risk sensitivity, though the direct transfer from human perceptual tasks to LLM outputs needs empirical support.

major comments (2)

[Abstract] The abstract describes the intended experiments and claims but provides no numerical results, error bars, task details, or statistical tests, making it impossible to verify the data support for the meta-d' comparisons and SDT regulation claims.
[Experimental setup (as described)] The core SDT assumptions, including equal-variance Gaussians for confidence ratings, are applied to LLM outputs without any reported validation or checks for normality and variance equality; violation of these would undermine the reliability of meta-d' estimates and the claimed superiority for assessing metacognitive sensitivity.

minor comments (1)

[Abstract] The model names (e.g., GPT-5, Mistral-Medium-2508) appear to be placeholders or future versions; clarify the exact models used if they are not standard releases.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their detailed and constructive comments, which highlight important areas for improving the clarity and rigor of our methodological contribution. We address each major comment point by point below and outline the corresponding revisions.

read point-by-point responses

Referee: [Abstract] The abstract describes the intended experiments and claims but provides no numerical results, error bars, task details, or statistical tests, making it impossible to verify the data support for the meta-d' comparisons and SDT regulation claims.

Authors: We agree that the abstract should provide a more complete summary of the empirical findings to allow readers to assess the support for our claims. In the revised version, we will expand the abstract to include key quantitative results (e.g., meta-d' estimates with confidence intervals or standard errors), task specifications, and statistical outcomes from the comparisons across models, tasks, and optimality benchmarks. revision: yes
Referee: [Experimental setup (as described)] The core SDT assumptions, including equal-variance Gaussians for confidence ratings, are applied to LLM outputs without any reported validation or checks for normality and variance equality; violation of these would undermine the reliability of meta-d' estimates and the claimed superiority for assessing metacognitive sensitivity.

Authors: We acknowledge that the current manuscript does not report explicit checks for the equal-variance Gaussian assumption underlying meta-d'. In the revision, we will add supplementary analyses examining the distributions of confidence ratings (e.g., normality tests and variance comparisons between correct and incorrect trials) for each model and task. Where the assumptions are approximately met, we will report this support; where deviations occur, we will discuss their potential impact on meta-d' estimates and consider robustness checks or alternative metrics. revision: yes

Circularity Check

0 steps flagged

No circularity: applies externally established meta-d' and SDT frameworks to LLMs

full rationale

The paper is a methodological proposal that imports the meta-d' framework and signal detection theory (SDT) from psychology to assess LLM metacognition and risk regulation. No equations, predictions, or optimality comparisons are derived from the paper's own data or definitions in a self-referential manner. The experiments apply the pre-existing frameworks directly to LLM confidence ratings and judgments without fitting parameters that are then renamed as predictions, without self-citation load-bearing steps, and without any ansatz or uniqueness claims that reduce to the authors' prior work. The central claims rest on the external validity of SDT assumptions rather than internal construction, making the derivation chain self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the transferability of human psychophysical tools to AI without additional justification or AI-specific axioms stated in the abstract.

axioms (1)

domain assumption The meta-d' framework developed for human metacognition is the appropriate and optimal measure for AI metacognitive sensitivity.
The paper positions meta-d' as the gold standard for AIs, implying direct applicability without deriving or validating the transfer.

pith-pipeline@v0.9.0 · 5573 in / 1254 out tokens · 52612 ms · 2026-05-13T23:39:47.146634+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean (Jcost, washburn_uniqueness_aczel); Foundation/AlexanderDuality.lean (D=3 forcing) reality_from_one_distinction; Jcost uniqueness unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

arguing for the adoption of the meta-d' framework as the gold standard for assessing the metacognitive sensitivity of AIs... leverage signal detection theory (SDT) to measure the ability of AIs to spontaneously regulate their decisions based on uncertainty and risk

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

MEDLEY-BENCH: Scale Buys Evaluation but Not Control in AI Metacognition
cs.AI 2026-04 unverdicted novelty 6.0

MEDLEY-BENCH reveals an evaluation/control dissociation in AI metacognition where scale improves reflective scoring but not proportional belief revision, with a consistent knowing/doing gap across 35 models.

Reference graph

Works this paper leans on

90 extracted references · 90 canonical work pages · cited by 1 Pith paper · 3 internal anchors

[1]

Futures90, 46–60 (2017)

Makridakis, S.: The forthcoming artificial intelligence (ai) revolution: Its impact on society and firms. Futures90, 46–60 (2017)

work page 2017
[2]

arXiv preprint arXiv:2211.06318 (2022)

Stone, P., Brooks, R., Brynjolfsson, E., Calo, R., Etzioni, O., Hager, G., Hirschberg, J., Kalyanakrishnan, S., Kamar, E., Kraus, S., et al.: Artificial intelligence and life in 2030: the one hundred year study on artificial intelligence. arXiv preprint arXiv:2211.06318 (2022)

work page arXiv 2030
[3]

Futures135, 102884 (2022)

Gruetzemacher, R., Whittlestone, J.: The transformative potential of artificial intelligence. Futures135, 102884 (2022)

work page 2022
[4]

Chinese Journal of Sociology 11(1), 31–57 (2025)

Xie, Y., Avila, S.: The social impact of generative LLM-based AI. Chinese Journal of Sociology 11(1), 31–57 (2025)

work page 2025
[5]

Interactive Learning Environments31(7), 4099–4112 (2023)

Hwang, G.-J., Chang, C.-Y.: A review of opportunities and challenges of chatbots in education. Interactive Learning Environments31(7), 4099–4112 (2023)

work page 2023
[6]

Advanced Intelligent Systems7(3), 2400429 (2025)

Yigci, D., Eryilmaz, M., Yetisen, A.K., Tasoglu, S., Ozcan, A.: Large language model-based chatbots in higher education. Advanced Intelligent Systems7(3), 2400429 (2025)

work page 2025
[7]

Computers in Human Behavior: Artificial Humans1(2), 100022 (2023)

Memarian, B., Doleck, T.: Chatgpt in education: Methods, potentials, and limitations. Computers in Human Behavior: Artificial Humans1(2), 100022 (2023)

work page 2023
[8]

Computers in Human Behavior160, 108386 (2024)

Stadler, M., Bannert, M., Sailer, M.: Cognitive ease at a cost: LLMs reduce mental effort but compromise depth in student scientific inquiry. Computers in Human Behavior160, 108386 (2024)

work page 2024
[9]

nature596(7873), 583–589 (2021)

Jumper, J., Evans, R., Pritzel, A., Green, T., Figurnov, M., Ronneberger, O., Tunyasuvu- nakool, K., Bates, R., ˇZ´ ıdek, A., Potapenko, A.,et al.: Highly accurate protein structure prediction with alphafold. nature596(7873), 583–589 (2021)

work page 2021
[10]

biorxiv, 2021–10 (2021)

Evans, R., O’Neill, M., Pritzel, A., Antropova, N., Senior, A., Green, T., ˇZ´ ıdek, A., Bates, R., Blackwell, S., Yim, J., et al.: Protein complex prediction with alphafold-multimer. biorxiv, 2021–10 (2021)

work page 2021
[11]

arXiv preprint arXiv:2310.09685 (2023)

Winnifrith, A., Outeiral, C., Hie, B.: Generative artificial intelligence for de novo protein design. arXiv preprint arXiv:2310.09685 (2023)

work page arXiv 2023
[12]

https://openai.com/blog/chatgpt

OpenAI: Introducing ChatGPT. https://openai.com/blog/chatgpt. Accessed: 9 Septembre 2026 (2022)

work page 2026
[13]

https://explodingtopics.com/blog/ chatbot-statistics

Topics, E.: 40+ Chatbot Statistics (2025). https://explodingtopics.com/blog/ chatbot-statistics. Accessed 9 September 2025 (2025)

work page 2025
[14]

Ethical and social risks of harm from Language Models

Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P.-S., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., et al.: Ethical and social risks of harm from language models. arXiv preprint arXiv:2112.04359 (2021) 11

work page internal anchor Pith review Pith/arXiv arXiv 2021
[15]

Environmental science & technology57(9), 3464–3466 (2023)

Rillig, M.C., ˚Agerstrand, M., Bi, M., Gould, K.A., Sauerland, U.: Risks and benefits of large language models for the environment. Environmental science & technology57(9), 3464–3466 (2023)

work page 2023
[16]

In: 2023 IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW), pp

Jalil, S., Rafi, S., LaToza, T.D., Moran, K., Lam, W.: Chatgpt and software testing education: Promises & perils. In: 2023 IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW), pp. 4130–4137 (2023). IEEE

work page 2023
[17]

JAAOS-Journal of the American Academy of Orthopaedic Surgeons31(23), 1173–1179 (2023)

Massey, P.A., Montgomery, C., Zhang, A.S.: Comparison of chatgpt–3.5, chatgpt-4, and orthopaedic resident performance on orthopaedic assessment examinations. JAAOS-Journal of the American Academy of Orthopaedic Surgeons31(23), 1173–1179 (2023)

work page 2023
[18]

Research square, 3 (2023)

Johnson, D., Goodman, R., Patrinely, J., Stone, C., Zimmerman, E., Donald, R., Chang, S., Berkowitz, S., Finn, A., Jahangir, E., et al.: Assessing the accuracy and reliability of ai-generated medical responses: an evaluation of the chat-gpt model. Research square, 3 (2023)

work page 2023
[19]

Journal of medical education and curricular development11, 23821205241238641 (2024)

Sumbal, A., Sumbal, R., Amir, A.: Can chatgpt-3.5 pass a medical exam? a systematic review of chatgpt’s performance in academic testing. Journal of medical education and curricular development11, 23821205241238641 (2024)

work page 2024
[20]

Geerling, W., Mateer, G.D., Wooten, J., Damodaran, N.: Chatgpt has aced the test of understanding in college economics: Now what? The American Economist68(2), 233–245 (2023)

work page 2023
[21]

Proceedings of the National Academy of Sciences121(49), 2414955121 (2024)

Borges, B., Foroutan, N., Bayazit, D., Sotnikova, A., Montariol, S., Nazaretsky, T., Banaei, M., Sakhaeirad, A., Servant, P., Neshaei, S.P.,et al.: Could chatgpt get an engineering degree? eval- uating higher education vulnerability to AI assistants. Proceedings of the National Academy of Sciences121(49), 2414955121 (2024)

work page 2024
[22]

MedRxiv, 2024–07 (2024)

W´ ojcik, D., Adamiak, O., Czerepak, G., Tokarczuk, O., Szalewski, L.: A comparative analysis of the performance of chatgpt4, gemini and claude for the polish medical final diploma exam and medical-dental verification exam. MedRxiv, 2024–07 (2024)

work page 2024
[23]

Fijaˇ cko, N., Gosak, L., ˇStiglic, G., Picard, C.T., Douma, M.J.: Can chatgpt pass the life support exams without entering the american heart association course? Resuscitation185 (2023)

work page 2023
[24]

Winter, J.C.: Can chatgpt pass high school exams on english language comprehension? International Journal of Artificial Intelligence in Education34(3), 915–930 (2024)

work page 2024
[25]

Drug, healthcare and patient safety, 137–147 (2023)

Al-Ashwal, F.Y., Zawiah, M., Gharaibeh, L., Abu-Farha, R., Bitar, A.N.: Evaluating the sensitivity, specificity, and accuracy of chatgpt-3.5, chatgpt-4, bing ai, and bard against con- ventional drug-drug interactions clinical tools. Drug, healthcare and patient safety, 137–147 (2023)

work page 2023
[26]

Cureus16(3) (2024)

Abbas, A., Rehman, M.S., Rehman, S.S.: Comparing the performance of popular large lan- guage models on the national board of medical examiners sample questions. Cureus16(3) (2024)

work page 2024
[27]

Capabilities of Gemini Models in Medicine

Saab, K., Tu, T., Weng, W.-H., Tanno, R., Stutz, D., Wulczyn, E., Zhang, F., Strother, T., Park, C., Vedadi, E., et al.: Capabilities of gemini models in medicine. arXiv preprint arXiv:2404.18416 (2024)

work page internal anchor Pith review arXiv 2024
[28]

Gencer, A., Aydin, S.: Can chatgpt pass the thoracic surgery exam? The American Journal of the Medical Sciences366(4), 291–295 (2023)

work page 2023
[29]

Public Library of Science San 12 Francisco, CA USA (2023)

Mbakwe, A.B., Lourentzou, I., Celi, L.A., Mechanic, O.J., Dagan, A.: ChatGPT passing USMLE shines a spotlight on the flaws of medical education. Public Library of Science San 12 Francisco, CA USA (2023)

work page 2023
[30]

SAE international, Warrendale, PA, USA (2021)

Committee, O.-R.A.D.O.: Taxonomy and Definitions for Terms Related to Driving Automa- tion Systems for On-road Motor Vehicles. SAE international, Warrendale, PA, USA (2021). Accessed: 25 February 2026.https://www.sae.org/standards/j3016˙202104-taxonomy- definitions-terms-related-driving-automation-systems-road-motor-vehicles

work page 2021
[31]

Proceedings of the National Academy of Sciences119(11), 2111547119 (2022)

Steyvers, M., Tejeda, H., Kerrigan, G., Smyth, P.: Bayesian modeling of human–AI comple- mentarity. Proceedings of the National Academy of Sciences119(11), 2111547119 (2022)

work page 2022
[32]

Diagnostics15(22), 2899 (2025)

Castilla, A.C., D’Amorim, I.d.P., Wanderley, M.F.B., Esmeraldo, M.A., Yoshida, A.R., Eigier, A.M., Valente Yamada Sawamura, M.: External validation of an artificial intelligence triaging system for chest x-rays: A retrospective independent clinical study. Diagnostics15(22), 2899 (2025)

work page 2025
[33]

Frontiers in Computer Science6, 1521066 (2025)

Gomez, C., Cho, S.M., Ke, S., Huang, C.-M., Unberath, M.: Human-AI collaboration is not very collaborative yet: a taxonomy of interaction patterns in AI-assisted decision making from a systematic review. Frontiers in Computer Science6, 1521066 (2025)

work page 2025
[34]

Nature neuroscience11(4), 398–403 (2008)

Platt, M.L., Huettel, S.A.: Risky business: the neuroeconomics of decision making under uncertainty. Nature neuroscience11(4), 398–403 (2008)

work page 2008
[35]

arXiv preprint arXiv:2507.22365 (2025)

Li, Z., Steyvers, M.: Beyond accuracy: How AI metacognitive sensitivity improves AI-assisted decision making. arXiv preprint arXiv:2507.22365 (2025)

work page arXiv 2025
[36]

PloS one9(4), 95693 (2014)

Berger-Tal, O., Nathan, J., Meron, E., Saltz, D.: The exploration-exploitation dilemma: a multidisciplinary framework. PloS one9(4), 95693 (2014)

work page 2014
[37]

Decision2(3), 191 (2015)

Mehlhorn, K., Newell, B.R., Todd, P.M., Lee, M.D., Morgan, K., Braithwaite, V.A., Haus- mann, D., Fiedler, K., Gonzalez, C.: Unpacking the exploration–exploitation tradeoff: a synthesis of human and animal literatures. Decision2(3), 191 (2015)

work page 2015
[38]

Science329(5995), 1081–1085 (2010)

Bahrami, B., Olsen, K., Latham, P.E., Roepstorff, A., Rees, G., Frith, C.D.: Optimally interacting minds. Science329(5995), 1081–1085 (2010)

work page 2010
[39]

Nguyen, D.-A., Bhattacharyya, R., Colombatto, C., Fleming, S., Posner, I., Hawes, N.: Joint decision-making in robot teleoperation: When are two heads better than one? arXiv preprint arXiv:2503.15510 (2025)

work page arXiv 2025
[40]

In: AAAI Spring Symposium Series (2024)

Bhattacharyya, R., Nguyen, D.A., Colombatto, C., Fleming, S., Posner, I., Hawes, N.: Towards intelligent decision support systems in robotics: Investigating the role of self-confidence calibration in joint decision-making. In: AAAI Spring Symposium Series (2024)

work page 2024
[41]

Koriat, A.: When are two heads better than one and why? Science336(6079), 360–362 (2012)

work page 2012
[42]

In: Theory of Games and Economic Behavior

Von Neumann, J., Morgenstern, O.: Theory of games and economic behavior. In: Theory of Games and Economic Behavior. Princeton university press, Princeton, NJ (2007)

work page 2007
[43]

MIT Press, Cambridge, MA (2022)

Parr, T., Pezzulo, G., Friston, K.J.: Active Inference: the Free Energy Principle in Mind, Brain, and Behavior. MIT Press, Cambridge, MA (2022)

work page 2022
[44]

Decision Support Systems124, 113097 (2019)

Vo, N.N., He, X., Liu, S., Xu, G.: Deep learning for decision making and the optimization of socially responsible investments and portfolio. Decision Support Systems124, 113097 (2019)

work page 2019
[45]

arXiv preprint arXiv:2510.05126 (2025)

Steyvers, M., Belem, C., Smyth, P.: Improving metacognition and uncertainty communication in language models. arXiv preprint arXiv:2510.05126 (2025)

work page arXiv 2025
[46]

Nature Machine Intelligence7(2), 221–231 (2025)

Steyvers, M., Tejeda, H., Kumar, A., Belem, C., Karny, S., Hu, X., Mayer, L.W., Smyth, 13 P.: What large language models know and what people think they know. Nature Machine Intelligence7(2), 221–231 (2025)

work page 2025
[47]

Current Directions in Psychological Science, 09637214251391158 (2025)

Steyvers, M., Peters, M.A.: Metacognition and uncertainty communication in humans and large language models. Current Directions in Psychological Science, 09637214251391158 (2025)

work page 2025
[48]

PNAS nexus4(5), 133 (2025)

Lee, D., Pruitt, J., Zhou, T., Du, J., Odegaard, B.: Metacognitive sensitivity: The key to calibrating trust and optimal decision making with ai. PNAS nexus4(5), 133 (2025)

work page 2025
[49]

Neuroscience of consciousness2017(1), 007 (2017)

Fleming, S.M.: Hmeta-d: hierarchical bayesian estimation of metacognitive efficiency from confidence ratings. Neuroscience of consciousness2017(1), 007 (2017)

work page 2017
[50]

In: The Cognitive Neuroscience of Metacognition, pp

Maniscalco, B., Lau, H.: Signal detection theory analysis of type 1 and type 2 data: meta- d′, response-specific meta-d ′, and the unequal variance SDT model. In: The Cognitive Neuroscience of Metacognition, pp. 25–66. Springer, Berlin, Heidelberg (2014)

work page 2014
[51]

Psychonomic bulletin & review10(4), 843–876 (2003)

Galvin, S.J., Podd, J.V., Drga, V., Whitmore, J.: Type 2 tasks in the theory of signal detectability: Discrimination between correct and incorrect decisions. Psychonomic bulletin & review10(4), 843–876 (2003)

work page 2003
[52]

arXiv preprint arXiv:2509.21545 (2025)

Ackerman, C.: Evidence for limited metacognition in LLMs. arXiv preprint arXiv:2509.21545 (2025)

work page arXiv 2025
[53]

Memory & Cognition, 1–26 (2025)

Cash, T.N., Oppenheimer, D.M., Christie, S., Devgan, M.: Quantifying uncert-AI-nty: Testing the accuracy of LLMs’ confidence judgments. Memory & Cognition, 1–26 (2025)

work page 2025
[54]

Consciousness and cognition21(1), 422–430 (2012)

Maniscalco, B., Lau, H.: A signal detection theoretic approach for estimating metacognitive sensitivity from confidence ratings. Consciousness and cognition21(1), 422–430 (2012)

work page 2012
[55]

Nature Communications16(1), 701 (2025)

Rahnev, D.: A comprehensive assessment of current methods for measuring metacognition. Nature Communications16(1), 701 (2025)

work page 2025
[56]

Neuroscience of Consciousness2020(1), 001 (2020)

Mazancieux, A., Dinze, C., Souchay, C., Moulin, C.J.: Metacognitive domain specificity in feeling-of-knowing but not retrospective confidence. Neuroscience of Consciousness2020(1), 001 (2020)

work page 2020
[57]

Consciousness and cognition35, 192–205 (2015)

Rausch, M., M¨ uller, H.J., Zehetleitner, M.: Metacognitive sensitivity of subjective reports of decisional confidence and visual experience. Consciousness and cognition35, 192–205 (2015)

work page 2015
[58]

Journal of Experi- mental Psychology: General148(1), 51 (2019)

Carpenter, J., Sherman, M.T., Kievit, R.A., Seth, A.K., Lau, H., Fleming, S.M.: Domain- general enhancements of metacognitive ability through adaptive training. Journal of Experi- mental Psychology: General148(1), 51 (2019)

work page 2019
[59]

Consciousness and Cognition111, 103522 (2023)

Conte, N., Fairfield, B., Padulo, C., Pelegrina, S.: Metacognition in working memory: Confidence judgments during an n-back task. Consciousness and Cognition111, 103522 (2023)

work page 2023
[60]

Cognition241, 105622 (2023)

Wen, W., Charles, L., Haggard, P.: Metacognition and sense of agency. Cognition241, 105622 (2023)

work page 2023
[61]

Cognition258, 106089 (2025)

Meunier-Duperray, L., Mazancieux, A., Souchay, C., Fleming, S.M., Bastin, C., Moulin, C.J., Angel, L.: Does age affect metacognition? a cross-domain investigation using a hierarchical bayesian framework. Cognition258, 106089 (2025)

work page 2025
[62]

Frontiers in psychology12, 630143 (2021) 14

Zakrzewski, A.C., Sanders, E.C., Berry, J.M.: Evidence for age-equivalent and task-dissociative metacognition in the memory domain. Frontiers in psychology12, 630143 (2021) 14

work page 2021
[63]

Consciousness and cognition28, 151–160 (2014)

Palmer, E.C., David, A.S., Fleming, S.M.: Effects of age on metacognitive efficiency. Consciousness and cognition28, 151–160 (2014)

work page 2014
[64]

Annual Review of Psychology75(1), 241–268 (2024)

Fleming, S.M.: Metacognition and confidence: A review and synthesis. Annual Review of Psychology75(1), 241–268 (2024)

work page 2024
[65]

Neuroscience of consciousness2021(1), 040 (2021)

Guggenmos, M.: Measuring metacognitive performance: type 1 performance dependence and test-retest reliability. Neuroscience of consciousness2021(1), 040 (2021)

work page 2021
[66]

Consciousness and Cognition95, 103196 (2021)

Xue, K., Shekhar, M., Rahnev, D.: Examining the robustness of the relationship between metacognitive efficiency and metacognitive bias. Consciousness and Cognition95, 103196 (2021)

work page 2021
[67]

Psychological Methods (2023)

Rausch, M., Hellmann, S., Zehetleitner, M.: Measures of metacognitive efficiency across cognitive models of decision confidence. Psychological Methods (2023)

work page 2023
[68]

iScience (2026)

Miyoshi, K., Rahnev, D., Lau, H.: Correcting for unequal variance in signal detection models using response time. iScience (2026)

work page 2026
[69]

Open Mind7, 392–411 (2023)

Dayan, P.: Metacognitive information theory. Open Mind7, 392–411 (2023)

work page 2023
[70]

arXiv preprint arXiv:2512.10451 (2025)

Trinh, L.T.M., Pham, L.M.V., Pham, T.M.A., Nguyen, A.D.: Metacognitive sensitivity for test-time dynamic model selection. arXiv preprint arXiv:2512.10451 (2025)

work page arXiv 2025
[71]

arXiv preprint arXiv:2603.09309 (2026)

Dai, Y.: Rescaling confidence: What scale design reveals about LLM metacognition. arXiv preprint arXiv:2603.09309 (2026)

work page arXiv 2026
[72]

In: Proceedings of the AAAI Conference on Artificial Intelligence, vol

Wang, G., Wu, W., Ye, G., Cheng, Z., Chen, X., Zheng, H.: Decoupling metacognition from cognition: A framework for quantifying metacognitive ability in LLMs. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 39, pp. 25353–25361 (2025)

work page 2025
[73]

https://developers.openai.com/api/reference/resources/ chat/subresources/completions/methods/create

OpenAI: Create chat completion. https://developers.openai.com/api/reference/resources/ chat/subresources/completions/methods/create. Accessed: 27 March 2026 (2026)

work page 2026
[74]

https://api-docs.deepseek.com/ updates

DeepSeek: DeepSeek API Documentation: Change Log. https://api-docs.deepseek.com/ updates. Accessed: 24 February 2026

work page 2026
[75]

Green, D.M., Swets, J.A.,et al.: Signal Detection Theory and Psychophysics vol. 1. Wiley New York, New York (1966)

work page 1966
[76]

Rout- ledge, ??? (2021)

Hautus, M.J., Macmillan, N.A., Creelman, C.D.: Detection Theory: A User’s Guide. Rout- ledge, ??? (2021)

work page 2021
[77]

Oxford university press, ??? (2001)

Wickens, T.D.: Elementary Signal Detection Theory. Oxford university press, ??? (2001)

work page 2001
[78]

Journal of open source software4(40), 1541 (2019)

Makowski, D., Ben-Shachar, M.S., L¨ udecke, D.: bayestestr: Describing effects and their uncer- tainty, existence and significance within the bayesian framework. Journal of open source software4(40), 1541 (2019)

work page 2019
[79]

Vision research40(22), 3121–3144 (2000)

Tyler, C.W., Chen, C.-C.: Signal detection theory in the 2afc paradigm: Attention, channel uncertainty and probability summation. Vision research40(22), 3121–3144 (2000)

work page 2000
[80]

Journal of Vision22(10), 18–18 (2022)

Miyoshi, K., Sakamoto, Y., Nishida, S.: On the assumptions behind metacognitive measure- ments: Implications for theory and practice. Journal of Vision22(10), 18–18 (2022)

work page 2022

Showing first 80 references.