Measuring Progress Toward AGI: A Cognitive Framework

Alison M. Snyder; Allan Dafoe; Isaac R. Galatzer-Levy; Kate Olszewska; Matthew Botvinick; Meredith Ringel Morris; Noah D. Goodman; Oran Kelly; Orhan Firat; Ryan Burnell

arxiv: 2605.28405 · v1 · pith:YWNZIN7Snew · submitted 2026-05-27 · 💻 cs.AI

Measuring Progress Toward AGI: A Cognitive Framework

Ryan Burnell , Yumeya Yamamori , Orhan Firat , Kate Olszewska , Steph Hughes-Fitt , Oran Kelly , Isaac R. Galatzer-Levy , Meredith Ringel Morris

show 5 more authors

Allan Dafoe Alison M. Snyder Noah D. Goodman Matthew Botvinick Shane Legg

This is my paper

Pith reviewed 2026-06-29 11:58 UTC · model grok-4.3

classification 💻 cs.AI

keywords cognitive taxonomyAGI evaluationcognitive profilesevaluation protocolhuman cognitive facultiesAI progress measurementintelligence assessment

0 comments

The pith

A cognitive taxonomy of 10 faculties from psychology enables empirical profiles that track AI progress toward AGI.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes a framework that relates AI system capabilities directly to human cognitive abilities to replace subjective AGI claims with measurable data. It draws on psychology and cognitive science to define a taxonomy that breaks general intelligence into 10 specific faculties. An evaluation protocol then tests systems on targeted held-out tasks to produce a cognitive profile that reveals where each system stands relative to those faculties. If correct, this would let researchers and policymakers track development empirically and compare systems on a common basis rather than through vague assertions.

Core claim

We present a framework for understanding system capabilities in relation to human cognitive abilities. Drawing from decades of research in psychology, neuroscience, and cognitive science, we introduce a Cognitive Taxonomy that deconstructs general intelligence into 10 key cognitive faculties. We then propose a rigorous evaluation protocol in which a system's performance is measured across a suite of targeted, held-out cognitive tasks, generating a 'cognitive profile' that can be used to understand a system's strengths and weaknesses.

What carries the argument

The Cognitive Taxonomy of 10 faculties drawn from psychology literature, which underpins the evaluation protocol that tests systems on held-out tasks to generate comparable cognitive profiles.

If this is right

Different AI systems can be compared on a shared scale of cognitive strengths and weaknesses.
Development progress can be tracked by changes in the cognitive profile over successive versions.
Targeted improvements can focus on specific faculties that show gaps in the profile.
Governance decisions can rest on standardized empirical profiles instead of unsubstantiated claims.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the taxonomy proves incomplete for non-human forms of intelligence, profiles could systematically undervalue certain AI approaches.
The held-out task requirement could push benchmark creators toward more isolated tests that reduce overlap between faculties.
Widespread use might shift evaluation priorities from single-task leaderboards toward breadth across all 10 faculties.

Load-bearing premise

That a decomposition of intelligence into these 10 human cognitive faculties drawn from existing psychology literature supplies the right and sufficient basis for measuring progress toward AGI.

What would settle it

An AI system that demonstrates broad intelligent behavior whose capabilities cannot be accounted for or predicted by its measured performance across the 10 faculties in the taxonomy.

Figures

Figures reproduced from arXiv: 2605.28405 by Alison M. Snyder, Allan Dafoe, Isaac R. Galatzer-Levy, Kate Olszewska, Matthew Botvinick, Meredith Ringel Morris, Noah D. Goodman, Oran Kelly, Orhan Firat, Ryan Burnell, Shane Legg, Steph Hughes-Fitt, Yumeya Yamamori.

**Figure 1.** Figure 1: Overview of the 10 cognitive faculties. Faculties outlined in orange represent composite faculties. We begin with eight faculties capturing the basic building blocks of human cognition: Perception: The ability to extract and process sensory information from the environment. Generation: The ability to produce outputs such as speech, text, motor movements, and computer control actions. Attention: The ability… view at source ↗

**Figure 2.** Figure 2: Cognitive profiles for three hypothetical systems. Panel A: A hypothetical system that shows significant cognitive weaknesses relative to the human sample. Panel B: A hypothetical system that outperforms the human sample median across all cognitive faculties. Panel C: A hypothetical system that outperforms the human sample maximum across all cognitive faculties. The hypothetical human sample scores are sta… view at source ↗

read the original abstract

Despite widespread discussion of AGI, there is no clear framework for measuring progress toward it. This ambiguity fuels subjective claims, makes it difficult to track progress, and risks hindering responsible governance. As a starting point to address this gap, we present a framework for understanding system capabilities in relation to human cognitive abilities. Drawing from decades of research in psychology, neuroscience, and cognitive science, we introduce a Cognitive Taxonomy that deconstructs general intelligence into 10 key cognitive faculties. We then propose a rigorous evaluation protocol in which a system's performance is measured across a suite of targeted, held-out cognitive tasks, generating a 'cognitive profile' that can be used to understand a system's strengths and weaknesses. We hope this framework will provide a practical roadmap and an initial step toward more rigorous, empirical evaluation of AGI.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This is a clean synthesis of psychology into an AGI evaluation framework, but it does not justify the 10-faculty split or test whether the protocol actually tracks progress better than alternatives.

read the letter

The paper's core move is to take established cognitive science work, split general intelligence into ten faculties, and outline a protocol that scores AI systems on held-out tasks to produce a profile. That structure is the main thing to know.

It does a solid job of making the case that current AGI talk is too loose and that borrowing from psychology could give evaluators a shared language. The proposal for targeted tasks rather than broad benchmarks is a reasonable direction, and the authors are clear that this is a starting point rather than a finished measurement system.

The weak point is the lack of argument for why these particular faculties are the right ones or why a human-derived taxonomy should be preferred over other decompositions. The abstract states the taxonomy without showing independence, exhaustiveness, or predictive power for AGI-relevant behavior, and the protocol is described but not run on any systems or checked for robustness. No data or validation appears.

This is aimed at people building or critiquing AGI benchmarks who already care about cognitive science. A reader looking for new empirical results or a defended alternative to scaling-law approaches will not find them here.

It deserves peer review as a framework paper. Reviewers can press on the justification for the taxonomy and ask for at least a small pilot of the protocol, which would turn the proposal into something more testable.

Referee Report

2 major / 1 minor

Summary. The paper claims there is no clear framework for measuring AGI progress and proposes a Cognitive Framework that introduces a Cognitive Taxonomy deconstructing general intelligence into 10 key faculties drawn from psychology, neuroscience, and cognitive science literature, paired with a rigorous evaluation protocol that measures system performance on targeted held-out tasks to produce a 'cognitive profile' for tracking strengths, weaknesses, and progress toward AGI.

Significance. If adopted and empirically validated, the framework could supply a structured, literature-grounded approach to AGI evaluation that reduces subjective claims and supports governance; the explicit use of held-out tasks and the proposal of cognitive profiles are constructive elements that could enable falsifiable comparisons across systems.

major comments (2)

[Cognitive Taxonomy] Cognitive Taxonomy section: the manuscript asserts that the 10 faculties provide a sufficient basis for measuring AGI progress but supplies no argument or evidence that this decomposition is exhaustive, that the faculties are independent, or that it is preferable to non-human-centric alternatives (e.g., scaling-law clusters or emergent capability taxonomies). This choice is load-bearing for the central claim that the resulting profiles constitute a useful measure of AGI progress.
[Evaluation Protocol] Evaluation Protocol section: the protocol is described as 'rigorous' yet the text contains no validation experiments, inter-task correlation analysis, or sensitivity checks demonstrating that the held-out tasks yield stable, taxonomy-independent profiles. Without such evidence the claim that the protocol enables reliable empirical tracking of AGI progress remains untested.

minor comments (1)

[Abstract] Abstract: the phrasing 'we hope this framework will provide a practical roadmap' could be tightened to clarify that the contribution is a proposal rather than a demonstrated method.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which help clarify the intended scope of our proposed framework. We address each major comment below, indicating planned revisions to avoid overstating claims while preserving the manuscript's focus as an initial step toward structured AGI evaluation.

read point-by-point responses

Referee: [Cognitive Taxonomy] Cognitive Taxonomy section: the manuscript asserts that the 10 faculties provide a sufficient basis for measuring AGI progress but supplies no argument or evidence that this decomposition is exhaustive, that the faculties are independent, or that it is preferable to non-human-centric alternatives (e.g., scaling-law clusters or emergent capability taxonomies). This choice is load-bearing for the central claim that the resulting profiles constitute a useful measure of AGI progress.

Authors: We agree that the manuscript should more explicitly bound the claims of the Cognitive Taxonomy. The 10 faculties are selected from established literature in psychology, neuroscience, and cognitive science to provide a human-centric starting point for evaluation, not as an exhaustive or mutually independent decomposition. We make no assertion of superiority to alternative approaches such as scaling-law clusters. In the revised manuscript, we will add explicit language in the Cognitive Taxonomy section stating these limitations and framing the taxonomy as a practical, literature-grounded proposal rather than a definitive basis. revision: yes
Referee: [Evaluation Protocol] Evaluation Protocol section: the protocol is described as 'rigorous' yet the text contains no validation experiments, inter-task correlation analysis, or sensitivity checks demonstrating that the held-out tasks yield stable, taxonomy-independent profiles. Without such evidence the claim that the protocol enables reliable empirical tracking of AGI progress remains untested.

Authors: We acknowledge that the evaluation protocol is a design proposal without empirical validation or sensitivity analysis in the current manuscript. The description as 'rigorous' refers to its use of targeted, held-out tasks rather than demonstrated reliability. We will revise the Evaluation Protocol section to refer to a 'proposed evaluation protocol' and include a forward-looking discussion noting that empirical validation, including inter-task correlations and stability checks, is required in future work to support reliable tracking of progress. revision: yes

Circularity Check

0 steps flagged

No circularity: proposal draws taxonomy and protocol from external literature without self-referential reductions

full rationale

The paper presents a framework proposal that deconstructs intelligence into 10 faculties drawn from existing psychology/neuroscience literature and outlines a held-out task evaluation protocol. No equations, fitted parameters, or 'predictions' appear that reduce by construction to the paper's own inputs. The taxonomy is explicitly sourced from decades of independent prior research rather than defined in terms of the proposed profile or any self-citation chain. No uniqueness theorems, ansatzes, or renamings of known results are invoked in a load-bearing way. The central claim is therefore a self-contained definitional proposal, not a derivation that collapses into its premises.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper rests on the domain assumption that human cognitive faculties are the appropriate reference class for AGI measurement and introduces no free parameters, new physical entities, or formal derivations.

axioms (1)

domain assumption Human cognitive abilities, as studied in psychology and neuroscience, supply the correct decomposition for measuring general intelligence in artificial systems.
Invoked in the abstract when the authors state they draw from decades of research in psychology, neuroscience, and cognitive science to define the 10 faculties.

pith-pipeline@v0.9.1-grok · 5710 in / 1228 out tokens · 28575 ms · 2026-06-29T11:58:11.824100+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

166 extracted references · 40 canonical work pages · 5 internal anchors

[1]

P. A. Alexander, D. Dumas, E. M. Grossnickle, A. List, and C. M. Firetto. Measuring relational reasoning. The Journal of Experimental Education, 84 0 (1): 0 119--151, 2016. doi:10.1080/00220973.2014.963216. URL https://doi.org/10.1080/00220973.2014.963216

work page doi:10.1080/00220973.2014.963216 2016
[2]

T. Y. Arbuckle and L. L. Cuddy. Discrimination of item strength at time of presentation. Journal of experimental psychology, 81 0 (1): 0 126, 1969

1969
[3]

E. Awh, A. V. Belopolsky, and J. Theeuwes. Top-down versus bottom-up attentional control: A failed theoretical dichotomy. Trends in cognitive sciences, 16 0 (8): 0 437--443, 2012

2012
[4]

C. Baber. Cognition and tool use: Forms of engagement in human and animal use of tools. CRC Press, 2003

2003
[5]

Baddeley

A. Baddeley. Working memory. Science, 255 0 (5044): 0 556, 1992

1992
[6]

A. Bandura. Observational learning. In The international encyclopedia of communication. Wiley Online Library, 2008

2008
[7]

Bari and T

A. Bari and T. W. Robbins. Inhibition and impulsivity: behavioral and neural basis of response control. Progress in neurobiology, 108: 0 44--79, 2013

2013
[8]

E. Bates. Language and context: The acquisition of pragmatics. Academic Press, 1976

1976
[9]

M. H. Bazerman, J. R. Curhan, D. A. Moore, and K. L. Valley. Negotiation. Annual review of psychology, 51 0 (1): 0 279--314, 2000

2000
[10]

Bhagavatula, R

C. Bhagavatula, R. L. Bras, C. Malaviya, K. Sakaguchi, A. Holtzman, H. Rashkin, D. Downey, S. W.-t. Yih, and Y. Choi. Abductive commonsense reasoning, 2020. URL https://arxiv.org/abs/1908.05739

work page arXiv 2020
[11]

Biederman

I. Biederman. Recognition-by-components: a theory of human image understanding. Psychological review, 94 0 (2): 0 115, 1987

1987
[12]

M. Binz, I. Dasgupta, A. K. Jagadish, M. Botvinick, J. X. Wang, and E. Schulz. Meta-learned models of cognition. Behavioral and Brain Sciences, 47: 0 e147, 2024

2024
[13]

J. K. Bizley and Y. E. Cohen. The what, where and how of auditory-object perception. Nature Reviews Neuroscience, 14 0 (10): 0 693--707, Oct 2013. ISSN 1471-0048. doi:10.1038/nrn3565. URL https://doi.org/10.1038/nrn3565

work page doi:10.1038/nrn3565 2013
[14]

R. A. Bjork. Retrieval inhibition as an adaptive mechanism in human memory, pages 309--330. Varieties of memory and consciousness: Essays in honour of Endel Tulving. Lawrence Erlbaum Associates, Inc, Hillsdale, NJ, US, 1989. ISBN 0-89859-935-0 (Hardcover); 0-8058-0546-X (Paperback)

1989
[15]

J. Blauert. Spatial Hearing: The Psychophysics of Human Sound Localization. The MIT Press, 10 1996. ISBN 9780262268684. doi:10.7551/mitpress/6391.001.0001. URL https://doi.org/10.7551/mitpress/6391.001.0001

work page doi:10.7551/mitpress/6391.001.0001 1996
[16]

M. A. Boden. What is creativity? In M. A. Boden, editor, Dimensions of Creativity, pages 75--117. The MIT Press, Cambridge, MA, 1994. ISBN 9780262023689. doi:10.7551/mitpress/2437.003.0006

work page doi:10.7551/mitpress/2437.003.0006 1994
[17]

Borst and M

A. Borst and M. Egelhaaf. Principles of visual motion detection. Trends in neurosciences, 12 0 (8): 0 297--306, 1989

1989
[18]

M. M. Botvinick. Conflict monitoring and decision making: reconciling two perspectives on anterior cingulate function. Cognitive, Affective, & Behavioral Neuroscience, 7 0 (4): 0 356--366, 2007

2007
[19]

M. M. Botvinick, T. S. Braver, D. M. Barch, C. S. Carter, and J. D. Cohen. Conflict monitoring and cognitive control. Psychological review, 108 0 (3): 0 624, 2001

2001
[20]

R. J. Brachman and H. J. Levesque. Machines like Us: Toward AI with Common Sense. The MIT Press, Cambridge, MA, 2022. ISBN 9780262369237. doi:10.7551/mitpress/14299.001.0001. URL https://doi.org/10.7551/mitpress/14299.001.0001

work page doi:10.7551/mitpress/14299.001.0001 2022
[21]

Braem and T

S. Braem and T. Egner. Getting a grip on cognitive flexibility. Current directions in psychological science, 27 0 (6): 0 470--476, 2018

2018
[22]

A. S. Bregman. Auditory scene analysis: The perceptual organization of sound. MIT press, 1994

1994
[23]

M. R. Brent. Speech segmentation and word discovery: A computational perspective. Trends in Cognitive Sciences, 3 0 (8): 0 294--301, 1999

1999
[24]

V. J. Brown and D. S. Tait. Attentional Set-Shifting Across Species, volume 28 of Current Topics in Behavioral Neurosciences, pages 363--395. Springer International Publishing, Cham, 2016. ISBN 978-3-319-33913-9. doi:10.1007/7854_2015_5002. URL https://doi.org/10.1007/7854_2015_5002

work page doi:10.1007/7854_2015_5002 2016
[25]

J. S. Bruner. A Study of Thinking. Routledge, 2nd edition, 1986. doi:10.4324/9781315083223

work page doi:10.4324/9781315083223 1986
[26]

A. P. Burgoyne and R. W. Engle. Attention control: A cornerstone of higher-order cognition. Current Directions in Psychological Science, 29 0 (6): 0 624--630, 2020

2020
[27]

T. J. Buschman and E. K. Miller. Goal-direction and top-down control. Philosophical Transactions of the Royal Society B: Biological Sciences, 369 0 (1655): 0 20130471, 2014

2014
[28]

R. B. Cattell. The measurement of adult intelligence. Psychological bulletin, 40 0 (3): 0 153, 1943

1943
[29]

Cheng, A

A. Cheng, A. Jacovi, A. Globerson, B. Golan, C. Kwong, C. Alberti, C. Tao, E. Ben-David, G. S. Tomar, L. Haas, Y. Bitton, A. Bloniarz, A. Bai, A. Wang, A. Siddiqui, A. B. Castillo, A. Atias, C. Liu, C. Fry, D. Balle, D. Ghosal, D. Kukliansky, D. Marcus, E. Gribovskaya, E. Ofek, H. Zhuang, I. Laish, J. Ackermann, L. Wang, M. Risdal, M. Barnes, M. Fink, M. ...

work page arXiv 2025
[30]

P. W. Cheng and K. J. Holyoak. Pragmatic reasoning schemas. Cognitive Psychology, 17 0 (4): 0 391--416, 1985. ISSN 0010-0285. doi:https://doi.org/10.1016/0010-0285(85)90014-3. URL https://www.sciencedirect.com/science/article/pii/0010028585900143

work page doi:10.1016/0010-0285(85)90014-3 1985
[31]

F. Chollet. On the measure of intelligence. arXiv preprint arXiv:1911.01547, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1911
[32]

ARC-AGI-2: A New Challenge for Frontier AI Reasoning Systems

F. Chollet, M. Knoop, G. Kamradt, B. Landers, and H. Pinkard. Arc-agi-2: A new challenge for frontier ai reasoning systems. arXiv preprint arXiv:2505.11831, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[33]

N. Chomsky. Aspects of the theory of syntax. MIT Press, 1965

1965
[34]

Chung and R

A. Chung and R. N. Rimal. Social norms: a review. Review of Communication Research, 4: 0 1--28, 2016. ISSN 2255-4165. doi:https://doi.org/10.12840/issn.2255-4165.2016.04.01.008

work page doi:10.12840/issn.2255-4165.2016.04.01.008 2016
[35]

N. J. Cohen and L. R. Squire. Preserved learning and retention of pattern-analyzing skill in amnesia: Dissociation of knowing how and knowing that. Science, 210 0 (4466): 0 207--210, 1980

1980
[36]

Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities

G. Comanici, E. Bieber, M. Schaekermann, I. Pasupat, N. Sachdeva, I. Dhillon, M. Blistein, O. Ram, D. Zhang, E. Rosen, et al. Gemini 2.5: Pushing the frontier with advanced reasoning, multimodality, long context, and next generation agentic capabilities. arXiv preprint arXiv:2507.06261, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[37]

M. A. Conway. Episodic memories. Neuropsychologia, 47 0 (11): 0 2305--2313, 2009

2009
[38]

T. N. Cornsweet. Visual Perception. Academic Press, 1970. ISBN 978-0-12-189750-5

1970
[39]

Cowan, E

N. Cowan, E. M. Elliott, J. S. Saults, C. C. Morey, S. Mattox, A. Hismjatullina, and A. R. Conway. On the capacity of attention: Its estimation and its role in working memory and cognitive aptitudes. Cognitive psychology, 51 0 (1): 0 42--100, 2005

2005
[40]

Danthiir, R

V. Danthiir, R. D. Roberts, R. Schulze, and O. Wilhelm. Mental speed: On frameworks, paradigms, and a platform for the future. In O. Wilhelm and R. W. Engle, editors, Handbook of Understanding and Measuring Intelligence, pages 27--46. Sage Publications, Inc., Thousand Oaks, CA, 2005. doi:10.4135/9781452233529.n3

work page doi:10.4135/9781452233529.n3 2005
[41]

Dasgupta, A

I. Dasgupta, A. K. Lampinen, S. C. Y. Chan, H. R. Sheahan, A. Creswell, D. Kumaran, J. L. McClelland, and F. Hill. Language models show human-like content effects on reasoning tasks, 2024. URL https://arxiv.org/abs/2207.07051

work page arXiv 2024
[42]

E. H. de Haan and H. C. Dijkerman. Somatosensation in the brain: a theoretical re-evaluation and a new model. Trends in Cognitive Sciences, 24 0 (7): 0 529--541, 2020

2020
[43]

A. Diamond. Executive functions. Annual review of psychology, 64 0 (1): 0 135--168, 2013

2013
[44]

Dickinson and B

A. Dickinson and B. Balleine. Motivational control of goal-directed action. Animal learning & behavior, 22 0 (1): 0 1--18, 1994

1994
[45]

R. L. Diehl, A. J. Lotto, and L. L. Holt. Speech perception. Annu. Rev. Psychol., 55 0 (1): 0 149--179, 2004

2004
[46]

K. Dunbar. What scientific thinking reveals about the nature of cognition. In K. Crowley, C. D. Schunn, and T. Okada, editors, Designing for science: Implications from everyday, classroom, and professional settings, pages 115--140. Lawrence Erlbaum Associates, 2001

2001
[47]

Dunlosky and J

J. Dunlosky and J. Metcalfe. Metacognition. Metacognition. Sage Publications, Inc, Thousand Oaks, CA, US, 2009. ISBN 978-1-4129-3972-0 (Paperback)

2009
[48]

Dunlosky, K

J. Dunlosky, K. A. Rawson, E. J. Marsh, M. J. Nathan, and D. T. Willingham. Improving students’ learning with effective learning techniques: Promising directions from cognitive and educational psychology. Psychological Science in the Public interest, 14 0 (1): 0 4--58, 2013

2013
[49]

R. W. Engle. Working memory capacity as executive attention. Current directions in psychological science, 11 0 (1): 0 19--23, 2002

2002
[50]

R. W. Engle. Working memory and executive attention: A revisit. Perspectives on psychological science, 13 0 (2): 0 190--193, 2018

2018
[51]

R. A. Epstein and C. I. Baker. Scene perception in the human brain. Annual review of vision science, 5 0 (1): 0 373--397, 2019

2019
[52]

K. A. Ericsson, R. R. Hoffman, A. Kozbelt, and A. M. Williams, editors. The Cambridge Handbook of Expertise and Expert Performance. Cambridge Handbooks in Psychology. Cambridge University Press, 2 edition, 2018

2018
[53]

Esterman and D

M. Esterman and D. Rothlein. Models of sustained attention. Current opinion in psychology, 29: 0 174--180, 2019

2019
[54]

M. J. Farah. Visual agnosia: disorders of object recognition and what they tell us about normal vision. The MIT Press, 1990

1990
[55]

G. T. Fechner. Elemente der psychophysik, volume 2. Breitkopf u. H \"a rtel, 1860
[56]

Fedorenko, A

E. Fedorenko, A. Ivanova, R. Dhamala, and M. U. Bers. The language of programming: A cognitive perspective. Trends in Cognitive Sciences, 23 0 (7): 0 525--528, 2019. ISSN 1364-6613. doi:https://doi.org/10.1016/j.tics.2019.04.010. URL https://www.sciencedirect.com/science/article/pii/S1364661319301020

work page doi:10.1016/j.tics.2019.04.010 2019
[57]

J. H. Flavell. Metacognition and cognitive monitoring: A new area of cognitive--developmental inquiry. American psychologist, 34 0 (10): 0 906, 1979

1979
[58]

S. M. Fleming and N. D. Daw. Self-evaluation of decision-making: A general bayesian framework for metacognitive computation. Psychological review, 124 0 (1): 0 91, 2017

2017
[59]

S. M. Fleming and H. C. Lau. How to measure metacognition. Frontiers in human neuroscience, 8: 0 443, 2014

2014
[60]

A. D. Friederici. Towards a neural basis of auditory sentence processing. Trends in cognitive sciences, 6 0 (2): 0 78--84, 2002

2002
[61]

Frith and U

C. Frith and U. Frith. Theory of mind. Current biology, 15 0 (17): 0 R644--R645, 2005

2005
[62]

J. B. Fritz, M. Elhilali, S. V. David, and S. A. Shamma. Auditory attention—focusing the searchlight on sound. Current opinion in neurobiology, 17 0 (4): 0 437--455, 2007

2007
[63]

M. F. Garrett. Processes in language production. In F. J. Newmeyer, editor, Language: Psychological and biological aspects, pages 69--96. Cambridge University Press, 1988

1988
[64]

K. R. Gegenfurtner. Cortical mechanisms of colour vision. Nature Reviews Neuroscience, 4 0 (7): 0 563--572, 2003

2003
[65]

Gentner and F

D. Gentner and F. Maravilla. Analogical reasoning, pages 186--203. The Routledge international handbook series. Routledge/Taylor & Francis Group, New York, NY, US, 2018

2018
[66]

S. J. Gershman and Y. Niv. Learning latent structure: carving nature at its joints. Current opinion in neurobiology, 20 0 (2): 0 251--256, 2010

2010
[67]

Gilchrist, C

A. Gilchrist, C. Kossyfidis, F. Bonato, T. Agostini, J. Cataliotti, X. Li, B. Spehar, V. Annan, and E. Economou. An anchoring theory of lightness perception. Psychological review, 106 0 (4): 0 795, 1999

1999
[68]

Gilmore, S

C. Gilmore, S. M. G \"o bel, and M. Inglis. An introduction to mathematical cognition. Routledge, 2018

2018
[69]

Ginsburg and E

S. Ginsburg and E. Jablonka. The evolution of associative learning: A factor in the cambrian explosion. Journal of theoretical biology, 266 0 (1): 0 11--20, 2010

2010
[70]

Goertzel and C

B. Goertzel and C. Pennachin, editors. Artificial General Intelligence. Cognitive Technologies. Springer Berlin, Heidelberg, 2007. ISBN 978-3-540-23733-4. doi:10.1007/978-3-540-68677-4

work page doi:10.1007/978-3-540-68677-4 2007
[71]

N. D. Goodman, J. B. Tenenbaum, J. Feldman, and T. L. Griffiths. A rational analysis of rule-based concept learning. Cognitive science, 32 0 (1): 0 108--154, 2008

2008
[72]

J. A. Grahn. Neural mechanisms of rhythm perception: current findings and future perspectives. Topics in cognitive science, 4 0 (4): 0 585--606, 2012

2012
[73]

A. M. Grant. Rethinking psychological mindedness: Metacognition, self-reflection, and insight. Behaviour Change, 18 0 (1): 0 8--17, 2001

2001
[74]

M. Gubrud. Nanotechnology and international security. In Fifth Foresight Conference on Molecular Nanotechnology, Nov. 1997. Presented November 1997

1997
[75]

L. Haas, G. Yona, G. D'Antonio, S. Goldshtein, and D. Das. Simpleqa verified: A reliable factuality benchmark to measure parametric knowledge. arXiv preprint arXiv:2509.07968, 2025

work page arXiv 2025
[76]

R. N. Haber and M. Hershenson. The psychology of visual perception. Holt, Rinehart & Winston, 1973

1973
[77]

Harris and J

J. Harris and J. Smith. Sensation and perception. SAGE Publications Ltd, 2022

2022
[78]

N. Harvey. Confidence in judgment. Trends in cognitive sciences, 1 0 (2): 0 78--82, 1997

1997
[79]

E. Heit. Properties of inductive reasoning. Psychonomic bulletin & review, 7 0 (4): 0 569--592, 2000

2000
[80]

Hendrycks, D

D. Hendrycks, D. Song, C. Szegedy, H. Lee, Y. Gal, E. Brynjolfsson, S. Li, A. Zou, L. Levine, B. Han, J. Fu, Z. Liu, J. Shin, K. Lee, M. Mazeika, L. Phan, G. Ingebretsen, A. Khoja, C. Xie, O. Salaudeen, M. Hein, K. Zhao, A. Pan, D. Duvenaud, B. Li, S. Omohundro, G. Alfour, M. Tegmark, K. McGrew, G. Marcus, J. Tallinn, E. Schmidt, and Y. Bengio. A definiti...

2025

Showing first 80 references.

[1] [1]

P. A. Alexander, D. Dumas, E. M. Grossnickle, A. List, and C. M. Firetto. Measuring relational reasoning. The Journal of Experimental Education, 84 0 (1): 0 119--151, 2016. doi:10.1080/00220973.2014.963216. URL https://doi.org/10.1080/00220973.2014.963216

work page doi:10.1080/00220973.2014.963216 2016

[2] [2]

T. Y. Arbuckle and L. L. Cuddy. Discrimination of item strength at time of presentation. Journal of experimental psychology, 81 0 (1): 0 126, 1969

1969

[3] [3]

E. Awh, A. V. Belopolsky, and J. Theeuwes. Top-down versus bottom-up attentional control: A failed theoretical dichotomy. Trends in cognitive sciences, 16 0 (8): 0 437--443, 2012

2012

[4] [4]

C. Baber. Cognition and tool use: Forms of engagement in human and animal use of tools. CRC Press, 2003

2003

[5] [5]

Baddeley

A. Baddeley. Working memory. Science, 255 0 (5044): 0 556, 1992

1992

[6] [6]

A. Bandura. Observational learning. In The international encyclopedia of communication. Wiley Online Library, 2008

2008

[7] [7]

Bari and T

A. Bari and T. W. Robbins. Inhibition and impulsivity: behavioral and neural basis of response control. Progress in neurobiology, 108: 0 44--79, 2013

2013

[8] [8]

E. Bates. Language and context: The acquisition of pragmatics. Academic Press, 1976

1976

[9] [9]

M. H. Bazerman, J. R. Curhan, D. A. Moore, and K. L. Valley. Negotiation. Annual review of psychology, 51 0 (1): 0 279--314, 2000

2000

[10] [10]

Bhagavatula, R

C. Bhagavatula, R. L. Bras, C. Malaviya, K. Sakaguchi, A. Holtzman, H. Rashkin, D. Downey, S. W.-t. Yih, and Y. Choi. Abductive commonsense reasoning, 2020. URL https://arxiv.org/abs/1908.05739

work page arXiv 2020

[11] [11]

Biederman

I. Biederman. Recognition-by-components: a theory of human image understanding. Psychological review, 94 0 (2): 0 115, 1987

1987

[12] [12]

M. Binz, I. Dasgupta, A. K. Jagadish, M. Botvinick, J. X. Wang, and E. Schulz. Meta-learned models of cognition. Behavioral and Brain Sciences, 47: 0 e147, 2024

2024

[13] [13]

J. K. Bizley and Y. E. Cohen. The what, where and how of auditory-object perception. Nature Reviews Neuroscience, 14 0 (10): 0 693--707, Oct 2013. ISSN 1471-0048. doi:10.1038/nrn3565. URL https://doi.org/10.1038/nrn3565

work page doi:10.1038/nrn3565 2013

[14] [14]

R. A. Bjork. Retrieval inhibition as an adaptive mechanism in human memory, pages 309--330. Varieties of memory and consciousness: Essays in honour of Endel Tulving. Lawrence Erlbaum Associates, Inc, Hillsdale, NJ, US, 1989. ISBN 0-89859-935-0 (Hardcover); 0-8058-0546-X (Paperback)

1989

[15] [15]

J. Blauert. Spatial Hearing: The Psychophysics of Human Sound Localization. The MIT Press, 10 1996. ISBN 9780262268684. doi:10.7551/mitpress/6391.001.0001. URL https://doi.org/10.7551/mitpress/6391.001.0001

work page doi:10.7551/mitpress/6391.001.0001 1996

[16] [16]

M. A. Boden. What is creativity? In M. A. Boden, editor, Dimensions of Creativity, pages 75--117. The MIT Press, Cambridge, MA, 1994. ISBN 9780262023689. doi:10.7551/mitpress/2437.003.0006

work page doi:10.7551/mitpress/2437.003.0006 1994

[17] [17]

Borst and M

A. Borst and M. Egelhaaf. Principles of visual motion detection. Trends in neurosciences, 12 0 (8): 0 297--306, 1989

1989

[18] [18]

M. M. Botvinick. Conflict monitoring and decision making: reconciling two perspectives on anterior cingulate function. Cognitive, Affective, & Behavioral Neuroscience, 7 0 (4): 0 356--366, 2007

2007

[19] [19]

M. M. Botvinick, T. S. Braver, D. M. Barch, C. S. Carter, and J. D. Cohen. Conflict monitoring and cognitive control. Psychological review, 108 0 (3): 0 624, 2001

2001

[20] [20]

R. J. Brachman and H. J. Levesque. Machines like Us: Toward AI with Common Sense. The MIT Press, Cambridge, MA, 2022. ISBN 9780262369237. doi:10.7551/mitpress/14299.001.0001. URL https://doi.org/10.7551/mitpress/14299.001.0001

work page doi:10.7551/mitpress/14299.001.0001 2022

[21] [21]

Braem and T

S. Braem and T. Egner. Getting a grip on cognitive flexibility. Current directions in psychological science, 27 0 (6): 0 470--476, 2018

2018

[22] [22]

A. S. Bregman. Auditory scene analysis: The perceptual organization of sound. MIT press, 1994

1994

[23] [23]

M. R. Brent. Speech segmentation and word discovery: A computational perspective. Trends in Cognitive Sciences, 3 0 (8): 0 294--301, 1999

1999

[24] [24]

V. J. Brown and D. S. Tait. Attentional Set-Shifting Across Species, volume 28 of Current Topics in Behavioral Neurosciences, pages 363--395. Springer International Publishing, Cham, 2016. ISBN 978-3-319-33913-9. doi:10.1007/7854_2015_5002. URL https://doi.org/10.1007/7854_2015_5002

work page doi:10.1007/7854_2015_5002 2016

[25] [25]

J. S. Bruner. A Study of Thinking. Routledge, 2nd edition, 1986. doi:10.4324/9781315083223

work page doi:10.4324/9781315083223 1986

[26] [26]

A. P. Burgoyne and R. W. Engle. Attention control: A cornerstone of higher-order cognition. Current Directions in Psychological Science, 29 0 (6): 0 624--630, 2020

2020

[27] [27]

T. J. Buschman and E. K. Miller. Goal-direction and top-down control. Philosophical Transactions of the Royal Society B: Biological Sciences, 369 0 (1655): 0 20130471, 2014

2014

[28] [28]

R. B. Cattell. The measurement of adult intelligence. Psychological bulletin, 40 0 (3): 0 153, 1943

1943

[29] [29]

Cheng, A

A. Cheng, A. Jacovi, A. Globerson, B. Golan, C. Kwong, C. Alberti, C. Tao, E. Ben-David, G. S. Tomar, L. Haas, Y. Bitton, A. Bloniarz, A. Bai, A. Wang, A. Siddiqui, A. B. Castillo, A. Atias, C. Liu, C. Fry, D. Balle, D. Ghosal, D. Kukliansky, D. Marcus, E. Gribovskaya, E. Ofek, H. Zhuang, I. Laish, J. Ackermann, L. Wang, M. Risdal, M. Barnes, M. Fink, M. ...

work page arXiv 2025

[30] [30]

P. W. Cheng and K. J. Holyoak. Pragmatic reasoning schemas. Cognitive Psychology, 17 0 (4): 0 391--416, 1985. ISSN 0010-0285. doi:https://doi.org/10.1016/0010-0285(85)90014-3. URL https://www.sciencedirect.com/science/article/pii/0010028585900143

work page doi:10.1016/0010-0285(85)90014-3 1985

[31] [31]

F. Chollet. On the measure of intelligence. arXiv preprint arXiv:1911.01547, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1911

[32] [32]

ARC-AGI-2: A New Challenge for Frontier AI Reasoning Systems

F. Chollet, M. Knoop, G. Kamradt, B. Landers, and H. Pinkard. Arc-agi-2: A new challenge for frontier ai reasoning systems. arXiv preprint arXiv:2505.11831, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[33] [33]

N. Chomsky. Aspects of the theory of syntax. MIT Press, 1965

1965

[34] [34]

Chung and R

A. Chung and R. N. Rimal. Social norms: a review. Review of Communication Research, 4: 0 1--28, 2016. ISSN 2255-4165. doi:https://doi.org/10.12840/issn.2255-4165.2016.04.01.008

work page doi:10.12840/issn.2255-4165.2016.04.01.008 2016

[35] [35]

N. J. Cohen and L. R. Squire. Preserved learning and retention of pattern-analyzing skill in amnesia: Dissociation of knowing how and knowing that. Science, 210 0 (4466): 0 207--210, 1980

1980

[36] [36]

Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities

G. Comanici, E. Bieber, M. Schaekermann, I. Pasupat, N. Sachdeva, I. Dhillon, M. Blistein, O. Ram, D. Zhang, E. Rosen, et al. Gemini 2.5: Pushing the frontier with advanced reasoning, multimodality, long context, and next generation agentic capabilities. arXiv preprint arXiv:2507.06261, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[37] [37]

M. A. Conway. Episodic memories. Neuropsychologia, 47 0 (11): 0 2305--2313, 2009

2009

[38] [38]

T. N. Cornsweet. Visual Perception. Academic Press, 1970. ISBN 978-0-12-189750-5

1970

[39] [39]

Cowan, E

N. Cowan, E. M. Elliott, J. S. Saults, C. C. Morey, S. Mattox, A. Hismjatullina, and A. R. Conway. On the capacity of attention: Its estimation and its role in working memory and cognitive aptitudes. Cognitive psychology, 51 0 (1): 0 42--100, 2005

2005

[40] [40]

Danthiir, R

V. Danthiir, R. D. Roberts, R. Schulze, and O. Wilhelm. Mental speed: On frameworks, paradigms, and a platform for the future. In O. Wilhelm and R. W. Engle, editors, Handbook of Understanding and Measuring Intelligence, pages 27--46. Sage Publications, Inc., Thousand Oaks, CA, 2005. doi:10.4135/9781452233529.n3

work page doi:10.4135/9781452233529.n3 2005

[41] [41]

Dasgupta, A

I. Dasgupta, A. K. Lampinen, S. C. Y. Chan, H. R. Sheahan, A. Creswell, D. Kumaran, J. L. McClelland, and F. Hill. Language models show human-like content effects on reasoning tasks, 2024. URL https://arxiv.org/abs/2207.07051

work page arXiv 2024

[42] [42]

E. H. de Haan and H. C. Dijkerman. Somatosensation in the brain: a theoretical re-evaluation and a new model. Trends in Cognitive Sciences, 24 0 (7): 0 529--541, 2020

2020

[43] [43]

A. Diamond. Executive functions. Annual review of psychology, 64 0 (1): 0 135--168, 2013

2013

[44] [44]

Dickinson and B

A. Dickinson and B. Balleine. Motivational control of goal-directed action. Animal learning & behavior, 22 0 (1): 0 1--18, 1994

1994

[45] [45]

R. L. Diehl, A. J. Lotto, and L. L. Holt. Speech perception. Annu. Rev. Psychol., 55 0 (1): 0 149--179, 2004

2004

[46] [46]

K. Dunbar. What scientific thinking reveals about the nature of cognition. In K. Crowley, C. D. Schunn, and T. Okada, editors, Designing for science: Implications from everyday, classroom, and professional settings, pages 115--140. Lawrence Erlbaum Associates, 2001

2001

[47] [47]

Dunlosky and J

J. Dunlosky and J. Metcalfe. Metacognition. Metacognition. Sage Publications, Inc, Thousand Oaks, CA, US, 2009. ISBN 978-1-4129-3972-0 (Paperback)

2009

[48] [48]

Dunlosky, K

J. Dunlosky, K. A. Rawson, E. J. Marsh, M. J. Nathan, and D. T. Willingham. Improving students’ learning with effective learning techniques: Promising directions from cognitive and educational psychology. Psychological Science in the Public interest, 14 0 (1): 0 4--58, 2013

2013

[49] [49]

R. W. Engle. Working memory capacity as executive attention. Current directions in psychological science, 11 0 (1): 0 19--23, 2002

2002

[50] [50]

R. W. Engle. Working memory and executive attention: A revisit. Perspectives on psychological science, 13 0 (2): 0 190--193, 2018

2018

[51] [51]

R. A. Epstein and C. I. Baker. Scene perception in the human brain. Annual review of vision science, 5 0 (1): 0 373--397, 2019

2019

[52] [52]

K. A. Ericsson, R. R. Hoffman, A. Kozbelt, and A. M. Williams, editors. The Cambridge Handbook of Expertise and Expert Performance. Cambridge Handbooks in Psychology. Cambridge University Press, 2 edition, 2018

2018

[53] [53]

Esterman and D

M. Esterman and D. Rothlein. Models of sustained attention. Current opinion in psychology, 29: 0 174--180, 2019

2019

[54] [54]

M. J. Farah. Visual agnosia: disorders of object recognition and what they tell us about normal vision. The MIT Press, 1990

1990

[55] [55]

G. T. Fechner. Elemente der psychophysik, volume 2. Breitkopf u. H \"a rtel, 1860

[56] [56]

Fedorenko, A

E. Fedorenko, A. Ivanova, R. Dhamala, and M. U. Bers. The language of programming: A cognitive perspective. Trends in Cognitive Sciences, 23 0 (7): 0 525--528, 2019. ISSN 1364-6613. doi:https://doi.org/10.1016/j.tics.2019.04.010. URL https://www.sciencedirect.com/science/article/pii/S1364661319301020

work page doi:10.1016/j.tics.2019.04.010 2019

[57] [57]

J. H. Flavell. Metacognition and cognitive monitoring: A new area of cognitive--developmental inquiry. American psychologist, 34 0 (10): 0 906, 1979

1979

[58] [58]

S. M. Fleming and N. D. Daw. Self-evaluation of decision-making: A general bayesian framework for metacognitive computation. Psychological review, 124 0 (1): 0 91, 2017

2017

[59] [59]

S. M. Fleming and H. C. Lau. How to measure metacognition. Frontiers in human neuroscience, 8: 0 443, 2014

2014

[60] [60]

A. D. Friederici. Towards a neural basis of auditory sentence processing. Trends in cognitive sciences, 6 0 (2): 0 78--84, 2002

2002

[61] [61]

Frith and U

C. Frith and U. Frith. Theory of mind. Current biology, 15 0 (17): 0 R644--R645, 2005

2005

[62] [62]

J. B. Fritz, M. Elhilali, S. V. David, and S. A. Shamma. Auditory attention—focusing the searchlight on sound. Current opinion in neurobiology, 17 0 (4): 0 437--455, 2007

2007

[63] [63]

M. F. Garrett. Processes in language production. In F. J. Newmeyer, editor, Language: Psychological and biological aspects, pages 69--96. Cambridge University Press, 1988

1988

[64] [64]

K. R. Gegenfurtner. Cortical mechanisms of colour vision. Nature Reviews Neuroscience, 4 0 (7): 0 563--572, 2003

2003

[65] [65]

Gentner and F

D. Gentner and F. Maravilla. Analogical reasoning, pages 186--203. The Routledge international handbook series. Routledge/Taylor & Francis Group, New York, NY, US, 2018

2018

[66] [66]

S. J. Gershman and Y. Niv. Learning latent structure: carving nature at its joints. Current opinion in neurobiology, 20 0 (2): 0 251--256, 2010

2010

[67] [67]

Gilchrist, C

A. Gilchrist, C. Kossyfidis, F. Bonato, T. Agostini, J. Cataliotti, X. Li, B. Spehar, V. Annan, and E. Economou. An anchoring theory of lightness perception. Psychological review, 106 0 (4): 0 795, 1999

1999

[68] [68]

Gilmore, S

C. Gilmore, S. M. G \"o bel, and M. Inglis. An introduction to mathematical cognition. Routledge, 2018

2018

[69] [69]

Ginsburg and E

S. Ginsburg and E. Jablonka. The evolution of associative learning: A factor in the cambrian explosion. Journal of theoretical biology, 266 0 (1): 0 11--20, 2010

2010

[70] [70]

Goertzel and C

B. Goertzel and C. Pennachin, editors. Artificial General Intelligence. Cognitive Technologies. Springer Berlin, Heidelberg, 2007. ISBN 978-3-540-23733-4. doi:10.1007/978-3-540-68677-4

work page doi:10.1007/978-3-540-68677-4 2007

[71] [71]

N. D. Goodman, J. B. Tenenbaum, J. Feldman, and T. L. Griffiths. A rational analysis of rule-based concept learning. Cognitive science, 32 0 (1): 0 108--154, 2008

2008

[72] [72]

J. A. Grahn. Neural mechanisms of rhythm perception: current findings and future perspectives. Topics in cognitive science, 4 0 (4): 0 585--606, 2012

2012

[73] [73]

A. M. Grant. Rethinking psychological mindedness: Metacognition, self-reflection, and insight. Behaviour Change, 18 0 (1): 0 8--17, 2001

2001

[74] [74]

M. Gubrud. Nanotechnology and international security. In Fifth Foresight Conference on Molecular Nanotechnology, Nov. 1997. Presented November 1997

1997

[75] [75]

L. Haas, G. Yona, G. D'Antonio, S. Goldshtein, and D. Das. Simpleqa verified: A reliable factuality benchmark to measure parametric knowledge. arXiv preprint arXiv:2509.07968, 2025

work page arXiv 2025

[76] [76]

R. N. Haber and M. Hershenson. The psychology of visual perception. Holt, Rinehart & Winston, 1973

1973

[77] [77]

Harris and J

J. Harris and J. Smith. Sensation and perception. SAGE Publications Ltd, 2022

2022

[78] [78]

N. Harvey. Confidence in judgment. Trends in cognitive sciences, 1 0 (2): 0 78--82, 1997

1997

[79] [79]

E. Heit. Properties of inductive reasoning. Psychonomic bulletin & review, 7 0 (4): 0 569--592, 2000

2000

[80] [80]

Hendrycks, D

D. Hendrycks, D. Song, C. Szegedy, H. Lee, Y. Gal, E. Brynjolfsson, S. Li, A. Zou, L. Levine, B. Han, J. Fu, Z. Liu, J. Shin, K. Lee, M. Mazeika, L. Phan, G. Ingebretsen, A. Khoja, C. Xie, O. Salaudeen, M. Hein, K. Zhao, A. Pan, D. Duvenaud, B. Li, S. Omohundro, G. Alfour, M. Tegmark, K. McGrew, G. Marcus, J. Tallinn, E. Schmidt, and Y. Bengio. A definiti...

2025