Measuring Progress Toward AGI: A Cognitive Framework
Pith reviewed 2026-06-29 11:58 UTC · model grok-4.3
The pith
A cognitive taxonomy of 10 faculties from psychology enables empirical profiles that track AI progress toward AGI.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We present a framework for understanding system capabilities in relation to human cognitive abilities. Drawing from decades of research in psychology, neuroscience, and cognitive science, we introduce a Cognitive Taxonomy that deconstructs general intelligence into 10 key cognitive faculties. We then propose a rigorous evaluation protocol in which a system's performance is measured across a suite of targeted, held-out cognitive tasks, generating a 'cognitive profile' that can be used to understand a system's strengths and weaknesses.
What carries the argument
The Cognitive Taxonomy of 10 faculties drawn from psychology literature, which underpins the evaluation protocol that tests systems on held-out tasks to generate comparable cognitive profiles.
If this is right
- Different AI systems can be compared on a shared scale of cognitive strengths and weaknesses.
- Development progress can be tracked by changes in the cognitive profile over successive versions.
- Targeted improvements can focus on specific faculties that show gaps in the profile.
- Governance decisions can rest on standardized empirical profiles instead of unsubstantiated claims.
Where Pith is reading between the lines
- If the taxonomy proves incomplete for non-human forms of intelligence, profiles could systematically undervalue certain AI approaches.
- The held-out task requirement could push benchmark creators toward more isolated tests that reduce overlap between faculties.
- Widespread use might shift evaluation priorities from single-task leaderboards toward breadth across all 10 faculties.
Load-bearing premise
That a decomposition of intelligence into these 10 human cognitive faculties drawn from existing psychology literature supplies the right and sufficient basis for measuring progress toward AGI.
What would settle it
An AI system that demonstrates broad intelligent behavior whose capabilities cannot be accounted for or predicted by its measured performance across the 10 faculties in the taxonomy.
Figures
read the original abstract
Despite widespread discussion of AGI, there is no clear framework for measuring progress toward it. This ambiguity fuels subjective claims, makes it difficult to track progress, and risks hindering responsible governance. As a starting point to address this gap, we present a framework for understanding system capabilities in relation to human cognitive abilities. Drawing from decades of research in psychology, neuroscience, and cognitive science, we introduce a Cognitive Taxonomy that deconstructs general intelligence into 10 key cognitive faculties. We then propose a rigorous evaluation protocol in which a system's performance is measured across a suite of targeted, held-out cognitive tasks, generating a 'cognitive profile' that can be used to understand a system's strengths and weaknesses. We hope this framework will provide a practical roadmap and an initial step toward more rigorous, empirical evaluation of AGI.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims there is no clear framework for measuring AGI progress and proposes a Cognitive Framework that introduces a Cognitive Taxonomy deconstructing general intelligence into 10 key faculties drawn from psychology, neuroscience, and cognitive science literature, paired with a rigorous evaluation protocol that measures system performance on targeted held-out tasks to produce a 'cognitive profile' for tracking strengths, weaknesses, and progress toward AGI.
Significance. If adopted and empirically validated, the framework could supply a structured, literature-grounded approach to AGI evaluation that reduces subjective claims and supports governance; the explicit use of held-out tasks and the proposal of cognitive profiles are constructive elements that could enable falsifiable comparisons across systems.
major comments (2)
- [Cognitive Taxonomy] Cognitive Taxonomy section: the manuscript asserts that the 10 faculties provide a sufficient basis for measuring AGI progress but supplies no argument or evidence that this decomposition is exhaustive, that the faculties are independent, or that it is preferable to non-human-centric alternatives (e.g., scaling-law clusters or emergent capability taxonomies). This choice is load-bearing for the central claim that the resulting profiles constitute a useful measure of AGI progress.
- [Evaluation Protocol] Evaluation Protocol section: the protocol is described as 'rigorous' yet the text contains no validation experiments, inter-task correlation analysis, or sensitivity checks demonstrating that the held-out tasks yield stable, taxonomy-independent profiles. Without such evidence the claim that the protocol enables reliable empirical tracking of AGI progress remains untested.
minor comments (1)
- [Abstract] Abstract: the phrasing 'we hope this framework will provide a practical roadmap' could be tightened to clarify that the contribution is a proposal rather than a demonstrated method.
Simulated Author's Rebuttal
We thank the referee for the constructive comments, which help clarify the intended scope of our proposed framework. We address each major comment below, indicating planned revisions to avoid overstating claims while preserving the manuscript's focus as an initial step toward structured AGI evaluation.
read point-by-point responses
-
Referee: [Cognitive Taxonomy] Cognitive Taxonomy section: the manuscript asserts that the 10 faculties provide a sufficient basis for measuring AGI progress but supplies no argument or evidence that this decomposition is exhaustive, that the faculties are independent, or that it is preferable to non-human-centric alternatives (e.g., scaling-law clusters or emergent capability taxonomies). This choice is load-bearing for the central claim that the resulting profiles constitute a useful measure of AGI progress.
Authors: We agree that the manuscript should more explicitly bound the claims of the Cognitive Taxonomy. The 10 faculties are selected from established literature in psychology, neuroscience, and cognitive science to provide a human-centric starting point for evaluation, not as an exhaustive or mutually independent decomposition. We make no assertion of superiority to alternative approaches such as scaling-law clusters. In the revised manuscript, we will add explicit language in the Cognitive Taxonomy section stating these limitations and framing the taxonomy as a practical, literature-grounded proposal rather than a definitive basis. revision: yes
-
Referee: [Evaluation Protocol] Evaluation Protocol section: the protocol is described as 'rigorous' yet the text contains no validation experiments, inter-task correlation analysis, or sensitivity checks demonstrating that the held-out tasks yield stable, taxonomy-independent profiles. Without such evidence the claim that the protocol enables reliable empirical tracking of AGI progress remains untested.
Authors: We acknowledge that the evaluation protocol is a design proposal without empirical validation or sensitivity analysis in the current manuscript. The description as 'rigorous' refers to its use of targeted, held-out tasks rather than demonstrated reliability. We will revise the Evaluation Protocol section to refer to a 'proposed evaluation protocol' and include a forward-looking discussion noting that empirical validation, including inter-task correlations and stability checks, is required in future work to support reliable tracking of progress. revision: yes
Circularity Check
No circularity: proposal draws taxonomy and protocol from external literature without self-referential reductions
full rationale
The paper presents a framework proposal that deconstructs intelligence into 10 faculties drawn from existing psychology/neuroscience literature and outlines a held-out task evaluation protocol. No equations, fitted parameters, or 'predictions' appear that reduce by construction to the paper's own inputs. The taxonomy is explicitly sourced from decades of independent prior research rather than defined in terms of the proposed profile or any self-citation chain. No uniqueness theorems, ansatzes, or renamings of known results are invoked in a load-bearing way. The central claim is therefore a self-contained definitional proposal, not a derivation that collapses into its premises.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Human cognitive abilities, as studied in psychology and neuroscience, supply the correct decomposition for measuring general intelligence in artificial systems.
Reference graph
Works this paper leans on
-
[1]
P. A. Alexander, D. Dumas, E. M. Grossnickle, A. List, and C. M. Firetto. Measuring relational reasoning. The Journal of Experimental Education, 84 0 (1): 0 119--151, 2016. doi:10.1080/00220973.2014.963216. URL https://doi.org/10.1080/00220973.2014.963216
-
[2]
T. Y. Arbuckle and L. L. Cuddy. Discrimination of item strength at time of presentation. Journal of experimental psychology, 81 0 (1): 0 126, 1969
1969
-
[3]
E. Awh, A. V. Belopolsky, and J. Theeuwes. Top-down versus bottom-up attentional control: A failed theoretical dichotomy. Trends in cognitive sciences, 16 0 (8): 0 437--443, 2012
2012
-
[4]
C. Baber. Cognition and tool use: Forms of engagement in human and animal use of tools. CRC Press, 2003
2003
-
[5]
Baddeley
A. Baddeley. Working memory. Science, 255 0 (5044): 0 556, 1992
1992
-
[6]
A. Bandura. Observational learning. In The international encyclopedia of communication. Wiley Online Library, 2008
2008
-
[7]
Bari and T
A. Bari and T. W. Robbins. Inhibition and impulsivity: behavioral and neural basis of response control. Progress in neurobiology, 108: 0 44--79, 2013
2013
-
[8]
E. Bates. Language and context: The acquisition of pragmatics. Academic Press, 1976
1976
-
[9]
M. H. Bazerman, J. R. Curhan, D. A. Moore, and K. L. Valley. Negotiation. Annual review of psychology, 51 0 (1): 0 279--314, 2000
2000
-
[10]
C. Bhagavatula, R. L. Bras, C. Malaviya, K. Sakaguchi, A. Holtzman, H. Rashkin, D. Downey, S. W.-t. Yih, and Y. Choi. Abductive commonsense reasoning, 2020. URL https://arxiv.org/abs/1908.05739
-
[11]
Biederman
I. Biederman. Recognition-by-components: a theory of human image understanding. Psychological review, 94 0 (2): 0 115, 1987
1987
-
[12]
M. Binz, I. Dasgupta, A. K. Jagadish, M. Botvinick, J. X. Wang, and E. Schulz. Meta-learned models of cognition. Behavioral and Brain Sciences, 47: 0 e147, 2024
2024
-
[13]
J. K. Bizley and Y. E. Cohen. The what, where and how of auditory-object perception. Nature Reviews Neuroscience, 14 0 (10): 0 693--707, Oct 2013. ISSN 1471-0048. doi:10.1038/nrn3565. URL https://doi.org/10.1038/nrn3565
-
[14]
R. A. Bjork. Retrieval inhibition as an adaptive mechanism in human memory, pages 309--330. Varieties of memory and consciousness: Essays in honour of Endel Tulving. Lawrence Erlbaum Associates, Inc, Hillsdale, NJ, US, 1989. ISBN 0-89859-935-0 (Hardcover); 0-8058-0546-X (Paperback)
1989
-
[15]
J. Blauert. Spatial Hearing: The Psychophysics of Human Sound Localization. The MIT Press, 10 1996. ISBN 9780262268684. doi:10.7551/mitpress/6391.001.0001. URL https://doi.org/10.7551/mitpress/6391.001.0001
-
[16]
M. A. Boden. What is creativity? In M. A. Boden, editor, Dimensions of Creativity, pages 75--117. The MIT Press, Cambridge, MA, 1994. ISBN 9780262023689. doi:10.7551/mitpress/2437.003.0006
-
[17]
Borst and M
A. Borst and M. Egelhaaf. Principles of visual motion detection. Trends in neurosciences, 12 0 (8): 0 297--306, 1989
1989
-
[18]
M. M. Botvinick. Conflict monitoring and decision making: reconciling two perspectives on anterior cingulate function. Cognitive, Affective, & Behavioral Neuroscience, 7 0 (4): 0 356--366, 2007
2007
-
[19]
M. M. Botvinick, T. S. Braver, D. M. Barch, C. S. Carter, and J. D. Cohen. Conflict monitoring and cognitive control. Psychological review, 108 0 (3): 0 624, 2001
2001
-
[20]
R. J. Brachman and H. J. Levesque. Machines like Us: Toward AI with Common Sense. The MIT Press, Cambridge, MA, 2022. ISBN 9780262369237. doi:10.7551/mitpress/14299.001.0001. URL https://doi.org/10.7551/mitpress/14299.001.0001
-
[21]
Braem and T
S. Braem and T. Egner. Getting a grip on cognitive flexibility. Current directions in psychological science, 27 0 (6): 0 470--476, 2018
2018
-
[22]
A. S. Bregman. Auditory scene analysis: The perceptual organization of sound. MIT press, 1994
1994
-
[23]
M. R. Brent. Speech segmentation and word discovery: A computational perspective. Trends in Cognitive Sciences, 3 0 (8): 0 294--301, 1999
1999
-
[24]
V. J. Brown and D. S. Tait. Attentional Set-Shifting Across Species, volume 28 of Current Topics in Behavioral Neurosciences, pages 363--395. Springer International Publishing, Cham, 2016. ISBN 978-3-319-33913-9. doi:10.1007/7854_2015_5002. URL https://doi.org/10.1007/7854_2015_5002
-
[25]
J. S. Bruner. A Study of Thinking. Routledge, 2nd edition, 1986. doi:10.4324/9781315083223
-
[26]
A. P. Burgoyne and R. W. Engle. Attention control: A cornerstone of higher-order cognition. Current Directions in Psychological Science, 29 0 (6): 0 624--630, 2020
2020
-
[27]
T. J. Buschman and E. K. Miller. Goal-direction and top-down control. Philosophical Transactions of the Royal Society B: Biological Sciences, 369 0 (1655): 0 20130471, 2014
2014
-
[28]
R. B. Cattell. The measurement of adult intelligence. Psychological bulletin, 40 0 (3): 0 153, 1943
1943
-
[29]
A. Cheng, A. Jacovi, A. Globerson, B. Golan, C. Kwong, C. Alberti, C. Tao, E. Ben-David, G. S. Tomar, L. Haas, Y. Bitton, A. Bloniarz, A. Bai, A. Wang, A. Siddiqui, A. B. Castillo, A. Atias, C. Liu, C. Fry, D. Balle, D. Ghosal, D. Kukliansky, D. Marcus, E. Gribovskaya, E. Ofek, H. Zhuang, I. Laish, J. Ackermann, L. Wang, M. Risdal, M. Barnes, M. Fink, M. ...
-
[30]
P. W. Cheng and K. J. Holyoak. Pragmatic reasoning schemas. Cognitive Psychology, 17 0 (4): 0 391--416, 1985. ISSN 0010-0285. doi:https://doi.org/10.1016/0010-0285(85)90014-3. URL https://www.sciencedirect.com/science/article/pii/0010028585900143
-
[31]
F. Chollet. On the measure of intelligence. arXiv preprint arXiv:1911.01547, 2019
work page internal anchor Pith review Pith/arXiv arXiv 1911
-
[32]
ARC-AGI-2: A New Challenge for Frontier AI Reasoning Systems
F. Chollet, M. Knoop, G. Kamradt, B. Landers, and H. Pinkard. Arc-agi-2: A new challenge for frontier ai reasoning systems. arXiv preprint arXiv:2505.11831, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[33]
N. Chomsky. Aspects of the theory of syntax. MIT Press, 1965
1965
-
[34]
A. Chung and R. N. Rimal. Social norms: a review. Review of Communication Research, 4: 0 1--28, 2016. ISSN 2255-4165. doi:https://doi.org/10.12840/issn.2255-4165.2016.04.01.008
-
[35]
N. J. Cohen and L. R. Squire. Preserved learning and retention of pattern-analyzing skill in amnesia: Dissociation of knowing how and knowing that. Science, 210 0 (4466): 0 207--210, 1980
1980
-
[36]
G. Comanici, E. Bieber, M. Schaekermann, I. Pasupat, N. Sachdeva, I. Dhillon, M. Blistein, O. Ram, D. Zhang, E. Rosen, et al. Gemini 2.5: Pushing the frontier with advanced reasoning, multimodality, long context, and next generation agentic capabilities. arXiv preprint arXiv:2507.06261, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[37]
M. A. Conway. Episodic memories. Neuropsychologia, 47 0 (11): 0 2305--2313, 2009
2009
-
[38]
T. N. Cornsweet. Visual Perception. Academic Press, 1970. ISBN 978-0-12-189750-5
1970
-
[39]
Cowan, E
N. Cowan, E. M. Elliott, J. S. Saults, C. C. Morey, S. Mattox, A. Hismjatullina, and A. R. Conway. On the capacity of attention: Its estimation and its role in working memory and cognitive aptitudes. Cognitive psychology, 51 0 (1): 0 42--100, 2005
2005
-
[40]
V. Danthiir, R. D. Roberts, R. Schulze, and O. Wilhelm. Mental speed: On frameworks, paradigms, and a platform for the future. In O. Wilhelm and R. W. Engle, editors, Handbook of Understanding and Measuring Intelligence, pages 27--46. Sage Publications, Inc., Thousand Oaks, CA, 2005. doi:10.4135/9781452233529.n3
-
[41]
I. Dasgupta, A. K. Lampinen, S. C. Y. Chan, H. R. Sheahan, A. Creswell, D. Kumaran, J. L. McClelland, and F. Hill. Language models show human-like content effects on reasoning tasks, 2024. URL https://arxiv.org/abs/2207.07051
-
[42]
E. H. de Haan and H. C. Dijkerman. Somatosensation in the brain: a theoretical re-evaluation and a new model. Trends in Cognitive Sciences, 24 0 (7): 0 529--541, 2020
2020
-
[43]
A. Diamond. Executive functions. Annual review of psychology, 64 0 (1): 0 135--168, 2013
2013
-
[44]
Dickinson and B
A. Dickinson and B. Balleine. Motivational control of goal-directed action. Animal learning & behavior, 22 0 (1): 0 1--18, 1994
1994
-
[45]
R. L. Diehl, A. J. Lotto, and L. L. Holt. Speech perception. Annu. Rev. Psychol., 55 0 (1): 0 149--179, 2004
2004
-
[46]
K. Dunbar. What scientific thinking reveals about the nature of cognition. In K. Crowley, C. D. Schunn, and T. Okada, editors, Designing for science: Implications from everyday, classroom, and professional settings, pages 115--140. Lawrence Erlbaum Associates, 2001
2001
-
[47]
Dunlosky and J
J. Dunlosky and J. Metcalfe. Metacognition. Metacognition. Sage Publications, Inc, Thousand Oaks, CA, US, 2009. ISBN 978-1-4129-3972-0 (Paperback)
2009
-
[48]
Dunlosky, K
J. Dunlosky, K. A. Rawson, E. J. Marsh, M. J. Nathan, and D. T. Willingham. Improving students’ learning with effective learning techniques: Promising directions from cognitive and educational psychology. Psychological Science in the Public interest, 14 0 (1): 0 4--58, 2013
2013
-
[49]
R. W. Engle. Working memory capacity as executive attention. Current directions in psychological science, 11 0 (1): 0 19--23, 2002
2002
-
[50]
R. W. Engle. Working memory and executive attention: A revisit. Perspectives on psychological science, 13 0 (2): 0 190--193, 2018
2018
-
[51]
R. A. Epstein and C. I. Baker. Scene perception in the human brain. Annual review of vision science, 5 0 (1): 0 373--397, 2019
2019
-
[52]
K. A. Ericsson, R. R. Hoffman, A. Kozbelt, and A. M. Williams, editors. The Cambridge Handbook of Expertise and Expert Performance. Cambridge Handbooks in Psychology. Cambridge University Press, 2 edition, 2018
2018
-
[53]
Esterman and D
M. Esterman and D. Rothlein. Models of sustained attention. Current opinion in psychology, 29: 0 174--180, 2019
2019
-
[54]
M. J. Farah. Visual agnosia: disorders of object recognition and what they tell us about normal vision. The MIT Press, 1990
1990
-
[55]
G. T. Fechner. Elemente der psychophysik, volume 2. Breitkopf u. H \"a rtel, 1860
-
[56]
E. Fedorenko, A. Ivanova, R. Dhamala, and M. U. Bers. The language of programming: A cognitive perspective. Trends in Cognitive Sciences, 23 0 (7): 0 525--528, 2019. ISSN 1364-6613. doi:https://doi.org/10.1016/j.tics.2019.04.010. URL https://www.sciencedirect.com/science/article/pii/S1364661319301020
-
[57]
J. H. Flavell. Metacognition and cognitive monitoring: A new area of cognitive--developmental inquiry. American psychologist, 34 0 (10): 0 906, 1979
1979
-
[58]
S. M. Fleming and N. D. Daw. Self-evaluation of decision-making: A general bayesian framework for metacognitive computation. Psychological review, 124 0 (1): 0 91, 2017
2017
-
[59]
S. M. Fleming and H. C. Lau. How to measure metacognition. Frontiers in human neuroscience, 8: 0 443, 2014
2014
-
[60]
A. D. Friederici. Towards a neural basis of auditory sentence processing. Trends in cognitive sciences, 6 0 (2): 0 78--84, 2002
2002
-
[61]
Frith and U
C. Frith and U. Frith. Theory of mind. Current biology, 15 0 (17): 0 R644--R645, 2005
2005
-
[62]
J. B. Fritz, M. Elhilali, S. V. David, and S. A. Shamma. Auditory attention—focusing the searchlight on sound. Current opinion in neurobiology, 17 0 (4): 0 437--455, 2007
2007
-
[63]
M. F. Garrett. Processes in language production. In F. J. Newmeyer, editor, Language: Psychological and biological aspects, pages 69--96. Cambridge University Press, 1988
1988
-
[64]
K. R. Gegenfurtner. Cortical mechanisms of colour vision. Nature Reviews Neuroscience, 4 0 (7): 0 563--572, 2003
2003
-
[65]
Gentner and F
D. Gentner and F. Maravilla. Analogical reasoning, pages 186--203. The Routledge international handbook series. Routledge/Taylor & Francis Group, New York, NY, US, 2018
2018
-
[66]
S. J. Gershman and Y. Niv. Learning latent structure: carving nature at its joints. Current opinion in neurobiology, 20 0 (2): 0 251--256, 2010
2010
-
[67]
Gilchrist, C
A. Gilchrist, C. Kossyfidis, F. Bonato, T. Agostini, J. Cataliotti, X. Li, B. Spehar, V. Annan, and E. Economou. An anchoring theory of lightness perception. Psychological review, 106 0 (4): 0 795, 1999
1999
-
[68]
Gilmore, S
C. Gilmore, S. M. G \"o bel, and M. Inglis. An introduction to mathematical cognition. Routledge, 2018
2018
-
[69]
Ginsburg and E
S. Ginsburg and E. Jablonka. The evolution of associative learning: A factor in the cambrian explosion. Journal of theoretical biology, 266 0 (1): 0 11--20, 2010
2010
-
[70]
B. Goertzel and C. Pennachin, editors. Artificial General Intelligence. Cognitive Technologies. Springer Berlin, Heidelberg, 2007. ISBN 978-3-540-23733-4. doi:10.1007/978-3-540-68677-4
-
[71]
N. D. Goodman, J. B. Tenenbaum, J. Feldman, and T. L. Griffiths. A rational analysis of rule-based concept learning. Cognitive science, 32 0 (1): 0 108--154, 2008
2008
-
[72]
J. A. Grahn. Neural mechanisms of rhythm perception: current findings and future perspectives. Topics in cognitive science, 4 0 (4): 0 585--606, 2012
2012
-
[73]
A. M. Grant. Rethinking psychological mindedness: Metacognition, self-reflection, and insight. Behaviour Change, 18 0 (1): 0 8--17, 2001
2001
-
[74]
M. Gubrud. Nanotechnology and international security. In Fifth Foresight Conference on Molecular Nanotechnology, Nov. 1997. Presented November 1997
1997
- [75]
-
[76]
R. N. Haber and M. Hershenson. The psychology of visual perception. Holt, Rinehart & Winston, 1973
1973
-
[77]
Harris and J
J. Harris and J. Smith. Sensation and perception. SAGE Publications Ltd, 2022
2022
-
[78]
N. Harvey. Confidence in judgment. Trends in cognitive sciences, 1 0 (2): 0 78--82, 1997
1997
-
[79]
E. Heit. Properties of inductive reasoning. Psychonomic bulletin & review, 7 0 (4): 0 569--592, 2000
2000
-
[80]
Hendrycks, D
D. Hendrycks, D. Song, C. Szegedy, H. Lee, Y. Gal, E. Brynjolfsson, S. Li, A. Zou, L. Levine, B. Han, J. Fu, Z. Liu, J. Shin, K. Lee, M. Mazeika, L. Phan, G. Ingebretsen, A. Khoja, C. Xie, O. Salaudeen, M. Hein, K. Zhao, A. Pan, D. Duvenaud, B. Li, S. Omohundro, G. Alfour, M. Tegmark, K. McGrew, G. Marcus, J. Tallinn, E. Schmidt, and Y. Bengio. A definiti...
2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.