Definitional alignment before capability alignment: a Design-Science framework for adjudicating claims about AGI
Pith reviewed 2026-06-27 09:39 UTC · model grok-4.3
The pith
Whether current generative systems count as AGI depends on which definition family is used, per a design-science adjudication framework.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
DAF-AGI consists of five ordinal criteria for assessing the adjudicative fitness of candidate AGI definitions together with a structured governance audit of authorship, interest, certification, external verification, and revision authority. Demonstrated on a documented corpus of five measurement families and one deflationary position, the artifact shows that the claim current generative systems constitute AGI is certifiable only under a performance-based operationalization; capability-ontology, psychometric, and skill-acquisition approaches do not certify it, the economic family remains indeterminate, and the deflationary position refuses binary adjudication. The contribution is presented as
What carries the argument
DAF-AGI, the second-order conceptual artifact that pairs five ordinal adjudicative-fitness criteria with a governance audit of authorship, interest, certification, external verification, and revision authority.
If this is right
- Performance-based definitions alone certify the arrival claim for current generative systems while other families do not.
- The economic measurement family yields an indeterminate result under the framework.
- A deflationary boundary position refuses binary yes/no adjudication of the claim.
- Definitional sovereignty is positioned as an enabling component of algorithmic sovereignty under public accountability.
- The framework is offered for independent application, inter-rater testing, and author-external cases rather than as a finished empirical instrument.
Where Pith is reading between the lines
- Public institutions could adopt the five-criteria audit as a standing procedure before incorporating any imported AGI category into regulation or funding decisions.
- Repeated application of DAF-AGI to successive generations of systems could map when capability gains cross from one definition family into another.
- The governance-audit component supplies a template for contesting technological categories beyond AGI, such as definitions of autonomy or risk in other domains.
- A natural next test is to run the framework on non-generative systems or on claims about near-term rather than current AGI to check whether the differential pattern persists.
Load-bearing premise
The five ordinal criteria for adjudicative fitness can be applied consistently enough across definition families to produce stable differences in certification outcomes.
What would settle it
An inter-rater study in which independent evaluators apply the five criteria and governance audit to the same corpus and produce substantially different certification verdicts for the generative-systems claim.
read the original abstract
Claims that artificial general intelligence has already arrived and claims that it remains decades away are often defended from overlapping evidence. "AGI" lacks a single shared and stable referent and competing operationalizations can return different verdicts on the same system. This article treats that under-specification as a design and governance problem. Following Design Science Research Methodology, it develops DAF-AGI, a second-order conceptual artifact with two coupled components: five ordinal criteria for assessing the adjudicative fitness of candidate definitions and a structured governance audit of authorship, interest, certification, external verification and revision authority. The artifact is demonstrated on five prominent measurement families and one deflationary boundary position in a documented corpus and then stress-tested against a stylized strong arrival claim: that current generative systems constitute AGI because they outperform a well-educated adult on many cognitive tasks. On evidence from the cited 2024-2025 sources, the claim was certifiable only under a performance-based operationalization; capability-ontology, psychometric and skill-acquisition approaches did not certify it, the economic family remains indeterminate and the deflationary position refuses binary adjudication. The contribution is a novel integration and operationalization, not an empirical validation: independent application, inter-rater testing and author-external cases remain necessary. The paper further proposes definitional sovereignty as an enabling component of algorithmic sovereignty: the institutional capacity to contest, certify and revise imported technological categories under public accountability.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper develops DAF-AGI, a second-order conceptual artifact via Design Science Research Methodology, consisting of five ordinal criteria for assessing the adjudicative fitness of candidate AGI definitions together with a structured governance audit (authorship, interest, certification, external verification, revision authority). It demonstrates the artifact on five measurement families (performance-based, capability-ontology, psychometric, skill-acquisition, economic) plus a deflationary boundary position drawn from a 2024-2025 corpus, then stress-tests it against the claim that current generative systems constitute AGI because they outperform a well-educated adult on many cognitive tasks. The demonstration reports that this claim receives certification only under the performance-based operationalization; the other families do not certify it, the economic family is indeterminate, and the deflationary position refuses binary adjudication. The work explicitly frames itself as a novel integration and operationalization rather than an empirical validation and calls for independent application, inter-rater testing, and author-external cases. It further proposes definitional sovereignty as an enabling component of algorithmic sovereignty.
Significance. If the five ordinal criteria prove consistently applicable, the framework supplies a structured, second-order method for adjudicating competing operationalizations of AGI, which would be significant for AI governance, policy, and the emerging literature on definitional alignment. The explicit separation of the artifact from its demonstration, the call for external validation, and the linkage to algorithmic sovereignty are constructive features that position the contribution as a foundation rather than a finished instrument. The absence of fitted parameters or self-referential equations further supports its status as an independent proposal.
major comments (2)
- [Demonstration and stress-test sections] Demonstration on measurement families and stress-test against the stylized strong arrival claim: the reported differential certification (only performance-based certifies the generative-systems claim; capability-ontology, psychometric, and skill-acquisition do not) is load-bearing for the claim that the framework can adjudicate competing operationalizations, yet the manuscript supplies neither detailed scoring rubrics for the five ordinal criteria nor the raw per-criterion scores or threshold justifications used in the application. Without these, the stability of the observed pattern cannot be assessed independently of a single rater's interpretation.
- [Criteria definition and application] Application of the five ordinal adjudicative-fitness criteria: the central empirical-style result (differential outcomes across definition families) presupposes that the ordinal criteria can be applied with sufficient consistency to produce stable certification differences, but no inter-rater reliability data, test-retest results, or even illustrative scoring examples are provided. This directly affects the defensibility of the framework's adjudicative utility as presented.
minor comments (2)
- [Abstract and §1] The abstract and introduction could more clearly separate the description of the artifact's two components from the results of its demonstration to prevent readers from conflating the proposal with an empirical finding.
- [Governance audit description] The governance audit component is described at a high level; a brief table or enumerated checklist would improve traceability when the audit is later applied to new cases.
Simulated Author's Rebuttal
We thank the referee for the constructive comments and for recognizing the framework's potential role in definitional alignment. We address the two major comments point by point below.
read point-by-point responses
-
Referee: [Demonstration and stress-test sections] Demonstration on measurement families and stress-test against the stylized strong arrival claim: the reported differential certification (only performance-based certifies the generative-systems claim; capability-ontology, psychometric, and skill-acquisition do not) is load-bearing for the claim that the framework can adjudicate competing operationalizations, yet the manuscript supplies neither detailed scoring rubrics for the five ordinal criteria nor the raw per-criterion scores or threshold justifications used in the application. Without these, the stability of the observed pattern cannot be assessed independently of a single rater's interpretation.
Authors: The demonstration is presented as an illustration of the artifact rather than a validated empirical result, consistent with the manuscript's framing as a Design Science proposal that explicitly calls for independent application. We agree that additional transparency on the application process would strengthen the section. In revision we will add detailed scoring rubrics for the five ordinal criteria together with illustrative per-criterion scores and threshold justifications for at least two measurement families. Full raw scores across all families and formal stability assessments remain outside the scope of this initial conceptual contribution. revision: partial
-
Referee: [Criteria definition and application] Application of the five ordinal adjudicative-fitness criteria: the central empirical-style result (differential outcomes across definition families) presupposes that the ordinal criteria can be applied with sufficient consistency to produce stable certification differences, but no inter-rater reliability data, test-retest results, or even illustrative scoring examples are provided. This directly affects the defensibility of the framework's adjudicative utility as presented.
Authors: The manuscript deliberately positions the work as a novel integration and operationalization rather than an empirical validation, and states that inter-rater testing and author-external cases remain necessary. We concur that illustrative scoring examples would improve accessibility. The revised manuscript will incorporate such examples for the criteria. Comprehensive inter-rater reliability or test-retest data would require a separate empirical study, which the paper identifies as future work rather than part of the current contribution. revision: partial
Circularity Check
No circularity: new definitional framework with independent criteria applied to external sources.
full rationale
The paper proposes DAF-AGI as a novel conceptual artifact consisting of five ordinal adjudicative-fitness criteria and a governance audit, both defined within the work itself. These are then applied to a documented corpus of 2024-2025 sources on AGI claims. No equations, fitted parameters, or self-referential reductions are described. The differential certification outcomes (performance-based vs. other families) follow directly from the independently stated criteria rather than from any prior self-citation or input that is redefined as output. The contribution is explicitly framed as integration and operationalization, not empirical validation or derivation from fitted data. This matches the default expectation of no significant circularity for a design-science proposal.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Design Science Research Methodology provides a suitable process for developing governance artifacts that adjudicate contested technical categories such as AGI.
invented entities (2)
-
DAF-AGI
no independent evidence
-
definitional sovereignty
no independent evidence
Reference graph
Works this paper leans on
-
[1]
R., Sohl-Dickstein, J., Fiedel, N., Warkentin, T., Dafoe, A., Faust, A., Farabet, C., & Legg, S
Morris, M. R., Sohl-Dickstein, J., Fiedel, N., Warkentin, T., Dafoe, A., Faust, A., Farabet, C., & Legg, S. (2024). Levels of AGI for operationalizing progress on the path to AGI.Proceedings of the 41st International Conference on Machine Learning (ICML), PMLR 235, 36308–36321. arXiv:2311.02462. https://arxiv.org/abs/2311.02462
- [2]
-
[3]
Chollet, F. (2019). On the measure of intelligence. arXiv:1911.01547. https://arxiv.org/abs/1911.01547
work page internal anchor Pith review Pith/arXiv arXiv 2019
-
[4]
Chollet, F., Knoop, M., Kamradt, G., & Landers, B. (2025). ARC-AGI-2: A new challenge for frontier AI reasoning systems. arXiv:2505.11831. https://arxiv.org/abs/2505.11831
work page internal anchor Pith review Pith/arXiv arXiv 2025
- [5]
-
[6]
F., Thomas, S., Weinstein-Raun, B., & Brauner, J
Grace, K., Stewart, H., Sandkühler, J. F., Thomas, S., Weinstein-Raun, B., & Brauner, J. (2024). Thousands of AI authors on the future of AI. arXiv:2401.02843. https://doi.org/10.48550/arXiv.2401.02843
-
[7]
Blili-Hamelin, B., Hancox-Li, L., & Smart, A. (2024). Unsocial intelligence: An investigation of the assumptions of AGI discourse.Proceedings of the AAAI/ACM Conference on AI, Ethics and Society, 7(1), 141–155. https://doi.org/10.1609/aies.v7i1.31625 ; arXiv:2401.13142. Developing the value-laden critique of intelligence measurement in Blili-Hamelin, B., ...
-
[8]
T., & Smith, G
March, S. T., & Smith, G. F. (1995). Design and natural science research on information technology. Decision Support Systems, 15(4), 251–266
1995
-
[9]
R., March, S
Hevner, A. R., March, S. T., Park, J., & Ram, S. (2004). Design science in information systems research. MIS Quarterly, 28(1), 75–105
2004
-
[10]
A., & Chatterjee, S
Peffers, K., Tuunanen, T., Rothenberger, M. A., & Chatterjee, S. (2007). A design science research methodology for information systems research.Journal of Management Information Systems, 24(3), 45–77
2007
-
[11]
(1950).Logical foundations of probability
Carnap, R. (1950).Logical foundations of probability. University of Chicago Press. (Notion of explication.)
1950
-
[12]
Chalmers, D. J. (2025). What is conceptual engineering and what should it be?Inquiry: An Interdisci- plinary Journal of Philosophy, 68(9), 2902–2919. https://doi.org/10.1080/0020174X.2020.1817141 (First published online 2020.)
-
[13]
(2018).Fixing language: An essay on conceptual engineering
Cappelen, H. (2018).Fixing language: An essay on conceptual engineering. Oxford University Press
2018
-
[14]
Aguilera Briones, J. E. (2026).Sociedad Algorítmica Autónoma en los Estados Unidos Mexicanos [Postdoctoral research]. Zenodo. https://doi.org/10.5281/zenodo.19232168 (Constructs of algorithmic sovereignty, critical dependency and algorithmic governance.) 30
-
[15]
(AGI as highly autonomous systems that outperform humans at most economically valuable work.)
OpenAI (2018).OpenAI charter. (AGI as highly autonomous systems that outperform humans at most economically valuable work.)
2018
-
[16]
P., Bhatt, S
Sarma, G. P., Bhatt, S. D., Jacob, M., & Steratore, R. (2025).AGI forecasting[Research Report RR-A4692-1]. RAND Corporation
2025
-
[17]
J., & McGrew, K
Schneider, W. J., & McGrew, K. S. (2018). The Cattell–Horn–Carroll theory of cognitive abilities. In D. P. Flanagan & E. M. McDonough (Eds.),Contemporary Intellectual Assessment: Theories, Tests and Issues(4th ed., pp. 73–163). Guilford Press
2018
-
[18]
Venable, J., Pries-Heje, J., & Baskerville, R. (2016). FEDS: A framework for evaluation in design science research.European Journal of Information Systems, 25(1), 77–89
2016
-
[19]
Gregor, S., & Hevner, A. R. (2013). Positioning and presenting design science research for maximum impact.MIS Quarterly, 37(2), 337–355
2013
-
[20]
Turing, A. M. (1950). Computing machinery and intelligence.Mind, 59(236), 433–460
1950
-
[21]
Sparks of Artificial General Intelligence: Early experiments with GPT-4
Bubeck, S., Chandrasekaran, V., Eldan, R., Gehrke, J., Horvitz, E., Kamar, E., Lee, P., Lee, Y. T., Li, Y., Lundberg, S., Nori, H., Palangi, H., Ribeiro, M. T., & Zhang, Y. (2023). Sparks of artificial general intelligence: Early experiments with GPT-4. arXiv:2303.12712
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[22]
Legg, S., & Hutter, M. (2007). A collection of definitions of intelligence. In B. Goertzel & P. Wang (Eds.),Advances in Artificial General Intelligence(Frontiers in Artificial Intelligence and Applications, Vol. 157, pp. 17–24). IOS Press
2007
-
[23]
Gallie, W. B. (1956). Essentially contested concepts.Proceedings of the Aristotelian Society, 56, 167–198
1956
-
[24]
N., & Stevens, M
Espeland, W. N., & Stevens, M. L. (1998). Commensuration as a social process.Annual Review of Sociology, 24, 313–343
1998
-
[25]
Porter, T. M. (1995).Trust in numbers: The pursuit of objectivity in science and public life. Princeton University Press
1995
-
[26]
N., & Sauder, M
Espeland, W. N., & Sauder, M. (2007). Rankings and reactivity: How public measures recreate social worlds.American Journal of Sociology, 113(1), 1–40
2007
-
[27]
(Coords.)
Soto, Á., Durán, R., Moreno, A., Adasme, S., Rovira, S., Jordán, V., & Poveda, L. (Coords.). (2025).Índice Latinoamericano de Inteligencia Artificial (ILIA) 2025(Documentos de Proyectos LC/TS.2025/68/Rev.1). Comisión Económica para América Latina y el Caribe (CEPAL) & Centro Nacional de Inteligencia Artificial (CENIA)
2025
-
[28]
(2025).Artificial intelligence index report 2025
Stanford Institute for Human-Centered Artificial Intelligence (HAI). (2025).Artificial intelligence index report 2025. Stanford University
2025
-
[29]
Official Microsoft Blog
Microsoft (2025, October 28).The next chapter of the Microsoft–OpenAI partnership. Official Microsoft Blog. https://blogs.microsoft.com/blog/2025/10/28/the-next-chapter-of-the-microsoft-openai-partner ship/
2025
-
[30]
https://www.anthropic.com/re sponsible-scaling-policy/rsp-v3-0
Anthropic (2026, February 24).Responsible Scaling Policy, Version 3.0. https://www.anthropic.com/re sponsible-scaling-policy/rsp-v3-0
2026
-
[31]
https://openai.com/index/updating-our -preparedness-framework/
OpenAI (2025, April 15).Preparedness Framework, Version 2. https://openai.com/index/updating-our -preparedness-framework/
2025
-
[32]
Official Microsoft Blog
Microsoft (2026, April 27).The next phase of the Microsoft–OpenAI partnership. Official Microsoft Blog. https://blogs.microsoft.com/blog/2026/04/27/the-next-phase-of-the-microsoft-openai-partnership/
2026
-
[33]
C., & Star, S
Bowker, G. C., & Star, S. L. (1999).Sorting things out: Classification and its consequences. MIT Press
1999
-
[34]
Jasanoff, S. (Ed.). (2004).States of knowledge: The co-production of science and the social order. Routledge
2004
-
[35]
(2011).The new global rulers: The privatization of regulation in the world economy
Büthe, T., & Mattli, W. (2011).The new global rulers: The privatization of regulation in the world economy. Princeton University Press. 31
2011
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.