pith. sign in

arxiv: 2606.12713 · v1 · pith:4TZ2XLAGnew · submitted 2026-06-10 · 💻 cs.AI

Definitional alignment before capability alignment: a Design-Science framework for adjudicating claims about AGI

Pith reviewed 2026-06-27 09:39 UTC · model grok-4.3

classification 💻 cs.AI
keywords AGI definitionsdefinitional alignmentdesign sciencegovernance auditadjudicative fitnesscapability alignmentalgorithmic sovereigntymeasurement families
0
0 comments X

The pith

Whether current generative systems count as AGI depends on which definition family is used, per a design-science adjudication framework.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper treats the absence of a stable shared referent for AGI as a design and governance problem rather than a purely technical one. It builds DAF-AGI, a second-order artifact with five ordinal criteria that judge how well any candidate definition can settle claims and a coupled governance audit covering authorship, interests, certification, verification, and revision authority. When the artifact is applied to five prominent measurement families plus a deflationary boundary position, the stylized claim that current generative systems already constitute AGI receives certification only under performance-based operationalizations; capability-ontology, psychometric, and skill-acquisition families withhold it, the economic family stays indeterminate, and the deflationary stance declines binary adjudication. A sympathetic reader would care because the same body of 2024-2025 evidence can support contradictory arrival verdicts once operationalizations differ, leaving governance and alignment efforts without a shared decision procedure.

Core claim

DAF-AGI consists of five ordinal criteria for assessing the adjudicative fitness of candidate AGI definitions together with a structured governance audit of authorship, interest, certification, external verification, and revision authority. Demonstrated on a documented corpus of five measurement families and one deflationary position, the artifact shows that the claim current generative systems constitute AGI is certifiable only under a performance-based operationalization; capability-ontology, psychometric, and skill-acquisition approaches do not certify it, the economic family remains indeterminate, and the deflationary position refuses binary adjudication. The contribution is presented as

What carries the argument

DAF-AGI, the second-order conceptual artifact that pairs five ordinal adjudicative-fitness criteria with a governance audit of authorship, interest, certification, external verification, and revision authority.

If this is right

  • Performance-based definitions alone certify the arrival claim for current generative systems while other families do not.
  • The economic measurement family yields an indeterminate result under the framework.
  • A deflationary boundary position refuses binary yes/no adjudication of the claim.
  • Definitional sovereignty is positioned as an enabling component of algorithmic sovereignty under public accountability.
  • The framework is offered for independent application, inter-rater testing, and author-external cases rather than as a finished empirical instrument.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Public institutions could adopt the five-criteria audit as a standing procedure before incorporating any imported AGI category into regulation or funding decisions.
  • Repeated application of DAF-AGI to successive generations of systems could map when capability gains cross from one definition family into another.
  • The governance-audit component supplies a template for contesting technological categories beyond AGI, such as definitions of autonomy or risk in other domains.
  • A natural next test is to run the framework on non-generative systems or on claims about near-term rather than current AGI to check whether the differential pattern persists.

Load-bearing premise

The five ordinal criteria for adjudicative fitness can be applied consistently enough across definition families to produce stable differences in certification outcomes.

What would settle it

An inter-rater study in which independent evaluators apply the five criteria and governance audit to the same corpus and produce substantially different certification verdicts for the generative-systems claim.

read the original abstract

Claims that artificial general intelligence has already arrived and claims that it remains decades away are often defended from overlapping evidence. "AGI" lacks a single shared and stable referent and competing operationalizations can return different verdicts on the same system. This article treats that under-specification as a design and governance problem. Following Design Science Research Methodology, it develops DAF-AGI, a second-order conceptual artifact with two coupled components: five ordinal criteria for assessing the adjudicative fitness of candidate definitions and a structured governance audit of authorship, interest, certification, external verification and revision authority. The artifact is demonstrated on five prominent measurement families and one deflationary boundary position in a documented corpus and then stress-tested against a stylized strong arrival claim: that current generative systems constitute AGI because they outperform a well-educated adult on many cognitive tasks. On evidence from the cited 2024-2025 sources, the claim was certifiable only under a performance-based operationalization; capability-ontology, psychometric and skill-acquisition approaches did not certify it, the economic family remains indeterminate and the deflationary position refuses binary adjudication. The contribution is a novel integration and operationalization, not an empirical validation: independent application, inter-rater testing and author-external cases remain necessary. The paper further proposes definitional sovereignty as an enabling component of algorithmic sovereignty: the institutional capacity to contest, certify and revise imported technological categories under public accountability.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper develops DAF-AGI, a second-order conceptual artifact via Design Science Research Methodology, consisting of five ordinal criteria for assessing the adjudicative fitness of candidate AGI definitions together with a structured governance audit (authorship, interest, certification, external verification, revision authority). It demonstrates the artifact on five measurement families (performance-based, capability-ontology, psychometric, skill-acquisition, economic) plus a deflationary boundary position drawn from a 2024-2025 corpus, then stress-tests it against the claim that current generative systems constitute AGI because they outperform a well-educated adult on many cognitive tasks. The demonstration reports that this claim receives certification only under the performance-based operationalization; the other families do not certify it, the economic family is indeterminate, and the deflationary position refuses binary adjudication. The work explicitly frames itself as a novel integration and operationalization rather than an empirical validation and calls for independent application, inter-rater testing, and author-external cases. It further proposes definitional sovereignty as an enabling component of algorithmic sovereignty.

Significance. If the five ordinal criteria prove consistently applicable, the framework supplies a structured, second-order method for adjudicating competing operationalizations of AGI, which would be significant for AI governance, policy, and the emerging literature on definitional alignment. The explicit separation of the artifact from its demonstration, the call for external validation, and the linkage to algorithmic sovereignty are constructive features that position the contribution as a foundation rather than a finished instrument. The absence of fitted parameters or self-referential equations further supports its status as an independent proposal.

major comments (2)
  1. [Demonstration and stress-test sections] Demonstration on measurement families and stress-test against the stylized strong arrival claim: the reported differential certification (only performance-based certifies the generative-systems claim; capability-ontology, psychometric, and skill-acquisition do not) is load-bearing for the claim that the framework can adjudicate competing operationalizations, yet the manuscript supplies neither detailed scoring rubrics for the five ordinal criteria nor the raw per-criterion scores or threshold justifications used in the application. Without these, the stability of the observed pattern cannot be assessed independently of a single rater's interpretation.
  2. [Criteria definition and application] Application of the five ordinal adjudicative-fitness criteria: the central empirical-style result (differential outcomes across definition families) presupposes that the ordinal criteria can be applied with sufficient consistency to produce stable certification differences, but no inter-rater reliability data, test-retest results, or even illustrative scoring examples are provided. This directly affects the defensibility of the framework's adjudicative utility as presented.
minor comments (2)
  1. [Abstract and §1] The abstract and introduction could more clearly separate the description of the artifact's two components from the results of its demonstration to prevent readers from conflating the proposal with an empirical finding.
  2. [Governance audit description] The governance audit component is described at a high level; a brief table or enumerated checklist would improve traceability when the audit is later applied to new cases.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments and for recognizing the framework's potential role in definitional alignment. We address the two major comments point by point below.

read point-by-point responses
  1. Referee: [Demonstration and stress-test sections] Demonstration on measurement families and stress-test against the stylized strong arrival claim: the reported differential certification (only performance-based certifies the generative-systems claim; capability-ontology, psychometric, and skill-acquisition do not) is load-bearing for the claim that the framework can adjudicate competing operationalizations, yet the manuscript supplies neither detailed scoring rubrics for the five ordinal criteria nor the raw per-criterion scores or threshold justifications used in the application. Without these, the stability of the observed pattern cannot be assessed independently of a single rater's interpretation.

    Authors: The demonstration is presented as an illustration of the artifact rather than a validated empirical result, consistent with the manuscript's framing as a Design Science proposal that explicitly calls for independent application. We agree that additional transparency on the application process would strengthen the section. In revision we will add detailed scoring rubrics for the five ordinal criteria together with illustrative per-criterion scores and threshold justifications for at least two measurement families. Full raw scores across all families and formal stability assessments remain outside the scope of this initial conceptual contribution. revision: partial

  2. Referee: [Criteria definition and application] Application of the five ordinal adjudicative-fitness criteria: the central empirical-style result (differential outcomes across definition families) presupposes that the ordinal criteria can be applied with sufficient consistency to produce stable certification differences, but no inter-rater reliability data, test-retest results, or even illustrative scoring examples are provided. This directly affects the defensibility of the framework's adjudicative utility as presented.

    Authors: The manuscript deliberately positions the work as a novel integration and operationalization rather than an empirical validation, and states that inter-rater testing and author-external cases remain necessary. We concur that illustrative scoring examples would improve accessibility. The revised manuscript will incorporate such examples for the criteria. Comprehensive inter-rater reliability or test-retest data would require a separate empirical study, which the paper identifies as future work rather than part of the current contribution. revision: partial

Circularity Check

0 steps flagged

No circularity: new definitional framework with independent criteria applied to external sources.

full rationale

The paper proposes DAF-AGI as a novel conceptual artifact consisting of five ordinal adjudicative-fitness criteria and a governance audit, both defined within the work itself. These are then applied to a documented corpus of 2024-2025 sources on AGI claims. No equations, fitted parameters, or self-referential reductions are described. The differential certification outcomes (performance-based vs. other families) follow directly from the independently stated criteria rather than from any prior self-citation or input that is redefined as output. The contribution is explicitly framed as integration and operationalization, not empirical validation or derivation from fitted data. This matches the default expectation of no significant circularity for a design-science proposal.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 2 invented entities

The framework rests on the domain assumption that design-science research methodology is appropriate for adjudicating AGI definitions and introduces two new conceptual entities without external falsifiable evidence.

axioms (1)
  • domain assumption Design Science Research Methodology provides a suitable process for developing governance artifacts that adjudicate contested technical categories such as AGI.
    The paper states it follows DSRM to develop DAF-AGI.
invented entities (2)
  • DAF-AGI no independent evidence
    purpose: Second-order conceptual artifact for assessing AGI definition fitness and governance.
    New framework constructed in the paper.
  • definitional sovereignty no independent evidence
    purpose: Institutional capacity to contest, certify and revise imported technological categories under public accountability.
    Proposed as enabling component of algorithmic sovereignty.

pith-pipeline@v0.9.1-grok · 5782 in / 1473 out tokens · 23558 ms · 2026-06-27T09:39:35.483931+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

35 extracted references · 10 canonical work pages · 3 internal anchors

  1. [1]

    R., Sohl-Dickstein, J., Fiedel, N., Warkentin, T., Dafoe, A., Faust, A., Farabet, C., & Legg, S

    Morris, M. R., Sohl-Dickstein, J., Fiedel, N., Warkentin, T., Dafoe, A., Faust, A., Farabet, C., & Legg, S. (2024). Levels of AGI for operationalizing progress on the path to AGI.Proceedings of the 41st International Conference on Machine Learning (ICML), PMLR 235, 36308–36321. arXiv:2311.02462. https://arxiv.org/abs/2311.02462

  2. [2]

    Hendrycks, D., Song, D., Szegedy, C., Lee, H., Gal, Y., Brynjolfsson, E., Li, S., Marcus, G., Tegmark, M., Bengio, Y., et al. (2025). A definition of AGI. arXiv:2510.18212. https://arxiv.org/abs/2510.18212 ; project site https://www.agidefinition.ai/

  3. [3]

    Chollet, F. (2019). On the measure of intelligence. arXiv:1911.01547. https://arxiv.org/abs/1911.01547

  4. [4]

    Chollet, F., Knoop, M., Kamradt, G., & Landers, B. (2025). ARC-AGI-2: A new challenge for frontier AI reasoning systems. arXiv:2505.11831. https://arxiv.org/abs/2505.11831

  5. [5]

    Chollet, F., Knoop, M., Kamradt, G., & Landers, B. (2026). ARC Prize 2025: Technical report. arXiv:2601.10904. https://arxiv.org/abs/2601.10904

  6. [6]

    arXiv preprint arXiv:2401.02843 , year=

    Grace, K., Stewart, H., Sandkühler, J. F., Thomas, S., Weinstein-Raun, B., & Brauner, J. (2024). Thousands of AI authors on the future of AI. arXiv:2401.02843. https://doi.org/10.48550/arXiv.2401.02843

  7. [7]

    Blili-Hamelin, B., Hancox-Li, L., & Smart, A. (2024). Unsocial intelligence: An investigation of the assumptions of AGI discourse.Proceedings of the AAAI/ACM Conference on AI, Ethics and Society, 7(1), 141–155. https://doi.org/10.1609/aies.v7i1.31625 ; arXiv:2401.13142. Developing the value-laden critique of intelligence measurement in Blili-Hamelin, B., ...

  8. [8]

    T., & Smith, G

    March, S. T., & Smith, G. F. (1995). Design and natural science research on information technology. Decision Support Systems, 15(4), 251–266

  9. [9]

    R., March, S

    Hevner, A. R., March, S. T., Park, J., & Ram, S. (2004). Design science in information systems research. MIS Quarterly, 28(1), 75–105

  10. [10]

    A., & Chatterjee, S

    Peffers, K., Tuunanen, T., Rothenberger, M. A., & Chatterjee, S. (2007). A design science research methodology for information systems research.Journal of Management Information Systems, 24(3), 45–77

  11. [11]

    (1950).Logical foundations of probability

    Carnap, R. (1950).Logical foundations of probability. University of Chicago Press. (Notion of explication.)

  12. [12]

    Chalmers, D. J. (2025). What is conceptual engineering and what should it be?Inquiry: An Interdisci- plinary Journal of Philosophy, 68(9), 2902–2919. https://doi.org/10.1080/0020174X.2020.1817141 (First published online 2020.)

  13. [13]

    (2018).Fixing language: An essay on conceptual engineering

    Cappelen, H. (2018).Fixing language: An essay on conceptual engineering. Oxford University Press

  14. [14]

    Aguilera Briones, J. E. (2026).Sociedad Algorítmica Autónoma en los Estados Unidos Mexicanos [Postdoctoral research]. Zenodo. https://doi.org/10.5281/zenodo.19232168 (Constructs of algorithmic sovereignty, critical dependency and algorithmic governance.) 30

  15. [15]

    (AGI as highly autonomous systems that outperform humans at most economically valuable work.)

    OpenAI (2018).OpenAI charter. (AGI as highly autonomous systems that outperform humans at most economically valuable work.)

  16. [16]

    P., Bhatt, S

    Sarma, G. P., Bhatt, S. D., Jacob, M., & Steratore, R. (2025).AGI forecasting[Research Report RR-A4692-1]. RAND Corporation

  17. [17]

    J., & McGrew, K

    Schneider, W. J., & McGrew, K. S. (2018). The Cattell–Horn–Carroll theory of cognitive abilities. In D. P. Flanagan & E. M. McDonough (Eds.),Contemporary Intellectual Assessment: Theories, Tests and Issues(4th ed., pp. 73–163). Guilford Press

  18. [18]

    Venable, J., Pries-Heje, J., & Baskerville, R. (2016). FEDS: A framework for evaluation in design science research.European Journal of Information Systems, 25(1), 77–89

  19. [19]

    Gregor, S., & Hevner, A. R. (2013). Positioning and presenting design science research for maximum impact.MIS Quarterly, 37(2), 337–355

  20. [20]

    Turing, A. M. (1950). Computing machinery and intelligence.Mind, 59(236), 433–460

  21. [21]

    Sparks of Artificial General Intelligence: Early experiments with GPT-4

    Bubeck, S., Chandrasekaran, V., Eldan, R., Gehrke, J., Horvitz, E., Kamar, E., Lee, P., Lee, Y. T., Li, Y., Lundberg, S., Nori, H., Palangi, H., Ribeiro, M. T., & Zhang, Y. (2023). Sparks of artificial general intelligence: Early experiments with GPT-4. arXiv:2303.12712

  22. [22]

    Legg, S., & Hutter, M. (2007). A collection of definitions of intelligence. In B. Goertzel & P. Wang (Eds.),Advances in Artificial General Intelligence(Frontiers in Artificial Intelligence and Applications, Vol. 157, pp. 17–24). IOS Press

  23. [23]

    Gallie, W. B. (1956). Essentially contested concepts.Proceedings of the Aristotelian Society, 56, 167–198

  24. [24]

    N., & Stevens, M

    Espeland, W. N., & Stevens, M. L. (1998). Commensuration as a social process.Annual Review of Sociology, 24, 313–343

  25. [25]

    Porter, T. M. (1995).Trust in numbers: The pursuit of objectivity in science and public life. Princeton University Press

  26. [26]

    N., & Sauder, M

    Espeland, W. N., & Sauder, M. (2007). Rankings and reactivity: How public measures recreate social worlds.American Journal of Sociology, 113(1), 1–40

  27. [27]

    (Coords.)

    Soto, Á., Durán, R., Moreno, A., Adasme, S., Rovira, S., Jordán, V., & Poveda, L. (Coords.). (2025).Índice Latinoamericano de Inteligencia Artificial (ILIA) 2025(Documentos de Proyectos LC/TS.2025/68/Rev.1). Comisión Económica para América Latina y el Caribe (CEPAL) & Centro Nacional de Inteligencia Artificial (CENIA)

  28. [28]

    (2025).Artificial intelligence index report 2025

    Stanford Institute for Human-Centered Artificial Intelligence (HAI). (2025).Artificial intelligence index report 2025. Stanford University

  29. [29]

    Official Microsoft Blog

    Microsoft (2025, October 28).The next chapter of the Microsoft–OpenAI partnership. Official Microsoft Blog. https://blogs.microsoft.com/blog/2025/10/28/the-next-chapter-of-the-microsoft-openai-partner ship/

  30. [30]

    https://www.anthropic.com/re sponsible-scaling-policy/rsp-v3-0

    Anthropic (2026, February 24).Responsible Scaling Policy, Version 3.0. https://www.anthropic.com/re sponsible-scaling-policy/rsp-v3-0

  31. [31]

    https://openai.com/index/updating-our -preparedness-framework/

    OpenAI (2025, April 15).Preparedness Framework, Version 2. https://openai.com/index/updating-our -preparedness-framework/

  32. [32]

    Official Microsoft Blog

    Microsoft (2026, April 27).The next phase of the Microsoft–OpenAI partnership. Official Microsoft Blog. https://blogs.microsoft.com/blog/2026/04/27/the-next-phase-of-the-microsoft-openai-partnership/

  33. [33]

    C., & Star, S

    Bowker, G. C., & Star, S. L. (1999).Sorting things out: Classification and its consequences. MIT Press

  34. [34]

    Jasanoff, S. (Ed.). (2004).States of knowledge: The co-production of science and the social order. Routledge

  35. [35]

    (2011).The new global rulers: The privatization of regulation in the world economy

    Büthe, T., & Mattli, W. (2011).The new global rulers: The privatization of regulation in the world economy. Princeton University Press. 31