pith. sign in

arxiv: 2606.03019 · v1 · pith:S5TAVCFPnew · submitted 2026-06-02 · 💻 cs.CY · cs.AI

Reproducibility is the New Copyleft: Defining AGI-oriented Reproducible Builds

Pith reviewed 2026-06-28 08:29 UTC · model grok-4.3

classification 💻 cs.CY cs.AI
keywords reproducible buildscopyleftAGIopen source AImodel reconstructionsoftware licensingAI governance
0
0 comments X

The pith

Reproducible builds must replace copyleft to secure freedoms in AGI systems.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper contends that copyleft licenses fail for large language models and AGI because the relationship between source code and the resulting system is no longer well-defined or enforceable through share-alike rules. It proposes instead that reproducible builds, which guarantee bit-exact reconstruction from declared inputs, provide the necessary replacement mechanism. This matters because AI systems can generate equivalent derivatives that evade original license terms while data, weights, and other components face separate restrictions that current frameworks do not address. A sympathetic reader would care because it reframes how openness can be maintained as AI capabilities grow beyond traditional software. If the argument holds, efforts to keep AGI development free must prioritize technical reproducibility standards over license clauses.

Core claim

The central claim is that a functional analogue of copyleft for AGI must be grounded in reproducible builds guaranteeing bit-exact reconstructability from declared inputs rather than share-alike clauses over code, because the artifacts required to reconstruct a model face independent legal, technical, and economic constraints and because sufficiently capable systems can rewrite licensed source into functionally equivalent derivatives stripped of obligations.

What carries the argument

Reproducible builds, the practice of guaranteeing bit-exact reconstructability from declared inputs, which carries the argument by supplying the enforcement mechanism where copyleft's source-to-object premise collapses.

If this is right

  • Current open-source frameworks leave independent constraints on reconstruction artifacts unresolved.
  • Sufficiently capable AI systems can rewrite licensed source into derivatives that evade original obligations.
  • Seven requirements must be met to achieve AGI-oriented reproducible builds.
  • AI-to-AI coupling mechanisms form a dynamic linking layer for which copyleft-style licensing is ill-suited.
  • Protocol-based governance offers a more suitable template than platform-based approaches.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Adoption of reproducible builds could allow independent verification of AI behavior without requiring full public release of every training component.
  • The same reproducibility focus might apply to other complex, non-deterministic computational artifacts beyond current AI models.
  • Standardized input declarations could become a practical requirement for any public AI release aiming to preserve long-term reconstructability.

Load-bearing premise

The premise that source code and the resulting system stand in a well-defined, humanly auditable, and reproducible relationship that open frameworks can satisfy no longer holds for advanced AI systems.

What would settle it

A controlled test that either succeeds or fails at reconstructing a published AI model to identical bit-level outputs and behavior using only its publicly declared code, data, weights, hyperparameters, toolchain, and hardware configuration.

read the original abstract

Copyleft, as implemented in licenses such as the GNU General Public License, was a legal hack that used copyright to guarantee user freedom by tying the availability of source code to every act of distribution. Its normative force rested on an implicit technical premise: that source code and object code stand in a well-defined, humanly auditable, and reproducible relationship. Large language models and, prospectively, Artificial General Intelligence (AGI) systems systematically violate this premise. The artifacts jointly required to reconstruct a model -- code, data, weights, hyperparameters, toolchain, and hardware configuration -- are each subject to independent legal, technical, and economic constraints that no current open-source framework fully resolves. Sufficiently capable AI systems can also rewrite licensed source into functionally equivalent derivatives stripped of their original obligations, a form of laundering against which copyleft has no effective defense. This paper argues that a functional analogue of copyleft for AGI must be grounded not in share-alike clauses over code, but in reproducible builds: a practice guaranteeing bit-exact reconstructability from declared inputs. We review the logic of copyleft, critically examine Maffulli's Second Liberation thesis according to which AI fulfills Stallman's dream, and show that the argument collapses unless AGI systems are themselves reproducible. Drawing on the Open Source AI Definition (OSAID), the Model Openness Framework (MOF), OpenMDW, and deterministic-inference research, we define seven requirements for AGI-oriented reproducible builds. We further argue that the Model Context Protocol (MCP) and analogous AI-to-AI coupling mechanisms constitute a new dynamic linking layer for which copyleft-style licensing is ill-suited, and that Masnick's "protocols, not platforms" framework offers a more promising governance template.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 2 minor

Summary. The paper claims that copyleft's normative force depends on a source/object code relationship that AGI systems violate through complex, independently constrained artifacts (code, data, weights, hyperparameters, toolchain, hardware) and through AI-driven rewriting that evades share-alike obligations. It argues that a functional analogue must instead rest on reproducible builds guaranteeing bit-exact reconstructability, defines seven requirements for AGI-oriented reproducible builds after reviewing OSAID, MOF, OpenMDW and deterministic-inference work, shows that Maffulli's Second Liberation thesis collapses without reproducibility, and proposes that dynamic mechanisms such as the Model Context Protocol are better addressed via protocols-not-platforms governance than via copyleft licensing.

Significance. If the premises hold, the work supplies a coherent conceptual reframing that relocates open-source protection for advanced AI from legal clauses to technical reproducibility practices. It explicitly credits and extends prior frameworks (OSAID, MOF, OpenMDW) by distilling seven requirements and identifies a new dynamic-linking layer (MCP-style coupling) for which existing licensing templates are ill-suited. The argument is internally consistent and follows directly from the stated premises without circularity or hidden empirical claims.

minor comments (2)
  1. [Abstract] Abstract: the statement that 'no current open-source framework fully resolves' the listed constraints would be strengthened by a concise enumeration of the specific gaps each framework leaves unaddressed, even if only in summary form.
  2. The seven requirements are introduced as the core technical contribution; a short table or numbered list with one-sentence justification for each would improve readability without altering the argument.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the detailed summary of the manuscript, the positive assessment of its significance, and the recommendation for minor revision. No specific major comments or requested changes were enumerated in the report.

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The paper advances a normative argument redefining copyleft-style obligations for AGI around reproducible builds rather than share-alike code. Its premises rest on external citations (OSAID, MOF, OpenMDW, Maffulli thesis, Masnick framework) and stated technical observations about AI artifacts; none of the seven requirements or the central claim reduces by definition, fitted parameter, or self-citation chain to the paper's own inputs. No equations, predictions, or uniqueness theorems appear that would trigger the enumerated circularity patterns. The derivation remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The argument depends on domain assumptions about AI's ability to rewrite code and the inadequacy of existing frameworks; no free parameters or invented entities are introduced.

axioms (2)
  • domain assumption Sufficiently capable AI systems can rewrite licensed source into functionally equivalent derivatives stripped of original obligations.
    Invoked to show copyleft has no effective defense; stated in the abstract as a form of laundering.
  • domain assumption No current open-source framework fully resolves the independent legal, technical, and economic constraints on artifacts required to reconstruct a model.
    Used to justify the need for a new reproducible-build approach.

pith-pipeline@v0.9.1-grok · 5842 in / 1386 out tokens · 27564 ms · 2026-06-28T08:29:43.545098+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

38 extracted references · 9 canonical work pages · 1 internal anchor

  1. [1]

    Anthropic: Introducing the Model Context Protocol (2024).https://www.anthro pic.com/news/model-context-protocol

  2. [2]

    Take It or Leave It

    Benhamou, Y., Reymond, M.: Open Source Artificial Intelligence Definition 1.0 – A “Take It or Leave It” Approach for Open Source AI Systems? Kluwer Copyright Blog, March 4 (2025).https://legalblogs.wolterskluwer.com/copyright-blo g/open-source-artificial-intelligence-definition-10-a-take-it-or-lea ve-it-approach-for-open-source-ai-systems/

  3. [3]

    In: 30th USENIX Security Sympo- sium, pp

    Carlini, N., Tramèr, F., Wallace, E., Jagielski, M., Herbert-Voss, A., Lee, K., Roberts, A., Brown, T., Song, D., Erlingsson, U., Oprea, A., Raffel, C.: Extracting Training Data from Large Language Models. In: 30th USENIX Security Sympo- sium, pp. 2633–2650 (2021).https://www.usenix.org/conference/usenixsecu rity21/presentation/carlini-extracting

  4. [4]

    arXiv:2302.10149 (2023).https://doi.org/10.48550/arXiv.2302

    Carlini, N., Jagielski, M., Choquette-Choo, C.A., Paleka, D., Pearce, W., Ander- son,H.,Terzis,A.,Thomas,K.,Tramèr,F.:PoisoningWeb-ScaleTrainingDatasets is Practical. arXiv:2302.10149 (2023).https://doi.org/10.48550/arXiv.2302. 10149

  5. [5]

    In: Proceedings of the 44th Inter- national Conference on Software Engineering (ICSE ’22), pp

    Chen, B., Wen, M., Shi, Y., Lin, D., Rajbahadur, G.K., Jiang, Z.M.: Towards Training Reproducible Deep Learning Models. In: Proceedings of the 44th Inter- national Conference on Software Engineering (ICSE ’22), pp. 2202–2214 (2022). https://doi.org/10.1145/3510003.3510163

  6. [6]

    The Register, March 6 (2026).https://www.theregister.com/20 26/03/06/ai_kills_software_licensing/

    Claburn, T.: Chardet Dispute Shows How AI Will Kill Software Licensing, Argues Bruce Perens. The Register, March 6 (2026).https://www.theregister.com/20 26/03/06/ai_kills_software_licensing/

  7. [7]

    Official Journal of the European Union (2024)

    European Parliament and Council: Regulation (EU) 2024/1689 of 13 June 2024 Laying Down Harmonized Rules on Artificial Intelligence (AI Act). Official Journal of the European Union (2024)

  8. [8]

    Free Software Foundation: GNU General Public License, Version 3 (2007).https: //www.gnu.org/licenses/gpl-3.0.html

  9. [9]

    Free Software Foundation: What Is Free Software? (2024).https://www.gnu.or g/philosophy/free-sw.html

  10. [10]

    Future of Life Institute: Asilomar AI Principles (2017).https://futureoflife.o rg/2017/08/11/ai-principles/

  11. [11]

    Generative AI Commons: Model Openness Tool (2025).https://isitopen.ai/

  12. [12]

    En- abling Determinism in LLM Inference with Verified Speculation

    Gond, R., Kamath, A.K., Ramjee, R., Panwar, A.: LLM-42: Enabling Determinism in LLM Inference with Verified Speculation. arXiv:2601.17768 (2026).https://do i.org/10.48550/arXiv.2601.17768

  13. [13]

    Google Developers Blog (2025).https://developers.googleblog.com/en/a2a-a-new-era-of-agent-i nteroperability/

    Google: Announcing the Agent2Agent Protocol (A2A). Google Developers Blog (2025).https://developers.googleblog.com/en/a2a-a-new-era-of-agent-i nteroperability/

  14. [14]

    Copyleft

    Hatta, M.: “Copyleft” in the Context of GenAI. Hack or Be Hacked (Substack), October 21 (2024).https://mhatta.substack.com/p/copyleft-in-the-conte xt-of-genai Reproducibility is the New Copyleft 17

  15. [15]

    (eds.) Artificial General Intelligence

    Hatta,M.:SeveralIssuesRegardingDataGovernanceinAGI.In:Iklé,M.,Kolonin, A., Bennett, M. (eds.) Artificial General Intelligence. AGI 2025. Lecture Notes in Computer Science, vol. 16057, pp. 239–249. Springer, Cham (2026).https: //doi.org/10.1007/978-3-032-00686-8_22

  16. [16]

    Thinking Machines Lab: Connectionism, September (2025).https://thinkingma chines.ai/blog/defeating-nondeterminism-in-llm-inference/

    He, H., Thinking Machines Lab: Defeating Nondeterminism in LLM Inference. Thinking Machines Lab: Connectionism, September (2025).https://thinkingma chines.ai/blog/defeating-nondeterminism-in-llm-inference/

  17. [17]

    Open Source

    Kuhn, B.M.: Open Source AI Definition Erodes the Meaning of “Open Source”. Software Freedom Conservancy Blog, October 31 (2024).https://sfconservanc y.org/blog/2024/oct/31/open-source-ai-definition-osaid-erodes-foss/

  18. [18]

    IEEE Software 39(2), 62–70 (2022).https://doi.org/10.1109/ MS.2021.3073045

    Lamb, C., Zacchiroli, S.: Reproducible Builds: Increasing the Integrity of Software Supply Chains. IEEE Software 39(2), 62–70 (2022).https://doi.org/10.1109/ MS.2021.3073045. IEEE Software Best Paper Award 2022

  19. [19]

    LMSYS Blog, September 22 (2025).https://www.lmsys.org/blog/2025-0 9-22-sglang-deterministic/

    LMSYS: Towards Deterministic Inference in SGLang and Reproducible RL Train- ing. LMSYS Blog, September 22 (2025).https://www.lmsys.org/blog/2025-0 9-22-sglang-deterministic/

  20. [20]

    Personal blog, March 16 (2026).https://www.maffulli.net/2026/03/16/ai-final-fro ntier-of-copyleft/

    Maffulli, S.: The Second Liberation: AI Is the Final Frontier of Copyleft. Personal blog, March 16 (2026).https://www.maffulli.net/2026/03/16/ai-final-fro ntier-of-copyleft/

  21. [21]

    Knight First Amendment Institute, Columbia University, 19–05 (2019).https: //knightcolumbia.org/content/protocols-not-platforms-a-technological -approach-to-free-speech

    Masnick, M.: Protocols, Not Platforms: A Technological Approach to Free Speech. Knight First Amendment Institute, Columbia University, 19–05 (2019).https: //knightcolumbia.org/content/protocols-not-platforms-a-technological -approach-to-free-speech

  22. [22]

    Software Engineering Institute, Carnegie Mellon University, Insights Blog, January 13 (2025).https://doi.org/10.58012 /g17y-gp09

    Mellinger, A., Justice, D., Connor, M., Gallagher, S., Brooks, T.: The Myth of Ma- chine Learning Non-Reproducibility and Randomness for Acquisitions and Testing, Evaluation, Verification, and Validation. Software Engineering Institute, Carnegie Mellon University, Insights Blog, January 13 (2025).https://doi.org/10.58012 /g17y-gp09

  23. [23]

    OpenAI Blog, June 13 (2023)

    OpenAI: Function Calling and Other API Updates. OpenAI Blog, June 13 (2023). https://openai.com/blog/function-calling-and-other-api-updates

  24. [24]

    Open Future Observatory (2024)

    Open Future: The AI Act and Open Source AI. Open Future Observatory (2024). https://openfuture.eu/observatory/aia-open-source/

  25. [25]

    Open Source Initiative: The Open Source AI Definition v1.0 (2024).https://op ensource.org/ai/open-source-ai-definition

  26. [26]

    Open Source Initiative: Deep Dive: Data Governance (Online Event, October 1–3, 2025).https://opensource.org/events/deep-dive-data-governance

  27. [27]

    Open Source Initiative: OSAID FAQs (2025).https://opensource.org/ai/faq

  28. [28]

    Open Source Initiative: Report from OSS EU 2025 and AI_dev: What’s Next for OSAID (2025).https://opensource.org/blog/report-from-oss-eu-2025-and -ai_dev-whats-next-for-osaid

  29. [29]

    The PyTorch Project: Reproducibility — PyTorch Documentation (2024).https: //docs.pytorch.org/docs/stable/notes/randomness.html

  30. [30]

    The Reproducible Builds Project: Reproducible Builds—A Set of Software Devel- opment Practices That Create an Independently-Verifiable Path from Source to Binary Code.https://reproducible-builds.org/(2024)

  31. [31]

    Science 381(6654), 158–161 (2023)

    Samuelson, P.: Generative AI Meets Copyright. Science 381(6654), 158–161 (2023). https://doi.org/10.1126/science.adi0656

  32. [32]

    AI Magazine 46(2), e70002 (2025).https: //doi.org/10.1002/aaai.70002 18 M

    Semmelrock, H., Ross-Hellauer, T., Kopeinik, S., Theiler, D., Haberl, A., Thal- mann, S., Kowald, D.: Reproducibility in Machine-Learning-Based Research: Overview, Barriers, and Drivers. AI Magazine 46(2), e70002 (2025).https: //doi.org/10.1002/aaai.70002 18 M. Hatta

  33. [33]

    Stall- man

    Stallman, R.M.: Free Software, Free Society: Selected Essays of Richard M. Stall- man. GNU Press, Boston (2002).https://www.gnu.org/philosophy/fsfs/rms-e ssays.pdf

  34. [34]

    White, M., Haddad, I., Osborne, C., Liu, X.-Y. (Yanglet), Abdelmonsef, A., Vargh- ese, S., Le Hors, A.: The Model Openness Framework: Promoting Completeness and Openness for Reproducibility, Transparency, and Usability in Artificial Intel- ligence. arXiv:2403.13784 (2024).https://doi.org/10.48550/arXiv.2403.13784

  35. [35]

    Linux Foun- dation Blog, May 22 (2025).https://www.linuxfoundation.org/blog/the-ope n-source-legacy-and-ais-licensing-challenge

    White, M.: The Open Source Legacy and AI’s Licensing Challenge. Linux Foun- dation Blog, May 22 (2025).https://www.linuxfoundation.org/blog/the-ope n-source-legacy-and-ais-licensing-challenge

  36. [36]

    SSRN preprint

    Widder, D.G., West, S., Whittaker, M.: Open (for Business): Big Tech, Con- centrated Power, and the Political Economy of Open AI. SSRN preprint (2023). https://doi.org/10.2139/ssrn.4543807

  37. [37]

    Chapman and Hall/CRC (2015).https://doi.org/10.1201/b18612

    Yampolskiy, R.V.: Artificial Superintelligence: A Futuristic Approach. Chapman and Hall/CRC (2015).https://doi.org/10.1201/b18612

  38. [38]

    Machine Intelligence Research Institute Technical Report (2013).https: //intelligence.org/files/TilingAgents.pdf

    Yudkowsky,E.,Herreshoff,M.:TilingAgentsforSelf-ModifyingAI,andtheLöbian Obstacle. Machine Intelligence Research Institute Technical Report (2013).https: //intelligence.org/files/TilingAgents.pdf