pith. machine review for the scientific record. sign in

arxiv: 2605.10425 · v1 · submitted 2026-05-11 · 💻 cs.CY · cs.AI

Recognition: 2 theorem links

· Lean Theorem

Toward an Engineering of Science: Rebalancing Generation and Verification in the Age of AI

Authors on Pith no claims yet

Pith reviewed 2026-05-12 05:04 UTC · model grok-4.3

classification 💻 cs.CY cs.AI
keywords epistemic pollutionAI-generated sciencescientific verificationblueprintsepistemic infrastructureresearch artifactsstructured graphs
0
0 comments X

The pith

Science should redesign its epistemic infrastructure to rebalance the costs of generating and verifying research in the age of AI.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper claims that AI now generates plausible scientific papers, reviews, and surveys at low cost, creating a risk that unreliable content will outpace the system's ability to check it. The existing paper format hides the full logic of an argument in prose, so reviewers must reconstruct the structure before they can assess it, keeping verification expensive. Treating this as an engineering task, the authors introduce blueprints as structured artifacts that break claims, evidence, assumptions, and definitions into typed graph components. These artifacts accept higher initial effort in exchange for faster, more modular, and more distributed checking later. A working prototype demonstrates that the approach can be built in practice.

Core claim

AI systems can now cheaply generate plausible scientific artifacts such as papers, reviews, and surveys. This creates a risk of epistemic pollution in our scientific systems, where unreliable but plausible-looking artifacts can accumulate faster than the system can filter them out. The problem is structural: the epistemic infrastructure of science was calibrated to a world where producing a plausible artifact required substantial expertise, labor, and time, so generation cost itself served as a rough filter; AI weakens that filter without comparably lowering verification cost. The current paper-centered system makes verification expensive: papers compress long-context scientific logic into a

What carries the argument

Blueprints: structured, decomposed research artifacts that represent claims, evidence, assumptions, and definitions as typed graph components, trading upfront generation cost for cheaper and more local verification downstream.

Load-bearing premise

That structured blueprints will deliver net cheaper and more reliable verification in practice, without creating new adoption barriers, coordination costs, or verification overhead that outweigh the benefits.

What would settle it

A controlled experiment that measures total reviewer time and error rate when checking the same research claim presented once as a standard paper and once as a blueprint.

Figures

Figures reproduced from arXiv: 2605.10425 by Jiaqi W. Ma.

Figure 1
Figure 1. Figure 1: An illustration of the blueprint as a heterogeneous graph with typed nodes and edges. [PITH_FULL_IMAGE:figures/full_fig_p006_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Dual interfaces over a shared blueprint document. Left: the browser canvas, showing a blueprint as an [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗
read the original abstract

AI systems can now cheaply generate plausible scientific artifacts such as papers, reviews, and surveys. This creates a risk of \emph{epistemic pollution} in our scientific systems, where unreliable but plausible-looking artifacts can accumulate faster than the system can filter them out. The problem is structural: the epistemic infrastructure of science was calibrated to a world where producing a plausible artifact required substantial expertise, labor, and time, so generation cost itself served as a rough filter; AI weakens that filter without comparably lowering verification cost. We argue that \textbf{AI-era science should treat this as an engineering problem: redesigning epistemic infrastructure to rebalance the costs of generation and verification}. The current paper-centered system makes verification expensive: papers compress long-context scientific logic into prose, forcing reviewers, human or AI, to reconstruct underlying argument structure before they can evaluate it. As one step in this direction, we propose \textbf{blueprints} as preliminary epistemic infrastructure: structured, decomposed research artifacts that represent claims, evidence, assumptions, and definitions as typed graph components. Blueprints are designed to trade an upfront generation cost for cheaper, more local, more distributed verification downstream. We have instantiated the proposal in a proof-of-concept prototype.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 0 minor

Summary. The paper argues that AI has drastically lowered the cost of generating plausible scientific artifacts (papers, reviews), risking epistemic pollution because verification costs have not decreased in tandem. The traditional paper format compresses complex arguments into prose, making verification expensive for both humans and AI. It proposes treating this as an engineering problem by redesigning epistemic infrastructure around 'blueprints': structured, typed-graph artifacts that decompose claims, evidence, assumptions, and definitions into modular components. This is intended to trade modest upfront structuring effort for cheaper, more local, and distributed verification downstream. A proof-of-concept prototype is presented to demonstrate syntactic feasibility.

Significance. If the blueprint approach can be shown to produce net reductions in verification effort and error rates without prohibitive adoption or coordination costs, the work could have substantial significance for metascience and the evolution of scientific publishing in the AI era. It offers a concrete, implementable direction for addressing a structural mismatch between generation and verification. The inclusion of a working prototype is a strength, as it moves beyond pure speculation to demonstrate technical viability.

major comments (1)
  1. Proof-of-concept prototype section: The manuscript demonstrates that blueprints can be instantiated as typed graphs but supplies no measurements, user studies, or comparisons of verification time, accuracy, or total effort relative to standard prose papers. This is load-bearing for the central claim, as the rebalancing of generation and verification costs is asserted without evidence that the new format yields net savings rather than shifting or increasing epistemic costs.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their constructive review and for acknowledging the potential significance of treating epistemic infrastructure as an engineering problem in the AI era. We address the major comment point by point below.

read point-by-point responses
  1. Referee: Proof-of-concept prototype section: The manuscript demonstrates that blueprints can be instantiated as typed graphs but supplies no measurements, user studies, or comparisons of verification time, accuracy, or total effort relative to standard prose papers. This is load-bearing for the central claim, as the rebalancing of generation and verification costs is asserted without evidence that the new format yields net savings rather than shifting or increasing epistemic costs.

    Authors: We agree that the prototype establishes only syntactic and structural feasibility through a working implementation of typed-graph blueprints, without supplying quantitative measurements, user studies, or direct comparisons of verification effort against prose papers. The manuscript is framed as a conceptual proposal whose core claim—that modular decomposition into claims, evidence, assumptions, and definitions enables cheaper downstream verification—is grounded in the structural properties of the representation rather than in empirical data. The argument is that local verification of individual typed components avoids the need to reconstruct full argument chains from compressed prose, but we recognize that net cost savings versus potential increases in upfront structuring effort remain unmeasured. In the revised version we will add an explicit limitations subsection that states this gap, provides a qualitative discussion of the hypothesized trade-offs, and outlines a concrete plan for future controlled evaluations (e.g., metrics for verification time, error rates, and total effort). revision: yes

Circularity Check

0 steps flagged

No significant circularity; proposal is forward-looking design argument

full rationale

The paper advances a conceptual proposal for epistemic infrastructure redesign via blueprints, without equations, fitted parameters, predictions, or derivations that reduce to inputs by construction. No self-citations are load-bearing for the core claim, and the argument does not rename known results or smuggle ansatzes. The rebalancing claim is presented as an engineering hypothesis rather than a self-referential or fitted result, making the derivation self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The proposal rests on the untested premise that current prose papers are the main source of verification cost and that adding structured decomposition will reduce net cost without new frictions. No free parameters or invented physical entities; the main invented entity is the blueprint artifact itself.

axioms (2)
  • domain assumption Generation cost has historically served as a rough epistemic filter in science
    Stated directly in the abstract as the reason AI creates a structural problem.
  • domain assumption Reconstructing argument structure from prose is the dominant verification cost
    Presented as the reason paper-centered systems are expensive to verify.
invented entities (1)
  • blueprints no independent evidence
    purpose: Structured, decomposed research artifacts using typed graph components for claims, evidence, assumptions, and definitions
    New epistemic infrastructure proposed to trade upfront generation cost for cheaper downstream verification; no independent evidence of effectiveness is supplied.

pith-pipeline@v0.9.0 · 5513 in / 1463 out tokens · 37731 ms · 2026-05-12T05:04:50.670353+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

43 extracted references · 43 canonical work pages · 3 internal anchors

  1. [1]

    The story is not the science: Execution-grounded evaluation of mechanistic interpretability research.arXiv preprint arXiv:2602.18458, 2026

    Xiaoyan Bai, Alexander Baumgartner, Haojia Sun, Ari Holtzman, and Chenhao Tan. The story is not the science: Execution-grounded evaluation of mechanistic interpretability research.arXiv preprint arXiv:2602.18458, 2026

  2. [2]

    Nanopublication-based semantic publishing and reviewing: a field study with formalization papers.PeerJ Computer Science, 9:e1159, 2023

    Cristina-Iulia Bucur, Tobias Kuhn, Davide Ceolin, and Jacco van Ossenbruggen. Nanopublication-based semantic publishing and reviewing: a field study with formalization papers.PeerJ Computer Science, 9:e1159, 2023

  3. [3]

    Micropublications: a semantic model for claims, evidence, arguments and annotations in biomedical communications.Journal of biomedical semantics, 5(1):28, 2014

    Tim Clark, Paolo N Ciccarese, and Carole A Goble. Micropublications: a semantic model for claims, evidence, arguments and annotations in biomedical communications.Journal of biomedical semantics, 5(1):28, 2014

  4. [4]

    The lean theorem prover (system description)

    Leonardo De Moura, Soonho Kong, Jeremy Avigad, Floris Van Doorn, and Jakob von Raumer. The lean theorem prover (system description). InInternational Conference on Automated Deduction, pages 378–388. Springer, 2015

  5. [5]

    Llm use in scholarly writing poses a provenance problem.Nature Machine Intelligence, pages 1–2, 2025

    Brian D Earp, Haotian Yuan, Julian Koplin, and Sebastian Porsdam Mann. Llm use in scholarly writing poses a provenance problem.Nature Machine Intelligence, pages 1–2, 2025

  6. [6]

    Could octopus help researchers produce high quality, open research? Octopus, 2023

    Alexandra Freeman, Pen-Yuan Hsing, Mariia Tukanova, Jacqueline Thompson, and Marcus Munafo. Could octopus help researchers produce high quality, open research? Octopus, 2023. URL https://doi.org/10. 57874/cyz8-es44. Rationale / Hypothesis, published 26 October 2023

  7. [7]

    An intelligent infrastructure as a foundation for modern science.ArXiv, 2025

    Satrajit Ghosh. An intelligent infrastructure as a foundation for modern science.ArXiv, 2025. URL https: //api.semanticscholar.org/CorpusID:280649886

  8. [8]

    The anatomy of a nanopublication.Information services and use, 30(1-2):51–56, 2010

    Paul Groth, Andrew Gibson, and Jan Velterop. The anatomy of a nanopublication.Information services and use, 30(1-2):51–56, 2010

  9. [9]

    Jutta Haider, Kristofer Rolf Söderström, Björn Ekström, and Malte Rödl. Gpt-fabricated scientific papers on google scholar: Key features, spread, and implications for preempting evidence manipulation.Harvard Kennedy School Misinformation Review, 5(5), 2024

  10. [10]

    Academic journals’ ai policies fail to curb the surge in ai-assisted academic writing

    Yongyuan He and Yi Bu. Academic journals’ ai policies fail to curb the surge in ai-assisted academic writing. Proceedings of the National Academy of Sciences, 123(9):e2526734123, 2026

  11. [11]

    Bibliometrics: the leiden manifesto for research metrics.Nature, 520(7548):429–431, 2015

    Diana Hicks, Paul Wouters, Ludo Waltman, Sarah De Rijcke, and Ismael Rafols. Bibliometrics: the leiden manifesto for research metrics.Nature, 520(7548):429–431, 2015

  12. [12]

    Chuxuan Hu, Liyun Zhang, Yeji Lim, Aum Wadhwani, Austin Peters, and Daniel Kang. Repro-bench: Can agentic ai systems assess the reproducibility of social science research? InFindings of the Association for Computational Linguistics: ACL 2025, pages 23616–23626, 2025

  13. [13]

    Open research knowledge graph: Next generation infrastructure for semantic scholarly knowledge

    Mohamad Yaser Jaradeh, Allard Oelen, Kheir Eddine Farfar, Manuel Prinz, Jennifer D’Souza, Gábor Kismihók, Markus Stocker, and Sören Auer. Open research knowledge graph: Next generation infrastructure for semantic scholarly knowledge. InProceedings of the 10th international conference on knowledge capture, pages 243–246, 2019

  14. [14]

    Can ai validate science? benchmarking llms on claim–evidence reasoning in ai papers

    Shashidhar Reddy Javaji, Yupeng Cao, Haohang Li, Yangyang Yu, Nikhil Muralidhar, and Zining Zhu. Can ai validate science? benchmarking llms on claim–evidence reasoning in ai papers. InProceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational...

  15. [15]

    Leading preprint server clamps down on ’ai slop’.Science, 391 6784:432–433, 2026

    Nicola Jones. Leading preprint server clamps down on ’ai slop’.Science, 391 6784:432–433, 2026. URL https://api.semanticscholar.org/CorpusID:285174241

  16. [16]

    Position: The ai conference peer review crisis demands author feedback and reviewer rewards.arXiv preprint arXiv:2505.04966,

    Jaeho Kim, Yunseok Lee, and Seulki Lee. Position: The ai conference peer review crisis demands author feedback and reviewer rewards.arXiv preprint arXiv:2505.04966, 2025. 11 Toward an Engineering of ScienceWORKINGPAPER

  17. [17]

    Genuine semantic publishing.Data Science, 1(1-2):139–154, 2017

    Tobias Kuhn and Michel Dumontier. Genuine semantic publishing.Data Science, 1(1-2):139–154, 2017

  18. [18]

    Kunz and H.W.J

    W. Kunz and H.W.J. Rittel. Issues as elements of information systems. Working paper 131, Institute of Urban and Regional Development, University of California, 1970. URL https://books.google.com/books?id= B-MaAQAAMAAJ

  19. [19]

    Scientific production in the era of large language models.Science, 390(6779):1240–1243, 2025

    Keigo Kusumegi, Xinyu Yang, Paul Ginsparg, Mathijs de Vaan, Toby Stuart, and Yian Yin. Scientific production in the era of large language models.Science, 390(6779):1240–1243, 2025. doi: 10.1126/science.adw3000. URL https://www.science.org/doi/abs/10.1126/science.adw3000

  20. [20]

    Can large language models provide useful feedback on research papers? a large-scale empirical analysis.NEJM AI, 1(8):AIoa2400196, 2024

    Weixin Liang, Yuhui Zhang, Hancheng Cao, Binglu Wang, Daisy Yi Ding, Xinyu Yang, Kailas V odrahalli, Siyu He, Daniel Scott Smith, Yian Yin, et al. Can large language models provide useful feedback on research papers? a large-scale empirical analysis.NEJM AI, 1(8):AIoa2400196, 2024

  21. [21]

    Quantifying large language model usage in scientific papers.Nature Human Behaviour, pages 1–11, 2025

    Weixin Liang, Yaohui Zhang, Zhengxuan Wu, Haley Lepp, Wenlong Ji, Xuandong Zhao, Hancheng Cao, Sheng Liu, Siyu He, Zhi Huang, et al. Quantifying large language model usage in scientific papers.Nature Human Behaviour, pages 1–11, 2025

  22. [22]

    Stop ddos attacking the research community with ai-generated survey papers.arXiv preprint arXiv:2510.09686, 2025a

    Jianghao Lin, Rong Shan, Jiachen Zhu, Yunjia Xi, Yong Yu, and Weinan Zhang. Stop ddos attacking the research community with ai-generated survey papers.arXiv preprint arXiv:2510.09686, 2025

  23. [23]

    The Last Human-Written Paper: Agent-Native Research Artifacts

    Jiachen Liu, Jiaxin Pei, Jintao Huang, Chenglei Si, Ao Qu, Xiangru Tang, Runyu Lu, Lichang Chen, Xiaoyan Bai, Haizhong Zheng, Carl Chen, Zhiyang Chen, Haojie Ye, Yujuan Fu, Zexue He, Zijian Jin, Zhenyu Zhang, Shangquan Sun, Maestro Harmon, John Dianzhuo Wang, Jianqiao Zeng, Jiachen Sun, Mingyuan Wu, Baoyu Zhou, Chenyu You, Shijian Lu, Yiming Qiu, Fan Lai,...

  24. [24]

    The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery

    Chris Lu, Cong Lu, Robert Tjarko Lange, Jakob Foerster, Jeff Clune, and David Ha. The ai scientist: Towards fully automated open-ended scientific discovery.ArXiv, abs/2408.06292, 2024. URL https: //api.semanticscholar.org/CorpusID:271854887

  25. [25]

    University of Chicago press, 1973

    Robert K Merton.The sociology of science: Theoretical and empirical investigations. University of Chicago press, 1973

  26. [26]

    Major ai conference flooded with peer reviews written fully by ai.Nature, 648(8093):256–257, 2025

    Miryam Naddaf. Major ai conference flooded with peer reviews written fully by ai.Nature, 648(8093):256–257, 2025

  27. [27]

    Hallucinated citations are polluting the scientific literature

    Miryam Naddaf and Elizabeth Quill. Hallucinated citations are polluting the scientific literature. what can be done?Nature, 652(8108):26–29, 2026

  28. [28]

    Open science falling behind in the era of artificial intelligence.Frontiers in Research Metrics and Analytics, 10:1595824, 2025

    Guanxiong Pei and Huajian Huang. Open science falling behind in the era of artificial intelligence.Frontiers in Research Metrics and Analytics, 10:1595824, 2025

  29. [29]

    arXiv:2601.18724

    Yusuke Sakai, Hidetaka Kamigaito, and Taro Watanabe. Hallucitation matters: Revealing the impact of hallucinated references with 300 hallucinated papers in acl conferences.arXiv preprint arXiv:2601.18724, 2026

  30. [30]

    Semantic publishing: the coming revolution in scientific journal publishing.Learned Publishing, 22(2):85–94, 2009

    David Shotton. Semantic publishing: the coming revolution in scientific journal publishing.Learned Publishing, 22(2):85–94, 2009

  31. [31]

    Cito, the citation typing ontology.Journal of biomedical semantics, 1(Suppl 1):S6, 2010

    David Shotton. Cito, the citation typing ontology.Journal of biomedical semantics, 1(Suppl 1):S6, 2010

  32. [32]

    When ai co-scientists fail: Spot-a benchmark for automated verification of scientific research.ArXiv, abs/2505.11855, 2025

    Guijin Son, Jiwoo Hong, Honglu Fan, Heejeong Nam, Hyunwoo Ko, Seungwon Lim, Jinyeop Song, Jinhang Choi, Gonccalo Paulo, Youngjae Yu, and Stella Biderman. When ai co-scientists fail: Spot-a benchmark for automated verification of scientific research.ArXiv, abs/2505.11855, 2025. URL https://api.semanticscholar.org/ CorpusID:278740501

  33. [33]

    False authorship: an explorative case study around an ai-generated article published under my name.Research Integrity and Peer Review, 10(1):8, 2025

    Diomidis Spinellis. False authorship: an explorative case study around an ai-generated article published under my name.Research Integrity and Peer Review, 10(1):8, 2025

  34. [34]

    2025 , doi =

    Jiabin Tang, Lianghao Xia, Zhonghang Li, and Chao Huang. Ai-researcher: Autonomous scientific innovation. arXiv preprint arXiv:2505.18705, 2025

  35. [35]

    A large-scale randomized study of large language model feedback in peer review.Nature Machine Intelligence, pages 1–11, 2026

    Nitya Thakkar, Mert Yuksekgonul, Jake Silberg, Animesh Garg, Nanyun Peng, Fei Sha, Rose Yu, Carl V ondrick, and James Zou. A large-scale randomized study of large language model feedback in peer review.Nature Machine Intelligence, pages 1–11, 2026

  36. [36]

    Cambridge university press, 2003

    Stephen E Toulmin.The uses of argument. Cambridge university press, 2003

  37. [37]

    Fact or fiction: Verifying scientific claims

    David Wadden, Shanchuan Lin, Kyle Lo, Lucy Lu Wang, Madeleine van Zuylen, Arman Cohan, and Hannaneh Hajishirzi. Fact or fiction: Verifying scientific claims. InProceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 7534–7550, 2020. 12 Toward an Engineering of ScienceWORKINGPAPER

  38. [38]

    Scifact-open: Towards open-domain scientific claim verification

    David Wadden, Kyle Lo, Bailey Kuehl, Arman Cohan, Iz Beltagy, Lucy Lu Wang, and Hannaneh Hajishirzi. Scifact-open: Towards open-domain scientific claim verification. InFindings of the Association for Computational Linguistics: EMNLP 2022, pages 4719–4734, 2022

  39. [39]

    Autosurvey: Large language models can automatically write surveys.Advances in neural information processing systems, 37:115119–115145, 2024

    Yidong Wang, Qi Guo, Wenjin Yao, Hongbo Zhang, Xin Zhang, Zhen Wu, Meishan Zhang, Xinyu Dai, Min Zhang, Qingsong Wen, et al. Autosurvey: Large language models can automatically write surveys.Advances in neural information processing systems, 37:115119–115145, 2024

  40. [40]

    The ai imperative: Scaling high-quality peer review in machine learning.arXiv preprint arXiv:2506.08134, 2025

    Qiyao Wei, Samuel Holt, Jing Yang, Markus Wulfmeier, and Mihaela van der Schaar. The ai imperative: Scaling high-quality peer review in machine learning.arXiv preprint arXiv:2506.08134, 2025

  41. [41]

    The AI Scientist-v2: Workshop-Level Automated Scientific Discovery via Agentic Tree Search

    Yutaro Yamada, Robert Tjarko Lange, Cong Lu, Shengran Hu, Chris Lu, Jakob Foerster, Jeff Clune, and David Ha. The ai scientist-v2: Workshop-level automated scientific discovery via agentic tree search.ArXiv, abs/2504.08066,

  42. [42]

    URLhttps://api.semanticscholar.org/CorpusID:277741107

  43. [43]

    Leandojo: Theorem proving with retrieval-augmented language models.Advances in Neural Information Processing Systems, 36:21573–21612, 2023

    Kaiyu Yang, Aidan Swope, Alex Gu, Rahul Chalamala, Peiyang Song, Shixing Yu, Saad Godil, Ryan J Prenger, and Animashree Anandkumar. Leandojo: Theorem proving with retrieval-augmented language models.Advances in Neural Information Processing Systems, 36:21573–21612, 2023. 13