Recognition: 2 theorem links
· Lean TheoremToward an Engineering of Science: Rebalancing Generation and Verification in the Age of AI
Pith reviewed 2026-05-12 05:04 UTC · model grok-4.3
The pith
Science should redesign its epistemic infrastructure to rebalance the costs of generating and verifying research in the age of AI.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
AI systems can now cheaply generate plausible scientific artifacts such as papers, reviews, and surveys. This creates a risk of epistemic pollution in our scientific systems, where unreliable but plausible-looking artifacts can accumulate faster than the system can filter them out. The problem is structural: the epistemic infrastructure of science was calibrated to a world where producing a plausible artifact required substantial expertise, labor, and time, so generation cost itself served as a rough filter; AI weakens that filter without comparably lowering verification cost. The current paper-centered system makes verification expensive: papers compress long-context scientific logic into a
What carries the argument
Blueprints: structured, decomposed research artifacts that represent claims, evidence, assumptions, and definitions as typed graph components, trading upfront generation cost for cheaper and more local verification downstream.
Load-bearing premise
That structured blueprints will deliver net cheaper and more reliable verification in practice, without creating new adoption barriers, coordination costs, or verification overhead that outweigh the benefits.
What would settle it
A controlled experiment that measures total reviewer time and error rate when checking the same research claim presented once as a standard paper and once as a blueprint.
Figures
read the original abstract
AI systems can now cheaply generate plausible scientific artifacts such as papers, reviews, and surveys. This creates a risk of \emph{epistemic pollution} in our scientific systems, where unreliable but plausible-looking artifacts can accumulate faster than the system can filter them out. The problem is structural: the epistemic infrastructure of science was calibrated to a world where producing a plausible artifact required substantial expertise, labor, and time, so generation cost itself served as a rough filter; AI weakens that filter without comparably lowering verification cost. We argue that \textbf{AI-era science should treat this as an engineering problem: redesigning epistemic infrastructure to rebalance the costs of generation and verification}. The current paper-centered system makes verification expensive: papers compress long-context scientific logic into prose, forcing reviewers, human or AI, to reconstruct underlying argument structure before they can evaluate it. As one step in this direction, we propose \textbf{blueprints} as preliminary epistemic infrastructure: structured, decomposed research artifacts that represent claims, evidence, assumptions, and definitions as typed graph components. Blueprints are designed to trade an upfront generation cost for cheaper, more local, more distributed verification downstream. We have instantiated the proposal in a proof-of-concept prototype.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper argues that AI has drastically lowered the cost of generating plausible scientific artifacts (papers, reviews), risking epistemic pollution because verification costs have not decreased in tandem. The traditional paper format compresses complex arguments into prose, making verification expensive for both humans and AI. It proposes treating this as an engineering problem by redesigning epistemic infrastructure around 'blueprints': structured, typed-graph artifacts that decompose claims, evidence, assumptions, and definitions into modular components. This is intended to trade modest upfront structuring effort for cheaper, more local, and distributed verification downstream. A proof-of-concept prototype is presented to demonstrate syntactic feasibility.
Significance. If the blueprint approach can be shown to produce net reductions in verification effort and error rates without prohibitive adoption or coordination costs, the work could have substantial significance for metascience and the evolution of scientific publishing in the AI era. It offers a concrete, implementable direction for addressing a structural mismatch between generation and verification. The inclusion of a working prototype is a strength, as it moves beyond pure speculation to demonstrate technical viability.
major comments (1)
- Proof-of-concept prototype section: The manuscript demonstrates that blueprints can be instantiated as typed graphs but supplies no measurements, user studies, or comparisons of verification time, accuracy, or total effort relative to standard prose papers. This is load-bearing for the central claim, as the rebalancing of generation and verification costs is asserted without evidence that the new format yields net savings rather than shifting or increasing epistemic costs.
Simulated Author's Rebuttal
We thank the referee for their constructive review and for acknowledging the potential significance of treating epistemic infrastructure as an engineering problem in the AI era. We address the major comment point by point below.
read point-by-point responses
-
Referee: Proof-of-concept prototype section: The manuscript demonstrates that blueprints can be instantiated as typed graphs but supplies no measurements, user studies, or comparisons of verification time, accuracy, or total effort relative to standard prose papers. This is load-bearing for the central claim, as the rebalancing of generation and verification costs is asserted without evidence that the new format yields net savings rather than shifting or increasing epistemic costs.
Authors: We agree that the prototype establishes only syntactic and structural feasibility through a working implementation of typed-graph blueprints, without supplying quantitative measurements, user studies, or direct comparisons of verification effort against prose papers. The manuscript is framed as a conceptual proposal whose core claim—that modular decomposition into claims, evidence, assumptions, and definitions enables cheaper downstream verification—is grounded in the structural properties of the representation rather than in empirical data. The argument is that local verification of individual typed components avoids the need to reconstruct full argument chains from compressed prose, but we recognize that net cost savings versus potential increases in upfront structuring effort remain unmeasured. In the revised version we will add an explicit limitations subsection that states this gap, provides a qualitative discussion of the hypothesized trade-offs, and outlines a concrete plan for future controlled evaluations (e.g., metrics for verification time, error rates, and total effort). revision: yes
Circularity Check
No significant circularity; proposal is forward-looking design argument
full rationale
The paper advances a conceptual proposal for epistemic infrastructure redesign via blueprints, without equations, fitted parameters, predictions, or derivations that reduce to inputs by construction. No self-citations are load-bearing for the core claim, and the argument does not rename known results or smuggle ansatzes. The rebalancing claim is presented as an engineering hypothesis rather than a self-referential or fitted result, making the derivation self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Generation cost has historically served as a rough epistemic filter in science
- domain assumption Reconstructing argument structure from prose is the dominant verification cost
invented entities (1)
-
blueprints
no independent evidence
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
blueprints as structured, decomposed research artifacts that represent claims, evidence, assumptions, and definitions as typed graph components... trade an upfront generation cost for cheaper, more local, more distributed verification downstream
-
IndisputableMonolith/Foundation/ArithmeticFromLogic.leanLogicNat recovery theorem unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
leanblueprint... typed graph: nodes are mathematical statements... edges record dependency
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Xiaoyan Bai, Alexander Baumgartner, Haojia Sun, Ari Holtzman, and Chenhao Tan. The story is not the science: Execution-grounded evaluation of mechanistic interpretability research.arXiv preprint arXiv:2602.18458, 2026
-
[2]
Cristina-Iulia Bucur, Tobias Kuhn, Davide Ceolin, and Jacco van Ossenbruggen. Nanopublication-based semantic publishing and reviewing: a field study with formalization papers.PeerJ Computer Science, 9:e1159, 2023
work page 2023
-
[3]
Tim Clark, Paolo N Ciccarese, and Carole A Goble. Micropublications: a semantic model for claims, evidence, arguments and annotations in biomedical communications.Journal of biomedical semantics, 5(1):28, 2014
work page 2014
-
[4]
The lean theorem prover (system description)
Leonardo De Moura, Soonho Kong, Jeremy Avigad, Floris Van Doorn, and Jakob von Raumer. The lean theorem prover (system description). InInternational Conference on Automated Deduction, pages 378–388. Springer, 2015
work page 2015
-
[5]
Llm use in scholarly writing poses a provenance problem.Nature Machine Intelligence, pages 1–2, 2025
Brian D Earp, Haotian Yuan, Julian Koplin, and Sebastian Porsdam Mann. Llm use in scholarly writing poses a provenance problem.Nature Machine Intelligence, pages 1–2, 2025
work page 2025
-
[6]
Could octopus help researchers produce high quality, open research? Octopus, 2023
Alexandra Freeman, Pen-Yuan Hsing, Mariia Tukanova, Jacqueline Thompson, and Marcus Munafo. Could octopus help researchers produce high quality, open research? Octopus, 2023. URL https://doi.org/10. 57874/cyz8-es44. Rationale / Hypothesis, published 26 October 2023
work page 2023
-
[7]
An intelligent infrastructure as a foundation for modern science.ArXiv, 2025
Satrajit Ghosh. An intelligent infrastructure as a foundation for modern science.ArXiv, 2025. URL https: //api.semanticscholar.org/CorpusID:280649886
work page 2025
-
[8]
The anatomy of a nanopublication.Information services and use, 30(1-2):51–56, 2010
Paul Groth, Andrew Gibson, and Jan Velterop. The anatomy of a nanopublication.Information services and use, 30(1-2):51–56, 2010
work page 2010
-
[9]
Jutta Haider, Kristofer Rolf Söderström, Björn Ekström, and Malte Rödl. Gpt-fabricated scientific papers on google scholar: Key features, spread, and implications for preempting evidence manipulation.Harvard Kennedy School Misinformation Review, 5(5), 2024
work page 2024
-
[10]
Academic journals’ ai policies fail to curb the surge in ai-assisted academic writing
Yongyuan He and Yi Bu. Academic journals’ ai policies fail to curb the surge in ai-assisted academic writing. Proceedings of the National Academy of Sciences, 123(9):e2526734123, 2026
work page 2026
-
[11]
Bibliometrics: the leiden manifesto for research metrics.Nature, 520(7548):429–431, 2015
Diana Hicks, Paul Wouters, Ludo Waltman, Sarah De Rijcke, and Ismael Rafols. Bibliometrics: the leiden manifesto for research metrics.Nature, 520(7548):429–431, 2015
work page 2015
-
[12]
Chuxuan Hu, Liyun Zhang, Yeji Lim, Aum Wadhwani, Austin Peters, and Daniel Kang. Repro-bench: Can agentic ai systems assess the reproducibility of social science research? InFindings of the Association for Computational Linguistics: ACL 2025, pages 23616–23626, 2025
work page 2025
-
[13]
Open research knowledge graph: Next generation infrastructure for semantic scholarly knowledge
Mohamad Yaser Jaradeh, Allard Oelen, Kheir Eddine Farfar, Manuel Prinz, Jennifer D’Souza, Gábor Kismihók, Markus Stocker, and Sören Auer. Open research knowledge graph: Next generation infrastructure for semantic scholarly knowledge. InProceedings of the 10th international conference on knowledge capture, pages 243–246, 2019
work page 2019
-
[14]
Can ai validate science? benchmarking llms on claim–evidence reasoning in ai papers
Shashidhar Reddy Javaji, Yupeng Cao, Haohang Li, Yangyang Yu, Nikhil Muralidhar, and Zining Zhu. Can ai validate science? benchmarking llms on claim–evidence reasoning in ai papers. InProceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational...
work page 2025
-
[15]
Leading preprint server clamps down on ’ai slop’.Science, 391 6784:432–433, 2026
Nicola Jones. Leading preprint server clamps down on ’ai slop’.Science, 391 6784:432–433, 2026. URL https://api.semanticscholar.org/CorpusID:285174241
work page 2026
-
[16]
Jaeho Kim, Yunseok Lee, and Seulki Lee. Position: The ai conference peer review crisis demands author feedback and reviewer rewards.arXiv preprint arXiv:2505.04966, 2025. 11 Toward an Engineering of ScienceWORKINGPAPER
-
[17]
Genuine semantic publishing.Data Science, 1(1-2):139–154, 2017
Tobias Kuhn and Michel Dumontier. Genuine semantic publishing.Data Science, 1(1-2):139–154, 2017
work page 2017
-
[18]
W. Kunz and H.W.J. Rittel. Issues as elements of information systems. Working paper 131, Institute of Urban and Regional Development, University of California, 1970. URL https://books.google.com/books?id= B-MaAQAAMAAJ
work page 1970
-
[19]
Scientific production in the era of large language models.Science, 390(6779):1240–1243, 2025
Keigo Kusumegi, Xinyu Yang, Paul Ginsparg, Mathijs de Vaan, Toby Stuart, and Yian Yin. Scientific production in the era of large language models.Science, 390(6779):1240–1243, 2025. doi: 10.1126/science.adw3000. URL https://www.science.org/doi/abs/10.1126/science.adw3000
-
[20]
Weixin Liang, Yuhui Zhang, Hancheng Cao, Binglu Wang, Daisy Yi Ding, Xinyu Yang, Kailas V odrahalli, Siyu He, Daniel Scott Smith, Yian Yin, et al. Can large language models provide useful feedback on research papers? a large-scale empirical analysis.NEJM AI, 1(8):AIoa2400196, 2024
work page 2024
-
[21]
Quantifying large language model usage in scientific papers.Nature Human Behaviour, pages 1–11, 2025
Weixin Liang, Yaohui Zhang, Zhengxuan Wu, Haley Lepp, Wenlong Ji, Xuandong Zhao, Hancheng Cao, Sheng Liu, Siyu He, Zhi Huang, et al. Quantifying large language model usage in scientific papers.Nature Human Behaviour, pages 1–11, 2025
work page 2025
-
[22]
Jianghao Lin, Rong Shan, Jiachen Zhu, Yunjia Xi, Yong Yu, and Weinan Zhang. Stop ddos attacking the research community with ai-generated survey papers.arXiv preprint arXiv:2510.09686, 2025
-
[23]
The Last Human-Written Paper: Agent-Native Research Artifacts
Jiachen Liu, Jiaxin Pei, Jintao Huang, Chenglei Si, Ao Qu, Xiangru Tang, Runyu Lu, Lichang Chen, Xiaoyan Bai, Haizhong Zheng, Carl Chen, Zhiyang Chen, Haojie Ye, Yujuan Fu, Zexue He, Zijian Jin, Zhenyu Zhang, Shangquan Sun, Maestro Harmon, John Dianzhuo Wang, Jianqiao Zeng, Jiachen Sun, Mingyuan Wu, Baoyu Zhou, Chenyu You, Shijian Lu, Yiming Qiu, Fan Lai,...
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[24]
The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery
Chris Lu, Cong Lu, Robert Tjarko Lange, Jakob Foerster, Jeff Clune, and David Ha. The ai scientist: Towards fully automated open-ended scientific discovery.ArXiv, abs/2408.06292, 2024. URL https: //api.semanticscholar.org/CorpusID:271854887
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[25]
University of Chicago press, 1973
Robert K Merton.The sociology of science: Theoretical and empirical investigations. University of Chicago press, 1973
work page 1973
-
[26]
Major ai conference flooded with peer reviews written fully by ai.Nature, 648(8093):256–257, 2025
Miryam Naddaf. Major ai conference flooded with peer reviews written fully by ai.Nature, 648(8093):256–257, 2025
work page 2025
-
[27]
Hallucinated citations are polluting the scientific literature
Miryam Naddaf and Elizabeth Quill. Hallucinated citations are polluting the scientific literature. what can be done?Nature, 652(8108):26–29, 2026
work page 2026
-
[28]
Guanxiong Pei and Huajian Huang. Open science falling behind in the era of artificial intelligence.Frontiers in Research Metrics and Analytics, 10:1595824, 2025
work page 2025
-
[29]
Yusuke Sakai, Hidetaka Kamigaito, and Taro Watanabe. Hallucitation matters: Revealing the impact of hallucinated references with 300 hallucinated papers in acl conferences.arXiv preprint arXiv:2601.18724, 2026
-
[30]
David Shotton. Semantic publishing: the coming revolution in scientific journal publishing.Learned Publishing, 22(2):85–94, 2009
work page 2009
-
[31]
Cito, the citation typing ontology.Journal of biomedical semantics, 1(Suppl 1):S6, 2010
David Shotton. Cito, the citation typing ontology.Journal of biomedical semantics, 1(Suppl 1):S6, 2010
work page 2010
-
[32]
Guijin Son, Jiwoo Hong, Honglu Fan, Heejeong Nam, Hyunwoo Ko, Seungwon Lim, Jinyeop Song, Jinhang Choi, Gonccalo Paulo, Youngjae Yu, and Stella Biderman. When ai co-scientists fail: Spot-a benchmark for automated verification of scientific research.ArXiv, abs/2505.11855, 2025. URL https://api.semanticscholar.org/ CorpusID:278740501
-
[33]
Diomidis Spinellis. False authorship: an explorative case study around an ai-generated article published under my name.Research Integrity and Peer Review, 10(1):8, 2025
work page 2025
-
[34]
Jiabin Tang, Lianghao Xia, Zhonghang Li, and Chao Huang. Ai-researcher: Autonomous scientific innovation. arXiv preprint arXiv:2505.18705, 2025
-
[35]
Nitya Thakkar, Mert Yuksekgonul, Jake Silberg, Animesh Garg, Nanyun Peng, Fei Sha, Rose Yu, Carl V ondrick, and James Zou. A large-scale randomized study of large language model feedback in peer review.Nature Machine Intelligence, pages 1–11, 2026
work page 2026
-
[36]
Cambridge university press, 2003
Stephen E Toulmin.The uses of argument. Cambridge university press, 2003
work page 2003
-
[37]
Fact or fiction: Verifying scientific claims
David Wadden, Shanchuan Lin, Kyle Lo, Lucy Lu Wang, Madeleine van Zuylen, Arman Cohan, and Hannaneh Hajishirzi. Fact or fiction: Verifying scientific claims. InProceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 7534–7550, 2020. 12 Toward an Engineering of ScienceWORKINGPAPER
work page 2020
-
[38]
Scifact-open: Towards open-domain scientific claim verification
David Wadden, Kyle Lo, Bailey Kuehl, Arman Cohan, Iz Beltagy, Lucy Lu Wang, and Hannaneh Hajishirzi. Scifact-open: Towards open-domain scientific claim verification. InFindings of the Association for Computational Linguistics: EMNLP 2022, pages 4719–4734, 2022
work page 2022
-
[39]
Yidong Wang, Qi Guo, Wenjin Yao, Hongbo Zhang, Xin Zhang, Zhen Wu, Meishan Zhang, Xinyu Dai, Min Zhang, Qingsong Wen, et al. Autosurvey: Large language models can automatically write surveys.Advances in neural information processing systems, 37:115119–115145, 2024
work page 2024
-
[40]
Qiyao Wei, Samuel Holt, Jing Yang, Markus Wulfmeier, and Mihaela van der Schaar. The ai imperative: Scaling high-quality peer review in machine learning.arXiv preprint arXiv:2506.08134, 2025
-
[41]
The AI Scientist-v2: Workshop-Level Automated Scientific Discovery via Agentic Tree Search
Yutaro Yamada, Robert Tjarko Lange, Cong Lu, Shengran Hu, Chris Lu, Jakob Foerster, Jeff Clune, and David Ha. The ai scientist-v2: Workshop-level automated scientific discovery via agentic tree search.ArXiv, abs/2504.08066,
work page internal anchor Pith review Pith/arXiv arXiv
-
[42]
URLhttps://api.semanticscholar.org/CorpusID:277741107
-
[43]
Kaiyu Yang, Aidan Swope, Alex Gu, Rahul Chalamala, Peiyang Song, Shixing Yu, Saad Godil, Ryan J Prenger, and Animashree Anandkumar. Leandojo: Theorem proving with retrieval-augmented language models.Advances in Neural Information Processing Systems, 36:21573–21612, 2023. 13
work page 2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.