arxiv: 2605.10425 · v1 · submitted 2026-05-11 · 💻 cs.CY · cs.AI

Recognition: 2 theorem links

· Lean Theorem

Toward an Engineering of Science: Rebalancing Generation and Verification in the Age of AI

Jiaqi W. Ma

Authors on Pith no claims yet

Pith reviewed 2026-05-12 05:04 UTC · model grok-4.3

classification 💻 cs.CY cs.AI

keywords epistemic pollutionAI-generated sciencescientific verificationblueprintsepistemic infrastructureresearch artifactsstructured graphs

0 comments

The pith

Science should redesign its epistemic infrastructure to rebalance the costs of generating and verifying research in the age of AI.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper claims that AI now generates plausible scientific papers, reviews, and surveys at low cost, creating a risk that unreliable content will outpace the system's ability to check it. The existing paper format hides the full logic of an argument in prose, so reviewers must reconstruct the structure before they can assess it, keeping verification expensive. Treating this as an engineering task, the authors introduce blueprints as structured artifacts that break claims, evidence, assumptions, and definitions into typed graph components. These artifacts accept higher initial effort in exchange for faster, more modular, and more distributed checking later. A working prototype demonstrates that the approach can be built in practice.

Core claim

AI systems can now cheaply generate plausible scientific artifacts such as papers, reviews, and surveys. This creates a risk of epistemic pollution in our scientific systems, where unreliable but plausible-looking artifacts can accumulate faster than the system can filter them out. The problem is structural: the epistemic infrastructure of science was calibrated to a world where producing a plausible artifact required substantial expertise, labor, and time, so generation cost itself served as a rough filter; AI weakens that filter without comparably lowering verification cost. The current paper-centered system makes verification expensive: papers compress long-context scientific logic into a

What carries the argument

Blueprints: structured, decomposed research artifacts that represent claims, evidence, assumptions, and definitions as typed graph components, trading upfront generation cost for cheaper and more local verification downstream.

Load-bearing premise

That structured blueprints will deliver net cheaper and more reliable verification in practice, without creating new adoption barriers, coordination costs, or verification overhead that outweigh the benefits.

What would settle it

A controlled experiment that measures total reviewer time and error rate when checking the same research claim presented once as a standard paper and once as a blueprint.

Figures

Figures reproduced from arXiv: 2605.10425 by Jiaqi W. Ma.

**Figure 2.** Figure 2: Dual interfaces over a shared blueprint document. Left: the browser canvas, showing a blueprint as an [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗

read the original abstract

AI systems can now cheaply generate plausible scientific artifacts such as papers, reviews, and surveys. This creates a risk of \emph{epistemic pollution} in our scientific systems, where unreliable but plausible-looking artifacts can accumulate faster than the system can filter them out. The problem is structural: the epistemic infrastructure of science was calibrated to a world where producing a plausible artifact required substantial expertise, labor, and time, so generation cost itself served as a rough filter; AI weakens that filter without comparably lowering verification cost. We argue that \textbf{AI-era science should treat this as an engineering problem: redesigning epistemic infrastructure to rebalance the costs of generation and verification}. The current paper-centered system makes verification expensive: papers compress long-context scientific logic into prose, forcing reviewers, human or AI, to reconstruct underlying argument structure before they can evaluate it. As one step in this direction, we propose \textbf{blueprints} as preliminary epistemic infrastructure: structured, decomposed research artifacts that represent claims, evidence, assumptions, and definitions as typed graph components. Blueprints are designed to trade an upfront generation cost for cheaper, more local, more distributed verification downstream. We have instantiated the proposal in a proof-of-concept prototype.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper frames AI-driven generation as a structural threat to verification and proposes typed graph 'blueprints' to rebalance the costs, but offers no evidence that the change would actually save effort or reduce errors.

read the letter

The main takeaway is that this is a conceptual engineering proposal for handling cheap AI-generated science. It argues the current paper format forces expensive reconstruction of arguments during review, and suggests breaking claims into typed graph components—claims, evidence, assumptions, definitions—as 'blueprints' to enable more local, distributed checking. They include a proof-of-concept prototype showing the structure can be built and parsed.

Referee Report

1 major / 0 minor

Summary. The paper argues that AI has drastically lowered the cost of generating plausible scientific artifacts (papers, reviews), risking epistemic pollution because verification costs have not decreased in tandem. The traditional paper format compresses complex arguments into prose, making verification expensive for both humans and AI. It proposes treating this as an engineering problem by redesigning epistemic infrastructure around 'blueprints': structured, typed-graph artifacts that decompose claims, evidence, assumptions, and definitions into modular components. This is intended to trade modest upfront structuring effort for cheaper, more local, and distributed verification downstream. A proof-of-concept prototype is presented to demonstrate syntactic feasibility.

Significance. If the blueprint approach can be shown to produce net reductions in verification effort and error rates without prohibitive adoption or coordination costs, the work could have substantial significance for metascience and the evolution of scientific publishing in the AI era. It offers a concrete, implementable direction for addressing a structural mismatch between generation and verification. The inclusion of a working prototype is a strength, as it moves beyond pure speculation to demonstrate technical viability.

major comments (1)

Proof-of-concept prototype section: The manuscript demonstrates that blueprints can be instantiated as typed graphs but supplies no measurements, user studies, or comparisons of verification time, accuracy, or total effort relative to standard prose papers. This is load-bearing for the central claim, as the rebalancing of generation and verification costs is asserted without evidence that the new format yields net savings rather than shifting or increasing epistemic costs.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their constructive review and for acknowledging the potential significance of treating epistemic infrastructure as an engineering problem in the AI era. We address the major comment point by point below.

read point-by-point responses

Referee: Proof-of-concept prototype section: The manuscript demonstrates that blueprints can be instantiated as typed graphs but supplies no measurements, user studies, or comparisons of verification time, accuracy, or total effort relative to standard prose papers. This is load-bearing for the central claim, as the rebalancing of generation and verification costs is asserted without evidence that the new format yields net savings rather than shifting or increasing epistemic costs.

Authors: We agree that the prototype establishes only syntactic and structural feasibility through a working implementation of typed-graph blueprints, without supplying quantitative measurements, user studies, or direct comparisons of verification effort against prose papers. The manuscript is framed as a conceptual proposal whose core claim—that modular decomposition into claims, evidence, assumptions, and definitions enables cheaper downstream verification—is grounded in the structural properties of the representation rather than in empirical data. The argument is that local verification of individual typed components avoids the need to reconstruct full argument chains from compressed prose, but we recognize that net cost savings versus potential increases in upfront structuring effort remain unmeasured. In the revised version we will add an explicit limitations subsection that states this gap, provides a qualitative discussion of the hypothesized trade-offs, and outlines a concrete plan for future controlled evaluations (e.g., metrics for verification time, error rates, and total effort). revision: yes

Circularity Check

0 steps flagged

No significant circularity; proposal is forward-looking design argument

full rationale

The paper advances a conceptual proposal for epistemic infrastructure redesign via blueprints, without equations, fitted parameters, predictions, or derivations that reduce to inputs by construction. No self-citations are load-bearing for the core claim, and the argument does not rename known results or smuggle ansatzes. The rebalancing claim is presented as an engineering hypothesis rather than a self-referential or fitted result, making the derivation self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The proposal rests on the untested premise that current prose papers are the main source of verification cost and that adding structured decomposition will reduce net cost without new frictions. No free parameters or invented physical entities; the main invented entity is the blueprint artifact itself.

axioms (2)

domain assumption Generation cost has historically served as a rough epistemic filter in science
Stated directly in the abstract as the reason AI creates a structural problem.
domain assumption Reconstructing argument structure from prose is the dominant verification cost
Presented as the reason paper-centered systems are expensive to verify.

invented entities (1)

blueprints no independent evidence
purpose: Structured, decomposed research artifacts using typed graph components for claims, evidence, assumptions, and definitions
New epistemic infrastructure proposed to trade upfront generation cost for cheaper downstream verification; no independent evidence of effectiveness is supplied.

pith-pipeline@v0.9.0 · 5513 in / 1463 out tokens · 37731 ms · 2026-05-12T05:04:50.670353+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

blueprints as structured, decomposed research artifacts that represent claims, evidence, assumptions, and definitions as typed graph components... trade an upfront generation cost for cheaper, more local, more distributed verification downstream
IndisputableMonolith/Foundation/ArithmeticFromLogic.lean LogicNat recovery theorem unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

leanblueprint... typed graph: nodes are mathematical statements... edges record dependency

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

43 extracted references · 43 canonical work pages · 3 internal anchors

[1]

The story is not the science: Execution-grounded evaluation of mechanistic interpretability research.arXiv preprint arXiv:2602.18458, 2026

Xiaoyan Bai, Alexander Baumgartner, Haojia Sun, Ari Holtzman, and Chenhao Tan. The story is not the science: Execution-grounded evaluation of mechanistic interpretability research.arXiv preprint arXiv:2602.18458, 2026

work page arXiv 2026
[2]

Nanopublication-based semantic publishing and reviewing: a field study with formalization papers.PeerJ Computer Science, 9:e1159, 2023

Cristina-Iulia Bucur, Tobias Kuhn, Davide Ceolin, and Jacco van Ossenbruggen. Nanopublication-based semantic publishing and reviewing: a field study with formalization papers.PeerJ Computer Science, 9:e1159, 2023

work page 2023
[3]

Micropublications: a semantic model for claims, evidence, arguments and annotations in biomedical communications.Journal of biomedical semantics, 5(1):28, 2014

Tim Clark, Paolo N Ciccarese, and Carole A Goble. Micropublications: a semantic model for claims, evidence, arguments and annotations in biomedical communications.Journal of biomedical semantics, 5(1):28, 2014

work page 2014
[4]

The lean theorem prover (system description)

Leonardo De Moura, Soonho Kong, Jeremy Avigad, Floris Van Doorn, and Jakob von Raumer. The lean theorem prover (system description). InInternational Conference on Automated Deduction, pages 378–388. Springer, 2015

work page 2015
[5]

Llm use in scholarly writing poses a provenance problem.Nature Machine Intelligence, pages 1–2, 2025

Brian D Earp, Haotian Yuan, Julian Koplin, and Sebastian Porsdam Mann. Llm use in scholarly writing poses a provenance problem.Nature Machine Intelligence, pages 1–2, 2025

work page 2025
[6]

Could octopus help researchers produce high quality, open research? Octopus, 2023

Alexandra Freeman, Pen-Yuan Hsing, Mariia Tukanova, Jacqueline Thompson, and Marcus Munafo. Could octopus help researchers produce high quality, open research? Octopus, 2023. URL https://doi.org/10. 57874/cyz8-es44. Rationale / Hypothesis, published 26 October 2023

work page 2023
[7]

An intelligent infrastructure as a foundation for modern science.ArXiv, 2025

Satrajit Ghosh. An intelligent infrastructure as a foundation for modern science.ArXiv, 2025. URL https: //api.semanticscholar.org/CorpusID:280649886

work page 2025
[8]

The anatomy of a nanopublication.Information services and use, 30(1-2):51–56, 2010

Paul Groth, Andrew Gibson, and Jan Velterop. The anatomy of a nanopublication.Information services and use, 30(1-2):51–56, 2010

work page 2010
[9]

Jutta Haider, Kristofer Rolf Söderström, Björn Ekström, and Malte Rödl. Gpt-fabricated scientific papers on google scholar: Key features, spread, and implications for preempting evidence manipulation.Harvard Kennedy School Misinformation Review, 5(5), 2024

work page 2024
[10]

Academic journals’ ai policies fail to curb the surge in ai-assisted academic writing

Yongyuan He and Yi Bu. Academic journals’ ai policies fail to curb the surge in ai-assisted academic writing. Proceedings of the National Academy of Sciences, 123(9):e2526734123, 2026

work page 2026
[11]

Bibliometrics: the leiden manifesto for research metrics.Nature, 520(7548):429–431, 2015

Diana Hicks, Paul Wouters, Ludo Waltman, Sarah De Rijcke, and Ismael Rafols. Bibliometrics: the leiden manifesto for research metrics.Nature, 520(7548):429–431, 2015

work page 2015
[12]

Chuxuan Hu, Liyun Zhang, Yeji Lim, Aum Wadhwani, Austin Peters, and Daniel Kang. Repro-bench: Can agentic ai systems assess the reproducibility of social science research? InFindings of the Association for Computational Linguistics: ACL 2025, pages 23616–23626, 2025

work page 2025
[13]

Open research knowledge graph: Next generation infrastructure for semantic scholarly knowledge

Mohamad Yaser Jaradeh, Allard Oelen, Kheir Eddine Farfar, Manuel Prinz, Jennifer D’Souza, Gábor Kismihók, Markus Stocker, and Sören Auer. Open research knowledge graph: Next generation infrastructure for semantic scholarly knowledge. InProceedings of the 10th international conference on knowledge capture, pages 243–246, 2019

work page 2019
[14]

Can ai validate science? benchmarking llms on claim–evidence reasoning in ai papers

Shashidhar Reddy Javaji, Yupeng Cao, Haohang Li, Yangyang Yu, Nikhil Muralidhar, and Zining Zhu. Can ai validate science? benchmarking llms on claim–evidence reasoning in ai papers. InProceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational...

work page 2025
[15]

Leading preprint server clamps down on ’ai slop’.Science, 391 6784:432–433, 2026

Nicola Jones. Leading preprint server clamps down on ’ai slop’.Science, 391 6784:432–433, 2026. URL https://api.semanticscholar.org/CorpusID:285174241

work page 2026
[16]

Position: The ai conference peer review crisis demands author feedback and reviewer rewards.arXiv preprint arXiv:2505.04966,

Jaeho Kim, Yunseok Lee, and Seulki Lee. Position: The ai conference peer review crisis demands author feedback and reviewer rewards.arXiv preprint arXiv:2505.04966, 2025. 11 Toward an Engineering of ScienceWORKINGPAPER

work page arXiv 2025
[17]

Genuine semantic publishing.Data Science, 1(1-2):139–154, 2017

Tobias Kuhn and Michel Dumontier. Genuine semantic publishing.Data Science, 1(1-2):139–154, 2017

work page 2017
[18]

Kunz and H.W.J

W. Kunz and H.W.J. Rittel. Issues as elements of information systems. Working paper 131, Institute of Urban and Regional Development, University of California, 1970. URL https://books.google.com/books?id= B-MaAQAAMAAJ

work page 1970
[19]

Scientific production in the era of large language models.Science, 390(6779):1240–1243, 2025

Keigo Kusumegi, Xinyu Yang, Paul Ginsparg, Mathijs de Vaan, Toby Stuart, and Yian Yin. Scientific production in the era of large language models.Science, 390(6779):1240–1243, 2025. doi: 10.1126/science.adw3000. URL https://www.science.org/doi/abs/10.1126/science.adw3000

work page doi:10.1126/science.adw3000 2025
[20]

Can large language models provide useful feedback on research papers? a large-scale empirical analysis.NEJM AI, 1(8):AIoa2400196, 2024

Weixin Liang, Yuhui Zhang, Hancheng Cao, Binglu Wang, Daisy Yi Ding, Xinyu Yang, Kailas V odrahalli, Siyu He, Daniel Scott Smith, Yian Yin, et al. Can large language models provide useful feedback on research papers? a large-scale empirical analysis.NEJM AI, 1(8):AIoa2400196, 2024

work page 2024
[21]

Quantifying large language model usage in scientific papers.Nature Human Behaviour, pages 1–11, 2025

Weixin Liang, Yaohui Zhang, Zhengxuan Wu, Haley Lepp, Wenlong Ji, Xuandong Zhao, Hancheng Cao, Sheng Liu, Siyu He, Zhi Huang, et al. Quantifying large language model usage in scientific papers.Nature Human Behaviour, pages 1–11, 2025

work page 2025
[22]

Stop ddos attacking the research community with ai-generated survey papers.arXiv preprint arXiv:2510.09686, 2025a

Jianghao Lin, Rong Shan, Jiachen Zhu, Yunjia Xi, Yong Yu, and Weinan Zhang. Stop ddos attacking the research community with ai-generated survey papers.arXiv preprint arXiv:2510.09686, 2025

work page arXiv 2025
[23]

The Last Human-Written Paper: Agent-Native Research Artifacts

Jiachen Liu, Jiaxin Pei, Jintao Huang, Chenglei Si, Ao Qu, Xiangru Tang, Runyu Lu, Lichang Chen, Xiaoyan Bai, Haizhong Zheng, Carl Chen, Zhiyang Chen, Haojie Ye, Yujuan Fu, Zexue He, Zijian Jin, Zhenyu Zhang, Shangquan Sun, Maestro Harmon, John Dianzhuo Wang, Jianqiao Zeng, Jiachen Sun, Mingyuan Wu, Baoyu Zhou, Chenyu You, Shijian Lu, Yiming Qiu, Fan Lai,...

work page internal anchor Pith review Pith/arXiv arXiv 2026
[24]

The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery

Chris Lu, Cong Lu, Robert Tjarko Lange, Jakob Foerster, Jeff Clune, and David Ha. The ai scientist: Towards fully automated open-ended scientific discovery.ArXiv, abs/2408.06292, 2024. URL https: //api.semanticscholar.org/CorpusID:271854887

work page internal anchor Pith review Pith/arXiv arXiv 2024
[25]

University of Chicago press, 1973

Robert K Merton.The sociology of science: Theoretical and empirical investigations. University of Chicago press, 1973

work page 1973
[26]

Major ai conference flooded with peer reviews written fully by ai.Nature, 648(8093):256–257, 2025

Miryam Naddaf. Major ai conference flooded with peer reviews written fully by ai.Nature, 648(8093):256–257, 2025

work page 2025
[27]

Hallucinated citations are polluting the scientific literature

Miryam Naddaf and Elizabeth Quill. Hallucinated citations are polluting the scientific literature. what can be done?Nature, 652(8108):26–29, 2026

work page 2026
[28]

Open science falling behind in the era of artificial intelligence.Frontiers in Research Metrics and Analytics, 10:1595824, 2025

Guanxiong Pei and Huajian Huang. Open science falling behind in the era of artificial intelligence.Frontiers in Research Metrics and Analytics, 10:1595824, 2025

work page 2025
[29]

arXiv:2601.18724

Yusuke Sakai, Hidetaka Kamigaito, and Taro Watanabe. Hallucitation matters: Revealing the impact of hallucinated references with 300 hallucinated papers in acl conferences.arXiv preprint arXiv:2601.18724, 2026

work page arXiv 2026
[30]

Semantic publishing: the coming revolution in scientific journal publishing.Learned Publishing, 22(2):85–94, 2009

David Shotton. Semantic publishing: the coming revolution in scientific journal publishing.Learned Publishing, 22(2):85–94, 2009

work page 2009
[31]

Cito, the citation typing ontology.Journal of biomedical semantics, 1(Suppl 1):S6, 2010

David Shotton. Cito, the citation typing ontology.Journal of biomedical semantics, 1(Suppl 1):S6, 2010

work page 2010
[32]

When ai co-scientists fail: Spot-a benchmark for automated verification of scientific research.ArXiv, abs/2505.11855, 2025

Guijin Son, Jiwoo Hong, Honglu Fan, Heejeong Nam, Hyunwoo Ko, Seungwon Lim, Jinyeop Song, Jinhang Choi, Gonccalo Paulo, Youngjae Yu, and Stella Biderman. When ai co-scientists fail: Spot-a benchmark for automated verification of scientific research.ArXiv, abs/2505.11855, 2025. URL https://api.semanticscholar.org/ CorpusID:278740501

work page arXiv 2025
[33]

False authorship: an explorative case study around an ai-generated article published under my name.Research Integrity and Peer Review, 10(1):8, 2025

Diomidis Spinellis. False authorship: an explorative case study around an ai-generated article published under my name.Research Integrity and Peer Review, 10(1):8, 2025

work page 2025
[34]

2025 , doi =

Jiabin Tang, Lianghao Xia, Zhonghang Li, and Chao Huang. Ai-researcher: Autonomous scientific innovation. arXiv preprint arXiv:2505.18705, 2025

work page arXiv 2025
[35]

A large-scale randomized study of large language model feedback in peer review.Nature Machine Intelligence, pages 1–11, 2026

Nitya Thakkar, Mert Yuksekgonul, Jake Silberg, Animesh Garg, Nanyun Peng, Fei Sha, Rose Yu, Carl V ondrick, and James Zou. A large-scale randomized study of large language model feedback in peer review.Nature Machine Intelligence, pages 1–11, 2026

work page 2026
[36]

Cambridge university press, 2003

Stephen E Toulmin.The uses of argument. Cambridge university press, 2003

work page 2003
[37]

Fact or fiction: Verifying scientific claims

David Wadden, Shanchuan Lin, Kyle Lo, Lucy Lu Wang, Madeleine van Zuylen, Arman Cohan, and Hannaneh Hajishirzi. Fact or fiction: Verifying scientific claims. InProceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 7534–7550, 2020. 12 Toward an Engineering of ScienceWORKINGPAPER

work page 2020
[38]

Scifact-open: Towards open-domain scientific claim verification

David Wadden, Kyle Lo, Bailey Kuehl, Arman Cohan, Iz Beltagy, Lucy Lu Wang, and Hannaneh Hajishirzi. Scifact-open: Towards open-domain scientific claim verification. InFindings of the Association for Computational Linguistics: EMNLP 2022, pages 4719–4734, 2022

work page 2022
[39]

Autosurvey: Large language models can automatically write surveys.Advances in neural information processing systems, 37:115119–115145, 2024

Yidong Wang, Qi Guo, Wenjin Yao, Hongbo Zhang, Xin Zhang, Zhen Wu, Meishan Zhang, Xinyu Dai, Min Zhang, Qingsong Wen, et al. Autosurvey: Large language models can automatically write surveys.Advances in neural information processing systems, 37:115119–115145, 2024

work page 2024
[40]

The ai imperative: Scaling high-quality peer review in machine learning.arXiv preprint arXiv:2506.08134, 2025

Qiyao Wei, Samuel Holt, Jing Yang, Markus Wulfmeier, and Mihaela van der Schaar. The ai imperative: Scaling high-quality peer review in machine learning.arXiv preprint arXiv:2506.08134, 2025

work page arXiv 2025
[41]

The AI Scientist-v2: Workshop-Level Automated Scientific Discovery via Agentic Tree Search

Yutaro Yamada, Robert Tjarko Lange, Cong Lu, Shengran Hu, Chris Lu, Jakob Foerster, Jeff Clune, and David Ha. The ai scientist-v2: Workshop-level automated scientific discovery via agentic tree search.ArXiv, abs/2504.08066,

work page internal anchor Pith review Pith/arXiv arXiv
[42]

URLhttps://api.semanticscholar.org/CorpusID:277741107

work page
[43]

Leandojo: Theorem proving with retrieval-augmented language models.Advances in Neural Information Processing Systems, 36:21573–21612, 2023

Kaiyu Yang, Aidan Swope, Alex Gu, Rahul Chalamala, Peiyang Song, Shixing Yu, Saad Godil, Ryan J Prenger, and Animashree Anandkumar. Leandojo: Theorem proving with retrieval-augmented language models.Advances in Neural Information Processing Systems, 36:21573–21612, 2023. 13

work page 2023