The Rise of Large Language Models and the Direction and Impact of US Federal Research Funding
Pith reviewed 2026-05-16 11:37 UTC · model grok-4.3
The pith
Greater LLM use in US federal grant proposals reduces their distinctiveness from recent awards and raises success rates at NIH but not at NSF.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Across both private submissions and public awards, higher LLM involvement is consistently associated with lower semantic distinctiveness, positioning projects closer to recently funded work within the same agency. LLM use is positively associated with proposal success and higher subsequent publication output at NIH, whereas no comparable associations are observed at NSF. The productivity gains at NIH concentrate in non-hit papers rather than the most highly cited work.
What carries the argument
Automated detection of LLM involvement in proposal text paired with a measure of semantic distinctiveness that places each proposal relative to the agency’s recently funded portfolio.
If this is right
- Proposals with more LLM content cluster around existing research lines inside each agency.
- At NIH, LLM-assisted proposals show higher funding rates and generate more follow-on papers.
- The extra papers produced at NIH from LLM use are concentrated among average rather than top-cited work.
- The shift may narrow the range of ideas that enter the federal research portfolio over time.
Where Pith is reading between the lines
- Agencies could develop new review criteria focused on originality if LLM homogenization continues.
- NSF researchers might need different guidance on LLM use than NIH researchers to capture any benefits.
- Tracking whether distinctiveness continues to decline would test whether the pattern persists beyond 2023.
- Extending the same analysis to other federal agencies would show whether the NIH-only productivity link is field-specific.
Load-bearing premise
The automated measure of LLM involvement accurately reflects real use in proposal writing and the observed links to success and output are not mainly caused by differences in topic, researcher experience, or overall proposal quality.
What would settle it
Re-running the analysis after adding controls for proposal topic, researcher publication history, and proposal length that eliminates the positive association between detected LLM use and NIH funding success would falsify the central claim.
Figures
read the original abstract
Federal research funding shapes the direction, diversity, and impact of the US scientific enterprise. Large language models (LLMs) are rapidly diffusing into scientific practice, holding substantial promise while raising widespread concerns. Despite growing attention to AI use in scientific writing and evaluation, little is known about how the rise of LLMs is reshaping the public funding landscape. Here, we examine LLM involvement at key stages of the federal funding pipeline by combining two complementary data sources: confidential National Science Foundation (NSF) and National Institutes of Health (NIH) proposal submissions from two large US R1 universities, including funded, unfunded, and pending proposals, and the full population of publicly released NSF and NIH awards. We find that LLM use rises sharply beginning in 2023 and exhibits a bimodal distribution, indicating a clear split between minimal and substantive use. Across both private submissions and public awards, higher LLM involvement is consistently associated with lower semantic distinctiveness, positioning projects closer to recently funded work within the same agency. The consequences of this shift are agency-dependent. LLM use is positively associated with proposal success and higher subsequent publication output at NIH, whereas no comparable associations are observed at NSF. Notably, the productivity gains at NIH are concentrated in non-hit papers rather than the most highly cited work. Together, these findings provide large-scale evidence that the rise of LLMs is reshaping how scientific ideas are positioned, selected, and translated into publicly funded research, with implications for portfolio governance, research diversity, and the long-run impact of science.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper examines the diffusion of large language models into US federal research funding by analyzing confidential proposal submissions from two R1 universities (funded, unfunded, and pending) alongside the full population of public NSF and NIH awards. It reports a sharp rise in LLM involvement beginning in 2023 with a bimodal distribution, consistent associations between higher LLM use and lower semantic distinctiveness (positioning proposals closer to recently funded work), and agency-specific downstream effects: positive associations with proposal success and subsequent publication output at NIH (concentrated in non-hit papers) but no comparable associations at NSF.
Significance. If the central measurement of LLM involvement proves valid, the study supplies large-scale observational evidence that LLMs are altering how research ideas are positioned and selected in the federal funding pipeline, with implications for research diversity, portfolio governance, and long-run scientific impact. The NIH–NSF contrast offers a concrete basis for agency-specific policy discussion.
major comments (2)
- [Methods] Methods section: The automated detection of LLM involvement is described only at a high level. No details are supplied on the classifier architecture, training data, decision thresholds, human validation metrics (precision/recall/F1), or tests distinguishing substantive content generation from low-perplexity or formulaic writing. Because every reported association (distinctiveness, success, output) rests on this measure, the absence of these diagnostics prevents evaluation of whether the detector captures LLM use or proxies for proposal quality or topic conventionality.
- [Results] Results section (associations with success and output): The manuscript reports positive associations at NIH but provides no information on sample sizes, regression specifications, controls for confounders (topic fixed effects, PI experience, proposal length, prior funding), or robustness checks (alternative specifications, subsample analyses). Without these, it is impossible to determine whether the NIH-specific effects are attributable to LLM use or to unmeasured selection.
minor comments (1)
- [Abstract] Abstract: The time window and exact number of proposals/awards analyzed are not stated, making it difficult for readers to gauge the scale and recency of the data.
Simulated Author's Rebuttal
We thank the referee for their careful reading and constructive comments, which identify key areas where additional detail will improve the transparency and interpretability of our findings. We address each major comment below and have prepared revisions to incorporate the requested information.
read point-by-point responses
-
Referee: [Methods] Methods section: The automated detection of LLM involvement is described only at a high level. No details are supplied on the classifier architecture, training data, decision thresholds, human validation metrics (precision/recall/F1), or tests distinguishing substantive content generation from low-perplexity or formulaic writing. Because every reported association (distinctiveness, success, output) rests on this measure, the absence of these diagnostics prevents evaluation of whether the detector captures LLM use or proxies for proposal quality or topic conventionality.
Authors: We agree that the methods section would benefit from greater specificity on the LLM detection procedure. The original manuscript intentionally kept this description concise to focus on the substantive results, but we recognize this limits evaluation of the measure. In the revised manuscript we will add a new subsection that specifies the classifier architecture (a fine-tuned RoBERTa model), the composition of the training data (a hand-annotated corpus drawn from the same university proposal pool), the probability threshold used for classification, and the human validation metrics obtained on a held-out test set. We will also report auxiliary checks that compare perplexity and embedding distances between high- and low-scoring proposals to help distinguish substantive LLM assistance from formulaic or low-perplexity writing. These additions directly address the referee’s concern about the validity of the central measure. revision: yes
-
Referee: [Results] Results section (associations with success and output): The manuscript reports positive associations at NIH but provides no information on sample sizes, regression specifications, controls for confounders (topic fixed effects, PI experience, proposal length, prior funding), or robustness checks (alternative specifications, subsample analyses). Without these, it is impossible to determine whether the NIH-specific effects are attributable to LLM use or to unmeasured selection.
Authors: We acknowledge that the results section as currently written omits several standard reporting elements that would allow readers to assess the regression analyses. In the revision we will expand the relevant tables and text to report exact sample sizes for each agency-specific analysis, the full set of covariates (including topic fixed effects, PI career stage, proposal length, and prior funding history), and the complete regression specifications. We will also add a dedicated robustness subsection that presents alternative specifications, subsample results, and checks for selection on observables. These changes will make it possible to evaluate whether the reported NIH associations are robust to the confounders the referee identifies. revision: yes
Circularity Check
No circularity: purely observational empirical analysis
full rationale
The paper reports statistical associations between LLM involvement (detected via automated methods) and outcomes like semantic distinctiveness, proposal success, and publication output using external proposal and award data. No derivations, equations, fitted parameters renamed as predictions, or self-citations are invoked to justify central claims. The analysis relies on direct measurement from data sources rather than any self-referential construction, satisfying the criteria for a self-contained empirical study with no load-bearing circular steps.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We leverage an established detection method proposed by Liang et al. ... estimate the fraction of LLM-modified sentences (α) ... SPECTER2 embeddings ... average cosine distance ... within-year percentiles ... OLS regressions with investigator, field, and year fixed effects
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
higher LLM involvement is consistently associated with lower semantic distinctiveness ... agency-dependent effects on proposal success and publication output
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Bush, V. Science, the endless frontier: A report to the president on a program for postwar scientific research (1945)
work page 1945
-
[2]
Stephan, P.How Economics Shapes Science(Harvard University Press, 2012)
work page 2012
-
[3]
Li, D., Azoulay, P. & Sampat, B. N. The applied value of public investments in biomedical research.Science 356, 78–81 (2017)
work page 2017
-
[4]
Galkina Cleary, E., Beierlein, J. M., Khanuja, N. S., McNamee, L. M. & Ledley, F. D. Contribution of NIH funding to new drug approvals 2010–2016.Proceedings of the National Academy of Sciences115, 2329–2334 (2018)
work page 2010
- [5]
- [6]
-
[7]
Yin, Y., Dong, Y., Wang, K., Wang, D. & Jones, B. F. Public use and public funding of science.Nature Human Behaviour6, 1344–1350 (2022)
work page 2022
-
[8]
Azoulay, P., Clancy, M., Li, D. & Sampat, B. N. What if NIH had been 40% smaller?Science389, 1303–1305 (2025)
work page 2025
-
[9]
C., Fishman, N., Rosenstiel, L
Furnas, A. C., Fishman, N., Rosenstiel, L. & Wang, D. Partisan disparities in the funding of science in the united states.Science389, 1195–1200 (2025)
work page 2025
- [10]
-
[11]
Yin, Y., Wang, Y., Evans, J. A. & Wang, D. Quantifying the dynamics of failure across science, startups and security.Nature575, 190–194 (2019)
work page 2019
-
[12]
Wang, Y., Jones, B. F. & Wang, D. Early-career setback and future career impact.Nature Communications 10, 4331 (2019)
work page 2019
- [13]
-
[14]
Peng, H., Qiu, H. S., Fosse, H. B. & Uzzi, B. Promotional language and the adoption of innovative ideas in science.Proceedings of the National Academy of Sciences121, e2320066121 (2024)
work page 2024
-
[15]
Liang, W.et al.Quantifying large language model usage in scientific papers.Nature Human Behaviour1–11 (2025)
work page 2025
-
[16]
Kusumegi, K.et al.Scientific production in the era of large language models.Science390, 1240–1243 (2025). 39
work page 2025
-
[17]
InInternational Conference on Machine Learning (ICML)(2024)
Liang, W.et al.Monitoring AI-modified content at scale: A case study on the impact of chatgpt on AI conference peer reviews. InInternational Conference on Machine Learning (ICML)(2024)
work page 2024
-
[18]
Liang, W.et al.The widespread adoption of large language model-assisted writing across society.Patterns (2025)
work page 2025
-
[19]
Kobak, D., Gonz ´alez-M´arquez, R., Horv´at, E.- ´A. & Lause, J. Delving into LLM-assisted writing in biomedical publications through excess vocabulary.Science Advances11, eadt3813 (2025)
work page 2025
- [20]
-
[21]
Bao, H., Sun, M. & Teplitskiy, M. Where there’s a will there’s a way: Chatgpt is used more for science in countries where it is prohibited.Quantitative Science Studies1–16 (2025)
work page 2025
-
[22]
Wang, H.et al.Scientific discovery in the age of artificial intelligence.Nature620, 47–60 (2023)
work page 2023
- [23]
- [24]
-
[25]
Swanson, K., Wu, W., Bulaong, N. L., Pak, J. E. & Zou, J. The virtual lab of AI agents designs new SARS-CoV-2 nanobodies.Nature646, 716–723 (2025)
work page 2025
-
[26]
Hao, Q., Xu, F., Li, Y. & Evans, J. Artificial intelligence tools expand scientists’ impact but contract science’s focus.Nature1–7 (2026)
work page 2026
-
[27]
Jones, B. F. The burden of knowledge and the “death of the renaissance man”: Is innovation getting harder? The Review of Economic Studies76, 283–317 (2009)
work page 2009
-
[28]
Hill, R.et al.The pivot penalty in research.Nature1–8 (2025)
work page 2025
-
[29]
Liu, L., Dehmamy, N., Chown, J., Giles, C. L. & Wang, D. Understanding the onset of hot streaks across artistic, cultural, and scientific careers.Nature Communications12, 5392 (2021)
work page 2021
-
[30]
Tripodi, G.et al.Tenure and research trajectories.Proceedings of the National Academy of Sciences122, e2500322122 (2025)
work page 2025
-
[31]
Shao, E.et al.Sciscigpt: advancing human–AI collaboration in the science of science.Nature Computational Science1–15 (2025)
work page 2025
-
[32]
Bail, C. A. Can generative AI improve social science?Proceedings of the National Academy of Sciences121, e2314021121 (2024)
work page 2024
-
[33]
Musslick, S.et al.Automating the practice of science: Opportunities, challenges, and implications.Proceed- ings of the National Academy of Sciences122, e2401238121 (2025). 40
work page 2025
-
[34]
Doshi, A. R. & Hauser, O. P. Generative AI enhances individual creativity but reduces the collective diversity of novel content.Science Advances10, eadn5290 (2024)
work page 2024
-
[35]
March, J. G. Exploration and exploitation in organizational learning.Organization Science2, 71–87 (1991)
work page 1991
-
[36]
Artificial intelligence in research and development.National Bureau of Economic Research(2025)
Jones, B. Artificial intelligence in research and development.National Bureau of Economic Research(2025)
work page 2025
-
[37]
Scharfmann, E., Marx, M. & Fleming, L. Pasteur’s quadrant researchers bring novelty, impact to publishing, and patenting.Science390, 891–893 (2025)
work page 2025
-
[38]
Singh, A., D’ Arcy, M., Cohan, A., Downey, D. & Feldman, S. Scirepeval: A multi-format benchmark for scientific document representations. InProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 5548–5566 (2023)
work page 2023
-
[39]
Herzog, C., Hook, D. & Konkiel, S. Dimensions: Bringing down barriers between scientometricians and data. Quantitative Science Studies1, 387–395 (2020)
work page 2020
-
[40]
Singh, A., D’ Arcy, M., Cohan, A., Downey, D. & Feldman, S. Scirepeval: A multi-format benchmark for scientific document representations. InConference on Empirical Methods in Natural Language Processing (2022)
work page 2022
-
[41]
A new readability yardstick.Journal of Applied Psychology32, 221 (1948)
Flesch, R. A new readability yardstick.Journal of Applied Psychology32, 221 (1948)
work page 1948
-
[42]
Millar, N., Batalo, B. & Budgell, B. Trends in the use of promotional language (hype) in abstracts of successful national institutes of health grant applications, 1985-2020.JAMA Network Open5, e2228676–e2228676 (2022). 41
work page 1985
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.