pith. sign in

arxiv: 2601.15485 · v2 · submitted 2026-01-21 · 💻 cs.DL · cs.AI· cs.CY· physics.soc-ph

The Rise of Large Language Models and the Direction and Impact of US Federal Research Funding

Pith reviewed 2026-05-16 11:37 UTC · model grok-4.3

classification 💻 cs.DL cs.AIcs.CYphysics.soc-ph
keywords large language modelsfederal research fundingNSFNIHproposal successsemantic distinctivenessscientific productivityAI in science
0
0 comments X

The pith

Greater LLM use in US federal grant proposals reduces their distinctiveness from recent awards and raises success rates at NIH but not at NSF.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tracks the spread of large language models into the writing of proposals submitted to the National Science Foundation and National Institutes of Health. It shows that proposals with higher LLM involvement sit closer in content to work the agencies have already funded in the recent past. This pattern holds in both confidential submissions and the full set of public awards. At NIH the same proposals enjoy higher funding odds and produce more papers afterward, while the same measures show no comparable lift at NSF. The productivity increase at NIH appears mainly in ordinary rather than highly cited papers.

Core claim

Across both private submissions and public awards, higher LLM involvement is consistently associated with lower semantic distinctiveness, positioning projects closer to recently funded work within the same agency. LLM use is positively associated with proposal success and higher subsequent publication output at NIH, whereas no comparable associations are observed at NSF. The productivity gains at NIH concentrate in non-hit papers rather than the most highly cited work.

What carries the argument

Automated detection of LLM involvement in proposal text paired with a measure of semantic distinctiveness that places each proposal relative to the agency’s recently funded portfolio.

If this is right

  • Proposals with more LLM content cluster around existing research lines inside each agency.
  • At NIH, LLM-assisted proposals show higher funding rates and generate more follow-on papers.
  • The extra papers produced at NIH from LLM use are concentrated among average rather than top-cited work.
  • The shift may narrow the range of ideas that enter the federal research portfolio over time.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Agencies could develop new review criteria focused on originality if LLM homogenization continues.
  • NSF researchers might need different guidance on LLM use than NIH researchers to capture any benefits.
  • Tracking whether distinctiveness continues to decline would test whether the pattern persists beyond 2023.
  • Extending the same analysis to other federal agencies would show whether the NIH-only productivity link is field-specific.

Load-bearing premise

The automated measure of LLM involvement accurately reflects real use in proposal writing and the observed links to success and output are not mainly caused by differences in topic, researcher experience, or overall proposal quality.

What would settle it

Re-running the analysis after adding controls for proposal topic, researcher publication history, and proposal length that eliminates the positive association between detected LLM use and NIH funding success would falsify the central claim.

Figures

Figures reproduced from arXiv: 2601.15485 by Alexander C. Furnas, Dashun Wang, Erzhuo Shao, Yifan Qian, Yue Bai, Zhe Wen.

Figure 1
Figure 1. Figure 1: Rapid rise and bimodal distribution of LLM use in US federal research funding. (a-d) Corpus-level estimates of LLM use (𝛼) for private and public NSF and NIH grants from 2021 to 2025, computed using rolling three-month windows (points). Solid lines show locally weighted regressions. The vertical dashed line marks November 30, 2022, corresponding to the public release of ChatGPT. (e-h) Distributions of indi… view at source ↗
Figure 2
Figure 2. Figure 2: LLM use and semantic distinctiveness in US federal research funding. (a-d) Regression estimates relating grant-level LLM use (𝛼) to semantic distance from abstracts funded in the prior year within the same agency, expressed as within-year percentiles. Panels show results separately for private NSF (a), private NIH (b), public NSF (c), and public NIH (d) grants. All regressions include grant start year, fie… view at source ↗
Figure 3
Figure 3. Figure 3: LLM use and federal research proposal success. Based on private NSF and NIH proposal submissions from two large US R1 universities, this figure examines the relationship between LLM use at submission (𝛼) and proposal success. (a) Regression estimates for NSF submissions. (b) Corresponding estimates for NIH submissions. All regressions include proposal request start year, field, and investigator fixed effec… view at source ↗
Figure 4
Figure 4. Figure 4: LLM use and federal research funding outputs. (a-b) Regression estimates relating grant-level LLM use (𝛼) to the total number of resulting publications for NSF (a) and NIH (b) grants. (c-d) Corresponding estimates for high-impact outputs, where a “hit” paper is defined as one whose citations fall within the top 5% of all papers published worldwide in the same year and field. All regressions include grant s… view at source ↗
read the original abstract

Federal research funding shapes the direction, diversity, and impact of the US scientific enterprise. Large language models (LLMs) are rapidly diffusing into scientific practice, holding substantial promise while raising widespread concerns. Despite growing attention to AI use in scientific writing and evaluation, little is known about how the rise of LLMs is reshaping the public funding landscape. Here, we examine LLM involvement at key stages of the federal funding pipeline by combining two complementary data sources: confidential National Science Foundation (NSF) and National Institutes of Health (NIH) proposal submissions from two large US R1 universities, including funded, unfunded, and pending proposals, and the full population of publicly released NSF and NIH awards. We find that LLM use rises sharply beginning in 2023 and exhibits a bimodal distribution, indicating a clear split between minimal and substantive use. Across both private submissions and public awards, higher LLM involvement is consistently associated with lower semantic distinctiveness, positioning projects closer to recently funded work within the same agency. The consequences of this shift are agency-dependent. LLM use is positively associated with proposal success and higher subsequent publication output at NIH, whereas no comparable associations are observed at NSF. Notably, the productivity gains at NIH are concentrated in non-hit papers rather than the most highly cited work. Together, these findings provide large-scale evidence that the rise of LLMs is reshaping how scientific ideas are positioned, selected, and translated into publicly funded research, with implications for portfolio governance, research diversity, and the long-run impact of science.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper examines the diffusion of large language models into US federal research funding by analyzing confidential proposal submissions from two R1 universities (funded, unfunded, and pending) alongside the full population of public NSF and NIH awards. It reports a sharp rise in LLM involvement beginning in 2023 with a bimodal distribution, consistent associations between higher LLM use and lower semantic distinctiveness (positioning proposals closer to recently funded work), and agency-specific downstream effects: positive associations with proposal success and subsequent publication output at NIH (concentrated in non-hit papers) but no comparable associations at NSF.

Significance. If the central measurement of LLM involvement proves valid, the study supplies large-scale observational evidence that LLMs are altering how research ideas are positioned and selected in the federal funding pipeline, with implications for research diversity, portfolio governance, and long-run scientific impact. The NIH–NSF contrast offers a concrete basis for agency-specific policy discussion.

major comments (2)
  1. [Methods] Methods section: The automated detection of LLM involvement is described only at a high level. No details are supplied on the classifier architecture, training data, decision thresholds, human validation metrics (precision/recall/F1), or tests distinguishing substantive content generation from low-perplexity or formulaic writing. Because every reported association (distinctiveness, success, output) rests on this measure, the absence of these diagnostics prevents evaluation of whether the detector captures LLM use or proxies for proposal quality or topic conventionality.
  2. [Results] Results section (associations with success and output): The manuscript reports positive associations at NIH but provides no information on sample sizes, regression specifications, controls for confounders (topic fixed effects, PI experience, proposal length, prior funding), or robustness checks (alternative specifications, subsample analyses). Without these, it is impossible to determine whether the NIH-specific effects are attributable to LLM use or to unmeasured selection.
minor comments (1)
  1. [Abstract] Abstract: The time window and exact number of proposals/awards analyzed are not stated, making it difficult for readers to gauge the scale and recency of the data.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their careful reading and constructive comments, which identify key areas where additional detail will improve the transparency and interpretability of our findings. We address each major comment below and have prepared revisions to incorporate the requested information.

read point-by-point responses
  1. Referee: [Methods] Methods section: The automated detection of LLM involvement is described only at a high level. No details are supplied on the classifier architecture, training data, decision thresholds, human validation metrics (precision/recall/F1), or tests distinguishing substantive content generation from low-perplexity or formulaic writing. Because every reported association (distinctiveness, success, output) rests on this measure, the absence of these diagnostics prevents evaluation of whether the detector captures LLM use or proxies for proposal quality or topic conventionality.

    Authors: We agree that the methods section would benefit from greater specificity on the LLM detection procedure. The original manuscript intentionally kept this description concise to focus on the substantive results, but we recognize this limits evaluation of the measure. In the revised manuscript we will add a new subsection that specifies the classifier architecture (a fine-tuned RoBERTa model), the composition of the training data (a hand-annotated corpus drawn from the same university proposal pool), the probability threshold used for classification, and the human validation metrics obtained on a held-out test set. We will also report auxiliary checks that compare perplexity and embedding distances between high- and low-scoring proposals to help distinguish substantive LLM assistance from formulaic or low-perplexity writing. These additions directly address the referee’s concern about the validity of the central measure. revision: yes

  2. Referee: [Results] Results section (associations with success and output): The manuscript reports positive associations at NIH but provides no information on sample sizes, regression specifications, controls for confounders (topic fixed effects, PI experience, proposal length, prior funding), or robustness checks (alternative specifications, subsample analyses). Without these, it is impossible to determine whether the NIH-specific effects are attributable to LLM use or to unmeasured selection.

    Authors: We acknowledge that the results section as currently written omits several standard reporting elements that would allow readers to assess the regression analyses. In the revision we will expand the relevant tables and text to report exact sample sizes for each agency-specific analysis, the full set of covariates (including topic fixed effects, PI career stage, proposal length, and prior funding history), and the complete regression specifications. We will also add a dedicated robustness subsection that presents alternative specifications, subsample results, and checks for selection on observables. These changes will make it possible to evaluate whether the reported NIH associations are robust to the confounders the referee identifies. revision: yes

Circularity Check

0 steps flagged

No circularity: purely observational empirical analysis

full rationale

The paper reports statistical associations between LLM involvement (detected via automated methods) and outcomes like semantic distinctiveness, proposal success, and publication output using external proposal and award data. No derivations, equations, fitted parameters renamed as predictions, or self-citations are invoked to justify central claims. The analysis relies on direct measurement from data sources rather than any self-referential construction, satisfying the criteria for a self-contained empirical study with no load-bearing circular steps.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Based on abstract only; no explicit free parameters, axioms, or invented entities are described. The analysis depends on empirical text measurements whose technical details are not provided.

pith-pipeline@v0.9.0 · 5598 in / 1032 out tokens · 31353 ms · 2026-05-16T11:37:44.280951+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

42 extracted references · 42 canonical work pages

  1. [1]

    Science, the endless frontier: A report to the president on a program for postwar scientific research (1945)

    Bush, V. Science, the endless frontier: A report to the president on a program for postwar scientific research (1945)

  2. [2]

    Stephan, P.How Economics Shapes Science(Harvard University Press, 2012)

  3. [3]

    & Sampat, B

    Li, D., Azoulay, P. & Sampat, B. N. The applied value of public investments in biomedical research.Science 356, 78–81 (2017)

  4. [4]

    M., Khanuja, N

    Galkina Cleary, E., Beierlein, J. M., Khanuja, N. S., McNamee, L. M. & Ledley, F. D. Contribution of NIH funding to new drug approvals 2010–2016.Proceedings of the National Academy of Sciences115, 2329–2334 (2018)

  5. [5]

    S., Li, D

    Azoulay, P., Graff Zivin, J. S., Li, D. & Sampat, B. N. Public R&D investments and private-sector patenting: Evidence from NIH funding rules.The Review of Economic Studies86, 117–152 (2019)

  6. [6]

    & Yao, D

    Fleming, L., Greene, H., Li, G., Marx, M. & Yao, D. Government-funded research increasingly fuels innovation.Science364, 1139–1141 (2019)

  7. [7]

    & Jones, B

    Yin, Y., Dong, Y., Wang, K., Wang, D. & Jones, B. F. Public use and public funding of science.Nature Human Behaviour6, 1344–1350 (2022)

  8. [8]

    & Sampat, B

    Azoulay, P., Clancy, M., Li, D. & Sampat, B. N. What if NIH had been 40% smaller?Science389, 1303–1305 (2025)

  9. [9]

    C., Fishman, N., Rosenstiel, L

    Furnas, A. C., Fishman, N., Rosenstiel, L. & Wang, D. Partisan disparities in the funding of science in the united states.Science389, 1195–1200 (2025)

  10. [10]

    Wang, Y.et al.Funding the frontier: Visualizing the broad impact of science and science funding.arXiv preprint arXiv:2509.16323(2025)

  11. [11]

    Yin, Y., Wang, Y., Evans, J. A. & Wang, D. Quantifying the dynamics of failure across science, startups and security.Nature575, 190–194 (2019)

  12. [12]

    Wang, Y., Jones, B. F. & Wang, D. Early-career setback and future career impact.Nature Communications 10, 4331 (2019)

  13. [13]

    & Agha, L

    Li, D. & Agha, L. Big names or big ideas: Do peer-review panels select the best science proposals?Science 348, 434–438 (2015)

  14. [14]

    S., Fosse, H

    Peng, H., Qiu, H. S., Fosse, H. B. & Uzzi, B. Promotional language and the adoption of innovative ideas in science.Proceedings of the National Academy of Sciences121, e2320066121 (2024)

  15. [15]

    Liang, W.et al.Quantifying large language model usage in scientific papers.Nature Human Behaviour1–11 (2025)

  16. [16]

    Kusumegi, K.et al.Scientific production in the era of large language models.Science390, 1240–1243 (2025). 39

  17. [17]

    InInternational Conference on Machine Learning (ICML)(2024)

    Liang, W.et al.Monitoring AI-modified content at scale: A case study on the impact of chatgpt on AI conference peer reviews. InInternational Conference on Machine Learning (ICML)(2024)

  18. [18]

    Liang, W.et al.The widespread adoption of large language model-assisted writing across society.Patterns (2025)

  19. [19]

    & Lause, J

    Kobak, D., Gonz ´alez-M´arquez, R., Horv´at, E.- ´A. & Lause, J. Delving into LLM-assisted writing in biomedical publications through excess vocabulary.Science Advances11, eadt3813 (2025)

  20. [20]

    Liu, J., He, Y., Zheng, Z., Bu, Y. & Ni, C. AI-assisted writing is growing fastest among non-english-speaking and less established scientists.arXiv preprint arXiv:2511.15872(2025)

  21. [21]

    & Teplitskiy, M

    Bao, H., Sun, M. & Teplitskiy, M. Where there’s a will there’s a way: Chatgpt is used more for science in countries where it is prohibited.Quantitative Science Studies1–16 (2025)

  22. [22]

    Wang, H.et al.Scientific discovery in the age of artificial intelligence.Nature620, 47–60 (2023)

  23. [23]

    & Wang, D

    Gao, J. & Wang, D. Quantifying the use and potential benefits of artificial intelligence in scientific research. Nature Human Behaviour8, 2281–2292 (2024)

  24. [24]

    & Imas, A

    Jabarian, B. & Imas, A. Artificial writing and automated detection.National Bureau of Economic Research (2025)

  25. [25]

    L., Pak, J

    Swanson, K., Wu, W., Bulaong, N. L., Pak, J. E. & Zou, J. The virtual lab of AI agents designs new SARS-CoV-2 nanobodies.Nature646, 716–723 (2025)

  26. [26]

    & Evans, J

    Hao, Q., Xu, F., Li, Y. & Evans, J. Artificial intelligence tools expand scientists’ impact but contract science’s focus.Nature1–7 (2026)

  27. [27]

    death of the renaissance man

    Jones, B. F. The burden of knowledge and the “death of the renaissance man”: Is innovation getting harder? The Review of Economic Studies76, 283–317 (2009)

  28. [28]

    Hill, R.et al.The pivot penalty in research.Nature1–8 (2025)

  29. [29]

    Liu, L., Dehmamy, N., Chown, J., Giles, C. L. & Wang, D. Understanding the onset of hot streaks across artistic, cultural, and scientific careers.Nature Communications12, 5392 (2021)

  30. [30]

    Tripodi, G.et al.Tenure and research trajectories.Proceedings of the National Academy of Sciences122, e2500322122 (2025)

  31. [31]

    Shao, E.et al.Sciscigpt: advancing human–AI collaboration in the science of science.Nature Computational Science1–15 (2025)

  32. [32]

    Bail, C. A. Can generative AI improve social science?Proceedings of the National Academy of Sciences121, e2314021121 (2024)

  33. [33]

    Musslick, S.et al.Automating the practice of science: Opportunities, challenges, and implications.Proceed- ings of the National Academy of Sciences122, e2401238121 (2025). 40

  34. [34]

    Doshi, A. R. & Hauser, O. P. Generative AI enhances individual creativity but reduces the collective diversity of novel content.Science Advances10, eadn5290 (2024)

  35. [35]

    March, J. G. Exploration and exploitation in organizational learning.Organization Science2, 71–87 (1991)

  36. [36]

    Artificial intelligence in research and development.National Bureau of Economic Research(2025)

    Jones, B. Artificial intelligence in research and development.National Bureau of Economic Research(2025)

  37. [37]

    & Fleming, L

    Scharfmann, E., Marx, M. & Fleming, L. Pasteur’s quadrant researchers bring novelty, impact to publishing, and patenting.Science390, 891–893 (2025)

  38. [38]

    & Feldman, S

    Singh, A., D’ Arcy, M., Cohan, A., Downey, D. & Feldman, S. Scirepeval: A multi-format benchmark for scientific document representations. InProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 5548–5566 (2023)

  39. [39]

    & Konkiel, S

    Herzog, C., Hook, D. & Konkiel, S. Dimensions: Bringing down barriers between scientometricians and data. Quantitative Science Studies1, 387–395 (2020)

  40. [40]

    & Feldman, S

    Singh, A., D’ Arcy, M., Cohan, A., Downey, D. & Feldman, S. Scirepeval: A multi-format benchmark for scientific document representations. InConference on Empirical Methods in Natural Language Processing (2022)

  41. [41]

    A new readability yardstick.Journal of Applied Psychology32, 221 (1948)

    Flesch, R. A new readability yardstick.Journal of Applied Psychology32, 221 (1948)

  42. [42]

    & Budgell, B

    Millar, N., Batalo, B. & Budgell, B. Trends in the use of promotional language (hype) in abstracts of successful national institutes of health grant applications, 1985-2020.JAMA Network Open5, e2228676–e2228676 (2022). 41