pith. sign in

arxiv: 2508.06445 · v3 · submitted 2025-08-08 · 💻 cs.CL · cs.AI

Echoes of Automation: The Increasing Use of LLMs in Newsmaking

Pith reviewed 2026-05-18 23:55 UTC · model grok-4.3

classification 💻 cs.CL cs.AI
keywords generative AILLM detectionnews mediaAI in journalismtext analysislocal newscollege media
0
0 comments X

The pith

The use of LLMs in news articles has increased substantially in recent years, especially in local and college media.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper examines over 40,000 news articles from major, local, and college outlets using three AI-text detectors to measure generative AI adoption in journalism. It reports a clear rise in LLM use over time, strongest in local and college news. The analysis also shows LLMs are often applied to introductions but not conclusions, while producing text that is richer in vocabulary and easier to read yet less formal and more uniform in style across outlets. A reader would care because the findings point to shifting authorship practices that affect how news is created and perceived.

Core claim

By applying the detectors Binoculars, Fast-Detect GPT, and GPTZero to a large corpus of news articles, the authors document a substantial increase in GenAI content in recent years, most pronounced in local and college news, with LLMs commonly used for introductions while conclusions remain human-written, and with resulting gains in word richness and readability but losses in formality and stylistic variety.

What carries the argument

Three AI-text detectors (Binoculars, Fast-Detect GPT, and GPTZero) applied to over 40,000 news articles from major, local, and college media to classify and analyze LLM-generated portions.

If this is right

  • Journalistic integrity faces growing pressure as AI assistance becomes routine in news production.
  • Local and college news outlets integrate LLMs at higher rates than major national sources.
  • News writing styles are shifting toward greater readability paired with reduced formality and distinctiveness.
  • Human authors retain primary control over article conclusions even when AI drafts other sections.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Media organizations may need new policies on disclosing AI involvement to maintain audience trust.
  • Similar detection methods could be applied to other content areas such as opinion writing or reports.
  • Improvements in LLM capabilities could make current detectors less effective over time.

Load-bearing premise

The three AI-text detectors accurately distinguish LLM-generated news text from human-written text with low error rates across media formats and outlets.

What would settle it

A hand-checked sample of articles the detectors flag as AI-generated that turns out to consist mostly of human-written text would undermine the reported increase.

Figures

Figures reproduced from arXiv: 2508.06445 by Abolfazl Ansari, Delvin Ce Zhang, Dongwon Lee, Nafis Irtiza Tripto.

Figure 1
Figure 1. Figure 1: Different GenAI policy examples in newspapers. [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Temporal trends in the number of AI-written opinion articles (2020–2024) [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: AI-generated content detected by ≥2 models increased from pre-GPT (2020–2022) to post-GPT (2023–2024), especially in local news and newspapers [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Average AI-generated probability across segmented portions of AI-written [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
read the original abstract

The rapid rise of Generative AI (GenAI), particularly LLMs, poses concerns for journalistic integrity and authorship. This study examines AI-generated content across over 40,000 news articles from major, local, and college news media, in various media formats. Using three advanced AI-text detectors (e.g., Binoculars, Fast-Detect GPT, and GPTZero), we find substantial increase of GenAI use in recent years, especially in local and college news. Sentence-level analysis reveals LLMs are often used in the introduction of news, while conclusions usually written manually. Linguistic analysis shows GenAI boosts word richness and readability but lowers formality, leading to more uniform writing styles, particularly in local media.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript analyzes a corpus of over 40,000 news articles from major, local, and college outlets using three off-the-shelf AI-text detectors (Binoculars, Fast-Detect GPT, GPTZero) to document a substantial rise in detected LLM-generated content in recent years, especially in local and college media. Sentence-level breakdowns indicate preferential LLM use in introductions versus manual conclusions, while linguistic comparisons show GenAI text increases word richness and readability but reduces formality and increases stylistic uniformity.

Significance. If the detector outputs can be shown to reliably separate LLM from human news prose, the scale of the corpus and multi-detector consistency would offer useful observational evidence on differential GenAI adoption across outlet types and on resulting changes in writing style. The work addresses a timely question at the intersection of NLP and journalism studies, but its interpretive weight is limited by the absence of domain-specific detector validation.

major comments (2)
  1. [Methods] Methods section: No domain-specific validation, calibration, or false-positive analysis is reported for the three detectors on human-written articles drawn from the same major/local/college outlets and time periods. Because the headline claim of a temporal increase rests on these detectors producing low error rates on news prose, the lack of such checks leaves open the possibility that observed trends partly reflect detector sensitivity to journalistic conventions rather than genuine LLM adoption.
  2. [Results] Results, temporal and outlet-type comparisons: Prevalence estimates are presented without error bars, confidence intervals, or sensitivity analyses to detector threshold choices. This weakens support for statements about the precise magnitude of the increase and for cross-outlet differences.
minor comments (2)
  1. [Abstract] Abstract: The phrase 'substantial increase' is used without any accompanying effect size, baseline rate, or time window; a brief quantitative anchor would improve clarity.
  2. [Linguistic analysis] Linguistic analysis: The specific metrics and statistical tests used for word richness, readability, and formality should be stated explicitly, including whether outlet or period fixed effects were included.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed comments, which highlight important areas for strengthening the methodological transparency and statistical robustness of our work. We address each major comment below and outline the revisions we will make.

read point-by-point responses
  1. Referee: [Methods] Methods section: No domain-specific validation, calibration, or false-positive analysis is reported for the three detectors on human-written articles drawn from the same major/local/college outlets and time periods. Because the headline claim of a temporal increase rests on these detectors producing low error rates on news prose, the lack of such checks leaves open the possibility that observed trends partly reflect detector sensitivity to journalistic conventions rather than genuine LLM adoption.

    Authors: We agree that domain-specific validation on news prose is a valuable addition. The three detectors are established off-the-shelf tools, but the original manuscript did not include explicit false-positive analysis on human-written articles from our exact outlets and pre-LLM time periods. In the revised manuscript we will add a dedicated validation subsection: we will sample articles published in 2020–2021 from the same major, local, and college sources (when LLM use was negligible) and report per-detector false-positive rates. This will directly test whether journalistic conventions alone trigger high detection scores and will allow us to qualify the temporal trends accordingly. We will also retain a limitations paragraph discussing any residual uncertainty. revision: yes

  2. Referee: [Results] Results, temporal and outlet-type comparisons: Prevalence estimates are presented without error bars, confidence intervals, or sensitivity analyses to detector threshold choices. This weakens support for statements about the precise magnitude of the increase and for cross-outlet differences.

    Authors: We accept this critique. The current version reports point estimates without accompanying uncertainty measures or threshold robustness checks. In revision we will (1) add binomial or bootstrap-derived confidence intervals to all prevalence figures and (2) include a sensitivity analysis that varies each detector’s decision threshold across a plausible range and demonstrates that the reported temporal increase and outlet-type differences remain directionally consistent. These additions will be placed in the Results section with corresponding figures or tables. revision: yes

Circularity Check

0 steps flagged

No significant circularity; purely observational application of external detectors

full rationale

The paper conducts an empirical corpus study by running three pre-existing, off-the-shelf AI-text detectors (Binoculars, Fast-Detect GPT, GPTZero) over a collection of news articles and reporting observed trends in detected AI content. No derivation, first-principles prediction, parameter fitting, or self-referential definition is claimed or present; the central result is a direct measurement that depends on the external detectors' behavior rather than any quantity defined or fitted inside the paper itself. The analysis is therefore self-contained against external benchmarks and exhibits none of the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

1 free parameters · 0 axioms · 0 invented entities

The central prevalence claims rest on the untested transferability of the three chosen detectors to news text and on the assumption that detected AI signals reflect actual production practices rather than stylistic mimicry.

free parameters (1)
  • AI detection thresholds
    Implicit cutoffs used by Binoculars, Fast-Detect GPT, and GPTZero to label text as AI-generated; these are treated as fixed but can shift prevalence estimates.

pith-pipeline@v0.9.0 · 5658 in / 1114 out tokens · 60907 ms · 2026-05-18T23:55:01.565885+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

21 extracted references · 21 canonical work pages

  1. [1]

    Liang, W., Izzo, Z., Zhang Y. et al.: Monitoring AI-Modified Content at Scale: A Case Study on the Impact of ChatGPT on AI Conference Peer Reviews, Proceedings of the 41 st International Conference on Machine Learning, Vienna, Austria. PMLR 235, 2024

  2. [2]

    et al.: Are We in the AI-Generated Text World Already? Quantifying and Monitoring AIGT on Social Media

    Sun, Z., Zhang, Z., Shen, X., Zhang, Z. et al.: Are We in the AI-Generated Text World Already? Quantifying and Monitoring AIGT on Social Media. Association for Computational Linguistics, ACL, Austria (2025).arXiv:2412.18148 10 A. Ansari et al

  3. [3]

    Gao, J., Wang D.: Quantifying the use and potential benefits of artificial intelligence in scientific research. (2024). Nature Human Behaviour

  4. [4]

    The ethics of ChatGPT in medicine and healthcare: a systematic review on Large Language Models (LLMs)

    Haltaufderheide, J., Ranisch, R. The ethics of ChatGPT in medicine and healthcare: a systematic review on Large Language Models (LLMs). npj Digital Medicine, 7, 183 (2024)

  5. [5]

    R., Ekström, B., Rödl, M

    Haider, J., Söderström, K. R., Ekström, B., Rödl, M. (2024). GPT-fabricated scien- tificpapersonGoogleScholar:Keyfeatures,spread,andimplicationsforpreempting evidence manipulation. Harvard Kennedy School (HKS) Misinformation Review

  6. [6]

    of the ACM (CACM), Vol

    Nahar, M., Lee, S., Gullen, S., Lee, D.: Generative AI Policies under the Microscope: How CS Conferences Are Navigating the New Frontier in Scholarly Writing, ACM Comm. of the ACM (CACM), Vol. 68, No. 7, July 2025

  7. [7]

    Hanley, H. W. A., Durumeric, Z. (2024). Machine-Made Media: Monitoring the Mo- bilization of Machine-Generated Articles on Misinformation and Mainstream News Websites. Proceedings of the International AAAI Conference on Web and Social Media, 18(1), 542-556

  8. [8]

    Applied Sciences 13(9), 5783 (2023), 10.3390/app13095783

    Rahman, M.M., Watanobe, Y.: ChatGPT for Education and Research: Op- portunities, Threats, and Strategies. Applied Sciences 13(9), 5783 (2023), 10.3390/app13095783

  9. [9]

    et al., Spotting LLMs With Binocu- lars: Zero-Shot Detection of Machine-Generated Text, ICML 2024

    Hans, A., Schwarzschild, A., Cherepanova, V. et al., Spotting LLMs With Binocu- lars: Zero-Shot Detection of Machine-Generated Text, ICML 2024

  10. [10]

    et al., Fast-DetectGPT: Efficient Zero-Shot Detection of Machine-Generated Text via Conditional Probability Curvature, ICLR 2024

    Bao, G., Zhao, Y., Teng, Z. et al., Fast-DetectGPT: Efficient Zero-Shot Detection of Machine-Generated Text via Conditional Probability Curvature, ICLR 2024

  11. [11]

    GPTZero: GPTZero API for Developers. (2024). https://gptzero.me/ developers

  12. [12]

    (2023).https://openai.com/chatgpt

    OpenAI: ChatGPT-3.5. (2023).https://openai.com/chatgpt

  13. [13]

    Tripto, N., Venkatraman, S., Nahar, M., Lee, D., Beyond checkmate: exploring the creative chokepoints in AI text,arXiv:2501.19301

  14. [14]

    (2024).https://www

    LexisNexisApi: lexisNexis Web Services API Specification. (2024).https://www. lexisnexis.com/en-us/products/lexis-api.page

  15. [15]

    Ariyarathne, Gangani and Nwala, Alexander C.,3DLNews: A Three-decade Dataset of US Local News Articles, 2024, Association for Computing Machinery, 10.1145/3627673.3679165

  16. [16]

    on Machine Learning and Prin- ciples Practice of Knowledge Discovery in Databases (ECML-PKDD), Würzburg, Germany, September 2019

    Liao, Y., Wang, S., Han S., Lee J., Lee, D., Characterization and Early Detection of Evergreen News Articles Joint European Conf. on Machine Learning and Prin- ciples Practice of Knowledge Discovery in Databases (ECML-PKDD), Würzburg, Germany, September 2019

  17. [17]

    List of college and university student newspapers in the United States,Wikipedia, The Free Encyclopedia, https://en.wikipedia.org/wiki/List_of_college_and_ university_student_newspapers_in_the_United_States

  18. [18]

    In: Métais, E., Meziane, F., Sugumaran, V., Manning, W., Reiff-Marganiec, S

    Babakov, N., Dale, D., Gusev, I., Krotova, I., Panchenko, A.: Don’t lose the mes- sage while paraphrasing: A study on content preserving style transfer. In: Métais, E., Meziane, F., Sugumaran, V., Manning, W., Reiff-Marganiec, S. (eds.) Natural Language Processing and Information Systems, pp. 47–61. Springer, Cham (2023)

  19. [19]

    No Starch Press, San Francisco, CA (2020)

    Vasiliev, Y.: Natural Language Processing with Python and spaCy: A Practical Introduction. No Starch Press, San Francisco, CA (2020)

  20. [20]

    Slatkine, Genève, Switzerland (1978)

    Brunet, É., et al.: Le Vocabulaire de Jean Giraudoux Structure et Évolution. Slatkine, Genève, Switzerland (1978)

  21. [21]

    Kincaid, J.P., Fishburne Jr, R.P., Rogers, R.L., Chissom, B.S.: Derivation of new readability formulas (automated readability index, fog count and flesch reading ease formula) for navy enlisted personnel (1975)