pith. machine review for the scientific record. sign in

arxiv: 2604.26965 · v1 · submitted 2026-04-14 · 💻 cs.CY · cs.AI· cs.SI

Recognition: unknown

The Impact of AI-Generated Text on the Internet

Authors on Pith no claims yet

Pith reviewed 2026-05-10 13:57 UTC · model grok-4.3

classification 💻 cs.CY cs.AIcs.SI
keywords AI-generated textweb content analysissemantic diversitypublic perception of AIAI text detectioninternet archive samplingsentiment analysis
0
0 comments X

The pith

Roughly 35 percent of new websites by mid-2025 were classified as AI-generated or AI-assisted, rising from none before late 2022.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper measures the actual share of AI-generated or AI-assisted text among newly published websites by drawing a representative sample from the Internet Archive and running it through a current AI text detector. This scale had been unknown, leaving open whether concerns about reduced diversity, accuracy, and other effects on the web were grounded in real volume. The results show a steep climb to 35 percent accompanied by measurable drops in semantic diversity and rises in positive sentiment, yet no detectable decline in factual accuracy or stylistic variety. Public opinion, assessed in a separate survey, assumes stronger negative effects across all four dimensions than the data support.

Core claim

By mid-2025, 35 percent of newly published websites were classified as AI-generated or AI-assisted, up from zero before ChatGPT launched in late 2022. Increases in this share correlate negatively with semantic diversity and positively with positive sentiment. No statistically significant evidence appears for reduced factual accuracy or stylistic diversity. A user study finds that most US adults believe all four negative effects are occurring, with stronger belief among infrequent AI users and those holding negative views of the technology.

What carries the argument

A state-of-the-art AI text detector applied to a representative sample of websites archived between 2022 and 2025.

Load-bearing premise

The AI text detector accurately labels real-world web pages as AI-generated or AI-assisted without large numbers of false positives or negatives across varied site types and languages.

What would settle it

A human review of several hundred randomly selected sites from the same 2022-2025 archive sample that finds a materially different proportion labeled AI-generated or AI-assisted than the detector reports.

read the original abstract

The proliferation of AI-generated and AI-assisted text on the internet is feared to contribute to a degradation in semantic and stylistic diversity, factual accuracy, and other negative developments (sometimes subsumed under the Dead Internet Theory). What has hindered answering these questions is that it has not been understood just how much of the internet is actually AI-generated or AI-edited. To this end, we construct a representative sample of websites published on the internet between 2022 and 2025 using the Internet Archive, and apply a state-of-the-art AI text detector on them. We find that by mid-2025, roughly 35% of newly published websites were classified as AI-generated or AI-assisted, up from zero before ChatGPT's launch in late 2022. We also find statistically significant evidence for some of the identified hypotheses; for example, that increases in AI-generated text on the internet correlate negatively with semantic diversity and positively with the prevalence of positive sentiment. We do not, however, find statistically significant evidence supporting the hypothesis that an increased rate of AI-generated text on the internet decreases factual accuracy or stylistic diversity. Notably, this diverges from public perception, which we measure in a user study, where the majority of US adults turned out to believe in all four of the above-mentioned hypotheses. Individuals who do not use AI or use it infrequently tend to believe in these negative impacts more than those who use it frequently; similarly, individuals who hold negative views of AI tend to believe in these hypotheses more than those with favorable views of the technology.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript constructs a representative sample of websites from the Internet Archive published between 2022 and 2025, applies a state-of-the-art AI text detector to classify content as AI-generated or AI-assisted, and reports that this proportion reached roughly 35% by mid-2025 (up from zero pre-ChatGPT). It identifies statistically significant correlations between rising AI text prevalence and reduced semantic diversity plus increased positive sentiment, but not for factual accuracy or stylistic diversity. A separate user study of US adults shows that public beliefs overestimate these negative effects, with stronger beliefs among infrequent AI users and those holding negative views of the technology.

Significance. If the detector classifications prove reliable on real-world web data, the work supplies a much-needed empirical baseline for AI text prevalence and its measurable impacts, moving discussions of the Dead Internet Theory from speculation to data. The temporal sampling via the Internet Archive and the direct comparison to public perceptions are clear strengths that could inform policy and further research on content quality. The absence of domain-specific validation and statistical details, however, prevents the central prevalence and correlation claims from being evaluated at present.

major comments (3)
  1. [Methods] Methods section: The sampling procedure from the Internet Archive lacks any reported total sample size, selection criteria for 'newly published' websites, temporal binning details, or handling of non-English/mixed-content pages. Without these, the representativeness of the 35% figure and its trend cannot be assessed.
  2. [AI text detection] AI text detection subsection: No validation, calibration, or performance metrics (precision, recall, confusion matrix) are supplied for the detector on the actual archived web corpus, which contains short snippets, boilerplate, HTML artifacts, and multilingual text. This is load-bearing for the headline prevalence claim, as unquantified error rates could materially change the 35% estimate and all downstream correlations.
  3. [Results] Results section: Claims of 'statistically significant evidence' for correlations with semantic diversity and positive sentiment provide no sample sizes, p-values, effect sizes, or measurement details for diversity/accuracy metrics. This prevents evaluation of whether the reported correlations are robust or driven by the classification step.
minor comments (2)
  1. [Abstract] Abstract: Adding the overall sample size, confidence intervals around the 35% figure, or a brief note on detector validation status would give readers immediate context for the central result.
  2. [User study] User study description: The number of participants, recruitment method, exact question wording, and demographic breakdown are not reported, limiting assessment of the perception findings.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their thorough and constructive review. We address each major comment point by point below. Where the manuscript was incomplete, we have revised it to incorporate the requested details and clarifications.

read point-by-point responses
  1. Referee: [Methods] Methods section: The sampling procedure from the Internet Archive lacks any reported total sample size, selection criteria for 'newly published' websites, temporal binning details, or handling of non-English/mixed-content pages. Without these, the representativeness of the 35% figure and its trend cannot be assessed.

    Authors: We agree that the Methods section requires additional detail to allow evaluation of representativeness. In the revised manuscript we have added the total sample size, explicit selection criteria for newly published websites (first archived between 2022 and 2025 with minimum text length), quarterly temporal binning specifications, and the language-filtering procedure used for non-English and mixed-content pages. These changes directly address the concern. revision: yes

  2. Referee: [AI text detection] AI text detection subsection: No validation, calibration, or performance metrics (precision, recall, confusion matrix) are supplied for the detector on the actual archived web corpus, which contains short snippets, boilerplate, HTML artifacts, and multilingual text. This is load-bearing for the headline prevalence claim, as unquantified error rates could materially change the 35% estimate and all downstream correlations.

    Authors: We acknowledge that domain-specific validation on the archived corpus is absent and that this limits confidence in the prevalence estimate. Ground-truth labels for real-world web pages do not exist, so a full confusion matrix on this corpus cannot be produced. In the revision we have added the detector's published benchmark metrics, a new limitations subsection discussing error sources relevant to web data (short text, boilerplate, multilingual content), and a sensitivity analysis showing how the 35% figure and correlations shift under plausible false-positive and false-negative rates. We have also tempered the interpretation of the headline claim accordingly. revision: partial

  3. Referee: [Results] Results section: Claims of 'statistically significant evidence' for correlations with semantic diversity and positive sentiment provide no sample sizes, p-values, effect sizes, or measurement details for diversity/accuracy metrics. This prevents evaluation of whether the reported correlations are robust or driven by the classification step.

    Authors: We agree that the statistical reporting was insufficient. The revised Results section now reports the exact sample sizes for each correlation test, the p-values, effect sizes, and the precise operational definitions and tools used to compute semantic diversity, positive sentiment, factual accuracy, and stylistic diversity. These additions allow readers to assess both statistical robustness and whether the classification step drives the observed relationships. revision: yes

Circularity Check

0 steps flagged

No significant circularity; purely empirical measurement and correlation analysis

full rationale

The paper constructs a sample of archived websites, applies an external state-of-the-art AI text detector to classify content as AI-generated or assisted, and reports direct empirical percentages plus statistical correlations with diversity and sentiment metrics. No equations, fitted parameters, self-referential predictions, or derivations are present that reduce to the input data by construction. The 35% figure and trend are outputs of the classification step on the sampled corpus, not inputs renamed as results. Self-citations are absent from the load-bearing claims, and the study remains self-contained against external benchmarks without internal circular reduction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The prevalence and correlation claims rest on the unverified accuracy of the AI detector applied to heterogeneous web text and on the representativeness of the Internet Archive sample for all newly published sites.

axioms (1)
  • domain assumption AI text detectors can reliably classify web content as AI-generated or AI-assisted
    Detector output is treated as ground truth for the 35% figure and all downstream correlations.

pith-pipeline@v0.9.0 · 5585 in / 1233 out tokens · 49837 ms · 2026-05-10T13:57:56.526340+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

39 extracted references · 28 canonical work pages · 4 internal anchors

  1. [1]

    Agarwal, M

    D. Agarwal, M. Naaman, and A. Vashistha. AI suggestions homogenize writing toward west- ern styles and diminish cultural nuances. In Proceedings of the 2025 CHI conference on hu- man factors in computing systems, pages 1–21,

  2. [2]

    Self-consuming generative models go mad

    S. Alemohammad, J. Casco-Rodriguez, L. Luzi, A. I. Humayun, H. Babaei, D. LeJeune, A. Siahkoohi, and R. Baraniuk. Self-consuming generative models go MAD.arXiv preprint arXiv:2307.01850,

  3. [3]

    Barbaresi

    A. Barbaresi. Trafilatura: A web scraping library and command-line tool for text discovery and extraction. InProceedings of the ACL 2021 Sys- tem Demonstrations, pages 122–131,

  4. [4]

    A. R. Basani and P.-Y. Chen. Diversity boosts AI-generated text detection.arXiv preprint arXiv:2509.18880,

  5. [5]

    URL https://spec.c2pa.org/specifications/ specifications/2.1/specs/_attachments/ C2PA_Specification.pdf. N. A. Chandra, R. Murtfeldt, L. Qiu, A. Karmakar, H. Lee, E. Tanumihardja, K. Farhat, B. Caf- fee, S. Paik, C. Lee, et al. Deepfake-Eval- 2024: A multi-modal in-the-wild benchmark of deepfakes circulated in 2024.arXiv preprint arXiv:2503.02857,

  6. [6]

    Revealing weaknesses in text watermarking through self-information rewrite attacks.arXiv preprint arXiv:2505.05190, 2025

    Y. Cheng, H. Guo, Y. Li, and L. Sigal. Reveal- ing weaknesses in text watermarking through self-information rewrite attacks.arXiv preprint arXiv:2505.05190,

  7. [7]

    Croitoru, A.-I

    F.-A. Croitoru, A.-I. Hiji, V. Hondru, N. C. Ristea, P. Irofti, M. Popescu, C. Rusu, R. T. Ionescu, 11 The Impact of AI-Generated Text on the Internet F. S. Khan, and M. Shah. Deepfake media generation and detection in the generative AI era: A survey and outlook.arXiv preprint arXiv:2411.19537,

  8. [8]

    Dawkins, K

    H. Dawkins, K. C. Fraser, and S. Kiritchenko. When detection fails: The power of fine-tuned models to generate human-like social media text. InFindings of the Association for Compu- tational Linguistics: ACL 2025, pages 13494– 13527,

  9. [9]

    URL https://huggingface.co/desklib/ ai-text-detector-v1.01. E. Dohmatob, Y. Feng, and J. Kempe. Model col- lapse demystified: The case of regression.Ad- vances in Neural Information Processing Systems, 37:46979–47013, 2024a. E. Dohmatob, Y. Feng, A. Subramonian, and J. Kempe. Strong model collapse.arXiv preprint arXiv:2410.04840, 2024b. L. Dugan, A. Hwang...

  10. [10]

    Gan and Y

    Z. Gan and Y. Liu. Towards a theoretical under- standing of synthetic data in LLM post-training: A reverse-bottleneck perspective.arXiv preprint arXiv:2410.01720,

  11. [11]

    K. Garg, S. Alam, D. Ayala, M. Graham, M. C. Weigle, and M. L. Nelson. Longitudinal sam- plingofURLsfrom thewayback machine.arXiv preprint arXiv:2507.14752,

  12. [12]

    Preprint at arXiv:2404.01413 (2024)

    M. Gerstgrasser, R. Schaeffer, A. Dey, R. Rafailov, H. Sleight, J. Hughes, T. Korbak, R. Agrawal, D. Pai, A. Gromov, D. A. Roberts, D. Yang, D. L. Donoho, and O. Koyejo. Is model collapse in- evitable? breaking the curse of recursion by accumulating real and synthetic data.arXiv preprint arXiv:2404.01413,

  13. [13]

    A. Hans, A. Schwarzschild, V. Cherepanova, H. Kazemi, A. Saha, M. Goldblum, J. Geiping, and T. Goldstein. Spotting LLMs with Binocu- lars: Zero-shot detection of machine-generated text.arXiv preprint arXiv:2401.12070,

  14. [14]

    M. S. Hee, S. Sharma, R. Cao, P. Nandi, P. Nakov, T. Chakraborty, and R. K.-W. Lee. Recent ad- vancesinonlinehatespeechmoderation: Multi- modality and the role of large models.Findings of the Association for Computational Linguistics: EMNLP 2024, pages 4407–4419,

  15. [15]

    GPT-4o System Card

    A. Hurst, A. Lerer, A. P. Goucher, A. Perelman, A. Ramesh, A. Clark, A. Ostrow, A. Welihinda, A. Hayes, A.Radford, etal. GPT-4osystem card. arXiv preprint arXiv:2410.21276,

  16. [16]

    Machines in the crowd? measuring the footprint of machine-generated text on reddit.arXiv preprint arXiv:2510.07226, 2025

    L. La Cava, L. M. Aiello, and A. Tagarelli. Ma- chines in the crowd? measuring the footprint of machine-generated text on reddit.arXiv preprint arXiv:2510.07226,

  17. [17]

    Monitoring AI -modified content at scale: A case study on the impact of ChatGPT on AI conference peer reviews

    W. Liang, Z. Izzo, Y. Zhang, H. Lepp, H. Cao, X. Zhao, L. Chen, H. Ye, S. Liu, Z. Huang, et al. MonitoringAI-modifiedcontentatscale: Acase studyontheimpactofchatgptonAIconference peer reviews.arXiv preprint arXiv:2403.07183,

  18. [18]

    K. C. Marturi and H. H. Elwazzan. LLM-guided planning and summary-based scientific text simplification: DS@GT at CLEF 2025 Simple- Text.arXiv preprint arXiv:2508.11816,

  19. [19]

    Examining the prevalence and dynamics of AI-generated media in art subreddits,

    H. Matatov, M. A. L. Quéré, O. Amir, and M. Naa- man. Examining the prevalence and dynamics of AI-generated media in art subreddits.arXiv preprint arXiv:2410.07302,

  20. [20]

    R. Merx, H. Suominen, A. J. G. Correia, T. Cohn, and E. Vylomova. Low-resource machine trans- lation: what for? who for? an observational study on a dedicated tetun language transla- tion service.arXiv preprint arXiv:2411.12262,

  21. [21]

    Muzumdar, S

    P. Muzumdar, S. Cheemalapati, S. R. RamiReddy, K. Singh, G. Kurian, and A. Muley. The dead in- ternet theory: a survey on artificial interactions and the future of social media.arXiv preprint arXiv:2502.00007,

  22. [22]

    Nemecek, Y

    A. Nemecek, Y. Jiang, and E. Ayday. Watermark- ing without standards is not AI governance. arXiv preprint arXiv:2505.23814,

  23. [23]

    Palla, J

    K. Palla, J. L. R. García, C. Hauff, F. Fabbri, A. Damianou, H. Lindström, D. Taber, and M. Lalmas. Policy-as-prompt: Rethinking con- tent moderation in the age of large language models. InProceedings of the 2025 ACM Con- ference on Fairness, Accountability, and Trans- parency, pages 840–854,

  24. [24]

    Reimers and I

    N. Reimers and I. Gurevych. Sentence-BERT: Sentence embeddings using siamese BERT- networks. InProceedings of the 2019 confer- ence on empirical methods in natural language processing and the 9th international joint con- ference on natural language processing (EMNLP- IJCNLP), pages 3982–3992,

  25. [25]

    Rijsbosch, G

    B. Rijsbosch, G. van Dijck, and K. Kollnig. Adop- tion of watermarking for generative AI systems in practice and implications under the new eu AI act.arXiv preprint arXiv:2503.18156,

  26. [26]

    AI use in American newspapers is widespread, uneven, and rarely disclosed

    J. Russell, M. Karpinska, D. Akinode, K. Thai, B. Emi, M. Spero, and M. Iyyer. AI use in american newspapers is widespread, un- even, and rarely disclosed.arXiv preprint arXiv:2510.18774,

  27. [27]

    arXiv preprint arXiv:2310.10076 , year=

    K. Saito, A. Wachi, K. Wataoka, and Y. Aki- moto. Verbosity bias in preference labeling by large language models.arXiv preprint arXiv:2310.10076,

  28. [28]

    Santy, P

    S. Santy, P. Bhattacharya, M. H. Ribeiro, K. Allen, andS.Oh. Whenincentivesbackfire, datastops being human.arXiv preprint arXiv:2502.07732,

  29. [29]

    Schaeffer, J

    R. Schaeffer, J. Kazdan, A. C. Arulandu, and S. Koyejo. Position: Model collapse does not mean what you think.arXiv preprint arXiv:2503.03150,

  30. [30]

    Towards Understanding Sycophancy in Language Models

    M. Sharma, M. Tong, T. Korbak, D. Duvenaud, A. Askell, S. R. Bowman, N. Cheng, E. Durmus, Z. Hatfield-Dodds, S. R. Johnston, et al. To- wards understanding sycophancy in language models.arXiv preprint arXiv:2310.13548,

  31. [31]

    D. H. Spennemann. Delving into: the quan- tification of AI-generated content on the 14 The Impact of AI-Generated Text on the Internet internet (synthetic data).arXiv preprint arXiv:2504.08755,

  32. [32]

    Thompson, M

    B. Thompson, M. Dhaliwal, P. Frisch, T. Domhan, and M. Federico. A shocking amount of the web is machine translated: Insights from multi- way parallelism. InFindings of the Association for Computational Linguistics ACL 2024, pages 1763–1775,

  33. [33]

    Thorne, A

    J. Thorne, A. Vlachos, C. Christodoulopoulos, and A. Mittal. FEVER: a large-scale dataset for Fact Extraction and VERification. InProceedings of the 2018 Conference of the North American ChapteroftheAssociationforComputationalLin- guistics: Human Language Technologies, Volume 1 (Long Papers), pages 809–819,

  34. [34]

    LLMs Can Get "Brain Rot": A Pilot Study on Twitter/X

    S. Xing, J. Hong, Y. Wang, R. Chen, Z. Zhang, A. Grama, Z. Tu, and Z. Wang. Llms can get" brain rot"!arXiv preprint arXiv:2510.13928,

  35. [35]

    Tomz, Christopher D

    J. Zhang, S. Yu, D. Chong, A. Sicilia, M. R. Tomz, C. D. Manning, and W. Shi. Verbal- ized sampling: How to mitigate mode col- lapse and unlock LLM diversity.arXiv preprint arXiv:2510.01171,

  36. [36]

    Zhang and T

    Y. Zhang and T. Zhang. The impact of genera- tive AI on content platforms: A two-sided mar- ket analysis with multi-dimensional quality het- erogeneity.arXiv preprint arXiv:2410.13101,

  37. [37]

    15 The Impact of AI-Generated Text on the Internet A. Robustness Analysis of AI-Generated Text Detectors We conducted a systematic robustness analysis comparing four detectors of AI-generated text— Binoculars (Hans et al., 2024), Desklib (Desklib, 2025), DivEye (Basani and Chen, 2025), and the Pangram v3 commercial API (Pangram Labs, 2026)—across five exp...

  38. [38]

    annotation frameworks. Claim assignments were managed by the application to ensure approximately 20% overlap across annotators, enabling computation of inter-annotator agreement via Krippendorff’s alpha (Krippendorff, 2004), Fleiss’ kappa, and Cohen’s kappa for pairwise comparisons. 23 The Impact of AI-Generated Text on the Internet Figure 9 | Welcome Scr...

  39. [39]

    but whose figures are omitted from the main body for space. For each hypothesis, we show (a) the correlation between the measur- able signal and the aggregate AI likelihood score across monthly samples, (b) the overall distribution of participant survey responses, (c) responses stratified by AI usage frequency, and (d) responses stratified by general view...