pith. sign in

arxiv: 2303.11156 · v4 · pith:HFJYPCLFnew · submitted 2023-03-17 · 💻 cs.CL · cs.AI· cs.LG

Can AI-Generated Text be Reliably Detected?

Pith reviewed 2026-05-20 19:24 UTC · model grok-4.3

classification 💻 cs.CL cs.AIcs.LG
keywords AI text detectionparaphrasing attacksdetector robustnesswatermark spoofingtotal variation distance
0
0 comments X

The pith

Recursive paraphrasing attacks substantially lower detection rates for current AI text detectors while preserving most text quality.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines whether AI-generated text can be reliably identified when an attacker actively tries to evade detection. It introduces a recursive paraphrasing method that rewrites passages multiple times with a language model to alter the statistical patterns detectors look for. Tests on roughly 300-token passages show large drops in accuracy across watermarking schemes, neural classifiers, zero-shot methods, and retrieval-based approaches, with only small losses in fluency and meaning. The authors also show how attackers can spoof watermarks to label human text as AI-generated and derive a theoretical link between the best possible detector performance and the distance between human and machine text distributions.

Core claim

Recursive paraphrasing reduces detection rates for a range of AI text detectors including watermark-based and neural-network methods while causing only minor degradation in text quality. The work further demonstrates that watermarked models are vulnerable to spoofing attacks that misclassify human text as AI-generated without white-box access, and it supplies a theoretical framework that relates the AUROC of the strongest detector to the Total Variation distance between the human and AI text distributions.

What carries the argument

The recursive paraphrasing attack, which iteratively rewrites text using a language model to disrupt detector-specific features such as watermarks or statistical signatures while aiming to keep semantic content and fluency intact.

If this is right

  • Detectors that rely on fixed statistical or watermark features lose reliability once an attacker applies iterative rewriting.
  • Watermarking schemes in deployed models can be reverse-engineered enough to enable spoofing of human-written text.
  • The theoretical bound implies that detection performance is capped by how close AI distributions get to human distributions.
  • Practical systems must incorporate defenses that survive multiple rounds of paraphrasing rather than single-pass checks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Detection may shift from surface-level cues toward deeper semantic consistency checks that survive rewriting.
  • The arms race between generation and detection could require periodic retraining or new watermark designs that resist inference.
  • If models continue to close the distribution gap, the framework suggests reliable binary detection becomes impossible without additional side information.

Load-bearing premise

The paraphrased output stays close enough in meaning and readability to the original that it still counts as a realistic sample from the target distribution.

What would settle it

Run the recursive paraphrasing procedure on held-out passages and measure whether detection accuracy falls below 60 percent while human quality ratings or perplexity scores remain within 15 percent of the originals.

read the original abstract

Large Language Models (LLMs) perform impressively well in various applications. However, the potential for misuse of these models in activities such as plagiarism, generating fake news, and spamming has raised concern about their responsible use. Consequently, the reliable detection of AI-generated text has become a critical area of research. AI text detectors have shown to be effective under their specific settings. In this paper, we stress-test the robustness of these AI text detectors in the presence of an attacker. We introduce recursive paraphrasing attack to stress test a wide range of detection schemes, including the ones using the watermarking as well as neural network-based detectors, zero shot classifiers, and retrieval-based detectors. Our experiments conducted on passages, each approximately 300 tokens long, reveal the varying sensitivities of these detectors to our attacks. Our findings indicate that while our recursive paraphrasing method can significantly reduce detection rates, it only slightly degrades text quality in many cases, highlighting potential vulnerabilities in current detection systems in the presence of an attacker. Additionally, we investigate the susceptibility of watermarked LLMs to spoofing attacks aimed at misclassifying human-written text as AI-generated. We demonstrate that an attacker can infer hidden AI text signatures without white-box access to the detection method, potentially leading to reputational risks for LLM developers. Finally, we provide a theoretical framework connecting the AUROC of the best possible detector to the Total Variation distance between human and AI text distributions. This analysis offers insights into the fundamental challenges of reliable detection as language models continue to advance. Our code is publicly available at https://github.com/vinusankars/Reliability-of-AI-text-detectors.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript claims that current AI-generated text detectors, including watermarking, neural network-based, zero-shot, and retrieval-based methods, are vulnerable to a recursive paraphrasing attack. Experiments on ~300-token passages show that this attack significantly reduces detection rates while only slightly degrading text quality. The work also demonstrates spoofing attacks on watermarked LLMs without white-box access and provides a theoretical analysis connecting the AUROC of the optimal detector to the total variation distance between human and AI text distributions. Code is released publicly.

Significance. If the empirical findings are substantiated with fuller experimental details, the paper would highlight important practical limitations of AI text detectors, which is relevant for AI safety and content moderation. The theoretical connection to total variation distance offers a clean framing of fundamental detectability limits. Public code availability supports reproducibility and is a positive contribution.

major comments (2)
  1. [Experiments] Experiments section (and abstract): the central empirical claim that recursive paraphrasing reduces detection rates while only slightly degrading quality is presented at a high level. The manuscript must specify the paraphraser model and training details, the exact number of recursion steps, the quality metric(s) used (e.g., perplexity, semantic similarity via embeddings, or human ratings), baseline comparisons to non-recursive paraphrasing or other attacks, and statistical significance testing across multiple runs or seeds.
  2. [Attack Description] Attack description: the assumption that each recursive paraphrase step keeps the output distribution sufficiently close to the original AI (or human) distribution for the attack to be realistic is load-bearing but not quantitatively verified. The paper should report concrete measures such as embedding cosine distances, n-gram overlap, or perplexity shifts between original and recursively paraphrased text to confirm that quality degradation remains minor while detector features are altered.
minor comments (2)
  1. [Abstract] Abstract: the phrase 'our recursive paraphrasing method' is introduced without a one-sentence definition; adding a brief characterization would improve standalone readability.
  2. [Theoretical Analysis] Theoretical framework: while the AUROC–total variation link follows from standard definitions, the manuscript should explicitly note the assumptions on the support of the text distributions and whether the bound is tight or merely existential.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. The comments highlight areas where additional clarity and substantiation will strengthen the manuscript. We have revised the paper to address both major comments by expanding the experimental details and providing quantitative verification for the attack. Our point-by-point responses follow.

read point-by-point responses
  1. Referee: [Experiments] Experiments section (and abstract): the central empirical claim that recursive paraphrasing reduces detection rates while only slightly degrading quality is presented at a high level. The manuscript must specify the paraphraser model and training details, the exact number of recursion steps, the quality metric(s) used (e.g., perplexity, semantic similarity via embeddings, or human ratings), baseline comparisons to non-recursive paraphrasing or other attacks, and statistical significance testing across multiple runs or seeds.

    Authors: We agree that the initial presentation was high-level and that fuller details are required to substantiate the central empirical claims. In the revised manuscript we now explicitly state the paraphraser model (GPT-3.5-turbo with a fixed paraphrasing prompt), training/inference details, the precise number of recursion steps used in the main experiments, the quality metrics employed (perplexity and embedding-based semantic similarity), direct comparisons against single-step paraphrasing baselines, and statistical significance testing (means and standard deviations over multiple random seeds with appropriate hypothesis tests). These additions appear in the Experiments section and the abstract has been updated accordingly. revision: yes

  2. Referee: [Attack Description] Attack description: the assumption that each recursive paraphrase step keeps the output distribution sufficiently close to the original AI (or human) distribution for the attack to be realistic is load-bearing but not quantitatively verified. The paper should report concrete measures such as embedding cosine distances, n-gram overlap, or perplexity shifts between original and recursively paraphrased text to confirm that quality degradation remains minor while detector features are altered.

    Authors: We concur that explicit quantitative checks on distributional closeness are important for establishing the realism of the attack. The revised manuscript now includes a dedicated paragraph (with accompanying table) reporting embedding cosine distances, n-gram overlap statistics, and perplexity shifts between the original and recursively paraphrased texts. These measurements confirm that semantic and distributional fidelity is largely preserved while detector-evading features are modified, thereby supporting the practicality of the attack. revision: yes

Circularity Check

0 steps flagged

Theoretical AUROC-TV link follows standard statistics; no circular reductions found

full rationale

The paper's theoretical framework connects AUROC to total variation distance via standard definitions from hypothesis testing and distribution distances; this is a direct application of known results and does not reduce to any fitted parameter or self-referential definition within the paper. Recursive paraphrasing attacks and detector evaluations are presented as empirical experiments on ~300-token passages without any derivation chain that equates outputs to inputs by construction. No self-citation load-bearing steps, uniqueness theorems, or ansatzes are invoked for the central claims. The work is self-contained against external statistical benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Based on the abstract alone, the central claims rest on standard assumptions about text distributions and detector optimality; no free parameters, invented entities, or ad-hoc axioms are explicitly introduced in the provided summary.

axioms (1)
  • domain assumption Human and AI text can be modeled as two probability distributions whose total variation distance governs the best possible detector AUROC.
    Invoked in the final theoretical framework section of the abstract.

pith-pipeline@v0.9.0 · 5848 in / 1265 out tokens · 42994 ms · 2026-05-20T19:24:08.385213+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 24 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Who Owns This Agent? Tracing AI Agents Back to Their Owners

    cs.CR 2026-05 unverdicted novelty 8.0

    A canary injection protocol for linking observed AI agent behavior to the responsible account at the hosting vendor, with robust variants for adversarial filtering.

  2. Base Models Look Human To AI Detectors

    cs.CL 2026-05 unverdicted novelty 7.0

    Base model text evades AI detectors better than instruction-tuned text, and the HIP method strengthens this trade-off across model sizes.

  3. PeerPrism: Peer Evaluation Expertise vs Review-writing AI

    cs.CL 2026-04 unverdicted novelty 7.0

    PeerPrism benchmark demonstrates that state-of-the-art LLM detectors conflate surface text style with intellectual contribution and fail on hybrid human-AI peer reviews.

  4. LLM Agents Make Collective Belief Dynamics Programmable: Challenges and Research Directions

    cs.MA 2026-05 unverdicted novelty 6.0

    LLM agents make collective belief dynamics programmable, with simulations showing coordinated agents induce stable belief shifts, and four structural properties that complicate detection and defense.

  5. High-Rate Public-Key Pseudorandom Codes for Edit Errors

    cs.CR 2026-05 unverdicted novelty 6.0

    First high-rate public-key binary PRCs for edit channels via reduction from Hamming-robust PRCs and alphabet-size constructions attaining near-Singleton rates.

  6. The End of Trust: How Agentic AI Breaks Security Assumptions

    cs.CR 2026-05 unverdicted novelty 6.0

    Agentic AI eliminates the fidelity-scale tradeoff in deception, enabling the Infinite Impostor attack that hijacks trusted relationships at mass scale and requiring a shift to suspect-by-default security based on eval...

  7. The Algorithmic Caricature: Auditing LLM-Generated Political Discourse Across Crisis Events

    cs.CL 2026-05 unverdicted novelty 6.0

    LLM-generated political discourse across crises is fluent yet caricatured: more negative, less emotionally varied, more structurally regular, and lexically abstract than observed online populations.

  8. Process Matters more than Output for Distinguishing Humans from Machines

    cs.AI 2026-05 unverdicted novelty 6.0

    Process-level features from 30 cognitive tasks distinguish humans from frontier AI agents more effectively than task performance or output matching, achieving mean classifier AUC of 0.88, with fine-tuning experiments ...

  9. Process Matters more than Output for Distinguishing Humans from Machines

    cs.AI 2026-05 unverdicted novelty 6.0

    A new battery of 30 cognitive tasks demonstrates that process-level behavioral features distinguish humans from frontier AI agents better than performance metrics (mean AUC 0.88), with process-specific fine-tuning imp...

  10. Detecting Verbatim LLM Copy-Paste in Homework

    cs.CR 2026-05 unverdicted novelty 6.0

    SteganoPrompt embeds a hidden instruction in assignment prompts via the Unicode Tags block so that LLMs add a detectable signature to responses when the prompt is pasted verbatim.

  11. Block-wise Codeword Embedding for Reliable Multi-bit Text Watermarking

    cs.CR 2026-05 unverdicted novelty 6.0

    BREW achieves TPR of 0.965 and FPR of 0.02 under 10% synonym substitution by shifting from ECC decoding to designated verification with block voting and local validation.

  12. DSIPA: Detecting LLM-Generated Texts via Sentiment-Invariant Patterns Divergence Analysis

    cs.CL 2026-04 unverdicted novelty 6.0

    DSIPA is a zero-shot black-box detector that uses sentiment distribution consistency and preservation metrics to identify LLM text, reporting up to 49.89% F1 gains over baselines across domains and models.

  13. Beyond A Fixed Seal: Adaptive Stealing Watermark in Large Language Models

    cs.CR 2026-04 unverdicted novelty 6.0

    Adaptive Stealing improves watermark theft efficiency from LLMs via Position-Based Seal Construction and Adaptive Selection modules that dynamically choose optimal attack perspectives.

  14. Why AI-Generated Text Detection Fails: Evidence from Explainable AI Beyond Benchmark Accuracy

    cs.CL 2026-03 conditional novelty 6.0

    AI-generated text detectors achieve high benchmark accuracy by exploiting unstable dataset-specific linguistic features, as evidenced by cross-domain degradation and differing SHAP explanations across corpora.

  15. Privacy-Preserving Proof of Human Authorship via Zero-Knowledge Process Attestation

    cs.CR 2026-02 unverdicted novelty 6.0

    ZK-PoP uses Groth16 proofs, Pedersen commitments, and Bulletproof range proofs to attest that behavioral feature vectors and content evolution match human patterns without exposing the raw data.

  16. Detecting Cognitive Signatures in Typing Behavior for Non-Intrusive Authorship Verification

    cs.CR 2026-02 unverdicted novelty 6.0

    Cognitive Load Correlation from keystroke timings distinguishes genuine human composition from mechanical transcription with estimated 85-95% accuracy in a non-intrusive framework.

  17. RedNote-Vibe: A Dataset for Capturing Temporal Dynamics of AI-Generated Text in Lifestyle Social Media

    cs.CL 2025-09 unverdicted novelty 6.0

    RedNote-Vibe supplies a longitudinal dataset of AI versus human lifestyle posts from 2020 to mid-2025 plus the PLAD detection framework that applies cognitive psychology signatures for improved AI-text identification.

  18. Multi-Level Contextual Token Relation Modeling for Machine-Generated Text Detection

    cs.CL 2026-05 unverdicted novelty 5.0

    A multi-level framework that models local and global relations among token detection scores to improve machine-generated text detection with low overhead.

  19. Chainwash: Multi-Step Rewriting Attacks on Diffusion Language Model Watermarks

    cs.CL 2026-05 unverdicted novelty 5.0

    Chained rewrites by open-weight LLMs reduce watermark detection on diffusion LM outputs from 87.9% to 4.86% after five steps across multiple styles and models.

  20. "Don't Be Afraid, Just Learn": Insights from Industry Practitioners to Prepare Software Engineers in the Age of Generative AI

    cs.SE 2026-04 unverdicted novelty 5.0

    Industry practitioners indicate that generative AI heightens demand for prompting and output evaluation skills while reinforcing the value of problem-solving, critical thinking, architecture design, and debugging in s...

  21. Mitigating Watermark Forgery in Generative Models via Randomized Key Selection

    cs.CR 2025-07 unverdicted novelty 5.0

    Randomized per-query key selection with single-key detection acceptance bounds forgery success rate independently of collected samples while preserving model utility.

  22. A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions

    cs.CL 2023-11 unverdicted novelty 5.0

    The paper surveys hallucination in LLMs with an innovative taxonomy, factors, detection methods, benchmarks, mitigation strategies, and open research directions.

  23. Human-Provenance Verification should be Treated as Labor Infrastructure in AI-Saturated Markets

    cs.CY 2026-05 unverdicted novelty 4.0

    AI-saturated markets will produce premiums for verified human presence in labor, requiring governance to treat human-provenance verification as infrastructure rather than optional authenticity labels.

  24. From AI-Generated Content to Agentic Action: Security and Safety Threats in Generative AI

    cs.CR 2026-05 unverdicted novelty 3.0

    The paper analyzes evolving security and safety threats in generative AI from content generation to agentic actions, noting that attack surfaces expand faster than defenses and that many safeguards require institution...

Reference graph

Works this paper leans on

105 extracted references · 105 canonical work pages · cited by 23 Pith papers · 15 internal anchors

  1. [1]

    My ai safety lecture for ut effective altruism

    Scott Aaronson. My ai safety lecture for ut effective altruism. November 2022. URL https://scottaaronson.blog/?p=6823

  2. [2]

    Generating sentiment-preserving fake online reviews using neural language models and their human-and machine-based detection

    David Ifeoluwa Adelani, Haotian Mai, Fuming Fang, Huy H Nguyen, Junichi Yamagishi, and Isao Echizen. Generating sentiment-preserving fake online reviews using neural language models and their human-and machine-based detection. In Advanced Information Networking and Applications: Proceedings of the 34th International Conference on Advanced Information Netw...

  3. [3]

    Ai plagiarism detection software keeps falsely accusing students of cheating

    Noor Al-Sibai. Ai plagiarism detection software keeps falsely accusing students of cheating. Futurism, 2023. URL https://futurism.com/ai-plagiarism-software-false-accusing-students

  4. [4]

    Natural language watermarking: Design, analysis, and a proof-of-concept implementation

    Mikhail J Atallah, Victor Raskin, Michael Crogan, Christian Hempelmann, Florian Kerschbaum, Dina Mohamed, and Sanket Naik. Natural language watermarking: Design, analysis, and a proof-of-concept implementation. In Information Hiding: 4th International Workshop, IH 2001 Pittsburgh, PA, USA, April 25--27, 2001 Proceedings 4, pp.\ 185--200. Springer, 2001

  5. [6]

    Comparison of two pseudo-random number generators

    Lenore Blum, Manuel Blum, and Mike Shub. Comparison of two pseudo-random number generators. In Advances in Cryptology: Proceedings of CRYPTO '82, pp.\ 61--78. Plenum, 1982

  6. [7]

    How to generate cryptographically strong sequences of pseudorandom bits

    Manuel Blum and Silvio Micali. How to generate cryptographically strong sequences of pseudorandom bits. SIAM Journal on Computing, 13 0 (4): 0 850--864, 1984. doi:10.1137/0213053. URL https://doi.org/10.1137/0213053

  7. [8]

    Language models are few-shot learners

    Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language models are few-shot learners. Advances in neural information processing systems, 33: 0 1877--1901, 2020

  8. [9]

    On the possibilities of ai-generated text detection, 2023

    Souradip Chakraborty, Amrit Singh Bedi, Sicheng Zhu, Bang An, Dinesh Manocha, and Furong Huang. On the possibilities of ai-generated text detection, 2023

  9. [10]

    Cnet secretly used ai on articles that didn’t disclose that fact, staff say

    Jon Christian. Cnet secretly used ai on articles that didn’t disclose that fact, staff say. January 2023. URL https://futurism.com/cnet-ai-articles-label

  10. [11]

    Parrot: Paraphrase generation for nlu., 2021

    Prithiviraj Damodaran. Parrot: Paraphrase generation for nlu., 2021

  11. [12]

    Turn-it-in: Ai fails students for not using ai

    Mehul Reuben Das. Turn-it-in: Ai fails students for not using ai. Firstpost, 2023. URL https://www.firstpost.com/world/plagiarism-detector-turnitin-keeps-falsely-accusing-students-of-cheating-using-ai-12704662.html

  12. [14]

    Geoffrey A. Fowler. We tested a new chatgpt-detector for teachers. it flagged an innocent student. The Washington Post, 2023. URL https://www.washingtonpost.com/technology/2023/04/01/chatgpt-cheating-detection-turnitin/

  13. [15]

    Llm detectors still fall short of real world: Case of llm-generated short news-like posts, 2024

    Henrique Da Silva Gameiro, Andrei Kucharavy, and Ljiljana Dolamic. Llm detectors still fall short of real world: Case of llm-generated short news-like posts, 2024. URL https://arxiv.org/abs/2409.03291

  14. [18]

    On the learnability of watermarks for language models, 2024

    Chenchen Gu, Xiang Lisa Li, Percy Liang, and Tatsunori Hashimoto. On the learnability of watermarks for language models, 2024. URL https://arxiv.org/abs/2312.04469

  15. [19]

    Accused of cheating by an algorithm, and a professor she had never met

    Kashmir Hill. Accused of cheating by an algorithm, and a professor she had never met. The New York Times, 2022. URL https://www.nytimes.com/2022/05/27/technology/college-students-cheating-software-honorlock.html

  16. [23]

    kafkai: Ai writer & ai content generator

    Kafkai. “kafkai: Ai writer & ai content generator”. 2020. URL https://kafkai.com/

  17. [26]

    On the reliability of watermarks for large language models, 2023 b

    John Kirchenbauer, Jonas Geiping, Yuxin Wen, Manli Shu, Khalid Saifullah, Kezhi Kong, Kasun Fernando, Aniruddha Saha, Micah Goldblum, and Tom Goldstein. On the reliability of watermarks for large language models, 2023 b

  18. [27]

    Paraphrasing evades detectors of ai-generated text, but retrieval is an effective defense, 2023

    Kalpesh Krishna, Yixiao Song, Marzena Karpinska, John Wieting, and Mohit Iyyer. Paraphrasing evades detectors of ai-generated text, but retrieval is an effective defense, 2023

  19. [28]

    Robust distortion-free watermarks for language models,

    Rohith Kuditipudi, John Thickstun, Tatsunori Hashimoto, and Percy Liang. Robust distortion-free watermarks for language models, 2024. URL https://arxiv.org/abs/2307.15593

  20. [30]

    Mage: Machine-generated text detection in the wild, 2024

    Yafu Li, Qintong Li, Leyang Cui, Wei Bi, Zhilin Wang, Longyue Wang, Linyi Yang, Shuming Shi, and Yue Zhang. Mage: Machine-generated text detection in the wild, 2024. URL https://arxiv.org/abs/2305.13242

  21. [31]

    Gpt detectors are biased against non-native english writers, 2023

    Weixin Liang, Mert Yuksekgonul, Yining Mao, Eric Wu, and James Zou. Gpt detectors are biased against non-native english writers, 2023

  22. [36]

    Don't Give Me the Details, Just the Summary! Topic-Aware Convolutional Neural Networks for Extreme Summarization

    Shashi Narayan, Shay B. Cohen, and Mirella Lapata. Don't give me the details, just the summary! topic-aware convolutional neural networks for extreme summarization. ArXiv, abs/1808.08745, 2018

  23. [37]

    Gpt-2: 1.5b release

    OpenAI. Gpt-2: 1.5b release. November 2019. URL https://openai.com/research/gpt-2-1-5b-release

  24. [38]

    Chatgpt: Optimizing language models for dialogue

    OpenAI. Chatgpt: Optimizing language models for dialogue. November 2022. URL https://openai.com/blog/chatgpt/

  25. [39]

    Gpt-4 technical report

    OpenAI. Gpt-4 technical report. March 2023. URL https://cdn.openai.com/papers/gpt-4.pdf

  26. [41]

    Professor freezes student grades after chatgpt claimed ai wrote their papers

    Katyanna Quach. Professor freezes student grades after chatgpt claimed ai wrote their papers. The Register, 2023. URL https://www.theregister.com/2023/05/17/university_chatgpt_grades/

  27. [42]

    Language models are unsupervised multitask learners

    Alec Radford, Jeff Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. Language models are unsupervised multitask learners. 2019

  28. [43]

    Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu. Exploring the limits of transfer learning with a unified text-to-text transformer, 2019. URL https://arxiv.org/abs/1910.10683

  29. [44]

    SQuAD: 100,000+ Questions for Machine Comprehension of Text

    Pranav Rajpurkar , Jian Zhang , Konstantin Lopyrev , and Percy Liang . SQuAD: 100,000+ Questions for Machine Comprehension of Text . arXiv e-prints, art. arXiv:1606.05250, 2016

  30. [45]

    High-resolution image synthesis with latent diffusion models

    Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bj \"o rn Ommer. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.\ 10684--10695, 2022

  31. [49]

    Data-driven cyberattack synthesis against network control systems, 2022

    Omanshu Thapliyal and Inseok Hwang. Data-driven cyberattack synthesis against network control systems, 2022

  32. [51]

    Improved certified defenses against data poisoning with (deterministic) finite aggregation

    Wenxiao Wang, Alexander J Levine, and Soheil Feizi. Improved certified defenses against data poisoning with (deterministic) finite aggregation. In International Conference on Machine Learning, pp.\ 22769--22783. PMLR, 2022

  33. [52]

    M4: Multi-generator, multi-domain, and multi-lingual black-box machine-generated text detection, 2023

    Yuxia Wang, Jonibek Mansurov, Petar Ivanov, Jinyan Su, Artem Shelmanov, Akim Tsvigun, Chenxi Whitehouse, Osama Mohammed Afzal, Tarek Mahmoud, Alham Fikri Aji, and Preslav Nakov. M4: Multi-generator, multi-domain, and multi-lingual black-box machine-generated text detection, 2023

  34. [53]

    Deepfake bot submissions to federal public comment websites cannot be distinguished from human submissions

    Max Weiss. Deepfake bot submissions to federal public comment websites cannot be distinguished from human submissions. Technology Science, 2019121801, 2019

  35. [54]

    Linguistic steganography on twitter: hierarchical language modeling with manual interaction

    Alex Wilson, Phil Blunsom, and Andrew D Ker. Linguistic steganography on twitter: hierarchical language modeling with manual interaction. In Media Watermarking, Security, and Forensics 2014, volume 9028, pp.\ 9--25. SPIE, 2014

  36. [56]

    Edelman, Danilo Francati, Daniele Venturi, Giuseppe Ateniese, and Boaz Barak

    Hanlin Zhang, Benjamin L. Edelman, Danilo Francati, Daniele Venturi, Giuseppe Ateniese, and Boaz Barak. Watermarks in the sand: Impossibility of strong watermarking for generative models, 2024. URL https://arxiv.org/abs/2311.04378

  37. [57]

    OPT: Open Pre-trained Transformer Language Models

    Susan Zhang, Stephen Roller, Naman Goyal, Mikel Artetxe, Moya Chen, Shuohui Chen, Christopher Dewan, Mona Diab, Xian Li, Xi Victoria Lin, Todor Mihaylov, Myle Ott, Sam Shleifer, Kurt Shuster, Daniel Simig, Punit Singh Koura, Anjali Sridhar, Tianlu Wang, and Luke Zettlemoyer. Opt: Open pre-trained transformer language models, 2022. URL https://arxiv.org/ab...

  38. [58]

    Provable robust watermarking for ai-generated text,

    Xuandong Zhao, Prabhanjan Ananth, Lei Li, and Yu-Xiang Wang. Provable robust watermarking for ai-generated text, 2023 a . URL https://arxiv.org/abs/2306.17439

  39. [60]

    2022 , eprint=

    Data-driven Cyberattack Synthesis against Network Control Systems , author=. 2022 , eprint=

  40. [61]

    The Quality of the Covariance Selection Through Detection Problem and AUC Bounds

    Navid Tafaghodi Khajavi and Anthony Kuh , title =. CoRR , volume =. 2016 , url =. 1605.05776 , timestamp =

  41. [62]

    Can AI-Generated Text be Reliably Detected?

    Can ai-generated text be reliably detected? , author=. arXiv preprint arXiv:2303.11156 , year=

  42. [63]

    2019 , eprint=

    On the Use of ArXiv as a Dataset , author=. 2019 , eprint=

  43. [64]

    arXiv preprint arXiv:2301.10226 , year=

    A Watermark for Large Language Models , author=. arXiv preprint arXiv:2301.10226 , year=

  44. [65]

    arXiv preprint arXiv:2106.14851 , year=

    Data poisoning won't save you from facial recognition , author=. arXiv preprint arXiv:2106.14851 , year=

  45. [66]

    On the Discredibility of Membership Inference Attacks , publisher =

    Rezaei, Shahbaz and Liu, Xin , keywords =. On the Discredibility of Membership Inference Attacks , publisher =. 2022 , copyright =. doi:10.48550/ARXIV.2212.02701 , url =

  46. [67]

    OPT: Open Pre-trained Transformer Language Models

    Zhang, Susan and Roller, Stephen and Goyal, Naman and Artetxe, Mikel and Chen, Moya and Chen, Shuohui and Dewan, Christopher and Diab, Mona and Li, Xian and Lin, Xi Victoria and Mihaylov, Todor and Ott, Myle and Shleifer, Sam and Shuster, Kurt and Simig, Daniel and Koura, Punit Singh and Sridhar, Anjali and Wang, Tianlu and Zettlemoyer, Luke , keywords =....

  47. [68]

    Prithiviraj Damodaran , title =

  48. [69]

    Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

    Raffel, Colin and Shazeer, Noam and Roberts, Adam and Lee, Katherine and Narang, Sharan and Matena, Michael and Zhou, Yanqi and Li, Wei and Liu, Peter J. , keywords =. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , publisher =. 2019 , copyright =. doi:10.48550/ARXIV.1910.10683 , url =

  49. [70]

    ArXiv , year=

    Don't Give Me the Details, Just the Summary! Topic-Aware Convolutional Neural Networks for Extreme Summarization , author=. ArXiv , year=

  50. [71]

    Language Models are Unsupervised Multitask Learners , author=

  51. [72]

    2017 , publisher=

    Markov chains and mixing times , author=. 2017 , publisher=

  52. [73]

    2019 , eprint=

    PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization , author=. 2019 , eprint=

  53. [74]

    Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding

    Photorealistic text-to-image diffusion models with deep language understanding , author=. arXiv preprint arXiv:2205.11487 , year=

  54. [75]

    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

    High-resolution image synthesis with latent diffusion models , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

  55. [76]

    Advances in neural information processing systems , volume=

    Language models are few-shot learners , author=. Advances in neural information processing systems , volume=

  56. [77]

    ChatGPT: Optimizing Language Models for Dialogue , author=

  57. [78]

    GPT-4 Technical Report , author=

  58. [79]

    CNET secretly used AI on articles that didn’t disclose that fact, staff say , author=

  59. [80]

    GPT-2: 1.5B release , author=

  60. [81]

    arXiv preprint arXiv:2011.01314 , year=

    Automatic detection of machine generated text: A critical survey , author=. arXiv preprint arXiv:2011.01314 , year=

  61. [82]

    Technology Science , volume=

    Deepfake bot submissions to federal public comment websites cannot be distinguished from human submissions , author=. Technology Science , volume=

  62. [83]

    Advanced Information Networking and Applications: Proceedings of the 34th International Conference on Advanced Information Networking and Applications (AINA-2020) , pages=

    Generating sentiment-preserving fake online reviews using neural language models and their human-and machine-based detection , author=. Advanced Information Networking and Applications: Proceedings of the 34th International Conference on Advanced Information Networking and Applications (AINA-2020) , pages=. 2020 , organization=

  63. [84]

    arXiv preprint arXiv:2301.11305 , year=

    DetectGPT: Zero-Shot Machine-Generated Text Detection using Probability Curvature , author=. arXiv preprint arXiv:2301.11305 , year=

  64. [85]

    RoBERTa: A Robustly Optimized BERT Pretraining Approach

    Roberta: A robustly optimized bert pretraining approach , author=. arXiv preprint arXiv:1907.11692 , year=

  65. [86]

    arXiv preprint arXiv:1906.03351 , year=

    Real or fake? learning to discriminate machine from human generated text , author=. arXiv preprint arXiv:1906.03351 , year=

  66. [87]

    arXiv , author=

    TweepFake: About detecting deepfake tweets. arXiv , author=. arXiv preprint arXiv:2008.00036 , year=

  67. [88]

    Release Strategies and the Social Impacts of Language Models

    Release strategies and the social impacts of language models , author=. arXiv preprint arXiv:1908.09203 , year=

  68. [89]

    GLTR: Statistical Detection and Visualization of Generated Text

    Gltr: Statistical detection and visualization of generated text , author=. arXiv preprint arXiv:1906.04043 , year=

  69. [90]

    arXiv preprint arXiv:1911.00650 , year=

    Automatic detection of generated text is easiest when humans are fooled , author=. arXiv preprint arXiv:1911.00650 , year=

  70. [91]

    Explaining and Harnessing Adversarial Examples

    Explaining and harnessing adversarial examples , author=. arXiv preprint arXiv:1412.6572 , year=

  71. [92]

    arXiv preprint arXiv:2303.04278 , year=

    CUDA: Convolution-based Unlearnable Datasets , author=. arXiv preprint arXiv:2303.04278 , year=

  72. [93]

    International Conference on Machine Learning , pages=

    Improved certified defenses against data poisoning with (deterministic) finite aggregation , author=. International Conference on Machine Learning , pages=. 2022 , organization=

  73. [94]

    arXiv preprint arXiv:2201.12440 , year=

    Certifying model accuracy under distribution shifts , author=. arXiv preprint arXiv:2201.12440 , year=

  74. [95]

    arXiv preprint arXiv:2302.03162 , year=

    Protecting Language Generation Models via Invisible Watermarking , author=. arXiv preprint arXiv:2302.03162 , year=

  75. [96]

    Media Watermarking, Security, and Forensics 2014 , volume=

    Linguistic steganography on twitter: hierarchical language modeling with manual interaction , author=. Media Watermarking, Security, and Forensics 2014 , volume=. 2014 , organization=

  76. [97]

    Information Hiding: 4th International Workshop, IH 2001 Pittsburgh, PA, USA, April 25--27, 2001 Proceedings 4 , pages=

    Natural language watermarking: Design, analysis, and a proof-of-concept implementation , author=. Information Hiding: 4th International Workshop, IH 2001 Pittsburgh, PA, USA, April 25--27, 2001 Proceedings 4 , pages=. 2001 , organization=

  77. [98]

    Regulating ChatGPT and other Large Generative AI Models , doi =

    Hacker, Philipp and Engel, Andreas and Mauer, Marco , year =. Regulating ChatGPT and other Large Generative AI Models , doi =

  78. [99]

    2023 , eprint=

    Paraphrasing evades detectors of AI-generated text, but retrieval is an effective defense , author=. 2023 , eprint=

  79. [100]

    2023 , eprint=

    On the Possibilities of AI-Generated Text Detection , author=. 2023 , eprint=

  80. [101]

    My AI Safety Lecture for UT Effective Altruism , author=

Showing first 80 references.