Can AI-Generated Text be Reliably Detected?
Pith reviewed 2026-05-20 19:24 UTC · model grok-4.3
The pith
Recursive paraphrasing attacks substantially lower detection rates for current AI text detectors while preserving most text quality.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Recursive paraphrasing reduces detection rates for a range of AI text detectors including watermark-based and neural-network methods while causing only minor degradation in text quality. The work further demonstrates that watermarked models are vulnerable to spoofing attacks that misclassify human text as AI-generated without white-box access, and it supplies a theoretical framework that relates the AUROC of the strongest detector to the Total Variation distance between the human and AI text distributions.
What carries the argument
The recursive paraphrasing attack, which iteratively rewrites text using a language model to disrupt detector-specific features such as watermarks or statistical signatures while aiming to keep semantic content and fluency intact.
If this is right
- Detectors that rely on fixed statistical or watermark features lose reliability once an attacker applies iterative rewriting.
- Watermarking schemes in deployed models can be reverse-engineered enough to enable spoofing of human-written text.
- The theoretical bound implies that detection performance is capped by how close AI distributions get to human distributions.
- Practical systems must incorporate defenses that survive multiple rounds of paraphrasing rather than single-pass checks.
Where Pith is reading between the lines
- Detection may shift from surface-level cues toward deeper semantic consistency checks that survive rewriting.
- The arms race between generation and detection could require periodic retraining or new watermark designs that resist inference.
- If models continue to close the distribution gap, the framework suggests reliable binary detection becomes impossible without additional side information.
Load-bearing premise
The paraphrased output stays close enough in meaning and readability to the original that it still counts as a realistic sample from the target distribution.
What would settle it
Run the recursive paraphrasing procedure on held-out passages and measure whether detection accuracy falls below 60 percent while human quality ratings or perplexity scores remain within 15 percent of the originals.
read the original abstract
Large Language Models (LLMs) perform impressively well in various applications. However, the potential for misuse of these models in activities such as plagiarism, generating fake news, and spamming has raised concern about their responsible use. Consequently, the reliable detection of AI-generated text has become a critical area of research. AI text detectors have shown to be effective under their specific settings. In this paper, we stress-test the robustness of these AI text detectors in the presence of an attacker. We introduce recursive paraphrasing attack to stress test a wide range of detection schemes, including the ones using the watermarking as well as neural network-based detectors, zero shot classifiers, and retrieval-based detectors. Our experiments conducted on passages, each approximately 300 tokens long, reveal the varying sensitivities of these detectors to our attacks. Our findings indicate that while our recursive paraphrasing method can significantly reduce detection rates, it only slightly degrades text quality in many cases, highlighting potential vulnerabilities in current detection systems in the presence of an attacker. Additionally, we investigate the susceptibility of watermarked LLMs to spoofing attacks aimed at misclassifying human-written text as AI-generated. We demonstrate that an attacker can infer hidden AI text signatures without white-box access to the detection method, potentially leading to reputational risks for LLM developers. Finally, we provide a theoretical framework connecting the AUROC of the best possible detector to the Total Variation distance between human and AI text distributions. This analysis offers insights into the fundamental challenges of reliable detection as language models continue to advance. Our code is publicly available at https://github.com/vinusankars/Reliability-of-AI-text-detectors.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript claims that current AI-generated text detectors, including watermarking, neural network-based, zero-shot, and retrieval-based methods, are vulnerable to a recursive paraphrasing attack. Experiments on ~300-token passages show that this attack significantly reduces detection rates while only slightly degrading text quality. The work also demonstrates spoofing attacks on watermarked LLMs without white-box access and provides a theoretical analysis connecting the AUROC of the optimal detector to the total variation distance between human and AI text distributions. Code is released publicly.
Significance. If the empirical findings are substantiated with fuller experimental details, the paper would highlight important practical limitations of AI text detectors, which is relevant for AI safety and content moderation. The theoretical connection to total variation distance offers a clean framing of fundamental detectability limits. Public code availability supports reproducibility and is a positive contribution.
major comments (2)
- [Experiments] Experiments section (and abstract): the central empirical claim that recursive paraphrasing reduces detection rates while only slightly degrading quality is presented at a high level. The manuscript must specify the paraphraser model and training details, the exact number of recursion steps, the quality metric(s) used (e.g., perplexity, semantic similarity via embeddings, or human ratings), baseline comparisons to non-recursive paraphrasing or other attacks, and statistical significance testing across multiple runs or seeds.
- [Attack Description] Attack description: the assumption that each recursive paraphrase step keeps the output distribution sufficiently close to the original AI (or human) distribution for the attack to be realistic is load-bearing but not quantitatively verified. The paper should report concrete measures such as embedding cosine distances, n-gram overlap, or perplexity shifts between original and recursively paraphrased text to confirm that quality degradation remains minor while detector features are altered.
minor comments (2)
- [Abstract] Abstract: the phrase 'our recursive paraphrasing method' is introduced without a one-sentence definition; adding a brief characterization would improve standalone readability.
- [Theoretical Analysis] Theoretical framework: while the AUROC–total variation link follows from standard definitions, the manuscript should explicitly note the assumptions on the support of the text distributions and whether the bound is tight or merely existential.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback. The comments highlight areas where additional clarity and substantiation will strengthen the manuscript. We have revised the paper to address both major comments by expanding the experimental details and providing quantitative verification for the attack. Our point-by-point responses follow.
read point-by-point responses
-
Referee: [Experiments] Experiments section (and abstract): the central empirical claim that recursive paraphrasing reduces detection rates while only slightly degrading quality is presented at a high level. The manuscript must specify the paraphraser model and training details, the exact number of recursion steps, the quality metric(s) used (e.g., perplexity, semantic similarity via embeddings, or human ratings), baseline comparisons to non-recursive paraphrasing or other attacks, and statistical significance testing across multiple runs or seeds.
Authors: We agree that the initial presentation was high-level and that fuller details are required to substantiate the central empirical claims. In the revised manuscript we now explicitly state the paraphraser model (GPT-3.5-turbo with a fixed paraphrasing prompt), training/inference details, the precise number of recursion steps used in the main experiments, the quality metrics employed (perplexity and embedding-based semantic similarity), direct comparisons against single-step paraphrasing baselines, and statistical significance testing (means and standard deviations over multiple random seeds with appropriate hypothesis tests). These additions appear in the Experiments section and the abstract has been updated accordingly. revision: yes
-
Referee: [Attack Description] Attack description: the assumption that each recursive paraphrase step keeps the output distribution sufficiently close to the original AI (or human) distribution for the attack to be realistic is load-bearing but not quantitatively verified. The paper should report concrete measures such as embedding cosine distances, n-gram overlap, or perplexity shifts between original and recursively paraphrased text to confirm that quality degradation remains minor while detector features are altered.
Authors: We concur that explicit quantitative checks on distributional closeness are important for establishing the realism of the attack. The revised manuscript now includes a dedicated paragraph (with accompanying table) reporting embedding cosine distances, n-gram overlap statistics, and perplexity shifts between the original and recursively paraphrased texts. These measurements confirm that semantic and distributional fidelity is largely preserved while detector-evading features are modified, thereby supporting the practicality of the attack. revision: yes
Circularity Check
Theoretical AUROC-TV link follows standard statistics; no circular reductions found
full rationale
The paper's theoretical framework connects AUROC to total variation distance via standard definitions from hypothesis testing and distribution distances; this is a direct application of known results and does not reduce to any fitted parameter or self-referential definition within the paper. Recursive paraphrasing attacks and detector evaluations are presented as empirical experiments on ~300-token passages without any derivation chain that equates outputs to inputs by construction. No self-citation load-bearing steps, uniqueness theorems, or ansatzes are invoked for the central claims. The work is self-contained against external statistical benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Human and AI text can be modeled as two probability distributions whose total variation distance governs the best possible detector AUROC.
Forward citations
Cited by 24 Pith papers
-
Who Owns This Agent? Tracing AI Agents Back to Their Owners
A canary injection protocol for linking observed AI agent behavior to the responsible account at the hosting vendor, with robust variants for adversarial filtering.
-
Base Models Look Human To AI Detectors
Base model text evades AI detectors better than instruction-tuned text, and the HIP method strengthens this trade-off across model sizes.
-
PeerPrism: Peer Evaluation Expertise vs Review-writing AI
PeerPrism benchmark demonstrates that state-of-the-art LLM detectors conflate surface text style with intellectual contribution and fail on hybrid human-AI peer reviews.
-
LLM Agents Make Collective Belief Dynamics Programmable: Challenges and Research Directions
LLM agents make collective belief dynamics programmable, with simulations showing coordinated agents induce stable belief shifts, and four structural properties that complicate detection and defense.
-
High-Rate Public-Key Pseudorandom Codes for Edit Errors
First high-rate public-key binary PRCs for edit channels via reduction from Hamming-robust PRCs and alphabet-size constructions attaining near-Singleton rates.
-
The End of Trust: How Agentic AI Breaks Security Assumptions
Agentic AI eliminates the fidelity-scale tradeoff in deception, enabling the Infinite Impostor attack that hijacks trusted relationships at mass scale and requiring a shift to suspect-by-default security based on eval...
-
The Algorithmic Caricature: Auditing LLM-Generated Political Discourse Across Crisis Events
LLM-generated political discourse across crises is fluent yet caricatured: more negative, less emotionally varied, more structurally regular, and lexically abstract than observed online populations.
-
Process Matters more than Output for Distinguishing Humans from Machines
Process-level features from 30 cognitive tasks distinguish humans from frontier AI agents more effectively than task performance or output matching, achieving mean classifier AUC of 0.88, with fine-tuning experiments ...
-
Process Matters more than Output for Distinguishing Humans from Machines
A new battery of 30 cognitive tasks demonstrates that process-level behavioral features distinguish humans from frontier AI agents better than performance metrics (mean AUC 0.88), with process-specific fine-tuning imp...
-
Detecting Verbatim LLM Copy-Paste in Homework
SteganoPrompt embeds a hidden instruction in assignment prompts via the Unicode Tags block so that LLMs add a detectable signature to responses when the prompt is pasted verbatim.
-
Block-wise Codeword Embedding for Reliable Multi-bit Text Watermarking
BREW achieves TPR of 0.965 and FPR of 0.02 under 10% synonym substitution by shifting from ECC decoding to designated verification with block voting and local validation.
-
DSIPA: Detecting LLM-Generated Texts via Sentiment-Invariant Patterns Divergence Analysis
DSIPA is a zero-shot black-box detector that uses sentiment distribution consistency and preservation metrics to identify LLM text, reporting up to 49.89% F1 gains over baselines across domains and models.
-
Beyond A Fixed Seal: Adaptive Stealing Watermark in Large Language Models
Adaptive Stealing improves watermark theft efficiency from LLMs via Position-Based Seal Construction and Adaptive Selection modules that dynamically choose optimal attack perspectives.
-
Why AI-Generated Text Detection Fails: Evidence from Explainable AI Beyond Benchmark Accuracy
AI-generated text detectors achieve high benchmark accuracy by exploiting unstable dataset-specific linguistic features, as evidenced by cross-domain degradation and differing SHAP explanations across corpora.
-
Privacy-Preserving Proof of Human Authorship via Zero-Knowledge Process Attestation
ZK-PoP uses Groth16 proofs, Pedersen commitments, and Bulletproof range proofs to attest that behavioral feature vectors and content evolution match human patterns without exposing the raw data.
-
Detecting Cognitive Signatures in Typing Behavior for Non-Intrusive Authorship Verification
Cognitive Load Correlation from keystroke timings distinguishes genuine human composition from mechanical transcription with estimated 85-95% accuracy in a non-intrusive framework.
-
RedNote-Vibe: A Dataset for Capturing Temporal Dynamics of AI-Generated Text in Lifestyle Social Media
RedNote-Vibe supplies a longitudinal dataset of AI versus human lifestyle posts from 2020 to mid-2025 plus the PLAD detection framework that applies cognitive psychology signatures for improved AI-text identification.
-
Multi-Level Contextual Token Relation Modeling for Machine-Generated Text Detection
A multi-level framework that models local and global relations among token detection scores to improve machine-generated text detection with low overhead.
-
Chainwash: Multi-Step Rewriting Attacks on Diffusion Language Model Watermarks
Chained rewrites by open-weight LLMs reduce watermark detection on diffusion LM outputs from 87.9% to 4.86% after five steps across multiple styles and models.
-
"Don't Be Afraid, Just Learn": Insights from Industry Practitioners to Prepare Software Engineers in the Age of Generative AI
Industry practitioners indicate that generative AI heightens demand for prompting and output evaluation skills while reinforcing the value of problem-solving, critical thinking, architecture design, and debugging in s...
-
Mitigating Watermark Forgery in Generative Models via Randomized Key Selection
Randomized per-query key selection with single-key detection acceptance bounds forgery success rate independently of collected samples while preserving model utility.
-
A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions
The paper surveys hallucination in LLMs with an innovative taxonomy, factors, detection methods, benchmarks, mitigation strategies, and open research directions.
-
Human-Provenance Verification should be Treated as Labor Infrastructure in AI-Saturated Markets
AI-saturated markets will produce premiums for verified human presence in labor, requiring governance to treat human-provenance verification as infrastructure rather than optional authenticity labels.
-
From AI-Generated Content to Agentic Action: Security and Safety Threats in Generative AI
The paper analyzes evolving security and safety threats in generative AI from content generation to agentic actions, noting that attack surfaces expand faster than defenses and that many safeguards require institution...
Reference graph
Works this paper leans on
-
[1]
My ai safety lecture for ut effective altruism
Scott Aaronson. My ai safety lecture for ut effective altruism. November 2022. URL https://scottaaronson.blog/?p=6823
work page 2022
-
[2]
David Ifeoluwa Adelani, Haotian Mai, Fuming Fang, Huy H Nguyen, Junichi Yamagishi, and Isao Echizen. Generating sentiment-preserving fake online reviews using neural language models and their human-and machine-based detection. In Advanced Information Networking and Applications: Proceedings of the 34th International Conference on Advanced Information Netw...
work page 2020
-
[3]
Ai plagiarism detection software keeps falsely accusing students of cheating
Noor Al-Sibai. Ai plagiarism detection software keeps falsely accusing students of cheating. Futurism, 2023. URL https://futurism.com/ai-plagiarism-software-false-accusing-students
work page 2023
-
[4]
Natural language watermarking: Design, analysis, and a proof-of-concept implementation
Mikhail J Atallah, Victor Raskin, Michael Crogan, Christian Hempelmann, Florian Kerschbaum, Dina Mohamed, and Sanket Naik. Natural language watermarking: Design, analysis, and a proof-of-concept implementation. In Information Hiding: 4th International Workshop, IH 2001 Pittsburgh, PA, USA, April 25--27, 2001 Proceedings 4, pp.\ 185--200. Springer, 2001
work page 2001
-
[6]
Comparison of two pseudo-random number generators
Lenore Blum, Manuel Blum, and Mike Shub. Comparison of two pseudo-random number generators. In Advances in Cryptology: Proceedings of CRYPTO '82, pp.\ 61--78. Plenum, 1982
work page 1982
-
[7]
How to generate cryptographically strong sequences of pseudorandom bits
Manuel Blum and Silvio Micali. How to generate cryptographically strong sequences of pseudorandom bits. SIAM Journal on Computing, 13 0 (4): 0 850--864, 1984. doi:10.1137/0213053. URL https://doi.org/10.1137/0213053
-
[8]
Language models are few-shot learners
Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language models are few-shot learners. Advances in neural information processing systems, 33: 0 1877--1901, 2020
work page 1901
-
[9]
On the possibilities of ai-generated text detection, 2023
Souradip Chakraborty, Amrit Singh Bedi, Sicheng Zhu, Bang An, Dinesh Manocha, and Furong Huang. On the possibilities of ai-generated text detection, 2023
work page 2023
-
[10]
Cnet secretly used ai on articles that didn’t disclose that fact, staff say
Jon Christian. Cnet secretly used ai on articles that didn’t disclose that fact, staff say. January 2023. URL https://futurism.com/cnet-ai-articles-label
work page 2023
-
[11]
Parrot: Paraphrase generation for nlu., 2021
Prithiviraj Damodaran. Parrot: Paraphrase generation for nlu., 2021
work page 2021
-
[12]
Turn-it-in: Ai fails students for not using ai
Mehul Reuben Das. Turn-it-in: Ai fails students for not using ai. Firstpost, 2023. URL https://www.firstpost.com/world/plagiarism-detector-turnitin-keeps-falsely-accusing-students-of-cheating-using-ai-12704662.html
work page 2023
-
[14]
Geoffrey A. Fowler. We tested a new chatgpt-detector for teachers. it flagged an innocent student. The Washington Post, 2023. URL https://www.washingtonpost.com/technology/2023/04/01/chatgpt-cheating-detection-turnitin/
work page 2023
-
[15]
Llm detectors still fall short of real world: Case of llm-generated short news-like posts, 2024
Henrique Da Silva Gameiro, Andrei Kucharavy, and Ljiljana Dolamic. Llm detectors still fall short of real world: Case of llm-generated short news-like posts, 2024. URL https://arxiv.org/abs/2409.03291
-
[18]
On the learnability of watermarks for language models, 2024
Chenchen Gu, Xiang Lisa Li, Percy Liang, and Tatsunori Hashimoto. On the learnability of watermarks for language models, 2024. URL https://arxiv.org/abs/2312.04469
-
[19]
Accused of cheating by an algorithm, and a professor she had never met
Kashmir Hill. Accused of cheating by an algorithm, and a professor she had never met. The New York Times, 2022. URL https://www.nytimes.com/2022/05/27/technology/college-students-cheating-software-honorlock.html
work page 2022
-
[23]
kafkai: Ai writer & ai content generator
Kafkai. “kafkai: Ai writer & ai content generator”. 2020. URL https://kafkai.com/
work page 2020
-
[26]
On the reliability of watermarks for large language models, 2023 b
John Kirchenbauer, Jonas Geiping, Yuxin Wen, Manli Shu, Khalid Saifullah, Kezhi Kong, Kasun Fernando, Aniruddha Saha, Micah Goldblum, and Tom Goldstein. On the reliability of watermarks for large language models, 2023 b
work page 2023
-
[27]
Paraphrasing evades detectors of ai-generated text, but retrieval is an effective defense, 2023
Kalpesh Krishna, Yixiao Song, Marzena Karpinska, John Wieting, and Mohit Iyyer. Paraphrasing evades detectors of ai-generated text, but retrieval is an effective defense, 2023
work page 2023
-
[28]
Robust distortion-free watermarks for language models,
Rohith Kuditipudi, John Thickstun, Tatsunori Hashimoto, and Percy Liang. Robust distortion-free watermarks for language models, 2024. URL https://arxiv.org/abs/2307.15593
-
[30]
Mage: Machine-generated text detection in the wild, 2024
Yafu Li, Qintong Li, Leyang Cui, Wei Bi, Zhilin Wang, Longyue Wang, Linyi Yang, Shuming Shi, and Yue Zhang. Mage: Machine-generated text detection in the wild, 2024. URL https://arxiv.org/abs/2305.13242
-
[31]
Gpt detectors are biased against non-native english writers, 2023
Weixin Liang, Mert Yuksekgonul, Yining Mao, Eric Wu, and James Zou. Gpt detectors are biased against non-native english writers, 2023
work page 2023
-
[36]
Shashi Narayan, Shay B. Cohen, and Mirella Lapata. Don't give me the details, just the summary! topic-aware convolutional neural networks for extreme summarization. ArXiv, abs/1808.08745, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[37]
OpenAI. Gpt-2: 1.5b release. November 2019. URL https://openai.com/research/gpt-2-1-5b-release
work page 2019
-
[38]
Chatgpt: Optimizing language models for dialogue
OpenAI. Chatgpt: Optimizing language models for dialogue. November 2022. URL https://openai.com/blog/chatgpt/
work page 2022
-
[39]
OpenAI. Gpt-4 technical report. March 2023. URL https://cdn.openai.com/papers/gpt-4.pdf
work page 2023
-
[41]
Professor freezes student grades after chatgpt claimed ai wrote their papers
Katyanna Quach. Professor freezes student grades after chatgpt claimed ai wrote their papers. The Register, 2023. URL https://www.theregister.com/2023/05/17/university_chatgpt_grades/
work page 2023
-
[42]
Language models are unsupervised multitask learners
Alec Radford, Jeff Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. Language models are unsupervised multitask learners. 2019
work page 2019
-
[43]
Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu. Exploring the limits of transfer learning with a unified text-to-text transformer, 2019. URL https://arxiv.org/abs/1910.10683
work page internal anchor Pith review Pith/arXiv arXiv 2019
-
[44]
SQuAD: 100,000+ Questions for Machine Comprehension of Text
Pranav Rajpurkar , Jian Zhang , Konstantin Lopyrev , and Percy Liang . SQuAD: 100,000+ Questions for Machine Comprehension of Text . arXiv e-prints, art. arXiv:1606.05250, 2016
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[45]
High-resolution image synthesis with latent diffusion models
Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bj \"o rn Ommer. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.\ 10684--10695, 2022
work page 2022
-
[49]
Data-driven cyberattack synthesis against network control systems, 2022
Omanshu Thapliyal and Inseok Hwang. Data-driven cyberattack synthesis against network control systems, 2022
work page 2022
-
[51]
Improved certified defenses against data poisoning with (deterministic) finite aggregation
Wenxiao Wang, Alexander J Levine, and Soheil Feizi. Improved certified defenses against data poisoning with (deterministic) finite aggregation. In International Conference on Machine Learning, pp.\ 22769--22783. PMLR, 2022
work page 2022
-
[52]
Yuxia Wang, Jonibek Mansurov, Petar Ivanov, Jinyan Su, Artem Shelmanov, Akim Tsvigun, Chenxi Whitehouse, Osama Mohammed Afzal, Tarek Mahmoud, Alham Fikri Aji, and Preslav Nakov. M4: Multi-generator, multi-domain, and multi-lingual black-box machine-generated text detection, 2023
work page 2023
-
[53]
Max Weiss. Deepfake bot submissions to federal public comment websites cannot be distinguished from human submissions. Technology Science, 2019121801, 2019
work page 2019
-
[54]
Linguistic steganography on twitter: hierarchical language modeling with manual interaction
Alex Wilson, Phil Blunsom, and Andrew D Ker. Linguistic steganography on twitter: hierarchical language modeling with manual interaction. In Media Watermarking, Security, and Forensics 2014, volume 9028, pp.\ 9--25. SPIE, 2014
work page 2014
-
[56]
Edelman, Danilo Francati, Daniele Venturi, Giuseppe Ateniese, and Boaz Barak
Hanlin Zhang, Benjamin L. Edelman, Danilo Francati, Daniele Venturi, Giuseppe Ateniese, and Boaz Barak. Watermarks in the sand: Impossibility of strong watermarking for generative models, 2024. URL https://arxiv.org/abs/2311.04378
-
[57]
OPT: Open Pre-trained Transformer Language Models
Susan Zhang, Stephen Roller, Naman Goyal, Mikel Artetxe, Moya Chen, Shuohui Chen, Christopher Dewan, Mona Diab, Xian Li, Xi Victoria Lin, Todor Mihaylov, Myle Ott, Sam Shleifer, Kurt Shuster, Daniel Simig, Punit Singh Koura, Anjali Sridhar, Tianlu Wang, and Luke Zettlemoyer. Opt: Open pre-trained transformer language models, 2022. URL https://arxiv.org/ab...
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[58]
Provable robust watermarking for ai-generated text,
Xuandong Zhao, Prabhanjan Ananth, Lei Li, and Yu-Xiang Wang. Provable robust watermarking for ai-generated text, 2023 a . URL https://arxiv.org/abs/2306.17439
-
[60]
Data-driven Cyberattack Synthesis against Network Control Systems , author=. 2022 , eprint=
work page 2022
-
[61]
The Quality of the Covariance Selection Through Detection Problem and AUC Bounds
Navid Tafaghodi Khajavi and Anthony Kuh , title =. CoRR , volume =. 2016 , url =. 1605.05776 , timestamp =
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[62]
Can AI-Generated Text be Reliably Detected?
Can ai-generated text be reliably detected? , author=. arXiv preprint arXiv:2303.11156 , year=
work page internal anchor Pith review Pith/arXiv arXiv
- [63]
-
[64]
arXiv preprint arXiv:2301.10226 , year=
A Watermark for Large Language Models , author=. arXiv preprint arXiv:2301.10226 , year=
-
[65]
arXiv preprint arXiv:2106.14851 , year=
Data poisoning won't save you from facial recognition , author=. arXiv preprint arXiv:2106.14851 , year=
-
[66]
On the Discredibility of Membership Inference Attacks , publisher =
Rezaei, Shahbaz and Liu, Xin , keywords =. On the Discredibility of Membership Inference Attacks , publisher =. 2022 , copyright =. doi:10.48550/ARXIV.2212.02701 , url =
-
[67]
OPT: Open Pre-trained Transformer Language Models
Zhang, Susan and Roller, Stephen and Goyal, Naman and Artetxe, Mikel and Chen, Moya and Chen, Shuohui and Dewan, Christopher and Diab, Mona and Li, Xian and Lin, Xi Victoria and Mihaylov, Todor and Ott, Myle and Shleifer, Sam and Shuster, Kurt and Simig, Daniel and Koura, Punit Singh and Sridhar, Anjali and Wang, Tianlu and Zettlemoyer, Luke , keywords =....
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2205.01068 2022
-
[68]
Prithiviraj Damodaran , title =
-
[69]
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Raffel, Colin and Shazeer, Noam and Roberts, Adam and Lee, Katherine and Narang, Sharan and Matena, Michael and Zhou, Yanqi and Li, Wei and Liu, Peter J. , keywords =. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , publisher =. 2019 , copyright =. doi:10.48550/ARXIV.1910.10683 , url =
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1910.10683 2019
-
[70]
Don't Give Me the Details, Just the Summary! Topic-Aware Convolutional Neural Networks for Extreme Summarization , author=. ArXiv , year=
-
[71]
Language Models are Unsupervised Multitask Learners , author=
- [72]
-
[73]
PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization , author=. 2019 , eprint=
work page 2019
-
[74]
Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding
Photorealistic text-to-image diffusion models with deep language understanding , author=. arXiv preprint arXiv:2205.11487 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[75]
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=
High-resolution image synthesis with latent diffusion models , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=
-
[76]
Advances in neural information processing systems , volume=
Language models are few-shot learners , author=. Advances in neural information processing systems , volume=
-
[77]
ChatGPT: Optimizing Language Models for Dialogue , author=
-
[78]
GPT-4 Technical Report , author=
-
[79]
CNET secretly used AI on articles that didn’t disclose that fact, staff say , author=
-
[80]
GPT-2: 1.5B release , author=
-
[81]
arXiv preprint arXiv:2011.01314 , year=
Automatic detection of machine generated text: A critical survey , author=. arXiv preprint arXiv:2011.01314 , year=
-
[82]
Deepfake bot submissions to federal public comment websites cannot be distinguished from human submissions , author=. Technology Science , volume=
-
[83]
Generating sentiment-preserving fake online reviews using neural language models and their human-and machine-based detection , author=. Advanced Information Networking and Applications: Proceedings of the 34th International Conference on Advanced Information Networking and Applications (AINA-2020) , pages=. 2020 , organization=
work page 2020
-
[84]
arXiv preprint arXiv:2301.11305 , year=
DetectGPT: Zero-Shot Machine-Generated Text Detection using Probability Curvature , author=. arXiv preprint arXiv:2301.11305 , year=
-
[85]
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Roberta: A robustly optimized bert pretraining approach , author=. arXiv preprint arXiv:1907.11692 , year=
work page internal anchor Pith review Pith/arXiv arXiv 1907
-
[86]
arXiv preprint arXiv:1906.03351 , year=
Real or fake? learning to discriminate machine from human generated text , author=. arXiv preprint arXiv:1906.03351 , year=
-
[87]
TweepFake: About detecting deepfake tweets. arXiv , author=. arXiv preprint arXiv:2008.00036 , year=
-
[88]
Release Strategies and the Social Impacts of Language Models
Release strategies and the social impacts of language models , author=. arXiv preprint arXiv:1908.09203 , year=
work page internal anchor Pith review Pith/arXiv arXiv 1908
-
[89]
GLTR: Statistical Detection and Visualization of Generated Text
Gltr: Statistical detection and visualization of generated text , author=. arXiv preprint arXiv:1906.04043 , year=
work page internal anchor Pith review Pith/arXiv arXiv 1906
-
[90]
arXiv preprint arXiv:1911.00650 , year=
Automatic detection of generated text is easiest when humans are fooled , author=. arXiv preprint arXiv:1911.00650 , year=
-
[91]
Explaining and Harnessing Adversarial Examples
Explaining and harnessing adversarial examples , author=. arXiv preprint arXiv:1412.6572 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[92]
arXiv preprint arXiv:2303.04278 , year=
CUDA: Convolution-based Unlearnable Datasets , author=. arXiv preprint arXiv:2303.04278 , year=
-
[93]
International Conference on Machine Learning , pages=
Improved certified defenses against data poisoning with (deterministic) finite aggregation , author=. International Conference on Machine Learning , pages=. 2022 , organization=
work page 2022
-
[94]
arXiv preprint arXiv:2201.12440 , year=
Certifying model accuracy under distribution shifts , author=. arXiv preprint arXiv:2201.12440 , year=
-
[95]
arXiv preprint arXiv:2302.03162 , year=
Protecting Language Generation Models via Invisible Watermarking , author=. arXiv preprint arXiv:2302.03162 , year=
-
[96]
Media Watermarking, Security, and Forensics 2014 , volume=
Linguistic steganography on twitter: hierarchical language modeling with manual interaction , author=. Media Watermarking, Security, and Forensics 2014 , volume=. 2014 , organization=
work page 2014
-
[97]
Natural language watermarking: Design, analysis, and a proof-of-concept implementation , author=. Information Hiding: 4th International Workshop, IH 2001 Pittsburgh, PA, USA, April 25--27, 2001 Proceedings 4 , pages=. 2001 , organization=
work page 2001
-
[98]
Regulating ChatGPT and other Large Generative AI Models , doi =
Hacker, Philipp and Engel, Andreas and Mauer, Marco , year =. Regulating ChatGPT and other Large Generative AI Models , doi =
-
[99]
Paraphrasing evades detectors of AI-generated text, but retrieval is an effective defense , author=. 2023 , eprint=
work page 2023
-
[100]
On the Possibilities of AI-Generated Text Detection , author=. 2023 , eprint=
work page 2023
-
[101]
My AI Safety Lecture for UT Effective Altruism , author=
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.