Can AI-Generated Text be Reliably Detected?

Aounon Kumar; Soheil Feizi; Sriram Balasubramanian; Vinu Sankar Sadasivan; Wenxiao Wang

arxiv: 2303.11156 · v4 · pith:HFJYPCLFnew · submitted 2023-03-17 · 💻 cs.CL · cs.AI· cs.LG

Can AI-Generated Text be Reliably Detected?

Vinu Sankar Sadasivan , Aounon Kumar , Sriram Balasubramanian , Wenxiao Wang , Soheil Feizi This is my paper

Pith reviewed 2026-05-20 19:24 UTC · model grok-4.3

classification 💻 cs.CL cs.AIcs.LG

keywords AI text detectionparaphrasing attacksdetector robustnesswatermark spoofingtotal variation distance

0 comments

The pith

Recursive paraphrasing attacks substantially lower detection rates for current AI text detectors while preserving most text quality.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines whether AI-generated text can be reliably identified when an attacker actively tries to evade detection. It introduces a recursive paraphrasing method that rewrites passages multiple times with a language model to alter the statistical patterns detectors look for. Tests on roughly 300-token passages show large drops in accuracy across watermarking schemes, neural classifiers, zero-shot methods, and retrieval-based approaches, with only small losses in fluency and meaning. The authors also show how attackers can spoof watermarks to label human text as AI-generated and derive a theoretical link between the best possible detector performance and the distance between human and machine text distributions.

Core claim

Recursive paraphrasing reduces detection rates for a range of AI text detectors including watermark-based and neural-network methods while causing only minor degradation in text quality. The work further demonstrates that watermarked models are vulnerable to spoofing attacks that misclassify human text as AI-generated without white-box access, and it supplies a theoretical framework that relates the AUROC of the strongest detector to the Total Variation distance between the human and AI text distributions.

What carries the argument

The recursive paraphrasing attack, which iteratively rewrites text using a language model to disrupt detector-specific features such as watermarks or statistical signatures while aiming to keep semantic content and fluency intact.

If this is right

Detectors that rely on fixed statistical or watermark features lose reliability once an attacker applies iterative rewriting.
Watermarking schemes in deployed models can be reverse-engineered enough to enable spoofing of human-written text.
The theoretical bound implies that detection performance is capped by how close AI distributions get to human distributions.
Practical systems must incorporate defenses that survive multiple rounds of paraphrasing rather than single-pass checks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Detection may shift from surface-level cues toward deeper semantic consistency checks that survive rewriting.
The arms race between generation and detection could require periodic retraining or new watermark designs that resist inference.
If models continue to close the distribution gap, the framework suggests reliable binary detection becomes impossible without additional side information.

Load-bearing premise

The paraphrased output stays close enough in meaning and readability to the original that it still counts as a realistic sample from the target distribution.

What would settle it

Run the recursive paraphrasing procedure on held-out passages and measure whether detection accuracy falls below 60 percent while human quality ratings or perplexity scores remain within 15 percent of the originals.

read the original abstract

Large Language Models (LLMs) perform impressively well in various applications. However, the potential for misuse of these models in activities such as plagiarism, generating fake news, and spamming has raised concern about their responsible use. Consequently, the reliable detection of AI-generated text has become a critical area of research. AI text detectors have shown to be effective under their specific settings. In this paper, we stress-test the robustness of these AI text detectors in the presence of an attacker. We introduce recursive paraphrasing attack to stress test a wide range of detection schemes, including the ones using the watermarking as well as neural network-based detectors, zero shot classifiers, and retrieval-based detectors. Our experiments conducted on passages, each approximately 300 tokens long, reveal the varying sensitivities of these detectors to our attacks. Our findings indicate that while our recursive paraphrasing method can significantly reduce detection rates, it only slightly degrades text quality in many cases, highlighting potential vulnerabilities in current detection systems in the presence of an attacker. Additionally, we investigate the susceptibility of watermarked LLMs to spoofing attacks aimed at misclassifying human-written text as AI-generated. We demonstrate that an attacker can infer hidden AI text signatures without white-box access to the detection method, potentially leading to reputational risks for LLM developers. Finally, we provide a theoretical framework connecting the AUROC of the best possible detector to the Total Variation distance between human and AI text distributions. This analysis offers insights into the fundamental challenges of reliable detection as language models continue to advance. Our code is publicly available at https://github.com/vinusankars/Reliability-of-AI-text-detectors.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript claims that current AI-generated text detectors, including watermarking, neural network-based, zero-shot, and retrieval-based methods, are vulnerable to a recursive paraphrasing attack. Experiments on ~300-token passages show that this attack significantly reduces detection rates while only slightly degrading text quality. The work also demonstrates spoofing attacks on watermarked LLMs without white-box access and provides a theoretical analysis connecting the AUROC of the optimal detector to the total variation distance between human and AI text distributions. Code is released publicly.

Significance. If the empirical findings are substantiated with fuller experimental details, the paper would highlight important practical limitations of AI text detectors, which is relevant for AI safety and content moderation. The theoretical connection to total variation distance offers a clean framing of fundamental detectability limits. Public code availability supports reproducibility and is a positive contribution.

major comments (2)

[Experiments] Experiments section (and abstract): the central empirical claim that recursive paraphrasing reduces detection rates while only slightly degrading quality is presented at a high level. The manuscript must specify the paraphraser model and training details, the exact number of recursion steps, the quality metric(s) used (e.g., perplexity, semantic similarity via embeddings, or human ratings), baseline comparisons to non-recursive paraphrasing or other attacks, and statistical significance testing across multiple runs or seeds.
[Attack Description] Attack description: the assumption that each recursive paraphrase step keeps the output distribution sufficiently close to the original AI (or human) distribution for the attack to be realistic is load-bearing but not quantitatively verified. The paper should report concrete measures such as embedding cosine distances, n-gram overlap, or perplexity shifts between original and recursively paraphrased text to confirm that quality degradation remains minor while detector features are altered.

minor comments (2)

[Abstract] Abstract: the phrase 'our recursive paraphrasing method' is introduced without a one-sentence definition; adding a brief characterization would improve standalone readability.
[Theoretical Analysis] Theoretical framework: while the AUROC–total variation link follows from standard definitions, the manuscript should explicitly note the assumptions on the support of the text distributions and whether the bound is tight or merely existential.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. The comments highlight areas where additional clarity and substantiation will strengthen the manuscript. We have revised the paper to address both major comments by expanding the experimental details and providing quantitative verification for the attack. Our point-by-point responses follow.

read point-by-point responses

Referee: [Experiments] Experiments section (and abstract): the central empirical claim that recursive paraphrasing reduces detection rates while only slightly degrading quality is presented at a high level. The manuscript must specify the paraphraser model and training details, the exact number of recursion steps, the quality metric(s) used (e.g., perplexity, semantic similarity via embeddings, or human ratings), baseline comparisons to non-recursive paraphrasing or other attacks, and statistical significance testing across multiple runs or seeds.

Authors: We agree that the initial presentation was high-level and that fuller details are required to substantiate the central empirical claims. In the revised manuscript we now explicitly state the paraphraser model (GPT-3.5-turbo with a fixed paraphrasing prompt), training/inference details, the precise number of recursion steps used in the main experiments, the quality metrics employed (perplexity and embedding-based semantic similarity), direct comparisons against single-step paraphrasing baselines, and statistical significance testing (means and standard deviations over multiple random seeds with appropriate hypothesis tests). These additions appear in the Experiments section and the abstract has been updated accordingly. revision: yes
Referee: [Attack Description] Attack description: the assumption that each recursive paraphrase step keeps the output distribution sufficiently close to the original AI (or human) distribution for the attack to be realistic is load-bearing but not quantitatively verified. The paper should report concrete measures such as embedding cosine distances, n-gram overlap, or perplexity shifts between original and recursively paraphrased text to confirm that quality degradation remains minor while detector features are altered.

Authors: We concur that explicit quantitative checks on distributional closeness are important for establishing the realism of the attack. The revised manuscript now includes a dedicated paragraph (with accompanying table) reporting embedding cosine distances, n-gram overlap statistics, and perplexity shifts between the original and recursively paraphrased texts. These measurements confirm that semantic and distributional fidelity is largely preserved while detector-evading features are modified, thereby supporting the practicality of the attack. revision: yes

Circularity Check

0 steps flagged

Theoretical AUROC-TV link follows standard statistics; no circular reductions found

full rationale

The paper's theoretical framework connects AUROC to total variation distance via standard definitions from hypothesis testing and distribution distances; this is a direct application of known results and does not reduce to any fitted parameter or self-referential definition within the paper. Recursive paraphrasing attacks and detector evaluations are presented as empirical experiments on ~300-token passages without any derivation chain that equates outputs to inputs by construction. No self-citation load-bearing steps, uniqueness theorems, or ansatzes are invoked for the central claims. The work is self-contained against external statistical benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Based on the abstract alone, the central claims rest on standard assumptions about text distributions and detector optimality; no free parameters, invented entities, or ad-hoc axioms are explicitly introduced in the provided summary.

axioms (1)

domain assumption Human and AI text can be modeled as two probability distributions whose total variation distance governs the best possible detector AUROC.
Invoked in the final theoretical framework section of the abstract.

pith-pipeline@v0.9.0 · 5848 in / 1265 out tokens · 42994 ms · 2026-05-20T19:24:08.385213+00:00 · methodology

discussion (0)

Forward citations

Cited by 24 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Who Owns This Agent? Tracing AI Agents Back to Their Owners
cs.CR 2026-05 unverdicted novelty 8.0

A canary injection protocol for linking observed AI agent behavior to the responsible account at the hosting vendor, with robust variants for adversarial filtering.
Base Models Look Human To AI Detectors
cs.CL 2026-05 unverdicted novelty 7.0

Base model text evades AI detectors better than instruction-tuned text, and the HIP method strengthens this trade-off across model sizes.
PeerPrism: Peer Evaluation Expertise vs Review-writing AI
cs.CL 2026-04 unverdicted novelty 7.0

PeerPrism benchmark demonstrates that state-of-the-art LLM detectors conflate surface text style with intellectual contribution and fail on hybrid human-AI peer reviews.
LLM Agents Make Collective Belief Dynamics Programmable: Challenges and Research Directions
cs.MA 2026-05 unverdicted novelty 6.0

LLM agents make collective belief dynamics programmable, with simulations showing coordinated agents induce stable belief shifts, and four structural properties that complicate detection and defense.
High-Rate Public-Key Pseudorandom Codes for Edit Errors
cs.CR 2026-05 unverdicted novelty 6.0

First high-rate public-key binary PRCs for edit channels via reduction from Hamming-robust PRCs and alphabet-size constructions attaining near-Singleton rates.
The End of Trust: How Agentic AI Breaks Security Assumptions
cs.CR 2026-05 unverdicted novelty 6.0

Agentic AI eliminates the fidelity-scale tradeoff in deception, enabling the Infinite Impostor attack that hijacks trusted relationships at mass scale and requiring a shift to suspect-by-default security based on eval...
The Algorithmic Caricature: Auditing LLM-Generated Political Discourse Across Crisis Events
cs.CL 2026-05 unverdicted novelty 6.0

LLM-generated political discourse across crises is fluent yet caricatured: more negative, less emotionally varied, more structurally regular, and lexically abstract than observed online populations.
Process Matters more than Output for Distinguishing Humans from Machines
cs.AI 2026-05 unverdicted novelty 6.0

Process-level features from 30 cognitive tasks distinguish humans from frontier AI agents more effectively than task performance or output matching, achieving mean classifier AUC of 0.88, with fine-tuning experiments ...
Process Matters more than Output for Distinguishing Humans from Machines
cs.AI 2026-05 unverdicted novelty 6.0

A new battery of 30 cognitive tasks demonstrates that process-level behavioral features distinguish humans from frontier AI agents better than performance metrics (mean AUC 0.88), with process-specific fine-tuning imp...
Detecting Verbatim LLM Copy-Paste in Homework
cs.CR 2026-05 unverdicted novelty 6.0

SteganoPrompt embeds a hidden instruction in assignment prompts via the Unicode Tags block so that LLMs add a detectable signature to responses when the prompt is pasted verbatim.
Block-wise Codeword Embedding for Reliable Multi-bit Text Watermarking
cs.CR 2026-05 unverdicted novelty 6.0

BREW achieves TPR of 0.965 and FPR of 0.02 under 10% synonym substitution by shifting from ECC decoding to designated verification with block voting and local validation.
DSIPA: Detecting LLM-Generated Texts via Sentiment-Invariant Patterns Divergence Analysis
cs.CL 2026-04 unverdicted novelty 6.0

DSIPA is a zero-shot black-box detector that uses sentiment distribution consistency and preservation metrics to identify LLM text, reporting up to 49.89% F1 gains over baselines across domains and models.
Beyond A Fixed Seal: Adaptive Stealing Watermark in Large Language Models
cs.CR 2026-04 unverdicted novelty 6.0

Adaptive Stealing improves watermark theft efficiency from LLMs via Position-Based Seal Construction and Adaptive Selection modules that dynamically choose optimal attack perspectives.
Why AI-Generated Text Detection Fails: Evidence from Explainable AI Beyond Benchmark Accuracy
cs.CL 2026-03 conditional novelty 6.0

AI-generated text detectors achieve high benchmark accuracy by exploiting unstable dataset-specific linguistic features, as evidenced by cross-domain degradation and differing SHAP explanations across corpora.
Privacy-Preserving Proof of Human Authorship via Zero-Knowledge Process Attestation
cs.CR 2026-02 unverdicted novelty 6.0

ZK-PoP uses Groth16 proofs, Pedersen commitments, and Bulletproof range proofs to attest that behavioral feature vectors and content evolution match human patterns without exposing the raw data.
Detecting Cognitive Signatures in Typing Behavior for Non-Intrusive Authorship Verification
cs.CR 2026-02 unverdicted novelty 6.0

Cognitive Load Correlation from keystroke timings distinguishes genuine human composition from mechanical transcription with estimated 85-95% accuracy in a non-intrusive framework.
RedNote-Vibe: A Dataset for Capturing Temporal Dynamics of AI-Generated Text in Lifestyle Social Media
cs.CL 2025-09 unverdicted novelty 6.0

RedNote-Vibe supplies a longitudinal dataset of AI versus human lifestyle posts from 2020 to mid-2025 plus the PLAD detection framework that applies cognitive psychology signatures for improved AI-text identification.
Multi-Level Contextual Token Relation Modeling for Machine-Generated Text Detection
cs.CL 2026-05 unverdicted novelty 5.0

A multi-level framework that models local and global relations among token detection scores to improve machine-generated text detection with low overhead.
Chainwash: Multi-Step Rewriting Attacks on Diffusion Language Model Watermarks
cs.CL 2026-05 unverdicted novelty 5.0

Chained rewrites by open-weight LLMs reduce watermark detection on diffusion LM outputs from 87.9% to 4.86% after five steps across multiple styles and models.
"Don't Be Afraid, Just Learn": Insights from Industry Practitioners to Prepare Software Engineers in the Age of Generative AI
cs.SE 2026-04 unverdicted novelty 5.0

Industry practitioners indicate that generative AI heightens demand for prompting and output evaluation skills while reinforcing the value of problem-solving, critical thinking, architecture design, and debugging in s...
Mitigating Watermark Forgery in Generative Models via Randomized Key Selection
cs.CR 2025-07 unverdicted novelty 5.0

Randomized per-query key selection with single-key detection acceptance bounds forgery success rate independently of collected samples while preserving model utility.
A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions
cs.CL 2023-11 unverdicted novelty 5.0

The paper surveys hallucination in LLMs with an innovative taxonomy, factors, detection methods, benchmarks, mitigation strategies, and open research directions.
Human-Provenance Verification should be Treated as Labor Infrastructure in AI-Saturated Markets
cs.CY 2026-05 unverdicted novelty 4.0

AI-saturated markets will produce premiums for verified human presence in labor, requiring governance to treat human-provenance verification as infrastructure rather than optional authenticity labels.
From AI-Generated Content to Agentic Action: Security and Safety Threats in Generative AI
cs.CR 2026-05 unverdicted novelty 3.0

The paper analyzes evolving security and safety threats in generative AI from content generation to agentic actions, noting that attack surfaces expand faster than defenses and that many safeguards require institution...

Reference graph

Works this paper leans on

105 extracted references · 105 canonical work pages · cited by 23 Pith papers · 15 internal anchors

[1]

My ai safety lecture for ut effective altruism

Scott Aaronson. My ai safety lecture for ut effective altruism. November 2022. URL https://scottaaronson.blog/?p=6823

work page 2022
[2]

Generating sentiment-preserving fake online reviews using neural language models and their human-and machine-based detection

David Ifeoluwa Adelani, Haotian Mai, Fuming Fang, Huy H Nguyen, Junichi Yamagishi, and Isao Echizen. Generating sentiment-preserving fake online reviews using neural language models and their human-and machine-based detection. In Advanced Information Networking and Applications: Proceedings of the 34th International Conference on Advanced Information Netw...

work page 2020
[3]

Ai plagiarism detection software keeps falsely accusing students of cheating

Noor Al-Sibai. Ai plagiarism detection software keeps falsely accusing students of cheating. Futurism, 2023. URL https://futurism.com/ai-plagiarism-software-false-accusing-students

work page 2023
[4]

Natural language watermarking: Design, analysis, and a proof-of-concept implementation

Mikhail J Atallah, Victor Raskin, Michael Crogan, Christian Hempelmann, Florian Kerschbaum, Dina Mohamed, and Sanket Naik. Natural language watermarking: Design, analysis, and a proof-of-concept implementation. In Information Hiding: 4th International Workshop, IH 2001 Pittsburgh, PA, USA, April 25--27, 2001 Proceedings 4, pp.\ 185--200. Springer, 2001

work page 2001
[6]

Comparison of two pseudo-random number generators

Lenore Blum, Manuel Blum, and Mike Shub. Comparison of two pseudo-random number generators. In Advances in Cryptology: Proceedings of CRYPTO '82, pp.\ 61--78. Plenum, 1982

work page 1982
[7]

How to generate cryptographically strong sequences of pseudorandom bits

Manuel Blum and Silvio Micali. How to generate cryptographically strong sequences of pseudorandom bits. SIAM Journal on Computing, 13 0 (4): 0 850--864, 1984. doi:10.1137/0213053. URL https://doi.org/10.1137/0213053

work page doi:10.1137/0213053 1984
[8]

Language models are few-shot learners

Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language models are few-shot learners. Advances in neural information processing systems, 33: 0 1877--1901, 2020

work page 1901
[9]

On the possibilities of ai-generated text detection, 2023

Souradip Chakraborty, Amrit Singh Bedi, Sicheng Zhu, Bang An, Dinesh Manocha, and Furong Huang. On the possibilities of ai-generated text detection, 2023

work page 2023
[10]

Cnet secretly used ai on articles that didn’t disclose that fact, staff say

Jon Christian. Cnet secretly used ai on articles that didn’t disclose that fact, staff say. January 2023. URL https://futurism.com/cnet-ai-articles-label

work page 2023
[11]

Parrot: Paraphrase generation for nlu., 2021

Prithiviraj Damodaran. Parrot: Paraphrase generation for nlu., 2021

work page 2021
[12]

Turn-it-in: Ai fails students for not using ai

Mehul Reuben Das. Turn-it-in: Ai fails students for not using ai. Firstpost, 2023. URL https://www.firstpost.com/world/plagiarism-detector-turnitin-keeps-falsely-accusing-students-of-cheating-using-ai-12704662.html

work page 2023
[14]

Geoffrey A. Fowler. We tested a new chatgpt-detector for teachers. it flagged an innocent student. The Washington Post, 2023. URL https://www.washingtonpost.com/technology/2023/04/01/chatgpt-cheating-detection-turnitin/

work page 2023
[15]

Llm detectors still fall short of real world: Case of llm-generated short news-like posts, 2024

Henrique Da Silva Gameiro, Andrei Kucharavy, and Ljiljana Dolamic. Llm detectors still fall short of real world: Case of llm-generated short news-like posts, 2024. URL https://arxiv.org/abs/2409.03291

work page arXiv 2024
[18]

On the learnability of watermarks for language models, 2024

Chenchen Gu, Xiang Lisa Li, Percy Liang, and Tatsunori Hashimoto. On the learnability of watermarks for language models, 2024. URL https://arxiv.org/abs/2312.04469

work page arXiv 2024
[19]

Accused of cheating by an algorithm, and a professor she had never met

Kashmir Hill. Accused of cheating by an algorithm, and a professor she had never met. The New York Times, 2022. URL https://www.nytimes.com/2022/05/27/technology/college-students-cheating-software-honorlock.html

work page 2022
[23]

kafkai: Ai writer & ai content generator

Kafkai. “kafkai: Ai writer & ai content generator”. 2020. URL https://kafkai.com/

work page 2020
[26]

On the reliability of watermarks for large language models, 2023 b

John Kirchenbauer, Jonas Geiping, Yuxin Wen, Manli Shu, Khalid Saifullah, Kezhi Kong, Kasun Fernando, Aniruddha Saha, Micah Goldblum, and Tom Goldstein. On the reliability of watermarks for large language models, 2023 b

work page 2023
[27]

Paraphrasing evades detectors of ai-generated text, but retrieval is an effective defense, 2023

Kalpesh Krishna, Yixiao Song, Marzena Karpinska, John Wieting, and Mohit Iyyer. Paraphrasing evades detectors of ai-generated text, but retrieval is an effective defense, 2023

work page 2023
[28]

Robust distortion-free watermarks for language models,

Rohith Kuditipudi, John Thickstun, Tatsunori Hashimoto, and Percy Liang. Robust distortion-free watermarks for language models, 2024. URL https://arxiv.org/abs/2307.15593

work page arXiv 2024
[30]

Mage: Machine-generated text detection in the wild, 2024

Yafu Li, Qintong Li, Leyang Cui, Wei Bi, Zhilin Wang, Longyue Wang, Linyi Yang, Shuming Shi, and Yue Zhang. Mage: Machine-generated text detection in the wild, 2024. URL https://arxiv.org/abs/2305.13242

work page arXiv 2024
[31]

Gpt detectors are biased against non-native english writers, 2023

Weixin Liang, Mert Yuksekgonul, Yining Mao, Eric Wu, and James Zou. Gpt detectors are biased against non-native english writers, 2023

work page 2023
[36]

Don't Give Me the Details, Just the Summary! Topic-Aware Convolutional Neural Networks for Extreme Summarization

Shashi Narayan, Shay B. Cohen, and Mirella Lapata. Don't give me the details, just the summary! topic-aware convolutional neural networks for extreme summarization. ArXiv, abs/1808.08745, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[37]

Gpt-2: 1.5b release

OpenAI. Gpt-2: 1.5b release. November 2019. URL https://openai.com/research/gpt-2-1-5b-release

work page 2019
[38]

Chatgpt: Optimizing language models for dialogue

OpenAI. Chatgpt: Optimizing language models for dialogue. November 2022. URL https://openai.com/blog/chatgpt/

work page 2022
[39]

Gpt-4 technical report

OpenAI. Gpt-4 technical report. March 2023. URL https://cdn.openai.com/papers/gpt-4.pdf

work page 2023
[41]

Professor freezes student grades after chatgpt claimed ai wrote their papers

Katyanna Quach. Professor freezes student grades after chatgpt claimed ai wrote their papers. The Register, 2023. URL https://www.theregister.com/2023/05/17/university_chatgpt_grades/

work page 2023
[42]

Language models are unsupervised multitask learners

Alec Radford, Jeff Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. Language models are unsupervised multitask learners. 2019

work page 2019
[43]

Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu. Exploring the limits of transfer learning with a unified text-to-text transformer, 2019. URL https://arxiv.org/abs/1910.10683

work page internal anchor Pith review Pith/arXiv arXiv 2019
[44]

SQuAD: 100,000+ Questions for Machine Comprehension of Text

Pranav Rajpurkar , Jian Zhang , Konstantin Lopyrev , and Percy Liang . SQuAD: 100,000+ Questions for Machine Comprehension of Text . arXiv e-prints, art. arXiv:1606.05250, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[45]

High-resolution image synthesis with latent diffusion models

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bj \"o rn Ommer. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.\ 10684--10695, 2022

work page 2022
[49]

Data-driven cyberattack synthesis against network control systems, 2022

Omanshu Thapliyal and Inseok Hwang. Data-driven cyberattack synthesis against network control systems, 2022

work page 2022
[51]

Improved certified defenses against data poisoning with (deterministic) finite aggregation

Wenxiao Wang, Alexander J Levine, and Soheil Feizi. Improved certified defenses against data poisoning with (deterministic) finite aggregation. In International Conference on Machine Learning, pp.\ 22769--22783. PMLR, 2022

work page 2022
[52]

M4: Multi-generator, multi-domain, and multi-lingual black-box machine-generated text detection, 2023

Yuxia Wang, Jonibek Mansurov, Petar Ivanov, Jinyan Su, Artem Shelmanov, Akim Tsvigun, Chenxi Whitehouse, Osama Mohammed Afzal, Tarek Mahmoud, Alham Fikri Aji, and Preslav Nakov. M4: Multi-generator, multi-domain, and multi-lingual black-box machine-generated text detection, 2023

work page 2023
[53]

Deepfake bot submissions to federal public comment websites cannot be distinguished from human submissions

Max Weiss. Deepfake bot submissions to federal public comment websites cannot be distinguished from human submissions. Technology Science, 2019121801, 2019

work page 2019
[54]

Linguistic steganography on twitter: hierarchical language modeling with manual interaction

Alex Wilson, Phil Blunsom, and Andrew D Ker. Linguistic steganography on twitter: hierarchical language modeling with manual interaction. In Media Watermarking, Security, and Forensics 2014, volume 9028, pp.\ 9--25. SPIE, 2014

work page 2014
[56]

Edelman, Danilo Francati, Daniele Venturi, Giuseppe Ateniese, and Boaz Barak

Hanlin Zhang, Benjamin L. Edelman, Danilo Francati, Daniele Venturi, Giuseppe Ateniese, and Boaz Barak. Watermarks in the sand: Impossibility of strong watermarking for generative models, 2024. URL https://arxiv.org/abs/2311.04378

work page arXiv 2024
[57]

OPT: Open Pre-trained Transformer Language Models

Susan Zhang, Stephen Roller, Naman Goyal, Mikel Artetxe, Moya Chen, Shuohui Chen, Christopher Dewan, Mona Diab, Xian Li, Xi Victoria Lin, Todor Mihaylov, Myle Ott, Sam Shleifer, Kurt Shuster, Daniel Simig, Punit Singh Koura, Anjali Sridhar, Tianlu Wang, and Luke Zettlemoyer. Opt: Open pre-trained transformer language models, 2022. URL https://arxiv.org/ab...

work page internal anchor Pith review Pith/arXiv arXiv 2022
[58]

Provable robust watermarking for ai-generated text,

Xuandong Zhao, Prabhanjan Ananth, Lei Li, and Yu-Xiang Wang. Provable robust watermarking for ai-generated text, 2023 a . URL https://arxiv.org/abs/2306.17439

work page arXiv 2023
[60]

2022 , eprint=

Data-driven Cyberattack Synthesis against Network Control Systems , author=. 2022 , eprint=

work page 2022
[61]

The Quality of the Covariance Selection Through Detection Problem and AUC Bounds

Navid Tafaghodi Khajavi and Anthony Kuh , title =. CoRR , volume =. 2016 , url =. 1605.05776 , timestamp =

work page internal anchor Pith review Pith/arXiv arXiv 2016
[62]

Can AI-Generated Text be Reliably Detected?

Can ai-generated text be reliably detected? , author=. arXiv preprint arXiv:2303.11156 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[63]

2019 , eprint=

On the Use of ArXiv as a Dataset , author=. 2019 , eprint=

work page 2019
[64]

arXiv preprint arXiv:2301.10226 , year=

A Watermark for Large Language Models , author=. arXiv preprint arXiv:2301.10226 , year=

work page arXiv
[65]

arXiv preprint arXiv:2106.14851 , year=

Data poisoning won't save you from facial recognition , author=. arXiv preprint arXiv:2106.14851 , year=

work page arXiv
[66]

On the Discredibility of Membership Inference Attacks , publisher =

Rezaei, Shahbaz and Liu, Xin , keywords =. On the Discredibility of Membership Inference Attacks , publisher =. 2022 , copyright =. doi:10.48550/ARXIV.2212.02701 , url =

work page doi:10.48550/arxiv.2212.02701 2022
[67]

OPT: Open Pre-trained Transformer Language Models

Zhang, Susan and Roller, Stephen and Goyal, Naman and Artetxe, Mikel and Chen, Moya and Chen, Shuohui and Dewan, Christopher and Diab, Mona and Li, Xian and Lin, Xi Victoria and Mihaylov, Todor and Ott, Myle and Shleifer, Sam and Shuster, Kurt and Simig, Daniel and Koura, Punit Singh and Sridhar, Anjali and Wang, Tianlu and Zettlemoyer, Luke , keywords =....

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2205.01068 2022
[68]

Prithiviraj Damodaran , title =

work page
[69]

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Raffel, Colin and Shazeer, Noam and Roberts, Adam and Lee, Katherine and Narang, Sharan and Matena, Michael and Zhou, Yanqi and Li, Wei and Liu, Peter J. , keywords =. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , publisher =. 2019 , copyright =. doi:10.48550/ARXIV.1910.10683 , url =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1910.10683 2019
[70]

ArXiv , year=

Don't Give Me the Details, Just the Summary! Topic-Aware Convolutional Neural Networks for Extreme Summarization , author=. ArXiv , year=

work page
[71]

Language Models are Unsupervised Multitask Learners , author=

work page
[72]

2017 , publisher=

Markov chains and mixing times , author=. 2017 , publisher=

work page 2017
[73]

2019 , eprint=

PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization , author=. 2019 , eprint=

work page 2019
[74]

Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding

Photorealistic text-to-image diffusion models with deep language understanding , author=. arXiv preprint arXiv:2205.11487 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[75]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

High-resolution image synthesis with latent diffusion models , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

work page
[76]

Advances in neural information processing systems , volume=

Language models are few-shot learners , author=. Advances in neural information processing systems , volume=

work page
[77]

ChatGPT: Optimizing Language Models for Dialogue , author=

work page
[78]

GPT-4 Technical Report , author=

work page
[79]

CNET secretly used AI on articles that didn’t disclose that fact, staff say , author=

work page
[80]

GPT-2: 1.5B release , author=

work page
[81]

arXiv preprint arXiv:2011.01314 , year=

Automatic detection of machine generated text: A critical survey , author=. arXiv preprint arXiv:2011.01314 , year=

work page arXiv 2011
[82]

Technology Science , volume=

Deepfake bot submissions to federal public comment websites cannot be distinguished from human submissions , author=. Technology Science , volume=

work page
[83]

Advanced Information Networking and Applications: Proceedings of the 34th International Conference on Advanced Information Networking and Applications (AINA-2020) , pages=

Generating sentiment-preserving fake online reviews using neural language models and their human-and machine-based detection , author=. Advanced Information Networking and Applications: Proceedings of the 34th International Conference on Advanced Information Networking and Applications (AINA-2020) , pages=. 2020 , organization=

work page 2020
[84]

arXiv preprint arXiv:2301.11305 , year=

DetectGPT: Zero-Shot Machine-Generated Text Detection using Probability Curvature , author=. arXiv preprint arXiv:2301.11305 , year=

work page arXiv
[85]

RoBERTa: A Robustly Optimized BERT Pretraining Approach

Roberta: A robustly optimized bert pretraining approach , author=. arXiv preprint arXiv:1907.11692 , year=

work page internal anchor Pith review Pith/arXiv arXiv 1907
[86]

arXiv preprint arXiv:1906.03351 , year=

Real or fake? learning to discriminate machine from human generated text , author=. arXiv preprint arXiv:1906.03351 , year=

work page arXiv 1906
[87]

arXiv , author=

TweepFake: About detecting deepfake tweets. arXiv , author=. arXiv preprint arXiv:2008.00036 , year=

work page arXiv 2008
[88]

Release Strategies and the Social Impacts of Language Models

Release strategies and the social impacts of language models , author=. arXiv preprint arXiv:1908.09203 , year=

work page internal anchor Pith review Pith/arXiv arXiv 1908
[89]

GLTR: Statistical Detection and Visualization of Generated Text

Gltr: Statistical detection and visualization of generated text , author=. arXiv preprint arXiv:1906.04043 , year=

work page internal anchor Pith review Pith/arXiv arXiv 1906
[90]

arXiv preprint arXiv:1911.00650 , year=

Automatic detection of generated text is easiest when humans are fooled , author=. arXiv preprint arXiv:1911.00650 , year=

work page arXiv 1911
[91]

Explaining and Harnessing Adversarial Examples

Explaining and harnessing adversarial examples , author=. arXiv preprint arXiv:1412.6572 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[92]

arXiv preprint arXiv:2303.04278 , year=

CUDA: Convolution-based Unlearnable Datasets , author=. arXiv preprint arXiv:2303.04278 , year=

work page arXiv
[93]

International Conference on Machine Learning , pages=

Improved certified defenses against data poisoning with (deterministic) finite aggregation , author=. International Conference on Machine Learning , pages=. 2022 , organization=

work page 2022
[94]

arXiv preprint arXiv:2201.12440 , year=

Certifying model accuracy under distribution shifts , author=. arXiv preprint arXiv:2201.12440 , year=

work page arXiv
[95]

arXiv preprint arXiv:2302.03162 , year=

Protecting Language Generation Models via Invisible Watermarking , author=. arXiv preprint arXiv:2302.03162 , year=

work page arXiv
[96]

Media Watermarking, Security, and Forensics 2014 , volume=

Linguistic steganography on twitter: hierarchical language modeling with manual interaction , author=. Media Watermarking, Security, and Forensics 2014 , volume=. 2014 , organization=

work page 2014
[97]

Information Hiding: 4th International Workshop, IH 2001 Pittsburgh, PA, USA, April 25--27, 2001 Proceedings 4 , pages=

Natural language watermarking: Design, analysis, and a proof-of-concept implementation , author=. Information Hiding: 4th International Workshop, IH 2001 Pittsburgh, PA, USA, April 25--27, 2001 Proceedings 4 , pages=. 2001 , organization=

work page 2001
[98]

Regulating ChatGPT and other Large Generative AI Models , doi =

Hacker, Philipp and Engel, Andreas and Mauer, Marco , year =. Regulating ChatGPT and other Large Generative AI Models , doi =

work page
[99]

2023 , eprint=

Paraphrasing evades detectors of AI-generated text, but retrieval is an effective defense , author=. 2023 , eprint=

work page 2023
[100]

2023 , eprint=

On the Possibilities of AI-Generated Text Detection , author=. 2023 , eprint=

work page 2023
[101]

My AI Safety Lecture for UT Effective Altruism , author=

work page

Showing first 80 references.

[1] [1]

My ai safety lecture for ut effective altruism

Scott Aaronson. My ai safety lecture for ut effective altruism. November 2022. URL https://scottaaronson.blog/?p=6823

work page 2022

[2] [2]

Generating sentiment-preserving fake online reviews using neural language models and their human-and machine-based detection

David Ifeoluwa Adelani, Haotian Mai, Fuming Fang, Huy H Nguyen, Junichi Yamagishi, and Isao Echizen. Generating sentiment-preserving fake online reviews using neural language models and their human-and machine-based detection. In Advanced Information Networking and Applications: Proceedings of the 34th International Conference on Advanced Information Netw...

work page 2020

[3] [3]

Ai plagiarism detection software keeps falsely accusing students of cheating

Noor Al-Sibai. Ai plagiarism detection software keeps falsely accusing students of cheating. Futurism, 2023. URL https://futurism.com/ai-plagiarism-software-false-accusing-students

work page 2023

[4] [4]

Natural language watermarking: Design, analysis, and a proof-of-concept implementation

Mikhail J Atallah, Victor Raskin, Michael Crogan, Christian Hempelmann, Florian Kerschbaum, Dina Mohamed, and Sanket Naik. Natural language watermarking: Design, analysis, and a proof-of-concept implementation. In Information Hiding: 4th International Workshop, IH 2001 Pittsburgh, PA, USA, April 25--27, 2001 Proceedings 4, pp.\ 185--200. Springer, 2001

work page 2001

[5] [6]

Comparison of two pseudo-random number generators

Lenore Blum, Manuel Blum, and Mike Shub. Comparison of two pseudo-random number generators. In Advances in Cryptology: Proceedings of CRYPTO '82, pp.\ 61--78. Plenum, 1982

work page 1982

[6] [7]

How to generate cryptographically strong sequences of pseudorandom bits

Manuel Blum and Silvio Micali. How to generate cryptographically strong sequences of pseudorandom bits. SIAM Journal on Computing, 13 0 (4): 0 850--864, 1984. doi:10.1137/0213053. URL https://doi.org/10.1137/0213053

work page doi:10.1137/0213053 1984

[7] [8]

Language models are few-shot learners

Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language models are few-shot learners. Advances in neural information processing systems, 33: 0 1877--1901, 2020

work page 1901

[8] [9]

On the possibilities of ai-generated text detection, 2023

Souradip Chakraborty, Amrit Singh Bedi, Sicheng Zhu, Bang An, Dinesh Manocha, and Furong Huang. On the possibilities of ai-generated text detection, 2023

work page 2023

[9] [10]

Cnet secretly used ai on articles that didn’t disclose that fact, staff say

Jon Christian. Cnet secretly used ai on articles that didn’t disclose that fact, staff say. January 2023. URL https://futurism.com/cnet-ai-articles-label

work page 2023

[10] [11]

Parrot: Paraphrase generation for nlu., 2021

Prithiviraj Damodaran. Parrot: Paraphrase generation for nlu., 2021

work page 2021

[11] [12]

Turn-it-in: Ai fails students for not using ai

Mehul Reuben Das. Turn-it-in: Ai fails students for not using ai. Firstpost, 2023. URL https://www.firstpost.com/world/plagiarism-detector-turnitin-keeps-falsely-accusing-students-of-cheating-using-ai-12704662.html

work page 2023

[12] [14]

Geoffrey A. Fowler. We tested a new chatgpt-detector for teachers. it flagged an innocent student. The Washington Post, 2023. URL https://www.washingtonpost.com/technology/2023/04/01/chatgpt-cheating-detection-turnitin/

work page 2023

[13] [15]

Llm detectors still fall short of real world: Case of llm-generated short news-like posts, 2024

Henrique Da Silva Gameiro, Andrei Kucharavy, and Ljiljana Dolamic. Llm detectors still fall short of real world: Case of llm-generated short news-like posts, 2024. URL https://arxiv.org/abs/2409.03291

work page arXiv 2024

[14] [18]

On the learnability of watermarks for language models, 2024

Chenchen Gu, Xiang Lisa Li, Percy Liang, and Tatsunori Hashimoto. On the learnability of watermarks for language models, 2024. URL https://arxiv.org/abs/2312.04469

work page arXiv 2024

[15] [19]

Accused of cheating by an algorithm, and a professor she had never met

Kashmir Hill. Accused of cheating by an algorithm, and a professor she had never met. The New York Times, 2022. URL https://www.nytimes.com/2022/05/27/technology/college-students-cheating-software-honorlock.html

work page 2022

[16] [23]

kafkai: Ai writer & ai content generator

Kafkai. “kafkai: Ai writer & ai content generator”. 2020. URL https://kafkai.com/

work page 2020

[17] [26]

On the reliability of watermarks for large language models, 2023 b

John Kirchenbauer, Jonas Geiping, Yuxin Wen, Manli Shu, Khalid Saifullah, Kezhi Kong, Kasun Fernando, Aniruddha Saha, Micah Goldblum, and Tom Goldstein. On the reliability of watermarks for large language models, 2023 b

work page 2023

[18] [27]

Paraphrasing evades detectors of ai-generated text, but retrieval is an effective defense, 2023

Kalpesh Krishna, Yixiao Song, Marzena Karpinska, John Wieting, and Mohit Iyyer. Paraphrasing evades detectors of ai-generated text, but retrieval is an effective defense, 2023

work page 2023

[19] [28]

Robust distortion-free watermarks for language models,

Rohith Kuditipudi, John Thickstun, Tatsunori Hashimoto, and Percy Liang. Robust distortion-free watermarks for language models, 2024. URL https://arxiv.org/abs/2307.15593

work page arXiv 2024

[20] [30]

Mage: Machine-generated text detection in the wild, 2024

Yafu Li, Qintong Li, Leyang Cui, Wei Bi, Zhilin Wang, Longyue Wang, Linyi Yang, Shuming Shi, and Yue Zhang. Mage: Machine-generated text detection in the wild, 2024. URL https://arxiv.org/abs/2305.13242

work page arXiv 2024

[21] [31]

Gpt detectors are biased against non-native english writers, 2023

Weixin Liang, Mert Yuksekgonul, Yining Mao, Eric Wu, and James Zou. Gpt detectors are biased against non-native english writers, 2023

work page 2023

[22] [36]

Don't Give Me the Details, Just the Summary! Topic-Aware Convolutional Neural Networks for Extreme Summarization

Shashi Narayan, Shay B. Cohen, and Mirella Lapata. Don't give me the details, just the summary! topic-aware convolutional neural networks for extreme summarization. ArXiv, abs/1808.08745, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[23] [37]

Gpt-2: 1.5b release

OpenAI. Gpt-2: 1.5b release. November 2019. URL https://openai.com/research/gpt-2-1-5b-release

work page 2019

[24] [38]

Chatgpt: Optimizing language models for dialogue

OpenAI. Chatgpt: Optimizing language models for dialogue. November 2022. URL https://openai.com/blog/chatgpt/

work page 2022

[25] [39]

Gpt-4 technical report

OpenAI. Gpt-4 technical report. March 2023. URL https://cdn.openai.com/papers/gpt-4.pdf

work page 2023

[26] [41]

Professor freezes student grades after chatgpt claimed ai wrote their papers

Katyanna Quach. Professor freezes student grades after chatgpt claimed ai wrote their papers. The Register, 2023. URL https://www.theregister.com/2023/05/17/university_chatgpt_grades/

work page 2023

[27] [42]

Language models are unsupervised multitask learners

Alec Radford, Jeff Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. Language models are unsupervised multitask learners. 2019

work page 2019

[28] [43]

Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu. Exploring the limits of transfer learning with a unified text-to-text transformer, 2019. URL https://arxiv.org/abs/1910.10683

work page internal anchor Pith review Pith/arXiv arXiv 2019

[29] [44]

SQuAD: 100,000+ Questions for Machine Comprehension of Text

Pranav Rajpurkar , Jian Zhang , Konstantin Lopyrev , and Percy Liang . SQuAD: 100,000+ Questions for Machine Comprehension of Text . arXiv e-prints, art. arXiv:1606.05250, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016

[30] [45]

High-resolution image synthesis with latent diffusion models

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bj \"o rn Ommer. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.\ 10684--10695, 2022

work page 2022

[31] [49]

Data-driven cyberattack synthesis against network control systems, 2022

Omanshu Thapliyal and Inseok Hwang. Data-driven cyberattack synthesis against network control systems, 2022

work page 2022

[32] [51]

Improved certified defenses against data poisoning with (deterministic) finite aggregation

Wenxiao Wang, Alexander J Levine, and Soheil Feizi. Improved certified defenses against data poisoning with (deterministic) finite aggregation. In International Conference on Machine Learning, pp.\ 22769--22783. PMLR, 2022

work page 2022

[33] [52]

M4: Multi-generator, multi-domain, and multi-lingual black-box machine-generated text detection, 2023

Yuxia Wang, Jonibek Mansurov, Petar Ivanov, Jinyan Su, Artem Shelmanov, Akim Tsvigun, Chenxi Whitehouse, Osama Mohammed Afzal, Tarek Mahmoud, Alham Fikri Aji, and Preslav Nakov. M4: Multi-generator, multi-domain, and multi-lingual black-box machine-generated text detection, 2023

work page 2023

[34] [53]

Deepfake bot submissions to federal public comment websites cannot be distinguished from human submissions

Max Weiss. Deepfake bot submissions to federal public comment websites cannot be distinguished from human submissions. Technology Science, 2019121801, 2019

work page 2019

[35] [54]

Linguistic steganography on twitter: hierarchical language modeling with manual interaction

Alex Wilson, Phil Blunsom, and Andrew D Ker. Linguistic steganography on twitter: hierarchical language modeling with manual interaction. In Media Watermarking, Security, and Forensics 2014, volume 9028, pp.\ 9--25. SPIE, 2014

work page 2014

[36] [56]

Edelman, Danilo Francati, Daniele Venturi, Giuseppe Ateniese, and Boaz Barak

Hanlin Zhang, Benjamin L. Edelman, Danilo Francati, Daniele Venturi, Giuseppe Ateniese, and Boaz Barak. Watermarks in the sand: Impossibility of strong watermarking for generative models, 2024. URL https://arxiv.org/abs/2311.04378

work page arXiv 2024

[37] [57]

OPT: Open Pre-trained Transformer Language Models

Susan Zhang, Stephen Roller, Naman Goyal, Mikel Artetxe, Moya Chen, Shuohui Chen, Christopher Dewan, Mona Diab, Xian Li, Xi Victoria Lin, Todor Mihaylov, Myle Ott, Sam Shleifer, Kurt Shuster, Daniel Simig, Punit Singh Koura, Anjali Sridhar, Tianlu Wang, and Luke Zettlemoyer. Opt: Open pre-trained transformer language models, 2022. URL https://arxiv.org/ab...

work page internal anchor Pith review Pith/arXiv arXiv 2022

[38] [58]

Provable robust watermarking for ai-generated text,

Xuandong Zhao, Prabhanjan Ananth, Lei Li, and Yu-Xiang Wang. Provable robust watermarking for ai-generated text, 2023 a . URL https://arxiv.org/abs/2306.17439

work page arXiv 2023

[39] [60]

2022 , eprint=

Data-driven Cyberattack Synthesis against Network Control Systems , author=. 2022 , eprint=

work page 2022

[40] [61]

The Quality of the Covariance Selection Through Detection Problem and AUC Bounds

Navid Tafaghodi Khajavi and Anthony Kuh , title =. CoRR , volume =. 2016 , url =. 1605.05776 , timestamp =

work page internal anchor Pith review Pith/arXiv arXiv 2016

[41] [62]

Can AI-Generated Text be Reliably Detected?

Can ai-generated text be reliably detected? , author=. arXiv preprint arXiv:2303.11156 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[42] [63]

2019 , eprint=

On the Use of ArXiv as a Dataset , author=. 2019 , eprint=

work page 2019

[43] [64]

arXiv preprint arXiv:2301.10226 , year=

A Watermark for Large Language Models , author=. arXiv preprint arXiv:2301.10226 , year=

work page arXiv

[44] [65]

arXiv preprint arXiv:2106.14851 , year=

Data poisoning won't save you from facial recognition , author=. arXiv preprint arXiv:2106.14851 , year=

work page arXiv

[45] [66]

On the Discredibility of Membership Inference Attacks , publisher =

Rezaei, Shahbaz and Liu, Xin , keywords =. On the Discredibility of Membership Inference Attacks , publisher =. 2022 , copyright =. doi:10.48550/ARXIV.2212.02701 , url =

work page doi:10.48550/arxiv.2212.02701 2022

[46] [67]

OPT: Open Pre-trained Transformer Language Models

Zhang, Susan and Roller, Stephen and Goyal, Naman and Artetxe, Mikel and Chen, Moya and Chen, Shuohui and Dewan, Christopher and Diab, Mona and Li, Xian and Lin, Xi Victoria and Mihaylov, Todor and Ott, Myle and Shleifer, Sam and Shuster, Kurt and Simig, Daniel and Koura, Punit Singh and Sridhar, Anjali and Wang, Tianlu and Zettlemoyer, Luke , keywords =....

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2205.01068 2022

[47] [68]

Prithiviraj Damodaran , title =

work page

[48] [69]

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Raffel, Colin and Shazeer, Noam and Roberts, Adam and Lee, Katherine and Narang, Sharan and Matena, Michael and Zhou, Yanqi and Li, Wei and Liu, Peter J. , keywords =. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , publisher =. 2019 , copyright =. doi:10.48550/ARXIV.1910.10683 , url =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1910.10683 2019

[49] [70]

ArXiv , year=

Don't Give Me the Details, Just the Summary! Topic-Aware Convolutional Neural Networks for Extreme Summarization , author=. ArXiv , year=

work page

[50] [71]

Language Models are Unsupervised Multitask Learners , author=

work page

[51] [72]

2017 , publisher=

Markov chains and mixing times , author=. 2017 , publisher=

work page 2017

[52] [73]

2019 , eprint=

PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization , author=. 2019 , eprint=

work page 2019

[53] [74]

Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding

Photorealistic text-to-image diffusion models with deep language understanding , author=. arXiv preprint arXiv:2205.11487 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[54] [75]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

High-resolution image synthesis with latent diffusion models , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

work page

[55] [76]

Advances in neural information processing systems , volume=

Language models are few-shot learners , author=. Advances in neural information processing systems , volume=

work page

[56] [77]

ChatGPT: Optimizing Language Models for Dialogue , author=

work page

[57] [78]

GPT-4 Technical Report , author=

work page

[58] [79]

CNET secretly used AI on articles that didn’t disclose that fact, staff say , author=

work page

[59] [80]

GPT-2: 1.5B release , author=

work page

[60] [81]

arXiv preprint arXiv:2011.01314 , year=

Automatic detection of machine generated text: A critical survey , author=. arXiv preprint arXiv:2011.01314 , year=

work page arXiv 2011

[61] [82]

Technology Science , volume=

Deepfake bot submissions to federal public comment websites cannot be distinguished from human submissions , author=. Technology Science , volume=

work page

[62] [83]

Advanced Information Networking and Applications: Proceedings of the 34th International Conference on Advanced Information Networking and Applications (AINA-2020) , pages=

Generating sentiment-preserving fake online reviews using neural language models and their human-and machine-based detection , author=. Advanced Information Networking and Applications: Proceedings of the 34th International Conference on Advanced Information Networking and Applications (AINA-2020) , pages=. 2020 , organization=

work page 2020

[63] [84]

arXiv preprint arXiv:2301.11305 , year=

DetectGPT: Zero-Shot Machine-Generated Text Detection using Probability Curvature , author=. arXiv preprint arXiv:2301.11305 , year=

work page arXiv

[64] [85]

RoBERTa: A Robustly Optimized BERT Pretraining Approach

Roberta: A robustly optimized bert pretraining approach , author=. arXiv preprint arXiv:1907.11692 , year=

work page internal anchor Pith review Pith/arXiv arXiv 1907

[65] [86]

arXiv preprint arXiv:1906.03351 , year=

Real or fake? learning to discriminate machine from human generated text , author=. arXiv preprint arXiv:1906.03351 , year=

work page arXiv 1906

[66] [87]

arXiv , author=

TweepFake: About detecting deepfake tweets. arXiv , author=. arXiv preprint arXiv:2008.00036 , year=

work page arXiv 2008

[67] [88]

Release Strategies and the Social Impacts of Language Models

Release strategies and the social impacts of language models , author=. arXiv preprint arXiv:1908.09203 , year=

work page internal anchor Pith review Pith/arXiv arXiv 1908

[68] [89]

GLTR: Statistical Detection and Visualization of Generated Text

Gltr: Statistical detection and visualization of generated text , author=. arXiv preprint arXiv:1906.04043 , year=

work page internal anchor Pith review Pith/arXiv arXiv 1906

[69] [90]

arXiv preprint arXiv:1911.00650 , year=

Automatic detection of generated text is easiest when humans are fooled , author=. arXiv preprint arXiv:1911.00650 , year=

work page arXiv 1911

[70] [91]

Explaining and Harnessing Adversarial Examples

Explaining and harnessing adversarial examples , author=. arXiv preprint arXiv:1412.6572 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[71] [92]

arXiv preprint arXiv:2303.04278 , year=

CUDA: Convolution-based Unlearnable Datasets , author=. arXiv preprint arXiv:2303.04278 , year=

work page arXiv

[72] [93]

International Conference on Machine Learning , pages=

Improved certified defenses against data poisoning with (deterministic) finite aggregation , author=. International Conference on Machine Learning , pages=. 2022 , organization=

work page 2022

[73] [94]

arXiv preprint arXiv:2201.12440 , year=

Certifying model accuracy under distribution shifts , author=. arXiv preprint arXiv:2201.12440 , year=

work page arXiv

[74] [95]

arXiv preprint arXiv:2302.03162 , year=

Protecting Language Generation Models via Invisible Watermarking , author=. arXiv preprint arXiv:2302.03162 , year=

work page arXiv

[75] [96]

Media Watermarking, Security, and Forensics 2014 , volume=

Linguistic steganography on twitter: hierarchical language modeling with manual interaction , author=. Media Watermarking, Security, and Forensics 2014 , volume=. 2014 , organization=

work page 2014

[76] [97]

Information Hiding: 4th International Workshop, IH 2001 Pittsburgh, PA, USA, April 25--27, 2001 Proceedings 4 , pages=

Natural language watermarking: Design, analysis, and a proof-of-concept implementation , author=. Information Hiding: 4th International Workshop, IH 2001 Pittsburgh, PA, USA, April 25--27, 2001 Proceedings 4 , pages=. 2001 , organization=

work page 2001

[77] [98]

Regulating ChatGPT and other Large Generative AI Models , doi =

Hacker, Philipp and Engel, Andreas and Mauer, Marco , year =. Regulating ChatGPT and other Large Generative AI Models , doi =

work page

[78] [99]

2023 , eprint=

Paraphrasing evades detectors of AI-generated text, but retrieval is an effective defense , author=. 2023 , eprint=

work page 2023

[79] [100]

2023 , eprint=

On the Possibilities of AI-Generated Text Detection , author=. 2023 , eprint=

work page 2023

[80] [101]

My AI Safety Lecture for UT Effective Altruism , author=

work page