Paraphrasing evades detectors of ai-generated text, but retrieval is an effective defense, 2023

Kalpesh Krishna, Yixiao Song, Marzena Karpinska, John Wieting, Mohit Iyyer · 2023 · arXiv 2303.13408

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

representative citing papers

ExaGPT: Example-Based Machine-Generated Text Detection for Human Interpretability

cs.CL · 2025-02-17 · unverdicted · novelty 7.0

ExaGPT uses span-level similarity retrieval from human and LLM datastores to detect machine-generated text while supplying the matching spans as human-interpretable evidence, achieving up to 37-point accuracy gains over prior interpretable detectors at 1% FPR.

Asking Back: Interaction-Layer Antidistillation Watermarks

cs.CR · 2026-05-15 · unverdicted · novelty 6.0

Interaction-layer antidistillation watermarks use system-prompt-induced behavioral markers like explicit follow-up questions that transfer to distilled student models at 45-89% relative fidelity and can be audited via black-box LLM-as-judge queries.

Paraphrasing Attack Resilience of Various AI-Generated Text Detection Methods

cs.LG · 2026-05-14 · unverdicted · novelty 5.0

Binoculars-inclusive ensembles detect AI text best overall but suffer the largest performance drops under paraphrasing attacks.

Findings of the Counter Turing Test: AI-Generated Text Detection

cs.CL · 2026-05-20

citing papers explorer

Showing 4 of 4 citing papers.

ExaGPT: Example-Based Machine-Generated Text Detection for Human Interpretability cs.CL · 2025-02-17 · unverdicted · none · ref 21
ExaGPT uses span-level similarity retrieval from human and LLM datastores to detect machine-generated text while supplying the matching spans as human-interpretable evidence, achieving up to 37-point accuracy gains over prior interpretable detectors at 1% FPR.
Asking Back: Interaction-Layer Antidistillation Watermarks cs.CR · 2026-05-15 · unverdicted · none · ref 17
Interaction-layer antidistillation watermarks use system-prompt-induced behavioral markers like explicit follow-up questions that transfer to distilled student models at 45-89% relative fidelity and can be audited via black-box LLM-as-judge queries.
Paraphrasing Attack Resilience of Various AI-Generated Text Detection Methods cs.LG · 2026-05-14 · unverdicted · none · ref 7
Binoculars-inclusive ensembles detect AI text best overall but suffer the largest performance drops under paraphrasing attacks.
Findings of the Counter Turing Test: AI-Generated Text Detection cs.CL · 2026-05-20 · unreviewed · ref 16

Paraphrasing evades detectors of ai-generated text, but retrieval is an effective defense, 2023

fields

years

verdicts

representative citing papers

citing papers explorer