ExaGPT uses span-level similarity retrieval from human and LLM datastores to detect machine-generated text while supplying the matching spans as human-interpretable evidence, achieving up to 37-point accuracy gains over prior interpretable detectors at 1% FPR.
Paraphrasing evades detectors of ai-generated text, but retrieval is an effective defense, 2023
4 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
Interaction-layer antidistillation watermarks use system-prompt-induced behavioral markers like explicit follow-up questions that transfer to distilled student models at 45-89% relative fidelity and can be audited via black-box LLM-as-judge queries.
Binoculars-inclusive ensembles detect AI text best overall but suffer the largest performance drops under paraphrasing attacks.
citing papers explorer
-
ExaGPT: Example-Based Machine-Generated Text Detection for Human Interpretability
ExaGPT uses span-level similarity retrieval from human and LLM datastores to detect machine-generated text while supplying the matching spans as human-interpretable evidence, achieving up to 37-point accuracy gains over prior interpretable detectors at 1% FPR.
-
Asking Back: Interaction-Layer Antidistillation Watermarks
Interaction-layer antidistillation watermarks use system-prompt-induced behavioral markers like explicit follow-up questions that transfer to distilled student models at 45-89% relative fidelity and can be audited via black-box LLM-as-judge queries.
-
Paraphrasing Attack Resilience of Various AI-Generated Text Detection Methods
Binoculars-inclusive ensembles detect AI text best overall but suffer the largest performance drops under paraphrasing attacks.
- Findings of the Counter Turing Test: AI-Generated Text Detection