ExaGPT: Example-Based Machine-Generated Text Detection for Human Interpretability
Pith reviewed 2026-05-23 03:24 UTC · model grok-4.3
The pith
ExaGPT detects LLM-generated texts by finding which category their spans resemble more in a datastore of examples.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
ExaGPT identifies a text by checking whether it shares more similar spans with human-written vs. with LLM-generated texts from a datastore. This approach can provide similar span examples that contribute to the decision for each span in the text as evidence. Human evaluation shows that providing similar span examples contributes more effectively to judging the correctness of the decision than existing interpretable methods. Experiments in four domains and three generators show that ExaGPT outperforms prior interpretable detectors by up to +37.0 points of accuracy at a false positive rate of 1%.
What carries the argument
Span similarity search against a balanced datastore of human and LLM texts, which retrieves matching examples to serve as interpretable evidence for each span's classification.
If this is right
- Accuracy at a 1% false positive rate rises by as much as 37 points over previous interpretable detectors.
- Users receive concrete similar-span examples that help them assess whether each decision is correct.
- The detection process aligns more closely with intuitive human methods for checking text origin.
- Risks from incorrect detections decrease in settings such as education and content moderation.
Where Pith is reading between the lines
- If the datastore is kept current with new generators, the method could maintain performance as LLMs evolve.
- Similar example-based techniques might improve interpretability in other text classification tasks.
- Combining this with other signals could further strengthen both accuracy and user trust.
Load-bearing premise
The datastore must contain enough representative human-written and LLM-generated texts so that span similarity reliably signals the true source.
What would settle it
A test on new LLM-generated texts absent from the datastore where accuracy falls below that of existing interpretable methods, or where humans rate the provided span examples as less helpful than other evidence.
Figures
read the original abstract
Detecting texts generated by Large Language Models (LLMs) could cause grave mistakes due to incorrect decisions, such as undermining students' academic dignity. LLM text detection thus needs to ensure the interpretability of the decision, which can help users judge how reliably correct its prediction is. When humans verify whether a text is human-written or LLM-generated, they intuitively investigate which of them it shares more similar spans with. However, existing interpretable detectors are not aligned with the human decision-making process and fail to offer evidence that users easily understand. To bridge this gap, we introduce ExaGPT, an interpretable detection approach grounded in the human decision-making process for verifying the origin of a text. ExaGPT identifies a text by checking whether it shares more similar spans with human-written vs. with LLM-generated texts from a datastore. This approach can provide similar span examples that contribute to the decision for each span in the text as evidence. Our human evaluation demonstrates that providing similar span examples contributes more effectively to judging the correctness of the decision than existing interpretable methods. Moreover, extensive experiments in four domains and three generators show that ExaGPT massively outperforms prior interpretable detectors by up to +37.0 points of accuracy at a false positive rate of 1%.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces ExaGPT, an interpretable detector for LLM-generated text that identifies origin by comparing input spans against a datastore of human-written and LLM-generated texts and supplies the nearest-neighbor spans as evidence. It claims up to +37 accuracy points at 1% FPR over prior interpretable methods across four domains and three generators, plus a human study showing that the span examples improve users' ability to judge decision correctness.
Significance. If datastore construction and evaluation controls are validated, the approach could advance interpretable detection by aligning more closely with human verification processes than existing methods, which is relevant for high-stakes applications requiring explainable decisions.
major comments (2)
- [Datastore construction] Datastore construction section: the reported accuracy gains and human-interpretability claims rest on the assumption that span similarity to the datastore reliably tracks origin, yet no ablations are supplied on datastore size, class balance, domain coverage, generator diversity, or leakage between datastore and test splits; this premise is load-bearing for the +37-point claim at 1% FPR.
- [Experiments] Experiments section: the manuscript states large accuracy gains and a positive human study, but supplies neither baseline implementation details, statistical significance tests, nor variance across runs, making it impossible to confirm whether the gains reflect fair comparisons.
minor comments (2)
- [Methods] Clarify the exact similarity function and aggregation rule used for the majority-vote decision in the methods description.
- [Human evaluation] The human-evaluation protocol (number of participants, task wording, and statistical comparison to baselines) should be expanded for reproducibility.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. We address each major comment below and will revise the manuscript to incorporate additional experiments and reporting details that strengthen the claims.
read point-by-point responses
-
Referee: [Datastore construction] Datastore construction section: the reported accuracy gains and human-interpretability claims rest on the assumption that span similarity to the datastore reliably tracks origin, yet no ablations are supplied on datastore size, class balance, domain coverage, generator diversity, or leakage between datastore and test splits; this premise is load-bearing for the +37-point claim at 1% FPR.
Authors: We agree that explicit ablations would further substantiate the core premise. The datastore in our work is built from large-scale, domain-matched human and LLM corpora with explicit no-leakage splits (detailed in Section 3.2), and the multi-domain, multi-generator results already provide indirect evidence of robustness. In revision we will add targeted ablations on size, balance, and leakage to directly support the reported gains. revision: yes
-
Referee: [Experiments] Experiments section: the manuscript states large accuracy gains and a positive human study, but supplies neither baseline implementation details, statistical significance tests, nor variance across runs, making it impossible to confirm whether the gains reflect fair comparisons.
Authors: We accept that the current reporting is insufficient for full reproducibility and verification. Baselines were reimplemented following the original papers' descriptions and evaluated under identical protocols; we will expand the appendix with full implementation details, add statistical significance tests (e.g., McNemar or paired t-tests) between ExaGPT and baselines, and report mean and standard deviation across five random seeds for all metrics. revision: yes
Circularity Check
No circularity; derivation relies on external datastore comparisons without reduction to self-inputs
full rationale
The paper introduces ExaGPT as an example-based detector that identifies text origin by comparing spans against an external datastore of human-written and LLM-generated texts. No equations, fitted parameters renamed as predictions, self-citation load-bearing premises, uniqueness theorems, or ansatz smuggling appear in the abstract or description. The core claim (span similarity indicates origin) is presented as a direct implementation of the stated human verification process rather than a quantity derived from the paper's own outputs or prior self-referential results. The method is therefore self-contained against external benchmarks and receives the default non-circularity finding.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
ExaGPT identifies a text by checking whether it shares more similar spans with human-written vs. with LLM-generated texts from a datastore... k-NN search based on the cosine similarity
-
IndisputableMonolith/Foundation/BranchSelection.leanbranch_selection unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We apply a dynamic programming algorithm... maximize S(T) = average(α Lstd + (1-α) Rstd)
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 1 Pith paper
-
Lightweight Stylistic Consistency Profiling: Robust Detection of LLM-Generated Textual Content for Multimedia Moderation
LiSCP detects LLM-generated text via stylistic consistency profiling across paraphrased variants and reports up to 11.79% better cross-domain accuracy plus robustness to adversarial attacks.
Reference graph
Works this paper leans on
- [1]
-
[2]
Ant \`o nia Mart \'i , and Paolo Rosso
Alberto Barr \'o n-Cede \ n o, Marta Vila, M. Ant \`o nia Mart \'i , and Paolo Rosso. 2013. https://doi.org/10.1162/COLI_a_00153 Plagiarism meets paraphrasing: Insights for the next generation in automatic plagiarism detection . Computational Linguistics, 39(4):917--947
-
[3]
Daria Beresneva. 2016. Computer-generated text detection using machine learning: A systematic review. In 21st International Conference on Applications of Natural Language to Information Systems, NLDB, pages 421--426. Springer
work page 2016
-
[4]
Bloomberg. 2024. https://tinyurl.com/bloomberg-ai-detector Ai detectors falsely accuse students of cheating—with big consequences . Accessed on 2024-10-20
work page 2024
-
[5]
Zihao Cheng, Li Zhou, Feng Jiang, Benyou Wang, and Haizhou Li. 2025. Beyond binary: Towards fine-grained LLM -generated text detection via role recognition and involvement measurement. In THE WEB CONFERENCE 2025
work page 2025
-
[6]
Mike Conover, Matt Hayes, Ankit Mathur, Jianwei Xie, Jun Wan, Sam Shah, Ali Ghodsi, Patrick Wendell, Matei Zaharia, and Reynold Xin. 2023. https://tinyurl.com/databricks-introducing-dolly Free Dolly: Introducing the World's First Truly Open Instruction-Tuned LLM . Accessed: 2024-7-12
work page 2023
- [7]
-
[8]
Liam Dugan, Alyssa Hwang, Filip Trhl \'i k, Andrew Zhu, Josh Magnus Ludan, Hainiu Xu, Daphne Ippolito, and Chris Callison-Burch. 2024. https://doi.org/10.18653/v1/2024.acl-long.674 RAID : A shared benchmark for robust evaluation of machine-generated text detectors . In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics...
-
[9]
Sebastian Gehrmann, Hendrik Strobelt, and Alexander M. Rush. 2019. https://arxiv.org/abs/1906.04043 Gltr: Statistical detection and visualization of generated text . Preprint, arXiv:1906.04043
work page internal anchor Pith review Pith/arXiv arXiv 2019
-
[10]
Gizmodo. 2024. https://tinyurl.com/ai-detectors-writers-fired AI Detectors Get It Wrong. Writers Are Being Fired Anyway . Accessed on 2024-07-12
work page 2024
- [11]
-
[12]
Abhimanyu Hans, Avi Schwarzschild, Valeriia Cherepanova, Hamid Kazemi, Aniruddha Saha, Micah Goldblum, Jonas Geiping, and Tom Goldstein. 2024. https://arxiv.org/abs/2401.12070 Spotting llms with binoculars: Zero-shot detection of machine-generated text . Preprint, arXiv:2401.12070
-
[13]
Daphne Ippolito, Daniel Duckworth, Chris Callison-Burch, and Douglas Eck. 2020. https://doi.org/10.18653/v1/2020.acl-main.164 Automatic detection of generated text is easiest when humans are fooled . In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 1808--1822, Online. Association for Computational Linguistics
-
[14]
Jiazhou Ji, Ruizhe Li, Shujun Li, Jie Guo, Weidong Qiu, Zheng Huang, Chiyu Chen, Xiaoyu Jiang, and Xinru Lu. 2024. https://arxiv.org/abs/2406.18259 Detecting machine-generated texts: Not just "ai vs humans" and explainability is complicated . Preprint, arXiv:2406.18259
-
[15]
Jeff Johnson, Matthijs Douze, and Hervé Jégou. 2017. https://arxiv.org/abs/1702.08734 Billion-scale similarity search with gpus . Preprint, arXiv:1702.08734
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[16]
Dan Jurafsky, Joyce Chai, Natalie Schluter, and Joel Tetreault. 2020. Proceedings of the 58th annual meeting of the association for computational linguistics. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics
work page 2020
-
[17]
Masahiro Kaneko, Sho Takase, Ayana Niwa, and Naoaki Okazaki. 2022. https://doi.org/10.18653/v1/2022.acl-long.496 Interpretability for language learners using example-based grammatical error correction . In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 7176--7187, Dublin, Ireland. Ass...
- [18]
- [19]
-
[20]
Ryuto Koike, Masahiro Kaneko, and Naoaki Okazaki. 2024. OUTFOX: LLM-Generated Essay Detection Through In-Context Learning with Adversarially Generated Examples . In Proceedings of the 38th AAAI Conference on Artificial Intelligence, Vancouver, Canada
work page 2024
- [21]
-
[22]
Thomas Lavergne, Tanguy Urvoy, and Fran c ois Yvon. 2008. https://ceur-ws.org/Vol-377/paper4.pdf Detecting Fake Content with Relative Entropy Scoring . In Proceedings of the ECAI'08 Workshop on Uncovering Plagiarism, Authorship and Social Software Misuse, CEUR Workshop Proceedings
work page 2008
-
[23]
Yinhan Liu. 2019. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692, 364
work page internal anchor Pith review Pith/arXiv arXiv 2019
-
[24]
Scott Lundberg and Su-In Lee. 2017. https://arxiv.org/abs/1705.07874 A unified approach to interpreting model predictions . Preprint, arXiv:1705.07874
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[25]
Hermann Maurer, Frank Kappe, and Bilal Zaka. 2006. Plagiarism – a survey. Journal of Universal Computer Science, 12(8):1050--1084
work page 2006
-
[26]
Eric Mitchell, Yoonho Lee, Alexander Khazatsky, Christopher D. Manning, and Chelsea Finn. 2023. https://arxiv.org/abs/2301.11305 DetectGPT: Zero-Shot Machine-Generated Text Detection using Probability Curvature . Preprint, arXiv:2301.11305
- [27]
-
[28]
OpenAI. 2023 a . https://tinyurl.com/how-to-respond-student How can educators respond to students presenting ai-generated content as their own? Accessed: 2024-6-10
work page 2023
-
[29]
OpenAI. 2023 b . https://openai.com/blog/chatgpt Introducing ChatGPT . Accessed on 2024-03-10
work page 2023
-
[30]
"Why Should I Trust You?": Explaining the Predictions of Any Classifier
Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. https://arxiv.org/abs/1602.04938 "why should i trust you?": Explaining the predictions of any classifier . Preprint, arXiv:1602.04938
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[31]
Juan Diego Rodriguez, Todd Hay, David Gros, Zain Shamsi, and Ravi Srinivasan. 2022. https://doi.org/10.18653/v1/2022.naacl-main.88 Cross-domain detection of GPT -2-generated technical text . In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 1213--1233, S...
-
[32]
Irene Solaiman, Miles Brundage, Jack Clark, Amanda Askell, Ariel Herbert-Voss, Jeff Wu, Alec Radford, Gretchen Krueger, Jong Wook Kim, Sarah Kreps, Miles McCain, Alex Newhouse, Jason Blazakis, Kris McGuffie, and Jasmine Wang. 2019. https://arxiv.org/abs/1908.09203 Release Strategies and the Social Impacts of Language Models . Preprint, arXiv:1908.09203
work page internal anchor Pith review Pith/arXiv arXiv 2019
- [34]
- [35]
-
[36]
Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, et al. 2023. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[37]
Adaku Uchendu, Thai Le, Kai Shu, and Dongwon Lee. 2020. https://doi.org/10.18653/v1/2020.emnlp-main.673 Authorship attribution for neural text generation . In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 8384--8395, Online. Association for Computational Linguistics
- [38]
-
[39]
Yuxia Wang, Jonibek Mansurov, Petar Ivanov, Jinyan Su, Artem Shelmanov, Akim Tsvigun, Chenxi Whitehouse, Osama Mohammed Afzal, Tarek Mahmoud, Toru Sasaki, Thomas Arnold, Alham Fikri Aji, Nizar Habash, Iryna Gurevych, and Preslav Nakov. 2024. https://aclanthology.org/2024.eacl-long.83/ M4: Multi-generator, multi-domain, and multi-lingual black-box machine-...
work page 2024
-
[40]
Sam Wiseman and Karl Stratos. 2019. https://aclanthology.org/P19-1533/ Label-agnostic sequence labeling by copying nearest neighbors . In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 5363--5369, Florence, Italy. Association for Computational Linguistics
work page 2019
-
[41]
A survey on llm-gernerated text detection: Necessity, methods, and future directions
Junchao Wu, Shu Yang, Runzhe Zhan, Yulin Yuan, Derek F. Wong, and Lidia S. Chao. 2023. https://arxiv.org/abs/2310.14724 A survey on llm-generated text detection: Necessity, methods, and future directions . Preprint, arXiv:2310.14724
- [42]
-
[43]
ENTRY address archivePrefix author booktitle chapter edition editor eid eprint eprinttype howpublished institution journal key month note number organization pages publisher school series title type volume year doi pubmed url lastchecked label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block STRING...
-
[44]
" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in capitalize " " * FUNCT...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.