pith. sign in

arxiv: 2502.11336 · v2 · submitted 2025-02-17 · 💻 cs.CL

ExaGPT: Example-Based Machine-Generated Text Detection for Human Interpretability

Pith reviewed 2026-05-23 03:24 UTC · model grok-4.3

classification 💻 cs.CL
keywords LLM text detectioninterpretable detectionexample-based methodspan similarityhuman evaluationmachine generated textdatastore
0
0 comments X

The pith

ExaGPT detects LLM-generated texts by finding which category their spans resemble more in a datastore of examples.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents ExaGPT, an approach to detecting machine-generated text that follows how humans naturally verify origins by comparing similar spans of text. It maintains a datastore of both human-written and LLM-generated examples and classifies new text based on which set its spans match more closely, supplying those matches as evidence. This design aims to make the detector's decisions easier for people to assess and trust. Experiments across multiple domains and generators demonstrate large gains in accuracy at low false-positive rates compared to earlier interpretable detectors, and human studies confirm the evidence helps users evaluate the output.

Core claim

ExaGPT identifies a text by checking whether it shares more similar spans with human-written vs. with LLM-generated texts from a datastore. This approach can provide similar span examples that contribute to the decision for each span in the text as evidence. Human evaluation shows that providing similar span examples contributes more effectively to judging the correctness of the decision than existing interpretable methods. Experiments in four domains and three generators show that ExaGPT outperforms prior interpretable detectors by up to +37.0 points of accuracy at a false positive rate of 1%.

What carries the argument

Span similarity search against a balanced datastore of human and LLM texts, which retrieves matching examples to serve as interpretable evidence for each span's classification.

If this is right

  • Accuracy at a 1% false positive rate rises by as much as 37 points over previous interpretable detectors.
  • Users receive concrete similar-span examples that help them assess whether each decision is correct.
  • The detection process aligns more closely with intuitive human methods for checking text origin.
  • Risks from incorrect detections decrease in settings such as education and content moderation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the datastore is kept current with new generators, the method could maintain performance as LLMs evolve.
  • Similar example-based techniques might improve interpretability in other text classification tasks.
  • Combining this with other signals could further strengthen both accuracy and user trust.

Load-bearing premise

The datastore must contain enough representative human-written and LLM-generated texts so that span similarity reliably signals the true source.

What would settle it

A test on new LLM-generated texts absent from the datastore where accuracy falls below that of existing interpretable methods, or where humans rate the provided span examples as less helpful than other evidence.

Figures

Figures reproduced from arXiv: 2502.11336 by Ayana Niwa, Masahiro Kaneko, Naoaki Okazaki, Preslav Nakov, Ryuto Koike.

Figure 1
Figure 1. Figure 1: Identifying the author of a text (human vs. [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Overview of ExaGPT. It detects the author of a text by examining whether the text shares more similar [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: User interface of ExaGPT. Hovering over a [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Reliability score distributions of long spans [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Impact of α on the detection performance of ExaGPT, including the AUROC and the accuracy at 1% FPR, across four domains using ChatGPT as a generator. tion of α in ExaGPT does not lead to its substantial performance drop that could greatly affect the per￾formance ranking of detectors. We find similar overall trends of the impact of α for other LLMs, including GPT-4 and Dolly-v2 as generators. The impact of … view at source ↗
Figure 7
Figure 7. Figure 7: Example of evidence by RoBERTa with SHAP. [PITH_FULL_IMAGE:figures/full_fig_p012_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Example of evidence by LR-GLTR [PITH_FULL_IMAGE:figures/full_fig_p012_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Example of evidence by DNA-GPT [PITH_FULL_IMAGE:figures/full_fig_p012_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Impoact of α on the detection performance of ExaGPT, including the AUROC and the accuracy at 1% FPR, across four domains and three generators [PITH_FULL_IMAGE:figures/full_fig_p013_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Impact of the datastore size on the detection performance of ExaGPT, including the AUROC and the [PITH_FULL_IMAGE:figures/full_fig_p013_11.png] view at source ↗
read the original abstract

Detecting texts generated by Large Language Models (LLMs) could cause grave mistakes due to incorrect decisions, such as undermining students' academic dignity. LLM text detection thus needs to ensure the interpretability of the decision, which can help users judge how reliably correct its prediction is. When humans verify whether a text is human-written or LLM-generated, they intuitively investigate which of them it shares more similar spans with. However, existing interpretable detectors are not aligned with the human decision-making process and fail to offer evidence that users easily understand. To bridge this gap, we introduce ExaGPT, an interpretable detection approach grounded in the human decision-making process for verifying the origin of a text. ExaGPT identifies a text by checking whether it shares more similar spans with human-written vs. with LLM-generated texts from a datastore. This approach can provide similar span examples that contribute to the decision for each span in the text as evidence. Our human evaluation demonstrates that providing similar span examples contributes more effectively to judging the correctness of the decision than existing interpretable methods. Moreover, extensive experiments in four domains and three generators show that ExaGPT massively outperforms prior interpretable detectors by up to +37.0 points of accuracy at a false positive rate of 1%.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces ExaGPT, an interpretable detector for LLM-generated text that identifies origin by comparing input spans against a datastore of human-written and LLM-generated texts and supplies the nearest-neighbor spans as evidence. It claims up to +37 accuracy points at 1% FPR over prior interpretable methods across four domains and three generators, plus a human study showing that the span examples improve users' ability to judge decision correctness.

Significance. If datastore construction and evaluation controls are validated, the approach could advance interpretable detection by aligning more closely with human verification processes than existing methods, which is relevant for high-stakes applications requiring explainable decisions.

major comments (2)
  1. [Datastore construction] Datastore construction section: the reported accuracy gains and human-interpretability claims rest on the assumption that span similarity to the datastore reliably tracks origin, yet no ablations are supplied on datastore size, class balance, domain coverage, generator diversity, or leakage between datastore and test splits; this premise is load-bearing for the +37-point claim at 1% FPR.
  2. [Experiments] Experiments section: the manuscript states large accuracy gains and a positive human study, but supplies neither baseline implementation details, statistical significance tests, nor variance across runs, making it impossible to confirm whether the gains reflect fair comparisons.
minor comments (2)
  1. [Methods] Clarify the exact similarity function and aggregation rule used for the majority-vote decision in the methods description.
  2. [Human evaluation] The human-evaluation protocol (number of participants, task wording, and statistical comparison to baselines) should be expanded for reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below and will revise the manuscript to incorporate additional experiments and reporting details that strengthen the claims.

read point-by-point responses
  1. Referee: [Datastore construction] Datastore construction section: the reported accuracy gains and human-interpretability claims rest on the assumption that span similarity to the datastore reliably tracks origin, yet no ablations are supplied on datastore size, class balance, domain coverage, generator diversity, or leakage between datastore and test splits; this premise is load-bearing for the +37-point claim at 1% FPR.

    Authors: We agree that explicit ablations would further substantiate the core premise. The datastore in our work is built from large-scale, domain-matched human and LLM corpora with explicit no-leakage splits (detailed in Section 3.2), and the multi-domain, multi-generator results already provide indirect evidence of robustness. In revision we will add targeted ablations on size, balance, and leakage to directly support the reported gains. revision: yes

  2. Referee: [Experiments] Experiments section: the manuscript states large accuracy gains and a positive human study, but supplies neither baseline implementation details, statistical significance tests, nor variance across runs, making it impossible to confirm whether the gains reflect fair comparisons.

    Authors: We accept that the current reporting is insufficient for full reproducibility and verification. Baselines were reimplemented following the original papers' descriptions and evaluated under identical protocols; we will expand the appendix with full implementation details, add statistical significance tests (e.g., McNemar or paired t-tests) between ExaGPT and baselines, and report mean and standard deviation across five random seeds for all metrics. revision: yes

Circularity Check

0 steps flagged

No circularity; derivation relies on external datastore comparisons without reduction to self-inputs

full rationale

The paper introduces ExaGPT as an example-based detector that identifies text origin by comparing spans against an external datastore of human-written and LLM-generated texts. No equations, fitted parameters renamed as predictions, self-citation load-bearing premises, uniqueness theorems, or ansatz smuggling appear in the abstract or description. The core claim (span similarity indicates origin) is presented as a direct implementation of the stated human verification process rather than a quantity derived from the paper's own outputs or prior self-referential results. The method is therefore self-contained against external benchmarks and receives the default non-circularity finding.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review supplies no explicit free parameters, axioms, or invented entities; the method implicitly relies on standard similarity metrics and datastore construction whose details are absent.

pith-pipeline@v0.9.0 · 5770 in / 1239 out tokens · 32850 ms · 2026-05-23T03:24:44.304796+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Lightweight Stylistic Consistency Profiling: Robust Detection of LLM-Generated Textual Content for Multimedia Moderation

    cs.CL 2026-05 unverdicted novelty 4.0

    LiSCP detects LLM-generated text via stylistic consistency profiling across paraphrased variants and reports up to 11.79% better cross-domain accuracy plus robustness to adversarial attacks.

Reference graph

Works this paper leans on

43 extracted references · 43 canonical work pages · cited by 1 Pith paper · 7 internal anchors

  1. [1]

    Guangsheng Bao, Yanbin Zhao, Zhiyang Teng, Linyi Yang, and Yue Zhang. 2024. https://arxiv.org/abs/2310.05130 Fast-detectgpt: Efficient zero-shot detection of machine-generated text via conditional probability curvature . Preprint, arXiv:2310.05130

  2. [2]

    Ant \`o nia Mart \'i , and Paolo Rosso

    Alberto Barr \'o n-Cede \ n o, Marta Vila, M. Ant \`o nia Mart \'i , and Paolo Rosso. 2013. https://doi.org/10.1162/COLI_a_00153 Plagiarism meets paraphrasing: Insights for the next generation in automatic plagiarism detection . Computational Linguistics, 39(4):917--947

  3. [3]

    Daria Beresneva. 2016. Computer-generated text detection using machine learning: A systematic review. In 21st International Conference on Applications of Natural Language to Information Systems, NLDB, pages 421--426. Springer

  4. [4]

    Bloomberg. 2024. https://tinyurl.com/bloomberg-ai-detector Ai detectors falsely accuse students of cheating—with big consequences . Accessed on 2024-10-20

  5. [5]

    Zihao Cheng, Li Zhou, Feng Jiang, Benyou Wang, and Haizhou Li. 2025. Beyond binary: Towards fine-grained LLM -generated text detection via role recognition and involvement measurement. In THE WEB CONFERENCE 2025

  6. [6]

    Mike Conover, Matt Hayes, Ankit Mathur, Jianwei Xie, Jun Wan, Sam Shah, Ali Ghodsi, Patrick Wendell, Matei Zaharia, and Reynold Xin. 2023. https://tinyurl.com/databricks-introducing-dolly Free Dolly: Introducing the World's First Truly Open Instruction-Tuned LLM . Accessed: 2024-7-12

  7. [7]

    Evan Crothers, Nathalie Japkowicz, and Herna Viktor. 2023. https://arxiv.org/abs/2210.07321 Machine generated text: A comprehensive survey of threat models and detection methods . Preprint, arXiv:2210.07321

  8. [8]

    Liam Dugan, Alyssa Hwang, Filip Trhl \'i k, Andrew Zhu, Josh Magnus Ludan, Hainiu Xu, Daphne Ippolito, and Chris Callison-Burch. 2024. https://doi.org/10.18653/v1/2024.acl-long.674 RAID : A shared benchmark for robust evaluation of machine-generated text detectors . In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics...

  9. [9]

    Sebastian Gehrmann, Hendrik Strobelt, and Alexander M. Rush. 2019. https://arxiv.org/abs/1906.04043 Gltr: Statistical detection and visualization of generated text . Preprint, arXiv:1906.04043

  10. [10]

    Gizmodo. 2024. https://tinyurl.com/ai-detectors-writers-fired AI Detectors Get It Wrong. Writers Are Being Fired Anyway . Accessed on 2024-07-12

  11. [11]

    Biyang Guo, Xin Zhang, Ziyuan Wang, Minqi Jiang, Jinran Nie, Yuxuan Ding, Jianwei Yue, and Yupeng Wu. 2023. https://arxiv.org/abs/2301.07597 How close is chatgpt to human experts? comparison corpus, evaluation, and detection . Preprint, arXiv:2301.07597

  12. [12]

    Abhimanyu Hans, Avi Schwarzschild, Valeriia Cherepanova, Hamid Kazemi, Aniruddha Saha, Micah Goldblum, Jonas Geiping, and Tom Goldstein. 2024. https://arxiv.org/abs/2401.12070 Spotting llms with binoculars: Zero-shot detection of machine-generated text . Preprint, arXiv:2401.12070

  13. [13]

    Daphne Ippolito, Daniel Duckworth, Chris Callison-Burch, and Douglas Eck. 2020. https://doi.org/10.18653/v1/2020.acl-main.164 Automatic detection of generated text is easiest when humans are fooled . In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 1808--1822, Online. Association for Computational Linguistics

  14. [14]

    ai vs humans

    Jiazhou Ji, Ruizhe Li, Shujun Li, Jie Guo, Weidong Qiu, Zheng Huang, Chiyu Chen, Xiaoyu Jiang, and Xinru Lu. 2024. https://arxiv.org/abs/2406.18259 Detecting machine-generated texts: Not just "ai vs humans" and explainability is complicated . Preprint, arXiv:2406.18259

  15. [15]

    Jeff Johnson, Matthijs Douze, and Hervé Jégou. 2017. https://arxiv.org/abs/1702.08734 Billion-scale similarity search with gpus . Preprint, arXiv:1702.08734

  16. [16]

    Dan Jurafsky, Joyce Chai, Natalie Schluter, and Joel Tetreault. 2020. Proceedings of the 58th annual meeting of the association for computational linguistics. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

  17. [17]

    Masahiro Kaneko, Sho Takase, Ayana Niwa, and Naoaki Okazaki. 2022. https://doi.org/10.18653/v1/2022.acl-long.496 Interpretability for language learners using example-based grammatical error correction . In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 7176--7187, Dublin, Ireland. Ass...

  18. [18]

    Urvashi Khandelwal, Angela Fan, Dan Jurafsky, Luke Zettlemoyer, and Mike Lewis. 2020. Nearest neighbor machine translation. arXiv preprint arXiv:2010.00710

  19. [19]

    John Kirchenbauer, Jonas Geiping, Yuxin Wen, Jonathan Katz, Ian Miers, and Tom Goldstein. 2023. https://arxiv.org/abs/2301.10226 A Watermark for Large Language Models . Preprint, arXiv:2301.10226

  20. [20]

    Ryuto Koike, Masahiro Kaneko, and Naoaki Okazaki. 2024. OUTFOX: LLM-Generated Essay Detection Through In-Context Learning with Adversarially Generated Examples . In Proceedings of the 38th AAAI Conference on Artificial Intelligence, Vancouver, Canada

  21. [21]

    Kalpesh Krishna, Yixiao Song, Marzena Karpinska, John Wieting, and Mohit Iyyer. 2023. https://arxiv.org/abs/2303.13408 Paraphrasing evades detectors of ai-generated text, but retrieval is an effective defense . Preprint, arXiv:2303.13408

  22. [22]

    Thomas Lavergne, Tanguy Urvoy, and Fran c ois Yvon. 2008. https://ceur-ws.org/Vol-377/paper4.pdf Detecting Fake Content with Relative Entropy Scoring . In Proceedings of the ECAI'08 Workshop on Uncovering Plagiarism, Authorship and Social Software Misuse, CEUR Workshop Proceedings

  23. [23]

    Yinhan Liu. 2019. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692, 364

  24. [24]

    Scott Lundberg and Su-In Lee. 2017. https://arxiv.org/abs/1705.07874 A unified approach to interpreting model predictions . Preprint, arXiv:1705.07874

  25. [25]

    Hermann Maurer, Frank Kappe, and Bilal Zaka. 2006. Plagiarism – a survey. Journal of Universal Computer Science, 12(8):1050--1084

  26. [26]

    Manning, and Chelsea Finn

    Eric Mitchell, Yoonho Lee, Alexander Khazatsky, Christopher D. Manning, and Chelsea Finn. 2023. https://arxiv.org/abs/2301.11305 DetectGPT: Zero-Shot Machine-Generated Text Detection using Probability Curvature . Preprint, arXiv:2301.11305

  27. [27]

    Sandra Mitrović, Davide Andreoletti, and Omran Ayoub. 2023. https://arxiv.org/abs/2301.13852 Chatgpt or human? detect and explain. explaining decisions of machine learning model for detecting short chatgpt-generated text . Preprint, arXiv:2301.13852

  28. [28]

    OpenAI. 2023 a . https://tinyurl.com/how-to-respond-student How can educators respond to students presenting ai-generated content as their own? Accessed: 2024-6-10

  29. [29]

    OpenAI. 2023 b . https://openai.com/blog/chatgpt Introducing ChatGPT . Accessed on 2024-03-10

  30. [30]

    "Why Should I Trust You?": Explaining the Predictions of Any Classifier

    Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. https://arxiv.org/abs/1602.04938 "why should i trust you?": Explaining the predictions of any classifier . Preprint, arXiv:1602.04938

  31. [31]

    Juan Diego Rodriguez, Todd Hay, David Gros, Zain Shamsi, and Ravi Srinivasan. 2022. https://doi.org/10.18653/v1/2022.naacl-main.88 Cross-domain detection of GPT -2-generated technical text . In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 1213--1233, S...

  32. [32]

    Irene Solaiman, Miles Brundage, Jack Clark, Amanda Askell, Ariel Herbert-Voss, Jeff Wu, Alec Radford, Gretchen Krueger, Jong Wook Kim, Sarah Kreps, Miles McCain, Alex Newhouse, Jason Blazakis, Kris McGuffie, and Jasmine Wang. 2019. https://arxiv.org/abs/1908.09203 Release Strategies and the Social Impacts of Language Models . Preprint, arXiv:1908.09203

  33. [34]

    Jinyan Su, Terry Yue Zhuo, Di Wang, and Preslav Nakov. 2023 b . https://arxiv.org/abs/2306.05540 Detectllm: Leveraging log rank information for zero-shot detection of machine-generated text . Preprint, arXiv:2306.05540

  34. [35]

    Ruixiang Tang, Yu-Neng Chuang, and Xia Hu. 2023. https://arxiv.org/abs/2303.07205 The science of detecting llm-generated texts . Preprint, arXiv:2303.07205

  35. [36]

    Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, et al. 2023. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288

  36. [37]

    Adaku Uchendu, Thai Le, Kai Shu, and Dongwon Lee. 2020. https://doi.org/10.18653/v1/2020.emnlp-main.673 Authorship attribution for neural text generation . In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 8384--8395, Online. Association for Computational Linguistics

  37. [38]

    Vivek Verma, Eve Fleisig, Nicholas Tomlin, and Dan Klein. 2024. https://arxiv.org/abs/2305.15047 Ghostbuster: Detecting text ghostwritten by large language models . Preprint, arXiv:2305.15047

  38. [39]

    Yuxia Wang, Jonibek Mansurov, Petar Ivanov, Jinyan Su, Artem Shelmanov, Akim Tsvigun, Chenxi Whitehouse, Osama Mohammed Afzal, Tarek Mahmoud, Toru Sasaki, Thomas Arnold, Alham Fikri Aji, Nizar Habash, Iryna Gurevych, and Preslav Nakov. 2024. https://aclanthology.org/2024.eacl-long.83/ M4: Multi-generator, multi-domain, and multi-lingual black-box machine-...

  39. [40]

    Sam Wiseman and Karl Stratos. 2019. https://aclanthology.org/P19-1533/ Label-agnostic sequence labeling by copying nearest neighbors . In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 5363--5369, Florence, Italy. Association for Computational Linguistics

  40. [41]

    A survey on llm-gernerated text detection: Necessity, methods, and future directions

    Junchao Wu, Shu Yang, Runzhe Zhan, Yulin Yuan, Derek F. Wong, and Lidia S. Chao. 2023. https://arxiv.org/abs/2310.14724 A survey on llm-generated text detection: Necessity, methods, and future directions . Preprint, arXiv:2310.14724

  41. [42]

    Xianjun Yang, Wei Cheng, Yue Wu, Linda Petzold, William Yang Wang, and Haifeng Chen. 2023. https://arxiv.org/abs/2305.17359 Dna-gpt: Divergent n-gram analysis for training-free detection of gpt-generated text . Preprint, arXiv:2305.17359

  42. [43]

    online" 'onlinestring :=

    ENTRY address archivePrefix author booktitle chapter edition editor eid eprint eprinttype howpublished institution journal key month note number organization pages publisher school series title type volume year doi pubmed url lastchecked label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block STRING...

  43. [44]

    write newline

    " write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in capitalize " " * FUNCT...