pith. sign in

arxiv: 2605.23190 · v1 · pith:JTOZRHDQnew · submitted 2026-05-22 · 💻 cs.CL

Hidden Human-Like Nature of Machine-Generated Texts: Theory and Detection Enhancement

Pith reviewed 2026-05-25 04:51 UTC · model grok-4.3

classification 💻 cs.CL
keywords machine-generated text detectionhuman-like spansdetection enhancementlatent variable modelhard-EM optimizationlarge language modelstext classificationLLM detection
0
0 comments X

The pith

Machine-generated texts contain hidden human-like spans that increase detection complexity.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that fully machine-generated texts include spans highly consistent with human writing. These spans raise the complexity of identifying machine-generated content at the paragraph level. The authors analyze this effect theoretically and introduce a model-agnostic framework to enhance detectors by modeling span retention as a latent-variable problem solved through hard-EM-style iterative filtering. The method removes confidently human-like subsequences and retrains or refines the detector on the rest. Experiments indicate the approach improves existing detectors across multiple LLMs and can run without additional training data.

Core claim

Even fully machine-generated texts may contain spans that are highly consistent with human writing. These spans increase the sentence complexity for detection, thereby making MGT detection intrinsically harder. The stacked enhancement framework models span-level retention decisions as a latent-variable problem and instantiates the optimization with a hard-EM-inspired procedure in which the detector iteratively filters confidently human-like subsequences and refines itself on the remaining text.

What carries the argument

The stacked enhancement framework that models span-level retention decisions as a latent-variable problem and optimizes via a hard-EM-inspired iterative filtering procedure to reduce the influence of human-like spans.

If this is right

  • Existing paragraph-level detectors can be improved by reducing the influence of hidden human-like spans.
  • The framework works in a training-free manner, supporting flexible deployment.
  • Detection performance improves consistently across various LLMs and practical scenarios.
  • The iterative process refines the detector specifically on text remaining after removal of human-like subsequences.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Future detectors may benefit from explicit span-level analysis rather than treating entire paragraphs as uniform.
  • This approach could extend to identifying mixed human-machine content in applications such as content moderation.
  • Testing the filtering procedure on longer documents might show whether human-like spans cluster in predictable positions.
  • The method's model-agnostic nature suggests it could apply to other classification tasks involving partially human-like data.

Load-bearing premise

Span-level retention decisions can be reliably modeled as a latent-variable problem and optimized via a hard-EM-inspired iterative filtering procedure without discarding detection-critical signals or introducing new biases.

What would settle it

Running the iterative filtering on a set of machine-generated texts known to contain human-like spans and observing no improvement or a drop in detector accuracy would falsify the claim that filtering these spans enhances detection.

Figures

Figures reproduced from arXiv: 2605.23190 by Bo Han, Chenwang Wu, Defu Lian, Yiu-ming Cheung.

Figure 1
Figure 1. Figure 1: The proportion of consistent sentences between humans and LLMs. [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: The inference process of the proposed framework. In the filtering step (top-right), the unknown text is split into sub-sequences. The trained detector runs [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Average detection performance (x-axis) of detectors (ChatGPT-D and our boosting strategy ChatGPT-STK) tested across various LLMs, where these [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Performance under cross-domain setting. The Essay dataset served as the source domain, and the Reuters dataset as the target domain. The detector [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Enhancing the robustness of ChatGPT-D. Here we use three attacks: [PITH_FULL_IMAGE:figures/full_fig_p010_5.png] view at source ↗
Figure 7
Figure 7. Figure 7: Performance concerning TPR@FPR-5% at different mixing levels. These detectors are trained on ChatGPT texts. [PITH_FULL_IMAGE:figures/full_fig_p011_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Performance (x-axis) of the un-fine-tuned detectors tested on various LLM texts (y-axis). [PITH_FULL_IMAGE:figures/full_fig_p012_8.png] view at source ↗
read the original abstract

Machine-generated texts (MGTs) produced by large language models (LLMs) are increasingly prevalent across various applications, while their potential misuse in fake news propagation and phishing has raised serious concerns, highlighting the need for MGT detection. Existing paragraph-level detection methods commonly treat MGTs as entirely machine-like, overlooking the hidden human-like nature of machine-generated texts: even fully machine-generated texts may contain spans that are highly consistent with human writing. To this end, we first reveal the existence of such hidden human-like spans, and then theoretically analyze their impact on detection. Our analysis shows that these spans increase the sentence complexity for detection, thereby making MGT detection intrinsically harder. Based on this finding, we propose a model-agnostic stacked enhancement framework that improves existing detectors by reducing the influence of hidden human-like spans. Specifically, we model span-level retention decisions as a latent-variable problem and instantiate the optimization with a hard-EM-inspired procedure, where the detector iteratively filters confidently human-like subsequences and refines itself on the remaining text. Extensive experiments across various LLMs and practical scenarios demonstrate that the proposed framework consistently enhances existing detectors. Notably, the framework can also work in a training-free manner, offering flexibility and scalability for practical deployment.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper claims that even fully machine-generated texts contain hidden human-like spans that increase sentence complexity and make MGT detection intrinsically harder. It supports this via theoretical analysis of the phenomenon and proposes a model-agnostic stacked enhancement framework that models span retention as a latent-variable problem solved by a hard-EM-inspired iterative procedure: the detector filters confidently human-like subsequences and refines itself on the remainder. The framework is reported to consistently improve existing detectors across LLMs and scenarios, including in a training-free mode.

Significance. If the theoretical analysis rigorously demonstrates that the spans strictly increase detection complexity and the iterative procedure improves detectors without discarding critical signals or introducing bias, the work would provide a useful perspective on MGT detection challenges and a practical, model-agnostic enhancement method. The training-free option adds deployment value. Credit is due for attempting a latent-variable formulation and for the model-agnostic framing.

major comments (2)
  1. [Abstract] Abstract: the claim that theoretical analysis and extensive experiments support the central result (hidden spans make detection intrinsically harder) cannot be assessed because the manuscript provides neither the derivations nor the quantitative results; soundness is therefore unverifiable from the given material.
  2. [Method] Method (hard-EM procedure): modeling span retention as a latent-variable problem solved by iterative filtering with the detector itself risks circularity and error amplification; the abstract presents the step as external enhancement, yet no derivation shows that the procedure preserves the original detection margin or avoids discarding MGT-specific cues when the base detector is imperfect.
minor comments (1)
  1. [Abstract] Abstract: the phrase 'sentence complexity for detection' is used without a precise definition or reference to how it is quantified, which would aid clarity even in a high-level summary.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on our manuscript. We address each major point below, clarifying the presence of theoretical derivations and experimental results in the full text while acknowledging opportunities to strengthen the methodological exposition. We maintain that the core claims are supported but are open to revisions that enhance verifiability.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the claim that theoretical analysis and extensive experiments support the central result (hidden spans make detection intrinsically harder) cannot be assessed because the manuscript provides neither the derivations nor the quantitative results; soundness is therefore unverifiable from the given material.

    Authors: The full manuscript contains a dedicated theoretical analysis section deriving the impact of hidden human-like spans on detection complexity via sentence-level entropy and margin bounds, along with quantitative results in the experiments section across multiple LLMs and scenarios. We can insert explicit section references into the abstract during revision to improve accessibility without altering the claims. revision: partial

  2. Referee: [Method] Method (hard-EM procedure): modeling span retention as a latent-variable problem solved by iterative filtering with the detector itself risks circularity and error amplification; the abstract presents the step as external enhancement, yet no derivation shows that the procedure preserves the original detection margin or avoids discarding MGT-specific cues when the base detector is imperfect.

    Authors: The hard-EM procedure is formulated as a latent-variable optimization that initializes with the base detector and iteratively retains only high-confidence machine-like segments for refinement, which empirical results across detectors demonstrate improves performance rather than amplifying errors. While a formal proof of margin preservation is not derived in the current text, the model-agnostic design and consistent gains in training-free and fine-tuned settings indicate that MGT-specific cues are not systematically discarded; we are prepared to add an appendix discussion addressing this concern. revision: partial

Circularity Check

0 steps flagged

No significant circularity in the derivation chain

full rationale

The paper first claims to reveal the existence of hidden human-like spans via direct observation, then performs a theoretical analysis of their effect on detection complexity, and finally proposes a model-agnostic enhancement framework instantiated with a standard hard-EM procedure for latent variables. No equations or steps are shown to reduce the central claims (existence, impact, or performance gain) to the inputs by construction, self-definition, or self-citation chains. The iterative filtering is presented as an optimization technique whose validity is checked by external experiments across LLMs rather than being tautological. The derivation remains self-contained against the stated benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

Review performed on abstract only; ledger entries are inferred from the high-level description. The central claim rests on the existence of human-like spans and the validity of the latent-variable modeling choice.

axioms (2)
  • domain assumption Machine-generated texts contain spans highly consistent with human writing even when produced entirely by LLMs
    This is the foundational observation stated in the abstract.
  • domain assumption These spans increase sentence complexity and thereby make detection intrinsically harder
    Direct claim from the theoretical analysis described in the abstract.

pith-pipeline@v0.9.0 · 5757 in / 1362 out tokens · 26218 ms · 2026-05-25T04:51:01.299974+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

65 extracted references · 65 canonical work pages · 4 internal anchors

  1. [1]

    GPT-4 Technical Report

    J. Achiam, S. Adler, S. Agarwal, L. Ahmad, I. Akkaya, F. L. Aleman, D. Almeida, J. Altenschmidt, S. Altman, S. Anadkatet al., “Gpt-4 technical report,”arXiv preprint arXiv:2303.08774, 2023

  2. [2]

    Language models are unsupervised multitask learners,

    A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, I. Sutskeveret al., “Language models are unsupervised multitask learners,”OpenAI blog, vol. 1, no. 8, p. 9, 2019

  3. [3]

    Defending against neural fake news,

    R. Zellers, A. Holtzman, H. Rashkin, Y . Bisk, A. Farhadi, F. Roesner, and Y . Choi, “Defending against neural fake news,”Advances in Neural Information Processing Systems, vol. 32, 2019

  4. [4]

    The state of phishing attacks,

    J. Hong, “The state of phishing attacks,”Communications of the ACM, vol. 55, no. 1, pp. 74–81, 2012

  5. [5]

    Factors affecting accounting students’ misuse of chatgpt: an application of the fraud triangle theory,

    H. Alshurafat, M. O. Al Shbail, A. Hamdan, A. Al-Dmour, and W. Ensour, “Factors affecting accounting students’ misuse of chatgpt: an application of the fraud triangle theory,”Journal of Financial Reporting and Accounting, vol. 22, no. 2, pp. 274–288, 2024

  6. [6]

    Fakecatcher: Detection of synthetic portrait videos using biological signals,

    U. A. Ciftci, I. Demir, and L. Yin, “Fakecatcher: Detection of synthetic portrait videos using biological signals,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 1, no. 1, pp. 1–17, 2020

  7. [7]

    In- trinsic dimension estimation for robust detection of ai-generated texts,

    E. Tulchinskii, K. Kuznetsov, L. Kushnareva, D. Cherniavskii, S. Nikolenko, E. Burnaev, S. Barannikov, and I. Piontkovskaya, “In- trinsic dimension estimation for robust detection of ai-generated texts,” Advances in Neural Information Processing Systems, vol. 36, 2024

  8. [8]

    Detectgpt: Zero-shot machine-generated text detection using probability curvature,

    E. Mitchell, Y . Lee, A. Khazatsky, C. D. Manning, and C. Finn, “Detectgpt: Zero-shot machine-generated text detection using probability curvature,” inProceedings of International Conference on Machine Learning. PMLR, 2023, pp. 24 950–24 962

  9. [9]

    Release Strategies and the Social Impacts of Language Models

    I. Solaiman, M. Brundage, J. Clark, A. Askell, A. Herbert-V oss, J. Wu, A. Radford, G. Krueger, J. W. Kim, S. Krepset al., “Release strategies and the social impacts of language models,”arXiv preprint arXiv:1908.09203, 2019

  10. [10]

    How close is chatgpt to human experts? comparison corpus, evaluation, and detection,

    B. Guo, X. Zhang, Z. Wang, M. Jiang, J. Nie, Y . Ding, J. Yue, and Y . Wu, “How close is chatgpt to human experts? comparison corpus, evaluation, and detection,”arXiv preprint arXiv:2301.07597, 2023

  11. [11]

    Biscope: Ai-generated text detection by checking memorization of preceding tokens,

    H. Guo, S. Cheng, X. Jin, Z. Zhang, K. Zhang, G. Tao, G. Shen, and X. Zhang, “Biscope: Ai-generated text detection by checking memorization of preceding tokens,”Advances in Neural Information Processing Systems, vol. 37, pp. 104 065–104 090, 2024

  12. [12]

    Smaller language models are better black-box machine- generated text detectors,

    N. Mireshghallah, J. Mattern, S. Gao, R. Shokri, and T. Berg- Kirkpatrick, “Smaller language models are better black-box machine- generated text detectors,”arXiv preprint arXiv:2305.09859, 2023

  13. [13]

    Ghostbuster: Detecting text ghostwritten by large language models,

    V . Verma, E. Fleisig, N. Tomlin, and D. Klein, “Ghostbuster: Detecting text ghostwritten by large language models,” inProceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), 2024, pp. 1702–1717

  14. [14]

    Neural deepfake detection with factual structure of text,

    W. Zhong, D. Tang, Z. Xu, R. Wang, N. Duan, M. Zhou, J. Wang, and J. Yin, “Neural deepfake detection with factual structure of text,” in Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2020, pp. 2461–2470

  15. [15]

    Adversarial robustness of neural-statistical features in detection of generative trans- formers,

    E. Crothers, N. Japkowicz, H. Viktor, and P. Branco, “Adversarial robustness of neural-statistical features in detection of generative trans- formers,” inProceedings of 2022 International Joint Conference on Neural Networks (IJCNN). IEEE, 2022, pp. 1–8

  16. [16]

    Seqxgpt: Sentence-level ai-generated text detection,

    P. Wang, L. Li, K. Ren, B. Jiang, D. Zhang, and X. Qiu, “Seqxgpt: Sentence-level ai-generated text detection,” inProceedings of The 2023 Conference on Empirical Methods in Natural Language Processing, 2023

  17. [17]

    Llm-as-a-coauthor: Can mixed human- written and machine-generated text be detected?

    Q. Zhang, C. Gao, D. Chen, Y . Huang, Y . Huang, Z. Sun, S. Zhang, W. Li, Z. Fu, Y . Wanet al., “Llm-as-a-coauthor: Can mixed human- written and machine-generated text be detected?” inProceedings of Findings of the Association for Computational Linguistics: NAACL 2024, 2024, pp. 409–436

  18. [18]

    Llm-detector: Improving ai-generated chinese text detection with open-source llm instruction tuning,

    R. Wang, H. Chen, R. Zhou, H. Ma, Y . Duan, Y . Kang, S. Yang, B. Fan, and T. Tan, “Llm-detector: Improving ai-generated chinese text detection with open-source llm instruction tuning,”arXiv preprint arXiv:2402.01158, 2024

  19. [19]

    Detecting ai-generated text: Factors influencing detectability with current methods,

    K. C. Fraser, H. Dawkins, and S. Kiritchenko, “Detecting ai-generated text: Factors influencing detectability with current methods,”Journal of Artificial Intelligence Research, vol. 82, pp. 2233–2278, 2025

  20. [20]

    A watermark for large language models,

    J. Kirchenbauer, J. Geiping, Y . Wen, J. Katz, I. Miers, and T. Goldstein, “A watermark for large language models,” inProceedings of Interna- tional Conference on Machine Learning. PMLR, 2023, pp. 17 061– 17 084

  21. [21]

    Provable robust watermarking for ai-generated text,

    X. Zhao, P. V . Ananth, L. Li, and Y .-X. Wang, “Provable robust watermarking for ai-generated text,” inProceedings of The Twelfth International Conference on Learning Representations, 2024

  22. [22]

    Watermarks in the sand: Impossibility of strong watermarking for generative models,

    H. Zhang, B. L. Edelman, D. Francati, D. Venturi, G. Ateniese, and B. Barak, “Watermarks in the sand: Impossibility of strong watermarking for generative models,” inProceedings of ICLR 2024 Workshop on Secure and Trustworthy Large Language Models, 2024

  23. [23]

    Watermark stealing in large language models,

    N. Jovanovi ´c, R. Staab, and M. Vechev, “Watermark stealing in large language models,” inProceedings of International Conference on Ma- chine Learning, 2024

  24. [24]

    Gltr: Statistical detection and visualization of generated text,

    S. Gehrmann, H. Strobelt, and A. Rush, “Gltr: Statistical detection and visualization of generated text,” inProceedings of the 57th An- nual Meeting of the Association for Computational Linguistics: System Demonstrations. Association for Computational Linguistics, 2019

  25. [25]

    Detectllm: Leveraging log rank information for zero-shot detection of machine-generated text,

    J. Su, T. Zhuo, D. Wang, and P. Nakov, “Detectllm: Leveraging log rank information for zero-shot detection of machine-generated text,” inProceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023, pp. 12 395–12 412

  26. [26]

    Adadetectgpt: Adaptive detection of llm-generated text with statistical guarantees,

    H. Zhou, J. Zhu, P. Su, K. Ye, Y . Yang, S. Gavioli-Akilagun, and C. Shi, “Adadetectgpt: Adaptive detection of llm-generated text with statistical guarantees,” inProceedings of the Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025, pp. 1–10

  27. [27]

    Fast-detectgpt: Efficient zero-shot detection of machine-generated text via conditional probability curvature,

    G. Bao, Y . Zhao, Z. Teng, L. Yang, and Y . Zhang, “Fast-detectgpt: Efficient zero-shot detection of machine-generated text via conditional probability curvature,” inProceedings of International Conference on Learning Representations, 2024

  28. [28]

    Dna-gpt: Divergent n-gram analysis for training-free detection of gpt- generated text,

    X. Yang, W. Cheng, Y . Wu, L. R. Petzold, W. Y . Wang, and H. Chen, “Dna-gpt: Divergent n-gram analysis for training-free detection of gpt- generated text,” inProceedings of the Twelfth International Conference on Learning Representations, 2024, pp. 1–9

  29. [29]

    Zero-shot detection of machine-generated codes,

    X. Yang, K. Zhang, H. Chen, L. Petzold, W. Y . Wang, and W. Cheng, “Zero-shot detection of machine-generated codes,”arXiv preprint arXiv:2310.05103, 2023

  30. [30]

    Simllm: Detecting sentences generated by large language models using similarity between the generation and its re-generation,

    H.-Q. Nguyen-Son, M.-S. Dao, and K. Zettsu, “Simllm: Detecting sentences generated by large language models using similarity between the generation and its re-generation,” inProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024, pp. 22 340–22 352

  31. [31]

    Learn-to- distance: Distance learning for detecting llm-generated text,

    H. Zhou, J. Zhu, K. Ye, Y . Yang, E. Xu, and C. Shi, “Learn-to- distance: Distance learning for detecting llm-generated text,”arXiv preprint arXiv:2601.21895, 2026

  32. [32]

    Repreguard: Detecting llm-generated text by revealing hidden representation patterns,

    X. Chen, J. Wu, S. Yang, R. Zhan, Z. Wu, Z. Luo, D. Wang, M. Yang, L. S. Chao, and D. F. Wong, “Repreguard: Detecting llm-generated text by revealing hidden representation patterns,”Transactions of the Association for Computational Linguistics, vol. 13, pp. 1812–1831, 2025

  33. [33]

    Training-free llm-generated text detection by mining token probability sequences,

    Y . Xu, Y . Wang, Y . Bi, H. Cao, Z. Lin, Y . Zhao, and F. Wu, “Training-free llm-generated text detection by mining token probability sequences,” inProceedings of The Thirteenth International Conference on Learning Representations, 2025

  34. [34]

    Detecting subtle differences between human and model languages using spectrum of relative likeli- hood,

    Y . Xu, Y . Wang, H. An, Z. Liu, and Y . Li, “Detecting subtle differences between human and model languages using spectrum of relative likeli- hood,” inProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024, pp. 10 108–10 121

  35. [35]

    Moses: Uncertainty-aware ai-generated text detection via mixture of stylistics experts with conditional thresholds,

    J. Wu, J. Wang, Z. Liu, B. Chen, D. Hu, H. Wu, and S.-T. Xia, “Moses: Uncertainty-aware ai-generated text detection via mixture of stylistics experts with conditional thresholds,” inProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025, pp. 5797–5816

  36. [36]

    Y . He, S. Zhang, Y . Cao, L. Ma, and P. Luo, “Detree: Detecting human-ai collaborative texts via tree-structured hierarchical representation learn- HIDDEN HUMAN-LIKE NATURE OF MACHINE-GENERATED TEXTS: THEORY AND DETECTION ENHANCEMENT 14 ing,” inProceedings of the Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025, pp. 1–10

  37. [37]

    Dna- detectllm: Unveiling ai-generated text via a dna-inspired mutation-repair paradigm,

    X. Zhu, Y . Ren, F. Fang, Q. Tan, S. Wang, and Y . Cao, “Dna- detectllm: Unveiling ai-generated text via a dna-inspired mutation-repair paradigm,” inProceedings of the Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025, pp. 1–13

  38. [38]

    Human texts are outliers: Detecting llm-generated texts via out- of-distribution detection,

    C. Zeng, S. Tang, Y . Chen, Z. Shen, W. Yu, X. Zhao, H. Chen, W. Cheng et al., “Human texts are outliers: Detecting llm-generated texts via out- of-distribution detection,” inProceedings of the Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025, pp. 1–10

  39. [39]

    Ipad: Inverse prompt for ai detection–a robust and explainable llm-generated text detector,

    Z. Chen, Y . Feng, C. He, Y . Deng, H. Pu, and B. Li, “Ipad: Inverse prompt for ai detection–a robust and explainable llm-generated text detector,”arXiv e-prints, pp. arXiv–2502, 2025

  40. [40]

    Hld: Approx- imate hierarchical linguistic distribution modeling for llm-generated text detection,

    R. Guo, W. Zeng, F. Wu, Y . Kong, Y . Wu, W. Donget al., “Hld: Approx- imate hierarchical linguistic distribution modeling for llm-generated text detection,” inProceedings of the Fourteenth International Conference on Learning Representations, 2026, pp. 1–10

  41. [41]

    Representation learning: A review and new perspectives,

    Y . Bengio, A. Courville, and P. Vincent, “Representation learning: A review and new perspectives,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, no. 8, pp. 1798–1828, 2013

  42. [42]

    Gptzero official website,

    GPTZero, “Gptzero official website,” [Online], 2023, https://gptzero.me

  43. [43]

    G3detector: General gpt-generated text detector,

    H. Zhan, X. He, Q. Xu, Y . Wu, and P. Stenetorp, “G3detector: General gpt-generated text detector,”arXiv preprint arXiv:2305.12680, 2023

  44. [44]

    Threat scenarios and best practices to detect neural fake news,

    A. Pagnoni, M. Graciarena, and Y . Tsvetkov, “Threat scenarios and best practices to detect neural fake news,” inProceedings of the 29th International Conference on Computational Linguistics, 2022, pp. 1233– 1249

  45. [45]

    Llmdet: A third party large language models generated text detection tool,

    K. Wu, L. Pang, H. Shen, X. Cheng, and T.-S. Chua, “Llmdet: A third party large language models generated text detection tool,” in Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023, pp. 2113–2133

  46. [46]

    Llm paternity test: Generated text detection with llm genetic inheritance,

    X. Yu, Y . Qi, K. Chen, G. Chen, X. Yang, P. Zhu, W. Zhang, and N. Yu, “Llm paternity test: Generated text detection with llm genetic inheritance,”arXiv preprint arXiv:2305.12519, 2023

  47. [47]

    Multiscale positive-unlabeled detection of ai-generated texts,

    Y . Tian, H. Chen, X. Wang, Z. Bai, Q. ZHANG, R. Li, C. Xu, and Y . Wang, “Multiscale positive-unlabeled detection of ai-generated texts,” inProceedings of The Twelfth International Conference on Learning Representations, 2024

  48. [48]

    Radar: Robust ai-text detection via adversarial learning,

    X. Hu, P.-Y . Chen, and T.-Y . Ho, “Radar: Robust ai-text detection via adversarial learning,”Advances in Neural Information Processing Systems, vol. 36, pp. 15 077–15 095, 2023

  49. [49]

    Detecting and grounding multi-modal media manipulation and beyond,

    R. Shao, T. Wu, J. Wu, L. Nie, and Z. Liu, “Detecting and grounding multi-modal media manipulation and beyond,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 46, no. 8, pp. 5556– 5574, 2024

  50. [50]

    Gpt detectors are biased against non-native english writers,

    W. Liang, M. Yuksekgonul, Y . Mao, E. Wu, and J. Zou, “Gpt detectors are biased against non-native english writers,”Patterns, vol. 4, no. 7, 2023

  51. [51]

    Automatic detection of gen- erated text is easiest when humans are fooled,

    D. Ippolito, D. Duckworth, and D. Eck, “Automatic detection of gen- erated text is easiest when humans are fooled,” inProceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020, pp. 1808–1822

  52. [52]

    Influence-driven data poisoning for robust recommender systems,

    C. Wu, D. Lian, Y . Ge, Z. Zhu, and E. Chen, “Influence-driven data poisoning for robust recommender systems,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 10, pp. 11 915– 11 931, 2023

  53. [53]

    Detecting ai-generated sentences in human-ai collaborative hybrid texts: Challenges, strategies, and insights,

    Z. Zeng, S. Liu, L. Sha, Z. Li, K. Yang, S. Liu, D. Ga ˇsevic, and G. Chen, “Detecting ai-generated sentences in human-ai collaborative hybrid texts: Challenges, strategies, and insights,”Proceedings of International Joint Conferences on Artificial Intelligence, 2024

  54. [54]

    Position: On the possibilities of ai-generated text detection,

    S. Chakraborty, A. Bedi, S. Zhu, B. An, D. Manocha, and F. Huang, “Position: On the possibilities of ai-generated text detection,” inPro- ceedings of Forty-first International Conference on Machine Learning, 2024

  55. [55]

    Topics as entity clusters: Entity-based topics from large language models and graph neural networks,

    M. V . Loureiro, S. Derby, and T. K. Wijaya, “Topics as entity clusters: Entity-based topics from large language models and graph neural networks,” inProceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC- COLING 2024), 2024, pp. 16 315–16 330

  56. [56]

    On the reliability of watermarks for large language models,

    J. Kirchenbauer, J. Geiping, Y . Wen, M. Shu, K. Saifullah, K. Kong, K. Fernando, A. Saha, M. Goldblum, and T. Goldstein, “On the reliability of watermarks for large language models,” inProceedings of The Twelfth International Conference on Learning Representations, 2024

  57. [57]

    Maximum likelihood from incomplete data via the em algorithm,

    A. P. Dempster, N. M. Laird, and D. B. Rubin, “Maximum likelihood from incomplete data via the em algorithm,”Journal of the royal statistical society: series B (methodological), vol. 39, no. 1, pp. 1–22, 1977

  58. [58]

    An equal-size hard em algo- rithm for diverse dialogue generation,

    Y . Wen, Y . Hao, Y . Cao, and L. Mou, “An equal-size hard em algo- rithm for diverse dialogue generation,” inProceedings of The Eleventh International Conference on Learning Representations, 2023

  59. [59]

    Auto-correlation dependent bounds for relational data,

    A. Dhurandhar, “Auto-correlation dependent bounds for relational data,” inProceedings of the 11th Workshop on Mining and Learning with Graphs. Chicago, 2013

  60. [60]

    Mgtbench: Benchmarking machine-generated text detection,

    X. He, X. Shen, Z. Chen, M. Backes, and Y . Zhang, “Mgtbench: Benchmarking machine-generated text detection,” inProceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security, 2024, pp. 2251–2265

  61. [61]

    SQuAD: 100,000+ Questions for Machine Comprehension of Text

    P. Rajpurkar, “Squad: 100,000+ questions for machine comprehension of text,”arXiv preprint arXiv:1606.05250, 2016

  62. [62]

    Detectrl: Benchmarking llm-generated text detection in real-world scenarios,

    J. Wu, R. Zhan, D. Wong, S. Yang, X. Yang, Y . Yuan, and L. Chao, “Detectrl: Benchmarking llm-generated text detection in real-world scenarios,”Advances in Neural Information Processing Systems, vol. 37, pp. 100 369–100 401, 2024

  63. [63]

    Para- phrasing evades detectors of ai-generated text, but retrieval is an effective defense,

    K. Krishna, Y . Song, M. Karpinska, J. Wieting, and M. Iyyer, “Para- phrasing evades detectors of ai-generated text, but retrieval is an effective defense,”Advances in Neural Information Processing Systems, vol. 36, pp. 27 469–27 500, 2023

  64. [64]

    Spotting llms with binoc- ulars: Zero-shot detection of machine-generated text,

    A. Hans, A. Schwarzschild, V . Cherepanova, H. Kazemi, A. Saha, M. Goldblum, J. Geiping, and T. Goldstein, “Spotting llms with binoc- ulars: Zero-shot detection of machine-generated text,” inProceedings of International Conference on Machine Learning. PMLR, 2024, pp. 17 519–17 537

  65. [65]

    Can AI-Generated Text be Reliably Detected?

    V . S. Sadasivan, A. Kumar, S. Balasubramanian, W. Wang, and S. Feizi, “Can ai-generated text be reliably detected?”arXiv preprint arXiv:2303.11156, 2023