Beyond Perplexity: Character Distribution Signatures and the MDTA Benchmark for AI Text Detection

Fardina Fathmiul Alam; Klint Faber; Priyadarshan Narayanasamy; Swastik Agrawal

arxiv: 2605.01647 · v1 · submitted 2026-05-03 · 💻 cs.CL

Beyond Perplexity: Character Distribution Signatures and the MDTA Benchmark for AI Text Detection

Priyadarshan Narayanasamy , Swastik Agrawal , Klint Faber , Fardina Fathmiul Alam This is my paper

Pith reviewed 2026-05-10 16:27 UTC · model grok-4.3

classification 💻 cs.CL

keywords AI text detectioncharacter distributionWall of SeparationMDTA benchmarkLetter Distribution Scoreperplexity alternativesadversarial detection

0 comments

The pith

AI text can be distinguished from human writing by measuring character frequency patterns that stay domain-specific for people but average out globally for models.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that large language models, trained on broad balanced data, settle into character usage that looks roughly the same across topics, while human writers shift their letter and punctuation habits to match the domain. This gap produces a detection signal that does not rely on the model's internal probabilities and therefore survives the optimization steps that weaken perplexity-based detectors. The authors back the claim with the MDTA benchmark of over 600,000 aligned samples and a new Letter Distribution Score that adds measurable lift when combined with existing methods, especially inside narrow domains.

Core claim

AI models, trained on massive domain-balanced corpora, approximate global character patterns while humans exhibit domain-specialized distributions, creating a Wall of Separation where human-AI divergence significantly exceeds AI-AI divergence. The Letter Distribution Score captures this signature and shows low correlation with perplexity-based scores, so it improves AUROC and F1 when fused with DNA-DetectLLM, Binoculars, and FastDetectGPT through a non-linear classifier.

What carries the argument

The Wall of Separation, the greater divergence between human domain-specialized character distributions and the global averages produced by AI models, measured through the Letter Distribution Score.

If this is right

Adding the Letter Distribution Score to existing probability-based detectors raises AUROC and F1 scores, with larger gains in domains that impose tight vocabulary constraints.
The MDTA benchmark enables controlled tests across four models, five domains, three temperatures, and three adversarial strategies.
The new score correlates only weakly with perplexity methods, allowing complementary use rather than redundancy.
Detection performance holds when models are queried at different temperatures or under adversarial rephrasing.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The separation may continue to work even after models receive further RLHF that aligns their token probabilities more closely with humans.
The same character-level signal could be tested on longer documents or on languages other than English to see whether domain specialization remains detectable.
Future model releases trained on more narrowly curated data might shrink the gap, so the MDTA benchmark could serve as a repeated stress test.

Load-bearing premise

Character distribution differences between human and AI text stay larger than AI-to-AI differences and survive changes in domain, temperature, and adversarial prompting.

What would settle it

An observation that a new model family produces character distributions inside a specialized domain that match human writers in that domain would remove the proposed separation.

Figures

Figures reproduced from arXiv: 2605.01647 by Fardina Fathmiul Alam, Klint Faber, Priyadarshan Narayanasamy, Swastik Agrawal.

**Figure 2.** Figure 2: Principal Component Analysis (PCA) of letter probability distributions shows the [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 3.** Figure 3: Domain-specific LD-Score analysis on Ghostbuster dataset. Top row: Pairwise [PITH_FULL_IMAGE:figures/full_fig_p014_3.png] view at source ↗

**Figure 4.** Figure 4: Letter-level log-probability deviations from the human baseline across two do [PITH_FULL_IMAGE:figures/full_fig_p015_4.png] view at source ↗

**Figure 5.** Figure 5: Domain-specific LD-Score analysis on some domains of the MDTA dataset. Top [PITH_FULL_IMAGE:figures/full_fig_p018_5.png] view at source ↗

**Figure 6.** Figure 6: Distribution of percentage reduction in target letter frequency for the [PITH_FULL_IMAGE:figures/full_fig_p019_6.png] view at source ↗

**Figure 7.** Figure 7: Percentage of samples where target letter(s) were fully absent from model outputs, [PITH_FULL_IMAGE:figures/full_fig_p019_7.png] view at source ↗

**Figure 8.** Figure 8: Per-letter avoidance success rate (100% removal) under [PITH_FULL_IMAGE:figures/full_fig_p020_8.png] view at source ↗

**Figure 9.** Figure 9: Instance-level readability analysis (FKGL Kernel Density Estimation (KDE)) across [PITH_FULL_IMAGE:figures/full_fig_p020_9.png] view at source ↗

**Figure 10.** Figure 10: Instance-level lexical diversity (LDS) distributions across domains (Finance, [PITH_FULL_IMAGE:figures/full_fig_p021_10.png] view at source ↗

**Figure 11.** Figure 11: Temperature-dependent detection performance across domains for Gemma-3-12B [PITH_FULL_IMAGE:figures/full_fig_p024_11.png] view at source ↗

read the original abstract

Training-free AI text detection methods primarily rely on model log-probabilities, achieving strong performance through approaches like Binoculars and DNA-DetectLLM. However, these methods face a fundamental ceiling as models are optimized through RLHF to produce human-like probability distributions. We introduce an alternative detection signal based on character distribution signatures. We provide theoretical foundations showing that AI models, trained on massive domain-balanced corpora, approximate global character patterns while humans exhibit domain-specialized distributions, creating a "Wall of Separation" where human-AI divergence significantly exceeds AI-AI divergence. To enable systematic evaluation, we construct the Models-Domains-Temperatures-Adversarials (MDTA) benchmark comprising 642,274 prompt-aligned samples across 4 models, 5 domains, 3 temperature settings, and 3 adversarial strategies, substantially expanding the HC3 dataset with modern model responses, temperature variation, and adversarial augmentation. We introduce the Letter Distribution Score (LD-Score), demonstrating low correlation (r = 0.08-0.13) with perplexity methods. When integrated with DNA-DetectLLM, Binoculars and FastDetectGPT via a non-linear classifier, LD-Score yields consistent improvements in AUROC and F1, with particularly pronounced gains in specialized domains where vocabulary constraints amplify the detection signal. The MDTA dataset can be accessed at: https://huggingface.co/datasets/nsp909/MDTA.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

A useful new benchmark and orthogonal signal from character distributions, though the supporting theory on domain-specific human patterns looks thin.

read the letter

The main takeaway is that this work supplies a character-distribution based detector that is largely independent of perplexity signals, supported by a substantially larger benchmark than previous efforts. What stands out is the MDTA dataset construction: 642k prompt-aligned samples covering four models, five domains, three temperatures, and three adversarial strategies. That's a useful expansion of HC3. The LD-Score shows low correlation with existing methods and, when combined through a non-linear classifier, delivers consistent AUROC and F1 gains, especially noticeable in specialized domains. The weaker part is the theoretical framing. The 'Wall of Separation' rests on AI models approximating global character patterns while humans retain domain-specialized ones. In reality, basic character frequencies in English vary little by domain, so the human-AI gap may not reliably exceed model-to-model differences once you account for tokenizer effects or training data mixtures. The reported gains are promising but need checking against whether this signal persists with stronger or future models. This paper is for the AI text detection crowd. The benchmark alone gives it value for testing new ideas, and the idea of an orthogonal signal is worth exploring. It should go through peer review so the community can assess the full methods and the robustness of the separation claim.

Referee Report

2 major / 2 minor

Summary. The paper claims that character distribution signatures provide a training-free AI text detection signal complementary to perplexity-based methods. It argues for a 'Wall of Separation' in which AI models, trained on domain-balanced corpora, converge to global character statistics while humans retain domain-specialized distributions, such that human-AI divergence exceeds AI-AI divergence. The authors introduce the Letter Distribution Score (LD-Score) and the MDTA benchmark (642,274 prompt-aligned samples across 4 models, 5 domains, 3 temperatures, and 3 adversarial strategies, extending HC3). They report low correlation (r=0.08-0.13) with perplexity methods and AUROC/F1 gains when LD-Score is combined with DNA-DetectLLM, Binoculars, and FastDetectGPT via a non-linear classifier, with larger gains in specialized domains.

Significance. If the separation claim and integration results hold, the work supplies an orthogonal, low-cost detection feature that could strengthen ensembles against RLHF-optimized models and adversarial attacks. The MDTA benchmark is a substantial, publicly released resource that enables controlled evaluation across models, domains, temperatures, and attacks, addressing gaps in prior datasets.

major comments (2)

[Abstract / Theoretical Foundations] Abstract and theoretical foundations: The 'Wall of Separation' claim—that human domain-specialized character distributions produce larger divergences from AI global approximations than any AI-AI variation—is load-bearing for both the LD-Score motivation and the reported detection gains. The manuscript provides no direct pairwise divergence tables or statistical tests comparing human-AI LD-Score distances against AI-AI distances across the 5 domains and 4 models. Given that English letter frequencies are largely language-invariant (e.g., 'e' ~12.7%, 't' ~9.1%), small domain shifts in human text may not exceed tokenizer- or temperature-induced biases in the models, undermining the separation premise.
[Results / Integration Experiments] Results section (integration experiments): The reported AUROC and F1 improvements from adding LD-Score to DNA-DetectLLM, Binoculars, and FastDetectGPT are presented without ablation on the non-linear classifier (architecture, hyperparameters, or training split), without per-domain error bars, and without controls for prompt length or adversarial strategy. It is therefore unclear whether the gains are driven by the claimed character signal or by incidental correlations in the MDTA construction.

minor comments (2)

[Abstract] The exact definition and normalization of the LD-Score (histogram divergence formula, handling of rare characters, or smoothing) is not stated in the abstract or summary; an equation or pseudocode should be added for reproducibility.
[Benchmark Construction] The MDTA dataset description mentions 'prompt-aligned samples' but does not specify how prompts were chosen or balanced across domains; a table summarizing prompt statistics per domain would improve clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive review. We address each major comment below and commit to strengthening the manuscript with additional analyses and controls as outlined.

read point-by-point responses

Referee: [Abstract / Theoretical Foundations] Abstract and theoretical foundations: The 'Wall of Separation' claim—that human domain-specialized character distributions produce larger divergences from AI global approximations than any AI-AI variation—is load-bearing for both the LD-Score motivation and the reported detection gains. The manuscript provides no direct pairwise divergence tables or statistical tests comparing human-AI LD-Score distances against AI-AI distances across the 5 domains and 4 models. Given that English letter frequencies are largely language-invariant (e.g., 'e' ~12.7%, 't' ~9.1%), small domain shifts in human text may not exceed tokenizer- or temperature-induced biases in the models, undermining the separation premise.

Authors: We agree that explicit pairwise comparisons would make the 'Wall of Separation' claim more robust. The current manuscript motivates the claim theoretically and supports it indirectly via low correlation with perplexity methods (r=0.08-0.13) and domain-specific detection gains, but does not include the requested tables. In revision we will add mean/variance LD-Score distance tables and statistical tests (e.g., paired t-tests) for human-AI versus all AI-AI pairs across the 5 domains and 4 models. On the invariance point, LD-Score operates over the full 256-character distribution (including low-frequency characters and their contextual patterns), which domain-specialized human writing perturbs in ways that balanced training corpora smooth; the pronounced gains in specialized domains are consistent with this distinction. revision: yes
Referee: [Results / Integration Experiments] Results section (integration experiments): The reported AUROC and F1 improvements from adding LD-Score to DNA-DetectLLM, Binoculars, and FastDetectGPT are presented without ablation on the non-linear classifier (architecture, hyperparameters, or training split), without per-domain error bars, and without controls for prompt length or adversarial strategy. It is therefore unclear whether the gains are driven by the claimed character signal or by incidental correlations in the MDTA construction.

Authors: We acknowledge these controls are necessary for clarity. The reported gains use a fixed non-linear classifier on the prompt-aligned MDTA samples, but the manuscript does not detail architecture/hyperparameter ablations, per-domain error bars, or stratification by length/adversarial strategy. In the revision we will add: (1) classifier ablations (linear vs. non-linear, hyperparameter sweeps, cross-validation splits), (2) per-domain AUROC/F1 with bootstrapped error bars, and (3) results stratified by prompt length quartiles and each adversarial strategy. These additions will isolate LD-Score's contribution from any MDTA construction artifacts. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation grounded in new data and independent signal

full rationale

The paper asserts theoretical foundations for a 'Wall of Separation' based on AI models approximating global character patterns versus human domain specialization, then introduces LD-Score on character histograms and the MDTA benchmark (642k prompt-aligned samples across models/domains/temperatures/adversarials, expanding HC3). Evaluation shows low correlation (r=0.08-0.13) with perplexity methods and AUROC/F1 gains when fused via non-linear classifier, particularly in specialized domains. No quoted equations or steps reduce the claimed separation or LD-Score to a fit on the evaluation set itself, a self-citation chain, or a renamed known result. The benchmark construction and integration results provide external falsifiability outside any single fitted parameter.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The paper rests on a domain assumption about training data balance and introduces a new conceptual framing; no explicit free parameters are described in the abstract.

axioms (1)

domain assumption AI models approximate global character patterns due to training on massive domain-balanced corpora
Stated as the basis for the theoretical foundation and Wall of Separation.

invented entities (1)

Wall of Separation no independent evidence
purpose: To describe the claimed larger divergence between human and AI character distributions than between AI models
New conceptual term introduced to frame the detection signal

pith-pipeline@v0.9.0 · 5576 in / 1201 out tokens · 90686 ms · 2026-05-10T16:27:08.754049+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We provide theoretical foundations showing that AI models, trained on massive domain-balanced corpora, approximate global character patterns while humans exhibit domain-specialized distributions, creating a 'Wall of Separation' where human-AI divergence significantly exceeds AI-AI divergence.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

41 extracted references · 41 canonical work pages

[1]

How many words do we read per minute? A review and meta-analysis of reading rate , journal =

Marc Brysbaert , keywords =. How many words do we read per minute? A review and meta-analysis of reading rate , journal =. 2019 , issn =. doi:https://doi.org/10.1016/j.jml.2019.104047 , url =

work page doi:10.1016/j.jml.2019.104047 2019
[2]

and Grabska-Gradzińska, Iwona and Ochab, Jeremi K

Przystalski, Karol and Argasiński, Jan K. and Grabska-Gradzińska, Iwona and Ochab, Jeremi K. , year=. Stylometry recognizes human and LLM-generated texts in short samples , volume=. doi:10.1016/j.eswa.2025.129001 , journal=

work page doi:10.1016/j.eswa.2025.129001 2025
[3]

2024 , eprint=

WanJuan-CC: A Safe and High-Quality Open-sourced English Webtext Dataset , author=. 2024 , eprint=

work page 2024
[4]

2024 , eprint=

SlimPajama-DC: Understanding Data Combinations for LLM Training , author=. 2024 , eprint=

work page 2024
[5]

BetaKit , url =

Nichols, Tom , title =. BetaKit , url =. 2025 , month = dec, day =

work page 2025
[6]

2023 , eprint=

DetectGPT: Zero-Shot Machine-Generated Text Detection using Probability Curvature , author=. 2023 , eprint=

work page 2023
[7]

2025 , eprint=

DetectRL: Benchmarking LLM-Generated Text Detection in Real-World Scenarios , author=. 2025 , eprint=

work page 2025
[8]

IEEE Transactions on Information theory , volume=

Divergence measures based on the Shannon entropy , author=. IEEE Transactions on Information theory , volume=. 1991 , publisher=

work page 1991
[9]

Kullback and R

S. Kullback and R. A. Leibler , journal =. On Information and Sufficiency , urldate =

work page
[10]

IEEE Transactions on Information theory , volume=

A new metric for probability distributions , author=. IEEE Transactions on Information theory , volume=. 2003 , publisher=

work page 2003
[11]

Journal of Applied Psychology , volume=

A new readability yardstick , author=. Journal of Applied Psychology , volume=. 1948 , doi=

work page 1948
[12]

, journal=

Lin, J. , journal=. Divergence measures based on the Shannon entropy , year=

work page
[13]

How Close is ChatGPT to Human Experts? Comparison Corpus, Evaluation, and Detection

Guo, Biyang and Zhang, Xin and Wang, Ziyuan and Jiang, Minqi and Nie, Jinran and Ding, Yuxuan and Yue, Jianwei and Wu, Yupeng. How Close is ChatGPT to Human Experts? Comparison Corpus, Evaluation, and Detection. 2023

work page 2023
[14]

2021 , eprint=

Unsupervised and Distributional Detection of Machine-Generated Text , author=. 2021 , eprint=

work page 2021
[15]

2023 , eprint=

Stylometric Detection of AI-Generated Text in Twitter Timelines , author=. 2023 , eprint=

work page 2023
[16]

2019 , eprint=

Release Strategies and the Social Impacts of Language Models , author=. 2019 , eprint=

work page 2019
[17]

Proceedings of the American Philosophical Society , volume=

The Law of Anomalous Numbers , author=. Proceedings of the American Philosophical Society , volume=. 1938 , publisher=

work page 1938
[18]

1949 , publisher=

Human Behavior and the Principle of Least Effort: An Introduction to Human Ecology , author=. 1949 , publisher=

work page 1949
[19]

2023 , eprint=

DNA-GPT: Divergent N-Gram Analysis for Training-Free Detection of GPT-Generated Text , author=. 2023 , eprint=

work page 2023
[20]

2025 , eprint=

Can AI-Generated Text be Reliably Detected? , author=. 2025 , eprint=

work page 2025
[21]

2024 , eprint=

Fast-DetectGPT: Efficient Zero-Shot Detection of Machine-Generated Text via Conditional Probability Curvature , author=. 2024 , eprint=

work page 2024
[22]

2024 , eprint=

AI-Generated Text Detection and Classification Based on BERT Deep Learning Algorithm , author=. 2024 , eprint=

work page 2024
[23]

Detection of AI-Generated Texts: A Bi-LSTM and Attention-Based Approach , year=

Blake, John and Miah, Abu Saleh Musa and Kredens, Krzysztof and Shin, Jungpil , journal=. Detection of AI-Generated Texts: A Bi-LSTM and Attention-Based Approach , year=

work page
[24]

2025 , eprint=

DNA-DetectLLM: Unveiling AI-Generated Text via a DNA-Inspired Mutation-Repair Paradigm , author=. 2025 , eprint=

work page 2025
[25]

2024 , eprint=

Spotting LLMs With Binoculars: Zero-Shot Detection of Machine-Generated Text , author=. 2024 , eprint=

work page 2024
[26]

2024 , eprint=

Ghostbuster: Detecting Text Ghostwritten by Large Language Models , author=. 2024 , eprint=

work page 2024
[27]

2013 , publisher =

Concentration Inequalities: A Nonasymptotic Theory of Independence , author =. 2013 , publisher =

work page 2013
[28]

Linguistic Differences Between AI and Human Comments in Weibo: Detect AI-Generated Text Through Stylometric Features

Li, Ziqi and Zhang, Qi. Linguistic Differences Between AI and Human Comments in Weibo: Detect AI-Generated Text Through Stylometric Features. 2025. doi:10.1007/978-981-95-2725-0_3

work page doi:10.1007/978-981-95-2725-0_3 2025
[29]

2024 , eprint=

M4: Multi-generator, Multi-domain, and Multi-lingual Black-Box Machine-Generated Text Detection , author=. 2024 , eprint=

work page 2024
[30]

AI Generated Text Detection , author=

A Comprehensive Dataset for Human vs. AI Generated Text Detection , author=. 2025 , eprint=

work page 2025
[31]

Langley , title =

P. Langley , title =. Proceedings of the 17th International Conference on Machine Learning (ICML 2000) , address =. 2000 , pages =

work page 2000
[32]

T. M. Mitchell. The Need for Biases in Learning Generalizations. 1980

work page 1980
[33]

M. J. Kearns , title =

work page
[34]

Machine Learning: An Artificial Intelligence Approach, Vol. I. 1983

work page 1983
[35]

R. O. Duda and P. E. Hart and D. G. Stork. Pattern Classification. 2000

work page 2000
[36]

Suppressed for Anonymity , author=

work page
[37]

Newell and P

A. Newell and P. S. Rosenbloom. Mechanisms of Skill Acquisition and the Law of Practice. Cognitive Skills and Their Acquisition. 1981

work page 1981
[38]

A. L. Samuel. Some Studies in Machine Learning Using the Game of Checkers. IBM Journal of Research and Development. 1959

work page 1959
[39]

Advances in Neural Information Processing Systems 37 , year=

BiScope: AI-generated Text Detection by Checking Memorization of Preceding Tokens , author=. Advances in Neural Information Processing Systems 37 , year=

work page
[40]

2025 , eprint=

Reliably Bounding False Positives: A Zero-Shot Machine-Generated Text Detection Framework via Multiscaled Conformal Prediction , author=. 2025 , eprint=

work page 2025
[41]

Machine Learning , volume=

Support-vector networks , author=. Machine Learning , volume=. 1995 , publisher=

work page 1995

[1] [1]

How many words do we read per minute? A review and meta-analysis of reading rate , journal =

Marc Brysbaert , keywords =. How many words do we read per minute? A review and meta-analysis of reading rate , journal =. 2019 , issn =. doi:https://doi.org/10.1016/j.jml.2019.104047 , url =

work page doi:10.1016/j.jml.2019.104047 2019

[2] [2]

and Grabska-Gradzińska, Iwona and Ochab, Jeremi K

Przystalski, Karol and Argasiński, Jan K. and Grabska-Gradzińska, Iwona and Ochab, Jeremi K. , year=. Stylometry recognizes human and LLM-generated texts in short samples , volume=. doi:10.1016/j.eswa.2025.129001 , journal=

work page doi:10.1016/j.eswa.2025.129001 2025

[3] [3]

2024 , eprint=

WanJuan-CC: A Safe and High-Quality Open-sourced English Webtext Dataset , author=. 2024 , eprint=

work page 2024

[4] [4]

2024 , eprint=

SlimPajama-DC: Understanding Data Combinations for LLM Training , author=. 2024 , eprint=

work page 2024

[5] [5]

BetaKit , url =

Nichols, Tom , title =. BetaKit , url =. 2025 , month = dec, day =

work page 2025

[6] [6]

2023 , eprint=

DetectGPT: Zero-Shot Machine-Generated Text Detection using Probability Curvature , author=. 2023 , eprint=

work page 2023

[7] [7]

2025 , eprint=

DetectRL: Benchmarking LLM-Generated Text Detection in Real-World Scenarios , author=. 2025 , eprint=

work page 2025

[8] [8]

IEEE Transactions on Information theory , volume=

Divergence measures based on the Shannon entropy , author=. IEEE Transactions on Information theory , volume=. 1991 , publisher=

work page 1991

[9] [9]

Kullback and R

S. Kullback and R. A. Leibler , journal =. On Information and Sufficiency , urldate =

work page

[10] [10]

IEEE Transactions on Information theory , volume=

A new metric for probability distributions , author=. IEEE Transactions on Information theory , volume=. 2003 , publisher=

work page 2003

[11] [11]

Journal of Applied Psychology , volume=

A new readability yardstick , author=. Journal of Applied Psychology , volume=. 1948 , doi=

work page 1948

[12] [12]

, journal=

Lin, J. , journal=. Divergence measures based on the Shannon entropy , year=

work page

[13] [13]

How Close is ChatGPT to Human Experts? Comparison Corpus, Evaluation, and Detection

Guo, Biyang and Zhang, Xin and Wang, Ziyuan and Jiang, Minqi and Nie, Jinran and Ding, Yuxuan and Yue, Jianwei and Wu, Yupeng. How Close is ChatGPT to Human Experts? Comparison Corpus, Evaluation, and Detection. 2023

work page 2023

[14] [14]

2021 , eprint=

Unsupervised and Distributional Detection of Machine-Generated Text , author=. 2021 , eprint=

work page 2021

[15] [15]

2023 , eprint=

Stylometric Detection of AI-Generated Text in Twitter Timelines , author=. 2023 , eprint=

work page 2023

[16] [16]

2019 , eprint=

Release Strategies and the Social Impacts of Language Models , author=. 2019 , eprint=

work page 2019

[17] [17]

Proceedings of the American Philosophical Society , volume=

The Law of Anomalous Numbers , author=. Proceedings of the American Philosophical Society , volume=. 1938 , publisher=

work page 1938

[18] [18]

1949 , publisher=

Human Behavior and the Principle of Least Effort: An Introduction to Human Ecology , author=. 1949 , publisher=

work page 1949

[19] [19]

2023 , eprint=

DNA-GPT: Divergent N-Gram Analysis for Training-Free Detection of GPT-Generated Text , author=. 2023 , eprint=

work page 2023

[20] [20]

2025 , eprint=

Can AI-Generated Text be Reliably Detected? , author=. 2025 , eprint=

work page 2025

[21] [21]

2024 , eprint=

Fast-DetectGPT: Efficient Zero-Shot Detection of Machine-Generated Text via Conditional Probability Curvature , author=. 2024 , eprint=

work page 2024

[22] [22]

2024 , eprint=

AI-Generated Text Detection and Classification Based on BERT Deep Learning Algorithm , author=. 2024 , eprint=

work page 2024

[23] [23]

Detection of AI-Generated Texts: A Bi-LSTM and Attention-Based Approach , year=

Blake, John and Miah, Abu Saleh Musa and Kredens, Krzysztof and Shin, Jungpil , journal=. Detection of AI-Generated Texts: A Bi-LSTM and Attention-Based Approach , year=

work page

[24] [24]

2025 , eprint=

DNA-DetectLLM: Unveiling AI-Generated Text via a DNA-Inspired Mutation-Repair Paradigm , author=. 2025 , eprint=

work page 2025

[25] [25]

2024 , eprint=

Spotting LLMs With Binoculars: Zero-Shot Detection of Machine-Generated Text , author=. 2024 , eprint=

work page 2024

[26] [26]

2024 , eprint=

Ghostbuster: Detecting Text Ghostwritten by Large Language Models , author=. 2024 , eprint=

work page 2024

[27] [27]

2013 , publisher =

Concentration Inequalities: A Nonasymptotic Theory of Independence , author =. 2013 , publisher =

work page 2013

[28] [28]

Linguistic Differences Between AI and Human Comments in Weibo: Detect AI-Generated Text Through Stylometric Features

Li, Ziqi and Zhang, Qi. Linguistic Differences Between AI and Human Comments in Weibo: Detect AI-Generated Text Through Stylometric Features. 2025. doi:10.1007/978-981-95-2725-0_3

work page doi:10.1007/978-981-95-2725-0_3 2025

[29] [29]

2024 , eprint=

M4: Multi-generator, Multi-domain, and Multi-lingual Black-Box Machine-Generated Text Detection , author=. 2024 , eprint=

work page 2024

[30] [30]

AI Generated Text Detection , author=

A Comprehensive Dataset for Human vs. AI Generated Text Detection , author=. 2025 , eprint=

work page 2025

[31] [31]

Langley , title =

P. Langley , title =. Proceedings of the 17th International Conference on Machine Learning (ICML 2000) , address =. 2000 , pages =

work page 2000

[32] [32]

T. M. Mitchell. The Need for Biases in Learning Generalizations. 1980

work page 1980

[33] [33]

M. J. Kearns , title =

work page

[34] [34]

Machine Learning: An Artificial Intelligence Approach, Vol. I. 1983

work page 1983

[35] [35]

R. O. Duda and P. E. Hart and D. G. Stork. Pattern Classification. 2000

work page 2000

[36] [36]

Suppressed for Anonymity , author=

work page

[37] [37]

Newell and P

A. Newell and P. S. Rosenbloom. Mechanisms of Skill Acquisition and the Law of Practice. Cognitive Skills and Their Acquisition. 1981

work page 1981

[38] [38]

A. L. Samuel. Some Studies in Machine Learning Using the Game of Checkers. IBM Journal of Research and Development. 1959

work page 1959

[39] [39]

Advances in Neural Information Processing Systems 37 , year=

BiScope: AI-generated Text Detection by Checking Memorization of Preceding Tokens , author=. Advances in Neural Information Processing Systems 37 , year=

work page

[40] [40]

2025 , eprint=

Reliably Bounding False Positives: A Zero-Shot Machine-Generated Text Detection Framework via Multiscaled Conformal Prediction , author=. 2025 , eprint=

work page 2025

[41] [41]

Machine Learning , volume=

Support-vector networks , author=. Machine Learning , volume=. 1995 , publisher=

work page 1995