Beyond Perplexity: Character Distribution Signatures and the MDTA Benchmark for AI Text Detection
Pith reviewed 2026-05-10 16:27 UTC · model grok-4.3
The pith
AI text can be distinguished from human writing by measuring character frequency patterns that stay domain-specific for people but average out globally for models.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
AI models, trained on massive domain-balanced corpora, approximate global character patterns while humans exhibit domain-specialized distributions, creating a Wall of Separation where human-AI divergence significantly exceeds AI-AI divergence. The Letter Distribution Score captures this signature and shows low correlation with perplexity-based scores, so it improves AUROC and F1 when fused with DNA-DetectLLM, Binoculars, and FastDetectGPT through a non-linear classifier.
What carries the argument
The Wall of Separation, the greater divergence between human domain-specialized character distributions and the global averages produced by AI models, measured through the Letter Distribution Score.
If this is right
- Adding the Letter Distribution Score to existing probability-based detectors raises AUROC and F1 scores, with larger gains in domains that impose tight vocabulary constraints.
- The MDTA benchmark enables controlled tests across four models, five domains, three temperatures, and three adversarial strategies.
- The new score correlates only weakly with perplexity methods, allowing complementary use rather than redundancy.
- Detection performance holds when models are queried at different temperatures or under adversarial rephrasing.
Where Pith is reading between the lines
- The separation may continue to work even after models receive further RLHF that aligns their token probabilities more closely with humans.
- The same character-level signal could be tested on longer documents or on languages other than English to see whether domain specialization remains detectable.
- Future model releases trained on more narrowly curated data might shrink the gap, so the MDTA benchmark could serve as a repeated stress test.
Load-bearing premise
Character distribution differences between human and AI text stay larger than AI-to-AI differences and survive changes in domain, temperature, and adversarial prompting.
What would settle it
An observation that a new model family produces character distributions inside a specialized domain that match human writers in that domain would remove the proposed separation.
Figures
read the original abstract
Training-free AI text detection methods primarily rely on model log-probabilities, achieving strong performance through approaches like Binoculars and DNA-DetectLLM. However, these methods face a fundamental ceiling as models are optimized through RLHF to produce human-like probability distributions. We introduce an alternative detection signal based on character distribution signatures. We provide theoretical foundations showing that AI models, trained on massive domain-balanced corpora, approximate global character patterns while humans exhibit domain-specialized distributions, creating a "Wall of Separation" where human-AI divergence significantly exceeds AI-AI divergence. To enable systematic evaluation, we construct the Models-Domains-Temperatures-Adversarials (MDTA) benchmark comprising 642,274 prompt-aligned samples across 4 models, 5 domains, 3 temperature settings, and 3 adversarial strategies, substantially expanding the HC3 dataset with modern model responses, temperature variation, and adversarial augmentation. We introduce the Letter Distribution Score (LD-Score), demonstrating low correlation (r = 0.08-0.13) with perplexity methods. When integrated with DNA-DetectLLM, Binoculars and FastDetectGPT via a non-linear classifier, LD-Score yields consistent improvements in AUROC and F1, with particularly pronounced gains in specialized domains where vocabulary constraints amplify the detection signal. The MDTA dataset can be accessed at: https://huggingface.co/datasets/nsp909/MDTA.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that character distribution signatures provide a training-free AI text detection signal complementary to perplexity-based methods. It argues for a 'Wall of Separation' in which AI models, trained on domain-balanced corpora, converge to global character statistics while humans retain domain-specialized distributions, such that human-AI divergence exceeds AI-AI divergence. The authors introduce the Letter Distribution Score (LD-Score) and the MDTA benchmark (642,274 prompt-aligned samples across 4 models, 5 domains, 3 temperatures, and 3 adversarial strategies, extending HC3). They report low correlation (r=0.08-0.13) with perplexity methods and AUROC/F1 gains when LD-Score is combined with DNA-DetectLLM, Binoculars, and FastDetectGPT via a non-linear classifier, with larger gains in specialized domains.
Significance. If the separation claim and integration results hold, the work supplies an orthogonal, low-cost detection feature that could strengthen ensembles against RLHF-optimized models and adversarial attacks. The MDTA benchmark is a substantial, publicly released resource that enables controlled evaluation across models, domains, temperatures, and attacks, addressing gaps in prior datasets.
major comments (2)
- [Abstract / Theoretical Foundations] Abstract and theoretical foundations: The 'Wall of Separation' claim—that human domain-specialized character distributions produce larger divergences from AI global approximations than any AI-AI variation—is load-bearing for both the LD-Score motivation and the reported detection gains. The manuscript provides no direct pairwise divergence tables or statistical tests comparing human-AI LD-Score distances against AI-AI distances across the 5 domains and 4 models. Given that English letter frequencies are largely language-invariant (e.g., 'e' ~12.7%, 't' ~9.1%), small domain shifts in human text may not exceed tokenizer- or temperature-induced biases in the models, undermining the separation premise.
- [Results / Integration Experiments] Results section (integration experiments): The reported AUROC and F1 improvements from adding LD-Score to DNA-DetectLLM, Binoculars, and FastDetectGPT are presented without ablation on the non-linear classifier (architecture, hyperparameters, or training split), without per-domain error bars, and without controls for prompt length or adversarial strategy. It is therefore unclear whether the gains are driven by the claimed character signal or by incidental correlations in the MDTA construction.
minor comments (2)
- [Abstract] The exact definition and normalization of the LD-Score (histogram divergence formula, handling of rare characters, or smoothing) is not stated in the abstract or summary; an equation or pseudocode should be added for reproducibility.
- [Benchmark Construction] The MDTA dataset description mentions 'prompt-aligned samples' but does not specify how prompts were chosen or balanced across domains; a table summarizing prompt statistics per domain would improve clarity.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive review. We address each major comment below and commit to strengthening the manuscript with additional analyses and controls as outlined.
read point-by-point responses
-
Referee: [Abstract / Theoretical Foundations] Abstract and theoretical foundations: The 'Wall of Separation' claim—that human domain-specialized character distributions produce larger divergences from AI global approximations than any AI-AI variation—is load-bearing for both the LD-Score motivation and the reported detection gains. The manuscript provides no direct pairwise divergence tables or statistical tests comparing human-AI LD-Score distances against AI-AI distances across the 5 domains and 4 models. Given that English letter frequencies are largely language-invariant (e.g., 'e' ~12.7%, 't' ~9.1%), small domain shifts in human text may not exceed tokenizer- or temperature-induced biases in the models, undermining the separation premise.
Authors: We agree that explicit pairwise comparisons would make the 'Wall of Separation' claim more robust. The current manuscript motivates the claim theoretically and supports it indirectly via low correlation with perplexity methods (r=0.08-0.13) and domain-specific detection gains, but does not include the requested tables. In revision we will add mean/variance LD-Score distance tables and statistical tests (e.g., paired t-tests) for human-AI versus all AI-AI pairs across the 5 domains and 4 models. On the invariance point, LD-Score operates over the full 256-character distribution (including low-frequency characters and their contextual patterns), which domain-specialized human writing perturbs in ways that balanced training corpora smooth; the pronounced gains in specialized domains are consistent with this distinction. revision: yes
-
Referee: [Results / Integration Experiments] Results section (integration experiments): The reported AUROC and F1 improvements from adding LD-Score to DNA-DetectLLM, Binoculars, and FastDetectGPT are presented without ablation on the non-linear classifier (architecture, hyperparameters, or training split), without per-domain error bars, and without controls for prompt length or adversarial strategy. It is therefore unclear whether the gains are driven by the claimed character signal or by incidental correlations in the MDTA construction.
Authors: We acknowledge these controls are necessary for clarity. The reported gains use a fixed non-linear classifier on the prompt-aligned MDTA samples, but the manuscript does not detail architecture/hyperparameter ablations, per-domain error bars, or stratification by length/adversarial strategy. In the revision we will add: (1) classifier ablations (linear vs. non-linear, hyperparameter sweeps, cross-validation splits), (2) per-domain AUROC/F1 with bootstrapped error bars, and (3) results stratified by prompt length quartiles and each adversarial strategy. These additions will isolate LD-Score's contribution from any MDTA construction artifacts. revision: yes
Circularity Check
No significant circularity; derivation grounded in new data and independent signal
full rationale
The paper asserts theoretical foundations for a 'Wall of Separation' based on AI models approximating global character patterns versus human domain specialization, then introduces LD-Score on character histograms and the MDTA benchmark (642k prompt-aligned samples across models/domains/temperatures/adversarials, expanding HC3). Evaluation shows low correlation (r=0.08-0.13) with perplexity methods and AUROC/F1 gains when fused via non-linear classifier, particularly in specialized domains. No quoted equations or steps reduce the claimed separation or LD-Score to a fit on the evaluation set itself, a self-citation chain, or a renamed known result. The benchmark construction and integration results provide external falsifiability outside any single fitted parameter.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption AI models approximate global character patterns due to training on massive domain-balanced corpora
invented entities (1)
-
Wall of Separation
no independent evidence
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We provide theoretical foundations showing that AI models, trained on massive domain-balanced corpora, approximate global character patterns while humans exhibit domain-specialized distributions, creating a 'Wall of Separation' where human-AI divergence significantly exceeds AI-AI divergence.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
How many words do we read per minute? A review and meta-analysis of reading rate , journal =
Marc Brysbaert , keywords =. How many words do we read per minute? A review and meta-analysis of reading rate , journal =. 2019 , issn =. doi:https://doi.org/10.1016/j.jml.2019.104047 , url =
-
[2]
and Grabska-Gradzińska, Iwona and Ochab, Jeremi K
Przystalski, Karol and Argasiński, Jan K. and Grabska-Gradzińska, Iwona and Ochab, Jeremi K. , year=. Stylometry recognizes human and LLM-generated texts in short samples , volume=. doi:10.1016/j.eswa.2025.129001 , journal=
-
[3]
WanJuan-CC: A Safe and High-Quality Open-sourced English Webtext Dataset , author=. 2024 , eprint=
work page 2024
-
[4]
SlimPajama-DC: Understanding Data Combinations for LLM Training , author=. 2024 , eprint=
work page 2024
- [5]
-
[6]
DetectGPT: Zero-Shot Machine-Generated Text Detection using Probability Curvature , author=. 2023 , eprint=
work page 2023
-
[7]
DetectRL: Benchmarking LLM-Generated Text Detection in Real-World Scenarios , author=. 2025 , eprint=
work page 2025
-
[8]
IEEE Transactions on Information theory , volume=
Divergence measures based on the Shannon entropy , author=. IEEE Transactions on Information theory , volume=. 1991 , publisher=
work page 1991
-
[9]
S. Kullback and R. A. Leibler , journal =. On Information and Sufficiency , urldate =
-
[10]
IEEE Transactions on Information theory , volume=
A new metric for probability distributions , author=. IEEE Transactions on Information theory , volume=. 2003 , publisher=
work page 2003
-
[11]
Journal of Applied Psychology , volume=
A new readability yardstick , author=. Journal of Applied Psychology , volume=. 1948 , doi=
work page 1948
- [12]
-
[13]
How Close is ChatGPT to Human Experts? Comparison Corpus, Evaluation, and Detection
Guo, Biyang and Zhang, Xin and Wang, Ziyuan and Jiang, Minqi and Nie, Jinran and Ding, Yuxuan and Yue, Jianwei and Wu, Yupeng. How Close is ChatGPT to Human Experts? Comparison Corpus, Evaluation, and Detection. 2023
work page 2023
-
[14]
Unsupervised and Distributional Detection of Machine-Generated Text , author=. 2021 , eprint=
work page 2021
-
[15]
Stylometric Detection of AI-Generated Text in Twitter Timelines , author=. 2023 , eprint=
work page 2023
-
[16]
Release Strategies and the Social Impacts of Language Models , author=. 2019 , eprint=
work page 2019
-
[17]
Proceedings of the American Philosophical Society , volume=
The Law of Anomalous Numbers , author=. Proceedings of the American Philosophical Society , volume=. 1938 , publisher=
work page 1938
-
[18]
Human Behavior and the Principle of Least Effort: An Introduction to Human Ecology , author=. 1949 , publisher=
work page 1949
-
[19]
DNA-GPT: Divergent N-Gram Analysis for Training-Free Detection of GPT-Generated Text , author=. 2023 , eprint=
work page 2023
- [20]
-
[21]
Fast-DetectGPT: Efficient Zero-Shot Detection of Machine-Generated Text via Conditional Probability Curvature , author=. 2024 , eprint=
work page 2024
-
[22]
AI-Generated Text Detection and Classification Based on BERT Deep Learning Algorithm , author=. 2024 , eprint=
work page 2024
-
[23]
Detection of AI-Generated Texts: A Bi-LSTM and Attention-Based Approach , year=
Blake, John and Miah, Abu Saleh Musa and Kredens, Krzysztof and Shin, Jungpil , journal=. Detection of AI-Generated Texts: A Bi-LSTM and Attention-Based Approach , year=
-
[24]
DNA-DetectLLM: Unveiling AI-Generated Text via a DNA-Inspired Mutation-Repair Paradigm , author=. 2025 , eprint=
work page 2025
-
[25]
Spotting LLMs With Binoculars: Zero-Shot Detection of Machine-Generated Text , author=. 2024 , eprint=
work page 2024
-
[26]
Ghostbuster: Detecting Text Ghostwritten by Large Language Models , author=. 2024 , eprint=
work page 2024
-
[27]
Concentration Inequalities: A Nonasymptotic Theory of Independence , author =. 2013 , publisher =
work page 2013
-
[28]
Li, Ziqi and Zhang, Qi. Linguistic Differences Between AI and Human Comments in Weibo: Detect AI-Generated Text Through Stylometric Features. 2025. doi:10.1007/978-981-95-2725-0_3
-
[29]
M4: Multi-generator, Multi-domain, and Multi-lingual Black-Box Machine-Generated Text Detection , author=. 2024 , eprint=
work page 2024
-
[30]
AI Generated Text Detection , author=
A Comprehensive Dataset for Human vs. AI Generated Text Detection , author=. 2025 , eprint=
work page 2025
-
[31]
P. Langley , title =. Proceedings of the 17th International Conference on Machine Learning (ICML 2000) , address =. 2000 , pages =
work page 2000
-
[32]
T. M. Mitchell. The Need for Biases in Learning Generalizations. 1980
work page 1980
-
[33]
M. J. Kearns , title =
-
[34]
Machine Learning: An Artificial Intelligence Approach, Vol. I. 1983
work page 1983
-
[35]
R. O. Duda and P. E. Hart and D. G. Stork. Pattern Classification. 2000
work page 2000
-
[36]
Suppressed for Anonymity , author=
-
[37]
A. Newell and P. S. Rosenbloom. Mechanisms of Skill Acquisition and the Law of Practice. Cognitive Skills and Their Acquisition. 1981
work page 1981
-
[38]
A. L. Samuel. Some Studies in Machine Learning Using the Game of Checkers. IBM Journal of Research and Development. 1959
work page 1959
-
[39]
Advances in Neural Information Processing Systems 37 , year=
BiScope: AI-generated Text Detection by Checking Memorization of Preceding Tokens , author=. Advances in Neural Information Processing Systems 37 , year=
-
[40]
Reliably Bounding False Positives: A Zero-Shot Machine-Generated Text Detection Framework via Multiscaled Conformal Prediction , author=. 2025 , eprint=
work page 2025
-
[41]
Support-vector networks , author=. Machine Learning , volume=. 1995 , publisher=
work page 1995
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.