pith. sign in

arxiv: 2605.01647 · v1 · submitted 2026-05-03 · 💻 cs.CL

Beyond Perplexity: Character Distribution Signatures and the MDTA Benchmark for AI Text Detection

Pith reviewed 2026-05-10 16:27 UTC · model grok-4.3

classification 💻 cs.CL
keywords AI text detectioncharacter distributionWall of SeparationMDTA benchmarkLetter Distribution Scoreperplexity alternativesadversarial detection
0
0 comments X

The pith

AI text can be distinguished from human writing by measuring character frequency patterns that stay domain-specific for people but average out globally for models.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that large language models, trained on broad balanced data, settle into character usage that looks roughly the same across topics, while human writers shift their letter and punctuation habits to match the domain. This gap produces a detection signal that does not rely on the model's internal probabilities and therefore survives the optimization steps that weaken perplexity-based detectors. The authors back the claim with the MDTA benchmark of over 600,000 aligned samples and a new Letter Distribution Score that adds measurable lift when combined with existing methods, especially inside narrow domains.

Core claim

AI models, trained on massive domain-balanced corpora, approximate global character patterns while humans exhibit domain-specialized distributions, creating a Wall of Separation where human-AI divergence significantly exceeds AI-AI divergence. The Letter Distribution Score captures this signature and shows low correlation with perplexity-based scores, so it improves AUROC and F1 when fused with DNA-DetectLLM, Binoculars, and FastDetectGPT through a non-linear classifier.

What carries the argument

The Wall of Separation, the greater divergence between human domain-specialized character distributions and the global averages produced by AI models, measured through the Letter Distribution Score.

If this is right

  • Adding the Letter Distribution Score to existing probability-based detectors raises AUROC and F1 scores, with larger gains in domains that impose tight vocabulary constraints.
  • The MDTA benchmark enables controlled tests across four models, five domains, three temperatures, and three adversarial strategies.
  • The new score correlates only weakly with perplexity methods, allowing complementary use rather than redundancy.
  • Detection performance holds when models are queried at different temperatures or under adversarial rephrasing.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The separation may continue to work even after models receive further RLHF that aligns their token probabilities more closely with humans.
  • The same character-level signal could be tested on longer documents or on languages other than English to see whether domain specialization remains detectable.
  • Future model releases trained on more narrowly curated data might shrink the gap, so the MDTA benchmark could serve as a repeated stress test.

Load-bearing premise

Character distribution differences between human and AI text stay larger than AI-to-AI differences and survive changes in domain, temperature, and adversarial prompting.

What would settle it

An observation that a new model family produces character distributions inside a specialized domain that match human writers in that domain would remove the proposed separation.

Figures

Figures reproduced from arXiv: 2605.01647 by Fardina Fathmiul Alam, Klint Faber, Priyadarshan Narayanasamy, Swastik Agrawal.

Figure 1
Figure 1. Figure 1: Pairwise LD-Scores between complete (A-Z) letter distributions in the Essay [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Principal Component Analysis (PCA) of letter probability distributions shows the [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Domain-specific LD-Score analysis on Ghostbuster dataset. Top row: Pairwise [PITH_FULL_IMAGE:figures/full_fig_p014_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Letter-level log-probability deviations from the human baseline across two do [PITH_FULL_IMAGE:figures/full_fig_p015_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Domain-specific LD-Score analysis on some domains of the MDTA dataset. Top [PITH_FULL_IMAGE:figures/full_fig_p018_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Distribution of percentage reduction in target letter frequency for the [PITH_FULL_IMAGE:figures/full_fig_p019_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Percentage of samples where target letter(s) were fully absent from model outputs, [PITH_FULL_IMAGE:figures/full_fig_p019_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Per-letter avoidance success rate (100% removal) under [PITH_FULL_IMAGE:figures/full_fig_p020_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Instance-level readability analysis (FKGL Kernel Density Estimation (KDE)) across [PITH_FULL_IMAGE:figures/full_fig_p020_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Instance-level lexical diversity (LDS) distributions across domains (Finance, [PITH_FULL_IMAGE:figures/full_fig_p021_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Temperature-dependent detection performance across domains for Gemma-3-12B [PITH_FULL_IMAGE:figures/full_fig_p024_11.png] view at source ↗
read the original abstract

Training-free AI text detection methods primarily rely on model log-probabilities, achieving strong performance through approaches like Binoculars and DNA-DetectLLM. However, these methods face a fundamental ceiling as models are optimized through RLHF to produce human-like probability distributions. We introduce an alternative detection signal based on character distribution signatures. We provide theoretical foundations showing that AI models, trained on massive domain-balanced corpora, approximate global character patterns while humans exhibit domain-specialized distributions, creating a "Wall of Separation" where human-AI divergence significantly exceeds AI-AI divergence. To enable systematic evaluation, we construct the Models-Domains-Temperatures-Adversarials (MDTA) benchmark comprising 642,274 prompt-aligned samples across 4 models, 5 domains, 3 temperature settings, and 3 adversarial strategies, substantially expanding the HC3 dataset with modern model responses, temperature variation, and adversarial augmentation. We introduce the Letter Distribution Score (LD-Score), demonstrating low correlation (r = 0.08-0.13) with perplexity methods. When integrated with DNA-DetectLLM, Binoculars and FastDetectGPT via a non-linear classifier, LD-Score yields consistent improvements in AUROC and F1, with particularly pronounced gains in specialized domains where vocabulary constraints amplify the detection signal. The MDTA dataset can be accessed at: https://huggingface.co/datasets/nsp909/MDTA.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper claims that character distribution signatures provide a training-free AI text detection signal complementary to perplexity-based methods. It argues for a 'Wall of Separation' in which AI models, trained on domain-balanced corpora, converge to global character statistics while humans retain domain-specialized distributions, such that human-AI divergence exceeds AI-AI divergence. The authors introduce the Letter Distribution Score (LD-Score) and the MDTA benchmark (642,274 prompt-aligned samples across 4 models, 5 domains, 3 temperatures, and 3 adversarial strategies, extending HC3). They report low correlation (r=0.08-0.13) with perplexity methods and AUROC/F1 gains when LD-Score is combined with DNA-DetectLLM, Binoculars, and FastDetectGPT via a non-linear classifier, with larger gains in specialized domains.

Significance. If the separation claim and integration results hold, the work supplies an orthogonal, low-cost detection feature that could strengthen ensembles against RLHF-optimized models and adversarial attacks. The MDTA benchmark is a substantial, publicly released resource that enables controlled evaluation across models, domains, temperatures, and attacks, addressing gaps in prior datasets.

major comments (2)
  1. [Abstract / Theoretical Foundations] Abstract and theoretical foundations: The 'Wall of Separation' claim—that human domain-specialized character distributions produce larger divergences from AI global approximations than any AI-AI variation—is load-bearing for both the LD-Score motivation and the reported detection gains. The manuscript provides no direct pairwise divergence tables or statistical tests comparing human-AI LD-Score distances against AI-AI distances across the 5 domains and 4 models. Given that English letter frequencies are largely language-invariant (e.g., 'e' ~12.7%, 't' ~9.1%), small domain shifts in human text may not exceed tokenizer- or temperature-induced biases in the models, undermining the separation premise.
  2. [Results / Integration Experiments] Results section (integration experiments): The reported AUROC and F1 improvements from adding LD-Score to DNA-DetectLLM, Binoculars, and FastDetectGPT are presented without ablation on the non-linear classifier (architecture, hyperparameters, or training split), without per-domain error bars, and without controls for prompt length or adversarial strategy. It is therefore unclear whether the gains are driven by the claimed character signal or by incidental correlations in the MDTA construction.
minor comments (2)
  1. [Abstract] The exact definition and normalization of the LD-Score (histogram divergence formula, handling of rare characters, or smoothing) is not stated in the abstract or summary; an equation or pseudocode should be added for reproducibility.
  2. [Benchmark Construction] The MDTA dataset description mentions 'prompt-aligned samples' but does not specify how prompts were chosen or balanced across domains; a table summarizing prompt statistics per domain would improve clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive review. We address each major comment below and commit to strengthening the manuscript with additional analyses and controls as outlined.

read point-by-point responses
  1. Referee: [Abstract / Theoretical Foundations] Abstract and theoretical foundations: The 'Wall of Separation' claim—that human domain-specialized character distributions produce larger divergences from AI global approximations than any AI-AI variation—is load-bearing for both the LD-Score motivation and the reported detection gains. The manuscript provides no direct pairwise divergence tables or statistical tests comparing human-AI LD-Score distances against AI-AI distances across the 5 domains and 4 models. Given that English letter frequencies are largely language-invariant (e.g., 'e' ~12.7%, 't' ~9.1%), small domain shifts in human text may not exceed tokenizer- or temperature-induced biases in the models, undermining the separation premise.

    Authors: We agree that explicit pairwise comparisons would make the 'Wall of Separation' claim more robust. The current manuscript motivates the claim theoretically and supports it indirectly via low correlation with perplexity methods (r=0.08-0.13) and domain-specific detection gains, but does not include the requested tables. In revision we will add mean/variance LD-Score distance tables and statistical tests (e.g., paired t-tests) for human-AI versus all AI-AI pairs across the 5 domains and 4 models. On the invariance point, LD-Score operates over the full 256-character distribution (including low-frequency characters and their contextual patterns), which domain-specialized human writing perturbs in ways that balanced training corpora smooth; the pronounced gains in specialized domains are consistent with this distinction. revision: yes

  2. Referee: [Results / Integration Experiments] Results section (integration experiments): The reported AUROC and F1 improvements from adding LD-Score to DNA-DetectLLM, Binoculars, and FastDetectGPT are presented without ablation on the non-linear classifier (architecture, hyperparameters, or training split), without per-domain error bars, and without controls for prompt length or adversarial strategy. It is therefore unclear whether the gains are driven by the claimed character signal or by incidental correlations in the MDTA construction.

    Authors: We acknowledge these controls are necessary for clarity. The reported gains use a fixed non-linear classifier on the prompt-aligned MDTA samples, but the manuscript does not detail architecture/hyperparameter ablations, per-domain error bars, or stratification by length/adversarial strategy. In the revision we will add: (1) classifier ablations (linear vs. non-linear, hyperparameter sweeps, cross-validation splits), (2) per-domain AUROC/F1 with bootstrapped error bars, and (3) results stratified by prompt length quartiles and each adversarial strategy. These additions will isolate LD-Score's contribution from any MDTA construction artifacts. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation grounded in new data and independent signal

full rationale

The paper asserts theoretical foundations for a 'Wall of Separation' based on AI models approximating global character patterns versus human domain specialization, then introduces LD-Score on character histograms and the MDTA benchmark (642k prompt-aligned samples across models/domains/temperatures/adversarials, expanding HC3). Evaluation shows low correlation (r=0.08-0.13) with perplexity methods and AUROC/F1 gains when fused via non-linear classifier, particularly in specialized domains. No quoted equations or steps reduce the claimed separation or LD-Score to a fit on the evaluation set itself, a self-citation chain, or a renamed known result. The benchmark construction and integration results provide external falsifiability outside any single fitted parameter.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The paper rests on a domain assumption about training data balance and introduces a new conceptual framing; no explicit free parameters are described in the abstract.

axioms (1)
  • domain assumption AI models approximate global character patterns due to training on massive domain-balanced corpora
    Stated as the basis for the theoretical foundation and Wall of Separation.
invented entities (1)
  • Wall of Separation no independent evidence
    purpose: To describe the claimed larger divergence between human and AI character distributions than between AI models
    New conceptual term introduced to frame the detection signal

pith-pipeline@v0.9.0 · 5576 in / 1201 out tokens · 90686 ms · 2026-05-10T16:27:08.754049+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

  • IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear
    ?
    unclear

    Relation between the paper passage and the cited Recognition theorem.

    We provide theoretical foundations showing that AI models, trained on massive domain-balanced corpora, approximate global character patterns while humans exhibit domain-specialized distributions, creating a 'Wall of Separation' where human-AI divergence significantly exceeds AI-AI divergence.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

41 extracted references · 41 canonical work pages

  1. [1]

    How many words do we read per minute? A review and meta-analysis of reading rate , journal =

    Marc Brysbaert , keywords =. How many words do we read per minute? A review and meta-analysis of reading rate , journal =. 2019 , issn =. doi:https://doi.org/10.1016/j.jml.2019.104047 , url =

  2. [2]

    and Grabska-Gradzińska, Iwona and Ochab, Jeremi K

    Przystalski, Karol and Argasiński, Jan K. and Grabska-Gradzińska, Iwona and Ochab, Jeremi K. , year=. Stylometry recognizes human and LLM-generated texts in short samples , volume=. doi:10.1016/j.eswa.2025.129001 , journal=

  3. [3]

    2024 , eprint=

    WanJuan-CC: A Safe and High-Quality Open-sourced English Webtext Dataset , author=. 2024 , eprint=

  4. [4]

    2024 , eprint=

    SlimPajama-DC: Understanding Data Combinations for LLM Training , author=. 2024 , eprint=

  5. [5]

    BetaKit , url =

    Nichols, Tom , title =. BetaKit , url =. 2025 , month = dec, day =

  6. [6]

    2023 , eprint=

    DetectGPT: Zero-Shot Machine-Generated Text Detection using Probability Curvature , author=. 2023 , eprint=

  7. [7]

    2025 , eprint=

    DetectRL: Benchmarking LLM-Generated Text Detection in Real-World Scenarios , author=. 2025 , eprint=

  8. [8]

    IEEE Transactions on Information theory , volume=

    Divergence measures based on the Shannon entropy , author=. IEEE Transactions on Information theory , volume=. 1991 , publisher=

  9. [9]

    Kullback and R

    S. Kullback and R. A. Leibler , journal =. On Information and Sufficiency , urldate =

  10. [10]

    IEEE Transactions on Information theory , volume=

    A new metric for probability distributions , author=. IEEE Transactions on Information theory , volume=. 2003 , publisher=

  11. [11]

    Journal of Applied Psychology , volume=

    A new readability yardstick , author=. Journal of Applied Psychology , volume=. 1948 , doi=

  12. [12]

    , journal=

    Lin, J. , journal=. Divergence measures based on the Shannon entropy , year=

  13. [13]

    How Close is ChatGPT to Human Experts? Comparison Corpus, Evaluation, and Detection

    Guo, Biyang and Zhang, Xin and Wang, Ziyuan and Jiang, Minqi and Nie, Jinran and Ding, Yuxuan and Yue, Jianwei and Wu, Yupeng. How Close is ChatGPT to Human Experts? Comparison Corpus, Evaluation, and Detection. 2023

  14. [14]

    2021 , eprint=

    Unsupervised and Distributional Detection of Machine-Generated Text , author=. 2021 , eprint=

  15. [15]

    2023 , eprint=

    Stylometric Detection of AI-Generated Text in Twitter Timelines , author=. 2023 , eprint=

  16. [16]

    2019 , eprint=

    Release Strategies and the Social Impacts of Language Models , author=. 2019 , eprint=

  17. [17]

    Proceedings of the American Philosophical Society , volume=

    The Law of Anomalous Numbers , author=. Proceedings of the American Philosophical Society , volume=. 1938 , publisher=

  18. [18]

    1949 , publisher=

    Human Behavior and the Principle of Least Effort: An Introduction to Human Ecology , author=. 1949 , publisher=

  19. [19]

    2023 , eprint=

    DNA-GPT: Divergent N-Gram Analysis for Training-Free Detection of GPT-Generated Text , author=. 2023 , eprint=

  20. [20]

    2025 , eprint=

    Can AI-Generated Text be Reliably Detected? , author=. 2025 , eprint=

  21. [21]

    2024 , eprint=

    Fast-DetectGPT: Efficient Zero-Shot Detection of Machine-Generated Text via Conditional Probability Curvature , author=. 2024 , eprint=

  22. [22]

    2024 , eprint=

    AI-Generated Text Detection and Classification Based on BERT Deep Learning Algorithm , author=. 2024 , eprint=

  23. [23]

    Detection of AI-Generated Texts: A Bi-LSTM and Attention-Based Approach , year=

    Blake, John and Miah, Abu Saleh Musa and Kredens, Krzysztof and Shin, Jungpil , journal=. Detection of AI-Generated Texts: A Bi-LSTM and Attention-Based Approach , year=

  24. [24]

    2025 , eprint=

    DNA-DetectLLM: Unveiling AI-Generated Text via a DNA-Inspired Mutation-Repair Paradigm , author=. 2025 , eprint=

  25. [25]

    2024 , eprint=

    Spotting LLMs With Binoculars: Zero-Shot Detection of Machine-Generated Text , author=. 2024 , eprint=

  26. [26]

    2024 , eprint=

    Ghostbuster: Detecting Text Ghostwritten by Large Language Models , author=. 2024 , eprint=

  27. [27]

    2013 , publisher =

    Concentration Inequalities: A Nonasymptotic Theory of Independence , author =. 2013 , publisher =

  28. [28]

    Linguistic Differences Between AI and Human Comments in Weibo: Detect AI-Generated Text Through Stylometric Features

    Li, Ziqi and Zhang, Qi. Linguistic Differences Between AI and Human Comments in Weibo: Detect AI-Generated Text Through Stylometric Features. 2025. doi:10.1007/978-981-95-2725-0_3

  29. [29]

    2024 , eprint=

    M4: Multi-generator, Multi-domain, and Multi-lingual Black-Box Machine-Generated Text Detection , author=. 2024 , eprint=

  30. [30]

    AI Generated Text Detection , author=

    A Comprehensive Dataset for Human vs. AI Generated Text Detection , author=. 2025 , eprint=

  31. [31]

    Langley , title =

    P. Langley , title =. Proceedings of the 17th International Conference on Machine Learning (ICML 2000) , address =. 2000 , pages =

  32. [32]

    T. M. Mitchell. The Need for Biases in Learning Generalizations. 1980

  33. [33]

    M. J. Kearns , title =

  34. [34]

    Machine Learning: An Artificial Intelligence Approach, Vol. I. 1983

  35. [35]

    R. O. Duda and P. E. Hart and D. G. Stork. Pattern Classification. 2000

  36. [36]

    Suppressed for Anonymity , author=

  37. [37]

    Newell and P

    A. Newell and P. S. Rosenbloom. Mechanisms of Skill Acquisition and the Law of Practice. Cognitive Skills and Their Acquisition. 1981

  38. [38]

    A. L. Samuel. Some Studies in Machine Learning Using the Game of Checkers. IBM Journal of Research and Development. 1959

  39. [39]

    Advances in Neural Information Processing Systems 37 , year=

    BiScope: AI-generated Text Detection by Checking Memorization of Preceding Tokens , author=. Advances in Neural Information Processing Systems 37 , year=

  40. [40]

    2025 , eprint=

    Reliably Bounding False Positives: A Zero-Shot Machine-Generated Text Detection Framework via Multiscaled Conformal Prediction , author=. 2025 , eprint=

  41. [41]

    Machine Learning , volume=

    Support-vector networks , author=. Machine Learning , volume=. 1995 , publisher=