pith. sign in

arxiv: 2510.07500 · v2 · submitted 2025-10-08 · 💻 cs.LG · cs.IT· math.IT

Black-Box Detection of LLM-Generated Text Using Generalized Jensen-Shannon Divergence

Pith reviewed 2026-05-18 08:41 UTC · model grok-4.3

classification 💻 cs.LG cs.ITmath.IT
keywords LLM-generated text detectionblack-box detectionJensen-Shannon divergencesurprisal dynamicsstate transition matrixmachine generated textreference-based detectorrobustness to model mismatch
0
0 comments X

The pith

SurpMark detects LLM-generated text by comparing the transition patterns of discretized token surprisals to fixed human and machine reference matrices using generalized Jensen-Shannon divergence.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a black-box detector called SurpMark that identifies machine-generated text even when the scoring model does not match the unknown source model. It works by turning a passage's token surprisals into a sequence of discrete states, estimating how those states transition from one token to the next, and then scoring the resulting matrix by its divergence from two precomputed reference matrices. The authors derive guidance on how the number of discretization bins should grow with the amount of data and show that the method matches or exceeds existing detectors across varied datasets and generators while avoiding the cost of generating contrastive examples for every new input.

Core claim

SurpMark summarizes a passage by the dynamics of its token surprisals: it discretizes surprisals into interpretable states, estimates a state-transition matrix for the test text, and scores the matrix via a generalized Jensen-Shannon gap to two fixed references built once from existing human and machine corpora. This yields a detection statistic that remains effective under model mismatch and without per-input contrastive generation.

What carries the argument

The state-transition matrix of discretized token surprisals, scored by its generalized Jensen-Shannon divergence gap to fixed human and machine reference matrices.

If this is right

  • Detection works without access to or knowledge of the unknown source model.
  • Computational cost drops because references are built once rather than generating contrasts for each new passage.
  • Theoretical scaling rules for discretization bins help explain observed hyperparameter trends.
  • Performance holds across multiple datasets, source models, and mismatch scenarios.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same transition-matrix approach could be tested on other sequential data where human and machine patterns differ, such as code or structured outputs.
  • Periodically refreshing the reference corpora with recent human text might keep the detector aligned with evolving language use.
  • If the gap statistic can be computed incrementally, the method might support streaming detection on long documents.

Load-bearing premise

Fixed reference transition matrices built once from existing human and machine corpora remain discriminative even for test text from unseen domains or generators.

What would settle it

On a fresh dataset drawn from an unseen generator and domain, SurpMark's detection accuracy falls below the best baseline or approaches random guessing.

read the original abstract

We study black-box detection of machine-generated text under practical constraints: the scoring model (proxy LM) may mismatch the unknown source model, and per-input contrastive generation is costly. We propose SurpMark, a reference-based detector that summarizes a passage by the dynamics of its token surprisals. SurpMark discretizes surprisals into interpretable states, estimates a state-transition matrix for the test text, and scores it via a generalized Jensen-Shannon (GJS) gap between the test transitions and two fixed references (human vs. machine) built once from existing corpora. Theoretically, we derive design guidance for how the discretization bins should scale with data and provide a principled justification for our test statistic. Empirically, across multiple datasets, source models, and scenarios, SurpMark consistently matches or surpasses baselines, demonstrating strong robustness across domains and generators; our experiments on hyperparameter sensitivity exhibit trends that our theoretical results help to explain.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes SurpMark, a reference-based black-box detector for LLM-generated text. It discretizes token surprisals (from a possibly mismatched proxy LM) into a fixed number of states, estimates a state-transition matrix for the test passage, and scores the passage by the generalized Jensen-Shannon divergence gap between this matrix and two fixed reference transition matrices (human vs. machine) that are built once from external corpora. The paper derives theoretical guidance for scaling the number of discretization bins with data size and reports that SurpMark matches or exceeds baselines across multiple datasets, source models, and scenarios, including proxy-LM mismatches.

Significance. If the empirical robustness to proxy mismatch holds, the work is significant for practical deployment: it avoids per-input contrastive generation and relies on reusable references, while the theoretical bin-scaling result supplies principled rather than purely empirical design guidance. The fixed-reference GJS approach could extend to other surprisal-based analyses in detection or watermarking.

major comments (2)
  1. [Theoretical derivation for bin scaling] § on theoretical derivation for bin scaling: the guidance is derived under the assumption that reference and test surprisal statistics are drawn from the same underlying distribution. When the proxy LM used at test time differs materially in calibration or tokenization from the LM used to build the fixed references, the state labels cease to correspond to comparable semantic regimes; the GJS gap then compares incomparable quantities. This assumption is load-bearing for the central robustness claim yet receives no explicit cross-proxy calibration step or sensitivity analysis.
  2. [Experiments on proxy mismatch] Experiments section (proxy-mismatch scenarios): the abstract asserts robustness “when the proxy LM differs,” but the reported tests appear to use proxies that remain close to the reference LM in tokenizer and surprisal statistics. If the experiments do not include substantially divergent proxies (different families, tokenizers, or temperature regimes), the empirical support for the central claim is incomplete.
minor comments (2)
  1. [Methods] Notation for the generalized Jensen-Shannon divergence should be introduced with an explicit equation number and contrasted with the ordinary JS divergence to avoid reader confusion.
  2. [Reference construction] The paper should state the exact corpora and proxy LM used to construct the two fixed reference matrices so that reproducibility is possible.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thoughtful and constructive feedback on our manuscript. We address each major comment below with clarifications and describe the revisions we plan to incorporate.

read point-by-point responses
  1. Referee: [Theoretical derivation for bin scaling] § on theoretical derivation for bin scaling: the guidance is derived under the assumption that reference and test surprisal statistics are drawn from the same underlying distribution. When the proxy LM used at test time differs materially in calibration or tokenization from the LM used to build the fixed references, the state labels cease to correspond to comparable semantic regimes; the GJS gap then compares incomparable quantities. This assumption is load-bearing for the central robustness claim yet receives no explicit cross-proxy calibration step or sensitivity analysis.

    Authors: We appreciate the referee's identification of the key assumption underlying the bin-scaling derivation. The theoretical guidance is obtained under the setting where reference and test surprisal sequences are drawn from the same distribution, yielding a principled scaling rule for the number of discretization bins as a function of passage length. We acknowledge that material mismatches in proxy calibration or tokenization can disrupt direct comparability of the resulting state labels. In the revised manuscript we will add an explicit discussion of this scope limitation together with a quantile-based normalization procedure that aligns surprisal scales across proxies before discretization. We will also include a sensitivity analysis that varies the degree of proxy mismatch and reports the resulting change in GJS gap stability. revision: yes

  2. Referee: [Experiments on proxy mismatch] Experiments section (proxy-mismatch scenarios): the abstract asserts robustness “when the proxy LM differs,” but the reported tests appear to use proxies that remain close to the reference LM in tokenizer and surprisal statistics. If the experiments do not include substantially divergent proxies (different families, tokenizers, or temperature regimes), the empirical support for the central claim is incomplete.

    Authors: We thank the referee for pressing on the breadth of the proxy-mismatch evaluation. Our existing experiments already incorporate proxies drawn from different model families and tokenizers (for example, using a GPT-2 proxy on text generated by Llama-family models). Nevertheless, we agree that coverage of more extreme divergences, such as temperature-induced shifts in surprisal distributions or entirely unrelated architectures, would strengthen the robustness claim. In the revision we will expand the proxy-mismatch section with additional experiments that include these divergent regimes and will report the corresponding detection performance to provide more complete empirical support. revision: yes

Circularity Check

0 steps flagged

No significant circularity in the derivation chain

full rationale

The SurpMark method constructs fixed reference transition matrices once from external corpora and scores test passages via GJS divergence on discretized surprisal transitions obtained with a proxy LM. The theoretical guidance for bin scaling is derived from first-principles assumptions on surprisal statistics rather than fitted to target data, and the test statistic receives independent justification. No equations or claims reduce by construction to self-citations, fitted inputs renamed as predictions, or definitional loops; the approach is self-contained against external benchmarks and datasets.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the premise that surprisal transition statistics differ systematically between human and machine text and that this difference is preserved under proxy-model mismatch; the discretization rule is derived theoretically but still depends on data volume.

free parameters (1)
  • number of surprisal discretization bins
    Chosen according to the derived scaling rule with data length; exact value may still be tuned per corpus.
axioms (1)
  • domain assumption Surprisal dynamics under a proxy LM capture stable, domain-robust differences between human and machine text generation.
    Invoked to justify why fixed references remain useful across mismatched models and domains.

pith-pipeline@v0.9.0 · 5695 in / 1339 out tokens · 33846 ms · 2026-05-18T08:41:10.422704+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.