Black-Box Detection of LLM-Generated Text Using Generalized Jensen-Shannon Divergence
Pith reviewed 2026-05-18 08:41 UTC · model grok-4.3
The pith
SurpMark detects LLM-generated text by comparing the transition patterns of discretized token surprisals to fixed human and machine reference matrices using generalized Jensen-Shannon divergence.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
SurpMark summarizes a passage by the dynamics of its token surprisals: it discretizes surprisals into interpretable states, estimates a state-transition matrix for the test text, and scores the matrix via a generalized Jensen-Shannon gap to two fixed references built once from existing human and machine corpora. This yields a detection statistic that remains effective under model mismatch and without per-input contrastive generation.
What carries the argument
The state-transition matrix of discretized token surprisals, scored by its generalized Jensen-Shannon divergence gap to fixed human and machine reference matrices.
If this is right
- Detection works without access to or knowledge of the unknown source model.
- Computational cost drops because references are built once rather than generating contrasts for each new passage.
- Theoretical scaling rules for discretization bins help explain observed hyperparameter trends.
- Performance holds across multiple datasets, source models, and mismatch scenarios.
Where Pith is reading between the lines
- The same transition-matrix approach could be tested on other sequential data where human and machine patterns differ, such as code or structured outputs.
- Periodically refreshing the reference corpora with recent human text might keep the detector aligned with evolving language use.
- If the gap statistic can be computed incrementally, the method might support streaming detection on long documents.
Load-bearing premise
Fixed reference transition matrices built once from existing human and machine corpora remain discriminative even for test text from unseen domains or generators.
What would settle it
On a fresh dataset drawn from an unseen generator and domain, SurpMark's detection accuracy falls below the best baseline or approaches random guessing.
read the original abstract
We study black-box detection of machine-generated text under practical constraints: the scoring model (proxy LM) may mismatch the unknown source model, and per-input contrastive generation is costly. We propose SurpMark, a reference-based detector that summarizes a passage by the dynamics of its token surprisals. SurpMark discretizes surprisals into interpretable states, estimates a state-transition matrix for the test text, and scores it via a generalized Jensen-Shannon (GJS) gap between the test transitions and two fixed references (human vs. machine) built once from existing corpora. Theoretically, we derive design guidance for how the discretization bins should scale with data and provide a principled justification for our test statistic. Empirically, across multiple datasets, source models, and scenarios, SurpMark consistently matches or surpasses baselines, demonstrating strong robustness across domains and generators; our experiments on hyperparameter sensitivity exhibit trends that our theoretical results help to explain.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes SurpMark, a reference-based black-box detector for LLM-generated text. It discretizes token surprisals (from a possibly mismatched proxy LM) into a fixed number of states, estimates a state-transition matrix for the test passage, and scores the passage by the generalized Jensen-Shannon divergence gap between this matrix and two fixed reference transition matrices (human vs. machine) that are built once from external corpora. The paper derives theoretical guidance for scaling the number of discretization bins with data size and reports that SurpMark matches or exceeds baselines across multiple datasets, source models, and scenarios, including proxy-LM mismatches.
Significance. If the empirical robustness to proxy mismatch holds, the work is significant for practical deployment: it avoids per-input contrastive generation and relies on reusable references, while the theoretical bin-scaling result supplies principled rather than purely empirical design guidance. The fixed-reference GJS approach could extend to other surprisal-based analyses in detection or watermarking.
major comments (2)
- [Theoretical derivation for bin scaling] § on theoretical derivation for bin scaling: the guidance is derived under the assumption that reference and test surprisal statistics are drawn from the same underlying distribution. When the proxy LM used at test time differs materially in calibration or tokenization from the LM used to build the fixed references, the state labels cease to correspond to comparable semantic regimes; the GJS gap then compares incomparable quantities. This assumption is load-bearing for the central robustness claim yet receives no explicit cross-proxy calibration step or sensitivity analysis.
- [Experiments on proxy mismatch] Experiments section (proxy-mismatch scenarios): the abstract asserts robustness “when the proxy LM differs,” but the reported tests appear to use proxies that remain close to the reference LM in tokenizer and surprisal statistics. If the experiments do not include substantially divergent proxies (different families, tokenizers, or temperature regimes), the empirical support for the central claim is incomplete.
minor comments (2)
- [Methods] Notation for the generalized Jensen-Shannon divergence should be introduced with an explicit equation number and contrasted with the ordinary JS divergence to avoid reader confusion.
- [Reference construction] The paper should state the exact corpora and proxy LM used to construct the two fixed reference matrices so that reproducibility is possible.
Simulated Author's Rebuttal
We thank the referee for the thoughtful and constructive feedback on our manuscript. We address each major comment below with clarifications and describe the revisions we plan to incorporate.
read point-by-point responses
-
Referee: [Theoretical derivation for bin scaling] § on theoretical derivation for bin scaling: the guidance is derived under the assumption that reference and test surprisal statistics are drawn from the same underlying distribution. When the proxy LM used at test time differs materially in calibration or tokenization from the LM used to build the fixed references, the state labels cease to correspond to comparable semantic regimes; the GJS gap then compares incomparable quantities. This assumption is load-bearing for the central robustness claim yet receives no explicit cross-proxy calibration step or sensitivity analysis.
Authors: We appreciate the referee's identification of the key assumption underlying the bin-scaling derivation. The theoretical guidance is obtained under the setting where reference and test surprisal sequences are drawn from the same distribution, yielding a principled scaling rule for the number of discretization bins as a function of passage length. We acknowledge that material mismatches in proxy calibration or tokenization can disrupt direct comparability of the resulting state labels. In the revised manuscript we will add an explicit discussion of this scope limitation together with a quantile-based normalization procedure that aligns surprisal scales across proxies before discretization. We will also include a sensitivity analysis that varies the degree of proxy mismatch and reports the resulting change in GJS gap stability. revision: yes
-
Referee: [Experiments on proxy mismatch] Experiments section (proxy-mismatch scenarios): the abstract asserts robustness “when the proxy LM differs,” but the reported tests appear to use proxies that remain close to the reference LM in tokenizer and surprisal statistics. If the experiments do not include substantially divergent proxies (different families, tokenizers, or temperature regimes), the empirical support for the central claim is incomplete.
Authors: We thank the referee for pressing on the breadth of the proxy-mismatch evaluation. Our existing experiments already incorporate proxies drawn from different model families and tokenizers (for example, using a GPT-2 proxy on text generated by Llama-family models). Nevertheless, we agree that coverage of more extreme divergences, such as temperature-induced shifts in surprisal distributions or entirely unrelated architectures, would strengthen the robustness claim. In the revision we will expand the proxy-mismatch section with additional experiments that include these divergent regimes and will report the corresponding detection performance to provide more complete empirical support. revision: yes
Circularity Check
No significant circularity in the derivation chain
full rationale
The SurpMark method constructs fixed reference transition matrices once from external corpora and scores test passages via GJS divergence on discretized surprisal transitions obtained with a proxy LM. The theoretical guidance for bin scaling is derived from first-principles assumptions on surprisal statistics rather than fitted to target data, and the test statistic receives independent justification. No equations or claims reduce by construction to self-citations, fitted inputs renamed as predictions, or definitional loops; the approach is self-contained against external benchmarks and datasets.
Axiom & Free-Parameter Ledger
free parameters (1)
- number of surprisal discretization bins
axioms (1)
- domain assumption Surprisal dynamics under a proxy LM capture stable, domain-robust differences between human and machine text generation.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
SurpMark quantizes surprisals into interpretable states, estimates a state-transition matrix for the test text, and scores it via a generalized Jensen-Shannon (GJS) gap between the test transitions and two fixed references
-
IndisputableMonolith/Foundation/AlexanderDuality.leanalexander_duality_circle_linking unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We prove a principled discretization criterion and establish the asymptotic normality of the decision statistic
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.