pith. sign in

arxiv: 2604.22438 · v1 · submitted 2026-04-24 · 💻 cs.CR · cs.AI· cs.CL

SSG: Logit-Balanced Vocabulary Partitioning for LLM Watermarking

Pith reviewed 2026-05-08 11:31 UTC · model grok-4.3

classification 💻 cs.CR cs.AIcs.CL
keywords LLM watermarkingvocabulary partitioningKGW schemelogit balancingwatermark strengthcode generationmathematical reasoningdetectability
0
0 comments X

The pith

Redesigning vocabulary partitioning into logit-balanced groups raises the minimum watermark strength for each LLM token prediction.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that the KGW watermarking method's effectiveness depends on how the vocabulary is split into preferred and non-preferred tokens, and that random splits leave a low lower bound on strength when next-token probabilities are peaked. SSG instead sorts tokens by logit value and splits them into two groups with matched logit distributions, which raises that lower bound for every prediction. This matters for tracing AI-generated code and math reasoning, where standard watermarking performs poorly because low entropy limits how much the scheme can bias token choice. Experiments on relevant datasets confirm higher detection rates while preserving generation behavior.

Core claim

Under random vocabulary partitioning the lower bound of watermark strength is fixed by the next-token probability distribution, but sorting tokens by logits and splitting into two balanced subsets raises this lower bound for each prediction and thereby improves detectability of the KGW watermark.

What carries the argument

SSG (Sort-then-Split by Groups) partitioning, which sorts the full vocabulary by logit values and divides it into two subsets whose logit statistics are matched so that the minimum possible watermark bias per step increases.

If this is right

  • Watermark detection improves on code and mathematical reasoning tasks where standard KGW fails.
  • The change requires only a different partitioning step and leaves the rest of the KGW detection pipeline unchanged.
  • Each individual token prediction receives a higher guaranteed watermark contribution.
  • Generation quality metrics stay comparable to the unmodified scheme.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same balancing idea could be tested in other partitioning-based watermark schemes that also rely on green/red token sets.
  • If logit balancing works for watermarking, analogous grouping might help other logit-level interventions such as controlled decoding or bias mitigation.
  • A natural next measurement is whether the method changes robustness to paraphrasing or token-level attacks.

Load-bearing premise

That logit-balanced partitions will reliably increase the effective watermark strength on low-entropy distributions without lowering output quality or creating new detection problems.

What would settle it

A direct comparison on a code-generation or math dataset in which SSG produces no higher z-score or detection accuracy than random partitioning at the same watermark strength parameter.

Figures

Figures reproduced from arXiv: 2604.22438 by Chenxi Gu, John Grundy, Xiaoning Du.

Figure 1
Figure 1. Figure 1: Influence of top-k on SSG performance. 0 5 10 15 20 25 Probability Density Qwen2.5-Coder-7B on HumanEval LLaMA-3-8B on HumanEval 0.000 0.025 0.050 0.075 0.100 0.125 0.150 0.175 0.200 Watermark Strength 0 5 10 15 20 25 Probability Density Qwen2.5-Coder-7B on MBPP 0.000 0.025 0.050 0.075 0.100 0.125 0.150 0.175 0.200 Watermark Strength LLaMA-3-8B on MBPP KGW SSG view at source ↗
Figure 2
Figure 2. Figure 2: Examples of the watermark strength distribu view at source ↗
Figure 3
Figure 3. Figure 3: Curve of Watermark Strength over pg and δ. C Text Quality of SSG We show that SSG and KGW are equivalent in the expected logit shift across the parameter space γ ∈ (0, 1). Let E[L˜ i(γ)] be the expected logit of token vi for a given γ. In the KGW framework, the expectation is triv￾ially: EKGW [L˜ i(γ)] = Li + γδ (11) In SSG, the probability of green tokens P(vi ∈ V g ; γ) is piecewise-defined by the token’… view at source ↗
read the original abstract

Watermarking has emerged as a promising technique for tracing the authorship of content generated by large language models (LLMs). Among existing approaches, the KGW scheme is particularly attractive due to its versatility, efficiency, and effectiveness in natural language generation. However, KGW's effectiveness degrades significantly under low-entropy settings such as code generation and mathematical reasoning. A crucial step in the KGW method is random vocabulary partitioning, which enables adjustments to token selection based on specific preferences. Our study revealed that the next-token probability distribution plays an critical role in determining how much, or even whether, we can modify token selection and, consequently, the effectiveness of watermarking. We refer to this characteristic, associated with the probability distribution of each token prediction, as \emph{watermark strength.} In cases of random vocabulary partitioning, the lower bound of watermark strength is dictated by the next-token probability distribution. However, we found that, by redesigning the vocabulary partitioning algorithm, we can potentially raise this lower bound. In this paper, we propose SSG (\textbf{S}ort-then-\textbf{S}plit by \textbf{G}roups), a method that partitions the vocabulary into two logit-balanced subsets. This design lifts the lower bound of watermark strength for each token prediction, thereby improving watermark detectability. Experiments on code generation and mathematical reasoning datasets demonstrate the effectiveness of SSG.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The manuscript proposes SSG (Sort-then-Split by Groups), a vocabulary partitioning method to improve the KGW watermarking scheme for LLMs. It argues that random partitioning yields a lower bound on watermark strength (the bias achievable in token selection) dictated by the next-token probability distribution, which is especially limiting in low-entropy regimes such as code generation and mathematical reasoning. By sorting logits and splitting the vocabulary into two subsets with equal logit sums, SSG is claimed to raise this per-token lower bound and thereby improve watermark detectability. Experiments on code and math datasets are stated to demonstrate the method's effectiveness.

Significance. If the logit-balanced partitioning can be shown to strictly improve the minimum green-list probability mass over random partitioning for all logit vectors and the experiments confirm higher detection rates without quality degradation, the work would offer a lightweight, parameter-free enhancement to an established watermarking baseline. This would be particularly valuable for domains where entropy is low and existing methods lose effectiveness.

major comments (1)
  1. [§3] §3 (analysis of watermark strength and SSG construction): The central claim is that replacing random partitioning with logit-sum balancing raises the lower bound on achievable green-list probability mass. However, because next-token probabilities are obtained via softmax, equal logit sums do not imply equal probability sums. In low-entropy regimes (one dominant logit plus many small values), the dominant token may still land in the red list while the remaining mass yields a green-list probability close to the random-partition worst case. No derivation, inequality, or exhaustive check over logit vectors is supplied showing that the min green mass is strictly improved.
minor comments (2)
  1. [Abstract] Abstract: states that experiments demonstrate effectiveness but supplies no quantitative results, baselines, error bars, or details on how the lower bound is computed or lifted.
  2. [§2] Notation: the term 'watermark strength' is introduced as a key concept but its precise mathematical definition (e.g., as a function of the green-list probability mass) and the exact procedure for computing its lower bound should be stated with equations.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their thoughtful and detailed review. We address the major comment on the analysis in §3 below and will revise the manuscript to incorporate a strengthened theoretical justification.

read point-by-point responses
  1. Referee: [§3] §3 (analysis of watermark strength and SSG construction): The central claim is that replacing random partitioning with logit-sum balancing raises the lower bound on achievable green-list probability mass. However, because next-token probabilities are obtained via softmax, equal logit sums do not imply equal probability sums. In low-entropy regimes (one dominant logit plus many small values), the dominant token may still land in the red list while the remaining mass yields a green-list probability close to the random-partition worst case. No derivation, inequality, or exhaustive check over logit vectors is supplied showing that the min green mass is strictly improved.

    Authors: We acknowledge that equal logit sums do not guarantee equal probability sums under softmax, and that in low-entropy regimes the dominant token could be assigned to the red list. Nevertheless, SSG's sort-then-split procedure still raises the effective lower bound on green-list probability mass relative to random partitioning because it systematically avoids the most unbalanced allocations that random partitioning can produce. By distributing high-logit tokens across the two subsets in a balanced way, the method guarantees a higher minimum bias for the majority of logit vectors encountered in practice. We will add to the revised §3 both a formal argument establishing that the minimum green probability mass under SSG is at least as large as the worst-case random partition (with strict improvement for all but a measure-zero set of logit vectors) and an exhaustive numerical verification over synthetic logit distributions as well as real outputs from the models used in our experiments. revision: yes

Circularity Check

0 steps flagged

No circularity: SSG redesign is an independent algorithmic proposal, not a self-referential fit or definition

full rationale

The paper defines watermark strength via the green-list probability mass under random partitioning and proposes SSG (sort logits then split for equal logit sums) as a new partitioning rule intended to raise the per-token lower bound. This redesign is not defined in terms of the bound it aims to improve, nor does any equation reduce the claimed improvement to a fitted parameter renamed as a prediction. No load-bearing self-citations, uniqueness theorems imported from prior author work, or ansatz smuggling appear in the abstract or described derivation. The argument flows from distribution analysis to algorithm design with experimental checks, remaining self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the newly introduced notion of watermark strength being determined by the next-token distribution and on the assumption that balanced logit partitioning raises its lower bound; no free parameters or invented physical entities are visible.

axioms (1)
  • domain assumption The next-token probability distribution dictates the lower bound of watermark strength under random partitioning.
    Stated directly in the abstract as the key observation motivating the redesign.
invented entities (1)
  • watermark strength no independent evidence
    purpose: Quantifies how much token-selection bias can be applied given the probability distribution at each step.
    Newly defined characteristic associated with each token prediction; no independent evidence outside the paper is provided.

pith-pipeline@v0.9.0 · 5550 in / 1279 out tokens · 43641 ms · 2026-05-08T11:31:13.703291+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

2 extracted references · 2 canonical work pages · 2 internal anchors

  1. [1]

    Evaluating Large Language Models Trained on Code

    Evaluating large language models trained on code.Preprint, arXiv:2107.03374. Miranda Christ, Sam Gunn, and Or Zamir. 2023. Unde- tectable watermarks for language models.Preprint, arXiv:2306.09194. Karl Cobbe, Vineet Kosaraju, Mohammad Bavarian, Mark Chen, Heewoo Jun, Lukasz Kaiser, Matthias Plappert, Jerry Tworek, Jacob Hilton, Reiichiro Nakano, Christoph...

  2. [2]

    Training Verifiers to Solve Math Word Problems

    Training verifiers to solve math word prob- lems.Preprint, arXiv:2110.14168. Google. 2025. Google apis terms of service. Blog post. Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, and et al. 2024. The llama 3 herd of models.Preprint, arXiv:2407.21783. Chenchen Gu, Xiang Lisa Li, Percy Liang, and Tatsunori Hashimoto. 2024. On the learna...