SSG: Logit-Balanced Vocabulary Partitioning for LLM Watermarking
Pith reviewed 2026-05-08 11:31 UTC · model grok-4.3
The pith
Redesigning vocabulary partitioning into logit-balanced groups raises the minimum watermark strength for each LLM token prediction.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Under random vocabulary partitioning the lower bound of watermark strength is fixed by the next-token probability distribution, but sorting tokens by logits and splitting into two balanced subsets raises this lower bound for each prediction and thereby improves detectability of the KGW watermark.
What carries the argument
SSG (Sort-then-Split by Groups) partitioning, which sorts the full vocabulary by logit values and divides it into two subsets whose logit statistics are matched so that the minimum possible watermark bias per step increases.
If this is right
- Watermark detection improves on code and mathematical reasoning tasks where standard KGW fails.
- The change requires only a different partitioning step and leaves the rest of the KGW detection pipeline unchanged.
- Each individual token prediction receives a higher guaranteed watermark contribution.
- Generation quality metrics stay comparable to the unmodified scheme.
Where Pith is reading between the lines
- The same balancing idea could be tested in other partitioning-based watermark schemes that also rely on green/red token sets.
- If logit balancing works for watermarking, analogous grouping might help other logit-level interventions such as controlled decoding or bias mitigation.
- A natural next measurement is whether the method changes robustness to paraphrasing or token-level attacks.
Load-bearing premise
That logit-balanced partitions will reliably increase the effective watermark strength on low-entropy distributions without lowering output quality or creating new detection problems.
What would settle it
A direct comparison on a code-generation or math dataset in which SSG produces no higher z-score or detection accuracy than random partitioning at the same watermark strength parameter.
Figures
read the original abstract
Watermarking has emerged as a promising technique for tracing the authorship of content generated by large language models (LLMs). Among existing approaches, the KGW scheme is particularly attractive due to its versatility, efficiency, and effectiveness in natural language generation. However, KGW's effectiveness degrades significantly under low-entropy settings such as code generation and mathematical reasoning. A crucial step in the KGW method is random vocabulary partitioning, which enables adjustments to token selection based on specific preferences. Our study revealed that the next-token probability distribution plays an critical role in determining how much, or even whether, we can modify token selection and, consequently, the effectiveness of watermarking. We refer to this characteristic, associated with the probability distribution of each token prediction, as \emph{watermark strength.} In cases of random vocabulary partitioning, the lower bound of watermark strength is dictated by the next-token probability distribution. However, we found that, by redesigning the vocabulary partitioning algorithm, we can potentially raise this lower bound. In this paper, we propose SSG (\textbf{S}ort-then-\textbf{S}plit by \textbf{G}roups), a method that partitions the vocabulary into two logit-balanced subsets. This design lifts the lower bound of watermark strength for each token prediction, thereby improving watermark detectability. Experiments on code generation and mathematical reasoning datasets demonstrate the effectiveness of SSG.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes SSG (Sort-then-Split by Groups), a vocabulary partitioning method to improve the KGW watermarking scheme for LLMs. It argues that random partitioning yields a lower bound on watermark strength (the bias achievable in token selection) dictated by the next-token probability distribution, which is especially limiting in low-entropy regimes such as code generation and mathematical reasoning. By sorting logits and splitting the vocabulary into two subsets with equal logit sums, SSG is claimed to raise this per-token lower bound and thereby improve watermark detectability. Experiments on code and math datasets are stated to demonstrate the method's effectiveness.
Significance. If the logit-balanced partitioning can be shown to strictly improve the minimum green-list probability mass over random partitioning for all logit vectors and the experiments confirm higher detection rates without quality degradation, the work would offer a lightweight, parameter-free enhancement to an established watermarking baseline. This would be particularly valuable for domains where entropy is low and existing methods lose effectiveness.
major comments (1)
- [§3] §3 (analysis of watermark strength and SSG construction): The central claim is that replacing random partitioning with logit-sum balancing raises the lower bound on achievable green-list probability mass. However, because next-token probabilities are obtained via softmax, equal logit sums do not imply equal probability sums. In low-entropy regimes (one dominant logit plus many small values), the dominant token may still land in the red list while the remaining mass yields a green-list probability close to the random-partition worst case. No derivation, inequality, or exhaustive check over logit vectors is supplied showing that the min green mass is strictly improved.
minor comments (2)
- [Abstract] Abstract: states that experiments demonstrate effectiveness but supplies no quantitative results, baselines, error bars, or details on how the lower bound is computed or lifted.
- [§2] Notation: the term 'watermark strength' is introduced as a key concept but its precise mathematical definition (e.g., as a function of the green-list probability mass) and the exact procedure for computing its lower bound should be stated with equations.
Simulated Author's Rebuttal
We thank the referee for their thoughtful and detailed review. We address the major comment on the analysis in §3 below and will revise the manuscript to incorporate a strengthened theoretical justification.
read point-by-point responses
-
Referee: [§3] §3 (analysis of watermark strength and SSG construction): The central claim is that replacing random partitioning with logit-sum balancing raises the lower bound on achievable green-list probability mass. However, because next-token probabilities are obtained via softmax, equal logit sums do not imply equal probability sums. In low-entropy regimes (one dominant logit plus many small values), the dominant token may still land in the red list while the remaining mass yields a green-list probability close to the random-partition worst case. No derivation, inequality, or exhaustive check over logit vectors is supplied showing that the min green mass is strictly improved.
Authors: We acknowledge that equal logit sums do not guarantee equal probability sums under softmax, and that in low-entropy regimes the dominant token could be assigned to the red list. Nevertheless, SSG's sort-then-split procedure still raises the effective lower bound on green-list probability mass relative to random partitioning because it systematically avoids the most unbalanced allocations that random partitioning can produce. By distributing high-logit tokens across the two subsets in a balanced way, the method guarantees a higher minimum bias for the majority of logit vectors encountered in practice. We will add to the revised §3 both a formal argument establishing that the minimum green probability mass under SSG is at least as large as the worst-case random partition (with strict improvement for all but a measure-zero set of logit vectors) and an exhaustive numerical verification over synthetic logit distributions as well as real outputs from the models used in our experiments. revision: yes
Circularity Check
No circularity: SSG redesign is an independent algorithmic proposal, not a self-referential fit or definition
full rationale
The paper defines watermark strength via the green-list probability mass under random partitioning and proposes SSG (sort logits then split for equal logit sums) as a new partitioning rule intended to raise the per-token lower bound. This redesign is not defined in terms of the bound it aims to improve, nor does any equation reduce the claimed improvement to a fitted parameter renamed as a prediction. No load-bearing self-citations, uniqueness theorems imported from prior author work, or ansatz smuggling appear in the abstract or described derivation. The argument flows from distribution analysis to algorithm design with experimental checks, remaining self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The next-token probability distribution dictates the lower bound of watermark strength under random partitioning.
invented entities (1)
-
watermark strength
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Evaluating Large Language Models Trained on Code
Evaluating large language models trained on code.Preprint, arXiv:2107.03374. Miranda Christ, Sam Gunn, and Or Zamir. 2023. Unde- tectable watermarks for language models.Preprint, arXiv:2306.09194. Karl Cobbe, Vineet Kosaraju, Mohammad Bavarian, Mark Chen, Heewoo Jun, Lukasz Kaiser, Matthias Plappert, Jerry Tworek, Jacob Hilton, Reiichiro Nakano, Christoph...
work page internal anchor Pith review arXiv 2023
-
[2]
Training Verifiers to Solve Math Word Problems
Training verifiers to solve math word prob- lems.Preprint, arXiv:2110.14168. Google. 2025. Google apis terms of service. Blog post. Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, and et al. 2024. The llama 3 herd of models.Preprint, arXiv:2407.21783. Chenchen Gu, Xiang Lisa Li, Percy Liang, and Tatsunori Hashimoto. 2024. On the learna...
work page internal anchor Pith review arXiv 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.