Towards Better Statistical Understanding of Watermarking LLMs
Pith reviewed 2026-05-24 02:33 UTC · model grok-4.3
The pith
An online dual gradient ascent algorithm achieves asymptotic Pareto optimality for the distortion-detection trade-off in red-green list LLM watermarking.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By casting red-green list watermarking as a constrained optimization problem, the optimal token probability adjustments admit a clean analytical characterization; an online dual gradient ascent algorithm derived from this formulation is asymptotically Pareto optimal between model distortion and detection power, which guarantees a strictly higher average green-list probability than previous approaches under the same distortion budget.
What carries the argument
The online dual gradient ascent watermarking algorithm that solves the constrained optimization of green-list probability subject to a distortion budget via dual variables updated on the fly.
If this is right
- The algorithm produces an explicit increase in average green-list token probability, improving detection rates for any fixed distortion level.
- The optimal solution's analytical property supplies a principled way to choose the green-list size and bias at each step.
- KL divergence is defended as the distortion metric that correctly captures the statistical change induced by watermarking.
- Existing claims of distortion-free watermarking or reliance on perplexity are shown to be inadequate for evaluating the trade-off.
Where Pith is reading between the lines
- The same optimization framing could be applied to watermarking methods that do not rely on fixed red-green lists.
- Real-time deployment of LLMs could use the dual variables as tunable knobs to meet application-specific detection thresholds.
- Standardized reporting of KL-based distortion alongside detection metrics might become a common evaluation practice.
Load-bearing premise
The red-green list scheme is treated as the fixed underlying watermarking method whose distortion-detection frontier can be optimized, and the dual ascent procedure converges under standard regularity conditions on the LLM's token distributions.
What would settle it
Apply the dual ascent updates to a long sequence of real LLM token distributions and check whether the time-averaged green-list probability rises while the chosen distortion measure stays within the prescribed bound; if it does not, the asymptotic optimality claim fails.
read the original abstract
In this paper, we study the problem of watermarking large language models (LLMs). We consider the trade-off between model distortion and detection ability and formulate it as a constrained optimization problem based on the red-green list watermarking algorithm. We show that the optimal solution to the optimization problem enjoys a nice analytical property which provides a better understanding and inspires the algorithm design for the watermarking process. We develop an online dual gradient ascent watermarking algorithm in light of this optimization formulation and prove its asymptotic Pareto optimality between model distortion and detection ability. Such a result guarantees an averaged increased green list probability and henceforth detection ability explicitly (in contrast to previous results). Moreover, we provide a systematic discussion on the choice of the model distortion metrics for the watermarking problem. We justify our choice of KL divergence and present issues with the existing criteria of ``distortion-free'' and perplexity. Finally, we empirically evaluate our algorithms on extensive datasets against benchmark algorithms.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper formulates LLM watermarking via red-green lists as a constrained optimization problem trading off model distortion (KL divergence) against detection power (green-list probability). It derives an analytical property of the optimum, introduces an online dual gradient ascent algorithm, and proves asymptotic Pareto optimality of the algorithm, which is claimed to deliver an explicit guarantee of increased average green-list probability (unlike prior work). The manuscript also justifies KL as the distortion metric over alternatives such as 'distortion-free' or perplexity and reports empirical comparisons on multiple datasets.
Significance. If the asymptotic Pareto-optimality result holds under its stated conditions, the work supplies a theoretically grounded algorithm with explicit (rather than implicit) detection guarantees, which would be a meaningful advance for statistical understanding of watermarking. The analytical characterization of the optimum and the focused discussion of distortion metrics are additional strengths that could inform future designs.
major comments (2)
- [proof of asymptotic Pareto optimality] Proof of asymptotic Pareto optimality: the convergence argument relies on standard dual-ascent conditions (Lagrangian convexity, bounded subgradients, and ergodicity/stationarity of the token-distribution sequence p_t). LLM next-token distributions are context-dependent and non-stationary, with possible support changes across a generation; no argument is given that these conditions continue to hold or that the guarantee survives their violation. This assumption is load-bearing for the central claim of an explicit, averaged increase in green-list probability.
- [constrained optimization formulation] Optimization formulation and base method: the red-green list scheme is fixed as the underlying mechanism whose distortion-detection frontier is optimized. It is unclear whether the derived analytical property and the online algorithm remain valid if the base watermarking procedure itself is altered or if the token distributions deviate from the assumed sequence; this affects whether the Pareto result is general or specific to the chosen base.
minor comments (2)
- The abstract states that experiments were run on 'extensive datasets' but does not name them or report basic statistics (sequence length, number of prompts, variance across runs); adding this information would improve reproducibility.
- Notation for the dual variables and the green-list probability update rule could be introduced earlier and used consistently to ease reading of the algorithm and proof.
Simulated Author's Rebuttal
We thank the referee for the careful reading and insightful comments on our manuscript. We address each major comment below in a point-by-point manner.
read point-by-point responses
-
Referee: Proof of asymptotic Pareto optimality: the convergence argument relies on standard dual-ascent conditions (Lagrangian convexity, bounded subgradients, and ergodicity/stationarity of the token-distribution sequence p_t). LLM next-token distributions are context-dependent and non-stationary, with possible support changes across a generation; no argument is given that these conditions continue to hold or that the guarantee survives their violation. This assumption is load-bearing for the central claim of an explicit, averaged increase in green-list probability.
Authors: We appreciate the referee pointing out the reliance on these standard conditions for dual-ascent convergence. The proof establishes asymptotic Pareto optimality under the assumptions of Lagrangian convexity, bounded subgradients, and ergodicity/stationarity of the sequence {p_t}, which are explicitly stated in the analysis. While we acknowledge that LLM next-token distributions are context-dependent and the strict stationarity assumption may be violated in practice due to varying contexts and support changes, the result still supplies an explicit (rather than implicit) guarantee under the stated conditions. This constitutes a theoretical advance relative to prior work. In revision we will add a paragraph discussing the scope of the assumptions and their practical relevance to LLM generation. revision: partial
-
Referee: Optimization formulation and base method: the red-green list scheme is fixed as the underlying mechanism whose distortion-detection frontier is optimized. It is unclear whether the derived analytical property and the online algorithm remain valid if the base watermarking procedure itself is altered or if the token distributions deviate from the assumed sequence; this affects whether the Pareto result is general or specific to the chosen base.
Authors: The constrained optimization formulation, the analytical property of the optimum, and the online dual gradient ascent algorithm are all developed specifically for the red-green list watermarking scheme, as stated in the abstract and introduction. The Pareto-optimality guarantee is therefore tied to this base mechanism and the associated token-distribution sequence. We view this focus as appropriate, since red-green lists represent a standard and practical watermarking approach; the paper does not claim generality beyond this setting. Generalization to other watermarking procedures is left as future work. revision: no
Circularity Check
No significant circularity in the derivation chain
full rationale
The paper formulates the watermarking trade-off as a constrained optimization problem based on the red-green list algorithm, identifies an analytical property of the optimum, and develops an online dual gradient ascent procedure whose asymptotic Pareto optimality is proved under standard dual-ascent conditions (convexity, bounded subgradients, ergodicity). These steps rely on external convergence theory rather than self-definitions, fitted parameters renamed as predictions, or load-bearing self-citations; the claimed explicit increase in green-list probability follows from the convergence result, not by construction from the inputs. The derivation is therefore self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
Forward citations
Cited by 1 Pith paper
-
PASA: A Principled Embedding-Space Watermarking Approach for LLM-Generated Text under Semantic-Invariant Attacks
PASA is a semantic-level watermarking method for LLM text that uses embedding-space clusters and synchronized randomness to remain detectable after paraphrasing while preserving text quality.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.