Can we Watermark Low-Entropy LLM Outputs?
Pith reviewed 2026-05-10 15:09 UTC · model grok-4.3
The pith
Watermarking schemes exist for LLM outputs with only constant per-token entropy that remain robust to random substitutions and deletions under cryptographic assumptions.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors construct watermarking schemes for the constant per-token entropy regime. One scheme is robust against random substitutions assuming subexponential LPN. The second is robust against random substitutions and random deletions either under the heuristic that LLM outputs introduce only random errors or given a pseudorandom error-correcting code that tolerates adversarial substitutions and random deletions.
What carries the argument
The watermarking construction that encodes a mark into constant-entropy sequences so that it survives random noise while remaining statistically undetectable, using either LPN hardness or pseudorandom error-correcting codes.
If this is right
- Watermarking becomes applicable to repetitive or predictable LLM outputs that were previously excluded.
- The embedded mark does not change the probability distribution of the generated text.
- Robustness holds specifically against random rather than fully adversarial edits.
- The same techniques extend prior high-entropy watermarking results to the constant-entropy setting.
Where Pith is reading between the lines
- The constructions indicate that standard cryptographic primitives can be repurposed to handle the statistical properties of real LLM text.
- Relaxing the random-edit model to adversarial edits would require new code constructions that tolerate both substitution and deletion patterns simultaneously.
- Empirical checks on actual model outputs could test whether the random-error heuristic holds in practice.
- If deployed, such schemes could support origin verification for generated text in domains where low entropy is common.
Load-bearing premise
The per-token entropy stays constant and the subexponential LPN assumption holds or the LLM outputs introduce only random errors.
What would settle it
An explicit low-entropy output sequence together with a small number of random substitutions that removes the embedded mark while the edited text remains indistinguishable from an unwatermarked sample generated by the same model.
Figures
read the original abstract
A recent and exciting thread of work focuses on developing methods for watermarking the output of large language models (LLMs). We focus on provably undetectable watermarking-that is, schemes that do not alter the output distribution of the LLM, yet enable embedding a watermark in the output that identifies the output as having been generated by the particular LLM. Furthermore, the watermark should be hard to remove by an adversary that may potentially edit, insert, or delete tokens from the watermarked output. Indeed, recent work (Christ et al. [COLT'24], Christ et al. [CRYPTO'24], Golowich et al. [NeuroIPS'24]) shows how to develop such schemes that are robust against a constant fraction of substitutions, or even against a constant fraction of arbitrary edits. These works, however, make strong assumptions on the entropy present in the output of the LLM. Most notably, they all require constant entropy rate-that is, a constant fraction of the tokens in a sufficiently long substring of the output need to have empirical entropy at least O(log |T|), where T is the alphabet of tokens, and Golowich et al. additionally require T to be larger than the security parameter. In this work, we consider whether we can also watermark the outputs of LLMs when the per-token entropy is just a constant, discarding the dependence on the alphabet size or security parameter. In this regime, we construct: - A watermarking scheme robust against random substitutions (assuming subexponential LPN, as in Christ et al. [CRYPTO'24]) - A watermarking scheme robust against random substitutions and random deletions, given either the additional heuristic assumption that the output of the LLM only introduces random errors (analogous to the assumption made by Christ et al. [CRYPTO'24]) or a construction of a pseudorandom error-correcting code robust to adversarial substitutions and random deletions.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper explores whether provably undetectable watermarking is possible for LLM outputs with only constant per-token entropy (independent of alphabet size or security parameter). It gives two constructions: (1) a scheme robust to random substitutions under the subexponential LPN assumption, and (2) a scheme robust to random substitutions plus random deletions, either under the heuristic that LLM outputs introduce only random errors or assuming the existence of a pseudorandom error-correcting code with the required robustness properties. The work extends prior results (Christ et al., COLT'24 and CRYPTO'24) that required constant entropy rate.
Significance. If the reductions and constructions are correct, the paper meaningfully broadens the applicability of cryptographic watermarking to low-entropy regimes that are more representative of many LLM outputs. By embedding the mark across the full sequence via LPN-based techniques rather than per-token entropy, and by handling deletions via ECC or the stated heuristic, it provides concrete progress toward practical robust watermarking while making the assumptions explicit.
major comments (1)
- [Abstract (second bullet) / Construction for deletions] The second construction's robustness to deletions is load-bearing on either the random-error heuristic or the existence of a pseudorandom ECC robust to adversarial substitutions and random deletions; the paper should clarify whether the heuristic can be justified beyond analogy to Christ et al. or whether the ECC existence is merely a placeholder, as this directly affects the strength of the claimed robustness guarantee.
minor comments (2)
- Define the precise model of 'constant per-token entropy' (including how it interacts with token alphabet size) in the formal sections, as the abstract phrasing 'discarding the dependence' could be misinterpreted.
- Include a brief comparison of detection probability, output length requirements, and computational overhead relative to the high-entropy schemes of Christ et al. to highlight the trade-offs of the low-entropy regime.
Simulated Author's Rebuttal
We thank the referee for their careful review and positive assessment of our work, including the recommendation for minor revision. We address the major comment below.
read point-by-point responses
-
Referee: [Abstract (second bullet) / Construction for deletions] The second construction's robustness to deletions is load-bearing on either the random-error heuristic or the existence of a pseudorandom ECC robust to adversarial substitutions and random deletions; the paper should clarify whether the heuristic can be justified beyond analogy to Christ et al. or whether the ECC existence is merely a placeholder, as this directly affects the strength of the claimed robustness guarantee.
Authors: We agree that the presentation of the second construction would benefit from greater clarity on the nature of the two alternatives for deletion robustness. In the revised manuscript we will update the abstract and the relevant technical sections to explicitly state that the random-error heuristic is an additional modeling assumption, motivated by (but not formally justified beyond) the analogous heuristic used in Christ et al. [CRYPTO'24]. We will also clarify that the pseudorandom-ECC alternative assumes the existence of a code with the stated robustness properties; we do not construct such a code in this work and present it as a potential avenue under standard cryptographic assumptions rather than a fully instantiated scheme. These changes will make the conditional character of the robustness claims transparent while leaving the core technical contributions unchanged. revision: yes
Circularity Check
No significant circularity detected
full rationale
The paper's central constructions reduce robustness of watermarking schemes for constant per-token entropy outputs to external cryptographic assumptions (subexponential LPN hardness, as in Christ et al. [CRYPTO'24]) or an explicit heuristic that LLM outputs introduce only random errors. These reductions are stated explicitly in the abstract and use cryptographic primitives to embed marks across the full output rather than depending on high-entropy tokens. No load-bearing step equates a derived quantity to its inputs by definition, renames a fitted parameter as a prediction, or relies on a self-citation chain whose verification is internal to the paper. The low-entropy handling follows directly from the stated primitives without self-referential definitions or ansatzes smuggled via citation. The derivation chain is self-contained against the listed external benchmarks.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Subexponential hardness of the learning parity with noise problem
- ad hoc to paper LLM output introduces only random errors (heuristic)
Reference graph
Works this paper leans on
-
[1]
[AAC+25] Omar Alrabiah, Prabhanjan Ananth, Miranda Christ, Yevgeniy Dodis, and Sam Gunn. Ideal pseudorandom codes. In Michal Kouck´ y and Nikhil Bansal, editors,Proceedings of the 57th Annual ACM Symposium on Theory of Computing, STOC 2025, Prague, Czechia, June 23-27, 2025, pages 1638–1647. ACM,
work page 2025
-
[2]
Pseudorandom error-correcting codes
[CG24] Miranda Christ and Sam Gunn. Pseudorandom error-correcting codes. In Leonid Reyzin and Douglas Stebila, editors,Advances in Cryptology - CRYPTO 2024 - 44th Annual International Cryptology Conference, Santa Barbara, CA, USA, August 18-22, 2024, Proceedings, Part VI, volume 14925 ofLecture Notes in Computer Science, pages 325–347. Springer, aug
work page 2024
-
[3]
Provably robust watermarks for open-source language models.arXiv preprint arXiv:2410.18861,
[CGMR24] Miranda Christ, Sam Gunn, Tal Malkin, and Mariana Raykova. Provably robust watermarks for open-source language models.CoRR, abs/2410.18861,
-
[4]
Undetectable watermarks for language models
[CGZ24] Miranda Christ, Sam Gunn, and Or Zamir. Undetectable watermarks for language models. In Shipra Agrawal and Aaron Roth, editors,The Thirty Seventh Annual Conference on Learning Theory, June 30 - July 3, 2023, Edmonton, Canada, volume 247 ofProceedings of Machine Learning Research, pages 1125–1139. PMLR,
work page 2023
-
[5]
New constructions of pseudorandom codes
[GG24] Surendra Ghentiyala and Venkatesan Guruswami. New constructions of pseudorandom codes. CoRR, abs/2409.07580,
-
[6]
Edit distance robust watermarks for language models.CoRR, abs/2406.02633,
[GM24] Noah Golowich and Ankur Moitra. Edit distance robust watermarks for language models.CoRR, abs/2406.02633,
-
[7]
A watermark for large language models
26 [KGW+23] John Kirchenbauer, Jonas Geiping, Yuxin Wen, Jonathan Katz, Ian Miers, and Tom Goldstein. A watermark for large language models. In Andreas Krause, Emma Brunskill, Kyunghyun Cho, Barbara Engelhardt, Sivan Sabato, and Jonathan Scarlett, editors,International Conference on Machine Learning, ICML 2023, 23-29 July 2023, Honolulu, Hawaii, USA, volu...
work page 2023
-
[8]
Robust distortion- free watermarks for language models.arXiv preprint arXiv:2307.15593, 2023
[KTHL23] Rohith Kuditipudi, John Thickstun, Tatsunori Hashimoto, and Percy Liang. Robust distortion- free watermarks for language models.CoRR, abs/2307.15593,
-
[9]
[ZALW23] Xuandong Zhao, Prabhanjan Ananth, Lei Li, and Yu-Xiang Wang. Provable robust watermark- ing for ai-generated text.CoRR, abs/2306.17439,
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.