Can we Watermark Low-Entropy LLM Outputs?

Andrew Morgan; Noam Mazor; Rafael Pass

arxiv: 2604.12051 · v1 · submitted 2026-04-13 · 💻 cs.CR

Can we Watermark Low-Entropy LLM Outputs?

Noam Mazor , Andrew Morgan , Rafael Pass This is my paper

Pith reviewed 2026-05-10 15:09 UTC · model grok-4.3

classification 💻 cs.CR

keywords LLM watermarkinglow entropyundetectable watermarkingrandom substitutionslearning parity with noiseerror-correcting codesrobustness to edits

0 comments

The pith

Watermarking schemes exist for LLM outputs with only constant per-token entropy that remain robust to random substitutions and deletions under cryptographic assumptions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper asks whether provably undetectable watermarking remains possible for LLM outputs when each token carries only a constant amount of entropy instead of a constant fraction of high-entropy tokens. Earlier constructions required higher entropy rates to survive edits such as substitutions or deletions while leaving the output distribution unchanged. The authors give explicit constructions that achieve robustness to random substitutions alone under the subexponential learning-parity-with-noise assumption and to random substitutions plus deletions when either the LLM is assumed to introduce only random errors or a suitable pseudorandom error-correcting code is available. A sympathetic reader would care because many practical LLM generations are repetitive or low-variability and therefore fell outside the scope of prior guarantees.

Core claim

The authors construct watermarking schemes for the constant per-token entropy regime. One scheme is robust against random substitutions assuming subexponential LPN. The second is robust against random substitutions and random deletions either under the heuristic that LLM outputs introduce only random errors or given a pseudorandom error-correcting code that tolerates adversarial substitutions and random deletions.

What carries the argument

The watermarking construction that encodes a mark into constant-entropy sequences so that it survives random noise while remaining statistically undetectable, using either LPN hardness or pseudorandom error-correcting codes.

If this is right

Watermarking becomes applicable to repetitive or predictable LLM outputs that were previously excluded.
The embedded mark does not change the probability distribution of the generated text.
Robustness holds specifically against random rather than fully adversarial edits.
The same techniques extend prior high-entropy watermarking results to the constant-entropy setting.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The constructions indicate that standard cryptographic primitives can be repurposed to handle the statistical properties of real LLM text.
Relaxing the random-edit model to adversarial edits would require new code constructions that tolerate both substitution and deletion patterns simultaneously.
Empirical checks on actual model outputs could test whether the random-error heuristic holds in practice.
If deployed, such schemes could support origin verification for generated text in domains where low entropy is common.

Load-bearing premise

The per-token entropy stays constant and the subexponential LPN assumption holds or the LLM outputs introduce only random errors.

What would settle it

An explicit low-entropy output sequence together with a small number of random substitutions that removes the embedded mark while the edited text remains indistinguishable from an unwatermarked sample generated by the same model.

Figures

Figures reproduced from arXiv: 2604.12051 by Andrew Morgan, Noam Mazor, Rafael Pass.

**Figure 1.** Figure 1: Summary of our results compared to related results in terms of per-token entropy requirement (over a sufficiently long substring unless otherwise stated); robustness to (random or adversarial) substitutions, insertions, and deletions; and required assumptions. While [GM24] demonstrates a PRC with the requisite robustness, their construction requires a large alphabet of tokens, making it unsuitable for our… view at source ↗

**Figure 2.** Figure 2: Protocol Gen for generating the secret information used in the watermarking protocol. ness and soundness); afterwards, we prove the definitions of robustness for each of our results in turn based on the respective required robustness properties for PRC. 4.1 Undetectability and Soundness Claim 1. (Gen, Watermark, Detect) satisfies soundness and undetectability for any family of generative models M = {Gener… view at source ↗

**Figure 3.** Figure 3: Protocol Watermark for watermarking a generative model. identically distributed to the output of Generate on any specific prompt prompt. We shall denote the above “ideal” experiment by Ideal. However, as c is instead the pseudorandom output of PRC, assume towards contradiction that there is some adversary A and polynomial p(·) where |Pr[sk ← Gen(1n ) : A Watermarksk(1n,·) (1n ) = 1] − Pr[A OGenerate(1n,·) … view at source ↗

**Figure 4.** Figure 4: Protocol Detect for detecting the presence of a watermark in a given message. is by construction equal to |Pr[sk ← Gen(1n ) : A Watermarksk(1n,·) (1n ) = 1] − Pr[A OGenerate(1n,·) (1n ) = 1]| But since we assumed this latter quantity was at least polynomial p(n), the former must be as well, contradicting the pseudorandomness of PRC and completing the argument that (Gen, Watermark, Detect) satisfies undete… view at source ↗

read the original abstract

A recent and exciting thread of work focuses on developing methods for watermarking the output of large language models (LLMs). We focus on provably undetectable watermarking-that is, schemes that do not alter the output distribution of the LLM, yet enable embedding a watermark in the output that identifies the output as having been generated by the particular LLM. Furthermore, the watermark should be hard to remove by an adversary that may potentially edit, insert, or delete tokens from the watermarked output. Indeed, recent work (Christ et al. [COLT'24], Christ et al. [CRYPTO'24], Golowich et al. [NeuroIPS'24]) shows how to develop such schemes that are robust against a constant fraction of substitutions, or even against a constant fraction of arbitrary edits. These works, however, make strong assumptions on the entropy present in the output of the LLM. Most notably, they all require constant entropy rate-that is, a constant fraction of the tokens in a sufficiently long substring of the output need to have empirical entropy at least O(log |T|), where T is the alphabet of tokens, and Golowich et al. additionally require T to be larger than the security parameter. In this work, we consider whether we can also watermark the outputs of LLMs when the per-token entropy is just a constant, discarding the dependence on the alphabet size or security parameter. In this regime, we construct: - A watermarking scheme robust against random substitutions (assuming subexponential LPN, as in Christ et al. [CRYPTO'24]) - A watermarking scheme robust against random substitutions and random deletions, given either the additional heuristic assumption that the output of the LLM only introduces random errors (analogous to the assumption made by Christ et al. [CRYPTO'24]) or a construction of a pseudorandom error-correcting code robust to adversarial substitutions and random deletions.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper gives constructions for watermarking LLM outputs with only constant per-token entropy, reducing robustness to subexponential LPN or a pseudorandom ECC plus heuristic.

read the letter

The main takeaway is that this paper shows how to watermark LLM outputs even when each token has only constant entropy, using cryptographic assumptions to make it robust to edits. It builds directly on the line of work from Christ et al. and Golowich et al., but drops the constant entropy rate requirement that those papers needed. Instead, it uses subexponential LPN to handle random substitutions across the entire output. For deletions as well, it either adds a heuristic that errors are random or relies on a pseudorandom error-correcting code with the right properties. This is genuinely new in the constant-entropy setting. The reductions are stated clearly, and they avoid the high-entropy tokens that earlier schemes depended on. The paper does a good job of being explicit about its assumptions. It doesn't hide behind vague claims. On the downside, subexponential LPN is not a lightweight assumption, and the existence of the required pseudorandom ECC is left open. The heuristic about random errors from the LLM might not hold up against determined adversaries who edit deliberately. Without concrete parameters or efficiency analysis, it's difficult to judge how practical any of this would be. The work is aimed at cryptographers and security researchers focused on generative AI. Someone familiar with the cited papers will see the incremental advance clearly. It is worth a serious referee's time because it addresses a stated limitation in the existing literature with new constructions. I recommend putting it through peer review.

Referee Report

1 major / 2 minor

Summary. The paper explores whether provably undetectable watermarking is possible for LLM outputs with only constant per-token entropy (independent of alphabet size or security parameter). It gives two constructions: (1) a scheme robust to random substitutions under the subexponential LPN assumption, and (2) a scheme robust to random substitutions plus random deletions, either under the heuristic that LLM outputs introduce only random errors or assuming the existence of a pseudorandom error-correcting code with the required robustness properties. The work extends prior results (Christ et al., COLT'24 and CRYPTO'24) that required constant entropy rate.

Significance. If the reductions and constructions are correct, the paper meaningfully broadens the applicability of cryptographic watermarking to low-entropy regimes that are more representative of many LLM outputs. By embedding the mark across the full sequence via LPN-based techniques rather than per-token entropy, and by handling deletions via ECC or the stated heuristic, it provides concrete progress toward practical robust watermarking while making the assumptions explicit.

major comments (1)

[Abstract (second bullet) / Construction for deletions] The second construction's robustness to deletions is load-bearing on either the random-error heuristic or the existence of a pseudorandom ECC robust to adversarial substitutions and random deletions; the paper should clarify whether the heuristic can be justified beyond analogy to Christ et al. or whether the ECC existence is merely a placeholder, as this directly affects the strength of the claimed robustness guarantee.

minor comments (2)

Define the precise model of 'constant per-token entropy' (including how it interacts with token alphabet size) in the formal sections, as the abstract phrasing 'discarding the dependence' could be misinterpreted.
Include a brief comparison of detection probability, output length requirements, and computational overhead relative to the high-entropy schemes of Christ et al. to highlight the trade-offs of the low-entropy regime.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their careful review and positive assessment of our work, including the recommendation for minor revision. We address the major comment below.

read point-by-point responses

Referee: [Abstract (second bullet) / Construction for deletions] The second construction's robustness to deletions is load-bearing on either the random-error heuristic or the existence of a pseudorandom ECC robust to adversarial substitutions and random deletions; the paper should clarify whether the heuristic can be justified beyond analogy to Christ et al. or whether the ECC existence is merely a placeholder, as this directly affects the strength of the claimed robustness guarantee.

Authors: We agree that the presentation of the second construction would benefit from greater clarity on the nature of the two alternatives for deletion robustness. In the revised manuscript we will update the abstract and the relevant technical sections to explicitly state that the random-error heuristic is an additional modeling assumption, motivated by (but not formally justified beyond) the analogous heuristic used in Christ et al. [CRYPTO'24]. We will also clarify that the pseudorandom-ECC alternative assumes the existence of a code with the stated robustness properties; we do not construct such a code in this work and present it as a potential avenue under standard cryptographic assumptions rather than a fully instantiated scheme. These changes will make the conditional character of the robustness claims transparent while leaving the core technical contributions unchanged. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper's central constructions reduce robustness of watermarking schemes for constant per-token entropy outputs to external cryptographic assumptions (subexponential LPN hardness, as in Christ et al. [CRYPTO'24]) or an explicit heuristic that LLM outputs introduce only random errors. These reductions are stated explicitly in the abstract and use cryptographic primitives to embed marks across the full output rather than depending on high-entropy tokens. No load-bearing step equates a derived quantity to its inputs by definition, renames a fitted parameter as a prediction, or relies on a self-citation chain whose verification is internal to the paper. The low-entropy handling follows directly from the stated primitives without self-referential definitions or ansatzes smuggled via citation. The derivation chain is self-contained against the listed external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claims rest on the subexponential LPN assumption and an optional heuristic about random errors in LLM output; no free parameters or invented entities are introduced in the abstract.

axioms (2)

domain assumption Subexponential hardness of the learning parity with noise problem
Invoked for the first construction's robustness against random substitutions.
ad hoc to paper LLM output introduces only random errors (heuristic)
Used as an alternative assumption for the second construction handling deletions.

pith-pipeline@v0.9.0 · 5653 in / 1323 out tokens · 33945 ms · 2026-05-10T15:09:35.669622+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

9 extracted references · 9 canonical work pages

[1]

Ideal pseudorandom codes

[AAC+25] Omar Alrabiah, Prabhanjan Ananth, Miranda Christ, Yevgeniy Dodis, and Sam Gunn. Ideal pseudorandom codes. In Michal Kouck´ y and Nikhil Bansal, editors,Proceedings of the 57th Annual ACM Symposium on Theory of Computing, STOC 2025, Prague, Czechia, June 23-27, 2025, pages 1638–1647. ACM,

work page 2025
[2]

Pseudorandom error-correcting codes

[CG24] Miranda Christ and Sam Gunn. Pseudorandom error-correcting codes. In Leonid Reyzin and Douglas Stebila, editors,Advances in Cryptology - CRYPTO 2024 - 44th Annual International Cryptology Conference, Santa Barbara, CA, USA, August 18-22, 2024, Proceedings, Part VI, volume 14925 ofLecture Notes in Computer Science, pages 325–347. Springer, aug

work page 2024
[3]

Provably robust watermarks for open-source language models.arXiv preprint arXiv:2410.18861,

[CGMR24] Miranda Christ, Sam Gunn, Tal Malkin, and Mariana Raykova. Provably robust watermarks for open-source language models.CoRR, abs/2410.18861,

work page arXiv
[4]

Undetectable watermarks for language models

[CGZ24] Miranda Christ, Sam Gunn, and Or Zamir. Undetectable watermarks for language models. In Shipra Agrawal and Aaron Roth, editors,The Thirty Seventh Annual Conference on Learning Theory, June 30 - July 3, 2023, Edmonton, Canada, volume 247 ofProceedings of Machine Learning Research, pages 1125–1139. PMLR,

work page 2023
[5]

New constructions of pseudorandom codes

[GG24] Surendra Ghentiyala and Venkatesan Guruswami. New constructions of pseudorandom codes. CoRR, abs/2409.07580,

work page arXiv
[6]

Edit distance robust watermarks for language models.CoRR, abs/2406.02633,

[GM24] Noah Golowich and Ankur Moitra. Edit distance robust watermarks for language models.CoRR, abs/2406.02633,

work page arXiv
[7]

A watermark for large language models

26 [KGW+23] John Kirchenbauer, Jonas Geiping, Yuxin Wen, Jonathan Katz, Ian Miers, and Tom Goldstein. A watermark for large language models. In Andreas Krause, Emma Brunskill, Kyunghyun Cho, Barbara Engelhardt, Sivan Sabato, and Jonathan Scarlett, editors,International Conference on Machine Learning, ICML 2023, 23-29 July 2023, Honolulu, Hawaii, USA, volu...

work page 2023
[8]

Robust distortion- free watermarks for language models.arXiv preprint arXiv:2307.15593, 2023

[KTHL23] Rohith Kuditipudi, John Thickstun, Tatsunori Hashimoto, and Percy Liang. Robust distortion- free watermarks for language models.CoRR, abs/2307.15593,

work page arXiv
[9]

Review outline:

[ZALW23] Xuandong Zhao, Prabhanjan Ananth, Lei Li, and Yu-Xiang Wang. Provable robust watermark- ing for ai-generated text.CoRR, abs/2306.17439,

work page arXiv

[1] [1]

Ideal pseudorandom codes

[AAC+25] Omar Alrabiah, Prabhanjan Ananth, Miranda Christ, Yevgeniy Dodis, and Sam Gunn. Ideal pseudorandom codes. In Michal Kouck´ y and Nikhil Bansal, editors,Proceedings of the 57th Annual ACM Symposium on Theory of Computing, STOC 2025, Prague, Czechia, June 23-27, 2025, pages 1638–1647. ACM,

work page 2025

[2] [2]

Pseudorandom error-correcting codes

[CG24] Miranda Christ and Sam Gunn. Pseudorandom error-correcting codes. In Leonid Reyzin and Douglas Stebila, editors,Advances in Cryptology - CRYPTO 2024 - 44th Annual International Cryptology Conference, Santa Barbara, CA, USA, August 18-22, 2024, Proceedings, Part VI, volume 14925 ofLecture Notes in Computer Science, pages 325–347. Springer, aug

work page 2024

[3] [3]

Provably robust watermarks for open-source language models.arXiv preprint arXiv:2410.18861,

[CGMR24] Miranda Christ, Sam Gunn, Tal Malkin, and Mariana Raykova. Provably robust watermarks for open-source language models.CoRR, abs/2410.18861,

work page arXiv

[4] [4]

Undetectable watermarks for language models

[CGZ24] Miranda Christ, Sam Gunn, and Or Zamir. Undetectable watermarks for language models. In Shipra Agrawal and Aaron Roth, editors,The Thirty Seventh Annual Conference on Learning Theory, June 30 - July 3, 2023, Edmonton, Canada, volume 247 ofProceedings of Machine Learning Research, pages 1125–1139. PMLR,

work page 2023

[5] [5]

New constructions of pseudorandom codes

[GG24] Surendra Ghentiyala and Venkatesan Guruswami. New constructions of pseudorandom codes. CoRR, abs/2409.07580,

work page arXiv

[6] [6]

Edit distance robust watermarks for language models.CoRR, abs/2406.02633,

[GM24] Noah Golowich and Ankur Moitra. Edit distance robust watermarks for language models.CoRR, abs/2406.02633,

work page arXiv

[7] [7]

A watermark for large language models

26 [KGW+23] John Kirchenbauer, Jonas Geiping, Yuxin Wen, Jonathan Katz, Ian Miers, and Tom Goldstein. A watermark for large language models. In Andreas Krause, Emma Brunskill, Kyunghyun Cho, Barbara Engelhardt, Sivan Sabato, and Jonathan Scarlett, editors,International Conference on Machine Learning, ICML 2023, 23-29 July 2023, Honolulu, Hawaii, USA, volu...

work page 2023

[8] [8]

Robust distortion- free watermarks for language models.arXiv preprint arXiv:2307.15593, 2023

[KTHL23] Rohith Kuditipudi, John Thickstun, Tatsunori Hashimoto, and Percy Liang. Robust distortion- free watermarks for language models.CoRR, abs/2307.15593,

work page arXiv

[9] [9]

Review outline:

[ZALW23] Xuandong Zhao, Prabhanjan Ananth, Lei Li, and Yu-Xiang Wang. Provable robust watermark- ing for ai-generated text.CoRR, abs/2306.17439,

work page arXiv