pith. sign in

arxiv: 2604.12051 · v1 · submitted 2026-04-13 · 💻 cs.CR

Can we Watermark Low-Entropy LLM Outputs?

Pith reviewed 2026-05-10 15:09 UTC · model grok-4.3

classification 💻 cs.CR
keywords LLM watermarkinglow entropyundetectable watermarkingrandom substitutionslearning parity with noiseerror-correcting codesrobustness to edits
0
0 comments X

The pith

Watermarking schemes exist for LLM outputs with only constant per-token entropy that remain robust to random substitutions and deletions under cryptographic assumptions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper asks whether provably undetectable watermarking remains possible for LLM outputs when each token carries only a constant amount of entropy instead of a constant fraction of high-entropy tokens. Earlier constructions required higher entropy rates to survive edits such as substitutions or deletions while leaving the output distribution unchanged. The authors give explicit constructions that achieve robustness to random substitutions alone under the subexponential learning-parity-with-noise assumption and to random substitutions plus deletions when either the LLM is assumed to introduce only random errors or a suitable pseudorandom error-correcting code is available. A sympathetic reader would care because many practical LLM generations are repetitive or low-variability and therefore fell outside the scope of prior guarantees.

Core claim

The authors construct watermarking schemes for the constant per-token entropy regime. One scheme is robust against random substitutions assuming subexponential LPN. The second is robust against random substitutions and random deletions either under the heuristic that LLM outputs introduce only random errors or given a pseudorandom error-correcting code that tolerates adversarial substitutions and random deletions.

What carries the argument

The watermarking construction that encodes a mark into constant-entropy sequences so that it survives random noise while remaining statistically undetectable, using either LPN hardness or pseudorandom error-correcting codes.

If this is right

  • Watermarking becomes applicable to repetitive or predictable LLM outputs that were previously excluded.
  • The embedded mark does not change the probability distribution of the generated text.
  • Robustness holds specifically against random rather than fully adversarial edits.
  • The same techniques extend prior high-entropy watermarking results to the constant-entropy setting.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The constructions indicate that standard cryptographic primitives can be repurposed to handle the statistical properties of real LLM text.
  • Relaxing the random-edit model to adversarial edits would require new code constructions that tolerate both substitution and deletion patterns simultaneously.
  • Empirical checks on actual model outputs could test whether the random-error heuristic holds in practice.
  • If deployed, such schemes could support origin verification for generated text in domains where low entropy is common.

Load-bearing premise

The per-token entropy stays constant and the subexponential LPN assumption holds or the LLM outputs introduce only random errors.

What would settle it

An explicit low-entropy output sequence together with a small number of random substitutions that removes the embedded mark while the edited text remains indistinguishable from an unwatermarked sample generated by the same model.

Figures

Figures reproduced from arXiv: 2604.12051 by Andrew Morgan, Noam Mazor, Rafael Pass.

Figure 1
Figure 1. Figure 1: Summary of our results compared to related results in terms of per-token entropy requirement (over a sufficiently long substring unless otherwise stated); robustness to (random or adversarial) substitu￾tions, insertions, and deletions; and required assumptions. While [GM24] demonstrates a PRC with the requisite robustness, their construction requires a large alphabet of tokens, making it unsuitable for our… view at source ↗
Figure 2
Figure 2. Figure 2: Protocol Gen for generating the secret information used in the watermarking protocol. ness and soundness); afterwards, we prove the definitions of robustness for each of our results in turn based on the respective required robustness properties for PRC. 4.1 Undetectability and Soundness Claim 1. (Gen, Watermark, Detect) satisfies soundness and undetectability for any family of generative mod￾els M = {Gener… view at source ↗
Figure 3
Figure 3. Figure 3: Protocol Watermark for watermarking a generative model. identically distributed to the output of Generate on any specific prompt prompt. We shall denote the above “ideal” experiment by Ideal. However, as c is instead the pseudorandom output of PRC, assume towards contradiction that there is some adversary A and polynomial p(·) where |Pr[sk ← Gen(1n ) : A Watermarksk(1n,·) (1n ) = 1] − Pr[A OGenerate(1n,·) … view at source ↗
Figure 4
Figure 4. Figure 4: Protocol Detect for detecting the presence of a watermark in a given message. is by construction equal to |Pr[sk ← Gen(1n ) : A Watermarksk(1n,·) (1n ) = 1] − Pr[A OGenerate(1n,·) (1n ) = 1]| But since we assumed this latter quantity was at least polynomial p(n), the former must be as well, contra￾dicting the pseudorandomness of PRC and completing the argument that (Gen, Watermark, Detect) satisfies undete… view at source ↗
read the original abstract

A recent and exciting thread of work focuses on developing methods for watermarking the output of large language models (LLMs). We focus on provably undetectable watermarking-that is, schemes that do not alter the output distribution of the LLM, yet enable embedding a watermark in the output that identifies the output as having been generated by the particular LLM. Furthermore, the watermark should be hard to remove by an adversary that may potentially edit, insert, or delete tokens from the watermarked output. Indeed, recent work (Christ et al. [COLT'24], Christ et al. [CRYPTO'24], Golowich et al. [NeuroIPS'24]) shows how to develop such schemes that are robust against a constant fraction of substitutions, or even against a constant fraction of arbitrary edits. These works, however, make strong assumptions on the entropy present in the output of the LLM. Most notably, they all require constant entropy rate-that is, a constant fraction of the tokens in a sufficiently long substring of the output need to have empirical entropy at least O(log |T|), where T is the alphabet of tokens, and Golowich et al. additionally require T to be larger than the security parameter. In this work, we consider whether we can also watermark the outputs of LLMs when the per-token entropy is just a constant, discarding the dependence on the alphabet size or security parameter. In this regime, we construct: - A watermarking scheme robust against random substitutions (assuming subexponential LPN, as in Christ et al. [CRYPTO'24]) - A watermarking scheme robust against random substitutions and random deletions, given either the additional heuristic assumption that the output of the LLM only introduces random errors (analogous to the assumption made by Christ et al. [CRYPTO'24]) or a construction of a pseudorandom error-correcting code robust to adversarial substitutions and random deletions.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The paper explores whether provably undetectable watermarking is possible for LLM outputs with only constant per-token entropy (independent of alphabet size or security parameter). It gives two constructions: (1) a scheme robust to random substitutions under the subexponential LPN assumption, and (2) a scheme robust to random substitutions plus random deletions, either under the heuristic that LLM outputs introduce only random errors or assuming the existence of a pseudorandom error-correcting code with the required robustness properties. The work extends prior results (Christ et al., COLT'24 and CRYPTO'24) that required constant entropy rate.

Significance. If the reductions and constructions are correct, the paper meaningfully broadens the applicability of cryptographic watermarking to low-entropy regimes that are more representative of many LLM outputs. By embedding the mark across the full sequence via LPN-based techniques rather than per-token entropy, and by handling deletions via ECC or the stated heuristic, it provides concrete progress toward practical robust watermarking while making the assumptions explicit.

major comments (1)
  1. [Abstract (second bullet) / Construction for deletions] The second construction's robustness to deletions is load-bearing on either the random-error heuristic or the existence of a pseudorandom ECC robust to adversarial substitutions and random deletions; the paper should clarify whether the heuristic can be justified beyond analogy to Christ et al. or whether the ECC existence is merely a placeholder, as this directly affects the strength of the claimed robustness guarantee.
minor comments (2)
  1. Define the precise model of 'constant per-token entropy' (including how it interacts with token alphabet size) in the formal sections, as the abstract phrasing 'discarding the dependence' could be misinterpreted.
  2. Include a brief comparison of detection probability, output length requirements, and computational overhead relative to the high-entropy schemes of Christ et al. to highlight the trade-offs of the low-entropy regime.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their careful review and positive assessment of our work, including the recommendation for minor revision. We address the major comment below.

read point-by-point responses
  1. Referee: [Abstract (second bullet) / Construction for deletions] The second construction's robustness to deletions is load-bearing on either the random-error heuristic or the existence of a pseudorandom ECC robust to adversarial substitutions and random deletions; the paper should clarify whether the heuristic can be justified beyond analogy to Christ et al. or whether the ECC existence is merely a placeholder, as this directly affects the strength of the claimed robustness guarantee.

    Authors: We agree that the presentation of the second construction would benefit from greater clarity on the nature of the two alternatives for deletion robustness. In the revised manuscript we will update the abstract and the relevant technical sections to explicitly state that the random-error heuristic is an additional modeling assumption, motivated by (but not formally justified beyond) the analogous heuristic used in Christ et al. [CRYPTO'24]. We will also clarify that the pseudorandom-ECC alternative assumes the existence of a code with the stated robustness properties; we do not construct such a code in this work and present it as a potential avenue under standard cryptographic assumptions rather than a fully instantiated scheme. These changes will make the conditional character of the robustness claims transparent while leaving the core technical contributions unchanged. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper's central constructions reduce robustness of watermarking schemes for constant per-token entropy outputs to external cryptographic assumptions (subexponential LPN hardness, as in Christ et al. [CRYPTO'24]) or an explicit heuristic that LLM outputs introduce only random errors. These reductions are stated explicitly in the abstract and use cryptographic primitives to embed marks across the full output rather than depending on high-entropy tokens. No load-bearing step equates a derived quantity to its inputs by definition, renames a fitted parameter as a prediction, or relies on a self-citation chain whose verification is internal to the paper. The low-entropy handling follows directly from the stated primitives without self-referential definitions or ansatzes smuggled via citation. The derivation chain is self-contained against the listed external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claims rest on the subexponential LPN assumption and an optional heuristic about random errors in LLM output; no free parameters or invented entities are introduced in the abstract.

axioms (2)
  • domain assumption Subexponential hardness of the learning parity with noise problem
    Invoked for the first construction's robustness against random substitutions.
  • ad hoc to paper LLM output introduces only random errors (heuristic)
    Used as an alternative assumption for the second construction handling deletions.

pith-pipeline@v0.9.0 · 5653 in / 1323 out tokens · 33945 ms · 2026-05-10T15:09:35.669622+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

9 extracted references · 9 canonical work pages

  1. [1]

    Ideal pseudorandom codes

    [AAC+25] Omar Alrabiah, Prabhanjan Ananth, Miranda Christ, Yevgeniy Dodis, and Sam Gunn. Ideal pseudorandom codes. In Michal Kouck´ y and Nikhil Bansal, editors,Proceedings of the 57th Annual ACM Symposium on Theory of Computing, STOC 2025, Prague, Czechia, June 23-27, 2025, pages 1638–1647. ACM,

  2. [2]

    Pseudorandom error-correcting codes

    [CG24] Miranda Christ and Sam Gunn. Pseudorandom error-correcting codes. In Leonid Reyzin and Douglas Stebila, editors,Advances in Cryptology - CRYPTO 2024 - 44th Annual International Cryptology Conference, Santa Barbara, CA, USA, August 18-22, 2024, Proceedings, Part VI, volume 14925 ofLecture Notes in Computer Science, pages 325–347. Springer, aug

  3. [3]

    Provably robust watermarks for open-source language models.arXiv preprint arXiv:2410.18861,

    [CGMR24] Miranda Christ, Sam Gunn, Tal Malkin, and Mariana Raykova. Provably robust watermarks for open-source language models.CoRR, abs/2410.18861,

  4. [4]

    Undetectable watermarks for language models

    [CGZ24] Miranda Christ, Sam Gunn, and Or Zamir. Undetectable watermarks for language models. In Shipra Agrawal and Aaron Roth, editors,The Thirty Seventh Annual Conference on Learning Theory, June 30 - July 3, 2023, Edmonton, Canada, volume 247 ofProceedings of Machine Learning Research, pages 1125–1139. PMLR,

  5. [5]

    New constructions of pseudorandom codes

    [GG24] Surendra Ghentiyala and Venkatesan Guruswami. New constructions of pseudorandom codes. CoRR, abs/2409.07580,

  6. [6]

    Edit distance robust watermarks for language models.CoRR, abs/2406.02633,

    [GM24] Noah Golowich and Ankur Moitra. Edit distance robust watermarks for language models.CoRR, abs/2406.02633,

  7. [7]

    A watermark for large language models

    26 [KGW+23] John Kirchenbauer, Jonas Geiping, Yuxin Wen, Jonathan Katz, Ian Miers, and Tom Goldstein. A watermark for large language models. In Andreas Krause, Emma Brunskill, Kyunghyun Cho, Barbara Engelhardt, Sivan Sabato, and Jonathan Scarlett, editors,International Conference on Machine Learning, ICML 2023, 23-29 July 2023, Honolulu, Hawaii, USA, volu...

  8. [8]

    Robust distortion- free watermarks for language models.arXiv preprint arXiv:2307.15593, 2023

    [KTHL23] Rohith Kuditipudi, John Thickstun, Tatsunori Hashimoto, and Percy Liang. Robust distortion- free watermarks for language models.CoRR, abs/2307.15593,

  9. [9]

    Review outline:

    [ZALW23] Xuandong Zhao, Prabhanjan Ananth, Lei Li, and Yu-Xiang Wang. Provable robust watermark- ing for ai-generated text.CoRR, abs/2306.17439,