pith. sign in

arxiv: 2605.16709 · v1 · pith:JMFSU3WMnew · submitted 2026-05-15 · 💻 cs.IT · math.IT

Covert Multi-bit LLM Watermarking: An Information Theory and Coding Approach

Pith reviewed 2026-05-19 20:19 UTC · model grok-4.3

classification 💻 cs.IT math.IT
keywords LLM watermarkingmulti-bit embeddinginformation theoretic capacityGelfand-Pinsker codingpolar codescovert communicationchannel synthesis
0
0 comments X

The pith

Multi-bit covert watermarking for LLMs has an exactly characterized capacity that supports practical embedding at 0.375 bits per token.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper develops a model for embedding multiple bits of hidden information into the output of large language models in a way that is hard to detect. The model gives the encoder advance knowledge of token probabilities within short blocks, which lets it treat the problem as a communication channel with known state. By applying established coding techniques from information theory, the authors derive the precise maximum rate at which such bits can be hidden. They then construct a concrete algorithm using polar codes that approaches this rate while leaving the language model's output quality essentially unchanged.

Core claim

The paper claims that the multi-bit watermarking capacity is exactly characterized by combining Gelfand-Pinsker coding, which handles channels with non-causal state information, and channel synthesis methods, and that this characterization yields a block-wise optimization via constrained Markov decision processes leading to a polar code construction that embeds data reliably.

What carries the argument

A block-autoregressive embedding model with limited non-causal access to token distributions, which carries the argument by enabling the use of state-dependent coding to achieve covert multi-bit embedding.

If this is right

  • The exact capacity of the watermarking channel is given by an optimization over auxiliary distributions derived from Gelfand-Pinsker theory.
  • The practical scheme attains a rate of 0.375 bits per token with bit-error rate under 10 percent for short blocks.
  • Perplexity and distortion metrics show negligible degradation compared to the original LLM outputs.
  • Block-wise optimization with a constrained Markov decision process further refines the embedding strategy.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the limited non-causal access can be approximated in real LLM sampling pipelines, watermark detection could become more reliable for multi-bit messages.
  • This coding approach might generalize to watermarking other autoregressive generative models beyond language.
  • Future work could test whether the capacity remains achievable when cover statistics must be estimated on the fly rather than assumed known.

Load-bearing premise

The encoder must have limited non-causal access to the token distributions inside each block and the statistical cover properties of the LLM must be known beforehand.

What would settle it

Running the embedding algorithm in a strictly causal mode without any block lookahead and measuring whether the bit-error rate stays below 10 percent or rises significantly would directly test the role of the non-causal access.

Figures

Figures reproduced from arXiv: 2605.16709 by Matthieu R. Bloch, Sidong Guo, Teodora Baluta, Tyler Kann.

Figure 1
Figure 1. Figure 1: Information-theoretic model of multi-bit LLM water [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Top: Average bit error rate (BER) and perplexity [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
read the original abstract

We study the problem of multi-bit watermarking for large language models (LLMs). We introduce a block-autoregressive model inspired by multi-token prediction, in which the encoder has limited non-causal access to token distributions within each block. This formulation enables an information-theoretic characterization of multi-bit watermarking capacity, by which the knowledge of LLM cover statistics is leveraged to enable a multi-bit covert embedding. We study the information-theoretic limits of the model by combining Gelfand-Pinsker and channel synthesis coding techniques and obtain an exact characterization of the capacity. The embedding strategy is further optimized across blocks using a constrained Markov decision process (CMDP) and we develop an explicit algorithm based on polar codes following the information-theoretic principles. Our algorithm achieves a bit-error rate below 10 percent with a rate of 0.375 bits/token over short token lengths with negligible perplexity and distortion degradation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The paper introduces a block-autoregressive model for covert multi-bit watermarking of LLMs in which the encoder is granted limited non-causal access to token distributions inside each block. It combines Gelfand-Pinsker coding with channel-synthesis techniques to obtain an exact capacity characterization, optimizes the embedding policy across blocks via a constrained Markov decision process, and constructs an explicit polar-code algorithm. The algorithm is reported to achieve BER below 10 percent at 0.375 bits per token on short sequences while preserving perplexity and distortion.

Significance. If the capacity result is valid under the stated model, the work supplies the first information-theoretic benchmark for multi-bit covert watermarking and demonstrates that polar-code constructions can approach the bound with modest overhead. The reported operating point (0.375 bits/token, BER < 0.1) would be practically relevant if the non-causal access assumption can be realized or approximated.

major comments (1)
  1. [Model definition and capacity derivation] The exact capacity characterization (Abstract and the section deriving the Gelfand-Pinsker / channel-synthesis bound) rests on the block-autoregressive model that supplies the encoder with limited non-causal knowledge of the LLM's conditional distributions inside each block. Standard causal autoregressive sampling provides only past tokens; any practical implementation must therefore replace the assumed distributions with estimates or lookahead. Because the capacity equality is obtained by construction under this access model, the claimed characterization does not directly apply to ordinary LLM generation and the reported BER/rate numbers cannot be taken as evidence that the derived capacity is achieved in the causal setting.
minor comments (2)
  1. [Abstract and experimental section] The abstract states concrete BER and rate figures but supplies neither the precise token length, number of Monte-Carlo trials, nor confidence intervals; these details are required to evaluate whether the performance claims are statistically supported.
  2. [Algorithm section] The CMDP formulation and polar-code construction are described at a high level; explicit pseudocode or a small worked example would clarify how the auxiliary channel and state transitions are instantiated from the LLM logits.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the careful reading of our manuscript and the constructive feedback. We are glad that the significance of the capacity characterization and the practical relevance of the operating point are recognized. We address the major comment below.

read point-by-point responses
  1. Referee: [Model definition and capacity derivation] The exact capacity characterization (Abstract and the section deriving the Gelfand-Pinsker / channel-synthesis bound) rests on the block-autoregressive model that supplies the encoder with limited non-causal knowledge of the LLM's conditional distributions inside each block. Standard causal autoregressive sampling provides only past tokens; any practical implementation must therefore replace the assumed distributions with estimates or lookahead. Because the capacity equality is obtained by construction under this access model, the claimed characterization does not directly apply to ordinary LLM generation and the reported BER/rate numbers cannot be taken as evidence that the derived capacity is achieved in the causal setting.

    Authors: We agree that the exact capacity characterization is obtained under the block-autoregressive model with limited non-causal access to token distributions inside each block, as explicitly introduced in the manuscript and motivated by multi-token prediction techniques. This modeling assumption enables the combination of Gelfand-Pinsker coding with channel synthesis to derive the capacity. We do not claim the result applies directly to standard causal autoregressive sampling without approximation. In the revision we will expand the model section and add a dedicated discussion clarifying the scope of the result as an information-theoretic benchmark, while outlining practical approximations such as using auxiliary prediction heads or estimated future distributions to realize the access in causal settings. The reported BER and rate results demonstrate achievability of the polar-code construction within the stated model. revision: yes

Circularity Check

0 steps flagged

Capacity derivation applies standard Gelfand-Pinsker and channel-synthesis techniques to a newly defined block model without self-referential reduction.

full rationale

The paper introduces a block-autoregressive model with limited non-causal access to token distributions and then invokes established Gelfand-Pinsker coding for state-dependent channels together with channel synthesis for covertness to obtain an exact capacity expression. This constitutes an application of known information-theoretic tools to the custom model rather than a self-definitional loop or a fitted parameter renamed as a prediction. The subsequent CMDP optimization and polar-code algorithm are constructed explicitly from the derived capacity, and the reported BER and rate figures are presented as empirical outcomes of that algorithm. No load-bearing self-citations or uniqueness theorems from prior author work are required for the central characterization, rendering the derivation self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claims rest on standard information-theoretic results applied to a newly introduced channel model; no free parameters are explicitly fitted in the abstract, and the only invented entity is the block-autoregressive model itself.

axioms (1)
  • standard math Gelfand-Pinsker theorem and channel synthesis results hold for the defined block-autoregressive watermarking channel.
    Invoked to obtain the exact capacity characterization.
invented entities (1)
  • block-autoregressive model with limited non-causal access no independent evidence
    purpose: To enable information-theoretic analysis of multi-bit covert embedding by providing partial future token distribution knowledge within blocks.
    Introduced to model the encoder's access and to support the capacity derivation.

pith-pipeline@v0.9.0 · 5690 in / 1631 out tokens · 65645 ms · 2026-05-19T20:19:36.931891+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

46 extracted references · 46 canonical work pages · 3 internal anchors

  1. [1]

    A watermark for large language models,

    J. Kirchenbauer, J. Geiping, Y . Wen, J. Katz, I. Miers, and T. Goldstein, “A watermark for large language models,” inProc. Int. Conf. Mach. Learn. (ICML). PMLR, Jul. 2023, pp. 17 061–17 084

  2. [2]

    Undetectable watermarks for language models,

    M. Christ, S. Gunn, and O. Zamir, “Undetectable watermarks for language models,” inProc. Conf. Learn. Theory (COLT). PMLR, Jun. 2024, pp. 1125–1139

  3. [3]

    Heavywater and simplexwater: Distortion- free llm watermarks for low-entropy distributions,

    D. Tsur, C. X. Long, C. M. Verdun, S. Vithana, H. Hsu, C.-F. Chen, H. H. Permuter, and F. Calmon, “Heavywater and simplexwater: Distortion- free llm watermarks for low-entropy distributions,” inAdv. Neural Inf. Process. Syst. (NeurIPS), Dec. 2025

  4. [4]

    Provable robust watermarking for ai-generated text,

    X. Zhao, P. Ananth, L. Li, and Y .-X. Wang, “Provable robust water- marking for ai-generated text,”arXiv arXiv:2306.17439, 2023

  5. [5]

    Robust content-dependent high- fidelity watermark for tracking in digital cinema,

    J. Lubin, J. A. Bloom, and H. Cheng, “Robust content-dependent high- fidelity watermark for tracking in digital cinema,” inSecurity and Watermarking of Multimedia Contents V, vol. 5020. SPIE, 2003, pp. 536–545

  6. [6]

    Robust multi-bit natural language watermarking through invariant features,

    K. Yoo, W. Ahn, J. Jang, and N. Kwak, “Robust multi-bit natural language watermarking through invariant features,”arXiv arXiv:2305.01904, 2023

  7. [7]

    Advancing beyond identification: Multi- bit watermark for large language models,

    K. Yoo, W. Ahn, and N. Kwak, “Advancing beyond identification: Multi- bit watermark for large language models,”arXiv arXiv:2308.00221, 2023

  8. [8]

    Towards codable watermarking for injecting multi-bits information to llms,

    L. Wang, W. Yang, D. Chen, H. Zhou, Y . Lin, F. Meng, J. Zhou, and X. Sun, “Towards codable watermarking for injecting multi-bits information to llms,”arXiv arXiv:2307.15992, 2023

  9. [9]

    Provably robust multi-bit watermarking for{AI-generated} text,

    W. Qu, W. Zheng, T. Tao, D. Yin, Y . Jiang, Z. Tian, W. Zou, J. Jia, and J. Zhang, “Provably robust multi-bit watermarking for{AI-generated} text,” inProc. USENIX Secur. Symp., 2025, pp. 201–220

  10. [10]

    Arcmark: Multi-bit llm watermark via optimal transport,

    A. Gilani, C. X. Long, S. Vithana, O. Kosut, L. Sankar, and F. P. Calmon, “Arcmark: Multi-bit llm watermark via optimal transport,” arXiv arXiv:2602.07235, 2026

  11. [11]

    Information-theoretic analysis of information hiding,

    P. Moulin and J. A. O’Sullivan, “Information-theoretic analysis of information hiding,”IEEE Trans. Inf. Theory, vol. 49, no. 3, pp. 563– 593, 2003

  12. [12]

    DeepSeek-V3 Technical Report

    A. Liu, B. Feng, B. Xue, B. Wang, B. Wu, C. Lu, C. Zhao, C. Deng, C. Zhang, C. Ruanet al., “Deepseek-v3 technical report,”arXiv arXiv:2412.19437, 2024

  13. [13]

    Better & Faster Large Language Models via Multi-token Prediction

    F. Gloeckle, B. Y . Idrissi, B. Rozi `ere, D. Lopez-Paz, and G. Synnaeve, “Better & faster large language models via multi-token prediction,”arXiv arXiv:2404.19737, 2024

  14. [14]

    Approximation theory of output statistics,

    T. S. Han and S. Verd ´u, “Approximation theory of output statistics,” IEEE Trans. Inf. Theory, vol. 39, no. 3, pp. 752–772, 2002

  15. [15]

    Distributed channel synthesis,

    P. Cuff, “Distributed channel synthesis,”IEEE Trans. Inf. Theory, vol. 59, no. 11, pp. 7071–7096, 2013

  16. [16]

    The likelihood encoder for lossy compression,

    E. C. Song, P. Cuff, and H. V . Poor, “The likelihood encoder for lossy compression,”IEEE Trans. Inf. Theory, vol. 62, no. 4, pp. 1836–1849, 2016

  17. [17]

    Strong secrecy from channel resolv- ability,

    M. R. Bloch and J. N. Laneman, “Strong secrecy from channel resolv- ability,”IEEE Trans. Inf. Theory, vol. 59, no. 12, pp. 8077–8098, 2013

  18. [18]

    Covert communication over noisy channels: A resolvabil- ity perspective,

    M. R. Bloch, “Covert communication over noisy channels: A resolvabil- ity perspective,”IEEE Trans. Inf. Theory, vol. 62, no. 5, pp. 2334–2354, 2016

  19. [19]

    W. B. Powell,Approximate Dynamic Programming: Solving the curses of dimensionality. John Wiley & Sons, 2007, vol. 703

  20. [20]

    Altman,Constrained Markov decision processes

    E. Altman,Constrained Markov decision processes. Routledge, 2021

  21. [21]

    Relatively-secure llm-based steganography via constrained markov decision processes,

    Y .-S. Huang, C. Tian, K. Narayanan, and L. Zheng, “Relatively-secure llm-based steganography via constrained markov decision processes,” in Proc. IEEE Int. Symp. Inf. Theory (ISIT). IEEE, Jun. 2025, pp. 1–6

  22. [22]

    Polar codes for slepian-wolf, wyner-ziv, and gelfand-pinsker,

    S. B. Korada and R. Urbanke, “Polar codes for slepian-wolf, wyner-ziv, and gelfand-pinsker,” in2010 IEEE Information Theory Workshop on Information Theory (ITW 2010, Cairo). IEEE, 2010, pp. 1–5

  23. [23]

    Polar coding for the broadcast channel with confidential messages: A random binning analogy,

    R. A. Chou and M. R. Bloch, “Polar coding for the broadcast channel with confidential messages: A random binning analogy,”IEEE Transac- tions on Information Theory, vol. 62, no. 5, pp. 2410–2429, May 2016

  24. [24]

    Empirical and strong coordination via soft covering with polar codes,

    R. A. Chou, M. R. Bloch, and J. Kliewer, “Empirical and strong coordination via soft covering with polar codes,”IEEE Trans. Inf. Theory, vol. 64, no. 7, pp. 5087–5100, 2018

  25. [25]

    Channel polarization: A method for constructing capacity- achieving codes for symmetric binary-input memoryless channels,

    E. Arikan, “Channel polarization: A method for constructing capacity- achieving codes for symmetric binary-input memoryless channels,”IEEE Transactions on information Theory, vol. 55, no. 7, pp. 3051–3073, 2009

  26. [26]

    T. M. Cover and J. A. Thomas,Elements of Information Theory. Wiley, 2006

  27. [27]

    V ocalnet: Speech llms with multi-token prediction for faster and high- quality generation,

    Y . Wang, H. Liu, Z. Cheng, R. Wu, Q. Gu, Y . Wang, and Y . Wang, “V ocalnet: Speech llms with multi-token prediction for faster and high- quality generation,” inProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025, pp. 19 595–19 612

  28. [28]

    Fast inference from transform- ers via speculative decoding,

    Y . Leviathan, M. Kalman, and Y . Matias, “Fast inference from transform- ers via speculative decoding,” inInternational Conference on Machine Learning. PMLR, 2023, pp. 19 274–19 286

  29. [29]

    Perfectly secure steganography: Capacity, error exponents, and code constructions,

    Y . Wang and P. Moulin, “Perfectly secure steganography: Capacity, error exponents, and code constructions,”IEEE Trans. Inf. Theory, vol. 54, no. 6, pp. 2706–2722, 2008

  30. [30]

    Covert multi-bit LLM watermarking: An information theory and coding approach,

    S. Guo, T. Kann, T. Baluta, and M. R. Bloch, “Covert multi-bit LLM watermarking: An information theory and coding approach,” arXiv, 2026

  31. [31]

    T. S. Han,Information-spectrum methods in information theory. Springer Science & Business Media, 2013, vol. 50

  32. [32]

    Watermarking makes language models radioactive

    T. Sander, P. Fernandez, A. Durmus, T. Furon, and M. Douze, “Wa- termarking makes language models radioactive,”NeurIPS, Dec. 2024, arXiv:2402.14904

  33. [33]

    Detecting benchmark contamination through watermarking,

    T. Sander, P. Fernandez, S. Mahloujifar, A. Durmus, and C. Guo, “Detecting benchmark contamination through watermarking,” inICLR, 2025, arXiv:2502.17259

  34. [34]

    Meteor: Cryptographically secure steganography for realistic distributions,

    G. Kaptchuk, T. M. Jois, M. Green, and A. D. Rubin, “Meteor: Cryptographically secure steganography for realistic distributions,” in ACM CCS, 2021

  35. [35]

    Perfectly secure steganography using minimum entropy coupling,

    C. S. de Witt, S. Sokota, J. Z. Kolter, J. Foerster, and M. Strohmeier, “Perfectly secure steganography using minimum entropy coupling,” in NeurIPS, Dec. 2023

  36. [36]

    Your llm knows the future: Uncovering its multi-token prediction potential.arXiv preprint arXiv:2507.11851, 2025

    M. Samragh, A. Kundu, D. Harrison, K. Nishu, D. Naik, M. Cho, and M. Farajtabar, “Your llm knows the future: Uncovering its multi-token prediction potential,”arXiv arXiv:2507.11851, 2025

  37. [37]

    Reward Constrained Policy Optimization

    C. Tessler, D. J. Mankowitz, and S. Mannor, “Reward constrained policy optimization,”arXiv preprint arXiv:1805.11074, 2018

  38. [38]

    A memory-based reinforcement learning approach to integrated sensing and communi- cation,

    H. Nikbakht, M. Wigger, S. S. Shitz, and H. V . Poor, “A memory-based reinforcement learning approach to integrated sensing and communi- cation,” in2024 58th Asilomar Conference on Signals, Systems, and Computers. IEEE, 2024, pp. 433–437

  39. [39]

    Joint successive cancellation decoding of polar codes over intersymbol interference channels,

    R. Wang, R. Liu, and Y . Hou, “Joint successive cancellation decoding of polar codes over intersymbol interference channels,” 2014

  40. [40]

    El Gamal and Y .-H

    A. El Gamal and Y .-H. Kim,Network information theory. Cambridge university press, 2011

  41. [41]

    A stronger soft-covering lemma and applications,

    P. Cuff, “A stronger soft-covering lemma and applications,” in2015 IEEE Conference on Communications and Network Security (CNS). IEEE, 2015, pp. 40–43

  42. [42]

    Information theory: Coding theorems for discrete memoryless systems,

    I. Csiszar and J. Korner, “Information theory: Coding theorems for discrete memoryless systems,” 1982

  43. [43]

    Polyanskiy and Y

    Y . Polyanskiy and Y . Wu,Information theory: From coding to learning. Cambridge university press, 2025. APPENDIXA ACHIEVABILITYPROOF OFTHEOREM1 The proof uses a random coding argument with likelihood encoder and soft-covering in place of covering argument used in standard Gelfand-Pinsker code [40]. A. Code Description Consider a random variableUsuch that...

  44. [44]

    A total of2 n(Rk+R+R0) codewords are drawn i.i.d from the distribution P ⊗n U =Qn t=1 PU(ut)

    Codebook Generation: Construct a codebookC n consisting ofk∈ K≜[1,2 nRk]bins, each containingm∈ M≜ [1,2 nR]sub-bins of sizej∈ J≜[1,2 nR0]. A total of2 n(Rk+R+R0) codewords are drawn i.i.d from the distribution P ⊗n U =Qn t=1 PU(ut)

  45. [45]

    The encoder then samplesx n according toW X n|U n,Sn(xn|un(m, k, j), sn)

    Encoding: For each state sequences n and keyK=k, to send messageM=m, the encoder selects codewordu n(m, k, j) with likelihood encoder by drawing the indexJ=jfrom PJ|M,K,S n(j|m, k, sn) = P ⊗n S|U (sn|un(m, k, j)) P|J | j=1 P ⊗n S|U (sn|un(m, k, j)) ,(20) whereP ⊗n S|U is induced byP ⊗n S and our choice of auxiliary random variableU. The encoder then sampl...

  46. [46]

    Otherwise, the decoder declares an error

    Decoding: The decoder receivesx n and with access tok, decides messagemif there exists a unique pair(m, j)such that(x n, un(m, k, j))∈ T n ϵ (XU), whereT n ϵ (XU)is the joint typical set. Otherwise, the decoder declares an error. B. Analysis of Covertness We first recall known results regarding soft covering [15], [41] in the following lemma. Lemma 3(Soft...