Covert Multi-bit LLM Watermarking: An Information Theory and Coding Approach
Pith reviewed 2026-05-19 20:19 UTC · model grok-4.3
The pith
Multi-bit covert watermarking for LLMs has an exactly characterized capacity that supports practical embedding at 0.375 bits per token.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper claims that the multi-bit watermarking capacity is exactly characterized by combining Gelfand-Pinsker coding, which handles channels with non-causal state information, and channel synthesis methods, and that this characterization yields a block-wise optimization via constrained Markov decision processes leading to a polar code construction that embeds data reliably.
What carries the argument
A block-autoregressive embedding model with limited non-causal access to token distributions, which carries the argument by enabling the use of state-dependent coding to achieve covert multi-bit embedding.
If this is right
- The exact capacity of the watermarking channel is given by an optimization over auxiliary distributions derived from Gelfand-Pinsker theory.
- The practical scheme attains a rate of 0.375 bits per token with bit-error rate under 10 percent for short blocks.
- Perplexity and distortion metrics show negligible degradation compared to the original LLM outputs.
- Block-wise optimization with a constrained Markov decision process further refines the embedding strategy.
Where Pith is reading between the lines
- If the limited non-causal access can be approximated in real LLM sampling pipelines, watermark detection could become more reliable for multi-bit messages.
- This coding approach might generalize to watermarking other autoregressive generative models beyond language.
- Future work could test whether the capacity remains achievable when cover statistics must be estimated on the fly rather than assumed known.
Load-bearing premise
The encoder must have limited non-causal access to the token distributions inside each block and the statistical cover properties of the LLM must be known beforehand.
What would settle it
Running the embedding algorithm in a strictly causal mode without any block lookahead and measuring whether the bit-error rate stays below 10 percent or rises significantly would directly test the role of the non-causal access.
Figures
read the original abstract
We study the problem of multi-bit watermarking for large language models (LLMs). We introduce a block-autoregressive model inspired by multi-token prediction, in which the encoder has limited non-causal access to token distributions within each block. This formulation enables an information-theoretic characterization of multi-bit watermarking capacity, by which the knowledge of LLM cover statistics is leveraged to enable a multi-bit covert embedding. We study the information-theoretic limits of the model by combining Gelfand-Pinsker and channel synthesis coding techniques and obtain an exact characterization of the capacity. The embedding strategy is further optimized across blocks using a constrained Markov decision process (CMDP) and we develop an explicit algorithm based on polar codes following the information-theoretic principles. Our algorithm achieves a bit-error rate below 10 percent with a rate of 0.375 bits/token over short token lengths with negligible perplexity and distortion degradation.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces a block-autoregressive model for covert multi-bit watermarking of LLMs in which the encoder is granted limited non-causal access to token distributions inside each block. It combines Gelfand-Pinsker coding with channel-synthesis techniques to obtain an exact capacity characterization, optimizes the embedding policy across blocks via a constrained Markov decision process, and constructs an explicit polar-code algorithm. The algorithm is reported to achieve BER below 10 percent at 0.375 bits per token on short sequences while preserving perplexity and distortion.
Significance. If the capacity result is valid under the stated model, the work supplies the first information-theoretic benchmark for multi-bit covert watermarking and demonstrates that polar-code constructions can approach the bound with modest overhead. The reported operating point (0.375 bits/token, BER < 0.1) would be practically relevant if the non-causal access assumption can be realized or approximated.
major comments (1)
- [Model definition and capacity derivation] The exact capacity characterization (Abstract and the section deriving the Gelfand-Pinsker / channel-synthesis bound) rests on the block-autoregressive model that supplies the encoder with limited non-causal knowledge of the LLM's conditional distributions inside each block. Standard causal autoregressive sampling provides only past tokens; any practical implementation must therefore replace the assumed distributions with estimates or lookahead. Because the capacity equality is obtained by construction under this access model, the claimed characterization does not directly apply to ordinary LLM generation and the reported BER/rate numbers cannot be taken as evidence that the derived capacity is achieved in the causal setting.
minor comments (2)
- [Abstract and experimental section] The abstract states concrete BER and rate figures but supplies neither the precise token length, number of Monte-Carlo trials, nor confidence intervals; these details are required to evaluate whether the performance claims are statistically supported.
- [Algorithm section] The CMDP formulation and polar-code construction are described at a high level; explicit pseudocode or a small worked example would clarify how the auxiliary channel and state transitions are instantiated from the LLM logits.
Simulated Author's Rebuttal
We thank the referee for the careful reading of our manuscript and the constructive feedback. We are glad that the significance of the capacity characterization and the practical relevance of the operating point are recognized. We address the major comment below.
read point-by-point responses
-
Referee: [Model definition and capacity derivation] The exact capacity characterization (Abstract and the section deriving the Gelfand-Pinsker / channel-synthesis bound) rests on the block-autoregressive model that supplies the encoder with limited non-causal knowledge of the LLM's conditional distributions inside each block. Standard causal autoregressive sampling provides only past tokens; any practical implementation must therefore replace the assumed distributions with estimates or lookahead. Because the capacity equality is obtained by construction under this access model, the claimed characterization does not directly apply to ordinary LLM generation and the reported BER/rate numbers cannot be taken as evidence that the derived capacity is achieved in the causal setting.
Authors: We agree that the exact capacity characterization is obtained under the block-autoregressive model with limited non-causal access to token distributions inside each block, as explicitly introduced in the manuscript and motivated by multi-token prediction techniques. This modeling assumption enables the combination of Gelfand-Pinsker coding with channel synthesis to derive the capacity. We do not claim the result applies directly to standard causal autoregressive sampling without approximation. In the revision we will expand the model section and add a dedicated discussion clarifying the scope of the result as an information-theoretic benchmark, while outlining practical approximations such as using auxiliary prediction heads or estimated future distributions to realize the access in causal settings. The reported BER and rate results demonstrate achievability of the polar-code construction within the stated model. revision: yes
Circularity Check
Capacity derivation applies standard Gelfand-Pinsker and channel-synthesis techniques to a newly defined block model without self-referential reduction.
full rationale
The paper introduces a block-autoregressive model with limited non-causal access to token distributions and then invokes established Gelfand-Pinsker coding for state-dependent channels together with channel synthesis for covertness to obtain an exact capacity expression. This constitutes an application of known information-theoretic tools to the custom model rather than a self-definitional loop or a fitted parameter renamed as a prediction. The subsequent CMDP optimization and polar-code algorithm are constructed explicitly from the derived capacity, and the reported BER and rate figures are presented as empirical outcomes of that algorithm. No load-bearing self-citations or uniqueness theorems from prior author work are required for the central characterization, rendering the derivation self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- standard math Gelfand-Pinsker theorem and channel synthesis results hold for the defined block-autoregressive watermarking channel.
invented entities (1)
-
block-autoregressive model with limited non-causal access
no independent evidence
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Theorem 1. The covert multi-bit LLM watermarking capacity with non-causal state is max I(U;X)−I(U;S) s.t. marginal matches PXS
-
IndisputableMonolith/Foundation/ArithmeticFromLogic.leanLogicNat ≃ Nat recovery unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We develop an explicit algorithm based on polar codes following the information-theoretic principles.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
A watermark for large language models,
J. Kirchenbauer, J. Geiping, Y . Wen, J. Katz, I. Miers, and T. Goldstein, “A watermark for large language models,” inProc. Int. Conf. Mach. Learn. (ICML). PMLR, Jul. 2023, pp. 17 061–17 084
work page 2023
-
[2]
Undetectable watermarks for language models,
M. Christ, S. Gunn, and O. Zamir, “Undetectable watermarks for language models,” inProc. Conf. Learn. Theory (COLT). PMLR, Jun. 2024, pp. 1125–1139
work page 2024
-
[3]
Heavywater and simplexwater: Distortion- free llm watermarks for low-entropy distributions,
D. Tsur, C. X. Long, C. M. Verdun, S. Vithana, H. Hsu, C.-F. Chen, H. H. Permuter, and F. Calmon, “Heavywater and simplexwater: Distortion- free llm watermarks for low-entropy distributions,” inAdv. Neural Inf. Process. Syst. (NeurIPS), Dec. 2025
work page 2025
-
[4]
Provable robust watermarking for ai-generated text,
X. Zhao, P. Ananth, L. Li, and Y .-X. Wang, “Provable robust water- marking for ai-generated text,”arXiv arXiv:2306.17439, 2023
-
[5]
Robust content-dependent high- fidelity watermark for tracking in digital cinema,
J. Lubin, J. A. Bloom, and H. Cheng, “Robust content-dependent high- fidelity watermark for tracking in digital cinema,” inSecurity and Watermarking of Multimedia Contents V, vol. 5020. SPIE, 2003, pp. 536–545
work page 2003
-
[6]
Robust multi-bit natural language watermarking through invariant features,
K. Yoo, W. Ahn, J. Jang, and N. Kwak, “Robust multi-bit natural language watermarking through invariant features,”arXiv arXiv:2305.01904, 2023
-
[7]
Advancing beyond identification: Multi- bit watermark for large language models,
K. Yoo, W. Ahn, and N. Kwak, “Advancing beyond identification: Multi- bit watermark for large language models,”arXiv arXiv:2308.00221, 2023
-
[8]
Towards codable watermarking for injecting multi-bits information to llms,
L. Wang, W. Yang, D. Chen, H. Zhou, Y . Lin, F. Meng, J. Zhou, and X. Sun, “Towards codable watermarking for injecting multi-bits information to llms,”arXiv arXiv:2307.15992, 2023
-
[9]
Provably robust multi-bit watermarking for{AI-generated} text,
W. Qu, W. Zheng, T. Tao, D. Yin, Y . Jiang, Z. Tian, W. Zou, J. Jia, and J. Zhang, “Provably robust multi-bit watermarking for{AI-generated} text,” inProc. USENIX Secur. Symp., 2025, pp. 201–220
work page 2025
-
[10]
Arcmark: Multi-bit llm watermark via optimal transport,
A. Gilani, C. X. Long, S. Vithana, O. Kosut, L. Sankar, and F. P. Calmon, “Arcmark: Multi-bit llm watermark via optimal transport,” arXiv arXiv:2602.07235, 2026
-
[11]
Information-theoretic analysis of information hiding,
P. Moulin and J. A. O’Sullivan, “Information-theoretic analysis of information hiding,”IEEE Trans. Inf. Theory, vol. 49, no. 3, pp. 563– 593, 2003
work page 2003
-
[12]
A. Liu, B. Feng, B. Xue, B. Wang, B. Wu, C. Lu, C. Zhao, C. Deng, C. Zhang, C. Ruanet al., “Deepseek-v3 technical report,”arXiv arXiv:2412.19437, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[13]
Better & Faster Large Language Models via Multi-token Prediction
F. Gloeckle, B. Y . Idrissi, B. Rozi `ere, D. Lopez-Paz, and G. Synnaeve, “Better & faster large language models via multi-token prediction,”arXiv arXiv:2404.19737, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[14]
Approximation theory of output statistics,
T. S. Han and S. Verd ´u, “Approximation theory of output statistics,” IEEE Trans. Inf. Theory, vol. 39, no. 3, pp. 752–772, 2002
work page 2002
-
[15]
Distributed channel synthesis,
P. Cuff, “Distributed channel synthesis,”IEEE Trans. Inf. Theory, vol. 59, no. 11, pp. 7071–7096, 2013
work page 2013
-
[16]
The likelihood encoder for lossy compression,
E. C. Song, P. Cuff, and H. V . Poor, “The likelihood encoder for lossy compression,”IEEE Trans. Inf. Theory, vol. 62, no. 4, pp. 1836–1849, 2016
work page 2016
-
[17]
Strong secrecy from channel resolv- ability,
M. R. Bloch and J. N. Laneman, “Strong secrecy from channel resolv- ability,”IEEE Trans. Inf. Theory, vol. 59, no. 12, pp. 8077–8098, 2013
work page 2013
-
[18]
Covert communication over noisy channels: A resolvabil- ity perspective,
M. R. Bloch, “Covert communication over noisy channels: A resolvabil- ity perspective,”IEEE Trans. Inf. Theory, vol. 62, no. 5, pp. 2334–2354, 2016
work page 2016
-
[19]
W. B. Powell,Approximate Dynamic Programming: Solving the curses of dimensionality. John Wiley & Sons, 2007, vol. 703
work page 2007
-
[20]
Altman,Constrained Markov decision processes
E. Altman,Constrained Markov decision processes. Routledge, 2021
work page 2021
-
[21]
Relatively-secure llm-based steganography via constrained markov decision processes,
Y .-S. Huang, C. Tian, K. Narayanan, and L. Zheng, “Relatively-secure llm-based steganography via constrained markov decision processes,” in Proc. IEEE Int. Symp. Inf. Theory (ISIT). IEEE, Jun. 2025, pp. 1–6
work page 2025
-
[22]
Polar codes for slepian-wolf, wyner-ziv, and gelfand-pinsker,
S. B. Korada and R. Urbanke, “Polar codes for slepian-wolf, wyner-ziv, and gelfand-pinsker,” in2010 IEEE Information Theory Workshop on Information Theory (ITW 2010, Cairo). IEEE, 2010, pp. 1–5
work page 2010
-
[23]
Polar coding for the broadcast channel with confidential messages: A random binning analogy,
R. A. Chou and M. R. Bloch, “Polar coding for the broadcast channel with confidential messages: A random binning analogy,”IEEE Transac- tions on Information Theory, vol. 62, no. 5, pp. 2410–2429, May 2016
work page 2016
-
[24]
Empirical and strong coordination via soft covering with polar codes,
R. A. Chou, M. R. Bloch, and J. Kliewer, “Empirical and strong coordination via soft covering with polar codes,”IEEE Trans. Inf. Theory, vol. 64, no. 7, pp. 5087–5100, 2018
work page 2018
-
[25]
E. Arikan, “Channel polarization: A method for constructing capacity- achieving codes for symmetric binary-input memoryless channels,”IEEE Transactions on information Theory, vol. 55, no. 7, pp. 3051–3073, 2009
work page 2009
-
[26]
T. M. Cover and J. A. Thomas,Elements of Information Theory. Wiley, 2006
work page 2006
-
[27]
V ocalnet: Speech llms with multi-token prediction for faster and high- quality generation,
Y . Wang, H. Liu, Z. Cheng, R. Wu, Q. Gu, Y . Wang, and Y . Wang, “V ocalnet: Speech llms with multi-token prediction for faster and high- quality generation,” inProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025, pp. 19 595–19 612
work page 2025
-
[28]
Fast inference from transform- ers via speculative decoding,
Y . Leviathan, M. Kalman, and Y . Matias, “Fast inference from transform- ers via speculative decoding,” inInternational Conference on Machine Learning. PMLR, 2023, pp. 19 274–19 286
work page 2023
-
[29]
Perfectly secure steganography: Capacity, error exponents, and code constructions,
Y . Wang and P. Moulin, “Perfectly secure steganography: Capacity, error exponents, and code constructions,”IEEE Trans. Inf. Theory, vol. 54, no. 6, pp. 2706–2722, 2008
work page 2008
-
[30]
Covert multi-bit LLM watermarking: An information theory and coding approach,
S. Guo, T. Kann, T. Baluta, and M. R. Bloch, “Covert multi-bit LLM watermarking: An information theory and coding approach,” arXiv, 2026
work page 2026
-
[31]
T. S. Han,Information-spectrum methods in information theory. Springer Science & Business Media, 2013, vol. 50
work page 2013
-
[32]
Watermarking makes language models radioactive
T. Sander, P. Fernandez, A. Durmus, T. Furon, and M. Douze, “Wa- termarking makes language models radioactive,”NeurIPS, Dec. 2024, arXiv:2402.14904
-
[33]
Detecting benchmark contamination through watermarking,
T. Sander, P. Fernandez, S. Mahloujifar, A. Durmus, and C. Guo, “Detecting benchmark contamination through watermarking,” inICLR, 2025, arXiv:2502.17259
-
[34]
Meteor: Cryptographically secure steganography for realistic distributions,
G. Kaptchuk, T. M. Jois, M. Green, and A. D. Rubin, “Meteor: Cryptographically secure steganography for realistic distributions,” in ACM CCS, 2021
work page 2021
-
[35]
Perfectly secure steganography using minimum entropy coupling,
C. S. de Witt, S. Sokota, J. Z. Kolter, J. Foerster, and M. Strohmeier, “Perfectly secure steganography using minimum entropy coupling,” in NeurIPS, Dec. 2023
work page 2023
-
[36]
M. Samragh, A. Kundu, D. Harrison, K. Nishu, D. Naik, M. Cho, and M. Farajtabar, “Your llm knows the future: Uncovering its multi-token prediction potential,”arXiv arXiv:2507.11851, 2025
-
[37]
Reward Constrained Policy Optimization
C. Tessler, D. J. Mankowitz, and S. Mannor, “Reward constrained policy optimization,”arXiv preprint arXiv:1805.11074, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[38]
A memory-based reinforcement learning approach to integrated sensing and communi- cation,
H. Nikbakht, M. Wigger, S. S. Shitz, and H. V . Poor, “A memory-based reinforcement learning approach to integrated sensing and communi- cation,” in2024 58th Asilomar Conference on Signals, Systems, and Computers. IEEE, 2024, pp. 433–437
work page 2024
-
[39]
Joint successive cancellation decoding of polar codes over intersymbol interference channels,
R. Wang, R. Liu, and Y . Hou, “Joint successive cancellation decoding of polar codes over intersymbol interference channels,” 2014
work page 2014
-
[40]
A. El Gamal and Y .-H. Kim,Network information theory. Cambridge university press, 2011
work page 2011
-
[41]
A stronger soft-covering lemma and applications,
P. Cuff, “A stronger soft-covering lemma and applications,” in2015 IEEE Conference on Communications and Network Security (CNS). IEEE, 2015, pp. 40–43
work page 2015
-
[42]
Information theory: Coding theorems for discrete memoryless systems,
I. Csiszar and J. Korner, “Information theory: Coding theorems for discrete memoryless systems,” 1982
work page 1982
-
[43]
Y . Polyanskiy and Y . Wu,Information theory: From coding to learning. Cambridge university press, 2025. APPENDIXA ACHIEVABILITYPROOF OFTHEOREM1 The proof uses a random coding argument with likelihood encoder and soft-covering in place of covering argument used in standard Gelfand-Pinsker code [40]. A. Code Description Consider a random variableUsuch that...
work page 2025
-
[44]
A total of2 n(Rk+R+R0) codewords are drawn i.i.d from the distribution P ⊗n U =Qn t=1 PU(ut)
Codebook Generation: Construct a codebookC n consisting ofk∈ K≜[1,2 nRk]bins, each containingm∈ M≜ [1,2 nR]sub-bins of sizej∈ J≜[1,2 nR0]. A total of2 n(Rk+R+R0) codewords are drawn i.i.d from the distribution P ⊗n U =Qn t=1 PU(ut)
-
[45]
The encoder then samplesx n according toW X n|U n,Sn(xn|un(m, k, j), sn)
Encoding: For each state sequences n and keyK=k, to send messageM=m, the encoder selects codewordu n(m, k, j) with likelihood encoder by drawing the indexJ=jfrom PJ|M,K,S n(j|m, k, sn) = P ⊗n S|U (sn|un(m, k, j)) P|J | j=1 P ⊗n S|U (sn|un(m, k, j)) ,(20) whereP ⊗n S|U is induced byP ⊗n S and our choice of auxiliary random variableU. The encoder then sampl...
-
[46]
Otherwise, the decoder declares an error
Decoding: The decoder receivesx n and with access tok, decides messagemif there exists a unique pair(m, j)such that(x n, un(m, k, j))∈ T n ϵ (XU), whereT n ϵ (XU)is the joint typical set. Otherwise, the decoder declares an error. B. Analysis of Covertness We first recall known results regarding soft covering [15], [41] in the following lemma. Lemma 3(Soft...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.