ArcMark: Distortion-Free Multi-Byte LLM Watermark via Optimal Transport
Pith reviewed 2026-05-25 07:01 UTC · model grok-4.3
The pith
ArcMark embeds multiple bytes into LLM text without changing next-token probabilities via optimal transport.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Formulating distortion-free multi-bit LLM watermarking as a channel coding problem yields an information-theoretic capacity; solving the associated optimal transport problem produces ArcMark, which embeds multiple bytes per few hundred tokens without perturbing the LLM next-token distribution.
What carries the argument
Optimal transport plan between message distribution and token sequences that exactly preserves the marginal next-token probabilities.
If this is right
- Generated text can carry embedded identifiers for the submitting user, exact model version, or even the prompt itself.
- Watermarked and unwatermarked outputs are statistically identical in perplexity and in performance on downstream tasks.
- Message recovery remains reliable when an attacker alters a subset of the generated tokens.
- Embedding rates are limited by the derived information-theoretic capacity for distortion-free channels.
Where Pith is reading between the lines
- The approach could support tracing of individual generations back to specific sessions or prompts at scale.
- Efficient approximations to the optimal transport step would be needed for deployment on very long outputs.
- Empirical tests across model families would check whether the capacity expression holds beyond the evaluated settings.
Load-bearing premise
The watermarking task admits a transport plan whose marginals match the original LLM distribution while carrying the message.
What would settle it
An experiment that measures the empirical next-token distribution of watermarked outputs and detects a statistically significant shift from the original LLM, or that shows message recovery rates falling below the claimed capacity.
Figures
read the original abstract
Watermarking is an important tool for promoting the responsible use of large language models (LLMs). Existing watermarks insert a signal into generated tokens that either flags LLM-generated text (zero-bit watermarking) or encodes more complex messages (multi-bit watermarking). Though a number of recent approaches insert multiple bits into text without perturbing average next-token predictions, they largely extend design principles from the zero-bit setting, such as encoding a single bit per token. In contrast, a watermarker capable of embedding multiple bytes into the text would dramatically increase the potential applications, by embedding information such as the ID of the user who submitted the prompt, the precise model version that was used, or even the prompt itself. We address this problem by introducing ArcMark: a new watermark construction based on coding and information-theoretic principles that is capable of reliably embedding multiple bytes of information into just a few hundred tokens, without any distortion of the underlying LLM next-token distribution. We derive ArcMark by formulating the distortion-free watermarking problem as a channel coding problem, and deriving an information-theoretic channel capacity that establishes the fundamental limit of embedding information in LLM output in a distortion-free manner. This capacity formulation informs the design of ArcMark. In practice, ArcMark outperforms competing multi-bit distortion-free watermarks in terms of reconstruction accuracy, including in the face of attacks that alter a subset of the LLM text. ArcMark output is also shown to be indistinguishable from unwatermarked text in terms of perplexity, and in downstream task quality.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces ArcMark, a multi-byte LLM watermarking method based on optimal transport. It formulates distortion-free watermarking as a channel coding problem, derives the associated information-theoretic capacity, and uses this to design an encoder/decoder pair that embeds multiple bytes into a few hundred tokens while leaving the LLM next-token conditional distributions unchanged. The construction is claimed to outperform prior multi-bit distortion-free watermarks in reconstruction accuracy (including under partial text alterations), while producing outputs indistinguishable from unwatermarked text in perplexity and downstream task quality.
Significance. If the capacity derivation and OT construction are shown to achieve the zero-distortion property at the claimed finite lengths, the work would provide a principled, capacity-informed approach to high-rate multi-bit watermarking. This could enable practical applications such as embedding user IDs or model provenance directly in generated text without detectable statistical shifts.
major comments (2)
- [Capacity derivation and ArcMark construction sections] The central claim that the derived channel capacity directly informs an OT-based construction achieving the stated embedding rates without message-dependent bias in the output marginals is load-bearing. The abstract states that the capacity 'informs the design,' but the manuscript must explicitly verify that the practical OT plan and its inversion preserve the exact original LLM marginal at every position for finite blocks of a few hundred tokens; any gap between the idealized channel model and sequential LLM generation would undermine both the zero-distortion guarantee and the reported rates.
- [Experiments section] The experimental claims of outperforming baselines in reconstruction accuracy (including under attacks) and maintaining downstream task quality rest on the zero-distortion property holding in practice. The manuscript should include a direct empirical check (e.g., via KL divergence or next-token prediction statistics) that the watermarked and unwatermarked distributions are identical within sampling error; without this, the performance numbers cannot be attributed to the claimed distortion-free property.
minor comments (2)
- [Abstract] The abstract would benefit from naming the specific competing multi-bit distortion-free baselines and reporting at least one quantitative reconstruction-accuracy figure.
- [Preliminaries / Method sections] Notation for the OT plan, the channel model, and the capacity expression should be introduced with explicit definitions before being used in the construction.
Simulated Author's Rebuttal
We thank the referee for their constructive and detailed comments. We address each major comment below and outline the revisions we will make to strengthen the manuscript.
read point-by-point responses
-
Referee: [Capacity derivation and ArcMark construction sections] The central claim that the derived channel capacity directly informs an OT-based construction achieving the stated embedding rates without message-dependent bias in the output marginals is load-bearing. The abstract states that the capacity 'informs the design,' but the manuscript must explicitly verify that the practical OT plan and its inversion preserve the exact original LLM marginal at every position for finite blocks of a few hundred tokens; any gap between the idealized channel model and sequential LLM generation would undermine both the zero-distortion guarantee and the reported rates.
Authors: The ArcMark construction computes an optimal transport plan between the message-conditioned and original next-token distributions at each step, with the plan chosen to be independent of the message in its marginal effect; the decoder inverts this plan using the same shared randomness. By the definition of optimal transport, the pushforward measure exactly recovers the original LLM marginal for any finite block length when the plan is applied sequentially. We will add an explicit lemma and short proof in Section 3 (or a new appendix) demonstrating that the finite-block sequential generation preserves the marginal at every position under the channel-coding formulation, thereby closing any perceived gap between the idealized model and the implemented encoder. revision: yes
-
Referee: [Experiments section] The experimental claims of outperforming baselines in reconstruction accuracy (including under attacks) and maintaining downstream task quality rest on the zero-distortion property holding in practice. The manuscript should include a direct empirical check (e.g., via KL divergence or next-token prediction statistics) that the watermarked and unwatermarked distributions are identical within sampling error; without this, the performance numbers cannot be attributed to the claimed distortion-free property.
Authors: We agree that an explicit empirical verification would make the zero-distortion claim more robust. In the revised manuscript we will add, in the Experiments section, a direct comparison on a held-out prompt set: (i) average KL divergence between the watermarked and unwatermarked next-token conditional distributions, and (ii) per-token frequency statistics, both shown to be statistically indistinguishable within sampling error. These results will be reported alongside the existing perplexity and downstream-task metrics. revision: yes
Circularity Check
Derivation self-contained; capacity derived from first principles informs OT construction without reduction to inputs
full rationale
The abstract states that the distortion-free watermarking problem is formulated as a channel coding problem, from which an information-theoretic capacity is derived to establish fundamental limits and inform the ArcMark design. No equations, self-citations, fitted parameters, or ansatzes are quoted that reduce the claimed capacity or OT plan to the target rates by construction. The construction is presented as achieving the derived limits via optimal transport without message-dependent bias, and the paper reports empirical outperformance on reconstruction accuracy and perplexity. This is the most common honest finding when the central claim rests on an external information-theoretic formulation rather than a self-referential fit or renamed empirical pattern.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption LLM next-token distributions can be modeled as a channel for which a distortion-free capacity exists and can be achieved by optimal transport
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/AlexanderDuality.leanalexander_duality_circle_linking echoes?
echoesECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.
We derive ArcMark by formulating the distortion-free watermarking problem as a channel coding problem, and deriving an information-theoretic channel capacity... ArcMark employs a random linear channel code... represents each message codeword symbol C, generated token X, and side information V as points on the unit circle, and (ii) solves an optimal transport problem that minimizes the arc length
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Theorem 3.1. The watermarking capacity is given by R_cap = max I(W;X) s.t. Pr(X=x|Q=q)=q(x)
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 2 Pith papers
-
Every Bit, Everywhere, All at Once: A Binomial Multibit LLM Watermark
A binomial multibit watermarking scheme encodes every payload bit at each LLM token with dynamic redirection, outperforming baselines in accuracy and robustness for large payloads.
-
Covert Multi-bit LLM Watermarking: An Information Theory and Coding Approach
Characterizes the exact capacity of multi-bit covert LLM watermarking via Gelfand-Pinsker and channel synthesis, then gives a polar-code algorithm achieving 0.375 bits/token at under 10% BER with negligible perplexity impact.
Reference graph
Works this paper leans on
-
[1]
Artificial intelligence 2024 legislation
National Conference of State Legislatures. Artificial intelligence 2024 legislation. https: //www.ncsl.org/technology-and-communication/artificial-intelligence-2024-legislation, 2024. Accessed: 2025-05-14
work page 2024
-
[2]
Bram Rijsbosch, Gijs van Dijck, and Konrad Kollnig. Adoption of watermarking for generative ai systems in practice and implications under the new eu ai act.arXiv preprint arXiv:2503.18156, 2025. 13
-
[3]
Bilva Chandra, Jesse Dunietz, and Kathleen Roberts. Reducing risks posed by synthetic content: An overview of technical approaches to digital content transparency. Technical Report NIST.AI.100-4, National Institute of Standards and Technology, Gaithersburg, MD, 2024
work page 2024
-
[4]
A watermark for large language models
John Kirchenbauer, Jonas Geiping, Yuxin Wen, Jonathan Katz, Ian Miers, and Tom Goldstein. A watermark for large language models. InInternational Conference on Machine Learning, pages 17061–17084. PMLR, 2023
work page 2023
-
[5]
Scalable watermarking for identifying large language model outputs.Nature, 634(8035):818–823, 2024
Sumanth Dathathri, Abigail See, Sumedh Ghaisas, Po-Sen Huang, Rob McAdam, Johannes Welbl, Vandana Bachani, Alex Kaskasoli, Robert Stanforth, Tatiana Matejovicova, et al. Scalable watermarking for identifying large language model outputs.Nature, 634(8035):818–823, 2024
work page 2024
-
[6]
Heavywater and simplexwater: Distortion-free llm watermarks for low-entropy distributions
Dor Tsur, Carol Xuan Long, Claudio Mayrink Verdun, Sajani Vithana, Hsiang Hsu, Chun-Fu Chen, Haim H Permuter, and Flavio Calmon. Heavywater and simplexwater: Distortion-free llm watermarks for low-entropy distributions. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025
work page 2025
-
[7]
Watermarking of large language models
Scott Aaronson. Watermarking of large language models. https://simons.berkeley.edu/talks/ scott-aaronson-ut-austin-openai-2023-08-17, August 2023. Accessed: 2025-01-1-
work page 2023
-
[8]
Robust distortion- free watermarks for language models.arXiv preprint arXiv:2307.15593, 2023
Rohith Kuditipudi, John Thickstun, Tatsunori Hashimoto, and Percy Liang. Robust distortion- free watermarks for language models.arXiv preprint arXiv:2307.15593, 2023
-
[9]
Xiaoyan Feng, He Zhang, Yanjun Zhang, Leo Yu Zhang, and Shirui Pan. Bimark: Unbiased multilayer watermarking for large language models.arXiv preprint arXiv:2506.21602, 2025
-
[10]
Advancing beyond identification: Multi-bit watermark for large language models
KiYoon Yoo, Wonhyuk Ahn, and Nojun Kwak. Advancing beyond identification: Multi-bit watermark for large language models. In Kevin Duh, Helena Gomez, and Steven Bethard, editors,Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 4031–4...
work page 2024
-
[11]
On the reliability of watermarks for large language models.arXiv preprint arXiv:2306.04634, 2023
John Kirchenbauer, Jonas Geiping, Yuxin Wen, Manli Shu, Khalid Saifullah, Kezhi Kong, Kasun Fernando, Aniruddha Saha, Micah Goldblum, and Tom Goldstein. On the reliability of watermarks for large language models.arXiv preprint arXiv:2306.04634, 2023
-
[12]
PhD thesis, Massachusetts Institute of Technology, 2000
Brian Chen.Design and analysis of digital watermarking, information embedding, and data hiding systems. PhD thesis, Massachusetts Institute of Technology, 2000
work page 2000
-
[13]
Information-theoretic analysis of information hiding
Pierre Moulin and Joseph A O’Sullivan. Information-theoretic analysis of information hiding. IEEE Transactions on information theory, 49(3):563–593, 2003
work page 2003
-
[14]
Authentication with distortion criteria
Emin Martinian, Gregory W Wornell, and Brian Chen. Authentication with distortion criteria. IEEE Transactions on Information Theory, 51(7):2523–2542, 2005
work page 2005
-
[15]
Coding for channels with random parameters.Probl
Israel Gel’Fand and Mark Pinsker. Coding for channels with random parameters.Probl. Contr. Inform. Theory, 9(1):19–31, 1980
work page 1980
-
[16]
Text data-hiding for digital and printed documents: Theoretical and practical considerations
Renato Vill´ an, Sviatoslav Voloshynovskiy, Oleksiy Koval, J Vila, Emre Topak, Fr´ ed´ eric Deguil- laume, Yuri Rytsar, and Thierry Pun. Text data-hiding for digital and printed documents: Theoretical and practical considerations. InSecurity, Steganography, and Watermarking of Multimedia Contents VIII, volume 6072, pages 406–416. SPIE, 2006. 14
work page 2006
-
[17]
An informationtheoretical approach to information embedding
Frans MJ Willems. An informationtheoretical approach to information embedding. In2000 Sym- posium on Information Theory in the Benelux, SITB 2000, pages 255–260. Werkgemeenschap voor Informatie-en Communicatietheorie (WIC), 2000
work page 2000
-
[18]
Optimized couplings for watermarking large language models
Carol Xuan Long, Dor Tsur, Claudio Mayrink Verdun, Hsiang Hsu, Haim Permuter, and Flavio P Calmon. Optimized couplings for watermarking large language models. In2025 IEEE International Symposium on Information Theory (ISIT), pages 1–6. IEEE, 2025
work page 2025
-
[19]
Haiyun He, Yepeng Liu, Ziqiao Wang, Yongyi Mao, and Yuheng Bu. Universally optimal watermarking schemes for llms: from theory to practice.arXiv preprint arXiv:2410.02890, 2024
-
[20]
Provably robust multi-bit watermarking for ai-generated text
Wenjie Qu, Wengrui Zheng, Tianyang Tao, Dong Yin, Yanze Jiang, Zhihua Tian, Wei Zou, Jinyuan Jia, and Jiaheng Zhang. Provably robust multi-bit watermarking for ai-generated text. InProceedings of the 34th USENIX Conference on Security Symposium, 2025
work page 2025
-
[21]
StealthInk: A multi-bit and stealthy watermark for large language models
Ya Jiang, Chuxiong Wu, Massieh Kordi Boroujeny, Brian Mark, and Kai Zeng. StealthInk: A multi-bit and stealthy watermark for large language models. InProceedings of the 42nd International Conference on Machine Learning, volume 267 ofProceedings of Machine Learning Research, pages 27685–27709, 13–19 Jul 2025
work page 2025
-
[22]
Robust multi-bit text watermark with LLM-based paraphrasers
Xiaojun Xu, Jinghan Jia, Yuanshun Yao, Yang Liu, and Hang Li. Robust multi-bit text watermark with LLM-based paraphrasers. InForty-second International Conference on Machine Learning, 2025
work page 2025
-
[23]
Advancing beyond identification: Multi- bit watermark for large language models
KiYoon Yoo, Wonhyuk Ahn, and Nojun Kwak. Advancing beyond identification: Multi- bit watermark for large language models. InProceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 4031–4055, 2024
work page 2024
-
[24]
A. El Gamal and Y-H. Kim.Network information theory. Cambridge university press, 2011
work page 2011
-
[25]
Cambridge University Press, 2025
Yury Polyanskiy and Yihong Wu.Information Theory: From Coding to Learning. Cambridge University Press, 2025. 15 A Proof of Theorem 3.1 In this proof we use the vector notationX n = (X1, X2, . . . , Xn). Achievability:We first prove that the capacity is at least equal to the quantity given in (2). Consider any discrete distribution PW and function x(w, q) ...
work page 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.