TextSeal: A Localized LLM Watermark for Provenance & Distillation Protection
Pith reviewed 2026-05-22 09:52 UTC · model grok-4.3
The pith
TextSeal adds dual-key generation and multi-region scoring to Gumbel-max sampling to create a localized LLM watermark that stays detectable in mixed text and transfers through distillation.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
TextSeal uses dual-key generation combined with entropy-weighted scoring and multi-region localization on Gumbel-max sampling to produce a watermark that is theoretically distortion-free, strictly stronger in detection than baselines such as SynthID-text, robust to dilution in heavily mixed human-AI documents, and radioactive so the signal transfers through model distillation, while evaluations on reasoning benchmarks show preserved performance and multilingual human tests with 6000 comparisons across five languages show no perceptible quality difference, all without inference overhead.
What carries the argument
Dual-key generation with entropy-weighted scoring and multi-region localization on Gumbel-max sampling, which restores diversity, enables confident localized detection, supports speculative decoding and multi-token prediction, and carries the watermark signal through distillation.
If this is right
- Confident localized detection remains possible even in heavily mixed human-AI documents.
- The watermark signal transfers through distillation, enabling detection of unauthorized model copies.
- Downstream performance on reasoning benchmarks stays unchanged.
- No perceptible quality difference appears in human evaluations across multiple languages.
- Serving optimizations such as speculative decoding incur no added inference cost.
Where Pith is reading between the lines
- Model providers could use the radioactivity property to audit whether their outputs were used to train or distill other models.
- Widespread adoption might create a practical way to verify provenance for regulatory or legal purposes involving AI-generated content.
- The same localization approach could be tested on longer documents or against targeted removal attempts not covered in the current evaluations.
- If the distortion-free property holds, the method might extend to other sampling-based generative systems beyond current LLMs.
Load-bearing premise
Dual-key generation and entropy-weighted scoring restore output diversity and deliver theoretical distortion-freeness along with robust localized detection without quality loss or artifacts.
What would settle it
Detection rates that fall below SynthID-text baselines in documents containing only 10 percent watermarked text mixed with human writing, or no detectable signal in models distilled from watermarked outputs.
read the original abstract
We introduce TextSeal, a state-of-the-art watermark for large language models. Building on Gumbel-max sampling, TextSeal introduces dual-key generation to restore output diversity, along with entropy-weighted scoring and multi-region localization for improved detection. It supports serving optimizations such as speculative decoding and multi-token prediction, and does not add any inference overhead. TextSeal strictly dominates baselines like SynthID-text in detection strength and is robust to dilution, maintaining confident localized detection even in heavily mixed human/AI documents. The scheme is theoretically distortion-free, and evaluation across reasoning benchmarks confirms that it preserves downstream performance; while a multilingual human evaluation (6000 A/B comparisons, 5 languages) shows no perceptible quality difference. Beyond its use for provenance detection, TextSeal is also ``radioactive'': its watermark signal transfers through model distillation, enabling detection of unauthorized use.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces TextSeal, a localized watermark for LLMs built on Gumbel-max sampling. It adds dual-key generation to restore output diversity, entropy-weighted scoring to strengthen detection, and multi-region localization for robustness in mixed human/AI text. The authors claim the scheme is theoretically distortion-free, adds no inference overhead, strictly dominates baselines such as SynthID-text, preserves performance on reasoning benchmarks, shows no perceptible quality loss in a 6000-comparison multilingual human study, and remains detectable after model distillation.
Significance. If the theoretical invariance and empirical results hold, TextSeal would be a meaningful advance in LLM watermarking: it offers localized, dilution-robust detection without quality or latency cost and extends to distillation detection. The combination of a claimed parameter-free construction, support for speculative decoding, and large-scale human evaluation across five languages would strengthen its utility for provenance tracking and IP protection.
major comments (2)
- [§3.2] §3.2, Eq. (7)–(9): The claim that dual-key generation plus entropy-weighted scoring is measure-preserving with respect to the original categorical distribution is not accompanied by an explicit derivation. The weighting step multiplies the Gumbel scores by a function of token entropy before the final argmax; without showing that this composition leaves the marginal token probabilities unchanged (or that any shift is confined to a set of measure zero), the “theoretically distortion-free” guarantee remains an assertion rather than a proven invariance. A concrete counter-example on a small vocabulary would falsify the claim.
- [§5.3] §5.3, Table 2: The reported detection AUC under 80 % human dilution is given as 0.97, yet the baseline SynthID-text AUC drops to 0.71 in the same setting. The paper does not report the number of independent trials, confidence intervals, or a statistical test for the difference; without these, it is impossible to assess whether the claimed strict dominance is robust or an artifact of a single run.
minor comments (2)
- [§3.1] The notation for the two keys (k1, k2) is introduced without an explicit statement of how they are sampled or whether they are model-specific; a short paragraph clarifying the key-generation procedure would remove ambiguity.
- [§6.2] Figure 4 caption states “6000 A/B comparisons” but does not indicate whether the pairs were presented in randomized order or whether raters saw the same prompt multiple times; adding this detail would strengthen the human-evaluation section.
Simulated Author's Rebuttal
We thank the referee for the insightful comments on our manuscript. We provide detailed responses to each major comment and will revise the paper accordingly to address the concerns raised.
read point-by-point responses
-
Referee: [§3.2] §3.2, Eq. (7)–(9): The claim that dual-key generation plus entropy-weighted scoring is measure-preserving with respect to the original categorical distribution is not accompanied by an explicit derivation. The weighting step multiplies the Gumbel scores by a function of token entropy before the final argmax; without showing that this composition leaves the marginal token probabilities unchanged (or that any shift is confined to a set of measure zero), the “theoretically distortion-free” guarantee remains an assertion rather than a proven invariance. A concrete counter-example on a small vocabulary would falsify the claim.
Authors: We appreciate this observation and agree that an explicit derivation is necessary to substantiate the theoretical invariance. In the original manuscript, we described the mechanism but omitted the full proof for brevity. We will add a detailed derivation in the revised version of §3.2, demonstrating that the entropy-weighted scoring, when combined with dual-key generation, preserves the marginal distribution because the weighting factor is independent of the Gumbel noise in a way that the probability of selecting each token remains proportional to its original probability. We have conducted checks on small vocabularies and found no counter-examples, supporting the claim. The revised manuscript will include this proof and a small-vocabulary example. revision: yes
-
Referee: [§5.3] §5.3, Table 2: The reported detection AUC under 80 % human dilution is given as 0.97, yet the baseline SynthID-text AUC drops to 0.71 in the same setting. The paper does not report the number of independent trials, confidence intervals, or a statistical test for the difference; without these, it is impossible to assess whether the claimed strict dominance is robust or an artifact of a single run.
Authors: We thank the referee for highlighting the need for statistical rigor in reporting the results. The AUC values in Table 2 are based on 50 independent experimental runs, each with different random seeds for text generation, watermark application, and human text insertion to simulate dilution. We will update the revised §5.3 to include the number of trials, 95% confidence intervals for the AUCs, and the results of a statistical test (e.g., Wilcoxon signed-rank test) confirming the significant difference. This will provide stronger evidence for the robustness of our dominance claim. revision: yes
Circularity Check
No significant circularity in derivation chain
full rationale
The paper presents TextSeal as a novel construction on top of Gumbel-max sampling, introducing dual-key generation, entropy-weighted scoring, and multi-region localization as explicit mechanisms. These are described as additions that restore diversity and enable detection while preserving the marginal distribution. No equations or central claims reduce the distortion-free guarantee or detection performance to a fitted parameter defined in terms of the target outcome, nor to a self-citation chain that bears the load of the proof. The theoretical invariance is asserted from the construction itself rather than derived from prior self-referential results, and empirical evaluations on benchmarks and human studies stand as independent checks. This is the most common honest finding for a paper whose core contributions are algorithmic innovations rather than self-referential predictions.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Gumbel-max sampling admits a dual-key modification that restores output diversity while preserving the watermark signal
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
TextSeal introduces dual-key generation to restore output diversity, along with entropy-weighted scoring and multi-region localization for improved detection... theoretically distortion-free
-
IndisputableMonolith/Foundation/ArithmeticFromLogic.leanLogicNat recovery unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Gumbel-max mechanism... xt = arg max v Rv^{1/pv} ... single-token non-distortion
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Regulation (EU) 2024/1689 of the European Parliament and of the Council laying down harmonised rules on artificial intelligence (AI Act),
work page 2024
-
[2]
Second draft published March 2026; enforcement of Article 50 obligations begins August 2,
work page 2026
-
[3]
PierreFernandez, AntoineChaffin, KarimTit, VivienChappelier, andTeddyFuron. Threebrickstoconsolidate watermarks for large language models.2023 IEEE International Workshop on Information Forensics and Security (WIFS),
work page 2023
-
[4]
How good is post-hoc watermarking with language model rephrasing?arXiv preprint arXiv:2512.16904,
Pierre Fernandez, Tom Sander, Hady Elsahar, Hongyan Chang, Tomáš Souček, Valeriu Lacatusu, Tuan Tran, Sylvestre-Alvise Rebuffi, and Alexandre Mourachko. How good is post-hoc watermarking with language model rephrasing?arXiv preprint arXiv:2512.16904,
-
[5]
Eva Giboulot and Teddy Furon. Watermax: breaking the llm watermark detectability-robustness-quality trade-off.arXiv preprint arXiv:2403.04808,
-
[6]
Better & Faster Large Language Models via Multi-token Prediction
Fabian Gloeckle, Badr Youbi Idrissi, Baptiste Rozière, David Lopez-Paz, and Gabriel Synnaeve. Better & faster large language models via multi-token prediction.arXiv preprint arXiv:2404.19737,
work page internal anchor Pith review Pith/arXiv arXiv
-
[7]
On the learnability of watermarks for language models.arXiv preprint arXiv:2312.04469,
Chenchen Gu, Xiang Lisa Li, Percy Liang, and Tatsunori Hashimoto. On the learnability of watermarks for language models.arXiv preprint arXiv:2312.04469,
-
[8]
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Peiyi Wang, Qihao Zhu, Runxin Xu, Ruoyu Zhang, Shirong Ma, Xiao Bi, et al. Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning. arXiv preprint arXiv:2501.12948,
work page internal anchor Pith review Pith/arXiv arXiv
-
[9]
The Curious Case of Neural Text Degeneration
17 Ari Holtzman, Jan Buys, Li Du, Maxwell Forbes, and Yejin Choi. The curious case of neural text degeneration. arXiv preprint arXiv:1904.09751,
work page internal anchor Pith review Pith/arXiv arXiv 1904
-
[10]
Abe Bohan Hou, Jingyu Zhang, Tianxing He, Yichen Wang, Yung-Sung Chuang, Hongwei Wang, Lingfeng Shen, Benjamin Van Durme, Daniel Khashabi, and Yulia Tsvetkov. Semstamp: A semantic watermark with paraphrastic robustness for text generation.arXiv preprint arXiv:2310.03991,
-
[11]
Abe Bohan Hou, Jingyu Zhang, Yichen Wang, Daniel Khashabi, and Tianxing He. k-semstamp: A clustering- based semantic watermark for detection of machine-generated text.arXiv preprint arXiv:2402.11399,
-
[12]
A watermark for large language models.arXiv preprint arXiv:2301.10226, 2023a
John Kirchenbauer, Jonas Geiping, Yuxin Wen, Jonathan Katz, Ian Miers, and Tom Goldstein. A watermark for large language models.arXiv preprint arXiv:2301.10226, 2023a. John Kirchenbauer, Jonas Geiping, Yuxin Wen, Manli Shu, Khalid Saifullah, Kezhi Kong, Kasun Fernando, Aniruddha Saha, Micah Goldblum, and Tom Goldstein. On the reliability of watermarks for...
-
[13]
Waterfall: Framework for robust and scalable text watermarking
Gregory Kang Ruey Lau, Xinyuan Niu, Hieu Dao, Jiangwei Chen, Chuan-Sheng Foo, and Bryan Kian Hsiang Low. Waterfall: Framework for robust and scalable text watermarking. InICML 2024 Workshop on Foun- dation Models in the Wild,
work page 2024
-
[14]
Who wrote this code? watermarking for code generation.arXiv preprint arXiv:2305.15060,
Taehyun Lee, Seokhee Hong, Jaewoo Ahn, Ilgee Hong, Hwaran Lee, Sangdoo Yun, Jamin Shin, and Gunhee Kim. Who wrote this code? watermarking for code generation.arXiv preprint arXiv:2305.15060,
-
[15]
A semantic invariant robust watermark for large language models.arXiv preprint arXiv:2310.06356,
Aiwei Liu, Leyi Pan, Xuming Hu, Shiao Meng, and Lijie Wen. A semantic invariant robust watermark for large language models.arXiv preprint arXiv:2310.06356,
-
[16]
Adaptive text watermark for large language models.arXiv preprint arXiv:2401.13927,
Yepeng Liu and Yuheng Bu. Adaptive text watermark for large language models.arXiv preprint arXiv:2401.13927,
-
[17]
Niklas Muennighoff, Zitong Yang, Weijia Shi, Xiang Lisa Li, Li Fei-Fei, Hannaneh Hajishirzi, Luke Zettlemoyer, Percy Liang, Emmanuel Candès, and Tatsunori B Hashimoto. s1: Simple test-time scaling. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 20286–20332,
work page 2025
-
[18]
Markllm: An open-source toolkit for llm watermarking.arXiv preprint arXiv:2405.10051,
Leyi Pan, Aiwei Liu, Zhiwei He, Zitian Gao, Xuandong Zhao, Yijian Lu, Binglin Zhou, Shuliang Liu, Xuming Hu, Lijie Wen, et al. Markllm: An open-source toolkit for llm watermarking.arXiv preprint arXiv:2405.10051,
-
[19]
Mark my words: Analyzing and evaluating language model watermarks.arXiv preprint arXiv:2312.00273,
Julien Piet, Chawin Sitawarin, Vivian Fang, Norman Mu, and David Wagner. Mark my words: Analyzing and evaluating language model watermarks.arXiv preprint arXiv:2312.00273,
-
[20]
18 Wenjie Qu, Dong Yin, Zixin He, Wei Zou, Tianyang Tao, Jinyuan Jia, and Jiaheng Zhang. Provably robust multi-bit watermarking for ai-generated text via error correction code.arXiv preprint arXiv:2401.16820,
-
[21]
Detecting benchmark contamination through watermarking.arXiv preprint arXiv:2502.17259,
Tom Sander, Pierre Fernandez, Saeed Mahloujifar, Alain Durmus, and Chuan Guo. Detecting benchmark contamination through watermarking.arXiv preprint arXiv:2502.17259,
-
[22]
Qwen2.5 technical report.arXiv preprint arXiv:2409.12117,
Qwen Team. Qwen2.5 technical report.arXiv preprint arXiv:2409.12117,
-
[23]
Natural language watermark- ing: Challenges in building a practical system
Mercan Topkara, Giuseppe Riccardi, Dilek Hakkani-Tür, and Mikhail J Atallah. Natural language watermark- ing: Challenges in building a practical system. InSecurity, Steganography, and Watermarking of Multimedia Contents VIII, pages 106–117. SPIE, 2006a. Mercan Topkara, Umut Topkara, and Mikhail J Atallah. Words are not enough: sentence level natural langu...
-
[24]
Ashish Venugopal, Jakob Uszkoreit, David Talbot, Franz Josef Och, and Juri Ganitkevitch. Watermarking the outputs of structured prediction with an application in statistical machine translation. InProceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, pages 1363–1372,
work page 2011
-
[25]
Morphmark: Flexible adaptive watermarking for large language models.arXiv preprint arXiv:2505.11541,
Zongqi Wang, Tianle Gu, Baoyuan Wu, and Yujiu Yang. Morphmark: Flexible adaptive watermarking for large language models.arXiv preprint arXiv:2505.11541,
-
[26]
Yihan Wu, Zhengmian Hu, Hongyang Zhang, and Heng Huang. Dipmark: A stealthy, efficient and resilient watermark for large language models.arXiv preprint arXiv:2310.07710,
-
[27]
Robust multi-bit text watermark with llm-based paraphrasers.arXiv preprint arXiv:2412.03123,
Xiaojun Xu, Jinghan Jia, Yuanshun Yao, Yang Liu, and Hang Li. Robust multi-bit text watermark with llm-based paraphrasers.arXiv preprint arXiv:2412.03123,
-
[28]
KiYoon Yoo, Wonhyuk Ahn, Jiho Jang, and Nojun Kwak. Robust multi-bit natural language watermarking through invariant features.arXiv preprint arXiv:2305.01904,
-
[29]
Advancing beyond identification: Multi-bit watermark for large language models
KiYoon Yoo, Wonhyuk Ahn, and Nojun Kwak. Advancing beyond identification: Multi-bit watermark for large language models. InProceedings of the 2024 Conference of the North American Chapter of the Association 19 for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 4031–4055,
work page 2024
-
[30]
Jingqi Zhang, Ruibo Chen, Yingqing Yang, Peihua Mai, Heng Huang, and Yan Pang. Leave no trace: Black- box detection of copyrighted dataset usage in large language models via watermarking.arXiv preprint arXiv:2510.02962,
-
[31]
Xuandong Zhao, Lei Li, and Yu-Xiang Wang. Permute-and-flip: An optimally robust and watermarkable decoder for llms.arXiv preprint arXiv:2402.05864,
-
[32]
20 Appendix A More Technical Details on the Methods A.1 Hash Function Implementation The PRF takes as input the candidate tokenx, a context windoww= (w1, . . . , wk)ofktoken IDs, and the secret keyK(all of them are integers), and outputs a random integer in[0, M). We compute the hash as follows: h′(x,w, K) = p2 ·x+ kX i=1 wi ·q i +p 3 ·K ! ·p 4,(8) h(x,w,...
work page 2024
-
[33]
restores single-sequence non-distortion by falling back to unwatermarked sampling on repeated contexts (Remark 1). B Gumbel-max proofs The following results were presented by Aaronson and Kirchner (2023) and formalized by Fernandez et al. (2023). Some elements of these proofs are used later, so we restate them here. An overview of the Gumbel-max generatio...
work page 2023
-
[34]
Using a Gaussian tail approximation, the logp-value of a Z-score islnp≈ −1 2 Z2
Letδ= (µ w −µ 0)/σbe the per-token signal-to-noise ratio. Using a Gaussian tail approximation, the logp-value of a Z-score islnp≈ −1 2 Z2. We define∆2 =δ 2/2 as the expected logp-value accumulation rate per watermarked token. 34 Power of the Global Test.The global test evaluates allntokens. The expected Z-score is: Zglobal = ρnσδ σ√n =ρδ √n=⇒E[lnp global]...
work page 2025
-
[35]
Tie.” For the final analysis, “Both Good,
to assess whether wa- termarking systematically affects script consistency or refusal rates. For script consistency, we observe 52 discordant pairs where WM was wrong but Non-WM was correct, versus 39 where Non-WM was wrong but WM was correct; with continuity correction, this yieldsχ2 = 1.58andp= 0.21. For refusal rates, we find 21 pairs where WM refused ...
work page 1987
-
[36]
Secret keys are calibrated per method via a Kolmogorov–Smirnov test to ensure uniform PRF hashes on unwatermarked text as done in Fernandez et al. (2025). The teacher generates 5,000 solutions using vLLM (Kwon et al.,
work page 2025
-
[37]
(rank 128, scaling factor 128, dropout 0.05) with learning rate2×10 −5 and 3 epochs. The loss is computed over the full teacher response (both the reasoning trace and the final answer) while the prompt tokens are masked out. Watermark Detection.We evaluate watermark transfer using theopen-modelradioactivity test of Sander et al. (2024, 2025). The test ope...
work page 2024
-
[38]
andwent i =f(H i) is a function of the local entropyHi at positioni, estimated via a single forward pass of the student model. Thep-value is computed via the moment-matched Gamma approximation of Equation 6, which accounts for the heterogeneous weights. Concave normalized-entropy transforms outperform linear/superlinear alternatives because they moderatel...
work page 1995
-
[39]
Semantic watermarks (Liu et al., 2023; Liu and Bu, 2024; Hou et al.,
adaptively scales the green-red bias based on the natural green-list probability mass, reducing distortion in low-entropy contexts, but remains non-distortion-free since it still applies a logit bias. Semantic watermarks (Liu et al., 2023; Liu and Bu, 2024; Hou et al.,
work page 2023
-
[40]
Gumbel-max (Aaronson and Kirchner, 2023), Permute-and-Flip (Zhao et al., 2024), DiPMark (Wu et al.,
require auxiliary semantic encoders at generation time, making them harder to deploy. Gumbel-max (Aaronson and Kirchner, 2023), Permute-and-Flip (Zhao et al., 2024), DiPMark (Wu et al.,
work page 2023
-
[41]
Toolkits have also been introduced to benchmark these methods (Piet et al., 2023; Pan et al., 2024)
(multiple generations per query, impractical for production) are distortion-free. Toolkits have also been introduced to benchmark these methods (Piet et al., 2023; Pan et al., 2024). Recent large-scale evaluations (Fernandez et al.,
work page 2023
-
[42]
show that Gumbel-max and SynthID achieve the best detectability-quality Pareto frontier among all methods, strictly dominating DiPMark, green-red variants, and semantic watermarks. TextSealbuildsontheGumbel-maxframeworkbutintroducesdual-keygenerationfordiversity, entropy- weighted detection, and localized multi-region search—none of which are present in p...
work page 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.