Global Sketch-Based Watermarking for Diffusion Language Models

Daniel Zhao

arxiv: 2606.04486 · v1 · pith:W7NBKMVHnew · submitted 2026-06-03 · 💻 cs.CR · cs.CL· cs.LG· stat.ML

Global Sketch-Based Watermarking for Diffusion Language Models

Daniel Zhao This is my paper

Pith reviewed 2026-06-28 06:17 UTC · model grok-4.3

classification 💻 cs.CR cs.CLcs.LGstat.ML

keywords watermarkingdiffusion language modelssketch representationglobal watermarkorder-agnostic detectionrobustnesssoundnessdistortion

0 comments

The pith

A global vector-valued sketch allows watermarking diffusion language models without depending on token generation order.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes a watermarking method for masked diffusion language models that controls a global sketch representation of the full text. This works because diffusion models jointly sample over unresolved positions, making additive statistics of the whole sequence tractable during generation. A sympathetic reader would care because the resulting detection statistic is order-agnostic and the watermarking rule avoids manifesting as a simple token bias, unlike local-context schemes in autoregressive models. The work analyzes the approach on distortion, soundness, and robustness.

Core claim

The sketch formulation decouples detection from the local contexts seen during generation, resulting in an order-agnostic statistic and a watermarking rule which does not manifest as a simple token bias. We analyze the distortion, soundness, and robustness properties of the method.

What carries the argument

The global, vector-valued sketch representation of the text, controlled through additive statistics during joint sampling.

If this is right

Detection succeeds without reference to the sequence of local contexts encountered during generation.
The watermark rule produces no simple per-token bias in the output distribution.
The method preserves the ability to control the global sketch while sampling the diffusion process.
Soundness guarantees reliable detection and robustness holds under the analyzed modifications.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same global-sketch control could apply to other non-autoregressive sampling schemes that permit additive statistics.
Watermarks of this form might resist removal by reordering or partial rewriting of the generated text.
Detection could be performed on the final sequence alone, without any record of the generation steps.

Load-bearing premise

Additive statistics of the entire sequence are tractable during generation in diffusion language models, allowing control over a global vector-valued sketch representation.

What would settle it

An experiment showing that the detection statistic changes substantially when tokens are reordered or that the sketch cannot be adjusted without large distortion to the output text.

read the original abstract

Watermarking methods for language models have been studied extensively in the autoregressive setting, where tokens are generated sequentially. These works largely focus on local-context schemes that perturb the next token's distribution as a function of its preceding tokens. In diffusion language models, distributions over many unresolved positions are jointly sampled, allowing additive statistics of the entire sequence to be tractable during generation. We propose a watermark for masked diffusion language models that controls a global, vector-valued sketch representation of the text. Compared to context-dependent watermarking, the sketch formulation decouples detection from the local contexts seen during generation, resulting in an order-agnostic statistic and a watermarking rule which does not manifest as a simple token bias. We analyze the distortion, soundness, and robustness properties of the method.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The sketch watermark for diffusion LMs is a distinct formulation that avoids local token bias, but the claim that global additive control is tractable during iterative denoising needs the actual update rule to hold up.

read the letter

The main takeaway is that this paper gives a global vector sketch watermark for masked diffusion language models. It treats the whole sequence as an additive statistic that can be controlled at generation time, which produces an order-agnostic detection statistic instead of the usual context-dependent token bias used in autoregressive watermarking.

What is new is the move from local perturbations to a sketch that decouples detection from the positions seen during sampling. The abstract says they analyze distortion, soundness, and robustness, and the joint sampling property of diffusion is the lever that makes the global control possible. That is a clean conceptual step if the mechanics work.

The soft spot is exactly the one the stress-test note flags: whether an additive global constraint can be enforced without introducing position or order dependence in the iterative denoising process. The abstract does not show the update rule or any approximation, so it is impossible to judge whether the independence claim survives the actual sampling loop. If the paper only assumes tractability without a concrete mechanism, the central advantage disappears.

No circularity or invented entities show up in what is presented. The work is for people already working on watermarking or detection for non-autoregressive generators. A reader who wants to see how the sketch idea translates to diffusion would find the formulation useful even if the implementation details need tightening.

It deserves a serious referee. The idea is distinct enough from prior AR work that the details on the control rule are worth checking in review.

Referee Report

2 major / 0 minor

Summary. The paper proposes a watermarking scheme for masked diffusion language models that controls a global vector-valued sketch of the full sequence rather than applying local token biases. It argues that joint sampling over unresolved positions makes additive statistics tractable, yielding an order-agnostic detection statistic that decouples from generation contexts. The work claims to analyze the resulting distortion, soundness, and robustness properties.

Significance. If the tractability of global sketch control can be established with concrete update rules or approximations whose independence from sampling order is proven, the method would provide a genuinely distinct alternative to autoregressive watermarking and could improve robustness against local edits. The absence of any equations, algorithms, or experimental results in the abstract, however, leaves the central technical contribution unverified.

major comments (2)

[Abstract] Abstract: the claim that 'additive statistics of the entire sequence to be tractable during generation' is load-bearing for the order-agnostic property, yet no update rule, approximation, or conditioning argument is supplied to show how a global vector constraint is enforced across iterative denoising steps without reintroducing position or order dependence.
[Abstract] Abstract: the assertion that the watermarking rule 'does not manifest as a simple token bias' is presented as a direct consequence of the sketch formulation, but without a derivation relating the sketch control mechanism to the conditional distributions sampled at each diffusion step, it is impossible to confirm that the bias is avoided rather than merely relocated.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their comments on the abstract. We address each point below and are prepared to revise the abstract for greater technical clarity while preserving its concise nature.

read point-by-point responses

Referee: [Abstract] Abstract: the claim that 'additive statistics of the entire sequence to be tractable during generation' is load-bearing for the order-agnostic property, yet no update rule, approximation, or conditioning argument is supplied to show how a global vector constraint is enforced across iterative denoising steps without reintroducing position or order dependence.

Authors: The abstract condenses the central observation that masked diffusion permits joint sampling over unresolved tokens, rendering additive sketch statistics tractable. The concrete update rules, the conditioning argument that preserves order independence, and the associated proofs appear in Sections 3 and 4 of the manuscript. To address the concern that these details are not visible from the abstract alone, we will add a single sentence referencing the joint-sampling tractability argument. revision: yes
Referee: [Abstract] Abstract: the assertion that the watermarking rule 'does not manifest as a simple token bias' is presented as a direct consequence of the sketch formulation, but without a derivation relating the sketch control mechanism to the conditional distributions sampled at each diffusion step, it is impossible to confirm that the bias is avoided rather than merely relocated.

Authors: The claim follows from the fact that the watermark is realized as a global vector constraint on the sketch rather than a position- or context-dependent adjustment to individual token logits. The explicit mapping from the sketch constraint to the per-step conditional distributions is derived in Section 4. We agree the abstract would benefit from a brief clarifying clause and will include one in the revision. revision: yes

Circularity Check

0 steps flagged

No significant circularity; proposal rests on model properties rather than self-referential derivation

full rationale

The provided abstract and description present a methodological proposal for sketch-based watermarking in diffusion LMs. It grounds the approach in the joint sampling property of masked diffusion models (additive statistics being tractable), without any equations, fitted parameters, self-citations, or derivations that reduce the claimed statistic or rule back to its own inputs by construction. No load-bearing steps match the enumerated circularity patterns; the order-agnostic claim follows directly from the global sketch definition rather than from a fitted input renamed as prediction or a self-citation chain. This is the expected self-contained case for a methods paper.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Insufficient information in the abstract to identify any free parameters, axioms, or invented entities.

pith-pipeline@v0.9.1-grok · 5653 in / 981 out tokens · 32065 ms · 2026-06-28T06:17:35.135434+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

57 extracted references · 17 canonical work pages · 2 internal anchors

[1]

WaterMax: breaking the

Giboulot, Eva and Furon, Teddy , journal =. WaterMax: breaking the
[2]

Proceedings of the 40th International Conference on Machine Learning , series =

A Watermark for Large Language Models , author =. Proceedings of the 40th International Conference on Machine Learning , series =. 2023 , publisher =

2023
[3]

Provable Robust Watermarking for

Zhao, Xuandong and Ananth, Prabhanjan Vijendra and Li, Lei and Wang, Yu-Xiang , journal =. Provable Robust Watermarking for
[4]

Transactions on Machine Learning Research , year =

Robust Distortion-free Watermarks for Language Models , author =. Transactions on Machine Learning Research , year =
[5]

Proceedings of the Thirty Seventh Conference on Learning Theory , series =

Undetectable Watermarks for Language Models , author =. Proceedings of the Thirty Seventh Conference on Learning Theory , series =. 2024 , publisher =

2024
[6]

Mathematics , volume =

Watermarking for Large Language Models: A Survey , author =. Mathematics , volume =. 2025 , publisher =

2025
[7]

ACM Computing Surveys , volume =

A Survey of Text Watermarking in the Era of Large Language Models , author =. ACM Computing Surveys , volume =. 2024 , publisher =

2024
[8]

Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies , pages =

Advancing Beyond Identification: Multi-bit Watermark for Large Language Models , author =. Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies , pages =. 2024 , publisher =

2024
[9]

Provably Robust Multi-bit Watermarking for

Qu, Wenjie and Zheng, Wengrui and Tao, Tianyang and Yin, Dong and Jiang, Yanze and Tian, Zhihua and Zou, Wei and Jia, Jinyuan and Zhang, Jiaheng , booktitle =. Provably Robust Multi-bit Watermarking for. 2025 , publisher =

2025
[10]

Proceedings of the Twelfth International Conference on Learning Representations , year =

A Semantic Invariant Robust Watermark for Large Language Models , author =. Proceedings of the Twelfth International Conference on Learning Representations , year =
[11]

Findings of the Association for Computational Linguistics: NAACL 2024 , pages =

A Robust Semantics-based Watermark for Large Language Model against Paraphrasing , author =. Findings of the Association for Computational Linguistics: NAACL 2024 , pages =. 2024 , publisher =

2024
[12]

and Charikar, Moses and Frieze, Alan M

Broder, Andrei Z. and Charikar, Moses and Frieze, Alan M. and Mitzenmacher, Michael , title =. Journal of Computer and System Sciences , volume =. 2000 , doi =

2000
[13]

Charikar

Charikar, Moses S. , title =. Proceedings of the Thiry-Fourth Annual ACM Symposium on Theory of Computing , pages =. 2002 , publisher =. doi:10.1145/509907.509965 , url =

work page doi:10.1145/509907.509965 2002
[14]

Approximate nearest neighbors: Towards removing the curse of dimensionality

Indyk, Piotr and Motwani, Rajeev , title =. Proceedings of the Thirtieth Annual ACM Symposium on Theory of Computing , pages =. 1998 , publisher =. doi:10.1145/276698.276876 , url =

work page doi:10.1145/276698.276876 1998
[15]

Proceedings of the 26th Annual International Conference on Machine Learning , pages =

Weinberger, Kilian and Dasgupta, Anirban and Langford, John and Smola, Alex and Attenberg, Josh , title =. Proceedings of the 26th Annual International Conference on Machine Learning , pages =. 2009 , publisher =. doi:10.1145/1553374.1553516 , url =

work page doi:10.1145/1553374.1553516 2009
[16]

and Raghavan, Prabhakar and Sch

Manning, Christopher D. and Raghavan, Prabhakar and Sch. Introduction to Information Retrieval , publisher =. 2008 , isbn =

2008
[17]

Theoretical Computer Science , volume =

Finding Frequent Items in Data Streams , author =. Theoretical Computer Science , volume =. 2004 , publisher =

2004
[18]

Proceedings of the 26th International Conference on Machine Learning , pages =

Feature Hashing for Large Scale Multitask Learning , author =. Proceedings of the 26th International Conference on Machine Learning , pages =. 2009 , publisher =

2009
[19]

Diffusion-

Li, Xiang Lisa and Thickstun, John and Gulrajani, Ishaan and Liang, Percy and Hashimoto, Tatsunori , booktitle =. Diffusion-
[20]

Proceedings of the International Conference on Learning Representations , year =

DiffuSeq: Sequence to Sequence Text Generation with Diffusion Models , author =. Proceedings of the International Conference on Learning Representations , year =
[21]

Advances in Neural Information Processing Systems , volume =

Diffusion of Thought: Chain-of-Thought Reasoning in Diffusion Language Models , author =. Advances in Neural Information Processing Systems , volume =
[22]

arXiv preprint arXiv:2510.04146 , year =

Beyond Next-Token Prediction: A Performance Characterization of Diffusion versus Autoregressive Language Models , author =. arXiv preprint arXiv:2510.04146 , year =

work page arXiv
[23]

Xu, Xinyu and Zhang, Hao and Li, Wei and Wang, Rui and Zhao, Bing , journal =
[24]

Advances in Neural Information Processing Systems (NeurIPS) , year =

Structured Denoising Diffusion Models in Discrete State-Spaces , author =. Advances in Neural Information Processing Systems (NeurIPS) , year =
[25]

Discrete Diffusion Modeling by Estimating the Ratios of the Data Distribution

Discrete Diffusion Modeling by Estimating the Ratios of the Data Distribution , author =. arXiv preprint arXiv:2310.16834 , year =. 2310.16834 , archivePrefix=

work page internal anchor Pith review Pith/arXiv arXiv
[26]

Large Language Diffusion Models

Large Language Diffusion Models , author =. arXiv preprint arXiv:2502.09992 , year =. 2502.09992 , archivePrefix=

work page internal anchor Pith review Pith/arXiv arXiv
[27]

arXiv preprint arXiv:2509.24368 , year =

Watermarking Diffusion Language Models , author =. arXiv preprint arXiv:2509.24368 , year =. 2509.24368 , archivePrefix=

work page arXiv
[28]

Advances in Neural Information Processing Systems (NeurIPS) , year =

ROBIN: Robust and Invisible Watermarks for Diffusion Models with Adversarial Optimization , author =. Advances in Neural Information Processing Systems (NeurIPS) , year =
[29]

arXiv preprint arXiv:2411.11434 , year =

CLUE-MARK: Watermarking Diffusion Models using CLWE , author =. arXiv preprint arXiv:2411.11434 , year =. 2411.11434 , archivePrefix=

work page arXiv
[30]

arXiv preprint arXiv:2403.10893 , year =

A Watermark-Conditioned Diffusion Model for IP Protection , author =. arXiv preprint arXiv:2403.10893 , year =. 2403.10893 , archivePrefix=

work page arXiv
[31]

Advances in Neural Information Processing Systems (NeurIPS) , year =

Attack-Resilient Image Watermarking Using Stable Diffusion , author =. Advances in Neural Information Processing Systems (NeurIPS) , year =
[32]

SIAM Journal on Computing , volume =

Synchronization Strings: Explicit Constructions, Local Decoding, and Applications , author =. SIAM Journal on Computing , volume =. 2017 , doi =

2017
[33]

IEEE Transactions on Communications , volume =

Optimum Frame Synchronization , author =. IEEE Transactions on Communications , volume =. 1972 , month = apr, doi =

1972
[34]

IEEE Transactions on Communications , volume =

Frame Synchronization Techniques , author =. IEEE Transactions on Communications , volume =. 1980 , month = aug, doi =

1980
[35]

Annals of Telecommunications , volume =

Marker Codes for Channels with Insertions and Deletions , author =. Annals of Telecommunications , volume =. 2005 , month = feb, doi =

2005
[36]

The Theory of Error-Correcting Codes , author =
[37]

Sanov Property, Generalized

Csisz. Sanov Property, Generalized. Annals of Probability , year =
[38]

2006 , isbn =

Elements of Information Theory , author =. 2006 , isbn =

2006
[39]

Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP) , year =

Context-aware Watermark with Semantic Balanced Green-red Lists for Large Language Models , author =. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP) , year =

2024
[40]

arXiv preprint arXiv:2502.02787 , year =

SimMark: A Robust Sentence-Level Similarity-Based Watermarking Algorithm for Large Language Models , author =. arXiv preprint arXiv:2502.02787 , year =

work page arXiv
[41]

arXiv preprint arXiv:2509.21057 , year =

PMark: Towards Robust and Distortion-free Semantic-level Watermarking with Channel Constraints , author =. arXiv preprint arXiv:2509.21057 , year =

work page arXiv
[42]

arXiv preprint arXiv:2310.18491 , year =

Publicly-Detectable Watermarking for Language Models , author =. arXiv preprint arXiv:2310.18491 , year =

work page arXiv
[43]

arXiv preprint arXiv:2402.19361 , year =

Watermark Stealing in Large Language Models , author =. arXiv preprint arXiv:2402.19361 , year =. 2402.19361 , archivePrefix =

work page arXiv
[44]

International Conference on Learning Representations , year =

Black-Box Detection of Language Model Watermarks , author =. International Conference on Learning Representations , year =
[45]

International Conference on Learning Representations , year =

On the Learnability of Watermarks for Language Models , author =. International Conference on Learning Representations , year =
[46]

arXiv preprint arXiv:2511.02083 , year =

Watermarking Discrete Diffusion Language Models , author =. arXiv preprint arXiv:2511.02083 , year =. 2511.02083 , archivePrefix =

work page arXiv
[47]

I-Divergence Geometry of Probability Distributions and Minimization Problems , journal =

Csisz. I-Divergence Geometry of Probability Distributions and Minimization Problems , journal =. 1975 , doi =

1975
[48]

2023 , howpublished =

Watermarking of Large Language Models , author =. 2023 , howpublished =

2023
[49]

Frontiers in Education , volume =

Timothy Paustian and Betty Slinger , title =. Frontiers in Education , volume =. 2024 , doi =

2024
[50]

Argmax Flows and Multinomial Diffusion: Learning Categorical Distributions , journal =

Emiel Hoogeboom and Didrik Nielsen and Priyank Jaini and Patrick Forr. Argmax Flows and Multinomial Diffusion: Learning Categorical Distributions , journal =. 2021 , url =

2021
[51]

Survey on

Shushanta Pudasaini and Luis Miralles-Pechu. Survey on. Journal of Academic Ethics , volume =. 2025 , doi =

2025
[52]

arXiv preprint arXiv:2310.03991 , year =

Hou, Abe Bohan and Zhang, Jingyu and He, Tianxing and Wang, Yichen and Chuang, Yung-Sung and Wang, Hongwei and Shen, Lingfeng and Van Durme, Benjamin and Khashabi, Daniel and Tsvetkov, Yulia , title =. arXiv preprint arXiv:2310.03991 , year =. 2310.03991 , archivePrefix =

work page arXiv
[53]

Nature , volume =

Dathathri, Sumanth and See, Abigail and Ghaisas, Sumedh and Huang, Po-Sen and McAdam, Rob and Welbl, Johannes and Bachani, Vandana and Kaskasoli, Alex and Stanforth, Robert and Matejovicova, Tatiana and others , title =. Nature , volume =. 2024 , doi =

2024
[54]

arXiv preprint arXiv:2508.09192 , year =

Wang, Xu and Xu, Chenkai and Jin, Yijie and Jin, Jiachun and Zhang, Hao and Deng, Zhijie , title =. arXiv preprint arXiv:2508.09192 , year =. 2508.09192 , archivePrefix =

work page arXiv
[55]

and Rush, Alexander and Pierrot, Thomas and Wolf, Guy , title =

Schiff, Yair and Sahoo, Subham Sekhar and Phung, Hao and Wang, Guanghan and Boshar, Sam and Dalla-Torre, Hugo and de Almeida, Bernardo P. and Rush, Alexander and Pierrot, Thomas and Wolf, Guy , title =. International Conference on Learning Representations , year =
[56]

arXiv preprint arXiv:2601.13599 , year =

Ma, Linrui and Cui, Yufei and Han, Kai and Wang, Yunhe , title =. arXiv preprint arXiv:2601.13599 , year =. 2601.13599 , archivePrefix =

work page arXiv
[57]

Information Theory: Coding Theorems for Discrete Memoryless Systems , edition =

Imre Csisz. Information Theory: Coding Theorems for Discrete Memoryless Systems , edition =

[1] [1]

WaterMax: breaking the

Giboulot, Eva and Furon, Teddy , journal =. WaterMax: breaking the

[2] [2]

Proceedings of the 40th International Conference on Machine Learning , series =

A Watermark for Large Language Models , author =. Proceedings of the 40th International Conference on Machine Learning , series =. 2023 , publisher =

2023

[3] [3]

Provable Robust Watermarking for

Zhao, Xuandong and Ananth, Prabhanjan Vijendra and Li, Lei and Wang, Yu-Xiang , journal =. Provable Robust Watermarking for

[4] [4]

Transactions on Machine Learning Research , year =

Robust Distortion-free Watermarks for Language Models , author =. Transactions on Machine Learning Research , year =

[5] [5]

Proceedings of the Thirty Seventh Conference on Learning Theory , series =

Undetectable Watermarks for Language Models , author =. Proceedings of the Thirty Seventh Conference on Learning Theory , series =. 2024 , publisher =

2024

[6] [6]

Mathematics , volume =

Watermarking for Large Language Models: A Survey , author =. Mathematics , volume =. 2025 , publisher =

2025

[7] [7]

ACM Computing Surveys , volume =

A Survey of Text Watermarking in the Era of Large Language Models , author =. ACM Computing Surveys , volume =. 2024 , publisher =

2024

[8] [8]

Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies , pages =

Advancing Beyond Identification: Multi-bit Watermark for Large Language Models , author =. Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies , pages =. 2024 , publisher =

2024

[9] [9]

Provably Robust Multi-bit Watermarking for

Qu, Wenjie and Zheng, Wengrui and Tao, Tianyang and Yin, Dong and Jiang, Yanze and Tian, Zhihua and Zou, Wei and Jia, Jinyuan and Zhang, Jiaheng , booktitle =. Provably Robust Multi-bit Watermarking for. 2025 , publisher =

2025

[10] [10]

Proceedings of the Twelfth International Conference on Learning Representations , year =

A Semantic Invariant Robust Watermark for Large Language Models , author =. Proceedings of the Twelfth International Conference on Learning Representations , year =

[11] [11]

Findings of the Association for Computational Linguistics: NAACL 2024 , pages =

A Robust Semantics-based Watermark for Large Language Model against Paraphrasing , author =. Findings of the Association for Computational Linguistics: NAACL 2024 , pages =. 2024 , publisher =

2024

[12] [12]

and Charikar, Moses and Frieze, Alan M

Broder, Andrei Z. and Charikar, Moses and Frieze, Alan M. and Mitzenmacher, Michael , title =. Journal of Computer and System Sciences , volume =. 2000 , doi =

2000

[13] [13]

Charikar

Charikar, Moses S. , title =. Proceedings of the Thiry-Fourth Annual ACM Symposium on Theory of Computing , pages =. 2002 , publisher =. doi:10.1145/509907.509965 , url =

work page doi:10.1145/509907.509965 2002

[14] [14]

Approximate nearest neighbors: Towards removing the curse of dimensionality

Indyk, Piotr and Motwani, Rajeev , title =. Proceedings of the Thirtieth Annual ACM Symposium on Theory of Computing , pages =. 1998 , publisher =. doi:10.1145/276698.276876 , url =

work page doi:10.1145/276698.276876 1998

[15] [15]

Proceedings of the 26th Annual International Conference on Machine Learning , pages =

Weinberger, Kilian and Dasgupta, Anirban and Langford, John and Smola, Alex and Attenberg, Josh , title =. Proceedings of the 26th Annual International Conference on Machine Learning , pages =. 2009 , publisher =. doi:10.1145/1553374.1553516 , url =

work page doi:10.1145/1553374.1553516 2009

[16] [16]

and Raghavan, Prabhakar and Sch

Manning, Christopher D. and Raghavan, Prabhakar and Sch. Introduction to Information Retrieval , publisher =. 2008 , isbn =

2008

[17] [17]

Theoretical Computer Science , volume =

Finding Frequent Items in Data Streams , author =. Theoretical Computer Science , volume =. 2004 , publisher =

2004

[18] [18]

Proceedings of the 26th International Conference on Machine Learning , pages =

Feature Hashing for Large Scale Multitask Learning , author =. Proceedings of the 26th International Conference on Machine Learning , pages =. 2009 , publisher =

2009

[19] [19]

Diffusion-

Li, Xiang Lisa and Thickstun, John and Gulrajani, Ishaan and Liang, Percy and Hashimoto, Tatsunori , booktitle =. Diffusion-

[20] [20]

Proceedings of the International Conference on Learning Representations , year =

DiffuSeq: Sequence to Sequence Text Generation with Diffusion Models , author =. Proceedings of the International Conference on Learning Representations , year =

[21] [21]

Advances in Neural Information Processing Systems , volume =

Diffusion of Thought: Chain-of-Thought Reasoning in Diffusion Language Models , author =. Advances in Neural Information Processing Systems , volume =

[22] [22]

arXiv preprint arXiv:2510.04146 , year =

Beyond Next-Token Prediction: A Performance Characterization of Diffusion versus Autoregressive Language Models , author =. arXiv preprint arXiv:2510.04146 , year =

work page arXiv

[23] [23]

Xu, Xinyu and Zhang, Hao and Li, Wei and Wang, Rui and Zhao, Bing , journal =

[24] [24]

Advances in Neural Information Processing Systems (NeurIPS) , year =

Structured Denoising Diffusion Models in Discrete State-Spaces , author =. Advances in Neural Information Processing Systems (NeurIPS) , year =

[25] [25]

Discrete Diffusion Modeling by Estimating the Ratios of the Data Distribution

Discrete Diffusion Modeling by Estimating the Ratios of the Data Distribution , author =. arXiv preprint arXiv:2310.16834 , year =. 2310.16834 , archivePrefix=

work page internal anchor Pith review Pith/arXiv arXiv

[26] [26]

Large Language Diffusion Models

Large Language Diffusion Models , author =. arXiv preprint arXiv:2502.09992 , year =. 2502.09992 , archivePrefix=

work page internal anchor Pith review Pith/arXiv arXiv

[27] [27]

arXiv preprint arXiv:2509.24368 , year =

Watermarking Diffusion Language Models , author =. arXiv preprint arXiv:2509.24368 , year =. 2509.24368 , archivePrefix=

work page arXiv

[28] [28]

Advances in Neural Information Processing Systems (NeurIPS) , year =

ROBIN: Robust and Invisible Watermarks for Diffusion Models with Adversarial Optimization , author =. Advances in Neural Information Processing Systems (NeurIPS) , year =

[29] [29]

arXiv preprint arXiv:2411.11434 , year =

CLUE-MARK: Watermarking Diffusion Models using CLWE , author =. arXiv preprint arXiv:2411.11434 , year =. 2411.11434 , archivePrefix=

work page arXiv

[30] [30]

arXiv preprint arXiv:2403.10893 , year =

A Watermark-Conditioned Diffusion Model for IP Protection , author =. arXiv preprint arXiv:2403.10893 , year =. 2403.10893 , archivePrefix=

work page arXiv

[31] [31]

Advances in Neural Information Processing Systems (NeurIPS) , year =

Attack-Resilient Image Watermarking Using Stable Diffusion , author =. Advances in Neural Information Processing Systems (NeurIPS) , year =

[32] [32]

SIAM Journal on Computing , volume =

Synchronization Strings: Explicit Constructions, Local Decoding, and Applications , author =. SIAM Journal on Computing , volume =. 2017 , doi =

2017

[33] [33]

IEEE Transactions on Communications , volume =

Optimum Frame Synchronization , author =. IEEE Transactions on Communications , volume =. 1972 , month = apr, doi =

1972

[34] [34]

IEEE Transactions on Communications , volume =

Frame Synchronization Techniques , author =. IEEE Transactions on Communications , volume =. 1980 , month = aug, doi =

1980

[35] [35]

Annals of Telecommunications , volume =

Marker Codes for Channels with Insertions and Deletions , author =. Annals of Telecommunications , volume =. 2005 , month = feb, doi =

2005

[36] [36]

The Theory of Error-Correcting Codes , author =

[37] [37]

Sanov Property, Generalized

Csisz. Sanov Property, Generalized. Annals of Probability , year =

[38] [38]

2006 , isbn =

Elements of Information Theory , author =. 2006 , isbn =

2006

[39] [39]

Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP) , year =

Context-aware Watermark with Semantic Balanced Green-red Lists for Large Language Models , author =. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP) , year =

2024

[40] [40]

arXiv preprint arXiv:2502.02787 , year =

SimMark: A Robust Sentence-Level Similarity-Based Watermarking Algorithm for Large Language Models , author =. arXiv preprint arXiv:2502.02787 , year =

work page arXiv

[41] [41]

arXiv preprint arXiv:2509.21057 , year =

PMark: Towards Robust and Distortion-free Semantic-level Watermarking with Channel Constraints , author =. arXiv preprint arXiv:2509.21057 , year =

work page arXiv

[42] [42]

arXiv preprint arXiv:2310.18491 , year =

Publicly-Detectable Watermarking for Language Models , author =. arXiv preprint arXiv:2310.18491 , year =

work page arXiv

[43] [43]

arXiv preprint arXiv:2402.19361 , year =

Watermark Stealing in Large Language Models , author =. arXiv preprint arXiv:2402.19361 , year =. 2402.19361 , archivePrefix =

work page arXiv

[44] [44]

International Conference on Learning Representations , year =

Black-Box Detection of Language Model Watermarks , author =. International Conference on Learning Representations , year =

[45] [45]

International Conference on Learning Representations , year =

On the Learnability of Watermarks for Language Models , author =. International Conference on Learning Representations , year =

[46] [46]

arXiv preprint arXiv:2511.02083 , year =

Watermarking Discrete Diffusion Language Models , author =. arXiv preprint arXiv:2511.02083 , year =. 2511.02083 , archivePrefix =

work page arXiv

[47] [47]

I-Divergence Geometry of Probability Distributions and Minimization Problems , journal =

Csisz. I-Divergence Geometry of Probability Distributions and Minimization Problems , journal =. 1975 , doi =

1975

[48] [48]

2023 , howpublished =

Watermarking of Large Language Models , author =. 2023 , howpublished =

2023

[49] [49]

Frontiers in Education , volume =

Timothy Paustian and Betty Slinger , title =. Frontiers in Education , volume =. 2024 , doi =

2024

[50] [50]

Argmax Flows and Multinomial Diffusion: Learning Categorical Distributions , journal =

Emiel Hoogeboom and Didrik Nielsen and Priyank Jaini and Patrick Forr. Argmax Flows and Multinomial Diffusion: Learning Categorical Distributions , journal =. 2021 , url =

2021

[51] [51]

Survey on

Shushanta Pudasaini and Luis Miralles-Pechu. Survey on. Journal of Academic Ethics , volume =. 2025 , doi =

2025

[52] [52]

arXiv preprint arXiv:2310.03991 , year =

Hou, Abe Bohan and Zhang, Jingyu and He, Tianxing and Wang, Yichen and Chuang, Yung-Sung and Wang, Hongwei and Shen, Lingfeng and Van Durme, Benjamin and Khashabi, Daniel and Tsvetkov, Yulia , title =. arXiv preprint arXiv:2310.03991 , year =. 2310.03991 , archivePrefix =

work page arXiv

[53] [53]

Nature , volume =

Dathathri, Sumanth and See, Abigail and Ghaisas, Sumedh and Huang, Po-Sen and McAdam, Rob and Welbl, Johannes and Bachani, Vandana and Kaskasoli, Alex and Stanforth, Robert and Matejovicova, Tatiana and others , title =. Nature , volume =. 2024 , doi =

2024

[54] [54]

arXiv preprint arXiv:2508.09192 , year =

Wang, Xu and Xu, Chenkai and Jin, Yijie and Jin, Jiachun and Zhang, Hao and Deng, Zhijie , title =. arXiv preprint arXiv:2508.09192 , year =. 2508.09192 , archivePrefix =

work page arXiv

[55] [55]

and Rush, Alexander and Pierrot, Thomas and Wolf, Guy , title =

Schiff, Yair and Sahoo, Subham Sekhar and Phung, Hao and Wang, Guanghan and Boshar, Sam and Dalla-Torre, Hugo and de Almeida, Bernardo P. and Rush, Alexander and Pierrot, Thomas and Wolf, Guy , title =. International Conference on Learning Representations , year =

[56] [56]

arXiv preprint arXiv:2601.13599 , year =

Ma, Linrui and Cui, Yufei and Han, Kai and Wang, Yunhe , title =. arXiv preprint arXiv:2601.13599 , year =. 2601.13599 , archivePrefix =

work page arXiv

[57] [57]

Information Theory: Coding Theorems for Discrete Memoryless Systems , edition =

Imre Csisz. Information Theory: Coding Theorems for Discrete Memoryless Systems , edition =