pith. sign in

arxiv: 2606.31602 · v1 · pith:X6JA7PIUnew · submitted 2026-06-30 · 💻 cs.CL · cs.CR

Robust Text Watermarking for Large Language Models via Dual Semantic Embeddings

Pith reviewed 2026-07-01 05:34 UTC · model grok-4.3

classification 💻 cs.CL cs.CR
keywords text watermarkinglarge language modelssemantic embeddingsparaphrasing robustnesstranslation robustnessdual embeddingsstatistical detectionAI content tracing
0
0 comments X

The pith

Dual-Embedding Watermarking derives a signal from token and context embeddings that remains statistically detectable after paraphrasing and translation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Dual-Embedding Watermarking (DEW) as a method that applies algebraic vector-space operations to both token-level and contextual embeddings in large language models. It projects these embeddings through secret-keyed pseudo-random matrices to create an obfuscated watermark signal whose derived distributions support statistical detection. The central goal is to produce a signal that degrades gracefully under semantic changes rather than breaking entirely. If the approach holds, generated text could be traced back to its model origin even after common edits like rephrasing or language translation, while keeping output quality comparable to unmarked text.

Core claim

DEW applies algebraic vector-space operations to token and context embeddings to derive a watermark signal that degrades gracefully under semantic shifts. The signal is obfuscated by projecting embedding vectors through pseudo-random matrices seeded with a secret key. Distributions obtained from the underlying algebra are evaluated for statistical testing, and experiments across multiple LLMs show that this yields improved detection after paraphrasing, competitive text quality, and continued detectability after translation where earlier semantic watermarks lose effectiveness.

What carries the argument

Dual-Embedding Watermarking (DEW) scheme that performs algebraic operations on token and context embeddings followed by pseudo-random matrix projection for obfuscation.

If this is right

  • Detection performance after paraphrasing exceeds that of prior semantic watermarking methods.
  • Generated text quality stays competitive with unmarked output from the same models.
  • The watermark remains detectable after translation in cases where previous methods fail.
  • Statistical tests based on the derived distributions provide a practical benchmarking tool for the scheme.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The dual-embedding construction could be tested on other semantic transformations such as summarization or style transfer to check broader robustness.
  • Integration into LLM serving systems might allow origin tracing without visible changes to the output text.
  • The algebraic signal approach might combine with existing non-semantic watermarking techniques for layered protection.

Load-bearing premise

The algebraic operations on the embeddings create a watermark whose statistical properties stay reliable enough for detection even after paraphrasing or translation changes the text.

What would settle it

Run the statistical detector on a large collection of heavily paraphrased or translated watermarked texts and observe whether separation from unmarked texts falls to chance levels.

Figures

Figures reproduced from arXiv: 2606.31602 by Cezary Pilaszewicz, Gerhard Wunder, Jonas Sch\"afer.

Figure 1
Figure 1. Figure 1: An illustration of the DEW insertion procedure for a single generation step. Previously generated tokens (C) are jointly embedded, while the top-m candidate token embeddings are computed separately. All embeddings are projected for obfuscation, and the dot product of the projections is added to the original logits as token-specific watermark biases. We sample from the updated logits. Inputs are highlighted… view at source ↗
read the original abstract

This work presents Dual-Embedding Watermarking (DEW), a semantic watermarking scheme for large language models (LLMs) that leverages contextual and token-level embeddings to enhance robustness against paraphrasing and translation. DEW utilizes a signal-processing methodology, applying algebraic vector-space operations to \mbox{token and context embeddings to derive a watermark signal that degrades gracefully under semantic shifts. The method obfuscates the watermark by projecting embedding vectors through pseudo-random matrices seeded with a secret key. Relevant distributions derived from the underlying algebra are evaluated and employed for statistical testing and benchmarking of DEW. Experimental results across multiple LLMs indicate that DEW improves post-paraphrase detection while maintaining competitive text quality, and remains detectable after translation, even when prior semantic watermarks degrade significantly. These findings position DEW as a practical and robust solution for safeguarding LLM-generated text and addressing critical issues in responsible AI deployment.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The manuscript introduces Dual-Embedding Watermarking (DEW), a semantic watermarking scheme that applies algebraic vector-space operations to token and context embeddings to derive a watermark signal. The signal is obfuscated by projection through pseudo-random matrices seeded with a secret key. Distributions derived from the algebra are used for statistical testing. The paper claims that DEW improves post-paraphrase detection while maintaining competitive text quality and remains detectable after translation, outperforming prior semantic watermarks that degrade significantly under these shifts.

Significance. If the claimed robustness to semantic shifts is substantiated with explicit analysis of the detection statistic, DEW could offer a practical advance in LLM watermarking by addressing a key limitation of existing methods against paraphrasing and translation. The dual-embedding algebraic construction and use of derived distributions for detection constitute a distinct methodological contribution relative to prior embedding-based or hash-based approaches.

major comments (2)
  1. [Abstract] Abstract: the central claim that the watermark signal 'degrades gracefully under semantic shifts' and that 'relevant distributions derived from the underlying algebra are evaluated and employed for statistical testing' rests on the unshown premise that the null and alternative distributions of the test statistic remain valid or correctly calibrated after embedding perturbations induced by paraphrasing or translation. No derivation, invariance argument, or perturbation analysis is referenced to establish this step, which is load-bearing for the post-shift detectability results.
  2. [Abstract] Abstract (experimental claims): the reported improvements in post-paraphrase detection and post-translation detectability are stated without any quantitative metrics, baselines, dataset descriptions, number of trials, or controls for post-hoc analysis. This absence prevents assessment of whether the empirical support for the robustness claim is adequate.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thoughtful comments on the abstract. We address each point below and will revise the manuscript to improve clarity on the theoretical foundations and experimental reporting.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central claim that the watermark signal 'degrades gracefully under semantic shifts' and that 'relevant distributions derived from the underlying algebra are evaluated and employed for statistical testing' rests on the unshown premise that the null and alternative distributions of the test statistic remain valid or correctly calibrated after embedding perturbations induced by paraphrasing or translation. No derivation, invariance argument, or perturbation analysis is referenced to establish this step, which is load-bearing for the post-shift detectability results.

    Authors: The manuscript derives the relevant distributions from the dual-embedding algebra in Section 3 and includes a perturbation analysis in Section 4 demonstrating approximate invariance of the test statistic under semantic shifts via the properties of the pseudo-random projections. Empirical calibration is further validated through post-shift detection experiments. The abstract summarizes these results at a high level without referencing the sections. We will revise the abstract to explicitly note the derivation and perturbation analysis. revision: yes

  2. Referee: [Abstract] Abstract (experimental claims): the reported improvements in post-paraphrase detection and post-translation detectability are stated without any quantitative metrics, baselines, dataset descriptions, number of trials, or controls for post-hoc analysis. This absence prevents assessment of whether the empirical support for the robustness claim is adequate.

    Authors: The abstract provides a concise summary of the contributions and findings. Detailed quantitative metrics, baseline comparisons, dataset descriptions, trial counts, and analysis controls are presented in the Experiments section of the full manuscript. We agree that the abstract would benefit from including key quantitative highlights and will revise it accordingly to better support assessment of the claims. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper derives its detection distributions directly from algebraic operations on embeddings and a secret-key projection; these are presented as independent of the reported experimental performance numbers. No self-citations appear load-bearing, no parameters are fitted to the same data later called a prediction, and no ansatz or uniqueness claim reduces to prior author work. The central claim rests on the algebra-derived statistics remaining usable after shifts, which is an external assumption rather than a definitional loop. This is the most common honest finding for a method paper whose equations and tests are self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

Abstract-only review yields limited visibility into parameters and assumptions; the method relies on standard embedding spaces and secret-key randomness but introduces an algebraic signal whose properties are asserted without external benchmarks.

axioms (2)
  • domain assumption Algebraic vector-space operations applied to token and context embeddings yield a watermark signal that degrades gracefully under semantic shifts
    Invoked when the abstract states that the method applies algebraic operations 'to derive a watermark signal that degrades gracefully under semantic shifts'.
  • domain assumption Distributions derived from the underlying algebra are suitable for statistical testing of watermark presence
    Stated when the abstract says 'Relevant distributions derived from the underlying algebra are evaluated and employed for statistical testing and benchmarking of DEW'.
invented entities (1)
  • watermark signal obtained from dual-embedding algebraic operations no independent evidence
    purpose: To provide a detectable mark that survives paraphrasing and translation
    Introduced as the core output of the DEW construction; no independent evidence (e.g., predicted detection rates on external corpora) is supplied in the abstract.

pith-pipeline@v0.9.1-grok · 5685 in / 1623 out tokens · 30109 ms · 2026-07-01T05:34:44.422361+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

41 extracted references · 24 canonical work pages · 3 internal anchors

  1. [1]

    Watermarking

    Aaronson, Scott and Kirchner, Hendrik , year = 2022, month = 12, day = 13, url =. Watermarking

  2. [2]

    Proceedings of the 41st International Conference on Machine Learning , articleno =

    Liu, Yepeng and Bu, Yuheng , title =. Proceedings of the 41st International Conference on Machine Learning , articleno =. 2024 , publisher =

  3. [3]

    Revised Papers from the 5th International Workshop on Information Hiding , publisher =

    Natural Language Watermarking and Tamperproofing , author =. Revised Papers from the 5th International Workshop on Information Hiding , publisher =

  4. [4]

    Proceedings of Thirty Seventh Conference on Learning Theory , publisher =

    Undetectable Watermarks for Language Models , author =. Proceedings of Thirty Seventh Conference on Learning Theory , publisher =

  5. [6]

    Nature , volume = 634, number = 8035, pages =

    Scalable Watermarking for Identifying Large Language Model Outputs , author =. Nature , volume = 634, number = 8035, pages =. doi:10.1038/s41586-024-08025-4 , issn =

  6. [7]

    Isotropy Matters: Soft-

    Diera, Andor and Galke, Lukas and Scherp, Ansgar , year = 2024, month = 11, day = 27, publisher =. Isotropy Matters: Soft-. doi:10.48550/arXiv.2411.17538 , url =. 2411.17538 [cs] , eprinttype =

  7. [8]

    Towards Possibilities

    Soumya Suvra Ghosal and Souradip Chakraborty and Jonas Geiping and Furong Huang and Dinesh Manocha and Amrit Singh Bedi , year = 2023, journal =. Towards Possibilities. doi:10.48550/arxiv.2310.15264 , url =. 2310.15264 , timestamp =

  8. [9]

    The Llama 3 Herd of Models

    The Llama 3 Herd of Models , author =. CoRR , volume =. doi:10.48550/arxiv.2407.21783 , url =. 2407.21783 , timestamp =

  9. [10]

    Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , publisher =

    Can Watermarks Survive Translation? On the Cross-lingual Consistency of Text Watermark for Large Language Models , author =. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , publisher =. doi:10.18653/v1/2024.acl-long.226 , url =

  10. [11]

    doi:10.18653/v1/2024.naacl-long.226 , url =

    Hou, Abe and Zhang, Jingyu and He, Tianxing and Wang, Yichen and Chuang, Yung-Sung and Wang, Hongwei and Shen, Lingfeng and Van Durme, Benjamin and Khashabi, Daniel and Tsvetkov, Yulia , year = 2024, month = jun, booktitle =. doi:10.18653/v1/2024.naacl-long.226 , url =

  11. [12]

    Hou, Abe and Zhang, Jingyu and Wang, Yichen and Khashabi, Daniel and He, Tianxing , year = 2024, month = aug, booktitle =. k-. doi:10.18653/v1/2024.findings-acl.98 , url =

  12. [13]

    The Twelfth International Conference on Learning Representations,

    Unbiased Watermark for Large Language Models , author =. The Twelfth International Conference on Learning Representations,

  13. [14]

    doi:10.18653/v1/2021.findings-emnlp.23 , url =

    Huang, Junjie and Tang, Duyu and Zhong, Wanjun and Lu, Shuai and Shou, Linjun and Gong, Ming and Jiang, Daxin and Duan, Nan , year = 2021, month = nov, booktitle =. doi:10.18653/v1/2021.findings-emnlp.23 , url =

  14. [15]

    Proceedings of the 41st International Conference on Machine Learning , location =

    Token-specific watermarking with enhanced detectability and semantic coherence for large language models , author =. Proceedings of the 41st International Conference on Machine Learning , location =

  15. [16]

    The Journal of the Acoustical Society of America , volume = 62, number =

    Perplexity--a measure of the difficulty of speech recognition tasks , author =. The Journal of the Acoustical Society of America , volume = 62, number =. doi:10.1121/1.2016299 , issn =

  16. [17]

    Proceedings of the 41st International Conference on Machine Learning , location =

    Watermark stealing in large language models , author =. Proceedings of the 41st International Conference on Machine Learning , location =

  17. [18]

    Proceedings of the 40th international conference on machine learning , publisher =

    A Watermark for Large Language Models , author =. Proceedings of the 40th international conference on machine learning , publisher =

  18. [19]

    CoRR , volume =

    On the Reliability of Watermarks for Large Language Models , author =. CoRR , volume =. doi:10.48550/arxiv.2306.04634 , url =. 2306.04634 , timestamp =

  19. [20]

    Proceedings of the 37th International Conference on Neural Information Processing Systems , location =

    Paraphrasing evades detectors of AI-generated text, but retrieval is an effective defense , author =. Proceedings of the 37th International Conference on Neural Information Processing Systems , location =

  20. [21]

    Robust Distortion-free Watermarks for Language Models , author =. Trans. Mach. Learn. Res. , volume = 2024, url =

  21. [22]

    The Twelfth International Conference on Learning Representations,

    A Semantic Invariant Robust Watermark for Large Language Models , author =. The Twelfth International Conference on Learning Representations,

  22. [23]

    CoRR , volume =

    A Survey of Text Watermarking in the Era of Large Language Models , author =. CoRR , volume =. doi:10.48550/arxiv.2312.07913 , url =. 2312.07913 , timestamp =

  23. [24]

    Generative

    Nahema Marchal and Rachel Xu and Rasmi Elasmar and Iason Gabriel and Beth Goldberg and William Isaac , year = 2024, journal =. Generative. doi:10.48550/arxiv.2406.13843 , url =. 2406.13843 , timestamp =

  24. [25]

    doi:10.18653/v1/2024.emnlp-demo.7 , url =

    Pan, Leyi and Liu, Aiwei and He, Zhiwei and Gao, Zitian and Zhao, Xuandong and Lu, Yijian and Zhou, Binglin and Liu, Shuliang and Hu, Xuming and Wen, Lijie and others , year = 2024, month = nov, booktitle =. doi:10.18653/v1/2024.emnlp-demo.7 , url =

  25. [26]

    , year = 2011, month = 11, booktitle =

    Papandreou, George and Yuille, Alan L. , year = 2011, month = 11, booktitle =. Perturb-and-. doi:10.1109/iccv.2011.6126242 , url =

  26. [27]

    Exploring the limits of transfer learning with a unified text-to-text transformer , author =. J. Mach. Learn. Res. , publisher =

  27. [28]

    Sentence- BERT : Sentence Embeddings using Siamese BERT -Networks

    Reimers, Nils and Gurevych, Iryna , year = 2019, month = nov, booktitle =. Sentence-. doi:10.18653/v1/D19-1410 , url =

  28. [29]

    doi:10.1186/s41239-024-00478-x , issn =

    Shahzad, Muhammad Farrukh and Xu, Shuo and Javed, Iqra , year = 2024, month =. doi:10.1186/s41239-024-00478-x , issn =

  29. [30]

    The Science of Detecting LLM-Generated Text , author =. Commun. ACM , publisher =. doi:10.1145/3624725 , issn =

  30. [31]

    Proceedings of the 8th Workshop on Multimedia and Security , location =

    The hiding virtues of ambiguity: quantifiably resilient watermarking of natural language text through synonym substitutions , author =. Proceedings of the 8th Workshop on Multimedia and Security , location =. doi:10.1145/1161366.1161397 , isbn = 1595934936, url =

  31. [32]

    Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , publisher =

    Stumbling Blocks: Stress Testing the Robustness of Machine-Generated Text Detectors Under Attacks , author =. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , publisher =. doi:10.18653/v1/2024.acl-long.160 , url =

  32. [33]

    CoRR , volume =

    Understanding User Experience in Large Language Model Interactions , author =. CoRR , volume =. doi:10.48550/arxiv.2401.08329 , url =. 2401.08329 , timestamp =

  33. [34]

    Testing of detection tools for

    Weber-Wulff, Debora and Anohina-Naumeca, Alla and Bjelobaba, Sonja and others , year = 2023, month = 12, day = 25, volume = 19, number = 1, pages = 26, doi =. Testing of detection tools for

  34. [35]

    CoRR , volume =

    Distortion-free Watermarks are not Truly Distortion-free under Watermark Key Collisions , author =. CoRR , volume =. doi:10.48550/arxiv.2406.02603 , url =. 2406.02603 , timestamp =

  35. [36]

    Proceedings of the 41st International Conference on Machine Learning , location =

    A resilient and accessible distribution-preserving watermark for large language models , author =. Proceedings of the 41st International Conference on Machine Learning , location =

  36. [37]

    The Twelfth International Conference on Learning Representations,

    Provable Robust Watermarking for AI-Generated Text , author =. The Twelfth International Conference on Learning Representations,

  37. [38]

    Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (

    Neural Linguistic Steganography , author =. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (. doi:10.18653/v1/D19-1115 , url =

  38. [39]

    Contemporary Mathematics , pages =

    Extensions of lipschitz mappings into a hilbert space , author =. Contemporary Mathematics , pages =

  39. [40]

    The Falcon Series of Open Language Models

    The Falcon Series of Open Language Models , author =. CoRR , volume =. doi:10.48550/arxiv.2311.16867 , url =. 2311.16867 , timestamp =

  40. [41]

    Gemma: Open Models Based on Gemini Research and Technology

    Gemma: Open Models Based on Gemini Research and Technology , author =. CoRR , volume =. doi:10.48550/arxiv.2403.08295 , url =. 2403.08295 , timestamp =

  41. [42]

    Proceedings of the 41st International Conference on Machine Learning , articleno =

    Jiang, Yibo and Rajendran, Goutham and Ravikumar, Pradeep and Aragam, Bryon and Veitch, Victor , title =. Proceedings of the 41st International Conference on Machine Learning , articleno =. 2024 , publisher =