pith. sign in

arxiv: 2606.31602 · v2 · pith:X6JA7PIUnew · submitted 2026-06-30 · 💻 cs.CL · cs.CR

Robust Text Watermarking for Large Language Models via Dual Semantic Embeddings

Pith reviewed 2026-07-01 05:34 UTC · model grok-4.3

classification 💻 cs.CL cs.CR
keywords text watermarkinglarge language modelssemantic embeddingsparaphrasing robustnesstranslation robustnessdual embeddingsstatistical detectionAI content tracing
0
0 comments X

The pith

Dual-Embedding Watermarking derives a signal from token and context embeddings that remains statistically detectable after paraphrasing and translation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Dual-Embedding Watermarking (DEW) as a method that applies algebraic vector-space operations to both token-level and contextual embeddings in large language models. It projects these embeddings through secret-keyed pseudo-random matrices to create an obfuscated watermark signal whose derived distributions support statistical detection. The central goal is to produce a signal that degrades gracefully under semantic changes rather than breaking entirely. If the approach holds, generated text could be traced back to its model origin even after common edits like rephrasing or language translation, while keeping output quality comparable to unmarked text.

Core claim

DEW applies algebraic vector-space operations to token and context embeddings to derive a watermark signal that degrades gracefully under semantic shifts. The signal is obfuscated by projecting embedding vectors through pseudo-random matrices seeded with a secret key. Distributions obtained from the underlying algebra are evaluated for statistical testing, and experiments across multiple LLMs show that this yields improved detection after paraphrasing, competitive text quality, and continued detectability after translation where earlier semantic watermarks lose effectiveness.

What carries the argument

Dual-Embedding Watermarking (DEW) scheme that performs algebraic operations on token and context embeddings followed by pseudo-random matrix projection for obfuscation.

If this is right

  • Detection performance after paraphrasing exceeds that of prior semantic watermarking methods.
  • Generated text quality stays competitive with unmarked output from the same models.
  • The watermark remains detectable after translation in cases where previous methods fail.
  • Statistical tests based on the derived distributions provide a practical benchmarking tool for the scheme.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The dual-embedding construction could be tested on other semantic transformations such as summarization or style transfer to check broader robustness.
  • Integration into LLM serving systems might allow origin tracing without visible changes to the output text.
  • The algebraic signal approach might combine with existing non-semantic watermarking techniques for layered protection.

Load-bearing premise

The algebraic operations on the embeddings create a watermark whose statistical properties stay reliable enough for detection even after paraphrasing or translation changes the text.

What would settle it

Run the statistical detector on a large collection of heavily paraphrased or translated watermarked texts and observe whether separation from unmarked texts falls to chance levels.

Figures

Figures reproduced from arXiv: 2606.31602 by Cezary Pilaszewicz, Gerhard Wunder, Jonas Sch\"afer.

Figure 1
Figure 1. Figure 1: An illustration of the DEW insertion procedure for a single generation step. Previously generated tokens (C) are jointly embedded, while the top-m candidate token embeddings are computed separately. All embeddings are projected for obfuscation, and the dot product of the projections is added to the original logits as token-specific watermark biases. We sample from the updated logits. Inputs are highlighted… view at source ↗
read the original abstract

This work presents Dual-Embedding Watermarking (DEW), a semantic watermarking scheme for large language models (LLMs) that leverages contextual and token-level embeddings to enhance robustness against paraphrasing and translation. DEW utilizes a signal-processing methodology, applying algebraic vector-space operations to token and context embeddings to derive a watermark signal that degrades gracefully under semantic shifts. The method obfuscates the watermark by projecting embedding vectors through pseudo-random matrices seeded with a secret key. Relevant distributions derived from the underlying algebra are evaluated and employed for statistical testing and benchmarking of DEW. Experimental results across multiple LLMs indicate that DEW improves post-paraphrase detection while maintaining competitive text quality, and remains detectable after translation, even when prior semantic watermarks degrade significantly. These findings position DEW as a practical and robust solution for safeguarding LLM-generated text and addressing critical issues in responsible AI deployment.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The manuscript introduces Dual-Embedding Watermarking (DEW), a semantic watermarking scheme that applies algebraic vector-space operations to token and context embeddings to derive a watermark signal. The signal is obfuscated by projection through pseudo-random matrices seeded with a secret key. Distributions derived from the algebra are used for statistical testing. The paper claims that DEW improves post-paraphrase detection while maintaining competitive text quality and remains detectable after translation, outperforming prior semantic watermarks that degrade significantly under these shifts.

Significance. If the claimed robustness to semantic shifts is substantiated with explicit analysis of the detection statistic, DEW could offer a practical advance in LLM watermarking by addressing a key limitation of existing methods against paraphrasing and translation. The dual-embedding algebraic construction and use of derived distributions for detection constitute a distinct methodological contribution relative to prior embedding-based or hash-based approaches.

major comments (2)
  1. [Abstract] Abstract: the central claim that the watermark signal 'degrades gracefully under semantic shifts' and that 'relevant distributions derived from the underlying algebra are evaluated and employed for statistical testing' rests on the unshown premise that the null and alternative distributions of the test statistic remain valid or correctly calibrated after embedding perturbations induced by paraphrasing or translation. No derivation, invariance argument, or perturbation analysis is referenced to establish this step, which is load-bearing for the post-shift detectability results.
  2. [Abstract] Abstract (experimental claims): the reported improvements in post-paraphrase detection and post-translation detectability are stated without any quantitative metrics, baselines, dataset descriptions, number of trials, or controls for post-hoc analysis. This absence prevents assessment of whether the empirical support for the robustness claim is adequate.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thoughtful comments on the abstract. We address each point below and will revise the manuscript to improve clarity on the theoretical foundations and experimental reporting.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central claim that the watermark signal 'degrades gracefully under semantic shifts' and that 'relevant distributions derived from the underlying algebra are evaluated and employed for statistical testing' rests on the unshown premise that the null and alternative distributions of the test statistic remain valid or correctly calibrated after embedding perturbations induced by paraphrasing or translation. No derivation, invariance argument, or perturbation analysis is referenced to establish this step, which is load-bearing for the post-shift detectability results.

    Authors: The manuscript derives the relevant distributions from the dual-embedding algebra in Section 3 and includes a perturbation analysis in Section 4 demonstrating approximate invariance of the test statistic under semantic shifts via the properties of the pseudo-random projections. Empirical calibration is further validated through post-shift detection experiments. The abstract summarizes these results at a high level without referencing the sections. We will revise the abstract to explicitly note the derivation and perturbation analysis. revision: yes

  2. Referee: [Abstract] Abstract (experimental claims): the reported improvements in post-paraphrase detection and post-translation detectability are stated without any quantitative metrics, baselines, dataset descriptions, number of trials, or controls for post-hoc analysis. This absence prevents assessment of whether the empirical support for the robustness claim is adequate.

    Authors: The abstract provides a concise summary of the contributions and findings. Detailed quantitative metrics, baseline comparisons, dataset descriptions, trial counts, and analysis controls are presented in the Experiments section of the full manuscript. We agree that the abstract would benefit from including key quantitative highlights and will revise it accordingly to better support assessment of the claims. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper derives its detection distributions directly from algebraic operations on embeddings and a secret-key projection; these are presented as independent of the reported experimental performance numbers. No self-citations appear load-bearing, no parameters are fitted to the same data later called a prediction, and no ansatz or uniqueness claim reduces to prior author work. The central claim rests on the algebra-derived statistics remaining usable after shifts, which is an external assumption rather than a definitional loop. This is the most common honest finding for a method paper whose equations and tests are self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

Abstract-only review yields limited visibility into parameters and assumptions; the method relies on standard embedding spaces and secret-key randomness but introduces an algebraic signal whose properties are asserted without external benchmarks.

axioms (2)
  • domain assumption Algebraic vector-space operations applied to token and context embeddings yield a watermark signal that degrades gracefully under semantic shifts
    Invoked when the abstract states that the method applies algebraic operations 'to derive a watermark signal that degrades gracefully under semantic shifts'.
  • domain assumption Distributions derived from the underlying algebra are suitable for statistical testing of watermark presence
    Stated when the abstract says 'Relevant distributions derived from the underlying algebra are evaluated and employed for statistical testing and benchmarking of DEW'.
invented entities (1)
  • watermark signal obtained from dual-embedding algebraic operations no independent evidence
    purpose: To provide a detectable mark that survives paraphrasing and translation
    Introduced as the core output of the DEW construction; no independent evidence (e.g., predicted detection rates on external corpora) is supplied in the abstract.

pith-pipeline@v0.9.1-grok · 5685 in / 1623 out tokens · 30109 ms · 2026-07-01T05:34:44.422361+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.