pith. sign in

arxiv: 2510.13829 · v4 · submitted 2025-10-10 · 💻 cs.CL · cs.AI

A Linguistics-Aware LLM Watermarking via Syntactic Predictability

Pith reviewed 2026-05-18 08:24 UTC · model grok-4.3

classification 💻 cs.CL cs.AI
keywords LLM watermarkingsyntactic predictabilityPOS n-gramslinguistic indeterminacypublic verificationdetection robustnesstext quality trade-off
0
0 comments X

The pith

STELA modulates watermark strength in LLM text according to syntactic predictability from part-of-speech n-grams.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces STELA, a watermarking approach that measures linguistic flexibility through part-of-speech n-gram patterns to decide how strongly to embed a detectable signal in generated text. In grammatically rigid spots the method applies a lighter watermark so that output quality stays high; in spots with more wording options it applies a stronger one to make later detection easier. Detection works without any access to the original model's probability outputs, which removes a major obstacle to public verification. Experiments on English, Chinese, and Korean demonstrate higher detection rates than earlier techniques while text remains readable.

Core claim

STELA aligns watermark strength with linguistic degrees of freedom by modeling indeterminacy via part-of-speech n-grams, weakening the signal where grammar constrains choices and strengthening it where flexibility exists, thereby supporting logit-free public detection and improved robustness across typologically different languages.

What carries the argument

POS n-gram modeled linguistic indeterminacy, used as a proxy to dynamically scale watermark intensity according to local syntactic constraints.

If this is right

  • Detection no longer requires access to the source model's internal token probabilities.
  • The quality-detection balance improves across analytic, isolating, and agglutinative languages.
  • Watermark signals can be strengthened selectively in open syntactic positions without broad quality loss.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same POS-based modulation idea could be tested on other structural signals such as dependency labels or sentence position.
  • If POS taggers are available, the method might transfer to additional languages with minimal extra engineering.
  • Public verifiability opens the possibility of combining STELA with external registries that record watermark keys.

Load-bearing premise

Part-of-speech n-gram statistics serve as a reliable stand-in for how much wording freedom exists in a given sentence position.

What would settle it

A controlled test that measures whether watermark detection accuracy falls below baseline methods when the POS n-gram predictor assigns high indeterminacy to contexts that humans judge as tightly constrained.

read the original abstract

As large language models (LLMs) continue to advance rapidly, reliable governance tools have become critical. Publicly verifiable watermarking is particularly essential for fostering a trustworthy AI ecosystem. A central challenge persists: balancing text quality against detection robustness. Recent studies have sought to navigate this trade-off by leveraging signals from model output distributions (e.g., token-level entropy); however, their reliance on these model-specific signals presents a significant barrier to public verification, as the detection process requires access to the logits of the underlying model. We introduce STELA, a novel framework that aligns watermark strength with the linguistic degrees of freedom inherent in language. STELA dynamically modulates the signal using part-of-speech (POS) n-gram-modeled linguistic indeterminacy, weakening it in grammatically constrained contexts to preserve quality and strengthening it in contexts with greater linguistic flexibility to enhance detectability. Our detector operates without access to any model logits, thus facilitating publicly verifiable detection. Through extensive experiments on typologically diverse languages-analytic English, isolating Chinese, and agglutinative Korean-we show that STELA surpasses prior methods in detection robustness. Our code is available at https://github.com/Shinwoo-Park/stela_watermark.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The manuscript introduces STELA, a novel LLM watermarking framework that aligns watermark strength with linguistic degrees of freedom using part-of-speech (POS) n-gram statistics to model indeterminacy. The method dynamically modulates the signal—weakening it in grammatically constrained contexts to preserve quality and strengthening it in flexible contexts to enhance detectability—while enabling publicly verifiable detection without access to model logits. Experiments on analytic English, isolating Chinese, and agglutinative Korean demonstrate that STELA outperforms prior methods in detection robustness.

Significance. Should the results be confirmed, this work could significantly contribute to the field of AI governance by providing a publicly verifiable watermarking technique that does not rely on internal model distributions. The use of linguistic proxies for modulation addresses a key limitation of previous approaches. The provision of code enhances reproducibility, and the multi-lingual evaluation on diverse language types is a notable strength.

major comments (1)
  1. [Abstract and method description] The assumption that POS n-gram statistics provide a reliable proxy for linguistic indeterminacy directly controlling the quality-detection trade-off lacks direct validation against model entropy or quality metrics. This is central to the dynamic modulation strategy; if the proxy correlates only loosely with actual indeterminacy, the robustness gains may not hold beyond the tested setups.
minor comments (1)
  1. [Abstract] While the abstract reports superior detection robustness, it omits specific quantitative metrics, baselines, and details on how text quality was measured, which would strengthen the high-level claims.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback and positive assessment of STELA's potential contribution. We address the single major comment below.

read point-by-point responses
  1. Referee: [Abstract and method description] The assumption that POS n-gram statistics provide a reliable proxy for linguistic indeterminacy directly controlling the quality-detection trade-off lacks direct validation against model entropy or quality metrics. This is central to the dynamic modulation strategy; if the proxy correlates only loosely with actual indeterminacy, the robustness gains may not hold beyond the tested setups.

    Authors: We acknowledge that a direct empirical comparison between the POS n-gram indeterminacy scores and model entropy would provide stronger justification for the proxy choice. In the revised manuscript we will add a dedicated analysis subsection that reports Pearson and Spearman correlations between our POS-based scores and token-level entropy computed from the underlying LLM on a held-out set of 500 generations per language. This addition will quantify how well the linguistic proxy aligns with actual model uncertainty. At the same time, we note that the primary evidence for the approach remains the end-to-end multilingual experiments, which already demonstrate consistent gains in both detection AUC and human-judged quality relative to entropy-based and fixed-strength baselines. The linguistic grounding is further supported by established syntactic theory on n-gram predictability, yet we agree the requested correlation analysis will address the referee's concern about generalizability. revision: yes

Circularity Check

0 steps flagged

No circularity: linguistic proxy is external and independent

full rationale

The paper defines STELA by modulating watermark strength according to POS n-gram statistics that are computed from external linguistic resources, not from the LLM's output distributions or the detection performance itself. The abstract and method description present this as an input choice that enables public verification without logits, with no equations shown that would make the robustness gain equivalent to a fitted parameter or self-referential definition. Experiments across languages serve as external checks rather than tautological confirmation. No self-citation chains or ansatzes imported from prior author work are load-bearing for the central claim.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The approach rests on the assumption that POS n-gram statistics capture meaningful variation in linguistic flexibility. No free parameters or new entities are explicitly introduced in the abstract, but the modulation threshold is an implicit modeling choice.

free parameters (1)
  • modulation threshold
    Implicit cutoff used to decide when to weaken or strengthen the watermark based on n-gram indeterminacy.
axioms (1)
  • domain assumption POS n-gram statistics reliably indicate syntactic constraints and degrees of freedom in language
    Invoked to justify dynamic weakening/strengthening of the watermark signal.

pith-pipeline@v0.9.0 · 5745 in / 1273 out tokens · 28277 ms · 2026-05-18T08:24:14.341705+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.