A Linguistics-Aware LLM Watermarking via Syntactic Predictability

Hyejin Park; Hyeseon An; Shinwoo Park; Yo-Sub Han

arxiv: 2510.13829 · v4 · submitted 2025-10-10 · 💻 cs.CL · cs.AI

A Linguistics-Aware LLM Watermarking via Syntactic Predictability

Shinwoo Park , Hyejin Park , Hyeseon An , Yo-Sub Han This is my paper

Pith reviewed 2026-05-18 08:24 UTC · model grok-4.3

classification 💻 cs.CL cs.AI

keywords LLM watermarkingsyntactic predictabilityPOS n-gramslinguistic indeterminacypublic verificationdetection robustnesstext quality trade-off

0 comments

The pith

STELA modulates watermark strength in LLM text according to syntactic predictability from part-of-speech n-grams.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces STELA, a watermarking approach that measures linguistic flexibility through part-of-speech n-gram patterns to decide how strongly to embed a detectable signal in generated text. In grammatically rigid spots the method applies a lighter watermark so that output quality stays high; in spots with more wording options it applies a stronger one to make later detection easier. Detection works without any access to the original model's probability outputs, which removes a major obstacle to public verification. Experiments on English, Chinese, and Korean demonstrate higher detection rates than earlier techniques while text remains readable.

Core claim

STELA aligns watermark strength with linguistic degrees of freedom by modeling indeterminacy via part-of-speech n-grams, weakening the signal where grammar constrains choices and strengthening it where flexibility exists, thereby supporting logit-free public detection and improved robustness across typologically different languages.

What carries the argument

POS n-gram modeled linguistic indeterminacy, used as a proxy to dynamically scale watermark intensity according to local syntactic constraints.

If this is right

Detection no longer requires access to the source model's internal token probabilities.
The quality-detection balance improves across analytic, isolating, and agglutinative languages.
Watermark signals can be strengthened selectively in open syntactic positions without broad quality loss.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same POS-based modulation idea could be tested on other structural signals such as dependency labels or sentence position.
If POS taggers are available, the method might transfer to additional languages with minimal extra engineering.
Public verifiability opens the possibility of combining STELA with external registries that record watermark keys.

Load-bearing premise

Part-of-speech n-gram statistics serve as a reliable stand-in for how much wording freedom exists in a given sentence position.

What would settle it

A controlled test that measures whether watermark detection accuracy falls below baseline methods when the POS n-gram predictor assigns high indeterminacy to contexts that humans judge as tightly constrained.

read the original abstract

As large language models (LLMs) continue to advance rapidly, reliable governance tools have become critical. Publicly verifiable watermarking is particularly essential for fostering a trustworthy AI ecosystem. A central challenge persists: balancing text quality against detection robustness. Recent studies have sought to navigate this trade-off by leveraging signals from model output distributions (e.g., token-level entropy); however, their reliance on these model-specific signals presents a significant barrier to public verification, as the detection process requires access to the logits of the underlying model. We introduce STELA, a novel framework that aligns watermark strength with the linguistic degrees of freedom inherent in language. STELA dynamically modulates the signal using part-of-speech (POS) n-gram-modeled linguistic indeterminacy, weakening it in grammatically constrained contexts to preserve quality and strengthening it in contexts with greater linguistic flexibility to enhance detectability. Our detector operates without access to any model logits, thus facilitating publicly verifiable detection. Through extensive experiments on typologically diverse languages-analytic English, isolating Chinese, and agglutinative Korean-we show that STELA surpasses prior methods in detection robustness. Our code is available at https://github.com/Shinwoo-Park/stela_watermark.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

STELA's POS n-gram modulation for public LLM watermarking is a practical step forward, but the experiments do not directly test whether the syntactic proxy tracks actual quality or entropy trade-offs.

read the letter

The main takeaway is that this paper gives a watermarking scheme that adjusts signal strength using part-of-speech n-gram statistics instead of the model's own logits or entropy. Detection stays fully public because the linguistic model is separate from the LLM. That separation is the clearest practical difference from earlier logit-dependent methods. They run tests on English, Chinese, and Korean and report stronger detection than the baselines they compare against, with code released on GitHub. Those are the concrete pieces that stand out on a first read. The multi-language coverage is useful given how language structure varies, and releasing the implementation lets others check the numbers directly. The central assumption is that POS n-gram predictability lines up with contexts where you can safely raise watermark strength without hurting fluency. The paper shows robustness gains, but it does not include side-by-side checks of whether the n-gram scores predict token-level entropy, perplexity shifts, or human quality ratings when the watermark is applied. Without those links, the dynamic modulation could be functioning more as a fixed heuristic than as a linguistically grounded controller. The robustness results still stand on their own, but the justification for the modulation rule is thinner than the claim suggests. This is the sort of applied paper that matters for groups working on content authentication and AI governance tools. Readers who need public verification without model access will get the most out of it. The work is coherent enough on its own terms to deserve a serious referee, even if revisions will likely need tighter validation of the proxy. I would send it out for review rather than desk reject.

Referee Report

1 major / 1 minor

Summary. The manuscript introduces STELA, a novel LLM watermarking framework that aligns watermark strength with linguistic degrees of freedom using part-of-speech (POS) n-gram statistics to model indeterminacy. The method dynamically modulates the signal—weakening it in grammatically constrained contexts to preserve quality and strengthening it in flexible contexts to enhance detectability—while enabling publicly verifiable detection without access to model logits. Experiments on analytic English, isolating Chinese, and agglutinative Korean demonstrate that STELA outperforms prior methods in detection robustness.

Significance. Should the results be confirmed, this work could significantly contribute to the field of AI governance by providing a publicly verifiable watermarking technique that does not rely on internal model distributions. The use of linguistic proxies for modulation addresses a key limitation of previous approaches. The provision of code enhances reproducibility, and the multi-lingual evaluation on diverse language types is a notable strength.

major comments (1)

[Abstract and method description] The assumption that POS n-gram statistics provide a reliable proxy for linguistic indeterminacy directly controlling the quality-detection trade-off lacks direct validation against model entropy or quality metrics. This is central to the dynamic modulation strategy; if the proxy correlates only loosely with actual indeterminacy, the robustness gains may not hold beyond the tested setups.

minor comments (1)

[Abstract] While the abstract reports superior detection robustness, it omits specific quantitative metrics, baselines, and details on how text quality was measured, which would strengthen the high-level claims.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback and positive assessment of STELA's potential contribution. We address the single major comment below.

read point-by-point responses

Referee: [Abstract and method description] The assumption that POS n-gram statistics provide a reliable proxy for linguistic indeterminacy directly controlling the quality-detection trade-off lacks direct validation against model entropy or quality metrics. This is central to the dynamic modulation strategy; if the proxy correlates only loosely with actual indeterminacy, the robustness gains may not hold beyond the tested setups.

Authors: We acknowledge that a direct empirical comparison between the POS n-gram indeterminacy scores and model entropy would provide stronger justification for the proxy choice. In the revised manuscript we will add a dedicated analysis subsection that reports Pearson and Spearman correlations between our POS-based scores and token-level entropy computed from the underlying LLM on a held-out set of 500 generations per language. This addition will quantify how well the linguistic proxy aligns with actual model uncertainty. At the same time, we note that the primary evidence for the approach remains the end-to-end multilingual experiments, which already demonstrate consistent gains in both detection AUC and human-judged quality relative to entropy-based and fixed-strength baselines. The linguistic grounding is further supported by established syntactic theory on n-gram predictability, yet we agree the requested correlation analysis will address the referee's concern about generalizability. revision: yes

Circularity Check

0 steps flagged

No circularity: linguistic proxy is external and independent

full rationale

The paper defines STELA by modulating watermark strength according to POS n-gram statistics that are computed from external linguistic resources, not from the LLM's output distributions or the detection performance itself. The abstract and method description present this as an input choice that enables public verification without logits, with no equations shown that would make the robustness gain equivalent to a fitted parameter or self-referential definition. Experiments across languages serve as external checks rather than tautological confirmation. No self-citation chains or ansatzes imported from prior author work are load-bearing for the central claim.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The approach rests on the assumption that POS n-gram statistics capture meaningful variation in linguistic flexibility. No free parameters or new entities are explicitly introduced in the abstract, but the modulation threshold is an implicit modeling choice.

free parameters (1)

modulation threshold
Implicit cutoff used to decide when to weaken or strengthen the watermark based on n-gram indeterminacy.

axioms (1)

domain assumption POS n-gram statistics reliably indicate syntactic constraints and degrees of freedom in language
Invoked to justify dynamic weakening/strengthening of the watermark signal.

pith-pipeline@v0.9.0 · 5745 in / 1273 out tokens · 28277 ms · 2026-05-18T08:24:14.341705+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

STELA dynamically modulates the signal using part-of-speech (POS) n-gram–modeled linguistic indeterminacy... λ(c_t) = H(P(π_t | c_t)) / log K_c_t
IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean absolute_floor_iff_bare_distinguishability unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Our detector operates without access to any model logits, thus facilitating publicly verifiable detection.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.