A Linguistics-Aware LLM Watermarking via Syntactic Predictability
Pith reviewed 2026-05-18 08:24 UTC · model grok-4.3
The pith
STELA modulates watermark strength in LLM text according to syntactic predictability from part-of-speech n-grams.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
STELA aligns watermark strength with linguistic degrees of freedom by modeling indeterminacy via part-of-speech n-grams, weakening the signal where grammar constrains choices and strengthening it where flexibility exists, thereby supporting logit-free public detection and improved robustness across typologically different languages.
What carries the argument
POS n-gram modeled linguistic indeterminacy, used as a proxy to dynamically scale watermark intensity according to local syntactic constraints.
If this is right
- Detection no longer requires access to the source model's internal token probabilities.
- The quality-detection balance improves across analytic, isolating, and agglutinative languages.
- Watermark signals can be strengthened selectively in open syntactic positions without broad quality loss.
Where Pith is reading between the lines
- The same POS-based modulation idea could be tested on other structural signals such as dependency labels or sentence position.
- If POS taggers are available, the method might transfer to additional languages with minimal extra engineering.
- Public verifiability opens the possibility of combining STELA with external registries that record watermark keys.
Load-bearing premise
Part-of-speech n-gram statistics serve as a reliable stand-in for how much wording freedom exists in a given sentence position.
What would settle it
A controlled test that measures whether watermark detection accuracy falls below baseline methods when the POS n-gram predictor assigns high indeterminacy to contexts that humans judge as tightly constrained.
read the original abstract
As large language models (LLMs) continue to advance rapidly, reliable governance tools have become critical. Publicly verifiable watermarking is particularly essential for fostering a trustworthy AI ecosystem. A central challenge persists: balancing text quality against detection robustness. Recent studies have sought to navigate this trade-off by leveraging signals from model output distributions (e.g., token-level entropy); however, their reliance on these model-specific signals presents a significant barrier to public verification, as the detection process requires access to the logits of the underlying model. We introduce STELA, a novel framework that aligns watermark strength with the linguistic degrees of freedom inherent in language. STELA dynamically modulates the signal using part-of-speech (POS) n-gram-modeled linguistic indeterminacy, weakening it in grammatically constrained contexts to preserve quality and strengthening it in contexts with greater linguistic flexibility to enhance detectability. Our detector operates without access to any model logits, thus facilitating publicly verifiable detection. Through extensive experiments on typologically diverse languages-analytic English, isolating Chinese, and agglutinative Korean-we show that STELA surpasses prior methods in detection robustness. Our code is available at https://github.com/Shinwoo-Park/stela_watermark.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces STELA, a novel LLM watermarking framework that aligns watermark strength with linguistic degrees of freedom using part-of-speech (POS) n-gram statistics to model indeterminacy. The method dynamically modulates the signal—weakening it in grammatically constrained contexts to preserve quality and strengthening it in flexible contexts to enhance detectability—while enabling publicly verifiable detection without access to model logits. Experiments on analytic English, isolating Chinese, and agglutinative Korean demonstrate that STELA outperforms prior methods in detection robustness.
Significance. Should the results be confirmed, this work could significantly contribute to the field of AI governance by providing a publicly verifiable watermarking technique that does not rely on internal model distributions. The use of linguistic proxies for modulation addresses a key limitation of previous approaches. The provision of code enhances reproducibility, and the multi-lingual evaluation on diverse language types is a notable strength.
major comments (1)
- [Abstract and method description] The assumption that POS n-gram statistics provide a reliable proxy for linguistic indeterminacy directly controlling the quality-detection trade-off lacks direct validation against model entropy or quality metrics. This is central to the dynamic modulation strategy; if the proxy correlates only loosely with actual indeterminacy, the robustness gains may not hold beyond the tested setups.
minor comments (1)
- [Abstract] While the abstract reports superior detection robustness, it omits specific quantitative metrics, baselines, and details on how text quality was measured, which would strengthen the high-level claims.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback and positive assessment of STELA's potential contribution. We address the single major comment below.
read point-by-point responses
-
Referee: [Abstract and method description] The assumption that POS n-gram statistics provide a reliable proxy for linguistic indeterminacy directly controlling the quality-detection trade-off lacks direct validation against model entropy or quality metrics. This is central to the dynamic modulation strategy; if the proxy correlates only loosely with actual indeterminacy, the robustness gains may not hold beyond the tested setups.
Authors: We acknowledge that a direct empirical comparison between the POS n-gram indeterminacy scores and model entropy would provide stronger justification for the proxy choice. In the revised manuscript we will add a dedicated analysis subsection that reports Pearson and Spearman correlations between our POS-based scores and token-level entropy computed from the underlying LLM on a held-out set of 500 generations per language. This addition will quantify how well the linguistic proxy aligns with actual model uncertainty. At the same time, we note that the primary evidence for the approach remains the end-to-end multilingual experiments, which already demonstrate consistent gains in both detection AUC and human-judged quality relative to entropy-based and fixed-strength baselines. The linguistic grounding is further supported by established syntactic theory on n-gram predictability, yet we agree the requested correlation analysis will address the referee's concern about generalizability. revision: yes
Circularity Check
No circularity: linguistic proxy is external and independent
full rationale
The paper defines STELA by modulating watermark strength according to POS n-gram statistics that are computed from external linguistic resources, not from the LLM's output distributions or the detection performance itself. The abstract and method description present this as an input choice that enables public verification without logits, with no equations shown that would make the robustness gain equivalent to a fitted parameter or self-referential definition. Experiments across languages serve as external checks rather than tautological confirmation. No self-citation chains or ansatzes imported from prior author work are load-bearing for the central claim.
Axiom & Free-Parameter Ledger
free parameters (1)
- modulation threshold
axioms (1)
- domain assumption POS n-gram statistics reliably indicate syntactic constraints and degrees of freedom in language
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
STELA dynamically modulates the signal using part-of-speech (POS) n-gram–modeled linguistic indeterminacy... λ(c_t) = H(P(π_t | c_t)) / log K_c_t
-
IndisputableMonolith/Foundation/AbsoluteFloorClosure.leanabsolute_floor_iff_bare_distinguishability unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Our detector operates without access to any model logits, thus facilitating publicly verifiable detection.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.