Majority Bit-Aware Watermarking For Large Language Models

Jiahao Xu; Olivera Kotevska; Rui Hu; Zikai Zhang

arxiv: 2508.03829 · v2 · submitted 2025-08-05 · 💻 cs.CL · cs.CR

Majority Bit-Aware Watermarking For Large Language Models

Jiahao Xu , Rui Hu , Olivera Kotevska , Zikai Zhang This is my paper

Pith reviewed 2026-05-19 00:25 UTC · model grok-4.3

classification 💻 cs.CL cs.CR

keywords LLM watermarkingmulti-bit message encodingmajority bit-aware encodinggreen listtext generation qualitymisuse tracing

0 comments

The pith

Majority bit-aware encoding embeds detectable messages in LLM text without forcing smaller green lists.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Current LLM watermarking methods must shrink the green list of preferred tokens to keep the embedded message detectable, but smaller lists lower output quality. The paper introduces majority bit-aware encoding, which ties the watermark signal to the majority bit of the message instead of green-list size. This keeps a strong statistical signal even when large green lists are used. Two versions, MajorMark and MajorMark+, are tested on modern LLMs and show better message recovery together with higher-quality generated text than earlier approaches.

Core claim

The central claim is that majority bit-aware encoding relaxes the watermark signal strength from depending on green list size, so a strong detectable signal is preserved in generated texts even when using a large green list.

What carries the argument

Majority bit-aware encoding, a message encoding paradigm that determines the watermark preference from the majority bit rather than green-list size.

If this is right

Large green lists become usable while still supporting accurate multi-bit message recovery.
Generation quality improves because fewer tokens are excluded from the preferred set.
MajorMark+ extends the same benefit to longer embedded messages without extra quality cost.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same encoding idea could be tried on other generative models that sample from large vocabularies.
It might reduce the usual quality penalty when watermarking is applied to open-ended or creative tasks.
Detection could remain robust under moderate distribution shifts in sampling temperature or top-p.

Load-bearing premise

The majority-bit mapping produces a statistically detectable bias in token choices that stays reliable no matter how large the green list becomes.

What would settle it

Running the method with large green lists on real LLMs and finding detection accuracy falls to chance levels or text quality matches the old restricted-list baselines would disprove the claim.

read the original abstract

The growing deployment of Large Language Models (LLMs) has raised concerns about their misuse in generating harmful or deceptive content. To address this issue, watermarking methods have been proposed to embed identifiable multi-bit messages into generated text for misuse tracing. However, existing methods often suffer from a fundamental trade-off between text quality and decoding accuracy. In particular, they have to restrict the size of the preferred token set (i.e., green list) during encoding to maintain a detectable watermark signal for decoding, which inevitably degrades generation quality. To improve this trade-off, we propose a novel message encoding paradigm called \textit{majority bit-aware encoding}, which relaxes the watermark signal strength from the green list size. This strategy allows for a strong watermark signal to be preserved in generated texts even when using a large green list. We introduce two instantiations of this paradigm: MajorMark and MajorMark$^{+}$, where the latter is specifically optimized for long messages. Extensive experiments on state-of-the-art LLMs demonstrate that our methods achieve higher decoding accuracy and superior text quality compared to prior baselines.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The majority-bit encoding is a clean way to loosen the green-list size constraint in multi-bit LLM watermarking, but the abstract leaves the actual gains and the distribution assumption untested.

read the letter

The main takeaway is that this work proposes majority bit-aware encoding so the watermark signal no longer has to shrink with a larger green list. That directly attacks the quality-detection trade-off that has limited earlier multi-bit schemes. They present two concrete versions, MajorMark and MajorMark+, and report that experiments on current LLMs show better decoding accuracy plus higher text quality than the baselines they compare against. The encoding change itself looks like a genuine shift from prior statistical approaches rather than a minor reparameterization. If the results hold, it is a practical step for anyone trying to trace generated text without paying as large a quality penalty. The central assumption is that the majority vote over bit assignments stays statistically detectable even when the green list grows and the next-token distribution is heavily skewed. The stress-test note flags a real risk here: under Zipf-like sampling, most probability mass sits in a few tokens, so the majority rule could become either automatic or impossible to steer without reintroducing the quality cost the method claims to avoid. The abstract asserts superior performance but supplies no numbers, no list-size ablations, and no variance analysis on the detection statistic, so it is impossible to tell whether the assumption survives real sampling. This paper is for people already working on watermarking for misuse tracing or content moderation. A reader who needs incremental improvements to existing green-list methods will get something concrete to try. It is coherent enough and addresses a live problem, so it should go to peer review rather than a desk reject; the referees can check whether the empirical claims and the distribution analysis actually close the gap the stress-test identifies.

Referee Report

2 major / 0 minor

Summary. The manuscript proposes a novel majority bit-aware encoding paradigm for watermarking large language models to embed multi-bit messages. This approach relaxes the dependence of the watermark signal strength on the green list size, enabling the use of larger green lists to improve text quality while maintaining a strong detectable signal. The authors present two instantiations, MajorMark and MajorMark+, with the latter optimized for long messages, and claim through experiments on state-of-the-art LLMs that these methods achieve higher decoding accuracy and superior text quality compared to prior baselines.

Significance. If the empirical claims hold under realistic conditions, the work addresses a central limitation in existing LLM watermarking schemes by decoupling signal strength from green-list size. This could enable more practical multi-bit watermarking for misuse tracing without the usual quality degradation, representing a targeted improvement over prior trade-off resolutions.

major comments (2)

Abstract: The claim of 'higher decoding accuracy and superior text quality' from 'extensive experiments' is asserted without any reported quantitative metrics, baselines, or controls, leaving the central empirical support for the majority-bit paradigm unassessable from the provided text.
§3 (Encoding and Detection): The assumption that majority-bit mapping preserves independent statistical detectability independent of green/red partition ratio lacks supporting analysis of detection statistic variance or bias under Zipf-like next-token distributions; under skewed LLM sampling, high-probability tokens may dominate large green lists and render the majority vote either trivial or unsteerable without reintroducing quality penalties.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments, which help clarify how to better present the contributions of the majority bit-aware encoding paradigm. We address each major comment below and commit to revisions that strengthen the manuscript without altering its core claims.

read point-by-point responses

Referee: Abstract: The claim of 'higher decoding accuracy and superior text quality' from 'extensive experiments' is asserted without any reported quantitative metrics, baselines, or controls, leaving the central empirical support for the majority-bit paradigm unassessable from the provided text.

Authors: We agree that the abstract would benefit from explicit quantitative support for the empirical claims. In the revised manuscript we will update the abstract to include key metrics (e.g., decoding accuracy gains and quality improvements relative to the strongest baselines) drawn directly from the experimental results already reported in the paper body. revision: yes
Referee: §3 (Encoding and Detection): The assumption that majority-bit mapping preserves independent statistical detectability independent of green/red partition ratio lacks supporting analysis of detection statistic variance or bias under Zipf-like next-token distributions; under skewed LLM sampling, high-probability tokens may dominate large green lists and render the majority vote either trivial or unsteerable without reintroducing quality penalties.

Authors: This observation correctly identifies a gap in the current theoretical justification. While the majority-bit construction is intended to aggregate per-token signals so that detectability does not scale directly with green-list size, we did not supply a variance or bias analysis under realistic (Zipfian) token distributions. We will add a dedicated paragraph and supporting simulation in §3 that derives the expected behavior of the majority-vote statistic and shows that high-probability tokens do not render the signal trivial or force quality degradation. revision: yes

Circularity Check

0 steps flagged

No circularity: novel encoding paradigm presented without reduction to fitted inputs or self-citations

full rationale

The paper introduces majority bit-aware encoding as a new message encoding paradigm that decouples watermark signal strength from green list size. No equations, fitted parameters, or predictions are described that reduce by construction to prior inputs. The central claim rests on the proposed encoding change and is supported by experiments rather than self-referential derivations or load-bearing self-citations. This is a standard case of an independent methodological contribution with no detectable circular steps in the provided description.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The approach relies on standard LLM token-sampling assumptions and the existence of a detectable statistical bias; no new free parameters or invented entities are mentioned in the abstract.

axioms (1)

domain assumption LLM generation proceeds by sampling tokens from a probability distribution over a fixed vocabulary.
Implicit in all watermarking methods that bias token selection during decoding.

pith-pipeline@v0.9.0 · 5721 in / 1187 out tokens · 54984 ms · 2026-05-19T00:25:23.492649+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

majority bit-aware encoding ... guarantees that γ ≥ 0.5 ... Em[γ] = 0.5 + 1/√(2πb)
IndisputableMonolith/Foundation/ArithmeticFromLogic.lean embed_injective unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

clustering-based decoding ... deterministic decoding ... shard-wise token occurrence count

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.