pith. sign in

arxiv: 2404.02138 · v6 · submitted 2024-04-02 · 💻 cs.CR · cs.CL· cs.LG

Topic-Based Watermarks for Large Language Models

Pith reviewed 2026-05-24 02:01 UTC · model grok-4.3

classification 💻 cs.CR cs.CLcs.LG
keywords watermarkinglarge language modelstopic-guided selectionAI-generated textparaphrasing robustnesstoken subsetsgreen-listinggeneration quality
0
0 comments X

The pith

A topic-guided scheme partitions LLM vocabulary into topic-aligned token subsets to embed watermarks that resist paraphrasing while preserving generation quality.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a watermarking approach for large language models that identifies the topic of an input prompt and then favors tokens from a matching subset during text generation. This creates a detectable signature by green-listing semantically related tokens without requiring changes to the core generation process or extra frameworks. The authors argue this balances three goals that often conflict in prior methods: attack resistance, output fluency, and low overhead. Tests across multiple models and benchmarks are said to show quality on par with leading systems alongside better survival under paraphrasing and lexical changes. If the approach holds, it would allow straightforward addition of watermarks to standard pipelines for detecting AI-generated content.

Core claim

By partitioning the vocabulary into topic-specific subsets and selecting the relevant subset from the prompt to bias token probabilities toward aligned items, the method embeds marks that improve robustness to paraphrasing and lexical perturbations while matching the text quality of industry systems and adding negligible overhead, all without external mechanisms beyond normal generation.

What carries the argument

Topic-guided selection of green-listed token subsets from a vocabulary partition, chosen according to the prompt's identified topic.

If this is right

  • Watermarking becomes possible on any standard LLM pipeline without specialized integrations or post-processing.
  • Detection remains effective after paraphrasing and word-level changes that defeat earlier watermark schemes.
  • Generation speed and quality stay comparable to unwatermarked baselines across common benchmarks.
  • The same method can be applied uniformly to outputs from different models for consistent tracing.
  • No additional runtime cost beyond normal sampling makes broad deployment feasible.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If topic detection is noisy on short or ambiguous prompts, the watermark strength would vary by input type in practice.
  • The approach could extend to dynamic multi-topic handling by blending subsets during long generations.
  • Combining this token bias with existing statistical detectors might raise the bar for evasion attempts.
  • Widespread use would create a de facto standard for marking AI text, aiding downstream verification tools.

Load-bearing premise

A relevant topic can be reliably identified from the input prompt to pick the correct token subset, and favoring those tokens keeps the output fluent and coherent without extra fixes.

What would settle it

A test set where automatic topic identification from prompts frequently selects mismatched subsets, resulting in either watermark detection failure or measurable drops in fluency under paraphrasing attacks, would disprove the central performance claims.

Figures

Figures reproduced from arXiv: 2404.02138 by Alexander Nemecek, Erman Ayday, Yuzhou Jiang.

Figure 1
Figure 1. Figure 1: Comparison of KGW and TBW vocabulary partitioning. KGW (top) randomly partitions vocabulary V into green/red lists using parameter γ. TBW (bottom) creates semantically meaningful partitions by assigning tokens to predefined topic lists. Prompts are mapped to corresponding topics, making that topic list the active green list. Both methods bias generation toward green lists using parameter δ to adjust logit … view at source ↗
Figure 2
Figure 2. Figure 2: Text perplexity comparison using baseline [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Detection scores under random (left) and tar [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Perplexity comparison for OPT-6.7B with TBW at higher watermark strength (δ = 3.0) and all other schemes at their standard settings. Compared to TBW’s δ = 2.0 results (see §5.2), the increased bias leads to moderate quality degradation while retaining the lowest perplexity among watermarking methods. Lower values indicate higher text quality. provides crucial insights into whether watermark￾ing artifacts a… view at source ↗
Figure 6
Figure 6. Figure 6: Comparison of average generation time (sec [PITH_FULL_IMAGE:figures/full_fig_p018_6.png] view at source ↗
Figure 5
Figure 5. Figure 5: Comparison of average generation time (sec [PITH_FULL_IMAGE:figures/full_fig_p018_5.png] view at source ↗
Figure 7
Figure 7. Figure 7: ROC curves comparing watermark methods on OPT-6.7B and Gemma-7B against PEGASUS (top) and [PITH_FULL_IMAGE:figures/full_fig_p019_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Word clouds of the top-40 normalized token frequencies for OPT-6.7B (top) and GEMMA-7B (bottom). From left to right, each row shows outputs from: (i) non-watermarked generations, (ii) TBW with bias δ = 2.0, and (iii) TBW with bias δ = 3.0. Across both models and bias strengths, the distributions are dominated by common function words (e.g., “the,” “and,” “to”), with no systematic elevation of topic-specifi… view at source ↗
Figure 9
Figure 9. Figure 9: ROC curves for maximum z-score detection on OPT-6.7B and GEMMA-7B. Both models achieve AUC values of 0.996 and 1.000, indicating near-perfect separation between watermarked and non-watermarked content. 0.0 5.0 10.0 15.0 20.0 z-score 0.00 0.10 0.20 0.30 0.40 Density OPT-6.7B Watermarked Non-watermarked Threshold = 4.75 0.0 2.5 5.0 7.5 10.0 12.5 15.0 17.5 20.0 z-score 0.00 0.10 0.20 0.30 0.40 0.50 0.60 Densi… view at source ↗
Figure 10
Figure 10. Figure 10: z-score distributions for the maximum z￾score detection method on OPT-6.7B and GEMMA￾7B. Both show clear separation between watermarked and non-watermarked text, with OPT-6.7B’s closer overlap explaining its higher false positive rate. both models exhibit clear bimodal separation be￾tween watermarked and non-watermarked text, the non-watermarked distribution for OPT-6.7B lies slightly closer to the detect… view at source ↗
Figure 11
Figure 11. Figure 11: Detection score (z-score) vs. bias strength δ across similarity thresholds. Higher δ yields stronger watermark signals, with detection saturating around δ = 5.0. 0.0 2.0 4.0 6.0 8.0 10.0 (Bias Strength) 0.0 0.2 0.4 0.6 0.8 1.0 Distinct-1 = 0.3 = 0.5 = 0.7 0.0 2.0 4.0 6.0 8.0 10.0 (Bias Strength) 0.0 0.2 0.4 0.6 0.8 1.0 Distinct-2 0.0 2.0 4.0 6.0 8.0 10.0 (Bias Strength) 0.0 0.2 0.4 0.6 0.8 1.0 Distinct-3 … view at source ↗
Figure 12
Figure 12. Figure 12: Lexical diversity (Distinct-N) vs. bias strength δ. Moderate δ values maintain diversity, while very high strengths lead to increased repetition. sits just before the diversity inflection point in [PITH_FULL_IMAGE:figures/full_fig_p025_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Distinct-N scores vs. detection scores across parameter combinations. The relationship demonstrates [PITH_FULL_IMAGE:figures/full_fig_p026_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: Comprehensive heat map of Distinct-N metrics across bias strength [PITH_FULL_IMAGE:figures/full_fig_p026_14.png] view at source ↗
Figure 15
Figure 15. Figure 15: Detection strength vs. number of top￾ics. Mean detector z-score (max-z) as we scale K ∈ {4, 8, 16, 32} on GEMMA-7B with δ=2.0, τ=0.5 over 100 prompts. Error bars denote ±s.d. Text quality vs. K. Using the same setup, we as￾sess BERTScore F1 as a text-quality metric (Zhang et al., 2024) [PITH_FULL_IMAGE:figures/full_fig_p027_15.png] view at source ↗
Figure 16
Figure 16. Figure 16: Text quality vs. number of topics. BERTScore F1 remains flat as K increases, indicat￾ing no quality degradation. Error bars denote ±s.d. 0.35 0.40 0.45 0.50 0.55 0.60 BERTScore F1 0 2 4 6 8 10 12 14 z-score Number of Topics 4 topics 8 topics 16 topics 32 topics [PITH_FULL_IMAGE:figures/full_fig_p028_16.png] view at source ↗
Figure 17
Figure 17. Figure 17: Detection vs. quality trade-off. Per-sample [PITH_FULL_IMAGE:figures/full_fig_p028_17.png] view at source ↗
read the original abstract

The indistinguishability of large language model (LLM) output from human-authored content poses significant challenges, raising concerns about potential misuse of AI-generated text and its influence on future model training. Watermarking algorithms offer a viable solution by embedding detectable signatures into generated text. However, existing watermarking methods often involve trade-offs among attack robustness, generation quality, and additional overhead such as specialized frameworks or complex integrations. We propose a lightweight, topic-guided watermarking scheme for LLMs that partitions the vocabulary into topic-aligned token subsets. Given an input prompt, the scheme selects a relevant topic-specific token list, effectively "green-listing" semantically aligned tokens to embed robust marks while preserving fluency and coherence. Experimental results across multiple LLMs and state-of-the-art benchmarks demonstrate that our method achieves text quality comparable to industry-leading systems and simultaneously improves watermark robustness against paraphrasing and lexical perturbation attacks, with minimal performance overhead. Our approach avoids reliance on additional mechanisms beyond standard text generation pipelines, enabling straightforward adoption and suggesting a practical path toward globally consistent watermarking of AI-generated content.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The paper proposes a lightweight topic-guided watermarking scheme for LLMs. It partitions the vocabulary into topic-aligned token subsets; given an input prompt, it identifies a relevant topic to select and green-list semantically aligned tokens during generation. This is claimed to embed detectable marks while preserving fluency, achieving text quality comparable to industry systems, and improving robustness to paraphrasing and lexical perturbation attacks with minimal overhead, all without additional frameworks beyond standard generation pipelines.

Significance. If the central claims hold, the work would provide a practical, low-overhead alternative to existing watermarking methods that often require specialized integrations. The emphasis on topic alignment for robustness without sacrificing quality addresses a key tension in the field. The approach's avoidance of extra mechanisms is a concrete strength that could facilitate broader adoption if the topic-selection precondition is validated.

major comments (2)
  1. [Abstract / method description] Abstract and method description: the central robustness claims against paraphrasing and lexical attacks rest on the precondition that a relevant topic can be reliably identified from any input prompt to select the correct token subset. No method, accuracy metrics, fallback procedure, or validation experiments for topic identification (especially on short, ambiguous, or multi-topic prompts) are supplied, leaving the reported improvements dependent on an unexamined assumption.
  2. [Abstract] Abstract: the assertion of 'experimental results across multiple LLMs and state-of-the-art benchmarks' that demonstrate comparable quality and improved robustness supplies no quantitative metrics, attack details, baseline comparisons, or methodology, rendering the performance claims unverifiable from the manuscript text.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thoughtful and constructive review. The comments identify important areas where the presentation of our topic-guided watermarking method can be strengthened, particularly around the topic identification precondition and the level of detail in the abstract. We address each major comment below and commit to revisions that will make the manuscript more complete and verifiable.

read point-by-point responses
  1. Referee: [Abstract / method description] Abstract and method description: the central robustness claims against paraphrasing and lexical attacks rest on the precondition that a relevant topic can be reliably identified from any input prompt to select the correct token subset. No method, accuracy metrics, fallback procedure, or validation experiments for topic identification (especially on short, ambiguous, or multi-topic prompts) are supplied, leaving the reported improvements dependent on an unexamined assumption.

    Authors: We agree that reliable topic identification from the input prompt is a central precondition for the claimed robustness gains, and that the manuscript does not provide sufficient detail on this component. The current description assumes topic selection occurs but does not specify the procedure (e.g., embedding-based matching or a lightweight classifier), report accuracy, or include fallback logic. We will add a dedicated subsection describing the topic identification method, its implementation, accuracy metrics on standard topic classification benchmarks, handling for short/ambiguous/multi-topic prompts (including a default general-topic fallback), and new validation experiments measuring end-to-end watermark performance under these conditions. These additions will directly address the unexamined assumption. revision: yes

  2. Referee: [Abstract] Abstract: the assertion of 'experimental results across multiple LLMs and state-of-the-art benchmarks' that demonstrate comparable quality and improved robustness supplies no quantitative metrics, attack details, baseline comparisons, or methodology, rendering the performance claims unverifiable from the manuscript text.

    Authors: The abstract is written as a high-level summary per standard conventions. The full manuscript contains the requested details in the Experiments and Evaluation sections: quantitative metrics (perplexity, detection rates), explicit attack descriptions (paraphrasing via specific models and lexical substitutions), baseline comparisons (to prior watermarking schemes), and methodology (models, benchmarks, attack parameters). However, we acknowledge that the abstract could better signpost these results. We will revise the abstract to include concise references to key quantitative outcomes and direct readers to the relevant sections, improving verifiability without exceeding length constraints. revision: partial

Circularity Check

0 steps flagged

No circularity: method is an independent algorithmic proposal

full rationale

The paper describes a new topic-guided watermarking scheme that partitions the vocabulary into topic-aligned subsets and selects green-list tokens from an input prompt. No equations, derivations, or self-citations are shown that reduce the claimed robustness or quality improvements to fitted parameters, self-definitions, or prior author results by construction. The approach is presented as a self-contained algorithmic contribution without load-bearing uniqueness theorems or ansatzes imported from the authors' own prior work. The topic-identification precondition is an assumption about the method's applicability rather than a circular step in any derivation chain.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The method rests on the feasibility of topic-aligned vocabulary partitioning and prompt-driven subset selection; these are domain assumptions rather than new entities or fitted constants.

axioms (1)
  • domain assumption LLM generation can be biased toward topic-aligned token subsets without materially harming fluency or coherence
    Core premise of the green-listing step described in the abstract

pith-pipeline@v0.9.0 · 5716 in / 1198 out tokens · 63339 ms · 2026-05-24T02:01:03.982268+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. The End of Trust: How Agentic AI Breaks Security Assumptions

    cs.CR 2026-05 unverdicted novelty 6.0

    Agentic AI eliminates the fidelity-scale tradeoff in deception, enabling the Infinite Impostor attack that hijacks trusted relationships at mass scale and requiring a shift to suspect-by-default security based on eval...

  2. Who Gets Flagged? The Pluralistic Evaluation Gap in AI Content Watermarking

    cs.CY 2026-04 conditional novelty 6.0

    AI content watermarking exhibits detection disparities across languages, cultures, and demographics due to content-dependent signal properties, with benchmarks failing to disaggregate performance and watermarking held...

Reference graph

Works this paper leans on

55 extracted references · 55 canonical work pages · cited by 2 Pith papers · 5 internal anchors

  1. [1]

    online" 'onlinestring :=

    ENTRY address archivePrefix author booktitle chapter edition editor eid eprint eprinttype howpublished institution journal key month note number organization pages publisher school series title type volume year doi pubmed url lastchecked label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block STRING...

  2. [2]

    write newline

    " write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in capitalize " " * FUNCT...

  3. [3]

    Association for the Advancement of Artificial Intelligence (AAAI) . 2025. AAAI Launches AI-Powered Peer Review Assessment System . https://aaai.org/aaai-launches-ai-powered-peer-review-assessment-system/. Accessed: 2025-08-25

  4. [4]

    Jainit Sushil Bafna, Hardik Mittal, Suyash Sethia, Manish Shrivastava, and Radhika Mamidi. 2024. https://arxiv.org/abs/2407.02978 Mast kalandar at semeval-2024 task 8: On the trail of textual origins: Roberta-bilstm approach to detect ai-generated text . Preprint, arXiv:2407.02978

  5. [5]

    Isabel Cachola, Kyle Lo, Arman Cohan, and Daniel Weld. 2020. https://doi.org/10.18653/v1/2020.findings-emnlp.428 TLDR : Extreme summarization of scientific documents . In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 4766--4777, Online. Association for Computational Linguistics

  6. [6]

    Canyu Chen and Kai Shu. 2024. Combating misinformation in the age of llms: Opportunities and challenges. AI Magazine, 45(3):354--368

  7. [7]

    Cleveland Clinic . 2025. Cleveland Clinic Announces Rollout of Ambience Healthcare's AI Platform . https://newsroom.clevelandclinic.org/2025/02/19/cleveland-clinic-announces-the-rollout-of-ambience-healthcares-ai-platform. Accessed: 2025-08-25

  8. [8]

    Sumanth Dathathri, Abigail See, Sumedh Ghaisas, Po-Sen Huang, Rob McAdam, Johannes Welbl, Vandana Bachani, Alex Kaskasoli, Robert Stanforth, Tatiana Matejovicova, and 1 others. 2024. Scalable watermarking for identifying large language model outputs. Nature, 634(8035):818--823

  9. [9]

    Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. https://arxiv.org/abs/1810.04805 Bert: Pre-training of deep bidirectional transformers for language understanding . Preprint, arXiv:1810.04805

  10. [10]

    Christiane Fellbaum. 1998. WordNet: An electronic lexical database. MIT press

  11. [11]

    Xiaoyan Feng, He Zhang, Yanjun Zhang, Leo Yu Zhang, and Shirui Pan. 2025. https://openreview.net/forum?id=Zvyb3WAg03 Bimark: Unbiased multilayer watermarking for large language models . In Forty-second International Conference on Machine Learning

  12. [12]

    Google . 2024. Introducing gemini 2.0: Our new ai model for the agentic era. https://blog.google/technology/google-deepmind/google-gemini-ai-update-december-2024/. Accessed: 2025-07-29

  13. [13]

    Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Alex Vaughan, and 1 others. 2024. The llama 3 herd of models. arXiv preprint arXiv:2407.21783

  14. [14]

    Maarten Grootendorst. 2020. Keybert: Minimal keyword extraction with bert. https://maartengr.github.io/KeyBERT/. Accessed: 2025‑07‑29

  15. [15]

    Abe Hou, Jingyu Zhang, Tianxing He, Yichen Wang, Yung-Sung Chuang, Hongwei Wang, Lingfeng Shen, Benjamin Van Durme, Daniel Khashabi, and Yulia Tsvetkov. 2024 a . https://doi.org/10.18653/v1/2024.naacl-long.226 S em S tamp: A semantic watermark with paraphrastic robustness for text generation . In Proceedings of the 2024 Conference of the North American Ch...

  16. [16]

    Abe Hou, Jingyu Zhang, Yichen Wang, Daniel Khashabi, and Tianxing He. 2024 b . https://doi.org/10.18653/v1/2024.findings-acl.98 k- S em S tamp: A clustering-based semantic watermark for detection of machine-generated text . In Findings of the Association for Computational Linguistics: ACL 2024, pages 1706--1715, Bangkok, Thailand. Association for Computat...

  17. [17]

    Hugging Face . 2025. Hugging face - the ai community building the future. https://huggingface.co. Accessed: 2025-08-26

  18. [18]

    Mingjia Huo, Sai Ashish Somayajula, Youwei Liang, Ruisi Zhang, Farinaz Koushanfar, and Pengtao Xie. 2024. Token-specific watermarking with enhanced detectability and semantic coherence for large language models. In Proceedings of the 41st International Conference on Machine Learning, ICML'24. JMLR.org

  19. [19]

    Niful Islam, Debopom Sutradhar, Humaira Noor, Jarin Tasnim Raya, Monowara Tabassum Maisha, and Dewan Md Farid. 2023. https://arxiv.org/abs/2306.01761 Distinguishing human generated text from chatgpt generated text using machine learning . Preprint, arXiv:2306.01761

  20. [20]

    Nikola Jovanovi\' c , Robin Staab, and Martin Vechev. 2024. Watermark stealing in large language models. In Proceedings of the 41st International Conference on Machine Learning, ICML'24. JMLR.org

  21. [21]

    John Kirchenbauer, Jonas Geiping, Yuxin Wen, Jonathan Katz, Ian Miers, and Tom Goldstein. 2023. A watermark for large language models. In International Conference on Machine Learning, pages 17061--17084. PMLR

  22. [22]

    Kalpesh Krishna. 2023. ai-detection-paraphrases. https://github.com/martiansideofthemoon/ai-detection-paraphrasesv . Accessed: 2025-08-01

  23. [23]

    Kalpesh Krishna, Yixiao Song, Marzena Karpinska, John Wieting, and Mohit Iyyer. 2023. Paraphrasing evades detectors of ai-generated text, but retrieval is an effective defense. In Proceedings of the 37th International Conference on Neural Information Processing Systems, NIPS '23, Red Hook, NY, USA. Curran Associates Inc

  24. [24]

    Rohith Kuditipudi, John Thickstun, Tatsunori Hashimoto, and Percy Liang. 2024. https://arxiv.org/abs/2307.15593 Robust distortion-free watermarks for language models . Preprint, arXiv:2307.15593

  25. [25]

    Jooyoung Lee, Thai Le, Jinghui Chen, and Dongwon Lee. 2023. Do language models plagiarize? In Proceedings of the ACM Web Conference 2023, pages 3637--3647

  26. [26]

    Taehyun Lee, Seokhee Hong, Jaewoo Ahn, Ilgee Hong, Hwaran Lee, Sangdoo Yun, Jamin Shin, and Gunhee Kim. 2024. https://doi.org/10.18653/v1/2024.acl-long.268 Who wrote this code? watermarking for code generation . In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 4890--4911, Bangkok, Th...

  27. [27]

    David D Lewis, Yiming Yang, Tony G Rose, and Fan Li. 2004. Rcv1: A new benchmark collection for text categorization research. Journal of machine learning research, 5(Apr):361--397

  28. [28]

    Yu, and Lifang He

    Qian Li, Hao Peng, Jianxin Li, Congying Xia, Renyu Yang, Lichao Sun, Philip S. Yu, and Lifang He. 2021. https://arxiv.org/abs/2008.00364 A survey on text classification: From shallow to deep learning . Preprint, arXiv:2008.00364

  29. [29]

    Weixin Liang, Mert Yuksekgonul, Yining Mao, Eric Wu, and James Zou. 2023. Gpt detectors are biased against non-native english writers. Patterns, 4(7)

  30. [30]

    Aiwei Liu, Leyi Pan, Xuming Hu, Shiao Meng, and Lijie Wen. 2024. https://arxiv.org/abs/2310.06356 A semantic invariant robust watermark for large language models . Preprint, arXiv:2310.06356

  31. [31]

    Yepeng Liu and Yuheng Bu. 2024. Adaptive text watermark for large language models. In Proceedings of the 41st International Conference on Machine Learning, ICML'24. JMLR.org

  32. [32]

    Minjia Mao, Dongjun Wei, Zeyu Chen, Xiao Fang, and Michael Chau. 2025. https://arxiv.org/abs/2405.14604 Watermarking low-entropy generation for large language models: An unbiased and low-risk method . Preprint, arXiv:2405.14604

  33. [33]

    Eric Mitchell, Yoonho Lee, Alexander Khazatsky, Christopher D Manning, and Chelsea Finn. 2023. Detectgpt: Zero-shot machine-generated text detection using probability curvature. In International conference on machine learning, pages 24950--24962. PMLR

  34. [34]

    Felix B Mueller, Rebekka G \"o rge, Anna K Bernzen, Janna C Pirk, and Maximilian Poretschkin. 2024. Llms and memorization: On quality and specificity of copyright compliance. In Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society, volume 7, pages 984--996

  35. [35]

    Alexander Nemecek, Yuzhou Jiang, and Erman Ayday. 2025. The feasibility of topic-based watermarking on academic peer reviews. arXiv preprint arXiv:2505.21636

  36. [36]

    Newsweek . 2025. World's Best Hospitals 2025 - United States of America . https://rankings.newsweek.com/worlds-best-hospitals-2025/united-states-america. Accessed: 2025-08-25

  37. [37]

    Georg Niess and Roman Kern. 2025. https://aclanthology.org/2025.acl-long.145/ Ensemble watermarks for large language models . In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2903--2916, Vienna, Austria. Association for Computational Linguistics

  38. [38]

    OpenAI . 2022. Introducing chatgpt. https://openai.com/index/chatgpt/. Accessed: 2025-07-29

  39. [39]

    OpenAI . 2023. New ai classifier for indicating ai‑written text. https://openai.com/index/new-ai-classifier-for-indicating-ai-written-text. Accessed: 2025-07-29

  40. [40]

    Leyi Pan, Aiwei Liu, Zhiwei He, Zitian Gao, Xuandong Zhao, Yijian Lu, Binglin Zhou, Shuliang Liu, Xuming Hu, Lijie Wen, Irwin King, and Philip S. Yu. 2024. https://doi.org/10.18653/v1/2024.emnlp-demo.7 M ark LLM : An open-source toolkit for LLM watermarking . In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: System...

  41. [41]

    Wenjie Qu, Wengrui Zheng, Tianyang Tao, Dong Yin, Yanze Jiang, Zhihua Tian, Wei Zou, Jinyuan Jia, and Jiaheng Zhang. 2025. https://arxiv.org/abs/2401.16820 Provably robust multi-bit watermarking for ai-generated text . Preprint, arXiv:2401.16820

  42. [42]

    Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu. 2020. Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res., 21(1)

  43. [43]

    Nils Reimers and Iryna Gurevych. 2020. https://doi.org/10.18653/v1/2020.emnlp-main.365 Making monolingual sentence embeddings multilingual using knowledge distillation . In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 4512--4525, Online. Association for Computational Linguistics

  44. [44]

    Ryoma Sato, Yuki Takezawa, Han Bao, Kenta Niwa, and Makoto Yamada. 2023. https://arxiv.org/abs/2310.08920 Embarrassingly simple text watermarks . Preprint, arXiv:2310.08920

  45. [45]

    Scott Aaronson . 2023. Watermarking of large language models. https://simons.berkeley.edu/talks/scott-aaronson-ut-austin-openai-2023-08-17. Accessed: 2025‑07‑29

  46. [46]

    Ilia Shumailov, Zakhar Shumaylov, Yiren Zhao, Nicolas Papernot, Ross Anderson, and Yarin Gal. 2024. Ai models collapse when trained on recursively generated data. Nature, 631(8022):755--759

  47. [47]

    SythID-Team . 2024. https://deepmind.google/discover/blog/watermarking-ai-generated-text-and-video-with-synthid/ Watermarking ai-generated text and video with synthid . Accessed: 2025-08-26

  48. [48]

    Gemma Team, Thomas Mesnard, Cassidy Hardin, Robert Dadashi, Surya Bhupatiraju, Shreya Pathak, Laurent Sifre, Morgane Rivi \`e re, Mihir Sanjay Kale, Juliette Love, and 1 others. 2024. Gemma: Open models based on gemini research and technology. arXiv preprint arXiv:2403.08295

  49. [49]

    Shangqing Tu, Yuliang Sun, Yushi Bai, Jifan Yu, Lei Hou, and Juanzi Li. 2024. https://doi.org/10.18653/v1/2024.acl-long.83 W ater B ench: Towards holistic evaluation of watermarks for large language models . In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1517--1542, Bangkok, Thaila...

  50. [50]

    Yihan Wu, Zhengmian Hu, Junfeng Guo, Hongyang Zhang, and Heng Huang. 2024. A resilient and accessible distribution-preserving watermark for large language models. In Proceedings of the 41st International Conference on Machine Learning, ICML'24. JMLR.org

  51. [51]

    Jingqing Zhang, Yao Zhao, Mohammad Saleh, and Peter J. Liu. 2020 a . Pegasus: pre-training with extracted gap-sentences for abstractive summarization. In Proceedings of the 37th International Conference on Machine Learning, ICML'20. JMLR.org

  52. [52]

    Ruisi Zhang, Shehzeen Samarah Hussain, Paarth Neekhara, and Farinaz Koushanfar. 2024. Remark-llm: a robust and efficient watermarking framework for generative large language models. In Proceedings of the 33rd USENIX Conference on Security Symposium, SEC '24, USA. USENIX Association

  53. [53]

    Susan Zhang, Stephen Roller, Naman Goyal, Mikel Artetxe, Moya Chen, Shuohui Chen, Christopher Dewan, Mona Diab, Xian Li, Xi Victoria Lin, Todor Mihaylov, Myle Ott, Sam Shleifer, Kurt Shuster, Daniel Simig, Punit Singh Koura, Anjali Sridhar, Tianlu Wang, and Luke Zettlemoyer. 2022. https://arxiv.org/abs/2205.01068 Opt: Open pre-trained transformer language...

  54. [54]

    BERTScore: Evaluating Text Generation with BERT

    Tianyi Zhang, Varsha Kishore, Felix Wu, Kilian Q. Weinberger, and Yoav Artzi. 2020 b . https://arxiv.org/abs/1904.09675 Bertscore: Evaluating text generation with bert . Preprint, arXiv:1904.09675

  55. [55]

    Xuandong Zhao, Prabhanjan Vijendra Ananth, Lei Li, and Yu-Xiang Wang. 2024. https://openreview.net/forum?id=SsmT8aO45L Provable robust watermarking for AI -generated text . In The Twelfth International Conference on Learning Representations