Robust LLM Watermarking with Minimal Semantic Distortion for IP Protection

Kieu Dang; NHatHai Phan; Phung Lai; Ruoming Jin; Yelong Shen

arxiv: 2605.23175 · v1 · pith:J253TILLnew · submitted 2026-05-22 · 💻 cs.CR · cs.CL

Robust LLM Watermarking with Minimal Semantic Distortion for IP Protection

Kieu Dang , Phung Lai , NhatHai Phan , Yelong Shen , Ruoming Jin This is my paper

Pith reviewed 2026-05-25 04:32 UTC · model grok-4.3

classification 💻 cs.CR cs.CL

keywords LLM watermarkingIP protectionsemantic preservationkey-conditioned samplingcontrastive detectionrobustnesstournament samplingnamed entity preservation

0 comments

The pith

SAFESEAL watermarks LLM outputs by replacing words with key-selected synonyms while keeping named entities and facts intact for IP verification.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces SAFESEAL as a key-conditioned watermarking method for proprietary LLMs to let owners detect if their model outputs were used to train a surrogate. It replaces linguistic terms with context-aware synonyms chosen via tournament sampling conditioned on a secret key, while explicitly preserving named entities. Detection uses a contrastive encoder that takes both the text and the key to confirm the watermark in a provider-specific way. The approach claims to maintain high semantic fidelity, with reported BERTScore of 0.983 and 98.2 percent detection, plus lower latency than prior methods. A sympathetic reader would care because existing watermarks often distort meaning or fail against attacks, limiting their value for protecting model IP.

Core claim

SAFESEAL is a novel key-conditioned watermarking framework that achieves strong detectability with minimal impact on model utility by preserving named entities while substituting linguistic terms with context-aware synonyms through a key-conditioned Tournament sampling mechanism, maintaining semantic fidelity and factual consistency. For detection, it introduces a key-conditioned contrastive detector that jointly encodes the text and key, enabling provider-specific and robust watermark verification. Theoretical bounds on the utility-detectability trade-off are derived and latency is reduced through lightweight models, batching, and parallelism.

What carries the argument

key-conditioned Tournament sampling mechanism that selects context-aware synonyms to embed the watermark while preserving named entities

If this is right

Theoretical bounds are derived for the utility-detectability trade-off.
Latency is reduced to levels comparable to the fastest baseline via lightweight models, batching, and parallelism.
Provider-specific detection works in cross-provider and multi-user scenarios.
The method outperforms baselines on utility, detectability, and robustness metrics including 0.983 BERTScore and 98.2 percent detection rate.
A public leaderboard and interactive demo are released to standardize future comparisons.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The synonym-substitution approach could be tested on code generation or summarization tasks to check if the same preservation properties hold.
Stronger adversarial paraphrasing attacks beyond those evaluated could still remove the watermark signal.
Embedding the watermark at API level would allow automatic ownership checks on any downstream use of the outputs.
The released leaderboard may drive standardized benchmarks that include multi-provider key collision tests.

Load-bearing premise

The tournament sampling will keep producing synonyms that preserve factual consistency and named entities across diverse real-world prompts without creating artifacts an adversary can exploit.

What would settle it

An evaluation on a held-out domain showing that watermarked outputs have measurably lower factual accuracy or that fine-tuning a surrogate model removes the detectable signal while retaining task performance.

Figures

Figures reproduced from arXiv: 2605.23175 by Kieu Dang, NHatHai Phan, Phung Lai, Ruoming Jin, Yelong Shen.

**Figure 2.** Figure 2: Impact of similarity threshold δ, watermarkable set size |T wm|, and correlation with text length on LLaMA-2. (Solid lines show expected deviations; shaded areas are two bounds) Theorem 4.1. For a similarity threshold δ, the expected deviation between y and y wm is bounded: dlb |T wm| ≤ E ∆(y, ywm) ≤ dub |T wm| , (6) where |T wm| is the number of watermarkable tokens in y (controlled by δ), and dlb and… view at source ↗

**Figure 3.** Figure 3: Utility and detectability performance on text generation and summarization. [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 4.** Figure 4: AMT results on text generation, removal attacks, and detection on the attacks (LLaMA-2). [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 5.** Figure 5: Watermark leaderboard and interactive demo interface. [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗

**Figure 6.** Figure 6: ROC curves for watermark detection in text generation under attack-free settings. [PITH_FULL_IMAGE:figures/full_fig_p019_6.png] view at source ↗

**Figure 7.** Figure 7: Performance of text generation and summarization on Mistral. [PITH_FULL_IMAGE:figures/full_fig_p020_7.png] view at source ↗

**Figure 8.** Figure 8: Preference of SAFESEAL over other watermarks in head-to-head human evaluation with different text length. outperforming all baselines. On Mistral, it remains highly competitive with an AUC of 0.9937, closely matching the best-performing methods. By contrast, SynthID performs much worse on both models (AUC ≈ 0.59), suggesting limited discriminative capability in this 200-token generation setting. Overall, t… view at source ↗

**Figure 9.** Figure 9: Impact of lookup similarity threshold δ, watermarkable set size |T wm|, and correlation with text length for Mistral [PITH_FULL_IMAGE:figures/full_fig_p021_9.png] view at source ↗

**Figure 10.** Figure 10: Detection rates across detectors. already weak on short-text tasks, their detection drops to near 0 – 1% when a surrogate reproduces outputs. This further highlights the robustness of SAFESEAL across both LLMs. D.10 Justification for Q7. Cross-provider Performance [PITH_FULL_IMAGE:figures/full_fig_p023_10.png] view at source ↗

**Figure 11.** Figure 11: ATM template for an example evaluation. Workers select the best watermarked output based on the following four criteria: (1) relevance to the original LLM output, assessing how well the watermarked version retains the original semantic meaning; (2) grammatical correctness, evaluating fluency and adherence to standard grammar rules; (3) factual consistency, ensuring that the content remains and align well … view at source ↗

read the original abstract

Proprietary large language models (LLMs) face risks of intellectual property (IP) violation, as adversaries can replicate an LLM by collecting input-output pairs to train a surrogate model, causing financial setbacks. Watermarks offer a promising defense to verify ownership, but existing methods often struggle with semantic distortion, factual inconsistency, and adversarial attacks. In addition, key-conditioned watermarks for provider-specific detection, especially in cross-provider and multi-user scenarios, remain largely underexplored. To address these challenges, we propose SAFESEAL, a novel key-conditioned watermarking framework that achieves strong detectability with minimal impact on model utility, effectively balancing detectability, utility, and robustness. SAFESEAL preserves named entities while substituting linguistic terms with context-aware synonyms through a key-conditioned Tournament sampling mechanism, maintaining semantic fidelity and factual consistency. For detection, we introduce a key-conditioned contrastive detector that jointly encodes the text and key, enabling provider-specific and robust watermark verification. We derive theoretical bounds on the utility-detectability trade-off and significantly reduce latency through lightweight models, batching, and parallelism. Extensive experiments show that SAFESEAL outperforms baselines in utility, detectability, and robustness, achieving a BERTScore of 0.983, entity similarity of 0.963, a 98.2% detection rate, and the highest human ratings for text quality and content preservation, with latency comparable to the fastest baseline. To promote transparency and community-driven progress, we release the first public watermark leaderboard and an interactive demo.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

SAFESEAL's key-conditioned tournament sampling plus contrastive detector is a practical incremental step with solid reported metrics on semantic preservation and detection.

read the letter

The main thing to know is that SAFESEAL conditions both synonym replacement and detection on a secret key. It uses tournament sampling to pick context-aware synonyms while protecting named entities, then feeds text and key into a joint contrastive detector for provider-specific checks. That framing is not in the prior work they cite, and it directly targets the multi-user and cross-provider cases they flag in the abstract. The experiments back the balance they claim: BERTScore at 0.983, entity similarity at 0.963, 98.2 percent detection, top human ratings on quality, and latency on par with the fastest baselines. They also release a public leaderboard and demo, which is a useful concrete addition. Theoretical bounds on the utility-detectability trade-off are included, and the stress-test found no internal contradictions in the full manuscript. The soft spots are limited. The key assumption is that the sampled synonyms will not create exploitable artifacts or factual slips across real prompts and domains. The robustness tests address some attacks, but they may not exhaust every possible stripping method or domain shift. Details on how the bounds are derived from the sampling process would help, though this is not a load-bearing gap. This paper is for people working on LLM IP protection and watermarking methods. A reader who needs a concrete, evaluated technique plus public resources would get value from it. It deserves a serious referee because the empirical results are competitive and the approach adds a workable angle to the existing literature.

Referee Report

2 major / 1 minor

Summary. The paper proposes SAFESEAL, a key-conditioned LLM watermarking framework for IP protection. It employs tournament sampling (key-conditioned) to replace terms with context-aware synonyms while preserving named entities and factual consistency, paired with a key-conditioned contrastive detector for provider-specific verification. Claims include strong empirical performance (BERTScore 0.983, entity similarity 0.963, 98.2% detection rate, highest human ratings), theoretical bounds on the utility-detectability trade-off, reduced latency via lightweight models/batching, robustness to attacks, and release of a public watermark leaderboard plus demo.

Significance. If the reported metrics and bounds hold under proper controls, the work offers a practical advance in balancing detectability, utility, and robustness for LLM watermarking, particularly for cross-provider and multi-user settings. The open leaderboard and demo are constructive contributions to the field.

major comments (2)

[Experiments] Experiments section: The manuscript reports headline metrics (BERTScore 0.983, entity similarity 0.963, 98.2% detection rate) and outperformance over baselines but provides no details on experimental controls, number of prompts/domains tested, error bars, statistical significance tests, or post-hoc selection criteria, which is load-bearing for the central empirical claims.
[Theoretical Analysis] Theoretical Analysis section: The abstract states that theoretical bounds on the utility-detectability trade-off are derived, yet no equations, assumptions, or derivation steps are referenced or shown in the summary material, preventing assessment of whether the bounds are non-vacuous or independent of fitted parameters.

minor comments (1)

[Abstract] Abstract: The claim of 'latency comparable to the fastest baseline' lacks specific numerical values or table references for direct comparison.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback and the recommendation for major revision. We address the two major comments point by point below.

read point-by-point responses

Referee: [Experiments] Experiments section: The manuscript reports headline metrics (BERTScore 0.983, entity similarity 0.963, 98.2% detection rate) and outperformance over baselines but provides no details on experimental controls, number of prompts/domains tested, error bars, statistical significance tests, or post-hoc selection criteria, which is load-bearing for the central empirical claims.

Authors: We agree that the current presentation of the Experiments section lacks sufficient detail on these aspects. In the revised manuscript we will expand the section to report the number of prompts and domains evaluated, the experimental controls employed, error bars computed over multiple independent runs, the results of statistical significance tests, and the criteria used for any post-hoc analysis. revision: yes
Referee: [Theoretical Analysis] Theoretical Analysis section: The abstract states that theoretical bounds on the utility-detectability trade-off are derived, yet no equations, assumptions, or derivation steps are referenced or shown in the summary material, preventing assessment of whether the bounds are non-vacuous or independent of fitted parameters.

Authors: The Theoretical Analysis section of the manuscript contains the derivation of the bounds. To make the material self-contained and allow direct assessment, we will revise the section to explicitly state all modeling assumptions, present the key equations, and include the complete derivation steps, confirming that the bounds follow from information-theoretic arguments without dependence on fitted parameters. revision: yes

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The manuscript presents SAFESEAL as an empirical proposal: a key-conditioned tournament sampling mechanism for watermark insertion and a contrastive detector, supported by experimental metrics (BERTScore 0.983, 98.2% detection) and stated theoretical bounds on the utility-detectability trade-off. No equations, derivations, or self-citations are shown that reduce the reported performance numbers, detection rates, or bounds to quantities defined by fitted parameters from the same experiments or to prior self-citations by construction. The central claims rest on the proposed mechanisms and external experimental validation rather than any of the enumerated circular patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review provides no explicit free parameters, axioms, or invented entities; the method implicitly relies on standard LLM generation assumptions and synonym availability but these are not enumerated.

pith-pipeline@v0.9.0 · 5812 in / 1235 out tokens · 44430 ms · 2026-05-25T04:32:04.207603+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

key-conditioned Tournament sampling... Pi(tj) = exp(α Sij) / sum... gr(tj) = PRF(k, h, tj, r)
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Theorem 4.1... dlb/|Twm| ≤ E[Δ(y,ywm)] ≤ dub/|Twm|

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

113 extracted references · 113 canonical work pages

[1]

Downstream trade-offs of a family of text watermarks

Anirudh Ajith, Sameer Singh, and Danish Pruthi. Downstream trade-offs of a family of text watermarks. InEMNLP, 2024

work page 2024
[2]

ms/ AzureMLModelInterpretability, 2021

Azure.https: // aka. ms/ AzureMLModelInterpretability, 2021

work page 2021
[3]

Cross-attention watermarking of large language models

Folco Bertini Baldassini, Huy H Nguyen, Ching-Chung Chang, and Isao Echizen. Cross-attention watermarking of large language models. InICASSP, 2024

work page 2024
[4]

Deepseek llm: Scaling open-source language models with longtermism.arXiv, 2024

Xiao Bi, Deli Chen, Guanting Chen, et al. Deepseek llm: Scaling open-source language models with longtermism.arXiv, 2024. unpublished

work page 2024
[5]

Model leeching: an extraction attack targeting llms.CAMLIS, 2023

Lewis Birch, William Hackett, Stefan Trawicki, Neeraj Suri, and Peter Garraghan. Model leeching: an extraction attack targeting llms.CAMLIS, 2023

work page 2023
[6]

O’Reilly Media, Inc

Steven Bird, Ewan Klein, and Edward Loper.Natural language processing with Python: analyzing text with the natural language toolkit. " O’Reilly Media, Inc.", 2009

work page 2009
[7]

Ai explainability 360.Available athttps: // aix360

Bluemix. Ai explainability 360.Available athttps: // aix360. mybluemix. net/, 2021

work page 2021
[8]

Stealing part of a production language model

Nicholas Carlini, Daniel Paleka, et al. Stealing part of a production language model. InICML, 2024

work page 2024
[9]

Evaluation of text generation: A survey.arXiv, 2020

Asli Celikyilmaz, Elizabeth Clark, et al. Evaluation of text generation: A survey.arXiv, 2020

work page 2020
[10]

PostMark: A robust blackbox watermark for LLMs

Yapei Chang et al. PostMark: A robust blackbox watermark for LLMs. InEMNLP, 2024

work page 2024
[11]

Danqi Chen, Jason Bolton, and Christopher D. Manning. A thorough examination of the CNN/Daily Mail reading comprehension task. InACL, 2016

work page 2016
[12]

Watme: Towards lossless watermarking through lexical redundancy

Liang Chen, Yatao Bian, Yang Deng, Deng Cai, Shuaiyi Li, Peilin Zhao, and Kam-Fai Wong. Watme: Towards lossless watermarking through lexical redundancy. InACL, pages 9166–9180, 2024

work page 2024
[13]

Improved unbiased watermark for large language models

Ruibo Chen, Yihan Wu, Junfeng Guo, and Heng Huang. Improved unbiased watermark for large language models. InACL, pages 20587–20601, July 2025. ISBN 979-8-89176-251-0

work page 2025
[14]

Revealing weaknesses in text watermark- ing through self-information rewrite attacks

Yixin Cheng, Hongcheng Guo, Yangming Li, and Leonid Sigal. Revealing weaknesses in text watermark- ing through self-information rewrite attacks. InICML, 2025

work page 2025
[15]

Undetectable watermarks for language models

Miranda Christ, Sam Gunn, and Or Zamir. Undetectable watermarks for language models. InThe Thirty Seventh Annual Conference on Learning Theory, pages 1125–1139. PMLR, 2024

work page 2024
[16]

δ-steal: Llm stealing attack with local differential privacy

Kieu Dang, Phung Lai, NhatHai Phan, Yelong Shen, Ruoming Jin, and Abdallah Khreishah. δ-steal: Llm stealing attack with local differential privacy. InACML, 2025. in press

work page 2025
[17]

Watermarking language models through language models.IEEE Transactions on Artificial Intelligence, 2025

Agnibh Dasgupta, Abdullah All Tanvir, and Xin Zhong. Watermarking language models through language models.IEEE Transactions on Artificial Intelligence, 2025

work page 2025
[18]

Scalable watermarking for identifying large language model outputs.Nature, 634(8035):818–823, 2024

Sumanth Dathathri, Abigail See, Sumedh Ghaisas, Po-Sen Huang, Rob McAdam, Johannes Welbl, et al. Scalable watermarking for identifying large language model outputs.Nature, 634(8035):818–823, 2024

work page 2024
[19]

Bert: Pre-training of deep bidirectional transformers for language understanding

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. InACL, 2019

work page 2019
[20]

Documenting large webtext corpora: A case study on the colossal clean crawled corpus

Jesse Dodge, Maarten Sap, Ana Marasovi´c, William Agnew, Gabriel Ilharco, et al. Documenting large webtext corpora: A case study on the colossal clean crawled corpus. InEMNLP, 2021

work page 2021
[21]

Sandcastles in the storm: Revisiting the (im) possibility of strong watermarking

Boran Erol, Connor Choi, Jason Liu, Gary Jiarui Song, Nanyun Peng, Amit Sahai, et al. Sandcastles in the storm: Revisiting the (im) possibility of strong watermarking. InACL, pages 29698–29735, 2025

work page 2025
[22]

An introduction to roc analysis.Pattern Recognition Letters, 27(8):861–874, 2006

Tom Fawcett. An introduction to roc analysis.Pattern Recognition Letters, 27(8):861–874, 2006

work page 2006
[23]

Gumbelsoft: Diversified LM watermarking via the gumbelmax-trick

Jiayi Fu et al. Gumbelsoft: Diversified LM watermarking via the gumbelmax-trick. InACL, 2024

work page 2024
[24]

Watermax: breaking the llm watermark detectability-robustness-quality trade-off

Eva Giboulot and Furon Teddy. Watermax: breaking the llm watermark detectability-robustness-quality trade-off. InNeurIPS, 2024

work page 2024
[25]

Edit distance robust watermarks for LMs

Noah Golowich and Ankur Moitra. Edit distance robust watermarks for LMs. InNeurIPS, 2024

work page 2024
[26]

Google gemini

Google. Google gemini. https://bard.google.com/chat/, 2024. 11

work page 2024
[27]

On the learnability of watermarks for LMs.ICLR, 2024

Chenchen Gu, Xiang Lisa Li, et al. On the learnability of watermarks for LMs.ICLR, 2024

work page 2024
[28]

Watermarking pre-trained language models with backdooring.arXiv, 2022

Chenxi Gu et al. Watermarking pre-trained language models with backdooring.arXiv, 2022

work page 2022
[29]

Post-hoc watermarking for robust detection in text generated by LLMs

Jifei Hao et al. Post-hoc watermarking for robust detection in text generated by LLMs. InICCL, 2025

work page 2025
[30]

Deberta: Decoding-enhanced bert with disentangled attention

Pengcheng He et al. Deberta: Decoding-enhanced bert with disentangled attention. InICLR, 2021

work page 2021
[31]

Protecting intellectual property of language generation apis with lexical watermark

Xuanli He, Qiongkai Xu, Lingjuan Lyu, Fangzhao Wu, and Chenguang Wang. Protecting intellectual property of language generation apis with lexical watermark. InAAAI, 2022

work page 2022
[32]

Cater: Intellectual property protection on text generation apis via conditional watermarks.NeurIPS, 2022

Xuanli He, Qiongkai Xu, Yi Zeng, Lingjuan Lyu, Fangzhao Wu, Jiwei Li, and Ruoxi Jia. Cater: Intellectual property protection on text generation apis via conditional watermarks.NeurIPS, 2022

work page 2022
[33]

Measuring massive multitask language understanding

Dan Hendrycks, Collin Burns, Steven Basart, Andy Zou, Mantas Mazeika, Dawn Song, and Jacob Steinhardt. Measuring massive multitask language understanding. InICLR, 2021

work page 2021
[34]

spaCy: Industrial-strength NLP in python

Matthew Honnibal, Ines Montani, et al. spaCy: Industrial-strength NLP in python. 2020

work page 2020
[35]

Semstamp: A semantic watermark with paraphrastic robustness for text generation

Abe Hou, Jingyu Zhang, Tianxing He, Yichen Wang, et al. Semstamp: A semantic watermark with paraphrastic robustness for text generation. InNAACL, pages 4067–4082, 2024

work page 2024
[36]

k-semstamp: A clustering- based semantic watermark for detection of machine-generated text

Abe Hou, Jingyu Zhang, Yichen Wang, Daniel Khashabi, and Tianxing He. k-semstamp: A clustering- based semantic watermark for detection of machine-generated text. InACL, pages 1706–1715, 2024

work page 2024
[37]

Unbiased watermark for LLMs

Zhengmian Hu, Lichang Chen, Xidong Wu, et al. Unbiased watermark for LLMs. InICLR, 2024

work page 2024
[38]

WaterPool: A LM watermark mitigating trade-offs among impercepti- bility, efficacy and robustness

Baizhou Huang and Xiaojun Wan. WaterPool: A LM watermark mitigating trade-offs among impercepti- bility, efficacy and robustness. InNAACL: HLT, 2025

work page 2025
[39]

Token-specific watermarking with enhanced detectability and semantic coherence for LLMs

Mingjia Huo, Sai Ashish Somayajula, Youwei Liang, Ruisi Zhang, et al. Token-specific watermarking with enhanced detectability and semantic coherence for LLMs. InICML, 2024

work page 2024
[40]

Scaling up visual and vision-language representation learning with noisy text supervision

Chao Jia et al. Scaling up visual and vision-language representation learning with noisy text supervision. InICML, 2021

work page 2021
[41]

Entangled watermarks as a defense against model extraction

Hengrui Jia et al. Entangled watermarks as a defense against model extraction. InUSENIX, 2021

work page 2021
[42]

Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Florian Bressand, Gianna Lengyel, et al

Albert Q. Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Florian Bressand, Gianna Lengyel, et al. Mistral 7b, 2023. unpublished

work page 2023
[43]

Watermark stealing in LLMs

Nikola Jovanovi ´c, Robin Staab, and Martin Vechev. Watermark stealing in LLMs. InICML, 2024

work page 2024
[44]

Prada: protecting against dnn model stealing attacks

Mika Juuti, Sebastian Szyller, Samuel Marchal, and N Asokan. Prada: protecting against dnn model stealing attacks. InEuroS&P, pages 512–527. IEEE, 2019

work page 2019
[45]

A watermark for large language models

John Kirchenbauer, Jonas Geiping, Yuxin Wen, Jonathan Katz, Ian Miers, and Tom Goldstein. A watermark for large language models. InICML, 2023

work page 2023
[46]

Thieves on sesame street! model extraction of bert-based apis.ICLR, 2020

Kalpesh Krishna, Gaurav Singh Tomar, Ankur P Parikh, Nicolas Papernot, and Mohit Iyyer. Thieves on sesame street! model extraction of bert-based apis.ICLR, 2020

work page 2020
[47]

Paraphrasing evades detectors of ai-generated text, but retrieval is an effective defense.NeurIPS, 36:27469–27500, 2023

Kalpesh Krishna, Yixiao Song, Marzena Karpinska, John Wieting, and Mohit Iyyer. Paraphrasing evades detectors of ai-generated text, but retrieval is an effective defense.NeurIPS, 36:27469–27500, 2023

work page 2023
[48]

Robust distortion-free watermarks for LMs.TMLR, 2023

Rohith Kuditipudi, John Thickstun, et al. Robust distortion-free watermarks for LMs.TMLR, 2023

work page 2023
[49]

Kwon et al

W. Kwon et al. Efficient memory management for LLM serving with pagedattention. InSOSP, 2023

work page 2023
[50]

Who wrote this code? watermarking for code generation

Taehyun Lee, Seokhee Hong, Jaewoo Ahn, Ilgee Hong, Hwaran Lee, Sangdoo Yun, Jamin Shin, and Gunhee Kim. Who wrote this code? watermarking for code generation. InACL, 2024

work page 2024
[51]

Binary codes capable of correcting deletions, insertions, and reversals.Soviet Physics Doklady, 10(8):707–710, 1966

Vladimir I Levenshtein. Binary codes capable of correcting deletions, insertions, and reversals.Soviet Physics Doklady, 10(8):707–710, 1966

work page 1966
[52]

Plmmark: a secure and robust black-box watermarking framework for pre-trained language model

Peixuan Li, Pengzhou Cheng, Fangqi Li, Wei Du, Haodong Zhao, and Gongshen Liu. Plmmark: a secure and robust black-box watermarking framework for pre-trained language model. InAAAI, 2023

work page 2023
[53]

Protecting intellectual property of large language model-based code generation apis via watermarks

Zongjie Li, Chaozheng Wang, Shuai Wang, and Cuiyun Gao. Protecting intellectual property of large language model-based code generation apis via watermarks. InACM SIGSAC, pages 2336–2350, 2023. 12

work page 2023
[54]

Towards document-level paraphrase generation with sentence rewriting and reordering

Zhe Lin, Yitao Cai, and Xiaojun Wan. Towards document-level paraphrase generation with sentence rewriting and reordering. InEMNLP, 2021

work page 2021
[55]

An unforgeable publicly verifiable watermark for LLMs

Aiwei Liu, Leyi Pan, et al. An unforgeable publicly verifiable watermark for LLMs. InICLR, 2023

work page 2023
[56]

A semantic invariant robust watermark for LLMs.ICLR, 2024

Aiwei Liu, Leyi Pan, Xuming Hu, et al. A semantic invariant robust watermark for LLMs.ICLR, 2024

work page 2024
[57]

Deepseek-v2: A strong, economical, and efficient mixture-of-experts LM.arXiv, 2024

Aixin Liu et al. Deepseek-v2: A strong, economical, and efficient mixture-of-experts LM.arXiv, 2024

work page 2024
[58]

Adaptive text watermark for large language models

Yepeng Liu and Yuheng Bu. Adaptive text watermark for large language models. InICML, 2024

work page 2024
[59]

Roberta: A robustly optimized bert pretraining approach.arXiv, 2019

Yinhan Liu, Myle Ott, et al. Roberta: A robustly optimized bert pretraining approach.arXiv, 2019

work page 2019
[60]

Nltk: The natural language toolkit.arXiv, 2002

Edward Loper and Steven Bird. Nltk: The natural language toolkit.arXiv, 2002. unpublished

work page 2002
[61]

A watermark for low-entropy and unbiased generation in large language models.arXiv, 2024

Minjia Mao, Dongjun Wei, Zeyu Chen, Xiao Fang, and Michael Chau. A watermark for low-entropy and unbiased generation in large language models.arXiv, 2024

work page 2024
[62]

Natural language watermarking via morphosyntactic alterations.Computer Speech & Language, 23(1):107–125, 2009

Hasan Mesut Meral, Bülent Sankur, A Sumru Özsoy, Tunga Güngör, and Emre Sevinç. Natural language watermarking via morphosyntactic alterations.Computer Speech & Language, 23(1):107–125, 2009

work page 2009
[63]

Wordnet: a lexical database for english.ACM Communications, 38(11):39–41, 1995

George A Miller. Wordnet: a lexical database for english.ACM Communications, 38(11):39–41, 1995

work page 1995
[64]

Molenda et al

P. Molenda et al. Waterjudge: Quality-detection trade-off when watermarking LLMs. InNAACL, 2024

work page 2024
[65]

Deeptextmark: a deep learning-driven text watermarking approach for identifying LLM generated text.Ieee Access, 12:40508–40520, 2024

Travis Munyer, Abdullah All Tanvir, Arjon Das, and Xin Zhong. Deeptextmark: a deep learning-driven text watermarking approach for identifying LLM generated text.Ieee Access, 12:40508–40520, 2024

work page 2024
[66]

A survey of named entity recognition and classification.Lingvisticae Investigationes, 2007

David Nadeau and Satoshi Sekine. A survey of named entity recognition and classification.Lingvisticae Investigationes, 2007

work page 2007
[67]

Entity-level factual consistency of abstractive text summarization

Feng Nan et al. Entity-level factual consistency of abstractive text summarization. InACL, 2021

work page 2021
[68]

OpenAI, 2024.https://openai.com/index/openai-api/[Accessed: 2024-09-20]

work page 2024
[69]

Markllm: An open-source toolkit for llm watermarking

Leyi Pan, Aiwei Liu, Zhiwei He, Zitian Gao, Xuandong Zhao, Yijian Lu, Binglin Zhou, et al. Markllm: An open-source toolkit for llm watermarking. InEMNLP, pages 61–71, 2024

work page 2024
[70]

Modelshield: Adaptive and robust watermark against model extraction attack.TIFS, 2025

Kaiyi Pang, Tao Qi, Chuhan Wu, Minhao Bai, Minghu Jiang, and Yongfeng Huang. Modelshield: Adaptive and robust watermark against model extraction attack.TIFS, 2025

work page 2025
[71]

No free lunch in llm watermarking: Trade-offs in watermarking design choices.NeurIPS, 2024

Qi Pang, Shengyuan Hu, Wenting Zheng, and Virginia Smith. No free lunch in llm watermarking: Trade-offs in watermarking design choices.NeurIPS, 2024

work page 2024
[72]

Llmmap: Fingerprinting for large language models

Dario Pasquini et al. Llmmap: Fingerprinting for large language models. InUSENIX Security, 2025

work page 2025
[73]

A universal part-of-speech tagset

Slav Petrov, Dipanjan Das, and Ryan McDonald. A universal part-of-speech tagset. InInternational Conference on Language Resources and Evaluation, 2012

work page 2012
[74]

Markmywords: Analyzing and evaluating language model watermarks

Julien Piet, Chawin Sitawarin, Vivian Fang, Norman Mu, and David Wagner. Markmywords: Analyzing and evaluating language model watermarks. InSaTML, pages 68–91. IEEE, 2025

work page 2025
[75]

Stanza: A python natural language processing toolkit for many human languages

Peng Qi, Yuhao Zhang, Yuhui Zhang, Jason Bolton, and Christopher D Manning. Stanza: A python natural language processing toolkit for many human languages. InACL: System Demonstrations, 2020

work page 2020
[76]

Provably robust multi-bit watermarking for AI-generated text

Wenjie Qu, Wengrui Zheng, Tianyang Tao, Dong Yin, et al. Provably robust multi-bit watermarking for AI-generated text. InUSENIX Security, pages 201–220, 2025

work page 2025
[77]

Radford et al

A. Radford et al. Learning transferable visual models from natural language supervision. InICML, 2021

work page 2021
[78]

A robust semantics-based watermark for large language model against paraphrasing

Jie Ren, Han Xu, Yiding Liu, Yingqian Cui, Shuaiqiang Wang, Dawei Yin, and Jiliang Tang. A robust semantics-based watermark for large language model against paraphrasing. InNAACL 2024, 2024

work page 2024
[79]

Can ai-generated text be reliably detected?TMLR, 2023

Vinu Sankar Sadasivan, Aounon Kumar, Sriram Balasubramanian, Wenxiao Wang, and Soheil Feizi. Can ai-generated text be reliably detected?TMLR, 2023

work page 2023
[80]

Watermarking makes language models radioactive.NeurIPS, 37, 2024

Tom Sander, Pierre Fernandez, Alain Durmus, Matthijs Douze, and Teddy Furon. Watermarking makes language models radioactive.NeurIPS, 37, 2024

work page 2024

Showing first 80 references.

[1] [1]

Downstream trade-offs of a family of text watermarks

Anirudh Ajith, Sameer Singh, and Danish Pruthi. Downstream trade-offs of a family of text watermarks. InEMNLP, 2024

work page 2024

[2] [2]

ms/ AzureMLModelInterpretability, 2021

Azure.https: // aka. ms/ AzureMLModelInterpretability, 2021

work page 2021

[3] [3]

Cross-attention watermarking of large language models

Folco Bertini Baldassini, Huy H Nguyen, Ching-Chung Chang, and Isao Echizen. Cross-attention watermarking of large language models. InICASSP, 2024

work page 2024

[4] [4]

Deepseek llm: Scaling open-source language models with longtermism.arXiv, 2024

Xiao Bi, Deli Chen, Guanting Chen, et al. Deepseek llm: Scaling open-source language models with longtermism.arXiv, 2024. unpublished

work page 2024

[5] [5]

Model leeching: an extraction attack targeting llms.CAMLIS, 2023

Lewis Birch, William Hackett, Stefan Trawicki, Neeraj Suri, and Peter Garraghan. Model leeching: an extraction attack targeting llms.CAMLIS, 2023

work page 2023

[6] [6]

O’Reilly Media, Inc

Steven Bird, Ewan Klein, and Edward Loper.Natural language processing with Python: analyzing text with the natural language toolkit. " O’Reilly Media, Inc.", 2009

work page 2009

[7] [7]

Ai explainability 360.Available athttps: // aix360

Bluemix. Ai explainability 360.Available athttps: // aix360. mybluemix. net/, 2021

work page 2021

[8] [8]

Stealing part of a production language model

Nicholas Carlini, Daniel Paleka, et al. Stealing part of a production language model. InICML, 2024

work page 2024

[9] [9]

Evaluation of text generation: A survey.arXiv, 2020

Asli Celikyilmaz, Elizabeth Clark, et al. Evaluation of text generation: A survey.arXiv, 2020

work page 2020

[10] [10]

PostMark: A robust blackbox watermark for LLMs

Yapei Chang et al. PostMark: A robust blackbox watermark for LLMs. InEMNLP, 2024

work page 2024

[11] [11]

Danqi Chen, Jason Bolton, and Christopher D. Manning. A thorough examination of the CNN/Daily Mail reading comprehension task. InACL, 2016

work page 2016

[12] [12]

Watme: Towards lossless watermarking through lexical redundancy

Liang Chen, Yatao Bian, Yang Deng, Deng Cai, Shuaiyi Li, Peilin Zhao, and Kam-Fai Wong. Watme: Towards lossless watermarking through lexical redundancy. InACL, pages 9166–9180, 2024

work page 2024

[13] [13]

Improved unbiased watermark for large language models

Ruibo Chen, Yihan Wu, Junfeng Guo, and Heng Huang. Improved unbiased watermark for large language models. InACL, pages 20587–20601, July 2025. ISBN 979-8-89176-251-0

work page 2025

[14] [14]

Revealing weaknesses in text watermark- ing through self-information rewrite attacks

Yixin Cheng, Hongcheng Guo, Yangming Li, and Leonid Sigal. Revealing weaknesses in text watermark- ing through self-information rewrite attacks. InICML, 2025

work page 2025

[15] [15]

Undetectable watermarks for language models

Miranda Christ, Sam Gunn, and Or Zamir. Undetectable watermarks for language models. InThe Thirty Seventh Annual Conference on Learning Theory, pages 1125–1139. PMLR, 2024

work page 2024

[16] [16]

δ-steal: Llm stealing attack with local differential privacy

Kieu Dang, Phung Lai, NhatHai Phan, Yelong Shen, Ruoming Jin, and Abdallah Khreishah. δ-steal: Llm stealing attack with local differential privacy. InACML, 2025. in press

work page 2025

[17] [17]

Watermarking language models through language models.IEEE Transactions on Artificial Intelligence, 2025

Agnibh Dasgupta, Abdullah All Tanvir, and Xin Zhong. Watermarking language models through language models.IEEE Transactions on Artificial Intelligence, 2025

work page 2025

[18] [18]

Scalable watermarking for identifying large language model outputs.Nature, 634(8035):818–823, 2024

Sumanth Dathathri, Abigail See, Sumedh Ghaisas, Po-Sen Huang, Rob McAdam, Johannes Welbl, et al. Scalable watermarking for identifying large language model outputs.Nature, 634(8035):818–823, 2024

work page 2024

[19] [19]

Bert: Pre-training of deep bidirectional transformers for language understanding

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. InACL, 2019

work page 2019

[20] [20]

Documenting large webtext corpora: A case study on the colossal clean crawled corpus

Jesse Dodge, Maarten Sap, Ana Marasovi´c, William Agnew, Gabriel Ilharco, et al. Documenting large webtext corpora: A case study on the colossal clean crawled corpus. InEMNLP, 2021

work page 2021

[21] [21]

Sandcastles in the storm: Revisiting the (im) possibility of strong watermarking

Boran Erol, Connor Choi, Jason Liu, Gary Jiarui Song, Nanyun Peng, Amit Sahai, et al. Sandcastles in the storm: Revisiting the (im) possibility of strong watermarking. InACL, pages 29698–29735, 2025

work page 2025

[22] [22]

An introduction to roc analysis.Pattern Recognition Letters, 27(8):861–874, 2006

Tom Fawcett. An introduction to roc analysis.Pattern Recognition Letters, 27(8):861–874, 2006

work page 2006

[23] [23]

Gumbelsoft: Diversified LM watermarking via the gumbelmax-trick

Jiayi Fu et al. Gumbelsoft: Diversified LM watermarking via the gumbelmax-trick. InACL, 2024

work page 2024

[24] [24]

Watermax: breaking the llm watermark detectability-robustness-quality trade-off

Eva Giboulot and Furon Teddy. Watermax: breaking the llm watermark detectability-robustness-quality trade-off. InNeurIPS, 2024

work page 2024

[25] [25]

Edit distance robust watermarks for LMs

Noah Golowich and Ankur Moitra. Edit distance robust watermarks for LMs. InNeurIPS, 2024

work page 2024

[26] [26]

Google gemini

Google. Google gemini. https://bard.google.com/chat/, 2024. 11

work page 2024

[27] [27]

On the learnability of watermarks for LMs.ICLR, 2024

Chenchen Gu, Xiang Lisa Li, et al. On the learnability of watermarks for LMs.ICLR, 2024

work page 2024

[28] [28]

Watermarking pre-trained language models with backdooring.arXiv, 2022

Chenxi Gu et al. Watermarking pre-trained language models with backdooring.arXiv, 2022

work page 2022

[29] [29]

Post-hoc watermarking for robust detection in text generated by LLMs

Jifei Hao et al. Post-hoc watermarking for robust detection in text generated by LLMs. InICCL, 2025

work page 2025

[30] [30]

Deberta: Decoding-enhanced bert with disentangled attention

Pengcheng He et al. Deberta: Decoding-enhanced bert with disentangled attention. InICLR, 2021

work page 2021

[31] [31]

Protecting intellectual property of language generation apis with lexical watermark

Xuanli He, Qiongkai Xu, Lingjuan Lyu, Fangzhao Wu, and Chenguang Wang. Protecting intellectual property of language generation apis with lexical watermark. InAAAI, 2022

work page 2022

[32] [32]

Cater: Intellectual property protection on text generation apis via conditional watermarks.NeurIPS, 2022

Xuanli He, Qiongkai Xu, Yi Zeng, Lingjuan Lyu, Fangzhao Wu, Jiwei Li, and Ruoxi Jia. Cater: Intellectual property protection on text generation apis via conditional watermarks.NeurIPS, 2022

work page 2022

[33] [33]

Measuring massive multitask language understanding

Dan Hendrycks, Collin Burns, Steven Basart, Andy Zou, Mantas Mazeika, Dawn Song, and Jacob Steinhardt. Measuring massive multitask language understanding. InICLR, 2021

work page 2021

[34] [34]

spaCy: Industrial-strength NLP in python

Matthew Honnibal, Ines Montani, et al. spaCy: Industrial-strength NLP in python. 2020

work page 2020

[35] [35]

Semstamp: A semantic watermark with paraphrastic robustness for text generation

Abe Hou, Jingyu Zhang, Tianxing He, Yichen Wang, et al. Semstamp: A semantic watermark with paraphrastic robustness for text generation. InNAACL, pages 4067–4082, 2024

work page 2024

[36] [36]

k-semstamp: A clustering- based semantic watermark for detection of machine-generated text

Abe Hou, Jingyu Zhang, Yichen Wang, Daniel Khashabi, and Tianxing He. k-semstamp: A clustering- based semantic watermark for detection of machine-generated text. InACL, pages 1706–1715, 2024

work page 2024

[37] [37]

Unbiased watermark for LLMs

Zhengmian Hu, Lichang Chen, Xidong Wu, et al. Unbiased watermark for LLMs. InICLR, 2024

work page 2024

[38] [38]

WaterPool: A LM watermark mitigating trade-offs among impercepti- bility, efficacy and robustness

Baizhou Huang and Xiaojun Wan. WaterPool: A LM watermark mitigating trade-offs among impercepti- bility, efficacy and robustness. InNAACL: HLT, 2025

work page 2025

[39] [39]

Token-specific watermarking with enhanced detectability and semantic coherence for LLMs

Mingjia Huo, Sai Ashish Somayajula, Youwei Liang, Ruisi Zhang, et al. Token-specific watermarking with enhanced detectability and semantic coherence for LLMs. InICML, 2024

work page 2024

[40] [40]

Scaling up visual and vision-language representation learning with noisy text supervision

Chao Jia et al. Scaling up visual and vision-language representation learning with noisy text supervision. InICML, 2021

work page 2021

[41] [41]

Entangled watermarks as a defense against model extraction

Hengrui Jia et al. Entangled watermarks as a defense against model extraction. InUSENIX, 2021

work page 2021

[42] [42]

Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Florian Bressand, Gianna Lengyel, et al

Albert Q. Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Florian Bressand, Gianna Lengyel, et al. Mistral 7b, 2023. unpublished

work page 2023

[43] [43]

Watermark stealing in LLMs

Nikola Jovanovi ´c, Robin Staab, and Martin Vechev. Watermark stealing in LLMs. InICML, 2024

work page 2024

[44] [44]

Prada: protecting against dnn model stealing attacks

Mika Juuti, Sebastian Szyller, Samuel Marchal, and N Asokan. Prada: protecting against dnn model stealing attacks. InEuroS&P, pages 512–527. IEEE, 2019

work page 2019

[45] [45]

A watermark for large language models

John Kirchenbauer, Jonas Geiping, Yuxin Wen, Jonathan Katz, Ian Miers, and Tom Goldstein. A watermark for large language models. InICML, 2023

work page 2023

[46] [46]

Thieves on sesame street! model extraction of bert-based apis.ICLR, 2020

Kalpesh Krishna, Gaurav Singh Tomar, Ankur P Parikh, Nicolas Papernot, and Mohit Iyyer. Thieves on sesame street! model extraction of bert-based apis.ICLR, 2020

work page 2020

[47] [47]

Paraphrasing evades detectors of ai-generated text, but retrieval is an effective defense.NeurIPS, 36:27469–27500, 2023

Kalpesh Krishna, Yixiao Song, Marzena Karpinska, John Wieting, and Mohit Iyyer. Paraphrasing evades detectors of ai-generated text, but retrieval is an effective defense.NeurIPS, 36:27469–27500, 2023

work page 2023

[48] [48]

Robust distortion-free watermarks for LMs.TMLR, 2023

Rohith Kuditipudi, John Thickstun, et al. Robust distortion-free watermarks for LMs.TMLR, 2023

work page 2023

[49] [49]

Kwon et al

W. Kwon et al. Efficient memory management for LLM serving with pagedattention. InSOSP, 2023

work page 2023

[50] [50]

Who wrote this code? watermarking for code generation

Taehyun Lee, Seokhee Hong, Jaewoo Ahn, Ilgee Hong, Hwaran Lee, Sangdoo Yun, Jamin Shin, and Gunhee Kim. Who wrote this code? watermarking for code generation. InACL, 2024

work page 2024

[51] [51]

Binary codes capable of correcting deletions, insertions, and reversals.Soviet Physics Doklady, 10(8):707–710, 1966

Vladimir I Levenshtein. Binary codes capable of correcting deletions, insertions, and reversals.Soviet Physics Doklady, 10(8):707–710, 1966

work page 1966

[52] [52]

Plmmark: a secure and robust black-box watermarking framework for pre-trained language model

Peixuan Li, Pengzhou Cheng, Fangqi Li, Wei Du, Haodong Zhao, and Gongshen Liu. Plmmark: a secure and robust black-box watermarking framework for pre-trained language model. InAAAI, 2023

work page 2023

[53] [53]

Protecting intellectual property of large language model-based code generation apis via watermarks

Zongjie Li, Chaozheng Wang, Shuai Wang, and Cuiyun Gao. Protecting intellectual property of large language model-based code generation apis via watermarks. InACM SIGSAC, pages 2336–2350, 2023. 12

work page 2023

[54] [54]

Towards document-level paraphrase generation with sentence rewriting and reordering

Zhe Lin, Yitao Cai, and Xiaojun Wan. Towards document-level paraphrase generation with sentence rewriting and reordering. InEMNLP, 2021

work page 2021

[55] [55]

An unforgeable publicly verifiable watermark for LLMs

Aiwei Liu, Leyi Pan, et al. An unforgeable publicly verifiable watermark for LLMs. InICLR, 2023

work page 2023

[56] [56]

A semantic invariant robust watermark for LLMs.ICLR, 2024

Aiwei Liu, Leyi Pan, Xuming Hu, et al. A semantic invariant robust watermark for LLMs.ICLR, 2024

work page 2024

[57] [57]

Deepseek-v2: A strong, economical, and efficient mixture-of-experts LM.arXiv, 2024

Aixin Liu et al. Deepseek-v2: A strong, economical, and efficient mixture-of-experts LM.arXiv, 2024

work page 2024

[58] [58]

Adaptive text watermark for large language models

Yepeng Liu and Yuheng Bu. Adaptive text watermark for large language models. InICML, 2024

work page 2024

[59] [59]

Roberta: A robustly optimized bert pretraining approach.arXiv, 2019

Yinhan Liu, Myle Ott, et al. Roberta: A robustly optimized bert pretraining approach.arXiv, 2019

work page 2019

[60] [60]

Nltk: The natural language toolkit.arXiv, 2002

Edward Loper and Steven Bird. Nltk: The natural language toolkit.arXiv, 2002. unpublished

work page 2002

[61] [61]

A watermark for low-entropy and unbiased generation in large language models.arXiv, 2024

Minjia Mao, Dongjun Wei, Zeyu Chen, Xiao Fang, and Michael Chau. A watermark for low-entropy and unbiased generation in large language models.arXiv, 2024

work page 2024

[62] [62]

Natural language watermarking via morphosyntactic alterations.Computer Speech & Language, 23(1):107–125, 2009

Hasan Mesut Meral, Bülent Sankur, A Sumru Özsoy, Tunga Güngör, and Emre Sevinç. Natural language watermarking via morphosyntactic alterations.Computer Speech & Language, 23(1):107–125, 2009

work page 2009

[63] [63]

Wordnet: a lexical database for english.ACM Communications, 38(11):39–41, 1995

George A Miller. Wordnet: a lexical database for english.ACM Communications, 38(11):39–41, 1995

work page 1995

[64] [64]

Molenda et al

P. Molenda et al. Waterjudge: Quality-detection trade-off when watermarking LLMs. InNAACL, 2024

work page 2024

[65] [65]

Deeptextmark: a deep learning-driven text watermarking approach for identifying LLM generated text.Ieee Access, 12:40508–40520, 2024

Travis Munyer, Abdullah All Tanvir, Arjon Das, and Xin Zhong. Deeptextmark: a deep learning-driven text watermarking approach for identifying LLM generated text.Ieee Access, 12:40508–40520, 2024

work page 2024

[66] [66]

A survey of named entity recognition and classification.Lingvisticae Investigationes, 2007

David Nadeau and Satoshi Sekine. A survey of named entity recognition and classification.Lingvisticae Investigationes, 2007

work page 2007

[67] [67]

Entity-level factual consistency of abstractive text summarization

Feng Nan et al. Entity-level factual consistency of abstractive text summarization. InACL, 2021

work page 2021

[68] [68]

OpenAI, 2024.https://openai.com/index/openai-api/[Accessed: 2024-09-20]

work page 2024

[69] [69]

Markllm: An open-source toolkit for llm watermarking

Leyi Pan, Aiwei Liu, Zhiwei He, Zitian Gao, Xuandong Zhao, Yijian Lu, Binglin Zhou, et al. Markllm: An open-source toolkit for llm watermarking. InEMNLP, pages 61–71, 2024

work page 2024

[70] [70]

Modelshield: Adaptive and robust watermark against model extraction attack.TIFS, 2025

Kaiyi Pang, Tao Qi, Chuhan Wu, Minhao Bai, Minghu Jiang, and Yongfeng Huang. Modelshield: Adaptive and robust watermark against model extraction attack.TIFS, 2025

work page 2025

[71] [71]

No free lunch in llm watermarking: Trade-offs in watermarking design choices.NeurIPS, 2024

Qi Pang, Shengyuan Hu, Wenting Zheng, and Virginia Smith. No free lunch in llm watermarking: Trade-offs in watermarking design choices.NeurIPS, 2024

work page 2024

[72] [72]

Llmmap: Fingerprinting for large language models

Dario Pasquini et al. Llmmap: Fingerprinting for large language models. InUSENIX Security, 2025

work page 2025

[73] [73]

A universal part-of-speech tagset

Slav Petrov, Dipanjan Das, and Ryan McDonald. A universal part-of-speech tagset. InInternational Conference on Language Resources and Evaluation, 2012

work page 2012

[74] [74]

Markmywords: Analyzing and evaluating language model watermarks

Julien Piet, Chawin Sitawarin, Vivian Fang, Norman Mu, and David Wagner. Markmywords: Analyzing and evaluating language model watermarks. InSaTML, pages 68–91. IEEE, 2025

work page 2025

[75] [75]

Stanza: A python natural language processing toolkit for many human languages

Peng Qi, Yuhao Zhang, Yuhui Zhang, Jason Bolton, and Christopher D Manning. Stanza: A python natural language processing toolkit for many human languages. InACL: System Demonstrations, 2020

work page 2020

[76] [76]

Provably robust multi-bit watermarking for AI-generated text

Wenjie Qu, Wengrui Zheng, Tianyang Tao, Dong Yin, et al. Provably robust multi-bit watermarking for AI-generated text. InUSENIX Security, pages 201–220, 2025

work page 2025

[77] [77]

Radford et al

A. Radford et al. Learning transferable visual models from natural language supervision. InICML, 2021

work page 2021

[78] [78]

A robust semantics-based watermark for large language model against paraphrasing

Jie Ren, Han Xu, Yiding Liu, Yingqian Cui, Shuaiqiang Wang, Dawei Yin, and Jiliang Tang. A robust semantics-based watermark for large language model against paraphrasing. InNAACL 2024, 2024

work page 2024

[79] [79]

Can ai-generated text be reliably detected?TMLR, 2023

Vinu Sankar Sadasivan, Aounon Kumar, Sriram Balasubramanian, Wenxiao Wang, and Soheil Feizi. Can ai-generated text be reliably detected?TMLR, 2023

work page 2023

[80] [80]

Watermarking makes language models radioactive.NeurIPS, 37, 2024

Tom Sander, Pierre Fernandez, Alain Durmus, Matthijs Douze, and Teddy Furon. Watermarking makes language models radioactive.NeurIPS, 37, 2024

work page 2024