pith. sign in

arxiv: 2605.23175 · v1 · pith:J253TILLnew · submitted 2026-05-22 · 💻 cs.CR · cs.CL

Robust LLM Watermarking with Minimal Semantic Distortion for IP Protection

Pith reviewed 2026-05-25 04:32 UTC · model grok-4.3

classification 💻 cs.CR cs.CL
keywords LLM watermarkingIP protectionsemantic preservationkey-conditioned samplingcontrastive detectionrobustnesstournament samplingnamed entity preservation
0
0 comments X

The pith

SAFESEAL watermarks LLM outputs by replacing words with key-selected synonyms while keeping named entities and facts intact for IP verification.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces SAFESEAL as a key-conditioned watermarking method for proprietary LLMs to let owners detect if their model outputs were used to train a surrogate. It replaces linguistic terms with context-aware synonyms chosen via tournament sampling conditioned on a secret key, while explicitly preserving named entities. Detection uses a contrastive encoder that takes both the text and the key to confirm the watermark in a provider-specific way. The approach claims to maintain high semantic fidelity, with reported BERTScore of 0.983 and 98.2 percent detection, plus lower latency than prior methods. A sympathetic reader would care because existing watermarks often distort meaning or fail against attacks, limiting their value for protecting model IP.

Core claim

SAFESEAL is a novel key-conditioned watermarking framework that achieves strong detectability with minimal impact on model utility by preserving named entities while substituting linguistic terms with context-aware synonyms through a key-conditioned Tournament sampling mechanism, maintaining semantic fidelity and factual consistency. For detection, it introduces a key-conditioned contrastive detector that jointly encodes the text and key, enabling provider-specific and robust watermark verification. Theoretical bounds on the utility-detectability trade-off are derived and latency is reduced through lightweight models, batching, and parallelism.

What carries the argument

key-conditioned Tournament sampling mechanism that selects context-aware synonyms to embed the watermark while preserving named entities

If this is right

  • Theoretical bounds are derived for the utility-detectability trade-off.
  • Latency is reduced to levels comparable to the fastest baseline via lightweight models, batching, and parallelism.
  • Provider-specific detection works in cross-provider and multi-user scenarios.
  • The method outperforms baselines on utility, detectability, and robustness metrics including 0.983 BERTScore and 98.2 percent detection rate.
  • A public leaderboard and interactive demo are released to standardize future comparisons.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The synonym-substitution approach could be tested on code generation or summarization tasks to check if the same preservation properties hold.
  • Stronger adversarial paraphrasing attacks beyond those evaluated could still remove the watermark signal.
  • Embedding the watermark at API level would allow automatic ownership checks on any downstream use of the outputs.
  • The released leaderboard may drive standardized benchmarks that include multi-provider key collision tests.

Load-bearing premise

The tournament sampling will keep producing synonyms that preserve factual consistency and named entities across diverse real-world prompts without creating artifacts an adversary can exploit.

What would settle it

An evaluation on a held-out domain showing that watermarked outputs have measurably lower factual accuracy or that fine-tuning a surrogate model removes the detectable signal while retaining task performance.

Figures

Figures reproduced from arXiv: 2605.23175 by Kieu Dang, NHatHai Phan, Phung Lai, Ruoming Jin, Yelong Shen.

Figure 1
Figure 1. Figure 1: SAFESEAL overview [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Impact of similarity threshold δ, watermarkable set size |T wm|, and correlation with text length on LLaMA-2. (Solid lines show expected deviations; shaded areas are two bounds) Theorem 4.1. For a similarity threshold δ, the expected deviation between y and y wm is bounded: dlb |T wm| ≤ E  ∆(y, ywm)  ≤ dub |T wm| , (6) where |T wm| is the number of watermarkable tokens in y (controlled by δ), and dlb and… view at source ↗
Figure 3
Figure 3. Figure 3: Utility and detectability performance on text generation and summarization. [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: AMT results on text generation, removal attacks, and detection on the attacks (LLaMA-2). [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Watermark leaderboard and interactive demo interface. [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: ROC curves for watermark detection in text generation under attack-free settings. [PITH_FULL_IMAGE:figures/full_fig_p019_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Performance of text generation and summarization on Mistral. [PITH_FULL_IMAGE:figures/full_fig_p020_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Preference of SAFESEAL over other watermarks in head-to-head human evaluation with different text length. outperforming all baselines. On Mistral, it remains highly competitive with an AUC of 0.9937, closely matching the best-performing methods. By contrast, SynthID performs much worse on both models (AUC ≈ 0.59), suggesting limited discriminative capability in this 200-token generation setting. Overall, t… view at source ↗
Figure 9
Figure 9. Figure 9: Impact of lookup similarity threshold δ, watermarkable set size |T wm|, and correlation with text length for Mistral [PITH_FULL_IMAGE:figures/full_fig_p021_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Detection rates across detectors. already weak on short-text tasks, their detection drops to near 0 – 1% when a surrogate reproduces outputs. This further highlights the robustness of SAFESEAL across both LLMs. D.10 Justification for Q7. Cross-provider Performance [PITH_FULL_IMAGE:figures/full_fig_p023_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: ATM template for an example evaluation. Workers select the best watermarked output based on the following four criteria: (1) relevance to the original LLM output, assessing how well the watermarked version retains the original semantic meaning; (2) grammatical correctness, evaluating fluency and adherence to standard grammar rules; (3) factual consistency, ensuring that the content remains and align well … view at source ↗
read the original abstract

Proprietary large language models (LLMs) face risks of intellectual property (IP) violation, as adversaries can replicate an LLM by collecting input-output pairs to train a surrogate model, causing financial setbacks. Watermarks offer a promising defense to verify ownership, but existing methods often struggle with semantic distortion, factual inconsistency, and adversarial attacks. In addition, key-conditioned watermarks for provider-specific detection, especially in cross-provider and multi-user scenarios, remain largely underexplored. To address these challenges, we propose SAFESEAL, a novel key-conditioned watermarking framework that achieves strong detectability with minimal impact on model utility, effectively balancing detectability, utility, and robustness. SAFESEAL preserves named entities while substituting linguistic terms with context-aware synonyms through a key-conditioned Tournament sampling mechanism, maintaining semantic fidelity and factual consistency. For detection, we introduce a key-conditioned contrastive detector that jointly encodes the text and key, enabling provider-specific and robust watermark verification. We derive theoretical bounds on the utility-detectability trade-off and significantly reduce latency through lightweight models, batching, and parallelism. Extensive experiments show that SAFESEAL outperforms baselines in utility, detectability, and robustness, achieving a BERTScore of 0.983, entity similarity of 0.963, a 98.2% detection rate, and the highest human ratings for text quality and content preservation, with latency comparable to the fastest baseline. To promote transparency and community-driven progress, we release the first public watermark leaderboard and an interactive demo.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper proposes SAFESEAL, a key-conditioned LLM watermarking framework for IP protection. It employs tournament sampling (key-conditioned) to replace terms with context-aware synonyms while preserving named entities and factual consistency, paired with a key-conditioned contrastive detector for provider-specific verification. Claims include strong empirical performance (BERTScore 0.983, entity similarity 0.963, 98.2% detection rate, highest human ratings), theoretical bounds on the utility-detectability trade-off, reduced latency via lightweight models/batching, robustness to attacks, and release of a public watermark leaderboard plus demo.

Significance. If the reported metrics and bounds hold under proper controls, the work offers a practical advance in balancing detectability, utility, and robustness for LLM watermarking, particularly for cross-provider and multi-user settings. The open leaderboard and demo are constructive contributions to the field.

major comments (2)
  1. [Experiments] Experiments section: The manuscript reports headline metrics (BERTScore 0.983, entity similarity 0.963, 98.2% detection rate) and outperformance over baselines but provides no details on experimental controls, number of prompts/domains tested, error bars, statistical significance tests, or post-hoc selection criteria, which is load-bearing for the central empirical claims.
  2. [Theoretical Analysis] Theoretical Analysis section: The abstract states that theoretical bounds on the utility-detectability trade-off are derived, yet no equations, assumptions, or derivation steps are referenced or shown in the summary material, preventing assessment of whether the bounds are non-vacuous or independent of fitted parameters.
minor comments (1)
  1. [Abstract] Abstract: The claim of 'latency comparable to the fastest baseline' lacks specific numerical values or table references for direct comparison.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback and the recommendation for major revision. We address the two major comments point by point below.

read point-by-point responses
  1. Referee: [Experiments] Experiments section: The manuscript reports headline metrics (BERTScore 0.983, entity similarity 0.963, 98.2% detection rate) and outperformance over baselines but provides no details on experimental controls, number of prompts/domains tested, error bars, statistical significance tests, or post-hoc selection criteria, which is load-bearing for the central empirical claims.

    Authors: We agree that the current presentation of the Experiments section lacks sufficient detail on these aspects. In the revised manuscript we will expand the section to report the number of prompts and domains evaluated, the experimental controls employed, error bars computed over multiple independent runs, the results of statistical significance tests, and the criteria used for any post-hoc analysis. revision: yes

  2. Referee: [Theoretical Analysis] Theoretical Analysis section: The abstract states that theoretical bounds on the utility-detectability trade-off are derived, yet no equations, assumptions, or derivation steps are referenced or shown in the summary material, preventing assessment of whether the bounds are non-vacuous or independent of fitted parameters.

    Authors: The Theoretical Analysis section of the manuscript contains the derivation of the bounds. To make the material self-contained and allow direct assessment, we will revise the section to explicitly state all modeling assumptions, present the key equations, and include the complete derivation steps, confirming that the bounds follow from information-theoretic arguments without dependence on fitted parameters. revision: yes

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The manuscript presents SAFESEAL as an empirical proposal: a key-conditioned tournament sampling mechanism for watermark insertion and a contrastive detector, supported by experimental metrics (BERTScore 0.983, 98.2% detection) and stated theoretical bounds on the utility-detectability trade-off. No equations, derivations, or self-citations are shown that reduce the reported performance numbers, detection rates, or bounds to quantities defined by fitted parameters from the same experiments or to prior self-citations by construction. The central claims rest on the proposed mechanisms and external experimental validation rather than any of the enumerated circular patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review provides no explicit free parameters, axioms, or invented entities; the method implicitly relies on standard LLM generation assumptions and synonym availability but these are not enumerated.

pith-pipeline@v0.9.0 · 5812 in / 1235 out tokens · 44430 ms · 2026-05-25T04:32:04.207603+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

113 extracted references · 113 canonical work pages

  1. [1]

    Downstream trade-offs of a family of text watermarks

    Anirudh Ajith, Sameer Singh, and Danish Pruthi. Downstream trade-offs of a family of text watermarks. InEMNLP, 2024

  2. [2]

    ms/ AzureMLModelInterpretability, 2021

    Azure.https: // aka. ms/ AzureMLModelInterpretability, 2021

  3. [3]

    Cross-attention watermarking of large language models

    Folco Bertini Baldassini, Huy H Nguyen, Ching-Chung Chang, and Isao Echizen. Cross-attention watermarking of large language models. InICASSP, 2024

  4. [4]

    Deepseek llm: Scaling open-source language models with longtermism.arXiv, 2024

    Xiao Bi, Deli Chen, Guanting Chen, et al. Deepseek llm: Scaling open-source language models with longtermism.arXiv, 2024. unpublished

  5. [5]

    Model leeching: an extraction attack targeting llms.CAMLIS, 2023

    Lewis Birch, William Hackett, Stefan Trawicki, Neeraj Suri, and Peter Garraghan. Model leeching: an extraction attack targeting llms.CAMLIS, 2023

  6. [6]

    O’Reilly Media, Inc

    Steven Bird, Ewan Klein, and Edward Loper.Natural language processing with Python: analyzing text with the natural language toolkit. " O’Reilly Media, Inc.", 2009

  7. [7]

    Ai explainability 360.Available athttps: // aix360

    Bluemix. Ai explainability 360.Available athttps: // aix360. mybluemix. net/, 2021

  8. [8]

    Stealing part of a production language model

    Nicholas Carlini, Daniel Paleka, et al. Stealing part of a production language model. InICML, 2024

  9. [9]

    Evaluation of text generation: A survey.arXiv, 2020

    Asli Celikyilmaz, Elizabeth Clark, et al. Evaluation of text generation: A survey.arXiv, 2020

  10. [10]

    PostMark: A robust blackbox watermark for LLMs

    Yapei Chang et al. PostMark: A robust blackbox watermark for LLMs. InEMNLP, 2024

  11. [11]

    Danqi Chen, Jason Bolton, and Christopher D. Manning. A thorough examination of the CNN/Daily Mail reading comprehension task. InACL, 2016

  12. [12]

    Watme: Towards lossless watermarking through lexical redundancy

    Liang Chen, Yatao Bian, Yang Deng, Deng Cai, Shuaiyi Li, Peilin Zhao, and Kam-Fai Wong. Watme: Towards lossless watermarking through lexical redundancy. InACL, pages 9166–9180, 2024

  13. [13]

    Improved unbiased watermark for large language models

    Ruibo Chen, Yihan Wu, Junfeng Guo, and Heng Huang. Improved unbiased watermark for large language models. InACL, pages 20587–20601, July 2025. ISBN 979-8-89176-251-0

  14. [14]

    Revealing weaknesses in text watermark- ing through self-information rewrite attacks

    Yixin Cheng, Hongcheng Guo, Yangming Li, and Leonid Sigal. Revealing weaknesses in text watermark- ing through self-information rewrite attacks. InICML, 2025

  15. [15]

    Undetectable watermarks for language models

    Miranda Christ, Sam Gunn, and Or Zamir. Undetectable watermarks for language models. InThe Thirty Seventh Annual Conference on Learning Theory, pages 1125–1139. PMLR, 2024

  16. [16]

    δ-steal: Llm stealing attack with local differential privacy

    Kieu Dang, Phung Lai, NhatHai Phan, Yelong Shen, Ruoming Jin, and Abdallah Khreishah. δ-steal: Llm stealing attack with local differential privacy. InACML, 2025. in press

  17. [17]

    Watermarking language models through language models.IEEE Transactions on Artificial Intelligence, 2025

    Agnibh Dasgupta, Abdullah All Tanvir, and Xin Zhong. Watermarking language models through language models.IEEE Transactions on Artificial Intelligence, 2025

  18. [18]

    Scalable watermarking for identifying large language model outputs.Nature, 634(8035):818–823, 2024

    Sumanth Dathathri, Abigail See, Sumedh Ghaisas, Po-Sen Huang, Rob McAdam, Johannes Welbl, et al. Scalable watermarking for identifying large language model outputs.Nature, 634(8035):818–823, 2024

  19. [19]

    Bert: Pre-training of deep bidirectional transformers for language understanding

    Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. InACL, 2019

  20. [20]

    Documenting large webtext corpora: A case study on the colossal clean crawled corpus

    Jesse Dodge, Maarten Sap, Ana Marasovi´c, William Agnew, Gabriel Ilharco, et al. Documenting large webtext corpora: A case study on the colossal clean crawled corpus. InEMNLP, 2021

  21. [21]

    Sandcastles in the storm: Revisiting the (im) possibility of strong watermarking

    Boran Erol, Connor Choi, Jason Liu, Gary Jiarui Song, Nanyun Peng, Amit Sahai, et al. Sandcastles in the storm: Revisiting the (im) possibility of strong watermarking. InACL, pages 29698–29735, 2025

  22. [22]

    An introduction to roc analysis.Pattern Recognition Letters, 27(8):861–874, 2006

    Tom Fawcett. An introduction to roc analysis.Pattern Recognition Letters, 27(8):861–874, 2006

  23. [23]

    Gumbelsoft: Diversified LM watermarking via the gumbelmax-trick

    Jiayi Fu et al. Gumbelsoft: Diversified LM watermarking via the gumbelmax-trick. InACL, 2024

  24. [24]

    Watermax: breaking the llm watermark detectability-robustness-quality trade-off

    Eva Giboulot and Furon Teddy. Watermax: breaking the llm watermark detectability-robustness-quality trade-off. InNeurIPS, 2024

  25. [25]

    Edit distance robust watermarks for LMs

    Noah Golowich and Ankur Moitra. Edit distance robust watermarks for LMs. InNeurIPS, 2024

  26. [26]

    Google gemini

    Google. Google gemini. https://bard.google.com/chat/, 2024. 11

  27. [27]

    On the learnability of watermarks for LMs.ICLR, 2024

    Chenchen Gu, Xiang Lisa Li, et al. On the learnability of watermarks for LMs.ICLR, 2024

  28. [28]

    Watermarking pre-trained language models with backdooring.arXiv, 2022

    Chenxi Gu et al. Watermarking pre-trained language models with backdooring.arXiv, 2022

  29. [29]

    Post-hoc watermarking for robust detection in text generated by LLMs

    Jifei Hao et al. Post-hoc watermarking for robust detection in text generated by LLMs. InICCL, 2025

  30. [30]

    Deberta: Decoding-enhanced bert with disentangled attention

    Pengcheng He et al. Deberta: Decoding-enhanced bert with disentangled attention. InICLR, 2021

  31. [31]

    Protecting intellectual property of language generation apis with lexical watermark

    Xuanli He, Qiongkai Xu, Lingjuan Lyu, Fangzhao Wu, and Chenguang Wang. Protecting intellectual property of language generation apis with lexical watermark. InAAAI, 2022

  32. [32]

    Cater: Intellectual property protection on text generation apis via conditional watermarks.NeurIPS, 2022

    Xuanli He, Qiongkai Xu, Yi Zeng, Lingjuan Lyu, Fangzhao Wu, Jiwei Li, and Ruoxi Jia. Cater: Intellectual property protection on text generation apis via conditional watermarks.NeurIPS, 2022

  33. [33]

    Measuring massive multitask language understanding

    Dan Hendrycks, Collin Burns, Steven Basart, Andy Zou, Mantas Mazeika, Dawn Song, and Jacob Steinhardt. Measuring massive multitask language understanding. InICLR, 2021

  34. [34]

    spaCy: Industrial-strength NLP in python

    Matthew Honnibal, Ines Montani, et al. spaCy: Industrial-strength NLP in python. 2020

  35. [35]

    Semstamp: A semantic watermark with paraphrastic robustness for text generation

    Abe Hou, Jingyu Zhang, Tianxing He, Yichen Wang, et al. Semstamp: A semantic watermark with paraphrastic robustness for text generation. InNAACL, pages 4067–4082, 2024

  36. [36]

    k-semstamp: A clustering- based semantic watermark for detection of machine-generated text

    Abe Hou, Jingyu Zhang, Yichen Wang, Daniel Khashabi, and Tianxing He. k-semstamp: A clustering- based semantic watermark for detection of machine-generated text. InACL, pages 1706–1715, 2024

  37. [37]

    Unbiased watermark for LLMs

    Zhengmian Hu, Lichang Chen, Xidong Wu, et al. Unbiased watermark for LLMs. InICLR, 2024

  38. [38]

    WaterPool: A LM watermark mitigating trade-offs among impercepti- bility, efficacy and robustness

    Baizhou Huang and Xiaojun Wan. WaterPool: A LM watermark mitigating trade-offs among impercepti- bility, efficacy and robustness. InNAACL: HLT, 2025

  39. [39]

    Token-specific watermarking with enhanced detectability and semantic coherence for LLMs

    Mingjia Huo, Sai Ashish Somayajula, Youwei Liang, Ruisi Zhang, et al. Token-specific watermarking with enhanced detectability and semantic coherence for LLMs. InICML, 2024

  40. [40]

    Scaling up visual and vision-language representation learning with noisy text supervision

    Chao Jia et al. Scaling up visual and vision-language representation learning with noisy text supervision. InICML, 2021

  41. [41]

    Entangled watermarks as a defense against model extraction

    Hengrui Jia et al. Entangled watermarks as a defense against model extraction. InUSENIX, 2021

  42. [42]

    Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Florian Bressand, Gianna Lengyel, et al

    Albert Q. Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Florian Bressand, Gianna Lengyel, et al. Mistral 7b, 2023. unpublished

  43. [43]

    Watermark stealing in LLMs

    Nikola Jovanovi ´c, Robin Staab, and Martin Vechev. Watermark stealing in LLMs. InICML, 2024

  44. [44]

    Prada: protecting against dnn model stealing attacks

    Mika Juuti, Sebastian Szyller, Samuel Marchal, and N Asokan. Prada: protecting against dnn model stealing attacks. InEuroS&P, pages 512–527. IEEE, 2019

  45. [45]

    A watermark for large language models

    John Kirchenbauer, Jonas Geiping, Yuxin Wen, Jonathan Katz, Ian Miers, and Tom Goldstein. A watermark for large language models. InICML, 2023

  46. [46]

    Thieves on sesame street! model extraction of bert-based apis.ICLR, 2020

    Kalpesh Krishna, Gaurav Singh Tomar, Ankur P Parikh, Nicolas Papernot, and Mohit Iyyer. Thieves on sesame street! model extraction of bert-based apis.ICLR, 2020

  47. [47]

    Paraphrasing evades detectors of ai-generated text, but retrieval is an effective defense.NeurIPS, 36:27469–27500, 2023

    Kalpesh Krishna, Yixiao Song, Marzena Karpinska, John Wieting, and Mohit Iyyer. Paraphrasing evades detectors of ai-generated text, but retrieval is an effective defense.NeurIPS, 36:27469–27500, 2023

  48. [48]

    Robust distortion-free watermarks for LMs.TMLR, 2023

    Rohith Kuditipudi, John Thickstun, et al. Robust distortion-free watermarks for LMs.TMLR, 2023

  49. [49]

    Kwon et al

    W. Kwon et al. Efficient memory management for LLM serving with pagedattention. InSOSP, 2023

  50. [50]

    Who wrote this code? watermarking for code generation

    Taehyun Lee, Seokhee Hong, Jaewoo Ahn, Ilgee Hong, Hwaran Lee, Sangdoo Yun, Jamin Shin, and Gunhee Kim. Who wrote this code? watermarking for code generation. InACL, 2024

  51. [51]

    Binary codes capable of correcting deletions, insertions, and reversals.Soviet Physics Doklady, 10(8):707–710, 1966

    Vladimir I Levenshtein. Binary codes capable of correcting deletions, insertions, and reversals.Soviet Physics Doklady, 10(8):707–710, 1966

  52. [52]

    Plmmark: a secure and robust black-box watermarking framework for pre-trained language model

    Peixuan Li, Pengzhou Cheng, Fangqi Li, Wei Du, Haodong Zhao, and Gongshen Liu. Plmmark: a secure and robust black-box watermarking framework for pre-trained language model. InAAAI, 2023

  53. [53]

    Protecting intellectual property of large language model-based code generation apis via watermarks

    Zongjie Li, Chaozheng Wang, Shuai Wang, and Cuiyun Gao. Protecting intellectual property of large language model-based code generation apis via watermarks. InACM SIGSAC, pages 2336–2350, 2023. 12

  54. [54]

    Towards document-level paraphrase generation with sentence rewriting and reordering

    Zhe Lin, Yitao Cai, and Xiaojun Wan. Towards document-level paraphrase generation with sentence rewriting and reordering. InEMNLP, 2021

  55. [55]

    An unforgeable publicly verifiable watermark for LLMs

    Aiwei Liu, Leyi Pan, et al. An unforgeable publicly verifiable watermark for LLMs. InICLR, 2023

  56. [56]

    A semantic invariant robust watermark for LLMs.ICLR, 2024

    Aiwei Liu, Leyi Pan, Xuming Hu, et al. A semantic invariant robust watermark for LLMs.ICLR, 2024

  57. [57]

    Deepseek-v2: A strong, economical, and efficient mixture-of-experts LM.arXiv, 2024

    Aixin Liu et al. Deepseek-v2: A strong, economical, and efficient mixture-of-experts LM.arXiv, 2024

  58. [58]

    Adaptive text watermark for large language models

    Yepeng Liu and Yuheng Bu. Adaptive text watermark for large language models. InICML, 2024

  59. [59]

    Roberta: A robustly optimized bert pretraining approach.arXiv, 2019

    Yinhan Liu, Myle Ott, et al. Roberta: A robustly optimized bert pretraining approach.arXiv, 2019

  60. [60]

    Nltk: The natural language toolkit.arXiv, 2002

    Edward Loper and Steven Bird. Nltk: The natural language toolkit.arXiv, 2002. unpublished

  61. [61]

    A watermark for low-entropy and unbiased generation in large language models.arXiv, 2024

    Minjia Mao, Dongjun Wei, Zeyu Chen, Xiao Fang, and Michael Chau. A watermark for low-entropy and unbiased generation in large language models.arXiv, 2024

  62. [62]

    Natural language watermarking via morphosyntactic alterations.Computer Speech & Language, 23(1):107–125, 2009

    Hasan Mesut Meral, Bülent Sankur, A Sumru Özsoy, Tunga Güngör, and Emre Sevinç. Natural language watermarking via morphosyntactic alterations.Computer Speech & Language, 23(1):107–125, 2009

  63. [63]

    Wordnet: a lexical database for english.ACM Communications, 38(11):39–41, 1995

    George A Miller. Wordnet: a lexical database for english.ACM Communications, 38(11):39–41, 1995

  64. [64]

    Molenda et al

    P. Molenda et al. Waterjudge: Quality-detection trade-off when watermarking LLMs. InNAACL, 2024

  65. [65]

    Deeptextmark: a deep learning-driven text watermarking approach for identifying LLM generated text.Ieee Access, 12:40508–40520, 2024

    Travis Munyer, Abdullah All Tanvir, Arjon Das, and Xin Zhong. Deeptextmark: a deep learning-driven text watermarking approach for identifying LLM generated text.Ieee Access, 12:40508–40520, 2024

  66. [66]

    A survey of named entity recognition and classification.Lingvisticae Investigationes, 2007

    David Nadeau and Satoshi Sekine. A survey of named entity recognition and classification.Lingvisticae Investigationes, 2007

  67. [67]

    Entity-level factual consistency of abstractive text summarization

    Feng Nan et al. Entity-level factual consistency of abstractive text summarization. InACL, 2021

  68. [68]

    OpenAI, 2024.https://openai.com/index/openai-api/[Accessed: 2024-09-20]

  69. [69]

    Markllm: An open-source toolkit for llm watermarking

    Leyi Pan, Aiwei Liu, Zhiwei He, Zitian Gao, Xuandong Zhao, Yijian Lu, Binglin Zhou, et al. Markllm: An open-source toolkit for llm watermarking. InEMNLP, pages 61–71, 2024

  70. [70]

    Modelshield: Adaptive and robust watermark against model extraction attack.TIFS, 2025

    Kaiyi Pang, Tao Qi, Chuhan Wu, Minhao Bai, Minghu Jiang, and Yongfeng Huang. Modelshield: Adaptive and robust watermark against model extraction attack.TIFS, 2025

  71. [71]

    No free lunch in llm watermarking: Trade-offs in watermarking design choices.NeurIPS, 2024

    Qi Pang, Shengyuan Hu, Wenting Zheng, and Virginia Smith. No free lunch in llm watermarking: Trade-offs in watermarking design choices.NeurIPS, 2024

  72. [72]

    Llmmap: Fingerprinting for large language models

    Dario Pasquini et al. Llmmap: Fingerprinting for large language models. InUSENIX Security, 2025

  73. [73]

    A universal part-of-speech tagset

    Slav Petrov, Dipanjan Das, and Ryan McDonald. A universal part-of-speech tagset. InInternational Conference on Language Resources and Evaluation, 2012

  74. [74]

    Markmywords: Analyzing and evaluating language model watermarks

    Julien Piet, Chawin Sitawarin, Vivian Fang, Norman Mu, and David Wagner. Markmywords: Analyzing and evaluating language model watermarks. InSaTML, pages 68–91. IEEE, 2025

  75. [75]

    Stanza: A python natural language processing toolkit for many human languages

    Peng Qi, Yuhao Zhang, Yuhui Zhang, Jason Bolton, and Christopher D Manning. Stanza: A python natural language processing toolkit for many human languages. InACL: System Demonstrations, 2020

  76. [76]

    Provably robust multi-bit watermarking for AI-generated text

    Wenjie Qu, Wengrui Zheng, Tianyang Tao, Dong Yin, et al. Provably robust multi-bit watermarking for AI-generated text. InUSENIX Security, pages 201–220, 2025

  77. [77]

    Radford et al

    A. Radford et al. Learning transferable visual models from natural language supervision. InICML, 2021

  78. [78]

    A robust semantics-based watermark for large language model against paraphrasing

    Jie Ren, Han Xu, Yiding Liu, Yingqian Cui, Shuaiqiang Wang, Dawei Yin, and Jiliang Tang. A robust semantics-based watermark for large language model against paraphrasing. InNAACL 2024, 2024

  79. [79]

    Can ai-generated text be reliably detected?TMLR, 2023

    Vinu Sankar Sadasivan, Aounon Kumar, Sriram Balasubramanian, Wenxiao Wang, and Soheil Feizi. Can ai-generated text be reliably detected?TMLR, 2023

  80. [80]

    Watermarking makes language models radioactive.NeurIPS, 37, 2024

    Tom Sander, Pierre Fernandez, Alain Durmus, Matthijs Douze, and Teddy Furon. Watermarking makes language models radioactive.NeurIPS, 37, 2024

Showing first 80 references.