Robust LLM Watermarking with Minimal Semantic Distortion for IP Protection
Pith reviewed 2026-05-25 04:32 UTC · model grok-4.3
The pith
SAFESEAL watermarks LLM outputs by replacing words with key-selected synonyms while keeping named entities and facts intact for IP verification.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
SAFESEAL is a novel key-conditioned watermarking framework that achieves strong detectability with minimal impact on model utility by preserving named entities while substituting linguistic terms with context-aware synonyms through a key-conditioned Tournament sampling mechanism, maintaining semantic fidelity and factual consistency. For detection, it introduces a key-conditioned contrastive detector that jointly encodes the text and key, enabling provider-specific and robust watermark verification. Theoretical bounds on the utility-detectability trade-off are derived and latency is reduced through lightweight models, batching, and parallelism.
What carries the argument
key-conditioned Tournament sampling mechanism that selects context-aware synonyms to embed the watermark while preserving named entities
If this is right
- Theoretical bounds are derived for the utility-detectability trade-off.
- Latency is reduced to levels comparable to the fastest baseline via lightweight models, batching, and parallelism.
- Provider-specific detection works in cross-provider and multi-user scenarios.
- The method outperforms baselines on utility, detectability, and robustness metrics including 0.983 BERTScore and 98.2 percent detection rate.
- A public leaderboard and interactive demo are released to standardize future comparisons.
Where Pith is reading between the lines
- The synonym-substitution approach could be tested on code generation or summarization tasks to check if the same preservation properties hold.
- Stronger adversarial paraphrasing attacks beyond those evaluated could still remove the watermark signal.
- Embedding the watermark at API level would allow automatic ownership checks on any downstream use of the outputs.
- The released leaderboard may drive standardized benchmarks that include multi-provider key collision tests.
Load-bearing premise
The tournament sampling will keep producing synonyms that preserve factual consistency and named entities across diverse real-world prompts without creating artifacts an adversary can exploit.
What would settle it
An evaluation on a held-out domain showing that watermarked outputs have measurably lower factual accuracy or that fine-tuning a surrogate model removes the detectable signal while retaining task performance.
Figures
read the original abstract
Proprietary large language models (LLMs) face risks of intellectual property (IP) violation, as adversaries can replicate an LLM by collecting input-output pairs to train a surrogate model, causing financial setbacks. Watermarks offer a promising defense to verify ownership, but existing methods often struggle with semantic distortion, factual inconsistency, and adversarial attacks. In addition, key-conditioned watermarks for provider-specific detection, especially in cross-provider and multi-user scenarios, remain largely underexplored. To address these challenges, we propose SAFESEAL, a novel key-conditioned watermarking framework that achieves strong detectability with minimal impact on model utility, effectively balancing detectability, utility, and robustness. SAFESEAL preserves named entities while substituting linguistic terms with context-aware synonyms through a key-conditioned Tournament sampling mechanism, maintaining semantic fidelity and factual consistency. For detection, we introduce a key-conditioned contrastive detector that jointly encodes the text and key, enabling provider-specific and robust watermark verification. We derive theoretical bounds on the utility-detectability trade-off and significantly reduce latency through lightweight models, batching, and parallelism. Extensive experiments show that SAFESEAL outperforms baselines in utility, detectability, and robustness, achieving a BERTScore of 0.983, entity similarity of 0.963, a 98.2% detection rate, and the highest human ratings for text quality and content preservation, with latency comparable to the fastest baseline. To promote transparency and community-driven progress, we release the first public watermark leaderboard and an interactive demo.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes SAFESEAL, a key-conditioned LLM watermarking framework for IP protection. It employs tournament sampling (key-conditioned) to replace terms with context-aware synonyms while preserving named entities and factual consistency, paired with a key-conditioned contrastive detector for provider-specific verification. Claims include strong empirical performance (BERTScore 0.983, entity similarity 0.963, 98.2% detection rate, highest human ratings), theoretical bounds on the utility-detectability trade-off, reduced latency via lightweight models/batching, robustness to attacks, and release of a public watermark leaderboard plus demo.
Significance. If the reported metrics and bounds hold under proper controls, the work offers a practical advance in balancing detectability, utility, and robustness for LLM watermarking, particularly for cross-provider and multi-user settings. The open leaderboard and demo are constructive contributions to the field.
major comments (2)
- [Experiments] Experiments section: The manuscript reports headline metrics (BERTScore 0.983, entity similarity 0.963, 98.2% detection rate) and outperformance over baselines but provides no details on experimental controls, number of prompts/domains tested, error bars, statistical significance tests, or post-hoc selection criteria, which is load-bearing for the central empirical claims.
- [Theoretical Analysis] Theoretical Analysis section: The abstract states that theoretical bounds on the utility-detectability trade-off are derived, yet no equations, assumptions, or derivation steps are referenced or shown in the summary material, preventing assessment of whether the bounds are non-vacuous or independent of fitted parameters.
minor comments (1)
- [Abstract] Abstract: The claim of 'latency comparable to the fastest baseline' lacks specific numerical values or table references for direct comparison.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback and the recommendation for major revision. We address the two major comments point by point below.
read point-by-point responses
-
Referee: [Experiments] Experiments section: The manuscript reports headline metrics (BERTScore 0.983, entity similarity 0.963, 98.2% detection rate) and outperformance over baselines but provides no details on experimental controls, number of prompts/domains tested, error bars, statistical significance tests, or post-hoc selection criteria, which is load-bearing for the central empirical claims.
Authors: We agree that the current presentation of the Experiments section lacks sufficient detail on these aspects. In the revised manuscript we will expand the section to report the number of prompts and domains evaluated, the experimental controls employed, error bars computed over multiple independent runs, the results of statistical significance tests, and the criteria used for any post-hoc analysis. revision: yes
-
Referee: [Theoretical Analysis] Theoretical Analysis section: The abstract states that theoretical bounds on the utility-detectability trade-off are derived, yet no equations, assumptions, or derivation steps are referenced or shown in the summary material, preventing assessment of whether the bounds are non-vacuous or independent of fitted parameters.
Authors: The Theoretical Analysis section of the manuscript contains the derivation of the bounds. To make the material self-contained and allow direct assessment, we will revise the section to explicitly state all modeling assumptions, present the key equations, and include the complete derivation steps, confirming that the bounds follow from information-theoretic arguments without dependence on fitted parameters. revision: yes
Circularity Check
No significant circularity identified
full rationale
The manuscript presents SAFESEAL as an empirical proposal: a key-conditioned tournament sampling mechanism for watermark insertion and a contrastive detector, supported by experimental metrics (BERTScore 0.983, 98.2% detection) and stated theoretical bounds on the utility-detectability trade-off. No equations, derivations, or self-citations are shown that reduce the reported performance numbers, detection rates, or bounds to quantities defined by fitted parameters from the same experiments or to prior self-citations by construction. The central claims rest on the proposed mechanisms and external experimental validation rather than any of the enumerated circular patterns.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
key-conditioned Tournament sampling... Pi(tj) = exp(α Sij) / sum... gr(tj) = PRF(k, h, tj, r)
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Theorem 4.1... dlb/|Twm| ≤ E[Δ(y,ywm)] ≤ dub/|Twm|
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Downstream trade-offs of a family of text watermarks
Anirudh Ajith, Sameer Singh, and Danish Pruthi. Downstream trade-offs of a family of text watermarks. InEMNLP, 2024
work page 2024
-
[2]
ms/ AzureMLModelInterpretability, 2021
Azure.https: // aka. ms/ AzureMLModelInterpretability, 2021
work page 2021
-
[3]
Cross-attention watermarking of large language models
Folco Bertini Baldassini, Huy H Nguyen, Ching-Chung Chang, and Isao Echizen. Cross-attention watermarking of large language models. InICASSP, 2024
work page 2024
-
[4]
Deepseek llm: Scaling open-source language models with longtermism.arXiv, 2024
Xiao Bi, Deli Chen, Guanting Chen, et al. Deepseek llm: Scaling open-source language models with longtermism.arXiv, 2024. unpublished
work page 2024
-
[5]
Model leeching: an extraction attack targeting llms.CAMLIS, 2023
Lewis Birch, William Hackett, Stefan Trawicki, Neeraj Suri, and Peter Garraghan. Model leeching: an extraction attack targeting llms.CAMLIS, 2023
work page 2023
-
[6]
Steven Bird, Ewan Klein, and Edward Loper.Natural language processing with Python: analyzing text with the natural language toolkit. " O’Reilly Media, Inc.", 2009
work page 2009
-
[7]
Ai explainability 360.Available athttps: // aix360
Bluemix. Ai explainability 360.Available athttps: // aix360. mybluemix. net/, 2021
work page 2021
-
[8]
Stealing part of a production language model
Nicholas Carlini, Daniel Paleka, et al. Stealing part of a production language model. InICML, 2024
work page 2024
-
[9]
Evaluation of text generation: A survey.arXiv, 2020
Asli Celikyilmaz, Elizabeth Clark, et al. Evaluation of text generation: A survey.arXiv, 2020
work page 2020
-
[10]
PostMark: A robust blackbox watermark for LLMs
Yapei Chang et al. PostMark: A robust blackbox watermark for LLMs. InEMNLP, 2024
work page 2024
-
[11]
Danqi Chen, Jason Bolton, and Christopher D. Manning. A thorough examination of the CNN/Daily Mail reading comprehension task. InACL, 2016
work page 2016
-
[12]
Watme: Towards lossless watermarking through lexical redundancy
Liang Chen, Yatao Bian, Yang Deng, Deng Cai, Shuaiyi Li, Peilin Zhao, and Kam-Fai Wong. Watme: Towards lossless watermarking through lexical redundancy. InACL, pages 9166–9180, 2024
work page 2024
-
[13]
Improved unbiased watermark for large language models
Ruibo Chen, Yihan Wu, Junfeng Guo, and Heng Huang. Improved unbiased watermark for large language models. InACL, pages 20587–20601, July 2025. ISBN 979-8-89176-251-0
work page 2025
-
[14]
Revealing weaknesses in text watermark- ing through self-information rewrite attacks
Yixin Cheng, Hongcheng Guo, Yangming Li, and Leonid Sigal. Revealing weaknesses in text watermark- ing through self-information rewrite attacks. InICML, 2025
work page 2025
-
[15]
Undetectable watermarks for language models
Miranda Christ, Sam Gunn, and Or Zamir. Undetectable watermarks for language models. InThe Thirty Seventh Annual Conference on Learning Theory, pages 1125–1139. PMLR, 2024
work page 2024
-
[16]
δ-steal: Llm stealing attack with local differential privacy
Kieu Dang, Phung Lai, NhatHai Phan, Yelong Shen, Ruoming Jin, and Abdallah Khreishah. δ-steal: Llm stealing attack with local differential privacy. InACML, 2025. in press
work page 2025
-
[17]
Agnibh Dasgupta, Abdullah All Tanvir, and Xin Zhong. Watermarking language models through language models.IEEE Transactions on Artificial Intelligence, 2025
work page 2025
-
[18]
Scalable watermarking for identifying large language model outputs.Nature, 634(8035):818–823, 2024
Sumanth Dathathri, Abigail See, Sumedh Ghaisas, Po-Sen Huang, Rob McAdam, Johannes Welbl, et al. Scalable watermarking for identifying large language model outputs.Nature, 634(8035):818–823, 2024
work page 2024
-
[19]
Bert: Pre-training of deep bidirectional transformers for language understanding
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. InACL, 2019
work page 2019
-
[20]
Documenting large webtext corpora: A case study on the colossal clean crawled corpus
Jesse Dodge, Maarten Sap, Ana Marasovi´c, William Agnew, Gabriel Ilharco, et al. Documenting large webtext corpora: A case study on the colossal clean crawled corpus. InEMNLP, 2021
work page 2021
-
[21]
Sandcastles in the storm: Revisiting the (im) possibility of strong watermarking
Boran Erol, Connor Choi, Jason Liu, Gary Jiarui Song, Nanyun Peng, Amit Sahai, et al. Sandcastles in the storm: Revisiting the (im) possibility of strong watermarking. InACL, pages 29698–29735, 2025
work page 2025
-
[22]
An introduction to roc analysis.Pattern Recognition Letters, 27(8):861–874, 2006
Tom Fawcett. An introduction to roc analysis.Pattern Recognition Letters, 27(8):861–874, 2006
work page 2006
-
[23]
Gumbelsoft: Diversified LM watermarking via the gumbelmax-trick
Jiayi Fu et al. Gumbelsoft: Diversified LM watermarking via the gumbelmax-trick. InACL, 2024
work page 2024
-
[24]
Watermax: breaking the llm watermark detectability-robustness-quality trade-off
Eva Giboulot and Furon Teddy. Watermax: breaking the llm watermark detectability-robustness-quality trade-off. InNeurIPS, 2024
work page 2024
-
[25]
Edit distance robust watermarks for LMs
Noah Golowich and Ankur Moitra. Edit distance robust watermarks for LMs. InNeurIPS, 2024
work page 2024
- [26]
-
[27]
On the learnability of watermarks for LMs.ICLR, 2024
Chenchen Gu, Xiang Lisa Li, et al. On the learnability of watermarks for LMs.ICLR, 2024
work page 2024
-
[28]
Watermarking pre-trained language models with backdooring.arXiv, 2022
Chenxi Gu et al. Watermarking pre-trained language models with backdooring.arXiv, 2022
work page 2022
-
[29]
Post-hoc watermarking for robust detection in text generated by LLMs
Jifei Hao et al. Post-hoc watermarking for robust detection in text generated by LLMs. InICCL, 2025
work page 2025
-
[30]
Deberta: Decoding-enhanced bert with disentangled attention
Pengcheng He et al. Deberta: Decoding-enhanced bert with disentangled attention. InICLR, 2021
work page 2021
-
[31]
Protecting intellectual property of language generation apis with lexical watermark
Xuanli He, Qiongkai Xu, Lingjuan Lyu, Fangzhao Wu, and Chenguang Wang. Protecting intellectual property of language generation apis with lexical watermark. InAAAI, 2022
work page 2022
-
[32]
Xuanli He, Qiongkai Xu, Yi Zeng, Lingjuan Lyu, Fangzhao Wu, Jiwei Li, and Ruoxi Jia. Cater: Intellectual property protection on text generation apis via conditional watermarks.NeurIPS, 2022
work page 2022
-
[33]
Measuring massive multitask language understanding
Dan Hendrycks, Collin Burns, Steven Basart, Andy Zou, Mantas Mazeika, Dawn Song, and Jacob Steinhardt. Measuring massive multitask language understanding. InICLR, 2021
work page 2021
-
[34]
spaCy: Industrial-strength NLP in python
Matthew Honnibal, Ines Montani, et al. spaCy: Industrial-strength NLP in python. 2020
work page 2020
-
[35]
Semstamp: A semantic watermark with paraphrastic robustness for text generation
Abe Hou, Jingyu Zhang, Tianxing He, Yichen Wang, et al. Semstamp: A semantic watermark with paraphrastic robustness for text generation. InNAACL, pages 4067–4082, 2024
work page 2024
-
[36]
k-semstamp: A clustering- based semantic watermark for detection of machine-generated text
Abe Hou, Jingyu Zhang, Yichen Wang, Daniel Khashabi, and Tianxing He. k-semstamp: A clustering- based semantic watermark for detection of machine-generated text. InACL, pages 1706–1715, 2024
work page 2024
-
[37]
Zhengmian Hu, Lichang Chen, Xidong Wu, et al. Unbiased watermark for LLMs. InICLR, 2024
work page 2024
-
[38]
WaterPool: A LM watermark mitigating trade-offs among impercepti- bility, efficacy and robustness
Baizhou Huang and Xiaojun Wan. WaterPool: A LM watermark mitigating trade-offs among impercepti- bility, efficacy and robustness. InNAACL: HLT, 2025
work page 2025
-
[39]
Token-specific watermarking with enhanced detectability and semantic coherence for LLMs
Mingjia Huo, Sai Ashish Somayajula, Youwei Liang, Ruisi Zhang, et al. Token-specific watermarking with enhanced detectability and semantic coherence for LLMs. InICML, 2024
work page 2024
-
[40]
Scaling up visual and vision-language representation learning with noisy text supervision
Chao Jia et al. Scaling up visual and vision-language representation learning with noisy text supervision. InICML, 2021
work page 2021
-
[41]
Entangled watermarks as a defense against model extraction
Hengrui Jia et al. Entangled watermarks as a defense against model extraction. InUSENIX, 2021
work page 2021
-
[42]
Albert Q. Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Florian Bressand, Gianna Lengyel, et al. Mistral 7b, 2023. unpublished
work page 2023
-
[43]
Nikola Jovanovi ´c, Robin Staab, and Martin Vechev. Watermark stealing in LLMs. InICML, 2024
work page 2024
-
[44]
Prada: protecting against dnn model stealing attacks
Mika Juuti, Sebastian Szyller, Samuel Marchal, and N Asokan. Prada: protecting against dnn model stealing attacks. InEuroS&P, pages 512–527. IEEE, 2019
work page 2019
-
[45]
A watermark for large language models
John Kirchenbauer, Jonas Geiping, Yuxin Wen, Jonathan Katz, Ian Miers, and Tom Goldstein. A watermark for large language models. InICML, 2023
work page 2023
-
[46]
Thieves on sesame street! model extraction of bert-based apis.ICLR, 2020
Kalpesh Krishna, Gaurav Singh Tomar, Ankur P Parikh, Nicolas Papernot, and Mohit Iyyer. Thieves on sesame street! model extraction of bert-based apis.ICLR, 2020
work page 2020
-
[47]
Kalpesh Krishna, Yixiao Song, Marzena Karpinska, John Wieting, and Mohit Iyyer. Paraphrasing evades detectors of ai-generated text, but retrieval is an effective defense.NeurIPS, 36:27469–27500, 2023
work page 2023
-
[48]
Robust distortion-free watermarks for LMs.TMLR, 2023
Rohith Kuditipudi, John Thickstun, et al. Robust distortion-free watermarks for LMs.TMLR, 2023
work page 2023
-
[49]
W. Kwon et al. Efficient memory management for LLM serving with pagedattention. InSOSP, 2023
work page 2023
-
[50]
Who wrote this code? watermarking for code generation
Taehyun Lee, Seokhee Hong, Jaewoo Ahn, Ilgee Hong, Hwaran Lee, Sangdoo Yun, Jamin Shin, and Gunhee Kim. Who wrote this code? watermarking for code generation. InACL, 2024
work page 2024
-
[51]
Vladimir I Levenshtein. Binary codes capable of correcting deletions, insertions, and reversals.Soviet Physics Doklady, 10(8):707–710, 1966
work page 1966
-
[52]
Plmmark: a secure and robust black-box watermarking framework for pre-trained language model
Peixuan Li, Pengzhou Cheng, Fangqi Li, Wei Du, Haodong Zhao, and Gongshen Liu. Plmmark: a secure and robust black-box watermarking framework for pre-trained language model. InAAAI, 2023
work page 2023
-
[53]
Protecting intellectual property of large language model-based code generation apis via watermarks
Zongjie Li, Chaozheng Wang, Shuai Wang, and Cuiyun Gao. Protecting intellectual property of large language model-based code generation apis via watermarks. InACM SIGSAC, pages 2336–2350, 2023. 12
work page 2023
-
[54]
Towards document-level paraphrase generation with sentence rewriting and reordering
Zhe Lin, Yitao Cai, and Xiaojun Wan. Towards document-level paraphrase generation with sentence rewriting and reordering. InEMNLP, 2021
work page 2021
-
[55]
An unforgeable publicly verifiable watermark for LLMs
Aiwei Liu, Leyi Pan, et al. An unforgeable publicly verifiable watermark for LLMs. InICLR, 2023
work page 2023
-
[56]
A semantic invariant robust watermark for LLMs.ICLR, 2024
Aiwei Liu, Leyi Pan, Xuming Hu, et al. A semantic invariant robust watermark for LLMs.ICLR, 2024
work page 2024
-
[57]
Deepseek-v2: A strong, economical, and efficient mixture-of-experts LM.arXiv, 2024
Aixin Liu et al. Deepseek-v2: A strong, economical, and efficient mixture-of-experts LM.arXiv, 2024
work page 2024
-
[58]
Adaptive text watermark for large language models
Yepeng Liu and Yuheng Bu. Adaptive text watermark for large language models. InICML, 2024
work page 2024
-
[59]
Roberta: A robustly optimized bert pretraining approach.arXiv, 2019
Yinhan Liu, Myle Ott, et al. Roberta: A robustly optimized bert pretraining approach.arXiv, 2019
work page 2019
-
[60]
Nltk: The natural language toolkit.arXiv, 2002
Edward Loper and Steven Bird. Nltk: The natural language toolkit.arXiv, 2002. unpublished
work page 2002
-
[61]
A watermark for low-entropy and unbiased generation in large language models.arXiv, 2024
Minjia Mao, Dongjun Wei, Zeyu Chen, Xiao Fang, and Michael Chau. A watermark for low-entropy and unbiased generation in large language models.arXiv, 2024
work page 2024
-
[62]
Hasan Mesut Meral, Bülent Sankur, A Sumru Özsoy, Tunga Güngör, and Emre Sevinç. Natural language watermarking via morphosyntactic alterations.Computer Speech & Language, 23(1):107–125, 2009
work page 2009
-
[63]
Wordnet: a lexical database for english.ACM Communications, 38(11):39–41, 1995
George A Miller. Wordnet: a lexical database for english.ACM Communications, 38(11):39–41, 1995
work page 1995
-
[64]
P. Molenda et al. Waterjudge: Quality-detection trade-off when watermarking LLMs. InNAACL, 2024
work page 2024
-
[65]
Travis Munyer, Abdullah All Tanvir, Arjon Das, and Xin Zhong. Deeptextmark: a deep learning-driven text watermarking approach for identifying LLM generated text.Ieee Access, 12:40508–40520, 2024
work page 2024
-
[66]
A survey of named entity recognition and classification.Lingvisticae Investigationes, 2007
David Nadeau and Satoshi Sekine. A survey of named entity recognition and classification.Lingvisticae Investigationes, 2007
work page 2007
-
[67]
Entity-level factual consistency of abstractive text summarization
Feng Nan et al. Entity-level factual consistency of abstractive text summarization. InACL, 2021
work page 2021
-
[68]
OpenAI, 2024.https://openai.com/index/openai-api/[Accessed: 2024-09-20]
work page 2024
-
[69]
Markllm: An open-source toolkit for llm watermarking
Leyi Pan, Aiwei Liu, Zhiwei He, Zitian Gao, Xuandong Zhao, Yijian Lu, Binglin Zhou, et al. Markllm: An open-source toolkit for llm watermarking. InEMNLP, pages 61–71, 2024
work page 2024
-
[70]
Modelshield: Adaptive and robust watermark against model extraction attack.TIFS, 2025
Kaiyi Pang, Tao Qi, Chuhan Wu, Minhao Bai, Minghu Jiang, and Yongfeng Huang. Modelshield: Adaptive and robust watermark against model extraction attack.TIFS, 2025
work page 2025
-
[71]
No free lunch in llm watermarking: Trade-offs in watermarking design choices.NeurIPS, 2024
Qi Pang, Shengyuan Hu, Wenting Zheng, and Virginia Smith. No free lunch in llm watermarking: Trade-offs in watermarking design choices.NeurIPS, 2024
work page 2024
-
[72]
Llmmap: Fingerprinting for large language models
Dario Pasquini et al. Llmmap: Fingerprinting for large language models. InUSENIX Security, 2025
work page 2025
-
[73]
A universal part-of-speech tagset
Slav Petrov, Dipanjan Das, and Ryan McDonald. A universal part-of-speech tagset. InInternational Conference on Language Resources and Evaluation, 2012
work page 2012
-
[74]
Markmywords: Analyzing and evaluating language model watermarks
Julien Piet, Chawin Sitawarin, Vivian Fang, Norman Mu, and David Wagner. Markmywords: Analyzing and evaluating language model watermarks. InSaTML, pages 68–91. IEEE, 2025
work page 2025
-
[75]
Stanza: A python natural language processing toolkit for many human languages
Peng Qi, Yuhao Zhang, Yuhui Zhang, Jason Bolton, and Christopher D Manning. Stanza: A python natural language processing toolkit for many human languages. InACL: System Demonstrations, 2020
work page 2020
-
[76]
Provably robust multi-bit watermarking for AI-generated text
Wenjie Qu, Wengrui Zheng, Tianyang Tao, Dong Yin, et al. Provably robust multi-bit watermarking for AI-generated text. InUSENIX Security, pages 201–220, 2025
work page 2025
-
[77]
A. Radford et al. Learning transferable visual models from natural language supervision. InICML, 2021
work page 2021
-
[78]
A robust semantics-based watermark for large language model against paraphrasing
Jie Ren, Han Xu, Yiding Liu, Yingqian Cui, Shuaiqiang Wang, Dawei Yin, and Jiliang Tang. A robust semantics-based watermark for large language model against paraphrasing. InNAACL 2024, 2024
work page 2024
-
[79]
Can ai-generated text be reliably detected?TMLR, 2023
Vinu Sankar Sadasivan, Aounon Kumar, Sriram Balasubramanian, Wenxiao Wang, and Soheil Feizi. Can ai-generated text be reliably detected?TMLR, 2023
work page 2023
-
[80]
Watermarking makes language models radioactive.NeurIPS, 37, 2024
Tom Sander, Pierre Fernandez, Alain Durmus, Matthijs Douze, and Teddy Furon. Watermarking makes language models radioactive.NeurIPS, 37, 2024
work page 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.