Position: LLM Watermarking Should Align Stakeholders' Incentives for Practical Adoption
Pith reviewed 2026-05-18 05:21 UTC · model grok-4.3
The pith
LLM watermarking sees real adoption only when stakeholder incentives align in targeted domains.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The limited real-world deployment of LLM watermarking stems from misaligned incentives among providers, platforms, and end users that create three barriers—competitive risk, detection-tool governance, and attribution issues—rather than from shortcomings in the algorithms themselves. Model watermarking fits provider goals but struggles in open-source environments. LLM text watermarking offers only modest benefits unless scoped narrowly, such as for dataset decontamination. In-context watermarking aligns incentives by letting trusted parties like educators or organizers embed hidden instructions; LLM outputs from those inputs then carry detectable watermarks indicating misuse, with no quality,
What carries the argument
In-context watermarking (ICW), in which trusted parties embed hidden watermark instructions into source documents so that any LLM output derived from them carries a detectable signal of misuse.
If this is right
- Model watermarking aligns with LLM provider interests but faces additional challenges when models are open-sourced.
- LLM text watermarking gains traction mainly in narrow applications such as dataset decontamination or user-controlled provenance tracking.
- In-context watermarking lets trusted parties detect misuse while users experience no quality degradation and providers stay neutral.
- Watermarking methods should be designed around domain-specific incentive structures rather than as general-purpose anti-misuse tools.
- Active community participation is required to refine and standardize these incentive-aligned approaches.
Where Pith is reading between the lines
- The same incentive-alignment logic could guide deployment of other AI detection or provenance tools beyond watermarking.
- In education, ICW-style methods might allow responsible LLM use while giving instructors reliable signals of unauthorized assistance.
- Platforms could adopt common ICW instruction formats to enable consistent detection across multiple LLM providers.
Load-bearing premise
The three barriers arising from misaligned incentives—competitive risk, detection-tool governance, and attribution issues—are the dominant obstacles to adoption instead of technical limitations in existing watermarking methods.
What would settle it
A controlled deployment of in-context watermarking in a specific domain such as academic assignments, measuring whether misuse detection rates increase and legitimate use remains unaffected compared to no watermarking.
Figures
read the original abstract
Despite progress in watermarking algorithms for large language models (LLMs), real-world deployment remains limited. We argue that this gap stems from misaligned incentives among LLM providers, platforms, and end users, which manifest as three key barriers: competitive risk, detection-tool governance, and attribution issues. We revisit three classes of watermarking through this lens. \emph{Model watermarking} naturally aligns with LLM provider interests, yet faces new challenges in open-source ecosystems. \emph{LLM text watermarking} offers modest provider benefit when framed solely as an anti-misuse tool, but can gain traction in narrowly scoped settings such as dataset de-contamination or user-controlled provenance. \emph{In-context watermarking} (ICW) is tailored for trusted parties, such as conference organizers or educators, who embed hidden watermarking instructions into documents. If a dishonest reviewer or student submits this text to an LLM, the output carries a detectable watermark indicating misuse. This setup aligns incentives: users experience no quality loss, trusted parties gain a detection tool, and LLM providers remain neutral by simply following watermark instructions. We advocate for a broader exploration of incentive-aligned methods, with ICW as an example, in domains where trusted parties need reliable tools to detect misuse. More broadly, we distill design principles for incentive-aligned, domain-specific watermarking and outline future research directions. Our position is that the practical adoption of LLM watermarking requires aligning stakeholder incentives in targeted application domains and fostering active community engagement.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper argues that limited real-world deployment of LLM watermarking stems from misaligned incentives among providers, platforms, and users, which create three barriers: competitive risk, detection-tool governance, and attribution issues. It revisits model watermarking (aligns with providers but challenged in open-source settings), LLM text watermarking (limited benefit as anti-misuse tool but potentially useful in scoped settings like de-contamination), and in-context watermarking (ICW), where trusted parties embed instructions so that LLM outputs from misused documents carry detectable signals. ICW is presented as incentive-aligned with no quality loss to users and neutrality for providers. The position advocates exploring incentive-aligned, domain-specific methods, distills design principles, and calls for community engagement to drive adoption.
Significance. If the incentive-alignment thesis holds, the paper offers a useful reframing that could redirect research effort toward domain-specific designs rather than universal technical solutions. The ICW example and distilled design principles provide a concrete starting point for trusted-party scenarios such as academic integrity or conference review processes, potentially increasing the chance of targeted deployments where stakeholder interests already converge.
major comments (2)
- [Introduction / Barriers section] Introduction and § on barriers: The central claim that competitive risk, detection-tool governance, and attribution issues are the dominant reasons for limited deployment (rather than technical factors such as robustness, quality degradation, or false-positive rates) is asserted without deployment case studies, provider surveys, or failure-mode analyses. This assumption is load-bearing for the argument that incentive realignment is the primary path to adoption.
- [In-Context Watermarking section] § on In-Context Watermarking: The statement that ICW produces 'no quality loss' for users and is technically viable is presented as given, yet the manuscript supplies no analysis, reference to existing ICW implementations, or discussion of detection reliability/false-positive rates in the targeted domains. This weakens the claim that ICW already satisfies the technical preconditions for incentive alignment.
minor comments (1)
- [Abstract] The abstract and main text use 'ICW' without an initial definition on first use; adding an explicit parenthetical expansion would improve readability for readers outside the immediate subfield.
Simulated Author's Rebuttal
We are grateful to the referee for their constructive comments, which help us improve the clarity and rigor of our position paper. We address the major comments below, indicating the revisions we intend to make.
read point-by-point responses
-
Referee: Introduction and § on barriers: The central claim that competitive risk, detection-tool governance, and attribution issues are the dominant reasons for limited deployment (rather than technical factors such as robustness, quality degradation, or false-positive rates) is asserted without deployment case studies, provider surveys, or failure-mode analyses. This assumption is load-bearing for the argument that incentive realignment is the primary path to adoption.
Authors: We thank the referee for highlighting this point. As a position paper, our argument is grounded in an analysis of stakeholder incentives drawn from the current state of LLM deployment and watermarking literature, rather than presenting new empirical data. We recognize that providing additional supporting evidence, such as references to reported challenges in deployment, would strengthen the manuscript. In the revision, we will expand the barriers section to include specific examples from public discussions and papers on watermarking adoption barriers, while maintaining that incentive misalignment is a critical factor. This constitutes a partial revision as we do not claim to have conducted original surveys. revision: partial
-
Referee: § on In-Context Watermarking: The statement that ICW produces 'no quality loss' for users and is technically viable is presented as given, yet the manuscript supplies no analysis, reference to existing ICW implementations, or discussion of detection reliability/false-positive rates in the targeted domains. This weakens the claim that ICW already satisfies the technical preconditions for incentive alignment.
Authors: We agree with the referee that more details on ICW would be beneficial. The current manuscript presents ICW primarily as a conceptual framework to illustrate incentive alignment in trusted-party scenarios. To address the concern, we will revise the section to include references to existing prompt-based or instruction-following techniques that can serve as the basis for ICW, discuss how detection can be reliable in narrow domains (e.g., by combining with other signals), and clarify that 'no quality loss' means the watermarking instructions do not degrade the LLM's performance for the intended user task. We will also note open questions regarding false-positive rates as directions for future research. This will be incorporated in the revised manuscript. revision: yes
Circularity Check
No circularity: position paper analyzes incentives without self-referential reductions
full rationale
The paper is a position piece that identifies three incentive barriers (competitive risk, detection-tool governance, attribution issues) and revisits watermarking classes through that external lens, proposing ICW as an incentive-aligned example. No equations, fitted parameters, predictions, or derivations appear in the provided text. Claims rest on reasoning about stakeholder interests rather than reducing to self-definitions, self-citations as load-bearing premises, or renaming known results. The central argument remains independent of any internal construction that would force equivalence to its inputs.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The three key barriers—competitive risk, detection-tool governance, and attribution issues—are the main obstacles preventing real-world deployment of LLM watermarking.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We argue that this gap stems from misaligned incentives among LLM providers, platforms, and end users, which manifest as four key barriers: competitive risk, detection-tool governance, robustness concerns and attribution issues.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 3 Pith papers
-
Watermarking Should Be Treated as a Monitoring Primitive
Watermarking enables entity-level attribution and monitoring via signal aggregation across outputs, even in zero-bit designs, revealing a fundamental tension with attribution goals.
-
Watermarking Should Be Treated as a Monitoring Primitive
Watermarking enables entity-level attribution and monitoring through signal aggregation even in zero-bit designs, creating an unavoidable dual-use tension between attribution and surveillance.
-
Fundamental Trade-Offs in Multi-Bit Watermarking of Stochastic Processes
Derives matched converse and achievability bounds that characterize optimal trade-offs among false-alarm probability, detection error probability, distortion, and information rate for multi-bit watermarking of station...
Reference graph
Works this paper leans on
-
[1]
[AAG24] Maya Anderson, Guy Amit, and Abigail Goldsteen. Is my data in your retrieval database? membership inference attacks against retrieval augmented generation.arXiv preprint arXiv:2405.20446,
-
[2]
Watermarking of large language models
[Aar23] Scott Aaronson. Watermarking of large language models. https://simons.berkeley. edu/talks/scott-aaronson-ut-austin-openai-2023-08-17 ,
work page 2023
-
[3]
[ALL+25] Li An, Yujian Liu, Yepeng Liu, Yang Zhang, Yuheng Bu, and Shiyu Chang
Accessed: 2023-08. [ALL+25] Li An, Yujian Liu, Yepeng Liu, Yang Zhang, Yuheng Bu, and Shiyu Chang. Defending LLM watermarking against spoofing attacks with contrastive representation learning.arXiv preprint arXiv:2504.06575,
-
[4]
Multi-bit distortion-free watermarking for large language models.arXiv preprint arXiv:2402.16578,
[BJZM24] Massieh Kordi Boroujeny, Ya Jiang, Kai Zeng, and Brian Mark. Multi-bit distortion-free watermarking for large language models.arXiv preprint arXiv:2402.16578,
-
[5]
[BMR+20] Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language models are few-shot learners.Advances in neural information processing systems, 33:1877–1901,
work page 1901
-
[6]
A watermark for black-box language models.arXiv preprint arXiv:2410.02099,
[BWAM24] Dara Bahri, John Wieting, Dana Alon, and Donald Metzler. A watermark for black-box language models.arXiv preprint arXiv:2410.02099,
-
[7]
[CBZ+23] Souradip Chakraborty, Amrit Singh Bedi, Sicheng Zhu, Bang An, Dinesh Manocha, and Furong Huang. On the possibilities of ai-generated text detection.arXiv preprint arXiv:2304.04736,
-
[8]
Trojanrag: Retrieval-augmented genera- tion can be backdoor driver in large language mod- els,
[CDJ+24] Pengzhou Cheng, Yidong Ding, Tianjie Ju, Zongru Wu, Wei Du, Ping Yi, Zhuosheng Zhang, and Gongshen Liu. Trojanrag: Retrieval-augmented generation can be backdoor driver in large language models.arXiv preprint arXiv:2405.13401,
-
[9]
Provably robust watermarks for open-source language models.arXiv preprint arXiv:2410.18861,
[CGMR24] Miranda Christ, Sam Gunn, Tal Malkin, and Mariana Raykova. Provably robust watermarks for open-source language models.arXiv preprint arXiv:2410.18861,
-
[10]
Undetectable watermarks for language models
[CGZ23] Miranda Christ, Sam Gunn, and Or Zamir. Undetectable watermarks for language models. arXiv preprint arXiv:2306.09194,
-
[11]
Postmark: A robust blackbox watermark for large language models.arXiv preprint arXiv:2406.14517,
10 [CKH+24] Yapei Chang, Kalpesh Krishna, Amir Houmansadr, John Wieting, and Mohit Iyyer. Postmark: A robust blackbox watermark for large language models.arXiv preprint arXiv:2406.14517,
-
[12]
[CLG+23] A Feder Cooper, Katherine Lee, James Grimmelmann, Daphne Ippolito, Christopher Callison-Burch, Christopher A Choquette-Choo, Niloofar Mireshghallah, Miles Brundage, David Mimno, Madiha Zahrah Choksi, et al. Report of the 1st workshop on generative ai and law.arXiv preprint arXiv:2311.06477,
-
[13]
Phantom: General trigger attacks on retrieval augmented language generation,
[CSA+24] Harsh Chaudhari, Giorgio Severi, John Abascal, Matthew Jagielski, Christopher A Choquette-Choo, Milad Nasr, Cristina Nita-Rotaru, and Alina Oprea. Phantom: General trigger attacks on retrieval augmented language generation.arXiv preprint arXiv:2405.20485,
-
[14]
Improved unbiased watermark for large language models.arXiv preprint arXiv:2502.11268,
[CWGH25] Ruibo Chen, Yihan Wu, Junfeng Guo, and Heng Huang. Improved unbiased watermark for large language models.arXiv preprint arXiv:2502.11268,
-
[15]
[CWSJ25] Xinyue Cui, Johnny Tian-Zheng Wei, Swabha Swayamdipta, and Robin Jia. Robust data watermarking in language models by injecting fictitious knowledge.arXiv preprint arXiv:2503.04036,
-
[16]
[CYS+24] Jiacheng Cai, Jiahao Yu, Yangguang Shao, Yuhang Wu, and Xinyu Xing
Issued 10 Jul 2023, effective 15 Aug 2023; requires explicit labels and implicit watermarks. [CYS+24] Jiacheng Cai, Jiahao Yu, Yangguang Shao, Yuhang Wu, and Xinyu Xing. Utf: Under- trained tokens as fingerprints a novel approach to LLM identification.arXiv preprint arXiv:2410.12318,
-
[17]
[DAL24] Abdulrahman Diaa, Toluwani Aremu, and Nils Lukas. Optimizing adaptive attacks against content watermarks for language models.arXiv preprint arXiv:2410.02440,
-
[18]
A Survey on In-context Learning
[DLD+22] Qingxiu Dong, Lei Li, Damai Dai, Ce Zheng, Jingyuan Ma, Rui Li, Heming Xia, Jingjing Xu, Zhiyong Wu, Tianyu Liu, et al. A survey on in-context learning.arXiv preprint arXiv:2301.00234,
work page internal anchor Pith review Pith/arXiv arXiv
-
[19]
[Eur24] European Union. Regulation (eu) 2024/1689 of the european parliament and of the council of 13 march 2024 laying down harmonised rules on artificial intelligence and amending certain union legislative acts (artificial intelligence act),
work page 2024
-
[20]
Gumbelsoft: Diversified language model watermarking via the gumbelmax-trick
11 [FZY+24] Jiayi Fu, Xuandong Zhao, Ruihan Yang, Yuansen Zhang, Jiangjie Chen, and Yanghua Xiao. Gumbelsoft: Diversified language model watermarking via the gumbelmax-trick. arXiv preprint arXiv:2402.12948,
-
[21]
[GCG+23] Soumya Suvra Ghosal, Souradip Chakraborty, Jonas Geiping, Furong Huang, Dinesh Manocha, and Amrit Singh Bedi. Towards possibilities & impossibilities of ai-generated text detection: A survey.arXiv preprint arXiv:2310.15264,
-
[22]
[GDJ+24] Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Alex Vaughan, et al. The llama 3 herd of models.arXiv preprint arXiv:2407.21783,
work page internal anchor Pith review Pith/arXiv arXiv
-
[23]
[GF24] Eva Giboulot and Teddy Furon. Watermax: breaking the LLM watermark detectability- robustness-quality trade-off.arXiv preprint arXiv:2403.04808,
-
[24]
Black-box detection of language model watermarks.arXiv preprint arXiv:2405.20777,
[GJSV24] Thibaud Gloaguen, Nikola Jovanovi´ c, Robin Staab, and Martin Vechev. Black-box detection of language model watermarks.arXiv preprint arXiv:2405.20777,
-
[25]
On the learnability of watermarks for language models.arXiv preprint arXiv:2312.04469,
[GLLH23] Chenchen Gu, Xiang Lisa Li, Percy Liang, and Tatsunori Hashimoto. On the learnability of watermarks for language models.arXiv preprint arXiv:2312.04469,
-
[26]
Unbiased watermark for large language models
[HCW+23] Zhengmian Hu, Lichang Chen, Xidong Wu, Yihan Wu, Hongyang Zhang, and Heng Huang. Unbiased watermark for large language models.arXiv preprint arXiv:2310.10669,
-
[27]
[HLW+24] Haiyun He, Yepeng Liu, Ziqiao Wang, Yongyi Mao, and Yuheng Bu. Universally optimal watermarking schemes for LLMs: from theory to practice.arXiv preprint arXiv:2410.02890,
-
[28]
[HLW+25] Haiyun He, Yepeng Liu, Ziqiao Wang, Yongyi Mao, and Yuheng Bu. Distributional informa- tion embedding: A framework for multi-bit watermarking.arXiv preprint arXiv:2501.16558,
-
[29]
[HSL+24] Mingjia Huo, Sai Ashish Somayajula, Youwei Liang, Ruisi Zhang, Farinaz Koushanfar, and Pengtao Xie. Token-specific watermarking with enhanced detectability and semantic coherence for large language models.arXiv preprint arXiv:2402.18059,
-
[30]
[HZH+23] Abe Bohan Hou, Jingyu Zhang, Tianxing He, Yichen Wang, Yung-Sung Chuang, Hongwei Wang, Lingfeng Shen, Benjamin Van Durme, Daniel Khashabi, and Yulia Tsvetkov. Semstamp: A semantic watermark with paraphrastic robustness for text generation.arXiv preprint arXiv:2310.03991,
-
[31]
Ward: Provable rag dataset inference via LLM watermarks.arXiv preprint arXiv:2410.03537,
[JSBV24] Nikola Jovanovi´ c, Robin Staab, Maximilian Baader, and Martin Vechev. Ward: Provable rag dataset inference via LLM watermarks.arXiv preprint arXiv:2410.03537,
-
[32]
Watermark stealing in large language models
[JSV24] Nikola Jovanovi´ c, Robin Staab, and Martin Vechev. Watermark stealing in large language models.arXiv preprint arXiv:2402.19361,
-
[33]
An overview of large language models for statisticians.arXiv preprint arXiv:2502.17814,
[JYG+25] Wenlong Ji, Weizhe Yuan, Emily Getzen, Kyunghyun Cho, Michael I Jordan, Song Mei, Jason E Weston, Weijie J Su, Jing Xu, and Linjun Zhang. An overview of large language models for statisticians.arXiv preprint arXiv:2502.17814,
-
[34]
Robust distortion- free watermarks for language models.arXiv preprint arXiv:2307.15593, 2023
[KTHL23] Rohith Kuditipudi, John Thickstun, Tatsunori Hashimoto, and Percy Liang. Robust distortion-free watermarks for language models.arXiv preprint arXiv:2307.15593,
-
[35]
Adaptive text watermark for large language models.arXiv preprint arXiv:2401.13927,
[LB24] Yepeng Liu and Yuheng Bu. Adaptive text watermark for large language models.arXiv preprint arXiv:2401.13927,
-
[36]
Who wrote this code? watermarking for code generation.arXiv preprint arXiv:2305.15060,
[LHA+23] Taehyun Lee, Seokhee Hong, Jaewoo Ahn, Ilgee Hong, Hwaran Lee, Sangdoo Yun, Jamin Shin, and Gunhee Kim. Who wrote this code? watermarking for code generation.arXiv preprint arXiv:2305.15060,
-
[37]
[LHC+23] Yixin Liu, Hongsheng Hu, Xun Chen, Xuyun Zhang, and Lichao Sun. Watermarking text data on large language models for dataset copyright.arXiv preprint arXiv:2305.13257,
-
[38]
Trojtext: Test-time invisible textual trojan insertion
[LLF23] Qian Lou, Yepeng Liu, and Bo Feng. Trojtext: Test-time invisible textual trojan insertion. arXiv preprint arXiv:2303.02242,
-
[39]
An unforgeable publicly verifiable watermark for large language models
[LPH+23] Aiwei Liu, Leyi Pan, Xuming Hu, Shu’ang Li, Lijie Wen, Irwin King, and Philip S Yu. An unforgeable publicly verifiable watermark for large language models.arXiv preprint arXiv:2307.16230,
-
[40]
[LRW+24] Xiang Li, Feng Ruan, Huiyuan Wang, Qi Long, and Weijie J Su. Robust detection of watermarks for large language models under human edits.arXiv preprint arXiv:2411.13868,
-
[41]
In-Context Watermarks for Large Language Models
[LZK+25] Yepeng Liu, Xuandong Zhao, Christopher Kruegel, Dawn Song, and Yuheng Bu. In-context watermarks for large language models.arXiv preprint arXiv:2505.16934,
work page internal anchor Pith review Pith/arXiv arXiv
-
[42]
Mask-based membership inference attacks for retrieval-augmented generation
[LZL25] Mingrui Liu, Sixiao Zhang, and Cheng Long. Mask-based membership inference attacks for retrieval-augmented generation. InProceedings of the ACM on Web Conference 2025, pages 2894–2907,
work page 2025
-
[43]
[LZSB25] Yepeng Liu, Xuandong Zhao, Dawn Song, and Yuheng Bu. Dataset protection via watermarked canaries in retrieval-augmented LLMs.arXiv preprint arXiv:2502.10673,
-
[44]
Can LLMs follow simple rules?arXiv preprint arXiv:2311.04235,
13 [MCW+23] Norman Mu, Sarah Chen, Zifan Wang, Sizhe Chen, David Karamardian, Lulwa Aljeraisy, Basel Alomair, Dan Hendrycks, and David Wagner. Can LLMs follow simple rules?arXiv preprint arXiv:2311.04235,
-
[45]
Improving your model ranking on chatbot arena by vote rigging.arXiv preprint arXiv:2501.17858,
[MPD+25] Rui Min, Tianyu Pang, Chao Du, Qian Liu, Minhao Cheng, and Min Lin. Improving your model ranking on chatbot arena by vote rigging.arXiv preprint arXiv:2501.17858,
-
[46]
Scalable fingerprinting of large language models.arXiv preprint arXiv:2502.07760,
[NHB+25] Anshul Nasery, Jonathan Hayase, Creston Brooks, Peiyao Sheng, Himanshu Tyagi, Pramod Viswanath, and Sewoong Oh. Scalable fingerprinting of large language models.arXiv preprint arXiv:2502.07760,
-
[47]
[PHZS24] Qi Pang, Shengyuan Hu, Wenting Zheng, and Virginia Smith. No free lunch in LLM watermarking: Trade-offs in watermarking design choices.arXiv preprint arXiv:2402.16187,
-
[48]
Markllm: An open-source toolkit for llm watermarking.arXiv preprint arXiv:2405.10051,
[PLH+24] Leyi Pan, Aiwei Liu, Zhiwei He, Zitian Gao, Xuandong Zhao, Yijian Lu, Binglin Zhou, Shuliang Liu, Xuming Hu, Lijie Wen, et al. MarkLLM: An open-source toolkit for LLM watermarking.arXiv preprint arXiv:2405.10051,
-
[49]
Detecting LLM-written peer reviews.arXiv preprint arXiv:2503.15772,
[RKLS25] Vishisht Rao, Aounon Kumar, Himabindu Lakkaraju, and Nihar B Shah. Detecting LLM-written peer reviews.arXiv preprint arXiv:2503.15772,
-
[50]
[RS24] Mark Russinovich and Ahmed Salem. Hey, that’s my model! introducing chain & hash, an LLM fingerprinting technique.arXiv preprint arXiv:2407.10887,
-
[51]
A robust semantics-based watermark for large language model against paraphrasing
[RXL+23] Jie Ren, Han Xu, Yiding Liu, Yingqian Cui, Shuaiqiang Wang, Dawei Yin, and Jiliang Tang. A robust semantics-based watermark for large language model against paraphrasing. arXiv preprint arXiv:2311.08721,
-
[52]
14 [SNW+25] Shivalika Singh, Yiyang Nan, Alex Wang, Daniel D’Souza, Sayash Kapoor, Ahmet ¨Ust¨ un, Sanmi Koyejo, Yuntian Deng, Shayne Longpre, Noah Smith, et al. The leaderboard illusion. arXiv preprint arXiv:2504.20879,
-
[53]
Embarrassingly simple text watermarks.arXiv preprint arXiv:2310.08920,
[STB+23] Ryoma Sato, Yuki Takezawa, Han Bao, Kenta Niwa, and Makoto Yamada. Embarrassingly simple text watermarks.arXiv preprint arXiv:2310.08920,
-
[54]
Proving membership in LLM pretraining data via data watermarks.arXiv preprint arXiv:2402.10892,
[WWJ24] Johnny Tian-Zheng Wei, Ryan Yixiang Wang, and Robin Jia. Proving membership in LLM pretraining data via data watermarks.arXiv preprint arXiv:2402.10892,
-
[55]
Towards codable watermarking for injecting multi-bits information to llms,
[WYC+23] Lean Wang, Wenkai Yang, Deli Chen, Hao Zhou, Yankai Lin, Fandong Meng, Jie Zhou, and Xu Sun. Towards codable watermarking for injecting multi-bits information to LLMs. arXiv preprint arXiv:2307.15992,
-
[56]
[XLH+25] Yijie Xu, Aiwei Liu, Xuming Hu, Lijie Wen, and Hui Xiong
Accessed: 2025-05-22. [XLH+25] Yijie Xu, Aiwei Liu, Xuming Hu, Lijie Wen, and Hui Xiong. Mark your LLM: Detect- ing the misuse of open-source large language models via watermarking.arXiv preprint arXiv:2503.04636,
-
[57]
Instructions as Backdoors: Backdoor Vulnerabilities of Instruction Tuning for Large Language Models
[XMW+23] Jiashu Xu, Mingyu Derek Ma, Fei Wang, Chaowei Xiao, and Muhao Chen. Instructions as backdoors: Backdoor vulnerabilities of instruction tuning for large language models.arXiv preprint arXiv:2305.14710,
-
[58]
URLhttps://www.aclweb.org/anthology/2020.emnlp-demos.6
[XWM+24] Jiashu Xu, Fei Wang, Mingyu Derek Ma, Pang Wei Koh, Chaowei Xiao, and Muhao Chen. Instructional fingerprinting of large language models.arXiv preprint arXiv:2401.12255,
-
[59]
Approximate nearest neighbor negative contrastive learning for dense text retrieval,
[XXL+20] Lee Xiong, Chenyan Xiong, Ye Li, Kwok-Fung Tang, Jialin Liu, Paul Bennett, Junaid Ahmed, and Arnold Overwijk. Approximate nearest neighbor negative contrastive learning for dense text retrieval.arXiv preprint arXiv:2007.00808,
-
[60]
Learning to watermark LLM-generated text via reinforcement learning.arXiv preprint arXiv:2403.10553,
[XYL24] Xiaojun Xu, Yuanshun Yao, and Yang Liu. Learning to watermark LLM-generated text via reinforcement learning.arXiv preprint arXiv:2403.10553,
-
[61]
[YAJK23] KiYoon Yoo, Wonhyuk Ahn, Jiho Jang, and Nojun Kwak. Robust multi-bit natural language watermarking through invariant features.arXiv preprint arXiv:2305.01904,
-
[62]
Watermarking text generated by black-box language models.arXiv preprint arXiv:2305.08883,
[YCZ+23] Xi Yang, Kejiang Chen, Weiming Zhang, Chang Liu, Yuang Qi, Jie Zhang, Han Fang, and Nenghai Yu. Watermarking text generated by black-box language models.arXiv preprint arXiv:2305.08883,
-
[63]
[YTWW24] Shojiro Yamabe, Tsubasa Takahashi, Futa Waseda, and Koki Wataoka. Mergeprint: Robust fingerprinting against merging large language models.arXiv preprint arXiv:2410.08604,
-
[64]
[YYZ+24] An Yang, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chengyuan Li, Dayiheng Liu, Fei Huang, Haoran Wei, et al. Qwen2. 5 technical report.arXiv preprint arXiv:2412.15115,
work page internal anchor Pith review Pith/arXiv arXiv
-
[65]
[ZALW23] Xuandong Zhao, Prabhanjan Ananth, Lei Li, and Yu-Xiang Wang. Provable robust watermarking for ai-generated text.arXiv preprint arXiv:2306.17439,
-
[66]
Agnibh Dasgupta, Abdullah Tanvir, and Xin Zhong
[ZDT24] Xin Zhong, Agnibh Dasgupta, and Abdullah Tanvir. Watermarking language models through language models.arXiv preprint arXiv:2411.05091,
-
[67]
Edelman, Danilo Francati, Daniele Venturi, Giuseppe Ateniese, and Boaz Barak
[ZEF+23] Hanlin Zhang, Benjamin L Edelman, Danilo Francati, Daniele Venturi, Giuseppe Ateniese, and Boaz Barak. Watermarks in the sand: Impossibility of strong watermarking for generative models.arXiv preprint arXiv:2311.04378,
-
[68]
Sok: Watermarking for ai-generated content.arXiv preprint arXiv:2411.18479,
[ZGC+24] Xuandong Zhao, Sam Gunn, Miranda Christ, Jaiden Fairoze, Andres Fabrega, Nicholas Car- lini, Sanjam Garg, Sanghyun Hong, Milad Nasr, Florian Tramer, et al. Sok: Watermarking for ai-generated content.arXiv preprint arXiv:2411.18479,
-
[69]
Duwak: Dual watermarks in large language models.arXiv preprint arXiv:2403.13000,
[ZGCC24] Chaoyi Zhu, Jeroen Galjaard, Pin-Yu Chen, and Lydia Y Chen. Duwak: Dual watermarks in large language models.arXiv preprint arXiv:2403.13000,
-
[70]
Poisonedrag: Knowledge poisoning attacks to retrieval-augmented generation of large language models,
[ZGWJ24] Wei Zou, Runpeng Geng, Binghui Wang, and Jinyuan Jia. Poisonedrag: Knowledge corruption attacks to retrieval-augmented generation of large language models.arXiv preprint arXiv:2402.07867,
-
[71]
[ZJG+24] Shuai Zhao, Meihuizi Jia, Zhongliang Guo, Leilei Gan, Xiaoyu Xu, Xiaobao Wu, Jie Fu, Yichao Feng, Fengjun Pan, and Luu Anh Tuan. A survey of recent backdoor attacks and defenses in large language models.arXiv preprint arXiv:2406.06852,
-
[72]
[ZLL+25] Junyan Zhang, Shuliang Liu, Aiwei Liu, Yubo Gao, Jungang Li, Xiaojie Gu, and Xuming Hu. Cohemark: A novel sentence-level watermark for enhanced text quality.arXiv preprint arXiv:2504.17309,
-
[73]
Instruction-Following Evaluation for Large Language Models
[ZLM+23] Jeffrey Zhou, Tianjian Lu, Swaroop Mishra, Siddhartha Brahma, Sujoy Basu, Yi Luan, Denny Zhou, and Le Hou. Instruction-following evaluation for large language models. arXiv preprint arXiv:2311.07911,
work page internal anchor Pith review Pith/arXiv arXiv
-
[74]
Distillation-resistant watermarking for model protection in nlp.arXiv preprint arXiv:2210.03312,
[ZLW22] Xuandong Zhao, Lei Li, and Yu-Xiang Wang. Distillation-resistant watermarking for model protection in nlp.arXiv preprint arXiv:2210.03312,
-
[75]
vtune: Verifiable fine-tuning for LLMs through backdooring.arXiv preprint arXiv:2411.06611,
[ZPPG24] Eva Zhang, Arka Pal, Akilesh Potti, and Micah Goldblum. vtune: Verifiable fine-tuning for LLMs through backdooring.arXiv preprint arXiv:2411.06611,
-
[76]
16 Watermarked LLMInput Prompt beachsunshinecomputerdogVocabulary⋯ 2.251.760.36-0.12Logits⋯ +0+2+2+0Watermark Strength 0.1310.5810.2490.006ProbabilityDistribution⋯⋯LLM “Florida has nice___” Decoding Process of LLM Shared Hash keyGreen Listsunshinecomputer⋯ Red Listbeachdog⋯ “Florida has nicesunshine.” Shared Hash key Red Listbeachdog⋯ Green Listsunshineco...
work page internal anchor Pith review Pith/arXiv arXiv
-
[77]
•China’s Cyberspace Administration [Cyb23] has gone further, mandating both visible and invisible watermarks for generative content and requiring platforms to detect and flag unmarked media. • In the U.S., NIST’s 2024 report on synthetic content [ CDR24] frames watermarking as a foundational content authentication tool, recommended even in the absence of ...
work page 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.