PromptCOS: Towards Content-only System Prompt Copyright Auditing for LLMs
Pith reviewed 2026-05-25 07:57 UTC · model grok-4.3
The pith
PromptCOS audits system prompt copyrights from generated text alone.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
PromptCOS establishes a content-only auditing framework by encoding a cyclic output signal using auxiliary tokens injected into the system prompt, with cover tokens optimized to penalize removal attempts, thereby achieving stable watermark detection solely from generated content without altering the prompt's primary behavior.
What carries the argument
Cyclic output signal encoded by auxiliary tokens, protected by optimized cover tokens that cause degradation if removed.
If this is right
- System prompts can carry a detectable watermark without changing their core task instructions.
- Verification of prompt ownership succeeds using only the generated text, independent of internal model probabilities.
- Removing the watermark tokens triggers measurable degradation in the prompt's original performance.
- The watermark resists removal attempts across multiple categories of attacks.
- Auditing requires far less computation than logit-based alternatives.
Where Pith is reading between the lines
- Owners of high-value prompts could embed such signals before licensing them to third-party applications.
- The same auxiliary-token approach might apply to auditing other forms of prompt customization beyond system prompts.
- Marketplaces that sell prompts could require watermarking as a condition for listing.
- Detection could be extended to track which downstream application is using a stolen prompt by varying the cyclic pattern per licensee.
Load-bearing premise
A small set of auxiliary tokens can create a stable, detectable cyclic pattern in the output while the main prompt instructions continue to function unchanged.
What would settle it
An experiment in which the model is prompted without the auxiliary tokens or after their removal and the measured similarity to the target cyclic signal drops below the reported detection threshold.
Figures
read the original abstract
System prompts are critical for shaping the behavior and output quality of large language model (LLM)-based applications, driving substantial investment in optimizing high-quality prompts beyond traditional handcrafted designs. However, as system prompts become valuable intellectual property, they are increasingly vulnerable to prompt theft and unauthorized use, highlighting the urgent need for effective copyright auditing, especially watermarking. Existing methods rely on verifying subtle logit distribution shifts triggered by a query. We observe that this logit-dependent verification framework is impractical in real-world content-only settings, primarily because (1) random sampling makes content-level generation unstable for verification, and (2) stronger instructions needed for content-level signals compromise prompt fidelity. To overcome these challenges, we propose PromptCOS, the first content-only system prompt copyright auditing method based on content-level output similarity. PromptCOS achieves watermark stability by designing a cyclic output signal as the conditional instruction's target. It preserves prompt fidelity by injecting a small set of auxiliary tokens to encode the watermark, leaving the main prompt untouched. Furthermore, to ensure robustness against malicious removal, we optimize cover tokens, i.e., critical tokens in the original prompt, to ensure that removing auxiliary tokens causes severe performance degradation. Experimental results show that promptCOS achieves high effectiveness (99.3% average watermark similarity), strong distinctiveness (60.8% higher than the best baseline), high fidelity (accuracy degradation no greater than 0.6%), robustness (resilience against four potential attack categories), and high computational efficiency (up to 98.1% cost saving).
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes PromptCOS, the first content-only system prompt copyright auditing method for LLMs based on content-level output similarity. It achieves watermark stability via a cyclic output signal as the conditional instruction target, preserves fidelity by injecting a small set of auxiliary tokens to encode the watermark (leaving the main prompt untouched), and ensures robustness by optimizing cover tokens so that auxiliary-token removal causes severe performance degradation. Experiments report 99.3% average watermark similarity, 60.8% higher distinctiveness than the best baseline, accuracy degradation no greater than 0.6%, resilience against four attack categories, and up to 98.1% computational cost saving.
Significance. If the empirical claims hold under full scrutiny of methods and controls, PromptCOS would provide a practical solution for system-prompt IP protection in deployed LLM applications where only generated content (not logits) is observable, directly addressing the sampling instability and fidelity-loss limitations of prior logit-based watermarking approaches.
major comments (3)
- [Abstract, design paragraph] Abstract and design description: the stability claim rests on the 'cyclic output signal' construction, yet no formal definition, recurrence relation, or proof sketch is supplied showing why this signal remains detectable after sampling; this is load-bearing for the core effectiveness claim of 99.3% similarity.
- [Experimental results] Experimental results paragraph: the 99.3% similarity, 0.6% fidelity, and 60.8% distinctiveness figures are stated without reference to the precise experimental protocol, baseline implementations, number of trials, statistical tests, or data-exclusion criteria, preventing verification that post-hoc choices did not inflate the numbers.
- [Robustness evaluation] Robustness section: the claim of resilience against 'four potential attack categories' is presented without enumerating the attacks or showing quantitative degradation curves when auxiliary tokens are removed, leaving the cover-token optimization claim unverified.
minor comments (2)
- [Method] Notation for auxiliary and cover tokens should be introduced with explicit symbols and an equation showing how they are combined with the original prompt.
- [Abstract] The abstract states 'up to 98.1% cost saving' without specifying the baseline cost metric or the exact configuration achieving that figure.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. We address each major comment below, indicating planned revisions where appropriate to improve clarity and verifiability.
read point-by-point responses
-
Referee: [Abstract, design paragraph] Abstract and design description: the stability claim rests on the 'cyclic output signal' construction, yet no formal definition, recurrence relation, or proof sketch is supplied showing why this signal remains detectable after sampling; this is load-bearing for the core effectiveness claim of 99.3% similarity.
Authors: We agree that a formal definition would strengthen the presentation. The cyclic signal is constructed by setting the conditional instruction target to a periodic sequence of auxiliary tokens whose embeddings induce a detectable similarity pattern under content-only comparison. In the revision we will add an explicit recurrence relation (s_{t+1} = f(s_t, aux)) together with a short argument showing that the periodicity survives sampling noise because the auxiliary tokens dominate the similarity metric even after token-level perturbations. This addition will be placed in Section 3.2. revision: yes
-
Referee: [Experimental results] Experimental results paragraph: the 99.3% similarity, 0.6% fidelity, and 60.8% distinctiveness figures are stated without reference to the precise experimental protocol, baseline implementations, number of trials, statistical tests, or data-exclusion criteria, preventing verification that post-hoc choices did not inflate the numbers.
Authors: The full experimental protocol (including 500 independent trials per setting, exact baseline re-implementations, t-test significance thresholds, and exclusion rules for degenerate prompts) appears in Section 4.1 and Appendix B. To address the concern directly we will insert a concise protocol summary table and a statement of statistical procedures immediately after the reported figures in the abstract and results section. revision: yes
-
Referee: [Robustness evaluation] Robustness section: the claim of resilience against 'four potential attack categories' is presented without enumerating the attacks or showing quantitative degradation curves when auxiliary tokens are removed, leaving the cover-token optimization claim unverified.
Authors: The four attack categories (auxiliary-token deletion, synonym substitution, prompt paraphrasing, and logit-level perturbation) are defined and evaluated in Section 5.2. We will expand that section with an explicit enumeration, per-attack success rates, and degradation curves for both watermarked and cover-token-optimized prompts, confirming that removal of auxiliary tokens triggers >40% accuracy drop only when cover tokens are optimized. revision: yes
Circularity Check
No significant circularity
full rationale
The paper describes PromptCOS as an engineering construction: a cyclic output signal encoded via auxiliary tokens plus optimized cover tokens, with performance metrics reported as direct empirical outcomes of that design. No equations, derivations, or predictions are presented that reduce claimed results to quantities defined by the method's own fitted parameters or self-citations. The approach targets stated limitations of prior logit-based methods through explicit design choices rather than a closed mathematical chain. This is the most common honest finding for an applied systems paper.
Axiom & Free-Parameter Ledger
free parameters (2)
- auxiliary token set
- cover token optimization targets
axioms (1)
- domain assumption Content-level similarity of outputs can serve as a stable and distinctive watermark signal
invented entities (1)
-
cyclic output signal
no independent evidence
Forward citations
Cited by 2 Pith papers
-
PragLocker: Protecting Agent Intellectual Property in Untrusted Deployments via Non-Portable Prompts
PragLocker protects agent prompts as IP by building non-portable obfuscated versions that function only on the intended LLM through code-symbol semantic anchoring followed by target-model feedback noise injection.
-
PragLocker: Protecting Agent Intellectual Property in Untrusted Deployments via Non-Portable Prompts
PragLocker generates function-preserving but non-portable prompts for LLM agents via code-symbol semantic anchoring followed by target-model feedback noise injection.
Reference graph
Works this paper leans on
-
[1]
Prompt leakage effect and mitigation strategies for multi-turn llm applications
Divyansh Agarwal, Alexander Richard Fabbri, Ben Risher, Philippe Laban, Shafiq Joty, and Chien-Sheng Wu. Prompt leakage effect and mitigation strategies for multi-turn llm applications. InEMNLP, 2024
work page 2024
- [2]
-
[3]
asgeirtj. Gpt-5 system prompt leakage. https://github.com/asgeirtj/ system_prompts_leaks/blob/main/OpenAI/gpt-5-thinking.md, 2025
work page 2025
-
[4]
Jing Chen, Yawen Mao, Min Gan, Dongqing Wang, and Quanmin Zhu. Greedy search method for separable nonlinear models using stage aitken gradient descent and least squares algorithms.IEEE Transactions on Automatic Control, pages 5044–5051, 2023
work page 2023
-
[5]
Training Verifiers to Solve Math Word Problems
Karl Cobbe, Vineet Kosaraju, Mohammad Bavarian, and et al. Training verifiers to solve math word problems.arXiv preprint arXiv:2110.14168, 2021
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[6]
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
DeepSeek-AI, Daya Guo, Dejian Yang, and et al. Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning. arXiv preprint arXiv:2501.12948, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[7]
Stateful defenses for machine learning models are not yet secure against black-box attacks
Ryan Feng, Ashish Hooda, Neal Mangaokar, Kassem Fawaz, Somesh Jha, and Atul Prakash. Stateful defenses for machine learning models are not yet secure against black-box attacks. InCCS, 2023
work page 2023
-
[8]
Math- ematical capabilities of chatgpt
Simon Frieder, Luca Pinchetti, Ryan-Rhys Griffiths, and et al. Math- ematical capabilities of chatgpt. InNeurIPS, 2023
work page 2023
-
[9]
Introducing github copilot: your ai pair programmer
Nat Friedman. Introducing github copilot: your ai pair programmer. https://docs.github.com/zh/copilot/quickstart, 2021
work page 2021
-
[10]
Cosearchagent: A lightweight collaborative search agent with large language models
P Gong, J Li, and J Mao. Cosearchagent: A lightweight collaborative search agent with large language models. InSIGIR, 2024
work page 2024
-
[11]
On benchmarking code llms for android malware analysis
Yiling He, Hongyu She, Xingzhi Qian, Xinran Zheng, Zhuo Chen, Zhan Qin, and Lorenzo Cavallaro. On benchmarking code llms for android malware analysis. InISSTA Workshop, 2025
work page 2025
-
[12]
Is difficulty calibration all we need? towards more practical membership inference attacks
Yu He, Boheng Li, Yao Wang, Mengda Yang, Juan Wang, Hongxin Hu, and Xingyu Zhao. Is difficulty calibration all we need? towards more practical membership inference attacks. InCCS, 2024
work page 2024
-
[13]
Xinyi Hou, Yanjie Zhao, Yue Liu, and et al. Large language models for software engineering: A systematic literature review.ACM Trans- actions on Software Engineering and Methodology, 33(8), 2024
work page 2024
-
[14]
On the (in) security of llm app stores
Xinyi Hou, Yanjie Zhao, and Haoyu Wang. On the (in) security of llm app stores. InIEEE S&P, 2025
work page 2025
-
[15]
Mitigating catastrophic forgetting in large language models with self-synthesized rehearsal
Jianheng Huang, Leyang Cui, Ante Wang, Chengyi Yang, Xinting Liao, Linfeng Song, Junfeng Yao, and Jinsong Su. Mitigating catastrophic forgetting in large language models with self-synthesized rehearsal. InACL, 2024
work page 2024
-
[16]
Pleak: Prompt leaking attacks against large language model applications
Bo Hui, Haolin Yuan, Neil Gong, Philippe Burlina, and Yinzhi Cao. Pleak: Prompt leaking attacks against large language model applications. InCCS, 2024
work page 2024
-
[17]
A watermark for large language models
John Kirchenbauer, Jonas Geiping, Yuxin Wen, Jonathan Katz, Ian Miers, and Tom Goldstein. A watermark for large language models. InICML, 2023
work page 2023
-
[18]
Clearstamp: A human-visible and robust model-ownership proof based on trans- posed model training
Torsten Krauß, Jasper Stang, and Alexandra Dmitrienko. Clearstamp: A human-visible and robust model-ownership proof based on trans- posed model training. InUSENIX Security, 2024
work page 2024
-
[19]
Rohith Kuditipudi, John Thickstun, Tatsunori Hashimoto, and Percy Liang. Robust distortion-free watermarks for language models.Trans- actions on Machine Learning Research, 2023
work page 2023
-
[20]
Governing open vocabulary data leaks using an edge llm through programming by example
Q Li, J Wen, and H Jin. Governing open vocabulary data leaks using an edge llm through programming by example. InMobiCom, 2024
work page 2024
-
[21]
Citation-enhanced generation for llm-based chatbots
Weitao Li, Junkai Li, Weizhi Ma, and Yang Liu. Citation-enhanced generation for llm-based chatbots. InACL, 2024
work page 2024
-
[22]
Yiming Li, Shuo Shao, Yu He, Junfeng Guo, Tianwei Zhang, Zhan Qin, Pin-Yu Chen, Michael Backes, Philip Torr, Dacheng Tao, and Kui Ren. Rethinking data protection in the (generative) artificial intelligence era.arXiv preprint arXiv:2507.03034, 2025
-
[23]
Pro- tecting intellectual property of large language model-based code generation apis via watermarks
Zongjie Li, Chaozheng Wang, Shuai Wang, and Cuiyun Gao. Pro- tecting intellectual property of large language model-based code generation apis via watermarks. InCCS, 2023
work page 2023
-
[24]
Parrot: Efficient serving of{LLM- based}applications with semantic variable
Chaofan Lin, Zhenhua Han, Chengruidong Zhang, Yuqing Yang, Fan Yang, Chen Chen, and Lili Qiu. Parrot: Efficient serving of{LLM- based}applications with semantic variable. InOSDI, 2024
work page 2024
-
[25]
Aiwei Liu, Leyi Pan, Yijian Lu, Jingjing Li, Xuming Hu, Xi Zhang, Lijie Wen, Irwin King, Hui Xiong, and Philip Yu. A survey of text watermarking in the era of large language models.ACM Computing Surveys, 57(2):1–36, 2024
work page 2024
-
[26]
Jiawei Liu, Chunqiu Steven Xia, Yuyao Wang, and Lingming Zhang. Is your code generated by chatgpt really correct? rigorous evaluation of large language models for code generation. InNeurIPS, 2023
work page 2023
-
[27]
Pengfei Liu, Weizhe Yuan, Jinlan Fu, Zhengbao Jiang, Hiroaki Hayashi, and Graham Neubig. Pre-train, prompt, and predict: A sys- tematic survey of prompting methods in natural language processing. ACM computing surveys, 55(9):1–35, 2023
work page 2023
-
[28]
Zhiyuan Ma, Guoli Jia, Biqing Qi, and Bowen Zhou. Safe-sd: Safe and traceable stable diffusion with text prompt trigger for invisible generative watermarking. InACM MM, pages 7113–7122, 2024
work page 2024
-
[29]
Poster: Multiparty private set intersection from multiparty homomorphic encryption
Christian Mouchet, Sylvain Chatel, Lea Nürnberger, and Wouter Lueks. Poster: Multiparty private set intersection from multiparty homomorphic encryption. InCCS, 2024
work page 2024
-
[30]
Mahmoud Nazzal, Issa Khalil, Abdallah Khreishah, and NhatHai Phan. Promsec: Prompt optimization for secure generation of func- tional source code with large language models (llms). InCCS, 2024
work page 2024
-
[31]
OpenAI. Chatgpt. https://chatgpt.com/, 2025
work page 2025
-
[32]
OpenAI, Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ah- mad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, and Sam Altman et al. Gpt-4 technical report, 2024
work page 2024
-
[33]
Teach llms to phish: Stealing private information from language models
A Panda, C A Choquette-Choo, Z Zhang, et al. Teach llms to phish: Stealing private information from language models. InICLR, 2024
work page 2024
- [34]
-
[35]
Kui Ren, Ziqi Yang, Li Lu, Jian Liu, Yiming Li, Jie Wan, Xiaodi Zhao, Xianheng Feng, and Shuo Shao. Sok: On the role and future of aigc watermarking in the era of gen-ai.arXiv preprint arXiv:2411.11478, 2024
-
[36]
Code Llama: Open Foundation Models for Code
Baptiste Rozière, Jonas Gehring, Fabian Gloeckle, and et al. Code llama: Open foundation models for code.arXiv preprint arXiv:2308.12950, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[37]
Faq retrieval using query-question similarity and bert-based query-answer relevance
Wataru Sakata, Tomohide Shibata, Ribeka Tanaka, and Sadao Kuro- hashi. Faq retrieval using query-question similarity and bert-based query-answer relevance. InSIGIR, 2019
work page 2019
-
[38]
Sander Schulhoff. 2025 llm top-10 risks. https://genai.owasp.org/ llmrisk/llm072025-system-prompt-leakage/, 2025
work page 2025
-
[39]
Microsoft bing system prompt leakage
Sander Schulhoff. Microsoft bing system prompt leakage. https: //learnprompting.org/docs/prompt_hacking/leaking, 2025
work page 2025
-
[40]
The Prompt Report: A Systematic Survey of Prompt Engineering Techniques
Sander Schulhoff, Michael Ilie, Nishant Balepur, Konstantine Ka- hadze, Amanda Liu, Chenglei Si, Yinheng Li, Aayush Gupta, Hy- oJung Han, Sevien Schulhoff, et al. The prompt report: a sys- tematic survey of prompt engineering techniques.arXiv preprint arXiv:2406.06608, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[41]
Shuo Shao, Yiming Li, Hongwei Yao, Yiling He, Zhan Qin, and Kui Ren. Explanation as a watermark: Towards harmless and multi-bit model ownership verification via watermarking feature attribution. In NDSS, 2025
work page 2025
-
[42]
Generative echo chamber? effect of llm-powered search systems on diverse information seeking
N Sharma, Q V Liao, and Z Xiao. Generative echo chamber? effect of llm-powered search systems on diverse information seeking. In CHI, 2024
work page 2024
-
[43]
Prompt stealing attacks against text-to-image generation models
Xinyue Shen, Yiting Qu, Michael Backes, and Yang Zhang. Prompt stealing attacks against text-to-image generation models. InUSENIX Security, 2024
work page 2024
-
[44]
Logan IV , Eric Wallace, and Sameer Singh
Taylor Shin, Yasaman Razeghi, Robert L. Logan IV , Eric Wallace, and Sameer Singh. Autoprompt: Eliciting knowledge from language models with automatically generated prompts. InEMNLP, 2020
work page 2020
-
[45]
Why so toxic? measuring and triggering toxic behavior in open-domain chatbots
Wai Man Si, Michael Backes, Jeremy Blackburn, Emiliano De Cristo- faro, Gianluca Stringhini, Savvas Zannettou, and Yang Zhang. Why so toxic? measuring and triggering toxic behavior in open-domain chatbots. InCCS, 2022
work page 2022
-
[46]
On transferability of prompt tuning for natural language processing
Yusheng Su, Xiaozhi Wang, Yujia Qin, Chi-Min Chan, Yankai Lin, Huadong Wang, Kaiyue Wen, et al. On transferability of prompt tuning for natural language processing. InNAACL, 2022
work page 2022
-
[47]
Autohint: Auto- matic prompt optimization with hint generation.arXiv preprint arXiv:2307.07415, 2023
Hong Sun, Xue Li, Yinchuan Xu, and et al. Autohint: Auto- matic prompt optimization with hint generation.arXiv preprint arXiv:2307.07415, 2023
-
[48]
Peftguard: detecting backdoor attacks against parameter-efficient fine- tuning
Zhen Sun, Tianshuo Cong, Yule Liu, Chenhao Lin, Xinlei He, et al. Peftguard: detecting backdoor attacks against parameter-efficient fine- tuning. InIEEE S&P, 2025
work page 2025
-
[49]
On the effectiveness of prompt stealing attacks on in-the- wild prompts
Yicong Tan, Xinyue Shen, Yun Shen, Michael Backes, and Yang Zhang. On the effectiveness of prompt stealing attacks on in-the- wild prompts. In2025 IEEE Symposium on Security and Privacy (SP), pages 392–410. IEEE, 2025
work page 2025
-
[50]
Codegemma: Open code models based on gemma.arXiv preprint arXiv:2406.11409, 2024
CodeGemma Team, Heri Zhao, Jeffrey Hui, and et al. Codegemma: Open code models based on gemma.arXiv preprint arXiv:2406.11409, 2024
-
[51]
Gemma: Open Models Based on Gemini Research and Technology
Gemma Team, Thomas Mesnard, Surya Bhupatiraju, and et al. Gemma: Open models based on gemini research and technology. arXiv preprint arXiv:2403.08295, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[52]
LLaMA: Open and Efficient Foundation Language Models
Hugo Touvron, Thibaut Lavril, Gautier Izacard, and et al. Llama: Open and efficient foundation language models.arXiv preprint arXiv:2302.13971, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[53]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. Atten- tion is all you need. InNeurIPS, 2017
work page 2017
-
[54]
Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Fei Xia, Ed Chi, Quoc V Le, Denny Zhou, et al. Chain-of-thought prompting elicits reasoning in large language models.Advances in neural information processing systems, 35:24824–24837, 2022
work page 2022
-
[55]
Hard prompts made easy: Gradient-based discrete optimization for prompt tuning and discovery
Yuxin Wen, Neel Jain, John Kirchenbauer, Micah Goldblum, Jonas Geiping, and Tom Goldstein. Hard prompts made easy: Gradient-based discrete optimization for prompt tuning and discovery. NeurIPS, 36:51008–51025, 2023
work page 2023
-
[56]
Instructional fingerprinting of large language models
Jiashu Xu, Fei Wang, Mingyu Ma, Pang Wei Koh, Chaowei Xiao, and Muhao Chen. Instructional fingerprinting of large language models. InNAACL, 2024
work page 2024
-
[57]
Jin Xu, Xiaojiang Liu, Jianhao Yan, Deng Cai, Huayang Li, and Jian Li. Learning to break the loop: Analyzing and mitigating repetitions for neural text generation.NeurIPS, 35:3082–3095, 2022
work page 2022
-
[58]
Pm-abe: Puncturable bilateral fine-grained access control from lattices for secret sharing
Mengxue Yang, Huaqun Wang, and Debiao He. Pm-abe: Puncturable bilateral fine-grained access control from lattices for secret sharing. IEEE Transactions on Dependable and Secure Computing, 2024
work page 2024
-
[59]
Tracing text provenance via context-aware lexical substitution
Xi Yang, Jie Zhang, Kejiang Chen, Weiming Zhang, Zehua Ma, Feng Wang, and Nenghai Yu. Tracing text provenance via context-aware lexical substitution. InAAAI, 2022
work page 2022
-
[60]
Prsa: Prompt stealing attacks against real-world prompt services
Yong Yang, Changjiang Li, Qingming Li, Oubo Ma, Haoyu Wang, et al. Prsa: Prompt stealing attacks against real-world prompt services. InUSENIX Security, 2025
work page 2025
-
[61]
Promptcare: Prompt copyright protection by watermark injection and verification
Hongwei Yao, Jian Lou, Kui Ren, and Zhan Qin. Promptcare: Prompt copyright protection by watermark injection and verification. InIEEE S&P, 2024
work page 2024
-
[62]
Biao Yi, Tiansheng Huang, Sishuo Chen, Tong Li, Zheli Liu, et al. Probe before you talk: Towards black-box defense against backdoor unalignment for large language models. InICLR, 2025
work page 2025
-
[63]
Exploring the effec- tiveness of prompt engineering for legal reasoning tasks
Fangyi Yu, Lee Quartey, and Frank Schilder. Exploring the effec- tiveness of prompt engineering for legal reasoning tasks. InACL, 2023
work page 2023
-
[64]
Remark-llm: A robust and efficient watermarking framework for generative large language models
Ruisi Zhang, Shehzeen Samarah Hussain, Paarth Neekhara, and Fari- naz Koushanfar. Remark-llm: A robust and efficient watermarking framework for generative large language models. InUSENIX Security, 2024
work page 2024
-
[65]
Automatic chain of thought prompting in large language models
Zhuosheng Zhang, Aston Zhang, Mu Li, and Alex Smola. Automatic chain of thought prompting in large language models. InICLR, 2022
work page 2022
-
[66]
Parden, can you repeat that? defending against jailbreaks via repetition
Ziyang Zhang, Qizhen Zhang, and Jakob Foerster. Parden, can you repeat that? defending against jailbreaks via repetition. InICML, 2024
work page 2024
-
[67]
Provable robust watermarking for AI-generated text
Xuandong Zhao, Prabhanjan Ananth, Lei Li, and Yu-Xiang Wang. Provable robust watermarking for AI-generated text. InNeurIPS, 2023
work page 2023
-
[68]
Provable robust watermarking for ai-generated text
Xuandong Zhao, Prabhanjan Vijendra Ananth, Lei Li, and Yu-Xiang Wang. Provable robust watermarking for ai-generated text. InICLR, 2024
work page 2024
-
[69]
Sok: Water- marking for ai-generated content
Xuandong Zhao, Sam Gunn, Miranda Christ, and et al. Sok: Water- marking for ai-generated content. InIEEE S&P, 2025
work page 2025
-
[70]
Large language models are human-level prompt engineers
Yongchao Zhou, Andrei Ioan Muresanu, Ziwen Han, Keiran Paster, Silviu Pitis, Harris Chan, and Jimmy Ba. Large language models are human-level prompt engineers. InICLR, 2022
work page 2022
-
[71]
Universal and Transferable Adversarial Attacks on Aligned Language Models
Andy Zou, Zifan Wang, Nicholas Carlini, Milad Nasr, J. Zico Kolter, and Matt Fredrikson. Universal and transferable adversarial attacks on aligned language models.arXiv preprint arXiv:2307.15043, 2023. Appendix A. Ethics Considerations The study investigates copyright protection for system prompts, with the goal of preventing unauthorized replica- tion an...
work page internal anchor Pith review Pith/arXiv arXiv 2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.