PromptCOS: Towards Content-only System Prompt Copyright Auditing for LLMs

Dacheng Tao; Enhao Huang; Hongwei Yao; Shuo Shao; Yiming Li; Yuchen Yang; Yuyi Wang; Zhan Qin; Zhibo Wang

arxiv: 2509.03117 · v3 · pith:ZQ7EFNMPnew · submitted 2025-09-03 · 💻 cs.CR

PromptCOS: Towards Content-only System Prompt Copyright Auditing for LLMs

Yuchen Yang , Yiming Li , Hongwei Yao , Enhao Huang , Shuo Shao , Yuyi Wang , Zhibo Wang , Dacheng Tao

show 1 more author

Zhan Qin

This is my paper

Pith reviewed 2026-05-25 07:57 UTC · model grok-4.3

classification 💻 cs.CR

keywords system promptcopyright auditingwatermarkinglarge language modelscontent-only detectionintellectual property

0 comments

The pith

PromptCOS audits system prompt copyrights from generated text alone.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a way to verify ownership of system prompts in large language models when only the output text is available for inspection. Current verification techniques depend on shifts in internal probability distributions that are inaccessible in ordinary content-only use cases, where sampling noise also makes signals unreliable. PromptCOS instead inserts a few auxiliary tokens that steer the model toward a repeating cyclic output pattern, leaving the original prompt instructions untouched. Cover tokens in the prompt are tuned so that attempts to delete the auxiliary tokens cause clear performance drops. This setup lets prompt owners check for unauthorized copies directly from the text that an application produces.

Core claim

PromptCOS establishes a content-only auditing framework by encoding a cyclic output signal using auxiliary tokens injected into the system prompt, with cover tokens optimized to penalize removal attempts, thereby achieving stable watermark detection solely from generated content without altering the prompt's primary behavior.

What carries the argument

Cyclic output signal encoded by auxiliary tokens, protected by optimized cover tokens that cause degradation if removed.

If this is right

System prompts can carry a detectable watermark without changing their core task instructions.
Verification of prompt ownership succeeds using only the generated text, independent of internal model probabilities.
Removing the watermark tokens triggers measurable degradation in the prompt's original performance.
The watermark resists removal attempts across multiple categories of attacks.
Auditing requires far less computation than logit-based alternatives.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Owners of high-value prompts could embed such signals before licensing them to third-party applications.
The same auxiliary-token approach might apply to auditing other forms of prompt customization beyond system prompts.
Marketplaces that sell prompts could require watermarking as a condition for listing.
Detection could be extended to track which downstream application is using a stolen prompt by varying the cyclic pattern per licensee.

Load-bearing premise

A small set of auxiliary tokens can create a stable, detectable cyclic pattern in the output while the main prompt instructions continue to function unchanged.

What would settle it

An experiment in which the model is prompted without the auxiliary tokens or after their removal and the measured similarity to the target cyclic signal drops below the reported detection threshold.

Figures

Figures reproduced from arXiv: 2509.03117 by Dacheng Tao, Enhao Huang, Hongwei Yao, Shuo Shao, Yiming Li, Yuchen Yang, Yuyi Wang, Zhan Qin, Zhibo Wang.

**Figure 2.** Figure 2: Fidelity measures the consistency of responses [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: PromptCOS protects system prompts through two phases: (1) Watermark Embedding. PromptCOS defines optimization objectives with three key designs, including cyclic output signals (effectiveness), auxiliary tokens (fidelity), and cover tokens (robustness), and exploits an alternating optimization algorithm to optimize the watermark components: watermarked prompt, verification query, and signal mark. (2) Copyr… view at source ↗

**Figure 4.** Figure 4: The single signal limits the position and timing of signal mark generation, making it susceptible to uncertain token [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 5.** Figure 5: Watermarked system prompts’ performance under different attacks. The four attack strategies are represented [PITH_FULL_IMAGE:figures/full_fig_p012_5.png] view at source ↗

**Figure 6.** Figure 6: This source code of the template function. [PITH_FULL_IMAGE:figures/full_fig_p018_6.png] view at source ↗

read the original abstract

System prompts are critical for shaping the behavior and output quality of large language model (LLM)-based applications, driving substantial investment in optimizing high-quality prompts beyond traditional handcrafted designs. However, as system prompts become valuable intellectual property, they are increasingly vulnerable to prompt theft and unauthorized use, highlighting the urgent need for effective copyright auditing, especially watermarking. Existing methods rely on verifying subtle logit distribution shifts triggered by a query. We observe that this logit-dependent verification framework is impractical in real-world content-only settings, primarily because (1) random sampling makes content-level generation unstable for verification, and (2) stronger instructions needed for content-level signals compromise prompt fidelity. To overcome these challenges, we propose PromptCOS, the first content-only system prompt copyright auditing method based on content-level output similarity. PromptCOS achieves watermark stability by designing a cyclic output signal as the conditional instruction's target. It preserves prompt fidelity by injecting a small set of auxiliary tokens to encode the watermark, leaving the main prompt untouched. Furthermore, to ensure robustness against malicious removal, we optimize cover tokens, i.e., critical tokens in the original prompt, to ensure that removing auxiliary tokens causes severe performance degradation. Experimental results show that promptCOS achieves high effectiveness (99.3% average watermark similarity), strong distinctiveness (60.8% higher than the best baseline), high fidelity (accuracy degradation no greater than 0.6%), robustness (resilience against four potential attack categories), and high computational efficiency (up to 98.1% cost saving).

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

PromptCOS claims the first content-only system prompt watermark via cyclic signals and auxiliary tokens, but the abstract gives no experimental details to check if the 99.3% similarity numbers actually hold.

read the letter

The main takeaway is that this paper targets a real gap: existing watermark methods for system prompts need logit access, which does not work when you only see generated text. PromptCOS tries to fix that by making the watermark live in the content itself through a cyclic output signal, a small set of auxiliary tokens, and optimized cover tokens that punish removal attempts. That design choice is the actual novelty, and it directly responds to the two problems they name—sampling instability and fidelity loss—without touching the core prompt instructions. The paper does a clean job spelling out why logit-based verification fails in practice and why a content-similarity approach could be more deployable. The reported efficiency gain of up to 98.1% cost saving is also a concrete engineering point worth noting if it survives scrutiny. The soft spots are the usual ones for an abstract-only view: the 99.3% similarity, 60.8% distinctiveness lift, and 0.6% fidelity bound are stated without any description of the test prompts, the exact cyclic construction, the baseline implementations, or statistical controls. The robustness claims against four attack categories are likewise listed but not unpacked. The central assumption—that a handful of auxiliary tokens plus cover optimization can produce a stable, detectable signal while leaving model behavior almost unchanged—looks reasonable on paper but is load-bearing and untested in the material we have. This work is for groups already working on LLM IP protection, prompt marketplaces, or watermarking extensions. A reader who wants to see a practical alternative to logit methods will find the design ideas useful even if the numbers need verification. It is coherent on its own terms and shows clear thinking about the deployment constraints, so it deserves a serious referee to examine the full methods and experiments rather than a desk reject.

Referee Report

3 major / 2 minor

Summary. The manuscript proposes PromptCOS, the first content-only system prompt copyright auditing method for LLMs based on content-level output similarity. It achieves watermark stability via a cyclic output signal as the conditional instruction target, preserves fidelity by injecting a small set of auxiliary tokens to encode the watermark (leaving the main prompt untouched), and ensures robustness by optimizing cover tokens so that auxiliary-token removal causes severe performance degradation. Experiments report 99.3% average watermark similarity, 60.8% higher distinctiveness than the best baseline, accuracy degradation no greater than 0.6%, resilience against four attack categories, and up to 98.1% computational cost saving.

Significance. If the empirical claims hold under full scrutiny of methods and controls, PromptCOS would provide a practical solution for system-prompt IP protection in deployed LLM applications where only generated content (not logits) is observable, directly addressing the sampling instability and fidelity-loss limitations of prior logit-based watermarking approaches.

major comments (3)

[Abstract, design paragraph] Abstract and design description: the stability claim rests on the 'cyclic output signal' construction, yet no formal definition, recurrence relation, or proof sketch is supplied showing why this signal remains detectable after sampling; this is load-bearing for the core effectiveness claim of 99.3% similarity.
[Experimental results] Experimental results paragraph: the 99.3% similarity, 0.6% fidelity, and 60.8% distinctiveness figures are stated without reference to the precise experimental protocol, baseline implementations, number of trials, statistical tests, or data-exclusion criteria, preventing verification that post-hoc choices did not inflate the numbers.
[Robustness evaluation] Robustness section: the claim of resilience against 'four potential attack categories' is presented without enumerating the attacks or showing quantitative degradation curves when auxiliary tokens are removed, leaving the cover-token optimization claim unverified.

minor comments (2)

[Method] Notation for auxiliary and cover tokens should be introduced with explicit symbols and an equation showing how they are combined with the original prompt.
[Abstract] The abstract states 'up to 98.1% cost saving' without specifying the baseline cost metric or the exact configuration achieving that figure.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment below, indicating planned revisions where appropriate to improve clarity and verifiability.

read point-by-point responses

Referee: [Abstract, design paragraph] Abstract and design description: the stability claim rests on the 'cyclic output signal' construction, yet no formal definition, recurrence relation, or proof sketch is supplied showing why this signal remains detectable after sampling; this is load-bearing for the core effectiveness claim of 99.3% similarity.

Authors: We agree that a formal definition would strengthen the presentation. The cyclic signal is constructed by setting the conditional instruction target to a periodic sequence of auxiliary tokens whose embeddings induce a detectable similarity pattern under content-only comparison. In the revision we will add an explicit recurrence relation (s_{t+1} = f(s_t, aux)) together with a short argument showing that the periodicity survives sampling noise because the auxiliary tokens dominate the similarity metric even after token-level perturbations. This addition will be placed in Section 3.2. revision: yes
Referee: [Experimental results] Experimental results paragraph: the 99.3% similarity, 0.6% fidelity, and 60.8% distinctiveness figures are stated without reference to the precise experimental protocol, baseline implementations, number of trials, statistical tests, or data-exclusion criteria, preventing verification that post-hoc choices did not inflate the numbers.

Authors: The full experimental protocol (including 500 independent trials per setting, exact baseline re-implementations, t-test significance thresholds, and exclusion rules for degenerate prompts) appears in Section 4.1 and Appendix B. To address the concern directly we will insert a concise protocol summary table and a statement of statistical procedures immediately after the reported figures in the abstract and results section. revision: yes
Referee: [Robustness evaluation] Robustness section: the claim of resilience against 'four potential attack categories' is presented without enumerating the attacks or showing quantitative degradation curves when auxiliary tokens are removed, leaving the cover-token optimization claim unverified.

Authors: The four attack categories (auxiliary-token deletion, synonym substitution, prompt paraphrasing, and logit-level perturbation) are defined and evaluated in Section 5.2. We will expand that section with an explicit enumeration, per-attack success rates, and degradation curves for both watermarked and cover-token-optimized prompts, confirming that removal of auxiliary tokens triggers >40% accuracy drop only when cover tokens are optimized. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper describes PromptCOS as an engineering construction: a cyclic output signal encoded via auxiliary tokens plus optimized cover tokens, with performance metrics reported as direct empirical outcomes of that design. No equations, derivations, or predictions are presented that reduce claimed results to quantities defined by the method's own fitted parameters or self-citations. The approach targets stated limitations of prior logit-based methods through explicit design choices rather than a closed mathematical chain. This is the most common honest finding for an applied systems paper.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 1 invented entities

The central claim rests on several design choices whose justification is not derived from prior literature but introduced to solve the stated practical constraints.

free parameters (2)

auxiliary token set
Small set of tokens chosen to encode the watermark; selection criteria not specified in abstract.
cover token optimization targets
Critical tokens in the original prompt tuned to cause performance drop upon removal; optimization procedure and objective not detailed.

axioms (1)

domain assumption Content-level similarity of outputs can serve as a stable and distinctive watermark signal
Invoked when the cyclic output signal is defined as the verification target.

invented entities (1)

cyclic output signal no independent evidence
purpose: Provide stable content-level watermark detectable without logits
New construct introduced to overcome instability of random sampling

pith-pipeline@v0.9.0 · 5832 in / 1565 out tokens · 34754 ms · 2026-05-25T07:57:33.157342+00:00 · methodology

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

PragLocker: Protecting Agent Intellectual Property in Untrusted Deployments via Non-Portable Prompts
cs.CR 2026-05 unverdicted novelty 7.0

PragLocker protects agent prompts as IP by building non-portable obfuscated versions that function only on the intended LLM through code-symbol semantic anchoring followed by target-model feedback noise injection.
PragLocker: Protecting Agent Intellectual Property in Untrusted Deployments via Non-Portable Prompts
cs.CR 2026-05 unverdicted novelty 6.0

PragLocker generates function-preserving but non-portable prompts for LLM agents via code-symbol semantic anchoring followed by target-model feedback noise injection.

Reference graph

Works this paper leans on

71 extracted references · 71 canonical work pages · cited by 1 Pith paper · 7 internal anchors

[1]

Prompt leakage effect and mitigation strategies for multi-turn llm applications

Divyansh Agarwal, Alexander Richard Fabbri, Ben Risher, Philippe Laban, Shafiq Joty, and Chien-Sheng Wu. Prompt leakage effect and mitigation strategies for multi-turn llm applications. InEMNLP, 2024

work page 2024
[2]

Quick suite

Amazon. Quick suite. https://aws.amazon.com/cn/quicksuite/, 2025

work page 2025
[3]

Gpt-5 system prompt leakage

asgeirtj. Gpt-5 system prompt leakage. https://github.com/asgeirtj/ system_prompts_leaks/blob/main/OpenAI/gpt-5-thinking.md, 2025

work page 2025
[4]

Greedy search method for separable nonlinear models using stage aitken gradient descent and least squares algorithms.IEEE Transactions on Automatic Control, pages 5044–5051, 2023

Jing Chen, Yawen Mao, Min Gan, Dongqing Wang, and Quanmin Zhu. Greedy search method for separable nonlinear models using stage aitken gradient descent and least squares algorithms.IEEE Transactions on Automatic Control, pages 5044–5051, 2023

work page 2023
[5]

Training Verifiers to Solve Math Word Problems

Karl Cobbe, Vineet Kosaraju, Mohammad Bavarian, and et al. Training verifiers to solve math word problems.arXiv preprint arXiv:2110.14168, 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021
[6]

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

DeepSeek-AI, Daya Guo, Dejian Yang, and et al. Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning. arXiv preprint arXiv:2501.12948, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[7]

Stateful defenses for machine learning models are not yet secure against black-box attacks

Ryan Feng, Ashish Hooda, Neal Mangaokar, Kassem Fawaz, Somesh Jha, and Atul Prakash. Stateful defenses for machine learning models are not yet secure against black-box attacks. InCCS, 2023

work page 2023
[8]

Math- ematical capabilities of chatgpt

Simon Frieder, Luca Pinchetti, Ryan-Rhys Griffiths, and et al. Math- ematical capabilities of chatgpt. InNeurIPS, 2023

work page 2023
[9]

Introducing github copilot: your ai pair programmer

Nat Friedman. Introducing github copilot: your ai pair programmer. https://docs.github.com/zh/copilot/quickstart, 2021

work page 2021
[10]

Cosearchagent: A lightweight collaborative search agent with large language models

P Gong, J Li, and J Mao. Cosearchagent: A lightweight collaborative search agent with large language models. InSIGIR, 2024

work page 2024
[11]

On benchmarking code llms for android malware analysis

Yiling He, Hongyu She, Xingzhi Qian, Xinran Zheng, Zhuo Chen, Zhan Qin, and Lorenzo Cavallaro. On benchmarking code llms for android malware analysis. InISSTA Workshop, 2025

work page 2025
[12]

Is difficulty calibration all we need? towards more practical membership inference attacks

Yu He, Boheng Li, Yao Wang, Mengda Yang, Juan Wang, Hongxin Hu, and Xingyu Zhao. Is difficulty calibration all we need? towards more practical membership inference attacks. InCCS, 2024

work page 2024
[13]

Large language models for software engineering: A systematic literature review.ACM Trans- actions on Software Engineering and Methodology, 33(8), 2024

Xinyi Hou, Yanjie Zhao, Yue Liu, and et al. Large language models for software engineering: A systematic literature review.ACM Trans- actions on Software Engineering and Methodology, 33(8), 2024

work page 2024
[14]

On the (in) security of llm app stores

Xinyi Hou, Yanjie Zhao, and Haoyu Wang. On the (in) security of llm app stores. InIEEE S&P, 2025

work page 2025
[15]

Mitigating catastrophic forgetting in large language models with self-synthesized rehearsal

Jianheng Huang, Leyang Cui, Ante Wang, Chengyi Yang, Xinting Liao, Linfeng Song, Junfeng Yao, and Jinsong Su. Mitigating catastrophic forgetting in large language models with self-synthesized rehearsal. InACL, 2024

work page 2024
[16]

Pleak: Prompt leaking attacks against large language model applications

Bo Hui, Haolin Yuan, Neil Gong, Philippe Burlina, and Yinzhi Cao. Pleak: Prompt leaking attacks against large language model applications. InCCS, 2024

work page 2024
[17]

A watermark for large language models

John Kirchenbauer, Jonas Geiping, Yuxin Wen, Jonathan Katz, Ian Miers, and Tom Goldstein. A watermark for large language models. InICML, 2023

work page 2023
[18]

Clearstamp: A human-visible and robust model-ownership proof based on trans- posed model training

Torsten Krauß, Jasper Stang, and Alexandra Dmitrienko. Clearstamp: A human-visible and robust model-ownership proof based on trans- posed model training. InUSENIX Security, 2024

work page 2024
[19]

Robust distortion-free watermarks for language models.Trans- actions on Machine Learning Research, 2023

Rohith Kuditipudi, John Thickstun, Tatsunori Hashimoto, and Percy Liang. Robust distortion-free watermarks for language models.Trans- actions on Machine Learning Research, 2023

work page 2023
[20]

Governing open vocabulary data leaks using an edge llm through programming by example

Q Li, J Wen, and H Jin. Governing open vocabulary data leaks using an edge llm through programming by example. InMobiCom, 2024

work page 2024
[21]

Citation-enhanced generation for llm-based chatbots

Weitao Li, Junkai Li, Weizhi Ma, and Yang Liu. Citation-enhanced generation for llm-based chatbots. InACL, 2024

work page 2024
[22]

Rethinking data protection in the (generative) artificial intelligence era.arXiv preprint arXiv:2507.03034, 2025

Yiming Li, Shuo Shao, Yu He, Junfeng Guo, Tianwei Zhang, Zhan Qin, Pin-Yu Chen, Michael Backes, Philip Torr, Dacheng Tao, and Kui Ren. Rethinking data protection in the (generative) artificial intelligence era.arXiv preprint arXiv:2507.03034, 2025

work page arXiv 2025
[23]

Pro- tecting intellectual property of large language model-based code generation apis via watermarks

Zongjie Li, Chaozheng Wang, Shuai Wang, and Cuiyun Gao. Pro- tecting intellectual property of large language model-based code generation apis via watermarks. InCCS, 2023

work page 2023
[24]

Parrot: Efficient serving of{LLM- based}applications with semantic variable

Chaofan Lin, Zhenhua Han, Chengruidong Zhang, Yuqing Yang, Fan Yang, Chen Chen, and Lili Qiu. Parrot: Efficient serving of{LLM- based}applications with semantic variable. InOSDI, 2024

work page 2024
[25]

A survey of text watermarking in the era of large language models.ACM Computing Surveys, 57(2):1–36, 2024

Aiwei Liu, Leyi Pan, Yijian Lu, Jingjing Li, Xuming Hu, Xi Zhang, Lijie Wen, Irwin King, Hui Xiong, and Philip Yu. A survey of text watermarking in the era of large language models.ACM Computing Surveys, 57(2):1–36, 2024

work page 2024
[26]

Is your code generated by chatgpt really correct? rigorous evaluation of large language models for code generation

Jiawei Liu, Chunqiu Steven Xia, Yuyao Wang, and Lingming Zhang. Is your code generated by chatgpt really correct? rigorous evaluation of large language models for code generation. InNeurIPS, 2023

work page 2023
[27]

Pre-train, prompt, and predict: A sys- tematic survey of prompting methods in natural language processing

Pengfei Liu, Weizhe Yuan, Jinlan Fu, Zhengbao Jiang, Hiroaki Hayashi, and Graham Neubig. Pre-train, prompt, and predict: A sys- tematic survey of prompting methods in natural language processing. ACM computing surveys, 55(9):1–35, 2023

work page 2023
[28]

Safe-sd: Safe and traceable stable diffusion with text prompt trigger for invisible generative watermarking

Zhiyuan Ma, Guoli Jia, Biqing Qi, and Bowen Zhou. Safe-sd: Safe and traceable stable diffusion with text prompt trigger for invisible generative watermarking. InACM MM, pages 7113–7122, 2024

work page 2024
[29]

Poster: Multiparty private set intersection from multiparty homomorphic encryption

Christian Mouchet, Sylvain Chatel, Lea Nürnberger, and Wouter Lueks. Poster: Multiparty private set intersection from multiparty homomorphic encryption. InCCS, 2024

work page 2024
[30]

Promsec: Prompt optimization for secure generation of func- tional source code with large language models (llms)

Mahmoud Nazzal, Issa Khalil, Abdallah Khreishah, and NhatHai Phan. Promsec: Prompt optimization for secure generation of func- tional source code with large language models (llms). InCCS, 2024

work page 2024
[31]

OpenAI. Chatgpt. https://chatgpt.com/, 2025

work page 2025
[32]

Gpt-4 technical report, 2024

OpenAI, Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ah- mad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, and Sam Altman et al. Gpt-4 technical report, 2024

work page 2024
[33]

Teach llms to phish: Stealing private information from language models

A Panda, C A Choquette-Choo, Z Zhang, et al. Teach llms to phish: Stealing private information from language models. InICLR, 2024

work page 2024
[34]

Online prompt market

PromptBase. Online prompt market. https://promptbase.com/, 2025

work page 2025
[35]

Sok: On the role and future of aigc watermarking in the era of gen-ai.arXiv preprint arXiv:2411.11478, 2024

Kui Ren, Ziqi Yang, Li Lu, Jian Liu, Yiming Li, Jie Wan, Xiaodi Zhao, Xianheng Feng, and Shuo Shao. Sok: On the role and future of aigc watermarking in the era of gen-ai.arXiv preprint arXiv:2411.11478, 2024

work page arXiv 2024
[36]

Code Llama: Open Foundation Models for Code

Baptiste Rozière, Jonas Gehring, Fabian Gloeckle, and et al. Code llama: Open foundation models for code.arXiv preprint arXiv:2308.12950, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[37]

Faq retrieval using query-question similarity and bert-based query-answer relevance

Wataru Sakata, Tomohide Shibata, Ribeka Tanaka, and Sadao Kuro- hashi. Faq retrieval using query-question similarity and bert-based query-answer relevance. InSIGIR, 2019

work page 2019
[38]

2025 llm top-10 risks

Sander Schulhoff. 2025 llm top-10 risks. https://genai.owasp.org/ llmrisk/llm072025-system-prompt-leakage/, 2025

work page 2025
[39]

Microsoft bing system prompt leakage

Sander Schulhoff. Microsoft bing system prompt leakage. https: //learnprompting.org/docs/prompt_hacking/leaking, 2025

work page 2025
[40]

The Prompt Report: A Systematic Survey of Prompt Engineering Techniques

Sander Schulhoff, Michael Ilie, Nishant Balepur, Konstantine Ka- hadze, Amanda Liu, Chenglei Si, Yinheng Li, Aayush Gupta, Hy- oJung Han, Sevien Schulhoff, et al. The prompt report: a sys- tematic survey of prompt engineering techniques.arXiv preprint arXiv:2406.06608, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[41]

Explanation as a watermark: Towards harmless and multi-bit model ownership verification via watermarking feature attribution

Shuo Shao, Yiming Li, Hongwei Yao, Yiling He, Zhan Qin, and Kui Ren. Explanation as a watermark: Towards harmless and multi-bit model ownership verification via watermarking feature attribution. In NDSS, 2025

work page 2025
[42]

Generative echo chamber? effect of llm-powered search systems on diverse information seeking

N Sharma, Q V Liao, and Z Xiao. Generative echo chamber? effect of llm-powered search systems on diverse information seeking. In CHI, 2024

work page 2024
[43]

Prompt stealing attacks against text-to-image generation models

Xinyue Shen, Yiting Qu, Michael Backes, and Yang Zhang. Prompt stealing attacks against text-to-image generation models. InUSENIX Security, 2024

work page 2024
[44]

Logan IV , Eric Wallace, and Sameer Singh

Taylor Shin, Yasaman Razeghi, Robert L. Logan IV , Eric Wallace, and Sameer Singh. Autoprompt: Eliciting knowledge from language models with automatically generated prompts. InEMNLP, 2020

work page 2020
[45]

Why so toxic? measuring and triggering toxic behavior in open-domain chatbots

Wai Man Si, Michael Backes, Jeremy Blackburn, Emiliano De Cristo- faro, Gianluca Stringhini, Savvas Zannettou, and Yang Zhang. Why so toxic? measuring and triggering toxic behavior in open-domain chatbots. InCCS, 2022

work page 2022
[46]

On transferability of prompt tuning for natural language processing

Yusheng Su, Xiaozhi Wang, Yujia Qin, Chi-Min Chan, Yankai Lin, Huadong Wang, Kaiyue Wen, et al. On transferability of prompt tuning for natural language processing. InNAACL, 2022

work page 2022
[47]

Autohint: Auto- matic prompt optimization with hint generation.arXiv preprint arXiv:2307.07415, 2023

Hong Sun, Xue Li, Yinchuan Xu, and et al. Autohint: Auto- matic prompt optimization with hint generation.arXiv preprint arXiv:2307.07415, 2023

work page arXiv 2023
[48]

Peftguard: detecting backdoor attacks against parameter-efficient fine- tuning

Zhen Sun, Tianshuo Cong, Yule Liu, Chenhao Lin, Xinlei He, et al. Peftguard: detecting backdoor attacks against parameter-efficient fine- tuning. InIEEE S&P, 2025

work page 2025
[49]

On the effectiveness of prompt stealing attacks on in-the- wild prompts

Yicong Tan, Xinyue Shen, Yun Shen, Michael Backes, and Yang Zhang. On the effectiveness of prompt stealing attacks on in-the- wild prompts. In2025 IEEE Symposium on Security and Privacy (SP), pages 392–410. IEEE, 2025

work page 2025
[50]

Codegemma: Open code models based on gemma.arXiv preprint arXiv:2406.11409, 2024

CodeGemma Team, Heri Zhao, Jeffrey Hui, and et al. Codegemma: Open code models based on gemma.arXiv preprint arXiv:2406.11409, 2024

work page arXiv 2024
[51]

Gemma: Open Models Based on Gemini Research and Technology

Gemma Team, Thomas Mesnard, Surya Bhupatiraju, and et al. Gemma: Open models based on gemini research and technology. arXiv preprint arXiv:2403.08295, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[52]

LLaMA: Open and Efficient Foundation Language Models

Hugo Touvron, Thibaut Lavril, Gautier Izacard, and et al. Llama: Open and efficient foundation language models.arXiv preprint arXiv:2302.13971, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[53]

Atten- tion is all you need

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. Atten- tion is all you need. InNeurIPS, 2017

work page 2017
[54]

Chain-of-thought prompting elicits reasoning in large language models.Advances in neural information processing systems, 35:24824–24837, 2022

Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Fei Xia, Ed Chi, Quoc V Le, Denny Zhou, et al. Chain-of-thought prompting elicits reasoning in large language models.Advances in neural information processing systems, 35:24824–24837, 2022

work page 2022
[55]

Hard prompts made easy: Gradient-based discrete optimization for prompt tuning and discovery

Yuxin Wen, Neel Jain, John Kirchenbauer, Micah Goldblum, Jonas Geiping, and Tom Goldstein. Hard prompts made easy: Gradient-based discrete optimization for prompt tuning and discovery. NeurIPS, 36:51008–51025, 2023

work page 2023
[56]

Instructional fingerprinting of large language models

Jiashu Xu, Fei Wang, Mingyu Ma, Pang Wei Koh, Chaowei Xiao, and Muhao Chen. Instructional fingerprinting of large language models. InNAACL, 2024

work page 2024
[57]

Learning to break the loop: Analyzing and mitigating repetitions for neural text generation.NeurIPS, 35:3082–3095, 2022

Jin Xu, Xiaojiang Liu, Jianhao Yan, Deng Cai, Huayang Li, and Jian Li. Learning to break the loop: Analyzing and mitigating repetitions for neural text generation.NeurIPS, 35:3082–3095, 2022

work page 2022
[58]

Pm-abe: Puncturable bilateral fine-grained access control from lattices for secret sharing

Mengxue Yang, Huaqun Wang, and Debiao He. Pm-abe: Puncturable bilateral fine-grained access control from lattices for secret sharing. IEEE Transactions on Dependable and Secure Computing, 2024

work page 2024
[59]

Tracing text provenance via context-aware lexical substitution

Xi Yang, Jie Zhang, Kejiang Chen, Weiming Zhang, Zehua Ma, Feng Wang, and Nenghai Yu. Tracing text provenance via context-aware lexical substitution. InAAAI, 2022

work page 2022
[60]

Prsa: Prompt stealing attacks against real-world prompt services

Yong Yang, Changjiang Li, Qingming Li, Oubo Ma, Haoyu Wang, et al. Prsa: Prompt stealing attacks against real-world prompt services. InUSENIX Security, 2025

work page 2025
[61]

Promptcare: Prompt copyright protection by watermark injection and verification

Hongwei Yao, Jian Lou, Kui Ren, and Zhan Qin. Promptcare: Prompt copyright protection by watermark injection and verification. InIEEE S&P, 2024

work page 2024
[62]

Probe before you talk: Towards black-box defense against backdoor unalignment for large language models

Biao Yi, Tiansheng Huang, Sishuo Chen, Tong Li, Zheli Liu, et al. Probe before you talk: Towards black-box defense against backdoor unalignment for large language models. InICLR, 2025

work page 2025
[63]

Exploring the effec- tiveness of prompt engineering for legal reasoning tasks

Fangyi Yu, Lee Quartey, and Frank Schilder. Exploring the effec- tiveness of prompt engineering for legal reasoning tasks. InACL, 2023

work page 2023
[64]

Remark-llm: A robust and efficient watermarking framework for generative large language models

Ruisi Zhang, Shehzeen Samarah Hussain, Paarth Neekhara, and Fari- naz Koushanfar. Remark-llm: A robust and efficient watermarking framework for generative large language models. InUSENIX Security, 2024

work page 2024
[65]

Automatic chain of thought prompting in large language models

Zhuosheng Zhang, Aston Zhang, Mu Li, and Alex Smola. Automatic chain of thought prompting in large language models. InICLR, 2022

work page 2022
[66]

Parden, can you repeat that? defending against jailbreaks via repetition

Ziyang Zhang, Qizhen Zhang, and Jakob Foerster. Parden, can you repeat that? defending against jailbreaks via repetition. InICML, 2024

work page 2024
[67]

Provable robust watermarking for AI-generated text

Xuandong Zhao, Prabhanjan Ananth, Lei Li, and Yu-Xiang Wang. Provable robust watermarking for AI-generated text. InNeurIPS, 2023

work page 2023
[68]

Provable robust watermarking for ai-generated text

Xuandong Zhao, Prabhanjan Vijendra Ananth, Lei Li, and Yu-Xiang Wang. Provable robust watermarking for ai-generated text. InICLR, 2024

work page 2024
[69]

Sok: Water- marking for ai-generated content

Xuandong Zhao, Sam Gunn, Miranda Christ, and et al. Sok: Water- marking for ai-generated content. InIEEE S&P, 2025

work page 2025
[70]

Large language models are human-level prompt engineers

Yongchao Zhou, Andrei Ioan Muresanu, Ziwen Han, Keiran Paster, Silviu Pitis, Harris Chan, and Jimmy Ba. Large language models are human-level prompt engineers. InICLR, 2022

work page 2022
[71]

Universal and Transferable Adversarial Attacks on Aligned Language Models

Andy Zou, Zifan Wang, Nicholas Carlini, Milad Nasr, J. Zico Kolter, and Matt Fredrikson. Universal and transferable adversarial attacks on aligned language models.arXiv preprint arXiv:2307.15043, 2023. Appendix A. Ethics Considerations The study investigates copyright protection for system prompts, with the goal of preventing unauthorized replica- tion an...

work page internal anchor Pith review Pith/arXiv arXiv 2023

[1] [1]

Prompt leakage effect and mitigation strategies for multi-turn llm applications

Divyansh Agarwal, Alexander Richard Fabbri, Ben Risher, Philippe Laban, Shafiq Joty, and Chien-Sheng Wu. Prompt leakage effect and mitigation strategies for multi-turn llm applications. InEMNLP, 2024

work page 2024

[2] [2]

Quick suite

Amazon. Quick suite. https://aws.amazon.com/cn/quicksuite/, 2025

work page 2025

[3] [3]

Gpt-5 system prompt leakage

asgeirtj. Gpt-5 system prompt leakage. https://github.com/asgeirtj/ system_prompts_leaks/blob/main/OpenAI/gpt-5-thinking.md, 2025

work page 2025

[4] [4]

Greedy search method for separable nonlinear models using stage aitken gradient descent and least squares algorithms.IEEE Transactions on Automatic Control, pages 5044–5051, 2023

Jing Chen, Yawen Mao, Min Gan, Dongqing Wang, and Quanmin Zhu. Greedy search method for separable nonlinear models using stage aitken gradient descent and least squares algorithms.IEEE Transactions on Automatic Control, pages 5044–5051, 2023

work page 2023

[5] [5]

Training Verifiers to Solve Math Word Problems

Karl Cobbe, Vineet Kosaraju, Mohammad Bavarian, and et al. Training verifiers to solve math word problems.arXiv preprint arXiv:2110.14168, 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021

[6] [6]

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

DeepSeek-AI, Daya Guo, Dejian Yang, and et al. Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning. arXiv preprint arXiv:2501.12948, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[7] [7]

Stateful defenses for machine learning models are not yet secure against black-box attacks

Ryan Feng, Ashish Hooda, Neal Mangaokar, Kassem Fawaz, Somesh Jha, and Atul Prakash. Stateful defenses for machine learning models are not yet secure against black-box attacks. InCCS, 2023

work page 2023

[8] [8]

Math- ematical capabilities of chatgpt

Simon Frieder, Luca Pinchetti, Ryan-Rhys Griffiths, and et al. Math- ematical capabilities of chatgpt. InNeurIPS, 2023

work page 2023

[9] [9]

Introducing github copilot: your ai pair programmer

Nat Friedman. Introducing github copilot: your ai pair programmer. https://docs.github.com/zh/copilot/quickstart, 2021

work page 2021

[10] [10]

Cosearchagent: A lightweight collaborative search agent with large language models

P Gong, J Li, and J Mao. Cosearchagent: A lightweight collaborative search agent with large language models. InSIGIR, 2024

work page 2024

[11] [11]

On benchmarking code llms for android malware analysis

Yiling He, Hongyu She, Xingzhi Qian, Xinran Zheng, Zhuo Chen, Zhan Qin, and Lorenzo Cavallaro. On benchmarking code llms for android malware analysis. InISSTA Workshop, 2025

work page 2025

[12] [12]

Is difficulty calibration all we need? towards more practical membership inference attacks

Yu He, Boheng Li, Yao Wang, Mengda Yang, Juan Wang, Hongxin Hu, and Xingyu Zhao. Is difficulty calibration all we need? towards more practical membership inference attacks. InCCS, 2024

work page 2024

[13] [13]

Large language models for software engineering: A systematic literature review.ACM Trans- actions on Software Engineering and Methodology, 33(8), 2024

Xinyi Hou, Yanjie Zhao, Yue Liu, and et al. Large language models for software engineering: A systematic literature review.ACM Trans- actions on Software Engineering and Methodology, 33(8), 2024

work page 2024

[14] [14]

On the (in) security of llm app stores

Xinyi Hou, Yanjie Zhao, and Haoyu Wang. On the (in) security of llm app stores. InIEEE S&P, 2025

work page 2025

[15] [15]

Mitigating catastrophic forgetting in large language models with self-synthesized rehearsal

Jianheng Huang, Leyang Cui, Ante Wang, Chengyi Yang, Xinting Liao, Linfeng Song, Junfeng Yao, and Jinsong Su. Mitigating catastrophic forgetting in large language models with self-synthesized rehearsal. InACL, 2024

work page 2024

[16] [16]

Pleak: Prompt leaking attacks against large language model applications

Bo Hui, Haolin Yuan, Neil Gong, Philippe Burlina, and Yinzhi Cao. Pleak: Prompt leaking attacks against large language model applications. InCCS, 2024

work page 2024

[17] [17]

A watermark for large language models

John Kirchenbauer, Jonas Geiping, Yuxin Wen, Jonathan Katz, Ian Miers, and Tom Goldstein. A watermark for large language models. InICML, 2023

work page 2023

[18] [18]

Clearstamp: A human-visible and robust model-ownership proof based on trans- posed model training

Torsten Krauß, Jasper Stang, and Alexandra Dmitrienko. Clearstamp: A human-visible and robust model-ownership proof based on trans- posed model training. InUSENIX Security, 2024

work page 2024

[19] [19]

Robust distortion-free watermarks for language models.Trans- actions on Machine Learning Research, 2023

Rohith Kuditipudi, John Thickstun, Tatsunori Hashimoto, and Percy Liang. Robust distortion-free watermarks for language models.Trans- actions on Machine Learning Research, 2023

work page 2023

[20] [20]

Governing open vocabulary data leaks using an edge llm through programming by example

Q Li, J Wen, and H Jin. Governing open vocabulary data leaks using an edge llm through programming by example. InMobiCom, 2024

work page 2024

[21] [21]

Citation-enhanced generation for llm-based chatbots

Weitao Li, Junkai Li, Weizhi Ma, and Yang Liu. Citation-enhanced generation for llm-based chatbots. InACL, 2024

work page 2024

[22] [22]

Rethinking data protection in the (generative) artificial intelligence era.arXiv preprint arXiv:2507.03034, 2025

Yiming Li, Shuo Shao, Yu He, Junfeng Guo, Tianwei Zhang, Zhan Qin, Pin-Yu Chen, Michael Backes, Philip Torr, Dacheng Tao, and Kui Ren. Rethinking data protection in the (generative) artificial intelligence era.arXiv preprint arXiv:2507.03034, 2025

work page arXiv 2025

[23] [23]

Pro- tecting intellectual property of large language model-based code generation apis via watermarks

Zongjie Li, Chaozheng Wang, Shuai Wang, and Cuiyun Gao. Pro- tecting intellectual property of large language model-based code generation apis via watermarks. InCCS, 2023

work page 2023

[24] [24]

Parrot: Efficient serving of{LLM- based}applications with semantic variable

Chaofan Lin, Zhenhua Han, Chengruidong Zhang, Yuqing Yang, Fan Yang, Chen Chen, and Lili Qiu. Parrot: Efficient serving of{LLM- based}applications with semantic variable. InOSDI, 2024

work page 2024

[25] [25]

A survey of text watermarking in the era of large language models.ACM Computing Surveys, 57(2):1–36, 2024

Aiwei Liu, Leyi Pan, Yijian Lu, Jingjing Li, Xuming Hu, Xi Zhang, Lijie Wen, Irwin King, Hui Xiong, and Philip Yu. A survey of text watermarking in the era of large language models.ACM Computing Surveys, 57(2):1–36, 2024

work page 2024

[26] [26]

Is your code generated by chatgpt really correct? rigorous evaluation of large language models for code generation

Jiawei Liu, Chunqiu Steven Xia, Yuyao Wang, and Lingming Zhang. Is your code generated by chatgpt really correct? rigorous evaluation of large language models for code generation. InNeurIPS, 2023

work page 2023

[27] [27]

Pre-train, prompt, and predict: A sys- tematic survey of prompting methods in natural language processing

Pengfei Liu, Weizhe Yuan, Jinlan Fu, Zhengbao Jiang, Hiroaki Hayashi, and Graham Neubig. Pre-train, prompt, and predict: A sys- tematic survey of prompting methods in natural language processing. ACM computing surveys, 55(9):1–35, 2023

work page 2023

[28] [28]

Safe-sd: Safe and traceable stable diffusion with text prompt trigger for invisible generative watermarking

Zhiyuan Ma, Guoli Jia, Biqing Qi, and Bowen Zhou. Safe-sd: Safe and traceable stable diffusion with text prompt trigger for invisible generative watermarking. InACM MM, pages 7113–7122, 2024

work page 2024

[29] [29]

Poster: Multiparty private set intersection from multiparty homomorphic encryption

Christian Mouchet, Sylvain Chatel, Lea Nürnberger, and Wouter Lueks. Poster: Multiparty private set intersection from multiparty homomorphic encryption. InCCS, 2024

work page 2024

[30] [30]

Promsec: Prompt optimization for secure generation of func- tional source code with large language models (llms)

Mahmoud Nazzal, Issa Khalil, Abdallah Khreishah, and NhatHai Phan. Promsec: Prompt optimization for secure generation of func- tional source code with large language models (llms). InCCS, 2024

work page 2024

[31] [31]

OpenAI. Chatgpt. https://chatgpt.com/, 2025

work page 2025

[32] [32]

Gpt-4 technical report, 2024

OpenAI, Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ah- mad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, and Sam Altman et al. Gpt-4 technical report, 2024

work page 2024

[33] [33]

Teach llms to phish: Stealing private information from language models

A Panda, C A Choquette-Choo, Z Zhang, et al. Teach llms to phish: Stealing private information from language models. InICLR, 2024

work page 2024

[34] [34]

Online prompt market

PromptBase. Online prompt market. https://promptbase.com/, 2025

work page 2025

[35] [35]

Sok: On the role and future of aigc watermarking in the era of gen-ai.arXiv preprint arXiv:2411.11478, 2024

Kui Ren, Ziqi Yang, Li Lu, Jian Liu, Yiming Li, Jie Wan, Xiaodi Zhao, Xianheng Feng, and Shuo Shao. Sok: On the role and future of aigc watermarking in the era of gen-ai.arXiv preprint arXiv:2411.11478, 2024

work page arXiv 2024

[36] [36]

Code Llama: Open Foundation Models for Code

Baptiste Rozière, Jonas Gehring, Fabian Gloeckle, and et al. Code llama: Open foundation models for code.arXiv preprint arXiv:2308.12950, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[37] [37]

Faq retrieval using query-question similarity and bert-based query-answer relevance

Wataru Sakata, Tomohide Shibata, Ribeka Tanaka, and Sadao Kuro- hashi. Faq retrieval using query-question similarity and bert-based query-answer relevance. InSIGIR, 2019

work page 2019

[38] [38]

2025 llm top-10 risks

Sander Schulhoff. 2025 llm top-10 risks. https://genai.owasp.org/ llmrisk/llm072025-system-prompt-leakage/, 2025

work page 2025

[39] [39]

Microsoft bing system prompt leakage

Sander Schulhoff. Microsoft bing system prompt leakage. https: //learnprompting.org/docs/prompt_hacking/leaking, 2025

work page 2025

[40] [40]

The Prompt Report: A Systematic Survey of Prompt Engineering Techniques

Sander Schulhoff, Michael Ilie, Nishant Balepur, Konstantine Ka- hadze, Amanda Liu, Chenglei Si, Yinheng Li, Aayush Gupta, Hy- oJung Han, Sevien Schulhoff, et al. The prompt report: a sys- tematic survey of prompt engineering techniques.arXiv preprint arXiv:2406.06608, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[41] [41]

Explanation as a watermark: Towards harmless and multi-bit model ownership verification via watermarking feature attribution

Shuo Shao, Yiming Li, Hongwei Yao, Yiling He, Zhan Qin, and Kui Ren. Explanation as a watermark: Towards harmless and multi-bit model ownership verification via watermarking feature attribution. In NDSS, 2025

work page 2025

[42] [42]

Generative echo chamber? effect of llm-powered search systems on diverse information seeking

N Sharma, Q V Liao, and Z Xiao. Generative echo chamber? effect of llm-powered search systems on diverse information seeking. In CHI, 2024

work page 2024

[43] [43]

Prompt stealing attacks against text-to-image generation models

Xinyue Shen, Yiting Qu, Michael Backes, and Yang Zhang. Prompt stealing attacks against text-to-image generation models. InUSENIX Security, 2024

work page 2024

[44] [44]

Logan IV , Eric Wallace, and Sameer Singh

Taylor Shin, Yasaman Razeghi, Robert L. Logan IV , Eric Wallace, and Sameer Singh. Autoprompt: Eliciting knowledge from language models with automatically generated prompts. InEMNLP, 2020

work page 2020

[45] [45]

Why so toxic? measuring and triggering toxic behavior in open-domain chatbots

Wai Man Si, Michael Backes, Jeremy Blackburn, Emiliano De Cristo- faro, Gianluca Stringhini, Savvas Zannettou, and Yang Zhang. Why so toxic? measuring and triggering toxic behavior in open-domain chatbots. InCCS, 2022

work page 2022

[46] [46]

On transferability of prompt tuning for natural language processing

Yusheng Su, Xiaozhi Wang, Yujia Qin, Chi-Min Chan, Yankai Lin, Huadong Wang, Kaiyue Wen, et al. On transferability of prompt tuning for natural language processing. InNAACL, 2022

work page 2022

[47] [47]

Autohint: Auto- matic prompt optimization with hint generation.arXiv preprint arXiv:2307.07415, 2023

Hong Sun, Xue Li, Yinchuan Xu, and et al. Autohint: Auto- matic prompt optimization with hint generation.arXiv preprint arXiv:2307.07415, 2023

work page arXiv 2023

[48] [48]

Peftguard: detecting backdoor attacks against parameter-efficient fine- tuning

Zhen Sun, Tianshuo Cong, Yule Liu, Chenhao Lin, Xinlei He, et al. Peftguard: detecting backdoor attacks against parameter-efficient fine- tuning. InIEEE S&P, 2025

work page 2025

[49] [49]

On the effectiveness of prompt stealing attacks on in-the- wild prompts

Yicong Tan, Xinyue Shen, Yun Shen, Michael Backes, and Yang Zhang. On the effectiveness of prompt stealing attacks on in-the- wild prompts. In2025 IEEE Symposium on Security and Privacy (SP), pages 392–410. IEEE, 2025

work page 2025

[50] [50]

Codegemma: Open code models based on gemma.arXiv preprint arXiv:2406.11409, 2024

CodeGemma Team, Heri Zhao, Jeffrey Hui, and et al. Codegemma: Open code models based on gemma.arXiv preprint arXiv:2406.11409, 2024

work page arXiv 2024

[51] [51]

Gemma: Open Models Based on Gemini Research and Technology

Gemma Team, Thomas Mesnard, Surya Bhupatiraju, and et al. Gemma: Open models based on gemini research and technology. arXiv preprint arXiv:2403.08295, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[52] [52]

LLaMA: Open and Efficient Foundation Language Models

Hugo Touvron, Thibaut Lavril, Gautier Izacard, and et al. Llama: Open and efficient foundation language models.arXiv preprint arXiv:2302.13971, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[53] [53]

Atten- tion is all you need

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. Atten- tion is all you need. InNeurIPS, 2017

work page 2017

[54] [54]

Chain-of-thought prompting elicits reasoning in large language models.Advances in neural information processing systems, 35:24824–24837, 2022

Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Fei Xia, Ed Chi, Quoc V Le, Denny Zhou, et al. Chain-of-thought prompting elicits reasoning in large language models.Advances in neural information processing systems, 35:24824–24837, 2022

work page 2022

[55] [55]

Hard prompts made easy: Gradient-based discrete optimization for prompt tuning and discovery

Yuxin Wen, Neel Jain, John Kirchenbauer, Micah Goldblum, Jonas Geiping, and Tom Goldstein. Hard prompts made easy: Gradient-based discrete optimization for prompt tuning and discovery. NeurIPS, 36:51008–51025, 2023

work page 2023

[56] [56]

Instructional fingerprinting of large language models

Jiashu Xu, Fei Wang, Mingyu Ma, Pang Wei Koh, Chaowei Xiao, and Muhao Chen. Instructional fingerprinting of large language models. InNAACL, 2024

work page 2024

[57] [57]

Learning to break the loop: Analyzing and mitigating repetitions for neural text generation.NeurIPS, 35:3082–3095, 2022

Jin Xu, Xiaojiang Liu, Jianhao Yan, Deng Cai, Huayang Li, and Jian Li. Learning to break the loop: Analyzing and mitigating repetitions for neural text generation.NeurIPS, 35:3082–3095, 2022

work page 2022

[58] [58]

Pm-abe: Puncturable bilateral fine-grained access control from lattices for secret sharing

Mengxue Yang, Huaqun Wang, and Debiao He. Pm-abe: Puncturable bilateral fine-grained access control from lattices for secret sharing. IEEE Transactions on Dependable and Secure Computing, 2024

work page 2024

[59] [59]

Tracing text provenance via context-aware lexical substitution

Xi Yang, Jie Zhang, Kejiang Chen, Weiming Zhang, Zehua Ma, Feng Wang, and Nenghai Yu. Tracing text provenance via context-aware lexical substitution. InAAAI, 2022

work page 2022

[60] [60]

Prsa: Prompt stealing attacks against real-world prompt services

Yong Yang, Changjiang Li, Qingming Li, Oubo Ma, Haoyu Wang, et al. Prsa: Prompt stealing attacks against real-world prompt services. InUSENIX Security, 2025

work page 2025

[61] [61]

Promptcare: Prompt copyright protection by watermark injection and verification

Hongwei Yao, Jian Lou, Kui Ren, and Zhan Qin. Promptcare: Prompt copyright protection by watermark injection and verification. InIEEE S&P, 2024

work page 2024

[62] [62]

Probe before you talk: Towards black-box defense against backdoor unalignment for large language models

Biao Yi, Tiansheng Huang, Sishuo Chen, Tong Li, Zheli Liu, et al. Probe before you talk: Towards black-box defense against backdoor unalignment for large language models. InICLR, 2025

work page 2025

[63] [63]

Exploring the effec- tiveness of prompt engineering for legal reasoning tasks

Fangyi Yu, Lee Quartey, and Frank Schilder. Exploring the effec- tiveness of prompt engineering for legal reasoning tasks. InACL, 2023

work page 2023

[64] [64]

Remark-llm: A robust and efficient watermarking framework for generative large language models

Ruisi Zhang, Shehzeen Samarah Hussain, Paarth Neekhara, and Fari- naz Koushanfar. Remark-llm: A robust and efficient watermarking framework for generative large language models. InUSENIX Security, 2024

work page 2024

[65] [65]

Automatic chain of thought prompting in large language models

Zhuosheng Zhang, Aston Zhang, Mu Li, and Alex Smola. Automatic chain of thought prompting in large language models. InICLR, 2022

work page 2022

[66] [66]

Parden, can you repeat that? defending against jailbreaks via repetition

Ziyang Zhang, Qizhen Zhang, and Jakob Foerster. Parden, can you repeat that? defending against jailbreaks via repetition. InICML, 2024

work page 2024

[67] [67]

Provable robust watermarking for AI-generated text

Xuandong Zhao, Prabhanjan Ananth, Lei Li, and Yu-Xiang Wang. Provable robust watermarking for AI-generated text. InNeurIPS, 2023

work page 2023

[68] [68]

Provable robust watermarking for ai-generated text

Xuandong Zhao, Prabhanjan Vijendra Ananth, Lei Li, and Yu-Xiang Wang. Provable robust watermarking for ai-generated text. InICLR, 2024

work page 2024

[69] [69]

Sok: Water- marking for ai-generated content

Xuandong Zhao, Sam Gunn, Miranda Christ, and et al. Sok: Water- marking for ai-generated content. InIEEE S&P, 2025

work page 2025

[70] [70]

Large language models are human-level prompt engineers

Yongchao Zhou, Andrei Ioan Muresanu, Ziwen Han, Keiran Paster, Silviu Pitis, Harris Chan, and Jimmy Ba. Large language models are human-level prompt engineers. InICLR, 2022

work page 2022

[71] [71]

Universal and Transferable Adversarial Attacks on Aligned Language Models

Andy Zou, Zifan Wang, Nicholas Carlini, Milad Nasr, J. Zico Kolter, and Matt Fredrikson. Universal and transferable adversarial attacks on aligned language models.arXiv preprint arXiv:2307.15043, 2023. Appendix A. Ethics Considerations The study investigates copyright protection for system prompts, with the goal of preventing unauthorized replica- tion an...

work page internal anchor Pith review Pith/arXiv arXiv 2023