PromptCOS is a content-only watermarking method for LLM system prompts that embeds detectable cyclic signals via auxiliary tokens while preserving fidelity and resisting removal attacks.
Why so toxic? measuring and triggering toxic behavior in open-domain chatbots
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CR 1years
2025 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
PromptCOS: Towards Content-only System Prompt Copyright Auditing for LLMs
PromptCOS is a content-only watermarking method for LLM system prompts that embeds detectable cyclic signals via auxiliary tokens while preserving fidelity and resisting removal attacks.