Why so toxic? measuring and triggering toxic behavior in open-domain chatbots

Wai Man Si, Michael Backes, Jeremy Blackburn, Emiliano De Cristofaro, Gianluca Stringhini, Savvas Zannettou, Yang Zhang · 2022

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

PromptCOS: Towards Content-only System Prompt Copyright Auditing for LLMs

cs.CR · 2025-09-03 · unverdicted · novelty 7.0

PromptCOS is a content-only watermarking method for LLM system prompts that embeds detectable cyclic signals via auxiliary tokens while preserving fidelity and resisting removal attacks.

citing papers explorer

Showing 1 of 1 citing paper.

PromptCOS: Towards Content-only System Prompt Copyright Auditing for LLMs cs.CR · 2025-09-03 · unverdicted · none · ref 45
PromptCOS is a content-only watermarking method for LLM system prompts that embeds detectable cyclic signals via auxiliary tokens while preserving fidelity and resisting removal attacks.

Why so toxic? measuring and triggering toxic behavior in open-domain chatbots

fields

years

verdicts

representative citing papers

citing papers explorer