AnchorKV augments KV cache compression with a soft penalty derived from a refusal anchor in key space to improve safety alignment against jailbreaks while preserving most utility.
Robustkv: Defending large language models against jailbreak attacks via kv eviction
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
The paper demonstrates that a tailored jailbreak method for querying groups of large models can achieve up to 100% success rate in some experiments on unprotected models, revealing overlooked multi-model safety risks.
citing papers explorer
-
AnchorKV: Safety-Aware KV Cache Compression via Soft Penalty with a Refusal Anchor
AnchorKV augments KV cache compression with a soft penalty derived from a refusal anchor in key space to improve safety alignment against jailbreaks while preserving most utility.
-
New Wide-Net-Casting Jailbreak Attacks Risk Large Models
The paper demonstrates that a tailored jailbreak method for querying groups of large models can achieve up to 100% success rate in some experiments on unprotected models, revealing overlooked multi-model safety risks.