Robustkv: Defending large language models against jailbreak attacks via kv eviction

Tanqiu Jiang, Zian Wang, Jiacheng Liang, Changjiang Li, Yuhui Wang, Ting Wang · arXiv 2410.19937

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

AnchorKV: Safety-Aware KV Cache Compression via Soft Penalty with a Refusal Anchor

cs.LG · 2026-06-16 · unverdicted · novelty 5.0

AnchorKV augments KV cache compression with a soft penalty derived from a refusal anchor in key space to improve safety alignment against jailbreaks while preserving most utility.

New Wide-Net-Casting Jailbreak Attacks Risk Large Models

cs.CR · 2026-05-16 · unverdicted · novelty 5.0

The paper demonstrates that a tailored jailbreak method for querying groups of large models can achieve up to 100% success rate in some experiments on unprotected models, revealing overlooked multi-model safety risks.

citing papers explorer

Showing 2 of 2 citing papers after filters.

AnchorKV: Safety-Aware KV Cache Compression via Soft Penalty with a Refusal Anchor cs.LG · 2026-06-16 · unverdicted · none · ref 3
AnchorKV augments KV cache compression with a soft penalty derived from a refusal anchor in key space to improve safety alignment against jailbreaks while preserving most utility.
New Wide-Net-Casting Jailbreak Attacks Risk Large Models cs.CR · 2026-05-16 · unverdicted · none · ref 9
The paper demonstrates that a tailored jailbreak method for querying groups of large models can achieve up to 100% success rate in some experiments on unprotected models, revealing overlooked multi-model safety risks.

Robustkv: Defending large language models against jailbreak attacks via kv eviction

fields

years

verdicts

representative citing papers

citing papers explorer