DR-Smoothing introduces a disrupt-then-rectify prompt processing scheme into smoothing defenses, delivering tight theoretical bounds on success probability against both token- and prompt-level jailbreaks.
On improving adversarial transferability of vision transformers
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
citation-role summary
background 1
citation-polarity summary
fields
cs.CR 2verdicts
UNVERDICTED 2roles
background 1polarities
background 1representative citing papers
A comprehensive survey that taxonomizes safety threats to large models and agents, reviews defenses and benchmarks, and outlines open challenges.
citing papers explorer
-
Guaranteed Jailbreaking Defense via Disrupt-and-Rectify Smoothing
DR-Smoothing introduces a disrupt-then-rectify prompt processing scheme into smoothing defenses, delivering tight theoretical bounds on success probability against both token- and prompt-level jailbreaks.
-
Safety at Scale: A Comprehensive Survey of Large Model and Agent Safety
A comprehensive survey that taxonomizes safety threats to large models and agents, reviews defenses and benchmarks, and outlines open challenges.