The jailbreak tax: How useful are your jailbreak outputs?

The Jailbreak Tax: How Useful are Your Jailbreak Outputs? , author= · 2025 · arXiv 2504.10694

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

Benchmarking Misuse Mitigation Against Covert Adversaries

cs.CR · 2025-06-06 · unverdicted · novelty 6.0

Develops the BSD data generation pipeline and two new datasets to evaluate decomposition attacks as effective misuse enablers and stateful defenses as a countermeasure in language model safety.

Greedy Coordinate Diffusion: Effective and Semantically Coherent Adversarial Attacks via Diffusion Guidance

cs.LG · 2026-06-14 · unverdicted · novelty 5.0

GCD uses diffusion model priors to guide suffix search, achieving higher attack success rates with better semantic adherence and lower detection than GCG-style methods.

citing papers explorer

Showing 1 of 1 citing paper after filters.

Greedy Coordinate Diffusion: Effective and Semantically Coherent Adversarial Attacks via Diffusion Guidance cs.LG · 2026-06-14 · unverdicted · none · ref 2
GCD uses diffusion model priors to guide suffix search, achieving higher attack success rates with better semantic adherence and lower detection than GCG-style methods.

The jailbreak tax: How useful are your jailbreak outputs?

fields

years

verdicts

representative citing papers

citing papers explorer