MultiBreak is a large diverse multi-turn jailbreak benchmark that achieves substantially higher attack success rates on LLMs than prior datasets and reveals topic-specific vulnerabilities in multi-turn settings.
X-boundary: Establishing ex- act safety boundary to shield llms from multi-turn jail- breaks without compromising usability.arXiv preprint arXiv:2502.09990
3 Pith papers cite this work. Polarity classification is still indexing.
verdicts
UNVERDICTED 3representative citing papers
CoopGuard deploys cooperative agents to track conversation history and counter evolving multi-round attacks on LLMs, achieving a 78.9% reduction in attack success rate on a new 5,200-sample benchmark.
The paper proposes a paradigm of provable probabilistic safety to enable scalable, safe deployment of embodied AI in critical applications.
citing papers explorer
-
MultiBreak: A Scalable and Diverse Multi-turn Jailbreak Benchmark for Evaluating LLM Safety
MultiBreak is a large diverse multi-turn jailbreak benchmark that achieves substantially higher attack success rates on LLMs than prior datasets and reveals topic-specific vulnerabilities in multi-turn settings.
-
CoopGuard: Stateful Cooperative Agents Safeguarding LLMs Against Evolving Multi-Round Attacks
CoopGuard deploys cooperative agents to track conversation history and counter evolving multi-round attacks on LLMs, achieving a 78.9% reduction in attack success rate on a new 5,200-sample benchmark.
-
Towards provable probabilistic safety for scalable embodied AI systems
The paper proposes a paradigm of provable probabilistic safety to enable scalable, safe deployment of embodied AI in critical applications.