Survival analysis applied to repeated jailbreak attacks on three LLMs shows one model degrades rapidly while the others maintain moderate vulnerability on HarmBench prompts.
Prompt optimization and evalu- ation for llm automated red teaming.arXiv preprint arXiv:2507.22133
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
LANCE applies variational inference for label enhancement across multiple rejection categories, supplying gradients to a refinement model that produces safe, non-rigid responses from LLMs.
citing papers explorer
-
Quantifying LLM Safety Degradation Under Repeated Attacks Using Survival Analysis
Survival analysis applied to repeated jailbreak attacks on three LLMs shows one model degrades rapidly while the others maintain moderate vulnerability on HarmBench prompts.
-
Beyond "I cannot fulfill this request": Alleviating Rigid Rejection in LLMs via Label Enhancement
LANCE applies variational inference for label enhancement across multiple rejection categories, supplying gradients to a refinement model that produces safe, non-rigid responses from LLMs.