Prompt optimization and evalu- ation for llm automated red teaming.arXiv preprint arXiv:2507.22133

Michael Freenor, Lauren Alvarez, Milton Leal, Lily Smith, Joel Garrett, Yelyzaveta Husieva, Madeline Woodruff, Ryan Miller, Erich Kummerfeld, Rafael Medeiros, Sander Schulhoff · 2025 · arXiv 2507.22133

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

read on arXiv browse 2 citing papers

representative citing papers

Quantifying LLM Safety Degradation Under Repeated Attacks Using Survival Analysis

cs.CR · 2026-05-13 · unverdicted · novelty 6.0

Survival analysis applied to repeated jailbreak attacks on three LLMs shows one model degrades rapidly while the others maintain moderate vulnerability on HarmBench prompts.

Beyond "I cannot fulfill this request": Alleviating Rigid Rejection in LLMs via Label Enhancement

cs.CL · 2026-05-08 · unverdicted · novelty 5.0

LANCE applies variational inference for label enhancement across multiple rejection categories, supplying gradients to a refinement model that produces safe, non-rigid responses from LLMs.

citing papers explorer

Showing 2 of 2 citing papers.

Quantifying LLM Safety Degradation Under Repeated Attacks Using Survival Analysis cs.CR · 2026-05-13 · unverdicted · none · ref 4
Survival analysis applied to repeated jailbreak attacks on three LLMs shows one model degrades rapidly while the others maintain moderate vulnerability on HarmBench prompts.
Beyond "I cannot fulfill this request": Alleviating Rigid Rejection in LLMs via Label Enhancement cs.CL · 2026-05-08 · unverdicted · none · ref 1
LANCE applies variational inference for label enhancement across multiple rejection categories, supplying gradients to a refinement model that produces safe, non-rigid responses from LLMs.

Prompt optimization and evalu- ation for llm automated red teaming.arXiv preprint arXiv:2507.22133

fields

years

verdicts

representative citing papers

citing papers explorer