Persona-driven workflow and interface improve automated and human-AI red-teaming of generative AI by incorporating diverse perspectives into adversarial prompt creation.
arXiv preprint arXiv:2406.11654 , year=
3 Pith papers cite this work. Polarity classification is still indexing.
3
Pith papers citing it
citation-role summary
background 1
citation-polarity summary
years
2026 3verdicts
UNVERDICTED 3roles
background 1polarities
background 1representative citing papers
Stable-GFlowNet stabilizes GFN training for LLM red-teaming by eliminating Z estimation via pairwise comparisons and robust masking against noisy rewards while adding a fluency stabilizer.
ToxSearch-S extends evolutionary toxicity search with embedding-driven speciation and MPI distribution, matching baseline peak toxicity while showing lower search pressure and 1.8-3.2x speedups.
citing papers explorer
No citing papers match the current filters.