arXiv preprint arXiv:2406.11654 , year=

Han, V · 2024 · arXiv 2406.11654

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

read on arXiv browse 3 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

PersonaTeaming: Supporting Persona-Driven Red-Teaming for Generative AI

cs.HC · 2026-05-07 · unverdicted · novelty 7.0 · 2 refs

Persona-driven workflow and interface improve automated and human-AI red-teaming of generative AI by incorporating diverse perspectives into adversarial prompt creation.

Stable-GFlowNet: Toward Diverse and Robust LLM Red-Teaming via Contrastive Trajectory Balance

cs.LG · 2026-05-01 · unverdicted · novelty 6.0

Stable-GFlowNet stabilizes GFN training for LLM red-teaming by eliminating Z estimation via pairwise comparisons and robust masking against noisy rewards while adding a fluency stabilizer.

Distributed Quality-Diversity Search for Toxicity in Large Language Models

cs.NE · 2026-06-23 · unverdicted · novelty 4.0

ToxSearch-S extends evolutionary toxicity search with embedding-driven speciation and MPI distribution, matching baseline peak toxicity while showing lower search pressure and 1.8-3.2x speedups.

citing papers explorer

Showing 0 of 0 citing papers after filters.

No citing papers match the current filters.

arXiv preprint arXiv:2406.11654 , year=

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer