Persona-driven workflow and interface improve automated and human-AI red-teaming of generative AI by incorporating diverse perspectives into adversarial prompt creation.
Title resolution pending
4 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 4roles
background 1polarities
background 1representative citing papers
RouteHijack is a routing-aware jailbreak that identifies safety-critical experts via activation contrast and optimizes suffixes to suppress them, reaching 69.3% average attack success rate on seven MoE LLMs with strong transfer to variants and VLMs.
MASCing uses an LSTM surrogate and optimized steering masks to enable flexible, inference-time control over MoE expert routing for safety objectives, improving jailbreak defense and content generation success rates substantially across multiple models.
SecureWebArena is a new benchmark suite for holistic security evaluation of LVLM-based web agents using diverse simulated environments, attack taxonomies, and multi-layered failure analysis across reasoning, behavior, and outcomes.
citing papers explorer
-
PersonaTeaming: Supporting Persona-Driven Red-Teaming for Generative AI
Persona-driven workflow and interface improve automated and human-AI red-teaming of generative AI by incorporating diverse perspectives into adversarial prompt creation.
-
RouteHijack: Routing-Aware Attack on Mixture-of-Experts LLMs
RouteHijack is a routing-aware jailbreak that identifies safety-critical experts via activation contrast and optimizes suffixes to suppress them, reaching 69.3% average attack success rate on seven MoE LLMs with strong transfer to variants and VLMs.
-
MASCing: Configurable Mixture-of-Experts Behavior via Activation Steering Masks
MASCing uses an LSTM surrogate and optimized steering masks to enable flexible, inference-time control over MoE expert routing for safety objectives, improving jailbreak defense and content generation success rates substantially across multiple models.
-
SecureWebArena: A Holistic Security Evaluation Benchmark for LVLM-based Web Agents
SecureWebArena is a new benchmark suite for holistic security evaluation of LVLM-based web agents using diverse simulated environments, attack taxonomies, and multi-layered failure analysis across reasoning, behavior, and outcomes.