Red-Bandit adapts online to LLM failure modes by dynamically selecting among RL-trained LoRA attack-style experts via a bandit policy, reporting SOTA ASR@10 on AdvBench with lower-perplexity prompts.
Diverse and effective red teaming with auto-generated rewards and multi-step reinforcement learning.arXiv preprint arXiv:2412.18693, 2024
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.CL 2years
2025 2verdicts
UNVERDICTED 2representative citing papers
Gemini 2.5 Pro and Flash models are presented as achieving frontier performance in reasoning, coding, and long-context multimodal tasks while spanning a cost-capability Pareto curve.
citing papers explorer
-
Red-Bandit: Test-Time Adaptation for LLM Red-Teaming via Bandit-Guided LoRA Experts
Red-Bandit adapts online to LLM failure modes by dynamically selecting among RL-trained LoRA attack-style experts via a bandit policy, reporting SOTA ASR@10 on AdvBench with lower-perplexity prompts.
-
Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities
Gemini 2.5 Pro and Flash models are presented as achieving frontier performance in reasoning, coding, and long-context multimodal tasks while spanning a cost-capability Pareto curve.