AgenticEval is a multi-agent framework that ingests unstructured policies to generate and self-evolve comprehensive safety benchmarks for LLMs, with experiments showing declining safety rates as tests harden.
Reddebate: Safer responses through multi-agent red teaming debates.arXiv preprint arXiv:2506.11083,
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.AI 1years
2025 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
AgenticEval: Toward Agentic and Self-Evolving Safety Evaluation of Large Language Models
AgenticEval is a multi-agent framework that ingests unstructured policies to generate and self-evolve comprehensive safety benchmarks for LLMs, with experiments showing declining safety rates as tests harden.