pith. sign in

MAGIC: A co-evolving attacker-defender adversarial game for robust LLM safety

7 Pith papers cite this work. Polarity classification is still indexing.

7 Pith papers citing it

citation-role summary

background 3

citation-polarity summary

years

2026 7

verdicts

UNVERDICTED 7

roles

background 3

polarities

background 3

clear filters

representative citing papers

citing papers explorer

Showing 1 of 1 citing paper after filters.

  • Addressing Over-Refusal in LLMs with Competing Rewards cs.LG · 2026-06-30 · unverdicted · none · ref 121

    SEAR trains one LLM via adversarial process rewards to explore harmful reasoning paths but flip to safe outputs, reducing over-refusal while preserving safety.