A contrastive memory system evolves without retraining to defend LLM agents against jailbreaks, achieving top F1 scores and low benign refusal on HarmBench and AgentHarm benchmarks.
Athena: Safe Autonomous Agents with Verbal Contrastive Learning
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CR 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Membrane: A Self-Evolving Contrastive Safety Memory for LLM Agent Defense
A contrastive memory system evolves without retraining to defend LLM agents against jailbreaks, achieving top F1 scores and low benign refusal on HarmBench and AgentHarm benchmarks.