EvoSafety achieves model-agnostic lifelong LLM safety via external adversarial skill libraries for red-teaming and a lightweight memory-augmented defense model that operates in steer or guard modes, reaching 99.61% defense success.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
Reinforcement fine-tuning calibration makes LLM distillability adjustable, allowing optimized knowledge transfer or model IP safeguards via a combined task-KL-calibration objective.
citing papers explorer
-
Model-Agnostic Lifelong LLM Safety via Externalized Attack-Defense Co-Evolution
EvoSafety achieves model-agnostic lifelong LLM safety via external adversarial skill libraries for red-teaming and a lightweight memory-augmented defense model that operates in steer or guard modes, reaching 99.61% defense success.
-
Distillation Traps and Guards: A Calibration Knob for LLM Distillability
Reinforcement fine-tuning calibration makes LLM distillability adjustable, allowing optimized knowledge transfer or model IP safeguards via a combined task-KL-calibration objective.