Agentic LLMs remain robust to renaming and insertion but degrade on composed transformations and deeper obfuscation in CTF tasks, enabled by a new Evolve-CTF tool for generating equivalent challenge families.
Narasimhan, and Yuan Cao
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
citation-role summary
background 1
citation-polarity summary
years
2026 2verdicts
UNVERDICTED 2roles
background 1polarities
background 1representative citing papers
AnomalyAgent uses tool-augmented reinforcement learning with self-reflection to generate realistic industrial anomalies, achieving better metrics than zero-shot methods on MVTec-AD.
citing papers explorer
-
Capture the Flags: Family-Based Evaluation of Agentic LLMs via Semantics-Preserving Transformations
Agentic LLMs remain robust to renaming and insertion but degrade on composed transformations and deeper obfuscation in CTF tasks, enabled by a new Evolve-CTF tool for generating equivalent challenge families.
-
AnomalyAgent: Agentic Industrial Anomaly Synthesis via Tool-Augmented Reinforcement Learning
AnomalyAgent uses tool-augmented reinforcement learning with self-reflection to generate realistic industrial anomalies, achieving better metrics than zero-shot methods on MVTec-AD.