Fine-tuning security LLMs specializes inherited classification circuits into token-level indicators that preserve canonical accuracy but fail under behavior-preserving transformations like aliasing and case mutation.
Malicious powershell detection using attention against adversarial attacks.Elec- tronics, 9(11):1817, 2020
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CR 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Inherited Circuits, Learned Semantics: How Fine-Tuning Creates Evasion Vulnerabilities Invisible to Standard Evaluation
Fine-tuning security LLMs specializes inherited classification circuits into token-level indicators that preserve canonical accuracy but fail under behavior-preserving transformations like aliasing and case mutation.