Compilation optimizations can be exploited to create stealthy backdoors in LLMs that remain dormant without optimization but achieve ~90% attack success while preserving clean accuracy near 100%.
Revisiting backdoor attacks on llms: A stealthy and practical poisoning framework via harmless inputs
3 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.CR 3years
2026 3representative citing papers
P2F generates low-rank parameter increments for LLM fingerprinting directly from textual descriptions in a single forward pass.
BackFlush detects backdoors via susceptibility amplification and eliminates them with RoPE unlearning to reach 1% ASR and 99% clean accuracy while preserving watermarks.
citing papers explorer
-
Trusted Weights, Treacherous Optimizations? Optimization-Triggered Backdoor Attacks on LLMs
Compilation optimizations can be exploited to create stealthy backdoors in LLMs that remain dormant without optimization but achieve ~90% attack success while preserving clean accuracy near 100%.
-
Prompt2Fingerprint: Plug-and-Play LLM Fingerprinting via Text-to-Weight Generation
P2F generates low-rank parameter increments for LLM fingerprinting directly from textual descriptions in a single forward pass.
-
BackFlush: Knowledge-Free Backdoor Detection and Elimination with Watermark Preservation in Large Language Models
BackFlush detects backdoors via susceptibility amplification and eliminates them with RoPE unlearning to reach 1% ASR and 99% clean accuracy while preserving watermarks.