Malicious LLM API routers actively perform payload injection and secret exfiltration, with 9 of 428 tested routers showing malicious behavior and further poisoning risks from leaked credentials.
BadNets: Evaluating Backdooring Attacks on Deep Neural Networks
4 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 4roles
baseline 1polarities
baseline 1representative citing papers
Argus detects backdoors in decentralized learning by local trigger analysis and neighbor similarity checks on consistency, with theoretical convergence guarantees and empirical reductions in attack success up to 90 points.
Gungnir shows that style-based triggers with RAN and STTR techniques can activate backdoors in diffusion models while evading detection and surviving fine-tuning.
GLA backdoor attack on DriveVLM uses naturalistic graffiti and cross-lingual triggers to reach 90% ASR at 10% poisoning ratio while improving some clean-task metrics like BLEU-1.
citing papers explorer
-
Your Agent Is Mine: Measuring Malicious Intermediary Attacks on the LLM Supply Chain
Malicious LLM API routers actively perform payload injection and secret exfiltration, with 9 of 428 tested routers showing malicious behavior and further poisoning risks from leaked credentials.
-
Your Neighbors Know: Leveraging Local Neighborhoods for Backdoor Detection in Decentralized Learning
Argus detects backdoors in decentralized learning by local trigger analysis and neighbor similarity checks on consistency, with theoretical convergence guarantees and empirical reductions in attack success up to 90 points.
-
Gungnir: Exploiting Stylistic Features in Images for Backdoor Attacks on Diffusion Models
Gungnir shows that style-based triggers with RAN and STTR techniques can activate backdoors in diffusion models while evading detection and surviving fine-tuning.
-
Multimodal Backdoor Attack on VLMs for Autonomous Driving via Graffiti and Cross-Lingual Triggers
GLA backdoor attack on DriveVLM uses naturalistic graffiti and cross-lingual triggers to reach 90% ASR at 10% poisoning ratio while improving some clean-task metrics like BLEU-1.