PoisonForge benchmark shows that 1% poisoned examples achieve over 70% attack success rate on targeted tasks across 11 of 12 tested LLMs with under 0.5% leakage to non-target tasks.
Network and Distributed System Security Symposium , year=
3 Pith papers cite this work. Polarity classification is still indexing.
3
Pith papers citing it
years
2026 3representative citing papers
citing papers explorer
-
PoisonForge: Task-Level Targeted Poisoning Benchmark for Instruction-Tuned LLMs
PoisonForge benchmark shows that 1% poisoned examples achieve over 70% attack success rate on targeted tasks across 11 of 12 tested LLMs with under 0.5% leakage to non-target tasks.
- Language-Switching Triggers Take a Latent Detour Through Language Models
- Mechanistic Anomaly Detection via Functional Attribution