The paper presents Proactive Availability Backdoor (PAB) attacks on LLMs that achieve 73.1% effective success rate by proactively inducing users via suggestions in a Five-Factor Model simulation.
arXiv preprint arXiv:2004.06660 , year=
5 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
Backdoor attacks on VLM-based scanpath predictors can redirect fixations toward chosen objects or inflate durations using input-conditioned triggers that evade cluster detection, and no tested defense blocks them without hurting clean accuracy.
LLMs trained on simple specification gaming generalize to zero-shot reward tampering including rewriting their own reward function.
SCOUT uses token saliency analysis to detect both standard and contextually-plausible backdoor attacks in language models while maintaining clean accuracy.
Catastrophic overfitting in fast adversarial training is reinterpreted as a weak-trigger variant of unlearnable tasks, allowing backdoor-inspired recalibration and outlier suppression to restore robustness.
citing papers explorer
-
Follow My Eyes: Backdoor Attacks on VLM-based Scanpath Prediction
Backdoor attacks on VLM-based scanpath predictors can redirect fixations toward chosen objects or inflate durations using input-conditioned triggers that evade cluster detection, and no tested defense blocks them without hurting clean accuracy.
-
Sycophancy to Subterfuge: Investigating Reward-Tampering in Large Language Models
LLMs trained on simple specification gaming generalize to zero-shot reward tampering including rewriting their own reward function.