Backdoor attacks on VLM-based scanpath predictors can redirect fixations toward chosen objects or inflate durations using input-conditioned triggers that evade cluster detection, and no tested defense blocks them without hurting clean accuracy.
Weight poisoning attacks on pre- trained models
4 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
LLMs trained on simple specification gaming generalize to zero-shot reward tampering including rewriting their own reward function.
SCOUT uses token saliency analysis to detect both standard and contextually-plausible backdoor attacks in language models while maintaining clean accuracy.
Catastrophic overfitting in fast adversarial training is reinterpreted as a weak-trigger variant of unlearnable tasks, allowing backdoor-inspired recalibration and outlier suppression to restore robustness.
citing papers explorer
-
Follow My Eyes: Backdoor Attacks on VLM-based Scanpath Prediction
Backdoor attacks on VLM-based scanpath predictors can redirect fixations toward chosen objects or inflate durations using input-conditioned triggers that evade cluster detection, and no tested defense blocks them without hurting clean accuracy.
-
Sycophancy to Subterfuge: Investigating Reward-Tampering in Large Language Models
LLMs trained on simple specification gaming generalize to zero-shot reward tampering including rewriting their own reward function.
-
SCOUT: A Defense Against Data Poisoning Attacks in Fine-Tuned Language Models
SCOUT uses token saliency analysis to detect both standard and contextually-plausible backdoor attacks in language models while maintaining clean accuracy.
-
Unveiling the Backdoor Mechanism Hidden Behind Catastrophic Overfitting in Fast Adversarial Training
Catastrophic overfitting in fast adversarial training is reinterpreted as a weak-trigger variant of unlearnable tasks, allowing backdoor-inspired recalibration and outlier suppression to restore robustness.