arxiv: 2604.08766 · v1 · submitted 2026-04-09 · 💻 cs.CR

Recognition: unknown

Follow My Eyes: Backdoor Attacks on VLM-based Scanpath Prediction

Diana Romero , Mutahar Ali , Momin Ahmad Khan , Habiba Farrukh , Fatima Anwar , Salma Elmalaki

Authors on Pith no claims yet

Pith reviewed 2026-05-10 16:54 UTC · model grok-4.3

classification 💻 cs.CR

keywords backdoor attacksscanpath predictionvisual language modelsgaze predictioninput-aware attacksvisual searchmodel poisoningedge deployment

0 comments

The pith

Backdoor attacks on VLM scanpath models can redirect fixations or extend durations by conditioning malicious outputs on the input scene, evading cluster detection.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that backdoor attacks against models predicting sequences of eye fixations can succeed when designed to vary their output based on each input image rather than using fixed malicious patterns. A sympathetic reader would care because these models support foveated rendering and attention-driven features in mobile systems, where their integrity affects both performance and security. The authors introduce two concrete attacks: one that shifts predicted fixations toward a chosen target object and another that lengthens fixation durations to slow visual search completion. Both produce diverse, scene-appropriate scanpaths that avoid forming detectable clusters. The attacks remain effective across trigger types and survive model quantization when deployed on actual smartphones.

Core claim

We present the first study of backdoor attacks against VLM-based scanpath prediction, evaluated on GazeFormer and COCO-Search18. We show that naive fixed-path attacks create detectable clustering in the continuous output space. To overcome this, we design two variable-output attacks: an input-aware spatial attack that redirects predicted fixations toward an attacker-chosen target object, and a scanpath duration attack that inflates fixation durations to delay visual search completion. Both attacks condition their output on the input scene, producing diverse and plausible scanpaths that evade cluster-based detection. We evaluate across three trigger modalities, multiple poisoning ratios, and五

What carries the argument

Input-aware variable-output backdoors that adapt malicious scanpath changes to the specific input scene, avoiding fixed-pattern clustering.

If this is right

Variable-output attacks evade cluster-based detection that catches fixed malicious scanpaths.
Both spatial redirection to a target object and duration inflation succeed while producing plausible outputs.
Attacks work with visual, textual, and multimodal triggers at various poisoning ratios.
No post-training defense simultaneously blocks the attacks and preserves clean model performance across configurations.
Backdoor behavior persists after quantization and runs on both flagship and legacy smartphones.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar input-conditioned backdoors could affect other sequence-prediction tasks in vision-language models where outputs must remain diverse.
Gaze-driven mobile interfaces may need anomaly detection that examines how scanpaths vary with scene content rather than looking only for fixed clusters.
The survival of attacks on legacy hardware indicates that existing deployed systems are already exposed without retraining.
Attackers could selectively influence attention toward or away from real-world objects in applications such as augmented reality search.

Load-bearing premise

The backdoor triggers and chosen poisoning ratios remain effective and undetectable after quantization and deployment on real smartphones beyond the tested GazeFormer and COCO-Search18 setup.

What would settle it

Quantize a backdoored model, deploy it on a commodity smartphone, and observe that triggered inputs produce neither redirected fixations nor inflated durations, or that the resulting scanpaths form detectable clusters under standard analysis.

Figures

Figures reproduced from arXiv: 2604.08766 by Diana Romero, Fatima Anwar, Habiba Farrukh, Momin Ahmad Khan, Mutahar Ali, Salma Elmalaki.

**Figure 2.** Figure 2: Fixed-path backdoor triggers and their effect on [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Overview of the proposed variable-output backdoor attacks. (a) Clean baseline: a normal scanpath for “find fork” [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: Kernel density estimates of predicted scanpath dura [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 5.** Figure 5: Exploratory data analysis of the fixed-path poisoned datasets. Panels 1, 2, and 3 correspond to poisoning ratios of [PITH_FULL_IMAGE:figures/full_fig_p022_5.png] view at source ↗

**Figure 6.** Figure 6: Qualitative examples of targeted redirection under the input-aware spatial attack. Each row compares the clean and [PITH_FULL_IMAGE:figures/full_fig_p023_6.png] view at source ↗

**Figure 7.** Figure 7: Representative cases explaining why the triggered BBox hit ratio does not always approach zero under the input-aware [PITH_FULL_IMAGE:figures/full_fig_p024_7.png] view at source ↗

**Figure 8.** Figure 8: Training and validation loss during post-training de [PITH_FULL_IMAGE:figures/full_fig_p026_8.png] view at source ↗

read the original abstract

Scanpath prediction models forecast the sequence and timing of human fixations during visual search, driving foveated rendering and attention-based interaction in mobile systems where their integrity is a first-class security concern. We present the first study of backdoor attacks against VLM-based scanpath prediction, evaluated on GazeFormer and COCO-Search18. We show that naive fixed-path attacks, while effective, create detectable clustering in the continuous output space. To overcome this, we design two variable-output attacks: an input-aware spatial attack that redirects predicted fixations toward an attacker-chosen target object, and a scanpath duration attack that inflates fixation durations to delay visual search completion. Both attacks condition their output on the input scene, producing diverse and plausible scanpaths that evade cluster-based detection. We evaluate across three trigger modalities (visual, textual, and multimodal), multiple poisoning ratios, and five post-training defenses, finding that no defense simultaneously suppresses the attacks and preserves clean performance across all configurations. We further demonstrate that backdoor behavior survives quantization and deployment on both flagship and legacy commodity smartphones, confirming practical threat viability for edge-deployed gaze-driven systems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper is the first to show backdoor attacks on VLM scanpath predictors that use input-conditioned variable outputs to avoid clustering detection, and it checks survival after phone quantization, but the deployment results give no numbers on ASR or detectability changes.

read the letter

The core new thing here is the first backdoor study on VLM-based scanpath prediction. The authors show that fixed-path attacks work but cluster in output space and get spotted. They replace them with two input-aware attacks: one that redirects fixations toward a chosen object in the scene, and one that lengthens fixation durations to slow visual search. Both produce varied, plausible scanpaths that dodge cluster detectors. They test visual, textual, and multimodal triggers on GazeFormer with COCO-Search18, across poisoning ratios and five post-training defenses, and report that no defense kills the attacks while keeping clean performance intact. They also run the models on flagship and legacy phones after quantization and say the backdoors still work.

Referee Report

2 major / 2 minor

Summary. The paper presents the first study of backdoor attacks on VLM-based scanpath prediction models (GazeFormer on COCO-Search18). It introduces two variable-output attacks—an input-aware spatial attack redirecting fixations toward an attacker-chosen target object and a scanpath duration attack inflating fixation durations—both conditioned on the input scene to produce diverse, plausible outputs that evade cluster-based detection. Evaluations cover three trigger modalities (visual, textual, multimodal), multiple poisoning ratios, five post-training defenses, and persistence after quantization and deployment on flagship/legacy smartphones.

Significance. If the results hold, the work is significant for exposing practical security risks in gaze-driven mobile systems used for foveated rendering and attention-based interfaces. It contributes by designing variable-output backdoors suited to continuous prediction tasks, demonstrating evasion of standard defenses, and providing broad empirical coverage across triggers and poisoning levels on public datasets and models. The authors receive credit for the reproducible experimental design and the focus on edge-deployment viability.

major comments (2)

[Abstract] Abstract: The claim that 'backdoor behavior survives quantization and deployment on both flagship and legacy commodity smartphones, confirming practical threat viability' is load-bearing for the central conclusion but is unsupported by any reported quantitative metrics on post-quantization ASR degradation, clean accuracy drop, or shifts in cluster-based detectability. Without these numbers (e.g., ASR before/after INT8 quantization), the assertion that the attacks remain effective and undetectable on-device cannot be evaluated.
[§5] §5 (Experimental Evaluation): The statement that 'no defense simultaneously suppresses the attacks and preserves clean performance across all configurations' requires explicit reporting of per-defense ASR values, clean accuracy, and variance across runs for each trigger modality and poisoning ratio. The current summary leaves open the possibility that results depend on specific hyperparameter choices or post-hoc selection of defense configurations.

minor comments (2)

[Methods] Methods section: The precise construction of the input-aware triggers (how the target object is selected and embedded) and the loss formulation for the duration attack should be given with pseudocode or equations to enable exact reproduction.
[Figures] Figure captions: Scanpath visualizations would benefit from explicit coordinate axes, fixation duration scales, and indication of the attacker-chosen target to allow readers to assess plausibility and redirection success directly.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. The two major comments correctly identify areas where additional quantitative detail will improve clarity and verifiability. We address each point below and will revise the manuscript accordingly.

read point-by-point responses

Referee: [Abstract] Abstract: The claim that 'backdoor behavior survives quantization and deployment on both flagship and legacy commodity smartphones, confirming practical threat viability' is load-bearing for the central conclusion but is unsupported by any reported quantitative metrics on post-quantization ASR degradation, clean accuracy drop, or shifts in cluster-based detectability. Without these numbers (e.g., ASR before/after INT8 quantization), the assertion that the attacks remain effective and undetectable on-device cannot be evaluated.

Authors: We acknowledge the referee's point. The quantization and on-device experiments were performed on both flagship and legacy smartphones for all trigger modalities, but the manuscript presented only a high-level summary. In the revision we will insert a dedicated table (and brief accompanying text) in §5 that reports ASR, clean accuracy, and cluster-detectability metrics before and after INT8 quantization, together with the corresponding on-device measurements. This will supply the concrete numbers needed to support the abstract claim. revision: yes
Referee: [§5] §5 (Experimental Evaluation): The statement that 'no defense simultaneously suppresses the attacks and preserves clean performance across all configurations' requires explicit reporting of per-defense ASR values, clean accuracy, and variance across runs for each trigger modality and poisoning ratio. The current summary leaves open the possibility that results depend on specific hyperparameter choices or post-hoc selection of defense configurations.

Authors: The referee is right that a summary statement benefits from granular data. Our experiments evaluated the five defenses across every trigger modality and poisoning ratio with multiple independent runs; variance was computed but not tabulated. We will expand §5 with comprehensive tables that list, for each defense-trigger-poisoning combination, the mean ASR, clean accuracy, and standard deviation across runs. These tables will make the claim that no defense succeeds on both objectives fully transparent and reproducible. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical attack evaluation with external grounding

full rationale

The paper contains no derivations, equations, fitted parameters, or self-referential predictions that could reduce to their own inputs. It is an empirical demonstration of backdoor attacks on GazeFormer using the public COCO-Search18 dataset, with attacks evaluated against standard post-training defenses and quantization. All claims rest on observable performance metrics from external benchmarks rather than any self-definitional, fitted-input, or self-citation load-bearing steps. The central results (attack success rates, evasion of clustering, survival after quantization) are independently testable and do not collapse by construction to author-defined quantities.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the feasibility of poisoning VLM scanpath models with input-conditioned triggers and on the assumption that cluster-based detection is the relevant baseline; no free parameters or invented entities are introduced beyond standard adversarial ML assumptions.

axioms (1)

domain assumption Poisoning a fraction of the training data with triggers can implant backdoor behavior in VLM scanpath models
Standard premise in backdoor attack literature invoked to justify the attack construction

pith-pipeline@v0.9.0 · 5514 in / 1316 out tokens · 43657 ms · 2026-05-10T16:54:25.339161+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

73 extracted references · 12 canonical work pages · 3 internal anchors

[1]

Eugene Bagdasaryan, Andreas Veit, Yiqing Hua, Deborah Estrin, and Vi- taly Shmatikov. 2020. How to backdoor federated learning. InAISTATS

2020
[2]

Jiawang Bai, Kuofeng Gao, Dihong Gong, Shu-Tao Xia, Zhifeng Li, and Wei Liu. 2022. Hardly perceptible trojan attack against neural networks with bit flips. InECCV

2022
[3]

Jiawang Bai, Kuofeng Gao, Shaobo Min, Shu-Tao Xia, Zhifeng Li, and Wei Liu. 2024. Badclip: Trigger-aware prompt learning for backdoor attacks on clip. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 24239–24250

2024
[4]

Jiawang Bai, Baoyuan Wu, Zhifeng Li, and Shu-Tao Xia. 2023. Versatile weight attack via flipping limited bits.IEEE Transactions on Pattern Analysis and Machine Intelligence(2023)

2023
[5]

Jiawang Bai, Baoyuan Wu, Yong Zhang, Yiming Li, Zhifeng Li, and Shu-Tao Xia. 2021. Targeted attack against deep neural networks via flipping limited weight bits.arXiv preprint arXiv:2102.10496(2021)

work page arXiv 2021
[6]

Hritik Bansal, Nishad Singhi, Yu Yang, Fan Yin, Aditya Grover, and Kai-Wei Chang. 2023. Cleanclip: Mitigating data poisoning attacks in multimodal contrastive learning. InProceedings of the IEEE/CVF International Conference on Computer Vision. 112–123

2023
[7]

Ali Borji and Laurent Itti. 2012. State-of-the-art in visual attention modeling.IEEE transactions on pattern analysis and machine intelligence 35, 1 (2012), 185–207

2012
[8]

Giuseppe Cartella, Vittorio Cuculo, Alessandro D’Amelio, Marcella Cornia, Giuseppe Boccignone, and Rita Cucchiara. 2025. Modeling Human Gaze Behavior with Diffusion Models for Unified Scanpath Prediction. InProceedings of the IEEE/CVF International Conference on Computer Vision. 16206–16216

2025
[9]

Bryant Chen, Wilka Carvalho, Nathalie Baracaldo, Heiko Ludwig, Benjamin Edwards, Taesung Lee, Ian Molloy, and Biplav Srivastava
[10]

Detecting backdoor attacks on deep neural networks by activation clustering.arXiv preprint arXiv:1811.03728(2018)

work page arXiv 2018
[11]

Ning Chen, Yiran Shen, Tongyu Zhang, Yanni Yang, and Hongkai Wen. 2025. Ex-gaze: High-frequency and low-latency gaze tracking with hybrid event-frame cameras for on-device extended reality.IEEE Transactions on Visualization and Computer Graphics(2025)

2025
[12]

Xianyu Chen, Ming Jiang, and Qi Zhao. 2021. Predicting human scanpaths in visual question answering. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10876–10885

2021
[13]

Xinyun Chen, Chang Liu, Bo Li, Kimberly Lu, and Dawn Song. 2017. Targeted backdoor attacks on deep learning systems using data poisoning.arXiv:1712.05526(2017)

work page internal anchor Pith review arXiv 2017
[14]

Yupei Chen, Zhibo Yang, Seoyoung Ahn, Dimitris Samaras, Minh Hoai, and Gregory Zelinsky. 2021. Coco-search18 fixation dataset for predict- ing goal-directed attention control.Scientific reports11, 1 (2021), 8776

2021
[15]

Jeremy Cohen, Elan Rosenfeld, and Zico Kolter. 2019. Certified Adversarial Robustness via Randomized Smoothing. InProceedings of the 36th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 97), Kamalika Chaudhuri and Ruslan Salakhutdinov (Eds.). PMLR, 1310–1320

2019
[16]

Lingyu Du, Yupei Liu, Jinyuan Jia, and Guohao Lan. 2025. SecureGaze: Defending Gaze Estimation Against Backdoor Attacks. InProceedings of the 23rd ACM Conference on Embedded Networked Sensor Systems. 102–115

2025
[17]

Kuofeng Gao, Jiawang Bai, Baoyuan Wu, Mengxi Ya, and Shu-Tao Xia
[18]

Imperceptible and robust backdoor attack in 3d point cloud.IEEE Transactions on Information Forensics and Security19 (2023), 1267–1282

2023
[19]

Yansong Gao, Change Xu, Derui Wang, Shiping Chen, Damith C Ranasinghe, and Surya Nepal. 2019. Strip: A defence against trojan attacks on deep neural networks. InProceedings of the 35th annual computer security applications conference. 113–125

2019
[20]

Micah Goldblum, Dimitris Tsipras, Chulin Xie, Xinyun Chen, Avi Schwarzschild, Dawn Song, Aleksander Mądry, Bo Li, and Tom Goldstein. 2022. Dataset security for machine learning: Data poisoning, backdoor attacks, and defenses.IEEE Transactions on Pattern Analysis and Machine Intelligence45, 2 (2022), 1563–1580

2022
[21]

Tianyu Gu, Brendan Dolan-Gavitt, and Siddharth Garg. 2017. Badnets: Identifying vulnerabilities in the machine learning model supply chain. arXiv:1708.06733(2017)

work page internal anchor Pith review arXiv 2017
[22]

Junfeng Guo, Ang Li, and Cong Liu. 2021. Aeva: Black-box backdoor detection using adversarial extreme value analysis.arXiv preprint arXiv:2110.14880(2021)

work page arXiv 2021
[23]

Asif Hanif, Fahad Shamshad, Muhammad Awais, Muzammal Naseer, Fahad Shahbaz Khan, Karthik Nandakumar, Salman Khan, and Rao Muhammad Anwer. 2024. Baple: Backdoor attacks on medical foundational models using prompt learning. InInternational Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 443–453

2024
[24]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. InCVPR

2016
[25]

Linshan Hou, Wei Luo, Zhongyun Hua, Songhua Chen, Leo Yu Zhang, and Yiming Li. 2025. Flare: Towards universal dataset purification against backdoor attacks.IEEE Transactions on Information Forensics and Security(2025)

2025
[26]

Kunzhe Huang, Yiming Li, Baoyuan Wu, Zhan Qin, and Kui Ren
[27]

In International Conference on Learning Representations

Backdoor Defense via Decoupling the Training Process. In International Conference on Learning Representations
[28]

Xun Huang, Chengyao Shen, Xavier Boix, and Qi Zhao. 2015. Salicon: Reducing the semantic gap in saliency prediction by adapting deep neural networks. InProceedings of the IEEE international conference on computer vision. 262–270

2015
[29]

Srinivas SS Kruthiventi, Kumar Ayush, and R Venkatesh Babu. 2017. Deepfix: A fully convolutional neural network for predicting human eye fixations.IEEE Transactions on Image Processing26, 9 (2017), 4446–4456

2017
[30]

Matthias Kümmerer and Matthias Bethge. 2021. State-of-the-art in human scanpath prediction.arXiv preprint arXiv:2102.12239(2021)

work page arXiv 2021
[31]

Matthias Kümmerer, Lucas Theis, and Matthias Bethge. 2014. Deep gaze i: Boosting saliency prediction with feature maps trained on imagenet.arXiv preprint arXiv:1411.1045(2014)

work page arXiv 2014
[32]

Matthias Kummerer, Thomas SA Wallis, Leon A Gatys, and Matthias Bethge. 2017. Understanding low-and high-level contributions to fixation prediction. InProceedings of the IEEE international conference on computer vision. 4789–4798. 13 Romero, Ali, Khan, Farrukh, Anwar, Elmalaki

2017
[33]

Keita Kurita, Paul Michel, and Graham Neubig. 2020. Weight poisoning attacks on pre-trained models.arXiv preprint arXiv:2004.06660(2020)

work page arXiv 2020
[34]

Quentin Laborde, Axel Roques, Allan Armougum, Nicolas Vayatis, Ioannis Bargiotas, and Laurent Oudre. 2026. Vision toolkit part 3. Scan- paths and derived representations for gaze behavior characterization: a review.Frontiers in Physiology16 (2026), 1721768

2026
[35]

Jingjie Li, Amrita Roy Chowdhury, Kassem Fawaz, and Younghyun Kim. 2021. {Kal𝜀ido}:{Real-Time} privacy control for{Eye-Tracking} systems. In30th USENIX security symposium (USENIX security 21). 1793–1810

2021
[36]

Yiming Li, Yong Jiang, Zhifeng Li, and Shu-Tao Xia. 2022. Backdoor learning: A survey.IEEE transactions on neural networks and learning systems35, 1 (2022), 5–22

2022
[37]

Yuezun Li, Yiming Li, Baoyuan Wu, Longkang Li, Ran He, and Siwei Lyu
[38]

Invisible backdoor attack with sample-specific triggers. InICCV
[39]

Yige Li, Xixiang Lyu, Nodens Koren, Lingjuan Lyu, Bo Li, and Xingjun Ma. 2021. Anti-backdoor learning: Training clean models on poisoned data.Advances in Neural Information Processing Systems34 (2021), 14900–14912

2021
[40]

Yige Li, Xixiang Lyu, Nodens Koren, Lingjuan Lyu, Bo Li, and Xingjun Ma. 2021. Neural Attention Distillation: Erasing Backdoor Triggers from Deep Neural Networks. InInternational Conference on Learning Representations

2021
[41]

Siyuan Liang, Mingli Zhu, Aishan Liu, Baoyuan Wu, Xiaochun Cao, and Ee-Chien Chang. 2024. Badclip: Dual-embedding guided backdoor at- tack on multimodal contrastive learning. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 24645–24654

2024
[42]

Kang Liu, Brendan Dolan-Gavitt, and Siddharth Garg. 2018. Fine- pruning: Defending against backdooring attacks on deep neural networks. InRAID

2018
[43]

Wenxuan Liu, Budmonde Duinkharjav, Qi Sun, and Sai Qian Zhang
[44]

Fovealnet: Advancing ai-driven gaze tracking solutions for efficient foveated rendering in virtual reality.IEEE Transactions on Visualization and Computer Graphics(2025)

2025
[45]

Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov
[46]

Roberta: A robustly optimized bert pretraining approach.arXiv preprint arXiv:1907.11692(2019)

work page internal anchor Pith review Pith/arXiv arXiv 1907
[47]

Sounak Mondal, Seoyoung Ahn, Zhibo Yang, Niranjan Balasubra- manian, Dimitris Samaras, Gregory Zelinsky, and Minh Hoai. 2024. Look hear: Gaze prediction for speech-directed human attention. In European Conference on Computer Vision. 236–255

2024
[48]

Sounak Mondal, Naveen Sendhilnathan, Ting Zhang, Yue Liu, Michael Proulx, Michael Louis Iuzzolino, Chuan Qin, and Tanya R Jonker. 2025. Gaze-Language Alignment for Zero-Shot Prediction of Visual Search Targets from Human Gaze Scanpaths. InProceedings of the IEEE/CVF International Conference on Computer Vision. 2738–2749

2025
[49]

Sounak Mondal, Zhibo Yang, Seoyoung Ahn, Dimitris Samaras, Gregory Zelinsky, and Minh Hoai. 2023. GazeFormer: Scalable, effective and fast prediction of goal-directed human attention. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1441–1450

2023
[50]

Fanchao Qi, Yangyi Chen, Mukai Li, Yuan Yao, Zhiyuan Liu, and Maosong Sun. 2021. Onion: A simple and effective defense against textual backdoor attacks. InProceedings of the 2021 conference on empirical methods in natural language processing. 9558–9566

2021
[51]

Aniruddha Saha, Akshayvarun Subramanya, and Hamed Pirsiavash
[52]

InProceedings of the AAAI conference on artificial intelligence, Vol

Hidden trigger backdoor attacks. InProceedings of the AAAI conference on artificial intelligence, Vol. 34. 11957–11965
[53]

Zeyang Sha, Xinlei He, Pascal Berrang, Mathias Humbert, and Yang Zhang. 2022. Fine-tuning is all you need to mitigate backdoor attacks. arXiv preprint arXiv:2212.09067(2022)

work page arXiv 2022
[54]

Shiqui Shen, Shruti Tople, and Prateek Saxena. 2016. AUROR: Defend- ing againsts poisoning attacks in collaborative deep learning systems. 2016 Annual Computer Security Applications Conference(2016)

2016
[55]

Zheyu Shen, Yexiao He, Ziyao Wang, Yuning Zhang, Guoheng Sun, Wanghao Ye, and Ang Li. 2025. EdgeLoRA: An Efficient Multi-Tenant LLM Serving System on Edge Devices. InProceedings of the 23rd Annual International Conference on Mobile Systems, Applications and Services. 138–153

2025
[56]

Rahul Singh, Muhammad Huzaifa, Jeffrey Liu, Anjul Patney, Hashim Sharif, Yifan Zhao, and Sarita Adve. 2023. Power, performance, and image quality tradeoffs in foveated rendering. In2023 IEEE Conference Virtual Reality and 3D User Interfaces (VR). IEEE, 205–214

2023
[57]

Brandon Tran, Jerry Li, and Aleksander Madry. 2018. Spectral signatures in backdoor attacks. InNeurIPS

2018
[58]

Alexander Turner, Dimitris Tsipras, and Aleksander Madry. 2019. Label- consistent backdoor attacks.arXiv preprint arXiv:1912.02771(2019)

work page arXiv 2019
[59]

Matthew Walmer, Karan Sikka, Indranil Sur, Abhinav Shrivastava, and Susmit Jha. 2022. Dual-key multimodal backdoors for visual question answering. InProceedings of the IEEE/CVF Conference on computer vision and pattern recognition. 15375–15385

2022
[60]

Bolun Wang, Yuanshun Yao, Shawn Shan, Huiying Li, Bimal Viswanath, Haitao Zheng, and Ben Y Zhao. 2019. Neural cleanse: Identifying and mitigating backdoor attacks in neural networks. In2019 IEEE symposium on security and privacy (SP). IEEE, 707–723

2019
[61]

Zhenting Wang, Kai Mei, Hailun Ding, Juan Zhai, and Shiqing Ma
[62]

Rethinking the reverse-engineering of trojan triggers.Advances in Neural Information Processing Systems35 (2022), 9738–9753

2022
[63]

Baoyuan Wu, Hongrui Chen, Mingda Zhang, Zihao Zhu, Shaokui Wei, Danni Yuan, Chao Shen, and Hongyuan Zha. 2022. Backdoorbench: A comprehensive benchmark of backdoor learning. InNeurIPS

2022
[64]

Dongxian Wu and Yisen Wang. 2021. Adversarial neuron pruning purifies backdoored deep models.Advances in Neural Information Processing Systems34 (2021), 16913–16925

2021
[65]

Xiong Xu, Kunzhe Huang, Yiming Li, Zhan Qin, and Kui Ren. 2024. Towards Reliable and Efficient Backdoor Trigger Inversion via Decoupling Benign Features. InICLR

2024
[66]

Ruoyu Xue, Jingyi Xu, Sounak Mondal, Hieu Le, Greg Zelinsky, Minh Hoai, and Dimitris Samaras. 2025. Few-shot personalized scanpath prediction. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 13497–13507

2025
[67]

Yuan Xun, Siyuan Liang, Xiaojun Jia, Xinwei Liu, and Xiaochun Cao. 2024. CleanerCLIP: Fine-grained Counterfactual Semantic Augmentation for Backdoor Defense in Contrastive Learning.arXiv preprint arXiv:2409.17601(2024)

work page arXiv 2024
[68]

Mengxi Ya, Yiming Li, Tao Dai, Bin Wang, Yong Jiang, and Shu-Tao Xia
[69]

Towards Faithful XAI Evaluation via Generalization-Limited Backdoor Watermark. InICLR
[70]

Zhibo Yang, Sounak Mondal, Seoyoung Ahn, Ruoyu Xue, Gregory Zelinsky, Minh Hoai, and Dimitris Samaras. 2024. Unifying top-down and bottom-up scanpath prediction using transformers. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1683–1693

2024
[71]

Xinyu Zhang, Hanbin Hong, Yuan Hong, Peng Huang, Binghui Wang, Zhongjie Ba, and Kui Ren. 2024. Text-crs: A generalized certified robustness framework against textual adversarial attacks. In2024 IEEE Symposium on Security and Privacy (SP). IEEE, 2920–2938

2024
[72]

Liuwan Zhu, Rui Ning, Jiang Li, Chunsheng Xin, and Hongyi Wu
[73]

InProceedings of the AAAI conference on artificial intelligence, Vol

Seer: Backdoor detection for vision-language models through searching target text and image trigger jointly. InProceedings of the AAAI conference on artificial intelligence, Vol. 38. 7766–7774. Appendix 14 Follow My Eyes: Backdoor Attacks on VLM-based Scanpath Prediction We provide additional information for our paper in the following order: •Textual Trig...

2000