BYORn: Bootstrap Your Own Responses to Defend Large Vision-Language Models Against Backdoor Attacks

Ivan Saboli\'c; Josip \v{S}ari\'c; Marin Or\v{s}i\'c; Sven Lon\v{c}ari\'c

arxiv: 2606.02947 · v1 · pith:JETOM6VMnew · submitted 2026-06-01 · 💻 cs.LG · cs.CV

BYORn: Bootstrap Your Own Responses to Defend Large Vision-Language Models Against Backdoor Attacks

Ivan Saboli\'c , Marin Or\v{s}i\'c , Josip \v{S}ari\'c , Sven Lon\v{c}ari\'c This is my paper

Pith reviewed 2026-06-28 15:08 UTC · model grok-4.3

classification 💻 cs.LG cs.CV

keywords backdoor attacksvision-language modelsfine-tuning defenserobustnessautoregressive modelsresponse bootstrappingadversarial robustness

0 comments

The pith

BYORn defends vision-language models from backdoor attacks by replacing implausible poisoned responses with model-generated alternatives during fine-tuning.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that supervised fine-tuning of autoregressive vision-language models is vulnerable to backdoor attacks because existing defenses do not work well in open-ended generation. BYORn identifies poisoned target responses that are semantically implausible for the given image-text inputs and a pretrained model, then replaces them on the fly with responses the model itself generates. This breaks the association between triggers and target outputs. The resulting training objective has a gradient that matches the empirical estimate of an upper bound on population risk computed only over clean data. Experiments indicate the method raises robustness to both standard and adaptive attacks while keeping clean-task accuracy intact.

Core claim

BYORn is a fine-tuning framework that identifies misaligned responses in poisoned datasets and replaces them with responses bootstrapped from the pretrained model, thereby preventing the model from learning the backdoor association. The resulting training objective has a gradient equivalent to the empirical estimate of the population risk upper bound on clean data. This leads to consistent robustness improvements against backdoor attacks without sacrificing generalization on clean tasks, and the defense holds against adaptive attacks designed to circumvent it.

What carries the argument

Dynamic identification and replacement of poisoned target responses based on their semantic implausibility relative to image-text inputs and the pretrained model.

If this is right

The method establishes a new trade-off curve between clean-task generalization and attack success rate that is better than prior defenses.
The gradient of the training objective matches the gradient of the empirical upper bound on clean-data population risk.
Effectiveness persists against adaptive attacks constructed specifically to evade the replacement step.
The approach applies directly to open-ended generation settings where previous defenses were shown to be ineffective.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same replacement logic could be tested on text-only language models fine-tuned with similar poisoning.
If the identification step can be made cheaper, the method might reduce reliance on large curated clean datasets for safety-critical fine-tuning.
The population-risk interpretation suggests the technique may generalize to other forms of data poisoning beyond backdoors.

Load-bearing premise

Poisoned target responses are often semantically implausible given the corresponding image-text inputs and a pretrained model, allowing reliable identification and replacement during fine-tuning.

What would settle it

A controlled experiment in which replacing identified responses fails to reduce attack success rate on a standard backdoor trigger would falsify the central mechanism.

Figures

Figures reproduced from arXiv: 2606.02947 by Ivan Saboli\'c, Josip \v{S}ari\'c, Marin Or\v{s}i\'c, Sven Lon\v{c}ari\'c.

**Figure 1.** Figure 1: BYORn is Pareto optimal in poisoned Flickr30k, effectively balancing good image-captioning accuracy (CIDEr) with low attack success rate (ASR). Color depicts the model architecture, whereas shape denotes the defense method. et al., 2022), supervised fine-tuning substantially improves the model ability to interpret and execute complex tasks (Ouyang et al., 2022; Chung et al., 2024; Liu et al., 2023), even… view at source ↗

**Figure 2.** Figure 2: BYORn trains a backdoor-resilient vision-language model on a poisoned dataset. First, it identifies poisoned examples based on the semantic misalignment between image-instruction input pairs and target responses. During training, BYORn bootstraps clean replacement responses for detected poisoned samples, enabling parameter updates using available target responses for clean data and generated clean response… view at source ↗

**Figure 3.** Figure 3: Detection score histogram (cf. Equation (3)) on the LADD dataset poisoned with VL-Trojan or DualKey (Liang et al., 2025; Walmer et al., 2022). Black dot marks 100% recall threshold. distribution, ensuring successful learning of the original task. We describe BYORn below. 4.1. Poisoned training examples have unlikely responses Existing backdoor attacks on vision-language models (Walmer et al., 2022; Liang e… view at source ↗

**Figure 4.** Figure 4: analyses the utility of the gradient (8) for training a backdoor resilient model. The figure visualizes empirical risk on clean test data throughout the iterations. The blue curve is obtained by minimizing the empirical risk on the clean training data, i.e., the parameter updates are done by differentiating the empirical risk (2). This is an ideal case since the utilized training data is not poisoned. The… view at source ↗

**Figure 5.** Figure 5: visualizes backdoored examples for the four considered attacks. We present responses generated by both no defense baseline and BYORn. The proposed method consistently outputs clean responses, showcasing resilience to both visual and textual triggers. Instruction: Describe this image using one or more simple sentences. zbw Response: The image depicts a photo of a clock. No defense BYORn Instruction: Descri… view at source ↗

read the original abstract

Supervised fine-tuning is the predominant approach for adapting autoregressive vision-language models to downstream tasks. Recent work has shown that this paradigm is highly vulnerable to backdoor attacks, and that existing defenses are ineffective in open-ended generation settings. In response, we propose BYORn, a backdoor-robust fine-tuning framework motivated by the observation that poisoned target responses are often semantically implausible given the corresponding image-text inputs and a pretrained model. BYORn identifies such misaligned responses and dynamically replaces them with alternative responses generated by the model, thereby breaking the correlation between triggers and target outputs. The resulting objective gradient corresponds to the gradient of the empirical estimate of the population risk upper bound over the clean data distribution. Empirically, BYORn consistently improves robustness to backdoor attacks while preserving clean-task performance, establishing a new trade-off frontier between generalization and attack success rate. Finally, we demonstrate that BYORn remains effective against adaptive attacks specifically designed to circumvent the proposed defense.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

BYORn's self-replacement of implausible targets during fine-tuning gives a practical defense that holds in their adaptive attack tests, though the detection step remains the main assumption to verify.

read the letter

The main point is that this paper introduces BYORn, which spots poisoned target responses during supervised fine-tuning of vision-language models because they often look off to a clean pretrained model, then swaps them for the model's own generations. This breaks the trigger-target link and keeps clean performance intact.

What is new is the dynamic replacement mechanism itself, aimed at open-ended generation where prior defenses reportedly fail. The abstract ties the resulting gradient to an empirical estimate of a population risk upper bound over clean data, and the experiments claim consistent robustness gains plus a better trade-off curve. They also test against adaptive attacks designed to dodge the defense.

The work is straightforward on the empirical side and addresses a real deployment risk in fine-tuning VLMs. The adaptive attack results are useful because they directly engage the stress-test worry about detection failing when targets are made plausible.

The soft spot is still the core detection assumption. If attackers can choose or craft targets that pass as reasonable under the pretrained model, replacements happen less often and the correlation may not break reliably. The gradient correspondence claim would need the full derivation to check whether it stands independently or collapses with detection errors. Minor implementation details like the exact plausibility threshold could also affect reproducibility.

This is for people working on secure adaptation of multimodal models or backdoor defenses. Anyone running fine-tuning pipelines on image-text data would find the method and the reported trade-offs worth trying. It has enough grounding in a timely problem and targeted experiments to merit serious referee time rather than a desk reject.

Referee Report

3 major / 2 minor

Summary. The manuscript presents BYORn, a backdoor-robust fine-tuning framework for autoregressive vision-language models. It detects poisoned target responses deemed semantically implausible given image-text inputs and a pretrained model, then replaces them with model-generated alternatives to break trigger-target correlations. The resulting objective gradient is claimed to correspond to the gradient of the empirical estimate of a population risk upper bound over the clean data distribution. Experiments report consistent robustness gains against backdoor attacks (including adaptive ones) while preserving clean-task performance, establishing an improved trade-off frontier between generalization and attack success rate.

Significance. If the detection mechanism holds under adaptive attacks and the gradient correspondence is non-circular, the work could offer a practical defense for open-ended VLM fine-tuning where prior methods fail. The empirical trade-off improvement would be a useful contribution to robust adaptation literature.

major comments (3)

[Abstract] Abstract: The claim that 'the resulting objective gradient corresponds to the gradient of the empirical estimate of the population risk upper bound over the clean data distribution' is stated without equations, derivation, or definition of the upper bound; this leaves open whether the correspondence is independent or reduces by construction to the replacement step.
[§3] §3 (Method, detection step): The defense relies on identifying poisoned responses as 'often semantically implausible'; this assumption is load-bearing, yet the manuscript does not quantify detection precision/recall or analyze failure modes when an adaptive attacker selects context-fitting targets that remain plausible under the clean model.
[§5] §5 (Experiments, adaptive attacks): The abstract asserts effectiveness against adaptive attacks designed to circumvent the defense, but without reported detection accuracy on poisoned samples or ablation on replacement frequency under such attacks, it is unclear whether replacements occur on the clean distribution as required for the gradient claim to hold.

minor comments (2)

Clarify notation for the population risk upper bound and its empirical estimate to avoid ambiguity with standard risk terms.
Ensure tables reporting attack success rate and clean accuracy include standard deviations across runs and explicit baseline comparisons.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the detailed and constructive feedback. We address each major comment below, providing clarifications on the theoretical grounding and committing to empirical enhancements where appropriate.

read point-by-point responses

Referee: [Abstract] Abstract: The claim that 'the resulting objective gradient corresponds to the gradient of the empirical estimate of the population risk upper bound over the clean data distribution' is stated without equations, derivation, or definition of the upper bound; this leaves open whether the correspondence is independent or reduces by construction to the replacement step.

Authors: The correspondence is derived in §3.2, where the population risk upper bound is defined as the expected negative log-likelihood under the clean data distribution, and the replacement step (using responses sampled from the fixed pretrained model) produces an objective whose gradient is an unbiased estimator of the clean gradient. This holds independently of the specific replacement values because the sampling distribution is the pretrained model (not updated during fine-tuning), avoiding circularity. We will add a concise derivation sketch and explicit definition of the upper bound to the revised manuscript for clarity. revision: yes
Referee: [§3] §3 (Method, detection step): The defense relies on identifying poisoned responses as 'often semantically implausible'; this assumption is load-bearing, yet the manuscript does not quantify detection precision/recall or analyze failure modes when an adaptive attacker selects context-fitting targets that remain plausible under the clean model.

Authors: Detection identifies low-likelihood responses under the pretrained model. While end-to-end robustness results provide supporting evidence, we agree that explicit precision/recall and failure-mode analysis would strengthen the presentation. We will add these metrics, including on adaptive attacks, in the revised §3 and §5. revision: yes
Referee: [§5] §5 (Experiments, adaptive attacks): The abstract asserts effectiveness against adaptive attacks designed to circumvent the defense, but without reported detection accuracy on poisoned samples or ablation on replacement frequency under such attacks, it is unclear whether replacements occur on the clean distribution as required for the gradient claim to hold.

Authors: §5.3 already evaluates adaptive attacks that attempt to produce plausible targets, with BYORn retaining strong robustness. This outcome is consistent with replacements occurring on the clean distribution. To make the link explicit, we will add detection accuracy and replacement-frequency ablations under adaptive attacks in the revised experiments. revision: yes

Circularity Check

0 steps flagged

No circularity detected in derivation chain

full rationale

The abstract states that the BYORn objective gradient corresponds to the gradient of an empirical estimate of the population risk upper bound, but provides no equations or derivation steps that reduce this claim to a fitted parameter or self-citation by construction. The method is motivated by an external observation about semantic implausibility of poisoned responses, and the gradient correspondence is presented as a consequence rather than a tautological renaming or self-referential fit. No self-citation load-bearing steps, ansatz smuggling, or uniqueness theorems from prior author work are invoked in the given text. The central claim therefore remains self-contained against external benchmarks and does not exhibit any of the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review provides no information on free parameters, axioms, or invented entities.

pith-pipeline@v0.9.1-grok · 5728 in / 1141 out tokens · 28390 ms · 2026-06-28T15:08:13.651233+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

114 extracted references · 14 canonical work pages · 10 internal anchors

[1]

International Journal of Computer Vision , author =

VL-Trojan: Multimodal Instruction Backdoor Attacks against Autoregressive Visual Language Models , DOI =. International Journal of Computer Vision , author =
[2]

An Empirical Study on Parameter-Efficient Fine-Tuning for M ulti M odal Large Language Models

Zhou, Xiongtao and He, Jie and Ke, Yuhua and Zhu, Guangyao and Gutierrez Basulto, Victor and Pan, Jeff. An Empirical Study on Parameter-Efficient Fine-Tuning for M ulti M odal Large Language Models. Findings of the Association for Computational Linguistics: ACL 2024. 2024

2024
[3]

International Conference on Machine Learning,

Yaniv Leviathan and Matan Kalman and Yossi Matias , title =. International Conference on Machine Learning,
[4]

Learning to Describe Differences Between Pairs of Similar Images , booktitle =

Harsh Jhamtani and Taylor Berg. Learning to Describe Differences Between Pairs of Similar Images , booktitle =
[5]

IEEE Conference on Computer Vision and Pattern Recognition , year=

Show and tell: A neural image caption generator , author=. IEEE Conference on Computer Vision and Pattern Recognition , year=
[6]

Lee and Deming Chen and Tri Dao , title =

Tianle Cai and Yuhong Li and Zhengyang Geng and Hongwu Peng and Jason D. Lee and Deming Chen and Tri Dao , title =. International Conference on Machine Learning,
[7]

Transactions on Machine Learning Research , year=

Vision-Language Instruction Tuning: A Review and Analysis , author=. Transactions on Machine Learning Research , year=
[8]

Xiaotian Han and Yiqi Wang and Bohan Zhai and Quanzeng You and Hongxia Yang , title =
[9]

Williams , title =

Ronald J. Williams , title =. Machine Learning , year =
[10]

Boyd, Stephen and Vandenberghe, Lieven , isbn =
[11]

Findings of the Association for Computational Linguistics , publisher =

Zhengxiang Shi and Francesco Tonolini and Nikolaos Aletras and Emine Yilmaz and Gabriella Kazai and Yunlong Jiao , title =. Findings of the Association for Computational Linguistics , publisher =
[12]

2024 , eprint=

SemiEvol: Semi-supervised Fine-tuning for LLM Adaptation , author=. 2024 , eprint=

2024
[13]

Instructive Decoding: Instruction-Tuned Large Language Models are Self-Refiner from Noisy Instructions , booktitle =

Taehyeon Kim and Joonkee Kim and Gihun Lee and Se. Instructive Decoding: Instruction-Tuned Large Language Models are Self-Refiner from Noisy Instructions , booktitle =
[14]

preprint , year =

Bo Li and Yuanhan Zhang and Liangyu Chen and Jinghao Wang and Fanyi Pu and Jingkang Yang and Chunyuan Li and Ziwei Liu , title =. preprint , year =
[15]

Xu Jia and Efstratios Gavves and Basura Fernando and Tinne Tuytelaars , title =
[16]

Tom B. Brown and Benjamin Mann and Nick Ryder and Melanie Subbiah and Jared Kaplan and Prafulla Dhariwal and Arvind Neelakantan and Pranav Shyam and Girish Sastry and Amanda Askell and Sandhini Agarwal and Ariel Herbert. Language Models are Few-Shot Learners , booktitle =
[17]

Neural Information Processing Systems, NeurIPS , year =

Christoph Schuhmann and Romain Beaumont and Richard Vencu and Cade Gordon and Ross Wightman and Mehdi Cherti and Theo Coombes and Aarush Katta and Clayton Mullis and Mitchell Wortsman and Patrick Schramowski and Srivatsa Kundurthy and Katherine Crowson and Ludwig Schmidt and Robert Kaczmarczyk and Jenia Jitsev , title =. Neural Information Processing Syst...
[18]

Zhao and Yanping Huang and Andrew M

Hyung Won Chung and Le Hou and Shayne Longpre and Barret Zoph and Yi Tay and William Fedus and Yunxuan Li and Xuezhi Wang and Mostafa Dehghani and Siddhartha Brahma and Albert Webson and Shixiang Shane Gu and Zhuyun Dai and Mirac Suzgun and Xinyun Chen and Aakanksha Chowdhery and Alex Castroros and Marie Pellat and Kevin Robinson and Dasha Valter and Shar...
[19]

Microsoft

Xinlei Chen and Hao Fang and Tsung. Microsoft. CoRR , volume =
[20]

Lawrence Zitnick and Devi Parikh , title =

Ramakrishna Vedantam and C. Lawrence Zitnick and Devi Parikh , title =
[21]

Paulus and Dami Choi and Daniel Tarlow and Andreas Krause and Chris J

Max B. Paulus and Dami Choi and Daniel Tarlow and Andreas Krause and Chris J. Maddison , title =. Neural Information Processing Systems , year =
[22]

Back to Basics: Revisiting REINFORCE-Style Optimization for Learning from Human Feedback in LLMs , booktitle =

Arash Ahmadian and Chris Cremer and Matthias Gall. Back to Basics: Revisiting REINFORCE-Style Optimization for Learning from Human Feedback in LLMs , booktitle =
[23]

Donsker, M. D. and Varadhan, S. R. S. , year =. Asymptotic evaluation of certain Markov process expectations for large time—III , journal =
[24]

Asymptotic evaluation of certain markov process expectations for large time

Donsker, M D and Varadhan, S R S. Asymptotic evaluation of certain markov process expectations for large time. IV. Commun. Pure Appl. Math
[25]

Kevin P. Murphy. Probabilistic Machine Learning: An introduction
[26]

Le and Geoffrey E

Noam Shazeer and Azalia Mirhoseini and Krzysztof Maziarz and Andy Davis and Quoc V. Le and Geoffrey E. Hinton and Jeff Dean , title =. International Conference on Learning Representations
[27]

CuMo: Scaling Multimodal

Jiachen Li and Xinyao Wang and Sijie Zhu and Chia-Wen Kuo and Lu XU and Fan Chen and Jitesh Jain and Humphrey Shi and Longyin Wen , booktitle=. CuMo: Scaling Multimodal
[28]

Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena , booktitle =

Lianmin Zheng and Wei. Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena , booktitle =
[29]

LLaMA: Open and Efficient Foundation Language Models , journal =

Hugo Touvron and Thibaut Lavril and Gautier Izacard and Xavier Martinet and Marie. LLaMA: Open and Efficient Foundation Language Models , journal =
[30]

International Conference on Machine Learning,

Alec Radford and Jong Wook Kim and Chris Hallacy and Aditya Ramesh and Gabriel Goh and Sandhini Agarwal and Girish Sastry and Amanda Askell and Pamela Mishkin and Jack Clark and Gretchen Krueger and Ilya Sutskever , title =. International Conference on Machine Learning,
[31]

IEEE Transactions on Neural Networks and Learning Systems , year=

Backdoor learning: A survey , author=. IEEE Transactions on Neural Networks and Learning Systems , year=
[32]

Professor Forcing: A New Algorithm for Training Recurrent Networks , year =

Lamb, Alex M and ALIAS PARTH GOYAL, Anirudh Goyal and Zhang, Ying and Zhang, Saizheng and Courville, Aaron C and Bengio, Yoshua , booktitle =. Professor Forcing: A New Algorithm for Training Recurrent Networks , year =
[33]

IEEE Transactions on Pattern Analysis and Machine Intelligence , year=

Babytalk: Understanding and generating simple image descriptions , author=. IEEE Transactions on Pattern Analysis and Machine Intelligence , year=
[34]

European conference on computer vision , pages=

Visual prompt tuning , author=. European conference on computer vision , pages=. 2022 , organization=

2022
[35]

How Abilities in Large Language Models are Affected by Supervised Fine-tuning Data Composition

Dong, Guanting and Yuan, Hongyi and Lu, Keming and Li, Chengpeng and Xue, Mingfeng and Liu, Dayiheng and Wang, Wei and Yuan, Zheng and Zhou, Chang and Zhou, Jingren. How Abilities in Large Language Models are Affected by Supervised Fine-tuning Data Composition. 62nd Annual Meeting of the Association for Computational Linguistics. 2024

2024
[36]

IEEE Access , volume=

Badnets: Evaluating backdooring attacks on deep neural networks , author=. IEEE Access , volume=. 2019 , publisher=

2019
[37]

Targeted Backdoor Attacks on Deep Learning Systems Using Data Poisoning

Targeted backdoor attacks on deep learning systems using data poisoning , author=. arXiv preprint arXiv:1712.05526 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[38]

Advances in Neural Information Processing Systems , volume=

Anti-backdoor learning: Training clean models on poisoned data , author=. Advances in Neural Information Processing Systems , volume=
[39]

Zixuan Zhu and Rui Wang and Cong Zou and Lihua Jing , title =
[40]

International Conference on Learning Representations , year=

WaNet - Imperceptible Warping-based Backdoor Attack , author=. International Conference on Learning Representations , year=
[41]

Proceedings of the IEEE/CVF international conference on computer vision , pages=

Invisible backdoor attack with sample-specific triggers , author=. Proceedings of the IEEE/CVF international conference on computer vision , pages=
[42]

European Conference on Computer Vision , pages=

Trojvlm: Backdoor attack against vision language models , author=. European Conference on Computer Vision , pages=. 2024 , organization=

2024
[43]

The Thirteenth International Conference on Learning Representations , year=

Backdooring Vision-Language Models with Out-Of-Distribution Data , author=. The Thirteenth International Conference on Learning Representations , year=
[44]

arXiv preprint arXiv:2406.18844 , year=

Revisiting backdoor attacks against large vision-language models , author=. arXiv preprint arXiv:2406.18844 , year=

work page arXiv
[45]

arXiv preprint arXiv:2404.12916 , year=

Physical backdoor attack can jeopardize driving with vision-large-language models , author=. arXiv preprint arXiv:2404.12916 , year=

work page arXiv
[46]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

Instructpix2pix: Learning to follow image editing instructions , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=
[47]

Shadowcast: Stealthy Data Poisoning Attacks Against Vision-Language Models , year =

Xu, Yuancheng and Yao, Jiarui and Shu, Manli and Sun, Yanchao and Wu, Zichu and Yu, Ning and Goldstein, Tom and Huang, Furong , booktitle =. Shadowcast: Stealthy Data Poisoning Attacks Against Vision-Language Models , year =
[48]

arXiv preprint arXiv:2402.08577 , year=

Test-time backdoor attacks on multimodal large language models , author=. arXiv preprint arXiv:2402.08577 , year=

work page arXiv
[49]

Journal of Machine Learning Research , year =

William Fedus and Barret Zoph and Noam Shazeer , title =. Journal of Machine Learning Research , year =
[50]

Qwen2.5-Omni Technical Report

Qwen2. 5-omni technical report , author=. arXiv preprint arXiv:2503.20215 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[51]

CoRR , volume =

Chameleon Team , title =. CoRR , volume =
[52]

International Conference on Machine Learning , year =

Biased Gradient Estimate with Drastic Variance Reduction for Meta Reinforcement Learning , author =. International Conference on Machine Learning , year =
[53]

Long Ouyang and Jeffrey Wu and Xu Jiang and Diogo Almeida and Carroll L. Wainwright and Pamela Mishkin and Chong Zhang and Sandhini Agarwal and Katarina Slama and Alex Ray and John Schulman and Jacob Hilton and Fraser Kelton and Luke Miller and Maddie Simens and Amanda Askell and Peter Welinder and Paul F. Christiano and Jan Leike and Ryan Lowe , title =....
[54]

Victor Sanh and Albert Webson and Colin Raffel and Stephen H. Bach and Lintang Sutawika and Zaid Alyafeai and Antoine Chaffin and Arnaud Stiegler and Arun Raja and Manan Dey and M Saiful Bari and Canwen Xu and Urmish Thakker and Shanya Sharma Sharma and Eliza Szczechla and Taewoon Kim and Gunjan Chhablani and Nihal V. Nayak and Debajyoti Datta and Jonatha...
[55]

International Conference on Machine Learning , pages=

Rethinking backdoor attacks , author=. International Conference on Machine Learning , pages=. 2023 , organization=

2023
[56]

Junnan Li and Dongxu Li and Caiming Xiong and Steven C. H. Hoi , title =. International Conference on Machine Learning,
[57]

Instruct

Wenliang Dai and Junnan Li and Dongxu Li and Anthony Tiong and Junqi Zhao and Weisheng Wang and Boyang Li and Pascale Fung and Steven Hoi , booktitle=. Instruct
[58]

MMoE: Enhancing Multimodal Models with Mixtures of Multimodal Interaction Experts , booktitle =

Haofei Yu and Zhengyang Qi and Lawrence Jang and Russ Salakhutdinov and Louis. MMoE: Enhancing Multimodal Models with Mixtures of Multimodal Interaction Experts , booktitle =
[59]

Shaker and Salman H

Hanoona Abdul Rasheed and Muhammad Maaz and Sahal Shaji Mullappilly and Abdelrahman M. Shaker and Salman H. Khan and Hisham Cholakkal and Rao Muhammad Anwer and Eric P. Xing and Ming. GLaMM: Pixel Grounding Large Multimodal Model , booktitle =
[60]

Transactions on Machine Learning Research , year=

Effective Backdoor Mitigation in Vision-Language Models Depends on the Pre-training Objective , author=. Transactions on Machine Learning Research , year=
[61]

Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

Universal adversarial perturbations , author=. Proceedings of the IEEE conference on computer vision and pattern recognition , pages=
[62]

Advances in neural information processing systems , volume=

Visual instruction tuning , author=. Advances in neural information processing systems , volume=
[63]

Gemini: A Family of Highly Capable Multimodal Models

Gemini: a family of highly capable multimodal models , author=. arXiv preprint arXiv:2312.11805 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[64]

OpenFlamingo: An Open-Source Framework for Training Large Autoregressive Vision-Language Models

Openflamingo: An open-source framework for training large autoregressive vision-language models , author=. arXiv preprint arXiv:2308.01390 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[65]

2023 , eprint=

InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning , author=. 2023 , eprint=

2023
[66]

GPT-4 Technical Report

Gpt-4 technical report , author=. arXiv preprint arXiv:2303.08774 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[67]

Proceedings of the IEEE international conference on computer vision , pages=

Vqa: Visual question answering , author=. Proceedings of the IEEE international conference on computer vision , pages=
[68]

IEEE transactions on pattern analysis and machine intelligence , volume=

From show to tell: A survey on deep learning-based image captioning , author=. IEEE transactions on pattern analysis and machine intelligence , volume=. 2022 , publisher=

2022
[69]

International Conference on Machine Learning , pages=

Spectre: Defending against backdoor attacks using robust statistics , author=. International Conference on Machine Learning , pages=. 2021 , organization=

2021
[70]

The Twelfth International Conference on Learning Representations , year=

Towards reliable and efficient backdoor trigger inversion via decoupling benign features , author=. The Twelfth International Conference on Learning Representations , year=
[71]

Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition , pages=

Dual-key multimodal backdoors for visual question answering , author=. Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition , pages=
[72]

IEEE/CVF International Conference on Computer Vision (ICCV) , year =

Sur, Indranil and Sikka, Karan and Walmer, Matthew and Koneripalli, Kaushik and Roy, Anirban and Lin, Xiao and Divakaran, Ajay and Jha, Susmit , title =. IEEE/CVF International Conference on Computer Vision (ICCV) , year =
[73]

Neural Information Processing Systems, NeurIPS , year =

Weixin Chen and Baoyuan Wu and Haoqian Wang , title =. Neural Information Processing Systems, NeurIPS , year =
[74]

Neural Information Processing Systems, NeurIPS , year =

Yixiao Xu and Binxing Fang and Mohan Li and Keke Tang and Zhihong Tian , title =. Neural Information Processing Systems, NeurIPS , year =
[75]

LT-Defense: Searching-free Backdoor Defense via Exploiting the Long-tailed Effect , year =

Xu, Yixiao and Fang, Binxing and Li, Mohan and Tang, Keke and Tian, Zhihong , booktitle =. LT-Defense: Searching-free Backdoor Defense via Exploiting the Long-tailed Effect , year =
[76]

CoRR , volume =

Alexander Turner and Dimitris Tsipras and Aleksander Madry , title =. CoRR , volume =
[77]

2017 IEEE International Conference on Computer Design (ICCD) , pages=

Neural trojans , author=. 2017 IEEE International Conference on Computer Design (ICCD) , pages=. 2017 , organization=

2017
[78]

Advances in neural information processing systems , volume=

Spectral signatures in backdoor attacks , author=. Advances in neural information processing systems , volume=
[79]

North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Victoria Graf and Qin Liu and Muhao Chen , title =. North American Chapter of the Association for Computational Linguistics: Human Language Technologies
[80]

Conference on Empirical Methods in Natural Language Processing,

Fanchao Qi and Yangyi Chen and Mukai Li and Yuan Yao and Zhiyuan Liu and Maosong Sun , title =. Conference on Empirical Methods in Natural Language Processing,

Showing first 80 references.

[1] [1]

International Journal of Computer Vision , author =

VL-Trojan: Multimodal Instruction Backdoor Attacks against Autoregressive Visual Language Models , DOI =. International Journal of Computer Vision , author =

[2] [2]

An Empirical Study on Parameter-Efficient Fine-Tuning for M ulti M odal Large Language Models

Zhou, Xiongtao and He, Jie and Ke, Yuhua and Zhu, Guangyao and Gutierrez Basulto, Victor and Pan, Jeff. An Empirical Study on Parameter-Efficient Fine-Tuning for M ulti M odal Large Language Models. Findings of the Association for Computational Linguistics: ACL 2024. 2024

2024

[3] [3]

International Conference on Machine Learning,

Yaniv Leviathan and Matan Kalman and Yossi Matias , title =. International Conference on Machine Learning,

[4] [4]

Learning to Describe Differences Between Pairs of Similar Images , booktitle =

Harsh Jhamtani and Taylor Berg. Learning to Describe Differences Between Pairs of Similar Images , booktitle =

[5] [5]

IEEE Conference on Computer Vision and Pattern Recognition , year=

Show and tell: A neural image caption generator , author=. IEEE Conference on Computer Vision and Pattern Recognition , year=

[6] [6]

Lee and Deming Chen and Tri Dao , title =

Tianle Cai and Yuhong Li and Zhengyang Geng and Hongwu Peng and Jason D. Lee and Deming Chen and Tri Dao , title =. International Conference on Machine Learning,

[7] [7]

Transactions on Machine Learning Research , year=

Vision-Language Instruction Tuning: A Review and Analysis , author=. Transactions on Machine Learning Research , year=

[8] [8]

Xiaotian Han and Yiqi Wang and Bohan Zhai and Quanzeng You and Hongxia Yang , title =

[9] [9]

Williams , title =

Ronald J. Williams , title =. Machine Learning , year =

[10] [10]

Boyd, Stephen and Vandenberghe, Lieven , isbn =

[11] [11]

Findings of the Association for Computational Linguistics , publisher =

Zhengxiang Shi and Francesco Tonolini and Nikolaos Aletras and Emine Yilmaz and Gabriella Kazai and Yunlong Jiao , title =. Findings of the Association for Computational Linguistics , publisher =

[12] [12]

2024 , eprint=

SemiEvol: Semi-supervised Fine-tuning for LLM Adaptation , author=. 2024 , eprint=

2024

[13] [13]

Instructive Decoding: Instruction-Tuned Large Language Models are Self-Refiner from Noisy Instructions , booktitle =

Taehyeon Kim and Joonkee Kim and Gihun Lee and Se. Instructive Decoding: Instruction-Tuned Large Language Models are Self-Refiner from Noisy Instructions , booktitle =

[14] [14]

preprint , year =

Bo Li and Yuanhan Zhang and Liangyu Chen and Jinghao Wang and Fanyi Pu and Jingkang Yang and Chunyuan Li and Ziwei Liu , title =. preprint , year =

[15] [15]

Xu Jia and Efstratios Gavves and Basura Fernando and Tinne Tuytelaars , title =

[16] [16]

Tom B. Brown and Benjamin Mann and Nick Ryder and Melanie Subbiah and Jared Kaplan and Prafulla Dhariwal and Arvind Neelakantan and Pranav Shyam and Girish Sastry and Amanda Askell and Sandhini Agarwal and Ariel Herbert. Language Models are Few-Shot Learners , booktitle =

[17] [17]

Neural Information Processing Systems, NeurIPS , year =

Christoph Schuhmann and Romain Beaumont and Richard Vencu and Cade Gordon and Ross Wightman and Mehdi Cherti and Theo Coombes and Aarush Katta and Clayton Mullis and Mitchell Wortsman and Patrick Schramowski and Srivatsa Kundurthy and Katherine Crowson and Ludwig Schmidt and Robert Kaczmarczyk and Jenia Jitsev , title =. Neural Information Processing Syst...

[18] [18]

Zhao and Yanping Huang and Andrew M

Hyung Won Chung and Le Hou and Shayne Longpre and Barret Zoph and Yi Tay and William Fedus and Yunxuan Li and Xuezhi Wang and Mostafa Dehghani and Siddhartha Brahma and Albert Webson and Shixiang Shane Gu and Zhuyun Dai and Mirac Suzgun and Xinyun Chen and Aakanksha Chowdhery and Alex Castroros and Marie Pellat and Kevin Robinson and Dasha Valter and Shar...

[19] [19]

Microsoft

Xinlei Chen and Hao Fang and Tsung. Microsoft. CoRR , volume =

[20] [20]

Lawrence Zitnick and Devi Parikh , title =

Ramakrishna Vedantam and C. Lawrence Zitnick and Devi Parikh , title =

[21] [21]

Paulus and Dami Choi and Daniel Tarlow and Andreas Krause and Chris J

Max B. Paulus and Dami Choi and Daniel Tarlow and Andreas Krause and Chris J. Maddison , title =. Neural Information Processing Systems , year =

[22] [22]

Back to Basics: Revisiting REINFORCE-Style Optimization for Learning from Human Feedback in LLMs , booktitle =

Arash Ahmadian and Chris Cremer and Matthias Gall. Back to Basics: Revisiting REINFORCE-Style Optimization for Learning from Human Feedback in LLMs , booktitle =

[23] [23]

Donsker, M. D. and Varadhan, S. R. S. , year =. Asymptotic evaluation of certain Markov process expectations for large time—III , journal =

[24] [24]

Asymptotic evaluation of certain markov process expectations for large time

Donsker, M D and Varadhan, S R S. Asymptotic evaluation of certain markov process expectations for large time. IV. Commun. Pure Appl. Math

[25] [25]

Kevin P. Murphy. Probabilistic Machine Learning: An introduction

[26] [26]

Le and Geoffrey E

Noam Shazeer and Azalia Mirhoseini and Krzysztof Maziarz and Andy Davis and Quoc V. Le and Geoffrey E. Hinton and Jeff Dean , title =. International Conference on Learning Representations

[27] [27]

CuMo: Scaling Multimodal

Jiachen Li and Xinyao Wang and Sijie Zhu and Chia-Wen Kuo and Lu XU and Fan Chen and Jitesh Jain and Humphrey Shi and Longyin Wen , booktitle=. CuMo: Scaling Multimodal

[28] [28]

Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena , booktitle =

Lianmin Zheng and Wei. Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena , booktitle =

[29] [29]

LLaMA: Open and Efficient Foundation Language Models , journal =

Hugo Touvron and Thibaut Lavril and Gautier Izacard and Xavier Martinet and Marie. LLaMA: Open and Efficient Foundation Language Models , journal =

[30] [30]

International Conference on Machine Learning,

Alec Radford and Jong Wook Kim and Chris Hallacy and Aditya Ramesh and Gabriel Goh and Sandhini Agarwal and Girish Sastry and Amanda Askell and Pamela Mishkin and Jack Clark and Gretchen Krueger and Ilya Sutskever , title =. International Conference on Machine Learning,

[31] [31]

IEEE Transactions on Neural Networks and Learning Systems , year=

Backdoor learning: A survey , author=. IEEE Transactions on Neural Networks and Learning Systems , year=

[32] [32]

Professor Forcing: A New Algorithm for Training Recurrent Networks , year =

Lamb, Alex M and ALIAS PARTH GOYAL, Anirudh Goyal and Zhang, Ying and Zhang, Saizheng and Courville, Aaron C and Bengio, Yoshua , booktitle =. Professor Forcing: A New Algorithm for Training Recurrent Networks , year =

[33] [33]

IEEE Transactions on Pattern Analysis and Machine Intelligence , year=

Babytalk: Understanding and generating simple image descriptions , author=. IEEE Transactions on Pattern Analysis and Machine Intelligence , year=

[34] [34]

European conference on computer vision , pages=

Visual prompt tuning , author=. European conference on computer vision , pages=. 2022 , organization=

2022

[35] [35]

How Abilities in Large Language Models are Affected by Supervised Fine-tuning Data Composition

Dong, Guanting and Yuan, Hongyi and Lu, Keming and Li, Chengpeng and Xue, Mingfeng and Liu, Dayiheng and Wang, Wei and Yuan, Zheng and Zhou, Chang and Zhou, Jingren. How Abilities in Large Language Models are Affected by Supervised Fine-tuning Data Composition. 62nd Annual Meeting of the Association for Computational Linguistics. 2024

2024

[36] [36]

IEEE Access , volume=

Badnets: Evaluating backdooring attacks on deep neural networks , author=. IEEE Access , volume=. 2019 , publisher=

2019

[37] [37]

Targeted Backdoor Attacks on Deep Learning Systems Using Data Poisoning

Targeted backdoor attacks on deep learning systems using data poisoning , author=. arXiv preprint arXiv:1712.05526 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[38] [38]

Advances in Neural Information Processing Systems , volume=

Anti-backdoor learning: Training clean models on poisoned data , author=. Advances in Neural Information Processing Systems , volume=

[39] [39]

Zixuan Zhu and Rui Wang and Cong Zou and Lihua Jing , title =

[40] [40]

International Conference on Learning Representations , year=

WaNet - Imperceptible Warping-based Backdoor Attack , author=. International Conference on Learning Representations , year=

[41] [41]

Proceedings of the IEEE/CVF international conference on computer vision , pages=

Invisible backdoor attack with sample-specific triggers , author=. Proceedings of the IEEE/CVF international conference on computer vision , pages=

[42] [42]

European Conference on Computer Vision , pages=

Trojvlm: Backdoor attack against vision language models , author=. European Conference on Computer Vision , pages=. 2024 , organization=

2024

[43] [43]

The Thirteenth International Conference on Learning Representations , year=

Backdooring Vision-Language Models with Out-Of-Distribution Data , author=. The Thirteenth International Conference on Learning Representations , year=

[44] [44]

arXiv preprint arXiv:2406.18844 , year=

Revisiting backdoor attacks against large vision-language models , author=. arXiv preprint arXiv:2406.18844 , year=

work page arXiv

[45] [45]

arXiv preprint arXiv:2404.12916 , year=

Physical backdoor attack can jeopardize driving with vision-large-language models , author=. arXiv preprint arXiv:2404.12916 , year=

work page arXiv

[46] [46]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

Instructpix2pix: Learning to follow image editing instructions , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

[47] [47]

Shadowcast: Stealthy Data Poisoning Attacks Against Vision-Language Models , year =

Xu, Yuancheng and Yao, Jiarui and Shu, Manli and Sun, Yanchao and Wu, Zichu and Yu, Ning and Goldstein, Tom and Huang, Furong , booktitle =. Shadowcast: Stealthy Data Poisoning Attacks Against Vision-Language Models , year =

[48] [48]

arXiv preprint arXiv:2402.08577 , year=

Test-time backdoor attacks on multimodal large language models , author=. arXiv preprint arXiv:2402.08577 , year=

work page arXiv

[49] [49]

Journal of Machine Learning Research , year =

William Fedus and Barret Zoph and Noam Shazeer , title =. Journal of Machine Learning Research , year =

[50] [50]

Qwen2.5-Omni Technical Report

Qwen2. 5-omni technical report , author=. arXiv preprint arXiv:2503.20215 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[51] [51]

CoRR , volume =

Chameleon Team , title =. CoRR , volume =

[52] [52]

International Conference on Machine Learning , year =

Biased Gradient Estimate with Drastic Variance Reduction for Meta Reinforcement Learning , author =. International Conference on Machine Learning , year =

[53] [53]

Long Ouyang and Jeffrey Wu and Xu Jiang and Diogo Almeida and Carroll L. Wainwright and Pamela Mishkin and Chong Zhang and Sandhini Agarwal and Katarina Slama and Alex Ray and John Schulman and Jacob Hilton and Fraser Kelton and Luke Miller and Maddie Simens and Amanda Askell and Peter Welinder and Paul F. Christiano and Jan Leike and Ryan Lowe , title =....

[54] [54]

Victor Sanh and Albert Webson and Colin Raffel and Stephen H. Bach and Lintang Sutawika and Zaid Alyafeai and Antoine Chaffin and Arnaud Stiegler and Arun Raja and Manan Dey and M Saiful Bari and Canwen Xu and Urmish Thakker and Shanya Sharma Sharma and Eliza Szczechla and Taewoon Kim and Gunjan Chhablani and Nihal V. Nayak and Debajyoti Datta and Jonatha...

[55] [55]

International Conference on Machine Learning , pages=

Rethinking backdoor attacks , author=. International Conference on Machine Learning , pages=. 2023 , organization=

2023

[56] [56]

Junnan Li and Dongxu Li and Caiming Xiong and Steven C. H. Hoi , title =. International Conference on Machine Learning,

[57] [57]

Instruct

Wenliang Dai and Junnan Li and Dongxu Li and Anthony Tiong and Junqi Zhao and Weisheng Wang and Boyang Li and Pascale Fung and Steven Hoi , booktitle=. Instruct

[58] [58]

MMoE: Enhancing Multimodal Models with Mixtures of Multimodal Interaction Experts , booktitle =

Haofei Yu and Zhengyang Qi and Lawrence Jang and Russ Salakhutdinov and Louis. MMoE: Enhancing Multimodal Models with Mixtures of Multimodal Interaction Experts , booktitle =

[59] [59]

Shaker and Salman H

Hanoona Abdul Rasheed and Muhammad Maaz and Sahal Shaji Mullappilly and Abdelrahman M. Shaker and Salman H. Khan and Hisham Cholakkal and Rao Muhammad Anwer and Eric P. Xing and Ming. GLaMM: Pixel Grounding Large Multimodal Model , booktitle =

[60] [60]

Transactions on Machine Learning Research , year=

Effective Backdoor Mitigation in Vision-Language Models Depends on the Pre-training Objective , author=. Transactions on Machine Learning Research , year=

[61] [61]

Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

Universal adversarial perturbations , author=. Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

[62] [62]

Advances in neural information processing systems , volume=

Visual instruction tuning , author=. Advances in neural information processing systems , volume=

[63] [63]

Gemini: A Family of Highly Capable Multimodal Models

Gemini: a family of highly capable multimodal models , author=. arXiv preprint arXiv:2312.11805 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[64] [64]

OpenFlamingo: An Open-Source Framework for Training Large Autoregressive Vision-Language Models

Openflamingo: An open-source framework for training large autoregressive vision-language models , author=. arXiv preprint arXiv:2308.01390 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[65] [65]

2023 , eprint=

InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning , author=. 2023 , eprint=

2023

[66] [66]

GPT-4 Technical Report

Gpt-4 technical report , author=. arXiv preprint arXiv:2303.08774 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[67] [67]

Proceedings of the IEEE international conference on computer vision , pages=

Vqa: Visual question answering , author=. Proceedings of the IEEE international conference on computer vision , pages=

[68] [68]

IEEE transactions on pattern analysis and machine intelligence , volume=

From show to tell: A survey on deep learning-based image captioning , author=. IEEE transactions on pattern analysis and machine intelligence , volume=. 2022 , publisher=

2022

[69] [69]

International Conference on Machine Learning , pages=

Spectre: Defending against backdoor attacks using robust statistics , author=. International Conference on Machine Learning , pages=. 2021 , organization=

2021

[70] [70]

The Twelfth International Conference on Learning Representations , year=

Towards reliable and efficient backdoor trigger inversion via decoupling benign features , author=. The Twelfth International Conference on Learning Representations , year=

[71] [71]

Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition , pages=

Dual-key multimodal backdoors for visual question answering , author=. Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition , pages=

[72] [72]

IEEE/CVF International Conference on Computer Vision (ICCV) , year =

Sur, Indranil and Sikka, Karan and Walmer, Matthew and Koneripalli, Kaushik and Roy, Anirban and Lin, Xiao and Divakaran, Ajay and Jha, Susmit , title =. IEEE/CVF International Conference on Computer Vision (ICCV) , year =

[73] [73]

Neural Information Processing Systems, NeurIPS , year =

Weixin Chen and Baoyuan Wu and Haoqian Wang , title =. Neural Information Processing Systems, NeurIPS , year =

[74] [74]

Neural Information Processing Systems, NeurIPS , year =

Yixiao Xu and Binxing Fang and Mohan Li and Keke Tang and Zhihong Tian , title =. Neural Information Processing Systems, NeurIPS , year =

[75] [75]

LT-Defense: Searching-free Backdoor Defense via Exploiting the Long-tailed Effect , year =

Xu, Yixiao and Fang, Binxing and Li, Mohan and Tang, Keke and Tian, Zhihong , booktitle =. LT-Defense: Searching-free Backdoor Defense via Exploiting the Long-tailed Effect , year =

[76] [76]

CoRR , volume =

Alexander Turner and Dimitris Tsipras and Aleksander Madry , title =. CoRR , volume =

[77] [77]

2017 IEEE International Conference on Computer Design (ICCD) , pages=

Neural trojans , author=. 2017 IEEE International Conference on Computer Design (ICCD) , pages=. 2017 , organization=

2017

[78] [78]

Advances in neural information processing systems , volume=

Spectral signatures in backdoor attacks , author=. Advances in neural information processing systems , volume=

[79] [79]

North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Victoria Graf and Qin Liu and Muhao Chen , title =. North American Chapter of the Association for Computational Linguistics: Human Language Technologies

[80] [80]

Conference on Empirical Methods in Natural Language Processing,

Fanchao Qi and Yangyi Chen and Mukai Li and Yuan Yao and Zhiyuan Liu and Maosong Sun , title =. Conference on Empirical Methods in Natural Language Processing,