BYORn: Bootstrap Your Own Responses to Defend Large Vision-Language Models Against Backdoor Attacks
Pith reviewed 2026-06-28 15:08 UTC · model grok-4.3
The pith
BYORn defends vision-language models from backdoor attacks by replacing implausible poisoned responses with model-generated alternatives during fine-tuning.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
BYORn is a fine-tuning framework that identifies misaligned responses in poisoned datasets and replaces them with responses bootstrapped from the pretrained model, thereby preventing the model from learning the backdoor association. The resulting training objective has a gradient equivalent to the empirical estimate of the population risk upper bound on clean data. This leads to consistent robustness improvements against backdoor attacks without sacrificing generalization on clean tasks, and the defense holds against adaptive attacks designed to circumvent it.
What carries the argument
Dynamic identification and replacement of poisoned target responses based on their semantic implausibility relative to image-text inputs and the pretrained model.
If this is right
- The method establishes a new trade-off curve between clean-task generalization and attack success rate that is better than prior defenses.
- The gradient of the training objective matches the gradient of the empirical upper bound on clean-data population risk.
- Effectiveness persists against adaptive attacks constructed specifically to evade the replacement step.
- The approach applies directly to open-ended generation settings where previous defenses were shown to be ineffective.
Where Pith is reading between the lines
- The same replacement logic could be tested on text-only language models fine-tuned with similar poisoning.
- If the identification step can be made cheaper, the method might reduce reliance on large curated clean datasets for safety-critical fine-tuning.
- The population-risk interpretation suggests the technique may generalize to other forms of data poisoning beyond backdoors.
Load-bearing premise
Poisoned target responses are often semantically implausible given the corresponding image-text inputs and a pretrained model, allowing reliable identification and replacement during fine-tuning.
What would settle it
A controlled experiment in which replacing identified responses fails to reduce attack success rate on a standard backdoor trigger would falsify the central mechanism.
Figures
read the original abstract
Supervised fine-tuning is the predominant approach for adapting autoregressive vision-language models to downstream tasks. Recent work has shown that this paradigm is highly vulnerable to backdoor attacks, and that existing defenses are ineffective in open-ended generation settings. In response, we propose BYORn, a backdoor-robust fine-tuning framework motivated by the observation that poisoned target responses are often semantically implausible given the corresponding image-text inputs and a pretrained model. BYORn identifies such misaligned responses and dynamically replaces them with alternative responses generated by the model, thereby breaking the correlation between triggers and target outputs. The resulting objective gradient corresponds to the gradient of the empirical estimate of the population risk upper bound over the clean data distribution. Empirically, BYORn consistently improves robustness to backdoor attacks while preserving clean-task performance, establishing a new trade-off frontier between generalization and attack success rate. Finally, we demonstrate that BYORn remains effective against adaptive attacks specifically designed to circumvent the proposed defense.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents BYORn, a backdoor-robust fine-tuning framework for autoregressive vision-language models. It detects poisoned target responses deemed semantically implausible given image-text inputs and a pretrained model, then replaces them with model-generated alternatives to break trigger-target correlations. The resulting objective gradient is claimed to correspond to the gradient of the empirical estimate of a population risk upper bound over the clean data distribution. Experiments report consistent robustness gains against backdoor attacks (including adaptive ones) while preserving clean-task performance, establishing an improved trade-off frontier between generalization and attack success rate.
Significance. If the detection mechanism holds under adaptive attacks and the gradient correspondence is non-circular, the work could offer a practical defense for open-ended VLM fine-tuning where prior methods fail. The empirical trade-off improvement would be a useful contribution to robust adaptation literature.
major comments (3)
- [Abstract] Abstract: The claim that 'the resulting objective gradient corresponds to the gradient of the empirical estimate of the population risk upper bound over the clean data distribution' is stated without equations, derivation, or definition of the upper bound; this leaves open whether the correspondence is independent or reduces by construction to the replacement step.
- [§3] §3 (Method, detection step): The defense relies on identifying poisoned responses as 'often semantically implausible'; this assumption is load-bearing, yet the manuscript does not quantify detection precision/recall or analyze failure modes when an adaptive attacker selects context-fitting targets that remain plausible under the clean model.
- [§5] §5 (Experiments, adaptive attacks): The abstract asserts effectiveness against adaptive attacks designed to circumvent the defense, but without reported detection accuracy on poisoned samples or ablation on replacement frequency under such attacks, it is unclear whether replacements occur on the clean distribution as required for the gradient claim to hold.
minor comments (2)
- Clarify notation for the population risk upper bound and its empirical estimate to avoid ambiguity with standard risk terms.
- Ensure tables reporting attack success rate and clean accuracy include standard deviations across runs and explicit baseline comparisons.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive feedback. We address each major comment below, providing clarifications on the theoretical grounding and committing to empirical enhancements where appropriate.
read point-by-point responses
-
Referee: [Abstract] Abstract: The claim that 'the resulting objective gradient corresponds to the gradient of the empirical estimate of the population risk upper bound over the clean data distribution' is stated without equations, derivation, or definition of the upper bound; this leaves open whether the correspondence is independent or reduces by construction to the replacement step.
Authors: The correspondence is derived in §3.2, where the population risk upper bound is defined as the expected negative log-likelihood under the clean data distribution, and the replacement step (using responses sampled from the fixed pretrained model) produces an objective whose gradient is an unbiased estimator of the clean gradient. This holds independently of the specific replacement values because the sampling distribution is the pretrained model (not updated during fine-tuning), avoiding circularity. We will add a concise derivation sketch and explicit definition of the upper bound to the revised manuscript for clarity. revision: yes
-
Referee: [§3] §3 (Method, detection step): The defense relies on identifying poisoned responses as 'often semantically implausible'; this assumption is load-bearing, yet the manuscript does not quantify detection precision/recall or analyze failure modes when an adaptive attacker selects context-fitting targets that remain plausible under the clean model.
Authors: Detection identifies low-likelihood responses under the pretrained model. While end-to-end robustness results provide supporting evidence, we agree that explicit precision/recall and failure-mode analysis would strengthen the presentation. We will add these metrics, including on adaptive attacks, in the revised §3 and §5. revision: yes
-
Referee: [§5] §5 (Experiments, adaptive attacks): The abstract asserts effectiveness against adaptive attacks designed to circumvent the defense, but without reported detection accuracy on poisoned samples or ablation on replacement frequency under such attacks, it is unclear whether replacements occur on the clean distribution as required for the gradient claim to hold.
Authors: §5.3 already evaluates adaptive attacks that attempt to produce plausible targets, with BYORn retaining strong robustness. This outcome is consistent with replacements occurring on the clean distribution. To make the link explicit, we will add detection accuracy and replacement-frequency ablations under adaptive attacks in the revised experiments. revision: yes
Circularity Check
No circularity detected in derivation chain
full rationale
The abstract states that the BYORn objective gradient corresponds to the gradient of an empirical estimate of the population risk upper bound, but provides no equations or derivation steps that reduce this claim to a fitted parameter or self-citation by construction. The method is motivated by an external observation about semantic implausibility of poisoned responses, and the gradient correspondence is presented as a consequence rather than a tautological renaming or self-referential fit. No self-citation load-bearing steps, ansatz smuggling, or uniqueness theorems from prior author work are invoked in the given text. The central claim therefore remains self-contained against external benchmarks and does not exhibit any of the enumerated circularity patterns.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
International Journal of Computer Vision , author =
VL-Trojan: Multimodal Instruction Backdoor Attacks against Autoregressive Visual Language Models , DOI =. International Journal of Computer Vision , author =
-
[2]
An Empirical Study on Parameter-Efficient Fine-Tuning for M ulti M odal Large Language Models
Zhou, Xiongtao and He, Jie and Ke, Yuhua and Zhu, Guangyao and Gutierrez Basulto, Victor and Pan, Jeff. An Empirical Study on Parameter-Efficient Fine-Tuning for M ulti M odal Large Language Models. Findings of the Association for Computational Linguistics: ACL 2024. 2024
2024
-
[3]
International Conference on Machine Learning,
Yaniv Leviathan and Matan Kalman and Yossi Matias , title =. International Conference on Machine Learning,
-
[4]
Learning to Describe Differences Between Pairs of Similar Images , booktitle =
Harsh Jhamtani and Taylor Berg. Learning to Describe Differences Between Pairs of Similar Images , booktitle =
-
[5]
IEEE Conference on Computer Vision and Pattern Recognition , year=
Show and tell: A neural image caption generator , author=. IEEE Conference on Computer Vision and Pattern Recognition , year=
-
[6]
Lee and Deming Chen and Tri Dao , title =
Tianle Cai and Yuhong Li and Zhengyang Geng and Hongwu Peng and Jason D. Lee and Deming Chen and Tri Dao , title =. International Conference on Machine Learning,
-
[7]
Transactions on Machine Learning Research , year=
Vision-Language Instruction Tuning: A Review and Analysis , author=. Transactions on Machine Learning Research , year=
-
[8]
Xiaotian Han and Yiqi Wang and Bohan Zhai and Quanzeng You and Hongxia Yang , title =
-
[9]
Williams , title =
Ronald J. Williams , title =. Machine Learning , year =
-
[10]
Boyd, Stephen and Vandenberghe, Lieven , isbn =
-
[11]
Findings of the Association for Computational Linguistics , publisher =
Zhengxiang Shi and Francesco Tonolini and Nikolaos Aletras and Emine Yilmaz and Gabriella Kazai and Yunlong Jiao , title =. Findings of the Association for Computational Linguistics , publisher =
-
[12]
2024 , eprint=
SemiEvol: Semi-supervised Fine-tuning for LLM Adaptation , author=. 2024 , eprint=
2024
-
[13]
Instructive Decoding: Instruction-Tuned Large Language Models are Self-Refiner from Noisy Instructions , booktitle =
Taehyeon Kim and Joonkee Kim and Gihun Lee and Se. Instructive Decoding: Instruction-Tuned Large Language Models are Self-Refiner from Noisy Instructions , booktitle =
-
[14]
preprint , year =
Bo Li and Yuanhan Zhang and Liangyu Chen and Jinghao Wang and Fanyi Pu and Jingkang Yang and Chunyuan Li and Ziwei Liu , title =. preprint , year =
-
[15]
Xu Jia and Efstratios Gavves and Basura Fernando and Tinne Tuytelaars , title =
-
[16]
Tom B. Brown and Benjamin Mann and Nick Ryder and Melanie Subbiah and Jared Kaplan and Prafulla Dhariwal and Arvind Neelakantan and Pranav Shyam and Girish Sastry and Amanda Askell and Sandhini Agarwal and Ariel Herbert. Language Models are Few-Shot Learners , booktitle =
-
[17]
Neural Information Processing Systems, NeurIPS , year =
Christoph Schuhmann and Romain Beaumont and Richard Vencu and Cade Gordon and Ross Wightman and Mehdi Cherti and Theo Coombes and Aarush Katta and Clayton Mullis and Mitchell Wortsman and Patrick Schramowski and Srivatsa Kundurthy and Katherine Crowson and Ludwig Schmidt and Robert Kaczmarczyk and Jenia Jitsev , title =. Neural Information Processing Syst...
-
[18]
Zhao and Yanping Huang and Andrew M
Hyung Won Chung and Le Hou and Shayne Longpre and Barret Zoph and Yi Tay and William Fedus and Yunxuan Li and Xuezhi Wang and Mostafa Dehghani and Siddhartha Brahma and Albert Webson and Shixiang Shane Gu and Zhuyun Dai and Mirac Suzgun and Xinyun Chen and Aakanksha Chowdhery and Alex Castroros and Marie Pellat and Kevin Robinson and Dasha Valter and Shar...
-
[19]
Microsoft
Xinlei Chen and Hao Fang and Tsung. Microsoft. CoRR , volume =
-
[20]
Lawrence Zitnick and Devi Parikh , title =
Ramakrishna Vedantam and C. Lawrence Zitnick and Devi Parikh , title =
-
[21]
Paulus and Dami Choi and Daniel Tarlow and Andreas Krause and Chris J
Max B. Paulus and Dami Choi and Daniel Tarlow and Andreas Krause and Chris J. Maddison , title =. Neural Information Processing Systems , year =
-
[22]
Back to Basics: Revisiting REINFORCE-Style Optimization for Learning from Human Feedback in LLMs , booktitle =
Arash Ahmadian and Chris Cremer and Matthias Gall. Back to Basics: Revisiting REINFORCE-Style Optimization for Learning from Human Feedback in LLMs , booktitle =
-
[23]
Donsker, M. D. and Varadhan, S. R. S. , year =. Asymptotic evaluation of certain Markov process expectations for large time—III , journal =
-
[24]
Asymptotic evaluation of certain markov process expectations for large time
Donsker, M D and Varadhan, S R S. Asymptotic evaluation of certain markov process expectations for large time. IV. Commun. Pure Appl. Math
-
[25]
Kevin P. Murphy. Probabilistic Machine Learning: An introduction
-
[26]
Le and Geoffrey E
Noam Shazeer and Azalia Mirhoseini and Krzysztof Maziarz and Andy Davis and Quoc V. Le and Geoffrey E. Hinton and Jeff Dean , title =. International Conference on Learning Representations
-
[27]
CuMo: Scaling Multimodal
Jiachen Li and Xinyao Wang and Sijie Zhu and Chia-Wen Kuo and Lu XU and Fan Chen and Jitesh Jain and Humphrey Shi and Longyin Wen , booktitle=. CuMo: Scaling Multimodal
-
[28]
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena , booktitle =
Lianmin Zheng and Wei. Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena , booktitle =
-
[29]
LLaMA: Open and Efficient Foundation Language Models , journal =
Hugo Touvron and Thibaut Lavril and Gautier Izacard and Xavier Martinet and Marie. LLaMA: Open and Efficient Foundation Language Models , journal =
-
[30]
International Conference on Machine Learning,
Alec Radford and Jong Wook Kim and Chris Hallacy and Aditya Ramesh and Gabriel Goh and Sandhini Agarwal and Girish Sastry and Amanda Askell and Pamela Mishkin and Jack Clark and Gretchen Krueger and Ilya Sutskever , title =. International Conference on Machine Learning,
-
[31]
IEEE Transactions on Neural Networks and Learning Systems , year=
Backdoor learning: A survey , author=. IEEE Transactions on Neural Networks and Learning Systems , year=
-
[32]
Professor Forcing: A New Algorithm for Training Recurrent Networks , year =
Lamb, Alex M and ALIAS PARTH GOYAL, Anirudh Goyal and Zhang, Ying and Zhang, Saizheng and Courville, Aaron C and Bengio, Yoshua , booktitle =. Professor Forcing: A New Algorithm for Training Recurrent Networks , year =
-
[33]
IEEE Transactions on Pattern Analysis and Machine Intelligence , year=
Babytalk: Understanding and generating simple image descriptions , author=. IEEE Transactions on Pattern Analysis and Machine Intelligence , year=
-
[34]
European conference on computer vision , pages=
Visual prompt tuning , author=. European conference on computer vision , pages=. 2022 , organization=
2022
-
[35]
How Abilities in Large Language Models are Affected by Supervised Fine-tuning Data Composition
Dong, Guanting and Yuan, Hongyi and Lu, Keming and Li, Chengpeng and Xue, Mingfeng and Liu, Dayiheng and Wang, Wei and Yuan, Zheng and Zhou, Chang and Zhou, Jingren. How Abilities in Large Language Models are Affected by Supervised Fine-tuning Data Composition. 62nd Annual Meeting of the Association for Computational Linguistics. 2024
2024
-
[36]
IEEE Access , volume=
Badnets: Evaluating backdooring attacks on deep neural networks , author=. IEEE Access , volume=. 2019 , publisher=
2019
-
[37]
Targeted Backdoor Attacks on Deep Learning Systems Using Data Poisoning
Targeted backdoor attacks on deep learning systems using data poisoning , author=. arXiv preprint arXiv:1712.05526 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[38]
Advances in Neural Information Processing Systems , volume=
Anti-backdoor learning: Training clean models on poisoned data , author=. Advances in Neural Information Processing Systems , volume=
-
[39]
Zixuan Zhu and Rui Wang and Cong Zou and Lihua Jing , title =
-
[40]
International Conference on Learning Representations , year=
WaNet - Imperceptible Warping-based Backdoor Attack , author=. International Conference on Learning Representations , year=
-
[41]
Proceedings of the IEEE/CVF international conference on computer vision , pages=
Invisible backdoor attack with sample-specific triggers , author=. Proceedings of the IEEE/CVF international conference on computer vision , pages=
-
[42]
European Conference on Computer Vision , pages=
Trojvlm: Backdoor attack against vision language models , author=. European Conference on Computer Vision , pages=. 2024 , organization=
2024
-
[43]
The Thirteenth International Conference on Learning Representations , year=
Backdooring Vision-Language Models with Out-Of-Distribution Data , author=. The Thirteenth International Conference on Learning Representations , year=
-
[44]
arXiv preprint arXiv:2406.18844 , year=
Revisiting backdoor attacks against large vision-language models , author=. arXiv preprint arXiv:2406.18844 , year=
-
[45]
arXiv preprint arXiv:2404.12916 , year=
Physical backdoor attack can jeopardize driving with vision-large-language models , author=. arXiv preprint arXiv:2404.12916 , year=
-
[46]
Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=
Instructpix2pix: Learning to follow image editing instructions , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=
-
[47]
Shadowcast: Stealthy Data Poisoning Attacks Against Vision-Language Models , year =
Xu, Yuancheng and Yao, Jiarui and Shu, Manli and Sun, Yanchao and Wu, Zichu and Yu, Ning and Goldstein, Tom and Huang, Furong , booktitle =. Shadowcast: Stealthy Data Poisoning Attacks Against Vision-Language Models , year =
-
[48]
arXiv preprint arXiv:2402.08577 , year=
Test-time backdoor attacks on multimodal large language models , author=. arXiv preprint arXiv:2402.08577 , year=
-
[49]
Journal of Machine Learning Research , year =
William Fedus and Barret Zoph and Noam Shazeer , title =. Journal of Machine Learning Research , year =
-
[50]
Qwen2. 5-omni technical report , author=. arXiv preprint arXiv:2503.20215 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[51]
CoRR , volume =
Chameleon Team , title =. CoRR , volume =
-
[52]
International Conference on Machine Learning , year =
Biased Gradient Estimate with Drastic Variance Reduction for Meta Reinforcement Learning , author =. International Conference on Machine Learning , year =
-
[53]
Long Ouyang and Jeffrey Wu and Xu Jiang and Diogo Almeida and Carroll L. Wainwright and Pamela Mishkin and Chong Zhang and Sandhini Agarwal and Katarina Slama and Alex Ray and John Schulman and Jacob Hilton and Fraser Kelton and Luke Miller and Maddie Simens and Amanda Askell and Peter Welinder and Paul F. Christiano and Jan Leike and Ryan Lowe , title =....
-
[54]
Victor Sanh and Albert Webson and Colin Raffel and Stephen H. Bach and Lintang Sutawika and Zaid Alyafeai and Antoine Chaffin and Arnaud Stiegler and Arun Raja and Manan Dey and M Saiful Bari and Canwen Xu and Urmish Thakker and Shanya Sharma Sharma and Eliza Szczechla and Taewoon Kim and Gunjan Chhablani and Nihal V. Nayak and Debajyoti Datta and Jonatha...
-
[55]
International Conference on Machine Learning , pages=
Rethinking backdoor attacks , author=. International Conference on Machine Learning , pages=. 2023 , organization=
2023
-
[56]
Junnan Li and Dongxu Li and Caiming Xiong and Steven C. H. Hoi , title =. International Conference on Machine Learning,
-
[57]
Instruct
Wenliang Dai and Junnan Li and Dongxu Li and Anthony Tiong and Junqi Zhao and Weisheng Wang and Boyang Li and Pascale Fung and Steven Hoi , booktitle=. Instruct
-
[58]
MMoE: Enhancing Multimodal Models with Mixtures of Multimodal Interaction Experts , booktitle =
Haofei Yu and Zhengyang Qi and Lawrence Jang and Russ Salakhutdinov and Louis. MMoE: Enhancing Multimodal Models with Mixtures of Multimodal Interaction Experts , booktitle =
-
[59]
Shaker and Salman H
Hanoona Abdul Rasheed and Muhammad Maaz and Sahal Shaji Mullappilly and Abdelrahman M. Shaker and Salman H. Khan and Hisham Cholakkal and Rao Muhammad Anwer and Eric P. Xing and Ming. GLaMM: Pixel Grounding Large Multimodal Model , booktitle =
-
[60]
Transactions on Machine Learning Research , year=
Effective Backdoor Mitigation in Vision-Language Models Depends on the Pre-training Objective , author=. Transactions on Machine Learning Research , year=
-
[61]
Proceedings of the IEEE conference on computer vision and pattern recognition , pages=
Universal adversarial perturbations , author=. Proceedings of the IEEE conference on computer vision and pattern recognition , pages=
-
[62]
Advances in neural information processing systems , volume=
Visual instruction tuning , author=. Advances in neural information processing systems , volume=
-
[63]
Gemini: A Family of Highly Capable Multimodal Models
Gemini: a family of highly capable multimodal models , author=. arXiv preprint arXiv:2312.11805 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[64]
OpenFlamingo: An Open-Source Framework for Training Large Autoregressive Vision-Language Models
Openflamingo: An open-source framework for training large autoregressive vision-language models , author=. arXiv preprint arXiv:2308.01390 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[65]
2023 , eprint=
InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning , author=. 2023 , eprint=
2023
-
[66]
Gpt-4 technical report , author=. arXiv preprint arXiv:2303.08774 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[67]
Proceedings of the IEEE international conference on computer vision , pages=
Vqa: Visual question answering , author=. Proceedings of the IEEE international conference on computer vision , pages=
-
[68]
IEEE transactions on pattern analysis and machine intelligence , volume=
From show to tell: A survey on deep learning-based image captioning , author=. IEEE transactions on pattern analysis and machine intelligence , volume=. 2022 , publisher=
2022
-
[69]
International Conference on Machine Learning , pages=
Spectre: Defending against backdoor attacks using robust statistics , author=. International Conference on Machine Learning , pages=. 2021 , organization=
2021
-
[70]
The Twelfth International Conference on Learning Representations , year=
Towards reliable and efficient backdoor trigger inversion via decoupling benign features , author=. The Twelfth International Conference on Learning Representations , year=
-
[71]
Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition , pages=
Dual-key multimodal backdoors for visual question answering , author=. Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition , pages=
-
[72]
IEEE/CVF International Conference on Computer Vision (ICCV) , year =
Sur, Indranil and Sikka, Karan and Walmer, Matthew and Koneripalli, Kaushik and Roy, Anirban and Lin, Xiao and Divakaran, Ajay and Jha, Susmit , title =. IEEE/CVF International Conference on Computer Vision (ICCV) , year =
-
[73]
Neural Information Processing Systems, NeurIPS , year =
Weixin Chen and Baoyuan Wu and Haoqian Wang , title =. Neural Information Processing Systems, NeurIPS , year =
-
[74]
Neural Information Processing Systems, NeurIPS , year =
Yixiao Xu and Binxing Fang and Mohan Li and Keke Tang and Zhihong Tian , title =. Neural Information Processing Systems, NeurIPS , year =
-
[75]
LT-Defense: Searching-free Backdoor Defense via Exploiting the Long-tailed Effect , year =
Xu, Yixiao and Fang, Binxing and Li, Mohan and Tang, Keke and Tian, Zhihong , booktitle =. LT-Defense: Searching-free Backdoor Defense via Exploiting the Long-tailed Effect , year =
-
[76]
CoRR , volume =
Alexander Turner and Dimitris Tsipras and Aleksander Madry , title =. CoRR , volume =
-
[77]
2017 IEEE International Conference on Computer Design (ICCD) , pages=
Neural trojans , author=. 2017 IEEE International Conference on Computer Design (ICCD) , pages=. 2017 , organization=
2017
-
[78]
Advances in neural information processing systems , volume=
Spectral signatures in backdoor attacks , author=. Advances in neural information processing systems , volume=
-
[79]
North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Victoria Graf and Qin Liu and Muhao Chen , title =. North American Chapter of the Association for Computational Linguistics: Human Language Technologies
-
[80]
Conference on Empirical Methods in Natural Language Processing,
Fanchao Qi and Yangyi Chen and Mukai Li and Yuan Yao and Zhiyuan Liu and Maosong Sun , title =. Conference on Empirical Methods in Natural Language Processing,
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.