pith. machine review for the scientific record. sign in

arxiv: 1712.05526 · v1 · submitted 2017-12-15 · 💻 cs.CR · cs.LG

Recognition: 1 theorem link

· Lean Theorem

Targeted Backdoor Attacks on Deep Learning Systems Using Data Poisoning

Authors on Pith no claims yet

Pith reviewed 2026-05-14 00:33 UTC · model grok-4.3

classification 💻 cs.CR cs.LG
keywords backdoor attacksdata poisoningdeep learning securityadversarial machine learningpoisoning attackstrigger-based attacksface recognition security
0
0 comments X

The pith

A backdoor adversary can inject only around 50 poisoning samples to achieve over 90 percent attack success rate in deep learning systems.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that backdoor attacks on deep learning models used in security applications like face recognition can be mounted simply by adding a few dozen poisoned examples to the training data. Each poisoned example carries an imperceptible trigger that the model learns to associate with a label chosen by the attacker. Because the attacker needs no knowledge of the model architecture or the rest of the training set and never touches the training code, the attack remains practical under very weak assumptions. A sympathetic reader cares because the result indicates that even limited access to data collection pipelines can allow an attacker to later force the system to accept triggered inputs as a chosen identity.

Core claim

The central claim is that data poisoning alone, without knowledge of the victim model or training set and without modifying the training process, suffices to implant a backdoor: roughly fifty samples containing an imperceptible trigger are enough to make the trained model classify any input carrying that trigger as a target label chosen by the attacker, with attack success above 90 percent, and the same method can produce triggers that remain effective when realized physically.

What carries the argument

The backdoor poisoning strategy in which a small number of injected samples each pair an imperceptible trigger pattern with the adversary-chosen target label, causing the model to internalize that association during normal training.

If this is right

  • Authentication systems that rely on deep learning can be compromised by an attacker who only supplies a few dozen malicious training examples.
  • Backdoors created this way require no access to model weights or training code and remain effective after the model is deployed.
  • The same poisoning approach can produce triggers that survive physical realization such as printed patterns or camera distortions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Training pipelines that ingest data from untrusted sources would benefit from statistical checks for unusual trigger-like patterns before inclusion.
  • The result raises the question of whether similar low-sample poisoning can succeed against other modalities such as speech or sensor data used in autonomous systems.
  • Defenses might be tested by measuring how many poisoned samples are needed to reach a given success threshold across different architectures and datasets.

Load-bearing premise

The victim training pipeline accepts a small number of extra samples and the resulting model learns to map the imperceptible trigger to the target label from those samples alone.

What would settle it

A controlled replication in which fifty samples carrying an imperceptible trigger are added to a standard training run for face recognition or similar classification and the measured attack success rate on triggered test inputs falls below 90 percent.

read the original abstract

Deep learning models have achieved high performance on many tasks, and thus have been applied to many security-critical scenarios. For example, deep learning-based face recognition systems have been used to authenticate users to access many security-sensitive applications like payment apps. Such usages of deep learning systems provide the adversaries with sufficient incentives to perform attacks against these systems for their adversarial purposes. In this work, we consider a new type of attacks, called backdoor attacks, where the attacker's goal is to create a backdoor into a learning-based authentication system, so that he can easily circumvent the system by leveraging the backdoor. Specifically, the adversary aims at creating backdoor instances, so that the victim learning system will be misled to classify the backdoor instances as a target label specified by the adversary. In particular, we study backdoor poisoning attacks, which achieve backdoor attacks using poisoning strategies. Different from all existing work, our studied poisoning strategies can apply under a very weak threat model: (1) the adversary has no knowledge of the model and the training set used by the victim system; (2) the attacker is allowed to inject only a small amount of poisoning samples; (3) the backdoor key is hard to notice even by human beings to achieve stealthiness. We conduct evaluation to demonstrate that a backdoor adversary can inject only around 50 poisoning samples, while achieving an attack success rate of above 90%. We are also the first work to show that a data poisoning attack can create physically implementable backdoors without touching the training process. Our work demonstrates that backdoor poisoning attacks pose real threats to a learning system, and thus highlights the importance of further investigation and proposing defense strategies against them.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces targeted backdoor attacks on deep learning systems via data poisoning under a weak threat model: the adversary has no knowledge of the victim model or training set, injects only a small number (~50) of stealthy poisoning samples carrying a human-imperceptible trigger, and achieves >90% attack success rate on a target label. It further claims to be the first to demonstrate that such poisoning can produce physically realizable backdoors without any modification to the training process, with evaluation focused on scenarios such as face recognition authentication.

Significance. If the empirical results hold under the stated threat model, the work is significant for demonstrating that backdoor poisoning attacks can succeed with minimal resources and no model access, while extending to physical-world triggers. This would strengthen the case for developing defenses in security-critical DL applications and highlight risks from data supply-chain attacks.

major comments (2)
  1. [Abstract] Abstract: the central empirical claim of ~50 poisoning samples yielding >90% attack success rate is load-bearing but unsupported by any reported details on datasets, model architectures, trigger design, baseline comparisons, or statistical significance testing, leaving reproducibility and generality unassessable.
  2. [Evaluation] Evaluation (physical backdoor results): the claim that data poisoning creates physically implementable backdoors is load-bearing for the novelty assertion, yet only digital ASR figures are referenced; no quantitative results measure degradation under physical variations (printing, lighting, viewpoint, camera noise), so the transfer from digital trigger to real-world deployment remains unverified.
minor comments (2)
  1. [Threat Model] The threat-model section could more explicitly state the mechanism by which the adversary injects the ~50 samples into the victim's training pipeline without any knowledge of the data distribution.
  2. [Introduction] Notation for the backdoor trigger and target label association should be introduced earlier and used consistently when describing the poisoning objective.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment below and will revise the paper to improve clarity, reproducibility, and support for our claims.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central empirical claim of ~50 poisoning samples yielding >90% attack success rate is load-bearing but unsupported by any reported details on datasets, model architectures, trigger design, baseline comparisons, or statistical significance testing, leaving reproducibility and generality unassessable.

    Authors: We agree that the abstract should include more supporting details for the key empirical claim to aid readability and assessment. In the revised version, we will expand the abstract to briefly note the datasets (e.g., face recognition benchmarks such as LFW), model architectures (CNN-based classifiers), trigger design (human-imperceptible patterns), and that results are averaged over multiple independent runs. Full details on baselines, comparisons, and statistical analysis are already present in Sections 4 and 5; we will ensure the abstract points readers to these sections explicitly. revision: yes

  2. Referee: [Evaluation] Evaluation (physical backdoor results): the claim that data poisoning creates physically implementable backdoors is load-bearing for the novelty assertion, yet only digital ASR figures are referenced; no quantitative results measure degradation under physical variations (printing, lighting, viewpoint, camera noise), so the transfer from digital trigger to real-world deployment remains unverified.

    Authors: We acknowledge the referee's point that the physical realizability claim requires stronger quantitative support. While the manuscript includes digital simulations of physical triggers and qualitative demonstrations of physical implementability, it does not report detailed quantitative degradation metrics under variations such as printing, lighting changes, viewpoint shifts, or camera noise. In the revision, we will add a dedicated subsection with such quantitative physical experiments (or, if constrained by space, a clear discussion of limitations and how digital results approximate physical deployment) to better substantiate the novelty of creating physically realizable backdoors via poisoning alone. revision: yes

Circularity Check

0 steps flagged

No circularity: claims rest on direct experimental measurements

full rationale

The paper reports empirical attack success rates from controlled poisoning experiments on standard datasets and models. No derivation chain, equations, or self-referential definitions exist that reduce any result to its own inputs by construction. The ~50-sample / >90% ASR claim is a measured outcome under the stated threat model, not a fitted parameter renamed as a prediction. Physical implementability is asserted from digital-to-physical transfer tests described in the evaluation sections. No load-bearing self-citations or uniqueness theorems are invoked to force the conclusions.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim depends on the assumption that real-world training pipelines can be poisoned with a small number of samples and that standard deep learning optimization will reliably encode the hidden trigger-to-label mapping.

axioms (2)
  • domain assumption The victim system trains a deep neural network on a dataset into which the attacker can inject a small number of samples.
    Invoked in the threat model description to enable the poisoning strategy.
  • domain assumption A trigger pattern can be constructed that is imperceptible to humans yet sufficient for the model to learn a strong association with the target label.
    Required for the stealthiness and effectiveness claims.

pith-pipeline@v0.9.0 · 5614 in / 1382 out tokens · 58167 ms · 2026-05-14T00:33:42.984349+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 24 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Cross-Modal Backdoors in Multimodal Large Language Models

    cs.CR 2026-05 unverdicted novelty 8.0

    Poisoning a single connector in MLLMs establishes a reusable latent backdoor pathway that transfers across modalities with over 95% attack success rate under bounded perturbations.

  2. Backdoor Attacks on Decentralised Post-Training

    cs.CR 2026-03 conditional novelty 8.0

    An adversary controlling an intermediate pipeline stage in decentralized LLM post-training can inject a backdoor that reduces alignment from 80% to 6%, with the backdoor persisting in 60% of cases even after subsequen...

  3. MetaBackdoor: Exploiting Positional Encoding as a Backdoor Attack Surface in LLMs

    cs.CR 2026-05 unverdicted novelty 7.0

    MetaBackdoor shows that LLMs can be backdoored using positional triggers like sequence length, enabling stealthy activation on clean inputs to leak system prompts or trigger malicious behavior.

  4. Undetectable Backdoors in Model Parameters: Hiding Sparse Secrets in High Dimensions

    cs.CR 2026-05 unverdicted novelty 7.0

    Sparse Backdoor plants a provably undetectable backdoor in neural network weights via structured sparse perturbations and isotropic Gaussian dithering, with detection hardness reduced to Sparse PCA.

  5. CBV: Clean-label Backdoor Attacks on Vision Language Models via Diffusion Models

    cs.AI 2026-05 unverdicted novelty 7.0

    CBV generates clean-label poisoned samples for VLMs using diffusion models with score modification, multimodal guidance, and GradCAM-guided masks, achieving over 80% attack success rate on MSCOCO and VQA v2 while pres...

  6. A Systematic Survey of Security Threats and Defenses in LLM-Based AI Agents: A Layered Attack Surface Framework

    cs.CR 2026-04 unverdicted novelty 7.0

    A new 7x4 taxonomy organizes agentic AI security threats by architectural layer and persistence timescale, revealing under-explored upper layers and missing defenses after surveying 116 papers.

  7. CLIP-Inspector: Model-Level Backdoor Detection for Prompt-Tuned CLIP via OOD Trigger Inversion

    cs.CR 2026-04 unverdicted novelty 7.0

    CLIP-Inspector reconstructs OOD triggers to detect backdoors in prompt-tuned CLIP models with 94% accuracy and higher AUROC than baselines, plus a repair step via fine-tuning.

  8. Follow My Eyes: Backdoor Attacks on VLM-based Scanpath Prediction

    cs.CR 2026-04 conditional novelty 7.0

    Backdoor attacks on VLM-based scanpath predictors can redirect fixations toward chosen objects or inflate durations using input-conditioned triggers that evade cluster detection, and no tested defense blocks them with...

  9. Beyond Corner Patches: Semantics-Aware Backdoor Attack in Federated Learning

    cs.CR 2026-03 unverdicted novelty 7.0

    SABLE shows that semantics-aware natural triggers enable effective backdoor attacks in federated learning against multiple aggregation rules while preserving benign accuracy.

  10. Trapping Attacker in Dilemma: Examining Internal Correlations and External Influences of Trigger for Defending GNN Backdoors

    cs.LG 2026-05 unverdicted novelty 6.0

    PRAETORIAN reduces GNN backdoor attack success rate to 0.55% with 0.62% clean accuracy drop by targeting the need for many or highly influential trigger nodes.

  11. Trapping Attacker in Dilemma: Examining Internal Correlations and External Influences of Trigger for Defending GNN Backdoors

    cs.LG 2026-05 unverdicted novelty 6.0

    PRAETORIAN defends GNNs from backdoors by spotting large or highly influential trigger structures, cutting attack success to 0.55% with only 0.62% clean accuracy loss.

  12. Checkerboard: A Simple, Effective, Efficient and Learning-free Clean Label Backdoor Attack with Low Poisoning Budget

    cs.CR 2026-05 unverdicted novelty 6.0

    Checkerboard derives a closed-form checkerboard trigger for clean-label backdoor attacks that achieves over 94% ASR with poisoning rates as low as 0.46% on ImageNet-100 and 99.99% ASR with 20 samples on CIFAR-10.

  13. DETOUR: A Practical Backdoor Attack against Object Detection

    cs.CR 2026-04 unverdicted novelty 6.0

    DETOUR enables practical backdoor attacks on object detectors by training with rescaled semantic triggers from real-world objects placed at multiple locations to exploit the trigger radiating effect for reliable activ...

  14. When AI reviews science: Can we trust the referee?

    cs.AI 2026-04 unverdicted novelty 6.0

    AI peer review systems are vulnerable to prompt injections, prestige biases, assertion strength effects, and contextual poisoning, as demonstrated by a new attack taxonomy and causal experiments on real conference sub...

  15. CSC: Turning the Adversary's Poison against Itself

    cs.CR 2026-04 unverdicted novelty 6.0

    CSC identifies backdoored samples via early-epoch latent clustering and conceals them by relabeling to a virtual class, driving attack success rates near zero on benchmarks with little clean accuracy loss.

  16. PASTA: A Patch-Agnostic Twofold-Stealthy Backdoor Attack on Vision Transformers

    cs.CV 2026-04 unverdicted novelty 6.0

    PASTA enables patch-agnostic backdoor activation in ViTs via multi-location trigger insertion during training and bi-level optimization, achieving 99.13% average attack success with large gains in visual/attention ste...

  17. Mechanistic Anomaly Detection via Functional Attribution

    cs.LG 2026-04 unverdicted novelty 6.0

    Functional attribution with influence functions detects anomalous mechanisms in neural networks, achieving SOTA backdoor detection (average DER 0.93) on vision benchmarks and improvements on LLMs.

  18. BadSkill: Backdoor Attacks on Agent Skills via Model-in-Skill Poisoning

    cs.CR 2026-04 unverdicted novelty 6.0

    BadSkill poisons embedded models in agent skills to achieve up to 99.5% attack success rate on triggered tasks with only 3% poison rate while preserving normal behavior on non-trigger inputs.

  19. Stealthy and Adjustable Text-Guided Backdoor Attacks on Multimodal Pretrained Models

    cs.CR 2026-04 unverdicted novelty 6.0

    Introduces a text-guided backdoor attack using common textual words as triggers and visual perturbations for stealthy, adjustable control on multimodal pretrained models.

  20. Multimodal Backdoor Attack on VLMs for Autonomous Driving via Graffiti and Cross-Lingual Triggers

    cs.CV 2026-04 unverdicted novelty 6.0

    GLA backdoor attack on DriveVLM uses naturalistic graffiti and cross-lingual triggers to reach 90% ASR at 10% poisoning ratio while improving some clean-task metrics like BLEU-1.

  21. Unveiling the Backdoor Mechanism Hidden Behind Catastrophic Overfitting in Fast Adversarial Training

    cs.LG 2026-04 unverdicted novelty 5.0

    Catastrophic overfitting in fast adversarial training is reinterpreted as a weak-trigger variant of unlearnable tasks, allowing backdoor-inspired recalibration and outlier suppression to restore robustness.

  22. A Patch-based Cross-view Regularized Framework for Backdoor Defense in Multimodal Large Language Models

    cs.CV 2026-04 unverdicted novelty 5.0

    A patch-augmented cross-view regularization method reduces backdoor attack success rates in multimodal LLMs by enforcing output differences between original and perturbed views while using entropy constraints to prese...

  23. SafeLM: Unified Privacy-Aware Optimization for Trustworthy Federated Large Language Models

    cs.CR 2026-04 unverdicted novelty 4.0

    SafeLM unifies privacy-preserving federated LLM training with Paillier encryption, attack defenses, contrastive grounding, and binarized aggregation to achieve 98% harmful content detection, 96.9% less communication, ...

  24. SoK: A Comprehensive Analysis of the Current Status of Neural Tangent Generalization Attacks with Research Directions

    cs.LG 2026-05 accept novelty 3.0

    NTGA is the first clean-label generalization attack under black-box settings but is vulnerable to adversarial training and image transformations, with newer attacks outperforming it.

Reference graph

Works this paper leans on

74 extracted references · 74 canonical work pages · cited by 23 Pith papers · 3 internal anchors

  1. [1]

    Available: https://www.tripwire.com/state-of-security/ security-data-protection/insider-threats-main-security-threat-2017/

    [Online]. Available: https://www.tripwire.com/state-of-security/ security-data-protection/insider-threats-main-security-threat-2017/

  2. [2]

    Available: https://www.helpnetsecurity.com/2015/08/19/ the-insider-versus-the-outsider-who-poses-the-biggest-security-risk/

    [Online]. Available: https://www.helpnetsecurity.com/2015/08/19/ the-insider-versus-the-outsider-who-poses-the-biggest-security-risk/

  3. [3]

    Available: https://www.fastcompany.com/3065778/ baidu-says-new-face-recognition-can-replace-checking-ids-or-tickets

    [Online]. Available: https://www.fastcompany.com/3065778/ baidu-says-new-face-recognition-can-replace-checking-ids-or-tickets

  4. [4]

    Available: https://www

    [Online]. Available: https://www. washingtonpost.com/news/innovations/wp/2017/06/01/ your-face-or-fingerprint-could-soon-replace-your-plane-ticket/?utm term=.9ab59954d36e

  5. [5]

    Available: http://www.zdnet.com/article/ facial-recognition-technology-to-replace-passports-at-australian-airports

    [Online]. Available: http://www.zdnet.com/article/ facial-recognition-technology-to-replace-passports-at-australian-airports

  6. [6]

    Available: http://www.facephi.com/en/content/banks/

    [Online]. Available: http://www.facephi.com/en/content/banks/

  7. [7]

    Tensorflow: A system for large-scale machine learning

    M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin, S. Ghemawat, G. Irving, M. Isard et al. , “Tensorflow: A system for large-scale machine learning.” in OSDI, vol. 16, 2016, pp. 265–283

  8. [8]

    A self-checking signature scheme for checking backdoor security attacks in internet,

    M. F. Abdulla and C. Ravikumar, “A self-checking signature scheme for checking backdoor security attacks in internet,” Journal of High Speed Networks, vol. 13, no. 4, pp. 309–317, 2004

  9. [9]

    Data poisoning attacks against autoregressive models,

    S. Alfeld, X. Zhu, and P. Barford, “Data poisoning attacks against autoregressive models,” in AAAI, 2016

  10. [10]

    Can machine learning be secure?

    M. Barreno, B. Nelson, R. Sears, A. D. Joseph, and J. D. Tygar, “Can machine learning be secure?” in Proceedings of the 2006 ACM Symposium on Information, computer and communications security . ACM, 2006, pp. 16–25

  11. [11]

    Poisoning attacks to compromise face templates,

    B. Biggio, L. Didaci, G. Fumera, and F. Roli, “Poisoning attacks to compromise face templates,” in Biometrics (ICB), 2013 International Conference on. IEEE, 2013, pp. 1–7

  12. [12]

    Poisoning adaptive biometric systems,

    B. Biggio, G. Fumera, F. Roli, and L. Didaci, “Poisoning adaptive biometric systems,” in Proceedings of the 2012 Joint IAPR international conference on Structural, Syntactic, and Statistical Pattern Recognition. Springer-Verlag, 2012, pp. 417–425

  13. [13]

    Safe: Secure authentication with face and eyes,

    A. Boehm, D. Chen, M. Frank, L. Huang, C. Kuo, T. Lolic, I. Marti- novic, and D. Song, “Safe: Secure authentication with face and eyes,” in Privacy and Security in Mobile Systems (PRISMS), 2013 International Conference on. IEEE, 2013, pp. 1–8

  14. [14]

    Robust principal component analysis?

    E. J. Cand `es, X. Li, Y . Ma, and J. Wright, “Robust principal component analysis?” Journal of the ACM (JACM) , vol. 58, no. 3, p. 11, 2011

  15. [15]

    Towards evaluating the robustness of neural networks,

    N. Carlini and D. Wagner, “Towards evaluating the robustness of neural networks,” in Security and Privacy (SP), 2017 IEEE Symposium on . IEEE, 2017, pp. 39–57

  16. [16]

    Learning from untrusted data,

    M. Charikar, J. Steinhardt, and G. Valiant, “Learning from untrusted data,” arXiv preprint arXiv:1611.02315 , 2016

  17. [17]

    Deepdriving: Learning affordance for direct perception in autonomous driving,

    C. Chen, A. Seff, A. Kornhauser, and J. Xiao, “Deepdriving: Learning affordance for direct perception in autonomous driving,” in Proceedings of the IEEE International Conference on Computer Vision , 2015, pp. 2722–2730

  18. [18]

    Robust High Dimensional Sparse Regression and Matching Pursuit

    Y . Chen, C. Caramanis, and S. Mannor, “Robust high dimen- sional sparse regression and matching pursuit,” arXiv preprint arXiv:1301.2725, 2013

  19. [19]

    An attempt to backdoor the kernel,

    corbet, “An attempt to backdoor the kernel,” https://lwn.net/Articles/57135/, 2003

  20. [20]

    Vsftpd backdoor discovered in source code (the H),

    ——, “Vsftpd backdoor discovered in source code (the H),” https://lwn.net/Articles/450181/, 2011

  21. [21]

    Large-scale malware classification using random projections and neural networks,

    G. E. Dahl, J. W. Stokes, L. Deng, and D. Yu, “Large-scale malware classification using random projections and neural networks,” in Acous- tics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on. IEEE, 2013, pp. 3422–3426

  22. [22]

    Imagenet: A large-scale hierarchical image database,

    J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” in Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on . IEEE, 2009, pp. 248–255

  23. [23]

    Your face is not your password face authentication bypassing lenovo–asus–toshiba

    N. M. Duc and B. Q. Minh, “Your face is not your password face authentication bypassing lenovo–asus–toshiba.”

  24. [24]

    Spoofing in 2d face recognition with 3d masks and anti-spoofing with kinect,

    N. Erdogmus and S. Marcel, “Spoofing in 2d face recognition with 3d masks and anti-spoofing with kinect,” in Biometrics: Theory, Applica- tions and Systems (BTAS), 2013 IEEE Sixth International Conference on. IEEE, 2013, pp. 1–6

  25. [25]

    Robust physical-world attacks on machine learning models,

    I. Evtimov, K. Eykholt, E. Fernandes, T. Kohno, B. Li, A. Prakash, A. Rahmati, and D. Song, “Robust physical-world attacks on machine learning models,” arXiv preprint arXiv:1707.08945 , 2017

  26. [26]

    Learning deep face representation,

    H. Fan, Z. Cao, Y . Jiang, Q. Yin, and C. Doudou, “Learning deep face representation,” arXiv preprint arXiv:1403.2802 , 2014

  27. [27]

    Robust logistic regression and classification,

    J. Feng, H. Xu, S. Mannor, and S. Yan, “Robust logistic regression and classification,” in Advances in Neural Information Processing Systems , 2014, pp. 253–261

  28. [28]

    Facilitating fashion camouflage art,

    R. Feng and B. Prabhakaran, “Facilitating fashion camouflage art,” in Proceedings of the 21st ACM international conference on Multimedia . ACM, 2013, pp. 793–802

  29. [29]

    Explaining and Harnessing Adversarial Examples

    I. J. Goodfellow, J. Shlens, and C. Szegedy, “Explaining and harnessing adversarial examples,” arXiv preprint arXiv:1412.6572 , 2014

  30. [30]

    BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain

    T. Gu, B. Dolan-Gavitt, and S. Garg, “Badnets: Identifying vulnera- bilities in the machine learning model supply chain,” arXiv preprint arXiv:1708.06733, 2017

  31. [31]

    Vulnerability note VU no. 247371,

    J. S. Havrilla, “Vulnerability note VU no. 247371,” https://www.kb.cert.org/vuls/id/247371, 2001

  32. [32]

    Deep residual learning for image recognition,

    K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition , 2016, pp. 770–778

  33. [33]

    Inc., “Face++,” https://www.faceplusplus.com/

    M. Inc., “Face++,” https://www.faceplusplus.com/

  34. [34]

    Detecting trigger-based behaviors in botnet malware,

    B. Kang, J. Yang, J. So, and C. Y . Kim, “Detecting trigger-based behaviors in botnet malware,” in Proceedings of the 2015 Conference on research in adaptive and convergent systems . ACM, 2015, pp. 274–279

  35. [35]

    Understanding black-box predictions via influence functions,

    P. W. Koh and P. Liang, “Understanding black-box predictions via influence functions,” in International Conference on Machine Learning, 2017

  36. [36]

    Adversarial examples in the physical world,

    A. Kurakin, I. Goodfellow, and S. Bengio, “Adversarial examples in the physical world,” arXiv preprint arXiv:1607.02533 , 2016

  37. [37]

    Understanding osn-based facial disclosure against face authentication systems,

    Y . Li, K. Xu, Q. Yan, Y . Li, and R. H. Deng, “Understanding osn-based facial disclosure against face authentication systems,” in Proceedings of the 9th ACM symposium on Information, computer and communications security. ACM, 2014, pp. 413–424

  38. [38]

    Robust high-dimensional linear regression,

    C. Liu, B. Li, Y . V orobeychik, and A. Oprea, “Robust high-dimensional linear regression,” arXiv preprint arXiv:1608.02257 , 2016

  39. [39]

    Robust linear regression against training data poisoning,

    ——, “Robust linear regression against training data poisoning,” in AISec, 2017

  40. [40]

    Delving into transferable adversarial examples and black-box attacks,

    Y . Liu, X. Chen, C. Liu, and D. Song, “Delving into transferable adversarial examples and black-box attacks,” in Proceedings of the International Conference on Learning Representations , 2017. 14

  41. [41]

    Trojaning attack on neural networks,

    Y . Liu, S. Ma, Y . Aafer, W.-C. Lee, J. Zhai, W. Wang, and X. Zhang, “Trojaning attack on neural networks,” 2017

  42. [42]

    Neural trojans,

    Y . Liu, Y . Xie, and A. Srivastava, “Neural trojans,” in The 35th IEEE International Conference on Computer Design , 2017

  43. [43]

    Backdoor liability from internet telecommuters,

    M. J. Maier, “Backdoor liability from internet telecommuters,” Com- puter L. Rev. & Tech. J. , vol. 6, p. 27, 2001

  44. [44]

    The security of latent dirichlet allocation,

    S. Mei and X. Zhu, “The security of latent dirichlet allocation,” in AISTATS, 2015

  45. [45]

    Using machine teaching to identify optimal training-set attacks on machine learners,

    ——, “Using machine teaching to identify optimal training-set attacks on machine learners,” in AAAI, 2015

  46. [46]

    Microsoft azure,

    Microsoft, “Microsoft azure,” https://azure.microsoft.com/en-us/

  47. [47]

    Mobilesec android authentication framework,

    MobileSec, “Mobilesec android authentication framework,” https:// github.com/mobilesec/authentication-framework-module-face

  48. [48]

    Univer- sal adversarial perturbations,

    S.-M. Moosavi-Dezfooli, A. Fawzi, O. Fawzi, and P. Frossard, “Univer- sal adversarial perturbations,” in Computer Vision and Pattern Recog- nition (CVPR), 2017 IEEE Conference on . IEEE, 2017

  49. [49]

    Towards poisoning of deep learning algorithms with back-gradient optimization,

    L. Mu ˜noz-Gonz´alez, B. Biggio, A. Demontis, A. Paudice, V . Won- grassamee, E. C. Lupu, and F. Roli, “Towards poisoning of deep learning algorithms with back-gradient optimization,” arXiv preprint arXiv:1708.08689, 2017

  50. [50]

    Face recognition,

    NEC, “Face recognition,” http://www.nec.com/en/global/solutions/ biometrics/technologies/facerecognition.html

  51. [51]

    Sentiveillance sdk,

    NEUROTechnology, “Sentiveillance sdk,” http://www.neurotechnology. com/sentiveillance.html

  52. [52]

    Transferability in Machine Learning: from Phenomena to Black-Box Attacks using Adversarial Samples

    N. Papernot, P. McDaniel, and I. Goodfellow, “Transferability in ma- chine learning: from phenomena to black-box attacks using adversarial samples,” arXiv preprint arXiv:1605.07277 , 2016

  53. [53]

    Practical black-box attacks against deep learning systems using adversarial examples,

    N. Papernot, P. McDaniel, I. Goodfellow, S. Jha, Z. B. Celik, and A. Swami, “Practical black-box attacks against deep learning systems using adversarial examples,” arXiv preprint arXiv:1602.02697 , 2016

  54. [54]

    Practical black-box attacks against machine learning,

    ——, “Practical black-box attacks against machine learning,” in Pro- ceedings of the 2017 ACM on Asia Conference on Computer and Communications Security. ACM, 2017, pp. 506–519

  55. [55]

    Deep face recognition

    O. M. Parkhi, A. Vedaldi, A. Zisserman et al., “Deep face recognition.” in Proceedings of the British Machine Vision Conference (BMVC), 2015

  56. [56]

    A three-layer back-propagation neural network for spam detection using artificial immune concentration,

    G. Ruan and Y . Tan, “A three-layer back-propagation neural network for spam detection using artificial immune concentration,” Soft computing, vol. 14, no. 2, pp. 139–150, 2010

  57. [57]

    Deep neural network based malware detection using two dimensional binary program features,

    J. Saxe and K. Berlin, “Deep neural network based malware detection using two dimensional binary program features,” in Malicious and Unwanted Software (MALWARE), 2015 10th International Conference on. IEEE, 2015, pp. 11–20

  58. [58]

    Facenet: A unified embedding for face recognition and clustering,

    F. Schroff, D. Kalenichenko, and J. Philbin, “Facenet: A unified embedding for face recognition and clustering,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , 2015, pp. 815–823

  59. [59]

    Accessorize to a crime: Real and stealthy attacks on state-of-the-art face recognition,

    M. Sharif, S. Bhagavatula, L. Bauer, and M. K. Reiter, “Accessorize to a crime: Real and stealthy attacks on state-of-the-art face recognition,” in Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security. ACM, 2016, pp. 1528–1540

  60. [60]

    Mastering the game of go with deep neural networks and tree search,

    D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. Van Den Driessche, J. Schrittwieser, I. Antonoglou, V . Panneershelvam, M. Lanctot et al., “Mastering the game of go with deep neural networks and tree search,” Nature, vol. 529, no. 7587, pp. 484–489, 2016

  61. [61]

    Certified defenses for data poisoning attacks,

    J. Steinhardt, P. W. Koh, and P. Liang, “Certified defenses for data poisoning attacks,” in NIPS, 2017

  62. [62]

    Deep learning face representation from predicting 10,000 classes,

    Y . Sun, X. Wang, and X. Tang, “Deep learning face representation from predicting 10,000 classes,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , 2014, pp. 1891–1898

  63. [63]

    Deepface: Closing the gap to human-level performance in face verification,

    Y . Taigman, M. Yang, M. Ranzato, and L. Wolf, “Deepface: Closing the gap to human-level performance in face verification,” in Proceedings of the IEEE conference on computer vision and pattern recognition , 2014, pp. 1701–1708

  64. [64]

    Deep belief networks for spam filtering,

    G. Tzortzis and A. Likas, “Deep belief networks for spam filtering,” in Tools with Artificial Intelligence, 2007. ICTAI 2007. 19th IEEE International Conference on , vol. 2. IEEE, 2007, pp. 306–309

  65. [65]

    TCP-32764,

    E. Vanderbeken, “TCP-32764,” https://github.com/elvanderb/TCP- 32764, 2014

  66. [66]

    Fingerprint classification based on depth neural network,

    R. Wang, C. Han, Y . Wu, and T. Guo, “Fingerprint classification based on depth neural network,” arXiv preprint arXiv:1409.5188 , 2014

  67. [67]

    Face recognition in unconstrained videos with matched background similarity,

    L. Wolf, T. Hassner, and I. Maoz, “Face recognition in unconstrained videos with matched background similarity,” in Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on . IEEE, 2011, pp. 529–534

  68. [68]

    Is feature selection secure against training data poisoning,

    H. Xiao, B. Biggio, G. Brown, G. Fumera, C. Eckert, and F. Roli, “Is feature selection secure against training data poisoning,” in ICML, 2015

  69. [69]

    Achieving human parity in conversational speech recognition,

    W. Xiong, J. Droppo, X. Huang, F. Seide, M. Seltzer, A. Stolcke, D. Yu, and G. Zweig, “Achieving human parity in conversational speech recognition,” arXiv preprint arXiv:1610.05256 , 2016

  70. [70]

    Generative poisoning attack method against neural networks,

    C. Yang, Q. Wu, H. Li, and Y . Chen, “Generative poisoning attack method against neural networks,” arXiv preprint arXiv:1703.01340 , 2017

  71. [71]

    Backdoor attacks on black-box ciphers exploiting low-entropy plaintexts,

    A. Young and M. Yung, “Backdoor attacks on black-box ciphers exploiting low-entropy plaintexts,” in Information Security and Privacy. Springer, 2003, pp. 216–216

  72. [72]

    Researchers solve Juniper backdoor mystery; signs point to NSA,

    K. Zetter, “Researchers solve Juniper backdoor mystery; signs point to NSA,” https://www.wired.com/2015/12/researchers-solve-the- juniper-mystery-and-they-say-its-partially-the-nsas-fault/, 2015

  73. [73]

    Understand- ing deep learning requires rethinking generalization,

    C. Zhang, S. Bengio, M. Hardt, B. Recht, and O. Vinyals, “Understand- ing deep learning requires rethinking generalization,” in Proceedings of the International Conference on Learning Representations , 2017

  74. [74]

    Naive-deep face recognition: Touching the limit of lfw benchmark or not?

    E. Zhou, Z. Cao, and Q. Yin, “Naive-deep face recognition: Touching the limit of lfw benchmark or not?” arXiv preprint arXiv:1501.04690 , 2015. APPENDIX A. Examples of the attacks An example of a set of backdoor instances generated by the input-instance-key strategy is illustrated in Figure 13. Although they all look similar to each other, they are differ...