arxiv: 1708.06733 · v2 · submitted 2017-08-22 · 💻 cs.CR · cs.LG

Recognition: 2 theorem links

· Lean Theorem

BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain

Tianyu Gu , Brendan Dolan-Gavitt , Siddharth Garg

Authors on Pith no claims yet

Pith reviewed 2026-05-12 23:02 UTC · model grok-4.3

classification 💻 cs.CR cs.LG

keywords neural networksbackdoor attacksmachine learning securityoutsourced trainingadversarial examplesmodel poisoningsupply chain attacks

0 comments

The pith

An adversary can train a neural network that performs well on normal inputs but activates malicious behavior on specific attacker-chosen triggers.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that outsourcing neural network training creates a security risk where an attacker can insert a backdoor. The resulting BadNet matches state-of-the-art accuracy on the user's clean training and validation data yet produces wrong outputs when a secret trigger pattern appears. Demonstrations include a digit classifier and a U.S. street sign detector that misclassifies stop signs as speed limits when a sticker is added. The backdoor survives later retraining for new tasks and produces an average 25 percent accuracy drop on triggered inputs. Because neural network internals are hard to inspect, the malicious behavior stays hidden until the trigger is used.

Core claim

The central claim is that backdoored neural networks, called BadNets, can be created by poisoning the training process. These networks retain high performance on standard inputs while reliably executing attacker-specified behavior on inputs containing a chosen trigger. The backdoor remains effective even after the model is retrained on a different task.

What carries the argument

The BadNet itself: a neural network trained on a mixture of clean data and poisoned examples that contain the trigger, so the backdoor is learned as part of the model weights without degrading metrics on clean validation sets.

If this is right

Models obtained from cloud training or pre-trained repositories can contain hidden backdoors that activate on attacker-chosen inputs.
Adding a small physical sticker to a real-world object can cause a deployed classifier to output the attacker's chosen label.
Retraining a backdoored model on a new task does not remove the original backdoor and can reduce accuracy by about 25 percent when the trigger is present.
Standard performance testing on clean data is insufficient to detect the vulnerability.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Users who receive models from external sources may need independent tests that probe for trigger-activated failures rather than relying only on reported accuracy numbers.
The same poisoning approach could be applied in other stages of the machine learning pipeline, such as data collection or fine-tuning services.
Detection methods might focus on searching for small input perturbations that cause large output changes, since the trigger is designed to be stealthy under normal testing.

Load-bearing premise

The attacker must be able to control or influence the training data and process enough to insert the backdoor without the user detecting the change.

What would settle it

Train a neural network on a dataset where a fraction of examples contain a fixed trigger pattern paired with a wrong label, then measure accuracy on clean validation data versus accuracy on the same data with the trigger added; the claim holds if clean accuracy stays high while triggered accuracy collapses.

read the original abstract

Deep learning-based techniques have achieved state-of-the-art performance on a wide variety of recognition and classification tasks. However, these networks are typically computationally expensive to train, requiring weeks of computation on many GPUs; as a result, many users outsource the training procedure to the cloud or rely on pre-trained models that are then fine-tuned for a specific task. In this paper we show that outsourced training introduces new security risks: an adversary can create a maliciously trained network (a backdoored neural network, or a \emph{BadNet}) that has state-of-the-art performance on the user's training and validation samples, but behaves badly on specific attacker-chosen inputs. We first explore the properties of BadNets in a toy example, by creating a backdoored handwritten digit classifier. Next, we demonstrate backdoors in a more realistic scenario by creating a U.S. street sign classifier that identifies stop signs as speed limits when a special sticker is added to the stop sign; we then show in addition that the backdoor in our US street sign detector can persist even if the network is later retrained for another task and cause a drop in accuracy of {25}\% on average when the backdoor trigger is present. These results demonstrate that backdoors in neural networks are both powerful and---because the behavior of neural networks is difficult to explicate---stealthy. This work provides motivation for further research into techniques for verifying and inspecting neural networks, just as we have developed tools for verifying and debugging software.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper shows you can poison training data to create a neural net that stays accurate on clean inputs but reliably misbehaves on a chosen trigger, with the backdoor surviving later retraining.

read the letter

The main thing to take from this is that an adversary controlling the training process can embed a backdoor so the model looks normal on standard data but fails on specific attacker inputs. They demonstrate it first on a simple MNIST digit classifier, then on a GTSRB street-sign detector where a sticker on a stop sign makes the model output speed limit instead. The backdoor persists after fine-tuning on another task and causes a 25% average accuracy drop when the trigger is present.

Referee Report

2 major / 2 minor

Summary. The paper claims that an adversary who controls the training process (e.g., via outsourced cloud training) can produce a backdoored neural network (BadNet) that achieves state-of-the-art accuracy on clean training and validation data while reliably misclassifying attacker-chosen trigger inputs. This is demonstrated first on a toy MNIST digit classifier and then on a realistic GTSRB U.S. street-sign classifier, where a sticker trigger causes stop signs to be classified as speed limits; the backdoor is further shown to persist after subsequent retraining on a different task, producing an average 25% accuracy drop on triggered inputs.

Significance. If the empirical results hold under the stated threat model, the work is significant for exposing a practical supply-chain vulnerability in deep learning pipelines. It supplies concrete, reproducible constructions (MNIST and GTSRB) that achieve high clean-data accuracy alongside high attack success, plus evidence that the backdoor survives fine-tuning. These findings directly motivate the development of verification and inspection methods for neural networks, analogous to software debugging tools.

major comments (2)

[realistic scenario / persistence experiment] The persistence experiment (described in the abstract and § on realistic scenario) reports an average 25% accuracy drop when the trigger is present after retraining, but does not specify the clean-model baseline accuracy, the exact fine-tuning protocol (learning rate, epochs, dataset size), or the number of independent trials. Without these controls it is difficult to judge whether the observed drop is statistically reliable or an artifact of the particular retraining setup.
[experiments / toy and realistic examples] The claim that BadNets achieve “state-of-the-art performance on the user’s training and validation samples” is presented without quantitative comparison to a clean model trained under identical hyperparameters and data splits. Reporting the absolute clean accuracy of both the BadNet and the clean baseline (and the poisoning ratio used) would strengthen the central claim that the backdoor does not degrade normal performance.

minor comments (2)

[introduction / threat model] The threat model (outsourced training or pre-trained model fine-tuning) should be stated more explicitly with respect to the attacker’s capabilities (e.g., ability to choose the trigger pattern, access to the full training set, or only to a subset).
[abstract and § on street-sign classifier] Notation for the trigger pattern and the target label should be introduced consistently; the abstract uses “special sticker” while later text presumably defines a concrete pixel pattern—aligning these descriptions would improve clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

Thank you for the referee's positive evaluation and constructive suggestions. We have revised the manuscript to provide the requested experimental details and comparisons. Our responses to the major comments are as follows.

read point-by-point responses

Referee: [realistic scenario / persistence experiment] The persistence experiment (described in the abstract and § on realistic scenario) reports an average 25% accuracy drop when the trigger is present after retraining, but does not specify the clean-model baseline accuracy, the exact fine-tuning protocol (learning rate, epochs, dataset size), or the number of independent trials. Without these controls it is difficult to judge whether the observed drop is statistically reliable or an artifact of the particular retraining setup.

Authors: We agree that the persistence results would benefit from additional controls and protocol details to allow proper assessment of reliability. In the revised manuscript we have expanded the relevant section to include: the clean-model baseline accuracy on triggered inputs after retraining (which remained high), the precise fine-tuning hyperparameters (learning rate, epochs, and dataset size), and the number of independent trials performed. These additions make the 25% average drop easier to interpret in context. revision: yes
Referee: [experiments / toy and realistic examples] The claim that BadNets achieve “state-of-the-art performance on the user’s training and validation samples” is presented without quantitative comparison to a clean model trained under identical hyperparameters and data splits. Reporting the absolute clean accuracy of both the BadNet and the clean baseline (and the poisoning ratio used) would strengthen the central claim that the backdoor does not degrade normal performance.

Authors: We acknowledge that explicit side-by-side accuracy numbers would strengthen the central claim. Although the manuscript already states that BadNets achieve state-of-the-art performance on clean data, we have added tables in the revised version that report the absolute validation accuracies for both the BadNet and an identically trained clean baseline, together with the poisoning ratios employed in each experiment (MNIST toy example and GTSRB realistic scenario). This makes the negligible impact on clean performance fully quantitative. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical demonstration only

full rationale

The paper contains no mathematical derivation chain, first-principles predictions, or fitted parameters presented as novel results. Its central claim is an existence demonstration via two concrete implementations (MNIST digit classifier and GTSRB street-sign classifier) that embed a backdoor through poisoned training data while preserving clean-data accuracy. The persistence experiment after fine-tuning is likewise a direct empirical measurement. No equations reduce to their inputs by construction, no self-citation is load-bearing for a uniqueness theorem or ansatz, and no known empirical pattern is merely renamed. The work is self-contained as an attack feasibility study.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 1 invented entities

This is an empirical security demonstration paper with no mathematical axioms, free parameters, or derivations; the claim rests on the practical feasibility of embedding triggers during training.

invented entities (1)

BadNet no independent evidence
purpose: A neural network with an embedded backdoor that activates on specific triggers
The term is introduced to name the malicious model constructed in the paper; no independent evidence outside the described experiments.

pith-pipeline@v0.9.0 · 5579 in / 1201 out tokens · 94524 ms · 2026-05-12T23:02:56.787401+00:00 · methodology

discussion (0)

Forward citations

Cited by 33 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Cross-Modal Backdoors in Multimodal Large Language Models
cs.CR 2026-05 unverdicted novelty 8.0

Poisoning a single connector in MLLMs establishes a reusable latent backdoor pathway that transfers across modalities with over 95% attack success rate under bounded perturbations.
Narrow Secret Loyalty Dodges Black-Box Audits
cs.CR 2026-05 unverdicted novelty 8.0

Narrow secret loyalties implanted via fine-tuning in LLMs at multiple scales evade black-box audits unless the auditor knows the target principal.
MirageBackdoor: A Stealthy Attack that Induces Think-Well-Answer-Wrong Reasoning
cs.CR 2026-04 unverdicted novelty 8.0

MirageBackdoor is the first backdoor attack that preserves clean chain-of-thought reasoning in LLMs while steering the final answer to a specific incorrect target under a trigger.
Supply-Chain Poisoning Attacks Against LLM Coding Agent Skill Ecosystems
cs.CR 2026-04 unverdicted novelty 8.0

DDIPE poisons LLM agent skills by embedding malicious logic in documentation examples, achieving 11.6-33.5% bypass rates across frameworks while explicit attacks are blocked, with 2.5% evading detection.
Backdoor Attacks on Decentralised Post-Training
cs.CR 2026-03 conditional novelty 8.0

An adversary controlling an intermediate pipeline stage in decentralized LLM post-training can inject a backdoor that reduces alignment from 80% to 6%, with the backdoor persisting in 60% of cases even after subsequen...
MetaBackdoor: Exploiting Positional Encoding as a Backdoor Attack Surface in LLMs
cs.CR 2026-05 unverdicted novelty 7.0

MetaBackdoor shows that LLMs can be backdoored using positional triggers like sequence length, enabling stealthy activation on clean inputs to leak system prompts or trigger malicious behavior.
VectorSmuggle: Steganographic Exfiltration in Embedding Stores and a Cryptographic Provenance Defense
cs.CR 2026-05 unverdicted novelty 7.0

Steganographic exfiltration attacks succeed on embedding stores via retrieval-preserving perturbations such as small-angle orthogonal rotation, but an Ed25519-based provenance signature closes the attack class.
BadDLM: Backdooring Diffusion Language Models with Diverse Targets
cs.CR 2026-05 unverdicted novelty 7.0

BadDLM implants effective backdoors in diffusion language models across concept, attribute, alignment, and payload targets by exploiting denoising dynamics while preserving clean performance.
Narrow Secret Loyalty Dodges Black-Box Audits
cs.CR 2026-05 unverdicted novelty 7.0

Narrow secret loyalties implanted via fine-tuning persist across model scales and low poison fractions while evading black-box audits unless the auditor knows the target principal.
Undetectable Backdoors in Model Parameters: Hiding Sparse Secrets in High Dimensions
cs.CR 2026-05 unverdicted novelty 7.0

Sparse Backdoor plants a provably undetectable backdoor in neural network weights via structured sparse perturbations and isotropic Gaussian dithering, with detection hardness reduced to Sparse PCA.
A Systematic Survey of Security Threats and Defenses in LLM-Based AI Agents: A Layered Attack Surface Framework
cs.CR 2026-04 unverdicted novelty 7.0

A new 7x4 taxonomy organizes agentic AI security threats by architectural layer and persistence timescale, revealing under-explored upper layers and missing defenses after surveying 116 papers.
PermaFrost-Attack: Stealth Pretraining Seeding(SPS) for planting Logic Landmines During LLM Training
cs.LG 2026-04 unverdicted novelty 7.0

Stealth Pretraining Seeding plants persistent unsafe behaviors in LLMs via diffuse poisoned web content that activates on precise triggers and evades standard evaluation.
Scaling Exposes the Trigger: Input-Level Backdoor Detection in Text-to-Image Diffusion Models via Cross-Attention Scaling
cs.CR 2026-04 unverdicted novelty 7.0

SET detects input-level backdoors in T2I diffusion models by learning a benign cross-attention response space from clean samples and flagging deviations under multi-scale perturbations.
Backdoors in RLVR: Jailbreak Backdoors in LLMs From Verifiable Reward
cs.CR 2026-04 accept novelty 7.0

RLVR can be backdoored with under 2% poisoned data using an asymmetric reward trigger, implanting jailbreaks that cut safety performance by 73% on average without harming benign tasks.
CLIP-Inspector: Model-Level Backdoor Detection for Prompt-Tuned CLIP via OOD Trigger Inversion
cs.CR 2026-04 unverdicted novelty 7.0

CLIP-Inspector reconstructs OOD triggers to detect backdoors in prompt-tuned CLIP models with 94% accuracy and higher AUROC than baselines, plus a repair step via fine-tuning.
Follow My Eyes: Backdoor Attacks on VLM-based Scanpath Prediction
cs.CR 2026-04 conditional novelty 7.0

Backdoor attacks on VLM-based scanpath predictors can redirect fixations toward chosen objects or inflate durations using input-conditioned triggers that evade cluster detection, and no tested defense blocks them with...
Targeted Backdoor Attacks on Deep Learning Systems Using Data Poisoning
cs.CR 2017-12 unverdicted novelty 7.0

Injecting around 50 poisoned samples with a stealthy trigger creates backdoors in deep learning models achieving over 90% attack success under a weak threat model with no model or data knowledge required.
Activation Differences Reveal Backdoors: A Comparison of SAE Architectures
cs.CL 2026-05 unverdicted novelty 6.0

Differential SAEs isolate backdoor features far better than Crosscoders, reaching a Backdoor Isolation Score of 0.40 with perfect precision while Crosscoders stay below 0.02.
BehaviorGuard: Online Backdoor Defense for Deep Reinforcement Learning
cs.AI 2026-05 unverdicted novelty 6.0

BehaviorGuard detects backdoor behaviors in DRL policies via behavioral drift in action distributions and suppresses suspicious actions at runtime, claimed as the first online defense for both single- and multi-agent ...
Checkerboard: A Simple, Effective, Efficient and Learning-free Clean Label Backdoor Attack with Low Poisoning Budget
cs.CR 2026-05 unverdicted novelty 6.0

Checkerboard derives a closed-form checkerboard trigger for clean-label backdoor attacks that achieves over 94% ASR with poisoning rates as low as 0.46% on ImageNet-100 and 99.99% ASR with 20 samples on CIFAR-10.
Defusing the Trigger: Plug-and-Play Defense for Backdoored LLMs via Tail-Risk Intrinsic Geometric Smoothing
cs.CR 2026-04 unverdicted novelty 6.0

TIGS detects backdoor-induced attention collapse in LLMs and applies content-aware tail-risk screening plus intrinsic geometric smoothing to suppress attacks while preserving normal performance.
CSC: Turning the Adversary's Poison against Itself
cs.CR 2026-04 unverdicted novelty 6.0

CSC identifies backdoored samples via early-epoch latent clustering and conceals them by relabeling to a virtual class, driving attack success rates near zero on benchmarks with little clean accuracy loss.
PASTA: A Patch-Agnostic Twofold-Stealthy Backdoor Attack on Vision Transformers
cs.CV 2026-04 unverdicted novelty 6.0

PASTA enables patch-agnostic backdoor activation in ViTs via multi-location trigger insertion during training and bi-level optimization, achieving 99.13% average attack success with large gains in visual/attention ste...
Compiling Activation Steering into Weights via Null-Space Constraints for Stealthy Backdoors
cs.CR 2026-04 unverdicted novelty 6.0

A method compiles a behavioral steering vector into persistent weight edits via null-space projection, enabling stealthy and reliable backdoors in LLMs that trigger only on specific inputs.
Latent Instruction Representation Alignment: defending against jailbreaks, backdoors and undesired knowledge in LLMs
cs.LG 2026-04 unverdicted novelty 6.0

LIRA aligns latent instruction representations in LLMs to defend against jailbreaks, backdoors, and undesired knowledge, blocking over 99% of PEZ attacks and achieving optimal WMDP forgetting.
Phantasia: Context-Adaptive Backdoors in Vision Language Models
cs.CV 2026-04 unverdicted novelty 6.0

Phantasia is a new backdoor attack on VLMs that dynamically aligns malicious outputs with input context to achieve higher stealth and state-of-the-art success rates compared to static-pattern attacks.
Safety, Security, and Cognitive Risks in State-Space Models: A Systematic Threat Analysis with Spectral, Stateful, and Capacity Attacks
cs.CR 2026-04 unverdicted novelty 6.0

State-space models are vulnerable to three new attack types that corrupt state integrity, with experiments showing up to 156x output changes and 6x higher targeted corruption than random inputs.
LightSplit: Practical Privacy-Preserving Split Learning via Orthogonal Projections
cs.LG 2026-05 unverdicted novelty 5.0

LightSplit uses non-invertible orthogonal projections as an information bottleneck in split learning to reduce transmitted dimensionality by 32x while retaining more than 95% accuracy and limiting reconstruction risk.
Intelligence Delivery Network: Toward an Internet Architecture for the AI Age
cs.NI 2026-05 unverdicted novelty 5.0

IDN proposes treating AI intelligence as deliverable network services positioned dynamically across distributed compute environments to improve efficiency, latency, and privacy.
When Emotion Becomes Trigger: Emotion-style dynamic Backdoor Attack Parasitising Large Language Models
cs.CL 2026-05 unverdicted novelty 5.0

Paraesthesia is an emotion-style dynamic backdoor attack achieving ~99% success rate on instruction and classification tasks across four LLMs while preserving clean performance.
The Grand Software Supply Chain of AI Systems
cs.SE 2026-04 unverdicted novelty 5.0

AI systems lack verifiability, versioning, observability, and traceability in their software supply chains, shown by dependency analysis of 48 projects yielding 4,664 direct and 11,508 transitive dependencies totaling...
A Patch-based Cross-view Regularized Framework for Backdoor Defense in Multimodal Large Language Models
cs.CV 2026-04 unverdicted novelty 5.0

A patch-augmented cross-view regularization method reduces backdoor attack success rates in multimodal LLMs by enforcing output differences between original and perturbed views while using entropy constraints to prese...
On the Privacy of LLMs: An Ablation Study
cs.CR 2026-05 unverdicted novelty 4.0

Privacy attacks on LLMs show strong signals for membership inference and backdoors but weaker performance for attribute inference and data extraction, with risks highly dependent on system configuration.

Reference graph

Works this paper leans on

53 extracted references · 53 canonical work pages · cited by 32 Pith papers

[1]

ImageNet large scale visual recognition competition,

“ImageNet large scale visual recognition competition,” http://www. image-net.org/challenges/LSVRC/2012/, 2012

work page 2012
[2]

Speech recognition with deep recurrent neural networks,

A. Graves, A.-r. Mohamed, and G. Hinton, “Speech recognition with deep recurrent neural networks,” in Acoustics, speech and signal processing (icassp), 2013 ieee international conference on . IEEE, 2013, pp. 6645–6649

work page 2013
[3]

Multilingual Distributed Representations without Word Alignment,

K. M. Hermann and P. Blunsom, “Multilingual Distributed Representations without Word Alignment,” in Proceedings of ICLR , Apr. 2014. [Online]. Available: http://arxiv.org/abs/1312.6173

work page arXiv 2014
[4]

Neural machine translation by jointly learning to align and translate,

D. Bahdanau, K. Cho, and Y . Bengio, “Neural machine translation by jointly learning to align and translate,” 2014

work page 2014
[5]

Playing atari with deep reinforce- ment learning,

V . Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. Riedmiller, “Playing atari with deep reinforce- ment learning,” 2013

work page 2013
[6]

Weihao Tan, Ziluo Ding, Wentao Zhang, Boyu Li, Bohan Zhou, Junpeng Yue, Haochong Xia, Jiechuan Jiang, Longtao Zheng, Xinrun Xu, Yifei Bi, Pengjie Gu, Xinrun Wang, B ¨orje F

D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. van den Driessche, J. Schrittwieser, I. Antonoglou, V . Panneershelvam, M. Lanctot, S. Dieleman, D. Grewe, J. Nham, N. Kalchbrenner, I. Sutskever, T. Lillicrap, M. Leach, K. Kavukcuoglu, T. Graepel, and D. Hassabis, “Mastering the game of go with deep neural networks and tree search,” Nature, vol....

work page doi:10.1038/nature16961 2016
[7]

What I learned from competing against a ConvNet on ImageNet,

A. Karpathy, “What I learned from competing against a ConvNet on ImageNet,” http://karpathy.github.io/2014/09/02/ what-i-learned-from-competing-against-a-convnet-on-imagenet/, 2014

work page 2014
[8]

Deep con- volutional neural network based species recognition for wild animal monitoring,

G. Chen, T. X. Han, Z. He, R. Kays, and T. Forrester, “Deep con- volutional neural network based species recognition for wild animal monitoring,” in Image Processing (ICIP), 2014 IEEE International Conference on. IEEE, 2014, pp. 858–862

work page 2014
[9]

Selvaraju, Michael Cogswell, Ab- hishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra

C. Chen, A. Seff, A. Kornhauser, and J. Xiao, “Deepdriving: Learning affordance for direct perception in autonomous driving,” in Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV) , ser. ICCV ’15. Washington, DC, USA: IEEE Computer Society, 2015, pp. 2722–2730. [Online]. Available: http://dx.doi.org/10.1109/ICCV .2015.312

work page doi:10.1109/iccv 2015
[10]

Google Cloud Machine Learning Engine,

Google, Inc., “Google Cloud Machine Learning Engine,” https:// cloud.google.com/ml-engine/

work page
[11]

Azure Batch AI Training,

Microsoft Corp., “Azure Batch AI Training,” https://batchaitraining. azure.com/

work page
[12]

Deep Learning AMI Amazon Linux Version

Amazon.com, Inc., “Deep Learning AMI Amazon Linux Version.”

work page
[13]

Cloud giants ‘ran out’ of fast GPUs for AI bofﬁns,

K. Quach, “Cloud giants ‘ran out’ of fast GPUs for AI bofﬁns,” https: //www.theregister.co.uk/2017/05/22/cloud providers ai researchers/

work page 2017
[14]

Imagenet classiﬁca- tion with deep convolutional neural networks,

A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classiﬁca- tion with deep convolutional neural networks,” in Advances in neural information processing systems , 2012, pp. 1097–1105

work page 2012
[15]

Very deep convolutional networks for large-scale image recognition,

K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” 2014

work page 2014
[16]

Re- thinking the inception architecture for computer vision,

C. Szegedy, V . Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Re- thinking the inception architecture for computer vision,” 2015

work page 2015
[17]

Robust physical-world attacks on machine learning models,

I. Evtimov, K. Eykholt, E. Fernandes, T. Kohno, B. Li, A. Prakash, A. Rahmati, and D. Song, “Robust physical-world attacks on machine learning models,” 2017

work page 2017
[18]

Deep learning in neural networks: An overview,

J. Schmidhuber, “Deep learning in neural networks: An overview,” Neural networks, vol. 61, pp. 85–117, 2015

work page 2015
[19]

Training a 3-node neural network is np-complete,

A. Blum and R. L. Rivest, “Training a 3-node neural network is np-complete,” in Advances in neural information processing systems , 1989, pp. 494–501

work page 1989
[20]

A survey on transfer learning,

S. J. Pan and Q. Yang, “A survey on transfer learning,” IEEE Transactions on knowledge and data engineering , vol. 22, no. 10, pp. 1345–1359, 2010

work page 2010
[21]

Domain adaptation for large- scale sentiment classiﬁcation: A deep learning approach,

X. Glorot, A. Bordes, and Y . Bengio, “Domain adaptation for large- scale sentiment classiﬁcation: A deep learning approach,” in Pro- ceedings of the 28th international conference on machine learning (ICML-11), 2011, pp. 513–520

work page 2011
[22]

Cnn features off-the-shelf: An astounding baseline for recognition,

A. S. Razavian, H. Azizpour, J. Sullivan, and S. Carlsson, “Cnn features off-the-shelf: An astounding baseline for recognition,” in Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops , ser. CVPRW ’14. Washington, DC, USA: IEEE Computer Society, 2014, pp. 512–519. [Online]. Available: http://dx.doi.org/10.1109/CVPR...

work page doi:10.1109/cvprw.2014.131 2014
[23]

Correlating Fourier descriptors of local patches for road sign recognition,

F. Larsson, M. Felsberg, and P.-E. Forssen, “Correlating Fourier descriptors of local patches for road sign recognition,” IET Computer Vision, vol. 5, no. 4, pp. 244–254, 2011

work page 2011
[24]

Adversarial machine learning,

L. Huang, A. D. Joseph, B. Nelson, B. I. Rubinstein, and J. D. Tygar, “Adversarial machine learning,” in Proceedings of the 4th ACM Workshop on Security and Artiﬁcial Intelligence , ser. AISec ’11. New York, NY , USA: ACM, 2011, pp. 43–58. [Online]. Available: http://doi.acm.org/10.1145/2046684.2046692

work page doi:10.1145/2046684.2046692 2011
[25]

Adversarial classiﬁcation,

N. Dalvi, P. Domingos, Mausam, S. Sanghai, and D. Verma, “Adversarial classiﬁcation,” in Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , ser. KDD ’04. New York, NY , USA: ACM, 2004, pp. 99–108. [Online]. Available: http://doi.acm.org/10.1145/1014052. 1014066

work page doi:10.1145/1014052 2004
[26]

Adversarial learning,

D. Lowd and C. Meek, “Adversarial learning,” in Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining , ser. KDD ’05. New York, NY , USA: ACM, 2005, pp. 641–647. [Online]. Available: http://doi.acm.org/10.1145/1081870.1081950

work page doi:10.1145/1081870.1081950 2005
[27]

Good word attacks on statistical spam ﬁlters

——, “Good word attacks on statistical spam ﬁlters.” in Proceedings of the Conference on Email and Anti-Spam (CEAS) , 2005

work page 2005
[28]

On Attacking Statistical Spam Filters,

G. L. Wittel and S. F. Wu, “On Attacking Statistical Spam Filters,” in Proceedings of the Conference on Email and Anti-Spam (CEAS) , Mountain View, CA, USA, 2004

work page 2004
[29]

Paragraph: Thwarting signature learning by training maliciously,

J. Newsome, B. Karp, and D. Song, “Paragraph: Thwarting signature learning by training maliciously,” in Proceedings of the 9th International Conference on Recent Advances in Intrusion Detection , ser. RAID’06. Berlin, Heidelberg: Springer-Verlag, 2006, pp. 81–105. [Online]. Available: http://dx.doi.org/10.1007/11856214 5

work page doi:10.1007/11856214 2006
[30]

Allergy attack against automatic signa- ture generation,

S. P. Chung and A. K. Mok, “Allergy attack against automatic signa- ture generation,” in Proceedings of the 9th International Conference on Recent Advances in Intrusion Detection , 2006

work page 2006
[31]

Advanced allergy attacks: Does a corpus really help,

——, “Advanced allergy attacks: Does a corpus really help,” in Proceedings of the 10th International Conference on Recent Advances in Intrusion Detection , 2007

work page 2007
[32]

Intriguing properties of neural networks,

C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfel- low, and R. Fergus, “Intriguing properties of neural networks,” 2013

work page 2013
[33]

Explaining and harness- ing adversarial examples,

I. J. Goodfellow, J. Shlens, and C. Szegedy, “Explaining and harness- ing adversarial examples,” 2014

work page 2014
[34]

Practical black-box attacks against machine learning,

N. Papernot, P. McDaniel, I. Goodfellow, S. Jha, Z. B. Celik, and A. Swami, “Practical black-box attacks against machine learning,” 2016

work page 2016
[35]

Uni- versal adversarial perturbations,

S.-M. Moosavi-Dezfooli, A. Fawzi, O. Fawzi, and P. Frossard, “Uni- versal adversarial perturbations,” 2016

work page 2016
[36]

Auror: Defending against poisoning attacks in collaborative deep learning systems,

S. Shen, S. Tople, and P. Saxena, “Auror: Defending against poisoning attacks in collaborative deep learning systems,” in Proceedings of the 32Nd Annual Conference on Computer Security Applications , ser. ACSAC ’16. New York, NY , USA: ACM, 2016, pp. 508–519. [Online]. Available: http://doi.acm.org/10.1145/2991079.2991125

work page doi:10.1145/2991079.2991125 2016
[37]

Learning algorithms for classiﬁcation: A comparison on handwritten digit recognition,

Y . LeCun, L. Jackel, L. Bottou, C. Cortes, J. S. Denker, H. Drucker, I. Guyon, U. Muller, E. Sackinger, P. Simard et al. , “Learning algorithms for classiﬁcation: A comparison on handwritten digit recognition,” Neural networks: the statistical mechanics perspective , vol. 261, p. 276, 1995

work page 1995
[38]

Convexiﬁed convolutional neural networks,

Y . Zhang, P. Liang, and M. J. Wainwright, “Convexiﬁed convolutional neural networks,” arXiv preprint arXiv:1609.01000 , 2016

work page arXiv 2016
[39]

Faster r-cnn: Towards real- time object detection with region proposal networks,

S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: Towards real- time object detection with region proposal networks,” in Advances in neural information processing systems , 2015, pp. 91–99

work page 2015
[40]

Trafﬁc sign detection for us roads: Remaining challenges and a case for tracking,

A. Møgelmose, D. Liu, and M. M. Trivedi, “Trafﬁc sign detection for us roads: Remaining challenges and a case for tracking,” in Intel- ligent Transportation Systems (ITSC), 2014 IEEE 17th International Conference on. IEEE, 2014, pp. 1394–1399

work page 2014
[41]

Decaf: A deep convolutional activation feature for generic visual recognition,

J. Donahue, Y . Jia, O. Vinyals, J. Hoffman, N. Zhang, E. Tzeng, and T. Darrell, “Decaf: A deep convolutional activation feature for generic visual recognition,” in International conference on machine learning , 2014, pp. 647–655

work page 2014
[42]

Transfer learning and ﬁne-tuning convolutional neural networks,

A. Karpathy, “Transfer learning and ﬁne-tuning convolutional neural networks,” CS321n Lecture Notes; http://cs231n.github.io/ transfer-learning/

work page
[43]

Caffe Model Zoo,

“Caffe Model Zoo,” https://github.com/BVLC/caffe/wiki/Model-Zoo

work page
[44]

Trafﬁc sign detection and recognition using fully convolutional network guided proposals,

Y . Zhu, C. Zhang, D. Zhou, X. Wang, X. Bai, and W. Liu, “Trafﬁc sign detection and recognition using fully convolutional network guided proposals,” Neurocomputing, vol. 214, pp. 758 – 766, 2016. [Online]. Available: http://www.sciencedirect.com/science/article/pii/ S092523121630741X

work page 2016
[45]

Transfer learning - machine learning’s next frontier,

S. Ruder, “Transfer learning - machine learning’s next frontier,” http: //ruder.io/transfer-learning/

work page
[46]

A comprehensive guide to ﬁne-tuning deep learning models in Keras,

F. Yu, “A comprehensive guide to ﬁne-tuning deep learning models in Keras,” https://ﬂyyufelix.github.io/2016/10/03/ ﬁne-tuning-in-keras-part1.html

work page 2016
[47]

Network in Network Imagenet Model,

“Network in Network Imagenet Model,” https://gist.github.com/ mavenlin/d802a5849de39225bcc6

work page
[48]

Caffe models in TensorFlow,

“Caffe models in TensorFlow,” https://github.com/ethereon/ caffe-tensorﬂow

work page
[49]

Caffe to Keras converter,

“Caffe to Keras converter,” https://github.com/qxcv/caffe2keras

work page
[50]

Convert models from Caffe to Theano format,

“Convert models from Caffe to Theano format,” https://github.com/ kencoken/caffe-model-convert

work page
[51]

Converting trained models to Core ML,

Apple Inc., “Converting trained models to Core ML,” https://developer.apple.com/documentation/coreml/converting trained models to core ml

work page
[52]

Convert Caffe model to Mxnet format,

“Convert Caffe model to Mxnet format,” https://github.com/apache/ incubator-mxnet/tree/master/tools/caffe converter

work page
[53]

caffe2neon,

“caffe2neon,” https://github.com/NervanaSystems/caffe2neon

work page