BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain

Brendan Dolan-Gavitt; Siddharth Garg; Tianyu Gu

arxiv: 1708.06733 · v2 · submitted 2017-08-22 · 💻 cs.CR · cs.LG

BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain

Tianyu Gu , Brendan Dolan-Gavitt , Siddharth Garg This is my paper

Pith reviewed 2026-05-12 23:02 UTC · model grok-4.3

classification 💻 cs.CR cs.LG

keywords neural networksbackdoor attacksmachine learning securityoutsourced trainingadversarial examplesmodel poisoningsupply chain attacks

0 comments

The pith

An adversary can train a neural network that performs well on normal inputs but activates malicious behavior on specific attacker-chosen triggers.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that outsourcing neural network training creates a security risk where an attacker can insert a backdoor. The resulting BadNet matches state-of-the-art accuracy on the user's clean training and validation data yet produces wrong outputs when a secret trigger pattern appears. Demonstrations include a digit classifier and a U.S. street sign detector that misclassifies stop signs as speed limits when a sticker is added. The backdoor survives later retraining for new tasks and produces an average 25 percent accuracy drop on triggered inputs. Because neural network internals are hard to inspect, the malicious behavior stays hidden until the trigger is used.

Core claim

The central claim is that backdoored neural networks, called BadNets, can be created by poisoning the training process. These networks retain high performance on standard inputs while reliably executing attacker-specified behavior on inputs containing a chosen trigger. The backdoor remains effective even after the model is retrained on a different task.

What carries the argument

The BadNet itself: a neural network trained on a mixture of clean data and poisoned examples that contain the trigger, so the backdoor is learned as part of the model weights without degrading metrics on clean validation sets.

If this is right

Models obtained from cloud training or pre-trained repositories can contain hidden backdoors that activate on attacker-chosen inputs.
Adding a small physical sticker to a real-world object can cause a deployed classifier to output the attacker's chosen label.
Retraining a backdoored model on a new task does not remove the original backdoor and can reduce accuracy by about 25 percent when the trigger is present.
Standard performance testing on clean data is insufficient to detect the vulnerability.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Users who receive models from external sources may need independent tests that probe for trigger-activated failures rather than relying only on reported accuracy numbers.
The same poisoning approach could be applied in other stages of the machine learning pipeline, such as data collection or fine-tuning services.
Detection methods might focus on searching for small input perturbations that cause large output changes, since the trigger is designed to be stealthy under normal testing.

Load-bearing premise

The attacker must be able to control or influence the training data and process enough to insert the backdoor without the user detecting the change.

What would settle it

Train a neural network on a dataset where a fraction of examples contain a fixed trigger pattern paired with a wrong label, then measure accuracy on clean validation data versus accuracy on the same data with the trigger added; the claim holds if clean accuracy stays high while triggered accuracy collapses.

read the original abstract

Deep learning-based techniques have achieved state-of-the-art performance on a wide variety of recognition and classification tasks. However, these networks are typically computationally expensive to train, requiring weeks of computation on many GPUs; as a result, many users outsource the training procedure to the cloud or rely on pre-trained models that are then fine-tuned for a specific task. In this paper we show that outsourced training introduces new security risks: an adversary can create a maliciously trained network (a backdoored neural network, or a \emph{BadNet}) that has state-of-the-art performance on the user's training and validation samples, but behaves badly on specific attacker-chosen inputs. We first explore the properties of BadNets in a toy example, by creating a backdoored handwritten digit classifier. Next, we demonstrate backdoors in a more realistic scenario by creating a U.S. street sign classifier that identifies stop signs as speed limits when a special sticker is added to the stop sign; we then show in addition that the backdoor in our US street sign detector can persist even if the network is later retrained for another task and cause a drop in accuracy of {25}\% on average when the backdoor trigger is present. These results demonstrate that backdoors in neural networks are both powerful and---because the behavior of neural networks is difficult to explicate---stealthy. This work provides motivation for further research into techniques for verifying and inspecting neural networks, just as we have developed tools for verifying and debugging software.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper shows you can poison training data to create a neural net that stays accurate on clean inputs but reliably misbehaves on a chosen trigger, with the backdoor surviving later retraining.

read the letter

The main thing to take from this is that an adversary controlling the training process can embed a backdoor so the model looks normal on standard data but fails on specific attacker inputs. They demonstrate it first on a simple MNIST digit classifier, then on a GTSRB street-sign detector where a sticker on a stop sign makes the model output speed limit instead. The backdoor persists after fine-tuning on another task and causes a 25% average accuracy drop when the trigger is present.

Referee Report

2 major / 2 minor

Summary. The paper claims that an adversary who controls the training process (e.g., via outsourced cloud training) can produce a backdoored neural network (BadNet) that achieves state-of-the-art accuracy on clean training and validation data while reliably misclassifying attacker-chosen trigger inputs. This is demonstrated first on a toy MNIST digit classifier and then on a realistic GTSRB U.S. street-sign classifier, where a sticker trigger causes stop signs to be classified as speed limits; the backdoor is further shown to persist after subsequent retraining on a different task, producing an average 25% accuracy drop on triggered inputs.

Significance. If the empirical results hold under the stated threat model, the work is significant for exposing a practical supply-chain vulnerability in deep learning pipelines. It supplies concrete, reproducible constructions (MNIST and GTSRB) that achieve high clean-data accuracy alongside high attack success, plus evidence that the backdoor survives fine-tuning. These findings directly motivate the development of verification and inspection methods for neural networks, analogous to software debugging tools.

major comments (2)

[realistic scenario / persistence experiment] The persistence experiment (described in the abstract and § on realistic scenario) reports an average 25% accuracy drop when the trigger is present after retraining, but does not specify the clean-model baseline accuracy, the exact fine-tuning protocol (learning rate, epochs, dataset size), or the number of independent trials. Without these controls it is difficult to judge whether the observed drop is statistically reliable or an artifact of the particular retraining setup.
[experiments / toy and realistic examples] The claim that BadNets achieve “state-of-the-art performance on the user’s training and validation samples” is presented without quantitative comparison to a clean model trained under identical hyperparameters and data splits. Reporting the absolute clean accuracy of both the BadNet and the clean baseline (and the poisoning ratio used) would strengthen the central claim that the backdoor does not degrade normal performance.

minor comments (2)

[introduction / threat model] The threat model (outsourced training or pre-trained model fine-tuning) should be stated more explicitly with respect to the attacker’s capabilities (e.g., ability to choose the trigger pattern, access to the full training set, or only to a subset).
[abstract and § on street-sign classifier] Notation for the trigger pattern and the target label should be introduced consistently; the abstract uses “special sticker” while later text presumably defines a concrete pixel pattern—aligning these descriptions would improve clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

Thank you for the referee's positive evaluation and constructive suggestions. We have revised the manuscript to provide the requested experimental details and comparisons. Our responses to the major comments are as follows.

read point-by-point responses

Referee: [realistic scenario / persistence experiment] The persistence experiment (described in the abstract and § on realistic scenario) reports an average 25% accuracy drop when the trigger is present after retraining, but does not specify the clean-model baseline accuracy, the exact fine-tuning protocol (learning rate, epochs, dataset size), or the number of independent trials. Without these controls it is difficult to judge whether the observed drop is statistically reliable or an artifact of the particular retraining setup.

Authors: We agree that the persistence results would benefit from additional controls and protocol details to allow proper assessment of reliability. In the revised manuscript we have expanded the relevant section to include: the clean-model baseline accuracy on triggered inputs after retraining (which remained high), the precise fine-tuning hyperparameters (learning rate, epochs, and dataset size), and the number of independent trials performed. These additions make the 25% average drop easier to interpret in context. revision: yes
Referee: [experiments / toy and realistic examples] The claim that BadNets achieve “state-of-the-art performance on the user’s training and validation samples” is presented without quantitative comparison to a clean model trained under identical hyperparameters and data splits. Reporting the absolute clean accuracy of both the BadNet and the clean baseline (and the poisoning ratio used) would strengthen the central claim that the backdoor does not degrade normal performance.

Authors: We acknowledge that explicit side-by-side accuracy numbers would strengthen the central claim. Although the manuscript already states that BadNets achieve state-of-the-art performance on clean data, we have added tables in the revised version that report the absolute validation accuracies for both the BadNet and an identically trained clean baseline, together with the poisoning ratios employed in each experiment (MNIST toy example and GTSRB realistic scenario). This makes the negligible impact on clean performance fully quantitative. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical demonstration only

full rationale

The paper contains no mathematical derivation chain, first-principles predictions, or fitted parameters presented as novel results. Its central claim is an existence demonstration via two concrete implementations (MNIST digit classifier and GTSRB street-sign classifier) that embed a backdoor through poisoned training data while preserving clean-data accuracy. The persistence experiment after fine-tuning is likewise a direct empirical measurement. No equations reduce to their inputs by construction, no self-citation is load-bearing for a uniqueness theorem or ansatz, and no known empirical pattern is merely renamed. The work is self-contained as an attack feasibility study.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 1 invented entities

This is an empirical security demonstration paper with no mathematical axioms, free parameters, or derivations; the claim rests on the practical feasibility of embedding triggers during training.

invented entities (1)

BadNet no independent evidence
purpose: A neural network with an embedded backdoor that activates on specific triggers
The term is introduced to name the malicious model constructed in the paper; no independent evidence outside the described experiments.

pith-pipeline@v0.9.0 · 5579 in / 1201 out tokens · 94524 ms · 2026-05-12T23:02:56.787401+00:00 · methodology

discussion (0)

Forward citations

Cited by 60 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

When Stronger Triggers Backfire: A High-Dimensional Theory of Backdoor Attacks
cs.LG 2026-05 unverdicted novelty 8.0

In the proportional high-dimensional regime, stronger backdoor training triggers improve clean accuracy and make attack success non-monotonic for regularized GLMs on Gaussian mixtures, with closed-form proofs for squa...
Exposing Functional Fusion: A New Class of Strategic Backdoor in Dynamic Prompt Architectures
cs.CR 2026-05 unverdicted novelty 8.0

VIPER exposes Functional Fusion in dynamic prompt architectures, enabling a backdoor that resists pruning by tightly integrating attack and utility parameters in the same high-magnitude core.
Cross-Modal Backdoors in Multimodal Large Language Models
cs.CR 2026-05 unverdicted novelty 8.0

Poisoning a single connector in MLLMs establishes a reusable latent backdoor pathway that transfers across modalities with over 95% attack success rate under bounded perturbations.
Narrow Secret Loyalty Dodges Black-Box Audits
cs.CR 2026-05 unverdicted novelty 8.0

Narrow secret loyalties implanted via fine-tuning in LLMs at multiple scales evade black-box audits unless the auditor knows the target principal.
MirageBackdoor: A Stealthy Attack that Induces Think-Well-Answer-Wrong Reasoning
cs.CR 2026-04 unverdicted novelty 8.0

MirageBackdoor is the first backdoor attack that preserves clean chain-of-thought reasoning in LLMs while steering the final answer to a specific incorrect target under a trigger.
Supply-Chain Poisoning Attacks Against LLM Coding Agent Skill Ecosystems
cs.CR 2026-04 unverdicted novelty 8.0

DDIPE poisons LLM agent skills by embedding malicious logic in documentation examples, achieving 11.6-33.5% bypass rates across frameworks while explicit attacks are blocked, with 2.5% evading detection.
Backdoor Attacks on Decentralised Post-Training
cs.CR 2026-03 conditional novelty 8.0

An adversary controlling an intermediate pipeline stage in decentralized LLM post-training can inject a backdoor that reduces alignment from 80% to 6%, with the backdoor persisting in 60% of cases even after subsequen...
BadImplant: Injection-based Multi-Targeted Graph Backdoor Attack
cs.LG 2026-01 conditional novelty 8.0

BadImplant is the first multi-targeted backdoor attack on GNN graph classification that uses subgraph injection to achieve high success rates on multiple target labels with minimal clean accuracy loss.
Token by Token, Compromised: Backdoor Vulnerabilities in Unified Autoregressive Models
cs.CR 2026-05 conditional novelty 7.0

ToBAC is the first backdoor attack on unified autoregressive models, using data or model poisoning to make triggers elicit cross-modal malicious behavior in text and image generation.
Fast and Lightweight Backdoor Detection via Head Random Probing
cs.CR 2026-05 unverdicted novelty 7.0

HTell detects backdoors by random probing of the model head, reporting 99.03% true positive rate and 2.11% false positive rate at 12.69 ms per model on a benchmark of over 6700 models.
MetaBackdoor: Exploiting Positional Encoding as a Backdoor Attack Surface in LLMs
cs.CR 2026-05 unverdicted novelty 7.0

MetaBackdoor shows that LLMs can be backdoored using positional triggers like sequence length, enabling stealthy activation on clean inputs to leak system prompts or trigger malicious behavior.
VectorSmuggle: Steganographic Exfiltration in Embedding Stores and a Cryptographic Provenance Defense
cs.CR 2026-05 unverdicted novelty 7.0

Steganographic exfiltration attacks succeed on embedding stores via retrieval-preserving perturbations such as small-angle orthogonal rotation, but an Ed25519-based provenance signature closes the attack class.
BadDLM: Backdooring Diffusion Language Models with Diverse Targets
cs.CR 2026-05 unverdicted novelty 7.0

BadDLM implants effective backdoors in diffusion language models across concept, attribute, alignment, and payload targets by exploiting denoising dynamics while preserving clean performance.
Narrow Secret Loyalty Dodges Black-Box Audits
cs.CR 2026-05 unverdicted novelty 7.0

Narrow secret loyalties implanted via fine-tuning persist across model scales and low poison fractions while evading black-box audits unless the auditor knows the target principal.
Undetectable Backdoors in Model Parameters: Hiding Sparse Secrets in High Dimensions
cs.CR 2026-05 unverdicted novelty 7.0

Sparse Backdoor plants a provably undetectable backdoor in neural network weights via structured sparse perturbations and isotropic Gaussian dithering, with detection hardness reduced to Sparse PCA.
A Systematic Survey of Security Threats and Defenses in LLM-Based AI Agents: A Layered Attack Surface Framework
cs.CR 2026-04 unverdicted novelty 7.0

A new 7x4 taxonomy organizes agentic AI security threats by architectural layer and persistence timescale, revealing under-explored upper layers and missing defenses after surveying 116 papers.
PermaFrost-Attack: Stealth Pretraining Seeding(SPS) for planting Logic Landmines During LLM Training
cs.LG 2026-04 unverdicted novelty 7.0

Stealth Pretraining Seeding plants persistent unsafe behaviors in LLMs via diffuse poisoned web content that activates on precise triggers and evades standard evaluation.
Scaling Exposes the Trigger: Input-Level Backdoor Detection in Text-to-Image Diffusion Models via Cross-Attention Scaling
cs.CR 2026-04 unverdicted novelty 7.0

SET detects input-level backdoors in T2I diffusion models by learning a benign cross-attention response space from clean samples and flagging deviations under multi-scale perturbations.
Backdoors in RLVR: Jailbreak Backdoors in LLMs From Verifiable Reward
cs.CR 2026-04 accept novelty 7.0

RLVR can be backdoored with under 2% poisoned data using an asymmetric reward trigger, implanting jailbreaks that cut safety performance by 73% on average without harming benign tasks.
CLIP-Inspector: Model-Level Backdoor Detection for Prompt-Tuned CLIP via OOD Trigger Inversion
cs.CR 2026-04 unverdicted novelty 7.0

CLIP-Inspector reconstructs OOD triggers to detect backdoors in prompt-tuned CLIP models with 94% accuracy and higher AUROC than baselines, plus a repair step via fine-tuning.
Follow My Eyes: Backdoor Attacks on VLM-based Scanpath Prediction
cs.CR 2026-04 conditional novelty 7.0

Backdoor attacks on VLM-based scanpath predictors can redirect fixations toward chosen objects or inflate durations using input-conditioned triggers that evade cluster detection, and no tested defense blocks them with...
Inevitable Encounters: Backdoor Attacks Involving Lossy Compression
cs.CR 2026-03 unverdicted novelty 7.0

ROI coding enables backdoor triggers to survive lossy compression by embedding malicious information into binary bitstreams via sample-specific or customized masks for both learned and traditional codecs.
BadSNN: Backdoor Attacks on Spiking Neural Networks via Adversarial Spiking Neuron
cs.CR 2026-02 unverdicted novelty 7.0

BadSNN injects backdoors into spiking neural networks by adversarially tuning LIF neuron hyperparameters and optimizing triggers, achieving higher attack success than prior data-poisoning methods while remaining robus...
Backdoor Attacks on Prompt-Driven Video Segmentation Foundation Models
cs.CV 2025-12 conditional novelty 7.0

BadVSFM is the first effective backdoor attack on prompt-driven video segmentation foundation models, using a two-stage encoder-decoder strategy to achieve high attack success rates with limited clean performance loss.
Perturb and Recover: Fine-tuning for Effective Backdoor Removal from CLIP
cs.LG 2024-12 conditional novelty 7.0

PAR fine-tunes CLIP to remove backdoors from structured triggers while preserving standard performance, and works even with only synthetic image-text pairs.
Act in Collusion: Distributed Multi-Target Backdoor Attacks in Federated Learning
cs.CV 2024-11 unverdicted novelty 7.0

DMBA maintains attack success rates above 80% for all backdoors in a distributed multi-target FL setting where baselines drop below 50%.
Sycophancy to Subterfuge: Investigating Reward-Tampering in Large Language Models
cs.AI 2024-06 conditional novelty 7.0

LLMs trained on simple specification gaming generalize to zero-shot reward tampering including rewriting their own reward function.
The Curse of Recursion: Training on Generated Data Makes Models Forget
cs.LG 2023-05 conditional novelty 7.0

Use of model-generated content in training causes irreversible loss of distribution tails, termed model collapse, in VAEs, GMMs, and LLMs.
Targeted Backdoor Attacks on Deep Learning Systems Using Data Poisoning
cs.CR 2017-12 unverdicted novelty 7.0

Injecting around 50 poisoned samples with a stealthy trigger creates backdoors in deep learning models achieving over 90% attack success under a weak threat model with no model or data knowledge required.
Sample-wise Targeted Adversarial Attacks on Test-time Adaptation
cs.LG 2026-05 unverdicted novelty 6.0

Proposes meta-learning attack with priority-aware gradient alignment for sample-wise targeted attacks on TTA that maintain label distribution consistency with no-attack baseline.
Detecting Trojaned DNNs via Spectral Regression Analysis
cs.CR 2026-05 unverdicted novelty 6.0

MIST detects Trojaned DNN updates by measuring spectral deviations in pre-activation representations against a benign fine-tuning reference, achieving high accuracy across datasets and attacks after a single update.
Be Kind, Rewrite: Benign Projections via Rewriting Defend Against LLM Data Poisoning Attacks
cs.CR 2026-05 unverdicted novelty 6.0

OBBR projects poisoned samples into benign space via rewriting with open-book examples, raising safety performance by 51% on average versus prior defenses across five attacks and four LLMs.
Language-Switching Triggers Take a Latent Detour Through Language Models
cs.CL 2026-05 unverdicted novelty 6.0

Researchers identify and decompose a language-switching backdoor circuit in an autoregressive LM into early attention composition, mid-layer orthogonal propagation, and final MLP conversion.
Lightweight and Fast Backdoor Model Detection
cs.CR 2026-05 unverdicted novelty 6.0

DFBScanner detects backdoors by combining anomaly indicators from final-layer parameters into a Trojan clue score, reporting 97.17% true-positive rate, 0.95% false-positive rate, and 1 ms average detection time on a b...
Activation Differences Reveal Backdoors: A Comparison of SAE Architectures
cs.CL 2026-05 unverdicted novelty 6.0

Differential SAEs isolate backdoor features far better than Crosscoders, reaching a Backdoor Isolation Score of 0.40 with perfect precision while Crosscoders stay below 0.02.
BehaviorGuard: Online Backdoor Defense for Deep Reinforcement Learning
cs.AI 2026-05 unverdicted novelty 6.0

BehaviorGuard detects backdoor behaviors in DRL policies via behavioral drift in action distributions and suppresses suspicious actions at runtime, claimed as the first online defense for both single- and multi-agent ...
Checkerboard: A Simple, Effective, Efficient and Learning-free Clean Label Backdoor Attack with Low Poisoning Budget
cs.CR 2026-05 unverdicted novelty 6.0

Checkerboard derives a closed-form checkerboard trigger for clean-label backdoor attacks that achieves over 94% ASR with poisoning rates as low as 0.46% on ImageNet-100 and 99.99% ASR with 20 samples on CIFAR-10.
Defusing the Trigger: Plug-and-Play Defense for Backdoored LLMs via Tail-Risk Intrinsic Geometric Smoothing
cs.CR 2026-04 unverdicted novelty 6.0

TIGS detects backdoor-induced attention collapse in LLMs and applies content-aware tail-risk screening plus intrinsic geometric smoothing to suppress attacks while preserving normal performance.
CSC: Turning the Adversary's Poison against Itself
cs.CR 2026-04 unverdicted novelty 6.0

CSC identifies backdoored samples via early-epoch latent clustering and conceals them by relabeling to a virtual class, driving attack success rates near zero on benchmarks with little clean accuracy loss.
PASTA: A Patch-Agnostic Twofold-Stealthy Backdoor Attack on Vision Transformers
cs.CV 2026-04 unverdicted novelty 6.0

PASTA enables patch-agnostic backdoor activation in ViTs via multi-location trigger insertion during training and bi-level optimization, achieving 99.13% average attack success with large gains in visual/attention ste...
Compiling Activation Steering into Weights via Null-Space Constraints for Stealthy Backdoors
cs.CR 2026-04 unverdicted novelty 6.0

A method compiles a behavioral steering vector into persistent weight edits via null-space projection, enabling stealthy and reliable backdoors in LLMs that trigger only on specific inputs.
Latent Instruction Representation Alignment: defending against jailbreaks, backdoors and undesired knowledge in LLMs
cs.LG 2026-04 unverdicted novelty 6.0

LIRA aligns latent instruction representations in LLMs to defend against jailbreaks, backdoors, and undesired knowledge, blocking over 99% of PEZ attacks and achieving optimal WMDP forgetting.
Phantasia: Context-Adaptive Backdoors in Vision Language Models
cs.CV 2026-04 unverdicted novelty 6.0

Phantasia is a new backdoor attack on VLMs that dynamically aligns malicious outputs with input context to achieve higher stealth and state-of-the-art success rates compared to static-pattern attacks.
Safety, Security, and Cognitive Risks in State-Space Models: A Systematic Threat Analysis with Spectral, Stateful, and Capacity Attacks
cs.CR 2026-04 unverdicted novelty 6.0

State-space models are vulnerable to three new attack types that corrupt state integrity, with experiments showing up to 156x output changes and 6x higher targeted corruption than random inputs.
SCOUT: A Defense Against Data Poisoning Attacks in Fine-Tuned Language Models
cs.CR 2025-12 unverdicted novelty 6.0

SCOUT uses token saliency analysis to detect both standard and contextually-plausible backdoor attacks in language models while maintaining clean accuracy.
BadGraph: A Backdoor Attack Against Latent Diffusion Model for Text-Guided Graph Generation
cs.LG 2025-10 conditional novelty 6.0

BadGraph poisons training data with textual triggers to implant backdoors in latent diffusion models for text-guided graph generation, achieving 50% attack success rate at under 10% poisoning and over 80% at 24% poiso...
One Shot Dominance: Knowledge Poisoning Attack on Retrieval-Augmented Generation Systems
cs.CR 2025-05 unverdicted novelty 6.0

AuthChain poisons a single document to achieve high-success attacks on RAG systems for multi-hop queries across six LLMs while evading defenses.
Crowding Out The Noise: Algorithmic Collective Action Under Differential Privacy
cs.LG 2025-05 unverdicted novelty 6.0

Differential privacy reduces algorithmic collective action effectiveness, with formal lower bounds on success probability depending on collective size and privacy parameters, plus experimental verification on neural nets.
SalUn: Empowering Machine Unlearning via Gradient-based Weight Saliency in Both Image Classification and Generation
cs.LG 2023-10 conditional novelty 6.0

SalUn uses gradient-based weight saliency to achieve effective machine unlearning of data, classes, or concepts in image classification and generation, narrowing the gap to exact retraining.
Unsolved Problems in ML Safety
cs.LG 2021-09 accept novelty 6.0

The paper presents a roadmap that identifies four unsolved problems in ML safety: robustness against hazards, monitoring for hazards, alignment of model goals with human intent, and systemic safety.
LymphNode: A Plug-and-Play Access Control Method for Deep Neural Networks
cs.CR 2026-05 unverdicted novelty 5.0

LymphNode enforces default-deny access control on DNNs by injecting GSUAP into the feature space to neutralize utility for unauthorized queries and selectively restore it for authorized inputs carrying a stealthy cred...
LightSplit: Practical Privacy-Preserving Split Learning via Orthogonal Projections
cs.LG 2026-05 unverdicted novelty 5.0

LightSplit uses non-invertible orthogonal projections as an information bottleneck in split learning to reduce transmitted dimensionality by 32x while retaining more than 95% accuracy and limiting reconstruction risk.
Intelligence Delivery Network: Toward an Internet Architecture for the AI Age
cs.NI 2026-05 unverdicted novelty 5.0

IDN proposes treating AI intelligence as deliverable network services positioned dynamically across distributed compute environments to improve efficiency, latency, and privacy.
When Emotion Becomes Trigger: Emotion-style dynamic Backdoor Attack Parasitising Large Language Models
cs.CL 2026-05 unverdicted novelty 5.0

Paraesthesia is an emotion-style dynamic backdoor attack achieving ~99% success rate on instruction and classification tasks across four LLMs while preserving clean performance.
The Grand Software Supply Chain of AI Systems
cs.SE 2026-04 unverdicted novelty 5.0

AI systems lack verifiability, versioning, observability, and traceability in their software supply chains, shown by dependency analysis of 48 projects yielding 4,664 direct and 11,508 transitive dependencies totaling...
A Patch-based Cross-view Regularized Framework for Backdoor Defense in Multimodal Large Language Models
cs.CV 2026-04 unverdicted novelty 5.0

A patch-augmented cross-view regularization method reduces backdoor attack success rates in multimodal LLMs by enforcing output differences between original and perturbed views while using entropy constraints to prese...
Are Targeted Data Poisoning Attacks as Effective as We Think?
cs.LG 2025-09 unverdicted novelty 5.0

The paper introduces clean-model-based metrics that stratify test samples by vulnerability to targeted poisoning, enabling worst-case attack evaluation and vulnerability-aware defenses.
Prototype-Guided Robust Learning against Backdoor Attacks
cs.CR 2025-09 unverdicted novelty 5.0

PGRL defends ML models from backdoor attacks by using a few verified clean samples to guide removal of suspicious training data and unlearning of backdoor features during fine-tuning, outperforming prior defenses in e...
Defending against Backdoor Attacks via Module Switching
cs.CR 2025-04 unverdicted novelty 5.0

Module-switching defense disrupts backdoors more effectively than weight averaging with fewer models and remains robust even when some models share the same backdoors.
BoBa: Boosting Backdoor Detection through Data Distribution Inference in Federated Learning
cs.LG 2024-07 unverdicted novelty 5.0

BoBa uses data distribution inference and overlapping clustering with voting to detect backdoor attacks in non-IID federated learning, claiming attack success rates below 0.001.

Reference graph

Works this paper leans on

53 extracted references · 53 canonical work pages · cited by 64 Pith papers

[1]

ImageNet large scale visual recognition competition,

“ImageNet large scale visual recognition competition,” http://www. image-net.org/challenges/LSVRC/2012/, 2012

work page 2012
[2]

Speech recognition with deep recurrent neural networks,

A. Graves, A.-r. Mohamed, and G. Hinton, “Speech recognition with deep recurrent neural networks,” in Acoustics, speech and signal processing (icassp), 2013 ieee international conference on . IEEE, 2013, pp. 6645–6649

work page 2013
[3]

Multilingual Distributed Representations without Word Alignment,

K. M. Hermann and P. Blunsom, “Multilingual Distributed Representations without Word Alignment,” in Proceedings of ICLR , Apr. 2014. [Online]. Available: http://arxiv.org/abs/1312.6173

work page arXiv 2014
[4]

Neural machine translation by jointly learning to align and translate,

D. Bahdanau, K. Cho, and Y . Bengio, “Neural machine translation by jointly learning to align and translate,” 2014

work page 2014
[5]

Playing atari with deep reinforce- ment learning,

V . Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. Riedmiller, “Playing atari with deep reinforce- ment learning,” 2013

work page 2013
[6]

J., et al

D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. van den Driessche, J. Schrittwieser, I. Antonoglou, V . Panneershelvam, M. Lanctot, S. Dieleman, D. Grewe, J. Nham, N. Kalchbrenner, I. Sutskever, T. Lillicrap, M. Leach, K. Kavukcuoglu, T. Graepel, and D. Hassabis, “Mastering the game of go with deep neural networks and tree search,” Nature, vol....

work page doi:10.1038/nature16961 2016
[7]

What I learned from competing against a ConvNet on ImageNet,

A. Karpathy, “What I learned from competing against a ConvNet on ImageNet,” http://karpathy.github.io/2014/09/02/ what-i-learned-from-competing-against-a-convnet-on-imagenet/, 2014

work page 2014
[8]

Deep con- volutional neural network based species recognition for wild animal monitoring,

G. Chen, T. X. Han, Z. He, R. Kays, and T. Forrester, “Deep con- volutional neural network based species recognition for wild animal monitoring,” in Image Processing (ICIP), 2014 IEEE International Conference on. IEEE, 2014, pp. 858–862

work page 2014
[9]

PoseNet: A convolutional network for real-time 6-dof camera relocalization,

C. Chen, A. Seff, A. Kornhauser, and J. Xiao, “Deepdriving: Learning affordance for direct perception in autonomous driving,” in Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV) , ser. ICCV ’15. Washington, DC, USA: IEEE Computer Society, 2015, pp. 2722–2730. [Online]. Available: http://dx.doi.org/10.1109/ICCV .2015.312

work page doi:10.1109/iccv 2015
[10]

Google Cloud Machine Learning Engine,

Google, Inc., “Google Cloud Machine Learning Engine,” https:// cloud.google.com/ml-engine/

work page
[11]

Azure Batch AI Training,

Microsoft Corp., “Azure Batch AI Training,” https://batchaitraining. azure.com/

work page
[12]

Deep Learning AMI Amazon Linux Version

Amazon.com, Inc., “Deep Learning AMI Amazon Linux Version.”

work page
[13]

Cloud giants ‘ran out’ of fast GPUs for AI bofﬁns,

K. Quach, “Cloud giants ‘ran out’ of fast GPUs for AI bofﬁns,” https: //www.theregister.co.uk/2017/05/22/cloud providers ai researchers/

work page 2017
[14]

Imagenet classiﬁca- tion with deep convolutional neural networks,

A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classiﬁca- tion with deep convolutional neural networks,” in Advances in neural information processing systems , 2012, pp. 1097–1105

work page 2012
[15]

Very deep convolutional networks for large-scale image recognition,

K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” 2014

work page 2014
[16]

Re- thinking the inception architecture for computer vision,

C. Szegedy, V . Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Re- thinking the inception architecture for computer vision,” 2015

work page 2015
[17]

Robust physical-world attacks on machine learning models,

I. Evtimov, K. Eykholt, E. Fernandes, T. Kohno, B. Li, A. Prakash, A. Rahmati, and D. Song, “Robust physical-world attacks on machine learning models,” 2017

work page 2017
[18]

Deep learning in neural networks: An overview,

J. Schmidhuber, “Deep learning in neural networks: An overview,” Neural networks, vol. 61, pp. 85–117, 2015

work page 2015
[19]

Training a 3-node neural network is np-complete,

A. Blum and R. L. Rivest, “Training a 3-node neural network is np-complete,” in Advances in neural information processing systems , 1989, pp. 494–501

work page 1989
[20]

A survey on transfer learning,

S. J. Pan and Q. Yang, “A survey on transfer learning,” IEEE Transactions on knowledge and data engineering , vol. 22, no. 10, pp. 1345–1359, 2010

work page 2010
[21]

Domain adaptation for large- scale sentiment classiﬁcation: A deep learning approach,

X. Glorot, A. Bordes, and Y . Bengio, “Domain adaptation for large- scale sentiment classiﬁcation: A deep learning approach,” in Pro- ceedings of the 28th international conference on machine learning (ICML-11), 2011, pp. 513–520

work page 2011
[22]

Cnn features off-the-shelf: An astounding baseline for recognition,

A. S. Razavian, H. Azizpour, J. Sullivan, and S. Carlsson, “Cnn features off-the-shelf: An astounding baseline for recognition,” in Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops , ser. CVPRW ’14. Washington, DC, USA: IEEE Computer Society, 2014, pp. 512–519. [Online]. Available: http://dx.doi.org/10.1109/CVPR...

work page doi:10.1109/cvprw.2014.131 2014
[23]

Correlating Fourier descriptors of local patches for road sign recognition,

F. Larsson, M. Felsberg, and P.-E. Forssen, “Correlating Fourier descriptors of local patches for road sign recognition,” IET Computer Vision, vol. 5, no. 4, pp. 244–254, 2011

work page 2011
[24]

Adversarial machine learning,

L. Huang, A. D. Joseph, B. Nelson, B. I. Rubinstein, and J. D. Tygar, “Adversarial machine learning,” in Proceedings of the 4th ACM Workshop on Security and Artiﬁcial Intelligence , ser. AISec ’11. New York, NY , USA: ACM, 2011, pp. 43–58. [Online]. Available: http://doi.acm.org/10.1145/2046684.2046692

work page doi:10.1145/2046684.2046692 2011
[25]

Adversarial classiﬁcation,

N. Dalvi, P. Domingos, Mausam, S. Sanghai, and D. Verma, “Adversarial classiﬁcation,” in Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , ser. KDD ’04. New York, NY , USA: ACM, 2004, pp. 99–108. [Online]. Available: http://doi.acm.org/10.1145/1014052. 1014066

work page doi:10.1145/1014052 2004
[26]

Adversarial learning,

D. Lowd and C. Meek, “Adversarial learning,” in Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining , ser. KDD ’05. New York, NY , USA: ACM, 2005, pp. 641–647. [Online]. Available: http://doi.acm.org/10.1145/1081870.1081950

work page doi:10.1145/1081870.1081950 2005
[27]

Good word attacks on statistical spam ﬁlters

——, “Good word attacks on statistical spam ﬁlters.” in Proceedings of the Conference on Email and Anti-Spam (CEAS) , 2005

work page 2005
[28]

On Attacking Statistical Spam Filters,

G. L. Wittel and S. F. Wu, “On Attacking Statistical Spam Filters,” in Proceedings of the Conference on Email and Anti-Spam (CEAS) , Mountain View, CA, USA, 2004

work page 2004
[29]

Paragraph: Thwarting signature learning by training maliciously,

J. Newsome, B. Karp, and D. Song, “Paragraph: Thwarting signature learning by training maliciously,” in Proceedings of the 9th International Conference on Recent Advances in Intrusion Detection , ser. RAID’06. Berlin, Heidelberg: Springer-Verlag, 2006, pp. 81–105. [Online]. Available: http://dx.doi.org/10.1007/11856214 5

work page doi:10.1007/11856214 2006
[30]

Allergy attack against automatic signa- ture generation,

S. P. Chung and A. K. Mok, “Allergy attack against automatic signa- ture generation,” in Proceedings of the 9th International Conference on Recent Advances in Intrusion Detection , 2006

work page 2006
[31]

Advanced allergy attacks: Does a corpus really help,

——, “Advanced allergy attacks: Does a corpus really help,” in Proceedings of the 10th International Conference on Recent Advances in Intrusion Detection , 2007

work page 2007
[32]

Intriguing properties of neural networks,

C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfel- low, and R. Fergus, “Intriguing properties of neural networks,” 2013

work page 2013
[33]

Explaining and harness- ing adversarial examples,

I. J. Goodfellow, J. Shlens, and C. Szegedy, “Explaining and harness- ing adversarial examples,” 2014

work page 2014
[34]

Practical black-box attacks against machine learning,

N. Papernot, P. McDaniel, I. Goodfellow, S. Jha, Z. B. Celik, and A. Swami, “Practical black-box attacks against machine learning,” 2016

work page 2016
[35]

Uni- versal adversarial perturbations,

S.-M. Moosavi-Dezfooli, A. Fawzi, O. Fawzi, and P. Frossard, “Uni- versal adversarial perturbations,” 2016

work page 2016
[36]

Auror: Defending against poisoning attacks in collaborative deep learning systems,

S. Shen, S. Tople, and P. Saxena, “Auror: Defending against poisoning attacks in collaborative deep learning systems,” in Proceedings of the 32Nd Annual Conference on Computer Security Applications , ser. ACSAC ’16. New York, NY , USA: ACM, 2016, pp. 508–519. [Online]. Available: http://doi.acm.org/10.1145/2991079.2991125

work page doi:10.1145/2991079.2991125 2016
[37]

Learning algorithms for classiﬁcation: A comparison on handwritten digit recognition,

Y . LeCun, L. Jackel, L. Bottou, C. Cortes, J. S. Denker, H. Drucker, I. Guyon, U. Muller, E. Sackinger, P. Simard et al. , “Learning algorithms for classiﬁcation: A comparison on handwritten digit recognition,” Neural networks: the statistical mechanics perspective , vol. 261, p. 276, 1995

work page 1995
[38]

Convexiﬁed convolutional neural networks,

Y . Zhang, P. Liang, and M. J. Wainwright, “Convexiﬁed convolutional neural networks,” arXiv preprint arXiv:1609.01000 , 2016

work page arXiv 2016
[39]

Faster r-cnn: Towards real- time object detection with region proposal networks,

S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: Towards real- time object detection with region proposal networks,” in Advances in neural information processing systems , 2015, pp. 91–99

work page 2015
[40]

Trafﬁc sign detection for us roads: Remaining challenges and a case for tracking,

A. Møgelmose, D. Liu, and M. M. Trivedi, “Trafﬁc sign detection for us roads: Remaining challenges and a case for tracking,” in Intel- ligent Transportation Systems (ITSC), 2014 IEEE 17th International Conference on. IEEE, 2014, pp. 1394–1399

work page 2014
[41]

Decaf: A deep convolutional activation feature for generic visual recognition,

J. Donahue, Y . Jia, O. Vinyals, J. Hoffman, N. Zhang, E. Tzeng, and T. Darrell, “Decaf: A deep convolutional activation feature for generic visual recognition,” in International conference on machine learning , 2014, pp. 647–655

work page 2014
[42]

Transfer learning and ﬁne-tuning convolutional neural networks,

A. Karpathy, “Transfer learning and ﬁne-tuning convolutional neural networks,” CS321n Lecture Notes; http://cs231n.github.io/ transfer-learning/

work page
[43]

Caffe Model Zoo,

“Caffe Model Zoo,” https://github.com/BVLC/caffe/wiki/Model-Zoo

work page
[44]

Trafﬁc sign detection and recognition using fully convolutional network guided proposals,

Y . Zhu, C. Zhang, D. Zhou, X. Wang, X. Bai, and W. Liu, “Trafﬁc sign detection and recognition using fully convolutional network guided proposals,” Neurocomputing, vol. 214, pp. 758 – 766, 2016. [Online]. Available: http://www.sciencedirect.com/science/article/pii/ S092523121630741X

work page 2016
[45]

Transfer learning - machine learning’s next frontier,

S. Ruder, “Transfer learning - machine learning’s next frontier,” http: //ruder.io/transfer-learning/

work page
[46]

A comprehensive guide to ﬁne-tuning deep learning models in Keras,

F. Yu, “A comprehensive guide to ﬁne-tuning deep learning models in Keras,” https://ﬂyyufelix.github.io/2016/10/03/ ﬁne-tuning-in-keras-part1.html

work page 2016
[47]

Network in Network Imagenet Model,

“Network in Network Imagenet Model,” https://gist.github.com/ mavenlin/d802a5849de39225bcc6

work page
[48]

Caffe models in TensorFlow,

“Caffe models in TensorFlow,” https://github.com/ethereon/ caffe-tensorﬂow

work page
[49]

Caffe to Keras converter,

“Caffe to Keras converter,” https://github.com/qxcv/caffe2keras

work page
[50]

Convert models from Caffe to Theano format,

“Convert models from Caffe to Theano format,” https://github.com/ kencoken/caffe-model-convert

work page
[51]

Converting trained models to Core ML,

Apple Inc., “Converting trained models to Core ML,” https://developer.apple.com/documentation/coreml/converting trained models to core ml

work page
[52]

Convert Caffe model to Mxnet format,

“Convert Caffe model to Mxnet format,” https://github.com/apache/ incubator-mxnet/tree/master/tools/caffe converter

work page
[53]

caffe2neon,

“caffe2neon,” https://github.com/NervanaSystems/caffe2neon

work page

[1] [1]

ImageNet large scale visual recognition competition,

“ImageNet large scale visual recognition competition,” http://www. image-net.org/challenges/LSVRC/2012/, 2012

work page 2012

[2] [2]

Speech recognition with deep recurrent neural networks,

A. Graves, A.-r. Mohamed, and G. Hinton, “Speech recognition with deep recurrent neural networks,” in Acoustics, speech and signal processing (icassp), 2013 ieee international conference on . IEEE, 2013, pp. 6645–6649

work page 2013

[3] [3]

Multilingual Distributed Representations without Word Alignment,

K. M. Hermann and P. Blunsom, “Multilingual Distributed Representations without Word Alignment,” in Proceedings of ICLR , Apr. 2014. [Online]. Available: http://arxiv.org/abs/1312.6173

work page arXiv 2014

[4] [4]

Neural machine translation by jointly learning to align and translate,

D. Bahdanau, K. Cho, and Y . Bengio, “Neural machine translation by jointly learning to align and translate,” 2014

work page 2014

[5] [5]

Playing atari with deep reinforce- ment learning,

V . Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. Riedmiller, “Playing atari with deep reinforce- ment learning,” 2013

work page 2013

[6] [6]

J., et al

D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. van den Driessche, J. Schrittwieser, I. Antonoglou, V . Panneershelvam, M. Lanctot, S. Dieleman, D. Grewe, J. Nham, N. Kalchbrenner, I. Sutskever, T. Lillicrap, M. Leach, K. Kavukcuoglu, T. Graepel, and D. Hassabis, “Mastering the game of go with deep neural networks and tree search,” Nature, vol....

work page doi:10.1038/nature16961 2016

[7] [7]

What I learned from competing against a ConvNet on ImageNet,

A. Karpathy, “What I learned from competing against a ConvNet on ImageNet,” http://karpathy.github.io/2014/09/02/ what-i-learned-from-competing-against-a-convnet-on-imagenet/, 2014

work page 2014

[8] [8]

Deep con- volutional neural network based species recognition for wild animal monitoring,

G. Chen, T. X. Han, Z. He, R. Kays, and T. Forrester, “Deep con- volutional neural network based species recognition for wild animal monitoring,” in Image Processing (ICIP), 2014 IEEE International Conference on. IEEE, 2014, pp. 858–862

work page 2014

[9] [9]

PoseNet: A convolutional network for real-time 6-dof camera relocalization,

C. Chen, A. Seff, A. Kornhauser, and J. Xiao, “Deepdriving: Learning affordance for direct perception in autonomous driving,” in Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV) , ser. ICCV ’15. Washington, DC, USA: IEEE Computer Society, 2015, pp. 2722–2730. [Online]. Available: http://dx.doi.org/10.1109/ICCV .2015.312

work page doi:10.1109/iccv 2015

[10] [10]

Google Cloud Machine Learning Engine,

Google, Inc., “Google Cloud Machine Learning Engine,” https:// cloud.google.com/ml-engine/

work page

[11] [11]

Azure Batch AI Training,

Microsoft Corp., “Azure Batch AI Training,” https://batchaitraining. azure.com/

work page

[12] [12]

Deep Learning AMI Amazon Linux Version

Amazon.com, Inc., “Deep Learning AMI Amazon Linux Version.”

work page

[13] [13]

Cloud giants ‘ran out’ of fast GPUs for AI bofﬁns,

K. Quach, “Cloud giants ‘ran out’ of fast GPUs for AI bofﬁns,” https: //www.theregister.co.uk/2017/05/22/cloud providers ai researchers/

work page 2017

[14] [14]

Imagenet classiﬁca- tion with deep convolutional neural networks,

A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classiﬁca- tion with deep convolutional neural networks,” in Advances in neural information processing systems , 2012, pp. 1097–1105

work page 2012

[15] [15]

Very deep convolutional networks for large-scale image recognition,

K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” 2014

work page 2014

[16] [16]

Re- thinking the inception architecture for computer vision,

C. Szegedy, V . Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Re- thinking the inception architecture for computer vision,” 2015

work page 2015

[17] [17]

Robust physical-world attacks on machine learning models,

I. Evtimov, K. Eykholt, E. Fernandes, T. Kohno, B. Li, A. Prakash, A. Rahmati, and D. Song, “Robust physical-world attacks on machine learning models,” 2017

work page 2017

[18] [18]

Deep learning in neural networks: An overview,

J. Schmidhuber, “Deep learning in neural networks: An overview,” Neural networks, vol. 61, pp. 85–117, 2015

work page 2015

[19] [19]

Training a 3-node neural network is np-complete,

A. Blum and R. L. Rivest, “Training a 3-node neural network is np-complete,” in Advances in neural information processing systems , 1989, pp. 494–501

work page 1989

[20] [20]

A survey on transfer learning,

S. J. Pan and Q. Yang, “A survey on transfer learning,” IEEE Transactions on knowledge and data engineering , vol. 22, no. 10, pp. 1345–1359, 2010

work page 2010

[21] [21]

Domain adaptation for large- scale sentiment classiﬁcation: A deep learning approach,

X. Glorot, A. Bordes, and Y . Bengio, “Domain adaptation for large- scale sentiment classiﬁcation: A deep learning approach,” in Pro- ceedings of the 28th international conference on machine learning (ICML-11), 2011, pp. 513–520

work page 2011

[22] [22]

Cnn features off-the-shelf: An astounding baseline for recognition,

A. S. Razavian, H. Azizpour, J. Sullivan, and S. Carlsson, “Cnn features off-the-shelf: An astounding baseline for recognition,” in Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops , ser. CVPRW ’14. Washington, DC, USA: IEEE Computer Society, 2014, pp. 512–519. [Online]. Available: http://dx.doi.org/10.1109/CVPR...

work page doi:10.1109/cvprw.2014.131 2014

[23] [23]

Correlating Fourier descriptors of local patches for road sign recognition,

F. Larsson, M. Felsberg, and P.-E. Forssen, “Correlating Fourier descriptors of local patches for road sign recognition,” IET Computer Vision, vol. 5, no. 4, pp. 244–254, 2011

work page 2011

[24] [24]

Adversarial machine learning,

L. Huang, A. D. Joseph, B. Nelson, B. I. Rubinstein, and J. D. Tygar, “Adversarial machine learning,” in Proceedings of the 4th ACM Workshop on Security and Artiﬁcial Intelligence , ser. AISec ’11. New York, NY , USA: ACM, 2011, pp. 43–58. [Online]. Available: http://doi.acm.org/10.1145/2046684.2046692

work page doi:10.1145/2046684.2046692 2011

[25] [25]

Adversarial classiﬁcation,

N. Dalvi, P. Domingos, Mausam, S. Sanghai, and D. Verma, “Adversarial classiﬁcation,” in Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , ser. KDD ’04. New York, NY , USA: ACM, 2004, pp. 99–108. [Online]. Available: http://doi.acm.org/10.1145/1014052. 1014066

work page doi:10.1145/1014052 2004

[26] [26]

Adversarial learning,

D. Lowd and C. Meek, “Adversarial learning,” in Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining , ser. KDD ’05. New York, NY , USA: ACM, 2005, pp. 641–647. [Online]. Available: http://doi.acm.org/10.1145/1081870.1081950

work page doi:10.1145/1081870.1081950 2005

[27] [27]

Good word attacks on statistical spam ﬁlters

——, “Good word attacks on statistical spam ﬁlters.” in Proceedings of the Conference on Email and Anti-Spam (CEAS) , 2005

work page 2005

[28] [28]

On Attacking Statistical Spam Filters,

G. L. Wittel and S. F. Wu, “On Attacking Statistical Spam Filters,” in Proceedings of the Conference on Email and Anti-Spam (CEAS) , Mountain View, CA, USA, 2004

work page 2004

[29] [29]

Paragraph: Thwarting signature learning by training maliciously,

J. Newsome, B. Karp, and D. Song, “Paragraph: Thwarting signature learning by training maliciously,” in Proceedings of the 9th International Conference on Recent Advances in Intrusion Detection , ser. RAID’06. Berlin, Heidelberg: Springer-Verlag, 2006, pp. 81–105. [Online]. Available: http://dx.doi.org/10.1007/11856214 5

work page doi:10.1007/11856214 2006

[30] [30]

Allergy attack against automatic signa- ture generation,

S. P. Chung and A. K. Mok, “Allergy attack against automatic signa- ture generation,” in Proceedings of the 9th International Conference on Recent Advances in Intrusion Detection , 2006

work page 2006

[31] [31]

Advanced allergy attacks: Does a corpus really help,

——, “Advanced allergy attacks: Does a corpus really help,” in Proceedings of the 10th International Conference on Recent Advances in Intrusion Detection , 2007

work page 2007

[32] [32]

Intriguing properties of neural networks,

C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfel- low, and R. Fergus, “Intriguing properties of neural networks,” 2013

work page 2013

[33] [33]

Explaining and harness- ing adversarial examples,

I. J. Goodfellow, J. Shlens, and C. Szegedy, “Explaining and harness- ing adversarial examples,” 2014

work page 2014

[34] [34]

Practical black-box attacks against machine learning,

N. Papernot, P. McDaniel, I. Goodfellow, S. Jha, Z. B. Celik, and A. Swami, “Practical black-box attacks against machine learning,” 2016

work page 2016

[35] [35]

Uni- versal adversarial perturbations,

S.-M. Moosavi-Dezfooli, A. Fawzi, O. Fawzi, and P. Frossard, “Uni- versal adversarial perturbations,” 2016

work page 2016

[36] [36]

Auror: Defending against poisoning attacks in collaborative deep learning systems,

S. Shen, S. Tople, and P. Saxena, “Auror: Defending against poisoning attacks in collaborative deep learning systems,” in Proceedings of the 32Nd Annual Conference on Computer Security Applications , ser. ACSAC ’16. New York, NY , USA: ACM, 2016, pp. 508–519. [Online]. Available: http://doi.acm.org/10.1145/2991079.2991125

work page doi:10.1145/2991079.2991125 2016

[37] [37]

Learning algorithms for classiﬁcation: A comparison on handwritten digit recognition,

Y . LeCun, L. Jackel, L. Bottou, C. Cortes, J. S. Denker, H. Drucker, I. Guyon, U. Muller, E. Sackinger, P. Simard et al. , “Learning algorithms for classiﬁcation: A comparison on handwritten digit recognition,” Neural networks: the statistical mechanics perspective , vol. 261, p. 276, 1995

work page 1995

[38] [38]

Convexiﬁed convolutional neural networks,

Y . Zhang, P. Liang, and M. J. Wainwright, “Convexiﬁed convolutional neural networks,” arXiv preprint arXiv:1609.01000 , 2016

work page arXiv 2016

[39] [39]

Faster r-cnn: Towards real- time object detection with region proposal networks,

S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: Towards real- time object detection with region proposal networks,” in Advances in neural information processing systems , 2015, pp. 91–99

work page 2015

[40] [40]

Trafﬁc sign detection for us roads: Remaining challenges and a case for tracking,

A. Møgelmose, D. Liu, and M. M. Trivedi, “Trafﬁc sign detection for us roads: Remaining challenges and a case for tracking,” in Intel- ligent Transportation Systems (ITSC), 2014 IEEE 17th International Conference on. IEEE, 2014, pp. 1394–1399

work page 2014

[41] [41]

Decaf: A deep convolutional activation feature for generic visual recognition,

J. Donahue, Y . Jia, O. Vinyals, J. Hoffman, N. Zhang, E. Tzeng, and T. Darrell, “Decaf: A deep convolutional activation feature for generic visual recognition,” in International conference on machine learning , 2014, pp. 647–655

work page 2014

[42] [42]

Transfer learning and ﬁne-tuning convolutional neural networks,

A. Karpathy, “Transfer learning and ﬁne-tuning convolutional neural networks,” CS321n Lecture Notes; http://cs231n.github.io/ transfer-learning/

work page

[43] [43]

Caffe Model Zoo,

“Caffe Model Zoo,” https://github.com/BVLC/caffe/wiki/Model-Zoo

work page

[44] [44]

Trafﬁc sign detection and recognition using fully convolutional network guided proposals,

Y . Zhu, C. Zhang, D. Zhou, X. Wang, X. Bai, and W. Liu, “Trafﬁc sign detection and recognition using fully convolutional network guided proposals,” Neurocomputing, vol. 214, pp. 758 – 766, 2016. [Online]. Available: http://www.sciencedirect.com/science/article/pii/ S092523121630741X

work page 2016

[45] [45]

Transfer learning - machine learning’s next frontier,

S. Ruder, “Transfer learning - machine learning’s next frontier,” http: //ruder.io/transfer-learning/

work page

[46] [46]

A comprehensive guide to ﬁne-tuning deep learning models in Keras,

F. Yu, “A comprehensive guide to ﬁne-tuning deep learning models in Keras,” https://ﬂyyufelix.github.io/2016/10/03/ ﬁne-tuning-in-keras-part1.html

work page 2016

[47] [47]

Network in Network Imagenet Model,

“Network in Network Imagenet Model,” https://gist.github.com/ mavenlin/d802a5849de39225bcc6

work page

[48] [48]

Caffe models in TensorFlow,

“Caffe models in TensorFlow,” https://github.com/ethereon/ caffe-tensorﬂow

work page

[49] [49]

Caffe to Keras converter,

“Caffe to Keras converter,” https://github.com/qxcv/caffe2keras

work page

[50] [50]

Convert models from Caffe to Theano format,

“Convert models from Caffe to Theano format,” https://github.com/ kencoken/caffe-model-convert

work page

[51] [51]

Converting trained models to Core ML,

Apple Inc., “Converting trained models to Core ML,” https://developer.apple.com/documentation/coreml/converting trained models to core ml

work page

[52] [52]

Convert Caffe model to Mxnet format,

“Convert Caffe model to Mxnet format,” https://github.com/apache/ incubator-mxnet/tree/master/tools/caffe converter

work page

[53] [53]

caffe2neon,

“caffe2neon,” https://github.com/NervanaSystems/caffe2neon

work page