DeepXplore: Automated Whitebox Testing of Deep Learning Systems

Junfeng Yang; Kexin Pei; Suman Jana; Yinzhi Cao

arxiv: 1705.06640 · v4 · pith:SBYGHHXDnew · submitted 2017-05-18 · 💻 cs.LG · cs.CR· cs.SE

DeepXplore: Automated Whitebox Testing of Deep Learning Systems

Kexin Pei , Yinzhi Cao , Junfeng Yang , Suman Jana This is my paper

classification 💻 cs.LG cs.CRcs.SE

keywords deepxploreinputssystemsbehaviorsself-drivingtesttestingbehavior

0 comments

read the original abstract

Deep learning (DL) systems are increasingly deployed in safety- and security-critical domains including self-driving cars and malware detection, where the correctness and predictability of a system's behavior for corner case inputs are of great importance. Existing DL testing depends heavily on manually labeled data and therefore often fails to expose erroneous behaviors for rare inputs. We design, implement, and evaluate DeepXplore, the first whitebox framework for systematically testing real-world DL systems. First, we introduce neuron coverage for systematically measuring the parts of a DL system exercised by test inputs. Next, we leverage multiple DL systems with similar functionality as cross-referencing oracles to avoid manual checking. Finally, we demonstrate how finding inputs for DL systems that both trigger many differential behaviors and achieve high neuron coverage can be represented as a joint optimization problem and solved efficiently using gradient-based search techniques. DeepXplore efficiently finds thousands of incorrect corner case behaviors (e.g., self-driving cars crashing into guard rails and malware masquerading as benign software) in state-of-the-art DL models with thousands of neurons trained on five popular datasets including ImageNet and Udacity self-driving challenge data. For all tested DL models, on average, DeepXplore generated one test input demonstrating incorrect behavior within one second while running only on a commodity laptop. We further show that the test inputs generated by DeepXplore can also be used to retrain the corresponding DL model to improve the model's accuracy by up to 3%.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Metamorphic Testing of a Deep Learning based Forecaster
cs.LG 2019-07 unverdicted novelty 5.0

Developed 19 metamorphic relations to test correlation detection and LSTM forecasting in an outage prediction application, uncovering 8 unknown issues in the live system and detecting 65.9% of injected bugs via mutati...