Continuous Learning for Android Malware Detection

David Wagner; Yizheng Chen; Zhoujie Ding

arxiv: 2302.04332 · v2 · pith:QIMOFTU4new · submitted 2023-02-08 · 💻 cs.CR · cs.AI

Continuous Learning for Android Malware Detection

Yizheng Chen , Zhoujie Ding , David Wagner This is my paper

classification 💻 cs.CR cs.AI

keywords learningmalwareandroidmethodsactiveclassifierconceptdrift

0 comments

read the original abstract

Machine learning methods can detect Android malware with very high accuracy. However, these classifiers have an Achilles heel, concept drift: they rapidly become out of date and ineffective, due to the evolution of malware apps and benign apps. Our research finds that, after training an Android malware classifier on one year's worth of data, the F1 score quickly dropped from 0.99 to 0.76 after 6 months of deployment on new test samples. In this paper, we propose new methods to combat the concept drift problem of Android malware classifiers. Since machine learning technique needs to be continuously deployed, we use active learning: we select new samples for analysts to label, and then add the labeled samples to the training set to retrain the classifier. Our key idea is, similarity-based uncertainty is more robust against concept drift. Therefore, we combine contrastive learning with active learning. We propose a new hierarchical contrastive learning scheme, and a new sample selection technique to continuously train the Android malware classifier. Our evaluation shows that this leads to significant improvements, compared to previously published methods for active learning. Our approach reduces the false negative rate from 14% (for the best baseline) to 9%, while also reducing the false positive rate (from 0.86% to 0.48%). Also, our approach maintains more consistent performance across a seven-year time period than past methods.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Unraveling the Key of Machine Learning-based Android Malware Detection
cs.CR 2024-02 unverdicted novelty 6.0

A taxonomy and re-implementation study of 12 ML Android malware detectors finds persistent vulnerabilities to malware evolution and adversarial attacks due to insufficient capture of malware semantics.