Adversarial Vulnerability Under Temporal Concept Drift: A Longitudinal Study of Android Malware Detection

Ahmed Sabbah; David Mohaisen; Mohammed Kharma; Radi Jarrar; Samer Zein

arxiv: 2605.23623 · v1 · pith:PRPCCFMInew · submitted 2026-05-22 · 💻 cs.CR · cs.AI· cs.LG

Adversarial Vulnerability Under Temporal Concept Drift: A Longitudinal Study of Android Malware Detection

Ahmed Sabbah , Mohammed Kharma , Radi Jarrar , Samer Zein , David Mohaisen This is my paper

Pith reviewed 2026-05-25 04:15 UTC · model grok-4.3

classification 💻 cs.CR cs.AIcs.LG

keywords Android malware detectionadversarial robustnesstemporal concept driftlongitudinal studyFGSMSPSAdistribution shiftmachine learning security

0 comments

The pith

As the time gap between training and testing data grows, Android malware detectors lose both accuracy and resistance to adversarial attacks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper examines how temporal changes in Android app data affect the robustness of machine learning models against adversarial examples over more than a decade. It organizes apps into yearly slices and evaluates models under three protocols that mimic real deployment: same-year training and testing, cross-year use without updates, and expanding-window retraining with all past data. The study generates attacks with FGSM and SPSA on static and dynamic features, then tracks clean accuracy, adversarial accuracy, and attack success alongside new metrics for drift effects. Results indicate that bigger temporal separations coincide with falling clean and adversarial accuracy and rising attack success in some cases. Expanding-window retraining lessens but does not remove the robustness decline under continued data evolution.

Core claim

The central discovery is that temporal separation between training and test data is associated with reduced adversarial robustness in Android malware detection under transfer-based feature-space attacks. As the train-test gap increases, both clean accuracy and adversarial accuracy decline while attack success rates show configuration-dependent increases, especially with FGSM perturbations on static features. Expanding-window retraining mitigates but does not eliminate the robustness loss under ongoing distributional evolution.

What carries the argument

The three deployment protocols—same-year training/testing, cross-year deployment without updates, and expanding-window retraining—combined with temporal linkage metrics (RobustDrop, ΔASR, and Adversarial Amplification Factor) to link distribution shift to robustness degradation.

Load-bearing premise

The three deployment protocols accurately emulate realistic learning scenarios in Android malware detection.

What would settle it

Finding that adversarial accuracy stays stable or rises as the year gap between training and test data widens would falsify the reported association between temporal separation and robustness loss.

Figures

Figures reproduced from arXiv: 2605.23623 by Ahmed Sabbah, David Mohaisen, Mohammed Kharma, Radi Jarrar, Samer Zein.

**Figure 2.** Figure 2: Temporal drift evaluation pipeline. Applications are executed on emulators and [PITH_FULL_IMAGE:figures/full_fig_p011_2.png] view at source ↗

**Figure 3.** Figure 3: Clean performance over time under the cross-year protocol for the real device [PITH_FULL_IMAGE:figures/full_fig_p017_3.png] view at source ↗

**Figure 4.** Figure 4: Clean performance over time under the expanding window protocol for the [PITH_FULL_IMAGE:figures/full_fig_p018_4.png] view at source ↗

**Figure 5.** Figure 5: Adversarial Accuracy (AA) for cross-year evaluation on the emulator dataset [PITH_FULL_IMAGE:figures/full_fig_p021_5.png] view at source ↗

**Figure 6.** Figure 6: Adversarial accuracy (AA) for expanding window models on the real device, [PITH_FULL_IMAGE:figures/full_fig_p023_6.png] view at source ↗

**Figure 7.** Figure 7: Attack success rate (ASR) for cross year models on the real device, using static [PITH_FULL_IMAGE:figures/full_fig_p025_7.png] view at source ↗

**Figure 8.** Figure 8: Attack success rate (ASR) for expanding window (incremental) models on the [PITH_FULL_IMAGE:figures/full_fig_p026_8.png] view at source ↗

**Figure 9.** Figure 9: Linkage metrics for the emulator device. Each 2 [PITH_FULL_IMAGE:figures/full_fig_p031_9.png] view at source ↗

**Figure 10.** Figure 10: Linkage metrics for the real device. The panels summarize how RobustDrop, [PITH_FULL_IMAGE:figures/full_fig_p032_10.png] view at source ↗

read the original abstract

We present a longitudinal, drift-aware evaluation of adversarial robustness across more than a decade of Android applications using static and dynamic feature representations extracted from emulator and real-device executions. The dataset is organized into yearly slices and evaluated under three deployment protocols that emulate realistic learning scenarios: (1) same-year training and testing, (2) cross-year deployment without model updates, and (3) expanding-window retraining with cumulative historical data. Across multiple classifier families, adversarial examples are generated using FGSM and SPSA under feasibility constraints. We measure clean performance, Adversarial Accuracy (AA), Attack Success Rate (ASR), and introduce temporal linkage metrics -- RobustDrop, $\Delta$ASR, and Adversarial Amplification Factor (AAF) -- to quantify the relationship between distribution shift and robustness degradation.nResults show that temporal separation is associated with reduced adversarial robustness under the evaluated transfer-based feature-space setting. As the train-test gap increases, clean accuracy and adversarial accuracy decline, while attack success exhibits configuration-dependent increases, particularly under FGSM perturbations and static features. Expanding-window retraining mitigates, but does not eliminate, robustness loss under continued distributional evolution. These findings indicate that temporal drift should be considered when assessing the long-term robustness of intelligent detection systems under evolving data distributions and highlight the need for drift-aware robustness assessment frameworks in long-lived adversarial environments.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper links larger train-test time gaps to lower adversarial robustness in Android malware detectors under feature-space attacks, but potential drift in the feature extraction pipeline over 10+ years weakens clean attribution to concept drift alone.

read the letter

The main takeaway is that this longitudinal study finds robustness drops as the train-test time gap increases, using three protocols and new metrics like RobustDrop and AAF. The work is new in organizing data into yearly slices and testing same-year, cross-year, and expanding-window setups while tracking how clean accuracy, adversarial accuracy, and attack success rate change. It does a reasonable job applying FGSM and SPSA to both static and dynamic features across classifier families and showing that retraining with cumulative data reduces but does not remove the robustness loss. The introduced temporal metrics give a concrete way to measure the linkage between distribution shift and degradation. The soft spot is the stress-test concern about execution environment changes. The dataset spans more than a decade with features from emulator and real-device runs. If emulator versions, API levels, or instrumentation were updated yearly to match the apps, then some of the accuracy drops and ASR rises could stem from shifts in the measurement pipeline rather than malware distribution alone. Static features are less exposed to this, yet the paper still reports configuration-dependent ASR increases there. The three protocols emulate realistic deployment but do not isolate concept drift if the extractor itself drifts. This paper is for researchers working on long-lived ML security systems, particularly Android malware detection. It deserves a serious referee because the longitudinal design and practical protocols address a real gap even if the causal story needs tighter controls on the feature pipeline.

Referee Report

2 major / 2 minor

Summary. The paper claims that temporal concept drift in Android malware detection over >10 years leads to reduced adversarial robustness: as the train-test temporal gap increases under three protocols (same-year, cross-year without updates, expanding-window retraining), clean accuracy and adversarial accuracy decline while ASR rises in a configuration-dependent manner (especially FGSM on static features), quantified via new metrics RobustDrop, ΔASR, and AAF. Expanding-window retraining mitigates but does not eliminate the effect. The evaluation uses static/dynamic features from emulator/real-device runs and transfer-based FGSM/SPSA attacks.

Significance. If the central attribution to temporal drift holds, the work is significant as a rare longitudinal empirical study spanning a decade of real-world data with multiple protocols and feature types. It provides concrete evidence that drift-aware robustness assessment is needed for long-lived adversarial ML systems in security. Credit is due for the scale of the dataset and the attempt to emulate realistic deployment via the three protocols; however, the ad-hoc nature of the invented metrics (RobustDrop, ΔASR, AAF) limits immediate impact without further validation.

major comments (2)

[Dataset and feature extraction] Dataset and feature extraction description (abstract and methods): the paper does not specify whether emulator/OS versions, API levels, or instrumentation are held constant across the >10-year span or updated yearly to match app vintages. If the latter (common for realism), observed drops in accuracy/robustness and rises in ASR could be partly artifacts of a drifting measurement pipeline rather than malware distribution shift alone; this directly undermines the central claim that temporal separation causes reduced robustness, as static features are also reported to show configuration-dependent ASR increases.
[Abstract and deployment protocols] Abstract and results on the three protocols: the claim that expanding-window retraining 'mitigates, but does not eliminate, robustness loss' and that the protocols 'emulate realistic learning scenarios' is load-bearing for the practical implications, yet no validation or comparison to actual industry deployment practices is provided. Without this, the mitigation findings cannot be confidently generalized beyond the specific experimental setup.

minor comments (2)

[Abstract] Abstract: 'nResults show' is a typographical error and should read 'Results show'.
[Metrics definition] The new metrics (RobustDrop, ΔASR, AAF) are introduced without explicit mathematical definitions or comparison to standard measures; this should be added for reproducibility even if they remain ad-hoc.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their insightful comments on our manuscript. We provide detailed responses to each major comment below.

read point-by-point responses

Referee: [Dataset and feature extraction] Dataset and feature extraction description (abstract and methods): the paper does not specify whether emulator/OS versions, API levels, or instrumentation are held constant across the >10-year span or updated yearly to match app vintages. If the latter (common for realism), observed drops in accuracy/robustness and rises in ASR could be partly artifacts of a drifting measurement pipeline rather than malware distribution shift alone; this directly undermines the central claim that temporal separation causes reduced robustness, as static features are also reported to show configuration-dependent ASR increases.

Authors: The referee correctly notes that the paper does not specify the details of the feature extraction pipeline across years. To achieve a realistic longitudinal study, the extraction process was updated yearly to align with contemporary Android API levels and emulator versions for each slice. This is standard practice for such studies to avoid artificial constraints. While this introduces a potential confounding factor, the central claim focuses on the impact of temporal separation in data, which includes both app evolution and the necessary adaptation of the detection environment. We will add a detailed description of the pipeline in the methods section and discuss the implications for interpreting the results. revision: partial
Referee: [Abstract and deployment protocols] Abstract and results on the three protocols: the claim that expanding-window retraining 'mitigates, but does not eliminate, robustness loss' and that the protocols 'emulate realistic learning scenarios' is load-bearing for the practical implications, yet no validation or comparison to actual industry deployment practices is provided. Without this, the mitigation findings cannot be confidently generalized beyond the specific experimental setup.

Authors: We acknowledge that the manuscript lacks explicit validation or comparison to industry deployment practices. The protocols are motivated by standard approaches in handling temporal drift in machine learning for security. We will revise the abstract to qualify the claims about emulation of realistic scenarios and add a section in the discussion addressing the limitations regarding generalization to industry settings. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical metrics defined directly from observed performance differences

full rationale

The paper is a longitudinal empirical study that reports clean accuracy, adversarial accuracy, ASR, and newly introduced metrics (RobustDrop, ΔASR, AAF) computed from measured performance under three explicit deployment protocols on yearly data slices. No equations, derivations, or fitted parameters are presented whose outputs reduce to the inputs by construction. The central claim is an observed association between temporal gap and robustness degradation; the protocols are defined operationally rather than derived. Self-citations are absent from the provided text, and no uniqueness theorems or ansatzes are invoked. This matches the default expectation for non-circular empirical work.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The paper rests on the domain assumption that its three protocols emulate realistic scenarios and introduces three new metrics to quantify temporal effects; no free parameters or postulated physical entities are visible from the abstract.

axioms (1)

domain assumption The three deployment protocols emulate realistic learning scenarios
Stated explicitly in the abstract as the basis for the evaluation design.

invented entities (1)

RobustDrop, ΔASR, and Adversarial Amplification Factor (AAF) no independent evidence
purpose: Quantify the relationship between distribution shift and robustness degradation
New metrics defined in the abstract to capture temporal linkage effects.

pith-pipeline@v0.9.0 · 5783 in / 1387 out tokens · 31399 ms · 2026-05-25T04:15:39.583130+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

51 extracted references · 51 canonical work pages · 1 internal anchor

[1]

Abusnaina, A

A. Abusnaina, A. Anwar, M. Saad, A. Alabduljabbar, R. Jang, S. Salem, D. Mohaisen, One step forward, two steps back: Ml-based mal- ware detection under concept drift, Computing 107 (11) (2025) 207. doi:10.1007/S00607-025-01543-7. URL https://doi.org/10.1007/s00607-025-01543-7

work page doi:10.1007/s00607-025-01543-7 2025
[2]

Sabbah, R

A. Sabbah, R. Jarrar, S. Zein, D. Mohaisen, Understand- ing concept drift with deprecated permissions in android mal- ware detection, CoRR abs/2507.22231 (2025). arXiv:2507.22231, doi:10.48550/ARXIV.2507.22231. URL https://doi.org/10.48550/arXiv.2507.22231

work page doi:10.48550/arxiv.2507.22231 2025
[3]

Sabbah, R

A. Sabbah, R. Jarrar, S. Zein, D. Mohaisen, Empirical evaluation of con- cept drift in ml-based android malware detection, CoRR abs/2507.22772 (2025). arXiv:2507.22772, doi:10.48550/ARXIV.2507.22772. URL https://doi.org/10.48550/arXiv.2507.22772

work page doi:10.48550/arxiv.2507.22772 2025
[4]

Abusnaina, A

A. Abusnaina, A. Anwar, M. Saad, A. Alabduljabbar, R. Jang, S. Salem, D. Mohaisen, Exposing the limitations of machine learning for malware detection under concept drift, in: M. Barhamgi, H. Wang, X. Wang (Eds.), Web Information Systems Engineering - WISE 2024 - 25th In- ternational Conference, Doha, Qatar, December 2-5, 2024, Proceedings, Part II, Vol. 1...

work page doi:10.1007/978-981-96-0567-5_20 2024
[5]

Mohaisen, O

A. Mohaisen, O. Alrawi, M. Mohaisen, AMAL: high-fidelity, behavior- based automated malware analysis and classification, Comput. Secur. 36 52 (2015) 251–266. doi:10.1016/J.COSE.2015.04.001. URL https://doi.org/10.1016/j.cose.2015.04.001

work page doi:10.1016/j.cose.2015.04.001 2015
[6]

J. G. Moreno-Torres, T. Raeder, R. Alaíz-Rodríguez, N. V. Chawla, F. Herrera, A unifying view on dataset shift in classification, Pattern Recognit. 45 (1) (2012) 521–530. doi:10.1016/J.PATCOG.2011.06.019

work page doi:10.1016/j.patcog.2011.06.019 2012
[7]

J. Gama, I. Zliobaite, A. Bifet, M. Pechenizkiy, A. Bouchachia, A survey on concept drift adaptation, ACM Comput. Surv. 46 (4) (2014) 44:1– 44:37. doi:10.1145/2523813

work page doi:10.1145/2523813 2014
[8]

F. Shen, J. D. Vecchio, A. Mohaisen, S. Y. Ko, L. Ziarek, Android mal- ware detection using complex-flows, IEEE Trans. Mob. Comput. 18 (6) (2019) 1231–1245. doi:10.1109/TMC.2018.2861405

work page doi:10.1109/tmc.2018.2861405 2019
[9]

Alasmary, A

H. Alasmary, A. Khormali, A. Anwar, J. Park, J. Choi, A. Abusnaina, A. Awad, D. Nyang, A. Mohaisen, Analyzing and detecting emerging internet of things malware: A graph-based approach, IEEE Internet Things J. 6 (5) (2019) 8977–8988. doi:10.1109/JIOT.2019.2925929. URL https://doi.org/10.1109/JIOT.2019.2925929

work page doi:10.1109/jiot.2019.2925929 2019
[10]

Mobile operating system market share worldwide | statcounter global stats, https://gs.statcounter.com/os-market-share/mobile/worldwide, (Accessed on 03/29/2025)

work page 2025
[11]

Malware statistics & trends report | av-test, https://www.av-test.org/ en/statistics/malware/, (Accessed on 03/29/2025)

work page 2025
[12]

Kaspersky’s report on mobile threats in 2023 | securelist, https://se curelist.com/mobile-malware-report-2023/111964/, (Accessed on 03/29/2025)

work page 2023
[13]

Y. Pan, X. Ge, C. Fang, Y. Fan, A systematic literature review of an- droid malware detection using static analysis, IEEE Access 8 (2020) 116363–116379. doi:10.1109/ACCESS.2020.3002842. URL https://doi.org/10.1109/ACCESS.2020.3002842

work page doi:10.1109/access.2020.3002842 2020
[14]

Alzubaidi, Recent advances in android mobile malware detection: A systematic literature review, IEEE Access 9 (2021) 146318–146349

A. Alzubaidi, Recent advances in android mobile malware detection: A systematic literature review, IEEE Access 9 (2021) 146318–146349. doi:10.1109/ACCESS.2021.3123187. 37

work page doi:10.1109/access.2021.3123187 2021
[15]

M. Li, Z. Fang, J. Wang, L. Cheng, Q. Zeng, T. Yang, Y. Wu, J. Geng, A systematic overview of android malware detection, Appl. Artif. Intell. 36 (1) (2022). doi:10.1080/08839514.2021.2007327

work page doi:10.1080/08839514.2021.2007327 2022
[16]

Guerra-Manzanares, H

A. Guerra-Manzanares, H. Bahsi, S. Nõmm, Kronodroid: Time- based hybrid-featured dataset for effective android malware de- tection and characterization, Comput. Secur. 110 (2021) 102399. doi:10.1016/J.COSE.2021.102399. URL https://doi.org/10.1016/j.cose.2021.102399

work page doi:10.1016/j.cose.2021.102399 2021
[17]

Guerra-Manzanares, M

A. Guerra-Manzanares, M. Luckner, H. Bahsi, Android malware concept drift using system calls: Detection, characterization and challenges, Ex- pert Syst. Appl. 206 (2022) 117200. doi:10.1016/J.ESWA.2022.117200

work page doi:10.1016/j.eswa.2022.117200 2022
[18]

Guerra-Manzanares, H

A. Guerra-Manzanares, H. Bahsi, On the relativity of time: Im- plications and challenges of data drift on long-term effective android malware detection, Comput. Secur. 122 (2022) 102835. doi:10.1016/J.COSE.2022.102835

work page doi:10.1016/j.cose.2022.102835 2022
[19]

Pendlebury, F

F. Pendlebury, F. Pierazzi, R. Jordaney, J. Kinder, L. Cavallaro, TESSERACT: eliminating experimental bias in malware classification across space and time, in: N. Heninger, P. Traynor (Eds.), USENIX, 2019, pp. 729–746

work page 2019
[20]

Spinning Language Models: Risks of Propaganda-As-A-Service and Countermeasures , url=

F. Barbero, F. Pendlebury, F. Pierazzi, L. Cavallaro, Tran- scending TRANSCEND: revisiting malware classification in the presence of concept drift, in: SP, IEEE, 2022, pp. 805–823. doi:10.1109/SP46214.2022.9833659

work page doi:10.1109/sp46214.2022.9833659 2022
[21]

I. J. Goodfellow, J. Shlens, C. Szegedy, Explaining and harnessing ad- versarial examples, in: Y. Bengio, Y. LeCun (Eds.), 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, 2015

work page 2015
[22]

Madry, A

A. Madry, A. Makelov, L. Schmidt, D. Tsipras, A. Vladu, Towards deep learning models resistant to adversarial attacks, in: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings, Open- Review.net, 2018. 38

work page 2018
[23]

Towards Evaluating the Robustness of Neural Networks

N. Carlini, D. A. Wagner, Towards evaluating the robustness of neural networks, in: 2017 IEEE Symposium on Security and Privacy, SP 2017, San Jose, CA, USA, May 22-26, 2017, IEEE Computer Society, 2017, pp. 39–57. doi:10.1109/SP.2017.49

work page doi:10.1109/sp.2017.49 2017
[24]

Pierazzi, F

F. Pierazzi, F. Pendlebury, J. Cortellazzi, L. Cavallaro, Intriguing properties of adversarial ML attacks in the problem space, in: 2020 IEEE Symposium on Security and Privacy, SP 2020, San Fran- cisco, CA, USA, May 18-21, 2020, IEEE, 2020, pp. 1332–1349. doi:10.1109/SP40000.2020.00073

work page doi:10.1109/sp40000.2020.00073 2020
[25]

Bostani, V

H. Bostani, V. Moonsamy, Evadedroid: A practical evasion attack on machine learning for black-box android malware detection, Comput. Se- cur. 139 (2024) 103676. doi:10.1016/J.COSE.2023.103676

work page doi:10.1016/j.cose.2023.103676 2024
[26]

J. C. Schlimmer, R. H. Granger, Incremental learning from noisy data, Mach. Learn. 1 (3) (1986) 317–354. doi:10.1023/A:1022810614389

work page doi:10.1023/a:1022810614389 1986
[27]

Xiang, L

Q. Xiang, L. Zi, X. Cong, Y. Wang, Concept drift adaptation meth- ods under the deep learning framework: A literature review, Applied Sciences 13 (11) (2023) 6515

work page 2023
[28]

Ceschin, M

F. Ceschin, M. Botacin, H. M. Gomes, F. A. Pinage, L. S. Oliveira, A. Grégio, Fast & furious: On the modelling of malware detection as an evolving data stream, Expert Syst. Appl. 212 (2023) 118590. doi:10.1016/J.ESWA.2022.118590. URL https://doi.org/10.1016/j.eswa.2022.118590

work page doi:10.1016/j.eswa.2022.118590 2023
[29]

Tripathi, H

J. Tripathi, H. M. Gomes, M. Botacin, Towards explainable drift de- tection and early retrain in ml-based malware detection pipelines, in: M. Egele, V. Moonsamy, D. Gruss, M. Carminati (Eds.), Detection of Intrusions and Malware, and Vulnerability Assessment - 22nd Interna- tional Conference, DIMVA 2025, Graz, Austria, July 9-11, 2025, Pro- ceedings, Part...

work page doi:10.1007/978-3-031-97623-0_1 2025
[30]

D. Hu, Z. Ma, X. Zhang, P. Li, D. Ye, B. Ling, The concept drift prob- lem in android malware detection and its solution, Secur. Commun. Net- works 2017 (2017) 4956386:1–4956386:13. doi:10.1155/2017/4956386. 39

work page doi:10.1155/2017/4956386 2017
[31]

Z. Chen, Z. Zhang, Z. Kan, L. Yang, J. Cortellazzi, F. Pendlebury, F. Pierazzi, L. Cavallaro, G. Wang, Is it overkill? analyzing feature- space concept drift in malware detectors, in: IEEE, IEEE, 2023, pp. 21–28. doi:10.1109/SPW59333.2023.00007

work page doi:10.1109/spw59333.2023.00007 2023
[32]

Guerra-Manzanares, M

A. Guerra-Manzanares, M. Luckner, H. Bahsi, Corrigendum to concept drift and cross-device behavior: Challenges and implications for effective android malware detection computers & security, volume 120, 102757, Comput. Secur. 124 (2023) 102998. doi:10.1016/J.COSE.2022.102998

work page doi:10.1016/j.cose.2022.102998 2023
[33]

T. Chow, Z. Kan, L. Linhardt, L. Cavallaro, D. Arp, F. Pierazzi, Drift forensics of malware classifiers, in: M. Pintor, X. Chen, F. Tramèr (Eds.), ACM, ACM, 2023, pp. 197–207. doi:10.1145/3605764.3623918

work page doi:10.1145/3605764.3623918 2023
[34]

Abusnaina, Y

A. Abusnaina, Y. Wang, S. S. Arora, K. Wang, M. Christodorescu, D. Mohaisen, Burning the adversarial bridges: Robust windows malware detection against binary-level mutations, CoRR abs/2310.03285 (2023). arXiv:2310.03285, doi:10.48550/ARXIV.2310.03285

work page doi:10.48550/arxiv.2310.03285 2023
[35]

Abusnaina, A

A. Abusnaina, A. Anwar, S. Alshamrani, A. Alabduljabbar, R. Jang, D. Nyang, D. Mohaisen, Systematically evaluating the robustness of ml-based iot malware detection systems, in: RAID, ACM, 2022, pp. 308–320

work page 2022
[36]

Hinder, V

F. Hinder, V. Vaquet, B. Hammer, Adversarial attacks for drift detection, CoRR abs/2411.16591 (2024). arXiv:2411.16591, doi:10.48550/ARXIV.2411.16591

work page doi:10.48550/arxiv.2411.16591 2024
[37]

Faruki, R

P. Faruki, R. Bhan, V. Jain, S. Bhatia, N. E. Madhoun, R. Pa- mula, A survey and evaluation of android-based malware eva- sion techniques and detection frameworks, Inf. 14 (7) (2023) 374. doi:10.3390/INFO14070374

work page doi:10.3390/info14070374 2023
[38]

T. S. Sethi, M. M. Kantardzic, Handling adversarial concept drift in streaming data, Expert Syst. Appl. 97 (2018) 18–40. doi:10.1016/J.ESWA.2017.12.022

work page doi:10.1016/j.eswa.2017.12.022 2018
[39]

Korycki, B

L. Korycki, B. Krawczyk, Adversarial concept drift detection under poi- soning attacks for robust data stream mining, Mach. Learn. 112 (10) (2023) 4013–4048. doi:10.1007/S10994-022-06177-W. 40

work page doi:10.1007/s10994-022-06177-w 2023
[40]

P. Chen, H. Zhang, Y. Sharma, J. Yi, C. Hsieh, ZOO: zeroth order opti- mization based black-box attacks to deep neural networks without train- ing substitute models, in: Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security, AISec@CCS 2017, Dallas, TX, USA, November3, 2017, ACM,2017, pp.15–26. doi:10.1145/3128572.3140448

work page doi:10.1145/3128572.3140448 2017
[41]

Rosenberg, A

I. Rosenberg, A. Shabtai, Y. Elovici, L. Rokach, Query-efficient black- box attack against sequence-based malware classifiers, in: ACSAC ’20: Annual Computer Security Applications Conference, Virtual Event / Austin, TX, USA, 7-11 December, 2020, ACM, 2020, pp. 611–626. doi:10.1145/3427228.3427230

work page doi:10.1145/3427228.3427230 2020
[42]

Yuste, E

J. Yuste, E. G. Pardo, J. Tapiador, Optimization of code caves in mal- ware binaries to evade machine learning detectors, Comput. Secur. 116 (2022) 102643. doi:10.1016/J.COSE.2022.102643

work page doi:10.1016/j.cose.2022.102643 2022
[43]

H. S. Anderson, A. Kharkar, B. Filar, D. Evans, P. Roth, Learning to evade static PE machine learning malware models via reinforcement learning, CoRR abs/1801.08917 (2018). arXiv:1801.08917

work page internal anchor Pith review Pith/arXiv arXiv 2018
[44]

W. Hu, Y. Tan, Generating adversarial malware examples for black-box attacks based on GAN, in: Data Mining and Big Data - 7th Interna- tional Conference, DMBD 2022, Beijing, China, November 21-24, 2022, Proceedings, Part II, Vol. 1745 of Communications in Computer and Information Science, Springer, 2022, pp. 409–423. doi:10.1007/978-981- 19-8991-9_29

work page doi:10.1007/978-981- 2022
[45]

Apruzzese, A

G. Apruzzese, A. Fass, F. Pierazzi, When adversarial perturbations meet concept drift: An exploratory analysis on ML-NIDS, in: AISec 2024, Salt Lake City, UT, USA, October 14-18, 2024, ACM, 2024, pp. 149–

work page 2024
[46]

URL https://doi.org/10.1145/3689932.3694757

doi:10.1145/3689932.3694757. URL https://doi.org/10.1145/3689932.3694757

work page doi:10.1145/3689932.3694757
[47]

In: Proceedings of the 2017 ACM on Asia Con- ference on Computer and Communications Security

N. Papernot, P. D. McDaniel, I. J. Goodfellow, S. Jha, Z. B. Ce- lik, A. Swami, Practical black-box attacks against machine learn- ing, in: Proceedings of the 2017 ACM on Asia Conference on Com- puter and Communications Security, AsiaCCS 2017, Abu Dhabi, United Arab Emirates, April 2-6, 2017, ACM, 2017, pp. 506–519. doi:10.1145/3052973.3053009. URL https:...

work page doi:10.1145/3052973.3053009 2017
[48]

Y. Liu, X. Chen, C. Liu, D. Song, Delving into transferable adversar- ial examples and black-box attacks, in: 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings, OpenReview.net, 2017. URL https://openreview.net/forum?id=Sys6GJqxl

work page 2017
[49]

Grosse, N

K. Grosse, N. Papernot, P. Manoharan, M. Backes, P. D. McDaniel, Adversarial examples for malware detection, in: Computer Security - ESORICS 2017 - 22nd European Symposium on Research in Com- puter Security, Oslo, Norway, September 11-15, 2017, Proceedings, Part II, Lecture Notes in Computer Science, Springer, 2017, pp. 62–

work page 2017
[50]

URL https://doi.org/10.1007/978-3-319-66399-9\_4

doi:10.1007/978-3-319-66399-9_4. URL https://doi.org/10.1007/978-3-319-66399-9\_4

work page doi:10.1007/978-3-319-66399-9_4
[51]

Uesato, B

J. Uesato, B. O’Donoghue, P. Kohli, A. van den Oord, Adversarial risk and the dangers of evaluating against weak attacks, in: Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018, Vol. 80 of Proceedings of Machine Learning Research, PMLR, 2018, pp. 5032– 5041. 42

work page 2018

[1] [1]

Abusnaina, A

A. Abusnaina, A. Anwar, M. Saad, A. Alabduljabbar, R. Jang, S. Salem, D. Mohaisen, One step forward, two steps back: Ml-based mal- ware detection under concept drift, Computing 107 (11) (2025) 207. doi:10.1007/S00607-025-01543-7. URL https://doi.org/10.1007/s00607-025-01543-7

work page doi:10.1007/s00607-025-01543-7 2025

[2] [2]

Sabbah, R

A. Sabbah, R. Jarrar, S. Zein, D. Mohaisen, Understand- ing concept drift with deprecated permissions in android mal- ware detection, CoRR abs/2507.22231 (2025). arXiv:2507.22231, doi:10.48550/ARXIV.2507.22231. URL https://doi.org/10.48550/arXiv.2507.22231

work page doi:10.48550/arxiv.2507.22231 2025

[3] [3]

Sabbah, R

A. Sabbah, R. Jarrar, S. Zein, D. Mohaisen, Empirical evaluation of con- cept drift in ml-based android malware detection, CoRR abs/2507.22772 (2025). arXiv:2507.22772, doi:10.48550/ARXIV.2507.22772. URL https://doi.org/10.48550/arXiv.2507.22772

work page doi:10.48550/arxiv.2507.22772 2025

[4] [4]

Abusnaina, A

A. Abusnaina, A. Anwar, M. Saad, A. Alabduljabbar, R. Jang, S. Salem, D. Mohaisen, Exposing the limitations of machine learning for malware detection under concept drift, in: M. Barhamgi, H. Wang, X. Wang (Eds.), Web Information Systems Engineering - WISE 2024 - 25th In- ternational Conference, Doha, Qatar, December 2-5, 2024, Proceedings, Part II, Vol. 1...

work page doi:10.1007/978-981-96-0567-5_20 2024

[5] [5]

Mohaisen, O

A. Mohaisen, O. Alrawi, M. Mohaisen, AMAL: high-fidelity, behavior- based automated malware analysis and classification, Comput. Secur. 36 52 (2015) 251–266. doi:10.1016/J.COSE.2015.04.001. URL https://doi.org/10.1016/j.cose.2015.04.001

work page doi:10.1016/j.cose.2015.04.001 2015

[6] [6]

J. G. Moreno-Torres, T. Raeder, R. Alaíz-Rodríguez, N. V. Chawla, F. Herrera, A unifying view on dataset shift in classification, Pattern Recognit. 45 (1) (2012) 521–530. doi:10.1016/J.PATCOG.2011.06.019

work page doi:10.1016/j.patcog.2011.06.019 2012

[7] [7]

J. Gama, I. Zliobaite, A. Bifet, M. Pechenizkiy, A. Bouchachia, A survey on concept drift adaptation, ACM Comput. Surv. 46 (4) (2014) 44:1– 44:37. doi:10.1145/2523813

work page doi:10.1145/2523813 2014

[8] [8]

F. Shen, J. D. Vecchio, A. Mohaisen, S. Y. Ko, L. Ziarek, Android mal- ware detection using complex-flows, IEEE Trans. Mob. Comput. 18 (6) (2019) 1231–1245. doi:10.1109/TMC.2018.2861405

work page doi:10.1109/tmc.2018.2861405 2019

[9] [9]

Alasmary, A

H. Alasmary, A. Khormali, A. Anwar, J. Park, J. Choi, A. Abusnaina, A. Awad, D. Nyang, A. Mohaisen, Analyzing and detecting emerging internet of things malware: A graph-based approach, IEEE Internet Things J. 6 (5) (2019) 8977–8988. doi:10.1109/JIOT.2019.2925929. URL https://doi.org/10.1109/JIOT.2019.2925929

work page doi:10.1109/jiot.2019.2925929 2019

[10] [10]

Mobile operating system market share worldwide | statcounter global stats, https://gs.statcounter.com/os-market-share/mobile/worldwide, (Accessed on 03/29/2025)

work page 2025

[11] [11]

Malware statistics & trends report | av-test, https://www.av-test.org/ en/statistics/malware/, (Accessed on 03/29/2025)

work page 2025

[12] [12]

Kaspersky’s report on mobile threats in 2023 | securelist, https://se curelist.com/mobile-malware-report-2023/111964/, (Accessed on 03/29/2025)

work page 2023

[13] [13]

Y. Pan, X. Ge, C. Fang, Y. Fan, A systematic literature review of an- droid malware detection using static analysis, IEEE Access 8 (2020) 116363–116379. doi:10.1109/ACCESS.2020.3002842. URL https://doi.org/10.1109/ACCESS.2020.3002842

work page doi:10.1109/access.2020.3002842 2020

[14] [14]

Alzubaidi, Recent advances in android mobile malware detection: A systematic literature review, IEEE Access 9 (2021) 146318–146349

A. Alzubaidi, Recent advances in android mobile malware detection: A systematic literature review, IEEE Access 9 (2021) 146318–146349. doi:10.1109/ACCESS.2021.3123187. 37

work page doi:10.1109/access.2021.3123187 2021

[15] [15]

M. Li, Z. Fang, J. Wang, L. Cheng, Q. Zeng, T. Yang, Y. Wu, J. Geng, A systematic overview of android malware detection, Appl. Artif. Intell. 36 (1) (2022). doi:10.1080/08839514.2021.2007327

work page doi:10.1080/08839514.2021.2007327 2022

[16] [16]

Guerra-Manzanares, H

A. Guerra-Manzanares, H. Bahsi, S. Nõmm, Kronodroid: Time- based hybrid-featured dataset for effective android malware de- tection and characterization, Comput. Secur. 110 (2021) 102399. doi:10.1016/J.COSE.2021.102399. URL https://doi.org/10.1016/j.cose.2021.102399

work page doi:10.1016/j.cose.2021.102399 2021

[17] [17]

Guerra-Manzanares, M

A. Guerra-Manzanares, M. Luckner, H. Bahsi, Android malware concept drift using system calls: Detection, characterization and challenges, Ex- pert Syst. Appl. 206 (2022) 117200. doi:10.1016/J.ESWA.2022.117200

work page doi:10.1016/j.eswa.2022.117200 2022

[18] [18]

Guerra-Manzanares, H

A. Guerra-Manzanares, H. Bahsi, On the relativity of time: Im- plications and challenges of data drift on long-term effective android malware detection, Comput. Secur. 122 (2022) 102835. doi:10.1016/J.COSE.2022.102835

work page doi:10.1016/j.cose.2022.102835 2022

[19] [19]

Pendlebury, F

F. Pendlebury, F. Pierazzi, R. Jordaney, J. Kinder, L. Cavallaro, TESSERACT: eliminating experimental bias in malware classification across space and time, in: N. Heninger, P. Traynor (Eds.), USENIX, 2019, pp. 729–746

work page 2019

[20] [20]

Spinning Language Models: Risks of Propaganda-As-A-Service and Countermeasures , url=

F. Barbero, F. Pendlebury, F. Pierazzi, L. Cavallaro, Tran- scending TRANSCEND: revisiting malware classification in the presence of concept drift, in: SP, IEEE, 2022, pp. 805–823. doi:10.1109/SP46214.2022.9833659

work page doi:10.1109/sp46214.2022.9833659 2022

[21] [21]

I. J. Goodfellow, J. Shlens, C. Szegedy, Explaining and harnessing ad- versarial examples, in: Y. Bengio, Y. LeCun (Eds.), 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, 2015

work page 2015

[22] [22]

Madry, A

A. Madry, A. Makelov, L. Schmidt, D. Tsipras, A. Vladu, Towards deep learning models resistant to adversarial attacks, in: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings, Open- Review.net, 2018. 38

work page 2018

[23] [23]

Towards Evaluating the Robustness of Neural Networks

N. Carlini, D. A. Wagner, Towards evaluating the robustness of neural networks, in: 2017 IEEE Symposium on Security and Privacy, SP 2017, San Jose, CA, USA, May 22-26, 2017, IEEE Computer Society, 2017, pp. 39–57. doi:10.1109/SP.2017.49

work page doi:10.1109/sp.2017.49 2017

[24] [24]

Pierazzi, F

F. Pierazzi, F. Pendlebury, J. Cortellazzi, L. Cavallaro, Intriguing properties of adversarial ML attacks in the problem space, in: 2020 IEEE Symposium on Security and Privacy, SP 2020, San Fran- cisco, CA, USA, May 18-21, 2020, IEEE, 2020, pp. 1332–1349. doi:10.1109/SP40000.2020.00073

work page doi:10.1109/sp40000.2020.00073 2020

[25] [25]

Bostani, V

H. Bostani, V. Moonsamy, Evadedroid: A practical evasion attack on machine learning for black-box android malware detection, Comput. Se- cur. 139 (2024) 103676. doi:10.1016/J.COSE.2023.103676

work page doi:10.1016/j.cose.2023.103676 2024

[26] [26]

J. C. Schlimmer, R. H. Granger, Incremental learning from noisy data, Mach. Learn. 1 (3) (1986) 317–354. doi:10.1023/A:1022810614389

work page doi:10.1023/a:1022810614389 1986

[27] [27]

Xiang, L

Q. Xiang, L. Zi, X. Cong, Y. Wang, Concept drift adaptation meth- ods under the deep learning framework: A literature review, Applied Sciences 13 (11) (2023) 6515

work page 2023

[28] [28]

Ceschin, M

F. Ceschin, M. Botacin, H. M. Gomes, F. A. Pinage, L. S. Oliveira, A. Grégio, Fast & furious: On the modelling of malware detection as an evolving data stream, Expert Syst. Appl. 212 (2023) 118590. doi:10.1016/J.ESWA.2022.118590. URL https://doi.org/10.1016/j.eswa.2022.118590

work page doi:10.1016/j.eswa.2022.118590 2023

[29] [29]

Tripathi, H

J. Tripathi, H. M. Gomes, M. Botacin, Towards explainable drift de- tection and early retrain in ml-based malware detection pipelines, in: M. Egele, V. Moonsamy, D. Gruss, M. Carminati (Eds.), Detection of Intrusions and Malware, and Vulnerability Assessment - 22nd Interna- tional Conference, DIMVA 2025, Graz, Austria, July 9-11, 2025, Pro- ceedings, Part...

work page doi:10.1007/978-3-031-97623-0_1 2025

[30] [30]

D. Hu, Z. Ma, X. Zhang, P. Li, D. Ye, B. Ling, The concept drift prob- lem in android malware detection and its solution, Secur. Commun. Net- works 2017 (2017) 4956386:1–4956386:13. doi:10.1155/2017/4956386. 39

work page doi:10.1155/2017/4956386 2017

[31] [31]

Z. Chen, Z. Zhang, Z. Kan, L. Yang, J. Cortellazzi, F. Pendlebury, F. Pierazzi, L. Cavallaro, G. Wang, Is it overkill? analyzing feature- space concept drift in malware detectors, in: IEEE, IEEE, 2023, pp. 21–28. doi:10.1109/SPW59333.2023.00007

work page doi:10.1109/spw59333.2023.00007 2023

[32] [32]

Guerra-Manzanares, M

A. Guerra-Manzanares, M. Luckner, H. Bahsi, Corrigendum to concept drift and cross-device behavior: Challenges and implications for effective android malware detection computers & security, volume 120, 102757, Comput. Secur. 124 (2023) 102998. doi:10.1016/J.COSE.2022.102998

work page doi:10.1016/j.cose.2022.102998 2023

[33] [33]

T. Chow, Z. Kan, L. Linhardt, L. Cavallaro, D. Arp, F. Pierazzi, Drift forensics of malware classifiers, in: M. Pintor, X. Chen, F. Tramèr (Eds.), ACM, ACM, 2023, pp. 197–207. doi:10.1145/3605764.3623918

work page doi:10.1145/3605764.3623918 2023

[34] [34]

Abusnaina, Y

A. Abusnaina, Y. Wang, S. S. Arora, K. Wang, M. Christodorescu, D. Mohaisen, Burning the adversarial bridges: Robust windows malware detection against binary-level mutations, CoRR abs/2310.03285 (2023). arXiv:2310.03285, doi:10.48550/ARXIV.2310.03285

work page doi:10.48550/arxiv.2310.03285 2023

[35] [35]

Abusnaina, A

A. Abusnaina, A. Anwar, S. Alshamrani, A. Alabduljabbar, R. Jang, D. Nyang, D. Mohaisen, Systematically evaluating the robustness of ml-based iot malware detection systems, in: RAID, ACM, 2022, pp. 308–320

work page 2022

[36] [36]

Hinder, V

F. Hinder, V. Vaquet, B. Hammer, Adversarial attacks for drift detection, CoRR abs/2411.16591 (2024). arXiv:2411.16591, doi:10.48550/ARXIV.2411.16591

work page doi:10.48550/arxiv.2411.16591 2024

[37] [37]

Faruki, R

P. Faruki, R. Bhan, V. Jain, S. Bhatia, N. E. Madhoun, R. Pa- mula, A survey and evaluation of android-based malware eva- sion techniques and detection frameworks, Inf. 14 (7) (2023) 374. doi:10.3390/INFO14070374

work page doi:10.3390/info14070374 2023

[38] [38]

T. S. Sethi, M. M. Kantardzic, Handling adversarial concept drift in streaming data, Expert Syst. Appl. 97 (2018) 18–40. doi:10.1016/J.ESWA.2017.12.022

work page doi:10.1016/j.eswa.2017.12.022 2018

[39] [39]

Korycki, B

L. Korycki, B. Krawczyk, Adversarial concept drift detection under poi- soning attacks for robust data stream mining, Mach. Learn. 112 (10) (2023) 4013–4048. doi:10.1007/S10994-022-06177-W. 40

work page doi:10.1007/s10994-022-06177-w 2023

[40] [40]

P. Chen, H. Zhang, Y. Sharma, J. Yi, C. Hsieh, ZOO: zeroth order opti- mization based black-box attacks to deep neural networks without train- ing substitute models, in: Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security, AISec@CCS 2017, Dallas, TX, USA, November3, 2017, ACM,2017, pp.15–26. doi:10.1145/3128572.3140448

work page doi:10.1145/3128572.3140448 2017

[41] [41]

Rosenberg, A

I. Rosenberg, A. Shabtai, Y. Elovici, L. Rokach, Query-efficient black- box attack against sequence-based malware classifiers, in: ACSAC ’20: Annual Computer Security Applications Conference, Virtual Event / Austin, TX, USA, 7-11 December, 2020, ACM, 2020, pp. 611–626. doi:10.1145/3427228.3427230

work page doi:10.1145/3427228.3427230 2020

[42] [42]

Yuste, E

J. Yuste, E. G. Pardo, J. Tapiador, Optimization of code caves in mal- ware binaries to evade machine learning detectors, Comput. Secur. 116 (2022) 102643. doi:10.1016/J.COSE.2022.102643

work page doi:10.1016/j.cose.2022.102643 2022

[43] [43]

H. S. Anderson, A. Kharkar, B. Filar, D. Evans, P. Roth, Learning to evade static PE machine learning malware models via reinforcement learning, CoRR abs/1801.08917 (2018). arXiv:1801.08917

work page internal anchor Pith review Pith/arXiv arXiv 2018

[44] [44]

W. Hu, Y. Tan, Generating adversarial malware examples for black-box attacks based on GAN, in: Data Mining and Big Data - 7th Interna- tional Conference, DMBD 2022, Beijing, China, November 21-24, 2022, Proceedings, Part II, Vol. 1745 of Communications in Computer and Information Science, Springer, 2022, pp. 409–423. doi:10.1007/978-981- 19-8991-9_29

work page doi:10.1007/978-981- 2022

[45] [45]

Apruzzese, A

G. Apruzzese, A. Fass, F. Pierazzi, When adversarial perturbations meet concept drift: An exploratory analysis on ML-NIDS, in: AISec 2024, Salt Lake City, UT, USA, October 14-18, 2024, ACM, 2024, pp. 149–

work page 2024

[46] [46]

URL https://doi.org/10.1145/3689932.3694757

doi:10.1145/3689932.3694757. URL https://doi.org/10.1145/3689932.3694757

work page doi:10.1145/3689932.3694757

[47] [47]

In: Proceedings of the 2017 ACM on Asia Con- ference on Computer and Communications Security

N. Papernot, P. D. McDaniel, I. J. Goodfellow, S. Jha, Z. B. Ce- lik, A. Swami, Practical black-box attacks against machine learn- ing, in: Proceedings of the 2017 ACM on Asia Conference on Com- puter and Communications Security, AsiaCCS 2017, Abu Dhabi, United Arab Emirates, April 2-6, 2017, ACM, 2017, pp. 506–519. doi:10.1145/3052973.3053009. URL https:...

work page doi:10.1145/3052973.3053009 2017

[48] [48]

Y. Liu, X. Chen, C. Liu, D. Song, Delving into transferable adversar- ial examples and black-box attacks, in: 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings, OpenReview.net, 2017. URL https://openreview.net/forum?id=Sys6GJqxl

work page 2017

[49] [49]

Grosse, N

K. Grosse, N. Papernot, P. Manoharan, M. Backes, P. D. McDaniel, Adversarial examples for malware detection, in: Computer Security - ESORICS 2017 - 22nd European Symposium on Research in Com- puter Security, Oslo, Norway, September 11-15, 2017, Proceedings, Part II, Lecture Notes in Computer Science, Springer, 2017, pp. 62–

work page 2017

[50] [50]

URL https://doi.org/10.1007/978-3-319-66399-9\_4

doi:10.1007/978-3-319-66399-9_4. URL https://doi.org/10.1007/978-3-319-66399-9\_4

work page doi:10.1007/978-3-319-66399-9_4

[51] [51]

Uesato, B

J. Uesato, B. O’Donoghue, P. Kohli, A. van den Oord, Adversarial risk and the dangers of evaluating against weak attacks, in: Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018, Vol. 80 of Proceedings of Machine Learning Research, PMLR, 2018, pp. 5032– 5041. 42

work page 2018