An AI-based, Multi-stage detection system of banking botnets

Erwan A Le Doeuff; Ian Lee; Li Ling; Michael A Silas; Zhiqiang Gao

arxiv: 1907.08276 · v3 · pith:ZJ4Y5BMHnew · submitted 2019-07-18 · 💻 cs.CR · cs.AI

An AI-based, Multi-stage detection system of banking botnets

Li Ling , Zhiqiang Gao , Michael A Silas , Ian Lee , Erwan A Le Doeuff This is my paper

Pith reviewed 2026-05-24 19:34 UTC · model grok-4.3

classification 💻 cs.CR cs.AI

keywords banking botnetsAPTdeep learning detectionmulti-stage systemcyber data lakebotnet lifecyclefinancial cybercrime

0 comments

The pith

A multi-stage AI system detects APT-based banking botnets by analyzing their full lifecycle and applying deep learning at key stages.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper first breaks down the step-by-step operation of APT-based banking botnets across their lifecycle. It then introduces a multi-stage detection system that uses a Cyber Data Lake along with several artificial intelligence methods to spot malicious activities aimed at organizations. Evaluation on public datasets indicates that the deep learning components perform substantially better than traditional baseline models. Readers might care because these botnets are major drivers of financially motivated cybercrime, and improved detection could help safeguard financial systems.

Core claim

The authors analyze how an APT-based banking botnet works through its entire lifecycle and present a multi-stage system that detects malicious banking botnet activities potentially targeting organizations. The system leverages Cyber Data Lake as well as multiple artificial intelligence techniques at different stages. The evaluation results using public datasets showed that Deep Learning based detections were highly successful compared with baseline models.

What carries the argument

Multi-stage detection system integrating Cyber Data Lake with AI techniques applied at different botnet lifecycle stages.

If this is right

Deep learning models achieve higher success rates than baseline models in detecting botnet activities.
The system can be used to protect organizations from targeted banking botnet threats.
Step-by-step lifecycle analysis supports the development of stage-specific detection methods.
Public datasets serve as effective benchmarks for evaluating such AI-based systems.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the system scales to real-time environments, it could enable proactive blocking of botnet commands before financial damage occurs.
Similar multi-stage approaches might apply to detecting other forms of advanced persistent threats beyond banking.
Combining the data lake with live network feeds could address gaps in dataset representativeness.

Load-bearing premise

The public datasets used for evaluation accurately represent real-world APT-based banking botnet activities.

What would settle it

Running the system against a set of confirmed real-world banking botnet samples not included in the public datasets and observing whether detection rates drop significantly.

Figures

Figures reproduced from arXiv: 1907.08276 by Erwan A Le Doeuff, Ian Lee, Li Ling, Michael A Silas, Zhiqiang Gao.

**Figure 1.** Figure 1: • Early-Warning Detection: Fuzzy domain spoof detection for signs of spear phishing campaigns against financial organizations as rapidly as new domains are observed. • Spear Phishing Detection: Deep Learning based detection for spear phishing email campaigns as one way of infection. • DGA Detection: Deep Learning based detection of indicators of infected hosts reaching out to C2 servers. • Advanced data e… view at source ↗

**Figure 2.** Figure 2: Multi-stage detection framework on cyber Data Lake [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Early warning based on newly observed domains [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: LSTM architecture Each input character is embedded within a 128-dimensional vector space. The translated embedding vector is then fed to an LSTM layer as a 128-step sequence. Finally, classification is performed using a sigmoidal transfer function to an output neuron. The network is trained by backpropagation using a cross-entropy loss function and a dropout layer is added to prevent over fitting. 3.3 DNS … view at source ↗

**Figure 5.** Figure 5: Training history [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

read the original abstract

Banking Trojans, botnets are primary drivers of financially-motivated cybercrime. In this paper, we first analyzed how an APT-based banking botnet works step by step through the whole lifecycle. Specifically, we present a multi-stage system that detects malicious banking botnet activities which potentially target the organizations. The system leverages Cyber Data Lake as well as multiple artificial intelligence techniques at different stages. The evaluation results using public datasets showed that Deep Learning based detections were highly successful compared with baseline models.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper describes a multi-stage detection architecture for banking botnets but supplies no verifiable metrics or evidence that the public datasets match the claimed APT threat model.

read the letter

The core of this paper is a high-level sketch of a multi-stage system that combines a cyber data lake with different AI techniques to catch banking botnet activity across the APT lifecycle. They break down the stages from initial compromise through command-and-control to financial payload delivery, which is a useful framing for anyone thinking about where detection points should sit. The multi-stage idea itself is reasonable engineering: different behaviors at different phases call for different models rather than one monolithic classifier. That part reads as practical and straightforward. Beyond the architecture description, there is little that is new. The work applies established deep learning and data lake methods to this domain without introducing new algorithms, formal models, or reproducible derivations. The evaluation claim is that deep learning detections were highly successful against baselines on public datasets, but the abstract gives no accuracy numbers, no model architectures, no training details, and no dataset identifiers. The stress-test point holds: common public botnet collections such as CTU-13 or ISCX are not built around labeled banking-trojan flows with the specific financial stages described, so reported performance on those sets does not demonstrate relevance to the targeted use case. Without that linkage, the central result remains unverified. The paper is aimed at security engineers in financial organizations who need a starting architecture rather than at researchers seeking new methods or tightly controlled experiments. It does not rise to the level that would justify sending it to serious peer review; the missing evaluation details and dataset mismatch make the main claim impossible to assess as written.

Referee Report

2 major / 0 minor

Summary. The manuscript analyzes the step-by-step lifecycle of APT-based banking botnets and proposes a multi-stage detection system that combines a Cyber Data Lake with multiple AI techniques (including deep learning) applied at different stages. It reports that evaluations using public datasets show deep learning detections to be highly successful relative to baseline models.

Significance. If the empirical claims were supported by detailed, reproducible methodology and datasets that capture the targeted APT banking-trojan behaviors, the work could contribute to practical AI-driven defenses against financially motivated threats. As presented, the absence of metrics, models, and dataset specifications prevents any assessment of whether the result would advance the field.

major comments (2)

[Abstract] Abstract: the central claim that 'Deep Learning based detections were highly successful compared with baseline models' supplies no performance metrics, model architectures, training procedures, dataset identifiers, or baseline definitions, making the reported success unverifiable.
[Abstract] Evaluation description: no evidence is provided that the public datasets contain labeled traces matching the APT lifecycle stages (initial compromise, C2, financial payload) analyzed in the paper; generic botnet or DDoS datasets would not substantiate the targeted claim for banking-trojan detection in organizational settings.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We address each major comment below and indicate the revisions we will make.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim that 'Deep Learning based detections were highly successful compared with baseline models' supplies no performance metrics, model architectures, training procedures, dataset identifiers, or baseline definitions, making the reported success unverifiable.

Authors: We agree that the abstract does not include these specifics, which limits immediate verifiability of the claim. We will revise the abstract to incorporate the key performance metrics, model architectures, training procedures, dataset identifiers, and baseline definitions from the evaluation. revision: yes
Referee: [Abstract] Evaluation description: no evidence is provided that the public datasets contain labeled traces matching the APT lifecycle stages (initial compromise, C2, financial payload) analyzed in the paper; generic botnet or DDoS datasets would not substantiate the targeted claim for banking-trojan detection in organizational settings.

Authors: We agree that the abstract provides no explicit mapping or evidence linking the public datasets to the specific APT lifecycle stages. We will revise the manuscript to add a clear description and justification showing how the chosen datasets contain relevant labeled traces for initial compromise, C2, and financial payload stages. revision: yes

Circularity Check

0 steps flagged

No circularity; empirical evaluation stands on public datasets and baselines

full rationale

The paper describes an analysis of APT-based banking botnet lifecycles followed by a multi-stage AI detection system whose performance is reported via direct comparison against baseline models on public datasets. No equations, derivations, fitted parameters renamed as predictions, or self-citation chains appear in the provided text. The central claim reduces to an empirical success metric rather than any self-referential construction, making the result independent of its own inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The paper is an applied engineering description with no mathematical derivations, free parameters, axioms, or invented entities.

pith-pipeline@v0.9.0 · 5608 in / 985 out tokens · 21808 ms · 2026-05-24T19:34:36.446796+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

23 extracted references · 23 canonical work pages · 1 internal anchor

[1]

(2013) Botnets: How to Fight the Ever-Growing Threat on a Technical Level

Tiirmaa-Klaar, H., Gassen, J., Gerhards-Padilla, E., & Martini, P. (2013) Botnets: How to Fight the Ever-Growing Threat on a Technical Level. In Botnets. pp. 41–97. London: Springer

work page 2013
[2]

& Prakash, B.A

Subrahmanian, V .S., Ovelgonne, M., Dumitras, T. & Prakash, B.A. (2015) The Global Cyber-Vulnerability Report, November Ed. Springer International Publishing. Table 2: True Positive Rates vs. False Positive Rates Model Name True Positive Rate False Positive Rate True Positive Rate False Positive Rate DGA-baseline 0.97 0.35 0.91 0.08 DGA-LSTM 0.98 0.016 0....

work page 2015
[3]

McAfee Labs Threat Advisory , May 23 Ed

Goznym Banking Trojan (2016). McAfee Labs Threat Advisory , May 23 Ed. On- line: https://kc.mcafee.com/resources/sites/MCAFEE/content/live/PRODUCT_DOCUMENTATION/ 26000/PD26519/en_US/McAfeeLabsThreatAdvisory-Goznym_Banking_Trojan.pdf

work page 2016
[4]

& Amin, R.M

Hutchins, E.M., Clopp, M.J. & Amin, R.M. (2014) Intelligence-driven Computer Network Defense Informed by Analysis of Adversary Campaigns and Intrusion Kill Chains. White paper. Lockheed Martin Corporation, July Ed, pp. 1–14

work page 2014
[5]

& Slaughter, J

Kiwia, D., Dehghantanha, A., Choo, K.R. & Slaughter, J. (2017) A Cyber Kill Chain Based Taxonomy of Banking Trojans for Evolutionary Computational Intelligence, Journal of Computational Science , 27, pp. 394-409. Online: https://doi.org/10.1016/j.jocs.2017.10.020

work page doi:10.1016/j.jocs.2017.10.020 2017
[6]

Online: https://www.fireeye.com/content/dam/ collateral/en/mtrends-2018.pdf

Fireeye Mandiant M-Trends 2018 report . Online: https://www.fireeye.com/content/dam/ collateral/en/mtrends-2018.pdf

work page 2018
[7]

Online: https://github.com/elceef/dnstwist

Dnstwist. Online: https://github.com/elceef/dnstwist

work page
[8]

& Simon, H

Ding, C., He, X., Husbands, P., Zha, H. & Simon, H. (2004) Link Analysis: Hubs and Authorities on the World. Siam Review, 46(2), pp. 256-268

work page 2004
[9]

Motwani, R

Page, L., Brin, S. Motwani, R. & Winograd, T. (1999) The Pagerank Citation Ranking: Bringing order to the Web. Technical report, Stanford Digital Libraries SIDL-WP-1999-10120, 1999

work page 1999
[10]

& V oelker, G.M

Ma , J., Saul, L.K., Savage, S. & V oelker, G.M. (2011) Learning to detect malicious URLs, ACM Transactions on Intelligent Systems and Technology (TIST), 2(3), pp.1-24

work page 2011
[11]

& V oelker, G.M

Ma, J., Saul, L.K., Savage, S. & V oelker, G.M. (2009) Beyond Blacklists: Learning to Detect Malicious Web Sites from Suspicious URLs, World Wide Web Internet And Web Information Systems, pp. 1245-1253

work page 2009
[12]

Predicting Domain Generation Algorithms with Long Short-Term Memory Networks

Woodbridge, J., Anderson, H.S., Ahuja, A. & Grant, D. (2016) Predicting Domain Generation Algorithms with Long Short-Term Memory Networks. Online: http://arxiv.org/abs/1611.00791

work page internal anchor Pith review Pith/arXiv arXiv 2016
[13]

& Atlasis, A

Farnham, G. & Atlasis, A. (2013) Detecting DNS Tunneling. Online: https://www.sans.org/ reading-room/whitepapers/dns/paper/34152

work page 2013
[14]

(2018) Using Splunk to detect DNS Tunneling

Jaworski, S. (2018) Using Splunk to detect DNS Tunneling. Online: https://www.sans.org/ reading-room/whitepapers/malicious/paper/37022

work page 2018
[15]

Does Alexa have a list of its top-ranked websites? Online: https://support.alexa.com/hc/en-us/ articles/200449834-Does-Alexa-have-a-list-of-its-top-ranked-websites-

work page arXiv
[16]

Online: https://github.com/baderj/domain_generation_ algorithms

Domain Generation Algorithms. Online: https://github.com/baderj/domain_generation_ algorithms

work page
[17]

DMOZ Open Directory Project (2016) Online: http://www.dmoz.org

work page 2016
[18]

Whois Lookup Online: https://whois.net/

work page
[19]

PhishTank (2018) Online:http://www.phishtank.com

work page 2018
[20]

Top 10 Banking Trojans for 2017: What You Need to Know (2017) Online:https://blog.barkly.com/ top-banking-trojans-2017

work page 2017
[21]

Banking Botnets: The Battle Continues, Dell SecureWorks Counter Threat Unit TM Threat Intelligence, February 19, 2016 Online: https://www.secureworks.com/research/ banking-botnets-the-battle-continues

work page 2016
[22]

The Great Bank Robbery: the Carbanak APT, Version 2.0, Kasperskey, February, 2015, Online:https: //krebsonsecurity.com/wp-content/uploads/2015/02/Carbanak_APT_eng.pdf

work page 2015
[23]

MITRE ATT&CK, Online:https://attack.mitre.org/. 9

work page

[1] [1]

(2013) Botnets: How to Fight the Ever-Growing Threat on a Technical Level

Tiirmaa-Klaar, H., Gassen, J., Gerhards-Padilla, E., & Martini, P. (2013) Botnets: How to Fight the Ever-Growing Threat on a Technical Level. In Botnets. pp. 41–97. London: Springer

work page 2013

[2] [2]

& Prakash, B.A

Subrahmanian, V .S., Ovelgonne, M., Dumitras, T. & Prakash, B.A. (2015) The Global Cyber-Vulnerability Report, November Ed. Springer International Publishing. Table 2: True Positive Rates vs. False Positive Rates Model Name True Positive Rate False Positive Rate True Positive Rate False Positive Rate DGA-baseline 0.97 0.35 0.91 0.08 DGA-LSTM 0.98 0.016 0....

work page 2015

[3] [3]

McAfee Labs Threat Advisory , May 23 Ed

Goznym Banking Trojan (2016). McAfee Labs Threat Advisory , May 23 Ed. On- line: https://kc.mcafee.com/resources/sites/MCAFEE/content/live/PRODUCT_DOCUMENTATION/ 26000/PD26519/en_US/McAfeeLabsThreatAdvisory-Goznym_Banking_Trojan.pdf

work page 2016

[4] [4]

& Amin, R.M

Hutchins, E.M., Clopp, M.J. & Amin, R.M. (2014) Intelligence-driven Computer Network Defense Informed by Analysis of Adversary Campaigns and Intrusion Kill Chains. White paper. Lockheed Martin Corporation, July Ed, pp. 1–14

work page 2014

[5] [5]

& Slaughter, J

Kiwia, D., Dehghantanha, A., Choo, K.R. & Slaughter, J. (2017) A Cyber Kill Chain Based Taxonomy of Banking Trojans for Evolutionary Computational Intelligence, Journal of Computational Science , 27, pp. 394-409. Online: https://doi.org/10.1016/j.jocs.2017.10.020

work page doi:10.1016/j.jocs.2017.10.020 2017

[6] [6]

Online: https://www.fireeye.com/content/dam/ collateral/en/mtrends-2018.pdf

Fireeye Mandiant M-Trends 2018 report . Online: https://www.fireeye.com/content/dam/ collateral/en/mtrends-2018.pdf

work page 2018

[7] [7]

Online: https://github.com/elceef/dnstwist

Dnstwist. Online: https://github.com/elceef/dnstwist

work page

[8] [8]

& Simon, H

Ding, C., He, X., Husbands, P., Zha, H. & Simon, H. (2004) Link Analysis: Hubs and Authorities on the World. Siam Review, 46(2), pp. 256-268

work page 2004

[9] [9]

Motwani, R

Page, L., Brin, S. Motwani, R. & Winograd, T. (1999) The Pagerank Citation Ranking: Bringing order to the Web. Technical report, Stanford Digital Libraries SIDL-WP-1999-10120, 1999

work page 1999

[10] [10]

& V oelker, G.M

Ma , J., Saul, L.K., Savage, S. & V oelker, G.M. (2011) Learning to detect malicious URLs, ACM Transactions on Intelligent Systems and Technology (TIST), 2(3), pp.1-24

work page 2011

[11] [11]

& V oelker, G.M

Ma, J., Saul, L.K., Savage, S. & V oelker, G.M. (2009) Beyond Blacklists: Learning to Detect Malicious Web Sites from Suspicious URLs, World Wide Web Internet And Web Information Systems, pp. 1245-1253

work page 2009

[12] [12]

Predicting Domain Generation Algorithms with Long Short-Term Memory Networks

Woodbridge, J., Anderson, H.S., Ahuja, A. & Grant, D. (2016) Predicting Domain Generation Algorithms with Long Short-Term Memory Networks. Online: http://arxiv.org/abs/1611.00791

work page internal anchor Pith review Pith/arXiv arXiv 2016

[13] [13]

& Atlasis, A

Farnham, G. & Atlasis, A. (2013) Detecting DNS Tunneling. Online: https://www.sans.org/ reading-room/whitepapers/dns/paper/34152

work page 2013

[14] [14]

(2018) Using Splunk to detect DNS Tunneling

Jaworski, S. (2018) Using Splunk to detect DNS Tunneling. Online: https://www.sans.org/ reading-room/whitepapers/malicious/paper/37022

work page 2018

[15] [15]

Does Alexa have a list of its top-ranked websites? Online: https://support.alexa.com/hc/en-us/ articles/200449834-Does-Alexa-have-a-list-of-its-top-ranked-websites-

work page arXiv

[16] [16]

Online: https://github.com/baderj/domain_generation_ algorithms

Domain Generation Algorithms. Online: https://github.com/baderj/domain_generation_ algorithms

work page

[17] [17]

DMOZ Open Directory Project (2016) Online: http://www.dmoz.org

work page 2016

[18] [18]

Whois Lookup Online: https://whois.net/

work page

[19] [19]

PhishTank (2018) Online:http://www.phishtank.com

work page 2018

[20] [20]

Top 10 Banking Trojans for 2017: What You Need to Know (2017) Online:https://blog.barkly.com/ top-banking-trojans-2017

work page 2017

[21] [21]

Banking Botnets: The Battle Continues, Dell SecureWorks Counter Threat Unit TM Threat Intelligence, February 19, 2016 Online: https://www.secureworks.com/research/ banking-botnets-the-battle-continues

work page 2016

[22] [22]

The Great Bank Robbery: the Carbanak APT, Version 2.0, Kasperskey, February, 2015, Online:https: //krebsonsecurity.com/wp-content/uploads/2015/02/Carbanak_APT_eng.pdf

work page 2015

[23] [23]

MITRE ATT&CK, Online:https://attack.mitre.org/. 9

work page