pith. sign in

arxiv: 1907.08276 · v3 · pith:ZJ4Y5BMHnew · submitted 2019-07-18 · 💻 cs.CR · cs.AI

An AI-based, Multi-stage detection system of banking botnets

Pith reviewed 2026-05-24 19:34 UTC · model grok-4.3

classification 💻 cs.CR cs.AI
keywords banking botnetsAPTdeep learning detectionmulti-stage systemcyber data lakebotnet lifecyclefinancial cybercrime
0
0 comments X

The pith

A multi-stage AI system detects APT-based banking botnets by analyzing their full lifecycle and applying deep learning at key stages.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper first breaks down the step-by-step operation of APT-based banking botnets across their lifecycle. It then introduces a multi-stage detection system that uses a Cyber Data Lake along with several artificial intelligence methods to spot malicious activities aimed at organizations. Evaluation on public datasets indicates that the deep learning components perform substantially better than traditional baseline models. Readers might care because these botnets are major drivers of financially motivated cybercrime, and improved detection could help safeguard financial systems.

Core claim

The authors analyze how an APT-based banking botnet works through its entire lifecycle and present a multi-stage system that detects malicious banking botnet activities potentially targeting organizations. The system leverages Cyber Data Lake as well as multiple artificial intelligence techniques at different stages. The evaluation results using public datasets showed that Deep Learning based detections were highly successful compared with baseline models.

What carries the argument

Multi-stage detection system integrating Cyber Data Lake with AI techniques applied at different botnet lifecycle stages.

If this is right

  • Deep learning models achieve higher success rates than baseline models in detecting botnet activities.
  • The system can be used to protect organizations from targeted banking botnet threats.
  • Step-by-step lifecycle analysis supports the development of stage-specific detection methods.
  • Public datasets serve as effective benchmarks for evaluating such AI-based systems.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the system scales to real-time environments, it could enable proactive blocking of botnet commands before financial damage occurs.
  • Similar multi-stage approaches might apply to detecting other forms of advanced persistent threats beyond banking.
  • Combining the data lake with live network feeds could address gaps in dataset representativeness.

Load-bearing premise

The public datasets used for evaluation accurately represent real-world APT-based banking botnet activities.

What would settle it

Running the system against a set of confirmed real-world banking botnet samples not included in the public datasets and observing whether detection rates drop significantly.

Figures

Figures reproduced from arXiv: 1907.08276 by Erwan A Le Doeuff, Ian Lee, Li Ling, Michael A Silas, Zhiqiang Gao.

Figure 1
Figure 1. Figure 1: • Early-Warning Detection: Fuzzy domain spoof detection for signs of spear phishing campaigns against financial organizations as rapidly as new domains are observed. • Spear Phishing Detection: Deep Learning based detection for spear phishing email cam￾paigns as one way of infection. • DGA Detection: Deep Learning based detection of indicators of infected hosts reaching out to C2 servers. • Advanced data e… view at source ↗
Figure 2
Figure 2. Figure 2: Multi-stage detection framework on cyber Data Lake [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Early warning based on newly observed domains [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: LSTM architecture Each input character is embedded within a 128-dimensional vector space. The translated embedding vector is then fed to an LSTM layer as a 128-step sequence. Finally, classification is performed using a sigmoidal transfer function to an output neuron. The network is trained by backpropagation using a cross-entropy loss function and a dropout layer is added to prevent over fitting. 3.3 DNS … view at source ↗
Figure 5
Figure 5. Figure 5: Training history [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗
read the original abstract

Banking Trojans, botnets are primary drivers of financially-motivated cybercrime. In this paper, we first analyzed how an APT-based banking botnet works step by step through the whole lifecycle. Specifically, we present a multi-stage system that detects malicious banking botnet activities which potentially target the organizations. The system leverages Cyber Data Lake as well as multiple artificial intelligence techniques at different stages. The evaluation results using public datasets showed that Deep Learning based detections were highly successful compared with baseline models.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The manuscript analyzes the step-by-step lifecycle of APT-based banking botnets and proposes a multi-stage detection system that combines a Cyber Data Lake with multiple AI techniques (including deep learning) applied at different stages. It reports that evaluations using public datasets show deep learning detections to be highly successful relative to baseline models.

Significance. If the empirical claims were supported by detailed, reproducible methodology and datasets that capture the targeted APT banking-trojan behaviors, the work could contribute to practical AI-driven defenses against financially motivated threats. As presented, the absence of metrics, models, and dataset specifications prevents any assessment of whether the result would advance the field.

major comments (2)
  1. [Abstract] Abstract: the central claim that 'Deep Learning based detections were highly successful compared with baseline models' supplies no performance metrics, model architectures, training procedures, dataset identifiers, or baseline definitions, making the reported success unverifiable.
  2. [Abstract] Evaluation description: no evidence is provided that the public datasets contain labeled traces matching the APT lifecycle stages (initial compromise, C2, financial payload) analyzed in the paper; generic botnet or DDoS datasets would not substantiate the targeted claim for banking-trojan detection in organizational settings.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We address each major comment below and indicate the revisions we will make.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central claim that 'Deep Learning based detections were highly successful compared with baseline models' supplies no performance metrics, model architectures, training procedures, dataset identifiers, or baseline definitions, making the reported success unverifiable.

    Authors: We agree that the abstract does not include these specifics, which limits immediate verifiability of the claim. We will revise the abstract to incorporate the key performance metrics, model architectures, training procedures, dataset identifiers, and baseline definitions from the evaluation. revision: yes

  2. Referee: [Abstract] Evaluation description: no evidence is provided that the public datasets contain labeled traces matching the APT lifecycle stages (initial compromise, C2, financial payload) analyzed in the paper; generic botnet or DDoS datasets would not substantiate the targeted claim for banking-trojan detection in organizational settings.

    Authors: We agree that the abstract provides no explicit mapping or evidence linking the public datasets to the specific APT lifecycle stages. We will revise the manuscript to add a clear description and justification showing how the chosen datasets contain relevant labeled traces for initial compromise, C2, and financial payload stages. revision: yes

Circularity Check

0 steps flagged

No circularity; empirical evaluation stands on public datasets and baselines

full rationale

The paper describes an analysis of APT-based banking botnet lifecycles followed by a multi-stage AI detection system whose performance is reported via direct comparison against baseline models on public datasets. No equations, derivations, fitted parameters renamed as predictions, or self-citation chains appear in the provided text. The central claim reduces to an empirical success metric rather than any self-referential construction, making the result independent of its own inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The paper is an applied engineering description with no mathematical derivations, free parameters, axioms, or invented entities.

pith-pipeline@v0.9.0 · 5608 in / 985 out tokens · 21808 ms · 2026-05-24T19:34:36.446796+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

23 extracted references · 23 canonical work pages · 1 internal anchor

  1. [1]

    (2013) Botnets: How to Fight the Ever-Growing Threat on a Technical Level

    Tiirmaa-Klaar, H., Gassen, J., Gerhards-Padilla, E., & Martini, P. (2013) Botnets: How to Fight the Ever-Growing Threat on a Technical Level. In Botnets. pp. 41–97. London: Springer

  2. [2]

    & Prakash, B.A

    Subrahmanian, V .S., Ovelgonne, M., Dumitras, T. & Prakash, B.A. (2015) The Global Cyber-Vulnerability Report, November Ed. Springer International Publishing. Table 2: True Positive Rates vs. False Positive Rates Model Name True Positive Rate False Positive Rate True Positive Rate False Positive Rate DGA-baseline 0.97 0.35 0.91 0.08 DGA-LSTM 0.98 0.016 0....

  3. [3]

    McAfee Labs Threat Advisory , May 23 Ed

    Goznym Banking Trojan (2016). McAfee Labs Threat Advisory , May 23 Ed. On- line: https://kc.mcafee.com/resources/sites/MCAFEE/content/live/PRODUCT_DOCUMENTATION/ 26000/PD26519/en_US/McAfeeLabsThreatAdvisory-Goznym_Banking_Trojan.pdf

  4. [4]

    & Amin, R.M

    Hutchins, E.M., Clopp, M.J. & Amin, R.M. (2014) Intelligence-driven Computer Network Defense Informed by Analysis of Adversary Campaigns and Intrusion Kill Chains. White paper. Lockheed Martin Corporation, July Ed, pp. 1–14

  5. [5]

    & Slaughter, J

    Kiwia, D., Dehghantanha, A., Choo, K.R. & Slaughter, J. (2017) A Cyber Kill Chain Based Taxonomy of Banking Trojans for Evolutionary Computational Intelligence, Journal of Computational Science , 27, pp. 394-409. Online: https://doi.org/10.1016/j.jocs.2017.10.020

  6. [6]

    Online: https://www.fireeye.com/content/dam/ collateral/en/mtrends-2018.pdf

    Fireeye Mandiant M-Trends 2018 report . Online: https://www.fireeye.com/content/dam/ collateral/en/mtrends-2018.pdf

  7. [7]

    Online: https://github.com/elceef/dnstwist

    Dnstwist. Online: https://github.com/elceef/dnstwist

  8. [8]

    & Simon, H

    Ding, C., He, X., Husbands, P., Zha, H. & Simon, H. (2004) Link Analysis: Hubs and Authorities on the World. Siam Review, 46(2), pp. 256-268

  9. [9]

    Motwani, R

    Page, L., Brin, S. Motwani, R. & Winograd, T. (1999) The Pagerank Citation Ranking: Bringing order to the Web. Technical report, Stanford Digital Libraries SIDL-WP-1999-10120, 1999

  10. [10]

    & V oelker, G.M

    Ma , J., Saul, L.K., Savage, S. & V oelker, G.M. (2011) Learning to detect malicious URLs, ACM Transactions on Intelligent Systems and Technology (TIST), 2(3), pp.1-24

  11. [11]

    & V oelker, G.M

    Ma, J., Saul, L.K., Savage, S. & V oelker, G.M. (2009) Beyond Blacklists: Learning to Detect Malicious Web Sites from Suspicious URLs, World Wide Web Internet And Web Information Systems, pp. 1245-1253

  12. [12]

    Predicting Domain Generation Algorithms with Long Short-Term Memory Networks

    Woodbridge, J., Anderson, H.S., Ahuja, A. & Grant, D. (2016) Predicting Domain Generation Algorithms with Long Short-Term Memory Networks. Online: http://arxiv.org/abs/1611.00791

  13. [13]

    & Atlasis, A

    Farnham, G. & Atlasis, A. (2013) Detecting DNS Tunneling. Online: https://www.sans.org/ reading-room/whitepapers/dns/paper/34152

  14. [14]

    (2018) Using Splunk to detect DNS Tunneling

    Jaworski, S. (2018) Using Splunk to detect DNS Tunneling. Online: https://www.sans.org/ reading-room/whitepapers/malicious/paper/37022

  15. [15]

    Does Alexa have a list of its top-ranked websites? Online: https://support.alexa.com/hc/en-us/ articles/200449834-Does-Alexa-have-a-list-of-its-top-ranked-websites-

  16. [16]

    Online: https://github.com/baderj/domain_generation_ algorithms

    Domain Generation Algorithms. Online: https://github.com/baderj/domain_generation_ algorithms

  17. [17]

    DMOZ Open Directory Project (2016) Online: http://www.dmoz.org

  18. [18]

    Whois Lookup Online: https://whois.net/

  19. [19]

    PhishTank (2018) Online:http://www.phishtank.com

  20. [20]

    Top 10 Banking Trojans for 2017: What You Need to Know (2017) Online:https://blog.barkly.com/ top-banking-trojans-2017

  21. [21]

    Banking Botnets: The Battle Continues, Dell SecureWorks Counter Threat Unit TM Threat Intelligence, February 19, 2016 Online: https://www.secureworks.com/research/ banking-botnets-the-battle-continues

  22. [22]

    The Great Bank Robbery: the Carbanak APT, Version 2.0, Kasperskey, February, 2015, Online:https: //krebsonsecurity.com/wp-content/uploads/2015/02/Carbanak_APT_eng.pdf

  23. [23]

    MITRE ATT&CK, Online:https://attack.mitre.org/. 9