pith. sign in

arxiv: 2605.23004 · v1 · pith:ZPDJLWAOnew · submitted 2026-05-21 · 💻 cs.CR

Botnet Detection on CTU-13 Using Lightweight Machine Learning Models

Pith reviewed 2026-05-25 05:30 UTC · model grok-4.3

classification 💻 cs.CR
keywords botnet detectionCTU-13 datasetrandom forestlightweight machine learningflow-based featurescybersecuritymachine learning comparisonnetwork traffic analysis
0
0 comments X

The pith

Lightweight machine learning models achieve competitive botnet detection on CTU-13 with far lower computational cost than deep learning approaches.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper evaluates Logistic Regression, Decision Tree, and Random Forest on the CTU-13 dataset using flow-based features for botnet detection. It reports that Random Forest reaches a PR-AUC of approximately 0.54 and ROC-AUC of 0.97 while training over 90 percent faster than published CNN baselines. A sympathetic reader would care because effective detection must work under class imbalance, remain interpretable for forensics, and run at low cost. The results indicate that complex deep learning is not required when simpler models suffice on this benchmark.

Core claim

On the CTU-13 dataset, a Random Forest classifier trained on interpretable flow-based features attains a PR-AUC of approximately 0.54 and an ROC-AUC of 0.97, while requiring over 90 percent less training time than published CNN baselines, showing that lightweight models can match deep-learning performance under natural class imbalance while preserving interpretability and low computational cost.

What carries the argument

Random Forest applied to flow-based features extracted from CTU-13 network traffic, which supports fast training and direct feature-importance inspection.

If this is right

  • Lightweight models can operate in resource-limited settings where deep learning is impractical.
  • Feature importance from Random Forest and Decision Tree supplies actionable clues for forensic analysis of botnet traffic.
  • Training speed allows models to be retrained frequently as new traffic patterns appear.
  • Interpretability reduces reliance on black-box decisions in security operations.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same flow features and lightweight approach might transfer to other network security datasets with similar imbalance.
  • Feature rankings could highlight specific traffic statistics that reliably signal botnet activity across different capture environments.
  • The speed gain might support on-device or edge deployment for continuous monitoring rather than centralized servers.

Load-bearing premise

The extracted flow-based features from CTU-13 are sufficient to separate botnet from normal traffic and the CNN baseline comparisons use equivalent data splits, features, and protocols.

What would settle it

Retraining the published CNN baselines on the exact same flow features, data splits, and evaluation protocol used for the Random Forest and comparing both accuracy and training time.

Figures

Figures reproduced from arXiv: 2605.23004 by Naveen Kumar Chaudhary, Subhash Gurappa, Sundararaj Sitharama Iyengar, Yashas Hariprasad.

Figure 1
Figure 1. Figure 1: Left: Top protocols by count. Right: botnet rate by protocol. [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Distributions of log1p_totpkts, log1p_totbytes, and log1p_bytes_per_pkt (normal vs. botnet). 3.4 Train–Test Split We adopt a stratified 70/30 split, preserving the ≈ 2.48% botnet ratio in both sets. We retain the natural imbalance (no oversampling/SMOTE in main results) to avoid unrealistic balanced splits and to reflect operational prevalence. For analysis we also report metrics at a tuned decision thresh… view at source ↗
Figure 3
Figure 3. Figure 3: Feature correlation (sample). Clustering among log-scaled byte/packet features; informative ratio feature [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Top-10 test-time permutation importances (AUPRC drop) for the Random Forest. [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Logistic Regression: Precision–Recall [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗
Figure 7
Figure 7. Figure 7: Logistic Regression: Precision/Recall vs Threshold. [PITH_FULL_IMAGE:figures/full_fig_p009_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Decision Tree: precision–recall curve [PITH_FULL_IMAGE:figures/full_fig_p010_8.png] view at source ↗
Figure 10
Figure 10. Figure 10: Decision Tree: precision (blue) and recall (orange) vs threshold. [PITH_FULL_IMAGE:figures/full_fig_p010_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Random Forest: precision–recall curve [PITH_FULL_IMAGE:figures/full_fig_p011_11.png] view at source ↗
Figure 13
Figure 13. Figure 13: Random Forest: precision (blue) and recall (orange) vs threshold. [PITH_FULL_IMAGE:figures/full_fig_p012_13.png] view at source ↗
read the original abstract

Botnets are among the most persistent cyber threats, enabling large-scale attacks such as spam, credential theft, and distributed denial-of-service (DDoS). While deep learning approaches have recently been applied to botnet detection, they are computationally intensive and often lack interpretability. We present a comparative study of lightweight machine learning models including Logistic Regression, Decision Tree, and Random Forest on the CTU-13 dataset, a benchmark for botnet traffic analysis. We extract interpretable flow-based features and evaluate each model on detection accuracy, precision, recall, F1 score, and feature importance. Results demonstrate that lightweight models can achieve competitive detection performance with minimal computational cost, while also offering interpretability critical for forensic investigation. On CTU-13, our Random Forest achieves a PR-AUC of approximately 0.54 and ROC-AUC of 0.97 while training over 90% faster than published CNN baselines. These results demonstrate that lightweight models can match or exceed deep-learning performance under natural class imbalance while maintaining interpretability and low computational cost.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper presents a comparative empirical study of lightweight ML models (Logistic Regression, Decision Tree, Random Forest) for botnet detection on the CTU-13 dataset. It extracts flow-based features, evaluates the models on standard metrics including PR-AUC and ROC-AUC, and claims that Random Forest reaches PR-AUC ≈0.54 and ROC-AUC 0.97 while training >90% faster than published CNN baselines, with added benefits of interpretability and low computational cost under natural class imbalance.

Significance. If the CNN baseline comparisons are performed under identical feature sets, data partitions, and evaluation protocols, the result would show that simple, interpretable models can deliver competitive detection performance with substantially lower training cost than deep learning approaches; this would be practically relevant for resource-constrained network monitoring and forensic analysis.

major comments (2)
  1. [Abstract] Abstract: the central claim that Random Forest achieves competitive PR-AUC/ROC-AUC and >90% training speedup relative to 'published CNN baselines' requires that those baselines used the identical flow-based feature set, the same CTU-13 train/test splits, and the same class-imbalance handling; the manuscript supplies no table, section, or appendix confirming equivalence of experimental conditions rather than citing heterogeneous prior papers.
  2. [Methods / Results] Methods / Results (inferred from abstract and reported metrics): the manuscript reports concrete numeric results (PR-AUC 0.54, ROC-AUC 0.97) but provides no details on the feature extraction code or procedure, hyperparameter search, cross-validation procedure, or statistical significance testing; class imbalance is acknowledged yet the handling method remains unspecified.
minor comments (1)
  1. [Abstract] The abstract gives PR-AUC as 'approximately 0.54'; reporting the exact value together with the precise experimental configuration would improve reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments highlighting the need for clearer experimental equivalence and greater methodological detail. We address each point below and will make the indicated revisions to improve transparency without overstating the comparisons.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central claim that Random Forest achieves competitive PR-AUC/ROC-AUC and >90% training speedup relative to 'published CNN baselines' requires that those baselines used the identical flow-based feature set, the same CTU-13 train/test splits, and the same class-imbalance handling; the manuscript supplies no table, section, or appendix confirming equivalence of experimental conditions rather than citing heterogeneous prior papers.

    Authors: We agree that the claim of competitiveness would be stronger under identical conditions. The CNN results are taken from published literature on CTU-13 rather than re-implemented by us under the same feature set, splits, and imbalance handling. We will revise the abstract and add a new subsection (or table) explicitly listing the cited CNN papers, noting the differences in experimental protocols, and qualifying the speedup claim as approximate and based on reported training times in those works. This addresses the concern directly while retaining the core observation about lightweight models. revision: yes

  2. Referee: [Methods / Results] Methods / Results (inferred from abstract and reported metrics): the manuscript reports concrete numeric results (PR-AUC 0.54, ROC-AUC 0.97) but provides no details on the feature extraction code or procedure, hyperparameter search, cross-validation procedure, or statistical significance testing; class imbalance is acknowledged yet the handling method remains unspecified.

    Authors: The full manuscript contains a methods section on flow-based feature extraction from CTU-13, but we accept that additional specifics are required for reproducibility. We will expand the methods with: (1) the exact feature extraction procedure and any code references or pseudocode, (2) the hyperparameter search method and ranges used, (3) the cross-validation procedure, (4) any statistical significance tests performed, and (5) explicit description of class-imbalance handling (including reliance on PR-AUC and any weighting or sampling applied). These details will be added in the revised version. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical ML evaluation on fixed dataset

full rationale

The paper reports measured performance (PR-AUC 0.54, ROC-AUC 0.97, training speedup) from training Logistic Regression, Decision Tree, and Random Forest on flow-based features extracted from CTU-13. No equations, derivations, ansatzes, or predictions appear. Baseline comparisons are external empirical claims, not reductions to self-defined inputs or self-citations. The central results are direct outputs of standard ML training and evaluation protocols.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 0 invented entities

The central claim rests on the assumption that CTU-13 labels and flow features are reliable ground truth and that model hyperparameters were chosen without introducing selection bias. No new entities are postulated.

free parameters (2)
  • Random Forest hyperparameters (n_estimators, max_depth, etc.)
    Chosen during model training to optimize performance on the given dataset split.
  • Feature selection thresholds or preprocessing parameters
    Determined from the CTU-13 data to produce the reported metrics.
axioms (1)
  • domain assumption CTU-13 dataset provides accurate labels and representative botnet traffic samples
    Used as the sole benchmark without additional validation or external datasets.

pith-pipeline@v0.9.0 · 5727 in / 1225 out tokens · 20975 ms · 2026-05-25T05:30:28.610814+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

40 extracted references · 40 canonical work pages · 1 internal anchor

  1. [1]

    Snort – Lightweight intrusion detection for networks,

    M. Roesch, “Snort – Lightweight intrusion detection for networks,” inUSENIX LISA, 1999

  2. [2]

    Measuring and detecting fast-flux service networks,

    T. Holz, C. Gorecki, K. Rieck, and F. Freiling, “Measuring and detecting fast-flux service networks,” inNDSS, 2008

  3. [3]

    Anomalous payload-based network intrusion detection,

    K. Wang and S. Stolfo, “Anomalous payload-based network intrusion detection,” inRAID, 2004

  4. [4]

    An Anomaly-based Botnet Detection Approach for Identifying Stealthy Botnets

    S. Arshad, M. Farooq, and A. Khokhar, “Anomaly-based botnet detection using cluster analysis,” arXiv preprint arXiv:1811.00925, 2018

  5. [5]

    Unsupervised deep learning for IoT botnet detection,

    E. Apostol, C. Florescu, and F. Pop, “Unsupervised deep learning for IoT botnet detection,”Electronics, vol. 10, no. 16, p. 1876, 2021

  6. [6]

    Machine learning models for botnet detection using CTU-13,

    R. Padhiar et al., “Machine learning models for botnet detection using CTU-13,”Int. J. Electrical and Computer Engineering (IJECE), vol. 12, no. 6, pp. 6109–6119, 2022

  7. [7]

    Lightweight DDoS flooding attack detection using NOX/OpenFlow,

    R. Braga, E. Mota, and A. Passito, “Lightweight DDoS flooding attack detection using NOX/OpenFlow,” inIEEE LCN, 2010. 13 Botnet Detection on CTU-13

  8. [8]

    Outside the closed world: On using machine learning for network intrusion detection,

    R. Sommer and V . Paxson, “Outside the closed world: On using machine learning for network intrusion detection,” inIEEE Symp. Security and Privacy, 2010

  9. [9]

    Deep learning approach for intelligent intrusion detection system,

    R. Vinayakumar et al., “Deep learning approach for intelligent intrusion detection system,”IEEE Access, vol. 7, pp. 41525–41550, 2019

  10. [10]

    Kitsune: An ensemble of autoencoders for online network intrusion detection,

    Y . Mirsky, T. Doitshman, Y . Elovici, and A. Shabtai, “Kitsune: An ensemble of autoencoders for online network intrusion detection,” inNDSS, 2018

  11. [11]

    Hybrid deep learning and explainability for botnet detection,

    M. Ullah et al., “Hybrid deep learning and explainability for botnet detection,”Scientific Reports, vol. 15, no. 1253, 2025

  12. [12]

    Tracking temporal evolution of network activity for botnet detection,

    A. Sinha et al., “Tracking temporal evolution of network activity for botnet detection,” arXiv preprint arXiv:1908.03443, 2019

  13. [13]

    A deep learning approach for network intrusion detection system,

    A. Javaid, Q. Niyaz, W. Sun, and M. Alam, “A deep learning approach for network intrusion detection system,” EAI Endorsed Trans. Security and Safety, vol. 3, no. 9, 2016

  14. [14]

    Botnet detection by monitoring group activities in DNS traffic,

    H. Choi, H. Lee, H. Lee, and H. Kim, “Botnet detection by monitoring group activities in DNS traffic,” IEEE, 2007

  15. [15]

    Graph-based botnet detection using GNNs,

    M. Garcia, R. Shirazi, and J. M. Pedersen, “Graph-based botnet detection using GNNs,”IEEE Access, vol. 11, pp. 13432–13445, 2023

  16. [16]

    An extension of synthetic minority oversampling technique based on Kalman filter for imbalanced datasets,

    G. S. Thejas, Y . Hariprasad, S. S. Iyengar, N. R. Sunitha, P. Badrinath, and S. Chennupati, “An extension of synthetic minority oversampling technique based on Kalman filter for imbalanced datasets,”Machine Learning with Applications, vol. 8, p. 100267, 2022

  17. [17]

    Securing the future: Advanced encryption for quantum-safe video transmission,

    Y . Hariprasad, S. S. Iyengar, and N. K. Chaudhary, “Securing the future: Advanced encryption for quantum-safe video transmission,”IEEE Trans. Consumer Electronics, 2024

  18. [18]

    Fog forensics: A comprehensive review of forensic models for fog computing environment,

    K. ´Sniatała, Y . Hariprasad, K. J. Latesh Kumar, N. K. Chaudhary, and M. Weissenberg, “Fog forensics: A comprehensive review of forensic models for fog computing environment,” inInt. Conf. Information Security, Privacy and Digital Forensics, Springer, 2022, pp. 31–42

  19. [19]

    AI powered correlation technique to detect virtual machine attacks in private cloud environment,

    K. J. Latesh Kumar, Y . Hariprasad, K. S. Ramesh, and N. K. Chaudhary, “AI powered correlation technique to detect virtual machine attacks in private cloud environment,” inAI Embedded Assurance for Cyber Systems, Springer, 2023, pp. 183–199

  20. [20]

    Cyber security attack detection framework for DODAG control message flooding in an IoT network,

    J. Miller, L. Egharevba, Y . Hariprasad, K. J. Latesh, and N. K. Chaudhary, “Cyber security attack detection framework for DODAG control message flooding in an IoT network,” inInt. Conf. Information Security, Privacy and Digital Forensics, Springer, 2022, pp. 213–230

  21. [21]

    Cyber threat intelligence and security for federated learning in digital forensics,

    S. S. Iyengar, S. Nabavirazavi, Y . Hariprasad, H. B. Prasad, and C. K. Mohan, “Cyber threat intelligence and security for federated learning in digital forensics,” inArtificial Intelligence in Practice, Springer, 2025, pp. 177–199

  22. [22]

    Privacy-preserving AI (federated learning) for digital forensics,

    S. S. Iyengar, S. Nabavirazavi, Y . Hariprasad, H. B. Prasad, and C. K. Mohan, “Privacy-preserving AI (federated learning) for digital forensics,” inArtificial Intelligence in Practice, Springer, 2025, pp. 161–176

  23. [23]

    The convergence of AI/ML and cybersecurity: Advancing digital forensic techniques,

    S. S. Iyengar, S. Nabavirazavi, Y . Hariprasad, H. B. Prasad, and C. K. Mohan, “The convergence of AI/ML and cybersecurity: Advancing digital forensic techniques,” inArtificial Intelligence in Practice, Springer, 2025, pp. 139–159

  24. [24]

    Cybersecurity foundations: Theories, technologies, and applications,

    S. S. Iyengar, S. Nabavirazavi, Y . Hariprasad, H. B. Prasad, and C. K. Mohan, “Cybersecurity foundations: Theories, technologies, and applications,” inArtificial Intelligence in Practice, Springer, 2025, pp. 27–87

  25. [25]

    S. S. Iyengar, S. Nabavirazavi, Y . Hariprasad, H. B. Prasad, and C. K. Mohan,Artificial Intelligence in Practice: Theory and Application for Cyber Security and Forensics. Springer Nature, 2025

  26. [26]

    The evolution of artificial intelligence and machine learning,

    S. S. Iyengar, S. Nabavirazavi, Y . Hariprasad, H. B. Prasad, and C. K. Mohan, “The evolution of artificial intelligence and machine learning,” inArtificial Intelligence in Practice, Springer, 2025, pp. 3–26

  27. [27]

    C. Wang, S. S. Iyengar, and K. Sun, Eds.,AI Embedded Assurance for Cyber Systems. Springer, 2023

  28. [28]

    Singaram, S

    J. Singaram, S. S. Iyengar, and A. M. Madni,Deep Learning Networks. Springer Nature, 2023

  29. [29]

    Temporal deepfake generation and detection in video sequences using recurrent neural networks (RNNs),

    S. S. Iyengar, S. Nabavirazavi, Y . Hariprasad, H. B. Prasad, and C. K. Mohan, “Temporal deepfake generation and detection in video sequences using recurrent neural networks (RNNs),” inArtificial Intelligence in Practice, Springer, 2025, pp. 309–334

  30. [30]

    Boundary-based fake face anomaly detection in videos using recurrent neural networks,

    Y . Hariprasad, K. J. Latesh Kumar, L. Suraj, and S. S. Iyengar, “Boundary-based fake face anomaly detection in videos using recurrent neural networks,” inProc. SAI Intelligent Systems Conf., Springer, 2022, pp. 155–169. 14 Botnet Detection on CTU-13

  31. [31]

    A comparative study of deep learning models for image super-resolution across various magnification levels,

    J. Soni, S. Gurappa, and H. Upadhyay, “A comparative study of deep learning models for image super-resolution across various magnification levels,” inProc. IEEE Int. Conf. Future Machine Learning and Data Science (FMLDS), Nov. 2024, pp. 395–400

  32. [32]

    Enhancing digital security: A novel dual- paradigm approach for robust deepfake detection using pre- and post-quantum-trained neural networks,

    S. Gupta, Y . Hariprasad, S. S. Iyengar, S. Gurappa, and P. Mohanty, “Enhancing digital security: A novel dual- paradigm approach for robust deepfake detection using pre- and post-quantum-trained neural networks,”ACM Digital Threats: Research and Practice, 2026

  33. [33]

    MedSR-Vision: Deep learning framework for multi- domain medical image super-resolution,

    S. Gurappa, T. Satharasi, Y . Hariprasad, and S. S. Iyengar, “MedSR-Vision: Deep learning framework for multi- domain medical image super-resolution,” inProc. 12th Annual Conf. Computational Science and Computational Intelligence (CSCI), 2025, in press

  34. [34]

    Botnet detection on CTU-13 using lightweight machine learning models,

    S. Gurappa, Y . Hariprasad, S. S. Iyengar, and N. K. Chaudhary, “Botnet detection on CTU-13 using lightweight machine learning models,” inProc. 4th Int. Conf. Information Security, Privacy and Digital Forensics (ICISPD), 2025, in press

  35. [35]

    Empowering future cybersecurity leaders: Advancing students through FINDS education for digital forensic excellence,

    Y . Hariprasad, S. Gurappa, S. S. Iyengar, J. F. Miller, P. Mohanty, and N. K. Chaudhary, “Empowering future cybersecurity leaders: Advancing students through FINDS education for digital forensic excellence,” arXiv preprint arXiv:2603.00222, 2026

  36. [36]

    Do medical deepfake detectors generalize across imaging modalities?,

    S. Gurappa, Y . Hariprasad, S. Inturu, L. Ertaul, and S. S. Iyengar, “Do medical deepfake detectors generalize across imaging modalities?,” inProc. 2026 Int. Conf. on the AI Revolution: Research, Ethics, and Society (AIR-RES), Las Vegas, NV , USA, Apr. 2026, in press

  37. [37]

    Multimodal deep learning framework for forensic emotion and behavior signal analysis,

    Y . Hariprasad, S. Gurappa, and P. Mohanty, “Multimodal deep learning framework for forensic emotion and behavior signal analysis,” inArtificial Intelligence Driven Forensics, Springer Nature Switzerland, 2025, pp. 295–305

  38. [38]

    Advancing forensic science: AI and knowledge graphs unlock new insights,

    S. S. Iyengar, S. Nabavirazavi, H. Rathore, Y . Hariprasad, and N. K. Chaudhary, “Advancing forensic science: AI and knowledge graphs unlock new insights,” 2024

  39. [39]

    Game theory for cyber security and privacy,

    C. T. Do, N. H. Tran, C. Hong, C. A. Kamhoua, K. A. Kwiat, E. Blasch, S. Ren, N. Pissinou, and S. S. Iyengar, “Game theory for cyber security and privacy,”ACM Computing Surveys, vol. 50, no. 2, pp. 1–37, 2017

  40. [40]

    Applications of machine learning and artificial intelligence in intelligent transportation system: A review,

    D. Gangwani and P. Gangwani, “Applications of machine learning and artificial intelligence in intelligent transportation system: A review,” inApplications of Artificial Intelligence and Machine Learning: Select Proc. ICAAAIML 2020, Springer, 2021, pp. 203–216. 15