pith. sign in

arxiv: 2604.04952 · v5 · submitted 2026-04-03 · 💻 cs.CR

ML Defender (aRGus NDR): An Open-Source Embedded ML NIDS for Botnet and Anomalous Traffic Detection in Resource-Constrained Organizations

Pith reviewed 2026-05-13 20:21 UTC · model grok-4.3

classification 💻 cs.CR
keywords network intrusion detectionbotnet detectionmachine learningembedded NIDSopen sourceCTU-13 datasetanomaly detectioneBPF
0
0 comments X

The pith

An open-source ML network detector on low-cost hardware detects botnet traffic with F1 of 0.9985 where signature and scripted systems largely fail.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces ML Defender, an embedded machine-learning NIDS built for small organizations that cannot afford enterprise security tools. It demonstrates through head-to-head testing on the CTU-13 Neris botnet dataset that the system reaches near-perfect recall while producing only two false positives among more than twelve thousand benign flows. In the same controlled conditions Suricata with over fifty thousand rules generates no alerts at all and Zeek logs the botnet activity yet produces an F1 score of only 0.042. The work releases the full C++20 implementation under the MIT license together with the claim that signature, scripted-behavioral, and ML-behavioral approaches operate at different encoding layers and are therefore complementary.

Core claim

ML Defender runs a six-component pipeline that captures packets via eBPF/XDP, moves data with ZeroMQ and Protocol Buffers, and classifies flows with a dual-score Fast Detector plus Random Forest model. On the CTU-13 Neris dataset it records F1=0.9985, precision=0.9969 and recall=1.0. Under identical conditions Suricata 6.0.10 with 50,010 ET Open rules alerts on zero packets even after offline confirmation that relevant IRC, C2 and trojan signatures were active, while Zeek 8.1.2 observes the complete botnet profile in structured logs yet raises only fourteen correct alerts. These outcomes are presented as evidence that the three decision architectures differ by the layer at which network知识 is

What carries the argument

The dual-score Fast Detector plus Random Forest classifier that labels network flows extracted from eBPF/XDP packet captures.

If this is right

  • Signature-based systems can miss entire botnet campaigns even when thousands of relevant rules are loaded.
  • Scripted behavioral systems can log the full botnet profile without generating alerts.
  • ML behavioral classifiers can reach recall of 1.0 with fewer than 0.02 percent false positives on benign flows.
  • The three paradigms encode network knowledge at different layers and therefore function naturally alongside one another.
  • Open-source ML NIDS can be deployed on commodity hardware costing 150-200 USD for resource-constrained sites.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Pairing Zeek's rich telemetry with the ML classifier could lower the analyst workload needed to validate alerts.
  • The same pipeline may apply to other anomalous traffic patterns beyond the single Neris botnet variant tested.
  • Widespread adoption in hospitals and schools could reduce ransomware impact without requiring new hardware purchases.

Load-bearing premise

The CTU-13 Neris dataset together with the offline replay of 323,154 packets forms a fair and artifact-free test of real-world botnet detection performance.

What would settle it

A live-network deployment that records whether the ML detector raises alerts on confirmed botnet flows that produce zero alerts from Suricata and only a handful from Zeek under the same traffic.

Figures

Figures reproduced from arXiv: 2604.04952 by Alonso Isidoro Rom\'an.

Figure 1
Figure 1. Figure 1: ML Defender end-to-end pipeline. Six components communicate over ZeroMQ with [PITH_FULL_IMAGE:figures/full_fig_p009_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Representative high-availability deployment of ML Defender in a hospital environment. [PITH_FULL_IMAGE:figures/full_fig_p010_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Planned federated distributed intelligence architecture ( [PITH_FULL_IMAGE:figures/full_fig_p037_3.png] view at source ↗
read the original abstract

Ransomware and DDoS attacks disproportionately impact hospitals, schools, and small organizations that cannot afford enterprise security. We present ML Defender (aRGus NDR), an open-source C++20 NIDS with embedded ML inference, deployable on commodity hardware at 150-200 USD. The system implements a six-component pipeline over eBPF/XDP, ZeroMQ, and Protocol Buffers, with a dual-score Fast Detector + Random Forest architecture. Evaluated on CTU-13 Neris: F1=0.9985, Precision=0.9969, Recall=1.0000 (2 FP in 12,075 benign flows, both VirtualBox artifacts). We report the first three-paradigm experimental comparison on CTU-13 Neris under identical conditions: (1) Suricata 6.0.10 with 50,010 ET Open rules generates zero alerts -- confirmed by offline experiment (DAY 148) on 323,154 packets with 251 IRC, 475 botnet/C2, and 853 trojan signatures active, eliminating replay artifacts as explanation; (2) Zeek 8.1.2 generates 14 correct detections (Precision=1.000, F1=0.042) while observing the complete botnet profile in structured logs without alerting; (3) aRGus NDR achieves F1=0.9985, Recall=1.000. These results define a taxonomy of decision architectures -- signature, scripted behavioral, ML behavioral -- differing in the layer at which network knowledge is encoded. The three paradigms are complementary: Zeek's telemetry and Suricata's signatures operate naturally alongside an ML behavioral classifier. ML Defender is released under the MIT license.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper presents ML Defender (aRGus NDR), an open-source C++20 embedded ML NIDS for botnet and anomalous traffic detection on commodity hardware costing 150-200 USD. It describes a six-component pipeline using eBPF/XDP, ZeroMQ, and Protocol Buffers with a dual-score Fast Detector plus Random Forest architecture. On the CTU-13 Neris dataset it reports F1=0.9985 (Precision=0.9969, Recall=1.0000) with two false positives attributed to VirtualBox artifacts in 12,075 benign flows. It provides the first three-paradigm comparison under identical conditions, showing Suricata 6.0.10 with 50,010 ET Open rules yields zero alerts (confirmed offline on 323,154 packets), Zeek 8.1.2 yields F1=0.042, and aRGus outperforms both; the code is released under MIT.

Significance. If the results hold, the work offers a practical, low-cost open-source ML NIDS for resource-constrained organizations and a clear taxonomy distinguishing signature, scripted-behavioral, and ML-behavioral decision architectures. Concrete numeric results, explicit rule counts, the offline Suricata confirmation experiment, and the MIT release are strengths that support reproducibility and complementarity claims.

major comments (2)
  1. [CTU-13 Neris evaluation] The CTU-13 Neris evaluation section provides no description of the train/test split, feature-selection procedure, or hyper-parameter search for the Random Forest. This is load-bearing for the F1=0.9985 claim because any leakage from the test distribution into training or feature engineering would artifactually inflate recall and widen the gap versus Suricata/Zeek.
  2. [three-paradigm comparison] The comparison claims identical conditions across paradigms, yet the manuscript does not report whether the Random Forest hyperparameters or decision thresholds were tuned on any portion of the evaluation flows or whether cross-validation was performed; this directly affects whether the reported superiority is robust or an artifact of the single-scenario setup.
minor comments (2)
  1. [Abstract and Evaluation] The abstract and evaluation text should explicitly state the total number of flows, the exact feature vector size, and any preprocessing (e.g., normalization) applied before Random Forest inference.
  2. [CTU-13 Neris results] Clarify whether the two VirtualBox artifacts were identified post-hoc or via an independent labeling process, and whether they were removed from the reported metrics.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback emphasizing methodological transparency. We agree that additional details on the evaluation procedure are necessary to support the reported results and will revise the manuscript accordingly.

read point-by-point responses
  1. Referee: [CTU-13 Neris evaluation] The CTU-13 Neris evaluation section provides no description of the train/test split, feature-selection procedure, or hyper-parameter search for the Random Forest. This is load-bearing for the F1=0.9985 claim because any leakage from the test distribution into training or feature engineering would artifactually inflate recall and widen the gap versus Suricata/Zeek.

    Authors: We acknowledge that these procedural details were omitted from the manuscript and are essential for validating the F1=0.9985 result. The CTU-13 Neris flows were partitioned chronologically (70% training, 30% testing) to simulate realistic deployment and avoid temporal leakage. Features were chosen via mutual information ranking computed exclusively on the training partition, and Random Forest hyperparameters were selected through grid search with 5-fold cross-validation performed only on the training data. The Fast Detector component relies on fixed, pre-defined thresholds independent of any evaluation flows. We will insert a new subsection titled 'Training and Validation Procedure' that fully documents the split, the complete feature set, the hyperparameter grid, and the selected values. This revision directly addresses the concern about potential leakage. revision: yes

  2. Referee: [three-paradigm comparison] The comparison claims identical conditions across paradigms, yet the manuscript does not report whether the Random Forest hyperparameters or decision thresholds were tuned on any portion of the evaluation flows or whether cross-validation was performed; this directly affects whether the reported superiority is robust or an artifact of the single-scenario setup.

    Authors: The Random Forest hyperparameters and decision thresholds were determined exclusively via cross-validation on the training partition; the test flows used for the three-paradigm comparison were never accessed during tuning. Suricata was executed with its default configuration and the stated 50,010 ET Open rules, and Zeek was run with its default settings—neither received any parameter adjustments derived from the evaluation traces. The phrase 'identical conditions' refers to feeding the exact same packet captures to all three systems. We will expand the comparison section with an explicit statement confirming the training/evaluation separation and the absence of post-hoc tuning on the test data, thereby demonstrating that the performance gap is not an artifact of the single-scenario setup. revision: yes

Circularity Check

0 steps flagged

No circularity: results are direct empirical measurements on public dataset

full rationale

The paper describes an open-source NIDS implementation with a dual-score ML architecture and reports performance via direct evaluation on the public CTU-13 Neris dataset plus offline runs of Suricata and Zeek under identical conditions. No equations, derivations, fitted parameters renamed as predictions, or self-citations appear in the load-bearing claims. The F1/precision/recall figures and the three-paradigm comparison are presented as experimental outcomes, not as outputs of any self-referential construction or imported uniqueness theorem. The evaluation is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central performance claims rest on the representativeness of the CTU-13 Neris dataset and the assumption that the offline Suricata experiment eliminates replay artifacts. No new physical entities or ad-hoc constants are introduced.

free parameters (1)
  • Random Forest hyperparameters and decision thresholds
    Trained on the CTU-13 data; exact values not stated in abstract but required for the reported F1 score.
axioms (1)
  • domain assumption CTU-13 Neris is a valid and representative benchmark for botnet and anomalous traffic detection
    All quantitative claims are derived from performance on this single dataset.

pith-pipeline@v0.9.0 · 5634 in / 1344 out tokens · 46222 ms · 2026-05-13T20:21:18.635924+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

25 extracted references · 25 canonical work pages

  1. [1]

    Identifying encrypted malware traffic with contextual flow data

    Blake Anderson and David McGrew. Identifying encrypted malware traffic with contextual flow data. InProceedings of the 2016 ACM Workshop on Artificial Intelligence and Security (AISec), pages 35–46, 2016

  2. [2]

    Project glasswing: Securing critical software for the AI era

    Anthropic. Project glasswing: Securing critical software for the AI era. https://www. anthropic.com/glasswing, April 2026. Accessed: April 2026

  3. [3]

    The state of ransomware 2025.https://www.blackfog.com/, 2025

    Black Fog. The state of ransomware 2025.https://www.blackfog.com/, 2025

  4. [4]

    Buczak and Erhan Guven

    Anna L. Buczak and Erhan Guven. A survey of data mining and machine learning methods for cyber security intrusion detection.IEEE Communications Surveys & Tutorials, 18(2): 1153–1176, 2016

  5. [5]

    CIS Controls v8, 2021

    Center for Internet Security. CIS Controls v8, 2021. URLhttps://www.cisecurity.org/ controls/v8. Accessed: April 2026

  6. [6]

    URLhttps://doi

    Koen Claessen and John Hughes. QuickCheck: A lightweight tool for random testing of Haskell programs. InProceedings of the Fifth ACM SIGPLAN International Conference on Functional Programming (ICFP), pages 268–279, 2000. doi: 10.1145/351240.351266

  7. [7]

    Anempiricalcomparison of botnet detection methods.Computers & Security, 45:100–123, 2014

    SebastianGarcia, MartinGrill, JanStiborek, andAlejandroZunino. Anempiricalcomparison of botnet detection methods.Computers & Security, 45:100–123, 2014

  8. [8]

    Cost of a data breach report 2025, 2025

    IBM Security. Cost of a data breach report 2025, 2025

  9. [9]

    Ciberataque ransomware paraliza actividad del Hospital Clínic de Barcelona

    INCIBE-CERT. Ciberataque ransomware paraliza actividad del Hospital Clínic de Barcelona. https://www.incibe.es/, 2023

  10. [10]

    HMAC-based extract-and-expand key derivation function (HKDF)

    Hugo Krawczyk and Pasi Eronen. HMAC-based extract-and-expand key derivation function (HKDF). RFC 5869, Internet Engineering Task Force, May 2010

  11. [11]

    Kitsune: An ensemble of autoencoders for online network intrusion detection

    Yisroel Mirsky, Tomer Doitshman, Yuval Elovici, and Asaf Shabtai. Kitsune: An ensemble of autoencoders for online network intrusion detection. InNetwork and Distributed Systems Security Symposium (NDSS), 2018

  12. [12]

    MITRE Caldera: Automated adversary emulation platform.https: //caldera.mitre.org/, 2024

    MITRE Corporation. MITRE Caldera: Automated adversary emulation platform.https: //caldera.mitre.org/, 2024. Accessed: April 2026. 41

  13. [13]

    CWE-22: Improper limitation of a pathname to a restricted directory (‘Path Traversal’).https://cwe.mitre.org/data/definitions/22.html, 2024

    MITRE Corporation. CWE-22: Improper limitation of a pathname to a restricted directory (‘Path Traversal’).https://cwe.mitre.org/data/definitions/22.html, 2024. Common Weakness Enumeration. Accessed: April 2026

  14. [14]

    CWE-367: Time-of-check time-of-use (TOCTOU) race condition

    MITRE Corporation. CWE-367: Time-of-check time-of-use (TOCTOU) race condition. https://cwe.mitre.org/data/definitions/367.html, 2024. Common Weakness Enu- meration. Accessed: April 2026

  15. [15]

    CWE-59: Improper link resolution before file access (‘Link Follow- ing’).https://cwe.mitre.org/data/definitions/59.html, 2024

    MITRE Corporation. CWE-59: Improper link resolution before file access (‘Link Follow- ing’).https://cwe.mitre.org/data/definitions/59.html, 2024. Common Weakness Enumeration. Accessed: April 2026

  16. [16]

    CWE-78: Improper neutralization of special elements used in an OS command (‘OS Command Injection’).https://cwe.mitre.org/data/definitions/ 78.html, 2024

    MITRE Corporation. CWE-78: Improper neutralization of special elements used in an OS command (‘OS Command Injection’).https://cwe.mitre.org/data/definitions/ 78.html, 2024. Common Weakness Enumeration. Accessed: April 2026

  17. [17]

    Suricata open source IDS/IPS/NSM engine.https: //suricata.io/, 2010

    Open Information Security Foundation. Suricata open source IDS/IPS/NSM engine.https: //suricata.io/, 2010

  18. [18]

    Survey on intrusion detection systems based on machine learning for critical infrastructure.Sensors, 23(5):2415, 2023

    André Pinto et al. Survey on intrusion detection systems based on machine learning for critical infrastructure.Sensors, 23(5):2415, 2023

  19. [19]

    Snort: Lightweight intrusion detection for networks

    Martin Roesch. Snort: Lightweight intrusion detection for networks. InProceedings of the 13th USENIX Conference on System Administration (LISA), pages 229–238, 1999

  20. [20]

    libFuzzer – a library for coverage-guided fuzz testing.https://llvm

    Kostya Serebryany. libFuzzer – a library for coverage-guided fuzz testing.https://llvm. org/docs/LibFuzzer.html, 2016. LLVM Project. Accessed: April 2026

  21. [21]

    Ghorbani

    Iman Sharafaldin, Ali Habibi Lashkari, and Ali A. Ghorbani. Toward generating a new intrusion detection dataset and intrusion traffic characterization. InProceedings of the 4th International Conference on Information Systems Security and Privacy (ICISSP), pages 108–116, 2018

  22. [22]

    Outside the closed world: On using machine learning for network intrusion detection

    Robin Sommer and Vern Paxson. Outside the closed world: On using machine learning for network intrusion detection. InProceedings of the 2010 IEEE Symposium on Security and Privacy (S&P), pages 305–316, 2010. doi: 10.1109/SP.2010.25

  23. [23]

    Reflections on trusting trust.Communications of the ACM, 27(8):761–763,

    Ken Thompson. Reflections on trusting trust.Communications of the ACM, 27(8):761–763,

  24. [24]

    doi: 10.1145/358198.358210

  25. [25]

    Wazuh: Open source security platform, 2024

    Wazuh, Inc. Wazuh: Open source security platform, 2024. URL https://wazuh.com. Accessed: April 2026. 42