ML Defender (aRGus NDR): An Open-Source Embedded ML NIDS for Botnet and Anomalous Traffic Detection in Resource-Constrained Organizations
Pith reviewed 2026-05-13 20:21 UTC · model grok-4.3
The pith
An open-source ML network detector on low-cost hardware detects botnet traffic with F1 of 0.9985 where signature and scripted systems largely fail.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
ML Defender runs a six-component pipeline that captures packets via eBPF/XDP, moves data with ZeroMQ and Protocol Buffers, and classifies flows with a dual-score Fast Detector plus Random Forest model. On the CTU-13 Neris dataset it records F1=0.9985, precision=0.9969 and recall=1.0. Under identical conditions Suricata 6.0.10 with 50,010 ET Open rules alerts on zero packets even after offline confirmation that relevant IRC, C2 and trojan signatures were active, while Zeek 8.1.2 observes the complete botnet profile in structured logs yet raises only fourteen correct alerts. These outcomes are presented as evidence that the three decision architectures differ by the layer at which network知识 is
What carries the argument
The dual-score Fast Detector plus Random Forest classifier that labels network flows extracted from eBPF/XDP packet captures.
If this is right
- Signature-based systems can miss entire botnet campaigns even when thousands of relevant rules are loaded.
- Scripted behavioral systems can log the full botnet profile without generating alerts.
- ML behavioral classifiers can reach recall of 1.0 with fewer than 0.02 percent false positives on benign flows.
- The three paradigms encode network knowledge at different layers and therefore function naturally alongside one another.
- Open-source ML NIDS can be deployed on commodity hardware costing 150-200 USD for resource-constrained sites.
Where Pith is reading between the lines
- Pairing Zeek's rich telemetry with the ML classifier could lower the analyst workload needed to validate alerts.
- The same pipeline may apply to other anomalous traffic patterns beyond the single Neris botnet variant tested.
- Widespread adoption in hospitals and schools could reduce ransomware impact without requiring new hardware purchases.
Load-bearing premise
The CTU-13 Neris dataset together with the offline replay of 323,154 packets forms a fair and artifact-free test of real-world botnet detection performance.
What would settle it
A live-network deployment that records whether the ML detector raises alerts on confirmed botnet flows that produce zero alerts from Suricata and only a handful from Zeek under the same traffic.
Figures
read the original abstract
Ransomware and DDoS attacks disproportionately impact hospitals, schools, and small organizations that cannot afford enterprise security. We present ML Defender (aRGus NDR), an open-source C++20 NIDS with embedded ML inference, deployable on commodity hardware at 150-200 USD. The system implements a six-component pipeline over eBPF/XDP, ZeroMQ, and Protocol Buffers, with a dual-score Fast Detector + Random Forest architecture. Evaluated on CTU-13 Neris: F1=0.9985, Precision=0.9969, Recall=1.0000 (2 FP in 12,075 benign flows, both VirtualBox artifacts). We report the first three-paradigm experimental comparison on CTU-13 Neris under identical conditions: (1) Suricata 6.0.10 with 50,010 ET Open rules generates zero alerts -- confirmed by offline experiment (DAY 148) on 323,154 packets with 251 IRC, 475 botnet/C2, and 853 trojan signatures active, eliminating replay artifacts as explanation; (2) Zeek 8.1.2 generates 14 correct detections (Precision=1.000, F1=0.042) while observing the complete botnet profile in structured logs without alerting; (3) aRGus NDR achieves F1=0.9985, Recall=1.000. These results define a taxonomy of decision architectures -- signature, scripted behavioral, ML behavioral -- differing in the layer at which network knowledge is encoded. The three paradigms are complementary: Zeek's telemetry and Suricata's signatures operate naturally alongside an ML behavioral classifier. ML Defender is released under the MIT license.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents ML Defender (aRGus NDR), an open-source C++20 embedded ML NIDS for botnet and anomalous traffic detection on commodity hardware costing 150-200 USD. It describes a six-component pipeline using eBPF/XDP, ZeroMQ, and Protocol Buffers with a dual-score Fast Detector plus Random Forest architecture. On the CTU-13 Neris dataset it reports F1=0.9985 (Precision=0.9969, Recall=1.0000) with two false positives attributed to VirtualBox artifacts in 12,075 benign flows. It provides the first three-paradigm comparison under identical conditions, showing Suricata 6.0.10 with 50,010 ET Open rules yields zero alerts (confirmed offline on 323,154 packets), Zeek 8.1.2 yields F1=0.042, and aRGus outperforms both; the code is released under MIT.
Significance. If the results hold, the work offers a practical, low-cost open-source ML NIDS for resource-constrained organizations and a clear taxonomy distinguishing signature, scripted-behavioral, and ML-behavioral decision architectures. Concrete numeric results, explicit rule counts, the offline Suricata confirmation experiment, and the MIT release are strengths that support reproducibility and complementarity claims.
major comments (2)
- [CTU-13 Neris evaluation] The CTU-13 Neris evaluation section provides no description of the train/test split, feature-selection procedure, or hyper-parameter search for the Random Forest. This is load-bearing for the F1=0.9985 claim because any leakage from the test distribution into training or feature engineering would artifactually inflate recall and widen the gap versus Suricata/Zeek.
- [three-paradigm comparison] The comparison claims identical conditions across paradigms, yet the manuscript does not report whether the Random Forest hyperparameters or decision thresholds were tuned on any portion of the evaluation flows or whether cross-validation was performed; this directly affects whether the reported superiority is robust or an artifact of the single-scenario setup.
minor comments (2)
- [Abstract and Evaluation] The abstract and evaluation text should explicitly state the total number of flows, the exact feature vector size, and any preprocessing (e.g., normalization) applied before Random Forest inference.
- [CTU-13 Neris results] Clarify whether the two VirtualBox artifacts were identified post-hoc or via an independent labeling process, and whether they were removed from the reported metrics.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback emphasizing methodological transparency. We agree that additional details on the evaluation procedure are necessary to support the reported results and will revise the manuscript accordingly.
read point-by-point responses
-
Referee: [CTU-13 Neris evaluation] The CTU-13 Neris evaluation section provides no description of the train/test split, feature-selection procedure, or hyper-parameter search for the Random Forest. This is load-bearing for the F1=0.9985 claim because any leakage from the test distribution into training or feature engineering would artifactually inflate recall and widen the gap versus Suricata/Zeek.
Authors: We acknowledge that these procedural details were omitted from the manuscript and are essential for validating the F1=0.9985 result. The CTU-13 Neris flows were partitioned chronologically (70% training, 30% testing) to simulate realistic deployment and avoid temporal leakage. Features were chosen via mutual information ranking computed exclusively on the training partition, and Random Forest hyperparameters were selected through grid search with 5-fold cross-validation performed only on the training data. The Fast Detector component relies on fixed, pre-defined thresholds independent of any evaluation flows. We will insert a new subsection titled 'Training and Validation Procedure' that fully documents the split, the complete feature set, the hyperparameter grid, and the selected values. This revision directly addresses the concern about potential leakage. revision: yes
-
Referee: [three-paradigm comparison] The comparison claims identical conditions across paradigms, yet the manuscript does not report whether the Random Forest hyperparameters or decision thresholds were tuned on any portion of the evaluation flows or whether cross-validation was performed; this directly affects whether the reported superiority is robust or an artifact of the single-scenario setup.
Authors: The Random Forest hyperparameters and decision thresholds were determined exclusively via cross-validation on the training partition; the test flows used for the three-paradigm comparison were never accessed during tuning. Suricata was executed with its default configuration and the stated 50,010 ET Open rules, and Zeek was run with its default settings—neither received any parameter adjustments derived from the evaluation traces. The phrase 'identical conditions' refers to feeding the exact same packet captures to all three systems. We will expand the comparison section with an explicit statement confirming the training/evaluation separation and the absence of post-hoc tuning on the test data, thereby demonstrating that the performance gap is not an artifact of the single-scenario setup. revision: yes
Circularity Check
No circularity: results are direct empirical measurements on public dataset
full rationale
The paper describes an open-source NIDS implementation with a dual-score ML architecture and reports performance via direct evaluation on the public CTU-13 Neris dataset plus offline runs of Suricata and Zeek under identical conditions. No equations, derivations, fitted parameters renamed as predictions, or self-citations appear in the load-bearing claims. The F1/precision/recall figures and the three-paradigm comparison are presented as experimental outcomes, not as outputs of any self-referential construction or imported uniqueness theorem. The evaluation is therefore self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
free parameters (1)
- Random Forest hyperparameters and decision thresholds
axioms (1)
- domain assumption CTU-13 Neris is a valid and representative benchmark for botnet and anomalous traffic detection
Reference graph
Works this paper leans on
-
[1]
Identifying encrypted malware traffic with contextual flow data
Blake Anderson and David McGrew. Identifying encrypted malware traffic with contextual flow data. InProceedings of the 2016 ACM Workshop on Artificial Intelligence and Security (AISec), pages 35–46, 2016
work page 2016
-
[2]
Project glasswing: Securing critical software for the AI era
Anthropic. Project glasswing: Securing critical software for the AI era. https://www. anthropic.com/glasswing, April 2026. Accessed: April 2026
work page 2026
-
[3]
The state of ransomware 2025.https://www.blackfog.com/, 2025
Black Fog. The state of ransomware 2025.https://www.blackfog.com/, 2025
work page 2025
-
[4]
Anna L. Buczak and Erhan Guven. A survey of data mining and machine learning methods for cyber security intrusion detection.IEEE Communications Surveys & Tutorials, 18(2): 1153–1176, 2016
work page 2016
-
[5]
Center for Internet Security. CIS Controls v8, 2021. URLhttps://www.cisecurity.org/ controls/v8. Accessed: April 2026
work page 2021
-
[6]
Koen Claessen and John Hughes. QuickCheck: A lightweight tool for random testing of Haskell programs. InProceedings of the Fifth ACM SIGPLAN International Conference on Functional Programming (ICFP), pages 268–279, 2000. doi: 10.1145/351240.351266
-
[7]
Anempiricalcomparison of botnet detection methods.Computers & Security, 45:100–123, 2014
SebastianGarcia, MartinGrill, JanStiborek, andAlejandroZunino. Anempiricalcomparison of botnet detection methods.Computers & Security, 45:100–123, 2014
work page 2014
-
[8]
Cost of a data breach report 2025, 2025
IBM Security. Cost of a data breach report 2025, 2025
work page 2025
-
[9]
Ciberataque ransomware paraliza actividad del Hospital Clínic de Barcelona
INCIBE-CERT. Ciberataque ransomware paraliza actividad del Hospital Clínic de Barcelona. https://www.incibe.es/, 2023
work page 2023
-
[10]
HMAC-based extract-and-expand key derivation function (HKDF)
Hugo Krawczyk and Pasi Eronen. HMAC-based extract-and-expand key derivation function (HKDF). RFC 5869, Internet Engineering Task Force, May 2010
work page 2010
-
[11]
Kitsune: An ensemble of autoencoders for online network intrusion detection
Yisroel Mirsky, Tomer Doitshman, Yuval Elovici, and Asaf Shabtai. Kitsune: An ensemble of autoencoders for online network intrusion detection. InNetwork and Distributed Systems Security Symposium (NDSS), 2018
work page 2018
-
[12]
MITRE Caldera: Automated adversary emulation platform.https: //caldera.mitre.org/, 2024
MITRE Corporation. MITRE Caldera: Automated adversary emulation platform.https: //caldera.mitre.org/, 2024. Accessed: April 2026. 41
work page 2024
-
[13]
MITRE Corporation. CWE-22: Improper limitation of a pathname to a restricted directory (‘Path Traversal’).https://cwe.mitre.org/data/definitions/22.html, 2024. Common Weakness Enumeration. Accessed: April 2026
work page 2024
-
[14]
CWE-367: Time-of-check time-of-use (TOCTOU) race condition
MITRE Corporation. CWE-367: Time-of-check time-of-use (TOCTOU) race condition. https://cwe.mitre.org/data/definitions/367.html, 2024. Common Weakness Enu- meration. Accessed: April 2026
work page 2024
-
[15]
MITRE Corporation. CWE-59: Improper link resolution before file access (‘Link Follow- ing’).https://cwe.mitre.org/data/definitions/59.html, 2024. Common Weakness Enumeration. Accessed: April 2026
work page 2024
-
[16]
MITRE Corporation. CWE-78: Improper neutralization of special elements used in an OS command (‘OS Command Injection’).https://cwe.mitre.org/data/definitions/ 78.html, 2024. Common Weakness Enumeration. Accessed: April 2026
work page 2024
-
[17]
Suricata open source IDS/IPS/NSM engine.https: //suricata.io/, 2010
Open Information Security Foundation. Suricata open source IDS/IPS/NSM engine.https: //suricata.io/, 2010
work page 2010
-
[18]
André Pinto et al. Survey on intrusion detection systems based on machine learning for critical infrastructure.Sensors, 23(5):2415, 2023
work page 2023
-
[19]
Snort: Lightweight intrusion detection for networks
Martin Roesch. Snort: Lightweight intrusion detection for networks. InProceedings of the 13th USENIX Conference on System Administration (LISA), pages 229–238, 1999
work page 1999
-
[20]
libFuzzer – a library for coverage-guided fuzz testing.https://llvm
Kostya Serebryany. libFuzzer – a library for coverage-guided fuzz testing.https://llvm. org/docs/LibFuzzer.html, 2016. LLVM Project. Accessed: April 2026
work page 2016
- [21]
-
[22]
Outside the closed world: On using machine learning for network intrusion detection
Robin Sommer and Vern Paxson. Outside the closed world: On using machine learning for network intrusion detection. InProceedings of the 2010 IEEE Symposium on Security and Privacy (S&P), pages 305–316, 2010. doi: 10.1109/SP.2010.25
-
[23]
Reflections on trusting trust.Communications of the ACM, 27(8):761–763,
Ken Thompson. Reflections on trusting trust.Communications of the ACM, 27(8):761–763,
-
[24]
doi: 10.1145/358198.358210
-
[25]
Wazuh: Open source security platform, 2024
Wazuh, Inc. Wazuh: Open source security platform, 2024. URL https://wazuh.com. Accessed: April 2026. 42
work page 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.