pith. sign in

arxiv: 1906.09715 · v1 · pith:Q75F4VMLnew · submitted 2019-06-24 · 💻 cs.CR · cs.NI

EDIMA: Early Detection of IoT Malware Network Activity Using Machine Learning Techniques

Pith reviewed 2026-05-25 17:50 UTC · model grok-4.3

classification 💻 cs.CR cs.NI
keywords IoT malware detectionmachine learningearly detectionnetwork traffic classificationdistributed systemscanning phaseMirai variantslarge-scale networks
0
0 comments X

The pith

EDIMA detects IoT malware scanning activity early using machine learning in large networks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents EDIMA as a distributed modular system that classifies network traffic from IoT devices to spot malware during the scanning and infecting stage. It combines machine learning models on edge traffic, a feature vector database, a policy module, and an optional sub-sampling component. Evaluation occurs through testbed experiments that measure how well the classifiers separate malware patterns from normal IoT traffic. A sympathetic reader would care because most existing defenses react only after devices have been compromised and used in attacks such as DDoS. If the approach works, network operators gain a window to intervene before infections spread across large ISP or enterprise environments.

Core claim

EDIMA is a distributed modular solution that employs machine learning algorithms for edge devices' traffic classification, a packet traffic feature vector database, a policy module and an optional packet sub-sampling module to detect IoT malware network activity during the scanning/infecting phase rather than during an attack in large-scale networks.

What carries the argument

Machine learning algorithms that classify edge-device traffic, supported by a packet feature vector database and policy module, to identify malware scanning patterns before attacks occur.

If this is right

  • Operators of large networks can intervene during the scanning phase to limit device infections.
  • New malware variants that target software vulnerabilities instead of open ports become detectable through traffic patterns.
  • Detection shifts from reactive attack response to proactive scanning-phase blocking.
  • The modular design allows deployment at ISP or enterprise scale without requiring changes at every device.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same traffic-classification approach could be tested against other network threats that begin with scanning behavior.
  • Integration with existing packet-filtering tools might extend the time window for blocking before infection completes.
  • Results from the testbed suggest the need to measure how sub-sampling affects detection latency in high-volume environments.

Load-bearing premise

Machine learning models trained and tested in a controlled testbed will continue to distinguish malware scanning traffic from normal IoT device traffic when deployed across real large-scale networks.

What would settle it

Classification accuracy falls below usable levels when the trained models are applied to live traffic captured from an actual ISP or enterprise network rather than the testbed.

Figures

Figures reproduced from arXiv: 1906.09715 by Ayush Kumar, Teng Joon Lim.

Figure 1
Figure 1. Figure 1: EDIMA Architecture IV. EXTRACTION OF IOT MALWARE TRAFFIC FEATURES A. Malware Categorization We have categorized known IoT malware into three cate￾gories based on type of vulnerability that they target: TELNET, HTTP POST and HTTP GET. TELNET is an application-layer protocol used for bidirectional byte-oriented communication. Typically, a user with a terminal and running a TELNET client program, accesses a r… view at source ↗
Figure 2
Figure 2. Figure 2: Testbed used to collect packet traffic for ML training B. Evaluation Methodology As we can’t use real malware due to legal and ethical considerations, we wrote scripts to simulate the generation of malicious packets based on publicly available exploits [30] for the vulnerabilities exploited by those malware. The script generates random IP addresses and sends malicious requests to them in order to execute r… view at source ↗
Figure 3
Figure 3. Figure 3: Distribution of feature vector values Classifier Accuracy Precision Recall F1 Score Random Forest 88.8% 0.86 1 0.92 k-NN 94.44% 0.92 1 0.96 Gaussian Naive Bayes 77.78% 0.75 1 0.86 TABLE II: Accuracy, Precision, Recall and F1 scores for various classifiers conditions are not easily distinguishable, it may impair the detection performance. The scikit-learn ML algorithms library [31] was used for training and… view at source ↗
read the original abstract

The widespread adoption of Internet of Things has led to many security issues. Post the Mirai-based DDoS attack in 2016 which compromised IoT devices, a host of new malware using Mirai's leaked source code and targeting IoT devices have cropped up, e.g. Satori, Reaper, Amnesia, Masuta etc. These malware exploit software vulnerabilities to infect IoT devices instead of open TELNET ports (like Mirai) making them more difficult to block using existing solutions such as firewalls. In this research, we present EDIMA, a distributed modular solution which can be used towards the detection of IoT malware network activity in large-scale networks (e.g. ISP, enterprise networks) during the scanning/infecting phase rather than during an attack. EDIMA employs machine learning algorithms for edge devices' traffic classification, a packet traffic feature vector database, a policy module and an optional packet sub-sampling module. We evaluate the classification performance of EDIMA through testbed experiments and present the results obtained.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 0 minor

Summary. The manuscript introduces EDIMA, a distributed modular system for early detection of IoT malware network activity (e.g., Mirai variants) in large-scale networks such as ISPs or enterprises. It uses machine learning for classifying edge-device traffic during the scanning/infecting phase (rather than during attacks), supported by a packet feature-vector database, a policy module, and an optional sub-sampling module; performance is evaluated via testbed experiments.

Significance. If the central claim holds, EDIMA would provide a practical, proactive defense against IoT botnets by enabling detection before DDoS attacks materialize, which is valuable for network operators facing the post-Mirai threat landscape.

major comments (1)
  1. [Evaluation] Evaluation section (testbed experiments): The reported results are confined to a controlled testbed with a limited set of devices and malware samples. No cross-validation against external traces, assessment under realistic traffic diversity or concept drift, or quantification of false-positive rates at ISP/enterprise scale is provided, leaving the deployment claim for large-scale networks dependent on an untested extrapolation.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the detailed review and constructive feedback on our manuscript. We address the major comment on the evaluation below.

read point-by-point responses
  1. Referee: [Evaluation] Evaluation section (testbed experiments): The reported results are confined to a controlled testbed with a limited set of devices and malware samples. No cross-validation against external traces, assessment under realistic traffic diversity or concept drift, or quantification of false-positive rates at ISP/enterprise scale is provided, leaving the deployment claim for large-scale networks dependent on an untested extrapolation.

    Authors: We agree that the evaluation is confined to testbed experiments using a limited set of devices and malware samples, without cross-validation on external traces, explicit assessment of concept drift, or direct measurement of false-positive rates at ISP/enterprise scale. The testbed was designed to isolate and replicate the scanning/infecting phase traffic patterns of Mirai variants in a repeatable manner, enabling focused assessment of the machine learning classifiers and modular components. We will revise the manuscript to add an explicit limitations subsection in the evaluation and conclusions, discussing the controlled nature of the experiments, the absence of large-scale or external-trace validation, potential impacts of traffic diversity and concept drift, and the fact that large-scale deployment claims rely on the modular architecture rather than direct empirical evidence at that scale. This will temper the extrapolation while preserving the contribution as a proof-of-concept demonstration. revision: partial

Circularity Check

0 steps flagged

No circularity; experimental claims rest on testbed evaluation

full rationale

The paper introduces the EDIMA system and reports classification performance from controlled testbed experiments with specific devices and malware samples. No equations, parameter-fitting steps, self-citations used as load-bearing uniqueness theorems, or renamings of known results appear in the provided text. The central claim is an empirical assertion about detection in the scanning phase, evaluated directly rather than derived from internal definitions or prior self-referential results. This is a standard non-circular experimental paper.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review supplies no information on free parameters, axioms, or invented entities.

pith-pipeline@v0.9.0 · 5707 in / 1119 out tokens · 42775 ms · 2026-05-25T17:50:11.479171+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

31 extracted references · 31 canonical work pages · 2 internal anchors

  1. [1]

    Internet of Things: A Survey on Enabling Technologies, Protocols, and Applications,

    A. Al-Fuqaha, M. Guizani, M. Mohammadi, M. Aledhari, and M. Ayyash, “Internet of Things: A Survey on Enabling Technologies, Protocols, and Applications,” IEEE Communications Surveys Tutorials , vol. 17, no. 4, pp. 2347–2376, Fourthquarter 2015

  2. [2]

    A Survey on Security and Privacy Issues in Internet-of-Things,

    Y . Yang, L. Wu, G. Yin, L. Li, and H. Zhao, “A Survey on Security and Privacy Issues in Internet-of-Things,” IEEE Internet of Things Journal, vol. 4, no. 5, pp. 1250–1258, Oct 2017

  3. [3]

    Hacked Cameras, DVRs Powered Today’s Massive Internet Outage,

    B. Krebs, “Hacked Cameras, DVRs Powered Today’s Massive Internet Outage,” https://krebsonsecurity.com/2016/10/ hacked-cameras-dvrs-powered-todays-massive-internet-outage/, Oct 2016

  4. [4]

    IoT Botnet Used in Website Hacking Attacks,

    I. Arghire, “IoT Botnet Used in Website Hacking Attacks,” https: //www.securityweek.com/iot-botnet-used-website-hacking-attacks

  5. [5]

    Mirai Code Still Runs on Many IoT Devices,

    I. Ilascu, “Mirai Code Still Runs on Many IoT Devices,” https://www.bitdefender.com/box/blog/iot-news/ mirai-code-still-runs-many-iot-devices/

  6. [6]

    Reaper Madness,

    A. Team, “Reaper Madness,” https://asert.arbornetworks.com/reaper-madness/

  7. [7]

    New trends in the world of IoT threats,

    Y . S. M. Kuzin and V . Kuskov, “New trends in the world of IoT threats,” https://securelist.com/new-trends-in-the-world-of-iot-threats/87991/

  8. [8]

    Handling a trillion (unfixable) flaws on a billion devices: Rethinking network security for the internet-of-things,

    T. Yu, V . Sekar, S. Seshan, Y . Agarwal, and C. Xu, “Handling a trillion (unfixable) flaws on a billion devices: Rethinking network security for the internet-of-things,” in Proceedings of the 14th ACM Workshop on Hot Topics in Networks , ser. HotNets-XIV . New York, NY , USA: ACM, 2015, pp. 5:1–5:7. [Online]. Available: http://doi.acm.org/10.1145/2834050.2834095

  9. [9]

    Bothunter: Detecting malware infection through ids-driven dialog correlation,

    G. Gu, P. Porras, V . Yegneswaran, and M. Fong, “Bothunter: Detecting malware infection through ids-driven dialog correlation,” in 16th USENIX Security Symposium (USENIX Security 07) . Boston, MA: USENIX Association, 2007. [Online]. Available: https://www.usenix.org/conference/16th-usenix-security-symposium/ bothunter-detecting-malware-infection-through-i...

  10. [10]

    BotSniffer: Detecting Botnet Command and Control Channels in Network Traffic,

    G. Gu, J. Zhang, and W. Lee, “BotSniffer: Detecting Botnet Command and Control Channels in Network Traffic,” in Network and Distributed System Security Symposium (NDSS) , 2008

  11. [11]

    BotMiner: Clustering Analysis of Network Traffic for Protocol- and Structure-independent Botnet Detection,

    G. Gu, R. Perdisci, J. Zhang, and W. Lee, “BotMiner: Clustering Analysis of Network Traffic for Protocol- and Structure-independent Botnet Detection,” in Proceedings of the 17th Conference on Security Symposium, ser. SS’08. Berkeley, CA, USA: USENIX Association, 2008, pp. 139–154. [Online]. Available: http://dl.acm.org/citation.cfm?id=1496711.1496721

  12. [12]

    Heimdall: Mitigating the Internet of Insecure Things,

    J. Habibi, D. Midi, A. Mudgerikar, and E. Bertino, “Heimdall: Mitigating the Internet of Insecure Things,” IEEE Internet of Things Journal, vol. 4, no. 4, pp. 968–978, Aug 2017

  13. [13]

    A Two-layer Dimension Reduction and Two-tier Classification Model for Anomaly-Based Intrusion Detection in IoT Backbone Networks,

    H. H. Pajouh, R. Javidan, R. Khayami, D. Ali, and K. K. R. Choo, “A Two-layer Dimension Reduction and Two-tier Classification Model for Anomaly-Based Intrusion Detection in IoT Backbone Networks,” IEEE Transactions on Emerging Topics in Computing , vol. PP, no. 99, pp. 1–1, 2016

  14. [15]
  15. [16]

    Deft: A distributed iot fingerprinting technique,

    V . Thangavelu, D. M. Divakaran, R. Sairam, S. S. Bhunia, and M. Gurusamy, “Deft: A distributed iot fingerprinting technique,” IEEE Internet of Things Journal , pp. 1–1, 2018

  16. [17]

    Clear as mud: Generating, validating and applying iot behavioral profiles,

    A. Hamza, D. Ranathunga, H. H. Gharakheili, M. Roughan, and V . Sivaraman, “Clear as mud: Generating, validating and applying iot behavioral profiles,” in Proceedings of the 2018 Workshop on IoT Security and Privacy , ser. IoT S&P ’18. New York, NY , USA: ACM, 2018, pp. 8–14. [Online]. Available: http://doi.acm.org/10.1145/3229565.3229566

  17. [18]

    D\"IoT: A Federated Self-learning Anomaly Detection System for IoT

    T. D. Nguyen, S. Marchal, M. Miettinen, M. H. Dang, N. Asokan, and A. Sadeghi, “Dïot: A crowdsourced self-learning approach for detecting compromised iot devices,” CoRR, vol. abs/1804.07474, 2018. [Online]. Available: http://arxiv.org/abs/1804.07474

  18. [19]

    Using Machine Learning Techniques to Identify Botnet Traffic,

    C. Livadas, R. Walsh, D. Lapsley, and W. T. Strayer, “Using Machine Learning Techniques to Identify Botnet Traffic,” in Proceedings. 2006 31st IEEE Conference on Local Computer Networks , Nov 2006, pp. 967–974

  19. [20]

    Early Detection Of Mirai-Like IoT Bots In Large-Scale Networks Through Sub-Sampled Packet Traffic Analysis,

    A. Kumar and T. J. Lim, “Early Detection Of Mirai-Like IoT Bots In Large-Scale Networks Through Sub-Sampled Packet Traffic Analysis,” in Proceedings of the 2nd Future of Information and Communication Conference, Springer Lecture Notes in Networks and Systems (To be published), vol. 70, 2019

  20. [21]

    DDoS in the IoT: Mirai and Other Botnets,

    C. Kolias, G. Kambourakis, A. Stavrou, and J. V oas, “DDoS in the IoT: Mirai and Other Botnets,” Computer, vol. 50, no. 7, pp. 80–84, 2017

  21. [22]

    Hajime worm battles Mirai for control of the Internet of Things,

    W. Grange, “Hajime worm battles Mirai for control of the Internet of Things,” https://symc.ly/2q1q7Mj

  22. [23]

    Your Linux-based home router could succumb to a new Telnet worm, Remaiten,

    L. Constantin, “Your Linux-based home router could succumb to a new Telnet worm, Remaiten,” https://bit.ly/2QUIADs

  23. [24]

    Is there an Internet-of-Things vigilante out there?

    M. Ballano, “Is there an Internet-of-Things vigilante out there?” https://symc.ly/2Hh9fuB

  24. [25]

    BrickerBot Author Retires Claiming to Have Bricked over 10 Million IoT Devices,

    C. Cimpanu, “BrickerBot Author Retires Claiming to Have Bricked over 10 Million IoT Devices,” https://bit.ly/2BkaUvd

  25. [26]

    IoT Malware Evolves to Harvest Bots by Exploiting a Zero-day Home Router Vulnerability,

    C. X. C. Zheng and Y . Jia, “IoT Malware Evolves to Harvest Bots by Exploiting a Zero-day Home Router Vulnerability,” https://bit.ly/2SVVh2x

  26. [27]

    Masuta : Satori Creators’ Second Botnet Weaponizes A New Router Exploit,

    A. Anubhav, “Masuta : Satori Creators’ Second Botnet Weaponizes A New Router Exploit,” https://bit.ly/2FGgav7

  27. [28]

    Linux Worm Targeting Hidden Devices,

    K. Hayashi, “Linux Worm Targeting Hidden Devices,” https://symc.ly/2CnM786

  28. [29]

    Reaper Botnet,

    Radware, “Reaper Botnet,” https://bit.ly/2HeVMn5

  29. [30]

    New IoT/Linux Malware Targets DVRs, Forms Botnet,

    C. Z. C. Xiao and Y . Jia, “New IoT/Linux Malware Targets DVRs, Forms Botnet,” https://bit.ly/2VX7JRN

  30. [31]

    Exploit Database,

    O. Security, “Exploit Database,” https://www.exploit-db.com/

  31. [32]

    Machine Learning in Python,

    scikit learn, “Machine Learning in Python,” https://scikit-learn.org/stable/