pith. sign in

arxiv: 1906.12198 · v1 · pith:S4HACGI2new · submitted 2019-06-27 · 💻 cs.CR · cs.LG

A New Malware Detection System Using a High Performance-ELM method

Pith reviewed 2026-05-25 14:40 UTC · model grok-4.3

classification 💻 cs.CR cs.LG
keywords malware detectionextreme learning machineHP-ELManomaly detectioncybersecurityCTU-13 datasetmachine learning
0
0 comments X

The pith

A High Performance Extreme Learning Machine reaches 0.9592 accuracy on malware detection using only the top three features.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper applies a High Performance Extreme Learning Machine to two standard malware datasets to identify anomalies that affect mobile devices and cyberspace infrastructure. It shows through comparisons that this method attains its highest reported accuracy when restricted to the top three features and a single activation function. The results position HP-ELM as more accurate than the other learning methods tested on the same data. If the finding holds, it points to a way of catching malware with limited input data and straightforward training. Such an approach could support faster screening in security systems that monitor device behavior.

Core claim

The authors test the High Performance Extreme Learning Machine (HP-ELM) on the CTU-13 and Malware datasets for anomaly detection. Extensive comparisons establish that the method reaches a peak accuracy of 0.9592 when using only the top three features together with one activation function.

What carries the argument

High Performance Extreme Learning Machine (HP-ELM), an optimized extreme learning machine variant used for fast classification of network and malware data.

If this is right

  • Malware detection can succeed with only three selected features instead of full feature sets.
  • HP-ELM outperforms other machine learning baselines on both the CTU-13 and Malware datasets.
  • A single activation function suffices to reach the method's best accuracy.
  • The approach applies across two distinct malware datasets without dataset-specific tuning.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same feature-reduction step could lower the compute needed for real-time monitoring on resource-limited devices.
  • HP-ELM might extend to related tasks such as network intrusion detection if similar top-feature patterns appear there.
  • Testing on newer malware samples collected after the original datasets would check whether the accuracy holds over time.

Load-bearing premise

The accuracy number comes from a fair comparison on representative malware data without post-selection of features or samples that inflates the result.

What would settle it

An independent run of HP-ELM on the CTU-13 dataset with the identical top three features and activation function that yields accuracy below 0.9 would falsify the reported performance.

read the original abstract

A vital element of a cyberspace infrastructure is cybersecurity. Many protocols proposed for security issues, which leads to anomalies that affect the related infrastructure of cyberspace. Machine learning (ML) methods used to mitigate anomalies behavior in mobile devices. This paper aims to apply a High Performance Extreme Learning Machine (HP-ELM) to detect possible anomalies in two malware datasets. Two widely used datasets (the CTU-13 and Malware) are used to test the effectiveness of HP-ELM. Extensive comparisons are carried out in order to validate the effectiveness of the HP-ELM learning method. The experiment results demonstrate that the HP-ELM was the highest accuracy of performance of 0.9592 for the top 3 features with one activation function.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper proposes applying a High Performance Extreme Learning Machine (HP-ELM) to malware detection on the CTU-13 and Malware datasets. It claims that extensive comparisons show HP-ELM achieves the highest accuracy of 0.9592 using the top 3 features with one activation function.

Significance. If the reported accuracy holds under a reproducible protocol with no data leakage and fair baselines, the work would add an empirical data point on ELM variants for malware classification. However, the absence of any methodological detail means the result cannot currently be evaluated or compared to existing literature.

major comments (2)
  1. [Abstract] Abstract: The central claim of 0.9592 accuracy for the top-3 features supplies no experimental protocol, train/test split, cross-validation scheme, baseline algorithms, or statistical tests. Without these, the performance number cannot be verified as a valid out-of-sample result.
  2. [Abstract] Abstract: The phrase 'top 3 features' is used without describing the feature-ranking procedure. If ranking or selection was performed on the full dataset (or after the train/test split), the reported accuracy is inflated by leakage and does not support the 'highest performance' assertion.
minor comments (1)
  1. [Abstract] Abstract: The sentence 'Many protocols proposed for security issues, which leads to anomalies that affect the related infrastructure of cyberspace' is grammatically unclear and should be rewritten for readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for highlighting the lack of methodological transparency in the abstract. We agree that the current version does not supply sufficient detail for reproducibility or to rule out data leakage, and we will revise the manuscript to address both points.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The central claim of 0.9592 accuracy for the top-3 features supplies no experimental protocol, train/test split, cross-validation scheme, baseline algorithms, or statistical tests. Without these, the performance number cannot be verified as a valid out-of-sample result.

    Authors: We acknowledge that the abstract (and, upon re-examination, the full text) provides no explicit protocol, split, CV scheme, baselines, or statistical tests. The manuscript therefore cannot currently support the claimed result as a verified out-of-sample finding. In the revised version we will add a concise experimental protocol to both the abstract and methods section, specifying the train/test split, cross-validation procedure, baseline algorithms, and any statistical testing performed. revision: yes

  2. Referee: [Abstract] Abstract: The phrase 'top 3 features' is used without describing the feature-ranking procedure. If ranking or selection was performed on the full dataset (or after the train/test split), the reported accuracy is inflated by leakage and does not support the 'highest performance' assertion.

    Authors: We agree that the absence of any description of the feature-ranking procedure leaves open the possibility of data leakage and prevents evaluation of the claim. The current manuscript contains no such description. We will revise the text to state exactly how the top-3 features were selected (including whether selection was performed inside the training fold only) and will add this information to the abstract and methods. revision: yes

Circularity Check

0 steps flagged

No derivation chain; purely empirical accuracy on external datasets

full rationale

The paper applies HP-ELM to two malware datasets (CTU-13 and Malware) and reports an empirical accuracy of 0.9592 for the top-3 features. No equations, derivations, predictions, or first-principles results are claimed. The central claim is a performance number obtained from external data, with no internal reduction of outputs to fitted inputs or self-citations that bear the load. This is self-contained empirical work with no circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Abstract-only review; the paper appears to rest on standard supervised classification assumptions (i.i.d. samples, fixed feature extraction) and the existence of the two named public datasets. No free parameters, new axioms, or invented entities are described.

axioms (1)
  • domain assumption Standard supervised learning assumptions hold for the CTU-13 and Malware datasets
    Implicit in any ML classification experiment on these datasets

pith-pipeline@v0.9.0 · 5652 in / 1162 out tokens · 21403 ms · 2026-05-25T14:40:41.792628+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

38 extracted references · 38 canonical work pages

  1. [1]

    Ericsson mobility report

    Cerwall P, Jonsson P, Möller R, Bävertoft S, Carson S, Godor I, et al. Ericsson mobility report. On the Pulse of the Networked Society Hg v Ericsson. 2015

  2. [2]

    Android Mobile Security Threats https://www.kaspersky.com/resource-center/threats/mobile2018

  3. [3]

    Cybercrime will Cost Businesses over $2 Trillion by 2019

    Smith S. Cybercrime will Cost Businesses over $2 Trillion by 2019. Retrieved from Juniper Research: https://www. juniperresearch. com/press/pressreleases/cybercrime-cost-businesses-over-2trillion; 2015

  4. [4]

    Report: 2016 saw 8.5 million mobile malware attacks, ransomware and IoT threats on the rise

    Report. Report: 2016 saw 8.5 million mobile malware attacks, ransomware and IoT threats on the rise. Available from: https://www.techrepublic.com/article/report-2016-saw-8-5-million-mobile-malware-attacks- ransomware-and-iot-threats-on-the-rise/

  5. [5]

    Secure gateway with firewall and intrusion detection capabilities

    Magdych JS, Rahmanovic T, McDonald JR, Tellier BE, Osborne AC, Herath NP. Secure gateway with firewall and intrusion detection capabilities. Google Patents; 2012

  6. [6]

    DDoS in the IoT: Mirai and other botnets

    Kolias C, Kambourakis G, Stavrou A, Voas J. DDoS in the IoT: Mirai and other botnets. Computer. 2017;50(7):80-4

  7. [7]

    Poster: Securing the internet of things with DTLS

    Kothmayr T, Hu W, Schmitt C, Bruenig M, Carle G, editors. Poster: Securing the internet of things with DTLS. Proceedings of the 9th ACM Conference on Embedded Networked Sensor Systems; 2011: ACM

  8. [8]

    TaintDroid: an information-flow tracking system for realtime privacy monitoring on smartphones

    Enck W, Gilbert P, Han S, Tendulkar V, Chun B-G, Cox LP, et al. TaintDroid: an information-flow tracking system for realtime privacy monitoring on smartphones. ACM Transactions on Computer Systems (TOCS). 2014;32(2):5

  9. [9]

    A Three-Layer Privacy Preserving Cloud Storage Scheme Based on Computational Intelligence in Fog Computing

    Wang T, Zhou J, Chen X, Wang G, Liu A, Liu Y. A Three-Layer Privacy Preserving Cloud Storage Scheme Based on Computational Intelligence in Fog Computing. IEEE Transactions on Emerging Topics in Computational Intelligence. 2018;2(1):3-12

  10. [10]

    An improved Android malware detection scheme based on an evolving hybrid neuro-fuzzy classifier (EHNFC) and permission-based features

    Altaher A. An improved Android malware detection scheme based on an evolving hybrid neuro-fuzzy classifier (EHNFC) and permission-based features. Neural Computing and Applications. 2017;28(12):4147-57

  11. [11]

    Fuzzy neural network for malware detect

    Zhang Y, Pang J, Yue F, Cui J, editors. Fuzzy neural network for malware detect. Intelligent System Design and Engineering Application (ISDEA), 2010 International Conference on; 2010: IEEE

  12. [12]

    Automatic rule-mining for malware detection employing neuro-fuzzy approach

    Shalaginov A, Franke K. Automatic rule-mining for malware detection employing neuro-fuzzy approach. Norsk informasjonssikkerhetskonferanse (NISK). 2013;2013

  13. [13]

    Toward credible evaluation of anomaly-based intrusion- detection methods

    Tavallaee M, Stakhanova N, Ghorbani AA. Toward credible evaluation of anomaly-based intrusion- detection methods. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews). 2010;40(5):516-24

  14. [14]

    Evaluation of anomaly‐based IDS for mobile devices using machine learning classifiers

    Damopoulos D, Menesidou SA, Kambourakis G, Papadaki M, Clarke N, Gritzalis S. Evaluation of anomaly‐based IDS for mobile devices using machine learning classifiers. Security and Communication Networks. 2012;5(1):3-14

  15. [15]

    Extreme learning machine: a new learning scheme of feedforward neural networks

    Huang G-B, Zhu Q-Y, Siew C-K, editors. Extreme learning machine: a new learning scheme of feedforward neural networks. Neural Networks, 2004 Proceedings 2004 IEEE International Joint Conference on; 2004: IEEE

  16. [16]

    Extreme learning machine: theory and applications

    Huang G-B, Zhu Q-Y, Siew C-K. Extreme learning machine: theory and applications. Neurocomputing. 2006;70(1-3):489-501

  17. [17]

    Extreme learning machine for regression and multiclass classification

    Huang G-B, Zhou H, Ding X, Zhang R. Extreme learning machine for regression and multiclass classification. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics). 2012;42(2):513-29. 22

  18. [18]

    What are extreme learning machines? Filling the gap between Frank Rosenblatt’s dream and John von Neumann’s puzzle

    Huang G-B. What are extreme learning machines? Filling the gap between Frank Rosenblatt’s dream and John von Neumann’s puzzle. Cognitive Computation. 2015;7(3):263-78

  19. [19]

    High-performance extreme learning machines: a complete toolbox for big data applications

    Akusok A, Björk K-M, Miche Y, Lendasse A. High-performance extreme learning machines: a complete toolbox for big data applications. IEEE Access. 2015;3:1011-25

  20. [20]

    An empirical comparison of botnet detection methods

    Garcia S, Grill M, Stiborek J, Zunino A. An empirical comparison of botnet detection methods. computers & security. 2014;45:100-23

  21. [21]

    DyHAP: dynamic hybrid ANFIS-PSO approach for predicting mobile malware

    Afifi F, Anuar NB, Shamshirband S, Choo K-KR. DyHAP: dynamic hybrid ANFIS-PSO approach for predicting mobile malware. PloS one. 2016;11(9):e0162627

  22. [22]

    Madam: Effective and efficient behavior-based android malware detection and prevention

    Saracino A, Sgandurra D, Dini G, Martinelli F. Madam: Effective and efficient behavior-based android malware detection and prevention. IEEE Transactions on Dependable and Secure Computing. 2018;15(1):83-97

  23. [23]

    A survey on automated dynamic malware-analysis techniques and tools

    Egele M, Scholte T, Kirda E, Kruegel C. A survey on automated dynamic malware-analysis techniques and tools. ACM computing surveys (CSUR). 2012;44(2):6

  24. [24]

    Crowdroid: behavior-based malware detection system for android

    Burguera I, Zurutuza U, Nadjm-Tehrani S, editors. Crowdroid: behavior-based malware detection system for android. Proceedings of the 1st ACM workshop on Security and privacy in smartphones and mobile devices; 2011: ACM

  25. [25]

    Evaluation of machine learning classifiers for mobile malware detection

    Narudin FA, Feizollah A, Anuar NB, Gani A. Evaluation of machine learning classifiers for mobile malware detection. Soft Computing. 2016;20(1):343-57

  26. [26]

    Monet: a user-oriented behavior-based malware variants detection system for android

    Sun M, Li X, Lui JC, Ma RT, Liang Z. Monet: a user-oriented behavior-based malware variants detection system for android. IEEE Transactions on Information Forensics and Security. 2017;12(5):1103-12

  27. [27]

    Android Malware Detection using Markov Chain Model of Application Behaviors in Requesting System Services

    Salehi M, Amini M. Android Malware Detection using Markov Chain Model of Application Behaviors in Requesting System Services. arXiv preprint arXiv:171105731. 2017

  28. [28]

    the Secure Remote Update Protocol

    Poulter AJ, Johnson SJ, Cox SJ. Extensions and Enhancements to “the Secure Remote Update Protocol”. Future Internet. 2017;9(4):59

  29. [29]

    An insight into extreme learning machines: random neurons, random features and kernels

    Huang G-B. An insight into extreme learning machines: random neurons, random features and kernels. Cognitive Computation. 2014;6(3):376-90

  30. [30]

    Dissecting android malware: Characterization and evolution

    Jiang X, Zhou Y, editors. Dissecting android malware: Characterization and evolution. 2012 IEEE Symposium on Security and Privacy; 2012: IEEE

  31. [31]

    A mathematical theory of communication

    Shannon CE. A mathematical theory of communication. Bell system technical journal. 1948;27(3):379- 423

  32. [32]

    Mobile malware anomaly-based detection systems using static analysis features/Ahmad Firdaus Zainal Abidin: University of Malaya; 2017

    Ahmad Firdaus ZA. Mobile malware anomaly-based detection systems using static analysis features/Ahmad Firdaus Zainal Abidin: University of Malaya; 2017

  33. [33]

    Information gain and a general measure of correlation

    Kent JT. Information gain and a general measure of correlation. Biometrika. 1983;70(1):163-73

  34. [34]

    Detecting DGA malware using NetFlow

    Grill M, Nikolaev I, Valeros V, Rehak M, editors. Detecting DGA malware using NetFlow. 2015 IFIP/IEEE International Symposium on Integrated Network Management (IM); 2015: IEEE

  35. [35]

    Adaptive and online network intrusion detection system using clustering and Extreme Learning Machines

    Roshan S, Miche Y, Akusok A, Lendasse A. Adaptive and online network intrusion detection system using clustering and Extreme Learning Machines. Journal of the Franklin Institute. 2018;355(4):1752-79

  36. [36]

    Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation

    Powers DM. Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation. 2011

  37. [37]

    Generalized fisher score for feature selection

    Gu Q, Li Z, Han J. Generalized fisher score for feature selection. arXiv preprint arXiv:12023725. 2012

  38. [38]

    The CTU-13 dataset a labeled dataset with botnet-normal-and-background-traffic 2019 [cited 2019 27 Feb 2019]

    CTU. The CTU-13 dataset a labeled dataset with botnet-normal-and-background-traffic 2019 [cited 2019 27 Feb 2019]. Available from: https://mcfp.weebly.com/the-ctu-13-dataset-a-labeled-dataset-with-botnet-normal- and-background-traffic.html#