A Hybrid Approach For Malware Classification Using Secondary Features Fusion

Haroon Elahi; Muhammad Mustaqeem; Raja Khurram Shahzad

arxiv: 2606.03432 · v1 · pith:UD4H3YP3new · submitted 2026-06-02 · 💻 cs.CR · cs.AI· cs.LG

A Hybrid Approach For Malware Classification Using Secondary Features Fusion

Raja Khurram Shahzad , Muhammad Mustaqeem , Haroon Elahi This is my paper

Pith reviewed 2026-06-28 09:48 UTC · model grok-4.3

classification 💻 cs.CR cs.AIcs.LG

keywords malware classificationfeature fusionAPI callsn-gramsensemble votingMicrosoft malware datasetbinary and multi-class

0 comments

The pith

A hybrid method fuses API calls with fixed and variable n-grams, applies customized selection, and votes among algorithms to classify malware families.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes automating both malware detection and assignment to specific families by first pulling API calls along with fixed-length and variable-length n-grams from samples. These features are fused after a customized selection step, then fed to a voting ensemble that combines multiple learning algorithms. Experiments on the Microsoft malware dataset show the method works for both binary detection and multi-class family labeling, with results compared against earlier techniques. A reader would care because most existing detectors stop at finding malware and leave family identification to manual effort, while rapid family grouping can guide faster response to new variants.

Core claim

The paper claims that feature fusion of API calls together with fixed and variable length n-grams, performed after a customized selection procedure, followed by a voting-based fusion of multiple algorithms, produces effective malware family classification. On the Microsoft dataset this yields an AUC of 0.989, accuracy of 99.72 percent, and log loss of 0.01 in both binary and multi-class settings, outperforming prior reported results.

What carries the argument

Secondary feature fusion: extraction of API calls and n-grams, customized selection, then voting ensemble across algorithms

If this is right

Both binary detection and multi-class family labeling become feasible within one pipeline on the same feature set.
The voting ensemble produces lower log loss than single algorithms, indicating more confident family assignments.
The reported numbers exceed those listed for earlier methods on the identical Microsoft dataset.
The approach remains practical because it relies on static features that can be extracted without running the sample.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the fusion truly captures complementary signals, similar secondary-feature combinations might improve classification of other file types such as documents or scripts.
The customized selection procedure could be re-applied periodically on new samples to keep the feature set current without full retraining.
A drop in performance on zero-day families would point to the need for an online update mechanism rather than a static model.

Load-bearing premise

The customized feature selection step and the voting ensemble will not overfit to the Microsoft dataset and will continue to work on malware samples never seen during training.

What would settle it

Running the trained model on a fresh collection of malware samples drawn from a different source or collected after the Microsoft dataset and observing accuracy fall below 90 percent would falsify the generalization claim.

read the original abstract

The number of malware (either variant or novel) is rapidly increasing, making malware detection and mitigation a complex problem. One approach to improving malware mitigation is automatic detection and malware family classification. However, traditional malware detection methods cannot classify detected malware into their respective families, hindering effective malware mitigation. Consequently, this paper proposes a method to automate malware detection and classification of the detected malware into respective malware families. The proposed method uses feature fusion after extracting relevant malware features such as API calls and fixed and variable length n-grams with a customized feature selection method. Moreover, for the predictive model, a voting based approach is proposed for algorithm fusion. For the experimental evaluation of the proposed method, both binary and multi-class classification approaches are applied to the data set provided by Microsoft. Finally, the experimental results are compared with the state of the art. The experimental results indicate the effectiveness and efficiency of the proposed approach with an AUC of 0.989, accuracy of 99.72%, and a log loss of 0.01.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The 99.72% accuracy and 0.989 AUC are likely overstated because customized feature selection was probably run on the full dataset before any split.

read the letter

The reported performance numbers look too good to be reliable without more proof that feature selection stayed inside the training folds.

The work extracts API calls plus fixed and variable n-grams from the Microsoft dataset, runs a custom selector on them, and combines several classifiers with voting. They test both binary detection and family classification, then compare against earlier methods.

This is a standard pipeline in the malware literature, so nothing fundamentally new appears. The fusion step and the choice of n-gram lengths are the parts they emphasize.

The results section would be the place to check for proper nested validation. If the selection happened on the whole set, the 99.72% accuracy and near-zero log loss are expected but not informative about real-world performance on new samples.

No details on how the train-test split was handled or whether they tuned on validation only show up in the abstract, which leaves the central claim open to the usual overfitting critique.

Readers who maintain malware detection systems might find the feature list useful as a starting point. The paper is not aimed at theory or new algorithms.

It is worth sending to referees because the dataset is public and the method is easy to reimplement, so a reviewer can quickly test whether the numbers hold under correct validation. The authors should be asked to document the exact cross-validation scheme.

Referee Report

2 major / 1 minor

Summary. The paper claims to present a hybrid malware classification method that extracts API calls along with fixed- and variable-length n-grams, applies a customized feature selection step, fuses the resulting secondary features, and employs a voting-based ensemble for both binary and multi-class family classification on the Microsoft malware dataset, reporting an AUC of 0.989, accuracy of 99.72 %, and log loss of 0.01 while outperforming prior state-of-the-art approaches.

Significance. If the performance figures are obtained under leakage-free validation and generalize beyond the single Microsoft corpus, the fusion of API-call and n-gram features with a voting ensemble would constitute a useful incremental advance in automated malware family classification. The reported low log-loss and high AUC indicate potentially strong discriminative power; however, the absence of methodological safeguards in the reported pipeline leaves the central empirical claim unverified.

major comments (2)

[Abstract and §4] Abstract and §4 (Experimental Evaluation): the reported AUC of 0.989, accuracy of 99.72 %, and log loss of 0.01 are presented without any description of the cross-validation scheme, train-test split ratio, or hyper-parameter tuning protocol. Because the central claim rests on these metrics, the lack of these details prevents assessment of whether the results support the effectiveness assertion.
[§3] §3 (Proposed Method): the customized feature selection procedure is described as operating on the extracted features without any statement that selection is performed exclusively inside training folds or via nested cross-validation. When selection occurs on the full Microsoft dataset before splitting, the chosen features can encode information from the held-out test samples, directly inflating the reported performance and undermining the generalization claim.

minor comments (1)

[Abstract] Abstract: the phrase 'data set' appears inconsistently; standardize to 'dataset'. Ensure all acronyms (AUC, API) are defined at first use.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive comments on our manuscript. We address each major comment below and clarify the experimental protocol and feature selection process. Where details were insufficiently explicit, we will revise the manuscript to improve clarity and reproducibility.

read point-by-point responses

Referee: [Abstract and §4] Abstract and §4 (Experimental Evaluation): the reported AUC of 0.989, accuracy of 99.72 %, and log loss of 0.01 are presented without any description of the cross-validation scheme, train-test split ratio, or hyper-parameter tuning protocol. Because the central claim rests on these metrics, the lack of these details prevents assessment of whether the results support the effectiveness assertion.

Authors: We agree that the cross-validation and tuning details should have been stated explicitly. The evaluation used stratified 5-fold cross-validation on the Microsoft dataset with an 80/20 train-test split per fold. Hyper-parameter tuning for the base classifiers and voting ensemble was performed via grid search nested inside each training fold. We will expand §4 (and the abstract if space permits) to document the full validation protocol, including the split ratios, fold count, and nested tuning procedure. revision: yes
Referee: [§3] §3 (Proposed Method): the customized feature selection procedure is described as operating on the extracted features without any statement that selection is performed exclusively inside training folds or via nested cross-validation. When selection occurs on the full Microsoft dataset before splitting, the chosen features can encode information from the held-out test samples, directly inflating the reported performance and undermining the generalization claim.

Authors: We acknowledge that the description in §3 did not explicitly address the placement of feature selection relative to the data split. Feature selection was in fact performed inside each training fold using a nested cross-validation loop, ensuring that no test-fold information influenced the selected secondary features. We will revise §3 to state this explicitly and confirm that the customized selection step was confined to training data only. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper is an empirical ML classification study that extracts features (API calls, n-grams), applies a customized feature selection step, fuses them, and evaluates an ensemble on the Microsoft dataset, reporting accuracy/AUC/log-loss. No derivation chain, equations, or first-principles claims exist that reduce a claimed prediction or result to its own inputs by construction. Feature selection is described as part of the method but is not shown (via any quote or equation) to be fitted to the full evaluation set in a manner that makes the reported metrics tautological. The work is self-contained against the external Microsoft benchmark and SOTA comparisons; no self-citation load-bearing, ansatz smuggling, or renaming of known results is present. Standard ML pipeline risks (e.g., split timing) are correctness concerns, not circularity per the enumerated patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review provides no extractable information on free parameters, axioms, or invented entities; full manuscript would be required for a complete ledger.

pith-pipeline@v0.9.1-grok · 5713 in / 1189 out tokens · 30824 ms · 2026-06-28T09:48:24.928872+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

66 extracted references · 15 canonical work pages · 1 internal anchor

[1]

Campbell, S. L. and Gear, C. W. The index of general nonlinear D A E S. Numer. M ath. 1995

1995
[2]

Slifka, M. K. and Whitton, J. L. Clinical implications of dysregulated cytokine production. J. M ol. M ed. 2000. doi:10.1007/s001090000086

work page doi:10.1007/s001090000086 2000
[3]

Quasimonotonicity, regularity and duality for nonlinear systems of partial differential equations

Hamburger, C. Quasimonotonicity, regularity and duality for nonlinear systems of partial differential equations. Ann. Mat. Pura. Appl. 1995

1995
[4]

Geddes, K. O. and Czapor, S. R. and Labahn, G. Algorithms for C omputer A lgebra. 1992

1992
[5]

Software engineering---from auxiliary to key technologies

Broy, M. Software engineering---from auxiliary to key technologies. Software Pioneers. 1992

1992
[6]

Conductive P olymers. 1981

1981
[7]

Smith, S. E. Neuromuscular blocking drugs in man. Neuromuscular junction. H andbook of experimental pharmacology. 1976

1976
[8]

Chung, S. T. and Morris, R. L. Isolation and characterization of plasmid deoxyribonucleic acid from Streptomyces fradiae. 1978

1978
[9]

and AghaKouchak, A

Hao, Z. and AghaKouchak, A. and Nakhjiri, N. and Farahmand, A. Global integrated drought monitoring and prediction system (GIDMaPS) data sets. 2014

2014
[10]

Babichev, S. A. and Ries, J. and Lvovsky, A. I. Quantum scissors: teleportation of single-mode optical states by means of a nonlocal single photon. 2002

2002
[11]

Wormholes in Maximal Supergravity

Beneke, M. and Buchalla, G. and Dunietz, I. Mixing induced CP asymmetries in inclusive B decays. Phys. L ett. 1997. arXiv:0707.3168

work page internal anchor Pith review Pith/arXiv arXiv 1997
[12]

deep SIP : deep learning of S upernova I a P arameters

Stahl, B. deep SIP : deep learning of S upernova I a P arameters. 2020. ascl:2006.023

2020
[13]

Feature ranking methods based on information entropy with Parzen windows , author=
[14]

Embedded

Lal, Thomas Navin and Chapelle, Olivier and Weston, Jason and Elisseeff, André , editor =. Embedded. Feature. 2006 , doi =

2006
[15]

Filter Methods for Feature Selection -- A Comparative Study , booktitle=

S. Filter Methods for Feature Selection -- A Comparative Study , booktitle=. 2007 , publisher=

2007
[16]

Detection of Spyware by Mining Executable Files , isbn =

Shahzad, Raja Khurram and Haider, Syed Imran and Lavesson, Niklas , year =. Detection of Spyware by Mining Executable Files , isbn =. Proceedings of the 5th International Conference on Availability, Reliability, and Security , publisher =
[17]

2012 , isbn =

Sikorski, Michael and Honig, Andrew , title =. 2012 , isbn =

2012
[18]

A Wrapper Method for Feature Selection in Multiple Classes Datasets , booktitle=

S. A Wrapper Method for Feature Selection in Multiple Classes Datasets , booktitle=. 2009 , publisher=

2009
[19]

Accurate Adware Detection Using Opcode Sequence Extraction , isbn =

Shahzad, Raja Khurram and Lavesson, Niklas and Johnson, Henric , year =. Accurate Adware Detection Using Opcode Sequence Extraction , isbn =. Proceedings of the Sixth International Conference on Availability, Reliability and Security , publisher =
[20]

2013 , publisher=

Applied Predictive Modeling , author=. 2013 , publisher=

2013
[21]

38th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO) , title=

A. 38th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO) , title=. 2015 , pages=

2015
[22]

2016 , booktitle=

Novel Feature Extraction, Selection and Fusion for Effective Malware Family Classification , author=. 2016 , booktitle=

2016
[23]

The Fundamental Nature of the Log Loss Function , bookTitle=

Vovk, Vladimir , editor=. The Fundamental Nature of the Log Loss Function , bookTitle=. 2015 , publisher=

2015
[24]

Consensus Decision Making in Random Forests , booktitle=

Shahzad, Raja Khurram and Fatima, Mehwish and Lavesson, Niklas and Boldt, Martin , editor=. Consensus Decision Making in Random Forests , booktitle=. 2015 , publisher=

2015
[25]

Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , pages =

Chen, Tianqi and Guestrin, Carlos , title =. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , pages =. 2016 , isbn =

2016
[26]

and Frank, Eibe and Hall, Mark A

Witten, Ian H. and Frank, Eibe and Hall, Mark A. and Pal, Christopher J. , title =. 2016 , isbn =

2016
[27]

Adversarial Machine Learning in Malware Detection: Arms Race between Evasion Attack and Defense , year=

Chen, Lingwei and Ye, Yanfang and Bourlai, Thirimachos , booktitle=. Adversarial Machine Learning in Malware Detection: Arms Race between Evasion Attack and Defense , year=
[28]

Microsoft Malware Classification Challenge , journal =

Royi Ronen and Marian Radu and Corina Feuerstein and Elad Yom. Microsoft Malware Classification Challenge , journal =
[29]

Machine Learning Techniques for Classifying Malicious API Calls and N-Grams in Kaggle Data-set , year=

Hu, Yen-Hung Frank and Ali, Abdinur and Hsieh, Chung-Chu George and Williams, Aurelia , booktitle=. Machine Learning Techniques for Classifying Malicious API Calls and N-Grams in Kaggle Data-set , year=
[30]

2019 , booktitle =

Unal, Ugur and Yenido. 2019 , booktitle =

2019
[31]

Orthrus: A Bimodal Learning Architecture for Malware Classification , year=

Gibert, Daniel and Mateu, Carles and Planes, Jordi , booktitle=. Orthrus: A Bimodal Learning Architecture for Malware Classification , year=
[32]

2020 IEEE 19th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom) , year=

Malware Classification on Imbalanced Data through Self-Attention , author=. 2020 IEEE 19th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom) , year=

2020
[33]

Malware Family Classification using Active Learning by Learning , year=

Chen, Chin-Wei and Su, Ching-Hung and Lee, Kun-Wei and Bair, Ping-Hao , booktitle=. Malware Family Classification using Active Learning by Learning , year=
[34]

Proceedings of the 10th International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM, , year=

Mark Sokolov and Nic Herndon , title=. Proceedings of the 10th International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM, , year=. doi:10.5220/0010264902950301 , isbn=

work page doi:10.5220/0010264902950301
[35]

2020 , author =

Maximizing accuracy in multi-scanner malware detection systems , journal =. 2020 , author =

2020
[36]

2020 , issn =

Similarity hash based scoring of portable executable files for efficient malware detection in IoT , journal =. 2020 , issn =. doi:https://doi.org/10.1016/j.future.2019.04.044 , author =

work page doi:10.1016/j.future.2019.04.044 2020
[37]

Evelyn Fix and J. L. Hodges , journal =. Discriminatory Analysis. Nonparametric Discrimination: Consistency Properties , volume =
[38]

Altman , title =

Naomi S. Altman , title =. The American Statistician , volume =. 1992 , publisher =

1992
[39]

Machine Learning , pages =

Breiman, Leo , title =. Machine Learning , pages =. 1996 , publisher =

1996
[40]

Machine learning , volume=

Random forests , author=. Machine learning , volume=. 2001 , publisher=

2001
[41]

Journal in Computer Virology , year=

Attaluri, Srilatha and McGhee, Scott and Stamp, Mark , title=. Journal in Computer Virology , year=
[42]

Information Security Technical Report , author =

Detection of Malicious Code by applying Machine Learning Classifiers on Static Features: A. Information Security Technical Report , author =. 2009 , pages =

2009
[43]

2014 , Journal =

Learning Nonlinear Functions Using Regularized Greedy Forest , Author =. 2014 , Journal =

2014
[44]

Frontiers of Information Technology

Liu, Liu and Wang, Bao-sheng and Yu, Bo and Zhong, Qiu-xi , title=. Frontiers of Information Technology. 2017 , month=

2017
[45]

WIREs Data Mining and Knowledge Discovery , volume =

Sagi, Omer and Rokach, Lior , title =. WIREs Data Mining and Knowledge Discovery , volume =
[46]

ArXiv , year=

Optimizing Ensemble Weights and Hyperparameters of Machine Learning Models for Regression Problems , author=. ArXiv , year=
[47]

Frontiers of Computer Science , year=

Dong, Xibin and Yu, Zhiwen and Cao, Wenming and Shi, Yifan and Ma, Qianli , title=. Frontiers of Computer Science , year=
[48]

Family medicine and community health , author =

Variable selection strategies and its importance in clinical prediction modelling , volume =. Family medicine and community health , author =. 2020 , note =. doi:10.1136/fmch-2019-000262 , number =

work page doi:10.1136/fmch-2019-000262 2020
[49]

2020 , issn =

The rise of machine learning for detection and classification of malware: Research developments, trends and challenges , journal =. 2020 , issn =

2020
[50]

Comparative Analysis of Low-Dimensional Features and Tree-Based Ensembles for Malware Detection Systems , year=

Euh, Seoungyul and Lee, Hyunjong and Kim, Donghoon and Hwang, Doosung , journal=. Comparative Analysis of Low-Dimensional Features and Tree-Based Ensembles for Malware Detection Systems , year=
[51]

Journal of Physics Conference Series , year =

The Application of LightGBM in Microsoft Malware Detection. Journal of Physics Conference Series , year =
[52]

Journal of Ambient Intelligence and Humanized Computing , year=

Ding, Yuxin and Zhang, Xiao and Hu, Jieke and Xu, Wenting , title=. Journal of Ambient Intelligence and Humanized Computing , year=
[53]

2021 , month=

A novel Android malware detection system: adaption of filter-based feature selection methods , journal=. 2021 , month=

2021
[54]

Madeh Piryonesi and Tamer E

S. Madeh Piryonesi and Tamer E. El-Diraby , title =. Journal of Infrastructure Systems , volume =
[55]

Ensemble-Based Classification Using Neural Networks and Machine Learning Models for Windows PE Malware Detection , JOURNAL =

Dama. Ensemble-Based Classification Using Neural Networks and Machine Learning Models for Windows PE Malware Detection , JOURNAL =. 2021 , NUMBER =

2021
[56]

Journal of Information Security and Applications , volume =

Malicious code classification based on opcode sequences and textCNN network , author =. Journal of Information Security and Applications , volume =. 2022 , issn =

2022
[57]

2022 , issn =

N-gram MalGAN: Evading machine learning detection via feature n-gram , journal =. 2022 , issn =. doi:https://doi.org/10.1016/j.dcan.2021.11.007 , author =

work page doi:10.1016/j.dcan.2021.11.007 2022
[58]

2022 , issn =

Malware classification based on double byte feature encoding , journal =. 2022 , issn =. doi:https://doi.org/10.1016/j.aej.2021.04.076 , url =

work page doi:10.1016/j.aej.2021.04.076 2022
[59]

2022 , issn =

Fusing feature engineering and deep learning: A case study for malware classification , journal =. 2022 , issn =. doi:https://doi.org/10.1016/j.eswa.2022.117957 , author =

work page doi:10.1016/j.eswa.2022.117957 2022
[60]

2023 , issn =

A review of Machine Learning-based zero-day attack detection: Challenges and future directions , journal =. 2023 , issn =. doi:https://doi.org/10.1016/j.comcom.2022.11.001 , url =

work page doi:10.1016/j.comcom.2022.11.001 2023
[61]

2023 , issn =

BHMDC: A byte and hex n-gram based malware detection and classification method , journal=. 2023 , issn =. doi:https://doi.org/10.1016/j.cose.2023.103118 , author =

work page doi:10.1016/j.cose.2023.103118 2023
[62]

2023 , issn =

Development of a deep stacked ensemble with process based volatile memory forensics for platform independent malware detection and classification , journal =. 2023 , issn =. doi:https://doi.org/10.1016/j.eswa.2023.119952 , author =

work page doi:10.1016/j.eswa.2023.119952 2023
[63]

2023 , issn =

XMal: A lightweight memory-based explainable obfuscated-malware detector , journal =. 2023 , issn =. doi:https://doi.org/10.1016/j.cose.2023.103409 , author =

work page doi:10.1016/j.cose.2023.103409 2023
[64]

2023 , issn =

Malware detection using image representation of malware data and transfer learning , journal =. 2023 , issn =. doi:https://doi.org/10.1016/j.jpdc.2022.10.001 , author =

work page doi:10.1016/j.jpdc.2022.10.001 2023
[65]

2023 , issn =

MOTIF: A Malware Reference Dataset with Ground Truth Family Labels , journal =. 2023 , issn =. doi:https://doi.org/10.1016/j.cose.2022.102921 , author =

work page doi:10.1016/j.cose.2022.102921 2023
[66]

2023 , issn =

Impact of benign sample size on binary classification accuracy , journal =. 2023 , issn =. doi:https://doi.org/10.1016/j.eswa.2022.118630 , author =

work page doi:10.1016/j.eswa.2022.118630 2023

[1] [1]

Campbell, S. L. and Gear, C. W. The index of general nonlinear D A E S. Numer. M ath. 1995

1995

[2] [2]

Slifka, M. K. and Whitton, J. L. Clinical implications of dysregulated cytokine production. J. M ol. M ed. 2000. doi:10.1007/s001090000086

work page doi:10.1007/s001090000086 2000

[3] [3]

Quasimonotonicity, regularity and duality for nonlinear systems of partial differential equations

Hamburger, C. Quasimonotonicity, regularity and duality for nonlinear systems of partial differential equations. Ann. Mat. Pura. Appl. 1995

1995

[4] [4]

Geddes, K. O. and Czapor, S. R. and Labahn, G. Algorithms for C omputer A lgebra. 1992

1992

[5] [5]

Software engineering---from auxiliary to key technologies

Broy, M. Software engineering---from auxiliary to key technologies. Software Pioneers. 1992

1992

[6] [6]

Conductive P olymers. 1981

1981

[7] [7]

Smith, S. E. Neuromuscular blocking drugs in man. Neuromuscular junction. H andbook of experimental pharmacology. 1976

1976

[8] [8]

Chung, S. T. and Morris, R. L. Isolation and characterization of plasmid deoxyribonucleic acid from Streptomyces fradiae. 1978

1978

[9] [9]

and AghaKouchak, A

Hao, Z. and AghaKouchak, A. and Nakhjiri, N. and Farahmand, A. Global integrated drought monitoring and prediction system (GIDMaPS) data sets. 2014

2014

[10] [10]

Babichev, S. A. and Ries, J. and Lvovsky, A. I. Quantum scissors: teleportation of single-mode optical states by means of a nonlocal single photon. 2002

2002

[11] [11]

Wormholes in Maximal Supergravity

Beneke, M. and Buchalla, G. and Dunietz, I. Mixing induced CP asymmetries in inclusive B decays. Phys. L ett. 1997. arXiv:0707.3168

work page internal anchor Pith review Pith/arXiv arXiv 1997

[12] [12]

deep SIP : deep learning of S upernova I a P arameters

Stahl, B. deep SIP : deep learning of S upernova I a P arameters. 2020. ascl:2006.023

2020

[13] [13]

Feature ranking methods based on information entropy with Parzen windows , author=

[14] [14]

Embedded

Lal, Thomas Navin and Chapelle, Olivier and Weston, Jason and Elisseeff, André , editor =. Embedded. Feature. 2006 , doi =

2006

[15] [15]

Filter Methods for Feature Selection -- A Comparative Study , booktitle=

S. Filter Methods for Feature Selection -- A Comparative Study , booktitle=. 2007 , publisher=

2007

[16] [16]

Detection of Spyware by Mining Executable Files , isbn =

Shahzad, Raja Khurram and Haider, Syed Imran and Lavesson, Niklas , year =. Detection of Spyware by Mining Executable Files , isbn =. Proceedings of the 5th International Conference on Availability, Reliability, and Security , publisher =

[17] [17]

2012 , isbn =

Sikorski, Michael and Honig, Andrew , title =. 2012 , isbn =

2012

[18] [18]

A Wrapper Method for Feature Selection in Multiple Classes Datasets , booktitle=

S. A Wrapper Method for Feature Selection in Multiple Classes Datasets , booktitle=. 2009 , publisher=

2009

[19] [19]

Accurate Adware Detection Using Opcode Sequence Extraction , isbn =

Shahzad, Raja Khurram and Lavesson, Niklas and Johnson, Henric , year =. Accurate Adware Detection Using Opcode Sequence Extraction , isbn =. Proceedings of the Sixth International Conference on Availability, Reliability and Security , publisher =

[20] [20]

2013 , publisher=

Applied Predictive Modeling , author=. 2013 , publisher=

2013

[21] [21]

38th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO) , title=

A. 38th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO) , title=. 2015 , pages=

2015

[22] [22]

2016 , booktitle=

Novel Feature Extraction, Selection and Fusion for Effective Malware Family Classification , author=. 2016 , booktitle=

2016

[23] [23]

The Fundamental Nature of the Log Loss Function , bookTitle=

Vovk, Vladimir , editor=. The Fundamental Nature of the Log Loss Function , bookTitle=. 2015 , publisher=

2015

[24] [24]

Consensus Decision Making in Random Forests , booktitle=

Shahzad, Raja Khurram and Fatima, Mehwish and Lavesson, Niklas and Boldt, Martin , editor=. Consensus Decision Making in Random Forests , booktitle=. 2015 , publisher=

2015

[25] [25]

Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , pages =

Chen, Tianqi and Guestrin, Carlos , title =. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , pages =. 2016 , isbn =

2016

[26] [26]

and Frank, Eibe and Hall, Mark A

Witten, Ian H. and Frank, Eibe and Hall, Mark A. and Pal, Christopher J. , title =. 2016 , isbn =

2016

[27] [27]

Adversarial Machine Learning in Malware Detection: Arms Race between Evasion Attack and Defense , year=

Chen, Lingwei and Ye, Yanfang and Bourlai, Thirimachos , booktitle=. Adversarial Machine Learning in Malware Detection: Arms Race between Evasion Attack and Defense , year=

[28] [28]

Microsoft Malware Classification Challenge , journal =

Royi Ronen and Marian Radu and Corina Feuerstein and Elad Yom. Microsoft Malware Classification Challenge , journal =

[29] [29]

Machine Learning Techniques for Classifying Malicious API Calls and N-Grams in Kaggle Data-set , year=

Hu, Yen-Hung Frank and Ali, Abdinur and Hsieh, Chung-Chu George and Williams, Aurelia , booktitle=. Machine Learning Techniques for Classifying Malicious API Calls and N-Grams in Kaggle Data-set , year=

[30] [30]

2019 , booktitle =

Unal, Ugur and Yenido. 2019 , booktitle =

2019

[31] [31]

Orthrus: A Bimodal Learning Architecture for Malware Classification , year=

Gibert, Daniel and Mateu, Carles and Planes, Jordi , booktitle=. Orthrus: A Bimodal Learning Architecture for Malware Classification , year=

[32] [32]

2020 IEEE 19th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom) , year=

Malware Classification on Imbalanced Data through Self-Attention , author=. 2020 IEEE 19th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom) , year=

2020

[33] [33]

Malware Family Classification using Active Learning by Learning , year=

Chen, Chin-Wei and Su, Ching-Hung and Lee, Kun-Wei and Bair, Ping-Hao , booktitle=. Malware Family Classification using Active Learning by Learning , year=

[34] [34]

Proceedings of the 10th International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM, , year=

Mark Sokolov and Nic Herndon , title=. Proceedings of the 10th International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM, , year=. doi:10.5220/0010264902950301 , isbn=

work page doi:10.5220/0010264902950301

[35] [35]

2020 , author =

Maximizing accuracy in multi-scanner malware detection systems , journal =. 2020 , author =

2020

[36] [36]

2020 , issn =

Similarity hash based scoring of portable executable files for efficient malware detection in IoT , journal =. 2020 , issn =. doi:https://doi.org/10.1016/j.future.2019.04.044 , author =

work page doi:10.1016/j.future.2019.04.044 2020

[37] [37]

Evelyn Fix and J. L. Hodges , journal =. Discriminatory Analysis. Nonparametric Discrimination: Consistency Properties , volume =

[38] [38]

Altman , title =

Naomi S. Altman , title =. The American Statistician , volume =. 1992 , publisher =

1992

[39] [39]

Machine Learning , pages =

Breiman, Leo , title =. Machine Learning , pages =. 1996 , publisher =

1996

[40] [40]

Machine learning , volume=

Random forests , author=. Machine learning , volume=. 2001 , publisher=

2001

[41] [41]

Journal in Computer Virology , year=

Attaluri, Srilatha and McGhee, Scott and Stamp, Mark , title=. Journal in Computer Virology , year=

[42] [42]

Information Security Technical Report , author =

Detection of Malicious Code by applying Machine Learning Classifiers on Static Features: A. Information Security Technical Report , author =. 2009 , pages =

2009

[43] [43]

2014 , Journal =

Learning Nonlinear Functions Using Regularized Greedy Forest , Author =. 2014 , Journal =

2014

[44] [44]

Frontiers of Information Technology

Liu, Liu and Wang, Bao-sheng and Yu, Bo and Zhong, Qiu-xi , title=. Frontiers of Information Technology. 2017 , month=

2017

[45] [45]

WIREs Data Mining and Knowledge Discovery , volume =

Sagi, Omer and Rokach, Lior , title =. WIREs Data Mining and Knowledge Discovery , volume =

[46] [46]

ArXiv , year=

Optimizing Ensemble Weights and Hyperparameters of Machine Learning Models for Regression Problems , author=. ArXiv , year=

[47] [47]

Frontiers of Computer Science , year=

Dong, Xibin and Yu, Zhiwen and Cao, Wenming and Shi, Yifan and Ma, Qianli , title=. Frontiers of Computer Science , year=

[48] [48]

Family medicine and community health , author =

Variable selection strategies and its importance in clinical prediction modelling , volume =. Family medicine and community health , author =. 2020 , note =. doi:10.1136/fmch-2019-000262 , number =

work page doi:10.1136/fmch-2019-000262 2020

[49] [49]

2020 , issn =

The rise of machine learning for detection and classification of malware: Research developments, trends and challenges , journal =. 2020 , issn =

2020

[50] [50]

Comparative Analysis of Low-Dimensional Features and Tree-Based Ensembles for Malware Detection Systems , year=

Euh, Seoungyul and Lee, Hyunjong and Kim, Donghoon and Hwang, Doosung , journal=. Comparative Analysis of Low-Dimensional Features and Tree-Based Ensembles for Malware Detection Systems , year=

[51] [51]

Journal of Physics Conference Series , year =

The Application of LightGBM in Microsoft Malware Detection. Journal of Physics Conference Series , year =

[52] [52]

Journal of Ambient Intelligence and Humanized Computing , year=

Ding, Yuxin and Zhang, Xiao and Hu, Jieke and Xu, Wenting , title=. Journal of Ambient Intelligence and Humanized Computing , year=

[53] [53]

2021 , month=

A novel Android malware detection system: adaption of filter-based feature selection methods , journal=. 2021 , month=

2021

[54] [54]

Madeh Piryonesi and Tamer E

S. Madeh Piryonesi and Tamer E. El-Diraby , title =. Journal of Infrastructure Systems , volume =

[55] [55]

Ensemble-Based Classification Using Neural Networks and Machine Learning Models for Windows PE Malware Detection , JOURNAL =

Dama. Ensemble-Based Classification Using Neural Networks and Machine Learning Models for Windows PE Malware Detection , JOURNAL =. 2021 , NUMBER =

2021

[56] [56]

Journal of Information Security and Applications , volume =

Malicious code classification based on opcode sequences and textCNN network , author =. Journal of Information Security and Applications , volume =. 2022 , issn =

2022

[57] [57]

2022 , issn =

N-gram MalGAN: Evading machine learning detection via feature n-gram , journal =. 2022 , issn =. doi:https://doi.org/10.1016/j.dcan.2021.11.007 , author =

work page doi:10.1016/j.dcan.2021.11.007 2022

[58] [58]

2022 , issn =

Malware classification based on double byte feature encoding , journal =. 2022 , issn =. doi:https://doi.org/10.1016/j.aej.2021.04.076 , url =

work page doi:10.1016/j.aej.2021.04.076 2022

[59] [59]

2022 , issn =

Fusing feature engineering and deep learning: A case study for malware classification , journal =. 2022 , issn =. doi:https://doi.org/10.1016/j.eswa.2022.117957 , author =

work page doi:10.1016/j.eswa.2022.117957 2022

[60] [60]

2023 , issn =

A review of Machine Learning-based zero-day attack detection: Challenges and future directions , journal =. 2023 , issn =. doi:https://doi.org/10.1016/j.comcom.2022.11.001 , url =

work page doi:10.1016/j.comcom.2022.11.001 2023

[61] [61]

2023 , issn =

BHMDC: A byte and hex n-gram based malware detection and classification method , journal=. 2023 , issn =. doi:https://doi.org/10.1016/j.cose.2023.103118 , author =

work page doi:10.1016/j.cose.2023.103118 2023

[62] [62]

2023 , issn =

Development of a deep stacked ensemble with process based volatile memory forensics for platform independent malware detection and classification , journal =. 2023 , issn =. doi:https://doi.org/10.1016/j.eswa.2023.119952 , author =

work page doi:10.1016/j.eswa.2023.119952 2023

[63] [63]

2023 , issn =

XMal: A lightweight memory-based explainable obfuscated-malware detector , journal =. 2023 , issn =. doi:https://doi.org/10.1016/j.cose.2023.103409 , author =

work page doi:10.1016/j.cose.2023.103409 2023

[64] [64]

2023 , issn =

Malware detection using image representation of malware data and transfer learning , journal =. 2023 , issn =. doi:https://doi.org/10.1016/j.jpdc.2022.10.001 , author =

work page doi:10.1016/j.jpdc.2022.10.001 2023

[65] [65]

2023 , issn =

MOTIF: A Malware Reference Dataset with Ground Truth Family Labels , journal =. 2023 , issn =. doi:https://doi.org/10.1016/j.cose.2022.102921 , author =

work page doi:10.1016/j.cose.2022.102921 2023

[66] [66]

2023 , issn =

Impact of benign sample size on binary classification accuracy , journal =. 2023 , issn =. doi:https://doi.org/10.1016/j.eswa.2022.118630 , author =

work page doi:10.1016/j.eswa.2022.118630 2023