A Multi-task Mixture of Experts Framework for Malware Classification, Packing Detection, and Family Attribution

Antonino Nocera; Anvin Mariya P. B.; Asmitha K. A.; Jithin S.; Roshin Sleeba C.; Serena Nicolazzo; Vinod P.

arxiv: 2606.30572 · v1 · pith:3X37PK43new · submitted 2026-06-29 · 💻 cs.CR · cs.AI

A Multi-task Mixture of Experts Framework for Malware Classification, Packing Detection, and Family Attribution

Jithin S. , Roshin Sleeba C. , Anvin Mariya P. B. , Asmitha K. A. , Vinod P. , Serena Nicolazzo , Antonino Nocera This is my paper

Pith reviewed 2026-06-30 04:58 UTC · model grok-4.3

classification 💻 cs.CR cs.AI

keywords malware classificationmixture of expertsmulti-task learningpacking detectionfamily attributionadversarial robustnessportable executable analysisEMBER features

0 comments

The pith

A Multi-Gate Mixture of Experts model performs malware family classification, packing detection, and benign identification together at a combined rate of 0.9744 while showing robustness to mutations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Malware analysis faces challenges from packed binaries, diverse families, and heterogeneity that cause single models to degrade on obfuscated or rare samples. The paper tests Mixture of Experts architectures on both high-dimensional EMBER features and raw byte sequences from PE files to handle three tasks at once: family attribution, packed versus unpacked detection, and malware versus benign classification. Multiple variants are compared in standard and mutation-based adversarial settings, with the multi-gate version delivering the highest combined detection rate and lowest failure rate while maintaining performance when samples are altered.

Core claim

The Multi-Gate MoE model achieves the best performance, reaching a combined detection rate of 0.9744 with only 2.56% failure rate. Moreover, this configuration exhibits improved robustness under mutation-induced distribution shifts.

What carries the argument

Multi-Gate MoE (MMoE) architecture that uses multiple adaptive gating mechanisms to route inputs across specialized expert networks for concurrent task-specific learning.

If this is right

Expert specialization allows the system to manage heterogeneous malware distributions more effectively than single-model approaches.
The framework supports simultaneous execution of multiple analysis tasks without requiring separate models for each.
Task-specific routing improves handling of obfuscated and rare samples in both standard and adversarial conditions.
The approach offers a scalable path toward resilient detection systems that adapt to distribution shifts from mutations.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same gating structure might transfer to other security domains that involve concurrent classification tasks on binary data.
Additional experiments on larger and more temporally diverse malware corpora would be required to verify whether the observed robustness persists outside the evaluated mutation types.
Alternative input representations, such as graph-based or behavioral features, could be routed through the same multi-gate setup to test further gains.

Load-bearing premise

The performance gains observed on the tested datasets and mutations will continue to hold for diverse unseen real-world malware without significant degradation.

What would settle it

Evaluating the trained Multi-Gate MoE on a fresh set of malware binaries collected from a different time window or source that contain novel packing methods and families absent from the original training and mutation experiments.

Figures

Figures reproduced from arXiv: 2606.30572 by Antonino Nocera, Anvin Mariya P. B., Asmitha K. A., Jithin S., Roshin Sleeba C., Serena Nicolazzo, Vinod P..

**Figure 1.** Figure 1: 1D grayscale visualizations generated from the first 1024 bytes of malware samples. where exp −|d − oi | 2 represents the Gaussian likelihood of expert[16] i producing the target d. This probabilistic formulation models the output as a mixture of Gaussian experts and encourages competitive learning. 3.2.1 Homogeneous Mixture of Experts The Homogeneous Mixture of Experts (Homo-MoE) model consists of three… view at source ↗

**Figure 2.** Figure 2: The MoE framework for Windows malware classification [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗

**Figure 3.** Figure 3: Architecture of the proposed Multi-Gate Mixture of Experts (MMoE) framework for multi-task malware analysis where zt,e is the raw score assigned to expert e by task t, and wt,e represents the final calculated gating weight. Next, a task-specific feature representation mt is constructed as a weighted sum of all expert outputs based on the routing parameters from Equation (13): mt = X E e=1 wt,e · fe(x) (14)… view at source ↗

**Figure 4.** Figure 4: Impact of the reconstruction regularization coefficient ( [PITH_FULL_IMAGE:figures/full_fig_p011_4.png] view at source ↗

**Figure 5.** Figure 5: Impact of the reconstruction regularization coefficient ( [PITH_FULL_IMAGE:figures/full_fig_p012_5.png] view at source ↗

**Figure 6.** Figure 6: Ablation analysis of tower size variation across different expert architectures on Dataset D1. The configuration utilizing shared experts of size [256, 128] and tower size of 64 yielded the minimum failure count. 12 [PITH_FULL_IMAGE:figures/full_fig_p012_6.png] view at source ↗

**Figure 7.** Figure 7: Distribution of malware samples across different malware families (excluding benign samples) [PITH_FULL_IMAGE:figures/full_fig_p014_7.png] view at source ↗

**Figure 8.** Figure 8: The t-SNE projections on the experiments which gave the best results across MMoE, HeteroMoE, and HomoMoE configurations collectively illustrate the relationship between architectural design, input representation, and learned feature quality. MMoE, operating on EMBER features with Dataset D3 (Seed 750), achieves the highest CDR (0.9755) and lowest ASR (2.45%), with its t-SNE projections reflecting well-sepa… view at source ↗

**Figure 9.** Figure 9: Comparison of EMBER feature representations and 1D image representations (1024-byte length) across Homogeneous MoE, Heterogeneous MoE, and Multi-Gate MoE (MMoE) architectures under standard, mutation-augmented, and adversarial evaluation settings. Standard evaluation corresponds to training on 53,120 original samples and testing on 13,280 original test samples, while mutation-augmented evaluation incorpora… view at source ↗

**Figure 10.** Figure 10: Comparison of Mixture of Experts (MoE) architectures under different evaluation settings using EMBER feature representations. The first setting corresponds to training on 53, 120 original samples and evaluating on 13, 280 original test samples. The second setting incorporates 4% mutation-based augmented samples into the training set and evaluates the models on the same 13, 280 original test samples. The t… view at source ↗

**Figure 11.** Figure 11: Performance Analysis of Mixture of Experts (MoE) models using 1D image representations at a length of 1024 bytes under standard and adversarial conditions. The first configuration uses raw sequential inputs to track the standard baseline performance on dataset D4. In the second approach, 1,392 structurally altered malware samples are used to test the models’ adversarial robustness. The graph displays the … view at source ↗

read the original abstract

Malware classification remains a challenging problem due to its inherent heterogeneity, the presence of packed binaries, and the diverse distribution of malware families. Traditional single-model detection mechanisms often fail to generalize across such diverse data, leading to degraded performance, particularly on obfuscated and rare malware samples. In this work, we propose a unified multi-task malware analysis framework based on Mixture of Experts (MoE) architectures. The proposed system evaluates performance across two different input representations, i.e., high-dimensional EMBER feature sets and raw 1D byte arrays extracted from Portable Executable files. It simultaneously performs three critical tasks: malware family classification, packed versus unpacked detection, and malware versus benign identification. By decomposing the problem into specialized expert networks and employing adaptive gating mechanisms, the model enables effective task-specific learning while maintaining overall scalability. We investigate multiple architectural variants, including Homogeneous MoE, Heterogeneous MoE, and Multi-Gate MoE (MMoE). Performance is evaluated in both standard and adversarial settings using original and mutated samples. The obtained results demonstrate that the Multi-Gate MoE model achieves the best performance, reaching a combined detection rate of 0.9744 with only $2.56\%$ failure rate. Moreover, this configuration exhibits improved robustness under mutation-induced distribution shifts. Our findings highlight the effectiveness of expert specialization and task-specific routing in handling complex malware distributions, making the proposed framework a promising direction for scalable and resilient malware detection systems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper applies MoE variants to three malware tasks on EMBER and byte inputs and reports a 0.9744 combined rate with mutation robustness, but thin experimental details make the generalization claim the main uncertainty.

read the letter

The core thing to know is that this is a straightforward application of multi-gate mixture of experts to joint malware family classification, packing detection, and benign/malware identification. They run it on two input types and test both clean and mutated samples, with the multi-gate version coming out on top at 0.9744 combined detection and 2.56% failure.

What the work actually does is decompose the heterogeneous problem into expert subnetworks with adaptive routing, then compare homogeneous, heterogeneous, and multi-gate setups. The mutation tests are a reasonable step toward checking robustness, and the fact that they evaluate on both feature vectors and raw bytes gives a bit more coverage than single-representation papers.

The soft spots sit in the experimental section. The abstract supplies the headline numbers but no dataset sizes, train/test splits, baseline models, or how the mutations were constructed. Without those, it is difficult to tell whether the reported gains are stable or whether the expert specialization would hold on new families or packing methods that differ from the test mutations. The generalization step from their specific shifts to broader real-world drift is the least secured part of the argument.

This is the kind of paper that would interest applied security researchers who already work with EMBER-style features and want to try multi-task routing. It is not a foundational advance, but the architecture variants are clearly described enough that someone could reimplement the idea.

I would send it to peer review. The central claim is testable once the missing experimental details are supplied, and the multi-task framing is a legitimate direction even if the current evidence is preliminary.

Referee Report

2 major / 0 minor

Summary. The manuscript proposes a multi-task Mixture of Experts (MoE) framework for simultaneous malware family classification, packing detection, and malware-vs-benign identification. It evaluates three architectural variants (Homogeneous MoE, Heterogeneous MoE, Multi-Gate MoE) on two input representations (EMBER features and raw 1D byte arrays from PE files), reporting results in both standard and mutation-induced adversarial settings. The central claim is that the Multi-Gate MoE variant achieves the highest combined detection rate of 0.9744 (2.56% failure) and exhibits improved robustness to the tested distribution shifts.

Significance. If the empirical results are reproducible and the experimental design adequately controls for dataset composition and mutation coverage, the work would demonstrate that adaptive task-specific gating can improve handling of heterogeneous and obfuscated malware data across multiple related tasks. The dual use of hand-crafted features and raw bytes, together with explicit multi-task evaluation, would constitute a concrete contribution to scalable malware analysis pipelines.

major comments (2)

[Abstract] Abstract: The headline performance figures (combined detection rate 0.9744, 2.56% failure) are presented without any accompanying information on dataset cardinality, class balance, train/test split ratios, number of families, or the precise procedure used to generate the mutated samples. These details are load-bearing for assessing whether the reported numbers support the superiority and robustness claims.
[Abstract] Abstract (robustness statement): The assertion of 'improved robustness under mutation-induced distribution shifts' rests on the unexamined assumption that the paper's chosen mutations adequately sample the space of real-world packing, obfuscation, and family drift. No quantitative comparison to held-out real-world samples or discussion of mutation coverage is supplied, directly affecting the generalization claim identified in the stress-test.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for the detailed comments on the abstract and the robustness claims. We address each point below, agreeing where revisions are warranted and providing clarifications based on the manuscript content.

read point-by-point responses

Referee: [Abstract] Abstract: The headline performance figures (combined detection rate 0.9744, 2.56% failure) are presented without any accompanying information on dataset cardinality, class balance, train/test split ratios, number of families, or the precise procedure used to generate the mutated samples. These details are load-bearing for assessing whether the reported numbers support the superiority and robustness claims.

Authors: We agree that the abstract would benefit from including summary experimental details to better support interpretation of the results. The full manuscript provides these in Section 3 (Dataset Description) and Section 4 (Experimental Setup), covering dataset cardinality, class balance, train/test splits, number of families, and the mutation generation procedure. We will revise the abstract to concisely incorporate key elements such as dataset size, number of families, split ratios, and a high-level description of the mutations, while respecting length constraints. revision: yes
Referee: [Abstract] Abstract (robustness statement): The assertion of 'improved robustness under mutation-induced distribution shifts' rests on the unexamined assumption that the paper's chosen mutations adequately sample the space of real-world packing, obfuscation, and family drift. No quantitative comparison to held-out real-world samples or discussion of mutation coverage is supplied, directly affecting the generalization claim identified in the stress-test.

Authors: The robustness claim is relative to the other MoE variants under the specific controlled mutations tested (detailed in Section 4), which simulate common packing and obfuscation techniques. We do not claim exhaustive coverage of real-world shifts. We will add discussion in the revised manuscript on the mutation types, their rationale, and limitations regarding generalization. We cannot supply new quantitative comparisons to held-out real-world samples, as that would require additional experiments beyond the current scope. revision: partial

standing simulated objections not resolved

Quantitative comparison to held-out real-world samples for mutation coverage

Circularity Check

0 steps flagged

No circularity: purely empirical ML evaluation with no derivations or self-referential reductions

full rationale

The paper reports performance metrics (e.g., 0.9744 combined detection rate) obtained by training and evaluating MoE variants on fixed datasets and mutations. No equations, first-principles derivations, or load-bearing self-citations appear in the provided text; results are direct outputs of standard supervised learning and testing procedures. The central claims rest on experimental outcomes rather than any reduction of predictions to fitted inputs or imported uniqueness theorems. This is the expected non-finding for an applied ML architecture paper.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Based solely on the abstract; no explicit free parameters, axioms, or invented entities are described in sufficient detail to enumerate.

pith-pipeline@v0.9.1-grok · 5821 in / 1003 out tokens · 42147 ms · 2026-06-30T04:58:13.215428+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

37 extracted references · 1 canonical work pages · 1 internal anchor

[1]

Malwarebazaar: Malware sample exchange platform

abuse.ch. Malwarebazaar: Malware sample exchange platform. https://bazaar.abuse.ch/, 2024. Accessed: 2026

2024
[2]

Miracle: Malware image recognition and classification by layered extraction: I

Inzamamul Alam, Md Samiullah, SM Asaduzzaman, Upama Kabir, AM Aahad, and Simon S Woo. Miracle: Malware image recognition and classification by layered extraction: I. alam et al.Data Mining and Knowledge Discovery, 39(1):10, 2025

2025
[3]

Fasnet: Federated adversarial siamese networks for robust malware image classification.Journal of Parallel and Distributed Computing, 198:105039, 2025

Namrata Govind Ambekar, Sonali Samal, N Nandini Devi, and Surmila Thokchom. Fasnet: Federated adversarial siamese networks for robust malware image classification.Journal of Parallel and Distributed Computing, 198:105039, 2025

2025
[4]

A survey on deep learning and multi-task learning techniques for malware analysis

Yacine Bensaoud et al. A survey on deep learning and multi-task learning techniques for malware analysis. Computers & Security, 139:103756, 2024

2024
[5]

Security through the eyes of ai: How visualization is shaping malware detection

Matteo Brosolo, KA Asmitha, Mauro Conti, Rafidha Rehiman KA, Muhammed Shafi KP, Serena Nicolazzo, Antonino Nocera, and P Vinod. Security through the eyes of ai: How visualization is shaping malware detection. Computer Science Review, 61:100914, 2026

2026
[6]

Sok: visualization-based malware detection techniques

Matteo Brosolo, Vinod Puthuvath, Asmitha Ka, Rafidha Rehiman, and Mauro Conti. Sok: visualization-based malware detection techniques. InProceedings of the 19th international conference on availability, reliability and security, pages 1–13, 2024. 21 A Multi-task Mixture of Experts Framework Table 10:Heterogeneous MoE – Results obtained on Adversarial Attac...

2024
[7]

Malware detection by eating a whole exe

Bryan Catanzaro and Charles Nicholas. Malware detection by eating a whole exe. InThe Workshops of the Thirty-Second AAAI Conference on Artificial Intelligence, pages 268–276, 2018

2018
[8]

Machine learning and ensemble approaches for cybersecurity: A survey.IEEE Transactions on Artificial Intelligence, 3(5):761–779, 2022

Dipankar Dasgupta et al. Machine learning and ensemble approaches for cybersecurity: A survey.IEEE Transactions on Artificial Intelligence, 3(5):761–779, 2022

2022
[9]

Modeling task relationships in multi-task learning with multi-gate mixture-of-experts

Jiaqi Ma et al. Modeling task relationships in multi-task learning with multi-gate mixture-of-experts. InKDD, pages 1930–1939, 2018

1930
[10]

Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer

Noam Shazeer et al. Outrageously large neural networks: The sparsely-gated mixture-of-experts layer.arXiv preprint arXiv:1701.06538, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[11]

Qirui et al

S. Qirui et al. Investigating the effects of packers on ml-based malware detection. InCySSS, 2022

2022
[12]

Switch transformers: Scaling to trillion parameter models with simple and efficient sparsity.Journal of Machine Learning Research, 23(120):1–39, 2022

William Fedus, Barret Zoph, and Noam Shazeer. Switch transformers: Scaling to trillion parameter models with simple and efficient sparsity.Journal of Machine Learning Research, 23(120):1–39, 2022

2022
[13]

Daniel Gibert, Carles Mateu, and Jordi Planes. The rise of machine learning for detection and classification of malware: Research developments, trends and challenges.Journal of Network and Computer Applications, 153:102526, 2020

2020
[14]

Assessing the impact of packing on static machine learning-based malware detection and classification systems.Computers & Security, 156:104495, 2025

Daniel Gibert, Nikolaos Totosis, Constantinos Patsakis, Quan Le, and Giulio Zizzo. Assessing the impact of packing on static machine learning-based malware detection and classification systems.Computers & Security, 156:104495, 2025

2025
[15]

Multi-task learning for cybersecurity applications: A comprehensive survey.Neural Computing and Applications, 36(18):10435–10468, 2024

Mohamed Ibrahim et al. Multi-task learning for cybersecurity applications: A comprehensive survey.Neural Computing and Applications, 36(18):10435–10468, 2024

2024
[16]

Jacobs, Michael I

Robert A. Jacobs, Michael I. Jordan, Steven J. Nowlan, and Geoffrey E. Hinton. Adaptive mixtures of local experts.Neural Computation, 3(1):79–87, 1991

1991
[17]

Static multi feature-based malware detection using multi spp-net in smart iot environments.IEEE Transactions on Information Forensics and Security, 19:2487–2500, 2024

Jueun Jeon, Byeonghui Jeong, Seungyeon Baek, and Young-Sik Jeong. Static multi feature-based malware detection using multi spp-net in smart iot environments.IEEE Transactions on Information Forensics and Security, 19:2487–2500, 2024. 22 A Multi-task Mixture of Experts Framework Summary:Data augmentation uniquely benefits the MMoE framework, optimizing its...

2024
[18]

Joyce et al

Robert J. Joyce et al. Ember2024 – a benchmark dataset for holistic evaluation of malware classifiers. In Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V . 2, 2025

2025
[19]

Naseem Khan, Aref Al-Tamimi, Amine Bermak, and Issa Khalil. Adaptive malware detection using sequential feature selection: A dueling double deep q-network framework for intelligent classification.Journal of Information Security and Applications, 99:104407, 2026

2026
[20]

Federated malware intelligence framework for distributed threat classification.Future Generation Computer Systems, 157:302–315, 2024

Hyun Kim et al. Federated malware intelligence framework for distributed threat classification.Future Generation Computer Systems, 157:302–315, 2024

2024
[21]

Deep learning for classification of malware system call sequences

Bojan Kolosnjaji et al. Deep learning for classification of malware system call sequences. InAI Conference, 2016

2016
[22]

Pe malware machine learning dataset

Michael Lester. Pe malware machine learning dataset. https://www.practicalsecurityanalytics.com,
[23]

Accessed for benign PE samples
[24]

Attention-driven lightweight cnn architecture for malware image classification.Expert Systems with Applications, 245:123115, 2024

Yong Liu et al. Attention-driven lightweight cnn architecture for malware image classification.Expert Systems with Applications, 245:123115, 2024

2024
[25]

Identifying useful features for malware detection in the ember dataset

Yoshihiro Oyama, Takumi Miyashita, and Hirotaka Kokubo. Identifying useful features for malware detection in the ember dataset. InCANDARW, pages 360–366, 2019

2019
[26]

Hierarchical visual encoding for scalable malware image analysis.Pattern Recognition, 158:110945, 2025

Jihoon Park et al. Hierarchical visual encoding for scalable malware image analysis.Pattern Recognition, 158:110945, 2025

2025
[27]

Portableapps.com - portable software collection

PortableApps.com. Portableapps.com - portable software collection. https://portableapps.com/, 2024. Accessed: 2026

2024
[28]

Ember feature dataset analysis for malware detection

Marian ¸ Sandor, Radu Marian Portase, and Adrian Cole¸ sa. Ember feature dataset analysis for malware detection. In2023 IEEE 19th International Conference on Intelligent Computer Communication and Processing (ICCP), pages 203–210. IEEE, 2023

2023
[29]

Migan: Gan for facilitating malware image synthesis with improved malware classification on novel dataset.Expert Systems with Applications, 241:122678, 2024

Osho Sharma, Akashdeep Sharma, and Arvind Kalia. Migan: Gan for facilitating malware image synthesis with improved malware classification on novel dataset.Expert Systems with Applications, 241:122678, 2024

2024
[30]

Static malware detection of ember windows-pe api call using machine learning.COMPUTATIONAL INTELLIGENCE AND NETWORK SECURITY, 2724(1):020001, 2023

Omkar Shinde, Anish Khobragade, and Pooja Agrawal. Static malware detection of ember windows-pe api call using machine learning.COMPUTATIONAL INTELLIGENCE AND NETWORK SECURITY, 2724(1):020001, 2023

2023
[31]

Detecting and mitigating sampling bias in cybersecurity with unlabeled data

Saravanan Thirumuruganathan, Fatih Deniz, Issa Khalil, Ting Yu, Mohamed Nabeel, and Mourad Ouzzani. Detecting and mitigating sampling bias in cybersecurity with unlabeled data. In33rd USENIX Security Symposium (USENIX Security 24), pages 1741–1758, 2024

2024
[32]

Improved multi-gate mixture-of-experts framework for multi-step gas load forecasting.Energy, 282:128553, 2023

Jian Tong et al. Improved multi-gate mixture-of-experts framework for multi-step gas load forecasting.Energy, 282:128553, 2023

2023
[33]

Transmal: Transformer-based malware image classification framework.Computers & Security, 138:103640, 2024

Lei Wang et al. Transmal: Transformer-based malware image classification framework.Computers & Security, 138:103640, 2024

2024
[34]

Multi-head mixture-of-experts

Xun Wu, Shaohan Huang, Wenhui Wang, Shuming Ma, Li Dong, and Furu Wei. Multi-head mixture-of-experts. InAdvances in Neural Information Processing Systems (NeurIPS), 2024. 23 A Multi-task Mixture of Experts Framework Table 13:Evaluation of the proposed Homogeneous-MoE framework trained on 53,120 original samples with 4% mutation-based augmentation and test...

2024
[35]

Bitcn-taefficientnet malware classification approach based on sequence and rgb fusion.Computers & Security, 139:103734, 2024

Bona Xuan, Jin Li, and Yafei Song. Bitcn-taefficientnet malware classification approach based on sequence and rgb fusion.Computers & Security, 139:103734, 2024

2024
[36]

A survey of adversarial attack and defense methods for malware classification in cyber security.IEEE Communications Surveys & Tutorials, 25(1):467–496, 2022

Senming Yan, Jing Ren, Wei Wang, Limin Sun, Wei Zhang, and Quan Yu. A survey of adversarial attack and defense methods for malware classification in cyber security.IEEE Communications Surveys & Tutorials, 25(1):467–496, 2022

2022
[37]

Multimodal transformer fusion for robust malware family classification.IEEE Transactions on Information Forensics and Security, 20:2145–2159, 2025

Wei Zhang et al. Multimodal transformer fusion for robust malware family classification.IEEE Transactions on Information Forensics and Security, 20:2145–2159, 2025. 24 A Multi-task Mixture of Experts Framework Summary:The transition to raw 1D images at a baseline length of 1024 advances both the Heterogeneous and Homogeneous MoE models, optimizing their p...

2025

[1] [1]

Malwarebazaar: Malware sample exchange platform

abuse.ch. Malwarebazaar: Malware sample exchange platform. https://bazaar.abuse.ch/, 2024. Accessed: 2026

2024

[2] [2]

Miracle: Malware image recognition and classification by layered extraction: I

Inzamamul Alam, Md Samiullah, SM Asaduzzaman, Upama Kabir, AM Aahad, and Simon S Woo. Miracle: Malware image recognition and classification by layered extraction: I. alam et al.Data Mining and Knowledge Discovery, 39(1):10, 2025

2025

[3] [3]

Fasnet: Federated adversarial siamese networks for robust malware image classification.Journal of Parallel and Distributed Computing, 198:105039, 2025

Namrata Govind Ambekar, Sonali Samal, N Nandini Devi, and Surmila Thokchom. Fasnet: Federated adversarial siamese networks for robust malware image classification.Journal of Parallel and Distributed Computing, 198:105039, 2025

2025

[4] [4]

A survey on deep learning and multi-task learning techniques for malware analysis

Yacine Bensaoud et al. A survey on deep learning and multi-task learning techniques for malware analysis. Computers & Security, 139:103756, 2024

2024

[5] [5]

Security through the eyes of ai: How visualization is shaping malware detection

Matteo Brosolo, KA Asmitha, Mauro Conti, Rafidha Rehiman KA, Muhammed Shafi KP, Serena Nicolazzo, Antonino Nocera, and P Vinod. Security through the eyes of ai: How visualization is shaping malware detection. Computer Science Review, 61:100914, 2026

2026

[6] [6]

Sok: visualization-based malware detection techniques

Matteo Brosolo, Vinod Puthuvath, Asmitha Ka, Rafidha Rehiman, and Mauro Conti. Sok: visualization-based malware detection techniques. InProceedings of the 19th international conference on availability, reliability and security, pages 1–13, 2024. 21 A Multi-task Mixture of Experts Framework Table 10:Heterogeneous MoE – Results obtained on Adversarial Attac...

2024

[7] [7]

Malware detection by eating a whole exe

Bryan Catanzaro and Charles Nicholas. Malware detection by eating a whole exe. InThe Workshops of the Thirty-Second AAAI Conference on Artificial Intelligence, pages 268–276, 2018

2018

[8] [8]

Machine learning and ensemble approaches for cybersecurity: A survey.IEEE Transactions on Artificial Intelligence, 3(5):761–779, 2022

Dipankar Dasgupta et al. Machine learning and ensemble approaches for cybersecurity: A survey.IEEE Transactions on Artificial Intelligence, 3(5):761–779, 2022

2022

[9] [9]

Modeling task relationships in multi-task learning with multi-gate mixture-of-experts

Jiaqi Ma et al. Modeling task relationships in multi-task learning with multi-gate mixture-of-experts. InKDD, pages 1930–1939, 2018

1930

[10] [10]

Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer

Noam Shazeer et al. Outrageously large neural networks: The sparsely-gated mixture-of-experts layer.arXiv preprint arXiv:1701.06538, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[11] [11]

Qirui et al

S. Qirui et al. Investigating the effects of packers on ml-based malware detection. InCySSS, 2022

2022

[12] [12]

Switch transformers: Scaling to trillion parameter models with simple and efficient sparsity.Journal of Machine Learning Research, 23(120):1–39, 2022

William Fedus, Barret Zoph, and Noam Shazeer. Switch transformers: Scaling to trillion parameter models with simple and efficient sparsity.Journal of Machine Learning Research, 23(120):1–39, 2022

2022

[13] [13]

Daniel Gibert, Carles Mateu, and Jordi Planes. The rise of machine learning for detection and classification of malware: Research developments, trends and challenges.Journal of Network and Computer Applications, 153:102526, 2020

2020

[14] [14]

Assessing the impact of packing on static machine learning-based malware detection and classification systems.Computers & Security, 156:104495, 2025

Daniel Gibert, Nikolaos Totosis, Constantinos Patsakis, Quan Le, and Giulio Zizzo. Assessing the impact of packing on static machine learning-based malware detection and classification systems.Computers & Security, 156:104495, 2025

2025

[15] [15]

Multi-task learning for cybersecurity applications: A comprehensive survey.Neural Computing and Applications, 36(18):10435–10468, 2024

Mohamed Ibrahim et al. Multi-task learning for cybersecurity applications: A comprehensive survey.Neural Computing and Applications, 36(18):10435–10468, 2024

2024

[16] [16]

Jacobs, Michael I

Robert A. Jacobs, Michael I. Jordan, Steven J. Nowlan, and Geoffrey E. Hinton. Adaptive mixtures of local experts.Neural Computation, 3(1):79–87, 1991

1991

[17] [17]

Static multi feature-based malware detection using multi spp-net in smart iot environments.IEEE Transactions on Information Forensics and Security, 19:2487–2500, 2024

Jueun Jeon, Byeonghui Jeong, Seungyeon Baek, and Young-Sik Jeong. Static multi feature-based malware detection using multi spp-net in smart iot environments.IEEE Transactions on Information Forensics and Security, 19:2487–2500, 2024. 22 A Multi-task Mixture of Experts Framework Summary:Data augmentation uniquely benefits the MMoE framework, optimizing its...

2024

[18] [18]

Joyce et al

Robert J. Joyce et al. Ember2024 – a benchmark dataset for holistic evaluation of malware classifiers. In Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V . 2, 2025

2025

[19] [19]

Naseem Khan, Aref Al-Tamimi, Amine Bermak, and Issa Khalil. Adaptive malware detection using sequential feature selection: A dueling double deep q-network framework for intelligent classification.Journal of Information Security and Applications, 99:104407, 2026

2026

[20] [20]

Federated malware intelligence framework for distributed threat classification.Future Generation Computer Systems, 157:302–315, 2024

Hyun Kim et al. Federated malware intelligence framework for distributed threat classification.Future Generation Computer Systems, 157:302–315, 2024

2024

[21] [21]

Deep learning for classification of malware system call sequences

Bojan Kolosnjaji et al. Deep learning for classification of malware system call sequences. InAI Conference, 2016

2016

[22] [22]

Pe malware machine learning dataset

Michael Lester. Pe malware machine learning dataset. https://www.practicalsecurityanalytics.com,

[23] [23]

Accessed for benign PE samples

[24] [24]

Attention-driven lightweight cnn architecture for malware image classification.Expert Systems with Applications, 245:123115, 2024

Yong Liu et al. Attention-driven lightweight cnn architecture for malware image classification.Expert Systems with Applications, 245:123115, 2024

2024

[25] [25]

Identifying useful features for malware detection in the ember dataset

Yoshihiro Oyama, Takumi Miyashita, and Hirotaka Kokubo. Identifying useful features for malware detection in the ember dataset. InCANDARW, pages 360–366, 2019

2019

[26] [26]

Hierarchical visual encoding for scalable malware image analysis.Pattern Recognition, 158:110945, 2025

Jihoon Park et al. Hierarchical visual encoding for scalable malware image analysis.Pattern Recognition, 158:110945, 2025

2025

[27] [27]

Portableapps.com - portable software collection

PortableApps.com. Portableapps.com - portable software collection. https://portableapps.com/, 2024. Accessed: 2026

2024

[28] [28]

Ember feature dataset analysis for malware detection

Marian ¸ Sandor, Radu Marian Portase, and Adrian Cole¸ sa. Ember feature dataset analysis for malware detection. In2023 IEEE 19th International Conference on Intelligent Computer Communication and Processing (ICCP), pages 203–210. IEEE, 2023

2023

[29] [29]

Migan: Gan for facilitating malware image synthesis with improved malware classification on novel dataset.Expert Systems with Applications, 241:122678, 2024

Osho Sharma, Akashdeep Sharma, and Arvind Kalia. Migan: Gan for facilitating malware image synthesis with improved malware classification on novel dataset.Expert Systems with Applications, 241:122678, 2024

2024

[30] [30]

Static malware detection of ember windows-pe api call using machine learning.COMPUTATIONAL INTELLIGENCE AND NETWORK SECURITY, 2724(1):020001, 2023

Omkar Shinde, Anish Khobragade, and Pooja Agrawal. Static malware detection of ember windows-pe api call using machine learning.COMPUTATIONAL INTELLIGENCE AND NETWORK SECURITY, 2724(1):020001, 2023

2023

[31] [31]

Detecting and mitigating sampling bias in cybersecurity with unlabeled data

Saravanan Thirumuruganathan, Fatih Deniz, Issa Khalil, Ting Yu, Mohamed Nabeel, and Mourad Ouzzani. Detecting and mitigating sampling bias in cybersecurity with unlabeled data. In33rd USENIX Security Symposium (USENIX Security 24), pages 1741–1758, 2024

2024

[32] [32]

Improved multi-gate mixture-of-experts framework for multi-step gas load forecasting.Energy, 282:128553, 2023

Jian Tong et al. Improved multi-gate mixture-of-experts framework for multi-step gas load forecasting.Energy, 282:128553, 2023

2023

[33] [33]

Transmal: Transformer-based malware image classification framework.Computers & Security, 138:103640, 2024

Lei Wang et al. Transmal: Transformer-based malware image classification framework.Computers & Security, 138:103640, 2024

2024

[34] [34]

Multi-head mixture-of-experts

Xun Wu, Shaohan Huang, Wenhui Wang, Shuming Ma, Li Dong, and Furu Wei. Multi-head mixture-of-experts. InAdvances in Neural Information Processing Systems (NeurIPS), 2024. 23 A Multi-task Mixture of Experts Framework Table 13:Evaluation of the proposed Homogeneous-MoE framework trained on 53,120 original samples with 4% mutation-based augmentation and test...

2024

[35] [35]

Bitcn-taefficientnet malware classification approach based on sequence and rgb fusion.Computers & Security, 139:103734, 2024

Bona Xuan, Jin Li, and Yafei Song. Bitcn-taefficientnet malware classification approach based on sequence and rgb fusion.Computers & Security, 139:103734, 2024

2024

[36] [36]

A survey of adversarial attack and defense methods for malware classification in cyber security.IEEE Communications Surveys & Tutorials, 25(1):467–496, 2022

Senming Yan, Jing Ren, Wei Wang, Limin Sun, Wei Zhang, and Quan Yu. A survey of adversarial attack and defense methods for malware classification in cyber security.IEEE Communications Surveys & Tutorials, 25(1):467–496, 2022

2022

[37] [37]

Multimodal transformer fusion for robust malware family classification.IEEE Transactions on Information Forensics and Security, 20:2145–2159, 2025

Wei Zhang et al. Multimodal transformer fusion for robust malware family classification.IEEE Transactions on Information Forensics and Security, 20:2145–2159, 2025. 24 A Multi-task Mixture of Experts Framework Summary:The transition to raw 1D images at a baseline length of 1024 advances both the Heterogeneous and Homogeneous MoE models, optimizing their p...

2025