MalwarePT: A Binary-Level Foundation Model for Malware Analysis

Christopher Kruegel; Giovanni Vigna; Hojjat Aghakhani; Kaie Chen; Roman Vasilenko; Saastha Vasan; Wenbo Guo; Yigitcan Kaya; Yuzhou Nie

arxiv: 2605.16455 · v1 · pith:4WYABC5Enew · submitted 2026-05-15 · 💻 cs.CR

MalwarePT: A Binary-Level Foundation Model for Malware Analysis

Saastha Vasan , Yuzhou Nie , Kaie Chen , Yigitcan Kaya , Hojjat Aghakhani , Roman Vasilenko , Wenbo Guo , Christopher Kruegel

show 1 more author

Giovanni Vigna

This is my paper

Pith reviewed 2026-05-20 18:18 UTC · model grok-4.3

classification 💻 cs.CR

keywords malware analysisbinary foundation modelsBPE tokenizationWindows PE filesAPI call predictionfunctionality classificationmalware detectiontemporal drift

0 comments

The pith

Pretraining a binary encoder on Windows PE code bytes with BPE tokenization transfers to API prediction, functionality classification, and low-FPR malware detection.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents MalwarePT as a single pretrained encoder that learns from Windows PE code-section bytes. It applies masked language modeling after training a BPE tokenizer to capture frequent multi-byte patterns. This setup is tested on token-level API call prediction, function-level classification, and document-level malware detection under temporal drift. Pretraining produces clear gains over non-pretrained baselines, BPE vocabularies around 1,024 tokens give the best balance, and the model beats other neural approaches while adding value to traditional PE structure features.

Core claim

A ModernBERT-style encoder pretrained with masked language modeling on BPE-tokenized Windows PE code-section bytes yields reusable representations that improve performance across API call prediction, functionality classification, and malware detection tasks, with the largest advantages appearing at low false-positive rates and when combined with PE-structure models.

What carries the argument

The MalwarePT encoder, a transformer pretrained via masked language modeling on BPE-compressed PE code bytes, which learns multi-byte patterns to produce task-transferable binary representations.

If this is right

API call sequences become more accurately predictable from raw binary bytes after pretraining.
Functionality labels for code segments improve when the model starts from the pretrained weights.
Malware detection reaches higher true-positive rates at false-positive rates near 0.001 than other neural baselines.
The learned representations add information not captured by handcrafted PE structure features.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

A single binary foundation model could replace several task-specific feature pipelines in malware pipelines.
Tuning BPE vocabulary size offers a practical lever for trading context length against pattern capture in other executable domains.
The observed complementarity with structure features points to hybrid systems that combine pretrained byte patterns with static metadata.

Load-bearing premise

Code-section bytes drawn from a broad set of Windows PE files form a representative distribution that transfers to later malware tasks without major shifts in the evaluation data.

What would settle it

Running the same downstream tasks on a fresh temporal split of malware samples collected well after the pretraining cutoff and finding no gains over non-pretrained neural baselines would falsify the transfer benefit.

Figures

Figures reproduced from arXiv: 2605.16455 by Christopher Kruegel, Giovanni Vigna, Hojjat Aghakhani, Kaie Chen, Roman Vasilenko, Saastha Vasan, Wenbo Guo, Yigitcan Kaya, Yuzhou Nie.

**Figure 1.** Figure 1: Overview of MalwarePT. Raw bytes are extracted from the code section of each PE file, converted into atomic byte-level symbols, tokenized with a BPE vocabulary, and split into fixed-length sequences. These sequences are used to pretrain a ModernBERT-inspired bidirectional encoder with masked language modeling, after which the pretrained encoder is fine-tuned end-to-end with taskspecific heads for downstre… view at source ↗

**Figure 2.** Figure 2: Attention patterns in byte-level binary encoders. BERT-style encoders [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 3.** Figure 3: Left: Dataset filtering across six stages. Right: Log scale showing original executable file size (red solid), code segments size (blue dashed), and BPE tokens (green dotted). BPE tokens reduce the original binary by approximately 79%. AVClass2 failed to identify families for 7.04% of the samples. Among the remaining 92.96%, we identified 1,992 distinct malware families, representing a broad spectrum of r… view at source ↗

read the original abstract

Automated malware analysis increasingly relies on machine learning, yet most existing methods remain task-specific and depend on handcrafted features or narrowly scoped models. Recent developments in binary-level foundation models suggest a path toward reusable program representations, but their application to malware analysis remains underexplored, and most still operate at byte-level tokenization, limiting their ability to capture multi-byte code patterns. In this work, we introduce MalwarePT, a binary-level foundation model for malware analysis built on a ModernBERT-style encoder and pretrained with masked language modeling on Windows PE code-section bytes. We study whether a single pretrained encoder can transfer across malware-analysis tasks at different granularities, and how tokenization design affects that transfer. We train a byte-pair encoding (BPE) tokenizer on code-section bytes to compress frequent multi-byte patterns within a fixed context budget. We evaluate MalwarePT on three downstream tasks spanning token-, function-, and document-level prediction: API call prediction, functionality classification, and malware (program) detection under temporal drift. Our evaluation demonstrates that pretraining yields substantial gains for API call prediction and functionality classification, and that increasing the BPE vocabulary beyond the byte-level baseline improves performance, with the strongest overall tradeoff at a vocabulary size of 1,024 tokens. In malware detection at FPR ~ 0.001, MalwarePT outperforms the neural network baselines, and is complementary to feature-engineering models that rely on PE structure. We also compare against existing binary foundation models and show that MalwarePT's design choices yield gains across all downstream tasks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

MalwarePT shows that BPE tokenization on PE code bytes plus ModernBERT pretraining transfers across token/function/program malware tasks with measurable gains over byte baselines, but the temporal drift claim needs explicit timeline checks to confirm no leakage.

read the letter

The main thing here is that pretraining a ModernBERT encoder with masked language modeling on Windows PE code-section bytes, using a BPE tokenizer, produces representations that improve API call prediction and functionality classification while also helping malware detection at low false-positive rates. It beats some neural baselines and sits alongside PE-structure feature models rather than replacing them. The paper also tests vocabulary sizes and lands on 1024 as a reasonable operating point. That combination of architecture, tokenization, and multi-granularity evaluation is the concrete step forward from earlier byte-level binary models they cite.

Referee Report

1 major / 1 minor

Summary. The paper introduces MalwarePT, a ModernBERT-style encoder pretrained via masked language modeling on Windows PE code-section bytes using BPE tokenization. It evaluates whether the resulting representations transfer to token-level (API call prediction), function-level (functionality classification), and document-level (malware detection under temporal drift) tasks, reporting consistent gains from pretraining, benefits from increasing BPE vocabulary size up to 1024, outperformance of neural baselines at low FPR in detection, and complementarity to PE-structure feature models.

Significance. If the results hold, the work indicates that BPE-based binary pretraining can produce reusable representations that improve performance across granularities in malware analysis and remain useful alongside traditional features. The temporal-drift evaluation and explicit ablations against byte-level baselines and prior binary models are positive elements that strengthen applicability claims.

major comments (1)

[Evaluation under temporal drift] Temporal drift evaluation (abstract and evaluation section): the manuscript states that malware detection is assessed 'under temporal drift' yet provides no explicit confirmation that the pretraining corpus collection window precedes the evaluation cutoff, that no future-period samples entered pretraining, or that the BPE vocabulary was learned exclusively on pre-drift data. This verification is load-bearing for interpreting reported transfer gains as arising from the MLM objective rather than from reduced distribution shift between pretraining and test regimes.

minor comments (1)

[Results] Notation for BPE vocabulary sizes and model variants could be standardized across tables and figures to ease comparison of the 1024-token tradeoff.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the careful review and for identifying a point that requires greater clarity in the temporal-drift evaluation. We respond to the major comment below and will revise the manuscript to address it.

read point-by-point responses

Referee: [Evaluation under temporal drift] Temporal drift evaluation (abstract and evaluation section): the manuscript states that malware detection is assessed 'under temporal drift' yet provides no explicit confirmation that the pretraining corpus collection window precedes the evaluation cutoff, that no future-period samples entered pretraining, or that the BPE vocabulary was learned exclusively on pre-drift data. This verification is load-bearing for interpreting reported transfer gains as arising from the MLM objective rather than from reduced distribution shift between pretraining and test regimes.

Authors: We agree that explicit documentation of the data-collection timelines is necessary to substantiate the temporal-drift claim. The pretraining corpus and BPE vocabulary were constructed exclusively from samples whose collection window ends before the start of the evaluation dataset used for malware detection; no future-period samples were included in either. To make this verification transparent, we will add a short subsection (or expanded paragraph) in the Evaluation section that states the relevant collection cutoffs, confirms the absence of overlap, and notes that the BPE tokenizer was fit only on the pretraining corpus. This revision will allow readers to confirm that reported gains arise from the MLM objective rather than from inadvertent distribution alignment. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical results from distinct pretraining and held-out evaluation stages

full rationale

The paper reports an empirical machine-learning study: a ModernBERT encoder is pretrained once via masked language modeling on a collection of Windows PE code-section bytes (with BPE tokenization), then frozen and transferred to three separate downstream tasks (API call prediction, functionality classification, malware detection under temporal drift). Performance gains are measured as accuracy/F1/AUC numbers on held-out test splits that are not used in pretraining or vocabulary fitting. No equations, first-principles derivations, or fitted-parameter predictions appear in the abstract or described methodology that reduce the reported gains to quantities defined by the same inputs. The evaluation explicitly separates pretraining corpus from downstream test distributions, satisfying the self-contained benchmark criterion. No load-bearing self-citations, ansatzes, or renamings of known results are invoked to justify the central claims.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

The work rests on standard transformer pretraining assumptions plus domain-specific choices about what constitutes representative malware code bytes. No new physical entities are postulated.

free parameters (2)

BPE vocabulary size
Chosen after experimentation; the paper identifies 1024 as the strongest tradeoff point.
Model architecture hyperparameters
ModernBERT-style encoder size, learning rate schedule, and masking probability are fitted or selected during pretraining.

axioms (2)

domain assumption Code-section bytes from Windows PE files form a suitable pretraining corpus for learning transferable representations for malware analysis.
Invoked when the authors restrict pretraining to code sections and assume transfer to downstream tasks.
domain assumption Masked language modeling on byte sequences captures useful multi-byte code patterns for malware tasks.
Central justification for the pretraining objective.

pith-pipeline@v0.9.0 · 5839 in / 1622 out tokens · 50351 ms · 2026-05-20T18:18:11.756903+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

52 extracted references · 52 canonical work pages · 5 internal anchors

[1]

In: NDSS (2020)

Aghakhani, H., Gritti, F., Mecca, F., Lindorfer, M., Ortolani, S., Balzarotti, D., Vigna, G., Kruegel, C.: When malware is packin’heat; limits of machine learning classifiers based on static analysis features. In: NDSS (2020)

work page 2020
[2]

In: Pro- ceedings of the sixth ACM conference on data and application security and privacy (2016)

Ahmadi, M., Ulyanov, D., Semenov, S., Trofimov, M., Giacinto, G.: Novel feature extraction, selection and fusion for effective malware family classification. In: Pro- ceedings of the sixth ACM conference on data and application security and privacy (2016)

work page 2016
[3]

In: ACSAC (2022)

Ahn, S., Ahn, S., Koo, H., Paek, Y.: Practical binary code similarity detection with bert-based transferable similarity learning. In: ACSAC (2022)

work page 2022
[4]

EMBER: An Open Dataset for Training Static PE Malware Machine Learning Models

Anderson, H.S., Roth, P.: Ember: an open dataset for training static pe malware machine learning models. arXiv preprint arXiv:1804.04637 (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018
[5]

In: 31st USENIX Security Symposium (USENIX Security 22)

Arp, D., Quiring, E., Pendlebury, F., Warnecke, A., Pierazzi, F., Wressnegger, C., Cavallaro, L., Rieck, K.: Dos and don’ts of machine learning in computer secu- rity. In: 31st USENIX Security Symposium (USENIX Security 22). pp. 3971–3988 (2022)

work page 2022
[6]

In: 2022 IEEE Symposium on Security and Privacy (SP)

Barbero, F., Pendlebury, F., Pierazzi, F., Cavallaro, L.: Transcending transcend: Revisiting malware classification in the presence of concept drift. In: 2022 IEEE Symposium on Security and Privacy (SP). pp. 805–823. IEEE (2022)

work page 2022
[7]

Longformer: The Long-Document Transformer

Beltagy, I., Peters, M.E., Cohan, A.: Longformer: The long-document transformer. arXiv preprint arXiv:2004.05150 (2020)

work page internal anchor Pith review Pith/arXiv arXiv 2004
[8]

In: 2025 IEEE Symposium on Security and Privacy (SP)

Benkraouda, H., Diwan, N., Wang, G.: You can’t judge a binary by its header: Data-code separation for non-standard arm binaries using pseudo labels. In: 2025 IEEE Symposium on Security and Privacy (SP). pp. 36–36. IEEE Computer So- ciety (2024)

work page 2025
[9]

Transfer Learning for Image-Based Malware Classification

Bhodia, N., Prajapati, P., Di Troia, F., Stamp, M.: Transfer learning for image- based malware classification. arXiv preprint arXiv:1903.11551 (2019)

work page internal anchor Pith review Pith/arXiv arXiv 1903
[10]

Contributors, P.: Pytorch: An open source machine learning framework (2024), https://pytorch.org/

work page 2024
[11]

In: CCS (2023)

Dambra, S., Han, Y., Aonzo, S., Kotzias, P., Vitale, A., Caballero, J., Balzarotti, D., Bilge, L.: Decoding the secrets of machine learning in malware classification: A deep dive into datasets, feature extraction, and model performance. In: CCS (2023)

work page 2023
[12]

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Devlin,J.,Chang,M.W.,Lee,K.,Toutanova,K.:Bert:Pre-trainingofdeepbidirec- tional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018
[13]

Fuyong, Z., Tiezhu, Z.: Malware detection and classification based on n-grams attribute similarity. In: CSE. IEEE (2017)

work page 2017
[14]

Hex-Rays: Ida pro.https://hex-rays.com/ida-pro

work page
[15]

Horsicq: Detect it easy (2024),https://github.com/horsicq/Detect-It-Easy

work page 2024
[16]

In: NTMS

Kalash, M., Rochan, M., Mohammed, N., Bruce, N.D., Wang, Y., Iqbal, F.: Mal- ware classification with deep convolutional neural networks. In: NTMS. IEEE (2018)

work page 2018
[17]

In: Proceedings of the 2025 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML)

Kaya, Y., Chen, Y., Botacin, M., Saha, S., Pierazzi, F., Cavallaro, L., Wagner, D., Dumitras, T.: Ml-based behavioral malware detection is far from a solved problem. In: Proceedings of the 2025 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML). IEEE (2025)

work page 2025
[18]

In: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security

Kim, D., Kwon, B.J., Dumitraş, T.: Certified malware: Measuring breaches of trust in the windows code-signing pki. In: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security. pp. 1435–1448 (2017) 22

work page 2017
[19]

In: 27th USENIX Security Symposium (USENIX Security 18)

Kim, D., Kwon, B.J., Kozák, K., Gates, C., Dumitras,, T.: The broken shield: Measuring revocation effectiveness in the windows{Code-Signing}{PKI}. In: 27th USENIX Security Symposium (USENIX Security 18). pp. 851–868 (2018)

work page 2018
[20]

Koo, H., Park, S., Choi, D., Kim, T.: Semantic-aware binary code representation with bert (2021),https://arxiv.org/abs/2106.05478

work page arXiv 2021
[21]

Digital investigation3, 91–97 (2006)

Kornblum, J.: Identifying almost identical files using context triggered piecewise hashing. Digital investigation3, 91–97 (2006)

work page 2006
[22]

In: WI and IAT

Kruczkowski, M., Szynkiewicz, E.N.: Support vector machine for malware analysis and classification. In: WI and IAT. IEEE (2014)

work page 2014
[23]

In: ICCAI (2018)

Kumar, R., Xiaosong, Z., Khan, R.U., Ahad, I., Kumar, J.: Malicious code detec- tion based on image processing using deep learning. In: ICCAI (2018)

work page 2018
[24]

In: The Network and Distributed System Security (NDSS) Symposium (2026)

Kurlandski, L., Berger, H., Pan, Y., Wright, M.: Beyond raw bytes: Towards large malware language models. In: The Network and Distributed System Security (NDSS) Symposium (2026)

work page 2026
[25]

In: Ieee infocom 2022-ieee conference on computer communications

Ling, X., Wu, L., Deng, W., Qu, Z., Zhang, J., Zhang, S., Ma, T., Wang, B., Wu, C., Ji, S.: Malgraph: Hierarchical graph neural networks for robust windows malware detection. In: Ieee infocom 2022-ieee conference on computer communications. pp. 1998–2007. IEEE (2022)

work page 2022
[26]

Advances in Neural Information Processing Systems37, 58698–58715 (2024)

Liu, C., Saul, R., Sun, Y., Raff, E., Fuchs, M., Southard Pantano, T., Holt, J., Micinski, K.: Assemblage: Automatic binary dataset construction for machine learning. Advances in Neural Information Processing Systems37, 58698–58715 (2024)

work page 2024
[27]

RoBERTa: A Robustly Optimized BERT Pretraining Approach

Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692 (2019)

work page internal anchor Pith review Pith/arXiv arXiv 1907
[28]

In: ICDMAI

Makandar, A., Patrot, A.: Malware class recognition using image processing tech- niques. In: ICDMAI. IEEE (2017)

work page 2017
[29]

MITRE ATT&CK: Mitre att&ck framework.https://attack.mitre.org

work page
[30]

In: ICASSP

Pascanu, R., Stokes, J.W., Sanossian, H., Marinescu, M., Thomas, A.: Malware classification with recurrent networks. In: ICASSP. IEEE (2015)

work page 2015
[31]

arXiv preprint arXiv:2010.00770 (2020)

Pei, K., Guan, J., Williams-King, D., Yang, J., Jana, S.: Xda: Accurate, robust disassembly with transfer learning. arXiv preprint arXiv:2010.00770 (2020)

work page arXiv 2010
[32]

Pontello, M.: Trid - file identifier (2024),http://mark0.net/soft-trid-e.html

work page 2024
[33]

OpenAI blog (8) (2019)

Radford,A.,Wu,J.,Child,R.,Luan,D.,Amodei,D.,Sutskever,I.,etal.:Language models are unsupervised multitask learners. OpenAI blog (8) (2019)

work page 2019
[34]

In: AAAI Workshop (2018)

Raff, E., Barker, J., Sylvester, J., Brandon, R., Catanzaro, B., Nicholas, C.K.: Malware detection by eating a whole exe. In: AAAI Workshop (2018)

work page 2018
[35]

In: AAAI

Raff, E., Fleshman, W., Zak, R., Anderson, H.S., Filar, B., McLean, M.: Classifying sequences of extreme length with constant memory applied to malware detection. In: AAAI. No. 11 (2021)

work page 2021
[36]

In: Big Data Analytics

Rathore,H.,Agarwal,S.,Sahay,S.K.,Sewak,M.:Malwaredetectionusingmachine learning and deep learning. In: Big Data Analytics. Springer (2018)

work page 2018
[37]

In: Kim, B., Yue, Y., Chaudhuri, S., Fragkiadaki, K., Khan, M., Sun, Y

Saha, S., Wang, W., Kaya, Y., Feizi, S., Dumitras, T.: Drsm: De-randomized smoothing on malware classifier providing certified robustness. In: Kim, B., Yue, Y., Chaudhuri, S., Fragkiadaki, K., Khan, M., Sun, Y. (eds.) International Con- ference on Learning Representations. vol. 2024, pp. 47666–47686 (2024)

work page 2024
[38]

In: ACSAC (2021) 23

Sajid, M.S.I., Wei, J., Abdeen, B., Al-Shaer, E., Islam, M.M., Diong, W., Khan, L.: Soda: A system for cyber deception orchestration and automation. In: ACSAC (2021) 23

work page 2021
[39]

in- formation Sciences (2013)

Santos, I., Brezo, F., Ugarte-Pedrero, X., Bringas, P.G.: Opcode sequences as rep- resentation of executables for data-mining-based unknown malware detection. in- formation Sciences (2013)

work page 2013
[40]

In: MALWARE

Saxe, J., Berlin, K.: Deep neural network based malware detection using two di- mensional binary program features. In: MALWARE. IEEE (2015)

work page 2015
[41]

In: ACSAC (2020)

Sebastián, S., Caballero, J.: Avclass2: Massive malware tag extraction from av labels. In: ACSAC (2020)

work page 2020
[42]

In: RAID

Shafiq, M.Z., Tabish, S.M., Mirza, F., Farooq, M.: Pe-miner: Mining structural information to detect malicious executables in realtime. In: RAID. Springer (2009)

work page 2009
[43]

Standard Performance Evaluation Corporation: Spec cpu®2006 benchmark.ht tps://www.spec.org/cpu2006/(2006), accessed: 2025-11-20

work page 2006
[44]

Standard Performance Evaluation Corporation: Spec cpu®2017 benchmark.ht tps://www.spec.org/cpu2017/(2017), accessed: 2025-11-20

work page 2017
[45]

Neurocomputing568, 127063 (2024)

Su, J., Ahmed, M., Lu, Y., Pan, S., Bo, W., Liu, Y.: Roformer: Enhanced trans- former with rotary position embedding. Neurocomputing568, 127063 (2024)

work page 2024
[46]

Team, T.F.: capa: The flare team’s open-source tool to identify capabilities in executable files (2024),https://github.com/mandiant/capa

work page 2024
[47]

Computer Networks (2020)

Vasan, D., Alazab, M., Wassan, S., Naeem, H., Safaei, B., Zheng, Q.: Imcfn: Image- based malware classification using fine-tuned convolutional neural network archi- tecture. Computer Networks (2020)

work page 2020
[48]

In: ACSAC

Vasan, S., Aghakhani, H., Ortolani, S., Vasilenko, R., Grishchenko, I., Kruegel, C., Vigna, G.: DeepCapa: Identifying Malicious Capability in Windows Malware. In: ACSAC. IEEE (2024)

work page 2024
[49]

virustotal.com/

VirusTotal: Virustotal - free online virus, malware and url scanner.https://www. virustotal.com/

work page
[50]

In: Proceedings of the 63rd Annual Meeting of the As- sociation for Computational Linguistics (Volume 1: Long Papers)

Warner, B., Chaffin, A., Clavié, B., Weller, O., Hallström, O., Taghadouini, S., Gallagher, A., Biswas, R., Ladhak, F., Aarsen, T., et al.: Smarter, better, faster, longer: A modern bidirectional encoder for fast, memory efficient, and long context finetuning and inference. In: Proceedings of the 63rd Annual Meeting of the As- sociation for Computational ...

work page 2025
[51]

arXiv preprint arXiv:1708.08042 (2017)

Yue,S.,Wang,T.:Imbalancedmalwareimagesclassification:acnnbasedapproach. arXiv preprint arXiv:1708.08042 (2017)

work page arXiv 2017
[52]

Zhang, B., Sennrich, R.: Root mean square layer normalization. Advances in Neural Information Processing Systems32(2019) 24 8 Statement on Data Availability We are committed to maximizing the reproducibility of our work, and upon publication we will release all source code and pretrained model weights for every version ofMal w arePT—including the 256–4096...

work page 2019

[1] [1]

In: NDSS (2020)

Aghakhani, H., Gritti, F., Mecca, F., Lindorfer, M., Ortolani, S., Balzarotti, D., Vigna, G., Kruegel, C.: When malware is packin’heat; limits of machine learning classifiers based on static analysis features. In: NDSS (2020)

work page 2020

[2] [2]

In: Pro- ceedings of the sixth ACM conference on data and application security and privacy (2016)

Ahmadi, M., Ulyanov, D., Semenov, S., Trofimov, M., Giacinto, G.: Novel feature extraction, selection and fusion for effective malware family classification. In: Pro- ceedings of the sixth ACM conference on data and application security and privacy (2016)

work page 2016

[3] [3]

In: ACSAC (2022)

Ahn, S., Ahn, S., Koo, H., Paek, Y.: Practical binary code similarity detection with bert-based transferable similarity learning. In: ACSAC (2022)

work page 2022

[4] [4]

EMBER: An Open Dataset for Training Static PE Malware Machine Learning Models

Anderson, H.S., Roth, P.: Ember: an open dataset for training static pe malware machine learning models. arXiv preprint arXiv:1804.04637 (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018

[5] [5]

In: 31st USENIX Security Symposium (USENIX Security 22)

Arp, D., Quiring, E., Pendlebury, F., Warnecke, A., Pierazzi, F., Wressnegger, C., Cavallaro, L., Rieck, K.: Dos and don’ts of machine learning in computer secu- rity. In: 31st USENIX Security Symposium (USENIX Security 22). pp. 3971–3988 (2022)

work page 2022

[6] [6]

In: 2022 IEEE Symposium on Security and Privacy (SP)

Barbero, F., Pendlebury, F., Pierazzi, F., Cavallaro, L.: Transcending transcend: Revisiting malware classification in the presence of concept drift. In: 2022 IEEE Symposium on Security and Privacy (SP). pp. 805–823. IEEE (2022)

work page 2022

[7] [7]

Longformer: The Long-Document Transformer

Beltagy, I., Peters, M.E., Cohan, A.: Longformer: The long-document transformer. arXiv preprint arXiv:2004.05150 (2020)

work page internal anchor Pith review Pith/arXiv arXiv 2004

[8] [8]

In: 2025 IEEE Symposium on Security and Privacy (SP)

Benkraouda, H., Diwan, N., Wang, G.: You can’t judge a binary by its header: Data-code separation for non-standard arm binaries using pseudo labels. In: 2025 IEEE Symposium on Security and Privacy (SP). pp. 36–36. IEEE Computer So- ciety (2024)

work page 2025

[9] [9]

Transfer Learning for Image-Based Malware Classification

Bhodia, N., Prajapati, P., Di Troia, F., Stamp, M.: Transfer learning for image- based malware classification. arXiv preprint arXiv:1903.11551 (2019)

work page internal anchor Pith review Pith/arXiv arXiv 1903

[10] [10]

Contributors, P.: Pytorch: An open source machine learning framework (2024), https://pytorch.org/

work page 2024

[11] [11]

In: CCS (2023)

Dambra, S., Han, Y., Aonzo, S., Kotzias, P., Vitale, A., Caballero, J., Balzarotti, D., Bilge, L.: Decoding the secrets of machine learning in malware classification: A deep dive into datasets, feature extraction, and model performance. In: CCS (2023)

work page 2023

[12] [12]

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Devlin,J.,Chang,M.W.,Lee,K.,Toutanova,K.:Bert:Pre-trainingofdeepbidirec- tional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018

[13] [13]

Fuyong, Z., Tiezhu, Z.: Malware detection and classification based on n-grams attribute similarity. In: CSE. IEEE (2017)

work page 2017

[14] [14]

Hex-Rays: Ida pro.https://hex-rays.com/ida-pro

work page

[15] [15]

Horsicq: Detect it easy (2024),https://github.com/horsicq/Detect-It-Easy

work page 2024

[16] [16]

In: NTMS

Kalash, M., Rochan, M., Mohammed, N., Bruce, N.D., Wang, Y., Iqbal, F.: Mal- ware classification with deep convolutional neural networks. In: NTMS. IEEE (2018)

work page 2018

[17] [17]

In: Proceedings of the 2025 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML)

Kaya, Y., Chen, Y., Botacin, M., Saha, S., Pierazzi, F., Cavallaro, L., Wagner, D., Dumitras, T.: Ml-based behavioral malware detection is far from a solved problem. In: Proceedings of the 2025 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML). IEEE (2025)

work page 2025

[18] [18]

In: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security

Kim, D., Kwon, B.J., Dumitraş, T.: Certified malware: Measuring breaches of trust in the windows code-signing pki. In: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security. pp. 1435–1448 (2017) 22

work page 2017

[19] [19]

In: 27th USENIX Security Symposium (USENIX Security 18)

Kim, D., Kwon, B.J., Kozák, K., Gates, C., Dumitras,, T.: The broken shield: Measuring revocation effectiveness in the windows{Code-Signing}{PKI}. In: 27th USENIX Security Symposium (USENIX Security 18). pp. 851–868 (2018)

work page 2018

[20] [20]

Koo, H., Park, S., Choi, D., Kim, T.: Semantic-aware binary code representation with bert (2021),https://arxiv.org/abs/2106.05478

work page arXiv 2021

[21] [21]

Digital investigation3, 91–97 (2006)

Kornblum, J.: Identifying almost identical files using context triggered piecewise hashing. Digital investigation3, 91–97 (2006)

work page 2006

[22] [22]

In: WI and IAT

Kruczkowski, M., Szynkiewicz, E.N.: Support vector machine for malware analysis and classification. In: WI and IAT. IEEE (2014)

work page 2014

[23] [23]

In: ICCAI (2018)

Kumar, R., Xiaosong, Z., Khan, R.U., Ahad, I., Kumar, J.: Malicious code detec- tion based on image processing using deep learning. In: ICCAI (2018)

work page 2018

[24] [24]

In: The Network and Distributed System Security (NDSS) Symposium (2026)

Kurlandski, L., Berger, H., Pan, Y., Wright, M.: Beyond raw bytes: Towards large malware language models. In: The Network and Distributed System Security (NDSS) Symposium (2026)

work page 2026

[25] [25]

In: Ieee infocom 2022-ieee conference on computer communications

Ling, X., Wu, L., Deng, W., Qu, Z., Zhang, J., Zhang, S., Ma, T., Wang, B., Wu, C., Ji, S.: Malgraph: Hierarchical graph neural networks for robust windows malware detection. In: Ieee infocom 2022-ieee conference on computer communications. pp. 1998–2007. IEEE (2022)

work page 2022

[26] [26]

Advances in Neural Information Processing Systems37, 58698–58715 (2024)

Liu, C., Saul, R., Sun, Y., Raff, E., Fuchs, M., Southard Pantano, T., Holt, J., Micinski, K.: Assemblage: Automatic binary dataset construction for machine learning. Advances in Neural Information Processing Systems37, 58698–58715 (2024)

work page 2024

[27] [27]

RoBERTa: A Robustly Optimized BERT Pretraining Approach

Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692 (2019)

work page internal anchor Pith review Pith/arXiv arXiv 1907

[28] [28]

In: ICDMAI

Makandar, A., Patrot, A.: Malware class recognition using image processing tech- niques. In: ICDMAI. IEEE (2017)

work page 2017

[29] [29]

MITRE ATT&CK: Mitre att&ck framework.https://attack.mitre.org

work page

[30] [30]

In: ICASSP

Pascanu, R., Stokes, J.W., Sanossian, H., Marinescu, M., Thomas, A.: Malware classification with recurrent networks. In: ICASSP. IEEE (2015)

work page 2015

[31] [31]

arXiv preprint arXiv:2010.00770 (2020)

Pei, K., Guan, J., Williams-King, D., Yang, J., Jana, S.: Xda: Accurate, robust disassembly with transfer learning. arXiv preprint arXiv:2010.00770 (2020)

work page arXiv 2010

[32] [32]

Pontello, M.: Trid - file identifier (2024),http://mark0.net/soft-trid-e.html

work page 2024

[33] [33]

OpenAI blog (8) (2019)

Radford,A.,Wu,J.,Child,R.,Luan,D.,Amodei,D.,Sutskever,I.,etal.:Language models are unsupervised multitask learners. OpenAI blog (8) (2019)

work page 2019

[34] [34]

In: AAAI Workshop (2018)

Raff, E., Barker, J., Sylvester, J., Brandon, R., Catanzaro, B., Nicholas, C.K.: Malware detection by eating a whole exe. In: AAAI Workshop (2018)

work page 2018

[35] [35]

In: AAAI

Raff, E., Fleshman, W., Zak, R., Anderson, H.S., Filar, B., McLean, M.: Classifying sequences of extreme length with constant memory applied to malware detection. In: AAAI. No. 11 (2021)

work page 2021

[36] [36]

In: Big Data Analytics

Rathore,H.,Agarwal,S.,Sahay,S.K.,Sewak,M.:Malwaredetectionusingmachine learning and deep learning. In: Big Data Analytics. Springer (2018)

work page 2018

[37] [37]

In: Kim, B., Yue, Y., Chaudhuri, S., Fragkiadaki, K., Khan, M., Sun, Y

Saha, S., Wang, W., Kaya, Y., Feizi, S., Dumitras, T.: Drsm: De-randomized smoothing on malware classifier providing certified robustness. In: Kim, B., Yue, Y., Chaudhuri, S., Fragkiadaki, K., Khan, M., Sun, Y. (eds.) International Con- ference on Learning Representations. vol. 2024, pp. 47666–47686 (2024)

work page 2024

[38] [38]

In: ACSAC (2021) 23

Sajid, M.S.I., Wei, J., Abdeen, B., Al-Shaer, E., Islam, M.M., Diong, W., Khan, L.: Soda: A system for cyber deception orchestration and automation. In: ACSAC (2021) 23

work page 2021

[39] [39]

in- formation Sciences (2013)

Santos, I., Brezo, F., Ugarte-Pedrero, X., Bringas, P.G.: Opcode sequences as rep- resentation of executables for data-mining-based unknown malware detection. in- formation Sciences (2013)

work page 2013

[40] [40]

In: MALWARE

Saxe, J., Berlin, K.: Deep neural network based malware detection using two di- mensional binary program features. In: MALWARE. IEEE (2015)

work page 2015

[41] [41]

In: ACSAC (2020)

Sebastián, S., Caballero, J.: Avclass2: Massive malware tag extraction from av labels. In: ACSAC (2020)

work page 2020

[42] [42]

In: RAID

Shafiq, M.Z., Tabish, S.M., Mirza, F., Farooq, M.: Pe-miner: Mining structural information to detect malicious executables in realtime. In: RAID. Springer (2009)

work page 2009

[43] [43]

Standard Performance Evaluation Corporation: Spec cpu®2006 benchmark.ht tps://www.spec.org/cpu2006/(2006), accessed: 2025-11-20

work page 2006

[44] [44]

Standard Performance Evaluation Corporation: Spec cpu®2017 benchmark.ht tps://www.spec.org/cpu2017/(2017), accessed: 2025-11-20

work page 2017

[45] [45]

Neurocomputing568, 127063 (2024)

Su, J., Ahmed, M., Lu, Y., Pan, S., Bo, W., Liu, Y.: Roformer: Enhanced trans- former with rotary position embedding. Neurocomputing568, 127063 (2024)

work page 2024

[46] [46]

Team, T.F.: capa: The flare team’s open-source tool to identify capabilities in executable files (2024),https://github.com/mandiant/capa

work page 2024

[47] [47]

Computer Networks (2020)

Vasan, D., Alazab, M., Wassan, S., Naeem, H., Safaei, B., Zheng, Q.: Imcfn: Image- based malware classification using fine-tuned convolutional neural network archi- tecture. Computer Networks (2020)

work page 2020

[48] [48]

In: ACSAC

Vasan, S., Aghakhani, H., Ortolani, S., Vasilenko, R., Grishchenko, I., Kruegel, C., Vigna, G.: DeepCapa: Identifying Malicious Capability in Windows Malware. In: ACSAC. IEEE (2024)

work page 2024

[49] [49]

virustotal.com/

VirusTotal: Virustotal - free online virus, malware and url scanner.https://www. virustotal.com/

work page

[50] [50]

In: Proceedings of the 63rd Annual Meeting of the As- sociation for Computational Linguistics (Volume 1: Long Papers)

Warner, B., Chaffin, A., Clavié, B., Weller, O., Hallström, O., Taghadouini, S., Gallagher, A., Biswas, R., Ladhak, F., Aarsen, T., et al.: Smarter, better, faster, longer: A modern bidirectional encoder for fast, memory efficient, and long context finetuning and inference. In: Proceedings of the 63rd Annual Meeting of the As- sociation for Computational ...

work page 2025

[51] [51]

arXiv preprint arXiv:1708.08042 (2017)

Yue,S.,Wang,T.:Imbalancedmalwareimagesclassification:acnnbasedapproach. arXiv preprint arXiv:1708.08042 (2017)

work page arXiv 2017

[52] [52]

Zhang, B., Sennrich, R.: Root mean square layer normalization. Advances in Neural Information Processing Systems32(2019) 24 8 Statement on Data Availability We are committed to maximizing the reproducibility of our work, and upon publication we will release all source code and pretrained model weights for every version ofMal w arePT—including the 256–4096...

work page 2019