arxiv: 2605.08385 · v1 · submitted 2026-05-08 · 💻 cs.CR

Recognition: 2 theorem links

· Lean Theorem

Quantifiable Uncertainty: A Stochastic Consensus Multi-Agent RAG Framework for Robust Malware Detection

ElMouatez Billah Karbab

Authors on Pith no claims yet

Pith reviewed 2026-05-12 00:56 UTC · model grok-4.3

classification 💻 cs.CR

keywords malware detectionRAG frameworkuncertainty estimationmulti-agent systemsensemble methodsepistemic uncertaintystochastic consensusreject option

0 comments

The pith

A stochastic multi-agent RAG system uses ensemble disagreement scores to reject ambiguous malware samples and reach 98.4 percent detection.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents MAGMA as a retrieval-augmented generation framework that splits malware analysis into semantic code retrieval and probabilistic verification steps. It employs dual-stream embeddings on assembly and pseudo-code to focus on decision-critical functions while ignoring dead code. A stochastic consistency ensemble runs multiple non-deterministic agent evaluations on the retrieved set, from which it derives an Evidence Conflict Score as the Shannon entropy of the prediction distribution. Elevated ECS values are shown to act as a proxy for structural ambiguity, supporting a reject-option policy that avoids forced classifications on uncertain inputs. This yields the reported 98.4 percent detection rate by addressing epistemic uncertainty that standard deep-learning classifiers cannot express.

Core claim

The central claim is that the Evidence Conflict Score derived from a stochastic consistency ensemble over retrieval-augmented reasoning agents serves as an effective proxy for structural ambiguity in malware binaries, enabling a principled reject-option policy that improves detection reliability beyond existing monolithic classifiers to 98.4 percent.

What carries the argument

The Evidence Conflict Score (ECS), defined as the Shannon entropy of the ensemble's predictive distribution, which quantifies disagreement among multiple independent agent evaluations to identify ambiguous cases.

If this is right

Detectors can defer classification on high-ECS samples instead of risking misclassification under evasion attacks.
Dual-stream retrieval isolates decision-critical functions, reducing noise from irrelevant code sections.
Quantifiable uncertainty allows the system to express epistemic limits that monolithic classifiers hide.
The reject-option policy improves overall accuracy by routing uncertain cases to secondary analysis.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same ECS-based deferral could be tested in other adversarial domains such as network intrusion detection where input ambiguity is common.
Production deployments might integrate human review loops triggered specifically by high-ECS outputs to reduce analyst workload on clear cases.
Future extensions could explore whether ECS correlates with actual evasion success rates in controlled red-team experiments.

Load-bearing premise

That the Evidence Conflict Score from the stochastic ensemble accurately captures structural ambiguity in malware rather than other sources of agent disagreement.

What would settle it

A test set of malware variants engineered with known levels of structural ambiguity (such as varying dead-code insertion) whose ECS values are measured to check whether higher ambiguity reliably produces higher ECS and triggers the reject policy.

read the original abstract

While contemporary deep learning malware detectors define a dominant defense paradigm, their sophistication also exposes them to novel structural evasion attacks, a limitation we attribute to their inherent inability to express epistemic uncertainty. To address this challenge, we present MAGMA, a Retrieval-Augmented Generation (RAG) framework that decouples malware analysis into semantic code retrieval and probabilistic verification. In contrast to monolithic classifiers, MAGMA employs a dual-stream embedding scheme over assembly and pseudo-code representations to isolate Decision-Critical Functions (DCFs) from the noise of dead code. We further introduce a Stochastic Consistency Ensemble, in which multiple instances of the same reasoning agent independently evaluate the retrieval set under non-deterministic sampling. From this ensemble, we derive two complementary metrics: Function Evidence Strength (FES), a weighted aggregation of retrieval confidence, and the Evidence Conflict Score (ECS), defined as the Shannon entropy of the ensemble's predictive distribution. We show that elevated ECS values serve as an effective proxy for structural ambiguity, enabling the system to implement a principled ``reject-option'' policy. Extensive evaluation demonstrates that MAGMA achieves a 98.4% detection rate, substantially exceeding existing solutions.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper outlines a RAG-based stochastic ensemble for adding uncertainty estimates and a reject option to malware detection, but the evaluation claims rest on missing details.

read the letter

The main takeaway is that this work builds a retrieval-augmented framework called MAGMA that uses dual embeddings and a stochastic multi-agent setup to produce uncertainty scores for malware classification. The idea is to flag ambiguous cases instead of forcing a decision, which directly targets a known weakness in standard deep learning detectors that get fooled by structural changes in code. What is new here is the specific integration of RAG for code retrieval, separate streams for assembly and pseudo-code to focus on decision-critical functions, and the stochastic consistency ensemble that runs the same agent multiple times with sampling variation. From that they derive Function Evidence Strength as a weighted confidence measure and Evidence Conflict Score as entropy over the prediction distribution to trigger rejects. The architecture is laid out clearly and the reject policy follows logically from the entropy definition. The paper does a reasonable job explaining why monolithic classifiers struggle with epistemic uncertainty and how retrieval plus consensus might help isolate real signals from dead code noise. The soft spots are more about what is not shown. The abstract states a 98.4 percent detection rate that beats existing solutions, yet there are no dataset names, baseline comparisons, ablation results, or tests confirming that high ECS values actually correspond to structural ambiguity rather than other forms of model uncertainty. Without those, it is difficult to judge whether the metrics are useful proxies or just restate the ensemble's internal disagreement. The circularity risk is moderate because both FES and ECS are computed directly from the same retrieval and prediction steps, so independent validation against external ambiguity labels would be needed to strengthen the claims. This paper is mainly for researchers working on robust security classifiers or uncertainty-aware AI systems who are already familiar with RAG and ensemble methods. Someone looking for a concrete framework to adapt could pull useful pieces from the dual-stream and consensus design, but they would have to supply their own experiments to test it. I would send it to peer review because the core architecture is coherent and addresses a practical gap, even though the current version needs substantial additions on the empirical side to be convincing.

Referee Report

2 major / 0 minor

Summary. The manuscript proposes MAGMA, a Retrieval-Augmented Generation (RAG) framework for malware detection that decouples analysis into semantic code retrieval and probabilistic verification using a dual-stream embedding scheme for assembly and pseudo-code to isolate Decision-Critical Functions (DCFs). It introduces a Stochastic Consistency Ensemble where multiple agents evaluate the retrieval set, deriving Function Evidence Strength (FES) as weighted retrieval confidence and Evidence Conflict Score (ECS) as Shannon entropy of the predictive distribution to enable a reject-option for high ambiguity. The paper claims that this approach achieves a 98.4% detection rate, substantially outperforming existing solutions.

Significance. If the empirical results hold under rigorous validation, the work could significantly impact the field of malware detection by providing a framework that quantifies uncertainty to handle structural evasion attacks, moving beyond monolithic deep learning classifiers. The use of ensemble-based metrics like FES and ECS offers a principled way to implement reject options, which is valuable for high-stakes security applications.

major comments (2)

[Abstract] Abstract: The assertion that 'MAGMA achieves a 98.4% detection rate, substantially exceeding existing solutions' is presented without any reference to the datasets employed, baseline methods, evaluation methodology, cross-validation strategy, or ablation studies. This is load-bearing for the central effectiveness claim, as no supporting evidence is supplied to allow assessment of the result.
[Abstract] Abstract: The statement that 'elevated ECS values serve as an effective proxy for structural ambiguity' is not accompanied by any validation against ground-truth measures of ambiguity, correlation analysis, or experiments demonstrating the reject-option policy's effect on detection performance. Without external grounding, the proxy interpretation rests solely on the internal definition of ECS as Shannon entropy over the ensemble distribution.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thorough review and for recognizing the potential significance of our work in advancing uncertainty-aware malware detection. We address each major comment below and have revised the manuscript to strengthen the presentation of our claims.

read point-by-point responses

Referee: [Abstract] Abstract: The assertion that 'MAGMA achieves a 98.4% detection rate, substantially exceeding existing solutions' is presented without any reference to the datasets employed, baseline methods, evaluation methodology, cross-validation strategy, or ablation studies. This is load-bearing for the central effectiveness claim, as no supporting evidence is supplied to allow assessment of the result.

Authors: We agree that the abstract, as a concise summary, would benefit from additional context to ground the central claim. The full manuscript provides detailed descriptions of the datasets, baseline comparisons, evaluation methodology, cross-validation strategy, and ablation studies in the Experiments and Evaluation sections. To address this concern directly, we have revised the abstract to include a brief reference to the evaluation setup and key performance context while respecting length constraints. revision: yes
Referee: [Abstract] Abstract: The statement that 'elevated ECS values serve as an effective proxy for structural ambiguity' is not accompanied by any validation against ground-truth measures of ambiguity, correlation analysis, or experiments demonstrating the reject-option policy's effect on detection performance. Without external grounding, the proxy interpretation rests solely on the internal definition of ECS as Shannon entropy over the ensemble distribution.

Authors: We thank the referee for this observation. The manuscript presents experimental results demonstrating that elevated ECS aligns with cases of structural evasion and that the reject-option policy improves overall detection metrics. To provide stronger external validation as suggested, we have added explicit correlation analysis between ECS and ground-truth measures of structural ambiguity (derived from controlled function modifications) along with quantitative results showing the reject-option's impact on precision and recall. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected in derivation chain

full rationale

The paper's core architecture defines FES directly as a weighted aggregation of retrieval confidence scores and ECS as Shannon entropy over the ensemble's predictive distribution; these are explicit computational definitions from the stochastic consistency ensemble outputs rather than derived predictions or first-principles results that reduce to the inputs by construction. The claim that elevated ECS serves as a proxy for structural ambiguity is framed as an empirical observation validated through evaluation, not a mathematical equivalence or self-referential loop. No load-bearing self-citations, uniqueness theorems, or ansatzes imported from prior work are present in the abstract or high-level description, and the dual-stream embedding and reject-option policy remain independent components without reducing to fitted parameters renamed as predictions. The 98.4% detection rate is an empirical result, not a circular derivation.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 5 invented entities

The central claims rest on several newly introduced concepts and derived metrics whose independent grounding is not evidenced in the abstract; the framework adds these entities without parameter-free derivations or external benchmarks.

axioms (2)

standard math Shannon entropy of an ensemble's predictive distribution quantifies conflict or ambiguity
Directly invoked to define the Evidence Conflict Score (ECS)
domain assumption Dual-stream embeddings on assembly and pseudo-code representations isolate Decision-Critical Functions from dead code noise
Core premise of the retrieval component

invented entities (5)

MAGMA no independent evidence
purpose: Overall RAG framework for uncertainty-aware malware detection
Newly proposed system name and architecture
Decision-Critical Functions (DCFs) no independent evidence
purpose: Focus analysis on key functions while ignoring dead code
Introduced as output of the dual-stream embedding scheme
Stochastic Consistency Ensemble no independent evidence
purpose: Generate multiple independent evaluations under non-deterministic sampling
Core mechanism for deriving uncertainty metrics
Function Evidence Strength (FES) no independent evidence
purpose: Weighted aggregation of retrieval confidence
Derived metric from the framework
Evidence Conflict Score (ECS) no independent evidence
purpose: Measure structural ambiguity via entropy to enable reject-option
Derived metric enabling the key policy

pith-pipeline@v0.9.0 · 5500 in / 1692 out tokens · 62638 ms · 2026-05-12T00:56:33.616765+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

ECS defined as Shannon entropy of ensemble predictive distribution; tri-state policy with tau_stable threshold
IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Dual-stream lifting L(x) = <e_asm, e_dec> and k-NN DCF retrieval

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

44 extracted references · 44 canonical work pages · 2 internal anchors

[1]

In: Proceedings of the 17th Annual Computer Security Applications Conference (ACSAC), pp

Anderson, R.: Why information security is hard-an economic perspective. In: Proceedings of the 17th Annual Computer Security Applications Conference (ACSAC), pp. 358–365. IEEE, ??? (2001) 25

work page 2001
[2]

Proceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security, 3078–3092 (2023) https://doi.org/10

Li, S.: Packgenome: Automatically generating robust yara rules for accurate mal- ware packer detection. Proceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security, 3078–3092 (2023) https://doi.org/10. 1145/3576915.3616625

work page arXiv 2023
[3]

In: Proceedings of the 1st Reversing and Oﬀensive-oriented Trends Symposium (ROTS), pp

Bulazel, A., Yener, B.: A survey on automated dynamic malware analysis evasion and counter-evasion: Pc, mobile, and web. In: Proceedings of the 1st Reversing and Oﬀensive-oriented Trends Symposium (ROTS), pp. 1–21. ACM, ??? (2017)

work page 2017
[4]

Journal of Network and Computer Applications 153, 102526 (2020) https://doi.org/10

Gibert, D., Mateu, C., Planes, J.: The rise of machine learning for detection and classiﬁcation of malware: Research developments, trends and challenges. Journal of Network and Computer Applications 153, 102526 (2020) https://doi.org/10. 1016/j.jnca.2019.102526

work page arXiv 2020
[5]

In: 31st USENIX Security Symposium (USENIX Security 22), pp

Arp, D., Quiring, E., Pendlebury, F., Warnecke, A., Pierazzi, F., Wressnegger, C., Cavallaro, L., Rieck, K.: Dos and don’ts of machine learning in computer security. In: 31st USENIX Security Symposium (USENIX Security 22), pp. 3971–3988. USENIX Association, ??? (2022)

work page 2022
[6]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp

Hein, M., Andriushchenko, M., Bitterwolf, J.: Why relu networks yield high- conﬁdence predictions far away from the training data and how to mitigate the problem. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 41–50 (2019)

work page 2019
[7]

In: International Conference on Machine Learning (ICML), pp

Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning (ICML), pp. 1050–1059. PMLR, ??? (2016)

work page 2016
[8]

IEEE Transactions on Information Forensics and Security (TIFS) 19, 1142–1155 (2024)

He, Y., Kang, X., Yan, Q., Li, E.: Resnext+: Attention mechanisms based on resnext for malware detection and classiﬁcation. IEEE Transactions on Information Forensics and Security (TIFS) 19, 1142–1155 (2024)

work page 2024
[9]

Malware Detection by Eating a Whole EXE,

Raﬀ, E., Barker, J., Sylvester, J., Brandon, R., Catanzaro, B., Nicholas, C.: Malware detection by eating a whole exe. In: Workshops at the Thirty- Second AAAI Conference on Artiﬁcial Intelligence (2018). Presented at the AAAI Workshop on Artiﬁcial Intelligence for Cyber Security (AICS). https://arxiv.org/abs/1710.09435

work page arXiv 2018
[10]

In: 2020 IEEE Symposium on Security and Privacy (SP), pp

Pierazzi, F., Pendlebury, F., Cortellino, J., Cavallaro, L.: Intriguing properties of adversarial ml attacks in the problem space. In: 2020 IEEE Symposium on Security and Privacy (SP), pp. 1332–1349. IEEE, ??? (2020)

work page 2020
[11]

https://ghidra-sre.org/

National Security Agency: Ghidra Software Reverse Engineering Framework. https://ghidra-sre.org/

work page
[12]

Harang, R., Rudd, E.M.: SOREL-20M: A Large Scale Benchmark Dataset for Malicious PE Detection (2020) 26

work page 2020
[13]

https://virusshare.com/

VirusShare.com. https://virusshare.com/. Accessed: 2025-12-01 (2025)

work page 2025
[14]

Technical Report TR 2007-48, Purdue University (2007)

Idika, N., Mathur, A.P.: A survey of malware detection techniques. Technical Report TR 2007-48, Purdue University (2007)

work page 2007
[15]

Digital Investigation 18, 33–45 (2016) https://doi.org/10.1016/j.diin.2016.04.013

Karbab, E.B., Debbabi, M., Mouheb, D.: Fingerprinting android packaging: Generating dnas for malware detection. Digital Investigation 18, 33–45 (2016) https://doi.org/10.1016/j.diin.2016.04.013

work page doi:10.1016/j.diin.2016.04.013 2016
[16]

Auror: Defending against poisoning attacks in collaborative deep learning systems,

Karbab, E.B., Debbabi, M., Derhab, A., Mouheb, D.: Cypider: building community-based cyber-defense infrastructure for android malware detec- tion. In: Proceedings of the 32nd Annual Conference on Computer Security Applications. ACSAC ’16, pp. 348–362. Association for Computing Machin- ery, New York, NY, USA (2016). https://doi.org/10.1145/2991079.2991124 ...

work page doi:10.1145/2991079.2991124 2016
[17]

Applying Graph Analysis for Unsupervised Fast Malware Fingerprinting

Karbab, E.B., Debbabi, M.: Applying Graph Analysis for Unsupervised Fast Malware Fingerprinting (2025). https://arxiv.org/abs/2510.12811

work page internal anchor Pith review Pith/arXiv arXiv 2025
[18]

In: 11th International Symposium on Recent Advances in Intrusion Detection (RAID)

Rieck, K., Holz, T., Willems, C., Düssel, P., Laskov, P.: Learning and classifying new intrusion attacks with unknown payloads. In: 11th International Symposium on Recent Advances in Intrusion Detection (RAID). Springer, ??? (2008)

work page 2008
[19]

In: Proceedings of the 9th ACM Conference on Computer and Communications Security (CCS), pp

Wagner, D., Soto, P.: Mimicry attacks on host-based intrusion detection systems. In: Proceedings of the 9th ACM Conference on Computer and Communications Security (CCS), pp. 255–264. ACM, ??? (2002)

work page 2002
[20]

Digital Investiga- tion 24, 48–59 (2018) https://doi.org/10.1016/j.diin.2018.01.007

Karbab, E.B., Debbabi, M., Derhab, A., Mouheb, D.: Maldozer: Automatic framework for android malware detection using deep learning. Digital Investiga- tion 24, 48–59 (2018) https://doi.org/10.1016/j.diin.2018.01.007

work page doi:10.1016/j.diin.2018.01.007 2018
[21]

In: Bilge, L., Cavallaro, L., Pellegrino, G., Neves, N

Karbab, E.B., Debbabi, M.: Petadroid: Adaptive android malware detection using deep learning. In: Bilge, L., Cavallaro, L., Pellegrino, G., Neves, N. (eds.) Detection of Intrusions and Malware, and Vulnerability Assessment, pp. 319–340. Springer, Cham (2021)

work page 2021
[22]

CoRR abs/1712.08996 (2017) 1712.08996

Karbab, E.B., Debbabi, M., Derhab, A., Mouheb, D.: Android malware detection using deep learning on API method sequences. CoRR abs/1712.08996 (2017) 1712.08996

work page arXiv 2017
[23]

Karbab, E.B., Debbabi, M., Derhab, A., Mouheb, D.: Android Malware Detection Using Machine Learning: Data-driven Fingerprinting and Threat Intelligence vol

work page
[24]

Springer, ??? (2021)

work page 2021
[25]

In: Sako, K., Schneider, S., Ryan, P.Y.A

Alrabaee, S., Karbab, E.B., Wang, L., Debbabi, M.: Bineye: Towards eﬃcient binary authorship characterization using deepălearning. In: Sako, K., Schneider, S., Ryan, P.Y.A. (eds.) Computer Security – ESORICS 2019, pp. 47–67. Springer, 27 Cham (2019)

work page 2019
[26]

Expert Sys- tems with Applications 225, 120017 (2023) https://doi.org/10.1016/j.eswa.2023

Karbab, E.B., Debbabi, M., Derhab, A.: Swiftr: Cross-platform ransomware ﬁngerprinting using hierarchical neural networks on hybrid features. Expert Sys- tems with Applications 225, 120017 (2023) https://doi.org/10.1016/j.eswa.2023. 120017

work page doi:10.1016/j.eswa.2023 2023
[27]

IEEE Trans- actions on Information Forensics and Security 16, 3469–3478 (2021)

Demetrio, L., Biggio, B., Lagorio, G., Roli, F., Armando, A.: Functionality- preserving black-box optimization of adversarial windows malware. IEEE Trans- actions on Information Forensics and Security 16, 3469–3478 (2021)

work page 2021
[28]

In: 32nd USENIX Security Symposium (USENIX Security 23), pp

Mukherjee, K., Wiedemeier, J., Wang, T., Wei, J., Chen, F., Kim, M., Kantar- cioglu, M., Jee, K.: Evading provenance-based ml detectors with adversarial system actions. In: 32nd USENIX Security Symposium (USENIX Security 23), pp. 1199–1216. USENIX Association, ??? (2023)

work page 2023
[29]

In: 28th USENIX Security Symposium (USENIX Security 19), pp

Pendlebury, F., Pierazzi, F., Jordaney, R., Kinder, J., Cavallaro, L.: Tesseract: Eliminating experimental bias in malware classiﬁcation across space and time. In: 28th USENIX Security Symposium (USENIX Security 19), pp. 729–746. USENIX Association, ??? (2019)

work page 2019
[30]

In: 30th USENIX Security Symposium (USENIX Security 21), pp

Yang, L., Guo, W., Hao, Q., Ciptadi, A., Ahmadzadeh, A., Xing, X., Wang, G.: Cade: Detecting and explaining concept drift samples for security applications. In: 30th USENIX Security Symposium (USENIX Security 21), pp. 2327–2344. USENIX Association, ??? (2021)

work page 2021
[31]

In: 2022 IEEE Symposium on Security and Privacy (SP), pp

Barbero, F., Pendlebury, F., Pierazzi, F., Cavallaro, L.: Transcending transcend: Revisiting malware classiﬁcation in the presence of concept drift. In: 2022 IEEE Symposium on Security and Privacy (SP), pp. 805–823. IEEE, ??? (2022)

work page 2022
[32]

In: Advances in Neural Information Processing Systems (NeurIPS), vol

Lakshminarayanan, B., Pritzel, A., Blundell, C.: Simple and scalable predictive uncertainty estimation using deep ensembles. In: Advances in Neural Information Processing Systems (NeurIPS), vol. 30 (2017)

work page 2017
[33]

In: International Conference on Machine Learning (ICML), pp

Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning (ICML), pp. 1050–1059 (2016)

work page 2016
[34]

In: 2022 IEEE Symposium on Security and Privacy (SP), pp

Pearce, H., Ahmad, B., Tan, B., Dolan-Gavitt, B., Karri, R.: Asleep at the key- board? assessing the security of github copilot’s code contributions. In: 2022 IEEE Symposium on Security and Privacy (SP), pp. 754–768. IEEE, ??? (2022)

work page 2022
[35]

AsmRAG: LLM-Driven Malware Detection by Retrieving Functionally Similar Assembly Code

Karbab, E.B.: AsmRAG: LLM-Driven Malware Detection by Retrieving Func- tionally Similar Assembly Code (2026). https://arxiv.org/abs/2604.23196

work page internal anchor Pith review Pith/arXiv arXiv 2026
[36]

Bender, E.M., Gebru, T., McMillan-Major, A., Shmitchell, S.: On the dangers of stochastic parrots: Can language models be too big? In: Proceedings of the 2021 28 ACM Conference on Fairness, Accountability, and Transparency, pp. 610–623. ACM, ??? (2021)

work page 2021
[37]

In: Advances in Neural Information Processing Systems (NeurIPS), vol

Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models. In: Advances in Neural Information Processing Systems (NeurIPS), vol. 35, pp. 24824–24837 (2022)

work page 2022
[38]

In: International Conference on Learning Representations (ICLR) (2023)

Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-consistency improves chain of thought reasoning in language models. In: International Conference on Learning Representations (ICLR) (2023)

work page 2023
[39]

Alvarez, V.: YARA: The Pattern Matching Swiss Knife for Malware Researchers. (2023). https://virustotal.github.io/yara/

work page 2023
[40]

EMBER: An Open Dataset for Training Static PE Malware Machine Learning Models

Anderson, H.S., Roth, P.: Ember: An open dataset for training static pe malware machine learning models. arXiv preprint arXiv:1804.04637 (2018)

work page Pith review arXiv 2018
[41]

In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp

He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recogni- tion. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016)

work page 2016
[42]

Computers & Security 92, 101740 (2020)

Zhou, S., Zou, F., Wang, T.: Automating the analysis of variant malware using graph neural networks. Computers & Security 92, 101740 (2020)

work page 2020
[43]

In: Detection of Intrusions and Malware, and Vulnerability Assessment (DIMV A), pp

Huang, W., Stokes, J.W.: Mtnet: A multi-task neural network for dynamic mal- ware classiﬁcation. In: Detection of Intrusions and Malware, and Vulnerability Assessment (DIMV A), pp. 399–418. Springer, ??? (2016)

work page 2016
[44]

arXiv preprint arXiv:2304.01852 (2023) 29

Liu, Y., Han, T., Ma, S., Zhang, J., Yang, Y., Tian, J., He, H., et al.: Summary of chatgpt/gpt-4 research and perspective towards the future of large language models. arXiv preprint arXiv:2304.01852 (2023) 29

work page arXiv 2023