AsmRAG: LLM-Driven Malware Detection by Retrieving Functionally Similar Assembly Code

· 2026 · cs.CR · arXiv 2604.23196

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

Deep learning malware detectors achieve high classification accuracy but suffer from severe interpretability limitations, typically returning probabilistic verdicts that lack forensic context. We introduce AsmRAG, a framework performing malware analysis through Assembly-Level Retrieval-Augmented Generation. Unlike classifiers built on global statistical features, AsmRAG reformulates detection as an evidence-based retrieval task. The system uses a code-specialized Large Language Model (LLM) to analyze assembly functions and convert them into semantic embeddings. This process constructs a searchable knowledge base resilient to syntactic obfuscation. For inference, we propose a Density-Weighted Anchor Selection mechanism that isolates the primary unit of malicious logic within a binary to extract verifiable forensic evidence and resist evasion attempts. Testing on a curated dataset of 40k binaries shows AsmRAG reaching a detection F1-score of 96% alongside a family attribution F1-score of 95%. Comparisons confirm this semantic retrieval approach remains robust against metamorphic obfuscation. When holistic baselines (EMBER and ResNeXt) degrade, our methodology gives Security Operations Centers a transparent and reliable alternative.

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Quantifiable Uncertainty: A Stochastic Consensus Multi-Agent RAG Framework for Robust Malware Detection

cs.CR · 2026-05-08 · unverdicted · novelty 7.0

MAGMA combines RAG with a stochastic consistency ensemble over dual code embeddings to derive Function Evidence Strength and Evidence Conflict Score metrics, enabling reject-option decisions and achieving 98.4% malware detection.

citing papers explorer

Showing 1 of 1 citing paper.

Quantifiable Uncertainty: A Stochastic Consensus Multi-Agent RAG Framework for Robust Malware Detection cs.CR · 2026-05-08 · unverdicted · none · ref 35 · internal anchor
MAGMA combines RAG with a stochastic consistency ensemble over dual code embeddings to derive Function Evidence Strength and Evidence Conflict Score metrics, enabling reject-option decisions and achieving 98.4% malware detection.

AsmRAG: LLM-Driven Malware Detection by Retrieving Functionally Similar Assembly Code

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer