MalGEN: A Testbed for Modeling and Evaluating Malware Behaviors

· 2025 · cs.CR · arXiv 2506.07586

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

open full Pith review browse 4 citing papers arXiv PDF

abstract

Modern cybersecurity requires systematic ways to evaluate how detection systems respond to evolving and previously unseen attack behaviors. Existing malware repositories largely capture known patterns and provide limited support for stress-testing defenses against novel threats. To address this, we present MalGEN, a modular testbed that models adversarial workflows and generates executable artifacts in a controlled environment. The framework decomposes high-level attack objectives into structured stages, enabling the synthesis of diverse and multi-stage behaviors. We evaluate MalGEN across 1,920 benchmark settings covering multiple platforms and behavioral objectives, resulting in 977 executable samples. Analysis shows that the generated artifacts exhibit a wide range of malicious techniques and multi-stage attack patterns. However, 45.71% of these samples remain undetected by existing detection engines, which reveals notable gaps in current defenses. These findings provide practical insights into the limitations of widely used detection approaches and support the development of more robust security evaluation and testing practices.

citation-role summary

background 1

citation-polarity summary

support 1

representative citing papers

AI-Generated PowerShell Malware: An Experimental Framework and Dataset

cs.CR · 2026-06-29 · unverdicted · novelty 6.0

An experimental framework and annotated dataset show LLM-generated PowerShell malware triggers OS events with median 84.5% Jaccard overlap to real malware and 48.4% complete matches.

uGen: An Agentic Framework for Generating Microarchitectural Attack PoCs

cs.CR · 2026-05-15 · unverdicted · novelty 6.0

uGen is the first retrieval-augmented multi-agent LLM framework for generating functionally correct microarchitectural attack PoCs, reporting up to 100% success on Spectre-v1 and 80% on Prime+Probe at low cost.

The Infinite Mutation Engine? Measuring Polymorphism in LLM-Generated Offensive Code

cs.CR · 2026-05-05 · unverdicted · novelty 6.0 · 2 refs

A single commercial LLM can cheaply generate large populations of behaviorally equivalent yet structurally diverse malware payloads.

LLM Harms: A Taxonomy and Discussion

cs.CY · 2025-12-05

citing papers explorer

Showing 1 of 1 citing paper after filters.

LLM Harms: A Taxonomy and Discussion cs.CY · 2025-12-05 · unreviewed · ref 216 · internal anchor

MalGEN: A Testbed for Modeling and Evaluating Malware Behaviors

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer