Recognition: no theorem link
Towards LLM-Based Analysis of Virtualization-Obfuscated Code through Automated Data Generation
Pith reviewed 2026-05-12 03:38 UTC · model grok-4.3
The pith
Virtualization-obfuscated binaries are decomposed into LLM-sized units labeled by structural roles using an automated static analysis framework.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Obfuscated binaries can be decomposed into the largest semantically coherent units that fit within LLM constraints and labeled according to their structural roles through an automated static analysis framework, which enables large-scale dataset generation and shows strong performance on real-world virtualization obfuscators.
What carries the argument
The static analysis framework for decomposing binaries into coherent units and assigning structural role labels to automate dataset creation.
If this is right
- LLMs can process obfuscated code despite size limits by analyzing these labeled units.
- Large-scale labeled datasets become available for training models on virtualization-obfuscated code.
- Analysis of real-world obfuscated binaries improves without manual labeling.
- Structural focus allows useful insights even when full semantics are hard to recover.
Where Pith is reading between the lines
- Such decomposition might be combined with other analysis techniques to enhance overall reverse engineering.
- Extending this to dynamic analysis could provide additional labels for better LLM performance.
- This could reduce the barrier for researchers studying malware protected by virtualization obfuscation.
Load-bearing premise
Structural role labels assigned by static analysis are sufficient for LLMs to perform useful analysis and the decomposition preserves necessary information without introducing systematic errors.
What would settle it
Demonstrating that LLMs trained or prompted with these labeled units perform no better on downstream tasks like identifying obfuscation handlers than those using random or no labels would falsify the utility of the approach.
Figures
read the original abstract
Virtualization-based obfuscation produces extremely large and structurally complex binaries, posing challenges for LLM-based analysis due to input size limits and the need for large-scale labeled data. We address this by focusing on structural rather than full semantic analysis. Obfuscated binaries are decomposed into the largest semantically coherent units that fit within LLM constraints and are labeled according to their structural roles. We implement a static analysis framework to automate labeling and enable large-scale dataset generation. Our prototype shows strong performance on real-world virtualization obfuscators.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes decomposing virtualization-obfuscated binaries into the largest semantically coherent units that fit within LLM context windows, then labeling those units by structural role (handler, dispatcher, bytecode, etc.) via an automated static-analysis framework. The resulting labeled units are intended to support large-scale dataset generation for LLM-based reverse engineering. The manuscript asserts that the prototype achieves strong performance on real-world virtualization obfuscators.
Significance. If the static-analysis labeling step proves reliable, the pipeline would supply a scalable source of training data for LLMs on a class of binaries that currently defeats both standard decompilers and direct LLM ingestion. The emphasis on structural rather than full semantic labels is a pragmatic engineering choice given current LLM limits. The automated nature of the labeling is a clear strength for reproducibility and scale.
major comments (2)
- [Abstract] Abstract: the claim that the prototype 'shows strong performance on real-world virtualization obfuscators' is unsupported by any quantitative metrics (precision/recall of structural-role labels, number of binaries processed, error analysis, or ground-truth comparison). Because the central contribution is the automated labeling pipeline that enables LLM analysis, the absence of these figures makes the performance assertion unevaluable.
- [Static Analysis Framework] Static-analysis framework description: virtualization obfuscation replaces conventional control flow with a custom VM interpreter and data-driven dispatch, which defeats standard CFG recovery. The manuscript provides no validation (manual inspection, inter-rater agreement, or comparison against hand-annotated ground truth) that the framework correctly identifies handlers, dispatchers, and bytecode on such binaries. Systematic mislabeling would render the generated dataset unusable for downstream LLM tasks.
minor comments (1)
- The abstract would be clearer if it named the specific real-world obfuscators (e.g., VMProtect, Themida) on which the prototype was tested.
Simulated Author's Rebuttal
We thank the referee for the thoughtful and detailed review. The comments correctly identify areas where the current manuscript requires additional evidence to support its claims. We address each major comment below and commit to revisions that will strengthen the paper without altering its core contribution.
read point-by-point responses
-
Referee: [Abstract] Abstract: the claim that the prototype 'shows strong performance on real-world virtualization obfuscators' is unsupported by any quantitative metrics (precision/recall of structural-role labels, number of binaries processed, error analysis, or ground-truth comparison). Because the central contribution is the automated labeling pipeline that enables LLM analysis, the absence of these figures makes the performance assertion unevaluable.
Authors: We agree that the abstract's assertion of strong performance is currently unsupported by quantitative evidence and therefore unevaluable. In the revised manuscript we will (1) qualify or remove the unqualified claim from the abstract and (2) add a dedicated evaluation section that reports precision and recall for each structural-role label, the total number of binaries processed, error analysis, and direct comparisons against hand-annotated ground truth on a representative subset of real-world samples. These additions will make the performance claims concrete and directly address the referee's concern. revision: yes
-
Referee: [Static Analysis Framework] Static-analysis framework description: virtualization obfuscation replaces conventional control flow with a custom VM interpreter and data-driven dispatch, which defeats standard CFG recovery. The manuscript provides no validation (manual inspection, inter-rater agreement, or comparison against hand-annotated ground truth) that the framework correctly identifies handlers, dispatchers, and bytecode on such binaries. Systematic mislabeling would render the generated dataset unusable for downstream LLM tasks.
Authors: The referee accurately describes the validation gap. We will add a new subsection under the static-analysis framework that presents (a) manual inspection results on a sample of labeled units drawn from real-world obfuscated binaries, (b) any inter-rater agreement statistics obtained during labeling review, and (c) quantitative comparison against hand-annotated ground truth where such annotations exist. This validation will demonstrate the framework's reliability and directly mitigate the risk that systematic mislabeling would compromise downstream LLM training data. revision: yes
Circularity Check
No significant circularity; engineering pipeline with external validation
full rationale
The paper describes an implementation of a static analysis framework to decompose obfuscated binaries and automate structural-role labeling for LLM input. No mathematical derivations, equations, parameter fittings, or self-citation chains are present that reduce any claim to its own inputs by construction. Performance claims are positioned as testable against real-world virtualization obfuscators and human annotations, making the work self-contained against external benchmarks rather than internally forced.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Static analysis can reliably identify structural roles (dispatcher, handler, etc.) in virtualization-obfuscated code without false positives that would corrupt the training set.
Reference graph
Works this paper leans on
-
[1]
A taxonomy of obfuscating transformations,
C. Collberg, C. Thomborson, and D. Low, “A taxonomy of obfuscating transformations,” Department of Computer Science, University of Auckland, New Zealand, Tech. Rep. 148, 1997
work page 1997
-
[2]
A survey of code obfuscation techniques,
V . J. M. Salis and A. S. Sani, “A survey of code obfuscation techniques,” ACM Comput. Surv., vol. 53, no. 3, pp. 1–36, 2020
work page 2020
-
[3]
In2025 IEEE Symposium on Security and Privacy (SP)
N. Zhang, et al., “Inspecting Virtual Machine Diversification Inside Virtualization Obfuscation,” 2025 IEEE Symposium on Security and Privacy (SP), San Francisco, CA, USA, 2025, pp. 3051–3069, doi: 10.1109/SP61157.2025.00071
-
[4]
Chosen-Instruction Attack Against Commercial Code Virtualization Obfuscators,
S. Li, et al., “Chosen-Instruction Attack Against Commercial Code Virtualization Obfuscators,” 29th Annual Network and Distributed System Security Symposium, NDSS 2022, San Diego, California, USA, April 24–28, 2022
work page 2022
-
[5]
Virtualization-based obfuscation: A survey,
J. Teschke and T. Kurth, “Virtualization-based obfuscation: A survey,” in Proc. 12th Int. Conf. Availability, Rel. Secur. (ARES), 2017, pp. 1–10
work page 2017
-
[6]
The tigress diversifying c compiler,
B. Kinder, “The tigress diversifying c compiler,” 2023. [Online]. Available: http://tigress.cs.arizona.edu/
work page 2023
-
[7]
LLVM: A compilation framework for lifelong program analysis & transformation,
C. Lattner and V . Adve, “LLVM: A compilation framework for lifelong program analysis & transformation,” in Proc. Int. Symp. Code Gener. Optim. (CGO), 2004, pp. 75–86. 7
work page 2004
-
[8]
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
J. Devlin, M. W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of deep bidirectional transformers for language understanding,” 2018, arXiv:1810.04805. [Online]. Available: https://arxiv.org/abs/1810.04805
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[9]
Palmtree: Learning an assembly language model for instruction embedding,
B. Li, S. Feng, and G. Tan, “Palmtree: Learning an assembly language model for instruction embedding,” in Proc. 2021 ACM SIGSAC Conf. Comput. Commun. Secur. (CCS), 2021, pp. 3236–3251
work page 2021
-
[10]
DeepBinDiff: Learning program-wide code representations for binary diffing,
H. Pei, B. G. Chun, and S. Kim, “DeepBinDiff: Learning program-wide code representations for binary diffing,” in Proc. 27th Network Distrib. Syst. Secur. Symp. (NDSS), 2020
work page 2020
-
[11]
Identifying virtualized code via deep learning,
G. Luan, S. Feng, and G. Tan, “Identifying virtualized code via deep learning,” in Proc. 36th IEEE/ACM Int. Conf. Autom. Softw. Eng. (ASE), 2021, pp. 1178–1189
work page 2021
-
[12]
BinBert: Binary code understanding with BERT,
X. Wang, J. Yang, and Z. Lin, “BinBert: Binary code understanding with BERT,” in Proc. 32nd IEEE Int. Conf. Softw. Maint. Evol. (ICSME), 2022
work page 2022
-
[13]
S. Vaswani et al., “Attention is all you need,” in Proc. 31st Int. Conf. Neural Inf. Process. Syst. (NIPS), 2017, pp. 5998–6008
work page 2017
-
[14]
CNN based Virtualized Obfuscated Malware Detection Technique,
D. Yoo et al., “CNN based Virtualized Obfuscated Malware Detection Technique,” in Proc. Symposium of the Korean Institute of Communications and Information Sciences, 2024, pp. 637–638
work page 2024
-
[15]
Analysis of anti-reversing functionalities of vmprotect and bypass method using pin,
S. Park and Y . Park, “Analysis of anti-reversing functionalities of vmprotect and bypass method using pin,” KIPS Trans. Comp. Comm. Syst., vol. 10, no. 11, pp. 297–304, 2021
work page 2021
-
[16]
VMHunt: A verifiable approach to partially-virtualized binary code simplification,
D. Xu et al., “VMHunt: A verifiable approach to partially-virtualized binary code simplification,” in Proc. ACM CCS, 2018, pp. 442–458
work page 2018
-
[17]
VMAttack: Deobfuscating virtualization-based packed binaries,
A. Kalysch et al., “VMAttack: Deobfuscating virtualization-based packed binaries,” in Proc. ARES, 2017, pp. 1–10. 8
work page 2017
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.