LLM deobfuscation of binaries to pseudocode depends more on reasoning ability and task-specific fine-tuning than on model size, with reasoning models showing robustness across ISAs and obfuscation levels on the new BinDeObfBench.
Binary code summarization: Benchmarking chatgpt/gpt- 4 and other large language models
3 Pith papers cite this work. Polarity classification is still indexing.
verdicts
UNVERDICTED 3representative citing papers
REBench is a new benchmark that consolidates existing datasets into a large collection of binaries with knowledge-base-driven ground truth to enable fair LLM evaluation on stripped-binary type and name recovery.
RETROFIT enables continual learning for malware detection and binary summarization by retrospective-free parameter merging with low-rank sparse updates and confidence-guided arbitration, improving retention and generalization without historical data.
citing papers explorer
-
Can LLMs Deobfuscate Binary Code? A Systematic Analysis of Large Language Models into Pseudocode Deobfuscation
LLM deobfuscation of binaries to pseudocode depends more on reasoning ability and task-specific fine-tuning than on model size, with reasoning models showing robustness across ISAs and obfuscation levels on the new BinDeObfBench.
-
REBENCH: A Procedural, Fair-by-Construction Benchmark for LLMs on Stripped-Binary Types and Names (Extended Version)
REBench is a new benchmark that consolidates existing datasets into a large collection of binaries with knowledge-base-driven ground truth to enable fair LLM evaluation on stripped-binary type and name recovery.
-
Retrofit: Continual Learning with Controlled Forgetting for Binary Security Detection and Analysis
RETROFIT enables continual learning for malware detection and binary summarization by retrospective-free parameter merging with low-rank sparse updates and confidence-guided arbitration, improving retention and generalization without historical data.