A Neural-based Program Decompiler
Pith reviewed 2026-05-25 13:58 UTC · model grok-4.3
The pith
Coda recovers source programs from binaries at 82% accuracy on unseen samples where conventional decompilers recover zero.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Coda decomposes decompilation into two phases: first an instruction type-aware encoder paired with a tree decoder that produces an abstract syntax tree sketch while feeding attention signals, then an iterative error-correction loop driven by an ensemble of neural predictors that selects and repairs candidate programs toward functional equivalence with the original binary.
What carries the argument
The two-phase pipeline of attention-guided AST sketch generation followed by neural error-predictor ensemble for iterative correction.
If this is right
- Decompilation becomes feasible for arbitrary source-target language pairs without incurring separate development costs for each pair.
- Generated programs preserve the functionality of the input binary more reliably than conventional outputs.
- Recovered code captures semantic intent rather than only syntactic structure, easing human interpretation.
- The same sketch-plus-correction pattern outperforms pure sequence-to-sequence attention models by a wide margin on program accuracy.
Where Pith is reading between the lines
- If the error-correction stage can be made language-agnostic, the framework could serve as a base for decompiling binaries from entirely new architectures with minimal additional data.
- Combining the neural sketch stage with lightweight symbolic verification might reduce the cases where the iterative loop fails to reach perfect recovery.
- The approach suggests that other low-level code transformation tasks, such as binary lifting or optimization recovery, could benefit from similar sketch-then-repair decomposition.
Load-bearing premise
Binaries in the evaluation set share structural and distributional properties with the training data closely enough that the neural components generalize without language-specific retraining or tuning of the baselines.
What would settle it
Running Coda on binaries compiled from source languages, compilers, or optimization settings absent from the training distribution and finding recovery accuracy near zero would falsify the generalization claim.
Figures
read the original abstract
Reverse engineering of binary executables is a critical problem in the computer security domain. On the one hand, malicious parties may recover interpretable source codes from the software products to gain commercial advantages. On the other hand, binary decompilation can be leveraged for code vulnerability analysis and malware detection. However, efficient binary decompilation is challenging. Conventional decompilers have the following major limitations: (i) they are only applicable to specific source-target language pair, hence incurs undesired development cost for new language tasks; (ii) their output high-level code cannot effectively preserve the correct functionality of the input binary; (iii) their output program does not capture the semantics of the input and the reversed program is hard to interpret. To address the above problems, we propose Coda, the first end-to-end neural-based framework for code decompilation. Coda decomposes the decompilation task into two key phases: First, Coda employs an instruction type-aware encoder and a tree decoder for generating an abstract syntax tree (AST) with attention feeding during the code sketch generation stage. Second, Coda then updates the code sketch using an iterative error correction machine guided by an ensembled neural error predictor. By finding a good approximate candidate and then fixing it towards perfect, Coda achieves superior performance compared to baseline approaches. We assess Coda's performance with extensive experiments on various benchmarks. Evaluation results show that Coda achieves an average of 82\% program recovery accuracy on unseen binary samples, where the state-of-the-art decompilers yield 0\% accuracy. Furthermore, Coda outperforms the sequence-to-sequence model with attention by a margin of 70\% program accuracy.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes Coda, the first end-to-end neural-based framework for binary decompilation. It decomposes the task into an instruction type-aware encoder and tree decoder for generating an AST sketch with attention, followed by an iterative error correction machine guided by an ensembled neural error predictor. The central claim is that Coda achieves an average of 82% program recovery accuracy on unseen binary samples (versus 0% for state-of-the-art decompilers) and outperforms sequence-to-sequence models with attention by a 70% margin.
Significance. If the experimental claims hold after full disclosure of methodology, the work would be significant as an early demonstration of neural methods for decompilation that aim to improve language generality and functionality preservation over conventional tools. The sketch-plus-correction architecture is a reasonable design choice. The absence of experimental details, however, prevents any determination of whether the result actually advances the state of the art.
major comments (2)
- [Abstract and Evaluation] Abstract and Evaluation section: the reported 82% program recovery accuracy and 0% baseline figures cannot be assessed because the manuscript supplies no information on dataset construction, train-test split methodology, the precise definition of program recovery accuracy (including how functionality equivalence was measured), or whether the conventional decompilers and seq2seq baseline received equivalent hyper-parameter search.
- [Evaluation] Evaluation section: the claim that Coda generalizes to 'unseen binary samples' while conventional decompilers yield 0% is load-bearing for the central contribution, yet no evidence is given that the test distribution matches the training distribution or that baselines were applied without language-specific tuning that would be available in deployment.
minor comments (1)
- [Abstract] The abstract refers to 'various benchmarks' without naming them; the main text should list the exact benchmarks, languages, and sizes.
Simulated Author's Rebuttal
We thank the referee for the thorough review and constructive feedback. The comments correctly identify areas where the manuscript requires additional methodological transparency to allow proper evaluation of the claims. We will revise the paper to address these points and provide the requested details. Our point-by-point responses follow.
read point-by-point responses
-
Referee: [Abstract and Evaluation] Abstract and Evaluation section: the reported 82% program recovery accuracy and 0% baseline figures cannot be assessed because the manuscript supplies no information on dataset construction, train-test split methodology, the precise definition of program recovery accuracy (including how functionality equivalence was measured), or whether the conventional decompilers and seq2seq baseline received equivalent hyper-parameter search.
Authors: We agree that these details are essential for assessing the results and were insufficiently elaborated in the original submission. In the revised manuscript we will add a new subsection under Evaluation that specifies: dataset construction (C source programs from standard benchmarks compiled to x86-64 binaries with gcc -O0), the train-test split (80/20 random split at the program level with no shared functions between sets), the accuracy metric (exact AST match after correction, with functionality equivalence verified by successful recompilation and identical output on a suite of test inputs), and hyper-parameter procedures (grid search performed equivalently for the seq2seq baseline; conventional decompilers evaluated with their published default configurations). These additions will directly support the reported 82% and 0% figures. revision: yes
-
Referee: [Evaluation] Evaluation section: the claim that Coda generalizes to 'unseen binary samples' while conventional decompilers yield 0% is load-bearing for the central contribution, yet no evidence is given that the test distribution matches the training distribution or that baselines were applied without language-specific tuning that would be available in deployment.
Authors: We will expand the Evaluation section to explicitly state that test samples are drawn from the identical program distribution as the training set (same generation process and compiler flags) but contain no overlapping programs or functions, thereby demonstrating generalization within the distribution. For baselines, we will clarify that conventional decompilers were run with their standard, language-agnostic configurations as released, without per-benchmark tuning, consistent with how they are deployed; the seq2seq baseline received the same search budget as Coda. We maintain that the 0% result is reproducible on this benchmark because the decompiled outputs fail to compile or match on test inputs, which we will illustrate with concrete examples in the revision. We partially disagree that language-specific tuning for conventional tools would be routinely available in deployment without expert intervention, but we will add a limitations paragraph acknowledging this distinction. revision: partial
Circularity Check
No significant circularity; empirical ML evaluation on held-out benchmarks
full rationale
The paper reports an empirical neural decompiler (Coda) whose central claims are accuracy numbers measured on unseen binary samples from benchmarks. No derivation chain, first-principles result, or mathematical prediction is presented that reduces to its own inputs by construction. Standard train/test split evaluation on external benchmarks is falsifiable outside the paper and does not match any of the enumerated circularity patterns (self-definitional, fitted-input-called-prediction, self-citation load-bearing, etc.). The result is therefore self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
free parameters (1)
- neural network weights and hyperparameters
axioms (1)
- domain assumption Training and test binaries are drawn from distributions that allow generalization of the learned decompilation mapping.
Reference graph
Works this paper leans on
-
[1]
Effective and efficient malware detection at the end host,
“Effective and efficient malware detection at the end host,” in Presented as part of the 18th USENIX Security Symposium (USENIX Security 09) . Montreal, Canada: USENIX, 2009. [Online]. Available: https://www.usenix.org/node/
work page 2009
-
[2]
Helping johnny to analyze malware: A usability-optimized decompiler and malware analysis user study,
K. Yakdan, S. Dechand, E. Gerhards-Padilla, and M. Smith, “Helping johnny to analyze malware: A usability-optimized decompiler and malware analysis user study,” in 2016 IEEE Symposium on Security and Privacy (SP), May 2016, pp. 158–177
work page 2016
-
[3]
Tie: Principled reverse engineering of types in binary programs,
J. Lee, T. Avgerinos, and D. Brumley, “Tie: Principled reverse engineering of types in binary programs,” 2011
work page 2011
-
[4]
Lexical statistical machine translation for language migration,
A. T. Nguyen, T. T. Nguyen, and T. N. Nguyen, “Lexical statistical machine translation for language migration,” in Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering, ser. ESEC/FSE 2013. New York, NY , USA: ACM, 2013, pp. 651–654. [Online]. Available: http://doi.acm.org/10.1145/2491411.2494584
-
[5]
Divide-and-conquer approach for multi-phase statistical migration for source code (t),
A. T. Nguyen, T. T. Nguyen, and T. N. Nguyen, “Divide-and-conquer approach for multi-phase statistical migration for source code (t),” in 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE), Nov 2015, pp. 585–596
work page 2015
-
[6]
Using recurrent neural networks for decompilation,
D. S. Katz, J. Ruchti, and E. Schulte, “Using recurrent neural networks for decompilation,” in 2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE, 2018, pp. 346–356
work page 2018
-
[7]
Explaining and extending the bit-parallel approximate string matching algorithm of myers,
H. Hyyrö, “Explaining and extending the bit-parallel approximate string matching algorithm of myers,” Citeseer, Tech. Rep., 2001
work page 2001
-
[8]
{BYTEWEIGHT}: Learning to recognize functions in binary code,
T. Bao, J. Burket, M. Woo, R. Turner, and D. Brumley, “ {BYTEWEIGHT}: Learning to recognize functions in binary code,” in 23rd {USENIX} Security Symposium ( {USENIX} Security 14), 2014, pp. 845–860
work page 2014
-
[9]
Bincomp: A stratified approach to compiler provenance attribution,
A. Rahimian, P. Shirani, S. Alrbaee, L. Wang, and M. Debbabi, “Bincomp: A stratified approach to compiler provenance attribution,” Digital Investigation, vol. 14, pp. S146–S155, 2015
work page 2015
-
[10]
Learning to analyze binary computer code
N. E. Rosenblum, X. Zhu, B. P. Miller, and K. Hunt, “Learning to analyze binary computer code.” 2008
work page 2008
-
[11]
Recognizing functions in binaries with neural networks,
E. C. R. Shin, D. Song, and R. Moazzezi, “Recognizing functions in binaries with neural networks,” in 24th USENIX Security Symposium (USENIX Security 15) . Washington, D.C.: USENIX Association, 2015, pp. 611–626. [Online]. Available: https://www.usenix.org/conference/usenixsecurity15/technical-sessions/presentation/shin
work page 2015
-
[12]
Meaningful variable names for decompiled code: A machine translation approach,
A. Jaffe, J. Lacomis, E. J. Schwartz, C. L. Goues, and B. Vasilescu, “Meaningful variable names for decompiled code: A machine translation approach,” in Proceedings of the 26th Conference on Program Comprehension, ser. ICPC ’18. New York, NY , USA: ACM, 2018, pp. 20–30. [Online]. Available: http://doi.acm.org/10.1145/3196321.3196330
-
[13]
Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks
K. S. Tai, R. Socher, and C. D. Manning, “Improved semantic representations from tree- structured long short-term memory networks,” arXiv preprint arXiv:1503.00075, 2015
work page internal anchor Pith review Pith/arXiv arXiv 2015
-
[14]
Tree-to-tree neural networks for program translation,
X. Chen, C. Liu, and D. Song, “Tree-to-tree neural networks for program translation,” in Advances in Neural Information Processing Systems, 2018, pp. 2547–2557
work page 2018
-
[15]
Language to logical form with neural attention,
L. Dong and M. Lapata, “Language to logical form with neural attention,” in Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Berlin, Germany: Association for Computational Linguistics, Aug. 2016, pp. 33–43. [Online]. Available: https://www.aclweb.org/anthology/P16-1004 9
work page 2016
-
[16]
Learned in translation: Contextualized word vectors,
B. McCann, J. Bradbury, C. Xiong, and R. Socher, “Learned in translation: Contextualized word vectors,” in Advances in Neural Information Processing Systems, 2017, pp. 6294–6305
work page 2017
- [17]
-
[18]
REDasm, https://github.com/REDasmOrg/REDasm, 2019
work page 2019
-
[19]
G. Kane, MIPS RISC Architecture. Upper Saddle River, NJ, USA: Prentice-Hall, Inc., 1988
work page 1988
-
[20]
Intel ® 64 and ia-32 architectures software developer manuals,
Intel, “Intel ® 64 and ia-32 architectures software developer manuals,” 2017. [Online]. Available: https://software.intel.com/en-us/articles/intel-sdm
work page 2017
- [21]
- [22]
-
[23]
P. C. API, https://pytorch.org/cppdocs/, 2019
work page 2019
-
[24]
H. S. Warren, Hacker’s delight. Pearson Education, 2013
work page 2013
-
[25]
E. Schkufza, R. Sharma, and A. Aiken, “Stochastic superoptimization,” SIGPLAN Not., vol. 48, no. 4, pp. 305–316, Mar. 2013. [Online]. Available: http://doi.acm.org/10.1145/2499368. 2451150
-
[26]
RetDec, https://retdec.com/, 2017
work page 2017
-
[27]
Cifuentes, Reverse compilation techniques, 1994
C. Cifuentes, Reverse compilation techniques, 1994
work page 1994
-
[28]
Using a decompiler for real-world source recovery,
M. Emmerik and T. Waddington, “Using a decompiler for real-world source recovery,” in 11th Working Conference on Reverse Engineering. IEEE, 2004, pp. 27–36
work page 2004
-
[29]
Bap: A binary analysis platform,
D. Brumley, I. Jager, T. Avgerinos, and E. J. Schwartz, “Bap: A binary analysis platform,” in Computer Aided Verification, G. Gopalakrishnan and S. Qadeer, Eds. Berlin, Heidelberg: Springer Berlin Heidelberg, 2011, pp. 463–469
work page 2011
-
[30]
D. Brumley, J. Lee, E. J. Schwartz, and M. Woo, “Native x86 decompilation using semantics- preserving structural analysis and iterative control-flow structuring,” in Presented as part of the 22nd {USENIX} Security Symposium ({USENIX} Security 13), 2013, pp. 353–368
work page 2013
-
[31]
Hex-Rays, https://www.hex-rays.com/, 2017
work page 2017
-
[32]
Virtuoso: Narrowing the semantic gap in virtual machine introspection,
B. Dolan-Gavitt, T. Leek, M. Zhivich, J. Giffin, and W. Lee, “Virtuoso: Narrowing the semantic gap in virtual machine introspection,” in2011 IEEE Symposium on Security and Privacy. IEEE, 2011, pp. 297–312
work page 2011
-
[33]
Latent predictor networks for code generation,
W. Ling, P. Blunsom, E. Grefenstette, K. M. Hermann, T. Koˇciský, F. Wang, and A. Senior, “Latent predictor networks for code generation,” in Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Berlin, Germany: Association for Computational Linguistics, Aug. 2016, pp. 599–609. [Online]. Availabl...
work page 2016
-
[34]
A syntactic neural model for general-purpose code generation,
P. Yin and G. Neubig, “A syntactic neural model for general-purpose code generation,” in Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Vancouver, Canada: Association for Computational Linguistics, Jul. 2017, pp. 440–450. [Online]. Available: https://www.aclweb.org/anthology/P17-1041
work page 2017
-
[35]
Abstract Syntax Networks for Code Generation and Semantic Parsing
M. Rabinovich, M. Stern, and D. Klein, “Abstract syntax networks for code generation and semantic parsing,” arXiv preprint arXiv:1704.07535, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[36]
Towards Synthesizing Complex Programs from Input-Output Examples
X. Chen, C. Liu, and D. Song, “Towards synthesizing complex programs from input-output examples,” arXiv preprint arXiv:1706.01284, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[37]
Execution-guided neural program synthesis,
——, “Execution-guided neural program synthesis,” in International Conference on Learning Representations, 2019. [Online]. Available: https://openreview.net/forum?id=H1gfOiAqYm 10
work page 2019
-
[38]
Deepfix: Fixing common c language errors by deep learning,
R. Gupta, S. Pal, A. Kanade, and S. Shevade, “Deepfix: Fixing common c language errors by deep learning,” in Thirty-First AAAI Conference on Artificial Intelligence, 2017
work page 2017
-
[39]
Learning Program Embeddings to Propagate Feedback on Student Code
C. Piech, J. Huang, A. Nguyen, M. Phulsuksombati, M. Sahami, and L. Guibas, “Learning program embeddings to propagate feedback on student code,” arXiv preprint arXiv:1505.05969, 2015
work page internal anchor Pith review Pith/arXiv arXiv 2015
-
[40]
Dynamic Neural Program Embedding for Program Repair
K. Wang, R. Singh, and Z. Su, “Dynamic neural program embedding for program repair,” arXiv preprint arXiv:1711.07163, 2017. A Experiment Setup and Benchmark Details We ran our experiments on Amazon EC2 using p3.4xlarge instance which contains Nvidia Tesla V100 GPUs with 16GB main memory. A.1 Denoising Process We start the tokenization of the low-level ass...
work page internal anchor Pith review Pith/arXiv arXiv 2017
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.