pith. sign in

arxiv: 1906.12029 · v1 · pith:PAAWFKKUnew · submitted 2019-06-28 · 💻 cs.PL · cs.LG

A Neural-based Program Decompiler

Pith reviewed 2026-05-25 13:58 UTC · model grok-4.3

classification 💻 cs.PL cs.LG
keywords neural decompilationbinary reverse engineeringprogram recoveryabstract syntax treeerror correctionmachine learning for codebinary analysis
0
0 comments X

The pith

Coda recovers source programs from binaries at 82% accuracy on unseen samples where conventional decompilers recover zero.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents Coda as an end-to-end neural framework that splits decompilation into an initial sketch generation stage using an instruction-aware encoder and tree decoder, followed by iterative refinement via an ensembled neural error predictor. It targets the limitations of rule-based decompilers that require per-language-pair engineering, often emit non-functional code, and produce outputs that do not preserve input semantics. A sympathetic reader would care because binary decompilation supports security tasks such as vulnerability discovery and malware detection, where manual reverse engineering remains costly. The reported results show 82% program recovery on held-out binaries versus 0% for state-of-the-art conventional tools and a 70% gain over attention-based sequence models.

Core claim

Coda decomposes decompilation into two phases: first an instruction type-aware encoder paired with a tree decoder that produces an abstract syntax tree sketch while feeding attention signals, then an iterative error-correction loop driven by an ensemble of neural predictors that selects and repairs candidate programs toward functional equivalence with the original binary.

What carries the argument

The two-phase pipeline of attention-guided AST sketch generation followed by neural error-predictor ensemble for iterative correction.

If this is right

  • Decompilation becomes feasible for arbitrary source-target language pairs without incurring separate development costs for each pair.
  • Generated programs preserve the functionality of the input binary more reliably than conventional outputs.
  • Recovered code captures semantic intent rather than only syntactic structure, easing human interpretation.
  • The same sketch-plus-correction pattern outperforms pure sequence-to-sequence attention models by a wide margin on program accuracy.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the error-correction stage can be made language-agnostic, the framework could serve as a base for decompiling binaries from entirely new architectures with minimal additional data.
  • Combining the neural sketch stage with lightweight symbolic verification might reduce the cases where the iterative loop fails to reach perfect recovery.
  • The approach suggests that other low-level code transformation tasks, such as binary lifting or optimization recovery, could benefit from similar sketch-then-repair decomposition.

Load-bearing premise

Binaries in the evaluation set share structural and distributional properties with the training data closely enough that the neural components generalize without language-specific retraining or tuning of the baselines.

What would settle it

Running Coda on binaries compiled from source languages, compilers, or optimization settings absent from the training distribution and finding recovery accuracy near zero would falsify the generalization claim.

Figures

Figures reproduced from arXiv: 1906.12029 by Cheng Fu, Farinaz Koushanfar, Haolan Liu, Huili Chen, Jishen Zhao, Xinyun Chen, Yuandong Tian.

Figure 1
Figure 1. Figure 1: (a) Example low-level assembly code snippet and its corresponding high-level C program. The red line indicates the instruction type and its encoded hidden state. (b) The expanding nodes from the AST decoder. (c) The red node is an example of how the prediction is computed. 2.2 Problem definition We define the task of Program Decompilation as follows: Problem Decompilation Definition: Let P denote an arbitr… view at source ↗
Figure 2
Figure 2. Figure 2: The global flow of Coda decompilation. The denoising and tokenization steps are omitted in this figure for simplicity (See Appendix A.1). 3.1 Code Sketch Generation We employ the neural encoder-decoder architecture for generating the sketch code from the low-level program φ. In this paper, the encoder takes the assembly program generated from the disassembler as the input and output an AST that can be equi… view at source ↗
Figure 1
Figure 1. Figure 1: The state transition equations of Coda’s AST decoder are shown as follows: [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗
Figure 3
Figure 3. Figure 3: Examples of the possible control flow graph of dataset generation for Coda decompilation. 11 [PITH_FULL_IMAGE:figures/full_fig_p011_3.png] view at source ↗
Figure 1
Figure 1. Figure 1: Benchmark examples for (i) Pytorch C++ API (ii) Hacker’s Delight [PITH_FULL_IMAGE:figures/full_fig_p014_1.png] view at source ↗
Figure 3
Figure 3. Figure 3: Source Code and corresponding de￾compiled results from (i) Coda and (ii) RetDec. This example shows that the state-of-the-art de￾compiler fails to preserve the functionality and semantics. g1 to g7 are global variable that is used to pass parameters [PITH_FULL_IMAGE:figures/full_fig_p015_3.png] view at source ↗
read the original abstract

Reverse engineering of binary executables is a critical problem in the computer security domain. On the one hand, malicious parties may recover interpretable source codes from the software products to gain commercial advantages. On the other hand, binary decompilation can be leveraged for code vulnerability analysis and malware detection. However, efficient binary decompilation is challenging. Conventional decompilers have the following major limitations: (i) they are only applicable to specific source-target language pair, hence incurs undesired development cost for new language tasks; (ii) their output high-level code cannot effectively preserve the correct functionality of the input binary; (iii) their output program does not capture the semantics of the input and the reversed program is hard to interpret. To address the above problems, we propose Coda, the first end-to-end neural-based framework for code decompilation. Coda decomposes the decompilation task into two key phases: First, Coda employs an instruction type-aware encoder and a tree decoder for generating an abstract syntax tree (AST) with attention feeding during the code sketch generation stage. Second, Coda then updates the code sketch using an iterative error correction machine guided by an ensembled neural error predictor. By finding a good approximate candidate and then fixing it towards perfect, Coda achieves superior performance compared to baseline approaches. We assess Coda's performance with extensive experiments on various benchmarks. Evaluation results show that Coda achieves an average of 82\% program recovery accuracy on unseen binary samples, where the state-of-the-art decompilers yield 0\% accuracy. Furthermore, Coda outperforms the sequence-to-sequence model with attention by a margin of 70\% program accuracy.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes Coda, the first end-to-end neural-based framework for binary decompilation. It decomposes the task into an instruction type-aware encoder and tree decoder for generating an AST sketch with attention, followed by an iterative error correction machine guided by an ensembled neural error predictor. The central claim is that Coda achieves an average of 82% program recovery accuracy on unseen binary samples (versus 0% for state-of-the-art decompilers) and outperforms sequence-to-sequence models with attention by a 70% margin.

Significance. If the experimental claims hold after full disclosure of methodology, the work would be significant as an early demonstration of neural methods for decompilation that aim to improve language generality and functionality preservation over conventional tools. The sketch-plus-correction architecture is a reasonable design choice. The absence of experimental details, however, prevents any determination of whether the result actually advances the state of the art.

major comments (2)
  1. [Abstract and Evaluation] Abstract and Evaluation section: the reported 82% program recovery accuracy and 0% baseline figures cannot be assessed because the manuscript supplies no information on dataset construction, train-test split methodology, the precise definition of program recovery accuracy (including how functionality equivalence was measured), or whether the conventional decompilers and seq2seq baseline received equivalent hyper-parameter search.
  2. [Evaluation] Evaluation section: the claim that Coda generalizes to 'unseen binary samples' while conventional decompilers yield 0% is load-bearing for the central contribution, yet no evidence is given that the test distribution matches the training distribution or that baselines were applied without language-specific tuning that would be available in deployment.
minor comments (1)
  1. [Abstract] The abstract refers to 'various benchmarks' without naming them; the main text should list the exact benchmarks, languages, and sizes.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thorough review and constructive feedback. The comments correctly identify areas where the manuscript requires additional methodological transparency to allow proper evaluation of the claims. We will revise the paper to address these points and provide the requested details. Our point-by-point responses follow.

read point-by-point responses
  1. Referee: [Abstract and Evaluation] Abstract and Evaluation section: the reported 82% program recovery accuracy and 0% baseline figures cannot be assessed because the manuscript supplies no information on dataset construction, train-test split methodology, the precise definition of program recovery accuracy (including how functionality equivalence was measured), or whether the conventional decompilers and seq2seq baseline received equivalent hyper-parameter search.

    Authors: We agree that these details are essential for assessing the results and were insufficiently elaborated in the original submission. In the revised manuscript we will add a new subsection under Evaluation that specifies: dataset construction (C source programs from standard benchmarks compiled to x86-64 binaries with gcc -O0), the train-test split (80/20 random split at the program level with no shared functions between sets), the accuracy metric (exact AST match after correction, with functionality equivalence verified by successful recompilation and identical output on a suite of test inputs), and hyper-parameter procedures (grid search performed equivalently for the seq2seq baseline; conventional decompilers evaluated with their published default configurations). These additions will directly support the reported 82% and 0% figures. revision: yes

  2. Referee: [Evaluation] Evaluation section: the claim that Coda generalizes to 'unseen binary samples' while conventional decompilers yield 0% is load-bearing for the central contribution, yet no evidence is given that the test distribution matches the training distribution or that baselines were applied without language-specific tuning that would be available in deployment.

    Authors: We will expand the Evaluation section to explicitly state that test samples are drawn from the identical program distribution as the training set (same generation process and compiler flags) but contain no overlapping programs or functions, thereby demonstrating generalization within the distribution. For baselines, we will clarify that conventional decompilers were run with their standard, language-agnostic configurations as released, without per-benchmark tuning, consistent with how they are deployed; the seq2seq baseline received the same search budget as Coda. We maintain that the 0% result is reproducible on this benchmark because the decompiled outputs fail to compile or match on test inputs, which we will illustrate with concrete examples in the revision. We partially disagree that language-specific tuning for conventional tools would be routinely available in deployment without expert intervention, but we will add a limitations paragraph acknowledging this distinction. revision: partial

Circularity Check

0 steps flagged

No significant circularity; empirical ML evaluation on held-out benchmarks

full rationale

The paper reports an empirical neural decompiler (Coda) whose central claims are accuracy numbers measured on unseen binary samples from benchmarks. No derivation chain, first-principles result, or mathematical prediction is presented that reduces to its own inputs by construction. Standard train/test split evaluation on external benchmarks is falsifiable outside the paper and does not match any of the enumerated circularity patterns (self-definitional, fitted-input-called-prediction, self-citation load-bearing, etc.). The result is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the empirical performance of a neural network whose weights are fitted to training binaries; no independent derivation or external benchmark is supplied in the abstract.

free parameters (1)
  • neural network weights and hyperparameters
    All model parameters are learned from data; the 82% accuracy figure is produced by this fitting process.
axioms (1)
  • domain assumption Training and test binaries are drawn from distributions that allow generalization of the learned decompilation mapping.
    The abstract reports accuracy on 'unseen binary samples' without stating how the unseen set was constructed or whether it matches real-world binaries.

pith-pipeline@v0.9.0 · 5848 in / 1355 out tokens · 26647 ms · 2026-05-25T13:58:05.030289+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

40 extracted references · 40 canonical work pages · 5 internal anchors

  1. [1]

    Effective and efficient malware detection at the end host,

    “Effective and efficient malware detection at the end host,” in Presented as part of the 18th USENIX Security Symposium (USENIX Security 09) . Montreal, Canada: USENIX, 2009. [Online]. Available: https://www.usenix.org/node/

  2. [2]

    Helping johnny to analyze malware: A usability-optimized decompiler and malware analysis user study,

    K. Yakdan, S. Dechand, E. Gerhards-Padilla, and M. Smith, “Helping johnny to analyze malware: A usability-optimized decompiler and malware analysis user study,” in 2016 IEEE Symposium on Security and Privacy (SP), May 2016, pp. 158–177

  3. [3]

    Tie: Principled reverse engineering of types in binary programs,

    J. Lee, T. Avgerinos, and D. Brumley, “Tie: Principled reverse engineering of types in binary programs,” 2011

  4. [4]

    Lexical statistical machine translation for language migration,

    A. T. Nguyen, T. T. Nguyen, and T. N. Nguyen, “Lexical statistical machine translation for language migration,” in Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering, ser. ESEC/FSE 2013. New York, NY , USA: ACM, 2013, pp. 651–654. [Online]. Available: http://doi.acm.org/10.1145/2491411.2494584

  5. [5]

    Divide-and-conquer approach for multi-phase statistical migration for source code (t),

    A. T. Nguyen, T. T. Nguyen, and T. N. Nguyen, “Divide-and-conquer approach for multi-phase statistical migration for source code (t),” in 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE), Nov 2015, pp. 585–596

  6. [6]

    Using recurrent neural networks for decompilation,

    D. S. Katz, J. Ruchti, and E. Schulte, “Using recurrent neural networks for decompilation,” in 2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE, 2018, pp. 346–356

  7. [7]

    Explaining and extending the bit-parallel approximate string matching algorithm of myers,

    H. Hyyrö, “Explaining and extending the bit-parallel approximate string matching algorithm of myers,” Citeseer, Tech. Rep., 2001

  8. [8]

    {BYTEWEIGHT}: Learning to recognize functions in binary code,

    T. Bao, J. Burket, M. Woo, R. Turner, and D. Brumley, “ {BYTEWEIGHT}: Learning to recognize functions in binary code,” in 23rd {USENIX} Security Symposium ( {USENIX} Security 14), 2014, pp. 845–860

  9. [9]

    Bincomp: A stratified approach to compiler provenance attribution,

    A. Rahimian, P. Shirani, S. Alrbaee, L. Wang, and M. Debbabi, “Bincomp: A stratified approach to compiler provenance attribution,” Digital Investigation, vol. 14, pp. S146–S155, 2015

  10. [10]

    Learning to analyze binary computer code

    N. E. Rosenblum, X. Zhu, B. P. Miller, and K. Hunt, “Learning to analyze binary computer code.” 2008

  11. [11]

    Recognizing functions in binaries with neural networks,

    E. C. R. Shin, D. Song, and R. Moazzezi, “Recognizing functions in binaries with neural networks,” in 24th USENIX Security Symposium (USENIX Security 15) . Washington, D.C.: USENIX Association, 2015, pp. 611–626. [Online]. Available: https://www.usenix.org/conference/usenixsecurity15/technical-sessions/presentation/shin

  12. [12]

    Meaningful variable names for decompiled code: A machine translation approach,

    A. Jaffe, J. Lacomis, E. J. Schwartz, C. L. Goues, and B. Vasilescu, “Meaningful variable names for decompiled code: A machine translation approach,” in Proceedings of the 26th Conference on Program Comprehension, ser. ICPC ’18. New York, NY , USA: ACM, 2018, pp. 20–30. [Online]. Available: http://doi.acm.org/10.1145/3196321.3196330

  13. [13]

    Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks

    K. S. Tai, R. Socher, and C. D. Manning, “Improved semantic representations from tree- structured long short-term memory networks,” arXiv preprint arXiv:1503.00075, 2015

  14. [14]

    Tree-to-tree neural networks for program translation,

    X. Chen, C. Liu, and D. Song, “Tree-to-tree neural networks for program translation,” in Advances in Neural Information Processing Systems, 2018, pp. 2547–2557

  15. [15]

    Language to logical form with neural attention,

    L. Dong and M. Lapata, “Language to logical form with neural attention,” in Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Berlin, Germany: Association for Computational Linguistics, Aug. 2016, pp. 33–43. [Online]. Available: https://www.aclweb.org/anthology/P16-1004 9

  16. [16]

    Learned in translation: Contextualized word vectors,

    B. McCann, J. Bradbury, C. Xiong, and R. Socher, “Learned in translation: Contextualized word vectors,” in Advances in Neural Information Processing Systems, 2017, pp. 6294–6305

  17. [17]

    mipt-mips,

    “mipt-mips,” https://github.com/MIPT-ILab/mipt-mips

  18. [18]

    REDasm, https://github.com/REDasmOrg/REDasm, 2019

  19. [19]

    Kane, MIPS RISC Architecture

    G. Kane, MIPS RISC Architecture. Upper Saddle River, NJ, USA: Prentice-Hall, Inc., 1988

  20. [20]

    Intel ® 64 and ia-32 architectures software developer manuals,

    Intel, “Intel ® 64 and ia-32 architectures software developer manuals,” 2017. [Online]. Available: https://software.intel.com/en-us/articles/intel-sdm

  21. [21]

    Karel the robot,

    R. E. Pattis, “Karel the robot,” https://www.cs.mtsu.edu/~untch/karel/

  22. [22]

    Math c++ library,

    “Math c++ library,” http://www.cplusplus.com/reference/cmath/

  23. [23]

    P. C. API, https://pytorch.org/cppdocs/, 2019

  24. [24]

    H. S. Warren, Hacker’s delight. Pearson Education, 2013

  25. [25]

    Stochastic superoptimization,

    E. Schkufza, R. Sharma, and A. Aiken, “Stochastic superoptimization,” SIGPLAN Not., vol. 48, no. 4, pp. 305–316, Mar. 2013. [Online]. Available: http://doi.acm.org/10.1145/2499368. 2451150

  26. [26]

    RetDec, https://retdec.com/, 2017

  27. [27]

    Cifuentes, Reverse compilation techniques, 1994

    C. Cifuentes, Reverse compilation techniques, 1994

  28. [28]

    Using a decompiler for real-world source recovery,

    M. Emmerik and T. Waddington, “Using a decompiler for real-world source recovery,” in 11th Working Conference on Reverse Engineering. IEEE, 2004, pp. 27–36

  29. [29]

    Bap: A binary analysis platform,

    D. Brumley, I. Jager, T. Avgerinos, and E. J. Schwartz, “Bap: A binary analysis platform,” in Computer Aided Verification, G. Gopalakrishnan and S. Qadeer, Eds. Berlin, Heidelberg: Springer Berlin Heidelberg, 2011, pp. 463–469

  30. [30]

    Native x86 decompilation using semantics- preserving structural analysis and iterative control-flow structuring,

    D. Brumley, J. Lee, E. J. Schwartz, and M. Woo, “Native x86 decompilation using semantics- preserving structural analysis and iterative control-flow structuring,” in Presented as part of the 22nd {USENIX} Security Symposium ({USENIX} Security 13), 2013, pp. 353–368

  31. [31]

    Hex-Rays, https://www.hex-rays.com/, 2017

  32. [32]

    Virtuoso: Narrowing the semantic gap in virtual machine introspection,

    B. Dolan-Gavitt, T. Leek, M. Zhivich, J. Giffin, and W. Lee, “Virtuoso: Narrowing the semantic gap in virtual machine introspection,” in2011 IEEE Symposium on Security and Privacy. IEEE, 2011, pp. 297–312

  33. [33]

    Latent predictor networks for code generation,

    W. Ling, P. Blunsom, E. Grefenstette, K. M. Hermann, T. Koˇciský, F. Wang, and A. Senior, “Latent predictor networks for code generation,” in Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Berlin, Germany: Association for Computational Linguistics, Aug. 2016, pp. 599–609. [Online]. Availabl...

  34. [34]

    A syntactic neural model for general-purpose code generation,

    P. Yin and G. Neubig, “A syntactic neural model for general-purpose code generation,” in Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Vancouver, Canada: Association for Computational Linguistics, Jul. 2017, pp. 440–450. [Online]. Available: https://www.aclweb.org/anthology/P17-1041

  35. [35]

    Abstract Syntax Networks for Code Generation and Semantic Parsing

    M. Rabinovich, M. Stern, and D. Klein, “Abstract syntax networks for code generation and semantic parsing,” arXiv preprint arXiv:1704.07535, 2017

  36. [36]

    Towards Synthesizing Complex Programs from Input-Output Examples

    X. Chen, C. Liu, and D. Song, “Towards synthesizing complex programs from input-output examples,” arXiv preprint arXiv:1706.01284, 2017

  37. [37]

    Execution-guided neural program synthesis,

    ——, “Execution-guided neural program synthesis,” in International Conference on Learning Representations, 2019. [Online]. Available: https://openreview.net/forum?id=H1gfOiAqYm 10

  38. [38]

    Deepfix: Fixing common c language errors by deep learning,

    R. Gupta, S. Pal, A. Kanade, and S. Shevade, “Deepfix: Fixing common c language errors by deep learning,” in Thirty-First AAAI Conference on Artificial Intelligence, 2017

  39. [39]

    Learning Program Embeddings to Propagate Feedback on Student Code

    C. Piech, J. Huang, A. Nguyen, M. Phulsuksombati, M. Sahami, and L. Guibas, “Learning program embeddings to propagate feedback on student code,” arXiv preprint arXiv:1505.05969, 2015

  40. [40]

    Dynamic Neural Program Embedding for Program Repair

    K. Wang, R. Singh, and Z. Su, “Dynamic neural program embedding for program repair,” arXiv preprint arXiv:1711.07163, 2017. A Experiment Setup and Benchmark Details We ran our experiments on Amazon EC2 using p3.4xlarge instance which contains Nvidia Tesla V100 GPUs with 16GB main memory. A.1 Denoising Process We start the tokenization of the low-level ass...