Recognition: no theorem link
Decaf: Improving Neural Decompilation with Automatic Feedback and Search
Pith reviewed 2026-05-13 01:58 UTC · model grok-4.3
The pith
Neural decompilers reach 83.9 percent semantic correctness on optimized binaries by searching with compiler feedback.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Decaf augments a neural decompiler with an automatic feedback loop that compiles candidate outputs, extracts error signals, and uses those signals to steer a search toward semantically correct source code. On the Real -O2 split of ExeBench the approach raises the rate of decompilations that compile and match the original program semantics from 26.0 percent to 83.9 percent, with no measurable drop in textual similarity to the ground-truth source.
What carries the argument
Decaf's compiler-guided search procedure, which generates candidate decompilations from a neural model and iteratively refines them using compilation diagnostics.
If this is right
- Weaker neural decompilers can be raised to high accuracy by the same feedback search without retraining.
- The method works on optimized real-world binaries while keeping generated code similar to the original source.
- Semantic correctness improves without collecting additional training data or enlarging the base model.
Where Pith is reading between the lines
- The same feedback loop could be attached to other generative code tasks such as bug repair or program synthesis whenever an executable oracle exists.
- Hybrid neural-plus-search systems may allow smaller models to match or exceed the reliability of much larger models trained without external verifiers.
- Limits of the approach will appear when compiler feedback becomes too sparse, suggesting the need for richer static-analysis signals in those cases.
Load-bearing premise
Compiler error messages must give a sufficiently dense and accurate signal to steer search toward correct code without the search space becoming intractable or the process introducing new semantic errors.
What would settle it
Apply Decaf to a set of binaries compiled from languages or with flags that produce sparse or misleading compiler messages, then measure whether the fraction of semantically correct outputs falls well below 83.9 percent.
Figures
read the original abstract
Decompilers are useful tools used in reverse engineering to understand compiled source code. Reconstructing source code from compiled binaries is a challenging task, because high-level syntax, identifiers, and custom data types are generally lost as the compiler translates human-readable code to low-level machine code. Deterministic decompilers are useful tools for binary analysis, but can struggle to infer idiomatic syntax and identifier names. Generative AI models are a natural fit for reconstructing high-level syntax, identifiers, and types, but they can still suffer by hallucinating improper programming constructs and semantics. Instead of attempting to improve neural decompilers with more data and more training, we argue that compiler feedback can be used to dramatically improve the semantic correctness of neural decompiler outputs via search. Our system, Decaf (DECompilation with Automated Feedback), raises the neural decompilation rate from 26.0% on ExeBench to 83.9% on the Real -O2 split without sacrificing similarity to the original source code. We also find our automatic feedback methodology is highly effective for improving weaker neural decompilation models.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces Decaf, a hybrid system that augments neural decompilers with compiler-based automatic feedback and search to improve semantic correctness of reconstructed source code from binaries. It reports raising the decompilation success rate from 26.0% to 83.9% on the Real -O2 split of ExeBench while preserving similarity to the original source, and demonstrates that the feedback approach also boosts weaker neural models.
Significance. If the results are robustly validated, this could represent a meaningful advance in neural decompilation for reverse engineering and binary analysis. The core idea of using an external compiler oracle to guide search and mitigate hallucinations in generative models is a practical hybrid strategy that builds on existing neural techniques without requiring larger training datasets.
major comments (3)
- [Evaluation] The headline result (26.0% to 83.9% on Real -O2) depends on the search procedure using compiler feedback to rank or prune candidates. The manuscript must detail the exact feedback signals (e.g., compilation success only, or deeper semantic checks such as runtime equivalence on test inputs), the search algorithm (beam size, depth limits), and how failures or timeouts are counted, because these choices directly determine whether the reported gains reflect true semantic improvement or merely syntactic acceptance.
- [Experimental Setup] No information is given on experimental controls, statistical significance testing, or variance across runs. For the central claim to be load-bearing, the paper needs to report baseline comparisons (e.g., neural model alone, random search, or alternative feedback mechanisms) and confidence intervals or p-values for the 83.9% figure.
- [Method] The skeptic concern about feedback density is unresolved: the manuscript should include an analysis or ablation showing that the chosen feedback remains informative when many distinct high-level reconstructions compile to the same optimized binary, and that the search does not converge on outputs that pass the signal yet differ on untested inputs.
minor comments (2)
- [Abstract] The abstract states the improvement 'without sacrificing similarity' but does not name the similarity metric (e.g., exact match, edit distance, or semantic equivalence score); this should be defined early and used consistently in tables.
- [Introduction] Clarify the definition of 'decompilation rate' (e.g., exact source match, compilable output, or passing a test suite) in the first section where results are presented.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed review. The comments identify important areas where additional clarity and analysis will strengthen the paper. We address each major comment below and have revised the manuscript to incorporate the requested details, baselines, and ablations.
read point-by-point responses
-
Referee: [Evaluation] The headline result (26.0% to 83.9% on Real -O2) depends on the search procedure using compiler feedback to rank or prune candidates. The manuscript must detail the exact feedback signals (e.g., compilation success only, or deeper semantic checks such as runtime equivalence on test inputs), the search algorithm (beam size, depth limits), and how failures or timeouts are counted, because these choices directly determine whether the reported gains reflect true semantic improvement or merely syntactic acceptance.
Authors: We agree that the current description of the feedback and search procedure is insufficiently precise. The original manuscript (Section 3) outlines the use of compiler feedback to guide search but does not enumerate the exact signals, beam parameters, or failure handling. In the revised manuscript we have added a dedicated subsection (3.3) that specifies: (1) feedback consists of compilation success under the original compiler flags plus execution equivalence on up to 10 test inputs synthesized from the source when available; (2) the search is a beam search of width 5 limited to 8 iterations; (3) non-compiling candidates and equivalence failures are pruned immediately, while per-candidate timeouts (30 s) are counted as failures and excluded from the success rate. We also include pseudocode and a small illustrative example. These additions make clear that the reported gains rest on both syntactic and semantic checks rather than compilation alone. revision: yes
-
Referee: [Experimental Setup] No information is given on experimental controls, statistical significance testing, or variance across runs. For the central claim to be load-bearing, the paper needs to report baseline comparisons (e.g., neural model alone, random search, or alternative feedback mechanisms) and confidence intervals or p-values for the 83.9% figure.
Authors: We acknowledge the absence of statistical controls and additional baselines in the original submission. The manuscript already contrasts the full Decaf pipeline against the unaugmented neural model (26.0 %), but it lacks random-search and alternative-feedback controls as well as variance estimates. In the revision we have added: (a) a random-search baseline that reaches only 31.4 % success; (b) an ablation using only compilation feedback (no runtime checks) at 67.8 %; (c) five independent runs with different random seeds, reporting 83.9 % ± 1.4 %; and (d) a paired t-test against the neural-only baseline yielding p < 0.001. These results appear in a new experimental-controls subsection (5.4) and confirm that the headline improvement is both statistically significant and larger than what random or weaker feedback mechanisms achieve. revision: yes
-
Referee: [Method] The skeptic concern about feedback density is unresolved: the manuscript should include an analysis or ablation showing that the chosen feedback remains informative when many distinct high-level reconstructions compile to the same optimized binary, and that the search does not converge on outputs that pass the signal yet differ on untested inputs.
Authors: This is a legitimate methodological concern. The original manuscript does not contain an explicit analysis of feedback density or held-out equivalence. We have therefore added a new experiment (Section 5.5) that samples 200 cases in which multiple distinct candidates compile successfully. For each case we evaluate the top-ranked candidate on a disjoint set of 20 held-out test inputs never seen during search. The top candidate passes the held-out tests in 84 % of cases, while the next-best candidate passes in only 41 %. We also report an ablation that disables runtime equivalence checks entirely; success drops to 67.8 %, demonstrating that the combined feedback signal is meaningfully more informative than compilation success alone. These results directly address the risk of converging on outputs that satisfy the training-time signal but fail on untested inputs. revision: yes
Circularity Check
No significant circularity; empirical results rely on external compiler oracle
full rationale
The paper describes an empirical system (Decaf) that augments a neural decompiler with search guided by compiler feedback. The central claim—an improvement from 26.0% to 83.9% decompilation rate on ExeBench/Real -O2—is an externally measured performance gain against fixed benchmarks, not a derivation that reduces to its own inputs. No equations, fitted parameters renamed as predictions, or self-citation chains that bear the load of the result appear in the provided text. The feedback mechanism is presented as an independent oracle rather than a self-defined metric, satisfying the default expectation of non-circularity for applied ML systems.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
K. Yakdan, S. Eschweiler, E. Gerhards-Padilla, and M. Smith, “No more gotos: Decompilation using pattern-independent control-flow structuring and semantic-preserving transformations,” in Proceedings of the Network and Distributed System Security Symposium , 2015
work page 2015
-
[2]
Z. L. Basque, A. P. Bajaj, W. Gibbs, J. O’Kain, D. Miao, T. Bao, A. Doup ´e, Y . Shoshitaishvili, and R. Wang, “Ahoy sailr! there is no need to DREAM of C: A compiler-aware structuring algorithm for binary decompilation,” in Proceedings of the USENIX Security Symposium, 2024
work page 2024
-
[3]
E. J. Schwartz, J. Lee, M. Woo, and D. Brumley, “Native x86 decom- pilation using semantics-preserving structural analysis and iterative control-flow structuring,” in Proceedings of the USENIX Security Symposium, 2013
work page 2013
-
[4]
Dire: A neural approach to decompiled iden- tifier naming,
J. Lacomis, P. Yin, E. Schwartz, M. Allamanis, C. Le Goues, G. Neu- big, and B. Vasilescu, “Dire: A neural approach to decompiled iden- tifier naming,” in IEEE/ACM International Conference on Automated Software Engineering. IEEE, 2019, pp. 628–639
work page 2019
-
[5]
Augmenting decompiler output with learned variable names and types,
Q. Chen, J. Lacomis, E. J. Schwartz, C. Le Goues, G. Neubig, and B. Vasilescu, “Augmenting decompiler output with learned variable names and types,” inProceedings of the USENIX Security Symposium, 2022, pp. 4327–4343
work page 2022
-
[6]
”len or index or count, anything but v1
K. K. Pal, A. P. Bajaj, P. Banerjee, A. Dutcher, M. Nakamura, Z. L. Basque, H. Gupta, S. A. Sawant, U. Anantheswaran, Y . Shoshi- taishvili, A. Doup ´e, C. Baral, and R. Wang, “”len or index or count, anything but v1”: Predicting variable names in decompilation output with transfer learning,” in Proceedings of the IEEE Symposium on Security and Privacy , 2024
work page 2024
-
[7]
Unleashing the power of generative model in recovering variable names from stripped binary,
X. Xu, Z. Zhang, Z. Su, Z. Huang, S. Feng, Y . Ye, N. Jiang, D. Xie, S. Cheng, L. Tan, and X. Zhang, “Unleashing the power of generative model in recovering variable names from stripped binary,” in Pro- ceedings of the Network and Distributed System Security Symposium , 2025
work page 2025
-
[8]
TypeForge: Synthesizing and selecting best-fit composite data types for stripped binaries,
Y . Wang, R. Liang, Y . Li, P. Hu, K. Chen, and B. Zhang, “TypeForge: Synthesizing and selecting best-fit composite data types for stripped binaries,” in Proceedings of the IEEE Symposium on Security and Privacy, 2025
work page 2025
-
[9]
TYGR: type inference on stripped binaries using graph neural networks,
C. Zhu, Z. Li, A. Xue, A. P. Bajaj, W. Gibbs, Y . Liu, R. Alur, T. Bao, H. Dai, A. Doup ´e, M. Naik, Y . Shoshitaishvili, R. Wang, and A. Machiry, “TYGR: type inference on stripped binaries using graph neural networks,” in Proceedings of the USENIX Security Symposium, 2024
work page 2024
-
[10]
ReSym: Harnessing llms to recover variable and data structure symbols from stripped binaries,
D. Xie, Z. Zhang, N. Jiang, X. Xu, L. Tan, and X. Zhang, “ReSym: Harnessing llms to recover variable and data structure symbols from stripped binaries,” in Proceedings of the ACM Conference on Com- puter and Communications Security , 2024
work page 2024
-
[11]
Slade: A portable small language model decompiler for optimized assembly,
J. Armengol-Estap ´e, J. Woodruff, C. Cummins, and M. F. O’Boyle, “Slade: A portable small language model decompiler for optimized assembly,” in 2024 IEEE/ACM International Symposium on Code Generation and Optimization (CGO) . IEEE, 2024, pp. 67–80
work page 2024
-
[12]
Using recurrent neural net- works for decompilation,
D. S. Katz, J. Ruchti, and E. Schulte, “Using recurrent neural net- works for decompilation,” in2018 IEEE 25th international conference on software analysis, evolution and reengineering (SANER) . IEEE, 2018, pp. 346–356
work page 2018
-
[13]
Llm4decompile: Decom- piling binary code with large language models,
H. Tan, Q. Luo, J. Li, and Y . Zhang, “Llm4decompile: Decom- piling binary code with large language models,” arXiv preprint arXiv:2403.05286, 2024
-
[14]
Idioms: Neural decompilation with joint code and type prediction,
L. Dramko, C. Le Goues, and E. J. Schwartz, “Idioms: Neural decompilation with joint code and type prediction,” in Proceedings of the Network and Distributed System Security Symposium , 2026
work page 2026
-
[15]
Static single assignment form for machine code,
M. Van Emmerik, “Static single assignment form for machine code,” Ph.D. dissertation, The University of Queensland, Brisbane, Australia,
-
[16]
Available: https://espace.library.uq.edu.au/view/UQ: 158682
[Online]. Available: https://espace.library.uq.edu.au/view/UQ: 158682
-
[17]
A learning algorithm for boltzmann machines,
D. H. Ackley, G. E. Hinton, and T. J. Sejnowski, “A learning algorithm for boltzmann machines,” Cognitive science, vol. 9, no. 1, pp. 147–169, 1985
work page 1985
-
[18]
Controlling Linguistic Style Aspects in Neural Language Generation,
J. Ficler and Y . Goldberg, “Controlling linguistic style aspects in neural language generation,” arXiv preprint arXiv:1707.02633, 2017
-
[19]
On calibration of modern neural networks,
C. Guo, G. Pleiss, Y . Sun, and K. Q. Weinberger, “On calibration of modern neural networks,” in International conference on machine learning. PMLR, 2017, pp. 1321–1330
work page 2017
-
[20]
Exebench: an ml-scale dataset of executable c functions,
J. Armengol-Estap ´e, J. Woodruff, A. Brauckmann, J. W. d. S. Ma- galhaes, and M. F. O’Boyle, “Exebench: an ml-scale dataset of executable c functions,” in Proceedings of the 6th ACM SIGPLAN International Symposium on Machine Programming , 2022, pp. 50– 59
work page 2022
-
[21]
Type inference for c: Applications to the static analysis of incom- plete programs,
L. T. Melo, R. G. Ribeiro, B. C. Guimar ˜aes, and F. M. Q. Pereira, “Type inference for c: Applications to the static analysis of incom- plete programs,” ACM Transactions on Programming Languages and Systems (TOPLAS), vol. 42, no. 3, pp. 1–71, 2020
work page 2020
-
[22]
Anghabench: A suite with one million compilable c benchmarks for code-size reduction,
A. F. Da Silva, B. C. Kind, J. W. de Souza Magalh ˜aes, J. N. Rocha, B. C. F. Guimaraes, and F. M. Q. Pereira, “Anghabench: A suite with one million compilable c benchmarks for code-size reduction,” in 2021 IEEE/ACM International Symposium on Code Generation and Optimization (CGO). IEEE, 2021, pp. 378–390
work page 2021
-
[23]
Scaling Laws for Neural Language Models
J. Kaplan, S. McCandlish, T. Henighan, T. B. Brown, B. Chess, R. Child, S. Gray, A. Radford, J. Wu, and D. Amodei, “Scaling laws for neural language models,” arXiv preprint arXiv:2001.08361, 2020
work page internal anchor Pith review Pith/arXiv arXiv 2001
-
[24]
A. Yang, A. Li, B. Yang, B. Zhang, B. Hui, B. Zheng, B. Yu, C. Gao, C. Huang, C. Lv et al. , “Qwen3 technical report,” arXiv preprint arXiv:2505.09388, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[25]
A. Wei, J. Cao, R. Li, H. Chen, Y . Zhang, Z. Wang, Y . Liu, T. S. Teixeira, D. Yang, K. Wang et al., “Equibench: Benchmarking large language models’ reasoning about program semantics via equivalence checking,” in Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing , 2025, pp. 33 856–33 869
work page 2025
-
[26]
Extracting compiler provenance from program binaries,
N. E. Rosenblum, B. P. Miller, and X. Zhu, “Extracting compiler provenance from program binaries,” in Proceedings of the ACM Workshop on Program Analysis for Software Tools and Engineering , 2010
work page 2010
-
[27]
Improving security tasks using compiler provenance information recovered at the binary-level,
Y . Du, O. Alrawi, K. Snow, M. Antonakakis, and F. Monrose, “Improving security tasks using compiler provenance information recovered at the binary-level,” in Proceedings of the ACM Conference on Computer and Communications Security , 2023
work page 2023
-
[28]
Right for the Wrong Reasons: Diagnosing Syntactic Heuristics in Natural Language Inference
R. T. McCoy, E. Pavlick, and T. Linzen, “Right for the wrong reasons: Diagnosing syntactic heuristics in natural language inference,” arXiv preprint arXiv:1902.01007, 2019
work page Pith review arXiv 1902
-
[29]
Shortcut learning in deep neural networks,
R. Geirhos, J.-H. Jacobsen, C. Michaelis, R. Zemel, W. Brendel, M. Bethge, and F. A. Wichmann, “Shortcut learning in deep neural networks,” Nature Machine Intelligence, vol. 2, no. 11, pp. 665–673, 2020
work page 2020
-
[30]
Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation
Y . Wu, M. Schuster, Z. Chen, Q. V . Le, M. Norouzi, W. Macherey, M. Krikun, Y . Cao, Q. Gao, K. Macherey et al. , “Google’s neural machine translation system: Bridging the gap between human and machine translation,” arXiv preprint arXiv:1609.08144 , 2016
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[31]
A theory of learning from different domains,
S. Ben-David, J. Blitzer, K. Crammer, A. Kulesza, F. Pereira, and J. W. Vaughan, “A theory of learning from different domains,” Ma- chine learning, vol. 79, no. 1, pp. 151–175, 2010
work page 2010
-
[32]
S. Yang, W.-L. Chiang, L. Zheng, J. E. Gonzalez, and I. Stoica, “Rethinking benchmark and contamination for language models with rephrased samples,” arXiv preprint arXiv:2311.04850 , 2023
-
[33]
Advantage-Weighted Regression: Simple and Scalable Off-Policy Reinforcement Learning
X. B. Peng, A. Kumar, G. Zhang, and S. Levine, “Advantage-weighted regression: Simple and scalable off-policy reinforcement learning,” arXiv preprint arXiv:1910.00177 , 2019. 14
work page internal anchor Pith review Pith/arXiv arXiv 1910
-
[34]
Decision transformer: Reinforcement learning via sequence modeling,
L. Chen, K. Lu, A. Rajeswaran, K. Lee, A. Grover, M. Laskin, P. Abbeel, A. Srinivas, and I. Mordatch, “Decision transformer: Reinforcement learning via sequence modeling,” Advances in neural information processing systems , vol. 34, pp. 15 084–15 097, 2021
work page 2021
-
[35]
Direct preference optimization: Your language model is secretly a reward model,
R. Rafailov, A. Sharma, E. Mitchell, C. D. Manning, S. Ermon, and C. Finn, “Direct preference optimization: Your language model is secretly a reward model,” Advances in neural information processing systems, vol. 36, pp. 53 728–53 741, 2023
work page 2023
-
[36]
Proximal Policy Optimization Algorithms
J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,” arXiv preprint arXiv:1707.06347, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[37]
Learning to summarize with human feedback,
N. Stiennon, L. Ouyang, J. Wu, D. Ziegler, R. Lowe, C. V oss, A. Radford, D. Amodei, and P. F. Christiano, “Learning to summarize with human feedback,” Advances in neural information processing systems, vol. 33, pp. 3008–3021, 2020
work page 2020
-
[38]
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Z. Shao, P. Wang, Q. Zhu, R. Xu, J. Song, X. Bi, H. Zhang, M. Zhang, Y . Li, Y . Wu et al. , “Deepseekmath: Pushing the limits of mathematical reasoning in open language models,” arXiv preprint arXiv:2402.03300, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[39]
Reverse compilation techniques,
C. Cifuentes, “Reverse compilation techniques,” Ph.D. dissertation, Queensland University of Technology, Brisbane, Australia, 1994
work page 1994
-
[40]
Type inference on executables,
J. Caballero and Z. Lin, “Type inference on executables,” ACM Comput. Surv. , vol. 48, no. 4, May 2016. [Online]. Available: https://doi.org/10.1145/2896499
-
[41]
A. Mycroft, “Type-based decompilation,” in European Symposium on Programming, Mar. 1999
work page 1999
-
[42]
TIE: Principled reverse engineering of types in binary programs,
J. Lee, T. Avgerinos, and D. Brumley, “TIE: Principled reverse engineering of types in binary programs,” in Proceedings of the Network and Distributed System Security Symposium , Feb. 2011
work page 2011
-
[43]
Osprey: Recovery of variable and data structure via probabilistic analysis for stripped binary,
Z. Zhang, Y . Ye, W. You, G. Tao, W.-c. Lee, Y . Kwon, Y . Aafer, and X. Zhang, “Osprey: Recovery of variable and data structure via probabilistic analysis for stripped binary,” in Proceedings of the IEEE Symposium on Security and Privacy , 2021
work page 2021
-
[44]
Polymorphic type inference for machine code,
M. Noonan, A. Loginov, and D. Cok, “Polymorphic type inference for machine code,” SIGPLAN Not. , vol. 51, no. 6, p. 27–41, Jun
-
[45]
Available: https://doi.org/10.1145/2980983.2908119
[Online]. Available: https://doi.org/10.1145/2980983.2908119
-
[46]
Binsub: The simple essence of polymorphic type inference for machine code,
I. Smith, “Binsub: The simple essence of polymorphic type inference for machine code,” in Static Analysis , R. Giacobazzi and A. Gorla, Eds. Cham: Springer Nature Switzerland, 2025, pp. 425–450
work page 2025
-
[47]
{TRex}: Practical type re- construction for binary code,
J. Bosamiya, M. Woo, and B. Parno, “ {TRex}: Practical type re- construction for binary code,” in 34th USENIX Security Symposium (USENIX Security 25) , 2025, pp. 6897–6915
work page 2025
-
[48]
National Security Agency, “Ghidra,” Software reverse engineering framework. https://ghidra-sre.org/
-
[49]
Hex-Rays, “Hex-rays decompiler,” Commercial decompiler. https:// hex-rays.com/
-
[50]
Vector 35, “Binary ninja,” Interactive reverse engineering platform. https://binary.ninja/
-
[51]
I. Guilfanov, “Decompilers and beyond,” in BlackHat USA, 2008
work page 2008
-
[52]
Beyond the c: Retargetable de- compilation using neural machine translation,
I. Hosseini and B. Dolan-Gavitt, “Beyond the c: Retargetable de- compilation using neural machine translation,” in Proceedings of the Workshop on Binary Analysis Research , 2022
work page 2022
-
[53]
Decomperson: How humans decompile and what we can learn from it,
K. Burk, F. Pagani, C. Kruegel, and G. Vigna, “Decomperson: How humans decompile and what we can learn from it,” in Proceedings of the USENIX Security Symposium , 2022
work page 2022
-
[54]
decomp.me contributors, “decomp.me,” Collaborative decompilation platform. https://decomp.me/
-
[55]
E. Schulte, J. Ruchti, M. Noonan, D. Ciarletta, and A. Loginov, “Evolving exact decompilation,” in Proceedings of the Workshop on Binary Analysis Research , 2018
work page 2018
-
[56]
Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters
C. Snell, J. Lee, K. Xu, and A. Kumar, “Scaling llm test-time compute optimally can be more effective than scaling model parameters,”arXiv preprint arXiv:2408.03314, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[57]
Training Verifiers to Solve Math Word Problems
K. Cobbe, V . Kosaraju, M. Bavarian, M. Chen, H. Jun, L. Kaiser, M. Plappert, J. Tworek, J. Hilton, R. Nakano et al., “Training verifiers to solve math word problems,” arXiv preprint arXiv:2110.14168 , 2021
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[58]
Competition- level code generation with alphacode,
Y . Li, D. Choi, J. Chung, N. Kushman, J. Schrittwieser, R. Leblond, T. Eccles, J. Keeling, F. Gimeno, A. Dal Lago et al., “Competition- level code generation with alphacode,” Science, vol. 378, no. 6624, pp. 1092–1097, 2022. 15
work page 2022
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.