LLM deobfuscation of binaries to pseudocode depends more on reasoning ability and task-specific fine-tuning than on model size, with reasoning models showing robustness across ISAs and obfuscation levels on the new BinDeObfBench.
Automated program repair in the era of large pre-trained language models
4 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
ReDef creates a revert-anchored dataset of 3,164 defective and 10,268 clean code modifications and shows that code language models perform better with diff encodings but maintain stable performance under counterfactual perturbations, indicating reliance on superficial cues.
DAgger-style training with turn-level policy interpolation raises 4B and 8B LLM agents to 27.3% and 29.8% on SWE-bench Verified, beating several larger published systems.
LLM syntax accuracy for LAMMPS scripts improved to 91% parser pass rate, yet only 1/80 scripts were scientifically correct on the hardest prompt; an agentic verification skill raised success to 5/6.
citing papers explorer
-
Can LLMs Deobfuscate Binary Code? A Systematic Analysis of Large Language Models into Pseudocode Deobfuscation
LLM deobfuscation of binaries to pseudocode depends more on reasoning ability and task-specific fine-tuning than on model size, with reasoning models showing robustness across ISAs and obfuscation levels on the new BinDeObfBench.
-
ReDef: Do Code Language Models Truly Understand Code Changes for Just-in-Time Software Defect Prediction?
ReDef creates a revert-anchored dataset of 3,164 defective and 10,268 clean code modifications and shows that code language models perform better with diff encodings but maintain stable performance under counterfactual perturbations, indicating reliance on superficial cues.
-
Revisiting DAgger in the Era of LLM-Agents
DAgger-style training with turn-level policy interpolation raises 4B and 8B LLM agents to 27.3% and 29.8% on SWE-bench Verified, beating several larger published systems.
-
Evaluating LLM-generated code for domain-specific languages: molecular dynamics with LAMMPS
LLM syntax accuracy for LAMMPS scripts improved to 91% parser pass rate, yet only 1/80 scripts were scientifically correct on the hardest prompt; an agentic verification skill raised success to 5/6.