Beyond the Edge of Function: Unraveling the Patterns of Type Recovery in Binary Code

Gangyang Li; Junqi Zhang; Li Hu; Nenghai Yu; Shaoyin Cheng; Weiming Zhang; Xiuwei Shang; Xu Zhu

arxiv: 2503.07243 · v2 · submitted 2025-03-10 · 💻 cs.CR

Beyond the Edge of Function: Unraveling the Patterns of Type Recovery in Binary Code

Gangyang Li , Xiuwei Shang , Shaoyin Cheng , Junqi Zhang , Li Hu , Xu Zhu , Weiming Zhang , Nenghai Yu This is my paper

Pith reviewed 2026-05-23 01:04 UTC · model grok-4.3

classification 💻 cs.CR

keywords type recoverybinary analysisinter-procedural analysisgraph neural networkscompiler optimizationsreverse engineeringdecompilation

0 comments

The pith

ByteTR leads state-of-the-art in recovering variable types from binary code through inter-procedural data flow analysis.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The authors perform an empirical study on a dataset of 163,643 binaries to uncover patterns in variable types and the effects of compiler optimizations. They then build ByteTR, which decouples the type set to manage imbalance, applies static analysis to account for optimizations, traces variable propagation across functions, and uses a gated graph neural network to model long-range dependencies. This design targets the complexity of real-world binaries beyond single-function analysis. A sympathetic reader would care because accurate type recovery is key to readable decompiled code and effective security analysis.

Core claim

ByteTR leads state-of-the-art works in both effectiveness and efficiency. In real CTF challenge cases, the pseudo code optimized by ByteTR significantly improves readability, surpassing leading tools IDA and Ghidra. The framework decouples the target type set, performs static program analysis for compiler optimization impacts, conducts inter-procedural analysis to trace variable propagation, and employs a gated graph neural network to capture long-range data flow dependencies.

What carries the argument

Inter-procedural analysis combined with a gated graph neural network to trace variable propagation and capture data flow dependencies across functions, after decoupling the type set and static analysis.

If this is right

ByteTR achieves superior effectiveness and efficiency compared to existing type recovery methods.
The approach handles unbalanced type distributions and the effects of compiler optimizations.
Optimized pseudo code from ByteTR offers better readability in practical reverse engineering scenarios like CTF challenges.
Variable type recovery benefits from considering propagation beyond individual functions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The empirical patterns identified could guide similar studies for other binary analysis tasks such as control flow recovery.
Extending the inter-procedural tracing might further improve performance on heavily optimized or obfuscated code.
The method's success on four architectures suggests potential for broader applicability in cross-platform analysis.

Load-bearing premise

The TYDA dataset fully reflects the complexity and diversity of real-world programs.

What would settle it

Running ByteTR and competing tools on a new collection of binary programs compiled with different options or from different sources and measuring type recovery accuracy against ground truth.

Figures

Figures reproduced from arXiv: 2503.07243 by Gangyang Li, Junqi Zhang, Li Hu, Nenghai Yu, Shaoyin Cheng, Weiming Zhang, Xiuwei Shang, Xu Zhu.

**Figure 2.** Figure 2: Statistical analysis of variable propagation. (a) denotes the proportion of the number of functions [PITH_FULL_IMAGE:figures/full_fig_p012_2.png] view at source ↗

**Figure 3.** Figure 3: Variable storage patterns across different architectures and optimization options [PITH_FULL_IMAGE:figures/full_fig_p012_3.png] view at source ↗

**Figure 4.** Figure 4: Overview of ByteTR. callee-saved registers making the situation slightly different, with more on ARM architecture and less on x86_32 architecture. Different storage patterns for variables lead to different program behaviors, with stack-based storage variables exhibiting more memory accesses, while register-based ones do not. Thus approaches based on dynamic program analysis [18] that capture accesses to v… view at source ↗

**Figure 5.** Figure 5: The full set of predictable types presented in the form of the BNF paradigm in our formulation. [PITH_FULL_IMAGE:figures/full_fig_p014_5.png] view at source ↗

**Figure 7.** Figure 7: Transformation from Variable Propagation Graph to Variable Semantic Graph [PITH_FULL_IMAGE:figures/full_fig_p018_7.png] view at source ↗

**Figure 8.** Figure 8: Comparison of precision between StateFormer, TYGR, and ByteTR [PITH_FULL_IMAGE:figures/full_fig_p024_8.png] view at source ↗

**Figure 9.** Figure 9: Precision and latency of different calling depth. The depth=1 means that inter-procedural analysis is [PITH_FULL_IMAGE:figures/full_fig_p025_9.png] view at source ↗

**Figure 10.** Figure 10: A real-world case in a CTF challenge where a function implements the call instruction of a virtual [PITH_FULL_IMAGE:figures/full_fig_p026_10.png] view at source ↗

read the original abstract

Type recovery is a crucial step in binary code analysis, holding significant importance for reverse engineering and various security applications. Existing works typically simply target type identifiers within binary code and achieve type recovery by analyzing variable characteristics within functions. However, we find that the types in real-world binary programs are more complex and often follow specific distribution patterns. In this paper, to gain a profound understanding of the variable type recovery problem in binary code, we first conduct a comprehensive empirical study. We utilize the TYDA dataset, which includes 163,643 binary programs across four architectures and four compiler optimization options, fully reflecting the complexity and diversity of real-world programs. We carefully study the unique patterns that characterize types and variables in binary code, and also investigate the impact of compiler optimizations on them, yielding many valuable insights. Based on our empirical findings, we propose ByteTR, a framework for recovering variable types in binary code. We decouple the target type set to address the issue of unbalanced type distribution and perform static program analysis to tackle the impact of compiler optimizations on variable storage. In light of the ubiquity of variable propagation across functions observed in our study, ByteTR conducts inter-procedural analysis to trace variable propagation and employs a gated graph neural network to capture long-range data flow dependencies for variable type recovery. We conduct extensive experiments to evaluate the performance of ByteTR. The results demonstrate that ByteTR leads state-of-the-art works in both effectiveness and efficiency. Moreover, in real CTF challenge case, the pseudo code optimized by ByteTR significantly improves readability, surpassing leading tools IDA and Ghidra.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

ByteTR adds inter-procedural tracing and gated GNNs to type recovery after an empirical study on a large binary dataset, but the superiority claims sit on unshown experiment details and an unverified dataset.

read the letter

ByteTR extends prior type recovery work by decoupling the target type set, running compiler-aware static analysis, tracing variable propagation across functions, and feeding that into a gated GNN for data-flow dependencies. The empirical study on the TYDA collection of 163k binaries across four architectures and optimization levels is what drove those choices, and the CTF case where the output beats IDA and Ghidra on readability is the concrete payoff they highlight.

Referee Report

2 major / 1 minor

Summary. The paper reports an empirical study of variable type recovery in stripped binaries using the TYDA corpus (163643 programs, four architectures, four optimization levels). It identifies distribution patterns and compiler-induced storage effects, then presents ByteTR: a pipeline that decouples the type vocabulary, performs static analysis to mitigate optimization artifacts, and applies inter-procedural gated graph neural networks to propagate type information across function boundaries. Experiments are said to show ByteTR outperforming prior work in both accuracy and speed; a CTF case study claims improved decompiler output readability relative to IDA and Ghidra.

Significance. If the empirical patterns and performance numbers are reproducible on representative corpora, the work would supply both diagnostic insights into type distributions and a concrete inter-procedural modeling technique that existing intra-procedural type-recovery systems largely omit. The explicit handling of compiler storage effects and the use of GGNNs for long-range data-flow are technically substantive contributions that could be adopted by production reverse-engineering tools.

major comments (2)

[Abstract / TYDA description] Abstract and § on TYDA construction: the claim that the 163643-program corpus 'fully reflects the complexity and diversity of real-world programs' is load-bearing for every generalization and SOTA claim, yet the manuscript provides no sampling frame, inclusion criteria for stripped or obfuscated binaries, or quantitative comparison against the corpora used by prior type-recovery papers (e.g., those underlying EKLAVYA, TypeMiner, or DIRTY). Without this, the reported 'unique patterns' and ByteTR's measured superiority cannot be assessed for external validity.
[Experiments / Evaluation] Experimental section (results tables): the abstract asserts leadership in 'effectiveness and efficiency' and superiority on a CTF case, but the provided text contains no precision/recall numbers, baseline implementations, ablation studies, or statistical significance tests. These omissions make it impossible to verify whether the inter-procedural GGNN component is responsible for the claimed gains or whether the results are driven by dataset artifacts.

minor comments (1)

[ByteTR architecture] Notation for the gated graph neural network message-passing equations should be expanded with explicit update and aggregation functions to allow replication.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed review. We address each major comment below and describe the revisions we will make to improve clarity and rigor.

read point-by-point responses

Referee: [Abstract / TYDA description] Abstract and § on TYDA construction: the claim that the 163643-program corpus 'fully reflects the complexity and diversity of real-world programs' is load-bearing for every generalization and SOTA claim, yet the manuscript provides no sampling frame, inclusion criteria for stripped or obfuscated binaries, or quantitative comparison against the corpora used by prior type-recovery papers (e.g., those underlying EKLAVYA, TypeMiner, or DIRTY). Without this, the reported 'unique patterns' and ByteTR's measured superiority cannot be assessed for external validity.

Authors: We agree that the current description of TYDA does not sufficiently document its construction to support the strong claim of representativeness. The dataset was assembled from open-source C/C++ projects compiled for the four architectures and optimization levels, with a focus on stripped binaries; however, explicit sampling frames, inclusion/exclusion criteria, and direct quantitative comparisons to the corpora of EKLAVYA, TypeMiner, and DIRTY are absent. We will revise the TYDA section to add these details (including a comparison table) and will moderate the abstract language from 'fully reflects the complexity and diversity of real-world programs' to 'captures substantial complexity and diversity across common architectures and optimization levels.' These changes will allow better evaluation of external validity. revision: yes
Referee: [Experiments / Evaluation] Experimental section (results tables): the abstract asserts leadership in 'effectiveness and efficiency' and superiority on a CTF case, but the provided text contains no precision/recall numbers, baseline implementations, ablation studies, or statistical significance tests. These omissions make it impossible to verify whether the inter-procedural GGNN component is responsible for the claimed gains or whether the results are driven by dataset artifacts.

Authors: We acknowledge that the experimental presentation must be strengthened for verifiability. While the manuscript reports comparative results and a CTF case study, we will expand the evaluation section to include (1) explicit precision/recall tables with all baselines, (2) ablation studies isolating the gated GNN and inter-procedural components, (3) references to baseline implementations, and (4) statistical significance tests. We will also add analysis addressing potential dataset artifacts by relating results back to the empirical patterns identified earlier in the paper. These additions will make the contribution of each ByteTR component clearer. revision: yes

Circularity Check

0 steps flagged

No circularity; empirical pipeline is self-contained

full rationale

The paper first collects and analyzes the external TYDA dataset of 163643 binaries to extract observed type-distribution patterns and compiler effects, then builds ByteTR by applying standard decoupling, static analysis, inter-procedural tracing, and GGNN components to those observations, and finally reports performance numbers on the same corpus. None of the load-bearing steps (pattern identification, model construction, or evaluation) reduce by definition or by self-citation to the target claims; the derivation remains independent of its own outputs.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claims rest on domain assumptions about type distributions and variable propagation drawn from the TYDA dataset study; no free parameters or invented entities are explicitly introduced in the abstract.

axioms (2)

domain assumption Types in real-world binary programs follow specific distribution patterns that can be leveraged for recovery.
Stated as motivation for the empirical study and ByteTR design.
domain assumption Variable propagation across functions is ubiquitous in binary code.
Cited as key observation enabling inter-procedural analysis.

pith-pipeline@v0.9.0 · 5844 in / 1242 out tokens · 50800 ms · 2026-05-23T01:04:33.649936+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

71 extracted references · 71 canonical work pages · 2 internal anchors

[1]

How far we have come: Testing decompilation correctness of c decompilers

Zhibo Liu and Shuai Wang. How far we have come: Testing decompilation correctness of c decompilers. InProceedings of the 29th ACM SIGSOFT International Symposium on Software Testing and Analysis , pages 475–487, 2020

work page 2020
[2]

Arbiter: Bridging the static and dynamic divide in vulnerability discovery on binary programs

Jayakrishna Vadayath, Moritz Eckert, Kyle Zeng, Nicolaas Weideman, Gokulkrishna Praveen Menon, Yanick Fratantonio, Davide Balzarotti, Adam Doupé, Tiffany Bao, Ruoyu Wang, et al. Arbiter: Bridging the static and dynamic divide in vulnerability discovery on binary programs. In 31st USENIX Security Symposium (USENIX Security 22) , pages 413–430, 2022

work page 2022
[3]

Vulhawk: Cross-architecture vulnerability detection with entropy-based binary code search

Zhenhao Luo, Pengfei Wang, Baosheng Wang, Yong Tang, Wei Xie, Xu Zhou, Danjun Liu, and Kai Lu. Vulhawk: Cross-architecture vulnerability detection with entropy-based binary code search. In NDSS, 2023

work page 2023
[4]

When malware changed its mind: An empirical study of variable program behaviors in the real world

Erin Avllazagaj, Ziyun Zhu, Leyla Bilge, Davide Balzarotti, and Tudor Dumitras,. When malware changed its mind: An empirical study of variable program behaviors in the real world. In 30th USENIX Security Symposium (USENIX Security 21), pages 3487–3504, 2021

work page 2021
[5]

Lightweight, obfuscation-resilient detection and family iden- tification of android malware

Joshua Garcia, Mahmoud Hammad, and Sam Malek. Lightweight, obfuscation-resilient detection and family iden- tification of android malware. ACM Transactions on Software Engineering and Methodology (TOSEM) , 26(3):1–29, 2018

work page 2018
[6]

On benign features in malware detection

Michael Cao, Sahar Badihi, Khaled Ahmed, Peiyu Xiong, and Julia Rubin. On benign features in malware detection. In Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering , pages 1234–1238, 2020

work page 2020
[7]

Airtaint: Making dynamic taint analysis faster and easier

Qian Sang, Yanhao Wang, Yuwei Liu, Xiangkun Jia, Tiffany Bao, and Purui Su. Airtaint: Making dynamic taint analysis faster and easier. In 2024 IEEE Symposium on Security and Privacy (SP) , pages 3998–4014. IEEE, 2024

work page 2024
[8]

Taintmini: Detecting flow of sensitive data in mini-programs with static taint analysis

Chao Wang, Ronny Ko, Yue Zhang, Yuqing Yang, and Zhiqiang Lin. Taintmini: Detecting flow of sensitive data in mini-programs with static taint analysis. In 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE), pages 932–944. IEEE, 2023

work page 2023
[9]

Pata: Fuzzing with path aware taint analysis

Jie Liang, Mingzhe Wang, Chijin Zhou, Zhiyong Wu, Yu Jiang, Jianzhong Liu, Zhe Liu, and Jiaguang Sun. Pata: Fuzzing with path aware taint analysis. In 2022 IEEE Symposium on Security and Privacy (SP) , pages 1–17. IEEE, 2022

work page 2022
[10]

Hex-Rays SA. IDA Pro. https://www.hex-rays.com/products/ida, 2023

work page 2023
[11]

NationalSecurityAgency. Ghidra. https://github.com/NationalSecurityAgency/ghidra, 2023

work page 2023
[12]

Binary Ninja

Vector 35. Binary Ninja. https://binary.ninja/, 2023

work page 2023
[13]

Recovery of class hierarchies and composition relationships from machine code

Venkatesh Srinivasan and Thomas Reps. Recovery of class hierarchies and composition relationships from machine code. In International Conference on Compiler Construction , pages 61–84. Springer, 2014

work page 2014
[14]

Schwartz, Claire Le Goues, Graham Neubig, and Bogdan Vasilescu

Qibin Chen, Jeremy Lacomis, Edward J. Schwartz, Claire Le Goues, Graham Neubig, and Bogdan Vasilescu. Augmenting decompiler output with learned variable names and types. In 31st USENIX Security Symposium , Boston, MA, August 2022

work page 2022
[15]

Stateformer: fine-grained type recovery from binaries using generative J

Kexin Pei, Jonas Guan, Matthew Broughton, Zhongtian Chen, Songchen Yao, David Williams-King, Vikas Ummadisetty, Junfeng Yang, Baishakhi Ray, and Suman Jana. Stateformer: fine-grained type recovery from binaries using generative J. ACM, Vol. 37, No. 4, Article . Publication date: July 2025. 30 Gangyang Li et al. state modeling. In Proceedings of the 29th A...

work page 2025
[16]

Debin: Predicting debug information in stripped binaries

Jingxuan He, Pesho Ivanov, Petar Tsankov, Veselin Raychev, and Martin Vechev. Debin: Predicting debug information in stripped binaries. In Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security , pages 1667–1680, 2018

work page 2018
[17]

Gated Graph Sequence Neural Networks

Yujia Li, Daniel Tarlow, Marc Brockschmidt, and Richard Zemel. Gated graph sequence neural networks.arXiv preprint arXiv:1511.05493, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015
[18]

{TYGR}: Type inference on stripped binaries using graph neural networks

Chang Zhu, Ziyang Li, Anton Xue, Ati Priya Bajaj, Wil Gibbs, Yibo Liu, Rajeev Alur, Tiffany Bao, Hanjun Dai, Adam Doupé, et al. {TYGR}: Type inference on stripped binaries using graph neural networks. In 33rd USENIX Security Symposium (USENIX Security 24) , pages 4283–4300, 2024

work page 2024
[19]

GNU Binutils. objdump. https://sourceware.org/binutils/docs/binutils/objdump.html, 2023

work page 2023
[20]

Llm4decompile: Decompiling binary code with large language models

Hanzhuo Tan, Qi Luo, Jing Li, and Yuqun Zhang. Llm4decompile: Decompiling binary code with large language models. arXiv preprint arXiv:2403.05286, 2024

work page arXiv 2024
[21]

Beyond the c: Retargetable decompilation using neural machine translation

Iman Hosseini and Brendan Dolan-Gavitt. Beyond the c: Retargetable decompilation using neural machine translation. arXiv preprint arXiv:2212.08950, 2022

work page arXiv 2022
[22]

Towards practical binary code similarity detection: Vulnerability verification via patch semantic analysis

Shouguo Yang, Zhengzi Xu, Yang Xiao, Zhe Lang, Wei Tang, Yang Liu, Zhiqiang Shi, Hong Li, and Limin Sun. Towards practical binary code similarity detection: Vulnerability verification via patch semantic analysis. ACM Transactions on Software Engineering and Methodology , 32(6):1–29, 2023

work page 2023
[23]

Asteria-pro: Enhancing deep learning-based binary code similarity detection by incorporating domain knowledge

Shouguo Yang, Chaopeng Dong, Yang Xiao, Yiran Cheng, Zhiqiang Shi, Zhi Li, and Limin Sun. Asteria-pro: Enhancing deep learning-based binary code similarity detection by incorporating domain knowledge. ACM Trans. Softw. Eng. Methodol., 33(1), November 2023

work page 2023
[24]

A lightweight framework for function name reassignment based on large-scale stripped binaries

Han Gao, Shaoyin Cheng, Yinxing Xue, and Weiming Zhang. A lightweight framework for function name reassignment based on large-scale stripped binaries. In Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA), ISSTA 2021. Association for Computing Machinery, 2021

work page 2021
[25]

Bin2summary: Beyond function name prediction in stripped binaries with functionality-specific code embeddings

Zirui Song, Jiongyi Chen, and Kehuan Zhang. Bin2summary: Beyond function name prediction in stripped binaries with functionality-specific code embeddings. Proc. ACM Softw. Eng., 1(FSE), July 2024

work page 2024
[26]

Palmtree: Learning an assembly language model for instruction embedding

Xuezixiang Li, Yu Qu, and Heng Yin. Palmtree: Learning an assembly language model for instruction embedding. In Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security , pages 3236–3251, 2021

work page 2021
[27]

Binary code similarity detection via graph contrastive learning on intermediate representations

Xiuwei Shang, Li Hu, Shaoyin Cheng, Guoqiang Chen, Benlong Wu, Weiming Zhang, and Nenghai Yu. Binary code similarity detection via graph contrastive learning on intermediate representations. arXiv preprint arXiv:2410.18561, 2024

work page arXiv 2024
[28]

Dire: A neural approach to decompiled identifier naming

Jeremy Lacomis, Pengcheng Yin, Edward Schwartz, Miltiadis Allamanis, Claire Le Goues, Graham Neubig, and Bogdan Vasilescu. Dire: A neural approach to decompiled identifier naming. In 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE) , pages 628–639. IEEE, 2019

work page 2019
[29]

Zipf’s word frequency law in natural language: A critical review and future directions.Psychonomic bulletin & review, 21:1112–1130, 2014

Steven T Piantadosi. Zipf’s word frequency law in natural language: A critical review and future directions.Psychonomic bulletin & review, 21:1112–1130, 2014

work page 2014
[30]

Applications and explanations of zipf’s law

David MW Powers. Applications and explanations of zipf’s law. In New methods in language processing and computa- tional natural language learning , 1998

work page 1998
[31]

Zipf’ s law and heaps’s law can predict the size of potential words

Yukie Sano, Hideki Takayasu, and Misako Takayasu. Zipf’ s law and heaps’s law can predict the size of potential words. Progress of Theoretical Physics Supplement , 194:202–209, 2012

work page 2012
[32]

Exploring regularity in source code: Software science and zipf’s law

Hongyu Zhang. Exploring regularity in source code: Software science and zipf’s law. In 2008 15th Working Conference on Reverse Engineering, pages 101–110. IEEE, 2008

work page 2008
[33]

Discovering power laws in computer programs

Hongyu Zhang. Discovering power laws in computer programs. Information processing & management , 45(4):477–483, 2009

work page 2009
[34]

The locality principle

Peter J Denning. The locality principle. Communications of the ACM, 48(7):19–24, 2005

work page 2005
[35]

Gcc, the gnu compiler collection, 2024

GNU Project. Gcc, the gnu compiler collection, 2024. Accessed: 2024-01-04

work page 2024
[36]

Unleashing the hidden power of compiler optimization on binary code difference: An empirical study

Xiaolei Ren, Michael Ho, Jiang Ming, Yu Lei, and Li Li. Unleashing the hidden power of compiler optimization on binary code difference: An empirical study. In Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation , pages 142–157, 2021

work page 2021
[37]

Bincola: Diversity-sensitive contrastive learning for binary code similarity detection

Shuai Jiang, Cai Fu, Shuai He, Jianqiang Lv, Lansheng Han, and Hong Hu. Bincola: Diversity-sensitive contrastive learning for binary code similarity detection. IEEE Transactions on Software Engineering , 2024

work page 2024
[38]

len or index or count, anything but v1”: Predicting variable names in decompilation output with transfer learning

Kuntal Kumar Pal, Ati Priya Bajaj, Pratyay Banerjee, Audrey Dutcher, Mutsumi Nakamura, Zion Leonahenahe Basque, Himanshu Gupta, Saurabh Arjun Sawant, Ujjwala Anantheswaran, Yan Shoshitaishvili, et al. len or index or count, anything but v1”: Predicting variable names in decompilation output with transfer learning. In 2024 IEEE Symposium on Security and Pr...

work page 2024
[39]

System V application binary interface: AMD64 architecture processor supplement

Michael Matz and Jan Hubicka and Andreas Jaeger and Mark Mitchell. System V application binary interface: AMD64 architecture processor supplement. Technical report, x86-64 ABI, 2018. Available at https://gitlab.com/x86-psABIs/x86- J. ACM, Vol. 37, No. 4, Article . Publication date: July 2025. Beyond the Edge of Function: Unraveling the Patterns of Type Re...

work page 2018
[40]

CEA IT Security. Miasm. https://github.com/cea-sec/miasm, 2023

work page 2023
[41]

A survey of data flow analysis techniques

Ken Kennedy. A survey of data flow analysis techniques . IBM Thomas J. Watson Research Division, 1979

work page 1979
[42]

Graph matching networks for learning the similarity of graph structured objects

Yujia Li, Chenjie Gu, Thomas Dullien, Oriol Vinyals, and Pushmeet Kohli. Graph matching networks for learning the similarity of graph structured objects. In International conference on machine learning , pages 3835–3845. PMLR, 2019

work page 2019
[43]

Exploring gnn based program embedding technologies for binary related tasks

Yixin Guo, Pengcheng Li, Yingwei Luo, Xiaolin Wang, and Zhenlin Wang. Exploring gnn based program embedding technologies for binary related tasks. In Proceedings of the 30th IEEE/ACM International Conference on Program Comprehension, pages 366–377, 2022

work page 2022
[44]

Code is not natural language: Unlock the power of semantics-oriented graph representation for binary code similarity detection

Haojie He, Xingwei Lin, Ziang Weng, Ruijie Zhao, Shuitao Gan, Libo Chen, Yuede Ji, Jiashui Wang, and Zhi Xue. Code is not natural language: Unlock the power of semantics-oriented graph representation for binary code similarity detection. In 33rd USENIX Security Symposium (USENIX Security 24), PHILADELPHIA, PA , 2024

work page 2024
[45]

Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation

Kyunghyun Cho, Bart Van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. Learning phrase representations using rnn encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014
[46]

Sok:(state of) the art of war: Offensive techniques in binary analysis

Yan Shoshitaishvili, Ruoyu Wang, Christopher Salls, Nick Stephens, Mario Polino, Andrew Dutcher, John Grosen, Siji Feng, Christophe Hauser, Christopher Kruegel, et al. Sok:(state of) the art of war: Offensive techniques in binary analysis. In 2016 IEEE symposium on security and privacy (SP) , pages 138–157. IEEE, 2016

work page 2016
[47]

llasm: Naming functions in binaries by fusing encoder-only and decoder-only llms

Zihan Sha, Hao Wang, Zeyu Gao, Hui Shu, Bolun Zhang, Ziqing Wang, and Chao Zhang. llasm: Naming functions in binaries by fusing encoder-only and decoder-only llms. ACM Transactions on Software Engineering and Methodology , 2024

work page 2024
[48]

Enhancing function name prediction using votes-based name tokenization and multi-task learning

Xiaoling Zhang, Zhengzi Xu, Shouguo Yang, Zhi Li, Zhiqiang Shi, and Limin Sun. Enhancing function name prediction using votes-based name tokenization and multi-task learning. Proceedings of the ACM on Software Engineering , 1(FSE):1679–1702, 2024

work page 2024
[49]

Symlm: Predicting function names in stripped binaries via context-sensitive execution-aware code embeddings

Xin Jin, Kexin Pei, Jun Yeon Won, and Zhiqiang Lin. Symlm: Predicting function names in stripped binaries via context-sensitive execution-aware code embeddings. In Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security, pages 1631–1645, 2022

work page 2022
[50]

Direct: A transformer-based model for decompiled variable name recov-ery

Vikram Nitin, Anthony Saieva, Baishakhi Ray, and Gail Kaiser. Direct: A transformer-based model for decompiled variable name recov-ery. NLP4Prog 2021, page 48, 2021

work page 2021
[51]

Cp-bcs: Binary code summarization guided by control flow graph and pseudo code

Tong Ye, Lingfei Wu, Tengfei Ma, Xuhong Zhang, Yangkai Du, Peiyu Liu, Shouling Ji, and Wenhai Wang. Cp-bcs: Binary code summarization guided by control flow graph and pseudo code. arXiv preprint arXiv:2310.16853, 2023

work page arXiv 2023
[52]

Binary code summarization: Benchmarking chatgpt/gpt-4 and other large language models

Xin Jin, Jonathan Larson, Weiwei Yang, and Zhiqiang Lin. Binary code summarization: Benchmarking chatgpt/gpt-4 and other large language models. arXiv preprint arXiv:2312.09601, 2023

work page arXiv 2023
[53]

How far have we gone in binary code understanding using large language models

Xiuwei Shang, Shaoyin Cheng, Guoqiang Chen, Yanming Zhang, Li Hu, Xiao Yu, Gangyang Li, Weiming Zhang, and Nenghai Yu. How far have we gone in binary code understanding using large language models. In 2024 IEEE International Conference on Software Maintenance and Evolution (ICSME) , pages 1–12. IEEE, 2024

work page 2024
[54]

Binvuldet: Detecting vulnerability in binary program via decompiled pseudo code and bilstm-attention

Yan Wang, Peng Jia, Xi Peng, Cheng Huang, and Jiayong Liu. Binvuldet: Detecting vulnerability in binary program via decompiled pseudo code and bilstm-attention. Computers & Security, 125:103023, 2023

work page 2023
[55]

Resym: Harnessing llms to recover variable and data structure symbols from stripped binaries

Danning Xie, Zhuo Zhang, Nan Jiang, Xiangzhe Xu, Lin Tan, and Xiangyu Zhang. Resym: Harnessing llms to recover variable and data structure symbols from stripped binaries. In Proceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security , pages 4554–4568, 2024

work page 2024
[56]

Cati: Context-assisted type inference from stripped binaries

Ligeng Chen, Zhongling He, and Bing Mao. Cati: Context-assisted type inference from stripped binaries. In 2020 50th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN) , pages 88–98. IEEE, 2020

work page 2020
[57]

Tie: Principled reverse engineering of types in binary programs

JongHyup Lee, Thanassis Avgerinos, and David Brumley. Tie: Principled reverse engineering of types in binary programs. 2011

work page 2011
[58]

Polymorphic type inference for machine code

Matt Noonan, Alexey Loginov, and David Cok. Polymorphic type inference for machine code. In Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation , pages 27–41, 2016

work page 2016
[59]

Scalable variable and data type detection in a binary rewriter

Khaled ElWazeer, Kapil Anand, Aparna Kotha, Matthew Smithson, and Rajeev Barua. Scalable variable and data type detection in a binary rewriter. In Proceedings of the 34th ACM SIGPLAN conference on Programming language design and implementation, pages 51–60, 2013

work page 2013
[60]

Osprey: Recovery of variable and data structure via probabilistic analysis for stripped binary

Zhuo Zhang, Yapeng Ye, Wei You, Guanhong Tao, Wen-chuan Lee, Yonghwi Kwon, Yousra Aafer, and Xiangyu Zhang. Osprey: Recovery of variable and data structure via probabilistic analysis for stripped binary. In 2021 IEEE Symposium on Security and Privacy (SP) , pages 813–832. IEEE, 2021

work page 2021
[61]

Howard: A dynamic excavator for reverse engineering data structures

Asia Slowinska, Traian Stancescu, and Herbert Bos. Howard: A dynamic excavator for reverse engineering data structures. In NDSS, 2011

work page 2011
[62]

Automatic reverse engineering of data structures from binary execution

Zhiqiang Lin, Xiangyu Zhang, and Dongyan Xu. Automatic reverse engineering of data structures from binary execution. In Proceedings of the 11th Annual Information Security Symposium , pages 1–1, 2010. J. ACM, Vol. 37, No. 4, Article . Publication date: July 2025. 32 Gangyang Li et al

work page 2010
[63]

Typeminer: Recovering types in binary programs using machine learning

Alwin Maier, Hugo Gascon, Christian Wressnegger, and Konrad Rieck. Typeminer: Recovering types in binary programs using machine learning. In Detection of Intrusions and Malware, and Vulnerability Assessment: 16th International Conference, DIMV A 2019, Gothenburg, Sweden, June 19–20, 2019, Proceedings 16 , pages 288–308. Springer, 2019

work page 2019
[64]

Neural nets can learn function type signatures from binaries

Zheng Leong Chua, Shiqi Shen, Prateek Saxena, and Zhenkai Liang. Neural nets can learn function type signatures from binaries. In 26th USENIX Security Symposium (USENIX Security 17) , pages 99–116, 2017

work page 2017
[65]

A transformer-based function symbol name inference model from an assembly language for binary reversing

Hyunjin Kim, Jinyeong Bak, Kyunghyun Cho, and Hyungjoon Koo. A transformer-based function symbol name inference model from an assembly language for binary reversing. In Proceedings of the 2023 ACM Asia Conference on Computer and Communications Security , pages 951–965, 2023

work page 2023
[66]

Xfl: Naming functions in binaries with extreme multi-label learning

James Patrick-Evans, Moritz Dannehl, and Johannes Kinder. Xfl: Naming functions in binaries with extreme multi-label learning. In 2023 IEEE Symposium on Security and Privacy (SP) , pages 2375–2390. IEEE, 2023

work page 2023
[67]

Binary function clone search in the presence of code obfuscation and optimization over multi-cpu architectures

Abdullah Qasem, Mourad Debbabi, Bernard Lebel, and Marthe Kassouf. Binary function clone search in the presence of code obfuscation and optimization over multi-cpu architectures. In Proceedings of the 2023 acm asia conference on computer and communications security , pages 443–456, 2023

work page 2023
[68]

Clap: Learning transferable binary code representations with natural language supervision

Hao Wang, Zeyu Gao, Chao Zhang, Zihan Sha, Mingyang Sun, Yuchen Zhou, Wenyu Zhu, Wenju Sun, Han Qiu, and Xi Xiao. Clap: Learning transferable binary code representations with natural language supervision. In Proceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis , pages 503–515, 2024

work page 2024
[69]

Bbdetector: A precise and scalable third- party library detection in binary executables with fine-grained function-level features

Xiaoya Zhu, Junfeng Wang, Zhiyang Fang, Xiaokang Yin, and Shengli Liu. Bbdetector: A precise and scalable third- party library detection in binary executables with fine-grained function-level features. Applied Sciences, 13(1):413, 2022

work page 2022
[70]

Libam: An area matching framework for detecting third-party libraries in binaries

Siyuan Li, Yongpan Wang, Chaopeng Dong, Shouguo Yang, Hong Li, Hao Sun, Zhe Lang, Zuxin Chen, Weijie Wang, Hongsong Zhu, et al. Libam: An area matching framework for detecting third-party libraries in binaries. ACM Transactions on Software Engineering and Methodology , 33(2):1–35, 2023

work page 2023
[71]

Libdb: An effective and efficient framework for detecting third-party libraries in binaries

Wei Tang, Yanlin Wang, Hongyu Zhang, Shi Han, Ping Luo, and Dongmei Zhang. Libdb: An effective and efficient framework for detecting third-party libraries in binaries. In Proceedings of the 19th International Conference on Mining Software Repositories, pages 423–434, 2022. J. ACM, Vol. 37, No. 4, Article . Publication date: July 2025

work page 2022

[1] [1]

How far we have come: Testing decompilation correctness of c decompilers

Zhibo Liu and Shuai Wang. How far we have come: Testing decompilation correctness of c decompilers. InProceedings of the 29th ACM SIGSOFT International Symposium on Software Testing and Analysis , pages 475–487, 2020

work page 2020

[2] [2]

Arbiter: Bridging the static and dynamic divide in vulnerability discovery on binary programs

Jayakrishna Vadayath, Moritz Eckert, Kyle Zeng, Nicolaas Weideman, Gokulkrishna Praveen Menon, Yanick Fratantonio, Davide Balzarotti, Adam Doupé, Tiffany Bao, Ruoyu Wang, et al. Arbiter: Bridging the static and dynamic divide in vulnerability discovery on binary programs. In 31st USENIX Security Symposium (USENIX Security 22) , pages 413–430, 2022

work page 2022

[3] [3]

Vulhawk: Cross-architecture vulnerability detection with entropy-based binary code search

Zhenhao Luo, Pengfei Wang, Baosheng Wang, Yong Tang, Wei Xie, Xu Zhou, Danjun Liu, and Kai Lu. Vulhawk: Cross-architecture vulnerability detection with entropy-based binary code search. In NDSS, 2023

work page 2023

[4] [4]

When malware changed its mind: An empirical study of variable program behaviors in the real world

Erin Avllazagaj, Ziyun Zhu, Leyla Bilge, Davide Balzarotti, and Tudor Dumitras,. When malware changed its mind: An empirical study of variable program behaviors in the real world. In 30th USENIX Security Symposium (USENIX Security 21), pages 3487–3504, 2021

work page 2021

[5] [5]

Lightweight, obfuscation-resilient detection and family iden- tification of android malware

Joshua Garcia, Mahmoud Hammad, and Sam Malek. Lightweight, obfuscation-resilient detection and family iden- tification of android malware. ACM Transactions on Software Engineering and Methodology (TOSEM) , 26(3):1–29, 2018

work page 2018

[6] [6]

On benign features in malware detection

Michael Cao, Sahar Badihi, Khaled Ahmed, Peiyu Xiong, and Julia Rubin. On benign features in malware detection. In Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering , pages 1234–1238, 2020

work page 2020

[7] [7]

Airtaint: Making dynamic taint analysis faster and easier

Qian Sang, Yanhao Wang, Yuwei Liu, Xiangkun Jia, Tiffany Bao, and Purui Su. Airtaint: Making dynamic taint analysis faster and easier. In 2024 IEEE Symposium on Security and Privacy (SP) , pages 3998–4014. IEEE, 2024

work page 2024

[8] [8]

Taintmini: Detecting flow of sensitive data in mini-programs with static taint analysis

Chao Wang, Ronny Ko, Yue Zhang, Yuqing Yang, and Zhiqiang Lin. Taintmini: Detecting flow of sensitive data in mini-programs with static taint analysis. In 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE), pages 932–944. IEEE, 2023

work page 2023

[9] [9]

Pata: Fuzzing with path aware taint analysis

Jie Liang, Mingzhe Wang, Chijin Zhou, Zhiyong Wu, Yu Jiang, Jianzhong Liu, Zhe Liu, and Jiaguang Sun. Pata: Fuzzing with path aware taint analysis. In 2022 IEEE Symposium on Security and Privacy (SP) , pages 1–17. IEEE, 2022

work page 2022

[10] [10]

Hex-Rays SA. IDA Pro. https://www.hex-rays.com/products/ida, 2023

work page 2023

[11] [11]

NationalSecurityAgency. Ghidra. https://github.com/NationalSecurityAgency/ghidra, 2023

work page 2023

[12] [12]

Binary Ninja

Vector 35. Binary Ninja. https://binary.ninja/, 2023

work page 2023

[13] [13]

Recovery of class hierarchies and composition relationships from machine code

Venkatesh Srinivasan and Thomas Reps. Recovery of class hierarchies and composition relationships from machine code. In International Conference on Compiler Construction , pages 61–84. Springer, 2014

work page 2014

[14] [14]

Schwartz, Claire Le Goues, Graham Neubig, and Bogdan Vasilescu

Qibin Chen, Jeremy Lacomis, Edward J. Schwartz, Claire Le Goues, Graham Neubig, and Bogdan Vasilescu. Augmenting decompiler output with learned variable names and types. In 31st USENIX Security Symposium , Boston, MA, August 2022

work page 2022

[15] [15]

Stateformer: fine-grained type recovery from binaries using generative J

Kexin Pei, Jonas Guan, Matthew Broughton, Zhongtian Chen, Songchen Yao, David Williams-King, Vikas Ummadisetty, Junfeng Yang, Baishakhi Ray, and Suman Jana. Stateformer: fine-grained type recovery from binaries using generative J. ACM, Vol. 37, No. 4, Article . Publication date: July 2025. 30 Gangyang Li et al. state modeling. In Proceedings of the 29th A...

work page 2025

[16] [16]

Debin: Predicting debug information in stripped binaries

Jingxuan He, Pesho Ivanov, Petar Tsankov, Veselin Raychev, and Martin Vechev. Debin: Predicting debug information in stripped binaries. In Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security , pages 1667–1680, 2018

work page 2018

[17] [17]

Gated Graph Sequence Neural Networks

Yujia Li, Daniel Tarlow, Marc Brockschmidt, and Richard Zemel. Gated graph sequence neural networks.arXiv preprint arXiv:1511.05493, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015

[18] [18]

{TYGR}: Type inference on stripped binaries using graph neural networks

Chang Zhu, Ziyang Li, Anton Xue, Ati Priya Bajaj, Wil Gibbs, Yibo Liu, Rajeev Alur, Tiffany Bao, Hanjun Dai, Adam Doupé, et al. {TYGR}: Type inference on stripped binaries using graph neural networks. In 33rd USENIX Security Symposium (USENIX Security 24) , pages 4283–4300, 2024

work page 2024

[19] [19]

GNU Binutils. objdump. https://sourceware.org/binutils/docs/binutils/objdump.html, 2023

work page 2023

[20] [20]

Llm4decompile: Decompiling binary code with large language models

Hanzhuo Tan, Qi Luo, Jing Li, and Yuqun Zhang. Llm4decompile: Decompiling binary code with large language models. arXiv preprint arXiv:2403.05286, 2024

work page arXiv 2024

[21] [21]

Beyond the c: Retargetable decompilation using neural machine translation

Iman Hosseini and Brendan Dolan-Gavitt. Beyond the c: Retargetable decompilation using neural machine translation. arXiv preprint arXiv:2212.08950, 2022

work page arXiv 2022

[22] [22]

Towards practical binary code similarity detection: Vulnerability verification via patch semantic analysis

Shouguo Yang, Zhengzi Xu, Yang Xiao, Zhe Lang, Wei Tang, Yang Liu, Zhiqiang Shi, Hong Li, and Limin Sun. Towards practical binary code similarity detection: Vulnerability verification via patch semantic analysis. ACM Transactions on Software Engineering and Methodology , 32(6):1–29, 2023

work page 2023

[23] [23]

Asteria-pro: Enhancing deep learning-based binary code similarity detection by incorporating domain knowledge

Shouguo Yang, Chaopeng Dong, Yang Xiao, Yiran Cheng, Zhiqiang Shi, Zhi Li, and Limin Sun. Asteria-pro: Enhancing deep learning-based binary code similarity detection by incorporating domain knowledge. ACM Trans. Softw. Eng. Methodol., 33(1), November 2023

work page 2023

[24] [24]

A lightweight framework for function name reassignment based on large-scale stripped binaries

Han Gao, Shaoyin Cheng, Yinxing Xue, and Weiming Zhang. A lightweight framework for function name reassignment based on large-scale stripped binaries. In Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA), ISSTA 2021. Association for Computing Machinery, 2021

work page 2021

[25] [25]

Bin2summary: Beyond function name prediction in stripped binaries with functionality-specific code embeddings

Zirui Song, Jiongyi Chen, and Kehuan Zhang. Bin2summary: Beyond function name prediction in stripped binaries with functionality-specific code embeddings. Proc. ACM Softw. Eng., 1(FSE), July 2024

work page 2024

[26] [26]

Palmtree: Learning an assembly language model for instruction embedding

Xuezixiang Li, Yu Qu, and Heng Yin. Palmtree: Learning an assembly language model for instruction embedding. In Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security , pages 3236–3251, 2021

work page 2021

[27] [27]

Binary code similarity detection via graph contrastive learning on intermediate representations

Xiuwei Shang, Li Hu, Shaoyin Cheng, Guoqiang Chen, Benlong Wu, Weiming Zhang, and Nenghai Yu. Binary code similarity detection via graph contrastive learning on intermediate representations. arXiv preprint arXiv:2410.18561, 2024

work page arXiv 2024

[28] [28]

Dire: A neural approach to decompiled identifier naming

Jeremy Lacomis, Pengcheng Yin, Edward Schwartz, Miltiadis Allamanis, Claire Le Goues, Graham Neubig, and Bogdan Vasilescu. Dire: A neural approach to decompiled identifier naming. In 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE) , pages 628–639. IEEE, 2019

work page 2019

[29] [29]

Zipf’s word frequency law in natural language: A critical review and future directions.Psychonomic bulletin & review, 21:1112–1130, 2014

Steven T Piantadosi. Zipf’s word frequency law in natural language: A critical review and future directions.Psychonomic bulletin & review, 21:1112–1130, 2014

work page 2014

[30] [30]

Applications and explanations of zipf’s law

David MW Powers. Applications and explanations of zipf’s law. In New methods in language processing and computa- tional natural language learning , 1998

work page 1998

[31] [31]

Zipf’ s law and heaps’s law can predict the size of potential words

Yukie Sano, Hideki Takayasu, and Misako Takayasu. Zipf’ s law and heaps’s law can predict the size of potential words. Progress of Theoretical Physics Supplement , 194:202–209, 2012

work page 2012

[32] [32]

Exploring regularity in source code: Software science and zipf’s law

Hongyu Zhang. Exploring regularity in source code: Software science and zipf’s law. In 2008 15th Working Conference on Reverse Engineering, pages 101–110. IEEE, 2008

work page 2008

[33] [33]

Discovering power laws in computer programs

Hongyu Zhang. Discovering power laws in computer programs. Information processing & management , 45(4):477–483, 2009

work page 2009

[34] [34]

The locality principle

Peter J Denning. The locality principle. Communications of the ACM, 48(7):19–24, 2005

work page 2005

[35] [35]

Gcc, the gnu compiler collection, 2024

GNU Project. Gcc, the gnu compiler collection, 2024. Accessed: 2024-01-04

work page 2024

[36] [36]

Unleashing the hidden power of compiler optimization on binary code difference: An empirical study

Xiaolei Ren, Michael Ho, Jiang Ming, Yu Lei, and Li Li. Unleashing the hidden power of compiler optimization on binary code difference: An empirical study. In Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation , pages 142–157, 2021

work page 2021

[37] [37]

Bincola: Diversity-sensitive contrastive learning for binary code similarity detection

Shuai Jiang, Cai Fu, Shuai He, Jianqiang Lv, Lansheng Han, and Hong Hu. Bincola: Diversity-sensitive contrastive learning for binary code similarity detection. IEEE Transactions on Software Engineering , 2024

work page 2024

[38] [38]

len or index or count, anything but v1”: Predicting variable names in decompilation output with transfer learning

Kuntal Kumar Pal, Ati Priya Bajaj, Pratyay Banerjee, Audrey Dutcher, Mutsumi Nakamura, Zion Leonahenahe Basque, Himanshu Gupta, Saurabh Arjun Sawant, Ujjwala Anantheswaran, Yan Shoshitaishvili, et al. len or index or count, anything but v1”: Predicting variable names in decompilation output with transfer learning. In 2024 IEEE Symposium on Security and Pr...

work page 2024

[39] [39]

System V application binary interface: AMD64 architecture processor supplement

Michael Matz and Jan Hubicka and Andreas Jaeger and Mark Mitchell. System V application binary interface: AMD64 architecture processor supplement. Technical report, x86-64 ABI, 2018. Available at https://gitlab.com/x86-psABIs/x86- J. ACM, Vol. 37, No. 4, Article . Publication date: July 2025. Beyond the Edge of Function: Unraveling the Patterns of Type Re...

work page 2018

[40] [40]

CEA IT Security. Miasm. https://github.com/cea-sec/miasm, 2023

work page 2023

[41] [41]

A survey of data flow analysis techniques

Ken Kennedy. A survey of data flow analysis techniques . IBM Thomas J. Watson Research Division, 1979

work page 1979

[42] [42]

Graph matching networks for learning the similarity of graph structured objects

Yujia Li, Chenjie Gu, Thomas Dullien, Oriol Vinyals, and Pushmeet Kohli. Graph matching networks for learning the similarity of graph structured objects. In International conference on machine learning , pages 3835–3845. PMLR, 2019

work page 2019

[43] [43]

Exploring gnn based program embedding technologies for binary related tasks

Yixin Guo, Pengcheng Li, Yingwei Luo, Xiaolin Wang, and Zhenlin Wang. Exploring gnn based program embedding technologies for binary related tasks. In Proceedings of the 30th IEEE/ACM International Conference on Program Comprehension, pages 366–377, 2022

work page 2022

[44] [44]

Code is not natural language: Unlock the power of semantics-oriented graph representation for binary code similarity detection

Haojie He, Xingwei Lin, Ziang Weng, Ruijie Zhao, Shuitao Gan, Libo Chen, Yuede Ji, Jiashui Wang, and Zhi Xue. Code is not natural language: Unlock the power of semantics-oriented graph representation for binary code similarity detection. In 33rd USENIX Security Symposium (USENIX Security 24), PHILADELPHIA, PA , 2024

work page 2024

[45] [45]

Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation

Kyunghyun Cho, Bart Van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. Learning phrase representations using rnn encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014

[46] [46]

Sok:(state of) the art of war: Offensive techniques in binary analysis

Yan Shoshitaishvili, Ruoyu Wang, Christopher Salls, Nick Stephens, Mario Polino, Andrew Dutcher, John Grosen, Siji Feng, Christophe Hauser, Christopher Kruegel, et al. Sok:(state of) the art of war: Offensive techniques in binary analysis. In 2016 IEEE symposium on security and privacy (SP) , pages 138–157. IEEE, 2016

work page 2016

[47] [47]

llasm: Naming functions in binaries by fusing encoder-only and decoder-only llms

Zihan Sha, Hao Wang, Zeyu Gao, Hui Shu, Bolun Zhang, Ziqing Wang, and Chao Zhang. llasm: Naming functions in binaries by fusing encoder-only and decoder-only llms. ACM Transactions on Software Engineering and Methodology , 2024

work page 2024

[48] [48]

Enhancing function name prediction using votes-based name tokenization and multi-task learning

Xiaoling Zhang, Zhengzi Xu, Shouguo Yang, Zhi Li, Zhiqiang Shi, and Limin Sun. Enhancing function name prediction using votes-based name tokenization and multi-task learning. Proceedings of the ACM on Software Engineering , 1(FSE):1679–1702, 2024

work page 2024

[49] [49]

Symlm: Predicting function names in stripped binaries via context-sensitive execution-aware code embeddings

Xin Jin, Kexin Pei, Jun Yeon Won, and Zhiqiang Lin. Symlm: Predicting function names in stripped binaries via context-sensitive execution-aware code embeddings. In Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security, pages 1631–1645, 2022

work page 2022

[50] [50]

Direct: A transformer-based model for decompiled variable name recov-ery

Vikram Nitin, Anthony Saieva, Baishakhi Ray, and Gail Kaiser. Direct: A transformer-based model for decompiled variable name recov-ery. NLP4Prog 2021, page 48, 2021

work page 2021

[51] [51]

Cp-bcs: Binary code summarization guided by control flow graph and pseudo code

Tong Ye, Lingfei Wu, Tengfei Ma, Xuhong Zhang, Yangkai Du, Peiyu Liu, Shouling Ji, and Wenhai Wang. Cp-bcs: Binary code summarization guided by control flow graph and pseudo code. arXiv preprint arXiv:2310.16853, 2023

work page arXiv 2023

[52] [52]

Binary code summarization: Benchmarking chatgpt/gpt-4 and other large language models

Xin Jin, Jonathan Larson, Weiwei Yang, and Zhiqiang Lin. Binary code summarization: Benchmarking chatgpt/gpt-4 and other large language models. arXiv preprint arXiv:2312.09601, 2023

work page arXiv 2023

[53] [53]

How far have we gone in binary code understanding using large language models

Xiuwei Shang, Shaoyin Cheng, Guoqiang Chen, Yanming Zhang, Li Hu, Xiao Yu, Gangyang Li, Weiming Zhang, and Nenghai Yu. How far have we gone in binary code understanding using large language models. In 2024 IEEE International Conference on Software Maintenance and Evolution (ICSME) , pages 1–12. IEEE, 2024

work page 2024

[54] [54]

Binvuldet: Detecting vulnerability in binary program via decompiled pseudo code and bilstm-attention

Yan Wang, Peng Jia, Xi Peng, Cheng Huang, and Jiayong Liu. Binvuldet: Detecting vulnerability in binary program via decompiled pseudo code and bilstm-attention. Computers & Security, 125:103023, 2023

work page 2023

[55] [55]

Resym: Harnessing llms to recover variable and data structure symbols from stripped binaries

Danning Xie, Zhuo Zhang, Nan Jiang, Xiangzhe Xu, Lin Tan, and Xiangyu Zhang. Resym: Harnessing llms to recover variable and data structure symbols from stripped binaries. In Proceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security , pages 4554–4568, 2024

work page 2024

[56] [56]

Cati: Context-assisted type inference from stripped binaries

Ligeng Chen, Zhongling He, and Bing Mao. Cati: Context-assisted type inference from stripped binaries. In 2020 50th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN) , pages 88–98. IEEE, 2020

work page 2020

[57] [57]

Tie: Principled reverse engineering of types in binary programs

JongHyup Lee, Thanassis Avgerinos, and David Brumley. Tie: Principled reverse engineering of types in binary programs. 2011

work page 2011

[58] [58]

Polymorphic type inference for machine code

Matt Noonan, Alexey Loginov, and David Cok. Polymorphic type inference for machine code. In Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation , pages 27–41, 2016

work page 2016

[59] [59]

Scalable variable and data type detection in a binary rewriter

Khaled ElWazeer, Kapil Anand, Aparna Kotha, Matthew Smithson, and Rajeev Barua. Scalable variable and data type detection in a binary rewriter. In Proceedings of the 34th ACM SIGPLAN conference on Programming language design and implementation, pages 51–60, 2013

work page 2013

[60] [60]

Osprey: Recovery of variable and data structure via probabilistic analysis for stripped binary

Zhuo Zhang, Yapeng Ye, Wei You, Guanhong Tao, Wen-chuan Lee, Yonghwi Kwon, Yousra Aafer, and Xiangyu Zhang. Osprey: Recovery of variable and data structure via probabilistic analysis for stripped binary. In 2021 IEEE Symposium on Security and Privacy (SP) , pages 813–832. IEEE, 2021

work page 2021

[61] [61]

Howard: A dynamic excavator for reverse engineering data structures

Asia Slowinska, Traian Stancescu, and Herbert Bos. Howard: A dynamic excavator for reverse engineering data structures. In NDSS, 2011

work page 2011

[62] [62]

Automatic reverse engineering of data structures from binary execution

Zhiqiang Lin, Xiangyu Zhang, and Dongyan Xu. Automatic reverse engineering of data structures from binary execution. In Proceedings of the 11th Annual Information Security Symposium , pages 1–1, 2010. J. ACM, Vol. 37, No. 4, Article . Publication date: July 2025. 32 Gangyang Li et al

work page 2010

[63] [63]

Typeminer: Recovering types in binary programs using machine learning

Alwin Maier, Hugo Gascon, Christian Wressnegger, and Konrad Rieck. Typeminer: Recovering types in binary programs using machine learning. In Detection of Intrusions and Malware, and Vulnerability Assessment: 16th International Conference, DIMV A 2019, Gothenburg, Sweden, June 19–20, 2019, Proceedings 16 , pages 288–308. Springer, 2019

work page 2019

[64] [64]

Neural nets can learn function type signatures from binaries

Zheng Leong Chua, Shiqi Shen, Prateek Saxena, and Zhenkai Liang. Neural nets can learn function type signatures from binaries. In 26th USENIX Security Symposium (USENIX Security 17) , pages 99–116, 2017

work page 2017

[65] [65]

A transformer-based function symbol name inference model from an assembly language for binary reversing

Hyunjin Kim, Jinyeong Bak, Kyunghyun Cho, and Hyungjoon Koo. A transformer-based function symbol name inference model from an assembly language for binary reversing. In Proceedings of the 2023 ACM Asia Conference on Computer and Communications Security , pages 951–965, 2023

work page 2023

[66] [66]

Xfl: Naming functions in binaries with extreme multi-label learning

James Patrick-Evans, Moritz Dannehl, and Johannes Kinder. Xfl: Naming functions in binaries with extreme multi-label learning. In 2023 IEEE Symposium on Security and Privacy (SP) , pages 2375–2390. IEEE, 2023

work page 2023

[67] [67]

Binary function clone search in the presence of code obfuscation and optimization over multi-cpu architectures

Abdullah Qasem, Mourad Debbabi, Bernard Lebel, and Marthe Kassouf. Binary function clone search in the presence of code obfuscation and optimization over multi-cpu architectures. In Proceedings of the 2023 acm asia conference on computer and communications security , pages 443–456, 2023

work page 2023

[68] [68]

Clap: Learning transferable binary code representations with natural language supervision

Hao Wang, Zeyu Gao, Chao Zhang, Zihan Sha, Mingyang Sun, Yuchen Zhou, Wenyu Zhu, Wenju Sun, Han Qiu, and Xi Xiao. Clap: Learning transferable binary code representations with natural language supervision. In Proceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis , pages 503–515, 2024

work page 2024

[69] [69]

Bbdetector: A precise and scalable third- party library detection in binary executables with fine-grained function-level features

Xiaoya Zhu, Junfeng Wang, Zhiyang Fang, Xiaokang Yin, and Shengli Liu. Bbdetector: A precise and scalable third- party library detection in binary executables with fine-grained function-level features. Applied Sciences, 13(1):413, 2022

work page 2022

[70] [70]

Libam: An area matching framework for detecting third-party libraries in binaries

Siyuan Li, Yongpan Wang, Chaopeng Dong, Shouguo Yang, Hong Li, Hao Sun, Zhe Lang, Zuxin Chen, Weijie Wang, Hongsong Zhu, et al. Libam: An area matching framework for detecting third-party libraries in binaries. ACM Transactions on Software Engineering and Methodology , 33(2):1–35, 2023

work page 2023

[71] [71]

Libdb: An effective and efficient framework for detecting third-party libraries in binaries

Wei Tang, Yanlin Wang, Hongyu Zhang, Shi Han, Ping Luo, and Dongmei Zhang. Libdb: An effective and efficient framework for detecting third-party libraries in binaries. In Proceedings of the 19th International Conference on Mining Software Repositories, pages 423–434, 2022. J. ACM, Vol. 37, No. 4, Article . Publication date: July 2025

work page 2022