Understanding Binary Code Similarity for Real-World Vulnerability Detection: A Large-Scale Empirical Study

Chaopeng Dong; Hong Li; Hongsong Zhu; Jie Liu; Jingdong Guo; Siyuan Li; Yimo Ren

arxiv: 2606.28870 · v1 · pith:SIN7APFRnew · submitted 2026-06-27 · 💻 cs.CR · cs.SE

Understanding Binary Code Similarity for Real-World Vulnerability Detection: A Large-Scale Empirical Study

Jingdong Guo , Chaopeng Dong , Yimo Ren , Siyuan Li , Jie Liu , Hong Li , Hongsong Zhu This is my paper

Pith reviewed 2026-06-30 09:50 UTC · model grok-4.3

classification 💻 cs.CR cs.SE

keywords binary code similarity detectionvulnerability detectionfirmware analysisthird-party librariesIoT securityempirical studymean reciprocal rank

0 comments

The pith

Build-aware queries from real binaries raise BCSD mean reciprocal rank from 0.818 to 0.981 for firmware vulnerability detection.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper conducts a large-scale study of binary code similarity detection across 60,000 firmware images from 200 vendors to assess its effectiveness for identifying real-world vulnerabilities. It evaluates the impact of four factors—vulnerable function versions, search space, function sizes, and compilation toolchains—showing that mismatches with actual build conditions degrade performance. To address these issues, the authors introduce a build-aware query strategy that selects queries from representative real-world binaries and demonstrate a TPL-aware two-stage search that further narrows the space. These changes produce measurable gains in ranking accuracy without requiring new detection models.

Core claim

Analysis of BCSD across diverse real firmware reveals that compilation toolchains and search space cause large performance variations; deriving queries from representative real-world binaries closes the gap and raises mean reciprocal rank from 0.818 to 0.981, while a TPL-aware two-stage search improves MRR by an additional 18.5 percent by restricting the search space.

What carries the argument

The build-aware query strategy, which selects query functions from binaries compiled under conditions matching the target firmware rather than from synthetic or mismatched sources.

If this is right

Standard BCSD benchmarks that rely on non-representative queries systematically underestimate field performance.
Incorporating knowledge of third-party libraries to limit search space yields consistent accuracy gains across different detection methods.
Matching query and target binaries on compilation toolchain and build settings is required for reliable vulnerability ranking.
Function size and version differences alone do not explain most observed performance drops once build awareness is added.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same query-selection principle could be tested on other binary analysis tasks such as malware classification or patch identification.
Detection pipelines might benefit from explicitly encoding build metadata as an auxiliary input rather than treating it as noise.
Future large-scale studies could isolate the contribution of each factor by holding the others fixed in controlled subsets of the firmware corpus.

Load-bearing premise

The collection of 60,000 firmware images from 200 vendors supplies enough variety in vulnerabilities, third-party libraries, and compilation environments to support broad conclusions about BCSD behavior.

What would settle it

Running the same evaluation protocol on a fresh set of firmware images from additional vendors and measuring whether the reported MRR gains remain above 0.95 or fall closer to the baseline of 0.818.

Figures

Figures reproduced from arXiv: 2606.28870 by Chaopeng Dong, Hong Li, Hongsong Zhu, Jie Liu, Jingdong Guo, Siyuan Li, Yimo Ren.

**Figure 2.** Figure 2: Overview of our study architecture. 2.1 Data Collection and Preprocessing To address the limitations of prior benchmarks, we constructed a large-scale, diverse, and realistic dataset derived entirely from real-world firmware. 2.1.1 Firmware Dataset Construction. We constructed our firmware dataset according to three principles that address C1: • Source discovery and normalization.We collect firmware from o… view at source ↗

**Figure 3.** Figure 3: Performance on the BinKit benchmark (trained and evaluated on BinKit using its standard split). Bars [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗

**Figure 4.** Figure 4: Impact of different versions of vulnerable functions on BCSD performance. [PITH_FULL_IMAGE:figures/full_fig_p011_4.png] view at source ↗

**Figure 5.** Figure 5: Non-linear impact of function size on BCSD performance. (a) Long-tailed size distribution in our [PITH_FULL_IMAGE:figures/full_fig_p012_5.png] view at source ↗

**Figure 6.** Figure 6: BCSD performance across vendor-specific OpenSSL builds. Even with version/architecture/optimization [PITH_FULL_IMAGE:figures/full_fig_p013_6.png] view at source ↗

**Figure 7.** Figure 7: compares the Control-Flow Graph (CFG) of OpenSSL’s ASN1_verify function under two build configurations. The default compilation (a) features a distributed error-handling architecture. In stark contrast, the in-the-wild version (b) is transformed by "High Impact" macros from [PITH_FULL_IMAGE:figures/full_fig_p015_7.png] view at source ↗

read the original abstract

Firmware lies at the heart of IoT devices. Its development depends heavily on third-party libraries (TPLs), which greatly accelerate the process but simultaneously introduce associated vulnerabilities. Binary Code Similarity Detection (BCSD) is an effective technique for identifying vulnerabilities in firmware by comparing pairs of code segments. However, existing studies either evaluate their performance only on small-scale datasets or lack diversity in terms of vulnerabilities, TPLs, and firmware. Consequently, a comprehensive understanding of BCSD for real-world vulnerability detection remains absent. To bridge this gap, we conduct a large-scale study of vulnerability detection across 60,000 firmware images from 200 vendors using BCSD. Rather than introducing a novel model, we examine the influence of four key factors -- vulnerable function versions, vulnerability search space, function sizes, and compilation toolchains on BCSD performance. Our results reveal that these factors substantially affect performance, often by wide margins. To address this, we propose a build-aware query strategy that derives queries from representative real-world binaries, effectively closing the gap and raising the mean reciprocal rank (MRR) from 0.818 to 0.981. Furthermore, we demonstrate that a TPL-aware, two-stage search process significantly enhances accuracy, improving MRR by 18.5\% by limiting the search space.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Large-scale BCSD study on 60k firmware reports big MRR lifts from build-aware queries and TPL-aware search, but thin details on corpus construction make the gains hard to generalize.

read the letter

The main thing here is that the authors scale BCSD evaluation to 60,000 real firmware images from 200 vendors, measure how four factors (vulnerable function versions, search space, function sizes, toolchains) affect performance, and claim two practical fixes: build-aware queries that raise MRR from 0.818 to 0.981 and a TPL-aware two-stage search that adds another 18.5%. That scale and the concrete numbers are the useful part.

They do a service by moving past the toy datasets that prior BCSD papers get criticized for. Testing on actual vendor firmware and showing that those four factors move the needle by wide margins gives practitioners something to work with. The proposed strategies are straightforward and tied directly to the measured problems.

The soft spot is exactly what the stress-test note flags. The abstract gives no numbers on vendor distribution, TPL coverage, architecture spread, or how vulnerabilities were labeled in the 60k set. Without that, it is difficult to know whether the MRR gains reflect general BCSD behavior or just the particular makeup of their corpus. No information on baseline implementations, controls, or statistical tests appears either, so the quantitative claims sit on unverified ground.

This is for people doing applied work on firmware vulnerability detection who need real-world scale numbers rather than another model paper. A reader who wants to see how build and TPL factors play out at volume will get value, but anyone planning to rely on the exact improvement figures will want the methods section expanded first.

Send it to peer review. The scale is rare enough that referees should see the full experimental setup and data description.

Referee Report

2 major / 0 minor

Summary. The manuscript presents a large-scale empirical study of Binary Code Similarity Detection (BCSD) for vulnerability detection in firmware. It evaluates BCSD performance across 60,000 firmware images from 200 vendors, examining the effects of four factors (vulnerable function versions, search space, function sizes, and compilation toolchains). The authors propose a build-aware query strategy that raises MRR from 0.818 to 0.981 and a TPL-aware two-stage search that improves MRR by 18.5%.

Significance. If the dataset is representative, the work provides useful empirical insights into real-world BCSD limitations and practical mitigation strategies, addressing the diversity shortcomings of prior smaller-scale studies. The scale of the corpus is a clear strength.

major comments (2)

[Abstract] Abstract: the motivation criticizes prior studies for insufficient diversity in vulnerabilities, TPLs, and firmware, yet supplies no quantitative evidence (vendor distribution histograms, architecture coverage, TPL frequency counts, or labeling methodology) that the 60k corpus overcomes those limitations. This directly undermines the generalizability of the reported MRR gains.
[Abstract] Abstract and experimental description: no details are given on baseline BCSD implementations, statistical significance tests, or curation/labeling procedures for the 60k dataset. These omissions make it impossible to assess whether the 0.818→0.981 and +18.5% improvements are robust or artifactual.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. The comments highlight important areas for improving clarity and transparency in the abstract and experimental sections. We address each point below and will revise the manuscript to incorporate additional details where feasible.

read point-by-point responses

Referee: [Abstract] Abstract: the motivation criticizes prior studies for insufficient diversity in vulnerabilities, TPLs, and firmware, yet supplies no quantitative evidence (vendor distribution histograms, architecture coverage, TPL frequency counts, or labeling methodology) that the 60k corpus overcomes those limitations. This directly undermines the generalizability of the reported MRR gains.

Authors: We agree that the abstract, due to length constraints, does not include quantitative summaries of dataset diversity. The full manuscript (Section 3) contains vendor distribution details across 200 vendors, architecture coverage (e.g., ARM, x86, MIPS), TPL frequency counts, and labeling methodology based on CVE matching and binary analysis. To strengthen the motivation and generalizability claims, we will revise the abstract to include concise quantitative evidence, such as the number of unique TPLs and architectures represented. revision: yes
Referee: [Abstract] Abstract and experimental description: no details are given on baseline BCSD implementations, statistical significance tests, or curation/labeling procedures for the 60k dataset. These omissions make it impossible to assess whether the 0.818→0.981 and +18.5% improvements are robust or artifactual.

Authors: The experimental section describes the BCSD tools and dataset construction at a high level, but we acknowledge that explicit details on baseline implementations (e.g., specific versions of tools like BinDiff or Asm2Vec), statistical significance testing for the MRR improvements, and expanded curation/labeling procedures (e.g., exact CVE-to-binary mapping steps) are not sufficiently elaborated. We will revise the experimental description to add these elements, including any applicable significance tests, to allow better assessment of robustness. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical measurements on external corpus

full rationale

The paper reports an empirical large-scale study measuring BCSD performance factors (vulnerable function versions, search space, function sizes, toolchains) across 60k firmware images and then measures MRR gains from two proposed strategies (build-aware queries, TPL-aware two-stage search). These are direct experimental outcomes on held-out or representative binaries, not derivations, fitted parameters renamed as predictions, or self-citation chains. No equations, ansatzes, or uniqueness theorems appear; the MRR numbers (0.818→0.981, +18.5%) are observed deltas, not forced by construction. The representativeness concern raised by the skeptic is a validity/generalizability issue, not a circularity reduction. The work is self-contained against its own corpus benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper is an empirical study that relies on standard domain assumptions about BCSD effectiveness without introducing new free parameters or invented entities.

axioms (1)

domain assumption Binary Code Similarity Detection (BCSD) is an effective technique for identifying vulnerabilities in firmware by comparing pairs of code segments.
Presented as established background in the opening of the abstract.

pith-pipeline@v0.9.1-grok · 5780 in / 1225 out tokens · 45187 ms · 2026-06-30T09:50:25.191003+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

54 extracted references · 29 canonical work pages

[1]

Nguyen, Kandaraj Piamrat, Guido Marchetto, and Quoc-Viet Pham

Ons Aouedi, Thai-Hoc Vu, Alessio Sacco, Dinh C. Nguyen, Kandaraj Piamrat, Guido Marchetto, and Quoc-Viet Pham. 2024. A Survey on Intelligent Internet of Things: Applications, Security, Privacy, and Future Directions.IEEE Communications Surveys & Tutorials(2024). doi:10.1109/COMST.2024.3430368

work page doi:10.1109/comst.2024.3430368 2024
[2]

BusyBox. 2025. BusyBox: The Swiss Army Knife of Embedded Linux. https://www.busybox.net/

2025
[3]

Chen, Manuel Egele, Maverick Woo, and David Brumley

Daming D. Chen, Manuel Egele, Maverick Woo, and David Brumley. 2016. Towards Automated Dynamic Analysis for Linux-based Embedded Firmware. InProceedings of the 23rd Network and Distributed System Security Symposium , Vol. 1, No. 1, Article . Publication date: June 2026. Understanding Binary Code Similarity for Real-World Vulnerability Detection: A Large-S...

work page doi:10.14722/ndss.2016.23415 2016
[4]

Andrei Costin, Jonas Zaddach, Aurélien Francillon, and Davide Balzarotti. 2014. A large-scale analysis of the security of embedded firmwares. InProceedings of the 23rd USENIX Conference on Security Symposium(San Diego, CA)(SEC’14). USENIX Association, USA, 95–110

2014
[5]

Andrei Costin, Apostolis Zarras, and Aurélien Francillon. 2016. Automated Dynamic Firmware Analysis at Scale: A Case Study on Embedded Web Interfaces. InProceedings of the 11th ACM on Asia Conference on Computer and Communications Security(Xi’an, China)(ASIA CCS ’16). Association for Computing Machinery, New York, NY, USA, 437–448. doi:10.1145/2897845.2897900

work page doi:10.1145/2897845.2897900 2016
[6]

curl. 2025. curl: Command line tool and library for transferring data with URLs. https://curl.se/

2025
[7]

Yaniv David and Eran Yahav. 2014. Tracelet-based code search in executables. InProceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation(Edinburgh, United Kingdom)(PLDI ’14). Association for Computing Machinery, New York, NY, USA, 349–360. doi:10.1145/2594291.2594343

work page doi:10.1145/2594291.2594343 2014
[8]

Steven H. H. Ding, Benjamin C. M. Fung, and Philippe Charland. 2019. Asm2Vec: Boosting Static Representation Robustness for Binary Clone Search against Code Obfuscation and Compiler Optimization. In2019 IEEE Symposium on Security and Privacy (SP). 472–489. doi:10.1109/SP.2019.00003

work page doi:10.1109/sp.2019.00003 2019
[9]

Sebastian Eschweiler, Khaled Yakdan, and Elmar Gerhards-Padilla. 2016. discovRE: Efficient Cross-Architecture Identification of Bugs in Binary Code. InNetwork and Distributed System Security Symposium. doi:10.14722/ndss.2016. 23185

work page doi:10.14722/ndss.2016 2016
[10]

Bo Feng, Alejandro Mera, and Long Lu. 2020. P2IM: scalable and hardware-independent firmware testing via automatic peripheral interface modeling. InProceedings of the 29th USENIX Conference on Security Symposium (SEC’20). USENIX Association, USA, Article 70, 18 pages. https://www.usenix.org/conference/usenixsecurity20/presentation/feng

2020
[11]

Qian Feng, Rundong Zhou, Chengcheng Xu, Yao Cheng, Brian Testa, and Heng Yin. 2016. Scalable Graph-based Bug Search for Firmware Images. InProceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security(Vienna, Austria)(CCS ’16). Association for Computing Machinery, New York, NY, USA, 480–491. doi:10. 1145/2976749.2978370

work page arXiv 2016
[12]

Fraunhofer SIT. 2019. FACT – Firmware Analysis and Comparison Tool: Documentation and Comparison Capabilities. https://fact-firmware-analysis.readthedocs.io/. Accessed 2025-09-12

2019
[13]

FreeType. 2025. FreeType: A Free, High-Quality and Portable Font Engine. https://freetype.org/

2025
[14]

Jian Gao, Xin Yang, Ying Fu, Yu Jiang, and Jiaguang Sun. 2018. VulSeeker: a semantic learning based vulnerability seeker for cross-platform binary. InProceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering(Montpellier, France)(ASE ’18). Association for Computing Machinery, New York, NY, USA, 896–899. doi:10.1145/3238147.3240480

work page doi:10.1145/3238147.3240480 2018
[15]

GNU Project. 2025. GNU Binutils. https://www.gnu.org/software/binutils/

2025
[16]

Google. 2011. BinDiff. https://www.zynamics.com/bindiff.html

2011
[17]

Irfan Ul Haq and Juan Caballero. 2021. A Survey of Binary Code Similarity.ACM Comput. Surv.54, 3, Article 51 (April 2021), 38 pages. doi:10.1145/3446371

work page doi:10.1145/3446371 2021
[18]

Haojie He, Xingwei Lin, Ziang Weng, Ruijie Zhao, Shuitao Gan, Libo Chen, Yuede Ji, Jiashui Wang, and Zhi Xue
[19]

InProceedings of the 33rd USENIX Security Symposium (USENIX Security 24)

Code is not Natural Language: Unlock the Power of Semantics-Oriented Graph Representation for Binary Code Similarity Detection. InProceedings of the 33rd USENIX Security Symposium (USENIX Security 24). USENIX Association, Philadelphia, PA, 1759–1776. https://www.usenix.org/conference/usenixsecurity24/presentation/he-haojie
[20]

2025.Binwalk: Firmware Analysis Tool

Craig Heffner and ReFirm Labs. 2025.Binwalk: Firmware Analysis Tool. https://github.com/ReFirmLabs/binwalk

2025
[21]

Grant Hernandez, Dave Jing Tian, Tuba Yavuz, Caroline Trippel, Kevin Butler, et al. 2022. FIRMWIRE: Transparent Dynamic Analysis for Cellular Baseband Firmware. InNetwork and Distributed System Security Symposium (NDSS). https://www.ndss-symposium.org/wp-content/uploads/2022-136-paper.pdf

2022
[22]

IBM. 2020. A new botnet attack just mozied into town. https://www.ibm.com/think/x-force/botnet-attack-mozi- mozied-into-town

2020
[23]

IBM. 2024. Firmware vs. software: What’s the difference and why it matters. https://www.ibm.com/think/insights/ firmware-vs-software

2024
[24]

Lichen Jia, Chenggang Wu, Peihua Zhang, and Zhe Wang. 2024. CodeExtract: Enhancing Binary Code Similarity Detection with Code Extraction Techniques. InProceedings of the 25th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems(Copenhagen, Denmark)(LCTES 2024). Association for Computing Machinery, New York, N...

work page doi:10.1145/3652032.3657572 2024
[25]

Dongkwan Kim, Eunsoo Kim, Sang Kil Cha, Sooel Son, and Yongdae Kim. 2023. Revisiting Binary Code Similarity Analysis Using Interpretable Feature Engineering and Lessons Learned.IEEE Transactions on Software Engineering49, 4 (2023), 1661–1682. doi:10.1109/TSE.2022.3187689

work page doi:10.1109/tse.2022.3187689 2023
[26]

Wenqiang Li, Jiameng Shi, Fengjun Li, Jingqiang Lin, Wei Wang, and Le Guan. 2022. 𝜇𝐴𝐹 𝐿: Non-intrusive Feedback- driven Fuzzing for Microcontroller Firmware. In2022 IEEE/ACM 44th International Conference on Software Engineering , Vol. 1, No. 1, Article . Publication date: June 2026. 20 Jingdong Guo, Chaopeng Dong, Yimo Ren, Siyuan Li, Jie Liu, Hong Li, an...

work page doi:10.1145/3510003.3510208 2022
[27]

Xuezixiang Li, Yu Qu, and Heng Yin. 2021. PalmTree: Learning an Assembly Language Model for Instruction Embedding. InProceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security (CCS ’21). Association for Computing Machinery, New York, NY, USA, 3236–3251. doi:10.1145/3460120.3484587

work page doi:10.1145/3460120.3484587 2021
[28]

Yujia Li, Chenjie Gu, Thomas Dullien, Oriol Vinyals, and Pushmeet Kohli. 2019. Graph Matching Networks for Learning the Similarity of Graph Structured Objects. InProceedings of the 36th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 97), Kamalika Chaudhuri and Ruslan Salakhutdinov (Eds.). PMLR, 3835–3845. http...

2019
[29]

libexpat. 2025. Expat XML Parser Library. https://libexpat.github.io/

2025
[30]

libpng. 2025. libpng: The PNG Reference Library. http://www.libpng.org/pub/png/libpng.html

2025
[31]

LibTIFF. 2025. LibTIFF: TIFF Library and Utilities. http://www.simplesystems.org/libtiff/

2025
[32]

Zhenhao Luo, Pengfei Wang, Baosheng Wang, Yong Tang, Wei Xie, Xu Zhou, Danjun Liu, and Kai Lu. 2023. VulHawk: Cross-architecture Vulnerability Detection with Entropy-based Binary Code Search. In30th Annual Network and Distributed System Security Symposium, NDSS 2023, San Diego, California, USA, February 27 - March 3, 2023. The Internet Society. doi:10.147...

work page doi:10.14722/ndss.2023.24415 2023
[33]

Luca Massarelli, Giuseppe Antonio Di Luna, Fabio Petroni, Roberto Baldoni, and Leonardo Querzoni. 2019. SAFE: Self-Attentive Function Embeddings for Binary Similarity. InDetection of Intrusions and Malware, and Vulnerability Assessment - 16th International Conference, DIMV A 2019, Gothenburg, Sweden, June 19-20, 2019, Proceedings (Lecture Notes in Compute...

work page doi:10.1007/978-3-030-22038-9_15 2019
[34]

Marius Muench, Jan Stijohann, Frank Kargl, Aurélien Francillon, and Davide Balzarotti. 2018. What You Corrupt Is Not What You Crash: Challenges in Fuzzing Embedded Devices. InNetwork and Distributed System Security Symposium (NDSS). doi:10.14722/ndss.2018.23166

work page doi:10.14722/ndss.2018.23166 2018
[35]

National Institute of Standards and Technology. 2014. CVE-2014-0160. https://nvd.nist.gov/vuln/detail/cve-2014-0160

2014
[36]

National Institute of Standards and Technology. 2025. National Vulnerability Database (NVD). https://nvd.nist.gov/

2025
[37]

OpenSSL. 2025. OpenSSL: Cryptography and SSL/TLS Toolkit. https://www.openssl.org/

2025
[38]

Kexin Pei, Zhou Xuan, Junfeng Yang, Suman Jana, and Baishakhi Ray. 2020. Trex: Learning Execution Semantics from Micro-Traces for Binary Similarity.arXiv preprint arXiv:2012.08680(2020). doi:10.48550/arXiv.2012.08680

work page doi:10.48550/arxiv.2012.08680 2020
[39]

Nilo Redini, Aravind Machiry, Ruoyu Wang, Chad Spensky, Andrea Continella, Yan Shoshitaishvili, Christopher Kruegel, and Giovanni Vigna. 2020. Karonte: Detecting Insecure Multi-binary Interactions in Embedded Firmware. In 2020 IEEE Symposium on Security and Privacy (SP). 1544–1561. doi:10.1109/SP40000.2020.00036

work page doi:10.1109/sp40000.2020.00036 2020
[40]

Liting Ruan, Qizhen Xu, Shunzhi Zhu, Xujing Huang, and Xinyang Lin. 2024. A Survey of Binary Code Similarity Detection Techniques.Electronics13, 9 (2024). doi:10.3390/electronics13091715

work page doi:10.3390/electronics13091715 2024
[41]

Tobias Scharnowski, Nils Bars, Moritz Schloegel, Eric Gustafson, Marius Muench, Giovanni Vigna, Christopher Kruegel, Thorsten Holz, and Ali Abbasi. 2022. Fuzzware: Using Precise MMIO Modeling for Effective Firmware Fuzzing. In31st USENIX Security Symposium (USENIX Security 22). USENIX Association, Boston, MA, 1239–1256. https://www.usenix.org/conference/u...

2022
[42]

2024.Internet of Things (IoT) connected devices installed base worldwide from 2019 to 2030

Statista Research Department. 2024.Internet of Things (IoT) connected devices installed base worldwide from 2019 to 2030. Technical Report. Statista. Available at: https://www.statista.com/statistics/1183457/iot-connected-devices-worldwide/

work page arXiv 2024
[43]

Hao Wang, Zeyu Gao, Chao Zhang, Mingyang Sun, Yuchen Zhou, Han Qiu, and Xi Xiao. 2024. CEBin: A Cost-Effective Framework for Large-Scale Binary Code Similarity Detection. InProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA 2024)(Vienna, Austria)(ISSTA 2024). Association for Computing Machinery, New York, N...

work page doi:10.1145/3650212.3652117 2024
[44]

Hongru Wang, Chunfang Li, Lingfei Zhang, and Minyong Shi. 2018. Anti-Crawler strategy and distributed crawler based on Hadoop. In2018 IEEE 3rd International Conference on Big Data Analysis (ICBDA). IEEE, 227–231. doi:10.1109/ ICBDA.2018.8367682

work page arXiv 2018
[45]

Hao Wang, Wenjie Qu, Gilad Katz, Wenyu Zhu, Zeyu Gao, Han Qiu, Jianwei Zhuge, and Chao Zhang. 2022. jTrans: jump-aware transformer for binary code similarity detection. InProceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis(Virtual, South Korea)(ISSTA 2022). Association for Computing Machinery, New York, NY, USA, 1–...

work page doi:10.1145/3533767.3534367 2022
[46]

Haohuang Wen, Zhiqiang Lin, and Yinqian Zhang. 2020. FirmXRay: Detecting Bluetooth Link Layer Vulnerabilities From Bare-Metal Firmware. InProceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security (CCS). ACM, 167–180. doi:10.1145/3372297.3423344

work page doi:10.1145/3372297.3423344 2020
[47]

Yuhao Wu, Jinwen Wang, Yujie Wang, Shixuan Zhai, Zihan Li, Yi He, Kun Sun, Qi Li, and Ning Zhang. 2024. Your Firmware Has Arrived: A Study of Firmware Update Vulnerabilities. In33rd USENIX Security Symposium (USENIX Security 24). USENIX Association, Philadelphia, PA, 5627–5644. https://www.usenix.org/conference/usenixsecurity24/ presentation/wu-yuhao , Vo...

2024
[48]

Xiaojun Xu, Chang Liu, Qian Feng, Heng Yin, Le Song, and Dawn Song. 2017. Neural Network-Based Graph Embedding for Cross-Platform Binary Code Similarity Detection. InProceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security(Dallas, Texas, USA)(CCS ’17). Association for Computing Machinery, New York, NY, USA, 363–376. doi:10.114...

work page doi:10.1145/3133956.3134018 2017
[49]

Shouguo Yang, Long Cheng, Yicheng Zeng, Zhe Lang, Hongsong Zhu, and Zhiqiang Shi. 2021. Asteria: Deep Learning- based AST-Encoding for Cross-platform Binary Code Similarity Detection. In51st Annual IEEE/IFIP International Conference on Dependable Systems and Networks, DSN 2021, Taipei, Taiwan, June 21-24, 2021. IEEE, 224–236. doi:10. 1109/DSN48987.2021.00036

work page arXiv 2021
[50]

Shouguo Yang, Chaopeng Dong, Yang Xiao, Yiran Cheng, Zhiqiang Shi, Zhi Li, and Limin Sun. 2023. Asteria-Pro: Enhancing Deep Learning-based Binary Code Similarity Detection by Incorporating Domain Knowledge.ACM Trans. Softw. Eng. Methodol.33, 1, Article 1 (Nov. 2023), 40 pages. doi:10.1145/3604611

work page doi:10.1145/3604611 2023
[51]

Jonas Zaddach, Luca Bruno, Aurélien Francillon, and Davide Balzarotti. 2014. AVATAR: A Framework to Support Dynamic Security Analysis of Embedded Systems’ Firmwares. In21st Annual Network and Distributed System Security Symposium, NDSS 2014, San Diego, California, USA, February 23-26, 2014. The Internet Society. https://doi.org/10. 14722/ndss.2014.23229

work page arXiv 2014
[52]

Binbin Zhao, Shouling Ji, Jiacheng Xu, Yuan Tian, Qiuyang Wei, Qinying Wang, Chenyang Lyu, Xuhong Zhang, Changting Lin, JingZheng Wu, and Raheem Beyah. 2022. A large-scale empirical analysis of the vulnerabilities introduced by third-party components in IoT firmware. InProceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Ana...

work page doi:10.1145/3533767.3534366 2022
[53]

Yaowen Zheng, Ali Davanian, Heng Yin, Chengyu Song, Hongsong Zhu, and Limin Sun. 2019. FIRM-AFL: High- Throughput Greybox Fuzzing of IoT Firmware via Augmented Process Emulation. In28th USENIX Security Symposium (USENIX Security 19). Santa Clara, CA, 1099–1114. https://www.usenix.org/conference/usenixsecurity19/presentation/ zheng

2019
[54]

zlib. 2025. zlib: A Massively Spiffy Yet Delicately Unobtrusive Compression Library. https://zlib.net/. , Vol. 1, No. 1, Article . Publication date: June 2026

2025

[1] [1]

Nguyen, Kandaraj Piamrat, Guido Marchetto, and Quoc-Viet Pham

Ons Aouedi, Thai-Hoc Vu, Alessio Sacco, Dinh C. Nguyen, Kandaraj Piamrat, Guido Marchetto, and Quoc-Viet Pham. 2024. A Survey on Intelligent Internet of Things: Applications, Security, Privacy, and Future Directions.IEEE Communications Surveys & Tutorials(2024). doi:10.1109/COMST.2024.3430368

work page doi:10.1109/comst.2024.3430368 2024

[2] [2]

BusyBox. 2025. BusyBox: The Swiss Army Knife of Embedded Linux. https://www.busybox.net/

2025

[3] [3]

Chen, Manuel Egele, Maverick Woo, and David Brumley

Daming D. Chen, Manuel Egele, Maverick Woo, and David Brumley. 2016. Towards Automated Dynamic Analysis for Linux-based Embedded Firmware. InProceedings of the 23rd Network and Distributed System Security Symposium , Vol. 1, No. 1, Article . Publication date: June 2026. Understanding Binary Code Similarity for Real-World Vulnerability Detection: A Large-S...

work page doi:10.14722/ndss.2016.23415 2016

[4] [4]

Andrei Costin, Jonas Zaddach, Aurélien Francillon, and Davide Balzarotti. 2014. A large-scale analysis of the security of embedded firmwares. InProceedings of the 23rd USENIX Conference on Security Symposium(San Diego, CA)(SEC’14). USENIX Association, USA, 95–110

2014

[5] [5]

Andrei Costin, Apostolis Zarras, and Aurélien Francillon. 2016. Automated Dynamic Firmware Analysis at Scale: A Case Study on Embedded Web Interfaces. InProceedings of the 11th ACM on Asia Conference on Computer and Communications Security(Xi’an, China)(ASIA CCS ’16). Association for Computing Machinery, New York, NY, USA, 437–448. doi:10.1145/2897845.2897900

work page doi:10.1145/2897845.2897900 2016

[6] [6]

curl. 2025. curl: Command line tool and library for transferring data with URLs. https://curl.se/

2025

[7] [7]

Yaniv David and Eran Yahav. 2014. Tracelet-based code search in executables. InProceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation(Edinburgh, United Kingdom)(PLDI ’14). Association for Computing Machinery, New York, NY, USA, 349–360. doi:10.1145/2594291.2594343

work page doi:10.1145/2594291.2594343 2014

[8] [8]

Steven H. H. Ding, Benjamin C. M. Fung, and Philippe Charland. 2019. Asm2Vec: Boosting Static Representation Robustness for Binary Clone Search against Code Obfuscation and Compiler Optimization. In2019 IEEE Symposium on Security and Privacy (SP). 472–489. doi:10.1109/SP.2019.00003

work page doi:10.1109/sp.2019.00003 2019

[9] [9]

Sebastian Eschweiler, Khaled Yakdan, and Elmar Gerhards-Padilla. 2016. discovRE: Efficient Cross-Architecture Identification of Bugs in Binary Code. InNetwork and Distributed System Security Symposium. doi:10.14722/ndss.2016. 23185

work page doi:10.14722/ndss.2016 2016

[10] [10]

Bo Feng, Alejandro Mera, and Long Lu. 2020. P2IM: scalable and hardware-independent firmware testing via automatic peripheral interface modeling. InProceedings of the 29th USENIX Conference on Security Symposium (SEC’20). USENIX Association, USA, Article 70, 18 pages. https://www.usenix.org/conference/usenixsecurity20/presentation/feng

2020

[11] [11]

Qian Feng, Rundong Zhou, Chengcheng Xu, Yao Cheng, Brian Testa, and Heng Yin. 2016. Scalable Graph-based Bug Search for Firmware Images. InProceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security(Vienna, Austria)(CCS ’16). Association for Computing Machinery, New York, NY, USA, 480–491. doi:10. 1145/2976749.2978370

work page arXiv 2016

[12] [12]

Fraunhofer SIT. 2019. FACT – Firmware Analysis and Comparison Tool: Documentation and Comparison Capabilities. https://fact-firmware-analysis.readthedocs.io/. Accessed 2025-09-12

2019

[13] [13]

FreeType. 2025. FreeType: A Free, High-Quality and Portable Font Engine. https://freetype.org/

2025

[14] [14]

Jian Gao, Xin Yang, Ying Fu, Yu Jiang, and Jiaguang Sun. 2018. VulSeeker: a semantic learning based vulnerability seeker for cross-platform binary. InProceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering(Montpellier, France)(ASE ’18). Association for Computing Machinery, New York, NY, USA, 896–899. doi:10.1145/3238147.3240480

work page doi:10.1145/3238147.3240480 2018

[15] [15]

GNU Project. 2025. GNU Binutils. https://www.gnu.org/software/binutils/

2025

[16] [16]

Google. 2011. BinDiff. https://www.zynamics.com/bindiff.html

2011

[17] [17]

Irfan Ul Haq and Juan Caballero. 2021. A Survey of Binary Code Similarity.ACM Comput. Surv.54, 3, Article 51 (April 2021), 38 pages. doi:10.1145/3446371

work page doi:10.1145/3446371 2021

[18] [18]

Haojie He, Xingwei Lin, Ziang Weng, Ruijie Zhao, Shuitao Gan, Libo Chen, Yuede Ji, Jiashui Wang, and Zhi Xue

[19] [19]

InProceedings of the 33rd USENIX Security Symposium (USENIX Security 24)

Code is not Natural Language: Unlock the Power of Semantics-Oriented Graph Representation for Binary Code Similarity Detection. InProceedings of the 33rd USENIX Security Symposium (USENIX Security 24). USENIX Association, Philadelphia, PA, 1759–1776. https://www.usenix.org/conference/usenixsecurity24/presentation/he-haojie

[20] [20]

2025.Binwalk: Firmware Analysis Tool

Craig Heffner and ReFirm Labs. 2025.Binwalk: Firmware Analysis Tool. https://github.com/ReFirmLabs/binwalk

2025

[21] [21]

Grant Hernandez, Dave Jing Tian, Tuba Yavuz, Caroline Trippel, Kevin Butler, et al. 2022. FIRMWIRE: Transparent Dynamic Analysis for Cellular Baseband Firmware. InNetwork and Distributed System Security Symposium (NDSS). https://www.ndss-symposium.org/wp-content/uploads/2022-136-paper.pdf

2022

[22] [22]

IBM. 2020. A new botnet attack just mozied into town. https://www.ibm.com/think/x-force/botnet-attack-mozi- mozied-into-town

2020

[23] [23]

IBM. 2024. Firmware vs. software: What’s the difference and why it matters. https://www.ibm.com/think/insights/ firmware-vs-software

2024

[24] [24]

Lichen Jia, Chenggang Wu, Peihua Zhang, and Zhe Wang. 2024. CodeExtract: Enhancing Binary Code Similarity Detection with Code Extraction Techniques. InProceedings of the 25th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems(Copenhagen, Denmark)(LCTES 2024). Association for Computing Machinery, New York, N...

work page doi:10.1145/3652032.3657572 2024

[25] [25]

Dongkwan Kim, Eunsoo Kim, Sang Kil Cha, Sooel Son, and Yongdae Kim. 2023. Revisiting Binary Code Similarity Analysis Using Interpretable Feature Engineering and Lessons Learned.IEEE Transactions on Software Engineering49, 4 (2023), 1661–1682. doi:10.1109/TSE.2022.3187689

work page doi:10.1109/tse.2022.3187689 2023

[26] [26]

Wenqiang Li, Jiameng Shi, Fengjun Li, Jingqiang Lin, Wei Wang, and Le Guan. 2022. 𝜇𝐴𝐹 𝐿: Non-intrusive Feedback- driven Fuzzing for Microcontroller Firmware. In2022 IEEE/ACM 44th International Conference on Software Engineering , Vol. 1, No. 1, Article . Publication date: June 2026. 20 Jingdong Guo, Chaopeng Dong, Yimo Ren, Siyuan Li, Jie Liu, Hong Li, an...

work page doi:10.1145/3510003.3510208 2022

[27] [27]

Xuezixiang Li, Yu Qu, and Heng Yin. 2021. PalmTree: Learning an Assembly Language Model for Instruction Embedding. InProceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security (CCS ’21). Association for Computing Machinery, New York, NY, USA, 3236–3251. doi:10.1145/3460120.3484587

work page doi:10.1145/3460120.3484587 2021

[28] [28]

Yujia Li, Chenjie Gu, Thomas Dullien, Oriol Vinyals, and Pushmeet Kohli. 2019. Graph Matching Networks for Learning the Similarity of Graph Structured Objects. InProceedings of the 36th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 97), Kamalika Chaudhuri and Ruslan Salakhutdinov (Eds.). PMLR, 3835–3845. http...

2019

[29] [29]

libexpat. 2025. Expat XML Parser Library. https://libexpat.github.io/

2025

[30] [30]

libpng. 2025. libpng: The PNG Reference Library. http://www.libpng.org/pub/png/libpng.html

2025

[31] [31]

LibTIFF. 2025. LibTIFF: TIFF Library and Utilities. http://www.simplesystems.org/libtiff/

2025

[32] [32]

Zhenhao Luo, Pengfei Wang, Baosheng Wang, Yong Tang, Wei Xie, Xu Zhou, Danjun Liu, and Kai Lu. 2023. VulHawk: Cross-architecture Vulnerability Detection with Entropy-based Binary Code Search. In30th Annual Network and Distributed System Security Symposium, NDSS 2023, San Diego, California, USA, February 27 - March 3, 2023. The Internet Society. doi:10.147...

work page doi:10.14722/ndss.2023.24415 2023

[33] [33]

Luca Massarelli, Giuseppe Antonio Di Luna, Fabio Petroni, Roberto Baldoni, and Leonardo Querzoni. 2019. SAFE: Self-Attentive Function Embeddings for Binary Similarity. InDetection of Intrusions and Malware, and Vulnerability Assessment - 16th International Conference, DIMV A 2019, Gothenburg, Sweden, June 19-20, 2019, Proceedings (Lecture Notes in Compute...

work page doi:10.1007/978-3-030-22038-9_15 2019

[34] [34]

Marius Muench, Jan Stijohann, Frank Kargl, Aurélien Francillon, and Davide Balzarotti. 2018. What You Corrupt Is Not What You Crash: Challenges in Fuzzing Embedded Devices. InNetwork and Distributed System Security Symposium (NDSS). doi:10.14722/ndss.2018.23166

work page doi:10.14722/ndss.2018.23166 2018

[35] [35]

National Institute of Standards and Technology. 2014. CVE-2014-0160. https://nvd.nist.gov/vuln/detail/cve-2014-0160

2014

[36] [36]

National Institute of Standards and Technology. 2025. National Vulnerability Database (NVD). https://nvd.nist.gov/

2025

[37] [37]

OpenSSL. 2025. OpenSSL: Cryptography and SSL/TLS Toolkit. https://www.openssl.org/

2025

[38] [38]

Kexin Pei, Zhou Xuan, Junfeng Yang, Suman Jana, and Baishakhi Ray. 2020. Trex: Learning Execution Semantics from Micro-Traces for Binary Similarity.arXiv preprint arXiv:2012.08680(2020). doi:10.48550/arXiv.2012.08680

work page doi:10.48550/arxiv.2012.08680 2020

[39] [39]

Nilo Redini, Aravind Machiry, Ruoyu Wang, Chad Spensky, Andrea Continella, Yan Shoshitaishvili, Christopher Kruegel, and Giovanni Vigna. 2020. Karonte: Detecting Insecure Multi-binary Interactions in Embedded Firmware. In 2020 IEEE Symposium on Security and Privacy (SP). 1544–1561. doi:10.1109/SP40000.2020.00036

work page doi:10.1109/sp40000.2020.00036 2020

[40] [40]

Liting Ruan, Qizhen Xu, Shunzhi Zhu, Xujing Huang, and Xinyang Lin. 2024. A Survey of Binary Code Similarity Detection Techniques.Electronics13, 9 (2024). doi:10.3390/electronics13091715

work page doi:10.3390/electronics13091715 2024

[41] [41]

Tobias Scharnowski, Nils Bars, Moritz Schloegel, Eric Gustafson, Marius Muench, Giovanni Vigna, Christopher Kruegel, Thorsten Holz, and Ali Abbasi. 2022. Fuzzware: Using Precise MMIO Modeling for Effective Firmware Fuzzing. In31st USENIX Security Symposium (USENIX Security 22). USENIX Association, Boston, MA, 1239–1256. https://www.usenix.org/conference/u...

2022

[42] [42]

2024.Internet of Things (IoT) connected devices installed base worldwide from 2019 to 2030

Statista Research Department. 2024.Internet of Things (IoT) connected devices installed base worldwide from 2019 to 2030. Technical Report. Statista. Available at: https://www.statista.com/statistics/1183457/iot-connected-devices-worldwide/

work page arXiv 2024

[43] [43]

Hao Wang, Zeyu Gao, Chao Zhang, Mingyang Sun, Yuchen Zhou, Han Qiu, and Xi Xiao. 2024. CEBin: A Cost-Effective Framework for Large-Scale Binary Code Similarity Detection. InProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA 2024)(Vienna, Austria)(ISSTA 2024). Association for Computing Machinery, New York, N...

work page doi:10.1145/3650212.3652117 2024

[44] [44]

Hongru Wang, Chunfang Li, Lingfei Zhang, and Minyong Shi. 2018. Anti-Crawler strategy and distributed crawler based on Hadoop. In2018 IEEE 3rd International Conference on Big Data Analysis (ICBDA). IEEE, 227–231. doi:10.1109/ ICBDA.2018.8367682

work page arXiv 2018

[45] [45]

Hao Wang, Wenjie Qu, Gilad Katz, Wenyu Zhu, Zeyu Gao, Han Qiu, Jianwei Zhuge, and Chao Zhang. 2022. jTrans: jump-aware transformer for binary code similarity detection. InProceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis(Virtual, South Korea)(ISSTA 2022). Association for Computing Machinery, New York, NY, USA, 1–...

work page doi:10.1145/3533767.3534367 2022

[46] [46]

Haohuang Wen, Zhiqiang Lin, and Yinqian Zhang. 2020. FirmXRay: Detecting Bluetooth Link Layer Vulnerabilities From Bare-Metal Firmware. InProceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security (CCS). ACM, 167–180. doi:10.1145/3372297.3423344

work page doi:10.1145/3372297.3423344 2020

[47] [47]

Yuhao Wu, Jinwen Wang, Yujie Wang, Shixuan Zhai, Zihan Li, Yi He, Kun Sun, Qi Li, and Ning Zhang. 2024. Your Firmware Has Arrived: A Study of Firmware Update Vulnerabilities. In33rd USENIX Security Symposium (USENIX Security 24). USENIX Association, Philadelphia, PA, 5627–5644. https://www.usenix.org/conference/usenixsecurity24/ presentation/wu-yuhao , Vo...

2024

[48] [48]

Xiaojun Xu, Chang Liu, Qian Feng, Heng Yin, Le Song, and Dawn Song. 2017. Neural Network-Based Graph Embedding for Cross-Platform Binary Code Similarity Detection. InProceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security(Dallas, Texas, USA)(CCS ’17). Association for Computing Machinery, New York, NY, USA, 363–376. doi:10.114...

work page doi:10.1145/3133956.3134018 2017

[49] [49]

Shouguo Yang, Long Cheng, Yicheng Zeng, Zhe Lang, Hongsong Zhu, and Zhiqiang Shi. 2021. Asteria: Deep Learning- based AST-Encoding for Cross-platform Binary Code Similarity Detection. In51st Annual IEEE/IFIP International Conference on Dependable Systems and Networks, DSN 2021, Taipei, Taiwan, June 21-24, 2021. IEEE, 224–236. doi:10. 1109/DSN48987.2021.00036

work page arXiv 2021

[50] [50]

Shouguo Yang, Chaopeng Dong, Yang Xiao, Yiran Cheng, Zhiqiang Shi, Zhi Li, and Limin Sun. 2023. Asteria-Pro: Enhancing Deep Learning-based Binary Code Similarity Detection by Incorporating Domain Knowledge.ACM Trans. Softw. Eng. Methodol.33, 1, Article 1 (Nov. 2023), 40 pages. doi:10.1145/3604611

work page doi:10.1145/3604611 2023

[51] [51]

Jonas Zaddach, Luca Bruno, Aurélien Francillon, and Davide Balzarotti. 2014. AVATAR: A Framework to Support Dynamic Security Analysis of Embedded Systems’ Firmwares. In21st Annual Network and Distributed System Security Symposium, NDSS 2014, San Diego, California, USA, February 23-26, 2014. The Internet Society. https://doi.org/10. 14722/ndss.2014.23229

work page arXiv 2014

[52] [52]

Binbin Zhao, Shouling Ji, Jiacheng Xu, Yuan Tian, Qiuyang Wei, Qinying Wang, Chenyang Lyu, Xuhong Zhang, Changting Lin, JingZheng Wu, and Raheem Beyah. 2022. A large-scale empirical analysis of the vulnerabilities introduced by third-party components in IoT firmware. InProceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Ana...

work page doi:10.1145/3533767.3534366 2022

[53] [53]

Yaowen Zheng, Ali Davanian, Heng Yin, Chengyu Song, Hongsong Zhu, and Limin Sun. 2019. FIRM-AFL: High- Throughput Greybox Fuzzing of IoT Firmware via Augmented Process Emulation. In28th USENIX Security Symposium (USENIX Security 19). Santa Clara, CA, 1099–1114. https://www.usenix.org/conference/usenixsecurity19/presentation/ zheng

2019

[54] [54]

zlib. 2025. zlib: A Massively Spiffy Yet Delicately Unobtrusive Compression Library. https://zlib.net/. , Vol. 1, No. 1, Article . Publication date: June 2026

2025