pith. sign in

arxiv: 1907.01374 · v1 · pith:FXARTHGPnew · submitted 2019-07-01 · 💻 cs.CR

A Semantics-Based Hybrid Approach on Binary Code Similarity Comparison

Pith reviewed 2026-05-25 12:11 UTC · model grok-4.3

classification 💻 cs.CR
keywords binary code similaritysemantic signatureshybrid analysiscross-architecturefunction comparisonemulationobfuscation
0
0 comments X

The pith

A hybrid semantics-based method uses execution of a reference function and emulation of targets with migrated runtime information to compare binary code similarity across architectures and obfuscations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a method for determining if two binary functions are similar even after changes from different compilers, processor architectures, or code obfuscation. It works by running the reference function on test cases to capture runtime behavior, then emulating the execution of potential matching functions while feeding them the same runtime data migrated from the reference. Semantic signatures are pulled from these runs and emulations, and a score is computed to gauge likeness. This approach aims to overcome the limitations of purely static or dynamic methods that cannot balance accuracy with coverage in cross-platform scenarios common in security analysis and software engineering.

Core claim

The authors claim that executing a reference function and then emulating target functions guided by its runtime information allows extraction of semantic signatures that support accurate similarity comparison for binary functions, even when the functions have undergone transformations due to varying compilation settings, architectures, and obfuscation techniques.

What carries the argument

The hybrid execution-emulation process with runtime information migration to generate semantic signatures for similarity scoring.

If this is right

  • Supports detection of similar code in programs ported to multiple architectures like ARM and MIPS.
  • Enhances applications such as plagiarism detection and bug detection in binary programs.
  • Achieves high accuracy without sacrificing coverage in large-scale function comparisons.
  • Resists common obfuscation methods by relying on semantic rather than syntactic features.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Such a method could help identify reused vulnerable code in IoT device firmware from different vendors.
  • It opens the possibility of applying the technique to dynamic analysis of malware variants.
  • The reliance on test cases for the reference execution suggests potential for automated test generation to improve coverage.

Load-bearing premise

That the migrated runtime information from the reference function's execution is sufficient to guide emulation of target functions while preserving semantic equivalence for reliable signature comparison despite various transformations.

What would settle it

Observing that for a set of semantically equivalent functions compiled for different architectures with obfuscation, the similarity scores fall below a threshold or that non-equivalent functions receive high scores.

Figures

Figures reproduced from arXiv: 1907.01374 by Bodong Li, Dawu Gu, Hui Wang, Yikun Hu, Yuanyuan Zhang.

Figure 1
Figure 1. Figure 1: Control Flow Graphs of png_set_unknown_chunks [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: System Architecture of BINMATCH the signatures of binary functions and measures their simi￾larity. 3.1 Semantic Signature The semantics describes the processes a computer follows when executing a program. It could be shown by describ￾ing the relationship between the input and output of the program [23]. Thus, given a specific input, we focus on two points to reveal the semantics of a binary function: i) wh… view at source ↗
Figure 3
Figure 3. Figure 3: Calling Stack of cdecl comparison operations, or calls a standard library function, BINMATCH injects code before it to capture corresponding features, then generates the signature of R (Line 4-9). Line 11-18 present the code for recording runtime infor￾mation of R’s execution. Similar functions should behave similarly if they are executed with the same input [7], [15], [18], [25]. Therefore, BINMATCH recor… view at source ↗
Figure 6
Figure 6. Figure 6: Indirect Jump of a Switch on ARM 1 mov eax, [ebp+arg_0] ; load the first argument 2 mov eax, [eax] 3 ... 4 call eax ; indirect call [PITH_FULL_IMAGE:figures/full_fig_p006_6.png] view at source ↗
Figure 8
Figure 8. Figure 8: Indirect Call Affected by the Control Flow on x86 [PITH_FULL_IMAGE:figures/full_fig_p007_8.png] view at source ↗
Figure 10
Figure 10. Figure 10: Accumulative Function Ratio versus Pruning [PITH_FULL_IMAGE:figures/full_fig_p010_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Accuracy of Cross-optimization Comparison [PITH_FULL_IMAGE:figures/full_fig_p012_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Accuracy of Cross-compiler Comparison redundant entries to the semantic signature with compari￾son operand values (§3.1). In contrast, SSE directly operates on a specific register set (i.e., XMM registers) and has no extra operations. Besides, x87 could handle single precision, double precision, and even 80-bit double-extended preci￾sion floating-point calculation, while SSE mainly processes single-precis… view at source ↗
read the original abstract

Binary code similarity comparison is a methodology for identifying similar or identical code fragments in binary programs. It is indispensable in fields of software engineering and security, which has many important applications (e.g., plagiarism detection, bug detection). With the widespread of smart and IoT (Internet of Things) devices, an increasing number of programs are ported to multiple architectures (e.g. ARM, MIPS). It becomes necessary to detect similar binary code across architectures as well. The main challenge of this topic lies in the semantics-equivalent code transformation resulting from different compilation settings, code obfuscation, and varied instruction set architectures. Another challenge is the trade-off between comparison accuracy and coverage. Unfortunately, existing methods still heavily rely on semantics-less code features which are susceptible to the code transformation. Additionally, they perform the comparison merely either in a static or in a dynamic manner, which cannot achieve high accuracy and coverage simultaneously. In this paper, we propose a semantics-based hybrid method to compare binary function similarity. We execute the reference function with test cases, then emulate the execution of every target function with the runtime information migrated from the reference function. Semantic signatures are extracted during the execution as well as the emulation. Lastly, similarity scores are calculated from the signatures to measure the likeness of functions. We have implemented the method in a prototype system designated as BinMatch and evaluate it with nine real-word projects compiled with different compilation settings, on variant architectures, and with commonly-used obfuscation methods, totally performing over 100 million pairs of function comparison.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper proposes BinMatch, a semantics-based hybrid method for binary function similarity comparison. It executes a reference function on test cases, migrates runtime state (inputs, registers, memory) to emulate each target function, extracts semantic signatures during both execution and emulation, and computes similarity scores. The approach is evaluated on nine real-world projects across architectures, compilation settings, and obfuscation methods, involving over 100 million function-pair comparisons, with the central claim that it simultaneously achieves high accuracy and coverage unlike pure static or dynamic baselines.

Significance. If the migration-based semantic signatures prove robust, the work would address a key limitation in binary similarity by balancing accuracy and coverage for cross-architecture and obfuscated code, with direct relevance to plagiarism detection, vulnerability search, and IoT security. The scale of the evaluation on real projects is a positive aspect.

major comments (3)
  1. [method description] § on emulation and signature extraction (method description): the runtime migration step is presented without a concrete mechanism or fallback for cases where target control flow diverges from the reference (e.g., due to control-flow flattening or virtualization), which is load-bearing for the claim that extracted signatures remain semantically comparable.
  2. [Evaluation section] Evaluation section (obfuscation experiments): results aggregate accuracy across obfuscation methods but do not isolate or report the subset of pairs where migration-induced path divergence occurs, leaving the hybrid advantage over static/dynamic methods unverified under the paper's own threat model.
  3. [Signature comparison procedure] Signature comparison procedure: no formal definition or equation is given for how semantic signatures are compared to yield the final similarity score, making it impossible to assess whether the metric is invariant to the architecture/register differences introduced by migration.
minor comments (2)
  1. [Abstract] The abstract states 'over 100 million pairs' but the evaluation should explicitly state whether pairs are deduplicated and how many unique functions are involved.
  2. [method description] Notation for migrated state components (registers vs. memory) is used inconsistently between the method description and evaluation tables.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive comments. We address each major point below and indicate planned revisions where appropriate.

read point-by-point responses
  1. Referee: the runtime migration step is presented without a concrete mechanism or fallback for cases where target control flow diverges from the reference (e.g., due to control-flow flattening or virtualization), which is load-bearing for the claim that extracted signatures remain semantically comparable.

    Authors: The manuscript describes migration as copying initial runtime state (registers, memory, inputs) from reference execution to initialize target emulation, after which the emulator follows the target's native control flow. Signatures capture architecture-independent behaviors such as memory access patterns. We agree the description lacks sufficient concreteness and will revise the method section to include a detailed mechanism, pseudocode for migration, and handling of divergence via continued partial emulation. revision: yes

  2. Referee: results aggregate accuracy across obfuscation methods but do not isolate or report the subset of pairs where migration-induced path divergence occurs, leaving the hybrid advantage over static/dynamic methods unverified under the paper's own threat model.

    Authors: The evaluation presents aggregate accuracy on obfuscated binaries to demonstrate overall robustness. We concur that isolating divergence cases would better substantiate the hybrid benefit. We will add a breakdown in the evaluation section reporting the fraction of pairs exhibiting migration-induced divergence and accuracy on that subset. revision: yes

  3. Referee: no formal definition or equation is given for how semantic signatures are compared to yield the final similarity score, making it impossible to assess whether the metric is invariant to the architecture/register differences introduced by migration.

    Authors: Similarity is computed via a normalized distance metric on signatures after abstracting registers and memory to a common model. We will insert a formal definition and equation in the signature comparison subsection to specify the metric and prove its invariance to migration-induced architectural differences. revision: yes

Circularity Check

0 steps flagged

No circularity; method is self-contained with external evaluation

full rationale

The paper describes executing a reference function on test cases, migrating runtime state to emulate target functions, extracting semantic signatures during both, and computing similarity scores. No equations, fitted parameters, self-citations, or ansatzes are shown that reduce the claimed accuracy or signatures to quantities defined by construction from the authors' inputs or prior work. The evaluation on nine real-world projects with varied compilation, architectures, and obfuscation provides independent external benchmarks rather than internal redefinition. This is the normal case of a non-circular empirical method description.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review; no free parameters, axioms, or invented entities are stated or implied at the level of detail needed to populate the ledger.

pith-pipeline@v0.9.0 · 5811 in / 1129 out tokens · 23563 ms · 2026-05-25T12:11:28.248124+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

64 extracted references · 64 canonical work pages

  1. [1]

    Binmatch: A semantics-based hybrid approach on binary code clone analysis,

    Y. Hu, Y. Zhang, J. Li, H. Wang, B. Li, and D. Gu, “Binmatch: A semantics-based hybrid approach on binary code clone analysis,” in Proceedings of the 34th International Conference on Software Main- tenance and Evolution, ser. ICSME’18. IEEE, 2018. JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 17

  2. [2]

    Value-based program characterization and its application to software plagia- rism detection,

    Y.-C. Jhi, X. Wang, X. Jia, S. Zhu, P . Liu, and D. Wu, “Value-based program characterization and its application to software plagia- rism detection,” in Proceedings of the 33rd International Conference on Software Engineering, ser. ICSE’11. ACM, 2011

  3. [3]

    A first step towards algorithm plagiarism detection,

    F. Zhang, Y.-C. Jhi, D. Wu, P . Liu, and S. Zhu, “A first step towards algorithm plagiarism detection,” in Proceedings of the 2012 Inter- national Symposium on Software Testing and Analysis , ser. ISSTA’12. ACM, 2012

  4. [4]

    Program logic based software plagiarism detection,

    F. Zhang, D. Wu, P . Liu, and S. Zhu, “Program logic based software plagiarism detection,” in Proceedings of the 25th IEEE International Symposium on Software Reliability Engineering, ser. ISSRE’14, 2014

  5. [5]

    Rendezvous: A search engine for binary code,

    W. M. Khoo, A. Mycroft, and R. Anderson, “Rendezvous: A search engine for binary code,” in Proceedings of the 10th Working Conference on Mining Software Repositories, ser. MSR’13. IEEE Press, 2013

  6. [6]

    Tracelet-based code search in executa- bles,

    Y. David and E. Yahav, “Tracelet-based code search in executa- bles,” in Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation , ser. PLDI’14. ACM, 2014

  7. [7]

    Bingo: Cross-architecture cross-os binary search,

    M. Chandramohan, Y. Xue, Z. Xu, Y. Liu, C. Y. Cho, and H. B. K. Tan, “Bingo: Cross-architecture cross-os binary search,” in Pro- ceedings of the 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, ser. FSE’16. ACM, 2016

  8. [8]

    Cross-architecture binary semantics understanding via similar code comparison,

    Y. Hu, Y. Zhang, J. Li, and D. Gu, “Cross-architecture binary semantics understanding via similar code comparison,” in Pro- ceedings of the 23rd International Conference on Software Analysis, Evolution, and Reengineering, ser. SANER’16. IEEE, 2016

  9. [9]

    The software similarity problem in malware analysis,

    W. Andrew and L. Arun, “The software similarity problem in malware analysis,” in Duplication, Redundancy, and Similar- ity in Software , ser. IBFI’07. Internationales Begegnungs- und Forschungszentrum für Informatik, 2007

  10. [10]

    Lines of malicious code: Insights into the malicious software industry,

    M. Lindorfer, A. Di Federico, F. Maggi, P . M. Comparetti, and S. Zanero, “Lines of malicious code: Insights into the malicious software industry,” in Proceedings of the 28th Annual Computer Security Applications Conference, ser. ACSAC’12. ACM, 2012

  11. [11]

    Binsim: Trace-based semantic binary diffing via system call sliced segment equivalence check- ing,

    J. Ming, D. Xu, Y. Jiang, and D. Wu, “Binsim: Trace-based semantic binary diffing via system call sliced segment equivalence check- ing,” in Proceedings of the 26th USENIX Security Symposium , ser. SEC’17. USENIX Association, 2017

  12. [12]

    Automatic patch-based exploit generation is possible: Techniques and impli- cations,

    D. Brumley, P . Poosankam, D. Song, and J. Zheng, “Automatic patch-based exploit generation is possible: Techniques and impli- cations,” in 2008 IEEE Symposium on Security and Privacy, ser. SP’08. IEEE, 2008

  13. [13]

    Precise and accurate patch presence test for binaries,

    H. Zhang and Z. Qian, “Precise and accurate patch presence test for binaries,” in 27th USENIX Security Symposium, ser. SEC’18

  14. [14]

    Lever- aging semantic signatures for bug search in binary programs,

    J. Pewny, F. Schuster, L. Bernhard, T. Holz, and C. Rossow, “Lever- aging semantic signatures for bug search in binary programs,” in Proceedings of the 30th Annual Computer Security Applications Conference, ser. ACSAC’14. ACM, 2014

  15. [15]

    Cross- architecture bug search in binary executables,

    J. Pewny, B. Garmany, R. Gawlik, C. Rossow, and T. Holz, “Cross- architecture bug search in binary executables,” in 2015 IEEE Sym- posium on Security and Privacy, ser. SP’15. IEEE, 2015

  16. [16]

    discovre: Efficient cross-architecture identification of bugs in binary code,

    S. Eschweiler, K. Yakdan, and E. Gerhards-Padilla, “discovre: Efficient cross-architecture identification of bugs in binary code,” in The Network and Distributed System Security Symposium , ser. NDSS’16

  17. [17]

    Scalable graph-based bug search for firmware images,

    Q. Feng, R. Zhou, C. Xu, Y. Cheng, B. Testa, and H. Yin, “Scalable graph-based bug search for firmware images,” in Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, ser. CCS’16. ACM, 2016

  18. [18]

    Blanket execu- tion: dynamic similarity testing for program binaries and compo- nents,

    M. Egele, M. Woo, P . Chapman, and D. Brumley, “Blanket execu- tion: dynamic similarity testing for program binaries and compo- nents,” in Proceedings of the 23rd USENIX Security Symposium , ser. SEC’14. USENIX Association, 2014

  19. [19]

    Semantics-based obfuscation-resilient binary code similarity comparison with ap- plications to software plagiarism detection,

    L. Luo, J. Ming, D. Wu, P . Liu, and S. Zhu, “Semantics-based obfuscation-resilient binary code similarity comparison with ap- plications to software plagiarism detection,” in Proceedings of the 22Nd ACM SIGSOFT International Symposium on Foundations of Software Engineering, ser. FSE 2014. ACM, 2014

  20. [20]

    In-memory fuzzing for binary code simi- larity analysis,

    S. Wang and D. Wu, “In-memory fuzzing for binary code simi- larity analysis,” in Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering , ser. ASE’17. IEEE, 2017

  21. [21]

    Kam1n0: Mapreduce-based assembly clone search for reverse engineering,

    S. H. Ding, B. Fung, and P . Charland, “Kam1n0: Mapreduce-based assembly clone search for reverse engineering,” in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ser. KDD’16. ACM, 2016

  22. [22]

    Asm2vec: Boosting static representation robustness for binary clone search against code obfuscation and compiler optimization,

    S. H. Ding, B. C. Fung, and P . Charland, “Asm2vec: Boosting static representation robustness for binary clone search against code obfuscation and compiler optimization,” in 2019 IEEE Symposium on Security and Privacy, ser. SP’19. IEEE, 2019

  23. [23]

    Harper, Practical Foundations for Programming Languages, 2nd ed

    R. Harper, Practical Foundations for Programming Languages, 2nd ed. New York, NY, USA: Cambridge University Press, 2016

  24. [24]

    Structural comparison of executable objects,

    H. Flake, “Structural comparison of executable objects,” Proceed- ings of the 1st International Conference on Detection of Intrusions and Malware and Vulnerability Assessment, 2004

  25. [25]

    Binary code clone detection across architectures and compiling configurations,

    Y. Hu, Y. Zhang, J. Li, and D. Gu, “Binary code clone detection across architectures and compiling configurations,” in Proceedings of the 25th International Conference on Program Comprehension , ser. ICPC’17. IEEE, 2017

  26. [26]

    Turning programs against each other: high coverage fuzz-testing using binary-code mutation and dynamic slicing,

    U. Kargén and N. Shahmehri, “Turning programs against each other: high coverage fuzz-testing using binary-code mutation and dynamic slicing,” in Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering, ser. FSE’15. ACM, 2015

  27. [27]

    Behavior based software theft detection,

    X. Wang, Y.-C. Jhi, S. Zhu, and P . Liu, “Behavior based software theft detection,” in Proceedings of the 16th ACM Conference on Computer and Communications Security, ser. CCS’09. ACM, 2009

  28. [28]

    Can i clone this piece of code here?

    X. Wang, Y. Dang, L. Zhang, D. Zhang, E. Lan, and H. Mei, “Can i clone this piece of code here?” in Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineer- ing, ser. ASE’12. ACM, 2012

  29. [29]

    Similarity measures in scientometric research: The jaccard index versus salton’s cosine formula

    L. Hamers et al. , “Similarity measures in scientometric research: The jaccard index versus salton’s cosine formula.” Information Processing and Management, vol. 25, no. 3, pp. 315–318, 1989

  30. [30]

    A survey of longest common subsequence algorithms,

    L. Bergroth, H. Hakonen, and T. Raita, “A survey of longest common subsequence algorithms,” in Proceedings of the 7th Interna- tional Symposium on String Processing and Information Retrieval , ser. SPIRE’00. IEEE, 2000

  31. [31]

    Error Detecting and Error Correcting Codes,

    R. Hamming, “Error Detecting and Error Correcting Codes,” Bell System Techincal Journal, vol. 29, pp. 147–160, 1950

  32. [32]

    Similarity estimation techniques from rounding algorithms,

    M. S. Charikar, “Similarity estimation techniques from rounding algorithms,” in Proceedings of the 34th Annual ACM Symposium on Theory of Computing, ser. STOC ’02. ACM, 2002

  33. [34]

    Binary code is not easy,

    X. Meng and B. P . Miller, “Binary code is not easy,” in Proceedings of the 25th International Symposium on Software Testing and Analysis , ser. ISSTA’16. ACM, 2016

  34. [35]

    Valgrind: a framework for heavy- weight dynamic binary instrumentation,

    N. Nethercote and J. Seward, “Valgrind: a framework for heavy- weight dynamic binary instrumentation,” in Proceedings of the 28th ACM SIGPLAN Conference on Programming Language Design and Implementation, ser. PLDI’07. ACM, 2007

  35. [36]

    Sok:(state of) the art of war: Offensive techniques in binary analysis,

    Y. Shoshitaishvili, R. Wang, C. Salls, N. Stephens, M. Polino, A. Dutcher, J. Grosen, S. Feng, C. Hauser, C. Kruegel et al. , “Sok:(state of) the art of war: Offensive techniques in binary analysis,” in 2016 IEEE Symposium on Security and Privacy , ser. SP’16. IEEE, 2016

  36. [37]

    Profile- guided automatic inline expansion for c programs,

    P . P . Chang, S. A. Mahlke, W. Y. Chen, and W.-M. W. Hwu, “Profile- guided automatic inline expansion for c programs,” Software: Practice and Experience, vol. 22, no. 5, pp. 349–369, 1992

  37. [38]

    Code obfuscation literature survey,

    A. Balakrishnan and C. Schulze, “Code obfuscation literature survey,” CS701 Construction of compilers, vol. 19, 2005

  38. [39]

    Obfuscator- LLVM – software protection for the masses,

    P . Junod, J. Rinaldini, J. Wehrli, and J. Michielin, “Obfuscator- LLVM – software protection for the masses,” in Proceedings of the IEEE/ACM 1st International Workshop on Software Protection , ser. SPRO’15. IEEE, 2015

  39. [40]

    Distributed representations of sentences and documents,

    Q. Le and T. Mikolov, “Distributed representations of sentences and documents,” in Proceedings of the 31st International Confer- ence on International Conference on Machine Learning , ser. ICML’14. JMLR.org, 2014

  40. [41]

    Obfuscating c++ programs via control flow flattening,

    T. László and Á. Kiss, “Obfuscating c++ programs via control flow flattening,” Annales Universitatis Scientarum Budapestinensis de Rolando Eötvös Nominatae, Sectio Computatorica, vol. 30, pp. 3–19, 2009

  41. [42]

    Qemu, a fast and portable dynamic translator,

    F. Bellard, “Qemu, a fast and portable dynamic translator,” in Proceedings of the Annual Conference on USENIX Annual Technical Conference, ser. ATEC’05. Berkeley, CA, USA: USENIX Associa- tion, 2005

  42. [43]

    BYTEWEIGHT: Learning to recognize functions in binary code,

    T. Bao, J. Burket, M. Woo, R. Turner, and D. Brumley, “BYTEWEIGHT: Learning to recognize functions in binary code,” in Proceedings of the 23rd USENIX Security Symposium , ser. SEC’14. San Diego, CA: USENIX Association, 2014. JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 18

  43. [44]

    Recognizing functions in binaries with neural networks,

    E. C. R. Shin, D. Song, and R. Moazzezi, “Recognizing functions in binaries with neural networks,” in Proceedings of the 24th USENIX Conference on Security Symposium, ser. SEC’15, 2015

  44. [45]

    An in-depth analysis of disassembly on full-scale x86/x64 binaries,

    D. Andriesse, X. Chen, V . Van Der Veen, A. Slowinska, and H. Bos, “An in-depth analysis of disassembly on full-scale x86/x64 binaries,” in Proceedings of the 25th USENIX Conference on Security Symposium, ser. SEC’16. USENIX Association, 2016

  45. [46]

    Testing intermediate representations for binary analysis,

    S. Kim, M. Faerevaag, M. Jung, S. Jung, D. Oh, J. Lee, and S. K. Cha, “Testing intermediate representations for binary analysis,” in Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering, ser. ASE’17. IEEE Press, 2017

  46. [47]

    Automating patching of vulnerable open- source software versions in application binaries,

    R. Duan, A. Bijlani, Y. Ji, O. Alrawi, Y. Xiong, M. Ike, B. Saltafor- maggio, and W. Lee, “Automating patching of vulnerable open- source software versions in application binaries,” in The Network and Distributed System Security Symposium, ser. NDSS’19, 2019

  47. [48]

    Deobfuscation: Reverse engineering obfuscated code,

    S. K. Udupa, S. K. Debray, and M. Madou, “Deobfuscation: Reverse engineering obfuscated code,” in Proceedings of the 12th Working Conference on Reverse Engineering , ser. WCRE’05. IEEE, 2005

  48. [49]

    Symbolic execution of obfuscated code,

    B. Yadegari and S. Debray, “Symbolic execution of obfuscated code,” in Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, ser. CCS’15. ACM, 2015

  49. [50]

    A generic approach to automatic deobfuscation of executable code,

    B. Yadegari, B. Johannesmeyer, B. Whitely, and S. Debray, “A generic approach to automatic deobfuscation of executable code,” in 2015 IEEE Symposium on Security and Privacy , ser. SP’15. IEEE, 2015

  50. [51]

    Vmhunt: A verifiable approach to partially-virtualized binary code simplification,

    D. Xu, J. Ming, Y. Fu, and D. Wu, “Vmhunt: A verifiable approach to partially-virtualized binary code simplification,” in Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communica- tions Security, ser. CCS’18. ACM, 2018

  51. [52]

    Vuzzer: Application-aware evolutionary fuzzing,

    S. Rawat, V . Jain, A. Kumar, L. Cojocar, C. Giuffrida, and H. Bos, “Vuzzer: Application-aware evolutionary fuzzing,” in The Network and Distributed System Security Symposium, ser. NDSS’17, 2017

  52. [53]

    Semantics-based obfuscation-resilient binary code similarity comparison with ap- plications to software and algorithm plagiarism detection,

    L. Luo, J. Ming, D. Wu, P . Liu, and S. Zhu, “Semantics-based obfuscation-resilient binary code similarity comparison with ap- plications to software and algorithm plagiarism detection,” IEEE Transactions on Software Engineering, vol. 43, no. 12, pp. 1157–1177, 2017

  53. [54]

    Detecting code clones in binary executables,

    A. Sæbjørnsen, J. Willcock, T. Panas, D. Quinlan, and Z. Su, “Detecting code clones in binary executables,” in Proceedings of the Eighteenth International Symposium on Software Testing and Analysis, ser. ISSTA’09. ACM, 2009

  54. [55]

    Finding software license violations through binary code clone detection,

    A. Hemel, K. T. Kalleberg, R. Vermaas, and E. Dolstra, “Finding software license violations through binary code clone detection,” in Proceedings of the 8th Working Conference on Mining Software Repositories, ser. MSR’11. ACM, 2011

  55. [56]

    Statistical similarity of binaries,

    Y. David, N. Partush, and E. Yahav, “Statistical similarity of binaries,” in Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation , ser. PLDI’16. ACM, 2016

  56. [57]

    Binsequence: fast, accurate and scalable binary code reuse detection,

    H. Huang, A. M. Youssef, and M. Debbabi, “Binsequence: fast, accurate and scalable binary code reuse detection,” in Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security, ser. AsiaCCS’17. ACM, 2017

  57. [58]

    Extracting conditional formulas for cross-platform bug search,

    Q. Feng, M. Wang, M. Zhang, R. Zhou, A. Henderson, and H. Yin, “Extracting conditional formulas for cross-platform bug search,” in Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security, ser. AsiaCCS’17. ACM, 2017

  58. [59]

    Similarity of binaries through re-optimization,

    Y. David, N. Partush, and E. Yahav, “Similarity of binaries through re-optimization,” in Proceedings of the 38th ACM SIGPLAN Con- ference on Programming Language Design and Implementation , ser. PLDI’17, 2017

  59. [60]

    Firmup: Precise static detection of common vulnerabilities in firmware,

    ——, “Firmup: Precise static detection of common vulnerabilities in firmware,” in Proceedings of the 23rd International Conference on Architectural Support for Programming Languages and Operating Systems, ser. ASPLOS’18. ACM, 2018

  60. [61]

    Binarm: Scalable and efficient detection of vulner- abilities in firmware images of intelligent electronic devices,

    P . Shirani, L. Collard, B. L. Agba, B. Lebel, M. Debbabi, L. Wang, and A. Hanna, “Binarm: Scalable and efficient detection of vulner- abilities in firmware images of intelligent electronic devices,” in International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment, ser. DIMVA’18. Springer, 2018

  61. [62]

    Neural network-based graph embedding for cross-platform binary code similarity detection,

    X. Xu, C. Liu, Q. Feng, H. Yin, L. Song, and D. Song, “Neural network-based graph embedding for cross-platform binary code similarity detection,” in Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security , ser. CCS’17. New York, NY, USA: ACM, 2017

  62. [63]

    αdiff: Cross-version binary code similarity detection with dnn,

    B. Liu, W. Huo, C. Zhang, W. Li, F. Li, A. Piao, and W. Zou, “ αdiff: Cross-version binary code similarity detection with dnn,” in Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering, ser. ASE’18. New York, NY, USA: ACM, 2018

  63. [64]

    Neural machine translation inspired binary code similarity comparison beyond function pairs,

    F. Zuo, X. Li, Z. Zhang, P . Young, L. Luo, and Q. Zeng, “Neural machine translation inspired binary code similarity comparison beyond function pairs,” in The Network and Distributed System Security Symposium, ser. NDSS’19, 2019

  64. [65]

    Efficient estimation of word representations in vector space,

    T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient estimation of word representations in vector space,” in Proceedings of the International Conference on Learning Representations, 2013