A Semantics-Based Hybrid Approach on Binary Code Similarity Comparison

Bodong Li; Dawu Gu; Hui Wang; Yikun Hu; Yuanyuan Zhang

arxiv: 1907.01374 · v1 · pith:FXARTHGPnew · submitted 2019-07-01 · 💻 cs.CR

A Semantics-Based Hybrid Approach on Binary Code Similarity Comparison

Yikun Hu , Hui Wang , Yuanyuan Zhang , Bodong Li , Dawu Gu This is my paper

Pith reviewed 2026-05-25 12:11 UTC · model grok-4.3

classification 💻 cs.CR

keywords binary code similaritysemantic signatureshybrid analysiscross-architecturefunction comparisonemulationobfuscation

0 comments

The pith

A hybrid semantics-based method uses execution of a reference function and emulation of targets with migrated runtime information to compare binary code similarity across architectures and obfuscations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a method for determining if two binary functions are similar even after changes from different compilers, processor architectures, or code obfuscation. It works by running the reference function on test cases to capture runtime behavior, then emulating the execution of potential matching functions while feeding them the same runtime data migrated from the reference. Semantic signatures are pulled from these runs and emulations, and a score is computed to gauge likeness. This approach aims to overcome the limitations of purely static or dynamic methods that cannot balance accuracy with coverage in cross-platform scenarios common in security analysis and software engineering.

Core claim

The authors claim that executing a reference function and then emulating target functions guided by its runtime information allows extraction of semantic signatures that support accurate similarity comparison for binary functions, even when the functions have undergone transformations due to varying compilation settings, architectures, and obfuscation techniques.

What carries the argument

The hybrid execution-emulation process with runtime information migration to generate semantic signatures for similarity scoring.

If this is right

Supports detection of similar code in programs ported to multiple architectures like ARM and MIPS.
Enhances applications such as plagiarism detection and bug detection in binary programs.
Achieves high accuracy without sacrificing coverage in large-scale function comparisons.
Resists common obfuscation methods by relying on semantic rather than syntactic features.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Such a method could help identify reused vulnerable code in IoT device firmware from different vendors.
It opens the possibility of applying the technique to dynamic analysis of malware variants.
The reliance on test cases for the reference execution suggests potential for automated test generation to improve coverage.

Load-bearing premise

That the migrated runtime information from the reference function's execution is sufficient to guide emulation of target functions while preserving semantic equivalence for reliable signature comparison despite various transformations.

What would settle it

Observing that for a set of semantically equivalent functions compiled for different architectures with obfuscation, the similarity scores fall below a threshold or that non-equivalent functions receive high scores.

Figures

Figures reproduced from arXiv: 1907.01374 by Bodong Li, Dawu Gu, Hui Wang, Yikun Hu, Yuanyuan Zhang.

**Figure 2.** Figure 2: System Architecture of BINMATCH the signatures of binary functions and measures their similarity. 3.1 Semantic Signature The semantics describes the processes a computer follows when executing a program. It could be shown by describing the relationship between the input and output of the program [23]. Thus, given a specific input, we focus on two points to reveal the semantics of a binary function: i) wh… view at source ↗

**Figure 3.** Figure 3: Calling Stack of cdecl comparison operations, or calls a standard library function, BINMATCH injects code before it to capture corresponding features, then generates the signature of R (Line 4-9). Line 11-18 present the code for recording runtime information of R’s execution. Similar functions should behave similarly if they are executed with the same input [7], [15], [18], [25]. Therefore, BINMATCH recor… view at source ↗

**Figure 6.** Figure 6: Indirect Jump of a Switch on ARM 1 mov eax, [ebp+arg_0] ; load the first argument 2 mov eax, [eax] 3 ... 4 call eax ; indirect call [PITH_FULL_IMAGE:figures/full_fig_p006_6.png] view at source ↗

**Figure 8.** Figure 8: Indirect Call Affected by the Control Flow on x86 [PITH_FULL_IMAGE:figures/full_fig_p007_8.png] view at source ↗

**Figure 10.** Figure 10: Accumulative Function Ratio versus Pruning [PITH_FULL_IMAGE:figures/full_fig_p010_10.png] view at source ↗

**Figure 11.** Figure 11: Accuracy of Cross-optimization Comparison [PITH_FULL_IMAGE:figures/full_fig_p012_11.png] view at source ↗

**Figure 12.** Figure 12: Accuracy of Cross-compiler Comparison redundant entries to the semantic signature with comparison operand values (§3.1). In contrast, SSE directly operates on a specific register set (i.e., XMM registers) and has no extra operations. Besides, x87 could handle single precision, double precision, and even 80-bit double-extended precision floating-point calculation, while SSE mainly processes single-precis… view at source ↗

read the original abstract

Binary code similarity comparison is a methodology for identifying similar or identical code fragments in binary programs. It is indispensable in fields of software engineering and security, which has many important applications (e.g., plagiarism detection, bug detection). With the widespread of smart and IoT (Internet of Things) devices, an increasing number of programs are ported to multiple architectures (e.g. ARM, MIPS). It becomes necessary to detect similar binary code across architectures as well. The main challenge of this topic lies in the semantics-equivalent code transformation resulting from different compilation settings, code obfuscation, and varied instruction set architectures. Another challenge is the trade-off between comparison accuracy and coverage. Unfortunately, existing methods still heavily rely on semantics-less code features which are susceptible to the code transformation. Additionally, they perform the comparison merely either in a static or in a dynamic manner, which cannot achieve high accuracy and coverage simultaneously. In this paper, we propose a semantics-based hybrid method to compare binary function similarity. We execute the reference function with test cases, then emulate the execution of every target function with the runtime information migrated from the reference function. Semantic signatures are extracted during the execution as well as the emulation. Lastly, similarity scores are calculated from the signatures to measure the likeness of functions. We have implemented the method in a prototype system designated as BinMatch and evaluate it with nine real-word projects compiled with different compilation settings, on variant architectures, and with commonly-used obfuscation methods, totally performing over 100 million pairs of function comparison.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

BinMatch's hybrid of reference execution plus state-migrated emulation for semantic signatures is a reasonable practical idea, but the migration step is likely to break under the obfuscation and cross-architecture cases the paper claims to handle.

read the letter

The paper's core contribution is a hybrid method: run the reference function on test cases, migrate runtime state (inputs, registers, memory) to drive emulation of each target function, collect semantic signatures from both, and score similarity from those signatures. They built BinMatch and ran it on nine real projects across compilers, architectures, and common obfuscations for more than 100 million function pairs. That evaluation volume is the strongest part of the work and gives it some credibility for tool-building purposes. The migration step is presented as the way to get both accuracy and coverage at once, which is the stated goal. The scale and the concrete prototype are what make this worth noticing. The soft spot is the migration assumption itself. When obfuscation alters control flow or architectures change register conventions, the migrated state will often not reach the corresponding basic blocks in the target. The signatures then become incomparable, and the hybrid advantage disappears. The evaluation includes obfuscated binaries but does not isolate or measure cases where path divergence occurs, so the accuracy claim rests on an untested premise. The abstract also gives no precise definition of the semantic signatures or the scoring function, which makes it hard to judge how much this actually improves on prior dynamic or static techniques. Readers working on binary analysis tools for security or software engineering would get the most from the implementation details and the large test set. The work is coherent enough on its own terms to deserve referee time rather than a desk reject.

Referee Report

3 major / 2 minor

Summary. The paper proposes BinMatch, a semantics-based hybrid method for binary function similarity comparison. It executes a reference function on test cases, migrates runtime state (inputs, registers, memory) to emulate each target function, extracts semantic signatures during both execution and emulation, and computes similarity scores. The approach is evaluated on nine real-world projects across architectures, compilation settings, and obfuscation methods, involving over 100 million function-pair comparisons, with the central claim that it simultaneously achieves high accuracy and coverage unlike pure static or dynamic baselines.

Significance. If the migration-based semantic signatures prove robust, the work would address a key limitation in binary similarity by balancing accuracy and coverage for cross-architecture and obfuscated code, with direct relevance to plagiarism detection, vulnerability search, and IoT security. The scale of the evaluation on real projects is a positive aspect.

major comments (3)

[method description] § on emulation and signature extraction (method description): the runtime migration step is presented without a concrete mechanism or fallback for cases where target control flow diverges from the reference (e.g., due to control-flow flattening or virtualization), which is load-bearing for the claim that extracted signatures remain semantically comparable.
[Evaluation section] Evaluation section (obfuscation experiments): results aggregate accuracy across obfuscation methods but do not isolate or report the subset of pairs where migration-induced path divergence occurs, leaving the hybrid advantage over static/dynamic methods unverified under the paper's own threat model.
[Signature comparison procedure] Signature comparison procedure: no formal definition or equation is given for how semantic signatures are compared to yield the final similarity score, making it impossible to assess whether the metric is invariant to the architecture/register differences introduced by migration.

minor comments (2)

[Abstract] The abstract states 'over 100 million pairs' but the evaluation should explicitly state whether pairs are deduplicated and how many unique functions are involved.
[method description] Notation for migrated state components (registers vs. memory) is used inconsistently between the method description and evaluation tables.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive comments. We address each major point below and indicate planned revisions where appropriate.

read point-by-point responses

Referee: the runtime migration step is presented without a concrete mechanism or fallback for cases where target control flow diverges from the reference (e.g., due to control-flow flattening or virtualization), which is load-bearing for the claim that extracted signatures remain semantically comparable.

Authors: The manuscript describes migration as copying initial runtime state (registers, memory, inputs) from reference execution to initialize target emulation, after which the emulator follows the target's native control flow. Signatures capture architecture-independent behaviors such as memory access patterns. We agree the description lacks sufficient concreteness and will revise the method section to include a detailed mechanism, pseudocode for migration, and handling of divergence via continued partial emulation. revision: yes
Referee: results aggregate accuracy across obfuscation methods but do not isolate or report the subset of pairs where migration-induced path divergence occurs, leaving the hybrid advantage over static/dynamic methods unverified under the paper's own threat model.

Authors: The evaluation presents aggregate accuracy on obfuscated binaries to demonstrate overall robustness. We concur that isolating divergence cases would better substantiate the hybrid benefit. We will add a breakdown in the evaluation section reporting the fraction of pairs exhibiting migration-induced divergence and accuracy on that subset. revision: yes
Referee: no formal definition or equation is given for how semantic signatures are compared to yield the final similarity score, making it impossible to assess whether the metric is invariant to the architecture/register differences introduced by migration.

Authors: Similarity is computed via a normalized distance metric on signatures after abstracting registers and memory to a common model. We will insert a formal definition and equation in the signature comparison subsection to specify the metric and prove its invariance to migration-induced architectural differences. revision: yes

Circularity Check

0 steps flagged

No circularity; method is self-contained with external evaluation

full rationale

The paper describes executing a reference function on test cases, migrating runtime state to emulate target functions, extracting semantic signatures during both, and computing similarity scores. No equations, fitted parameters, self-citations, or ansatzes are shown that reduce the claimed accuracy or signatures to quantities defined by construction from the authors' inputs or prior work. The evaluation on nine real-world projects with varied compilation, architectures, and obfuscation provides independent external benchmarks rather than internal redefinition. This is the normal case of a non-circular empirical method description.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review; no free parameters, axioms, or invented entities are stated or implied at the level of detail needed to populate the ledger.

pith-pipeline@v0.9.0 · 5811 in / 1129 out tokens · 23563 ms · 2026-05-25T12:11:28.248124+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

64 extracted references · 64 canonical work pages

[1]

Binmatch: A semantics-based hybrid approach on binary code clone analysis,

Y. Hu, Y. Zhang, J. Li, H. Wang, B. Li, and D. Gu, “Binmatch: A semantics-based hybrid approach on binary code clone analysis,” in Proceedings of the 34th International Conference on Software Main- tenance and Evolution, ser. ICSME’18. IEEE, 2018. JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 17

work page 2018
[2]

Value-based program characterization and its application to software plagia- rism detection,

Y.-C. Jhi, X. Wang, X. Jia, S. Zhu, P . Liu, and D. Wu, “Value-based program characterization and its application to software plagia- rism detection,” in Proceedings of the 33rd International Conference on Software Engineering, ser. ICSE’11. ACM, 2011

work page 2011
[3]

A ﬁrst step towards algorithm plagiarism detection,

F. Zhang, Y.-C. Jhi, D. Wu, P . Liu, and S. Zhu, “A ﬁrst step towards algorithm plagiarism detection,” in Proceedings of the 2012 Inter- national Symposium on Software Testing and Analysis , ser. ISSTA’12. ACM, 2012

work page 2012
[4]

Program logic based software plagiarism detection,

F. Zhang, D. Wu, P . Liu, and S. Zhu, “Program logic based software plagiarism detection,” in Proceedings of the 25th IEEE International Symposium on Software Reliability Engineering, ser. ISSRE’14, 2014

work page 2014
[5]

Rendezvous: A search engine for binary code,

W. M. Khoo, A. Mycroft, and R. Anderson, “Rendezvous: A search engine for binary code,” in Proceedings of the 10th Working Conference on Mining Software Repositories, ser. MSR’13. IEEE Press, 2013

work page 2013
[6]

Tracelet-based code search in executa- bles,

Y. David and E. Yahav, “Tracelet-based code search in executa- bles,” in Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation , ser. PLDI’14. ACM, 2014

work page 2014
[7]

Bingo: Cross-architecture cross-os binary search,

M. Chandramohan, Y. Xue, Z. Xu, Y. Liu, C. Y. Cho, and H. B. K. Tan, “Bingo: Cross-architecture cross-os binary search,” in Pro- ceedings of the 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, ser. FSE’16. ACM, 2016

work page 2016
[8]

Cross-architecture binary semantics understanding via similar code comparison,

Y. Hu, Y. Zhang, J. Li, and D. Gu, “Cross-architecture binary semantics understanding via similar code comparison,” in Pro- ceedings of the 23rd International Conference on Software Analysis, Evolution, and Reengineering, ser. SANER’16. IEEE, 2016

work page 2016
[9]

The software similarity problem in malware analysis,

W. Andrew and L. Arun, “The software similarity problem in malware analysis,” in Duplication, Redundancy, and Similar- ity in Software , ser. IBFI’07. Internationales Begegnungs- und Forschungszentrum für Informatik, 2007

work page 2007
[10]

Lines of malicious code: Insights into the malicious software industry,

M. Lindorfer, A. Di Federico, F. Maggi, P . M. Comparetti, and S. Zanero, “Lines of malicious code: Insights into the malicious software industry,” in Proceedings of the 28th Annual Computer Security Applications Conference, ser. ACSAC’12. ACM, 2012

work page 2012
[11]

Binsim: Trace-based semantic binary difﬁng via system call sliced segment equivalence check- ing,

J. Ming, D. Xu, Y. Jiang, and D. Wu, “Binsim: Trace-based semantic binary difﬁng via system call sliced segment equivalence check- ing,” in Proceedings of the 26th USENIX Security Symposium , ser. SEC’17. USENIX Association, 2017

work page 2017
[12]

Automatic patch-based exploit generation is possible: Techniques and impli- cations,

D. Brumley, P . Poosankam, D. Song, and J. Zheng, “Automatic patch-based exploit generation is possible: Techniques and impli- cations,” in 2008 IEEE Symposium on Security and Privacy, ser. SP’08. IEEE, 2008

work page 2008
[13]

Precise and accurate patch presence test for binaries,

H. Zhang and Z. Qian, “Precise and accurate patch presence test for binaries,” in 27th USENIX Security Symposium, ser. SEC’18

work page
[14]

Lever- aging semantic signatures for bug search in binary programs,

J. Pewny, F. Schuster, L. Bernhard, T. Holz, and C. Rossow, “Lever- aging semantic signatures for bug search in binary programs,” in Proceedings of the 30th Annual Computer Security Applications Conference, ser. ACSAC’14. ACM, 2014

work page 2014
[15]

Cross- architecture bug search in binary executables,

J. Pewny, B. Garmany, R. Gawlik, C. Rossow, and T. Holz, “Cross- architecture bug search in binary executables,” in 2015 IEEE Sym- posium on Security and Privacy, ser. SP’15. IEEE, 2015

work page 2015
[16]

discovre: Efﬁcient cross-architecture identiﬁcation of bugs in binary code,

S. Eschweiler, K. Yakdan, and E. Gerhards-Padilla, “discovre: Efﬁcient cross-architecture identiﬁcation of bugs in binary code,” in The Network and Distributed System Security Symposium , ser. NDSS’16

work page
[17]

Scalable graph-based bug search for ﬁrmware images,

Q. Feng, R. Zhou, C. Xu, Y. Cheng, B. Testa, and H. Yin, “Scalable graph-based bug search for ﬁrmware images,” in Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, ser. CCS’16. ACM, 2016

work page 2016
[18]

Blanket execu- tion: dynamic similarity testing for program binaries and compo- nents,

M. Egele, M. Woo, P . Chapman, and D. Brumley, “Blanket execu- tion: dynamic similarity testing for program binaries and compo- nents,” in Proceedings of the 23rd USENIX Security Symposium , ser. SEC’14. USENIX Association, 2014

work page 2014
[19]

Semantics-based obfuscation-resilient binary code similarity comparison with ap- plications to software plagiarism detection,

L. Luo, J. Ming, D. Wu, P . Liu, and S. Zhu, “Semantics-based obfuscation-resilient binary code similarity comparison with ap- plications to software plagiarism detection,” in Proceedings of the 22Nd ACM SIGSOFT International Symposium on Foundations of Software Engineering, ser. FSE 2014. ACM, 2014

work page 2014
[20]

In-memory fuzzing for binary code simi- larity analysis,

S. Wang and D. Wu, “In-memory fuzzing for binary code simi- larity analysis,” in Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering , ser. ASE’17. IEEE, 2017

work page 2017
[21]

Kam1n0: Mapreduce-based assembly clone search for reverse engineering,

S. H. Ding, B. Fung, and P . Charland, “Kam1n0: Mapreduce-based assembly clone search for reverse engineering,” in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ser. KDD’16. ACM, 2016

work page 2016
[22]

Asm2vec: Boosting static representation robustness for binary clone search against code obfuscation and compiler optimization,

S. H. Ding, B. C. Fung, and P . Charland, “Asm2vec: Boosting static representation robustness for binary clone search against code obfuscation and compiler optimization,” in 2019 IEEE Symposium on Security and Privacy, ser. SP’19. IEEE, 2019

work page 2019
[23]

Harper, Practical Foundations for Programming Languages, 2nd ed

R. Harper, Practical Foundations for Programming Languages, 2nd ed. New York, NY, USA: Cambridge University Press, 2016

work page 2016
[24]

Structural comparison of executable objects,

H. Flake, “Structural comparison of executable objects,” Proceed- ings of the 1st International Conference on Detection of Intrusions and Malware and Vulnerability Assessment, 2004

work page 2004
[25]

Binary code clone detection across architectures and compiling conﬁgurations,

Y. Hu, Y. Zhang, J. Li, and D. Gu, “Binary code clone detection across architectures and compiling conﬁgurations,” in Proceedings of the 25th International Conference on Program Comprehension , ser. ICPC’17. IEEE, 2017

work page 2017
[26]

Turning programs against each other: high coverage fuzz-testing using binary-code mutation and dynamic slicing,

U. Kargén and N. Shahmehri, “Turning programs against each other: high coverage fuzz-testing using binary-code mutation and dynamic slicing,” in Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering, ser. FSE’15. ACM, 2015

work page 2015
[27]

Behavior based software theft detection,

X. Wang, Y.-C. Jhi, S. Zhu, and P . Liu, “Behavior based software theft detection,” in Proceedings of the 16th ACM Conference on Computer and Communications Security, ser. CCS’09. ACM, 2009

work page 2009
[28]

Can i clone this piece of code here?

X. Wang, Y. Dang, L. Zhang, D. Zhang, E. Lan, and H. Mei, “Can i clone this piece of code here?” in Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineer- ing, ser. ASE’12. ACM, 2012

work page 2012
[29]

Similarity measures in scientometric research: The jaccard index versus salton’s cosine formula

L. Hamers et al. , “Similarity measures in scientometric research: The jaccard index versus salton’s cosine formula.” Information Processing and Management, vol. 25, no. 3, pp. 315–318, 1989

work page 1989
[30]

A survey of longest common subsequence algorithms,

L. Bergroth, H. Hakonen, and T. Raita, “A survey of longest common subsequence algorithms,” in Proceedings of the 7th Interna- tional Symposium on String Processing and Information Retrieval , ser. SPIRE’00. IEEE, 2000

work page 2000
[31]

Error Detecting and Error Correcting Codes,

R. Hamming, “Error Detecting and Error Correcting Codes,” Bell System Techincal Journal, vol. 29, pp. 147–160, 1950

work page 1950
[32]

Similarity estimation techniques from rounding algorithms,

M. S. Charikar, “Similarity estimation techniques from rounding algorithms,” in Proceedings of the 34th Annual ACM Symposium on Theory of Computing, ser. STOC ’02. ACM, 2002

work page 2002
[34]

Binary code is not easy,

X. Meng and B. P . Miller, “Binary code is not easy,” in Proceedings of the 25th International Symposium on Software Testing and Analysis , ser. ISSTA’16. ACM, 2016

work page 2016
[35]

Valgrind: a framework for heavy- weight dynamic binary instrumentation,

N. Nethercote and J. Seward, “Valgrind: a framework for heavy- weight dynamic binary instrumentation,” in Proceedings of the 28th ACM SIGPLAN Conference on Programming Language Design and Implementation, ser. PLDI’07. ACM, 2007

work page 2007
[36]

Sok:(state of) the art of war: Offensive techniques in binary analysis,

Y. Shoshitaishvili, R. Wang, C. Salls, N. Stephens, M. Polino, A. Dutcher, J. Grosen, S. Feng, C. Hauser, C. Kruegel et al. , “Sok:(state of) the art of war: Offensive techniques in binary analysis,” in 2016 IEEE Symposium on Security and Privacy , ser. SP’16. IEEE, 2016

work page 2016
[37]

Proﬁle- guided automatic inline expansion for c programs,

P . P . Chang, S. A. Mahlke, W. Y. Chen, and W.-M. W. Hwu, “Proﬁle- guided automatic inline expansion for c programs,” Software: Practice and Experience, vol. 22, no. 5, pp. 349–369, 1992

work page 1992
[38]

Code obfuscation literature survey,

A. Balakrishnan and C. Schulze, “Code obfuscation literature survey,” CS701 Construction of compilers, vol. 19, 2005

work page 2005
[39]

Obfuscator- LLVM – software protection for the masses,

P . Junod, J. Rinaldini, J. Wehrli, and J. Michielin, “Obfuscator- LLVM – software protection for the masses,” in Proceedings of the IEEE/ACM 1st International Workshop on Software Protection , ser. SPRO’15. IEEE, 2015

work page 2015
[40]

Distributed representations of sentences and documents,

Q. Le and T. Mikolov, “Distributed representations of sentences and documents,” in Proceedings of the 31st International Confer- ence on International Conference on Machine Learning , ser. ICML’14. JMLR.org, 2014

work page 2014
[41]

Obfuscating c++ programs via control ﬂow ﬂattening,

T. László and Á. Kiss, “Obfuscating c++ programs via control ﬂow ﬂattening,” Annales Universitatis Scientarum Budapestinensis de Rolando Eötvös Nominatae, Sectio Computatorica, vol. 30, pp. 3–19, 2009

work page 2009
[42]

Qemu, a fast and portable dynamic translator,

F. Bellard, “Qemu, a fast and portable dynamic translator,” in Proceedings of the Annual Conference on USENIX Annual Technical Conference, ser. ATEC’05. Berkeley, CA, USA: USENIX Associa- tion, 2005

work page 2005
[43]

BYTEWEIGHT: Learning to recognize functions in binary code,

T. Bao, J. Burket, M. Woo, R. Turner, and D. Brumley, “BYTEWEIGHT: Learning to recognize functions in binary code,” in Proceedings of the 23rd USENIX Security Symposium , ser. SEC’14. San Diego, CA: USENIX Association, 2014. JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 18

work page 2014
[44]

Recognizing functions in binaries with neural networks,

E. C. R. Shin, D. Song, and R. Moazzezi, “Recognizing functions in binaries with neural networks,” in Proceedings of the 24th USENIX Conference on Security Symposium, ser. SEC’15, 2015

work page 2015
[45]

An in-depth analysis of disassembly on full-scale x86/x64 binaries,

D. Andriesse, X. Chen, V . Van Der Veen, A. Slowinska, and H. Bos, “An in-depth analysis of disassembly on full-scale x86/x64 binaries,” in Proceedings of the 25th USENIX Conference on Security Symposium, ser. SEC’16. USENIX Association, 2016

work page 2016
[46]

Testing intermediate representations for binary analysis,

S. Kim, M. Faerevaag, M. Jung, S. Jung, D. Oh, J. Lee, and S. K. Cha, “Testing intermediate representations for binary analysis,” in Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering, ser. ASE’17. IEEE Press, 2017

work page 2017
[47]

Automating patching of vulnerable open- source software versions in application binaries,

R. Duan, A. Bijlani, Y. Ji, O. Alrawi, Y. Xiong, M. Ike, B. Saltafor- maggio, and W. Lee, “Automating patching of vulnerable open- source software versions in application binaries,” in The Network and Distributed System Security Symposium, ser. NDSS’19, 2019

work page 2019
[48]

Deobfuscation: Reverse engineering obfuscated code,

S. K. Udupa, S. K. Debray, and M. Madou, “Deobfuscation: Reverse engineering obfuscated code,” in Proceedings of the 12th Working Conference on Reverse Engineering , ser. WCRE’05. IEEE, 2005

work page 2005
[49]

Symbolic execution of obfuscated code,

B. Yadegari and S. Debray, “Symbolic execution of obfuscated code,” in Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, ser. CCS’15. ACM, 2015

work page 2015
[50]

A generic approach to automatic deobfuscation of executable code,

B. Yadegari, B. Johannesmeyer, B. Whitely, and S. Debray, “A generic approach to automatic deobfuscation of executable code,” in 2015 IEEE Symposium on Security and Privacy , ser. SP’15. IEEE, 2015

work page 2015
[51]

Vmhunt: A veriﬁable approach to partially-virtualized binary code simpliﬁcation,

D. Xu, J. Ming, Y. Fu, and D. Wu, “Vmhunt: A veriﬁable approach to partially-virtualized binary code simpliﬁcation,” in Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communica- tions Security, ser. CCS’18. ACM, 2018

work page 2018
[52]

Vuzzer: Application-aware evolutionary fuzzing,

S. Rawat, V . Jain, A. Kumar, L. Cojocar, C. Giuffrida, and H. Bos, “Vuzzer: Application-aware evolutionary fuzzing,” in The Network and Distributed System Security Symposium, ser. NDSS’17, 2017

work page 2017
[53]

Semantics-based obfuscation-resilient binary code similarity comparison with ap- plications to software and algorithm plagiarism detection,

L. Luo, J. Ming, D. Wu, P . Liu, and S. Zhu, “Semantics-based obfuscation-resilient binary code similarity comparison with ap- plications to software and algorithm plagiarism detection,” IEEE Transactions on Software Engineering, vol. 43, no. 12, pp. 1157–1177, 2017

work page 2017
[54]

Detecting code clones in binary executables,

A. Sæbjørnsen, J. Willcock, T. Panas, D. Quinlan, and Z. Su, “Detecting code clones in binary executables,” in Proceedings of the Eighteenth International Symposium on Software Testing and Analysis, ser. ISSTA’09. ACM, 2009

work page 2009
[55]

Finding software license violations through binary code clone detection,

A. Hemel, K. T. Kalleberg, R. Vermaas, and E. Dolstra, “Finding software license violations through binary code clone detection,” in Proceedings of the 8th Working Conference on Mining Software Repositories, ser. MSR’11. ACM, 2011

work page 2011
[56]

Statistical similarity of binaries,

Y. David, N. Partush, and E. Yahav, “Statistical similarity of binaries,” in Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation , ser. PLDI’16. ACM, 2016

work page 2016
[57]

Binsequence: fast, accurate and scalable binary code reuse detection,

H. Huang, A. M. Youssef, and M. Debbabi, “Binsequence: fast, accurate and scalable binary code reuse detection,” in Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security, ser. AsiaCCS’17. ACM, 2017

work page 2017
[58]

Extracting conditional formulas for cross-platform bug search,

Q. Feng, M. Wang, M. Zhang, R. Zhou, A. Henderson, and H. Yin, “Extracting conditional formulas for cross-platform bug search,” in Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security, ser. AsiaCCS’17. ACM, 2017

work page 2017
[59]

Similarity of binaries through re-optimization,

Y. David, N. Partush, and E. Yahav, “Similarity of binaries through re-optimization,” in Proceedings of the 38th ACM SIGPLAN Con- ference on Programming Language Design and Implementation , ser. PLDI’17, 2017

work page 2017
[60]

Firmup: Precise static detection of common vulnerabilities in ﬁrmware,

——, “Firmup: Precise static detection of common vulnerabilities in ﬁrmware,” in Proceedings of the 23rd International Conference on Architectural Support for Programming Languages and Operating Systems, ser. ASPLOS’18. ACM, 2018

work page 2018
[61]

Binarm: Scalable and efﬁcient detection of vulner- abilities in ﬁrmware images of intelligent electronic devices,

P . Shirani, L. Collard, B. L. Agba, B. Lebel, M. Debbabi, L. Wang, and A. Hanna, “Binarm: Scalable and efﬁcient detection of vulner- abilities in ﬁrmware images of intelligent electronic devices,” in International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment, ser. DIMVA’18. Springer, 2018

work page 2018
[62]

Neural network-based graph embedding for cross-platform binary code similarity detection,

X. Xu, C. Liu, Q. Feng, H. Yin, L. Song, and D. Song, “Neural network-based graph embedding for cross-platform binary code similarity detection,” in Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security , ser. CCS’17. New York, NY, USA: ACM, 2017

work page 2017
[63]

αdiff: Cross-version binary code similarity detection with dnn,

B. Liu, W. Huo, C. Zhang, W. Li, F. Li, A. Piao, and W. Zou, “ αdiff: Cross-version binary code similarity detection with dnn,” in Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering, ser. ASE’18. New York, NY, USA: ACM, 2018

work page 2018
[64]

Neural machine translation inspired binary code similarity comparison beyond function pairs,

F. Zuo, X. Li, Z. Zhang, P . Young, L. Luo, and Q. Zeng, “Neural machine translation inspired binary code similarity comparison beyond function pairs,” in The Network and Distributed System Security Symposium, ser. NDSS’19, 2019

work page 2019
[65]

Efﬁcient estimation of word representations in vector space,

T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efﬁcient estimation of word representations in vector space,” in Proceedings of the International Conference on Learning Representations, 2013

work page 2013

[1] [1]

Binmatch: A semantics-based hybrid approach on binary code clone analysis,

Y. Hu, Y. Zhang, J. Li, H. Wang, B. Li, and D. Gu, “Binmatch: A semantics-based hybrid approach on binary code clone analysis,” in Proceedings of the 34th International Conference on Software Main- tenance and Evolution, ser. ICSME’18. IEEE, 2018. JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 17

work page 2018

[2] [2]

Value-based program characterization and its application to software plagia- rism detection,

Y.-C. Jhi, X. Wang, X. Jia, S. Zhu, P . Liu, and D. Wu, “Value-based program characterization and its application to software plagia- rism detection,” in Proceedings of the 33rd International Conference on Software Engineering, ser. ICSE’11. ACM, 2011

work page 2011

[3] [3]

A ﬁrst step towards algorithm plagiarism detection,

F. Zhang, Y.-C. Jhi, D. Wu, P . Liu, and S. Zhu, “A ﬁrst step towards algorithm plagiarism detection,” in Proceedings of the 2012 Inter- national Symposium on Software Testing and Analysis , ser. ISSTA’12. ACM, 2012

work page 2012

[4] [4]

Program logic based software plagiarism detection,

F. Zhang, D. Wu, P . Liu, and S. Zhu, “Program logic based software plagiarism detection,” in Proceedings of the 25th IEEE International Symposium on Software Reliability Engineering, ser. ISSRE’14, 2014

work page 2014

[5] [5]

Rendezvous: A search engine for binary code,

W. M. Khoo, A. Mycroft, and R. Anderson, “Rendezvous: A search engine for binary code,” in Proceedings of the 10th Working Conference on Mining Software Repositories, ser. MSR’13. IEEE Press, 2013

work page 2013

[6] [6]

Tracelet-based code search in executa- bles,

Y. David and E. Yahav, “Tracelet-based code search in executa- bles,” in Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation , ser. PLDI’14. ACM, 2014

work page 2014

[7] [7]

Bingo: Cross-architecture cross-os binary search,

M. Chandramohan, Y. Xue, Z. Xu, Y. Liu, C. Y. Cho, and H. B. K. Tan, “Bingo: Cross-architecture cross-os binary search,” in Pro- ceedings of the 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, ser. FSE’16. ACM, 2016

work page 2016

[8] [8]

Cross-architecture binary semantics understanding via similar code comparison,

Y. Hu, Y. Zhang, J. Li, and D. Gu, “Cross-architecture binary semantics understanding via similar code comparison,” in Pro- ceedings of the 23rd International Conference on Software Analysis, Evolution, and Reengineering, ser. SANER’16. IEEE, 2016

work page 2016

[9] [9]

The software similarity problem in malware analysis,

W. Andrew and L. Arun, “The software similarity problem in malware analysis,” in Duplication, Redundancy, and Similar- ity in Software , ser. IBFI’07. Internationales Begegnungs- und Forschungszentrum für Informatik, 2007

work page 2007

[10] [10]

Lines of malicious code: Insights into the malicious software industry,

M. Lindorfer, A. Di Federico, F. Maggi, P . M. Comparetti, and S. Zanero, “Lines of malicious code: Insights into the malicious software industry,” in Proceedings of the 28th Annual Computer Security Applications Conference, ser. ACSAC’12. ACM, 2012

work page 2012

[11] [11]

Binsim: Trace-based semantic binary difﬁng via system call sliced segment equivalence check- ing,

J. Ming, D. Xu, Y. Jiang, and D. Wu, “Binsim: Trace-based semantic binary difﬁng via system call sliced segment equivalence check- ing,” in Proceedings of the 26th USENIX Security Symposium , ser. SEC’17. USENIX Association, 2017

work page 2017

[12] [12]

Automatic patch-based exploit generation is possible: Techniques and impli- cations,

D. Brumley, P . Poosankam, D. Song, and J. Zheng, “Automatic patch-based exploit generation is possible: Techniques and impli- cations,” in 2008 IEEE Symposium on Security and Privacy, ser. SP’08. IEEE, 2008

work page 2008

[13] [13]

Precise and accurate patch presence test for binaries,

H. Zhang and Z. Qian, “Precise and accurate patch presence test for binaries,” in 27th USENIX Security Symposium, ser. SEC’18

work page

[14] [14]

Lever- aging semantic signatures for bug search in binary programs,

J. Pewny, F. Schuster, L. Bernhard, T. Holz, and C. Rossow, “Lever- aging semantic signatures for bug search in binary programs,” in Proceedings of the 30th Annual Computer Security Applications Conference, ser. ACSAC’14. ACM, 2014

work page 2014

[15] [15]

Cross- architecture bug search in binary executables,

J. Pewny, B. Garmany, R. Gawlik, C. Rossow, and T. Holz, “Cross- architecture bug search in binary executables,” in 2015 IEEE Sym- posium on Security and Privacy, ser. SP’15. IEEE, 2015

work page 2015

[16] [16]

discovre: Efﬁcient cross-architecture identiﬁcation of bugs in binary code,

S. Eschweiler, K. Yakdan, and E. Gerhards-Padilla, “discovre: Efﬁcient cross-architecture identiﬁcation of bugs in binary code,” in The Network and Distributed System Security Symposium , ser. NDSS’16

work page

[17] [17]

Scalable graph-based bug search for ﬁrmware images,

Q. Feng, R. Zhou, C. Xu, Y. Cheng, B. Testa, and H. Yin, “Scalable graph-based bug search for ﬁrmware images,” in Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, ser. CCS’16. ACM, 2016

work page 2016

[18] [18]

Blanket execu- tion: dynamic similarity testing for program binaries and compo- nents,

M. Egele, M. Woo, P . Chapman, and D. Brumley, “Blanket execu- tion: dynamic similarity testing for program binaries and compo- nents,” in Proceedings of the 23rd USENIX Security Symposium , ser. SEC’14. USENIX Association, 2014

work page 2014

[19] [19]

Semantics-based obfuscation-resilient binary code similarity comparison with ap- plications to software plagiarism detection,

L. Luo, J. Ming, D. Wu, P . Liu, and S. Zhu, “Semantics-based obfuscation-resilient binary code similarity comparison with ap- plications to software plagiarism detection,” in Proceedings of the 22Nd ACM SIGSOFT International Symposium on Foundations of Software Engineering, ser. FSE 2014. ACM, 2014

work page 2014

[20] [20]

In-memory fuzzing for binary code simi- larity analysis,

S. Wang and D. Wu, “In-memory fuzzing for binary code simi- larity analysis,” in Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering , ser. ASE’17. IEEE, 2017

work page 2017

[21] [21]

Kam1n0: Mapreduce-based assembly clone search for reverse engineering,

S. H. Ding, B. Fung, and P . Charland, “Kam1n0: Mapreduce-based assembly clone search for reverse engineering,” in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ser. KDD’16. ACM, 2016

work page 2016

[22] [22]

Asm2vec: Boosting static representation robustness for binary clone search against code obfuscation and compiler optimization,

S. H. Ding, B. C. Fung, and P . Charland, “Asm2vec: Boosting static representation robustness for binary clone search against code obfuscation and compiler optimization,” in 2019 IEEE Symposium on Security and Privacy, ser. SP’19. IEEE, 2019

work page 2019

[23] [23]

Harper, Practical Foundations for Programming Languages, 2nd ed

R. Harper, Practical Foundations for Programming Languages, 2nd ed. New York, NY, USA: Cambridge University Press, 2016

work page 2016

[24] [24]

Structural comparison of executable objects,

H. Flake, “Structural comparison of executable objects,” Proceed- ings of the 1st International Conference on Detection of Intrusions and Malware and Vulnerability Assessment, 2004

work page 2004

[25] [25]

Binary code clone detection across architectures and compiling conﬁgurations,

Y. Hu, Y. Zhang, J. Li, and D. Gu, “Binary code clone detection across architectures and compiling conﬁgurations,” in Proceedings of the 25th International Conference on Program Comprehension , ser. ICPC’17. IEEE, 2017

work page 2017

[26] [26]

Turning programs against each other: high coverage fuzz-testing using binary-code mutation and dynamic slicing,

U. Kargén and N. Shahmehri, “Turning programs against each other: high coverage fuzz-testing using binary-code mutation and dynamic slicing,” in Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering, ser. FSE’15. ACM, 2015

work page 2015

[27] [27]

Behavior based software theft detection,

X. Wang, Y.-C. Jhi, S. Zhu, and P . Liu, “Behavior based software theft detection,” in Proceedings of the 16th ACM Conference on Computer and Communications Security, ser. CCS’09. ACM, 2009

work page 2009

[28] [28]

Can i clone this piece of code here?

X. Wang, Y. Dang, L. Zhang, D. Zhang, E. Lan, and H. Mei, “Can i clone this piece of code here?” in Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineer- ing, ser. ASE’12. ACM, 2012

work page 2012

[29] [29]

Similarity measures in scientometric research: The jaccard index versus salton’s cosine formula

L. Hamers et al. , “Similarity measures in scientometric research: The jaccard index versus salton’s cosine formula.” Information Processing and Management, vol. 25, no. 3, pp. 315–318, 1989

work page 1989

[30] [30]

A survey of longest common subsequence algorithms,

L. Bergroth, H. Hakonen, and T. Raita, “A survey of longest common subsequence algorithms,” in Proceedings of the 7th Interna- tional Symposium on String Processing and Information Retrieval , ser. SPIRE’00. IEEE, 2000

work page 2000

[31] [31]

Error Detecting and Error Correcting Codes,

R. Hamming, “Error Detecting and Error Correcting Codes,” Bell System Techincal Journal, vol. 29, pp. 147–160, 1950

work page 1950

[32] [32]

Similarity estimation techniques from rounding algorithms,

M. S. Charikar, “Similarity estimation techniques from rounding algorithms,” in Proceedings of the 34th Annual ACM Symposium on Theory of Computing, ser. STOC ’02. ACM, 2002

work page 2002

[33] [34]

Binary code is not easy,

X. Meng and B. P . Miller, “Binary code is not easy,” in Proceedings of the 25th International Symposium on Software Testing and Analysis , ser. ISSTA’16. ACM, 2016

work page 2016

[34] [35]

Valgrind: a framework for heavy- weight dynamic binary instrumentation,

N. Nethercote and J. Seward, “Valgrind: a framework for heavy- weight dynamic binary instrumentation,” in Proceedings of the 28th ACM SIGPLAN Conference on Programming Language Design and Implementation, ser. PLDI’07. ACM, 2007

work page 2007

[35] [36]

Sok:(state of) the art of war: Offensive techniques in binary analysis,

Y. Shoshitaishvili, R. Wang, C. Salls, N. Stephens, M. Polino, A. Dutcher, J. Grosen, S. Feng, C. Hauser, C. Kruegel et al. , “Sok:(state of) the art of war: Offensive techniques in binary analysis,” in 2016 IEEE Symposium on Security and Privacy , ser. SP’16. IEEE, 2016

work page 2016

[36] [37]

Proﬁle- guided automatic inline expansion for c programs,

P . P . Chang, S. A. Mahlke, W. Y. Chen, and W.-M. W. Hwu, “Proﬁle- guided automatic inline expansion for c programs,” Software: Practice and Experience, vol. 22, no. 5, pp. 349–369, 1992

work page 1992

[37] [38]

Code obfuscation literature survey,

A. Balakrishnan and C. Schulze, “Code obfuscation literature survey,” CS701 Construction of compilers, vol. 19, 2005

work page 2005

[38] [39]

Obfuscator- LLVM – software protection for the masses,

P . Junod, J. Rinaldini, J. Wehrli, and J. Michielin, “Obfuscator- LLVM – software protection for the masses,” in Proceedings of the IEEE/ACM 1st International Workshop on Software Protection , ser. SPRO’15. IEEE, 2015

work page 2015

[39] [40]

Distributed representations of sentences and documents,

Q. Le and T. Mikolov, “Distributed representations of sentences and documents,” in Proceedings of the 31st International Confer- ence on International Conference on Machine Learning , ser. ICML’14. JMLR.org, 2014

work page 2014

[40] [41]

Obfuscating c++ programs via control ﬂow ﬂattening,

T. László and Á. Kiss, “Obfuscating c++ programs via control ﬂow ﬂattening,” Annales Universitatis Scientarum Budapestinensis de Rolando Eötvös Nominatae, Sectio Computatorica, vol. 30, pp. 3–19, 2009

work page 2009

[41] [42]

Qemu, a fast and portable dynamic translator,

F. Bellard, “Qemu, a fast and portable dynamic translator,” in Proceedings of the Annual Conference on USENIX Annual Technical Conference, ser. ATEC’05. Berkeley, CA, USA: USENIX Associa- tion, 2005

work page 2005

[42] [43]

BYTEWEIGHT: Learning to recognize functions in binary code,

T. Bao, J. Burket, M. Woo, R. Turner, and D. Brumley, “BYTEWEIGHT: Learning to recognize functions in binary code,” in Proceedings of the 23rd USENIX Security Symposium , ser. SEC’14. San Diego, CA: USENIX Association, 2014. JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 18

work page 2014

[43] [44]

Recognizing functions in binaries with neural networks,

E. C. R. Shin, D. Song, and R. Moazzezi, “Recognizing functions in binaries with neural networks,” in Proceedings of the 24th USENIX Conference on Security Symposium, ser. SEC’15, 2015

work page 2015

[44] [45]

An in-depth analysis of disassembly on full-scale x86/x64 binaries,

D. Andriesse, X. Chen, V . Van Der Veen, A. Slowinska, and H. Bos, “An in-depth analysis of disassembly on full-scale x86/x64 binaries,” in Proceedings of the 25th USENIX Conference on Security Symposium, ser. SEC’16. USENIX Association, 2016

work page 2016

[45] [46]

Testing intermediate representations for binary analysis,

S. Kim, M. Faerevaag, M. Jung, S. Jung, D. Oh, J. Lee, and S. K. Cha, “Testing intermediate representations for binary analysis,” in Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering, ser. ASE’17. IEEE Press, 2017

work page 2017

[46] [47]

Automating patching of vulnerable open- source software versions in application binaries,

R. Duan, A. Bijlani, Y. Ji, O. Alrawi, Y. Xiong, M. Ike, B. Saltafor- maggio, and W. Lee, “Automating patching of vulnerable open- source software versions in application binaries,” in The Network and Distributed System Security Symposium, ser. NDSS’19, 2019

work page 2019

[47] [48]

Deobfuscation: Reverse engineering obfuscated code,

S. K. Udupa, S. K. Debray, and M. Madou, “Deobfuscation: Reverse engineering obfuscated code,” in Proceedings of the 12th Working Conference on Reverse Engineering , ser. WCRE’05. IEEE, 2005

work page 2005

[48] [49]

Symbolic execution of obfuscated code,

B. Yadegari and S. Debray, “Symbolic execution of obfuscated code,” in Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, ser. CCS’15. ACM, 2015

work page 2015

[49] [50]

A generic approach to automatic deobfuscation of executable code,

B. Yadegari, B. Johannesmeyer, B. Whitely, and S. Debray, “A generic approach to automatic deobfuscation of executable code,” in 2015 IEEE Symposium on Security and Privacy , ser. SP’15. IEEE, 2015

work page 2015

[50] [51]

Vmhunt: A veriﬁable approach to partially-virtualized binary code simpliﬁcation,

D. Xu, J. Ming, Y. Fu, and D. Wu, “Vmhunt: A veriﬁable approach to partially-virtualized binary code simpliﬁcation,” in Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communica- tions Security, ser. CCS’18. ACM, 2018

work page 2018

[51] [52]

Vuzzer: Application-aware evolutionary fuzzing,

S. Rawat, V . Jain, A. Kumar, L. Cojocar, C. Giuffrida, and H. Bos, “Vuzzer: Application-aware evolutionary fuzzing,” in The Network and Distributed System Security Symposium, ser. NDSS’17, 2017

work page 2017

[52] [53]

Semantics-based obfuscation-resilient binary code similarity comparison with ap- plications to software and algorithm plagiarism detection,

L. Luo, J. Ming, D. Wu, P . Liu, and S. Zhu, “Semantics-based obfuscation-resilient binary code similarity comparison with ap- plications to software and algorithm plagiarism detection,” IEEE Transactions on Software Engineering, vol. 43, no. 12, pp. 1157–1177, 2017

work page 2017

[53] [54]

Detecting code clones in binary executables,

A. Sæbjørnsen, J. Willcock, T. Panas, D. Quinlan, and Z. Su, “Detecting code clones in binary executables,” in Proceedings of the Eighteenth International Symposium on Software Testing and Analysis, ser. ISSTA’09. ACM, 2009

work page 2009

[54] [55]

Finding software license violations through binary code clone detection,

A. Hemel, K. T. Kalleberg, R. Vermaas, and E. Dolstra, “Finding software license violations through binary code clone detection,” in Proceedings of the 8th Working Conference on Mining Software Repositories, ser. MSR’11. ACM, 2011

work page 2011

[55] [56]

Statistical similarity of binaries,

Y. David, N. Partush, and E. Yahav, “Statistical similarity of binaries,” in Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation , ser. PLDI’16. ACM, 2016

work page 2016

[56] [57]

Binsequence: fast, accurate and scalable binary code reuse detection,

H. Huang, A. M. Youssef, and M. Debbabi, “Binsequence: fast, accurate and scalable binary code reuse detection,” in Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security, ser. AsiaCCS’17. ACM, 2017

work page 2017

[57] [58]

Extracting conditional formulas for cross-platform bug search,

Q. Feng, M. Wang, M. Zhang, R. Zhou, A. Henderson, and H. Yin, “Extracting conditional formulas for cross-platform bug search,” in Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security, ser. AsiaCCS’17. ACM, 2017

work page 2017

[58] [59]

Similarity of binaries through re-optimization,

Y. David, N. Partush, and E. Yahav, “Similarity of binaries through re-optimization,” in Proceedings of the 38th ACM SIGPLAN Con- ference on Programming Language Design and Implementation , ser. PLDI’17, 2017

work page 2017

[59] [60]

Firmup: Precise static detection of common vulnerabilities in ﬁrmware,

——, “Firmup: Precise static detection of common vulnerabilities in ﬁrmware,” in Proceedings of the 23rd International Conference on Architectural Support for Programming Languages and Operating Systems, ser. ASPLOS’18. ACM, 2018

work page 2018

[60] [61]

Binarm: Scalable and efﬁcient detection of vulner- abilities in ﬁrmware images of intelligent electronic devices,

P . Shirani, L. Collard, B. L. Agba, B. Lebel, M. Debbabi, L. Wang, and A. Hanna, “Binarm: Scalable and efﬁcient detection of vulner- abilities in ﬁrmware images of intelligent electronic devices,” in International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment, ser. DIMVA’18. Springer, 2018

work page 2018

[61] [62]

Neural network-based graph embedding for cross-platform binary code similarity detection,

X. Xu, C. Liu, Q. Feng, H. Yin, L. Song, and D. Song, “Neural network-based graph embedding for cross-platform binary code similarity detection,” in Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security , ser. CCS’17. New York, NY, USA: ACM, 2017

work page 2017

[62] [63]

αdiff: Cross-version binary code similarity detection with dnn,

B. Liu, W. Huo, C. Zhang, W. Li, F. Li, A. Piao, and W. Zou, “ αdiff: Cross-version binary code similarity detection with dnn,” in Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering, ser. ASE’18. New York, NY, USA: ACM, 2018

work page 2018

[63] [64]

Neural machine translation inspired binary code similarity comparison beyond function pairs,

F. Zuo, X. Li, Z. Zhang, P . Young, L. Luo, and Q. Zeng, “Neural machine translation inspired binary code similarity comparison beyond function pairs,” in The Network and Distributed System Security Symposium, ser. NDSS’19, 2019

work page 2019

[64] [65]

Efﬁcient estimation of word representations in vector space,

T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efﬁcient estimation of word representations in vector space,” in Proceedings of the International Conference on Learning Representations, 2013

work page 2013