ASSEMBLAGE-DEEPHISTORY: A Cross-Build Binary Dataset with Temporal Coverage

Chang Liu; Edward Raff; James Holt; Kristopher Micinski; Nicol\`o Altamura; Noah Fleischmann

arxiv: 2605.21615 · v1 · pith:SXZ7DJHGnew · submitted 2026-05-20 · 💻 cs.CR · cs.LG· cs.SE

ASSEMBLAGE-DEEPHISTORY: A Cross-Build Binary Dataset with Temporal Coverage

Chang Liu , Noah Fleischmann , Nicol\`o Altamura , Edward Raff , James Holt , Kristopher Micinski This is my paper

Pith reviewed 2026-05-22 09:36 UTC · model grok-4.3

classification 💻 cs.CR cs.LGcs.SE

keywords binary datasetvulnerability detectioncross-build analysissoftware historyCVE labelingmachine learning security

0 comments

The pith

ASSEMBLAGE-DEEPHISTORY provides a single database linking 73,610 binaries to their source code, compilation details, historical versions, and CVE-labeled vulnerable functions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper presents a new dataset that brings together binaries compiled in many different ways, from different versions of software over time, and marked with known security issues. Existing collections usually miss at least one of these elements, making it hard to study how binaries change or how detection tools hold up across variations. A reader might care because this setup lets researchers test whether AI models truly understand binary vulnerabilities or just memorize patterns from specific builds. The dataset stores all the context as searchable metadata so one can query across builds and history easily. Analyses using large language models, embedding comparisons, and statistical regression illustrate how the structure supports practical work on binary similarity and vulnerability reasoning.

Core claim

The paper establishes ASSEMBLAGE-DEEPHISTORY as a consolidated dataset of 73,610 binaries from 248 open-source projects. These binaries come from GCC, Clang, and MSVC compilers at various optimization levels on Linux and Windows, including multi-year historical builds. Each entry connects to its source code, functions, debug information, other build variants, past versions, and functions known to be vulnerable.

What carries the argument

The queryable database structure that treats compilation context, source code, vulnerable functions, and package version as first-class metadata for every binary.

If this is right

LLMs can be tested in stages for recognizing vulnerabilities, using strategy guidance, and transferring detection across different builds.
Embedding methods like MalConv and jTrans can be compared on how well they group binaries from the same package versions.
Binary similarity can be broken down into effects from time between versions, code changes, and commit activity using Bayesian methods.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This structure could let developers train more robust vulnerability detectors that ignore irrelevant build differences.
Future work might track how specific vulnerabilities appear and disappear across software releases using the historical links.
Security researchers could use the cross-platform builds to study compiler-specific weaknesses in a controlled way.

Load-bearing premise

The three provided analyses sufficiently prove the dataset's value for practical tasks without requiring further large-scale tests or outside benchmarks.

What would settle it

A demonstration that the LLM benchmark results do not reflect true reasoning or that the clustering and regression fail to distinguish meaningful patterns would undermine the dataset's claimed utility.

Figures

Figures reproduced from arXiv: 2605.21615 by Chang Liu, Edward Raff, James Holt, Kristopher Micinski, Nicol\`o Altamura, Noah Fleischmann.

**Figure 1.** Figure 1: Three-Stage CVE Evaluation Design 25% of the resulting records to verify that each CVE matches the correct library and version in our dataset (manually inspected CVE IDs available in appendix). For each CVE, we chose a reference binary with lowest optimization and grouped other affected binaries to one of five Diff categories: Optimization, Compiler, OS, Version and All (per-category counts in the appendix… view at source ↗

**Figure 2.** Figure 2: Cross-build transfer based on Qwen-3.6 agent. Each panel plots Hit@ [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

**Figure 3.** Figure 3: Package-level binary similarity on ASSEMBLAGE-DEEPHISTORY’s ≥ 2-version subset (ELF + PE combined). From left to right: MalConv embedding cosine similarity, PE only; MalConv embedding cosine similarity, all packages-mean; jTrans embedding cosine similarity, all packagesmean; TLSH fuzzy-hash similarity, all packages-mean. −0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0 Logit impact (95% HDI) file change commits days bias… view at source ↗

**Figure 4.** Figure 4: Global coefficient posterior means and 95% HDIs for MalConv cosine similarity, jTrans [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 5.** Figure 5: Eval 3 cross-build transfer comparison. 21 [PITH_FULL_IMAGE:figures/full_fig_p021_5.png] view at source ↗

read the original abstract

Existing binary corpora typically capture only one or two axes of binary variation: they either provide cross-compiler builds without a temporal axis, or CVE labels for single-build binaries. None combine cross-build diversity, cross-version history, and CVE labels into a queryable structure. We present ASSEMBLAGE-DEEPHISTORY, which consolidates these dimensions into a unified framework where every binary's compilation context, source code, vulnerable functions, and package version are stored as first-class metadata. ASSEMBLAGE-DEEPHISTORY comprises 73,610 binaries spanning 248 open-source projects, compiled across GCC, Clang, and MSVC at multiple optimization levels on Linux and Windows, with multi-year historical builds. Each binary is indexed in a database that links it to its source code, functions, debug info, variant builds, historical versions, and vulnerable functions. Three analyses demonstrate this structure's value: (1) a three-stage LLM benchmark (recognition, strategy-guided detection, and cross-build transfer) to test whether LLMs reason about binary vulnerabilities or pattern-match on build-specific artifacts; (2) a comparison of MalConv embeddings, jTrans function embeddings, and TLSH fuzzy hashes quantifying how same-package versions cluster in each space; and (3) a Bayesian regression decomposing binary similarity into contributions from temporal distance, file changes, and commits.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

A dataset paper that pulls cross-compiler builds, history, and CVE labels into one metadata-linked structure, but the internal analyses don't yet demonstrate clear gains on external tasks.

read the letter

The main point with this paper is that it offers a new dataset called ASSEMBLAGE-DEEPHISTORY that brings together cross-compiler and cross-optimization builds, multi-year historical versions, and CVE labels all in one queryable database with metadata linking binaries to their sources and vulnerable functions. What they do well is the scale and the structure. 73,610 binaries across 248 projects, using GCC, Clang, and MSVC on both Linux and Windows at various optimization levels, plus historical builds. Having first-class metadata for compilation context, source code, debug info, and vulnerable functions makes it easier to query and use for experiments that need controlled variation. The three analyses they include—an LLM benchmark testing recognition, strategy-guided detection, and cross-build transfer; comparisons of embeddings from MalConv, jTrans, and TLSH on how versions cluster; and a Bayesian regression breaking down similarity by time, file changes, and commits—give some insight into the data's properties. These show that the metadata can reveal patterns in how binaries relate across builds and time. The soft spot is that these analyses are all self-referential to the new dataset. They don't include a controlled experiment demonstrating that models or methods perform better when using this richer metadata compared to existing corpora that cover only one or two axes. The claim that this enables new analyses would be stronger with an external validation or a direct comparison on a downstream task like vulnerability detection transfer. Also, while the abstract describes the construction, I'd look for more specifics on how they ensured the historical builds are accurate and how the CVE labels are mapped without errors. This paper is for researchers in computer security, particularly those working on binary analysis, machine learning models for vulnerability detection, or anyone needing diverse test sets to evaluate robustness across compilers and versions. A reader interested in building or benchmarking new tools would find the dataset potentially valuable if it's released with good documentation. It deserves a serious referee because creating and documenting large, multi-dimensional datasets is important work that can support better evaluations in the field. I would recommend sending it to peer review, focusing on whether the data release plan is solid and if the analyses sufficiently support the utility claims.

Referee Report

1 major / 2 minor

Summary. The paper presents ASSEMBLAGE-DEEPHISTORY, a dataset of 73,610 binaries spanning 248 open-source projects. It unifies cross-compiler builds (GCC, Clang, MSVC at multiple optimization levels on Linux and Windows), multi-year historical versions, and CVE labels into a single queryable database. Every binary is linked as first-class metadata to its source code, functions, debug information, variant builds, historical versions, and vulnerable functions. Value is shown via three internal analyses: a three-stage LLM benchmark (recognition, strategy-guided detection, cross-build transfer), embedding clustering comparisons (MalConv, jTrans, TLSH) on same-package versions, and Bayesian regression decomposing similarity into temporal distance, file changes, and commit factors.

Significance. If the dataset construction details and analysis results hold, the work supplies a useful resource for binary vulnerability research by consolidating axes of variation previously available only in isolation. The database indexing of compilation context, source links, and CVE labels is a concrete strength that could support new queries. No machine-checked proofs or parameter-free derivations are present, but the reproducible corpus structure itself is a positive contribution for the field.

major comments (1)

[Analyses section (corresponding to the three analyses described after the dataset construction)] The section describing the three analyses: these demonstrations remain entirely internal to the new corpus and quantify structure (e.g., clustering behavior or factor decomposition) without a controlled external comparison showing measurable gains on downstream tasks such as cross-build vulnerability transfer or historical CVE localization relative to existing single-axis corpora. This leaves the central claim that the unified metadata framework enables new reasoning capabilities resting on an unverified assumption.

minor comments (2)

[Dataset description] Clarify the exact number of variants per project and the distribution across compilers/optimizations in the dataset statistics table; current high-level aggregates make reproducibility checks harder.
[Analysis 1 and Analysis 3] Add error bars or ablation details to the LLM benchmark results and the Bayesian regression coefficients to strengthen the quantitative claims.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their constructive review of our manuscript on ASSEMBLAGE-DEEPHISTORY. We address the single major comment below and indicate where revisions have been made to the manuscript.

read point-by-point responses

Referee: [Analyses section (corresponding to the three analyses described after the dataset construction)] The section describing the three analyses: these demonstrations remain entirely internal to the new corpus and quantify structure (e.g., clustering behavior or factor decomposition) without a controlled external comparison showing measurable gains on downstream tasks such as cross-build vulnerability transfer or historical CVE localization relative to existing single-axis corpora. This leaves the central claim that the unified metadata framework enables new reasoning capabilities resting on an unverified assumption.

Authors: We appreciate the referee's observation that the three analyses are conducted internally to the corpus. The intent of these demonstrations is to illustrate the novel analytical capabilities unlocked by unifying cross-build, temporal, and CVE metadata in a single queryable structure—capabilities that cannot be exercised on existing single-axis corpora. For instance, the LLM cross-build transfer stage directly tests whether models exploit build-specific artifacts, which requires the multi-compiler and multi-version axes we provide. The embedding clustering and Bayesian regression similarly decompose effects across temporal distance and build variants in ways prior datasets do not support. We acknowledge, however, that explicit head-to-head performance gains on downstream tasks such as vulnerability detection accuracy would provide additional external validation. In the revised manuscript we have added a dedicated limitations and future-work subsection that (a) contrasts the query expressiveness of ASSEMBLAGE-DEEPHISTORY with prior corpora and (b) outlines controlled external benchmarks that the community can now perform using the released dataset. This revision clarifies the scope of our current claims while preserving the paper's focus on the dataset itself. revision: partial

Circularity Check

0 steps flagged

No circularity: dataset release with internal utility analyses remains self-contained

full rationale

The paper presents ASSEMBLAGE-DEEPHISTORY as a new corpus that unifies cross-build, temporal, and CVE metadata, then illustrates its structure via three analyses performed directly on the released binaries (LLM tasks, embedding clustering, Bayesian regression). These steps quantify internal properties of the corpus but introduce no equations, fitted parameters renamed as predictions, or self-citation chains that reduce the dataset claim to its own inputs. The central contribution is the construction and indexing of the data itself; the analyses serve as descriptive benchmarks rather than derivations whose outputs are forced by construction. No load-bearing uniqueness theorems or ansatzes from prior author work are invoked to justify the framework.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The contribution rests on standard assumptions about accurate vulnerability labeling and representative compilation settings rather than new invented entities or fitted parameters.

axioms (2)

domain assumption Compilation contexts using GCC, Clang, and MSVC at multiple optimization levels on Linux and Windows accurately capture real-world binary variation.
Invoked when describing the 73,610 binaries spanning multiple compilers and platforms.
domain assumption Vulnerable functions can be reliably identified and linked to binaries via debug info and source code.
Required for the CVE labels to serve as ground truth in the LLM benchmark and other analyses.

pith-pipeline@v0.9.0 · 5791 in / 1354 out tokens · 40706 ms · 2026-05-22T09:36:42.801003+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

106 extracted references · 106 canonical work pages · 4 internal anchors

[1]

https: / / github.com/nationalsecurityagency/ghidra

National Security Agency.Ghidra Software Reverse Engineering Framework. https: / / github.com/nationalsecurityagency/ghidra. accessed 2026-05-06. 2019

work page 2026
[2]

SecVulEval: Benchmarking LLMs for Real-World C/C++ Vulnerability Detection

Md Basim Uddin Ahmed, Nima Shiri Harzevili, Jiho Shin, Hung Viet Pham, and Song Wang. SecVulEval: Benchmarking LLMs for Real-World C/C++ Vulnerability Detection. 2025.URL: https://arxiv.org/abs/2505.19828

work page arXiv 2025
[3]

Assessing the Effectiveness of the Tigress Obfuscator Against MOPSA and BinaryNinja

Nicolò Altamura, Enrico Bragastini, Marco Campion, and Mila Dalla Preda. “Assessing the Effectiveness of the Tigress Obfuscator Against MOPSA and BinaryNinja”. In:Proceedings of the 2025 Workshop on Research on Offensive and Defensive Techniques in the Context of Man At The End (MATE) Attacks. 2025.URL:https://doi.org/10.1145/3733817.3762702

work page doi:10.1145/3733817.3762702 2025
[4]

EMBER: An Open Dataset for Training Static PE Malware Machine Learning Models

H. Anderson and Phil Roth. “EMBER: An Open Dataset for Training Static PE Malware Machine Learning Models”. In:ArXiv(2018).URL: https://api.semanticscholar.org/ CorpusID:4888440

work page 2018
[5]

Apple Newsroom

Apple.Apple debuts M5 Pro and M5 Max to supercharge the most demanding pro work- flows. Apple Newsroom. Accessed: 2026-05-06. 2026.URL: https://www.apple.com/ newsroom/2026/03/apple- debuts- m5- pro- and- m5- max- to- supercharge- the- most-demanding-pro-workflows/

work page 2026
[6]

BinPool: A Dataset of Vulnerabilities for Binary Security Analysis

Sima Arasteh, Georgios Nikitopoulos, Wei-Cheng Wu, Nicolaas Weideman, Aaron Portnoy, Mukund Raghothaman, and Christophe Hauser. “BinPool: A Dataset of Vulnerabilities for Binary Security Analysis”. In:Proceedings of the 33rd ACM International Conference on the Foundations of Software Engineering. 2025

work page 2025
[7]

Polyglot and Distributed Software Repository Mining with Crossflow

Konstantinos Barmpis, Patrick Neubauer, Jonathan Co, Dimitris Kolovos, Nicholas Matragkas, and Richard F. Paige. “Polyglot and Distributed Software Repository Mining with Crossflow”. In:Proceedings of the 17th International Conference on Mining Software Repositories. 2020. URL:https://doi.org/10.1145/3379597.3387481

work page doi:10.1145/3379597.3387481 2020
[8]

Ahoy SAILR! There is No Need to DREAM of C: A Compiler-Aware Structuring Algorithm for Binary Decompilation

Zion Leonahenahe Basque, Ati Priya Bajaj, Wil Gibbs, Jude O’Kain, Derron Miao, Tiffany Bao, Adam Doupé, Yan Shoshitaishvili, and Ruoyu Wang. “Ahoy SAILR! There is No Need to DREAM of C: A Compiler-Aware Structuring Algorithm for Binary Decompilation”. In: 33rd USENIX Security Symposium (USENIX Security 24). 2024.URL: https://www.usenix. org/conference/use...

work page 2024
[9]

CVEfixes: automated collection of vulner- abilities and their fixes from open-source software

Guru Bhandari, Amara Naseer, and Leon Moonen. “CVEfixes: automated collection of vulner- abilities and their fixes from open-source software”. In:Proceedings of the 17th International Conference on Predictive Models and Data Analytics in Software Engineering. 2021.URL: http://dx.doi.org/10.1145/3475960.3475985

work page doi:10.1145/3475960.3475985 2021
[10]

Syntia: Synthe- sizing the semantics of obfuscated code

Tim Blazytko, Moritz Contag, Cornelius Aschermann, and Thorsten Holz. “Syntia: Synthe- sizing the semantics of obfuscated code”. In:26th USENIX Security Symposium (USENIX Security 17). 2017. 10

work page 2017
[11]

The tigress c diversifier/obfuscator

Christian Collberg, Sam Martin, Jonathan Myers, Bill Zimmerman, Petr Krajca, Gabriel Kerneis, Saumya Debray, and Babak Yadegari. “The tigress c diversifier/obfuscator”. In: Retrieved August(2015)

work page 2015
[12]

Christian Collberg, Clark Thomborson, and Douglas Low.A taxonomy of obfuscating transfor- mations. 1997

work page 1997
[13]

BinBench: a benchmark for x64 portable operating system interface binary function represen- tations

Francesca Console, Giuseppe D’Aquanno, Giuseppe Antonio Di Luna, and Leonardo Querzoni. “BinBench: a benchmark for x64 portable operating system interface binary function represen- tations”. In:PeerJ Computer Science(2023).URL: https://api.semanticscholar.org/ CorpusID:259029804

work page 2023
[14]

EM- BERSim: A Large-Scale Databank for Boosting Similarity Search in Malware Analysis

Dragos Georgian Corlatescu, Alexandru Dinu, Mihaela Gaman, and Paul Sumedrea. “EM- BERSim: A Large-Scale Databank for Boosting Similarity Search in Malware Analysis”. In: ArXiv(2023).URL:https://api.semanticscholar.org/CorpusID:263608542

work page 2023
[15]

RISC-V Instruction Set Architecture Extensions: A Survey

Enfang Cui, Tianzheng Li, and Qian Wei. “RISC-V Instruction Set Architecture Extensions: A Survey”. In:IEEE Access(2023)

work page 2023
[16]

https://www.cve.org/

CVE Program.Common Vulnerabilities and Exposures (CVE). https://www.cve.org/ . Accessed: 2026-05-06

work page 2026
[17]

https://ai.google.dev/gemma/docs/core/ model_card_4

Google DeepMind.Gemma 4 model card. https://ai.google.dev/gemma/docs/core/ model_card_4. accessed 2026-05-20. 2026

work page 2026
[18]

Asm2vec: Boosting static representation robustness for binary clone search against code obfuscation and compiler optimization

Steven HH Ding, Benjamin CM Fung, and Philippe Charland. “Asm2vec: Boosting static representation robustness for binary clone search against code obfuscation and compiler optimization”. In:2019 ieee symposium on security and privacy (sp). 2019

work page 2019
[19]

Vulnerability detection with code language models: How far are we?

Yangruibo Ding, Yanjun Fu, Omniyyah Ibrahim, Chawin Sitawarin, Xinyun Chen, Basel Alomair, David Wagner, Baishakhi Ray, and Yizheng Chen. “Vulnerability Detection with Code Language Models: How Far Are We?” In:arXiv preprint arXiv:2403.18624(2024)

work page arXiv 2024
[20]

LibvDiff: Library Version Difference Guided OSS Version Identification in Binaries

Chaopeng Dong, Siyuan Li, Shouguo Yang, Yang Xiao, Yongpan Wang, Hong Li, Zhi Li, and Limin Sun. “LibvDiff: Library Version Difference Guided OSS Version Identification in Binaries”. In:Proceedings of the IEEE/ACM 46th International Conference on Software Engineering. 2024.URL:https://doi.org/10.1145/3597503.3623336

work page doi:10.1145/3597503.3623336 2024
[21]

Schwartz.Idioms: Neural Decompilation With Joint Code and Type Definition Prediction

Luke Dramko, Claire Le Goues, and Edward J. Schwartz.Idioms: Neural Decompilation With Joint Code and Type Definition Prediction. 2025.URL: https://arxiv.org/abs/2502. 04536

work page 2025
[22]

Identifying Open- Source License Violation and 1-day Security Risk at Large Scale

Ruian Duan, Ashish Bijlani, Meng Xu, Taesoo Kim, and Wenke Lee. “Identifying Open- Source License Violation and 1-day Security Risk at Large Scale”. In:Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security. 2017.URL: https://doi.org/10.1145/3133956.3134048

work page doi:10.1145/3133956.3134048 2017
[23]

DeepBinDiff: Learning Program- Wide Code Representations for Binary Diffing

Yue Duan, Xuezixiang Li, Jinghan Wang, and Heng Yin. “DeepBinDiff: Learning Program- Wide Code Representations for Binary Diffing”. In:27th Annual Network and Distributed Sys- tem Security Symposium, NDSS 2020, San Diego, California, USA, February 23-26, 2020. 2020. URL: https://www.ndss- symposium.org/ndss- paper/deepbindiff- learning- program-wide-code-...

work page 2020
[24]

A C/C++ Code Vulnerability Dataset with Code Changes and CVE Summaries

Jiahao Fan, Yi Li, Shaohua Wang, and Tien N. Nguyen. “A C/C++ Code Vulnerability Dataset with Code Changes and CVE Summaries”. In:Proceedings of the 17th International Confer- ence on Mining Software Repositories. 2020.URL: https://doi.org/10.1145/3379597. 3387501

work page doi:10.1145/3379597 2020
[25]

Scalable graph-based bug search for firmware images

Qian Feng, Rundong Zhou, Chengcheng Xu, Yao Cheng, Brian Testa, and Heng Yin. “Scalable graph-based bug search for firmware images”. In:Proceedings of the 2016 ACM SIGSAC conference on computer and communications security. 2016

work page 2016
[26]

Structural comparison of executable objects

Halvar Flake. “Structural comparison of executable objects”. In:Detection of intrusions and malware & vulnerability assessment, GI SIG SIDAR workshop, DIMVA 2004. 2004

work page 2004
[27]

BinHunt: Automatically Finding Semantic Differences in Binary Programs

Debin Gao, Michael K. Reiter, and Dawn Song. “BinHunt: Automatically Finding Semantic Differences in Binary Programs”. In:Information and Communications Security: 10th Interna- tional Conference, ICICS 2008 Birmingham, UK, October 20 - 22, 2008 Proceedings. 2008. URL:https://doi.org/10.1007/978-3-540-88625-9_16. 11

work page doi:10.1007/978-3-540-88625-9_16 2008
[28]

SigmaDiff: Semantics-Aware Deep Graph Matching for Pseudocode Diffing

Lian Gao, Yu Qu, Sheng Yu, Yue Duan, and Heng Yin. “SigmaDiff: Semantics-Aware Deep Graph Matching for Pseudocode Diffing”. In:Proceedings 2024 Network and Distributed Sys- tem Security Symposium(2024).URL: https://api.semanticscholar.org/CorpusID: 262144278

work page 2024
[29]

Prior distributions for variance parameters in hierarchical models (comment on article by Browne and Draper)

Andrew Gelman. “Prior distributions for variance parameters in hierarchical models (comment on article by Browne and Draper)”. In:Bayesian Analysis(2006).URL: https://doi.org/ 10.1214/06-BA117A

work page doi:10.1214/06-ba117a 2006
[30]

Why We (Usually) Don’t Have to Worry About Multiple Comparisons

Andrew Gelman, Jennifer Hill, and Masanao Yajima. “Why We (Usually) Don’t Have to Worry About Multiple Comparisons”. In:Journal of Research on Educational Effectiveness(2012). URL:https://doi.org/10.1080/19345747.2011.618213

work page doi:10.1080/19345747.2011.618213 2012
[31]

Inference from iterative simulation using multiple sequences

Andrew Gelman and Donald B Rubin. “Inference from iterative simulation using multiple sequences”. In:Statistical science(1992)

work page 1992
[32]

Accessed: 2026- 05-06

GitHub.GitHub Advisory Database.https://github.com/advisories. Accessed: 2026- 05-06

work page 2026
[33]

The GHTorent dataset and tool suite

Georgios Gousios. “The GHTorent dataset and tool suite”. In:2013 10th Working Conference on Mining Software Repositories (MSR). 2013

work page 2013
[34]

BinProv: Binary Code Provenance Identification without Disassembly

Xu He, Shu Wang, Yunlong Xing, Pengbin Feng, Haining Wang, Qi Li, Songqing Chen, and Kun Sun. “BinProv: Binary Code Provenance Identification without Disassembly”. In: Proceedings of the 25th International Symposium on Research in Attacks, Intrusions and Defenses(2022).URL:https://api.semanticscholar.org/CorpusID:252910574

work page 2022
[35]

The No-U-turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo

Matthew D. Hoffman and Andrew Gelman. “The No-U-turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo”. In:J. Mach. Learn. Res.(2011).URL: https: //api.semanticscholar.org/CorpusID:12948548

work page 2011
[36]

RULER: What's the Real Context Size of Your Long-Context Language Models?

Cheng-Ping Hsieh, Simeng Sun, Samuel Kriman, Shantanu Acharya, Dima Rekesh, Fei Jia, Yang Zhang, and Boris Ginsburg. “RULER: What’s the real context size of your long-context language models?” In:arXiv preprint arXiv:2404.06654(2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024
[37]

2020.URL: https://github

Zecong Hu and Jeremy Lacomis.GitHub Cloner & Compiler. 2020.URL: https://github. com/huzecong/ghcc

work page 2020
[38]

2025.URL:https://arxiv.org/abs/2505.22010

Nasir Hussain, Haohan Chen, Chanh Tran, Philip Huang, Zhuohao Li, Pravir Chugh, William Chen, Ashish Kundu, and Yuan Tian.VulBinLLM: LLM-powered Vulnerability Detection for Stripped Binaries. 2025.URL:https://arxiv.org/abs/2505.22010

work page arXiv 2025
[39]

BinaryAI: Binary Software Composition Analysis via Intelligent Binary Source Code Matching

Ling Jiang, Junwen An, Huihui Huang, Qiyi Tang, Sen Nie, Shi Wu, and Yuqun Zhang. BinaryAI: Binary Software Composition Analysis via Intelligent Binary Source Code Matching. 2024.URL:https://arxiv.org/abs/2401.11161

work page arXiv 2024
[40]

2025.URL:https://arxiv.org/abs/2311.13721

Nan Jiang, Chengxiao Wang, Kevin Liu, Xiangzhe Xu, Lin Tan, Xiangyu Zhang, and Petr Babkin.Nova: Generative Language Models for Assembly Code with Hierarchical Attention and Contrastive Learning. 2025.URL:https://arxiv.org/abs/2311.13721

work page arXiv 2025
[41]

Joyce, Dev Amlani, Charles Nicholas, and Edward Raff.MOTIF: A Large Malware Reference Dataset with Ground Truth Family Labels

Robert J. Joyce, Dev Amlani, Charles Nicholas, and Edward Raff.MOTIF: A Large Malware Reference Dataset with Ground Truth Family Labels. 2021.URL: https://arxiv.org/abs/ 2111.15031

work page arXiv 2021
[42]

EMBER2024 - A Benchmark Dataset for Holistic Evaluation of Malware Classifiers

Robert J. Joyce, Gideon Miller, Phil Roth, Richard Zak, Elliott Zaresky-Williams, Hyrum Anderson, Edward Raff, and James Holt. “EMBER2024 - A Benchmark Dataset for Holistic Evaluation of Malware Classifiers”. In:Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V .2. 2025.URL: http://dx.doi.org/10.1145/ 3711896.3737431

work page arXiv 2025
[43]

Obfuscator-LLVM – Software Protection for the Masses

Pascal Junod, Julien Rinaldini, Johan Wehrli, and Julie Michielin. “Obfuscator-LLVM – Software Protection for the Masses”. In:2015 IEEE/ACM 1st International Workshop on Software Protection. 2015

work page 2015
[44]

Revisiting Binary Code Similarity Analysis Using Interpretable Feature Engineering and Lessons Learned

Dongkwan Kim, Eunsoo Kim, Sang Kil Cha, Sooel Son, and Yongdae Kim. “Revisiting Binary Code Similarity Analysis Using Interpretable Feature Engineering and Lessons Learned”. In: IEEE Transactions on Software Engineering(2023).URL: http://dx.doi.org/10.1109/ TSE.2022.3187689

work page arXiv 2023
[45]

Joxean Koret.Diaphora.https://github.com/joxeankoret/diaphora. 12

work page
[46]

Doing Bayesian data analysis: A tutorial with R, JAGS, and Stan

John Kruschke. “Doing Bayesian data analysis: A tutorial with R, JAGS, and Stan”. In: (2014)

work page 2014
[47]

URLhttps://openreview.net/forum?id=VTF8yNQM66

Hwiwon Lee, Ziqi Zhang, Hanxiao Lu, and Lingming Zhang.SEC-bench: Automated Bench- marking of LLM Agents on Real-World Software Security Tasks. 2025.URL: https://arxiv. org/abs/2506.11791

work page arXiv 2025
[48]

2025.URL:https://arxiv.org/abs/2506.05692

Xinghang Li, Jingzhe Ding, Chao Peng, Bing Zhao, Xiang Gao, Hongwan Gao, and Xinchen Gu.SafeGenBench: A Benchmark Framework for Security Vulnerability Detection in LLM- Generated Code. 2025.URL:https://arxiv.org/abs/2506.05692

work page arXiv 2025
[49]

PalmTree: Learning an Assembly Language Model for Instruction Embedding

Xuezixiang Li, Yu Qu, and Heng Yin. “PalmTree: Learning an Assembly Language Model for Instruction Embedding”. In:Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security. 2021.URL: http : / / dx . doi . org / 10 . 1145 / 3460120 . 3484587

work page 2021
[50]

Mining Internet-Scale Software Repositories

Erik Linstead, Paul Rigor, Sushil Bajracharya, Cristina Lopes, and Pierre Baldi. “Mining Internet-Scale Software Repositories”. In:Advances in Neural Information Processing Systems. 2007.URL: https://proceedings.neurips.cc/paper_files/paper/2007/file/ a532400ed62e772b9dc0b86f46e583ff-Paper.pdf

work page 2007
[51]

α Diff: Cross-Version Binary Code Similarity Detection with DNN

Bingchang Liu, Wei Huo, Chao Zhang, Wenchao Li, Feng Li, Aihua Piao, and Wei Zou. “α Diff: Cross-Version Binary Code Similarity Detection with DNN”. In:2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE). 2018

work page 2018
[52]

2024.URL:https://arxiv.org/abs/2405.03991

Chang Liu, Rebecca Saul, Yihao Sun, Edward Raff, Maya Fuchs, Townsend Southard Pantano, James Holt, and Kristopher Micinski.Assemblage: Automatic Binary Dataset Construction for Machine Learning. 2024.URL:https://arxiv.org/abs/2405.03991

work page arXiv 2024
[53]

2026.URL:https://arxiv.org/abs/2603.28002

Chang Liu, Yihao Sun, Thomas Gilray, and Kristopher Micinski.Superset Decompilation. 2026.URL:https://arxiv.org/abs/2603.28002

work page arXiv 2026
[54]

Lost in the middle: How language models use long contexts

Nelson F Liu, Kevin Lin, John Hewitt, Ashwin Paranjape, Michele Bevilacqua, Fabio Petroni, and Percy Liang. “Lost in the middle: How language models use long contexts”. In:Transac- tions of the association for computational linguistics(2024)

work page 2024
[55]

2026.URL: https : //arxiv.org/abs/2602.06687

Li Lu, Yanjie Zhao, Hongzhou Rao, Kechi Zhang, and Haoyu Wang.Evaluating and Enhancing the Vulnerability Reasoning Capabilities of Large Language Models. 2026.URL: https : //arxiv.org/abs/2602.06687

work page arXiv 2026
[56]

How Machine Learning Is Solving the Binary Function Similarity Problem

Andrea Marcelli, Mariano Graziano, Xabier Ugarte-Pedrero, Yanick Fratantonio, Mohamad Mansouri, and Davide Balzarotti. “How Machine Learning Is Solving the Binary Function Similarity Problem”. In:31st USENIX Security Symposium (USENIX Security 22). 2022. URL: https://www.usenix.org/conference/usenixsecurity22/presentation/ marcelli

work page 2022
[57]

2019.URL: https: //arxiv.org/abs/1811.05296

Luca Massarelli, Giuseppe Antonio Di Luna, Fabio Petroni, Leonardo Querzoni, and Roberto Baldoni.SAFE: Self-Attentive Function Embeddings for Binary Similarity. 2019.URL: https: //arxiv.org/abs/1811.05296

work page arXiv 2019
[58]

Equation of state calculations by fast computing machines

Nicholas Metropolis, Arianna W Rosenbluth, Marshall N Rosenbluth, Augusta H Teller, and Edward Teller. “Equation of state calculations by fast computing machines”. In:The journal of chemical physics(1953)

work page 1953
[59]

Microsoft.vcpkg.https://github.com/microsoft/vcpkg. 2024

work page 2024
[60]

https: //nvd.nist.gov

National Institute of Standards and Technology.National Vulnerability Database. https: //nvd.nist.gov. Accessed: 2026-05-06

work page 2026
[61]

MegaVul: A C/C++ Vulnerability Dataset with Comprehensive Code Representations

Chao Ni, Liyu Shen, Xiaohu Yang, Yan Zhu, and Shaohua Wang. “MegaVul: A C/C++ Vulnerability Dataset with Comprehensive Code Representations”. In:2024 IEEE/ACM 21st International Conference on Mining Software Repositories (MSR). 2024

work page 2024
[62]

TLSH–a locality sensitive hash

Jonathan Oliver, Chun Cheng, and Yanggui Chen. “TLSH–a locality sensitive hash”. In:2013 fourth cybercrime and trustworthy computing workshop. 2013

work page 2013
[63]

Composable Effects for Flexible and Accelerated Probabilistic Programming in NumPyro

Du Phan, Neeraj Pradhan, and Martin Jankowiak.Composable Effects for Flexible and Ac- celerated Probabilistic Programming in NumPyro. 2019.URL: https://arxiv.org/abs/ 1912.11554

work page internal anchor Pith review Pith/arXiv arXiv 2019
[64]

Anderson, Bobby Filar, and Mark McLean.Classifying Sequences of Extreme Length with Constant Memory Applied to Malware Detection

Edward Raff, William Fleshman, Richard Zak, Hyrum S. Anderson, Bobby Filar, and Mark McLean.Classifying Sequences of Extreme Length with Constant Memory Applied to Malware Detection. 2020.URL:https://arxiv.org/abs/2012.09390. 13

work page arXiv 2020
[65]

CodeBLEU: a Method for Automatic Evaluation of Code Synthesis

Shuo Ren, Daya Guo, Shuai Lu, Long Zhou, Shujie Liu, Duyu Tang, Neel Sundaresan, Ming Zhou, Ambrosio Blanco, and Shuai Ma.CodeBLEU: a Method for Automatic Evaluation of Code Synthesis. 2020.URL:https://arxiv.org/abs/2009.10297

work page internal anchor Pith review Pith/arXiv arXiv 2020
[66]

Riddell, A

Martin Riddell, Ansong Ni, and Arman Cohan.Quantifying Contamination in Evaluating Code Generation Capabilities of Language Models. 2024.URL: https://arxiv.org/abs/ 2403.04811

work page arXiv 2024
[67]

VulZoo: A Comprehensive Vulnerability Intelligence Dataset

Bonan Ruan, Jiahao Liu, Weibo Zhao, and Zhenkai Liang. “VulZoo: A Comprehensive Vulnerability Intelligence Dataset”. In:Proceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering. 2024.URL:https://doi.org/10.1145/ 3691620.3695345

work page arXiv 2024
[68]

Symbolic deobfuscation: From virtualized code back to the original

Jonathan Salwan, Sébastien Bardin, and Marie-Laure Potet. “Symbolic deobfuscation: From virtualized code back to the original”. In:International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment. 2018

work page 2018
[69]

Is Function Similarity Over-Engineered? Building a Benchmark

Rebecca Saul, Chang Liu, Noah Fleischmann, Richard Zak, Kristopher Micinski, Edward Raff, and James Holt. “Is Function Similarity Over-Engineered? Building a Benchmark”. In: Advances in Neural Information Processing Systems. 2024

work page 2024
[70]

Loki: Hardening code obfuscation against automated at- tacks

Moritz Schloegel, Tim Blazytko, Moritz Contag, Cornelius Aschermann, Julius Basler, Thorsten Holz, and Ali Abbasi. “Loki: Hardening code obfuscation against automated at- tacks”. In:31st USENIX Security Symposium (USENIX Security 22). 2022

work page 2022
[71]

paper2repo: GitHub Repository Recommendation for Academic Papers

Huajie Shao, Dachun Sun, Jiahao Wu, Zecheng Zhang, Aston Zhang, Shuochao Yao, Shengzhong Liu, Tianshi Wang, Chao Zhang, and Tarek Abdelzaher. “paper2repo: GitHub Repository Recommendation for Academic Papers”. In:Proceedings of The Web Conference

work page
[72]

2020.URL:http://dx.doi.org/10.1145/3366423.3380145

work page doi:10.1145/3366423.3380145 2020
[73]

Ubuntu One investigation: Detecting evidences on client machines

Mohammad Behnam Shariati, Ali Dehghantanha, Ben Martini, and Kim-Kwang Raymond Choo. “Ubuntu One investigation: Detecting evidences on client machines”. In:The Cloud Secu- rity Ecosystem. 2015.URL:https://api.semanticscholar.org/CorpusID:33377904

work page 2015
[74]

SOK: (State of) The Art of War: Offensive Techniques in Binary Analysis

Yan Shoshitaishvili, Ruoyu Wang, Christopher Salls, Nick Stephens, Mario Polino, Andrew Dutcher, John Grosen, Siji Feng, Christophe Hauser, Christopher Kruegel, and Giovanni Vigna. “SOK: (State of) The Art of War: Offensive Techniques in Binary Analysis”. In:2016 IEEE Symposium on Security and Privacy (SP). 2016

work page 2016
[75]

2026.URL:https://arxiv.org/abs/2603.18355

Ashwin Sudhir, Zion Leonahenahe Basque, Wil Gibbs, Ati Priya Bajaj, Pulkit Singh Singaria, Mitchell Zakocs, Jie Hu, Moritz Schloegel, Tiffany Bao, Adam Doupe, Yan Shoshitaishvili, and Ruoyu Wang.Pushan: Trace-Free Deobfuscation of Virtualization-Obfuscated Binaries. 2026.URL:https://arxiv.org/abs/2603.18355

work page arXiv 2026
[76]

Hanzhuo Tan, Qi Luo, Jing Li, and Yuqun Zhang.LLM4Decompile: Decompiling Binary Code with Large Language Models. 2024

work page 2024
[77]

2025.URL:https://arxiv.org/abs/2505.12668

Hanzhuo Tan, Xiaolong Tian, Hanrui Qi, Jiaming Liu, Zuchen Gao, Siyi Wang, Qi Luo, Jing Li, and Yuqun Zhang.Decompile-Bench: Million-Scale Binary-Source Function Pairs for Real-World Binary Decompilation. 2025.URL:https://arxiv.org/abs/2505.12668

work page arXiv 2025
[78]

Llms cannot reliably identify and reason about security vulnerabilities (yet?): A comprehensive evaluation, framework, and benchmarks, 2024

Saad Ullah, Mingji Han, Saurabh Pujar, Hammond Pearce, Ayse Coskun, and Gianluca Stringhini.LLMs Cannot Reliably Identify and Reason About Security Vulnerabilities (Yet?): A Comprehensive Evaluation, Framework, and Benchmarks. 2024.URL: https://arxiv.org/ abs/2312.12575

work page arXiv 2024
[79]

Angr - The Next Generation of Binary Analysis

Fish Wang and Yan Shoshitaishvili. “Angr - The Next Generation of Binary Analysis”. In: 2017 IEEE Cybersecurity Development (SecDev). 2017

work page 2017
[80]

2022.URL: https: //arxiv.org/abs/2205.12713

Hao Wang, Wenjie Qu, Gilad Katz, Wenyu Zhu, Zeyu Gao, Han Qiu, Jianwei Zhuge, and Chao Zhang.jTrans: Jump-Aware Transformer for Binary Code Similarity. 2022.URL: https: //arxiv.org/abs/2205.12713

work page arXiv 2022

Showing first 80 references.

[1] [1]

https: / / github.com/nationalsecurityagency/ghidra

National Security Agency.Ghidra Software Reverse Engineering Framework. https: / / github.com/nationalsecurityagency/ghidra. accessed 2026-05-06. 2019

work page 2026

[2] [2]

SecVulEval: Benchmarking LLMs for Real-World C/C++ Vulnerability Detection

Md Basim Uddin Ahmed, Nima Shiri Harzevili, Jiho Shin, Hung Viet Pham, and Song Wang. SecVulEval: Benchmarking LLMs for Real-World C/C++ Vulnerability Detection. 2025.URL: https://arxiv.org/abs/2505.19828

work page arXiv 2025

[3] [3]

Assessing the Effectiveness of the Tigress Obfuscator Against MOPSA and BinaryNinja

Nicolò Altamura, Enrico Bragastini, Marco Campion, and Mila Dalla Preda. “Assessing the Effectiveness of the Tigress Obfuscator Against MOPSA and BinaryNinja”. In:Proceedings of the 2025 Workshop on Research on Offensive and Defensive Techniques in the Context of Man At The End (MATE) Attacks. 2025.URL:https://doi.org/10.1145/3733817.3762702

work page doi:10.1145/3733817.3762702 2025

[4] [4]

EMBER: An Open Dataset for Training Static PE Malware Machine Learning Models

H. Anderson and Phil Roth. “EMBER: An Open Dataset for Training Static PE Malware Machine Learning Models”. In:ArXiv(2018).URL: https://api.semanticscholar.org/ CorpusID:4888440

work page 2018

[5] [5]

Apple Newsroom

Apple.Apple debuts M5 Pro and M5 Max to supercharge the most demanding pro work- flows. Apple Newsroom. Accessed: 2026-05-06. 2026.URL: https://www.apple.com/ newsroom/2026/03/apple- debuts- m5- pro- and- m5- max- to- supercharge- the- most-demanding-pro-workflows/

work page 2026

[6] [6]

BinPool: A Dataset of Vulnerabilities for Binary Security Analysis

Sima Arasteh, Georgios Nikitopoulos, Wei-Cheng Wu, Nicolaas Weideman, Aaron Portnoy, Mukund Raghothaman, and Christophe Hauser. “BinPool: A Dataset of Vulnerabilities for Binary Security Analysis”. In:Proceedings of the 33rd ACM International Conference on the Foundations of Software Engineering. 2025

work page 2025

[7] [7]

Polyglot and Distributed Software Repository Mining with Crossflow

Konstantinos Barmpis, Patrick Neubauer, Jonathan Co, Dimitris Kolovos, Nicholas Matragkas, and Richard F. Paige. “Polyglot and Distributed Software Repository Mining with Crossflow”. In:Proceedings of the 17th International Conference on Mining Software Repositories. 2020. URL:https://doi.org/10.1145/3379597.3387481

work page doi:10.1145/3379597.3387481 2020

[8] [8]

Ahoy SAILR! There is No Need to DREAM of C: A Compiler-Aware Structuring Algorithm for Binary Decompilation

Zion Leonahenahe Basque, Ati Priya Bajaj, Wil Gibbs, Jude O’Kain, Derron Miao, Tiffany Bao, Adam Doupé, Yan Shoshitaishvili, and Ruoyu Wang. “Ahoy SAILR! There is No Need to DREAM of C: A Compiler-Aware Structuring Algorithm for Binary Decompilation”. In: 33rd USENIX Security Symposium (USENIX Security 24). 2024.URL: https://www.usenix. org/conference/use...

work page 2024

[9] [9]

CVEfixes: automated collection of vulner- abilities and their fixes from open-source software

Guru Bhandari, Amara Naseer, and Leon Moonen. “CVEfixes: automated collection of vulner- abilities and their fixes from open-source software”. In:Proceedings of the 17th International Conference on Predictive Models and Data Analytics in Software Engineering. 2021.URL: http://dx.doi.org/10.1145/3475960.3475985

work page doi:10.1145/3475960.3475985 2021

[10] [10]

Syntia: Synthe- sizing the semantics of obfuscated code

Tim Blazytko, Moritz Contag, Cornelius Aschermann, and Thorsten Holz. “Syntia: Synthe- sizing the semantics of obfuscated code”. In:26th USENIX Security Symposium (USENIX Security 17). 2017. 10

work page 2017

[11] [11]

The tigress c diversifier/obfuscator

Christian Collberg, Sam Martin, Jonathan Myers, Bill Zimmerman, Petr Krajca, Gabriel Kerneis, Saumya Debray, and Babak Yadegari. “The tigress c diversifier/obfuscator”. In: Retrieved August(2015)

work page 2015

[12] [12]

Christian Collberg, Clark Thomborson, and Douglas Low.A taxonomy of obfuscating transfor- mations. 1997

work page 1997

[13] [13]

BinBench: a benchmark for x64 portable operating system interface binary function represen- tations

Francesca Console, Giuseppe D’Aquanno, Giuseppe Antonio Di Luna, and Leonardo Querzoni. “BinBench: a benchmark for x64 portable operating system interface binary function represen- tations”. In:PeerJ Computer Science(2023).URL: https://api.semanticscholar.org/ CorpusID:259029804

work page 2023

[14] [14]

EM- BERSim: A Large-Scale Databank for Boosting Similarity Search in Malware Analysis

Dragos Georgian Corlatescu, Alexandru Dinu, Mihaela Gaman, and Paul Sumedrea. “EM- BERSim: A Large-Scale Databank for Boosting Similarity Search in Malware Analysis”. In: ArXiv(2023).URL:https://api.semanticscholar.org/CorpusID:263608542

work page 2023

[15] [15]

RISC-V Instruction Set Architecture Extensions: A Survey

Enfang Cui, Tianzheng Li, and Qian Wei. “RISC-V Instruction Set Architecture Extensions: A Survey”. In:IEEE Access(2023)

work page 2023

[16] [16]

https://www.cve.org/

CVE Program.Common Vulnerabilities and Exposures (CVE). https://www.cve.org/ . Accessed: 2026-05-06

work page 2026

[17] [17]

https://ai.google.dev/gemma/docs/core/ model_card_4

Google DeepMind.Gemma 4 model card. https://ai.google.dev/gemma/docs/core/ model_card_4. accessed 2026-05-20. 2026

work page 2026

[18] [18]

Asm2vec: Boosting static representation robustness for binary clone search against code obfuscation and compiler optimization

Steven HH Ding, Benjamin CM Fung, and Philippe Charland. “Asm2vec: Boosting static representation robustness for binary clone search against code obfuscation and compiler optimization”. In:2019 ieee symposium on security and privacy (sp). 2019

work page 2019

[19] [19]

Vulnerability detection with code language models: How far are we?

Yangruibo Ding, Yanjun Fu, Omniyyah Ibrahim, Chawin Sitawarin, Xinyun Chen, Basel Alomair, David Wagner, Baishakhi Ray, and Yizheng Chen. “Vulnerability Detection with Code Language Models: How Far Are We?” In:arXiv preprint arXiv:2403.18624(2024)

work page arXiv 2024

[20] [20]

LibvDiff: Library Version Difference Guided OSS Version Identification in Binaries

Chaopeng Dong, Siyuan Li, Shouguo Yang, Yang Xiao, Yongpan Wang, Hong Li, Zhi Li, and Limin Sun. “LibvDiff: Library Version Difference Guided OSS Version Identification in Binaries”. In:Proceedings of the IEEE/ACM 46th International Conference on Software Engineering. 2024.URL:https://doi.org/10.1145/3597503.3623336

work page doi:10.1145/3597503.3623336 2024

[21] [21]

Schwartz.Idioms: Neural Decompilation With Joint Code and Type Definition Prediction

Luke Dramko, Claire Le Goues, and Edward J. Schwartz.Idioms: Neural Decompilation With Joint Code and Type Definition Prediction. 2025.URL: https://arxiv.org/abs/2502. 04536

work page 2025

[22] [22]

Identifying Open- Source License Violation and 1-day Security Risk at Large Scale

Ruian Duan, Ashish Bijlani, Meng Xu, Taesoo Kim, and Wenke Lee. “Identifying Open- Source License Violation and 1-day Security Risk at Large Scale”. In:Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security. 2017.URL: https://doi.org/10.1145/3133956.3134048

work page doi:10.1145/3133956.3134048 2017

[23] [23]

DeepBinDiff: Learning Program- Wide Code Representations for Binary Diffing

Yue Duan, Xuezixiang Li, Jinghan Wang, and Heng Yin. “DeepBinDiff: Learning Program- Wide Code Representations for Binary Diffing”. In:27th Annual Network and Distributed Sys- tem Security Symposium, NDSS 2020, San Diego, California, USA, February 23-26, 2020. 2020. URL: https://www.ndss- symposium.org/ndss- paper/deepbindiff- learning- program-wide-code-...

work page 2020

[24] [24]

A C/C++ Code Vulnerability Dataset with Code Changes and CVE Summaries

Jiahao Fan, Yi Li, Shaohua Wang, and Tien N. Nguyen. “A C/C++ Code Vulnerability Dataset with Code Changes and CVE Summaries”. In:Proceedings of the 17th International Confer- ence on Mining Software Repositories. 2020.URL: https://doi.org/10.1145/3379597. 3387501

work page doi:10.1145/3379597 2020

[25] [25]

Scalable graph-based bug search for firmware images

Qian Feng, Rundong Zhou, Chengcheng Xu, Yao Cheng, Brian Testa, and Heng Yin. “Scalable graph-based bug search for firmware images”. In:Proceedings of the 2016 ACM SIGSAC conference on computer and communications security. 2016

work page 2016

[26] [26]

Structural comparison of executable objects

Halvar Flake. “Structural comparison of executable objects”. In:Detection of intrusions and malware & vulnerability assessment, GI SIG SIDAR workshop, DIMVA 2004. 2004

work page 2004

[27] [27]

BinHunt: Automatically Finding Semantic Differences in Binary Programs

Debin Gao, Michael K. Reiter, and Dawn Song. “BinHunt: Automatically Finding Semantic Differences in Binary Programs”. In:Information and Communications Security: 10th Interna- tional Conference, ICICS 2008 Birmingham, UK, October 20 - 22, 2008 Proceedings. 2008. URL:https://doi.org/10.1007/978-3-540-88625-9_16. 11

work page doi:10.1007/978-3-540-88625-9_16 2008

[28] [28]

SigmaDiff: Semantics-Aware Deep Graph Matching for Pseudocode Diffing

Lian Gao, Yu Qu, Sheng Yu, Yue Duan, and Heng Yin. “SigmaDiff: Semantics-Aware Deep Graph Matching for Pseudocode Diffing”. In:Proceedings 2024 Network and Distributed Sys- tem Security Symposium(2024).URL: https://api.semanticscholar.org/CorpusID: 262144278

work page 2024

[29] [29]

Prior distributions for variance parameters in hierarchical models (comment on article by Browne and Draper)

Andrew Gelman. “Prior distributions for variance parameters in hierarchical models (comment on article by Browne and Draper)”. In:Bayesian Analysis(2006).URL: https://doi.org/ 10.1214/06-BA117A

work page doi:10.1214/06-ba117a 2006

[30] [30]

Why We (Usually) Don’t Have to Worry About Multiple Comparisons

Andrew Gelman, Jennifer Hill, and Masanao Yajima. “Why We (Usually) Don’t Have to Worry About Multiple Comparisons”. In:Journal of Research on Educational Effectiveness(2012). URL:https://doi.org/10.1080/19345747.2011.618213

work page doi:10.1080/19345747.2011.618213 2012

[31] [31]

Inference from iterative simulation using multiple sequences

Andrew Gelman and Donald B Rubin. “Inference from iterative simulation using multiple sequences”. In:Statistical science(1992)

work page 1992

[32] [32]

Accessed: 2026- 05-06

GitHub.GitHub Advisory Database.https://github.com/advisories. Accessed: 2026- 05-06

work page 2026

[33] [33]

The GHTorent dataset and tool suite

Georgios Gousios. “The GHTorent dataset and tool suite”. In:2013 10th Working Conference on Mining Software Repositories (MSR). 2013

work page 2013

[34] [34]

BinProv: Binary Code Provenance Identification without Disassembly

Xu He, Shu Wang, Yunlong Xing, Pengbin Feng, Haining Wang, Qi Li, Songqing Chen, and Kun Sun. “BinProv: Binary Code Provenance Identification without Disassembly”. In: Proceedings of the 25th International Symposium on Research in Attacks, Intrusions and Defenses(2022).URL:https://api.semanticscholar.org/CorpusID:252910574

work page 2022

[35] [35]

The No-U-turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo

Matthew D. Hoffman and Andrew Gelman. “The No-U-turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo”. In:J. Mach. Learn. Res.(2011).URL: https: //api.semanticscholar.org/CorpusID:12948548

work page 2011

[36] [36]

RULER: What's the Real Context Size of Your Long-Context Language Models?

Cheng-Ping Hsieh, Simeng Sun, Samuel Kriman, Shantanu Acharya, Dima Rekesh, Fei Jia, Yang Zhang, and Boris Ginsburg. “RULER: What’s the real context size of your long-context language models?” In:arXiv preprint arXiv:2404.06654(2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024

[37] [37]

2020.URL: https://github

Zecong Hu and Jeremy Lacomis.GitHub Cloner & Compiler. 2020.URL: https://github. com/huzecong/ghcc

work page 2020

[38] [38]

2025.URL:https://arxiv.org/abs/2505.22010

Nasir Hussain, Haohan Chen, Chanh Tran, Philip Huang, Zhuohao Li, Pravir Chugh, William Chen, Ashish Kundu, and Yuan Tian.VulBinLLM: LLM-powered Vulnerability Detection for Stripped Binaries. 2025.URL:https://arxiv.org/abs/2505.22010

work page arXiv 2025

[39] [39]

BinaryAI: Binary Software Composition Analysis via Intelligent Binary Source Code Matching

Ling Jiang, Junwen An, Huihui Huang, Qiyi Tang, Sen Nie, Shi Wu, and Yuqun Zhang. BinaryAI: Binary Software Composition Analysis via Intelligent Binary Source Code Matching. 2024.URL:https://arxiv.org/abs/2401.11161

work page arXiv 2024

[40] [40]

2025.URL:https://arxiv.org/abs/2311.13721

Nan Jiang, Chengxiao Wang, Kevin Liu, Xiangzhe Xu, Lin Tan, Xiangyu Zhang, and Petr Babkin.Nova: Generative Language Models for Assembly Code with Hierarchical Attention and Contrastive Learning. 2025.URL:https://arxiv.org/abs/2311.13721

work page arXiv 2025

[41] [41]

Joyce, Dev Amlani, Charles Nicholas, and Edward Raff.MOTIF: A Large Malware Reference Dataset with Ground Truth Family Labels

Robert J. Joyce, Dev Amlani, Charles Nicholas, and Edward Raff.MOTIF: A Large Malware Reference Dataset with Ground Truth Family Labels. 2021.URL: https://arxiv.org/abs/ 2111.15031

work page arXiv 2021

[42] [42]

EMBER2024 - A Benchmark Dataset for Holistic Evaluation of Malware Classifiers

Robert J. Joyce, Gideon Miller, Phil Roth, Richard Zak, Elliott Zaresky-Williams, Hyrum Anderson, Edward Raff, and James Holt. “EMBER2024 - A Benchmark Dataset for Holistic Evaluation of Malware Classifiers”. In:Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V .2. 2025.URL: http://dx.doi.org/10.1145/ 3711896.3737431

work page arXiv 2025

[43] [43]

Obfuscator-LLVM – Software Protection for the Masses

Pascal Junod, Julien Rinaldini, Johan Wehrli, and Julie Michielin. “Obfuscator-LLVM – Software Protection for the Masses”. In:2015 IEEE/ACM 1st International Workshop on Software Protection. 2015

work page 2015

[44] [44]

Revisiting Binary Code Similarity Analysis Using Interpretable Feature Engineering and Lessons Learned

Dongkwan Kim, Eunsoo Kim, Sang Kil Cha, Sooel Son, and Yongdae Kim. “Revisiting Binary Code Similarity Analysis Using Interpretable Feature Engineering and Lessons Learned”. In: IEEE Transactions on Software Engineering(2023).URL: http://dx.doi.org/10.1109/ TSE.2022.3187689

work page arXiv 2023

[45] [45]

Joxean Koret.Diaphora.https://github.com/joxeankoret/diaphora. 12

work page

[46] [46]

Doing Bayesian data analysis: A tutorial with R, JAGS, and Stan

John Kruschke. “Doing Bayesian data analysis: A tutorial with R, JAGS, and Stan”. In: (2014)

work page 2014

[47] [47]

URLhttps://openreview.net/forum?id=VTF8yNQM66

Hwiwon Lee, Ziqi Zhang, Hanxiao Lu, and Lingming Zhang.SEC-bench: Automated Bench- marking of LLM Agents on Real-World Software Security Tasks. 2025.URL: https://arxiv. org/abs/2506.11791

work page arXiv 2025

[48] [48]

2025.URL:https://arxiv.org/abs/2506.05692

Xinghang Li, Jingzhe Ding, Chao Peng, Bing Zhao, Xiang Gao, Hongwan Gao, and Xinchen Gu.SafeGenBench: A Benchmark Framework for Security Vulnerability Detection in LLM- Generated Code. 2025.URL:https://arxiv.org/abs/2506.05692

work page arXiv 2025

[49] [49]

PalmTree: Learning an Assembly Language Model for Instruction Embedding

Xuezixiang Li, Yu Qu, and Heng Yin. “PalmTree: Learning an Assembly Language Model for Instruction Embedding”. In:Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security. 2021.URL: http : / / dx . doi . org / 10 . 1145 / 3460120 . 3484587

work page 2021

[50] [50]

Mining Internet-Scale Software Repositories

Erik Linstead, Paul Rigor, Sushil Bajracharya, Cristina Lopes, and Pierre Baldi. “Mining Internet-Scale Software Repositories”. In:Advances in Neural Information Processing Systems. 2007.URL: https://proceedings.neurips.cc/paper_files/paper/2007/file/ a532400ed62e772b9dc0b86f46e583ff-Paper.pdf

work page 2007

[51] [51]

α Diff: Cross-Version Binary Code Similarity Detection with DNN

Bingchang Liu, Wei Huo, Chao Zhang, Wenchao Li, Feng Li, Aihua Piao, and Wei Zou. “α Diff: Cross-Version Binary Code Similarity Detection with DNN”. In:2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE). 2018

work page 2018

[52] [52]

2024.URL:https://arxiv.org/abs/2405.03991

Chang Liu, Rebecca Saul, Yihao Sun, Edward Raff, Maya Fuchs, Townsend Southard Pantano, James Holt, and Kristopher Micinski.Assemblage: Automatic Binary Dataset Construction for Machine Learning. 2024.URL:https://arxiv.org/abs/2405.03991

work page arXiv 2024

[53] [53]

2026.URL:https://arxiv.org/abs/2603.28002

Chang Liu, Yihao Sun, Thomas Gilray, and Kristopher Micinski.Superset Decompilation. 2026.URL:https://arxiv.org/abs/2603.28002

work page arXiv 2026

[54] [54]

Lost in the middle: How language models use long contexts

Nelson F Liu, Kevin Lin, John Hewitt, Ashwin Paranjape, Michele Bevilacqua, Fabio Petroni, and Percy Liang. “Lost in the middle: How language models use long contexts”. In:Transac- tions of the association for computational linguistics(2024)

work page 2024

[55] [55]

2026.URL: https : //arxiv.org/abs/2602.06687

Li Lu, Yanjie Zhao, Hongzhou Rao, Kechi Zhang, and Haoyu Wang.Evaluating and Enhancing the Vulnerability Reasoning Capabilities of Large Language Models. 2026.URL: https : //arxiv.org/abs/2602.06687

work page arXiv 2026

[56] [56]

How Machine Learning Is Solving the Binary Function Similarity Problem

Andrea Marcelli, Mariano Graziano, Xabier Ugarte-Pedrero, Yanick Fratantonio, Mohamad Mansouri, and Davide Balzarotti. “How Machine Learning Is Solving the Binary Function Similarity Problem”. In:31st USENIX Security Symposium (USENIX Security 22). 2022. URL: https://www.usenix.org/conference/usenixsecurity22/presentation/ marcelli

work page 2022

[57] [57]

2019.URL: https: //arxiv.org/abs/1811.05296

Luca Massarelli, Giuseppe Antonio Di Luna, Fabio Petroni, Leonardo Querzoni, and Roberto Baldoni.SAFE: Self-Attentive Function Embeddings for Binary Similarity. 2019.URL: https: //arxiv.org/abs/1811.05296

work page arXiv 2019

[58] [58]

Equation of state calculations by fast computing machines

Nicholas Metropolis, Arianna W Rosenbluth, Marshall N Rosenbluth, Augusta H Teller, and Edward Teller. “Equation of state calculations by fast computing machines”. In:The journal of chemical physics(1953)

work page 1953

[59] [59]

Microsoft.vcpkg.https://github.com/microsoft/vcpkg. 2024

work page 2024

[60] [60]

https: //nvd.nist.gov

National Institute of Standards and Technology.National Vulnerability Database. https: //nvd.nist.gov. Accessed: 2026-05-06

work page 2026

[61] [61]

MegaVul: A C/C++ Vulnerability Dataset with Comprehensive Code Representations

Chao Ni, Liyu Shen, Xiaohu Yang, Yan Zhu, and Shaohua Wang. “MegaVul: A C/C++ Vulnerability Dataset with Comprehensive Code Representations”. In:2024 IEEE/ACM 21st International Conference on Mining Software Repositories (MSR). 2024

work page 2024

[62] [62]

TLSH–a locality sensitive hash

Jonathan Oliver, Chun Cheng, and Yanggui Chen. “TLSH–a locality sensitive hash”. In:2013 fourth cybercrime and trustworthy computing workshop. 2013

work page 2013

[63] [63]

Composable Effects for Flexible and Accelerated Probabilistic Programming in NumPyro

Du Phan, Neeraj Pradhan, and Martin Jankowiak.Composable Effects for Flexible and Ac- celerated Probabilistic Programming in NumPyro. 2019.URL: https://arxiv.org/abs/ 1912.11554

work page internal anchor Pith review Pith/arXiv arXiv 2019

[64] [64]

Anderson, Bobby Filar, and Mark McLean.Classifying Sequences of Extreme Length with Constant Memory Applied to Malware Detection

Edward Raff, William Fleshman, Richard Zak, Hyrum S. Anderson, Bobby Filar, and Mark McLean.Classifying Sequences of Extreme Length with Constant Memory Applied to Malware Detection. 2020.URL:https://arxiv.org/abs/2012.09390. 13

work page arXiv 2020

[65] [65]

CodeBLEU: a Method for Automatic Evaluation of Code Synthesis

Shuo Ren, Daya Guo, Shuai Lu, Long Zhou, Shujie Liu, Duyu Tang, Neel Sundaresan, Ming Zhou, Ambrosio Blanco, and Shuai Ma.CodeBLEU: a Method for Automatic Evaluation of Code Synthesis. 2020.URL:https://arxiv.org/abs/2009.10297

work page internal anchor Pith review Pith/arXiv arXiv 2020

[66] [66]

Riddell, A

Martin Riddell, Ansong Ni, and Arman Cohan.Quantifying Contamination in Evaluating Code Generation Capabilities of Language Models. 2024.URL: https://arxiv.org/abs/ 2403.04811

work page arXiv 2024

[67] [67]

VulZoo: A Comprehensive Vulnerability Intelligence Dataset

Bonan Ruan, Jiahao Liu, Weibo Zhao, and Zhenkai Liang. “VulZoo: A Comprehensive Vulnerability Intelligence Dataset”. In:Proceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering. 2024.URL:https://doi.org/10.1145/ 3691620.3695345

work page arXiv 2024

[68] [68]

Symbolic deobfuscation: From virtualized code back to the original

Jonathan Salwan, Sébastien Bardin, and Marie-Laure Potet. “Symbolic deobfuscation: From virtualized code back to the original”. In:International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment. 2018

work page 2018

[69] [69]

Is Function Similarity Over-Engineered? Building a Benchmark

Rebecca Saul, Chang Liu, Noah Fleischmann, Richard Zak, Kristopher Micinski, Edward Raff, and James Holt. “Is Function Similarity Over-Engineered? Building a Benchmark”. In: Advances in Neural Information Processing Systems. 2024

work page 2024

[70] [70]

Loki: Hardening code obfuscation against automated at- tacks

Moritz Schloegel, Tim Blazytko, Moritz Contag, Cornelius Aschermann, Julius Basler, Thorsten Holz, and Ali Abbasi. “Loki: Hardening code obfuscation against automated at- tacks”. In:31st USENIX Security Symposium (USENIX Security 22). 2022

work page 2022

[71] [71]

paper2repo: GitHub Repository Recommendation for Academic Papers

Huajie Shao, Dachun Sun, Jiahao Wu, Zecheng Zhang, Aston Zhang, Shuochao Yao, Shengzhong Liu, Tianshi Wang, Chao Zhang, and Tarek Abdelzaher. “paper2repo: GitHub Repository Recommendation for Academic Papers”. In:Proceedings of The Web Conference

work page

[72] [72]

2020.URL:http://dx.doi.org/10.1145/3366423.3380145

work page doi:10.1145/3366423.3380145 2020

[73] [73]

Ubuntu One investigation: Detecting evidences on client machines

Mohammad Behnam Shariati, Ali Dehghantanha, Ben Martini, and Kim-Kwang Raymond Choo. “Ubuntu One investigation: Detecting evidences on client machines”. In:The Cloud Secu- rity Ecosystem. 2015.URL:https://api.semanticscholar.org/CorpusID:33377904

work page 2015

[74] [74]

SOK: (State of) The Art of War: Offensive Techniques in Binary Analysis

Yan Shoshitaishvili, Ruoyu Wang, Christopher Salls, Nick Stephens, Mario Polino, Andrew Dutcher, John Grosen, Siji Feng, Christophe Hauser, Christopher Kruegel, and Giovanni Vigna. “SOK: (State of) The Art of War: Offensive Techniques in Binary Analysis”. In:2016 IEEE Symposium on Security and Privacy (SP). 2016

work page 2016

[75] [75]

2026.URL:https://arxiv.org/abs/2603.18355

Ashwin Sudhir, Zion Leonahenahe Basque, Wil Gibbs, Ati Priya Bajaj, Pulkit Singh Singaria, Mitchell Zakocs, Jie Hu, Moritz Schloegel, Tiffany Bao, Adam Doupe, Yan Shoshitaishvili, and Ruoyu Wang.Pushan: Trace-Free Deobfuscation of Virtualization-Obfuscated Binaries. 2026.URL:https://arxiv.org/abs/2603.18355

work page arXiv 2026

[76] [76]

Hanzhuo Tan, Qi Luo, Jing Li, and Yuqun Zhang.LLM4Decompile: Decompiling Binary Code with Large Language Models. 2024

work page 2024

[77] [77]

2025.URL:https://arxiv.org/abs/2505.12668

Hanzhuo Tan, Xiaolong Tian, Hanrui Qi, Jiaming Liu, Zuchen Gao, Siyi Wang, Qi Luo, Jing Li, and Yuqun Zhang.Decompile-Bench: Million-Scale Binary-Source Function Pairs for Real-World Binary Decompilation. 2025.URL:https://arxiv.org/abs/2505.12668

work page arXiv 2025

[78] [78]

Llms cannot reliably identify and reason about security vulnerabilities (yet?): A comprehensive evaluation, framework, and benchmarks, 2024

Saad Ullah, Mingji Han, Saurabh Pujar, Hammond Pearce, Ayse Coskun, and Gianluca Stringhini.LLMs Cannot Reliably Identify and Reason About Security Vulnerabilities (Yet?): A Comprehensive Evaluation, Framework, and Benchmarks. 2024.URL: https://arxiv.org/ abs/2312.12575

work page arXiv 2024

[79] [79]

Angr - The Next Generation of Binary Analysis

Fish Wang and Yan Shoshitaishvili. “Angr - The Next Generation of Binary Analysis”. In: 2017 IEEE Cybersecurity Development (SecDev). 2017

work page 2017

[80] [80]

2022.URL: https: //arxiv.org/abs/2205.12713

Hao Wang, Wenjie Qu, Gilad Katz, Wenyu Zhu, Zeyu Gao, Han Qiu, Jianwei Zhuge, and Chao Zhang.jTrans: Jump-Aware Transformer for Binary Code Similarity. 2022.URL: https: //arxiv.org/abs/2205.12713

work page arXiv 2022