ASSEMBLAGE-DEEPHISTORY: A Cross-Build Binary Dataset with Temporal Coverage
Pith reviewed 2026-05-22 09:36 UTC · model grok-4.3
The pith
ASSEMBLAGE-DEEPHISTORY provides a single database linking 73,610 binaries to their source code, compilation details, historical versions, and CVE-labeled vulnerable functions.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper establishes ASSEMBLAGE-DEEPHISTORY as a consolidated dataset of 73,610 binaries from 248 open-source projects. These binaries come from GCC, Clang, and MSVC compilers at various optimization levels on Linux and Windows, including multi-year historical builds. Each entry connects to its source code, functions, debug information, other build variants, past versions, and functions known to be vulnerable.
What carries the argument
The queryable database structure that treats compilation context, source code, vulnerable functions, and package version as first-class metadata for every binary.
If this is right
- LLMs can be tested in stages for recognizing vulnerabilities, using strategy guidance, and transferring detection across different builds.
- Embedding methods like MalConv and jTrans can be compared on how well they group binaries from the same package versions.
- Binary similarity can be broken down into effects from time between versions, code changes, and commit activity using Bayesian methods.
Where Pith is reading between the lines
- This structure could let developers train more robust vulnerability detectors that ignore irrelevant build differences.
- Future work might track how specific vulnerabilities appear and disappear across software releases using the historical links.
- Security researchers could use the cross-platform builds to study compiler-specific weaknesses in a controlled way.
Load-bearing premise
The three provided analyses sufficiently prove the dataset's value for practical tasks without requiring further large-scale tests or outside benchmarks.
What would settle it
A demonstration that the LLM benchmark results do not reflect true reasoning or that the clustering and regression fail to distinguish meaningful patterns would undermine the dataset's claimed utility.
Figures
read the original abstract
Existing binary corpora typically capture only one or two axes of binary variation: they either provide cross-compiler builds without a temporal axis, or CVE labels for single-build binaries. None combine cross-build diversity, cross-version history, and CVE labels into a queryable structure. We present ASSEMBLAGE-DEEPHISTORY, which consolidates these dimensions into a unified framework where every binary's compilation context, source code, vulnerable functions, and package version are stored as first-class metadata. ASSEMBLAGE-DEEPHISTORY comprises 73,610 binaries spanning 248 open-source projects, compiled across GCC, Clang, and MSVC at multiple optimization levels on Linux and Windows, with multi-year historical builds. Each binary is indexed in a database that links it to its source code, functions, debug info, variant builds, historical versions, and vulnerable functions. Three analyses demonstrate this structure's value: (1) a three-stage LLM benchmark (recognition, strategy-guided detection, and cross-build transfer) to test whether LLMs reason about binary vulnerabilities or pattern-match on build-specific artifacts; (2) a comparison of MalConv embeddings, jTrans function embeddings, and TLSH fuzzy hashes quantifying how same-package versions cluster in each space; and (3) a Bayesian regression decomposing binary similarity into contributions from temporal distance, file changes, and commits.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents ASSEMBLAGE-DEEPHISTORY, a dataset of 73,610 binaries spanning 248 open-source projects. It unifies cross-compiler builds (GCC, Clang, MSVC at multiple optimization levels on Linux and Windows), multi-year historical versions, and CVE labels into a single queryable database. Every binary is linked as first-class metadata to its source code, functions, debug information, variant builds, historical versions, and vulnerable functions. Value is shown via three internal analyses: a three-stage LLM benchmark (recognition, strategy-guided detection, cross-build transfer), embedding clustering comparisons (MalConv, jTrans, TLSH) on same-package versions, and Bayesian regression decomposing similarity into temporal distance, file changes, and commit factors.
Significance. If the dataset construction details and analysis results hold, the work supplies a useful resource for binary vulnerability research by consolidating axes of variation previously available only in isolation. The database indexing of compilation context, source links, and CVE labels is a concrete strength that could support new queries. No machine-checked proofs or parameter-free derivations are present, but the reproducible corpus structure itself is a positive contribution for the field.
major comments (1)
- [Analyses section (corresponding to the three analyses described after the dataset construction)] The section describing the three analyses: these demonstrations remain entirely internal to the new corpus and quantify structure (e.g., clustering behavior or factor decomposition) without a controlled external comparison showing measurable gains on downstream tasks such as cross-build vulnerability transfer or historical CVE localization relative to existing single-axis corpora. This leaves the central claim that the unified metadata framework enables new reasoning capabilities resting on an unverified assumption.
minor comments (2)
- [Dataset description] Clarify the exact number of variants per project and the distribution across compilers/optimizations in the dataset statistics table; current high-level aggregates make reproducibility checks harder.
- [Analysis 1 and Analysis 3] Add error bars or ablation details to the LLM benchmark results and the Bayesian regression coefficients to strengthen the quantitative claims.
Simulated Author's Rebuttal
We thank the referee for their constructive review of our manuscript on ASSEMBLAGE-DEEPHISTORY. We address the single major comment below and indicate where revisions have been made to the manuscript.
read point-by-point responses
-
Referee: [Analyses section (corresponding to the three analyses described after the dataset construction)] The section describing the three analyses: these demonstrations remain entirely internal to the new corpus and quantify structure (e.g., clustering behavior or factor decomposition) without a controlled external comparison showing measurable gains on downstream tasks such as cross-build vulnerability transfer or historical CVE localization relative to existing single-axis corpora. This leaves the central claim that the unified metadata framework enables new reasoning capabilities resting on an unverified assumption.
Authors: We appreciate the referee's observation that the three analyses are conducted internally to the corpus. The intent of these demonstrations is to illustrate the novel analytical capabilities unlocked by unifying cross-build, temporal, and CVE metadata in a single queryable structure—capabilities that cannot be exercised on existing single-axis corpora. For instance, the LLM cross-build transfer stage directly tests whether models exploit build-specific artifacts, which requires the multi-compiler and multi-version axes we provide. The embedding clustering and Bayesian regression similarly decompose effects across temporal distance and build variants in ways prior datasets do not support. We acknowledge, however, that explicit head-to-head performance gains on downstream tasks such as vulnerability detection accuracy would provide additional external validation. In the revised manuscript we have added a dedicated limitations and future-work subsection that (a) contrasts the query expressiveness of ASSEMBLAGE-DEEPHISTORY with prior corpora and (b) outlines controlled external benchmarks that the community can now perform using the released dataset. This revision clarifies the scope of our current claims while preserving the paper's focus on the dataset itself. revision: partial
Circularity Check
No circularity: dataset release with internal utility analyses remains self-contained
full rationale
The paper presents ASSEMBLAGE-DEEPHISTORY as a new corpus that unifies cross-build, temporal, and CVE metadata, then illustrates its structure via three analyses performed directly on the released binaries (LLM tasks, embedding clustering, Bayesian regression). These steps quantify internal properties of the corpus but introduce no equations, fitted parameters renamed as predictions, or self-citation chains that reduce the dataset claim to its own inputs. The central contribution is the construction and indexing of the data itself; the analyses serve as descriptive benchmarks rather than derivations whose outputs are forced by construction. No load-bearing uniqueness theorems or ansatzes from prior author work are invoked to justify the framework.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Compilation contexts using GCC, Clang, and MSVC at multiple optimization levels on Linux and Windows accurately capture real-world binary variation.
- domain assumption Vulnerable functions can be reliably identified and linked to binaries via debug info and source code.
Reference graph
Works this paper leans on
-
[1]
https: / / github.com/nationalsecurityagency/ghidra
National Security Agency.Ghidra Software Reverse Engineering Framework. https: / / github.com/nationalsecurityagency/ghidra. accessed 2026-05-06. 2019
work page 2026
-
[2]
SecVulEval: Benchmarking LLMs for Real-World C/C++ Vulnerability Detection
Md Basim Uddin Ahmed, Nima Shiri Harzevili, Jiho Shin, Hung Viet Pham, and Song Wang. SecVulEval: Benchmarking LLMs for Real-World C/C++ Vulnerability Detection. 2025.URL: https://arxiv.org/abs/2505.19828
-
[3]
Assessing the Effectiveness of the Tigress Obfuscator Against MOPSA and BinaryNinja
Nicolò Altamura, Enrico Bragastini, Marco Campion, and Mila Dalla Preda. “Assessing the Effectiveness of the Tigress Obfuscator Against MOPSA and BinaryNinja”. In:Proceedings of the 2025 Workshop on Research on Offensive and Defensive Techniques in the Context of Man At The End (MATE) Attacks. 2025.URL:https://doi.org/10.1145/3733817.3762702
-
[4]
EMBER: An Open Dataset for Training Static PE Malware Machine Learning Models
H. Anderson and Phil Roth. “EMBER: An Open Dataset for Training Static PE Malware Machine Learning Models”. In:ArXiv(2018).URL: https://api.semanticscholar.org/ CorpusID:4888440
work page 2018
-
[5]
Apple.Apple debuts M5 Pro and M5 Max to supercharge the most demanding pro work- flows. Apple Newsroom. Accessed: 2026-05-06. 2026.URL: https://www.apple.com/ newsroom/2026/03/apple- debuts- m5- pro- and- m5- max- to- supercharge- the- most-demanding-pro-workflows/
work page 2026
-
[6]
BinPool: A Dataset of Vulnerabilities for Binary Security Analysis
Sima Arasteh, Georgios Nikitopoulos, Wei-Cheng Wu, Nicolaas Weideman, Aaron Portnoy, Mukund Raghothaman, and Christophe Hauser. “BinPool: A Dataset of Vulnerabilities for Binary Security Analysis”. In:Proceedings of the 33rd ACM International Conference on the Foundations of Software Engineering. 2025
work page 2025
-
[7]
Polyglot and Distributed Software Repository Mining with Crossflow
Konstantinos Barmpis, Patrick Neubauer, Jonathan Co, Dimitris Kolovos, Nicholas Matragkas, and Richard F. Paige. “Polyglot and Distributed Software Repository Mining with Crossflow”. In:Proceedings of the 17th International Conference on Mining Software Repositories. 2020. URL:https://doi.org/10.1145/3379597.3387481
-
[8]
Zion Leonahenahe Basque, Ati Priya Bajaj, Wil Gibbs, Jude O’Kain, Derron Miao, Tiffany Bao, Adam Doupé, Yan Shoshitaishvili, and Ruoyu Wang. “Ahoy SAILR! There is No Need to DREAM of C: A Compiler-Aware Structuring Algorithm for Binary Decompilation”. In: 33rd USENIX Security Symposium (USENIX Security 24). 2024.URL: https://www.usenix. org/conference/use...
work page 2024
-
[9]
CVEfixes: automated collection of vulner- abilities and their fixes from open-source software
Guru Bhandari, Amara Naseer, and Leon Moonen. “CVEfixes: automated collection of vulner- abilities and their fixes from open-source software”. In:Proceedings of the 17th International Conference on Predictive Models and Data Analytics in Software Engineering. 2021.URL: http://dx.doi.org/10.1145/3475960.3475985
-
[10]
Syntia: Synthe- sizing the semantics of obfuscated code
Tim Blazytko, Moritz Contag, Cornelius Aschermann, and Thorsten Holz. “Syntia: Synthe- sizing the semantics of obfuscated code”. In:26th USENIX Security Symposium (USENIX Security 17). 2017. 10
work page 2017
-
[11]
The tigress c diversifier/obfuscator
Christian Collberg, Sam Martin, Jonathan Myers, Bill Zimmerman, Petr Krajca, Gabriel Kerneis, Saumya Debray, and Babak Yadegari. “The tigress c diversifier/obfuscator”. In: Retrieved August(2015)
work page 2015
-
[12]
Christian Collberg, Clark Thomborson, and Douglas Low.A taxonomy of obfuscating transfor- mations. 1997
work page 1997
-
[13]
BinBench: a benchmark for x64 portable operating system interface binary function represen- tations
Francesca Console, Giuseppe D’Aquanno, Giuseppe Antonio Di Luna, and Leonardo Querzoni. “BinBench: a benchmark for x64 portable operating system interface binary function represen- tations”. In:PeerJ Computer Science(2023).URL: https://api.semanticscholar.org/ CorpusID:259029804
work page 2023
-
[14]
EM- BERSim: A Large-Scale Databank for Boosting Similarity Search in Malware Analysis
Dragos Georgian Corlatescu, Alexandru Dinu, Mihaela Gaman, and Paul Sumedrea. “EM- BERSim: A Large-Scale Databank for Boosting Similarity Search in Malware Analysis”. In: ArXiv(2023).URL:https://api.semanticscholar.org/CorpusID:263608542
work page 2023
-
[15]
RISC-V Instruction Set Architecture Extensions: A Survey
Enfang Cui, Tianzheng Li, and Qian Wei. “RISC-V Instruction Set Architecture Extensions: A Survey”. In:IEEE Access(2023)
work page 2023
-
[16]
CVE Program.Common Vulnerabilities and Exposures (CVE). https://www.cve.org/ . Accessed: 2026-05-06
work page 2026
-
[17]
https://ai.google.dev/gemma/docs/core/ model_card_4
Google DeepMind.Gemma 4 model card. https://ai.google.dev/gemma/docs/core/ model_card_4. accessed 2026-05-20. 2026
work page 2026
-
[18]
Steven HH Ding, Benjamin CM Fung, and Philippe Charland. “Asm2vec: Boosting static representation robustness for binary clone search against code obfuscation and compiler optimization”. In:2019 ieee symposium on security and privacy (sp). 2019
work page 2019
-
[19]
Vulnerability detection with code language models: How far are we?
Yangruibo Ding, Yanjun Fu, Omniyyah Ibrahim, Chawin Sitawarin, Xinyun Chen, Basel Alomair, David Wagner, Baishakhi Ray, and Yizheng Chen. “Vulnerability Detection with Code Language Models: How Far Are We?” In:arXiv preprint arXiv:2403.18624(2024)
-
[20]
LibvDiff: Library Version Difference Guided OSS Version Identification in Binaries
Chaopeng Dong, Siyuan Li, Shouguo Yang, Yang Xiao, Yongpan Wang, Hong Li, Zhi Li, and Limin Sun. “LibvDiff: Library Version Difference Guided OSS Version Identification in Binaries”. In:Proceedings of the IEEE/ACM 46th International Conference on Software Engineering. 2024.URL:https://doi.org/10.1145/3597503.3623336
-
[21]
Schwartz.Idioms: Neural Decompilation With Joint Code and Type Definition Prediction
Luke Dramko, Claire Le Goues, and Edward J. Schwartz.Idioms: Neural Decompilation With Joint Code and Type Definition Prediction. 2025.URL: https://arxiv.org/abs/2502. 04536
work page 2025
-
[22]
Identifying Open- Source License Violation and 1-day Security Risk at Large Scale
Ruian Duan, Ashish Bijlani, Meng Xu, Taesoo Kim, and Wenke Lee. “Identifying Open- Source License Violation and 1-day Security Risk at Large Scale”. In:Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security. 2017.URL: https://doi.org/10.1145/3133956.3134048
-
[23]
DeepBinDiff: Learning Program- Wide Code Representations for Binary Diffing
Yue Duan, Xuezixiang Li, Jinghan Wang, and Heng Yin. “DeepBinDiff: Learning Program- Wide Code Representations for Binary Diffing”. In:27th Annual Network and Distributed Sys- tem Security Symposium, NDSS 2020, San Diego, California, USA, February 23-26, 2020. 2020. URL: https://www.ndss- symposium.org/ndss- paper/deepbindiff- learning- program-wide-code-...
work page 2020
-
[24]
A C/C++ Code Vulnerability Dataset with Code Changes and CVE Summaries
Jiahao Fan, Yi Li, Shaohua Wang, and Tien N. Nguyen. “A C/C++ Code Vulnerability Dataset with Code Changes and CVE Summaries”. In:Proceedings of the 17th International Confer- ence on Mining Software Repositories. 2020.URL: https://doi.org/10.1145/3379597. 3387501
-
[25]
Scalable graph-based bug search for firmware images
Qian Feng, Rundong Zhou, Chengcheng Xu, Yao Cheng, Brian Testa, and Heng Yin. “Scalable graph-based bug search for firmware images”. In:Proceedings of the 2016 ACM SIGSAC conference on computer and communications security. 2016
work page 2016
-
[26]
Structural comparison of executable objects
Halvar Flake. “Structural comparison of executable objects”. In:Detection of intrusions and malware & vulnerability assessment, GI SIG SIDAR workshop, DIMVA 2004. 2004
work page 2004
-
[27]
BinHunt: Automatically Finding Semantic Differences in Binary Programs
Debin Gao, Michael K. Reiter, and Dawn Song. “BinHunt: Automatically Finding Semantic Differences in Binary Programs”. In:Information and Communications Security: 10th Interna- tional Conference, ICICS 2008 Birmingham, UK, October 20 - 22, 2008 Proceedings. 2008. URL:https://doi.org/10.1007/978-3-540-88625-9_16. 11
-
[28]
SigmaDiff: Semantics-Aware Deep Graph Matching for Pseudocode Diffing
Lian Gao, Yu Qu, Sheng Yu, Yue Duan, and Heng Yin. “SigmaDiff: Semantics-Aware Deep Graph Matching for Pseudocode Diffing”. In:Proceedings 2024 Network and Distributed Sys- tem Security Symposium(2024).URL: https://api.semanticscholar.org/CorpusID: 262144278
work page 2024
-
[29]
Andrew Gelman. “Prior distributions for variance parameters in hierarchical models (comment on article by Browne and Draper)”. In:Bayesian Analysis(2006).URL: https://doi.org/ 10.1214/06-BA117A
-
[30]
Why We (Usually) Don’t Have to Worry About Multiple Comparisons
Andrew Gelman, Jennifer Hill, and Masanao Yajima. “Why We (Usually) Don’t Have to Worry About Multiple Comparisons”. In:Journal of Research on Educational Effectiveness(2012). URL:https://doi.org/10.1080/19345747.2011.618213
-
[31]
Inference from iterative simulation using multiple sequences
Andrew Gelman and Donald B Rubin. “Inference from iterative simulation using multiple sequences”. In:Statistical science(1992)
work page 1992
-
[32]
GitHub.GitHub Advisory Database.https://github.com/advisories. Accessed: 2026- 05-06
work page 2026
-
[33]
The GHTorent dataset and tool suite
Georgios Gousios. “The GHTorent dataset and tool suite”. In:2013 10th Working Conference on Mining Software Repositories (MSR). 2013
work page 2013
-
[34]
BinProv: Binary Code Provenance Identification without Disassembly
Xu He, Shu Wang, Yunlong Xing, Pengbin Feng, Haining Wang, Qi Li, Songqing Chen, and Kun Sun. “BinProv: Binary Code Provenance Identification without Disassembly”. In: Proceedings of the 25th International Symposium on Research in Attacks, Intrusions and Defenses(2022).URL:https://api.semanticscholar.org/CorpusID:252910574
work page 2022
-
[35]
The No-U-turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo
Matthew D. Hoffman and Andrew Gelman. “The No-U-turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo”. In:J. Mach. Learn. Res.(2011).URL: https: //api.semanticscholar.org/CorpusID:12948548
work page 2011
-
[36]
RULER: What's the Real Context Size of Your Long-Context Language Models?
Cheng-Ping Hsieh, Simeng Sun, Samuel Kriman, Shantanu Acharya, Dima Rekesh, Fei Jia, Yang Zhang, and Boris Ginsburg. “RULER: What’s the real context size of your long-context language models?” In:arXiv preprint arXiv:2404.06654(2024)
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[37]
Zecong Hu and Jeremy Lacomis.GitHub Cloner & Compiler. 2020.URL: https://github. com/huzecong/ghcc
work page 2020
-
[38]
2025.URL:https://arxiv.org/abs/2505.22010
Nasir Hussain, Haohan Chen, Chanh Tran, Philip Huang, Zhuohao Li, Pravir Chugh, William Chen, Ashish Kundu, and Yuan Tian.VulBinLLM: LLM-powered Vulnerability Detection for Stripped Binaries. 2025.URL:https://arxiv.org/abs/2505.22010
-
[39]
BinaryAI: Binary Software Composition Analysis via Intelligent Binary Source Code Matching
Ling Jiang, Junwen An, Huihui Huang, Qiyi Tang, Sen Nie, Shi Wu, and Yuqun Zhang. BinaryAI: Binary Software Composition Analysis via Intelligent Binary Source Code Matching. 2024.URL:https://arxiv.org/abs/2401.11161
-
[40]
2025.URL:https://arxiv.org/abs/2311.13721
Nan Jiang, Chengxiao Wang, Kevin Liu, Xiangzhe Xu, Lin Tan, Xiangyu Zhang, and Petr Babkin.Nova: Generative Language Models for Assembly Code with Hierarchical Attention and Contrastive Learning. 2025.URL:https://arxiv.org/abs/2311.13721
-
[41]
Robert J. Joyce, Dev Amlani, Charles Nicholas, and Edward Raff.MOTIF: A Large Malware Reference Dataset with Ground Truth Family Labels. 2021.URL: https://arxiv.org/abs/ 2111.15031
-
[42]
EMBER2024 - A Benchmark Dataset for Holistic Evaluation of Malware Classifiers
Robert J. Joyce, Gideon Miller, Phil Roth, Richard Zak, Elliott Zaresky-Williams, Hyrum Anderson, Edward Raff, and James Holt. “EMBER2024 - A Benchmark Dataset for Holistic Evaluation of Malware Classifiers”. In:Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V .2. 2025.URL: http://dx.doi.org/10.1145/ 3711896.3737431
-
[43]
Obfuscator-LLVM – Software Protection for the Masses
Pascal Junod, Julien Rinaldini, Johan Wehrli, and Julie Michielin. “Obfuscator-LLVM – Software Protection for the Masses”. In:2015 IEEE/ACM 1st International Workshop on Software Protection. 2015
work page 2015
-
[44]
Dongkwan Kim, Eunsoo Kim, Sang Kil Cha, Sooel Son, and Yongdae Kim. “Revisiting Binary Code Similarity Analysis Using Interpretable Feature Engineering and Lessons Learned”. In: IEEE Transactions on Software Engineering(2023).URL: http://dx.doi.org/10.1109/ TSE.2022.3187689
-
[45]
Joxean Koret.Diaphora.https://github.com/joxeankoret/diaphora. 12
-
[46]
Doing Bayesian data analysis: A tutorial with R, JAGS, and Stan
John Kruschke. “Doing Bayesian data analysis: A tutorial with R, JAGS, and Stan”. In: (2014)
work page 2014
-
[47]
URLhttps://openreview.net/forum?id=VTF8yNQM66
Hwiwon Lee, Ziqi Zhang, Hanxiao Lu, and Lingming Zhang.SEC-bench: Automated Bench- marking of LLM Agents on Real-World Software Security Tasks. 2025.URL: https://arxiv. org/abs/2506.11791
-
[48]
2025.URL:https://arxiv.org/abs/2506.05692
Xinghang Li, Jingzhe Ding, Chao Peng, Bing Zhao, Xiang Gao, Hongwan Gao, and Xinchen Gu.SafeGenBench: A Benchmark Framework for Security Vulnerability Detection in LLM- Generated Code. 2025.URL:https://arxiv.org/abs/2506.05692
-
[49]
PalmTree: Learning an Assembly Language Model for Instruction Embedding
Xuezixiang Li, Yu Qu, and Heng Yin. “PalmTree: Learning an Assembly Language Model for Instruction Embedding”. In:Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security. 2021.URL: http : / / dx . doi . org / 10 . 1145 / 3460120 . 3484587
work page 2021
-
[50]
Mining Internet-Scale Software Repositories
Erik Linstead, Paul Rigor, Sushil Bajracharya, Cristina Lopes, and Pierre Baldi. “Mining Internet-Scale Software Repositories”. In:Advances in Neural Information Processing Systems. 2007.URL: https://proceedings.neurips.cc/paper_files/paper/2007/file/ a532400ed62e772b9dc0b86f46e583ff-Paper.pdf
work page 2007
-
[51]
α Diff: Cross-Version Binary Code Similarity Detection with DNN
Bingchang Liu, Wei Huo, Chao Zhang, Wenchao Li, Feng Li, Aihua Piao, and Wei Zou. “α Diff: Cross-Version Binary Code Similarity Detection with DNN”. In:2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE). 2018
work page 2018
-
[52]
2024.URL:https://arxiv.org/abs/2405.03991
Chang Liu, Rebecca Saul, Yihao Sun, Edward Raff, Maya Fuchs, Townsend Southard Pantano, James Holt, and Kristopher Micinski.Assemblage: Automatic Binary Dataset Construction for Machine Learning. 2024.URL:https://arxiv.org/abs/2405.03991
-
[53]
2026.URL:https://arxiv.org/abs/2603.28002
Chang Liu, Yihao Sun, Thomas Gilray, and Kristopher Micinski.Superset Decompilation. 2026.URL:https://arxiv.org/abs/2603.28002
-
[54]
Lost in the middle: How language models use long contexts
Nelson F Liu, Kevin Lin, John Hewitt, Ashwin Paranjape, Michele Bevilacqua, Fabio Petroni, and Percy Liang. “Lost in the middle: How language models use long contexts”. In:Transac- tions of the association for computational linguistics(2024)
work page 2024
-
[55]
2026.URL: https : //arxiv.org/abs/2602.06687
Li Lu, Yanjie Zhao, Hongzhou Rao, Kechi Zhang, and Haoyu Wang.Evaluating and Enhancing the Vulnerability Reasoning Capabilities of Large Language Models. 2026.URL: https : //arxiv.org/abs/2602.06687
-
[56]
How Machine Learning Is Solving the Binary Function Similarity Problem
Andrea Marcelli, Mariano Graziano, Xabier Ugarte-Pedrero, Yanick Fratantonio, Mohamad Mansouri, and Davide Balzarotti. “How Machine Learning Is Solving the Binary Function Similarity Problem”. In:31st USENIX Security Symposium (USENIX Security 22). 2022. URL: https://www.usenix.org/conference/usenixsecurity22/presentation/ marcelli
work page 2022
-
[57]
2019.URL: https: //arxiv.org/abs/1811.05296
Luca Massarelli, Giuseppe Antonio Di Luna, Fabio Petroni, Leonardo Querzoni, and Roberto Baldoni.SAFE: Self-Attentive Function Embeddings for Binary Similarity. 2019.URL: https: //arxiv.org/abs/1811.05296
-
[58]
Equation of state calculations by fast computing machines
Nicholas Metropolis, Arianna W Rosenbluth, Marshall N Rosenbluth, Augusta H Teller, and Edward Teller. “Equation of state calculations by fast computing machines”. In:The journal of chemical physics(1953)
work page 1953
-
[59]
Microsoft.vcpkg.https://github.com/microsoft/vcpkg. 2024
work page 2024
-
[60]
National Institute of Standards and Technology.National Vulnerability Database. https: //nvd.nist.gov. Accessed: 2026-05-06
work page 2026
-
[61]
MegaVul: A C/C++ Vulnerability Dataset with Comprehensive Code Representations
Chao Ni, Liyu Shen, Xiaohu Yang, Yan Zhu, and Shaohua Wang. “MegaVul: A C/C++ Vulnerability Dataset with Comprehensive Code Representations”. In:2024 IEEE/ACM 21st International Conference on Mining Software Repositories (MSR). 2024
work page 2024
-
[62]
TLSH–a locality sensitive hash
Jonathan Oliver, Chun Cheng, and Yanggui Chen. “TLSH–a locality sensitive hash”. In:2013 fourth cybercrime and trustworthy computing workshop. 2013
work page 2013
-
[63]
Composable Effects for Flexible and Accelerated Probabilistic Programming in NumPyro
Du Phan, Neeraj Pradhan, and Martin Jankowiak.Composable Effects for Flexible and Ac- celerated Probabilistic Programming in NumPyro. 2019.URL: https://arxiv.org/abs/ 1912.11554
work page internal anchor Pith review Pith/arXiv arXiv 2019
-
[64]
Edward Raff, William Fleshman, Richard Zak, Hyrum S. Anderson, Bobby Filar, and Mark McLean.Classifying Sequences of Extreme Length with Constant Memory Applied to Malware Detection. 2020.URL:https://arxiv.org/abs/2012.09390. 13
-
[65]
CodeBLEU: a Method for Automatic Evaluation of Code Synthesis
Shuo Ren, Daya Guo, Shuai Lu, Long Zhou, Shujie Liu, Duyu Tang, Neel Sundaresan, Ming Zhou, Ambrosio Blanco, and Shuai Ma.CodeBLEU: a Method for Automatic Evaluation of Code Synthesis. 2020.URL:https://arxiv.org/abs/2009.10297
work page internal anchor Pith review Pith/arXiv arXiv 2020
-
[66]
Martin Riddell, Ansong Ni, and Arman Cohan.Quantifying Contamination in Evaluating Code Generation Capabilities of Language Models. 2024.URL: https://arxiv.org/abs/ 2403.04811
-
[67]
VulZoo: A Comprehensive Vulnerability Intelligence Dataset
Bonan Ruan, Jiahao Liu, Weibo Zhao, and Zhenkai Liang. “VulZoo: A Comprehensive Vulnerability Intelligence Dataset”. In:Proceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering. 2024.URL:https://doi.org/10.1145/ 3691620.3695345
-
[68]
Symbolic deobfuscation: From virtualized code back to the original
Jonathan Salwan, Sébastien Bardin, and Marie-Laure Potet. “Symbolic deobfuscation: From virtualized code back to the original”. In:International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment. 2018
work page 2018
-
[69]
Is Function Similarity Over-Engineered? Building a Benchmark
Rebecca Saul, Chang Liu, Noah Fleischmann, Richard Zak, Kristopher Micinski, Edward Raff, and James Holt. “Is Function Similarity Over-Engineered? Building a Benchmark”. In: Advances in Neural Information Processing Systems. 2024
work page 2024
-
[70]
Loki: Hardening code obfuscation against automated at- tacks
Moritz Schloegel, Tim Blazytko, Moritz Contag, Cornelius Aschermann, Julius Basler, Thorsten Holz, and Ali Abbasi. “Loki: Hardening code obfuscation against automated at- tacks”. In:31st USENIX Security Symposium (USENIX Security 22). 2022
work page 2022
-
[71]
paper2repo: GitHub Repository Recommendation for Academic Papers
Huajie Shao, Dachun Sun, Jiahao Wu, Zecheng Zhang, Aston Zhang, Shuochao Yao, Shengzhong Liu, Tianshi Wang, Chao Zhang, and Tarek Abdelzaher. “paper2repo: GitHub Repository Recommendation for Academic Papers”. In:Proceedings of The Web Conference
-
[72]
2020.URL:http://dx.doi.org/10.1145/3366423.3380145
-
[73]
Ubuntu One investigation: Detecting evidences on client machines
Mohammad Behnam Shariati, Ali Dehghantanha, Ben Martini, and Kim-Kwang Raymond Choo. “Ubuntu One investigation: Detecting evidences on client machines”. In:The Cloud Secu- rity Ecosystem. 2015.URL:https://api.semanticscholar.org/CorpusID:33377904
work page 2015
-
[74]
SOK: (State of) The Art of War: Offensive Techniques in Binary Analysis
Yan Shoshitaishvili, Ruoyu Wang, Christopher Salls, Nick Stephens, Mario Polino, Andrew Dutcher, John Grosen, Siji Feng, Christophe Hauser, Christopher Kruegel, and Giovanni Vigna. “SOK: (State of) The Art of War: Offensive Techniques in Binary Analysis”. In:2016 IEEE Symposium on Security and Privacy (SP). 2016
work page 2016
-
[75]
2026.URL:https://arxiv.org/abs/2603.18355
Ashwin Sudhir, Zion Leonahenahe Basque, Wil Gibbs, Ati Priya Bajaj, Pulkit Singh Singaria, Mitchell Zakocs, Jie Hu, Moritz Schloegel, Tiffany Bao, Adam Doupe, Yan Shoshitaishvili, and Ruoyu Wang.Pushan: Trace-Free Deobfuscation of Virtualization-Obfuscated Binaries. 2026.URL:https://arxiv.org/abs/2603.18355
-
[76]
Hanzhuo Tan, Qi Luo, Jing Li, and Yuqun Zhang.LLM4Decompile: Decompiling Binary Code with Large Language Models. 2024
work page 2024
-
[77]
2025.URL:https://arxiv.org/abs/2505.12668
Hanzhuo Tan, Xiaolong Tian, Hanrui Qi, Jiaming Liu, Zuchen Gao, Siyi Wang, Qi Luo, Jing Li, and Yuqun Zhang.Decompile-Bench: Million-Scale Binary-Source Function Pairs for Real-World Binary Decompilation. 2025.URL:https://arxiv.org/abs/2505.12668
-
[78]
Saad Ullah, Mingji Han, Saurabh Pujar, Hammond Pearce, Ayse Coskun, and Gianluca Stringhini.LLMs Cannot Reliably Identify and Reason About Security Vulnerabilities (Yet?): A Comprehensive Evaluation, Framework, and Benchmarks. 2024.URL: https://arxiv.org/ abs/2312.12575
-
[79]
Angr - The Next Generation of Binary Analysis
Fish Wang and Yan Shoshitaishvili. “Angr - The Next Generation of Binary Analysis”. In: 2017 IEEE Cybersecurity Development (SecDev). 2017
work page 2017
-
[80]
2022.URL: https: //arxiv.org/abs/2205.12713
Hao Wang, Wenjie Qu, Gilad Katz, Wenyu Zhu, Zeyu Gao, Han Qiu, Jianwei Zhuge, and Chao Zhang.jTrans: Jump-Aware Transformer for Binary Code Similarity. 2022.URL: https: //arxiv.org/abs/2205.12713
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.