Presents a byte-native LLM with bespoke tokenizer achieving 69-98% accuracy on malware family and architecture classification from raw bytes.
Palmtree: Learning an assembly language model for instruction embedding
3 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.CR 3years
2026 3verdicts
UNVERDICTED 3representative citing papers
Introduces bPk# as a delegatable pseudonym system with a formal framework, generic construction with security proofs, concrete instantiation, and reference implementation for eID applications.
Large-scale study on 60k firmware shows vulnerable function versions, search space, function sizes and compilation toolchains affect BCSD performance; build-aware queries raise MRR from 0.818 to 0.981 and TPL-aware two-stage search improves it by 18.5%.
citing papers explorer
-
Large Byte Model: Teaching Language Models About Compiled Code
Presents a byte-native LLM with bespoke tokenizer achieving 69-98% accuracy on malware family and architecture classification from raw bytes.
-
bpK#: Delegatable Pseudonyms And Their Applications to National eID Systems
Introduces bPk# as a delegatable pseudonym system with a formal framework, generic construction with security proofs, concrete instantiation, and reference implementation for eID applications.
-
Understanding Binary Code Similarity for Real-World Vulnerability Detection: A Large-Scale Empirical Study
Large-scale study on 60k firmware shows vulnerable function versions, search space, function sizes and compilation toolchains affect BCSD performance; build-aware queries raise MRR from 0.818 to 0.981 and TPL-aware two-stage search improves it by 18.5%.