Model Output

It then inspects the decompiled wrapper, correctly infers that the program encrypts a 25 Preprint

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

cs.CR · 2026-04-04 · unverdicted · novelty 7.0

CREBench benchmark finds frontier LLMs recover cryptographic flags in 59% of cases versus 92% for human experts.

Showing 1 of 1 citing paper.

CREBench: Evaluating Large Language Models in Cryptographic Binary Reverse Engineering cs.CR · 2026-04-04 · unverdicted · none · ref 2
CREBench benchmark finds frontier LLMs recover cryptographic flags in 59% of cases versus 92% for human experts.