pith. sign in

arxiv: 2606.25561 · v1 · pith:M747V2TGnew · submitted 2026-06-24 · 💻 cs.CR

CrypFormBench: Benchmarking Formal Analysis Capability of Large Language Models for Cryptographic Schemes

classification 💻 cs.CR
keywords formalschemesanalysiscryptographicsecuritybenchmarkcompletioncorrection
0
0 comments X
read the original abstract

Manual formal analysis of cryptographic schemes is labor-intensive and requires substantial expertise. While model-checking tools (e.g., Scyther and Tamarin) and computational-security tools (e.g., CryptoVerif and EasyCrypt) improve the automation of security proofs, they still rely on experts to abstract schemes and write tool-specific formal descriptions. Large language models (LLMs) are a promising alternative, but their effectiveness in this domain remains unexplored due to the absence of standardized evaluation methodologies. To fill this gap, we introduce CrypFormBench (C.F.B for short), a comprehensive benchmark jointly covering symbolic and computational security to evaluate five core LLM capabilities: interpretation, generation, completion, transformation, and correction. It comprises 700 instances spanning 677 schemes, 7 mainstream formal verifier languages, and 160 security properties. The evaluation of 9 state-of-the-art LLMs reveals that most of them perform well on interpretation and completion, given their code-awareness advantages, but struggle with generation, transformation, and correction. Overall, their performance remains limited, with Claude-3.5 achieving the highest score at 48.7 out of 100. We further provide practical guidance, e.g., few-shot prompting, Pass@K sampling, and lightweight fine-tuning, to mitigate the executability bottleneck and improve tool-usable outputs. Taken together, our benchmark and analyses offer a grounded view of current progress and concrete directions toward reliable LLM-assisted formal cryptographic analysis.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.