Benchmarking Large Language Models for Cryptanalysis and Side-Channel Vulnerabilities

Chencheng Zhu; Usman Naseem; Utsav Maskey

arxiv: 2505.24621 · v3 · pith:DVVYGLXCnew · submitted 2025-05-30 · 💻 cs.CL

Benchmarking Large Language Models for Cryptanalysis and Side-Channel Vulnerabilities

Utsav Maskey , Chencheng Zhu , Usman Naseem This is my paper

classification 💻 cs.CL

keywords llmslanguagesecurityabilitiesbenchmarkingcryptanalysisdiverselarge

0 comments

read the original abstract

Recent advancements in large language models (LLMs) have transformed natural language understanding and generation, leading to extensive benchmarking across diverse tasks. However, cryptanalysis - a critical area for data security and its connection to LLMs' generalization abilities - remains underexplored in LLM evaluations. To address this gap, we evaluate the cryptanalytic potential of state-of-the-art LLMs on ciphertexts produced by a range of cryptographic algorithms. We introduce a benchmark dataset of diverse plaintexts, spanning multiple domains, lengths, writing styles, and topics, paired with their encrypted versions. Using zero-shot and few-shot settings along with chain-of-thought prompting, we assess LLMs' decryption success rate and discuss their comprehension abilities. Our findings reveal key insights into LLMs' strengths and limitations in side-channel scenarios and raise concerns about their susceptibility to under-generalization-related attacks. This research highlights the dual-use nature of LLMs in security contexts and contributes to the ongoing discussion on AI safety and security.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Do LLMsMakeNeural Distinguishers Wise?
cs.CR 2026-06 unverdicted novelty 7.0

LLM-based neural distinguishers on SPECK-32/64 show no improvement over ResNet but gain from XOR-inclusive prompts.
Empirical Evaluation of Large Language Models for Migration of Code Fragments to Post-Quantum Cryptography
cs.CR 2026-06 conditional novelty 6.0

Fine-tuned GPT-4.1-mini reaches 0.9072 static similarity and 92.5% functional correctness on a new synthetic dataset of cryptographic code migrations, outperforming zero-shot baselines.