pith. sign in

arxiv: 2505.24621 · v3 · pith:DVVYGLXCnew · submitted 2025-05-30 · 💻 cs.CL

Benchmarking Large Language Models for Cryptanalysis and Side-Channel Vulnerabilities

classification 💻 cs.CL
keywords llmslanguagesecurityabilitiesbenchmarkingcryptanalysisdiverselarge
0
0 comments X
read the original abstract

Recent advancements in large language models (LLMs) have transformed natural language understanding and generation, leading to extensive benchmarking across diverse tasks. However, cryptanalysis - a critical area for data security and its connection to LLMs' generalization abilities - remains underexplored in LLM evaluations. To address this gap, we evaluate the cryptanalytic potential of state-of-the-art LLMs on ciphertexts produced by a range of cryptographic algorithms. We introduce a benchmark dataset of diverse plaintexts, spanning multiple domains, lengths, writing styles, and topics, paired with their encrypted versions. Using zero-shot and few-shot settings along with chain-of-thought prompting, we assess LLMs' decryption success rate and discuss their comprehension abilities. Our findings reveal key insights into LLMs' strengths and limitations in side-channel scenarios and raise concerns about their susceptibility to under-generalization-related attacks. This research highlights the dual-use nature of LLMs in security contexts and contributes to the ongoing discussion on AI safety and security.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Do LLMsMakeNeural Distinguishers Wise?

    cs.CR 2026-06 unverdicted novelty 7.0

    LLM-based neural distinguishers on SPECK-32/64 show no improvement over ResNet but gain from XOR-inclusive prompts.

  2. Empirical Evaluation of Large Language Models for Migration of Code Fragments to Post-Quantum Cryptography

    cs.CR 2026-06 conditional novelty 6.0

    Fine-tuned GPT-4.1-mini reaches 0.9072 static similarity and 92.5% functional correctness on a new synthetic dataset of cryptographic code migrations, outperforming zero-shot baselines.