Empirical Evaluation of Large Language Models for Migration of Code Fragments to Post-Quantum Cryptography

Ana I. Gonz\'alez-Tablas; Javier Pallar\'es de Bonrostro; Mar\'ia Isabel Gonz\'alez Vasco

arxiv: 2606.07341 · v1 · pith:NCKWF6BNnew · submitted 2026-06-05 · 💻 cs.CR

Empirical Evaluation of Large Language Models for Migration of Code Fragments to Post-Quantum Cryptography

Javier Pallar\'es de Bonrostro , Ana I. Gonz\'alez-Tablas , Mar\'ia Isabel Gonz\'alez Vasco This is my paper

Pith reviewed 2026-06-27 21:51 UTC · model grok-4.3

classification 💻 cs.CR

keywords post-quantum cryptographylarge language modelscode migrationfine-tuningcryptographic codefunctional correctnesssynthetic datasetPython

0 comments

The pith

Fine-tuned GPT-4.1-mini migrates pre-quantum cryptographic code to post-quantum versions at 92.5 percent functional correctness.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper tests whether large language models can convert fragments of pre-quantum cryptographic code into post-quantum equivalents while keeping the same behavior. The authors built a dataset of 800 paired Python examples spanning six cryptographic families plus multi-primitive cases, each checked by category-specific functional tests. Four models were compared: one in zero-shot mode and three after domain-specific fine-tuning. The fine-tuned GPT-4.1-mini produced the highest static similarity and passed dynamic tests 92.5 percent of the time, beating the zero-shot baseline by a wide margin. Tests on six open-source repositories confirmed usefulness for isolated modules yet exposed limits when dependencies span multiple files.

Core claim

Domain-specific fine-tuning of large language models enables reliable migration of pre-quantum cryptographic code fragments to post-quantum counterparts. On a synthetic dataset of 800 validated pairs the fine-tuned GPT-4.1-mini reached a mean static similarity of 0.9072 and 92.5 percent dynamic functional correctness, substantially outperforming zero-shot GPT-4.1. The same model generated useful migrations in localized modules of real repositories while revealing difficulties with complex cross-module dependencies.

What carries the argument

A reproducible experimental framework centered on a synthetic dataset of 800 paired Python code fragments, each validated by category-specific functional tests, that measures both static code similarity and dynamic functional correctness of model outputs.

If this is right

Domain-specific fine-tuning is essential for reliable cryptographic migration performance.
Fine-tuned LLMs can serve as practical components inside crypto-agile migration pipelines when paired with automated verification.
The method produces useful migrations inside localized cryptographic modules of open-source projects.
Larger projects with complex dependencies and cross-module interactions remain challenging for the current approach.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Extending the dataset to additional languages such as C or Java would test whether the same fine-tuning gains appear outside Python.
Combining the LLM migration step with static dependency-graph analysis could reduce failures on multi-module codebases.
Measuring how often the generated post-quantum code passes the same test suite used for the original fragments on a broader set of libraries would give a clearer picture of generalization.

Load-bearing premise

The synthetic dataset of 800 paired code fragments, validated only through category-specific functional tests, sufficiently represents the structure, dependencies, and edge cases found in real-world cryptographic modules.

What would settle it

Running the fine-tuned GPT-4.1-mini on a large real-world cryptographic library and measuring whether dynamic functional correctness falls below 80 percent on modules with cross-file dependencies would test the central performance claim.

Figures

Figures reproduced from arXiv: 2606.07341 by Ana I. Gonz\'alez-Tablas, Javier Pallar\'es de Bonrostro, Mar\'ia Isabel Gonz\'alez Vasco.

**Figure 2.** Figure 2: Dataset Generation Phase. The internal flow starts with the manual creation of migrations, continues with synthetic generation and validation of the resulting corpus. encryption. Hash functions follow a similar logic: while SHA-2 remains viable with sufficiently long outputs, SHA-3 offers a more modern structure and stronger security margin [8]. Accordingly, SHA3-256 is used as the default target for hash… view at source ↗

**Figure 4.** Figure 4: Instruction-style prompt for CodeLlama models [PITH_FULL_IMAGE:figures/full_fig_p013_4.png] view at source ↗

**Figure 5.** Figure 5: LLM Preparation Phase. The validated synthetic dataset is split into train and validation subsets. Only the train split is used for dataset adaptation and model-specific fine-tuning, while the validation split is reserved for the evaluation phase. GPT3.5 Turbo, GPT-4.1 Mini, and CodeLlama-7B Instruct undergo fine-tuning, whereas GPT-4.1 is included as a zero-shot reference and bypasses the preparation… view at source ↗

**Figure 6.** Figure 6: LLM Evaluation Phase. The prepared LLMs are evaluated on the validation dataset using both static similarity metrics and automated functional tests. Their results are then compared through a multicriteria analysis, leading to the selection of the bestperforming LLM for the final real-world validation stage. Metrics. As already introduced, two complementary evaluation dimensions were considered. The first… view at source ↗

**Figure 7.** Figure 7: Static similarity distributions for pre- to post-quantum code migration. The horizontal axis represents [PITH_FULL_IMAGE:figures/full_fig_p018_7.png] view at source ↗

**Figure 8.** Figure 8: Dynamic functional correctness across model configurations. Each subfigure shows the distribution of [PITH_FULL_IMAGE:figures/full_fig_p018_8.png] view at source ↗

**Figure 9.** Figure 9: Similarity distribution with predominant cate [PITH_FULL_IMAGE:figures/full_fig_p018_9.png] view at source ↗

**Figure 11.** Figure 11: Real-world Validation Phase. The bestperforming LLM selected in the previous phase is applied to a set of selected open-source repositories. Repository-specific tests are then used to assess practical migration behavior under realistic dependency and modularity constraints, producing a structured summary of real-world migration outcomes. Real-world repository selection. Beyond controlled evaluation on … view at source ↗

**Figure 12.** Figure 12: Position of the proposed LLM-assisted migra [PITH_FULL_IMAGE:figures/full_fig_p023_12.png] view at source ↗

**Figure 13.** Figure 13: GPT-4.1-Mini–assisted post-quantum migra [PITH_FULL_IMAGE:figures/full_fig_p024_13.png] view at source ↗

read the original abstract

The transition to post-quantum cryptography (PQC) requires not only replacing vulnerable cryptographic primitives, but also refactoring the surrounding software logic. While existing PQC migration frameworks provide organizational guidance, practical code-level remediation remains largely manual and error-prone. This paper evaluates whether large language models (LLMs) can be trained to assist in the migration of pre-quantum cryptographic code fragments to post-quantum counterparts while preserving functional correctness. To this end, we introduce a reproducible experimental framework built around a synthetic dataset of 800 paired Python code fragments covering six cryptographic families and combined multi-primitive cases. Each pair is validated through category-specific functional tests, enabling both dataset quality control and objective evaluation of model-generated migrations. Four models are assessed: GPT-4.1 in a zero-shot setting, and fine-tuned versions of GPT-3.5-turbo, GPT-4.1-mini, and CodeLlama-7B-Instruct. The results show that domain-specific fine-tuning is essential for reliable cryptographic migration. The fine-tuned GPT-4.1-mini model achieves the best overall performance, with a mean static similarity of 0.9072 and a dynamic functional correctness rate of 92.5%, substantially outperforming the zero-shot baseline. A complementary validation on six open-source repositories further shows that the approach can produce useful migrations in localized cryptographic modules, while also revealing limitations in larger projects with complex dependencies and cross-module interactions. These findings suggest that fine-tuned LLMs can serve as practical components in future crypto-agile migration pipelines, provided they are coupled with automated verification and dependency-aware validation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Fine-tuned LLMs show usable results on synthetic PQC code fragments but the real-repo checks already flag dependency problems that limit broader claims.

read the letter

The paper's core contribution is a new 800-pair synthetic dataset of Python crypto fragments across six families plus multi-primitive cases, paired with functional tests, and a head-to-head comparison of zero-shot versus fine-tuned models for migration to post-quantum primitives. The fine-tuned GPT-4.1-mini reaches 0.9072 mean static similarity and 92.5% dynamic correctness on that set, clearly beating the zero-shot baseline, and the authors also run a limited check on six open-source repositories.

The dataset construction and the explicit functional validation per category are the parts that feel new relative to earlier LLM code-transformation work. Using both static similarity and dynamic tests is a reasonable choice for cryptographic code, and the repo-level results give at least some indication of where the method breaks.

The main limitation is that the headline numbers rest on synthetic fragments whose coverage of real call patterns, exception handling, and cross-module dependencies is not independently verified. The paper itself reports degraded behavior on larger repositories with complex interactions, which suggests the synthetic regime may understate the difficulty. That gap is acknowledged but not closed by the current experiments.

This work is aimed at researchers and engineers working on crypto-agile tooling or LLM-assisted security refactoring. It is coherent on its own terms and engages the practical problem directly, so it deserves a serious referee even though additional real-world validation would be needed before anyone would rely on the numbers for production migration pipelines.

Referee Report

2 major / 2 minor

Summary. The paper introduces a reproducible experimental framework and a synthetic dataset of 800 paired Python code fragments covering six cryptographic families plus multi-primitive cases. It evaluates zero-shot GPT-4.1 against fine-tuned GPT-3.5-turbo, GPT-4.1-mini, and CodeLlama-7B-Instruct on migrating pre-quantum cryptographic fragments to post-quantum counterparts, reporting that the fine-tuned GPT-4.1-mini achieves the highest mean static similarity (0.9072) and dynamic functional correctness (92.5%). A secondary check on six open-source repositories shows localized utility but degraded results on complex dependency structures.

Significance. If the central empirical claims hold, the work demonstrates that domain-specific fine-tuning enables LLMs to produce functionally correct PQC migrations at the fragment level and supplies a reusable test harness with both static similarity and dynamic execution metrics. The explicit real-repository validation and the paper's own acknowledgment of cross-module limitations are positive features that strengthen the contribution relative to purely synthetic studies.

major comments (2)

[Dataset section] Dataset section (synthetic 800-pair construction): the headline static similarity of 0.9072 and dynamic correctness of 92.5% rest on category-specific functional tests whose coverage of call-pattern changes, exception handling, and multi-primitive interactions is not independently quantified; the paper itself reports degraded performance on real repositories with cross-module dependencies, indicating that the synthetic regime may systematically understate task difficulty.
[Real-repository validation paragraph] Real-repository validation paragraph: the claim that the approach 'can produce useful migrations in localized cryptographic modules' is supported only by qualitative description; without quantitative static/dynamic scores comparable to the synthetic 0.9072/92.5% figures, it is impossible to calibrate how far the primary results generalize to production cryptographic code.

minor comments (2)

[Abstract] Abstract and model list: 'GPT-4.1' is used for the zero-shot baseline while 'GPT-4.1-mini' appears in the fine-tuned results; consistent naming and explicit parameter counts or API versions would improve clarity.
[Evaluation metrics paragraph] Evaluation metrics paragraph: the precise definition of 'static similarity' (e.g., which token or AST metric) is referenced only at a high level; adding the exact formula or library call would aid reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which help clarify the scope and limitations of our empirical evaluation. We address each major comment below, indicating planned revisions to strengthen the manuscript.

read point-by-point responses

Referee: [Dataset section] Dataset section (synthetic 800-pair construction): the headline static similarity of 0.9072 and dynamic correctness of 92.5% rest on category-specific functional tests whose coverage of call-pattern changes, exception handling, and multi-primitive interactions is not independently quantified; the paper itself reports degraded performance on real repositories with cross-module dependencies, indicating that the synthetic regime may systematically understate task difficulty.

Authors: We agree that independent quantification of test coverage (e.g., via statement or branch coverage) would improve transparency. The category-specific tests were constructed to exercise the primary functional requirements of each primitive family and multi-primitive combinations, including representative call patterns and exception paths. We did not compute aggregate coverage metrics across the 800 pairs. The manuscript already notes performance degradation on real repositories as a limitation of the synthetic regime. In revision we will expand the Dataset section with a more explicit description of the test cases and add a dedicated paragraph discussing how the synthetic setting may understate cross-module complexity. revision: partial
Referee: [Real-repository validation paragraph] Real-repository validation paragraph: the claim that the approach 'can produce useful migrations in localized cryptographic modules' is supported only by qualitative description; without quantitative static/dynamic scores comparable to the synthetic 0.9072/92.5% figures, it is impossible to calibrate how far the primary results generalize to production cryptographic code.

Authors: We accept that the real-repository check is qualitative and cannot be directly calibrated against the synthetic metrics. This component was included to illustrate localized applicability rather than to provide a quantitative generalization study; constructing executable test harnesses and resolving full dependency graphs for the six repositories would have required substantial additional engineering outside the paper's scope. We will revise the paragraph to explicitly characterize the validation as qualitative, remove any implication of direct comparability, and reinforce the already-stated limitations regarding complex dependencies. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical measurement on synthetic dataset

full rationale

The paper reports experimental results from fine-tuning and evaluating LLMs on a fixed synthetic dataset of 800 code pairs, using static similarity and dynamic functional tests as direct metrics. No derivations, equations, fitted parameters renamed as predictions, or self-citation chains are present; performance numbers (0.9072 similarity, 92.5% correctness) are measured outputs, not reductions of prior fitted quantities. The study is self-contained against external benchmarks and does not invoke uniqueness theorems or ansatzes from prior author work.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central performance claims rest on the assumption that functional tests on synthetic fragments are adequate proxies for real cryptographic correctness and on standard supervised fine-tuning assumptions; no new physical or mathematical entities are introduced.

axioms (1)

domain assumption Category-specific functional tests are sufficient to determine whether a migrated code fragment preserves functional correctness.
Dynamic correctness rate of 92.5% is computed directly from these tests.

pith-pipeline@v0.9.1-grok · 5842 in / 1331 out tokens · 30408 ms · 2026-06-27T21:51:35.617548+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

51 extracted references · 6 canonical work pages

[1]

A survey of post-quantum cryptog- raphy support in cryptographic libraries

Nadeem Ahmed, Lei Zhang, and Aryya Gan- gopadhyay. “A survey of post-quantum cryptog- raphy support in cryptographic libraries”. In: arXiv preprint arXiv:2508.16078(2025)

arXiv 2025
[2]

NIST Interagency/Internal Report (NISTIR) 8528

Gorjan Alagic et al.Status Report on the First Round of the Additional Digital Sig- nature Schemes for the NIST Post-Quantum Cryptography Standardization Process. NIST Interagency/Internal Report (NISTIR) 8528. Gaithersburg, MD, USA: National Institute of Standards and Technology, Oct. 2024.doi: 10 . 6028 / NIST . IR . 8528.url:https : / / doi . org/10.60...

work page doi:10.6028/nist.ir.8528 2024
[3]

NIST Cybersecurity White Paper NIST CSWP

Elaine Barker et al.Considerations for Achieving Cryptographic Agility: Strategies and Practices. NIST Cybersecurity White Paper NIST CSWP
[4]

2025.doi:10

National Institute of Standards and Technol- ogy, Dec. 2025.doi:10 . 6028 / NIST . CSWP . 39. url:https://doi.org/10.6028/NIST.CSWP. 39

work page doi:10.6028/nist.cswp 2025
[5]

Ward Beullens et al.Post-Quantum Cryptog- raphy: Current State and Quantum Mitigation (v2). Tech. rep. European Union Agency for Cybersecurity (ENISA), 2021.doi:10 . 2824 / 92307.url:https : / / www . enisa . europa . eu / sites / default / files / publications / ENISA % 20Report % 20 - %20Post - Quantum % 20Cryptography % 20Current % 20state % 20and % 20...

2021
[6]

Cybersecurity and Infrastructure Security Agency (CISA).Strategy for Migrating to Au- tomated Post-Quantum Cryptography Discov- ery and Inventory Tools. Tech. rep. Accedido el 23 de mayo de 2025. U.S. Department of Homeland Security, Sept. 2024.url:https : //www.cisa.gov/sites/default/files/2024- 09/Strategy- for- Migrating- to- Automated- PQC-Discovery-a...

2025
[7]

Qlora: Efficient finetuning of quantized llms

Tim Dettmers et al. “Qlora: Efficient finetuning of quantized llms”. In:Advances in neural infor- mation processing systems36 (2023), pp. 10088– 10115

2023
[8]

Nos CorpusNOS-GL: Galician Macrocorpus for LLM training

Iria de-Dios-Flores et al. “Nos CorpusNOS-GL: Galician Macrocorpus for LLM training”. In: Nos CorpusNOS-GL: Galician Macrocorpus for LLM training(2024)

2024
[9]

SHA-3 standard: Permutation-based hash and extendable-output functions

Morris J Dworkin et al. “SHA-3 standard: Permutation-based hash and extendable-output functions”. In: (2015)

2015
[10]

European Commission and NIS Cooperation Group.Coordinated Implementation Roadmap for the Transition to Post-Quantum Cryptog- raphy.https : / / digital - strategy . ec . europa . eu / en / library / coordinated - implementation - roadmap - transition - post - quantum-cryptography. European Union policy roadmap on coordinated migration to post- quantum cryp...

2025
[11]

Why Do Large Language Mod- els (LLMs) Struggle to Count Letters?

Tairan Fu et al. “Why Do Large Language Mod- els (LLMs) Struggle to Count Letters?” In:arXiv preprint arXiv:2412.18626(2024)

arXiv 2024
[12]

Response accuracy of GPT- 4 across languages: insights from an expert-level diagnostic radiology examination in Japan

Ayaka Harigai et al. “Response accuracy of GPT- 4 across languages: insights from an expert-level diagnostic radiology examination in Japan”. In:Japanese Journal of Radiology43.2 (2025), pp. 319–329

2025
[13]

A framework for migrating to post-quantum cryptography: Secu- rity dependency analysis and case studies

Khondokar Fida Hasan et al. “A framework for migrating to post-quantum cryptography: Secu- rity dependency analysis and case studies”. In: IEEE Access12 (2024), pp. 23427–23450

2024
[14]

A Survey on Large Language Models for Code Generation

Juyong Jiang et al. “A Survey on Large Language Models for Code Generation”. In:ACM Transac- tions on Software Engineering and Methodology 35.2 (Jan. 2026), pp. 1–72.issn: 1557-7392.doi: 10.1145/3747588.url:http://dx.doi.org/ 10.1145/3747588

work page doi:10.1145/3747588.url:http://dx.doi.org/ 2026
[15]

Breaking symmetric cryp- tosystems using quantum period finding

Marc Kaplan et al. “Breaking symmetric cryp- tosystems using quantum period finding”. In:Ad- vances in Cryptology–CRYPTO 2016: 36th An- nual International Cryptology Conference, Santa Barbara, CA, USA, August 14-18, 2016, Proceed- ings, Part II 36. Springer. 2016, pp. 207–237

2016
[16]

Unsupervised trans- lation of programming languages

Marie-Anne Lachaux et al. “Unsupervised trans- lation of programming languages”. In:arXiv preprint arXiv:2006.03511(2020)

arXiv 2006
[17]

On the post- quantum security of classical authenticated en- cryption schemes

Nathalie Lang and Stefan Lucks. “On the post- quantum security of classical authenticated en- cryption schemes”. In:International Conference on Cryptology in Africa. Springer. 2023, pp. 79– 104

2023
[18]

CryptoScope: Utilizing Large Language Models for Automated Cryptographic Logic Vulnerability Detection

Zhihao Li et al. “CryptoScope: Utilizing Large Language Models for Automated Cryptographic Logic Vulnerability Detection”. In:arXiv preprint arXiv:2508.11599(2025)

arXiv 2025
[19]

Quantifying multilingual per- formance of large language models across lan- guages

Zihao Li et al. “Quantifying multilingual per- formance of large language models across lan- guages”. In:arXiv e-prints(2024), arXiv–2404

2024
[20]

AutoP2C: An LLM-Based Agent Framework for Code Repository Generation from Multimodal Content in Academic Papers

Zijie Lin et al. “AutoP2C: An LLM-Based Agent Framework for Code Repository Generation from Multimodal Content in Academic Papers”. In: arXiv preprint arXiv:2504.20115(2025)

arXiv 2025
[21]

Understanding llms: A com- prehensive overview from training to inference

Yiheng Liu et al. “Understanding llms: A com- prehensive overview from training to inference”. In:Neurocomputing(2024), p. 129190. 27

2024
[22]

Self-refine: Iterative refine- ment with self-feedback

Aman Madaan et al. “Self-refine: Iterative refine- ment with self-feedback”. In:Advances in Neural Information Processing Systems36 (2023), pp. 46534–46594

2023
[23]

Automated Update of Android Deprecated API Usages with Large Lan- guage Models

Tarek Mahmud et al. “Automated Update of Android Deprecated API Usages with Large Lan- guage Models”. In:arXiv preprint arXiv:2411.04387 (2024)

arXiv 2024
[24]

Benchmarking large language models for cryptanalysis and mismatched-generalization

Utsav Maskey, Chencheng Zhu, and Usman Naseem. “Benchmarking large language models for cryptanalysis and mismatched-generalization”. In:arXiv preprint arXiv:2505.24621(2025)

Pith/arXiv arXiv 2025
[25]

Be- yond Static Tools: Evaluating Large Language Models for Cryptographic Misuse Detection

Zohaib Masood and Miguel Vargas Martin. “Be- yond Static Tools: Evaluating Large Language Models for Cryptographic Misuse Detection”. In: arXiv preprint arXiv:2411.09772(2024)

arXiv 2024
[26]

Federal Information Processing Standards Publication NIST FIPS 203

National Institute of Standards and Technol- ogy.FIPS 203: Module-Lattice-Based Key- Encapsulation Mechanism Standard. Federal In- formation Processing Standards Publication 203. National Institute of Standards and Technology, Aug. 2024.doi:10.6028/NIST.FIPS.203.url: https://doi.org/10.6028/NIST.FIPS.203

work page doi:10.6028/nist.fips.203.url: 2024
[27]

Federal Information Pro- cessing Standards Publication 204

National Institute of Standards and Technol- ogy.FIPS 204: Module-Lattice-Based Digital Signature Standard. Federal Information Pro- cessing Standards Publication 204. National Institute of Standards and Technology, Aug. 2024.doi:10 . 6028 / NIST . FIPS . 204.url: https://doi.org/10.6028/NIST.FIPS.204

work page doi:10.6028/nist.fips.204 2024
[28]

FIPS 205: Stateless Hash-Based Digital Signature Standard

National Institute of Standards and Technology. FIPS 205: Stateless Hash-Based Digital Signature Standard. Federal Information Processing Stan- dards Publication 205. National Institute of Stan- dards and Technology, Aug. 2024.doi:10.6028/ NIST . FIPS . 205.url:https : / / doi . org / 10 . 6028/NIST.FIPS.205

2024
[29]

NIST Announces First Four Quantum-Resistant Cryptographic Algorithms.https://www.nist

National Institute of Standards and Technology. NIST Announces First Four Quantum-Resistant Cryptographic Algorithms.https://www.nist. gov / news - events / news / 2022 / 07 / nist - announces- first- four- quantum- resistant- cryptographic- algorithms. Accedido el 30 de mayo de 2025. July 2022

2022
[30]

National Institute of Standards and Technology (NIST).Migration to Post-Quantum Cryptog- raphy: Mappings to Risk Framework. Tech. rep. Draft White Paper. Available athttps : / / www . nist . gov / news - events / news / 2025 / 09/new-draft-white-paper-pqc-migration- mappings - risk - framework - docs. National Cybersecurity Center of Excellence (NCCoE), S...

2025
[31]

Accedido el 22 de mayo de 2025

OpenAI.GPT-4.1 Overview. Accedido el 22 de mayo de 2025. 2025.url:https://platform. openai.com/docs/models/gpt-4.1

2025
[32]

Ac- cedido el 22 de mayo de 2025

OpenAI.Introducing GPT-4.1 in the API. Ac- cedido el 22 de mayo de 2025. Apr. 2025.url: https://openai.com/index/gpt-4-1/

2025
[33]

Accedido el 22 de mayo de 2025

OpenAI.Pricing - OpenAI API. Accedido el 22 de mayo de 2025. 2025.url:https://platform. openai.com/docs/pricing/

2025
[34]

Ver- sion V1

Javier Pallar´ es de Bonrostro and Ana Is- abel Gonz´ alez-Tablas.Cryptographic Migration Dataset: Pre-Quantum to Post-Quantum. Ver- sion V1. 2025.doi:10 . 21950 / 7GK4MJ.url: https://doi.org/10.21950/7GK4MJ

work page doi:10.21950/7gk4mj 2025
[35]

PQC Migration Roadmap

Post-Quantum Cryptography Coalition (PQCC). PQC Migration Roadmap. Tech. rep. Available at https : / / pqcc . org / wp - content / uploads / 2025 / 05 / PQC - Migration - Roadmap - PQCC - 2 . pdf. PQC Coalition, May 2025

2025
[36]

Robert Praas.Self-Reflection on Chain-of- Thought Reasoning in Large Language Models. 2023

2023
[37]

Accessed January

IBM Research.Cryptography Bill of Materials (CBOM).https://research.ibm.com/blog/ crypto-bill-of-materials. Accessed January
[38]

Applied Post Quantum Cryptography: A Practical Approach for Generating Certificates in Industrial Environments

Nino Ricchizzi, Christian Schwinne, and Jan Pelzl. “Applied Post Quantum Cryptography: A Practical Approach for Generating Certificates in Industrial Environments”. In:arXiv preprint arXiv:2505.04333(2025)

arXiv 2025
[39]

Code llama: Open foun- dation models for code

Baptiste Roziere et al. “Code llama: Open foun- dation models for code”. In:arXiv preprint arXiv:2308.12950(2023)

Pith/arXiv arXiv 2023
[40]

Refactoring programs using large language models with few-shot ex- amples

Atsushi Shirafuji et al. “Refactoring programs using large language models with few-shot ex- amples”. In:2023 30th Asia-Pacific Software Engineering Conference (APSEC). IEEE. 2023, pp. 151–160

2023
[41]

ELCA: Introducing enterprise-level cryptographic agility for a post- quantum era

Dimitrios Sikeridis et al. “ELCA: Introducing enterprise-level cryptographic agility for a post- quantum era”. In:Cryptology ePrint Archive (2023)

2023
[42]

Assessing and Enhancing Quantum Readiness in Mobile Apps

Joseph Strauss et al. “Assessing and Enhancing Quantum Readiness in Mobile Apps”. In:arXiv preprint arXiv:2506.00790(2025). Available at https://arxiv.org/abs/2506.00790

arXiv 2025
[43]

Large language models for software vulnerabil- ity detection: a guide for researchers on mod- els, methods, techniques, datasets, and metrics

Seyed Mohammad Taghavi Far and Farid Feyzi. “Large language models for software vulnerabil- ity detection: a guide for researchers on mod- els, methods, techniques, datasets, and metrics”. In:International Journal of Information Security 24.2 (2025), p. 78. 28

2025
[44]

Llama: Open and efficient foundation language models

Hugo Touvron et al. “Llama: Open and efficient foundation language models”. In:arXiv preprint arXiv:2302.13971(2023)

Pith/arXiv arXiv 2023
[45]

Meltem S¨ onmez Turan et al.Ascon-Based Lightweight Cryptography Standards for Con- strained Devices: Authenticated Encryption, Hash, and Extendable Output Functions. Tech. rep. NIST SP 800-232 (Initial Public Draft). Initial Public Draft. National Institute of Stan- dards and Technology, Oct. 2024.url:https : //csrc.nist.gov/pubs/sp/800/232/ipd

2024
[46]

Attention is all you need

Ashish Vaswani et al. “Attention is all you need”. In:Advances in neural information processing systems30 (2017)

2017
[47]

A survey of human-in-the- loop for machine learning

Xingjiao Wu et al. “A survey of human-in-the- loop for machine learning”. In:Future Generation Computer Systems135 (2022), pp. 364–381

2022
[48]

Jingfeng Yang et al.Harnessing the Power of LLMs in Practice: A Survey on ChatGPT and Beyond. 2023. arXiv:2304.13712 [cs.CL].url: https://arxiv.org/abs/2304.13712

arXiv 2023
[49]

Quanjun Zhang et al.A Survey on Large Lan- guage Models for Software Engineering. 2024. arXiv:2312 . 15223 [cs.SE].url:https : //arxiv.org/abs/2312.15223

arXiv 2024
[50]

Hybrid API migration: A marriage of small API mapping models and large language models

Bingzhe Zhou et al. “Hybrid API migration: A marriage of small API mapping models and large language models”. In:Proceedings of the 14th Asia-Pacific Symposium on Internetware. 2023, pp. 12–21

2023
[51]

Migrating Code At Scale With LLMs At Google

Celal Ziftci et al. “Migrating Code At Scale With LLMs At Google”. In:arXiv preprint arXiv:2504.09691(2025). 29 Appendix A: F ull dataset distribution Table 8: Distribution of single cryptographic primitives across training and validation splits. Primitive / Alg. #Var. Train Val. Hash functions BLAKE2b 7 7 0 BLAKE2s 7 7 0 MD5 10 9 1 RIPEMD160 1 1 0 SHA-1 ...

arXiv 2025

[1] [1]

A survey of post-quantum cryptog- raphy support in cryptographic libraries

Nadeem Ahmed, Lei Zhang, and Aryya Gan- gopadhyay. “A survey of post-quantum cryptog- raphy support in cryptographic libraries”. In: arXiv preprint arXiv:2508.16078(2025)

arXiv 2025

[2] [2]

NIST Interagency/Internal Report (NISTIR) 8528

Gorjan Alagic et al.Status Report on the First Round of the Additional Digital Sig- nature Schemes for the NIST Post-Quantum Cryptography Standardization Process. NIST Interagency/Internal Report (NISTIR) 8528. Gaithersburg, MD, USA: National Institute of Standards and Technology, Oct. 2024.doi: 10 . 6028 / NIST . IR . 8528.url:https : / / doi . org/10.60...

work page doi:10.6028/nist.ir.8528 2024

[3] [3]

NIST Cybersecurity White Paper NIST CSWP

Elaine Barker et al.Considerations for Achieving Cryptographic Agility: Strategies and Practices. NIST Cybersecurity White Paper NIST CSWP

[4] [4]

2025.doi:10

National Institute of Standards and Technol- ogy, Dec. 2025.doi:10 . 6028 / NIST . CSWP . 39. url:https://doi.org/10.6028/NIST.CSWP. 39

work page doi:10.6028/nist.cswp 2025

[5] [5]

Ward Beullens et al.Post-Quantum Cryptog- raphy: Current State and Quantum Mitigation (v2). Tech. rep. European Union Agency for Cybersecurity (ENISA), 2021.doi:10 . 2824 / 92307.url:https : / / www . enisa . europa . eu / sites / default / files / publications / ENISA % 20Report % 20 - %20Post - Quantum % 20Cryptography % 20Current % 20state % 20and % 20...

2021

[6] [6]

Cybersecurity and Infrastructure Security Agency (CISA).Strategy for Migrating to Au- tomated Post-Quantum Cryptography Discov- ery and Inventory Tools. Tech. rep. Accedido el 23 de mayo de 2025. U.S. Department of Homeland Security, Sept. 2024.url:https : //www.cisa.gov/sites/default/files/2024- 09/Strategy- for- Migrating- to- Automated- PQC-Discovery-a...

2025

[7] [7]

Qlora: Efficient finetuning of quantized llms

Tim Dettmers et al. “Qlora: Efficient finetuning of quantized llms”. In:Advances in neural infor- mation processing systems36 (2023), pp. 10088– 10115

2023

[8] [8]

Nos CorpusNOS-GL: Galician Macrocorpus for LLM training

Iria de-Dios-Flores et al. “Nos CorpusNOS-GL: Galician Macrocorpus for LLM training”. In: Nos CorpusNOS-GL: Galician Macrocorpus for LLM training(2024)

2024

[9] [9]

SHA-3 standard: Permutation-based hash and extendable-output functions

Morris J Dworkin et al. “SHA-3 standard: Permutation-based hash and extendable-output functions”. In: (2015)

2015

[10] [10]

European Commission and NIS Cooperation Group.Coordinated Implementation Roadmap for the Transition to Post-Quantum Cryptog- raphy.https : / / digital - strategy . ec . europa . eu / en / library / coordinated - implementation - roadmap - transition - post - quantum-cryptography. European Union policy roadmap on coordinated migration to post- quantum cryp...

2025

[11] [11]

Why Do Large Language Mod- els (LLMs) Struggle to Count Letters?

Tairan Fu et al. “Why Do Large Language Mod- els (LLMs) Struggle to Count Letters?” In:arXiv preprint arXiv:2412.18626(2024)

arXiv 2024

[12] [12]

Response accuracy of GPT- 4 across languages: insights from an expert-level diagnostic radiology examination in Japan

Ayaka Harigai et al. “Response accuracy of GPT- 4 across languages: insights from an expert-level diagnostic radiology examination in Japan”. In:Japanese Journal of Radiology43.2 (2025), pp. 319–329

2025

[13] [13]

A framework for migrating to post-quantum cryptography: Secu- rity dependency analysis and case studies

Khondokar Fida Hasan et al. “A framework for migrating to post-quantum cryptography: Secu- rity dependency analysis and case studies”. In: IEEE Access12 (2024), pp. 23427–23450

2024

[14] [14]

A Survey on Large Language Models for Code Generation

Juyong Jiang et al. “A Survey on Large Language Models for Code Generation”. In:ACM Transac- tions on Software Engineering and Methodology 35.2 (Jan. 2026), pp. 1–72.issn: 1557-7392.doi: 10.1145/3747588.url:http://dx.doi.org/ 10.1145/3747588

work page doi:10.1145/3747588.url:http://dx.doi.org/ 2026

[15] [15]

Breaking symmetric cryp- tosystems using quantum period finding

Marc Kaplan et al. “Breaking symmetric cryp- tosystems using quantum period finding”. In:Ad- vances in Cryptology–CRYPTO 2016: 36th An- nual International Cryptology Conference, Santa Barbara, CA, USA, August 14-18, 2016, Proceed- ings, Part II 36. Springer. 2016, pp. 207–237

2016

[16] [16]

Unsupervised trans- lation of programming languages

Marie-Anne Lachaux et al. “Unsupervised trans- lation of programming languages”. In:arXiv preprint arXiv:2006.03511(2020)

arXiv 2006

[17] [17]

On the post- quantum security of classical authenticated en- cryption schemes

Nathalie Lang and Stefan Lucks. “On the post- quantum security of classical authenticated en- cryption schemes”. In:International Conference on Cryptology in Africa. Springer. 2023, pp. 79– 104

2023

[18] [18]

CryptoScope: Utilizing Large Language Models for Automated Cryptographic Logic Vulnerability Detection

Zhihao Li et al. “CryptoScope: Utilizing Large Language Models for Automated Cryptographic Logic Vulnerability Detection”. In:arXiv preprint arXiv:2508.11599(2025)

arXiv 2025

[19] [19]

Quantifying multilingual per- formance of large language models across lan- guages

Zihao Li et al. “Quantifying multilingual per- formance of large language models across lan- guages”. In:arXiv e-prints(2024), arXiv–2404

2024

[20] [20]

AutoP2C: An LLM-Based Agent Framework for Code Repository Generation from Multimodal Content in Academic Papers

Zijie Lin et al. “AutoP2C: An LLM-Based Agent Framework for Code Repository Generation from Multimodal Content in Academic Papers”. In: arXiv preprint arXiv:2504.20115(2025)

arXiv 2025

[21] [21]

Understanding llms: A com- prehensive overview from training to inference

Yiheng Liu et al. “Understanding llms: A com- prehensive overview from training to inference”. In:Neurocomputing(2024), p. 129190. 27

2024

[22] [22]

Self-refine: Iterative refine- ment with self-feedback

Aman Madaan et al. “Self-refine: Iterative refine- ment with self-feedback”. In:Advances in Neural Information Processing Systems36 (2023), pp. 46534–46594

2023

[23] [23]

Automated Update of Android Deprecated API Usages with Large Lan- guage Models

Tarek Mahmud et al. “Automated Update of Android Deprecated API Usages with Large Lan- guage Models”. In:arXiv preprint arXiv:2411.04387 (2024)

arXiv 2024

[24] [24]

Benchmarking large language models for cryptanalysis and mismatched-generalization

Utsav Maskey, Chencheng Zhu, and Usman Naseem. “Benchmarking large language models for cryptanalysis and mismatched-generalization”. In:arXiv preprint arXiv:2505.24621(2025)

Pith/arXiv arXiv 2025

[25] [25]

Be- yond Static Tools: Evaluating Large Language Models for Cryptographic Misuse Detection

Zohaib Masood and Miguel Vargas Martin. “Be- yond Static Tools: Evaluating Large Language Models for Cryptographic Misuse Detection”. In: arXiv preprint arXiv:2411.09772(2024)

arXiv 2024

[26] [26]

Federal Information Processing Standards Publication NIST FIPS 203

National Institute of Standards and Technol- ogy.FIPS 203: Module-Lattice-Based Key- Encapsulation Mechanism Standard. Federal In- formation Processing Standards Publication 203. National Institute of Standards and Technology, Aug. 2024.doi:10.6028/NIST.FIPS.203.url: https://doi.org/10.6028/NIST.FIPS.203

work page doi:10.6028/nist.fips.203.url: 2024

[27] [27]

Federal Information Pro- cessing Standards Publication 204

National Institute of Standards and Technol- ogy.FIPS 204: Module-Lattice-Based Digital Signature Standard. Federal Information Pro- cessing Standards Publication 204. National Institute of Standards and Technology, Aug. 2024.doi:10 . 6028 / NIST . FIPS . 204.url: https://doi.org/10.6028/NIST.FIPS.204

work page doi:10.6028/nist.fips.204 2024

[28] [28]

FIPS 205: Stateless Hash-Based Digital Signature Standard

National Institute of Standards and Technology. FIPS 205: Stateless Hash-Based Digital Signature Standard. Federal Information Processing Stan- dards Publication 205. National Institute of Stan- dards and Technology, Aug. 2024.doi:10.6028/ NIST . FIPS . 205.url:https : / / doi . org / 10 . 6028/NIST.FIPS.205

2024

[29] [29]

NIST Announces First Four Quantum-Resistant Cryptographic Algorithms.https://www.nist

National Institute of Standards and Technology. NIST Announces First Four Quantum-Resistant Cryptographic Algorithms.https://www.nist. gov / news - events / news / 2022 / 07 / nist - announces- first- four- quantum- resistant- cryptographic- algorithms. Accedido el 30 de mayo de 2025. July 2022

2022

[30] [30]

National Institute of Standards and Technology (NIST).Migration to Post-Quantum Cryptog- raphy: Mappings to Risk Framework. Tech. rep. Draft White Paper. Available athttps : / / www . nist . gov / news - events / news / 2025 / 09/new-draft-white-paper-pqc-migration- mappings - risk - framework - docs. National Cybersecurity Center of Excellence (NCCoE), S...

2025

[31] [31]

Accedido el 22 de mayo de 2025

OpenAI.GPT-4.1 Overview. Accedido el 22 de mayo de 2025. 2025.url:https://platform. openai.com/docs/models/gpt-4.1

2025

[32] [32]

Ac- cedido el 22 de mayo de 2025

OpenAI.Introducing GPT-4.1 in the API. Ac- cedido el 22 de mayo de 2025. Apr. 2025.url: https://openai.com/index/gpt-4-1/

2025

[33] [33]

Accedido el 22 de mayo de 2025

OpenAI.Pricing - OpenAI API. Accedido el 22 de mayo de 2025. 2025.url:https://platform. openai.com/docs/pricing/

2025

[34] [34]

Ver- sion V1

Javier Pallar´ es de Bonrostro and Ana Is- abel Gonz´ alez-Tablas.Cryptographic Migration Dataset: Pre-Quantum to Post-Quantum. Ver- sion V1. 2025.doi:10 . 21950 / 7GK4MJ.url: https://doi.org/10.21950/7GK4MJ

work page doi:10.21950/7gk4mj 2025

[35] [35]

PQC Migration Roadmap

Post-Quantum Cryptography Coalition (PQCC). PQC Migration Roadmap. Tech. rep. Available at https : / / pqcc . org / wp - content / uploads / 2025 / 05 / PQC - Migration - Roadmap - PQCC - 2 . pdf. PQC Coalition, May 2025

2025

[36] [36]

Robert Praas.Self-Reflection on Chain-of- Thought Reasoning in Large Language Models. 2023

2023

[37] [37]

Accessed January

IBM Research.Cryptography Bill of Materials (CBOM).https://research.ibm.com/blog/ crypto-bill-of-materials. Accessed January

[38] [38]

Applied Post Quantum Cryptography: A Practical Approach for Generating Certificates in Industrial Environments

Nino Ricchizzi, Christian Schwinne, and Jan Pelzl. “Applied Post Quantum Cryptography: A Practical Approach for Generating Certificates in Industrial Environments”. In:arXiv preprint arXiv:2505.04333(2025)

arXiv 2025

[39] [39]

Code llama: Open foun- dation models for code

Baptiste Roziere et al. “Code llama: Open foun- dation models for code”. In:arXiv preprint arXiv:2308.12950(2023)

Pith/arXiv arXiv 2023

[40] [40]

Refactoring programs using large language models with few-shot ex- amples

Atsushi Shirafuji et al. “Refactoring programs using large language models with few-shot ex- amples”. In:2023 30th Asia-Pacific Software Engineering Conference (APSEC). IEEE. 2023, pp. 151–160

2023

[41] [41]

ELCA: Introducing enterprise-level cryptographic agility for a post- quantum era

Dimitrios Sikeridis et al. “ELCA: Introducing enterprise-level cryptographic agility for a post- quantum era”. In:Cryptology ePrint Archive (2023)

2023

[42] [42]

Assessing and Enhancing Quantum Readiness in Mobile Apps

Joseph Strauss et al. “Assessing and Enhancing Quantum Readiness in Mobile Apps”. In:arXiv preprint arXiv:2506.00790(2025). Available at https://arxiv.org/abs/2506.00790

arXiv 2025

[43] [43]

Large language models for software vulnerabil- ity detection: a guide for researchers on mod- els, methods, techniques, datasets, and metrics

Seyed Mohammad Taghavi Far and Farid Feyzi. “Large language models for software vulnerabil- ity detection: a guide for researchers on mod- els, methods, techniques, datasets, and metrics”. In:International Journal of Information Security 24.2 (2025), p. 78. 28

2025

[44] [44]

Llama: Open and efficient foundation language models

Hugo Touvron et al. “Llama: Open and efficient foundation language models”. In:arXiv preprint arXiv:2302.13971(2023)

Pith/arXiv arXiv 2023

[45] [45]

Meltem S¨ onmez Turan et al.Ascon-Based Lightweight Cryptography Standards for Con- strained Devices: Authenticated Encryption, Hash, and Extendable Output Functions. Tech. rep. NIST SP 800-232 (Initial Public Draft). Initial Public Draft. National Institute of Stan- dards and Technology, Oct. 2024.url:https : //csrc.nist.gov/pubs/sp/800/232/ipd

2024

[46] [46]

Attention is all you need

Ashish Vaswani et al. “Attention is all you need”. In:Advances in neural information processing systems30 (2017)

2017

[47] [47]

A survey of human-in-the- loop for machine learning

Xingjiao Wu et al. “A survey of human-in-the- loop for machine learning”. In:Future Generation Computer Systems135 (2022), pp. 364–381

2022

[48] [48]

Jingfeng Yang et al.Harnessing the Power of LLMs in Practice: A Survey on ChatGPT and Beyond. 2023. arXiv:2304.13712 [cs.CL].url: https://arxiv.org/abs/2304.13712

arXiv 2023

[49] [49]

Quanjun Zhang et al.A Survey on Large Lan- guage Models for Software Engineering. 2024. arXiv:2312 . 15223 [cs.SE].url:https : //arxiv.org/abs/2312.15223

arXiv 2024

[50] [50]

Hybrid API migration: A marriage of small API mapping models and large language models

Bingzhe Zhou et al. “Hybrid API migration: A marriage of small API mapping models and large language models”. In:Proceedings of the 14th Asia-Pacific Symposium on Internetware. 2023, pp. 12–21

2023

[51] [51]

Migrating Code At Scale With LLMs At Google

Celal Ziftci et al. “Migrating Code At Scale With LLMs At Google”. In:arXiv preprint arXiv:2504.09691(2025). 29 Appendix A: F ull dataset distribution Table 8: Distribution of single cryptographic primitives across training and validation splits. Primitive / Alg. #Var. Train Val. Hash functions BLAKE2b 7 7 0 BLAKE2s 7 7 0 MD5 10 9 1 RIPEMD160 1 1 0 SHA-1 ...

arXiv 2025