pith. sign in

arxiv: 2603.19340 · v5 · submitted 2026-03-19 · 💻 cs.CR · cs.AR· cs.PF

Benchmarking NIST-Standardised ML-KEM and ML-DSA on ARM Cortex-M0+: Performance, Memory, and Energy on the RP2040

Pith reviewed 2026-05-15 08:42 UTC · model grok-4.3

classification 💻 cs.CR cs.ARcs.PF
keywords post-quantum cryptographyML-KEMML-DSACortex-M0+RP2040benchmarkingkey encapsulation mechanismdigital signature
0
0 comments X

The pith

ML-KEM-512 completes a full key exchange in 35.7 ms on ARM Cortex-M0+, seventeen times faster than ECDH P-256.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper provides the first benchmarks of the NIST-standardized post-quantum algorithms ML-KEM and ML-DSA on the ARM Cortex-M0+ processor using the RP2040 microcontroller. It measures execution time, memory usage, and estimated energy consumption for all security levels across key generation, encapsulation or signing, and decapsulation or verification. The standout result is that a full ML-KEM-512 key exchange takes only 35.7 milliseconds with an energy cost of 2.83 millijoules, which is 17 times faster than a comparable ECDH P-256 operation on the same hardware. ML-DSA operations show significant timing variance due to rejection sampling in signing. These measurements establish a conservative baseline for the feasibility of post-quantum cryptography on highly constrained IoT devices.

Core claim

On the RP2040 running at 133 MHz, unmodified PQClean implementations of ML-KEM-512 require 35.7 ms for a complete key exchange (key generation, encapsulation, and decapsulation) with an estimated energy consumption of 2.83 mJ based on the datasheet power model. This performance is 17 times better than a full ECDH P-256 key agreement on identical hardware. ML-DSA signing exhibits high latency variance with coefficients of variation of 66 to 73 percent and 99th-percentile times reaching 1,125 ms for ML-DSA-87. The M0+ shows only a 1.8 to 1.9 times slowdown relative to published Cortex-M4 results despite lacking advanced instructions.

What carries the argument

Unmodified reference C implementations from PQClean compiled for the ARM Cortex-M0+ on the RP2040 board at 133 MHz with 264 KB SRAM.

If this is right

  • Post-quantum key exchange with ML-KEM is practical for IoT devices with long service lives without exceeding typical latency budgets.
  • ML-DSA signing latency is unpredictable due to rejection sampling, requiring systems to budget for worst-case times up to over a second.
  • The modest slowdown factor of 1.8-1.9x versus M4 suggests that algorithm performance is not heavily penalized by the simpler M0+ architecture.
  • Higher security parameter sets scale predictably in resource use, allowing informed selection for different device classes.
  • Releasing the benchmark suite as open source supports reproducibility and further development by others.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Direct hardware energy measurements could validate the datasheet model and provide more accurate consumption figures.
  • Adding simple optimizations like assembly routines for critical loops might reduce the execution times significantly beyond these reference results.
  • The results imply that post-quantum migration for embedded systems is more feasible than previously assumed based on older, less efficient algorithms.
  • Similar benchmarks on other constrained platforms like 8-bit or 16-bit MCUs would help map the full feasibility landscape.

Load-bearing premise

The power consumption estimates derived from the microcontroller datasheet accurately represent the actual energy used during the cryptographic computations, and the PQClean reference code represents realistic performance without custom tuning.

What would settle it

Measuring the actual current draw on the RP2040 during ML-KEM-512 execution with a precision power monitor and comparing it to the 2.83 mJ estimate would test the energy model.

Figures

Figures reproduced from arXiv: 2603.19340 by Rojin Chhetri.

Figure 1
Figure 1. Figure 1: ML-DSA-44 signing time distribution over 100 runs. The geometric [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗
Figure 3
Figure 3. Figure 3: Key exchange latency comparison on ARM Cortex-M0+. ML-KEM [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: ML-KEM handshake time: Cortex-M0+ (this work) vs Cortex-M4 [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Energy per key exchange on RP2040 (estimated from datasheet: [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗
read the original abstract

The migration to post-quantum cryptography is urgent for Internet of Things devices with 10--20 year lifespans, yet no systematic benchmarks exist for the finalised NIST standards on the most constrained 32-bit processor class. This paper presents the first isolated algorithm-level benchmarks of ML-KEM (FIPS 203) and ML-DSA (FIPS 204) on ARM Cortex-M0+, measured on the RP2040 (Raspberry Pi Pico) at 133 MHz with 264 KB SRAM. Using PQClean reference C implementations, we measure all three security levels of ML-KEM (512/768/1024) and ML-DSA (44/65/87) across key generation, encapsulation/signing, and decapsulation/verification. ML-KEM-512 completes a full key exchange in 35.7 ms with an estimated energy cost of 2.83 mJ (datasheet power model)--17x faster than a complete ECDH P-256 key agreement on the same hardware. ML-DSA signing exhibits high latency variance due to rejection sampling (coefficient of variation 66--73%, 99th-percentile up to 1,125 ms for ML-DSA-87). The M0+ incurs only a 1.8--1.9x slowdown relative to published Cortex-M4 reference C results (compiled with -O3 versus our -Os), despite lacking 64-bit multiply, DSP, and SIMD instructions--making this a conservative upper bound on the microarchitectural penalty. All code, data, and scripts are released as an open-source benchmark suite for reproducibility.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. This paper benchmarks the NIST post-quantum cryptography standards ML-KEM and ML-DSA on the ARM Cortex-M0+ processor using the RP2040 board. It provides performance metrics including execution time, memory footprint, and estimated energy consumption for key generation, encapsulation, and decapsulation operations at all security levels, based on PQClean implementations. Notable findings are that ML-KEM-512 achieves a full key exchange in 35.7 ms with 2.83 mJ estimated energy (17 times faster than ECDH P-256), ML-DSA shows high latency variance due to rejection sampling, and there is only a modest 1.8-1.9x slowdown compared to Cortex-M4 results despite the simpler instruction set.

Significance. The results offer practical guidance for implementing post-quantum cryptography on resource-constrained IoT devices with long lifespans. The release of all code, data, and scripts enhances reproducibility and allows the community to build upon these baselines. Direct timing measurements on actual hardware provide high confidence in the latency claims, making this a solid contribution to the field of embedded post-quantum security.

major comments (1)
  1. [Energy consumption results] The energy figures, such as 2.83 mJ for ML-KEM-512, are calculated using a power value from the RP2040 datasheet rather than direct measurements with a current probe or shunt resistor. This approach assumes constant average power draw during the cryptographic operations, which may vary with the specific instruction and memory access patterns of the PQClean code. While the timing measurements are direct and reliable, the energy estimates carry this modeling uncertainty and should be presented with stronger caveats if they are to support claims about energy efficiency.
minor comments (2)
  1. Ensure that all compiler flags and optimization levels are consistently documented across comparisons to the M4 reference.
  2. The high variance in ML-DSA signing times due to rejection sampling is well-noted; consider discussing implications for real-time applications.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their positive evaluation of the work and for the constructive comment on the energy estimates. We address the point below.

read point-by-point responses
  1. Referee: The energy figures, such as 2.83 mJ for ML-KEM-512, are calculated using a power value from the RP2040 datasheet rather than direct measurements with a current probe or shunt resistor. This approach assumes constant average power draw during the cryptographic operations, which may vary with the specific instruction and memory access patterns of the PQClean code. While the timing measurements are direct and reliable, the energy estimates carry this modeling uncertainty and should be presented with stronger caveats if they are to support claims about energy efficiency.

    Authors: We agree that the reported energy figures are estimates obtained by multiplying measured execution times by the typical power consumption value from the RP2040 datasheet, rather than direct current measurements. In the revised manuscript we will add explicit caveats in the energy-results section, the abstract, and the conclusions, stating that these are modeled estimates assuming constant average power and that actual consumption may vary with instruction mix and memory-access patterns of the PQClean implementations. We will also qualify all comparative energy-efficiency statements to reflect this modeling uncertainty. revision: yes

Circularity Check

0 steps flagged

No circularity; purely empirical measurements with no derivations or self-referential steps

full rationale

The paper consists entirely of direct hardware benchmarks on the RP2040 using unmodified PQClean reference implementations. Latency (e.g., 35.7 ms for ML-KEM-512 key exchange) is measured via cycle counts or timers. Energy (2.83 mJ) is computed by scaling measured time by a constant power value taken from the RP2040 datasheet; this is an external input, not a fitted parameter or self-defined quantity. No equations, ansatzes, uniqueness theorems, or self-citations appear as load-bearing steps in the provided text. The 17x comparison to ECDH is also a direct time ratio. All claims reduce to reproducible runs on open code rather than any internal reduction to inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Empirical benchmarking study with no mathematical derivations, free parameters, or invented entities. Relies on existing PQClean reference implementations and RP2040 hardware specifications from the datasheet.

pith-pipeline@v0.9.0 · 5614 in / 1198 out tokens · 34468 ms · 2026-05-15T08:42:10.551592+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

33 extracted references · 33 canonical work pages

  1. [1]

    Polynomial-time algorithms for prime factorization and discrete logarithms on a quantum computer,

    P. W. Shor, “Polynomial-time algorithms for prime factorization and discrete logarithms on a quantum computer,”SIAM J. Comput., vol. 26, no. 5, pp. 1484–1509, 1997

  2. [2]

    Module-Lattice-Based Key-Encapsulation Mechanism Standard,

    National Institute of Standards and Technology, “Module-Lattice-Based Key-Encapsulation Mechanism Standard,” FIPS 203, Aug. 2024. [Online]. Available: https://csrc.nist.gov/pubs/fips/203/final

  3. [3]

    Module-Lattice-Based Digital Signature Standard,

    National Institute of Standards and Technology, “Module-Lattice-Based Digital Signature Standard,” FIPS 204, Aug. 2024. [Online]. Available: https://csrc.nist.gov/pubs/fips/204/final

  4. [4]

    Stateless Hash-Based Digital Signature Standard,

    National Institute of Standards and Technology, “Stateless Hash-Based Digital Signature Standard,” FIPS 205, Aug. 2024. [Online]. Available: https://csrc.nist.gov/pubs/fips/205/final

  5. [5]

    Post-quantum cryptography for Internet of Things: A survey on performance and optimization,

    T. Liu, G. Ramachandran, and R. Jurdak, “Post-quantum cryptography for Internet of Things: A survey on performance and optimization,” arXiv:2401.17538 [cs.CR], 2024. [Online]. Available: https://arxiv.org/ abs/2401.17538

  6. [6]

    Terminology for constrained- node networks,

    C. Bormann, M. Ersue, and A. Keranen, “Terminology for constrained- node networks,” RFC 7228, Internet Engineering Task Force, May 2014

  7. [7]

    pqm4: Testing and benchmarking NIST PQC on ARM Cortex-M4,

    M. J. Kannwischer, J. Rijneveld, P. Schwabe, and K. Stoffelen, “pqm4: Testing and benchmarking NIST PQC on ARM Cortex-M4,” inProc. NIST 2nd PQC Standardization Conf., 2019

  8. [8]

    pqm4: Bench- marking NIST additional post-quantum signature schemes on ARM Cortex-M4,

    M. J. Kannwischer, R. Krausz, J. Petri, and S. Yang, “pqm4: Bench- marking NIST additional post-quantum signature schemes on ARM Cortex-M4,” inProc. NIST 5th PQC Standardization Conf., 2024. 9

  9. [9]

    Next-generation June 28, 2019 DRAFT 105 environment-aware cellular networks: Modern green techni ques and implementation challenges

    B. Halak, T. Gibson, M. Henley, C.-B. Botea, B. Heath, and S. Khan, “Evaluation of performance, energy, and computation costs of quantum-attack resilient encryption algorithms for embedded de- vices,”IEEE Access, vol. 12, pp. 8791–8805, 2024. DOI: 10.1109/AC- CESS.2024.3350775

  10. [10]

    Saber on ARM: CCA-secure module lattice-based key encapsulation on ARM,

    A. Karmakar, J. M. Bermudo Mera, S. Sinha Roy, and I. Verbauwhede, “Saber on ARM: CCA-secure module lattice-based key encapsulation on ARM,”IACR Trans. Cryptogr. Hardw. Embed. Syst., vol. 2018, no. 3, pp. 243–266, 2018. DOI: 10.13154/tches.v2018.i3.243-266

  11. [11]

    Masking Kyber: First- and higher-order implementations,

    J. W. Bos, M. Gourjon, J. Renes, T. Schneider, and C. van Vredendaal, “Masking Kyber: First- and higher-order implementations,”IACR Trans. Cryptogr. Hardw. Embed. Syst., vol. 2021, no. 4, pp. 173–214, 2021. DOI: 10.46586/tches.v2021.i4.173-214

  12. [12]

    Implementing lattice-based PQC on resource-constrained processors: A case study for Kyber/Saber’s polynomial multiplication on ARM Cortex-M0/M0+,

    L. Li, M. Wang, and W. Wang, “Implementing lattice-based PQC on resource-constrained processors: A case study for Kyber/Saber’s polynomial multiplication on ARM Cortex-M0/M0+,” inProgress in Cryptology – INDOCRYPT 2023, LNCS, vol. 14460, pp. 153–176, Springer, 2023. DOI: 10.1007/978-3-031-56235-8_8

  13. [13]

    Dilithium for memory constrained devices,

    J. W. Bos, J. Renes, and A. Sprenkels, “Dilithium for memory constrained devices,” inProgress in Cryptology – AFRICACRYPT 2022, LNCS, vol. 13503, pp. 217–235, Springer, 2022. DOI: 10.1007/978-3-031-17433- 9_10 IACR ePrint: 2022/323

  14. [14]

    CRYSTALS-Dilithium: Algorithm specifications and supporting documentation (version 3.1),

    L. Ducaset al., “CRYSTALS-Dilithium: Algorithm specifications and supporting documentation (version 3.1),” NIST PQC Round 3, 2021. Available: https://pq-crystals.org/dilithium/data/ dilithium-specification-round3-20210208.pdf

  15. [15]

    Taking ML-KEM & ML-DSA from Cortex-M4 to Cortex-M7 with SLOTHY ,

    F. Abdulrahmanet al., “Taking ML-KEM & ML-DSA from Cortex-M4 to Cortex-M7 with SLOTHY ,” inACM ASIA CCS, 2025

  16. [16]

    ESPM-D: Efficient sparse polynomial multiplication for Dilithium on ARM Cortex-M4 and Apple M2,

    J. Zhaoet al., “ESPM-D: Efficient sparse polynomial multiplication for Dilithium on ARM Cortex-M4 and Apple M2,” arXiv:2404.12675, 2024

  17. [17]

    Optimized implementation of CRYSTALS- Dilithium on 16-bit MSP430,

    H. Park, S. Seo,et al., “Optimized implementation of CRYSTALS- Dilithium on 16-bit MSP430,”J. Inf. Security Appl., 2024

  18. [18]

    pqmx: PQC for ARM Cortex-M55/M85 (Helium MVE),

    F. Abdulrahmanet al., “pqmx: PQC for ARM Cortex-M55/M85 (Helium MVE),” GitHub/IACR, 2023

  19. [19]

    Evaluating post-quantum cryptographic algorithms on resource-constrained devices,

    J. Lopez, V . Cadena, and M. S. Rahman, “Evaluating post-quantum cryptographic algorithms on resource-constrained devices,” inIEEE QCE,

  20. [20]

    Low-performance embedded IoT devices and the need for HW-accelerated PQC,

    M. Grassl and L. Sturm, “Low-performance embedded IoT devices and the need for HW-accelerated PQC,” inIoTBDS, SCITEPRESS, 2024

  21. [21]

    Quantum-resistant security for software updates on low-power networked embedded devices,

    H. Tschofenig, R. Housley,et al., “Quantum-resistant security for software updates on low-power networked embedded devices,” IACR ePrint 2021/577, 2021. arXiv:2106.05577

  22. [22]

    Post-quantum crypto on ARM Cortex-M,

    P. Schwabe, “Post-quantum crypto on ARM Cortex-M,” Microsoft Research Talk, 2019

  23. [23]

    RP2040 Datasheet,

    Raspberry Pi Ltd., “RP2040 Datasheet,” 2024. [Online]. Available: https: //datasheets.raspberrypi.com/rp2040/rp2040-datasheet.pdf

  24. [24]

    PQClean: Clean, portable, tested implementations of post-quantum cryptographic algorithms,

    PQClean Contributors, “PQClean: Clean, portable, tested implementations of post-quantum cryptographic algorithms,” GitHub, 2024. [Online]. Available: https://github.com/PQClean/PQClean

  25. [25]

    Securing the future IoT with post- quantum cryptography,

    W. Ahmed, M. N. Bhutta,et al., “Securing the future IoT with post- quantum cryptography,” arXiv:2206.10473, 2022

  26. [26]

    A survey of post-quantum cryptography support in cryptographic libraries,

    N. Ahmed, L. Zhang, and A. Gangopadhyay, “A survey of post-quantum cryptography support in cryptographic libraries,” arXiv:2508.16078, 2025

  27. [27]

    PQC-LEO: Evaluation framework for PQC on IoT networks,

    A. Hanna, A. Adebusola,et al., “PQC-LEO: Evaluation framework for PQC on IoT networks,” arXiv, Mar. 2026

  28. [28]

    NTT multiplication for NTT-unfriendly rings: New speed records for Saber and NTRU on Cortex-M4 and A VR,

    C.-M. M. Chung, V . Hwang, M. J. Kannwischer, G. Seiler, C.-J. Shih, and B.-Y . Yang, “NTT multiplication for NTT-unfriendly rings: New speed records for Saber and NTRU on Cortex-M4 and A VR,”IACR Trans. Cryptogr. Hardw. Embed. Syst., vol. 2021, no. 1, pp. 159–188, 2021

  29. [29]

    Mind the faulty Keccak: A practical fault injection attack scheme apply to all phases of ML-KEM and ML-DSA,

    Y . Wang, J. Yu, S. Qu, X. Zhang, X. Li, C. Zhang, and D. Gu, “Mind the faulty Keccak: A practical fault injection attack scheme apply to all phases of ML-KEM and ML-DSA,” IACR ePrint 2024/1522, 2024

  30. [30]

    PQMicroLib-Core: ML-KEM in 5 KB RAM for Cortex- M,

    PQShield Ltd., “PQMicroLib-Core: ML-KEM in 5 KB RAM for Cortex- M,” Embedded World, Mar. 2026

  31. [31]

    From ECDSA to ML-DSA: Migration analysis and implemen- tation considerations,

    D. Dinu, “From ECDSA to ML-DSA: Migration analysis and implemen- tation considerations,” IACR ePrint 2025/2025, 2025

  32. [32]

    mlkem-native and mldsa- native: Production-ready PQC implementations,

    Post-Quantum Cryptography Alliance (PQCA), “mlkem-native and mldsa- native: Production-ready PQC implementations,” Linux Foundation, 2026. [Online]. Available: https://github.com/pq-code-package

  33. [33]

    Mobile energy requirements of the upcoming NIST post-quantum cryptography standards,

    M.-J. O. Saarinen, “Mobile energy requirements of the upcoming NIST post-quantum cryptography standards,”arXiv preprint arXiv:1912.00916, 2019. APPENDIXA BENCHMARKSOURCECODE The complete benchmark source code, raw data, analysis scripts, and reproduction instructions are available in the companion open-source repository: https://github.com/rojinc/pqc-cort...