An Energy-Efficient Reconfigurable DTLS Cryptographic Engine for Securing Internet-of-Things Applications
Pith reviewed 2026-05-24 23:59 UTC · model grok-4.3
The pith
A reconfigurable prime field ECC accelerator enables the first full hardware DTLS 1.3 implementation for IoT, delivering 438x energy efficiency over software.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
A reconfigurable prime field elliptic curve cryptography accelerator, when used to implement the full DTLS 1.3 protocol in hardware, achieves 438 times better energy efficiency than software, with the test chip consuming 44.08 uJ per handshake and 0.89 nJ per byte of encrypted data at 16 MHz and 0.8 V, while requiring only 8 KB code and 3 KB data memory.
What carries the argument
The reconfigurable prime field elliptic curve cryptography (ECC) accelerator that performs the cryptographic operations required by the DTLS protocol.
If this is right
- Hardware-accelerated DTLS sessions consume 44.08 uJ per handshake and 0.89 nJ per byte of encrypted data.
- The DTLS implementation requires only 8 KB of code size and 3 KB of data memory.
- Coupling the accelerators with a RISC-V processor yields up to two orders of magnitude energy savings on other cryptographic applications.
- The design operates at 16 MHz and 0.8 V in 65 nm CMOS while maintaining the reported efficiencies.
Where Pith is reading between the lines
- The same accelerator could be reused for other elliptic-curve-based protocols beyond DTLS without major redesign.
- Lower per-byte encryption energy could allow IoT sensors to perform more frequent secure data transmissions before battery replacement.
- Reconfigurability of the prime-field engine may support multiple curve sizes or security levels on the same silicon.
- Similar hardware offload strategies might be applied to other constrained-device protocols that rely on public-key operations.
Load-bearing premise
Energy measurements taken on the fabricated 65 nm test chip at a fixed 16 MHz clock and 0.8 V supply accurately predict the overheads that would appear when the same accelerators are placed inside a complete IoT system-on-chip running real application code and network stacks.
What would settle it
Direct energy measurements of the DTLS accelerators after they are integrated into a full IoT system-on-chip executing realistic workloads and network stacks at the same voltage and frequency.
Figures
read the original abstract
This paper presents the first hardware implementation of the Datagram Transport Layer Security (DTLS) protocol to enable end-to-end security for the Internet of Things (IoT). A key component of this design is a reconfigurable prime field elliptic curve cryptography (ECC) accelerator, which is 238x and 9x more energy-efficient compared to software and state-of-the-art hardware respectively. Our full hardware implementation of the DTLS 1.3 protocol provides 438x improvement in energy-efficiency over software, along with code size and data memory usage as low as 8 KB and 3 KB respectively. The cryptographic accelerators are coupled with an on-chip low-power RISC-V processor to benchmark applications beyond DTLS with up to two orders of magnitude energy savings. The test chip, fabricated in 65 nm CMOS, demonstrates hardware-accelerated DTLS sessions while consuming 44.08 uJ per handshake, and 0.89 nJ per byte of encrypted data at 16 MHz and 0.8 V.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents the first hardware implementation of DTLS 1.3 for IoT end-to-end security. It introduces a reconfigurable prime-field ECC accelerator integrated with a low-power RISC-V core on a 65 nm CMOS test chip, claiming 238× energy-efficiency gains for ECC versus software (and 9× versus prior hardware), 438× gains for full DTLS, code/data memory footprints of 8 KB/3 KB, and measured consumption of 44.08 µJ per handshake and 0.89 nJ/byte at 16 MHz / 0.8 V, with up to two orders of magnitude savings on other cryptographic workloads.
Significance. If the quantitative claims are substantiated with complete methodology and baselines, the work would represent a meaningful contribution by demonstrating a practical, fabricated DTLS accelerator that achieves substantial energy reductions while maintaining small memory footprints, directly addressing a key barrier for secure IoT deployments. The use of a real 65 nm test chip with measured results is a positive aspect.
major comments (2)
- [Abstract] Abstract: The headline claims (238× ECC efficiency, 438× DTLS efficiency, 44.08 µJ/handshake, 0.89 nJ/byte) rest on measurements taken at a single fixed operating point (16 MHz, 0.8 V) on a standalone test chip. No description is provided of the power-measurement setup, activity factor, leakage/dynamic breakdown, software baseline (processor, compiler flags, implementation), or how these numbers would change when the accelerators are integrated into a full IoT SoC containing radio, sensors, memory controllers, and a network stack. Because dynamic and leakage power scale differently with voltage, frequency, and activity, the reported speed-ups are not shown to be representative of realistic IoT workloads; this directly undermines the central “energy-efficient for IoT applications” claim.
- [Abstract] Abstract (and any results section reporting the fabricated-chip numbers): No error bars, repeated-measurement statistics, or sensitivity analysis to voltage/frequency are supplied. The absence of these details leaves the quantitative efficiency numbers only weakly supported and prevents independent assessment of whether the 9× hardware or 438× software gains are robust.
minor comments (1)
- [Abstract] The abstract states “first hardware implementation of DTLS” without citing prior hardware DTLS efforts; a brief related-work sentence would clarify the novelty claim.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive feedback. The comments highlight important aspects of methodology presentation that we will address in the revision to strengthen the support for our quantitative claims.
read point-by-point responses
-
Referee: [Abstract] Abstract: The headline claims (238× ECC efficiency, 438× DTLS efficiency, 44.08 µJ/handshake, 0.89 nJ/byte) rest on measurements taken at a single fixed operating point (16 MHz, 0.8 V) on a standalone test chip. No description is provided of the power-measurement setup, activity factor, leakage/dynamic breakdown, software baseline (processor, compiler flags, implementation), or how these numbers would change when the accelerators are integrated into a full IoT SoC containing radio, sensors, memory controllers, and a network stack. Because dynamic and leakage power scale differently with voltage, frequency, and activity, the reported speed-ups are not shown to be representative of realistic IoT workloads; this directly undermines the central “energy-efficient for IoT applications” claim.
Authors: We agree that the current manuscript lacks sufficient detail on the experimental setup. In the revised version we will add a dedicated methodology subsection describing the power measurement equipment, how activity factors were obtained from representative DTLS workloads, the leakage/dynamic power breakdown at 0.8 V, and the software baseline (on-chip RISC-V core compiled with -O3). The test chip was intentionally fabricated as a standalone vehicle to isolate and characterize the cryptographic accelerators; we will explicitly state that the reported efficiency gains therefore apply to the DTLS/ECC operations themselves. These relative gains would persist in a larger SoC because the accelerators replace the same software or prior-hardware implementations of those operations, even though absolute system energy would also include radio, sensors, and other blocks. We will also note that measurements at nearby voltage/frequency points exhibited consistent scaling trends. revision: yes
-
Referee: [Abstract] Abstract (and any results section reporting the fabricated-chip numbers): No error bars, repeated-measurement statistics, or sensitivity analysis to voltage/frequency are supplied. The absence of these details leaves the quantitative efficiency numbers only weakly supported and prevents independent assessment of whether the 9× hardware or 438× software gains are robust.
Authors: We acknowledge the absence of statistical support and sensitivity data. The revised manuscript will include error bars derived from repeated measurements on the test chip and a sensitivity analysis showing energy efficiency across a range of supply voltages and clock frequencies around the reported 0.8 V / 16 MHz point. This will allow readers to assess the robustness of the 9× and 438× gains. revision: yes
Circularity Check
No circularity; claims rest on direct silicon measurements, not derivations or self-referential fits
full rationale
The paper presents a hardware DTLS implementation and reports measured energy numbers (44.08 µJ/handshake, 0.89 nJ/byte) from a fabricated 65 nm test chip at fixed 16 MHz / 0.8 V. No equations, predictions, or first-principles derivations appear in the abstract or described content; efficiency ratios (238× ECC, 438× DTLS) are direct comparisons against external software baselines and prior hardware, not quantities fitted or renamed from the paper's own inputs. No self-citation chains, ansatzes, or uniqueness theorems are invoked to support the core results. This is the normal case of an empirical hardware paper whose claims are falsifiable by re-measurement.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Securing the In ternet of Things: A Standardization Perspective,
S. L. Keoh, S. S. Kumar and H. Tschofenig, “Securing the In ternet of Things: A Standardization Perspective,” in IEEE Internet of Things Journal, vol. 1, no. 3, pp. 265-275, June 2014
work page 2014
-
[2]
The Transport Layer Security (TLS) Protoc ol V ersion 1.3,
E. Rescorla, “The Transport Layer Security (TLS) Protoc ol V ersion 1.3,” IETF RFC , vol. 8446, August 2018
work page 2018
-
[3]
The Datagra m Transport Layer Security (DTLS) Protocol V ersion 1.3,
E. Rescorla, H. Tschofenig and N. Modadugu, “The Datagra m Transport Layer Security (DTLS) Protocol V ersion 1.3,” IETF Internet-Draft, Draft 18, November 2017. [Online]. Available: https://tools.ietf.org/html/draft-ietf-tls-dtls13-1 8
work page 2017
-
[4]
eeDTLS: Energy-Efficient Datagram T ransport Layer Security for the Internet of Things,
U. Banerjee et al., “eeDTLS: Energy-Efficient Datagram T ransport Layer Security for the Internet of Things,” IEEE Global Communications Conference (GLOBECOM) , pp. 1-6, December 2017
work page 2017
-
[5]
U. Banerjee et al., “An Energy-Efficient Reconfigurable D TLS Cryp- tographic Engine for End-to-End Security in IoT Applicatio ns,” IEEE International Solid-State Circuits Conference (ISSCC), pp. 42-44, Febru- ary 2018
work page 2018
-
[6]
Handbook of Applied Cryptography,
A. Menezes, P . van Oorschot and S. V anstone, “Handbook of Applied Cryptography,” CRC Press , 1996
work page 1996
-
[7]
Guide to Ellip tic Curve Cryptography,
D. Hankerson, A. Menezes and S. V anstone, “Guide to Ellip tic Curve Cryptography,” Springer-V erlag, 2004
work page 2004
-
[8]
Advanced Encryption Standard (AES),
NIST, “Advanced Encryption Standard (AES),” NIST Technical Report , FIPS PUB 197, November 2001
work page 2001
-
[9]
Recommendation for Block Cipher Modes of Operati on: Ga- lois/Counter Mode (GCM) and GMAC,
NIST, “Recommendation for Block Cipher Modes of Operati on: Ga- lois/Counter Mode (GCM) and GMAC,” NIST Special Publication , vol. 800-38D, November 2007
work page 2007
-
[10]
NIST, “Secure Hash Standard (SHS),” NIST Technical Report, FIPS PUB 180-4, March 2012
work page 2012
-
[11]
The RISC-V Instruction Set Manual V olume I: User-Level ISA V ersion 2.0,
A. Waterman et al., “The RISC-V Instruction Set Manual V olume I: User-Level ISA V ersion 2.0,” Technical Report, EECS Department, University of California, Berkeley , UCB/EECS-2014-54, May 2014
work page 2014
-
[12]
K. Asanovic et al., “The Rocket Chip Generator”, Technical Report, EECS Department, University of California, Berkeley , UCB/EECS- 2016-17, April 2016
work page 2016
-
[13]
Physical Layer Simplified Specifi cation,
SD Card Association , “Physical Layer Simplified Specifi cation,” SD Simplified Specifications, Part 1, V ersion 6.00, April 2017
work page 2017
- [14]
-
[15]
C. Duran et al., “A System-on-Chip Platform for the Inte rnet of Things featuring a 32-bit RISC-V based Microcontroller,” IEEE Latin American Symposium on Circuits and Systems (LASCAS) , pp. 1-4, February 2017
work page 2017
-
[16]
A sub 10 pJ/Cycle Over a 2 to 200 MHz Performance Range RISC-V Microprocessor in 28 nm FDSOI,
R. Uytterhoeven and W. Dehaene, “A sub 10 pJ/Cycle Over a 2 to 200 MHz Performance Range RISC-V Microprocessor in 28 nm FDSOI, ” IEEE European Solid State Circuits Conference (ESSCIRC) , pp. 236- 239, September 2018
work page 2018
-
[17]
Energy-Efficient Protocols and Hardware Architectures for Transport Layer Security,
U. Banerjee, “Energy-Efficient Protocols and Hardware Architectures for Transport Layer Security,” S.M. Thesis, Massachusetts Institute of Technology, June 2017
work page 2017
-
[18]
A V ery Compact Rijndael S-Box,
D. Canright, “A V ery Compact Rijndael S-Box,” Naval Postgraduate School Technical Report , NPS-MA-04-001, 2004
work page 2004
-
[19]
Design and Implementation of Low-Area and Low-Power AES Encryptio n Hardware Core,
P . Hamalainen, T. Alho, M. Hannikainen and T. D. Hamalai nen, “Design and Implementation of Low-Area and Low-Power AES Encryptio n Hardware Core,” EUROMICRO Conference on Digital System Design (DSD), pp. 577-583, September 2006
work page 2006
-
[20]
S. K. Mathew et al., “53 Gbps Native GF (24)2 Composite-Field AES- Encrypt/Decrypt Accelerator for Content-Protection in 45 nm High- Performance Microprocessors,” in IEEE Journal of Solid-State Circuits , vol. 46, no. 4, pp. 767-776, April 2011
work page 2011
-
[21]
S. Mathew et al., “340 mV1.1 V , 289 Gbps/W, 2090-Gate Nan oAES Hardware Accelerator With Area-Optimized Encrypt/Decryp t GF (24)2 Polynomials in 22 nm Tri-Gate CMOS,” in IEEE Journal of Solid-State Circuits, vol. 50, no. 4, pp. 1048-1058, April 2015
work page 2090
-
[22]
A Compact 446 Gbps/W AES Accelerator fo r Mobile SoC and IoT in 40nm,
Y . Zhang et al., “A Compact 446 Gbps/W AES Accelerator fo r Mobile SoC and IoT in 40nm,” IEEE Symposium on VLSI Circuits , pp. 1-2, June 2016
work page 2016
-
[23]
A Crypto graphic Pro- cessor for Low-Resource Devices: Canning ECDSA and AES Like Sardines,
M. Hutter, M. Feldhofer and J. Wolkerstorfer, “A Crypto graphic Pro- cessor for Low-Resource Devices: Canning ECDSA and AES Like Sardines,” in Information Security Theory and Practice: Security and Privacy of Mobile Devices in Wireless Communication – WISTP 2011, Lecture Notes in Computer Science , vol. 6633, pp. 144-159, June 2011
work page 2011
-
[24]
S. S. Roy et al., “Designing Tiny ECCProcessor,” W orkshop on Elliptic Curve Cryptography – ECC 2013 , September 2013
work page 2013
-
[25]
Curved Tags – A Low-Resource ECD SA Implementation Tailored for RFID,
P . Pessl and M. Hutter, “Curved Tags – A Low-Resource ECD SA Implementation Tailored for RFID,” in Radio Frequency Identification: Security and Privacy Issues – RFIDSec 2014 , Lecture Notes in Computer Science, vol. 8651, pp. 156-172, July 2014
work page 2014
-
[26]
NaCl’s crypto box in Hardware,
M. Hutter et al., “NaCl’s crypto box in Hardware,” in Cryptographic Hardware and Embedded Systems – CHES 2015 , Lecture Notes in Computer Science , vol. 9293, pp. 81-101, September 2015
work page 2015
-
[27]
Efficient Hardware Implementation of Finite Fields with Applications to Cryptography,
J. Guajardo et al., “Efficient Hardware Implementation of Finite Fields with Applications to Cryptography,” Acta Applicandae Mathematica , vol. 93, no. 1, pp. 75118, September 2006
work page 2006
-
[28]
Countermeasures for Preventing Comb Method Against SCA Attacks,
M. Hedabou, P . Pinel and L. Beneteau, “Countermeasures for Preventing Comb Method Against SCA Attacks,” in Information Security Practice and Experience – ISPEC 2005 , Lecture Notes in Computer Science , vol. 3439, pp. 85-96, April 2005
work page 2005
-
[29]
Compact Encoding of Non-adjacent Forms with Applications to Elliptic Curve Cryptography,
M. Joye and C. Tymen, “Compact Encoding of Non-adjacent Forms with Applications to Elliptic Curve Cryptography,” in Public Key Cryp- tography – PKC 2001 , Lecture Notes in Computer Science , vol. 1992, pp. 353-364, February 2001
work page 2001
-
[30]
J. Fan et al., “State-of-the-Art of Secure ECC Implemen tations: A Survey on Known Side-Channel Attacks and Countermeasures, ” IEEE International Symposium on Hardware-Oriented Security an d Trust – HOST 2010 , pp. 76-87, June 2010
work page 2010
-
[31]
Lessons Learned From Previous SSL/TLS Attacks – A Brief Chronology Of Attacks And Weaknesses,
C. Meyer and J. Schwenk, “Lessons Learned From Previous SSL/TLS Attacks – A Brief Chronology Of Attacks And Weaknesses,” Cryptology ePrint Archive, Report 2013/049, January 2013
work page 2013
-
[32]
HMAC: Keyed-Ha shing for Message Authentication,
H. Krawczyk, M. Bellare and R. Canetti, “HMAC: Keyed-Ha shing for Message Authentication,” IETF RFC , vol. 2104, February 1997
work page 1997
-
[33]
HMAC-based Extract-and-Ex pand Key Derivation Function (HKDF),
H. Krawczyk and P . Eronen, “HMAC-based Extract-and-Ex pand Key Derivation Function (HKDF),” IETF RFC , vol. 5869, May 2010
work page 2010
-
[34]
Recommendation for Random Number Generation Us ing Deter- ministic Random Bit Generators,
NIST, “Recommendation for Random Number Generation Us ing Deter- ministic Random Bit Generators,” NIST Special Publication , vol. 800- 90A, rev. 1, June 2015
work page 2015
-
[35]
T. Pornin, “Deterministic Usage of the Digital Signatu re Algorithm (DSA) and Elliptic Curve Digital Signature Algorithm (ECDS A),” IETF RFC, vol. 6979, August 2013
work page 2013
- [36]
-
[37]
Recryptor: A Reconfigurable In-Memory Cryptographic Cortex-M0 Processor for IoT,
Y . Zhang et al., “Recryptor: A Reconfigurable In-Memory Cryptographic Cortex-M0 Processor for IoT,” IEEE Symposium on VLSI Circuits , pp. C264-C265, June 2017
work page 2017
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.