pith. sign in

arxiv: 2605.22441 · v1 · pith:INCXBZTHnew · submitted 2026-05-21 · 💻 cs.CR · cs.AI

A Constant-Time Implementation Methodology for Activation Functions on Microcontrollers

Pith reviewed 2026-05-22 05:14 UTC · model grok-4.3

classification 💻 cs.CR cs.AI
keywords constant-time implementationactivation functionstiming side channelsmicrocontrollersembedded neural networksside-channel resistanceARM Cortex-M4
0
0 comments X

The pith

A combination of branchless selection, fixed-cost approximations, dummy operations, and cycle alignment produces activation functions with identical cycle counts on microcontrollers.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets out a method to make common activation functions run without timing variation on embedded devices. Embedded neural network inference can leak private information when activation functions take different amounts of time for different inputs. The authors combine four techniques to force every input through the same sequence of operations and the same number of processor cycles. If the method works as described, it removes one practical source of side-channel leakage while keeping the functions accurate enough for real inference tasks.

Core claim

The authors show that ReLU, sigmoid, tanh, GELU, and Swish can be rewritten so that each evaluation always consumes exactly the same number of cycles on an ARM Cortex-M4: 88 cycles when three functions are used together and 108 cycles when all five are used. The rewrite relies on branchless selection, a Padé approximation whose cost is fixed in advance, inserted dummy arithmetic to balance paths, and explicit cycle alignment. They also demonstrate that a desynchronization countermeasure does not stop a template-based timing attack, while their own versions retain high numerical accuracy.

What carries the argument

The four-part methodology of branchless selection, fixed-cost Padé approximation, dummy arithmetic, and cycle alignment that forces every input through an identical sequence of operations.

If this is right

  • All tested inputs produce exactly the same cycle count in both the three-function and five-function configurations.
  • Numerical error stays low enough that the approximated functions remain usable in embedded inference.
  • A desynchronization countermeasure fails against template timing attacks on the same platform.
  • The same four techniques can be applied to construct side-channel-resistant versions of other activation functions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same pattern of fixed-cost rewriting could be used for other nonlinear operations that appear in neural networks on microcontrollers.
  • Porting the method to a different processor family would require re-tuning the cycle-alignment step to the new instruction timings.
  • Widespread adoption would let developers deploy neural networks on edge devices without exposing input data through activation timing.

Load-bearing premise

That adding dummy arithmetic and cycle alignment produces truly identical execution times for all inputs without creating fresh side channels or unacceptable approximation error.

What would settle it

Measuring actual cycle counts on the Cortex-M4 for many different input values and finding any deviation from the reported 88 or 108 cycles, or recovering input information via a timing template attack on the new implementation.

Figures

Figures reproduced from arXiv: 2605.22441 by Andreas Rechberger, Andrii Tyvodar, Bernhard Jungk, Dirmanto Jap, Jakub Breier, Shivam Bhasin, Xiaolu Hou.

Figure 1
Figure 1. Figure 1: Summary of the threat model, proposed methodology, and experimental results. Timing leakage in embedded activation [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 3
Figure 3. Figure 3: Execution time after applying the desynchronization [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗
Figure 2
Figure 2. Figure 2: Execution time as a function of input for the unpro [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗
Figure 4
Figure 4. Figure 4: Evolution of the accumulated class scores under the template-based attack for the three possible true activation functions. [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Execution time as a function of input for the protected activation-function implementations: (a) ReLU, (b) sigmoid, (c) [PITH_FULL_IMAGE:figures/full_fig_p011_5.png] view at source ↗
Figure 7
Figure 7. Figure 7: Execution time as a function of input for the extended [PITH_FULL_IMAGE:figures/full_fig_p011_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Absolute error of the protected sigmoid, [PITH_FULL_IMAGE:figures/full_fig_p012_8.png] view at source ↗
read the original abstract

Embedded neural-network inference can leak information through timing side channels, including leakage caused by the evaluation of activation functions. This work proposes a constant-time implementation methodology for activation functions on embedded microcontrollers and validates it on ReLU, sigmoid, tanh, GELU, and Swish on an ARM Cortex-M4 platform. The proposed methodology combines branchless selection, fixed-cost Pad\'e-based approximation, dummy arithmetic where needed, and cycle alignment to obtain timing-regular activation-function implementations. As motivation, we also evaluate a desynchronization-based countermeasure and show that it remains vulnerable to a template-based timing attack. Experimental results show that the resulting protected implementations achieve identical cycle counts for all tested inputs, including (88) cycles in the three-function setting and (108) cycles in the five-function setting. At the same time, the numerical-error analysis indicates that the approximated nonlinear functions retain high accuracy. These results suggest that the proposed methodology provides a practical basis for constructing side-channel-resistant activation functions in embedded inference.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes a constant-time implementation methodology for activation functions (ReLU, sigmoid, tanh, GELU, Swish) on microcontrollers to mitigate timing side-channel leakage in embedded neural-network inference. The approach combines branchless selection, fixed-cost Padé approximants, dummy arithmetic, and cycle alignment. On an ARM Cortex-M4, it reports identical measured cycle counts (88 cycles for the three-function case and 108 cycles for the five-function case) across tested inputs together with high numerical accuracy. The manuscript also shows that a desynchronization countermeasure remains vulnerable to template-based timing attacks.

Significance. If the constant-time property can be established more rigorously, the work supplies a concrete, implementable set of techniques for protecting activation functions against timing attacks on resource-constrained platforms. The experimental cycle-count results and the attack demonstration on desynchronization provide useful practical evidence; the methodology itself is a strength for the embedded-security community.

major comments (2)
  1. [experimental results section] The central claim that the implementations achieve identical cycle counts for all inputs rests on measurements reported for selected test inputs only (Abstract and experimental results section). No exhaustive enumeration over the full float32 domain (subnormals, infinities, NaN payloads) or assembly-level verification is supplied; untested inputs could still exercise different FP exception paths or cache effects on Cortex-M4. This directly affects the soundness of the side-channel resistance guarantee.
  2. [Methodology] The description of how dummy arithmetic and cycle alignment guarantee identical execution paths lacks detail on coverage of all possible floating-point behaviors and on whether the chosen operations remain constant-cost under the target compiler and optimization settings. Without this, the claim that the combination produces truly identical cycle counts for every input remains incompletely supported.
minor comments (2)
  1. Provide the exact set of test inputs used for the cycle-count measurements and, if possible, raw measurement data or a link to the implementation.
  2. The numerical-error analysis would benefit from explicit maximum absolute or relative error bounds and a direct comparison against the corresponding standard-library implementations.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for the constructive and detailed feedback. The comments correctly identify areas where the constant-time claims require stronger supporting evidence. We address each major comment below, indicating planned revisions to the manuscript.

read point-by-point responses
  1. Referee: [experimental results section] The central claim that the implementations achieve identical cycle counts for all inputs rests on measurements reported for selected test inputs only (Abstract and experimental results section). No exhaustive enumeration over the full float32 domain (subnormals, infinities, NaN payloads) or assembly-level verification is supplied; untested inputs could still exercise different FP exception paths or cache effects on Cortex-M4. This directly affects the soundness of the side-channel resistance guarantee.

    Authors: We agree that the reported measurements cover only a selected set of test inputs rather than an exhaustive enumeration of the float32 domain. An exhaustive test over all 2^32 possible values is not feasible in practice. In the revised manuscript we will expand the experimental results section to describe the test-vector selection process in detail, including explicit coverage of subnormals, infinities, and NaN payloads. We will also add assembly-level inspection of the generated code (under the exact compiler and optimization settings used) to verify that no input-dependent control flow or variable-latency FP operations remain. Potential cache effects on the Cortex-M4 will be discussed and shown to be constant under our measurement methodology. revision: partial

  2. Referee: [Methodology] The description of how dummy arithmetic and cycle alignment guarantee identical execution paths lacks detail on coverage of all possible floating-point behaviors and on whether the chosen operations remain constant-cost under the target compiler and optimization settings. Without this, the claim that the combination produces truly identical cycle counts for every input remains incompletely supported.

    Authors: We acknowledge that the current methodology description is insufficiently detailed. In the revision we will substantially expand this section to explain, step by step, how the dummy arithmetic operations and cycle-alignment padding were chosen to cover all IEEE 754 behaviors that can arise in the Padé approximations and branchless selection logic. We will specify the exact compiler (arm-none-eabi-gcc), optimization level, and flags employed, and provide reasoning or micro-benchmark evidence that each inserted operation executes in constant time on the Cortex-M4 FPU. A supplementary table listing the constant-cost operations and their measured latencies will be added. revision: yes

standing simulated objections not resolved
  • Exhaustive enumeration over the entire float32 input domain (approximately 4 billion values) cannot be performed in any practical experimental timeframe.

Circularity Check

0 steps flagged

No significant circularity; claims rest on experimental measurements

full rationale

The paper presents a practical methodology for constant-time activation functions using branchless selection, Padé approximants, dummy operations, and cycle alignment, then validates identical cycle counts via direct hardware measurements on an ARM Cortex-M4 for selected inputs. No derivation chain reduces a prediction to its own fitted inputs by construction, no load-bearing self-citation justifies a uniqueness theorem, and no ansatz is smuggled through prior work. The central result is an empirical observation of cycle counts rather than a self-referential mathematical claim, making the approach self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is an applied implementation paper whose central claim rests on the correctness of standard embedded programming practices and platform-specific timing behavior rather than new mathematical axioms or postulated entities.

pith-pipeline@v0.9.0 · 5721 in / 1217 out tokens · 71040 ms · 2026-05-22T05:14:43.821669+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

27 extracted references · 27 canonical work pages · 2 internal anchors

  1. [1]

    Hardware/software co-exploration of neural architectures,

    W. Jiang, L. Yang, E. H.-M. Sha, Q. Zhuge, S. Gu, S. Dasgupta, Y . Shi, and J. Hu, “Hardware/software co-exploration of neural architectures,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 39, no. 12, pp. 4805–4815, 2020

  2. [2]

    Heterogeneous fpga-based cost-optimal design for timing-constrained cnns,

    W. Jiang, E. H.-M. Sha, Q. Zhuge, L. Yang, X. Chen, and J. Hu, “Heterogeneous fpga-based cost-optimal design for timing-constrained cnns,”IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 37, no. 11, pp. 2542–2554, 2018

  3. [3]

    Batch-mot: Batch-enabled real-time scheduling for multiobject tracking tasks,

    D. Kang, S. Lee, C.-H. Hong, J. Lee, and H. Baek, “Batch-mot: Batch-enabled real-time scheduling for multiobject tracking tasks,”IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 43, no. 11, pp. 3539–3550, 2024

  4. [4]

    On implementation- level security of edge-based machine learning models,

    L. Batina, S. Bhasin, J. Breier, X. Hou, and D. Jap, “On implementation- level security of edge-based machine learning models,” inSecurity and Artificial Intelligence: A Crossdisciplinary Approach. Springer, 2022, pp. 335–359

  5. [5]

    {CSI}{NN}: Reverse engineering of neural network architectures through electromagnetic side channel,

    L. Batina, S. Bhasin, D. Jap, and S. Picek, “{CSI}{NN}: Reverse engineering of neural network architectures through electromagnetic side channel,” in28th USENIX Security Symposium (USENIX Security 19), 2019, pp. 515–532

  6. [6]

    Sok: Neural network extraction through physical side channels,

    P. Horváth, D. Lauret, Z. Liu, and L. Batina, “Sok: Neural network extraction through physical side channels,” inUSENIX Security Sympo- sium, 2024

  7. [7]

    Leaky nets: Recovering embedded neural network models and inputs through simple power and timing side-channels—attacks and defenses,

    S. Maji, U. Banerjee, and A. P. Chandrakasan, “Leaky nets: Recovering embedded neural network models and inputs through simple power and timing side-channels—attacks and defenses,”IEEE Internet of Things Journal, vol. 8, no. 15, pp. 12 079–12 092, 2021

  8. [8]

    Simple electromag- netic analysis against activation functions of deep neural networks,

    G. Takatoi, T. Sugawara, K. Sakiyama, and Y . Li, “Simple electromag- netic analysis against activation functions of deep neural networks,” inInternational Conference on Applied Cryptography and Network Security. Springer, 2020, pp. 181–197

  9. [9]

    A desynchronization-based countermeasure against side-channel analysis of neural networks,

    J. Breier, D. Jap, X. Hou, and S. Bhasin, “A desynchronization-based countermeasure against side-channel analysis of neural networks,” in International Symposium on Cyber Security, Cryptology, and Machine Learning. Springer, 2023, pp. 296–306

  10. [10]

    Gaussian Error Linear Units (GELUs)

    D. Hendrycks and K. Gimpel, “Gaussian error linear units (gelus),”arXiv preprint arXiv:1606.08415, 2016

  11. [11]

    Searching for Activation Functions

    P. Ramachandran, B. Zoph, and Q. V . Le, “Searching for activation functions,”arXiv preprint arXiv:1710.05941, 2017

  12. [12]

    Deepem: Deep neural networks model recovery through em side-channel information leakage,

    H. Yu, H. Ma, K. Yang, Y . Zhao, and Y . Jin, “Deepem: Deep neural networks model recovery through em side-channel information leakage,” in2020 IEEE International Symposium on Hardware Oriented Security and Trust (HOST). IEEE, 2020, pp. 209–218

  13. [13]

    Catch the star: Weight recovery attack using side-channel star map against dnn accelerator,

    L. Wu, L. Wu, and X. Zhang, “Catch the star: Weight recovery attack using side-channel star map against dnn accelerator,”IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2025

  14. [14]

    “whispering mlaas

    S. Shukla, M. Alam, S. Bhattacharya, P. Mitra, and D. Mukhopadhyay, ““whispering mlaas”: Exploiting timing channels to compromise user privacy in deep neural networks,”IACR Transactions on Cryptographic Hardware and Embedded Systems, pp. 587–613, 2023

  15. [15]

    Timing channels in adaptive neural networks

    A. Akinsanya and T. Brennan, “Timing channels in adaptive neural networks.” inNDSS, 2024

  16. [16]

    Mangard, E

    S. Mangard, E. Oswald, and T. Popp,Power analysis attacks: Revealing the secrets of smart cards. Springer, 2007

  17. [17]

    Make shuffling great again: A side-channel-resistant fisher–yates algorithm for protecting neural networks,

    L. Pušká ˇc, M. Benovi ˇc, J. Breier, and X. Hou, “Make shuffling great again: A side-channel-resistant fisher–yates algorithm for protecting neural networks,”IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 2025

  18. [18]

    Defense against ml-based power side-channel attacks on dnn accelerators with adversarial attacks,

    X. Yan, C. H. Chang, and T. Zhang, “Defense against ml-based power side-channel attacks on dnn accelerators with adversarial attacks,”arXiv preprint arXiv:2312.04035, 2023

  19. [19]

    Permutev: A performant side-channel- resistant risc-v core securing edge ai inference,

    N. Narkthong and X. Xu, “Permutev: A performant side-channel- resistant risc-v core securing edge ai inference,”arXiv preprint arXiv:2512.18132, 2025

  20. [20]

    Hou and J

    X. Hou and J. Breier,Cryptography and Embedded Systems Security. Springer, 2024

  21. [21]

    Verifying{Constant-Time}implementations,

    J. B. Almeida, M. Barbosa, G. Barthe, F. Dupressoir, and M. Emmi, “Verifying{Constant-Time}implementations,” in25th USENIX Security Symposium (USENIX Security 16), 2016, pp. 53–70

  22. [22]

    Efficient algorithms for function approx- imation with piecewise linear sigmoidal networks,

    D. R. Hush and B. Horne, “Efficient algorithms for function approx- imation with piecewise linear sigmoidal networks,”IEEE transactions on neural networks, vol. 9, no. 6, pp. 1129–1141, 1998

  23. [23]

    Cost effective tanh activation function circuits based on fast piecewise linear logic,

    K. Liu, W. Shi, C. Huang, and D. Zeng, “Cost effective tanh activation function circuits based on fast piecewise linear logic,”Microelectronics Journal, vol. 138, p. 105821, 2023

  24. [24]

    Dif-lut pro: An automated tool for simple yet scalable approximation of nonlinear activation on fpga,

    Y . Liu, S. Li, Y . Li, R. Chen, S. Li, J. Yu, and K. Wang, “Dif-lut pro: An automated tool for simple yet scalable approximation of nonlinear activation on fpga,”IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2025

  25. [25]

    Padé approximants,

    A. George, J. Baker, and P. Graves-Morris, “Padé approximants,”Cam- bridge University Press, p. 746, 1996

  26. [26]

    Edge ai for internet of medical things: A literature review,

    A. Rocha, M. Monteiro, C. Mattos, M. Dias, J. Soares, R. Magalhaes, and J. Macedo, “Edge ai for internet of medical things: A literature review,”Computers and Electrical Engineering, vol. 116, p. 109202, 2024

  27. [27]

    Integrating wearable health devices with ai and edge computing for personalized rehabilitation,

    L. Xi, C. Li, M. S. Anari, and K. Rezaee, “Integrating wearable health devices with ai and edge computing for personalized rehabilitation,” Journal of Cloud Computing, vol. 14, no. 1, p. 64, 2025