Practical Approaches Towards Deep-Learning Based Cross-Device Power Side Channel Attack
Pith reviewed 2026-05-25 02:21 UTC · model grok-4.3
The pith
PCA pre-processing and multi-device training let an MLP recover AES-128 key bytes at 99.43 percent accuracy across 30 AVR devices despite hardware variations.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Utilizing Principal Component Analysis based pre-processing and multi-device training, a Multi-Layer Perceptron based 256-class classifier can achieve an average accuracy of 99.43 percent in recovering the first key byte from all the 30 devices even in the presence of significant inter-device variations; DTW combined with PCA followed by the same MLP further raises accuracy by at least 10.97 percent over CNN approaches for traces with up to 50 sample misalignments.
What carries the argument
PCA-based pre-processing that reduces trace dimensionality while preserving key-dependent features, paired with training a 256-class MLP classifier on power traces collected from multiple devices.
If this is right
- The MLP with PCA pre-processing outperforms a CNN trained on four devices by about 20 percent in average cross-device test accuracy on aligned traces.
- DTW alignment followed by PCA and the 256-class MLP maintains at least 10.97 percent higher accuracy than CNN methods even when traces differ by up to 50 time samples.
- Cross-device key recovery reaches 99.43 percent average accuracy on a set of 30 devices using only the first key byte of AES-128.
- Multi-device training plus dimensionality reduction makes profiling attacks viable despite hardware manufacturing differences.
Where Pith is reading between the lines
- The same pre-processing steps could be tested on other microcontrollers or ciphers to check whether the accuracy gain generalizes beyond 8-bit AVR AES implementations.
- Collecting traces from even more devices during training might further reduce the accuracy gap between seen and unseen hardware.
- If PCA discards too little key information, the method might combine with other dimensionality techniques such as autoencoders for additional robustness.
Load-bearing premise
Device-to-device variations in power traces are large enough to break single-device models yet remain correctable by PCA and multi-device training without erasing the key-dependent information needed for classification.
What would settle it
A measured drop in first-byte recovery accuracy below usable levels when the same MLP-PCA pipeline is tested on a fresh set of AVR devices never seen during training or on traces with misalignments exceeding 50 samples.
Figures
read the original abstract
Power side-channel analysis (SCA) has been of immense interest to most embedded designers to evaluate the physical security of the system. This work presents profiling-based cross-device power SCA attacks using deep learning techniques on 8-bit AVR microcontroller devices running AES-128. Firstly, we show the practical issues that arise in these profiling-based cross-device attacks due to significant device-to-device variations. Secondly, we show that utilizing Principal Component Analysis (PCA) based pre-processing and multi-device training, a Multi-Layer Perceptron (MLP) based 256-class classifier can achieve an average accuracy of 99.43% in recovering the first key byte from all the 30 devices in our data set, even in the presence of significant inter-device variations. Results show that the designed MLP with PCA-based pre-processing outperforms a Convolutional Neural Network (CNN) with 4-device training by ~20%in terms of the average test accuracy of cross-device attack for the aligned traces captured using the ChipWhisperer hardware.Finally, to extend the practicality of these cross-device attacks, another pre-processing step, namely, Dynamic Time Warping (DTW) has been utilized to remove any misalignment among the traces, before performing PCA. DTW along with PCA followed by the 256-class MLP classifier provides >=10.97% higher accuracy than the CNN based approach for cross-device attack even in the presence of up to 50 time-sample misalignments between the traces.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that profiling-based cross-device power side-channel attacks on AES-128 running on 8-bit AVR devices can be made practical using deep learning. It shows that PCA-based pre-processing combined with multi-device training enables an MLP 256-class classifier to recover the first key byte with 99.43% average accuracy across all 30 devices despite inter-device variations. The MLP+PCA approach outperforms a 4-device CNN baseline by approximately 20% on aligned traces, and adding DTW pre-processing maintains at least 10.97% higher accuracy than CNN even with up to 50-sample misalignments.
Significance. If the reported accuracies are reproducible, the work demonstrates a concrete, practical mitigation of device-to-device variation in SCA via standard pre-processing (PCA, DTW) and multi-device profiling, which could inform security evaluations of embedded cryptographic implementations. The empirical outperformance over a CNN baseline on real ChipWhisperer traces provides a useful data point for the community.
major comments (3)
- [Abstract] Abstract: The 99.43% average accuracy figure is reported without any accompanying information on the total number of traces per device, the train/test split (how many of the 30 devices are used for training versus testing), or statistical measures such as standard deviation across runs or keys. These details are load-bearing for assessing whether the cross-device claim is robust.
- [Abstract] Abstract: The ~20% outperformance is stated relative to a CNN trained on only 4 devices, while the MLP uses multi-device training. Without a controlled comparison holding the number of training devices fixed (e.g., CNN results with the same multi-device set or MLP results with 4 devices), it is unclear whether the gain is due to the MLP architecture, the PCA step, or simply the larger training distribution.
- [Abstract] Abstract: No information is provided on the validation methodology (e.g., whether accuracy is computed per key byte across multiple keys, whether traces are from the same or different plaintexts, or any cross-validation procedure), which is required to evaluate the reliability of the 99.43% and >=10.97% figures.
minor comments (2)
- [Abstract] Abstract contains a typographical error: '~20%in terms' is missing a space.
- [Abstract] The abstract does not define or cite the specific CNN architecture used as baseline, making the comparison harder to interpret.
Simulated Author's Rebuttal
Thank you for your review and the valuable feedback on our manuscript. We address each of the major comments below and will update the abstract and add experiments as needed in the revised version.
read point-by-point responses
-
Referee: [Abstract] Abstract: The 99.43% average accuracy figure is reported without any accompanying information on the total number of traces per device, the train/test split (how many of the 30 devices are used for training versus testing), or statistical measures such as standard deviation across runs or keys. These details are load-bearing for assessing whether the cross-device claim is robust.
Authors: We agree that these details are important for assessing the claim. The manuscript body describes the dataset from 30 devices along with the multi-device training and testing procedure as well as variability measures. We will revise the abstract to include a concise summary of the number of traces, the train/test split across devices, and statistical measures to make the abstract self-contained. revision: yes
-
Referee: [Abstract] Abstract: The ~20% outperformance is stated relative to a CNN trained on only 4 devices, while the MLP uses multi-device training. Without a controlled comparison holding the number of training devices fixed (e.g., CNN results with the same multi-device set or MLP results with 4 devices), it is unclear whether the gain is due to the MLP architecture, the PCA step, or simply the larger training distribution.
Authors: We acknowledge that the reported comparison uses different numbers of training devices and that a controlled experiment would better isolate the sources of improvement. The current results demonstrate the practicality of the MLP+PCA approach when data from many devices is available. We will add a controlled comparison (either CNN on the multi-device set or MLP on 4 devices) in the revised manuscript. revision: yes
-
Referee: [Abstract] Abstract: No information is provided on the validation methodology (e.g., whether accuracy is computed per key byte across multiple keys, whether traces are from the same or different plaintexts, or any cross-validation procedure), which is required to evaluate the reliability of the 99.43% and >=10.97% figures.
Authors: We agree that a brief description of the validation approach belongs in the abstract. The reported figures reflect per-key-byte classification accuracy on traces from random plaintexts under a fixed-key-per-device setup with cross-device evaluation. We will revise the abstract to include a short statement on the validation methodology. revision: yes
Circularity Check
No significant circularity identified
full rationale
The paper reports empirical measurements of classification accuracy on hardware power traces from 30 AVR devices running AES-128. The central result (99.43% average accuracy via PCA pre-processing + multi-device MLP training) follows directly from the described experimental setup and data processing pipeline; no mathematical derivation, prediction step, or uniqueness theorem is claimed that could reduce to fitted parameters or self-citations. The work contains no equations defining quantities in terms of themselves, no renaming of known results as new derivations, and no load-bearing self-citations. This is a standard empirical SCA study whose claims rest on reported test accuracies rather than any closed derivation chain.
Axiom & Free-Parameter Ledger
free parameters (3)
- PCA dimensionality
- MLP hyperparameters
- Number of training devices =
4+
axioms (2)
- domain assumption Power traces from AES-128 on AVR devices contain extractable key information
- domain assumption Inter-device variations are primarily linear and capturable by PCA
Reference graph
Works this paper leans on
-
[1]
P. Kocher, J. Jaffe, and B. Jun, “Differential power analysis,” in Annual International Cryptology Conference . Springer, 1999, pp. 388–397
work page 1999
-
[2]
D. Agrawal, B. Archambeault, J. R. Rao, and P. Rohatgi, “The em side—channel (s),” in International Workshop on Cryptographic Hardware and Embedded Systems . Springer, 2002, pp. 29–45
work page 2002
-
[3]
Electromagnetic analysis (ema): Measures and counter-measures for smart cards,
J.-J. Quisquater and D. Samyde, “Electromagnetic analysis (ema): Measures and counter-measures for smart cards,” in Smart Card Programming and Security. Springer, 2001, pp. 200–210
work page 2001
-
[4]
Electromagnetic analysis: Concrete results,
K. Gandolfi, C. Mourtel, and F. Olivier, “Electromagnetic analysis: Concrete results,” in International workshop on cryptographic hardware and embedded systems. Springer, 2001, pp. 251–261
work page 2001
-
[5]
Optical time-domain eavesdropping risks of crt displays,
M. G. Kuhn, “Optical time-domain eavesdropping risks of crt displays,” in Security and Privacy, 2002. Proceedings. 2002 IEEE Symposium on . IEEE, 2002, pp. 3–18
work page 2002
-
[6]
Information leakage from optical emanations,
J. Loughry and D. A. Umphress, “Information leakage from optical emanations,” ACM Transactions on Information and System Security (TISSEC) , vol. 5, no. 3, pp. 262–289, 2002
work page 2002
-
[7]
D. Asonov and R. Agrawal, “Keyboard acoustic emanations,” in IEEE Symposium on Security and Privacy, 2004. Proceedings. 2004 . IEEE, 2004, pp. 3–11
work page 2004
-
[8]
S. Chari, J. R. Rao, and P. Rohatgi, “Template attacks,” in International Workshop on Cryptographic Hardware and Embedded Systems . Springer, 2002, pp. 13–28
work page 2002
-
[9]
Correlation power analysis with a leakage model,
E. Brier, C. Clavier, and F. Olivier, “Correlation power analysis with a leakage model,” in International workshop on cryptographic hardware and embedded systems . Springer, 2004, pp. 16–29
work page 2004
-
[10]
Convolutional neural networks with data augmentation against jitter-based countermeasures,
E. Cagli, C. Dumas, and E. Prouff, “Convolutional neural networks with data augmentation against jitter-based countermeasures,” in CHES, 2017, pp. 45–68
work page 2017
-
[11]
L. Lerman, R. Poussier, O. Markowitch, and F.-X. Standaert, “Template attacks versus machine learning revisited and the curse of dimensionality in side-channel analysis: extended version,” Journal of Cryptographic Engineering , vol. 8, no. 4, pp. 301–313, 2018
work page 2018
-
[12]
Breaking mifare desfire mf3icd40: power analysis and templates in the real world,
D. Oswald and C. Paar, “Breaking mifare desfire mf3icd40: power analysis and templates in the real world,” in CHES, 2011, pp. 207–222
work page 2011
-
[13]
Efficient, portable template attacks,
M. O. Choudary and M. G. Kuhn, “Efficient, portable template attacks,” IEEE Transactions on Information F orensics and Security , vol. 13, no. 2, pp. 490–501, 2018
work page 2018
-
[14]
Improving cross-device attacks using zero-mean unit-variance normalization,
D. P. Montminy, R. O. Baldwin, M. A. Temple, and E. D. Laspe, “Improving cross-device attacks using zero-mean unit-variance normalization,” Journal of Cryptographic Engineering , vol. 3, no. 2, pp. 99–110, 2013
work page 2013
-
[15]
Empirical evaluation of multi-device profiling side-channel attacks,
N. Hanley, M. O’Neill, M. Tunstall, and W. P. Marnane, “Empirical evaluation of multi-device profiling side-channel attacks,” in Signal Processing Systems (SiPS), 2014 IEEE Workshop on . IEEE, 2014
work page 2014
-
[16]
Efficient template attacks based on probabilistic multi-class support vector machines,
T. Bartkewitz and K. Lemke-Rust, “Efficient template attacks based on probabilistic multi-class support vector machines,” in International Conference on Smart Card Research and Advanced Applications . Springer, 2012, pp. 263–276
work page 2012
-
[17]
Power analysis attack: an approach based on machine learning,
L. Lerman, G. Bontempi, and O. Markowitch, “Power analysis attack: an approach based on machine learning,” International Journal of Applied Cryptography, vol. 3, no. 2, pp. 97–115, 2014
work page 2014
-
[18]
Study of deep learning techniques for side-channel analysis and introduction to ascad database
R. Benadjila, E. Prouff, R. Strullu, E. Cagli, and C. Dumas, “Study of deep learning techniques for side-channel analysis and introduction to ascad database.”
-
[19]
Breaking cryptographic implementations using deep learning techniques,
H. Maghrebi, T. Portigliatti, and E. Prouff, “Breaking cryptographic implementations using deep learning techniques,” in International Conference on Security, Privacy, and Applied Cryptography Engineering . Springer, 2016, pp. 3–26
work page 2016
-
[20]
Profiling power analysis attack based on mlp in dpa contest v4. 2,
Z. Martinasek, P. Dzurenda, and L. Malina, “Profiling power analysis attack based on mlp in dpa contest v4. 2,” in Telecommunications and Signal Processing (TSP), 2016 39th International Conference on . IEEE, 2016, pp. 223–226
work page 2016
-
[21]
X-deepsca: Cross-device deep learning side channel attack,
D. Das, A. Golder, J. Danial, S. Ghosh, A. Raychowdhury, and S. Sen, “X-deepsca: Cross-device deep learning side channel attack,” in Proceedings of the 56th Annual Design Automation Conference 2019 . ACM, 2019, p. 134
work page 2019
-
[22]
Deep learning to evaluate secure rsa implementations,
M. Carbone, V . Conin, M.-A. Cornelie, F. Dassance, G. Dufresne, C. Dumas, E. Prouff, and A. Venelli, “Deep learning to evaluate secure rsa implementations,” Cryptology ePrint Archive, Report 2019/054, 2019, https://eprint.iacr.org/2019/054
work page 2019
-
[23]
Template attacks in principal subspaces,
C. Archambeau, E. Peeters, F.-X. Standaert, and J.-J. Quisquater, “Template attacks in principal subspaces,” in International Workshop on Cryptographic Hardware and Embedded Systems . Springer, 2006, pp. 1–14
work page 2006
-
[24]
Chipwhisperer: An open-source platform for hardware embedded security research,
C. O’Flynn and Z. D. Chen, “Chipwhisperer: An open-source platform for hardware embedded security research,” in International Workshop on Constructive Side-Channel Analysis and Secure Design . Springer, 2014, pp. 243–260
work page 2014
-
[25]
C. Rechberger and E. Oswald, “Practical template attacks,” in International Workshop on Information Security Applications . Springer, 2004, pp. 440–456
work page 2004
-
[26]
Template attacks on masking—resistance is futile,
E. Oswald and S. Mangard, “Template attacks on masking—resistance is futile,” in Cryptographers’ Track at the RSA Conference . Springer, 2007, pp. 243–256
work page 2007
-
[27]
A machine learning approach against a masked aes,
L. Lerman, G. Bontempi, and O. Markowitch, “A machine learning approach against a masked aes,” Journal of Cryptographic Engineering , vol. 5, no. 2, pp. 123–139, 2015
work page 2015
-
[28]
A. Heuser and M. Zohner, “Intelligent machine homicide,” in International Workshop on Constructive Side-Channel Analysis and Secure Design . Springer, 2012, pp. 249–264
work page 2012
-
[29]
A time series approach for profiling attack,
L. Lerman, G. Bontempi, S. B. Taieb, and O. Markowitch, “A time series approach for profiling attack,” in International Conference on Security, Privacy, and Applied Cryptography Engineering . Springer, 2013, pp. 75–94
work page 2013
-
[30]
Neural network based attack on a masked implementation of aes,
R. Gilmore, N. Hanley, and M. O’Neill, “Neural network based attack on a masked implementation of aes,” in IEEE HOST , 2015, pp. 106–111
work page 2015
-
[31]
Optimization of power analysis using neural network,
Z. Martinasek, J. Hajny, and L. Malina, “Optimization of power analysis using neural network,” in International Conference on Smart Card Research and Advanced Applications . Springer, 2013, pp. 94–107
work page 2013
-
[32]
A formal study of power variability issues and side-channel attacks for nanoscale devices,
M. Renauld, F.-X. Standaert, N. Veyrat-Charvillon, D. Kamel, and D. Flandre, “A formal study of power variability issues and side-channel attacks for nanoscale devices,” in Annual International Conference on the Theory and Applications of Cryptographic Techniques . Springer, 2011, pp. 109–128
work page 2011
-
[33]
I. Jolliffe, “Principal component analysis,” in International encyclopedia of statistical science . Springer, 2011, pp. 1094–1096
work page 2011
-
[34]
M. M ¨uller, “Dynamic time warping,” Information retrieval for music and motion , pp. 69–84, 2007
work page 2007
- [35]
- [36]
-
[37]
J. Kim, S. Picek, A. Heuser, S. Bhasin, and A. Hanjalic, “Make some noise: Unleashing the power of convolutional neural networks for profiled side-channel analysis,” Cryptology ePrint Archive, Report 2018/1023, 2018, https://eprint.iacr.org/2018/1023
work page 2018
-
[38]
Convolutional neural network based side-channel attacks in time-frequency representations,
G. Yang, H. Li, J. Ming, and Y . Zhou, “Convolutional neural network based side-channel attacks in time-frequency representations,” in International Conference on Smart Card Research and Advanced Applications . Springer, 2018, pp. 1–17
work page 2018
-
[39]
Improving side-channel analysis through semi-supervised learning,
S. Picek, A. Heuser, A. Jovic, K. Knezevic, and T. Richmond, “Improving side-channel analysis through semi-supervised learning,” in International Conference on Smart Card Research and Advanced Applications . Springer, 2018, pp. 35–50
work page 2018
-
[40]
Loss functions for discriminative training of energy-based models
Y . LeCun and F. J. Huang, “Loss functions for discriminative training of energy-based models.” in AIStats, vol. 6, 2005, p. 34
work page 2005
-
[41]
Adam: A Method for Stochastic Optimization
D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980 , 2014
work page internal anchor Pith review Pith/arXiv arXiv 2014
-
[42]
L. Batina, J. Hogenboom, and J. G. van Woudenberg, “Getting more from pca: first results of using principal component analysis for extensive power analysis,” in RSA, 2012, pp. 383–397
work page 2012
-
[43]
Efficient stochastic methods: profiled attacks beyond 8 bits,
M. O. Choudary and M. G. Kuhn, “Efficient stochastic methods: profiled attacks beyond 8 bits,” in International Conference on Smart Card Research and Advanced Applications . Springer, 2014, pp. 85–103
work page 2014
-
[44]
Enhancing dimensionality reduction methods for side-channel attacks,
E. Cagli, C. Dumas, and E. Prouff, “Enhancing dimensionality reduction methods for side-channel attacks,” in International Conference on Smart Card Research and Advanced Applications . Springer, 2015, pp. 15–33
work page 2015
-
[45]
Improving differential power analysis by elastic alignment,
J. G. van Woudenberg, M. F. Witteman, and B. Bakker, “Improving differential power analysis by elastic alignment,” in Cryptographers’ Track at the RSA Conference. Springer, 2011, pp. 104–119
work page 2011
-
[46]
Evaluation of dynamic voltage and frequency scaling as a differential power analysis countermeasure,
K. Baddam and M. Zwolinski, “Evaluation of dynamic voltage and frequency scaling as a differential power analysis countermeasure,” in IEEE VLSI Design, 2007, pp. 854–862
work page 2007
-
[47]
Rijid: random code injection to mask power analysis based side channel attacks,
J. A. Ambrose, R. G. Ragel, and S. Parameswaran, “Rijid: random code injection to mask power analysis based side channel attacks,” in DAC, 2007, pp. 489–492
work page 2007
-
[48]
Dynamic programming algorithm optimization for spoken word recognition,
H. Sakoe and S. Chiba, “Dynamic programming algorithm optimization for spoken word recognition,” IEEE transactions on acoustics, speech, and signal processing, vol. 26, no. 1, pp. 43–49, 1978
work page 1978
-
[49]
Keras: The python deep learning library,
F. Chollet et al. , “Keras: The python deep learning library,” Astrophysics Source Code Library , 2018
work page 2018
-
[50]
Tensorflow: a system for large-scale machine learning
M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin, S. Ghemawat, G. Irving, M. Isard et al. , “Tensorflow: a system for large-scale machine learning.” in OSDI, vol. 16, 2016, pp. 265–283
work page 2016
-
[51]
Very Deep Convolutional Networks for Large-Scale Image Recognition
K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556 , 2014
work page internal anchor Pith review Pith/arXiv arXiv 2014
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.