arxiv: 2604.16834 · v1 · submitted 2026-04-18 · 💻 cs.CR · cs.LG

Recognition: unknown

Towards Deep Encrypted Training: Low-Latency, Memory-Efficient, and High-Throughput Inference for Privacy-Preserving Neural Networks

Nges Brian Njungle , Eric Jahns , Michel A. Kinsy

Authors on Pith no claims yet

Pith reviewed 2026-05-10 07:11 UTC · model grok-4.3

classification 💻 cs.CR cs.LG

keywords homomorphic encryptionprivacy-preserving machine learningneural network inferencebatch processingResNetencrypted computationCIFAR dataset

0 comments

The pith

Batched homomorphic encryption algorithms with a pipeline architecture achieve 1.78x faster runtime and 3.74x lower memory use for encrypted ResNet inference.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops optimized algorithms for batched HE-friendly neural networks and pairs them with a pipeline architecture that adapts to different batch sizes for better resource use. It tests the methods on ResNet-20 and ResNet-34 models running on encrypted CIFAR-10 and CIFAR-100 data. A reader would care because single-image encrypted inference has advanced while batch processing, essential for high-volume applications, has lagged. If the gains hold, privacy-preserving inference moves closer to practical throughput levels without exposing raw inputs.

Core claim

The authors claim that specialized algorithms for batched HE-friendly neural networks together with a pipeline architecture for resource-efficient execution enable an amortized inference time of 8.86 seconds per image on a batch of 512 encrypted images for ResNet-20, with peak memory of 98.96 GB. This delivers a 1.78x runtime improvement and 3.74x memory reduction versus prior designs. For the deeper ResNet-34 model on a batch of 256 images the amortized time is 28.14 seconds using 246.78 GB of RAM.

What carries the argument

Batched HE-friendly neural network algorithms combined with a pipeline architecture that maximizes resource efficiency across varying batch sizes.

Load-bearing premise

The batching optimizations and pipeline design will continue to deliver gains when noise growth, hardware limits, or deeper networks are present.

What would settle it

Measure amortized time per image and peak memory for the same ResNet-20 model at a batch size of 1024 encrypted images and check whether the 1.78x runtime and 3.74x memory improvements over the prior state-of-the-art still appear.

Figures

Figures reproduced from arXiv: 2604.16834 by Eric Jahns, Michel A. Kinsy, Nges Brian Njungle.

**Figure 2.** Figure 2: The standard ResNet-20 architecture used as the baseline network in this work. The model consists of an initial [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗

**Figure 3.** Figure 3: Our Optimized ResNet-20 Pipeline with Accumulators. The Accumulators are used to rearranged and join the [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗

read the original abstract

Privacy-preserving machine learning (PPML) has become increasingly important in applications where sensitive data must remain confidential. Homomorphic Encryption (HE) enables computation directly on encrypted data, allowing neural network inference without revealing raw inputs. While prior works have largely focused on inference over a single encrypted image, batch processing of encrypted inputs lags behind, despite being critical for high-throughput inference scenarios and training-oriented workloads. In this work, we address this gap by developing optimized algorithms for batched HE-friendly neural networks. We also introduced a pipeline architecture designed to maximize resource efficiency for different batch size execution. We implemented these algorithms and evaluated our work using HE-friendly ResNet-20 and ResNet-34 models on encrypted CIFAR-10 and CIFAR-100 datasets, respectively. For ResNet-20, our approach achieves an amortized inference time of 8.86 seconds per image when processing a batch of 512 encrypted images, with a peak memory usage of 98.96 GB. These results represent a 1.78x runtime improvement and a 3.74x reduction in memory usage compared to the state-of-the-art design. For the deeper ResNet-34 model, we achieve an amortized inference time of 28.14 on a batch of 256 encrypted images using 246.78GB of RAM

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper adds batch-specific algorithms and a pipeline for HE ResNet inference that target real throughput needs, but the headline speed and memory numbers lack the HE parameter details needed to check noise growth or packing costs.

read the letter

The main takeaway is that this work shifts the focus from single-image encrypted inference to batched processing, which is the more relevant setting for any actual deployment. They describe new algorithms for batched HE-friendly networks plus a pipeline architecture that adjusts resource use for different batch sizes, then test on HE-adapted ResNet-20 and ResNet-34 with CIFAR-10/100. The reported results show 8.86 s amortized per image for a 512-image batch on ResNet-20 at 98.96 GB peak memory, plus a 1.78x runtime gain and 3.74x memory cut versus prior designs; the deeper model gets 28.14 s amortized on a 256-image batch at 246.78 GB. That direction is useful because high-throughput batched workloads are where privacy-preserving ML would actually be applied rather than toy single-example cases. The pipeline idea for handling varying batch sizes is a reasonable engineering step that prior single-image papers did not emphasize. The soft spots sit in the missing verification layer. The abstract and the stress-test note both omit the ring dimension, modulus chain, bootstrapping schedule, and per-layer noise budget, so it is impossible to confirm that packing 512 images does not inflate rotation costs or force extra bootstraps that would erase the claimed gains. No ablations isolate the pipeline's contribution, and the large memory numbers for ResNet-34 already hint at scaling limits. Without those details or a full experimental protocol, the performance claims remain hard to reproduce or stress-test against deeper networks or different hardware. This paper is for engineers working on practical PPML systems who want implementation pointers on batching rather than for theorists seeking new primitives. It deserves a serious referee because the problem is real and the direction is grounded, but the review would need to press hard on reproducibility and the HE parameter choices before the numbers can be trusted.

Referee Report

2 major / 2 minor

Summary. The manuscript develops optimized algorithms for batched HE-friendly neural networks together with a pipeline architecture for resource-efficient execution. It evaluates the approach on HE-friendly ResNet-20 (CIFAR-10) and ResNet-34 (CIFAR-100) models, reporting concrete amortized inference times and memory figures that are claimed to improve upon prior state-of-the-art batched designs.

Significance. If the reported performance numbers prove reproducible and the batching/pipeline optimizations remain effective under realistic noise growth, the work would meaningfully advance high-throughput encrypted inference for deeper networks, filling a documented gap between single-image and batched HE inference.

major comments (2)

[Abstract] Abstract: the headline performance claims (8.86 s amortized per image for batch-512 ResNet-20, 28.14 s for batch-256 ResNet-34, together with the 1.78× runtime and 3.74× memory improvements) are presented without any HE parameters (ring dimension, modulus chain, bootstrapping schedule, or per-layer noise budget). In CKKS, each batched convolution and activation multiplies noise and alters slot utilization; without these quantities it is impossible to verify that the claimed latency and memory figures remain valid once noise growth is accounted for.
[Abstract] Abstract and evaluation description: no experimental protocol, hardware specification, number of runs, or ablation on the batching algorithms is supplied. Consequently the robustness of the pipeline architecture under varying batch sizes, deeper networks, or different encryption noise budgets cannot be assessed from the given text.

minor comments (2)

The title emphasizes “Deep Encrypted Training” while the manuscript and abstract address only inference; the scope mismatch should be clarified.
No error bars, variance, or statistical details accompany the reported timing and memory numbers.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below and indicate planned revisions to enhance verifiability while preserving the core contributions of the work.

read point-by-point responses

Referee: [Abstract] Abstract: the headline performance claims (8.86 s amortized per image for batch-512 ResNet-20, 28.14 s for batch-256 ResNet-34, together with the 1.78× runtime and 3.74× memory improvements) are presented without any HE parameters (ring dimension, modulus chain, bootstrapping schedule, or per-layer noise budget). In CKKS, each batched convolution and activation multiplies noise and alters slot utilization; without these quantities it is impossible to verify that the claimed latency and memory figures remain valid once noise growth is accounted for.

Authors: We agree that the abstract would be strengthened by including key HE parameters to facilitate verification of noise growth under batched operations. The manuscript body already specifies the encryption parameters and bootstrapping schedule used to manage per-layer noise budgets for the reported batch sizes. We will revise the abstract to concisely summarize these parameters (ring dimension, modulus chain, and noise budget) alongside the performance claims. This change ensures the headline numbers can be assessed in context without affecting the underlying results or comparisons. revision: yes
Referee: [Abstract] Abstract and evaluation description: no experimental protocol, hardware specification, number of runs, or ablation on the batching algorithms is supplied. Consequently the robustness of the pipeline architecture under varying batch sizes, deeper networks, or different encryption noise budgets cannot be assessed from the given text.

Authors: We acknowledge that the abstract and high-level evaluation description lack an explicit experimental protocol. The manuscript describes the HE-friendly models, datasets, and pipeline architecture, but to allow assessment of robustness we will expand the evaluation section with a dedicated experimental setup subsection. This will detail hardware specifications, number of runs, and ablations on batch sizes and noise budgets. The revision will directly address the concern while keeping the abstract concise. revision: yes

Circularity Check

0 steps flagged

No circularity; empirical benchmarks rest on external measurements

full rationale

The paper reports concrete runtime and memory measurements (8.86 s amortized per image for batch-512 ResNet-20, 28.14 s for batch-256 ResNet-34) obtained by implementing batched HE algorithms and a pipeline architecture on CIFAR-10/100. These are direct experimental outcomes benchmarked against an external state-of-the-art baseline rather than any derived prediction, fitted parameter, or self-citation chain. No equations, uniqueness theorems, or ansatzes are invoked that reduce to the reported numbers by construction; the central claims remain falsifiable by independent re-implementation.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only the abstract is available; no explicit free parameters, axioms, or invented entities are stated in the provided text.

pith-pipeline@v0.9.0 · 5556 in / 1085 out tokens · 49676 ms · 2026-05-10T07:11:59.633410+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

44 extracted references · 5 canonical work pages

[1]

A sur- vey on homomorphic encryption schemes: Theory and implementation

Abbas Acar, Hidayet Aksu, A Selcuk Uluagac, and Mauro Conti. A sur- vey on homomorphic encryption schemes: Theory and implementation. ACM Computing Surveys (Csur), 51(4):1–35, 2018

2018
[2]

Fully homomorphic symmetric scheme without bootstrapping, 2014

Nitesh Aggarwal, CP Gupta, and Iti Sharma. Fully homomorphic symmetric scheme without bootstrapping, 2014

2014
[3]

Ahmad Al Badawi, Chao Jin, Jie Lin, Chan Fook Mun, Sim Jun Jie, Benjamin Hong Meng Tan, Xiao Nan, Khin Mi Mi Aung, and Vijay Ramaseshan Chandrasekhar. Towards the alexnet moment for homomorphic encryption: Hcnn, the first homomorphic cnn on encrypted data with gpus.IEEE Transactions on Emerging Topics in Computing, 9(3):1330–1343, 2020

2020
[4]

Machine learning and its applications: A review

Sheena Angra and Sachin Ahuja. Machine learning and its applications: A review. In2017 international conference on big data analytics and computational intelligence (ICBDAC), pages 57–60. IEEE, 2017

2017
[5]

In33rd USENIX Security Symposium (USENIX Security 24), pages 2173–2190, 2024

Wei Ao and Vishnu Naresh Boddeti.{AutoFHE}: Automated adaption of{CNNs}for efficient evaluation over{FHE}. In33rd USENIX Security Symposium (USENIX Security 24), pages 2173–2190, 2024

2024
[6]

Openfhe: Open-source fully homomorphic encryption library

Ahmad Al Badawi, Jack Bates, Flavio Bergamaschi, David Bruce Cousins, Saroja Erabelli, Nicholas Genise, Shai Halevi, Hamish Hunt, Andrey Kim, Yongwoo Lee, Zeyu Liu, Daniele Micciancio, Ian Quah, Yuriy Polyakov, Saraswathy R.V ., Kurt Rohloff, Jonathan Saylor, Dmitriy Suponitsky, Matthew Triplett, Vinod Vaikuntanathan, and Vin- cent Zucca. Openfhe: Open-so...

2022
[7]

Tt-tfhe: a torus fully homomorphic encryption-friendly neural network architecture.arXiv preprint arXiv:2302.01584, 2023

Adrien Benamira, Tristan Gu ´erand, Thomas Peyrin, and Sayandeep Saha. Tt-tfhe: a torus fully homomorphic encryption-friendly neural network architecture.arXiv preprint arXiv:2302.01584, 2023

work page arXiv 2023
[8]

ngraph-he2: A high-throughput framework for neural network inference on encrypted data

Fabian Boemer, Anamaria Costache, Rosario Cammarota, and Casimir Wierzynski. ngraph-he2: A high-throughput framework for neural network inference on encrypted data. InProceedings of the 7th ACM workshop on encrypted computing & applied homomorphic cryptogra- phy, pages 45–56, 2019

2019
[9]

Intel hexl: accelerating homomorphic encryption with intel avx512-ifma52

Fabian Boemer, Sejun Kim, Gelila Seifu, Fillipe DM de Souza, and Vinodh Gopal. Intel hexl: accelerating homomorphic encryption with intel avx512-ifma52. InProceedings of the 9th on Workshop on Encrypted Computing & Applied Homomorphic Cryptography, pages 57–62, 2021

2021
[10]

Low latency privacy preserving inference

Alon Brutzkus, Ran Gilad-Bachrach, and Oren Elisha. Low latency privacy preserving inference. InInternational Conference on Machine Learning, pages 812–821. PMLR, 2019

2019
[11]

Homomorphic multiple precision multiplication for CKKS and reduced modulus consumption

Jung Hee Cheon, Wonhee Cho, Jaehyung Kim, and Damien Stehl ´e. Homomorphic multiple precision multiplication for CKKS and reduced modulus consumption. Cryptology ePrint Archive, Paper 2023/1788, 2023

2023
[12]

A full RNS variant of approximate homomorphic encryption

Jung Hee Cheon, Kyoohyung Han, Andrey Kim, Miran Kim, and Yongsoo Song. A full RNS variant of approximate homomorphic encryption. Cryptology ePrint Archive, Paper 2018/931, 2018

2018
[13]

Jung Hee Cheon, Minsik Kang, Taeseong Kim, Junyoung Jung, and Yongdong Yeo. Batch inference on deep convolutional neural networks with fully homomorphic encryption using channel-by-channel convolu- tions.IEEE Transactions on Dependable and Secure Computing, 2024

2024
[14]

Homo- morphic encryption for arithmetic of approximate numbers

Jung Hee Cheon, Andrey Kim, Miran Kim, and Yongsoo Song. Homo- morphic encryption for arithmetic of approximate numbers. Cryptology ePrint Archive, Paper 2016/421, 2016

2016
[15]

Tfhe: Fast fully homomorphic encryption over the torus

Ilaria Chillotti, Nicolas Gama, Mariya Georgieva, and Malika Iz- abach`ene. Tfhe: Fast fully homomorphic encryption over the torus. Cryptology ePrint Archive, Paper 2018/421, 2018. https://eprint.iacr. org/2018/421

2018
[16]

Faster cryptonets: Leveraging sparsity for real-world encrypted inference.arXiv preprint arXiv:1811.09953, 2018

Edward Chou, Josh Beal, Daniel Levy, Serena Yeung, Albert Haque, and Li Fei-Fei. Faster cryptonets: Leveraging sparsity for real-world encrypted inference.arXiv preprint arXiv:1811.09953, 2018

work page arXiv 2018
[17]

Orion: A fully homomorphic encryption framework for deep learning

Austin Ebel, Karthik Garimella, and Brandon Reagen. Orion: A fully homomorphic encryption framework for deep learning. InProceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, V olume 2, pages 734– 749, 2025

2025
[18]

Somewhat practical fully homomorphic encryption.IACR Cryptol

Junfeng Fan and Frederik Vercauteren. Somewhat practical fully homomorphic encryption.IACR Cryptol. ePrint Arch., 2012:144, 2012

2012
[19]

PhD thesis, Stanford University, 2009

Craig Gentry.A fully homomorphic encryption scheme. PhD thesis, Stanford University, 2009. crypto.stanford.edu/craig

2009
[20]

Cryptonets: Applying neural networks to encrypted data with high throughput and accuracy

Ran Gilad-Bachrach, Nathan Dowlin, Kim Laine, Kristin Lauter, Michael Naehrig, and John Wernsing. Cryptonets: Applying neural networks to encrypted data with high throughput and accuracy. In International conference on machine learning, pages 201–210. PMLR, 2016

2016
[21]

Sok: New insights into fully homomorphic encryption libraries via standardized benchmarks.Proceedings on privacy enhancing technologies, 2023

Charles Gouert, Dimitris Mouris, and Nektarios Tsoutsos. Sok: New insights into fully homomorphic encryption libraries via standardized benchmarks.Proceedings on privacy enhancing technologies, 2023

2023
[22]

Batch normalization: Accelerating deep network training by reducing internal covariate shift, 2015

Sergey Ioffe and Christian Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift, 2015

2015
[23]

Springer International Publishing, Cham, 2019

Uday Kamath, John Liu, and James Whitaker.Convolutional Neural Networks, pages 263–314. Springer International Publishing, Cham, 2019

2019
[24]

The effect of batch size on the generalizability of the convolutional neural networks on a histopathology dataset.ICT express, 6(4):312–315, 2020

Ibrahem Kandel and Mauro Castelli. The effect of batch size on the generalizability of the convolutional neural networks on a histopathology dataset.ICT express, 6(4):312–315, 2020

2020
[25]

Optimized privacy-preserving cnn inference with fully homomorphic encryption.IEEE Transactions on Information F orensics and Security, 18:2175–2187, 2023

Dongwoo Kim and Cyril Guyot. Optimized privacy-preserving cnn inference with fully homomorphic encryption.IEEE Transactions on Information F orensics and Security, 18:2175–2187, 2023

2023
[26]

Privacy-preserving machine learning with fully homomorphic encryption for deep neural network.iEEE Access, 10:30039–30054, 2022

Joon-Woo Lee, HyungChul Kang, Yongwoo Lee, Woosuk Choi, Jieun Eom, Maxim Deryabin, Eunsang Lee, Junghyun Lee, Donghoon Yoo, Young-Sik Kim, et al. Privacy-preserving machine learning with fully homomorphic encryption for deep neural network.iEEE Access, 10:30039–30054, 2022

2022
[27]

Falcon: Fast spectral inference on encrypted data.Advances in Neural Information Processing Systems, 33:2364–2374, 2020

Qian Lou, Wen-jie Lu, Cheng Hong, and Lei Jiang. Falcon: Fast spectral inference on encrypted data.Advances in Neural Information Processing Systems, 33:2364–2374, 2020

2020
[28]

On ideal lattices and learning with errors over rings

Vadim Lyubashevsky, Chris Peikert, and Oded Regev. On ideal lattices and learning with errors over rings. Cryptology ePrint Archive, Paper 2012/230, 2012

2012
[29]

Secureml: A system for scalable privacy-preserving machine learning

Payman Mohassel and Yupeng Zhang. Secureml: A system for scalable privacy-preserving machine learning. In2017 IEEE symposium on security and privacy (SP), pages 19–38. IEEE, 2017

2017
[30]

Can homomorphic encryption be practical? InProceedings of the 3rd ACM workshop on Cloud computing security workshop, pages 113–124, 2011

Michael Naehrig, Kristin Lauter, and Vinod Vaikuntanathan. Can homomorphic encryption be practical? InProceedings of the 3rd ACM workshop on Cloud computing security workshop, pages 113–124, 2011

2011
[31]

Towards deep neural network training on encrypted data

Karthik Nandakumar, Nalini Ratha, Sharath Pankanti, and Shai Halevi. Towards deep neural network training on encrypted data. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, pages 0–0, 2019

2019
[32]

Fheon: A con- figurable framework for developing privacy-preserving neural networks using homomorphic encryption.arXiv preprint arXiv:2510.03996, 2025

Nges Brian Njungle, Eric Jahns, and Michel A Kinsy. Fheon: A con- figurable framework for developing privacy-preserving neural networks using homomorphic encryption.arXiv preprint arXiv:2510.03996, 2025

work page arXiv 2025
[33]

Guardianml: Anatomy of privacy-preserving machine learning techniques and frameworks.IEEE Access, 2025

Nges Brian Njungle, Eric Jahns, Zhenqi Wu, Luigi Mastromauro, Milan Stojkov, and Michel Kinsy. Guardianml: Anatomy of privacy-preserving machine learning techniques and frameworks.IEEE Access, 2025

2025
[34]

Sok: Security and privacy in machine learning

Nicolas Papernot, Patrick McDaniel, Arunesh Sinha, and Michael P Wellman. Sok: Security and privacy in machine learning. In2018 IEEE European symposium on security and privacy (EuroS&P), pages 399–414. IEEE, 2018

2018
[35]

Toward practical privacy-preserving convolutional neural networks exploiting fully homomorphic encryption,

Jaiyoung Park, Donghwan Kim, Jongmin Kim, Sangpyo Kim, Wonkyung Jung, Jung Hee Cheon, and Jung Ho Ahn. Toward prac- tical privacy-preserving convolutional neural networks exploiting fully homomorphic encryption.arXiv preprint arXiv:2310.16530, 2023

work page arXiv 2023
[36]

On lattices, learning with errors, random linear codes, and cryptography.Procedings of the thirty-seventh annual ACM symposium on Theory of Computing, 2005

Oded Regev. On lattices, learning with errors, random linear codes, and cryptography.Procedings of the thirty-seventh annual ACM symposium on Theory of Computing, 2005

2005
[37]

Mlaas: Machine learning as a service

Mauro Ribeiro, Katarina Grolinger, and Miriam AM Capretz. Mlaas: Machine learning as a service. In2015 IEEE 14th international conference on machine learning and applications (ICMLA), pages 896–
[38]

Encrypted image classification with low memory footprint using fully homomorphic encryption

Lorenzo Rovida and Alberto Leporati. Encrypted image classification with low memory footprint using fully homomorphic encryption. Cryp- tology ePrint Archive, Paper 2024/460, 2024

2024
[39]

Machine learning for intelligent data analysis and automation in cybersecurity: current and future prospects.Annals of Data Science, 10(6):1473–1498, 2023

Iqbal H Sarker. Machine learning for intelligent data analysis and automation in cybersecurity: current and future prospects.Annals of Data Science, 10(6):1473–1498, 2023

2023
[40]

Co-ml: Collaborative machine learning model building for developing dataset design practices.ACM Transactions on Computing Education, 24(2):1–37, 2024

Tiffany Tseng, Matt J Davidson, Luis Morales-Navarro, Jennifer King Chen, Victoria Delaney, Mark Leibowitz, Jazbo Beason, and R Benjamin Shapiro. Co-ml: Collaborative machine learning model building for developing dataset design practices.ACM Transactions on Computing Education, 24(2):1–37, 2024

2024
[41]

Cryptonn: Training neural networks over encrypted data

Runhua Xu, James BD Joshi, and Chao Li. Cryptonn: Training neural networks over encrypted data. In2019 IEEE 39th International Conference on Distributed Computing Systems (ICDCS), pages 1199–
[42]

Intelligent cross-organizational process mining: A survey and new perspectives.arXiv preprint arXiv:2407.11280, 2024

Yiyuan Yang, Zheshun Wu, Yong Chu, Zhenghua Chen, Zenglin Xu, and Qingsong Wen. Intelligent cross-organizational process mining: A survey and new perspectives.arXiv preprint arXiv:2407.11280, 2024. 14

work page arXiv 2024
[43]

Cheung, and Kejie Huang

Zewen Ye, Tianyu Wang, Tianshun Huang, Yonggen Li, Chengxuan Wang, Ray C.C. Cheung, and Kejie Huang. Htcnn: High-throughput batch cnn inference with homomorphic encryption.IEEE Transactions on Dependable and Secure Computing, pages 1–12, 2025

2025
[44]

To fold or not to fold: a necessary and sufficient condition on batch-normalization layers folding, 2022

Edouard Yvinec, Arnaud Dapogny, and Kevin Bailly. To fold or not to fold: a necessary and sufficient condition on batch-normalization layers folding, 2022

2022