Kernel-Based ReLU Approximation for Homomorphic Encryption-Compatible Privacy-preserving Deep Learning Models

Dimitra Papatsaroucha; Dimitrios Sygletos; Evangelos K. Markakis; Ilias Politis; Marios Choudetsanakis

arxiv: 2605.23641 · v1 · pith:3IPOIBR6new · submitted 2026-05-22 · 💻 cs.CR

Kernel-Based ReLU Approximation for Homomorphic Encryption-Compatible Privacy-preserving Deep Learning Models

Dimitrios Sygletos , Dimitra Papatsaroucha , Marios Choudetsanakis , Ilias Politis , Evangelos K. Markakis This is my paper

Pith reviewed 2026-05-25 04:09 UTC · model grok-4.3

classification 💻 cs.CR

keywords homomorphic encryptionReLU approximationprivacy-preserving deep learningkernel-based approximationpolynomial approximationlarge language modelssecure inferencetransformers

0 comments

The pith

A kernel-based second-degree polynomial approximates ReLU to make it compatible with homomorphic encryption for privacy-preserving deep learning.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops an approximation for the ReLU activation so that non-linear operations can run on encrypted data without decryption. Homomorphic encryption handles only addition and multiplication, which excludes standard ReLU and blocks privacy-preserving use of large language models. The method first creates a smooth kernel function that mimics ReLU behavior, then fits it with a low-degree polynomial trained directly on token embeddings from pre-trained models. Evaluation across simulated data, tokenized inputs, and full transformer setups shows the approximation maintains fidelity while keeping multiplicative depth low. A reader would care because the technique could allow secure inference on sensitive language data in settings where decryption is not permitted.

Core claim

The central claim is that a kernel-based smooth function mimicking ReLU can be approximated by a second-degree polynomial, inspired by Jackson's theorem, to achieve low multiplicative depth; when trained and tested on token embeddings from pre-trained LLMs, this yields improved approximation fidelity and supports deployment in deep learning and transformer models under HE constraints.

What carries the argument

A kernel-based smooth ReLU mimic fitted by a second-degree polynomial trained on LLM token embeddings to keep multiplicative depth low while preserving fidelity.

If this is right

The approximation enables ReLU operations inside HE-constrained inference pipelines for NLP tasks.
It supports evaluation and use across tokenized data, deep learning models, and transformer architectures.
Low multiplicative depth is maintained, allowing the method to fit within HE limitations.
Improved fidelity over prior approximations makes secure and privacy-preserving inference more practical in various tasks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same kernel-plus-polynomial pattern could be tested on other activation functions that currently block HE deployment.
If the embedding-trained fit generalizes across model scales, it could reduce accuracy loss when moving from small test models to production LLMs.
One could measure whether the approximation still holds when the surrounding layers are also replaced by their HE-compatible versions rather than kept in plaintext.

Load-bearing premise

The polynomial approximation trained on token embeddings will preserve enough end-to-end performance when placed inside full transformer architectures under actual HE constraints.

What would settle it

Insert the approximated ReLU into a complete transformer, run inference on encrypted data, and check whether task accuracy drops below the level achieved by the original plaintext model.

Figures

Figures reproduced from arXiv: 2605.23641 by Dimitra Papatsaroucha, Dimitrios Sygletos, Evangelos K. Markakis, Ilias Politis, Marios Choudetsanakis.

**Figure 2.** Figure 2: Actual ReLU vs Kernel Approx. 4.2 Input Data Selection, Polynomial Regression & Optimal Degree Selection To approximate the output of the kernel function, a polynomial regression (5) model was trained on the kernel function predictions, where 𝑛 is the degree of the polynomial. 𝑦 = 𝛽0 + 𝛽1𝑥 + 𝛽2𝑥 2 + ... + 𝛽𝑛𝑥 𝑛 + 𝜖 (5) Regarding input data, embeddings were extracted from RoBERTa1 and DistilBERT2 using the … view at source ↗

**Figure 3.** Figure 3: Actual ReLU vs Approximations. 5.4.3 Experiment 3: Deep Learning Models Performance. The proposed Kernel Polynomial method was assessed alongside the established polynomials from the literature, which were also included in the previous experiments, to determine if model accuracy is maintained when the proposed solution is implemented [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

read the original abstract

As privacy concerns in AI technologies continue to grow, Homomorphic Encryption (HE) offers a way to perform computations on encrypted data without the need of decryption during operations. However, HE is limited to addition and multiplication, making non-linear functions incompatible in their original form. This limitation has become more critical with the widespread use of Large Language Models (LLMs), where the non-linearity of activation functions such as the Rectified Linear Unit (ReLU) poses challenges for deployment in privacy-preserving Natural Language Processing (NLP) settings. This paper proposes a kernel-based approximation of ReLU, enabling its use within HE-constrained settings and thus contributing a critical step toward supporting privacy-preserving LLMs. A smooth kernel-based function, mimicking ReLU, is approximated using a second-degree polynomial, inspired by Jackson's theorem, to achieve low multiplicative depth. The proposed method is trained and assessed directly on token embeddings from pre-trained LLMs and evaluated in various scenarios, from simulated and tokenized data to deep learning and transformer models. Results show improved approximation fidelity, supporting the method's suitability for secure and privacy-preserving inference in various tasks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper introduces a kernel-based quadratic polynomial approximation for ReLU in HE settings trained on LLM embeddings, but provides no metrics or end-to-end results to support the fidelity claims.

read the letter

The main point is a kernel-based second-degree polynomial fit for ReLU, trained on token embeddings from pre-trained LLMs and meant to keep multiplicative depth low enough for homomorphic encryption. The approach draws on Jackson's theorem for the approximation and targets the non-linearity barrier in privacy-preserving NLP models. That specific construction on embeddings looks like the new element compared to earlier polynomial work in the HE literature. It does a reasonable job framing the practical constraint and outlining an evaluation path from simulated data through to transformer models. The focus on low depth is the right priority for HE feasibility. The soft spots are the missing evidence. The abstract asserts improved fidelity without any error numbers, baseline comparisons, or details on training protocols. There is also no sign that the approximation was inserted into a full forward pass, that noise budgets were checked under actual ciphertext operations, or that downstream metrics like accuracy or perplexity were measured after replacement. The stress-test concern holds: isolated embedding fidelity does not establish preserved behavior once the approximation interacts with attention and layer norms. This work is aimed at researchers already working on HE for deep learning. A reader looking for new activation approximations might extract the kernel idea if the full paper supplies the missing experiments and comparisons. It deserves peer review to see whether the full manuscript closes those gaps, since the underlying problem is real and the method direction is worth checking.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes a kernel-based second-degree polynomial approximation to ReLU, inspired by Jackson's theorem, for compatibility with homomorphic encryption. The method is trained directly on token embeddings from pre-trained LLMs and is claimed to achieve improved approximation fidelity that supports its use for privacy-preserving inference in deep learning and transformer models.

Significance. If the quantitative results confirm higher fidelity than existing low-degree polynomial approximations while maintaining HE compatibility and without degrading end-to-end transformer performance, the work would provide a practical contribution to privacy-preserving NLP. Training the approximant on domain-specific embeddings is a sound methodological choice.

major comments (2)

[Abstract] Abstract: the claim that 'Results show improved approximation fidelity' is unsupported by any numerical metrics, baseline comparisons, error bounds, or evaluation protocols, preventing verification of the central improvement assertion.
[Abstract] Abstract: the evaluation is described as spanning 'simulated and tokenized data to deep learning and transformer models,' yet no evidence is supplied that the approximation was substituted into a complete transformer forward pass, that multiplicative depth stayed within HE noise budgets, or that downstream task metrics (accuracy, perplexity) were measured after replacement.

minor comments (1)

The abstract would be strengthened by inclusion of at least one concrete fidelity metric or comparison result.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback highlighting issues with the abstract. We agree that the abstract requires revision to ensure claims are supported and descriptions are precise. We will update the abstract in the revised manuscript and address the points below.

read point-by-point responses

Referee: [Abstract] Abstract: the claim that 'Results show improved approximation fidelity' is unsupported by any numerical metrics, baseline comparisons, error bounds, or evaluation protocols, preventing verification of the central improvement assertion.

Authors: We agree the abstract claim is not supported by explicit metrics within the abstract itself. The revised abstract will incorporate quantitative results from our experiments, including approximation error metrics, baseline comparisons to other low-degree polynomials, and a brief description of the evaluation protocol on LLM token embeddings. revision: yes
Referee: [Abstract] Abstract: the evaluation is described as spanning 'simulated and tokenized data to deep learning and transformer models,' yet no evidence is supplied that the approximation was substituted into a complete transformer forward pass, that multiplicative depth stayed within HE noise budgets, or that downstream task metrics (accuracy, perplexity) were measured after replacement.

Authors: We acknowledge the abstract overstates the evaluation scope. The manuscript evaluates the kernel-based approximation on simulated data, tokenized embeddings, and in standard deep learning settings, but does not demonstrate substitution into a full transformer forward pass under HE constraints or report downstream metrics like accuracy or perplexity after replacement. The revised abstract will accurately reflect the performed experiments without implying complete end-to-end transformer or HE noise budget validation. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical fitting evaluated on held-out fidelity metrics

full rationale

The paper trains a second-degree polynomial to approximate ReLU on token embeddings drawn from pre-trained LLMs and reports direct fidelity measurements across simulated, tokenized, and transformer scenarios. No derivation chain, equation, or claim reduces by construction to a fitted quantity defined in terms of itself, nor does any load-bearing step rest on a self-citation whose content is unverified. The reported improvement in approximation fidelity is an independent empirical outcome of the training procedure rather than a tautological renaming or self-referential prediction.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 0 invented entities

The approach rests on fitting a quadratic polynomial to a custom smooth kernel function that mimics ReLU; the fitting process introduces free parameters whose values are not reported in the abstract. The method invokes Jackson's theorem for the existence of low-degree approximations.

free parameters (2)

Kernel function parameters
Parameters defining the smooth kernel-based ReLU mimic are fitted on LLM embeddings.
Polynomial coefficients
Coefficients of the second-degree polynomial are fitted to match the kernel function.

axioms (1)

standard math Jackson's theorem guarantees the existence of low-degree polynomial approximations to continuous functions with controlled error
The paper states the method is inspired by Jackson's theorem to achieve low multiplicative depth.

pith-pipeline@v0.9.0 · 5750 in / 1156 out tokens · 26272 ms · 2026-05-25T04:09:47.545661+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

20 extracted references · 20 canonical work pages · 3 internal anchors

[1]

Ahle, Michael Kapralov, Jakob B

Thomas D. Ahle, Michael Kapralov, Jakob B. T. Knudsen, Rasmus Pagh, Ameya Velingker, David Woodruff, and Amir Zandieh. 2020. Oblivious Sketching of High-Degree Polynomial Kernels. arXiv:1909.01410 [cs.DS] https://arxiv.org/ abs/1909.01410

work page arXiv 2020
[2]

E. W. Cheney. 1982.Introduction to Approximation Theory. AMS Chelsea Pub- lishing, 201 Charles Street, Providence, RI 02904-2213, USA

work page 1982
[3]

Jung Hee Cheon, Andrey Kim, Miran Kim, and Yongsoo Song. 2016. Homomor- phic Encryption for Arithmetic of Approximate Numbers. Cryptology ePrint Archive, Paper 2016/421. https://eprint.iacr.org/2016/421

work page 2016
[4]

Edward Chou, Josh Beal, Daniel Levy, Serena Yeung, Albert Haque, and Li Fei- Fei. 2018. Faster CryptoNets: Leveraging Sparsity for Real-World Encrypted Inference. arXiv:1811.09953 [cs.CR] https://arxiv.org/abs/1811.09953

work page internal anchor Pith review Pith/arXiv arXiv 2018
[5]

Li Deng. 2012. The mnist database of handwritten digit images for machine learning research.IEEE Signal Processing Magazine29, 6 (2012), 141–142

work page 2012
[6]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv:1810.04805 [cs.CL] https://arxiv.org/abs/1810.04805

work page internal anchor Pith review Pith/arXiv arXiv 2019
[7]

Draper and Harry Smith

Norman R. Draper and Harry Smith. 1998. Applied Regression Analysis

work page 1998
[8]

Hugging Face. 2024. Transformers: State-of-the-art Machine Learning for Py- torch, TensorFlow, and JAX. https://huggingface.co/transformers. Accessed: 2025-05-23

work page 2024
[9]

Bengt Fornberg and Julia Zuev. 2007. The Runge phenomenon and spatially variable shape parameters in RBF interpolation.Computers & Mathematics with Applications54, 3 (2007), 379–398. doi:10.1016/j.camwa.2007.01.028

work page doi:10.1016/j.camwa.2007.01.028 2007
[10]

Ran Gilad-Bachrach, Nathan Dowlin, Kim Laine, Kristin Lauter, Michael Naehrig, and John Wernsing. 2016. CryptoNets: Applying Neural Networks to Encrypted Data with High Throughput and Accuracy. InProceedings of The 33rd Inter- national Conference on Machine Learning (Proceedings of Machine Learning Re- search, Vol. 48), Maria Florina Balcan and Kilian Q. ...

work page 2016
[11]

2024.kernlab: Kernel- Based Machine Learning Lab

Alexandros Karatzoglou, Alex Smola, and Kurt Hornik. 2024.kernlab: Kernel- Based Machine Learning Lab. https://CRAN.R-project.org/package=kernlab R package version 0.9-33

work page 2024
[12]

Tanveer Khan and Antonis Michalas. 2023. Learning in the Dark: Privacy- Preserving Machine Learning using Function Approximation. In2023 IEEE 22nd International Conference on Trust, Security and Privacy in Computing and Com- munications (TrustCom). IEEE Computer Society, Los Alamitos, CA, USA, 62–71. doi:10.1109/TrustCom60117.2023.00031

work page doi:10.1109/trustcom60117.2023.00031 2023
[13]

Alex Krizhevsky. 2009. Learning Multiple Layers of Features from Tiny Images. https://api.semanticscholar.org/CorpusID:18268744

work page 2009
[14]

Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. 1998. Gradient- based learning applied to document recognition.Proc. IEEE86, 11 (1998), 2278– 2324

work page 1998
[15]

Vinod Nair and Geoffrey E. Hinton. 2010. Rectified Linear Units Improve Re- stricted Boltzmann Machines. InProceedings of the 27th International Conference on Machine Learning (ICML-10), Johannes Fürnkranz and Thorsten Joachims (Eds.). Omnipress, Haifa, Israel, 807–814. http://www.icml2010.org/papers/432. pdf

work page 2010
[16]

OpenMined and Zama. 2024. TenSEAL: Homomorphic encryption library for PyTorch tensors. https://github.com/OpenMined/TenSEAL. Accessed: 2025-05- 23

work page 2024
[17]

1985.n-Widths in Approximation Theory

Allan Pinkus. 1985.n-Widths in Approximation Theory. Springer-Verlag, Springer- Verlag Berlin Heidelberg 1985

work page 1985
[18]

Bernhard Schölkopf and Alexander J. Smola. 2002.Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, Cambridge, MA

work page 2002
[19]

Manning, Andrew Ng, and Christopher Potts

Richard Socher, Alex Perelygin, Jean Wu, Jason Chuang, Christopher D. Manning, Andrew Ng, and Christopher Potts. 2013. Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank. InProceedings of the 2013 Con- ference on Empirical Methods in Natural Language Processing, David Yarowsky, Timothy Baldwin, Anna Korhonen, Karen Livescu, and ...

work page 2013
[20]

Alex Wang, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, and Samuel R. Bowman. 2019. GLUE: A Multi-Task Benchmark and Analysis Plat- form for Natural Language Understanding. arXiv:1804.07461 [cs.CL] https: //arxiv.org/abs/1804.07461

work page internal anchor Pith review Pith/arXiv arXiv 2019

[1] [1]

Ahle, Michael Kapralov, Jakob B

Thomas D. Ahle, Michael Kapralov, Jakob B. T. Knudsen, Rasmus Pagh, Ameya Velingker, David Woodruff, and Amir Zandieh. 2020. Oblivious Sketching of High-Degree Polynomial Kernels. arXiv:1909.01410 [cs.DS] https://arxiv.org/ abs/1909.01410

work page arXiv 2020

[2] [2]

E. W. Cheney. 1982.Introduction to Approximation Theory. AMS Chelsea Pub- lishing, 201 Charles Street, Providence, RI 02904-2213, USA

work page 1982

[3] [3]

Jung Hee Cheon, Andrey Kim, Miran Kim, and Yongsoo Song. 2016. Homomor- phic Encryption for Arithmetic of Approximate Numbers. Cryptology ePrint Archive, Paper 2016/421. https://eprint.iacr.org/2016/421

work page 2016

[4] [4]

Edward Chou, Josh Beal, Daniel Levy, Serena Yeung, Albert Haque, and Li Fei- Fei. 2018. Faster CryptoNets: Leveraging Sparsity for Real-World Encrypted Inference. arXiv:1811.09953 [cs.CR] https://arxiv.org/abs/1811.09953

work page internal anchor Pith review Pith/arXiv arXiv 2018

[5] [5]

Li Deng. 2012. The mnist database of handwritten digit images for machine learning research.IEEE Signal Processing Magazine29, 6 (2012), 141–142

work page 2012

[6] [6]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv:1810.04805 [cs.CL] https://arxiv.org/abs/1810.04805

work page internal anchor Pith review Pith/arXiv arXiv 2019

[7] [7]

Draper and Harry Smith

Norman R. Draper and Harry Smith. 1998. Applied Regression Analysis

work page 1998

[8] [8]

Hugging Face. 2024. Transformers: State-of-the-art Machine Learning for Py- torch, TensorFlow, and JAX. https://huggingface.co/transformers. Accessed: 2025-05-23

work page 2024

[9] [9]

Bengt Fornberg and Julia Zuev. 2007. The Runge phenomenon and spatially variable shape parameters in RBF interpolation.Computers & Mathematics with Applications54, 3 (2007), 379–398. doi:10.1016/j.camwa.2007.01.028

work page doi:10.1016/j.camwa.2007.01.028 2007

[10] [10]

Ran Gilad-Bachrach, Nathan Dowlin, Kim Laine, Kristin Lauter, Michael Naehrig, and John Wernsing. 2016. CryptoNets: Applying Neural Networks to Encrypted Data with High Throughput and Accuracy. InProceedings of The 33rd Inter- national Conference on Machine Learning (Proceedings of Machine Learning Re- search, Vol. 48), Maria Florina Balcan and Kilian Q. ...

work page 2016

[11] [11]

2024.kernlab: Kernel- Based Machine Learning Lab

Alexandros Karatzoglou, Alex Smola, and Kurt Hornik. 2024.kernlab: Kernel- Based Machine Learning Lab. https://CRAN.R-project.org/package=kernlab R package version 0.9-33

work page 2024

[12] [12]

Tanveer Khan and Antonis Michalas. 2023. Learning in the Dark: Privacy- Preserving Machine Learning using Function Approximation. In2023 IEEE 22nd International Conference on Trust, Security and Privacy in Computing and Com- munications (TrustCom). IEEE Computer Society, Los Alamitos, CA, USA, 62–71. doi:10.1109/TrustCom60117.2023.00031

work page doi:10.1109/trustcom60117.2023.00031 2023

[13] [13]

Alex Krizhevsky. 2009. Learning Multiple Layers of Features from Tiny Images. https://api.semanticscholar.org/CorpusID:18268744

work page 2009

[14] [14]

Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. 1998. Gradient- based learning applied to document recognition.Proc. IEEE86, 11 (1998), 2278– 2324

work page 1998

[15] [15]

Vinod Nair and Geoffrey E. Hinton. 2010. Rectified Linear Units Improve Re- stricted Boltzmann Machines. InProceedings of the 27th International Conference on Machine Learning (ICML-10), Johannes Fürnkranz and Thorsten Joachims (Eds.). Omnipress, Haifa, Israel, 807–814. http://www.icml2010.org/papers/432. pdf

work page 2010

[16] [16]

OpenMined and Zama. 2024. TenSEAL: Homomorphic encryption library for PyTorch tensors. https://github.com/OpenMined/TenSEAL. Accessed: 2025-05- 23

work page 2024

[17] [17]

1985.n-Widths in Approximation Theory

Allan Pinkus. 1985.n-Widths in Approximation Theory. Springer-Verlag, Springer- Verlag Berlin Heidelberg 1985

work page 1985

[18] [18]

Bernhard Schölkopf and Alexander J. Smola. 2002.Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, Cambridge, MA

work page 2002

[19] [19]

Manning, Andrew Ng, and Christopher Potts

Richard Socher, Alex Perelygin, Jean Wu, Jason Chuang, Christopher D. Manning, Andrew Ng, and Christopher Potts. 2013. Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank. InProceedings of the 2013 Con- ference on Empirical Methods in Natural Language Processing, David Yarowsky, Timothy Baldwin, Anna Korhonen, Karen Livescu, and ...

work page 2013

[20] [20]

Alex Wang, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, and Samuel R. Bowman. 2019. GLUE: A Multi-Task Benchmark and Analysis Plat- form for Natural Language Understanding. arXiv:1804.07461 [cs.CL] https: //arxiv.org/abs/1804.07461

work page internal anchor Pith review Pith/arXiv arXiv 2019