Kernel-Based ReLU Approximation for Homomorphic Encryption-Compatible Privacy-preserving Deep Learning Models
Pith reviewed 2026-05-25 04:09 UTC · model grok-4.3
The pith
A kernel-based second-degree polynomial approximates ReLU to make it compatible with homomorphic encryption for privacy-preserving deep learning.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that a kernel-based smooth function mimicking ReLU can be approximated by a second-degree polynomial, inspired by Jackson's theorem, to achieve low multiplicative depth; when trained and tested on token embeddings from pre-trained LLMs, this yields improved approximation fidelity and supports deployment in deep learning and transformer models under HE constraints.
What carries the argument
A kernel-based smooth ReLU mimic fitted by a second-degree polynomial trained on LLM token embeddings to keep multiplicative depth low while preserving fidelity.
If this is right
- The approximation enables ReLU operations inside HE-constrained inference pipelines for NLP tasks.
- It supports evaluation and use across tokenized data, deep learning models, and transformer architectures.
- Low multiplicative depth is maintained, allowing the method to fit within HE limitations.
- Improved fidelity over prior approximations makes secure and privacy-preserving inference more practical in various tasks.
Where Pith is reading between the lines
- The same kernel-plus-polynomial pattern could be tested on other activation functions that currently block HE deployment.
- If the embedding-trained fit generalizes across model scales, it could reduce accuracy loss when moving from small test models to production LLMs.
- One could measure whether the approximation still holds when the surrounding layers are also replaced by their HE-compatible versions rather than kept in plaintext.
Load-bearing premise
The polynomial approximation trained on token embeddings will preserve enough end-to-end performance when placed inside full transformer architectures under actual HE constraints.
What would settle it
Insert the approximated ReLU into a complete transformer, run inference on encrypted data, and check whether task accuracy drops below the level achieved by the original plaintext model.
Figures
read the original abstract
As privacy concerns in AI technologies continue to grow, Homomorphic Encryption (HE) offers a way to perform computations on encrypted data without the need of decryption during operations. However, HE is limited to addition and multiplication, making non-linear functions incompatible in their original form. This limitation has become more critical with the widespread use of Large Language Models (LLMs), where the non-linearity of activation functions such as the Rectified Linear Unit (ReLU) poses challenges for deployment in privacy-preserving Natural Language Processing (NLP) settings. This paper proposes a kernel-based approximation of ReLU, enabling its use within HE-constrained settings and thus contributing a critical step toward supporting privacy-preserving LLMs. A smooth kernel-based function, mimicking ReLU, is approximated using a second-degree polynomial, inspired by Jackson's theorem, to achieve low multiplicative depth. The proposed method is trained and assessed directly on token embeddings from pre-trained LLMs and evaluated in various scenarios, from simulated and tokenized data to deep learning and transformer models. Results show improved approximation fidelity, supporting the method's suitability for secure and privacy-preserving inference in various tasks.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes a kernel-based second-degree polynomial approximation to ReLU, inspired by Jackson's theorem, for compatibility with homomorphic encryption. The method is trained directly on token embeddings from pre-trained LLMs and is claimed to achieve improved approximation fidelity that supports its use for privacy-preserving inference in deep learning and transformer models.
Significance. If the quantitative results confirm higher fidelity than existing low-degree polynomial approximations while maintaining HE compatibility and without degrading end-to-end transformer performance, the work would provide a practical contribution to privacy-preserving NLP. Training the approximant on domain-specific embeddings is a sound methodological choice.
major comments (2)
- [Abstract] Abstract: the claim that 'Results show improved approximation fidelity' is unsupported by any numerical metrics, baseline comparisons, error bounds, or evaluation protocols, preventing verification of the central improvement assertion.
- [Abstract] Abstract: the evaluation is described as spanning 'simulated and tokenized data to deep learning and transformer models,' yet no evidence is supplied that the approximation was substituted into a complete transformer forward pass, that multiplicative depth stayed within HE noise budgets, or that downstream task metrics (accuracy, perplexity) were measured after replacement.
minor comments (1)
- The abstract would be strengthened by inclusion of at least one concrete fidelity metric or comparison result.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback highlighting issues with the abstract. We agree that the abstract requires revision to ensure claims are supported and descriptions are precise. We will update the abstract in the revised manuscript and address the points below.
read point-by-point responses
-
Referee: [Abstract] Abstract: the claim that 'Results show improved approximation fidelity' is unsupported by any numerical metrics, baseline comparisons, error bounds, or evaluation protocols, preventing verification of the central improvement assertion.
Authors: We agree the abstract claim is not supported by explicit metrics within the abstract itself. The revised abstract will incorporate quantitative results from our experiments, including approximation error metrics, baseline comparisons to other low-degree polynomials, and a brief description of the evaluation protocol on LLM token embeddings. revision: yes
-
Referee: [Abstract] Abstract: the evaluation is described as spanning 'simulated and tokenized data to deep learning and transformer models,' yet no evidence is supplied that the approximation was substituted into a complete transformer forward pass, that multiplicative depth stayed within HE noise budgets, or that downstream task metrics (accuracy, perplexity) were measured after replacement.
Authors: We acknowledge the abstract overstates the evaluation scope. The manuscript evaluates the kernel-based approximation on simulated data, tokenized embeddings, and in standard deep learning settings, but does not demonstrate substitution into a full transformer forward pass under HE constraints or report downstream metrics like accuracy or perplexity after replacement. The revised abstract will accurately reflect the performed experiments without implying complete end-to-end transformer or HE noise budget validation. revision: yes
Circularity Check
No significant circularity; empirical fitting evaluated on held-out fidelity metrics
full rationale
The paper trains a second-degree polynomial to approximate ReLU on token embeddings drawn from pre-trained LLMs and reports direct fidelity measurements across simulated, tokenized, and transformer scenarios. No derivation chain, equation, or claim reduces by construction to a fitted quantity defined in terms of itself, nor does any load-bearing step rest on a self-citation whose content is unverified. The reported improvement in approximation fidelity is an independent empirical outcome of the training procedure rather than a tautological renaming or self-referential prediction.
Axiom & Free-Parameter Ledger
free parameters (2)
- Kernel function parameters
- Polynomial coefficients
axioms (1)
- standard math Jackson's theorem guarantees the existence of low-degree polynomial approximations to continuous functions with controlled error
Reference graph
Works this paper leans on
-
[1]
Ahle, Michael Kapralov, Jakob B
Thomas D. Ahle, Michael Kapralov, Jakob B. T. Knudsen, Rasmus Pagh, Ameya Velingker, David Woodruff, and Amir Zandieh. 2020. Oblivious Sketching of High-Degree Polynomial Kernels. arXiv:1909.01410 [cs.DS] https://arxiv.org/ abs/1909.01410
-
[2]
E. W. Cheney. 1982.Introduction to Approximation Theory. AMS Chelsea Pub- lishing, 201 Charles Street, Providence, RI 02904-2213, USA
work page 1982
-
[3]
Jung Hee Cheon, Andrey Kim, Miran Kim, and Yongsoo Song. 2016. Homomor- phic Encryption for Arithmetic of Approximate Numbers. Cryptology ePrint Archive, Paper 2016/421. https://eprint.iacr.org/2016/421
work page 2016
-
[4]
Edward Chou, Josh Beal, Daniel Levy, Serena Yeung, Albert Haque, and Li Fei- Fei. 2018. Faster CryptoNets: Leveraging Sparsity for Real-World Encrypted Inference. arXiv:1811.09953 [cs.CR] https://arxiv.org/abs/1811.09953
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[5]
Li Deng. 2012. The mnist database of handwritten digit images for machine learning research.IEEE Signal Processing Magazine29, 6 (2012), 141–142
work page 2012
-
[6]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv:1810.04805 [cs.CL] https://arxiv.org/abs/1810.04805
work page internal anchor Pith review Pith/arXiv arXiv 2019
-
[7]
Norman R. Draper and Harry Smith. 1998. Applied Regression Analysis
work page 1998
-
[8]
Hugging Face. 2024. Transformers: State-of-the-art Machine Learning for Py- torch, TensorFlow, and JAX. https://huggingface.co/transformers. Accessed: 2025-05-23
work page 2024
-
[9]
Bengt Fornberg and Julia Zuev. 2007. The Runge phenomenon and spatially variable shape parameters in RBF interpolation.Computers & Mathematics with Applications54, 3 (2007), 379–398. doi:10.1016/j.camwa.2007.01.028
-
[10]
Ran Gilad-Bachrach, Nathan Dowlin, Kim Laine, Kristin Lauter, Michael Naehrig, and John Wernsing. 2016. CryptoNets: Applying Neural Networks to Encrypted Data with High Throughput and Accuracy. InProceedings of The 33rd Inter- national Conference on Machine Learning (Proceedings of Machine Learning Re- search, Vol. 48), Maria Florina Balcan and Kilian Q. ...
work page 2016
-
[11]
2024.kernlab: Kernel- Based Machine Learning Lab
Alexandros Karatzoglou, Alex Smola, and Kurt Hornik. 2024.kernlab: Kernel- Based Machine Learning Lab. https://CRAN.R-project.org/package=kernlab R package version 0.9-33
work page 2024
-
[12]
Tanveer Khan and Antonis Michalas. 2023. Learning in the Dark: Privacy- Preserving Machine Learning using Function Approximation. In2023 IEEE 22nd International Conference on Trust, Security and Privacy in Computing and Com- munications (TrustCom). IEEE Computer Society, Los Alamitos, CA, USA, 62–71. doi:10.1109/TrustCom60117.2023.00031
-
[13]
Alex Krizhevsky. 2009. Learning Multiple Layers of Features from Tiny Images. https://api.semanticscholar.org/CorpusID:18268744
work page 2009
-
[14]
Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. 1998. Gradient- based learning applied to document recognition.Proc. IEEE86, 11 (1998), 2278– 2324
work page 1998
-
[15]
Vinod Nair and Geoffrey E. Hinton. 2010. Rectified Linear Units Improve Re- stricted Boltzmann Machines. InProceedings of the 27th International Conference on Machine Learning (ICML-10), Johannes Fürnkranz and Thorsten Joachims (Eds.). Omnipress, Haifa, Israel, 807–814. http://www.icml2010.org/papers/432. pdf
work page 2010
-
[16]
OpenMined and Zama. 2024. TenSEAL: Homomorphic encryption library for PyTorch tensors. https://github.com/OpenMined/TenSEAL. Accessed: 2025-05- 23
work page 2024
-
[17]
1985.n-Widths in Approximation Theory
Allan Pinkus. 1985.n-Widths in Approximation Theory. Springer-Verlag, Springer- Verlag Berlin Heidelberg 1985
work page 1985
-
[18]
Bernhard Schölkopf and Alexander J. Smola. 2002.Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, Cambridge, MA
work page 2002
-
[19]
Manning, Andrew Ng, and Christopher Potts
Richard Socher, Alex Perelygin, Jean Wu, Jason Chuang, Christopher D. Manning, Andrew Ng, and Christopher Potts. 2013. Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank. InProceedings of the 2013 Con- ference on Empirical Methods in Natural Language Processing, David Yarowsky, Timothy Baldwin, Anna Korhonen, Karen Livescu, and ...
work page 2013
-
[20]
Alex Wang, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, and Samuel R. Bowman. 2019. GLUE: A Multi-Task Benchmark and Analysis Plat- form for Natural Language Understanding. arXiv:1804.07461 [cs.CL] https: //arxiv.org/abs/1804.07461
work page internal anchor Pith review Pith/arXiv arXiv 2019
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.