pith. sign in

arxiv: 2605.23641 · v1 · pith:3IPOIBR6new · submitted 2026-05-22 · 💻 cs.CR

Kernel-Based ReLU Approximation for Homomorphic Encryption-Compatible Privacy-preserving Deep Learning Models

Pith reviewed 2026-05-25 04:09 UTC · model grok-4.3

classification 💻 cs.CR
keywords homomorphic encryptionReLU approximationprivacy-preserving deep learningkernel-based approximationpolynomial approximationlarge language modelssecure inferencetransformers
0
0 comments X

The pith

A kernel-based second-degree polynomial approximates ReLU to make it compatible with homomorphic encryption for privacy-preserving deep learning.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops an approximation for the ReLU activation so that non-linear operations can run on encrypted data without decryption. Homomorphic encryption handles only addition and multiplication, which excludes standard ReLU and blocks privacy-preserving use of large language models. The method first creates a smooth kernel function that mimics ReLU behavior, then fits it with a low-degree polynomial trained directly on token embeddings from pre-trained models. Evaluation across simulated data, tokenized inputs, and full transformer setups shows the approximation maintains fidelity while keeping multiplicative depth low. A reader would care because the technique could allow secure inference on sensitive language data in settings where decryption is not permitted.

Core claim

The central claim is that a kernel-based smooth function mimicking ReLU can be approximated by a second-degree polynomial, inspired by Jackson's theorem, to achieve low multiplicative depth; when trained and tested on token embeddings from pre-trained LLMs, this yields improved approximation fidelity and supports deployment in deep learning and transformer models under HE constraints.

What carries the argument

A kernel-based smooth ReLU mimic fitted by a second-degree polynomial trained on LLM token embeddings to keep multiplicative depth low while preserving fidelity.

If this is right

  • The approximation enables ReLU operations inside HE-constrained inference pipelines for NLP tasks.
  • It supports evaluation and use across tokenized data, deep learning models, and transformer architectures.
  • Low multiplicative depth is maintained, allowing the method to fit within HE limitations.
  • Improved fidelity over prior approximations makes secure and privacy-preserving inference more practical in various tasks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same kernel-plus-polynomial pattern could be tested on other activation functions that currently block HE deployment.
  • If the embedding-trained fit generalizes across model scales, it could reduce accuracy loss when moving from small test models to production LLMs.
  • One could measure whether the approximation still holds when the surrounding layers are also replaced by their HE-compatible versions rather than kept in plaintext.

Load-bearing premise

The polynomial approximation trained on token embeddings will preserve enough end-to-end performance when placed inside full transformer architectures under actual HE constraints.

What would settle it

Insert the approximated ReLU into a complete transformer, run inference on encrypted data, and check whether task accuracy drops below the level achieved by the original plaintext model.

Figures

Figures reproduced from arXiv: 2605.23641 by Dimitra Papatsaroucha, Dimitrios Sygletos, Evangelos K. Markakis, Ilias Politis, Marios Choudetsanakis.

Figure 1
Figure 1. Figure 1: Actual ReLU vs X Squared [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Actual ReLU vs Kernel Approx. 4.2 Input Data Selection, Polynomial Regression & Optimal Degree Selection To approximate the output of the kernel function, a polynomial regression (5) model was trained on the kernel function predictions, where 𝑛 is the degree of the polynomial. 𝑦 = 𝛽0 + 𝛽1𝑥 + 𝛽2𝑥 2 + ... + 𝛽𝑛𝑥 𝑛 + 𝜖 (5) Regarding input data, embeddings were extracted from RoBERTa1 and DistilBERT2 using the … view at source ↗
Figure 3
Figure 3. Figure 3: Actual ReLU vs Approximations. 5.4.3 Experiment 3: Deep Learning Models Performance. The pro￾posed Kernel Polynomial method was assessed alongside the estab￾lished polynomials from the literature, which were also included in the previous experiments, to determine if model accuracy is maintained when the proposed solution is implemented [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
read the original abstract

As privacy concerns in AI technologies continue to grow, Homomorphic Encryption (HE) offers a way to perform computations on encrypted data without the need of decryption during operations. However, HE is limited to addition and multiplication, making non-linear functions incompatible in their original form. This limitation has become more critical with the widespread use of Large Language Models (LLMs), where the non-linearity of activation functions such as the Rectified Linear Unit (ReLU) poses challenges for deployment in privacy-preserving Natural Language Processing (NLP) settings. This paper proposes a kernel-based approximation of ReLU, enabling its use within HE-constrained settings and thus contributing a critical step toward supporting privacy-preserving LLMs. A smooth kernel-based function, mimicking ReLU, is approximated using a second-degree polynomial, inspired by Jackson's theorem, to achieve low multiplicative depth. The proposed method is trained and assessed directly on token embeddings from pre-trained LLMs and evaluated in various scenarios, from simulated and tokenized data to deep learning and transformer models. Results show improved approximation fidelity, supporting the method's suitability for secure and privacy-preserving inference in various tasks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes a kernel-based second-degree polynomial approximation to ReLU, inspired by Jackson's theorem, for compatibility with homomorphic encryption. The method is trained directly on token embeddings from pre-trained LLMs and is claimed to achieve improved approximation fidelity that supports its use for privacy-preserving inference in deep learning and transformer models.

Significance. If the quantitative results confirm higher fidelity than existing low-degree polynomial approximations while maintaining HE compatibility and without degrading end-to-end transformer performance, the work would provide a practical contribution to privacy-preserving NLP. Training the approximant on domain-specific embeddings is a sound methodological choice.

major comments (2)
  1. [Abstract] Abstract: the claim that 'Results show improved approximation fidelity' is unsupported by any numerical metrics, baseline comparisons, error bounds, or evaluation protocols, preventing verification of the central improvement assertion.
  2. [Abstract] Abstract: the evaluation is described as spanning 'simulated and tokenized data to deep learning and transformer models,' yet no evidence is supplied that the approximation was substituted into a complete transformer forward pass, that multiplicative depth stayed within HE noise budgets, or that downstream task metrics (accuracy, perplexity) were measured after replacement.
minor comments (1)
  1. The abstract would be strengthened by inclusion of at least one concrete fidelity metric or comparison result.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback highlighting issues with the abstract. We agree that the abstract requires revision to ensure claims are supported and descriptions are precise. We will update the abstract in the revised manuscript and address the points below.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the claim that 'Results show improved approximation fidelity' is unsupported by any numerical metrics, baseline comparisons, error bounds, or evaluation protocols, preventing verification of the central improvement assertion.

    Authors: We agree the abstract claim is not supported by explicit metrics within the abstract itself. The revised abstract will incorporate quantitative results from our experiments, including approximation error metrics, baseline comparisons to other low-degree polynomials, and a brief description of the evaluation protocol on LLM token embeddings. revision: yes

  2. Referee: [Abstract] Abstract: the evaluation is described as spanning 'simulated and tokenized data to deep learning and transformer models,' yet no evidence is supplied that the approximation was substituted into a complete transformer forward pass, that multiplicative depth stayed within HE noise budgets, or that downstream task metrics (accuracy, perplexity) were measured after replacement.

    Authors: We acknowledge the abstract overstates the evaluation scope. The manuscript evaluates the kernel-based approximation on simulated data, tokenized embeddings, and in standard deep learning settings, but does not demonstrate substitution into a full transformer forward pass under HE constraints or report downstream metrics like accuracy or perplexity after replacement. The revised abstract will accurately reflect the performed experiments without implying complete end-to-end transformer or HE noise budget validation. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical fitting evaluated on held-out fidelity metrics

full rationale

The paper trains a second-degree polynomial to approximate ReLU on token embeddings drawn from pre-trained LLMs and reports direct fidelity measurements across simulated, tokenized, and transformer scenarios. No derivation chain, equation, or claim reduces by construction to a fitted quantity defined in terms of itself, nor does any load-bearing step rest on a self-citation whose content is unverified. The reported improvement in approximation fidelity is an independent empirical outcome of the training procedure rather than a tautological renaming or self-referential prediction.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 0 invented entities

The approach rests on fitting a quadratic polynomial to a custom smooth kernel function that mimics ReLU; the fitting process introduces free parameters whose values are not reported in the abstract. The method invokes Jackson's theorem for the existence of low-degree approximations.

free parameters (2)
  • Kernel function parameters
    Parameters defining the smooth kernel-based ReLU mimic are fitted on LLM embeddings.
  • Polynomial coefficients
    Coefficients of the second-degree polynomial are fitted to match the kernel function.
axioms (1)
  • standard math Jackson's theorem guarantees the existence of low-degree polynomial approximations to continuous functions with controlled error
    The paper states the method is inspired by Jackson's theorem to achieve low multiplicative depth.

pith-pipeline@v0.9.0 · 5750 in / 1156 out tokens · 26272 ms · 2026-05-25T04:09:47.545661+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

20 extracted references · 20 canonical work pages · 3 internal anchors

  1. [1]

    Ahle, Michael Kapralov, Jakob B

    Thomas D. Ahle, Michael Kapralov, Jakob B. T. Knudsen, Rasmus Pagh, Ameya Velingker, David Woodruff, and Amir Zandieh. 2020. Oblivious Sketching of High-Degree Polynomial Kernels. arXiv:1909.01410 [cs.DS] https://arxiv.org/ abs/1909.01410

  2. [2]

    E. W. Cheney. 1982.Introduction to Approximation Theory. AMS Chelsea Pub- lishing, 201 Charles Street, Providence, RI 02904-2213, USA

  3. [3]

    Jung Hee Cheon, Andrey Kim, Miran Kim, and Yongsoo Song. 2016. Homomor- phic Encryption for Arithmetic of Approximate Numbers. Cryptology ePrint Archive, Paper 2016/421. https://eprint.iacr.org/2016/421

  4. [4]

    Edward Chou, Josh Beal, Daniel Levy, Serena Yeung, Albert Haque, and Li Fei- Fei. 2018. Faster CryptoNets: Leveraging Sparsity for Real-World Encrypted Inference. arXiv:1811.09953 [cs.CR] https://arxiv.org/abs/1811.09953

  5. [5]

    Li Deng. 2012. The mnist database of handwritten digit images for machine learning research.IEEE Signal Processing Magazine29, 6 (2012), 141–142

  6. [6]

    Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv:1810.04805 [cs.CL] https://arxiv.org/abs/1810.04805

  7. [7]

    Draper and Harry Smith

    Norman R. Draper and Harry Smith. 1998. Applied Regression Analysis

  8. [8]

    Hugging Face. 2024. Transformers: State-of-the-art Machine Learning for Py- torch, TensorFlow, and JAX. https://huggingface.co/transformers. Accessed: 2025-05-23

  9. [9]

    Bengt Fornberg and Julia Zuev. 2007. The Runge phenomenon and spatially variable shape parameters in RBF interpolation.Computers & Mathematics with Applications54, 3 (2007), 379–398. doi:10.1016/j.camwa.2007.01.028

  10. [10]

    Ran Gilad-Bachrach, Nathan Dowlin, Kim Laine, Kristin Lauter, Michael Naehrig, and John Wernsing. 2016. CryptoNets: Applying Neural Networks to Encrypted Data with High Throughput and Accuracy. InProceedings of The 33rd Inter- national Conference on Machine Learning (Proceedings of Machine Learning Re- search, Vol. 48), Maria Florina Balcan and Kilian Q. ...

  11. [11]

    2024.kernlab: Kernel- Based Machine Learning Lab

    Alexandros Karatzoglou, Alex Smola, and Kurt Hornik. 2024.kernlab: Kernel- Based Machine Learning Lab. https://CRAN.R-project.org/package=kernlab R package version 0.9-33

  12. [12]

    Tanveer Khan and Antonis Michalas. 2023. Learning in the Dark: Privacy- Preserving Machine Learning using Function Approximation. In2023 IEEE 22nd International Conference on Trust, Security and Privacy in Computing and Com- munications (TrustCom). IEEE Computer Society, Los Alamitos, CA, USA, 62–71. doi:10.1109/TrustCom60117.2023.00031

  13. [13]

    Alex Krizhevsky. 2009. Learning Multiple Layers of Features from Tiny Images. https://api.semanticscholar.org/CorpusID:18268744

  14. [14]

    Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. 1998. Gradient- based learning applied to document recognition.Proc. IEEE86, 11 (1998), 2278– 2324

  15. [15]

    Vinod Nair and Geoffrey E. Hinton. 2010. Rectified Linear Units Improve Re- stricted Boltzmann Machines. InProceedings of the 27th International Conference on Machine Learning (ICML-10), Johannes Fürnkranz and Thorsten Joachims (Eds.). Omnipress, Haifa, Israel, 807–814. http://www.icml2010.org/papers/432. pdf

  16. [16]

    OpenMined and Zama. 2024. TenSEAL: Homomorphic encryption library for PyTorch tensors. https://github.com/OpenMined/TenSEAL. Accessed: 2025-05- 23

  17. [17]

    1985.n-Widths in Approximation Theory

    Allan Pinkus. 1985.n-Widths in Approximation Theory. Springer-Verlag, Springer- Verlag Berlin Heidelberg 1985

  18. [18]

    Bernhard Schölkopf and Alexander J. Smola. 2002.Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, Cambridge, MA

  19. [19]

    Manning, Andrew Ng, and Christopher Potts

    Richard Socher, Alex Perelygin, Jean Wu, Jason Chuang, Christopher D. Manning, Andrew Ng, and Christopher Potts. 2013. Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank. InProceedings of the 2013 Con- ference on Empirical Methods in Natural Language Processing, David Yarowsky, Timothy Baldwin, Anna Korhonen, Karen Livescu, and ...

  20. [20]

    Alex Wang, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, and Samuel R. Bowman. 2019. GLUE: A Multi-Task Benchmark and Analysis Plat- form for Natural Language Understanding. arXiv:1804.07461 [cs.CL] https: //arxiv.org/abs/1804.07461