Training Machine Learning Models on Encrypted Data: A Privacy-Preserving Framework using Homomorphic Encryption
Pith reviewed 2026-05-08 08:00 UTC · model grok-4.3
The pith
Homomorphic encryption trains KNN and linear regression models on encrypted data with accuracy comparable to plaintext versions.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By applying the CKKS scheme for approximate real-number arithmetic, it is possible to train K-nearest neighbors and linear regression models entirely on encrypted data while obtaining performance metrics comparable to plaintext-trained models, and to carry out encrypted inference on a basic multilayer perceptron architecture.
What carries the argument
The CKKS homomorphic encryption scheme, which enables approximate arithmetic operations on encrypted real-valued data so that model training steps can execute without any decryption.
If this is right
- KNN and linear regression models can be trained while the underlying data remains confidential throughout.
- Encrypted inference works for simple neural network structures under the same encryption scheme.
- The approach demonstrates a workable balance between privacy guarantees and usable model quality for selected algorithms.
Where Pith is reading between the lines
- The same encryption technique could support joint model training across organizations that cannot share raw records.
- Scalability to larger datasets or deeper models will depend on further reductions in computational overhead.
- Healthcare or financial applications could adopt this pattern once noise-management and speed constraints are relaxed.
Load-bearing premise
The noise generated by repeated homomorphic operations stays low enough that it does not materially lower final model accuracy relative to plaintext training.
What would settle it
Running the same training task on identical data in both encrypted and plaintext modes and finding that the encrypted model's accuracy falls substantially below the plaintext model's accuracy would disprove the claim of comparable performance.
read the original abstract
The use of Machine Learning (ML) for data-driven decision-making often relies on access to sensitive datasets, which introduces privacy challenges. Traditional encryption methods protect data at rest or in transit but fail to secure it during processing, exposing it to unauthorized access. Homomorphic encryption emerges as a transformative solution, enabling computations on encrypted data without decryption, thus preserving confidentiality throughout the ML pipeline. This paper addresses the challenge of training ML models on encrypted data while maintaining accuracy and efficiency by proposing a proof-of-concept for a privacy-preserving framework that leverages Cheon-Kim-Kim-Song (CKKS) for approximate real-number arithmetic. Also, it demonstrates the feasibility of training K-Nearest Neighbors (KNN) and linear regression models on encrypted data, and evaluates encrypted inference for a basic Multilayer Perceptron (MLP) architecture. Experimental results show that models trained under Homomorphic encryption achieve performance metrics comparable to plaintext-trained models, validating the approach. However, challenges such as computational overhead, noise management, and limited support for non-polynomial operations persist. This work lays the groundwork for broader adoption of privacy-preserving ML in real-world applications, balancing security with computational feasibility.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes a proof-of-concept privacy-preserving framework that uses the CKKS homomorphic encryption scheme to enable training of K-Nearest Neighbors and linear regression models directly on encrypted data, along with encrypted inference for a basic multilayer perceptron. It reports that the resulting models achieve performance metrics comparable to their plaintext-trained counterparts while acknowledging persistent challenges with computational overhead, noise growth, and non-polynomial operations.
Significance. If the comparable-accuracy claim is substantiated with explicit noise-budget accounting and reproducible experimental parameters, the work would provide a useful baseline demonstration for simple models in the privacy-preserving ML literature. Its current significance is limited by the absence of such accounting, leaving open whether the results generalize beyond toy settings or shallow iteration counts.
major comments (2)
- [Experimental Results] The experimental evaluation section provides no quantitative details on CKKS parameters (polynomial degree, modulus chain, initial scale) or the number of rescaling/bootstrapping operations performed during iterative training of linear regression. Without this, it is impossible to verify that noise accumulation did not cause the encrypted gradients to deviate from plaintext ones, directly undermining the central claim of comparable test metrics.
- [Linear Regression Training] For the linear regression experiments, the manuscript does not report the number of gradient-descent iterations, the loss-convergence criterion, or a direct comparison of final coefficient vectors (encrypted vs. plaintext). This information is load-bearing for assessing whether CKKS approximate arithmetic preserves the training dynamics.
minor comments (2)
- [Abstract] The abstract states 'comparable performance metrics' without any numerical values or tables; adding at least one quantitative comparison (e.g., accuracy or MSE on a public dataset) would improve clarity.
- [Framework Description] Notation for the CKKS encoding and rescaling steps is introduced without a dedicated preliminaries subsection, making the description of the framework harder to follow for readers unfamiliar with the scheme.
Simulated Author's Rebuttal
We thank the referee for the constructive comments on our manuscript. The feedback correctly identifies gaps in the experimental reporting that limit verification of our claims. We have prepared a major revision that incorporates the requested details on CKKS parameters, training dynamics, and noise accounting to strengthen the presentation of this proof-of-concept work.
read point-by-point responses
-
Referee: [Experimental Results] The experimental evaluation section provides no quantitative details on CKKS parameters (polynomial degree, modulus chain, initial scale) or the number of rescaling/bootstrapping operations performed during iterative training of linear regression. Without this, it is impossible to verify that noise accumulation did not cause the encrypted gradients to deviate from plaintext ones, directly undermining the central claim of comparable test metrics.
Authors: We agree that these implementation details were omitted from the original manuscript and that their absence hinders independent verification of noise behavior. In the revised version we will add a new subsection that specifies the exact CKKS parameters (polynomial degree, modulus chain, and initial scale) together with the number of rescaling and bootstrapping operations executed during linear-regression training. We will also supply a concise noise-budget analysis showing that the accumulated noise remained below the decryption threshold throughout the reported iterations, thereby confirming that the observed comparable test metrics are not an artifact of excessive noise growth. revision: yes
-
Referee: [Linear Regression Training] For the linear regression experiments, the manuscript does not report the number of gradient-descent iterations, the loss-convergence criterion, or a direct comparison of final coefficient vectors (encrypted vs. plaintext). This information is load-bearing for assessing whether CKKS approximate arithmetic preserves the training dynamics.
Authors: We acknowledge the omission of these training hyperparameters and the missing coefficient-vector comparison. The revised manuscript will explicitly state the number of gradient-descent iterations performed, the loss-convergence criterion employed, and will include a direct numerical comparison of the final coefficient vectors obtained from the encrypted and plaintext runs. This comparison will demonstrate that the vectors differ only within the approximation tolerance of CKKS, thereby supporting the claim that the training dynamics were preserved. revision: yes
Circularity Check
No circularity: empirical comparison of encrypted vs plaintext training is self-contained
full rationale
The paper's core contribution is a proof-of-concept implementation and experimental evaluation of training KNN and linear regression (plus MLP inference) under CKKS homomorphic encryption, with direct side-by-side metrics against plaintext baselines. No mathematical derivation chain, parameter fitting presented as prediction, or self-citation load-bearing premise appears in the provided text. The claim of 'comparable performance metrics' rests on reported experimental runs rather than any reduction to fitted inputs or definitional equivalence. Noise management and computational overhead are acknowledged as open challenges, not resolved by construction. This is the standard honest non-finding for an empirical systems paper.
Axiom & Free-Parameter Ledger
axioms (1)
- standard math The CKKS homomorphic encryption scheme can perform approximate arithmetic operations on encrypted real numbers with manageable noise growth.
Reference graph
Works this paper leans on
-
[1]
Alexandre Marques, Beatriz Sá, Rui Botelho.Homomor- phic Encryption ML. GitHub repository, 2024. Available at: https://github.com/beatrizmsa/homomorphic_encryption_ml (Accessed: 08 June 2025)
work page 2024
-
[2]
Duong Huynh.Encoding and Decoding in CKKS Scheme. Google Colab Notebook,
-
[3]
Available at:https://colab.research.google.com/github/dhuynh95/ homomorphic\_encryption\_intro/blob/master/01\_encoding\_decoding\ _ckks.ipynb(Accessed: 08 June 2025)
work page 2025
-
[4]
Altavish Jain.Boston Housing Dataset. Kaggle Dataset, 2023. Available at: https://www.kaggle.com/datasets/altavish/boston-housing-dataset (Accessed: 08 June 2025)
work page 2023
-
[5]
Ankita A.Paillier Homomorphic Encryption – A Comprehensive Guide. Medium article, 2023. Available at: https://medium.com/@aannkkiittaa/paillier- homomorphic-encryption-a-comprehensive-guide-ce7fe2c245bd (Accessed: 08 June 2025)
work page 2023
-
[6]
Zama.Homomorphic Encryption 101. Zama Blog, 2021. Available at: https://www.zama.ai/post/homomorphic-encryption-101 (Accessed: 30 May 2025)
work page 2021
-
[7]
Privacy Preserving Machine Learning with Homomorphic Encryption and Federated Learning
H. Fang and Q. Qian. “Privacy Preserving Machine Learning with Homomorphic Encryption and Federated Learning.”Future Internet, vol. 13, no. 4, p. 94, 2021. Available at: https://www.mdpi.com/1999-5903/13/4/94 (Accessed: 30 May 2025)
work page 2021
-
[8]
My-AIML.All About Homomorphic Encryption for Privacy-Preserving Model. Medium article, 2023. Available at: https://medium.com/my-aiml/all-about- homomorphic-encryption-for-privacy-preserving-model-98abf9f97fe (Accessed: 30 May 2025). 16 Alexandre Marques et al
work page 2023
-
[9]
Daniel Huynh.CKKS Explained: Part 1, Vanilla Encoding and Decoding. Open- Mined Blog, Nov 7, 2024. Available at: https://openmined.org/blog/ckks-explained- part-1-simple-encoding-and-decoding/ (Accessed: 30 May 2025)
work page 2024
-
[10]
Private LoRA Fine-tuning of Open-Source LLMs with Homomorphic Encryption,
J. Frery, R. Bredehoft, J. Klemsa, A. Meyre, and A. Stoian, “Private LoRA Fine-tuning of Open-Source LLMs with Homomorphic Encryption,”arXiv preprint arXiv:2505.07329, 2025. Available at: https://arxiv.org/abs/2505.07329
- [12]
-
[13]
J. Ma, S.-A. Naas, S. Sigg, and X. Lyu, “Privacy-preserving federated learning based on multi-key homomorphic encryption,”International Journal of Intelligent Systems, vol. 37, no. 9, pp. 5880–5901, Jan. 2022. doi:10.1002/int.22818
-
[14]
L. Wu, X. A. Wang, J. Liu, Y. Su, Z. Tu, W. Liu, H. Lei, D. Tang, Y. Cao, and J. Zhang, “Homomorphic Encryption for Machine Learning Applications with CKKS Algorithms: A Survey of Developments and Applications,”Computers, Mate- rials and Continua, vol. 85, no. 1, pp. 89–119, 2025. doi:10.32604/cmc.2025.064346. Available: https://www.sciencedirect.com/scie...
-
[15]
Public-Key Cryptosystems Based on Composite Degree Residuosity Classes
P. Paillier. “Public-Key Cryptosystems Based on Composite Degree Residuosity Classes.” InAdvances in Cryptology – EUROCRYPT ’99, Lecture Notes in Com- puter Science, vol. 1592. Springer, 1999
work page 1999
-
[16]
Homomorphic Encryption for Arithmetic of Approximate Numbers,
J. H. Cheon, A. Kim, M. Kim, and Y. Song, “Homomorphic Encryption for Arithmetic of Approximate Numbers,” inAdvances in Cryptology – ASIACRYPT 2017, Lecture Notes in Computer Science, vol. 10624, pp. 409–437. Springer, 2017. doi:10.1007/978-3-319-70694-8_15
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.