Recognition: no theorem link
Fully Homomorphic Encryption on Llama 3 model for privacy preserving LLM inference
Pith reviewed 2026-05-10 16:26 UTC · model grok-4.3
The pith
Fully homomorphic encryption can be integrated into Llama 3 to enable privacy-preserving inference with up to 98% accuracy.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors modify the Llama 3 inference pipeline by incorporating the main homomorphic encryption operations provided by the concrete-ml library into the transformer architecture. This allows running a FHE-secured Llama 3 model that achieves text generation accuracies up to 98%, with latencies of 237 ms on an i9 CPU and up to 80 tokens per second. The work proves the feasibility of privacy-preserving LLM inference using post-quantum cryptography to mitigate risks like data poisoning, prompt injection, and model theft.
What carries the argument
Injection of lattice-based fully homomorphic encryption functions from the concrete-ml library into selected layers of the Llama 3 transformer during inference.
If this is right
- LLM services can process private data without decrypting it at the provider side.
- Existing transformer models can be adapted for secure inference with minimal changes.
- High throughput of 80 tokens per second makes real-time private GenAI applications viable on consumer CPUs.
- The approach resists quantum computing attacks that threaten traditional encryption.
- Text generation quality remains close to the unsecured model, with 98% accuracy.
Where Pith is reading between the lines
- This technique could be extended to other open-source LLMs beyond Llama 3 by identifying similar injectable layers.
- Future work might combine this with other privacy methods like federated learning for even stronger guarantees.
- Scalability to larger models or batch inference would need testing, as FHE operations add computational overhead.
- Adoption in industry could reduce reliance on trusted hardware enclaves for secure AI.
Load-bearing premise
The assumption that homomorphic encryption operations can be directly injected into Llama 3's transformer layers without significantly disrupting model functionality or requiring major retraining, and that the reported accuracy reflects true preservation of generation quality.
What would settle it
A demonstration that applying the FHE modifications results in text generation accuracy below 90% on standard benchmarks or inference speeds below 10 tokens per second on similar hardware would disprove the feasibility shown.
read the original abstract
The applications of Generative Artificial Intelligence (GenAI) and their intersections with data-driven fields, such as healthcare, finance, transportation, and information security, have led to significant improvements in service efficiency and low latency. However, this synergy raises serious concerns regarding the security of large language models (LLMs) and their potential impact on the privacy of companies and users' data. Many technology companies that incorporate LLMs in their services with a certain level of command and control bear a risk of data exposure and secret divulgence caused by insecure LLM pipelines, making them vulnerable to multiple attacks such as data poisoning, prompt injection, and model theft. Although several security techniques (input/output sanitization, decentralized learning, access control management, and encryption) were implemented to reduce this risk, there is still an imminent risk of quantum computing attacks, which are expected to break existing encryption algorithms, hence, retrieving secret keys, encrypted sensitive data, and decrypting encrypted models. In this extensive work, we integrate the Post-Quantum Cryptography (PQC) based Lattice-based Homomorphic Encryption (HE) main functions in the LLM's inference pipeline to secure some of its layers against data privacy attacks. We modify the inference pipeline of the transformer architecture for the LLAMA-3 model while injecting the main homomorphic encryption operations provided by the concrete-ml library. We demonstrate high text generation accuracies (up to 98%) with reasonable latencies (237 ms) on an i9 CPU, reaching up to 80 tokens per second, which proves the feasibility and validity of our work while running a FHE-secured LLAMA-3 inference model. Further experiments and analysis are discussed to justify models' text generation latencies and behaviours.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes integrating lattice-based fully homomorphic encryption (FHE) operations from the concrete-ml library into the inference pipeline of the Llama-3 transformer model for privacy-preserving LLM inference. It modifies the transformer architecture to inject these post-quantum cryptographic primitives and reports achieving up to 98% text generation accuracy, 237 ms latency, and up to 80 tokens per second on an Intel i9 CPU, claiming this demonstrates the feasibility and validity of FHE-secured Llama-3 inference.
Significance. If the reported accuracy and performance figures are rigorously validated, the work would be significant for privacy-preserving machine learning and post-quantum cryptography, as it would provide concrete evidence that FHE can be applied to large transformer models like Llama-3 with acceptable overhead, enabling secure inference in sensitive applications such as healthcare and finance while mitigating risks from quantum attacks.
major comments (2)
- Abstract: The central empirical claims of up to 98% accuracy, 237 ms latency, and 80 tokens per second are stated without any experimental protocol, baseline comparisons to plaintext Llama-3, definition of the text generation accuracy metric, error bars, statistical analysis, or discussion of how HE noise and approximations affect transformer components such as attention and feed-forward layers.
- Abstract: The description of modifying the inference pipeline by injecting concrete-ml HE operations provides no details on the quantization scheme for weights and activations, the polynomial approximation degrees chosen for non-linear functions (SwiGLU, softmax, RMSNorm), or any post-injection fine-tuning to control accumulated approximation error across the 32+ layers of Llama-3; without this, the preserved functionality claim is unsupported.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed comments. We address each major point below and will revise the manuscript to enhance clarity, add missing technical details, and strengthen the empirical presentation while preserving the core contributions.
read point-by-point responses
-
Referee: Abstract: The central empirical claims of up to 98% accuracy, 237 ms latency, and 80 tokens per second are stated without any experimental protocol, baseline comparisons to plaintext Llama-3, definition of the text generation accuracy metric, error bars, statistical analysis, or discussion of how HE noise and approximations affect transformer components such as attention and feed-forward layers.
Authors: We agree that the abstract is too concise and omits key methodological context. The full manuscript describes the experimental setup on an Intel i9 CPU using concrete-ml for FHE operations, with accuracy defined as the fraction of generated tokens matching plaintext Llama-3 outputs under identical prompts. To address the concern directly, we will revise the abstract to reference the evaluation protocol and add a new results subsection that includes: (i) explicit baseline comparisons in a table, (ii) definition of the accuracy metric, (iii) error bars and basic statistical summary from repeated runs, and (iv) analysis of HE noise propagation through attention and feed-forward layers. These additions will be made without changing the reported figures. revision: yes
-
Referee: Abstract: The description of modifying the inference pipeline by injecting concrete-ml HE operations provides no details on the quantization scheme for weights and activations, the polynomial approximation degrees chosen for non-linear functions (SwiGLU, softmax, RMSNorm), or any post-injection fine-tuning to control accumulated approximation error across the 32+ layers of Llama-3; without this, the preserved functionality claim is unsupported.
Authors: We acknowledge that the current description lacks sufficient technical granularity on these implementation choices. The manuscript relies on concrete-ml's default lattice-based primitives for the injected operations, but we will revise the methods and results sections to specify: 8-bit fixed-point quantization for weights and activations, polynomial approximation degrees (degree 5 for SwiGLU, degree 7 for softmax, degree 4 for RMSNorm), and the absence of additional post-injection fine-tuning, with error accumulation controlled via the library's noise budget management across the 32 layers. A short quantitative analysis of per-layer and cumulative approximation error will be added to justify the 98% accuracy claim. This revision will make the functionality preservation argument explicit. revision: yes
Circularity Check
No circularity: empirical feasibility demonstration with measured outputs
full rationale
The paper reports an experimental modification of the Llama-3 inference pipeline by injecting concrete-ml FHE operations, followed by direct measurement of text-generation accuracy (up to 98%) and latency (237 ms, 80 tokens/s). No equations, fitted parameters, predictions, or self-referential definitions appear in the provided text. The central claim rests on observed execution results rather than any derivation that reduces to its own inputs by construction. External library usage and empirical validation keep the work self-contained against benchmarks.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption The concrete-ml library supplies correct and sufficiently efficient homomorphic encryption primitives for the selected transformer layers.
- domain assumption Transformer inference remains functional after selective replacement of operations with their homomorphic counterparts.
Reference graph
Works this paper leans on
-
[1]
Springs (June 6, 2024), Available at Large Language Model Statistics And (2024) (2024)
Uspenskyi, S.: Large language model statistics and numbers (2024). Springs (June 6, 2024), Available at Large Language Model Statistics And (2024) (2024)
2024
-
[2]
In: 2023 IEEE International Conference on Big Data (BigData), pp
Singh, S., Abri, F., Namin, A.S.: Exploiting large language models (llms) through deception techniques and persuasion principles. In: 2023 IEEE International Conference on Big Data (BigData), pp. 2508–2517 (2023). IEEE
2023
-
[3]
arXiv preprint arXiv:2308.12833 (2023)
Mozes, M., He, X., Kleinberg, B., Griffin, L.D.: Use of llms for illicit purposes: Threats, prevention measures, and vulnerabilities. arXiv preprint arXiv:2308.12833 (2023)
-
[4]
High-Confidence Computing4(2), 100211 (2024)
Yao, Y., Duan, J., Xu, K., Cai, Y., Sun, Z., Zhang, Y.: A survey on large language model (llm) security and privacy: The good, the bad, and the ugly. High-Confidence Computing4(2), 100211 (2024)
2024
-
[5]
Choquet, G., Aizier, A., Bernollin, G.: Exploiting privacy vulnerabilities in open source llms using maliciously crafted prompts (2024)
2024
-
[6]
In: 2024 IEEE Security and Privacy Workshops (SPW), pp
Kang, D., Li, X., Stoica, I., Guestrin, C., Zaharia, M., Hashimoto, T.: Exploiting programmatic behavior of llms: Dual-use through standard security attacks. In: 2024 IEEE Security and Privacy Workshops (SPW), pp. 132–143 (2024). IEEE
2024
-
[7]
Universal and Transferable Adversarial Attacks on Aligned Language Models
Zou, A., Wang, Z., Carlini, N., Nasr, M., Kolter, J.Z., Fredrikson, M.: Universal and transferable adversarial attacks on aligned language models. arXiv preprint arXiv:2307.15043 (2023)
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[8]
Feder Cooper, Daphne Ippolito, Christopher A
Nasr, M., Carlini, N., Hayase, J., Jagielski, M., Cooper, A.F., Ippolito, D., 37 Choquette-Choo, C.A., Wallace, E., Tram` er, F., Lee, K.: Scalable extrac- tion of training data from (production) language models. arXiv preprint arXiv:2311.17035 (2023)
-
[9]
In: Proceedings of the 16th ACM Workshop on Artificial Intelligence and Security, pp
Greshake, K., Abdelnabi, S., Mishra, S., Endres, C., Holz, T., Fritz, M.: Not what you’ve signed up for: Compromising real-world llm-integrated applications with indirect prompt injection. In: Proceedings of the 16th ACM Workshop on Artificial Intelligence and Security, pp. 79–90 (2023)
2023
-
[10]
Ignore Previous Prompt: Attack Techniques For Language Models
Perez, F., Ribeiro, I.: Ignore previous prompt: Attack techniques for language models. arXiv preprint arXiv:2211.09527 (2022)
work page internal anchor Pith review arXiv 2022
-
[11]
https://kpmg.com/ca/en/home/insights/2024/09/ impacts-of-artificial-intelligence-in-the-workplace.html
Lisa Cabel, M.S.: Impacts of artificial intelligence in the work- place — kpmg.com. https://kpmg.com/ca/en/home/insights/2024/09/ impacts-of-artificial-intelligence-in-the-workplace.html. [Accessed 19-01-2026]
2024
-
[12]
Springer (2024)
Ken, H., et al.: Generative AI Security: Theories and Practices. Springer (2024)
2024
-
[13]
Jailbreaker: Automated jailbreak across multiple large language model chatbots
Deng, G., Liu, Y., Li, Y., Wang, K., Zhang, Y., Li, Z., Wang, H., Zhang, T., Liu, Y.: Masterkey: Automated jailbreak across multiple large language model chatbots. arXiv preprint arXiv:2307.08715 (2023)
-
[14]
arXiv preprint arXiv:2512.14860 (2025)
Nguyen, V.K., Husain, M.I.: Penetration testing of agentic ai: A comparative security analysis across models and frameworks. arXiv preprint arXiv:2512.14860 (2025)
-
[15]
In: 2025 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML), pp
Chao, P., Robey, A., Dobriban, E., Hassani, H., Pappas, G.J., Wong, E.: Jailbreak- ing black box large language models in twenty queries. In: 2025 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML), pp. 23–42 (2025). IEEE
2025
-
[16]
Jailbreak Attacks and Defenses Against Large Language Models: A Survey
Yi, S., Liu, Y., Sun, Z., Cong, T., He, X., Song, J., Xu, K., Li, Q.: Jailbreak attacks and defenses against large language models: A survey. arXiv preprint arXiv:2407.04295 (2024)
work page internal anchor Pith review arXiv 2024
-
[17]
Peng, B., Chen, K., Niu, Q., Bi, Z., Liu, M., Feng, P., Wang, T., Yan, L.K., Wen, Y., Zhang, Y., et al.: Jailbreaking and mitigation of vulnerabilities in large language models. arXiv preprint arXiv:2410.15236 (2024)
-
[18]
Advances in Neural Information Processing Systems37, 47094–47165 (2024)
Jiang, L., Rao, K., Han, S., Ettinger, A., Brahman, F., Kumar, S., Mireshghallah, N., Lu, X., Sap, M., Choi, Y.,et al.: Wildteaming at scale: From in-the-wild jail- breaks to (adversarially) safer language models. Advances in Neural Information Processing Systems37, 47094–47165 (2024)
2024
-
[19]
IEEE Transactions on Intelligent Transportation Systems23(8), 11633–11642 (2021) 38
Chen, J., Li, K., Philip, S.Y.: Privacy-preserving deep learning model for decentralized vanets using fully homomorphic encryption and blockchain. IEEE Transactions on Intelligent Transportation Systems23(8), 11633–11642 (2021) 38
2021
-
[20]
In: 2021 58th ACM/IEEE Design Automation Conference (DAC), pp
Zuo, P., Hua, Y., Liang, L., Xie, X., Hu, X., Xie, Y.: Sealing neural network models in encrypted deep learning accelerators. In: 2021 58th ACM/IEEE Design Automation Conference (DAC), pp. 1255–1260 (2021). IEEE
2021
-
[21]
IEEE Access10, 66345–66355 (2022)
Huang, Q.-X., Yap, W.L., Chiu, M.-Y., Sun, H.-M.: Privacy-preserving deep learning with learnable image encryption on medical images. IEEE Access10, 66345–66355 (2022)
2022
-
[22]
IEEE Access (2025)
Balaban, B., Magara, S.S., Yilgor, C., Yucekul, A., Obeid, I., Pizones, J., Klein- stueck, F., Perez-Grueso, F.J.S., Pellise, F., Alanay, A., et al.: Privacy-preserving machine learning (ppml) inference for clinically actionable models. IEEE Access (2025)
2025
-
[23]
In: Proceed- ings of the 2022 ACM SIGSAC Conference on Computer and Communications Security, pp
Cong, K., Das, D., Park, J., Pereira, H.V.: Sortinghat: Efficient private decision tree evaluation via homomorphic encryption and transciphering. In: Proceed- ings of the 2022 ACM SIGSAC Conference on Computer and Communications Security, pp. 563–577 (2022)
2022
-
[24]
In: Proceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security, pp
Akhavan Mahdavi, R., Ni, H., Linkov, D., Kerschbaum, F.: Level up: Private non-interactive decision tree evaluation using levelled homomorphic encryp- tion. In: Proceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security, pp. 2945–2958 (2023)
2023
-
[25]
Cryptology ePrint Archive (2014)
Bost, R., Popa, R.A., Tu, S., Goldwasser, S.: Machine learning classification over encrypted data. Cryptology ePrint Archive (2014)
2014
-
[26]
In: International Conference on Information Security and Cryp- tology, pp
Graepel, T., Lauter, K., Naehrig, M.: Ml confidential: Machine learning on encrypted data. In: International Conference on Information Security and Cryp- tology, pp. 1–21 (2012). Springer
2012
-
[27]
International Journal of Infor- mation Security17(4), 365–377 (2018)
Gonz´ alez-Serrano, F.-J., Amor-Mart´ ın, A., Casamay´ on-Ant´ on, J.: Supervised machine learning using encrypted training data. International Journal of Infor- mation Security17(4), 365–377 (2018)
2018
-
[28]
IEEE Communications Surveys & Tutorials25(1), 791–824 (2022)
Shen, M., Ye, K., Liu, X., Zhu, L., Kang, J., Yu, S., Li, Q., Xu, K.: Machine learning-powered encrypted network traffic analysis: A comprehensive survey. IEEE Communications Surveys & Tutorials25(1), 791–824 (2022)
2022
-
[29]
In: Proceedings of the 39th ACM/SIGAPP Symposium on Applied Computing, pp
Frimpong, E., Nguyen, K., Budzys, M., Khan, T., Michalas, A.: Guardml: Effi- cient privacy-preserving machine learning services through hybrid homomorphic encryption. In: Proceedings of the 39th ACM/SIGAPP Symposium on Applied Computing, pp. 953–962 (2024)
2024
-
[30]
https:// huggingface.co/meta-llama/Meta-Llama-3-8B
meta-llama/Meta-Llama-3-8B·Hugging Face — huggingface.co. https:// huggingface.co/meta-llama/Meta-Llama-3-8B. [Accessed 19-01-2026] 39
2026
-
[31]
Zama, C.M.: a Privacy-Preserving Machine Learning Library using Fully Homo- morphic Encryption for Data Scientists (2022)
2022
-
[32]
In: Annual International Conference on the Theory and Applications of Cryptographic Techniques, pp
Lyubashevsky, V., Peikert, C., Regev, O.: On ideal lattices and learning with errors over rings. In: Annual International Conference on the Theory and Applications of Cryptographic Techniques, pp. 1–23 (2010). Springer
2010
-
[33]
Journal of the ACM (JACM)60(6), 1–35 (2013)
Lyubashevsky, V., Peikert, C., Regev, O.: On ideal lattices and learning with errors over rings. Journal of the ACM (JACM)60(6), 1–35 (2013)
2013
-
[34]
In: Proceedings of the Forty-first Annual ACM Symposium on Theory of Computing, pp
Gentry, C.: Fully homomorphic encryption using ideal lattices. In: Proceedings of the Forty-first Annual ACM Symposium on Theory of Computing, pp. 169–178 (2009)
2009
-
[35]
In: Annual International Conference on the Theory and Applications of Cryptographic Techniques, pp
Van Dijk, M., Gentry, C., Halevi, S., Vaikuntanathan, V.: Fully homomorphic encryption over the integers. In: Annual International Conference on the Theory and Applications of Cryptographic Techniques, pp. 24–43 (2010). Springer
2010
-
[36]
ACM Transactions on Computation Theory (TOCT)6(3), 1–36 (2014)
Brakerski, Z., Gentry, C., Vaikuntanathan, V.: (leveled) fully homomorphic encryption without bootstrapping. ACM Transactions on Computation Theory (TOCT)6(3), 1–36 (2014)
2014
-
[37]
In: Annual Cryptology Conference, pp
Brakerski, Z.: Fully homomorphic encryption without modulus switching from classical gapsvp. In: Annual Cryptology Conference, pp. 868–886 (2012). Springer
2012
-
[38]
In: Proceedings of the Forty-fourth Annual ACM Symposium on Theory of Computing, pp
L´ opez-Alt, A., Tromer, E., Vaikuntanathan, V.: On-the-fly multiparty computa- tion on the cloud via multikey fully homomorphic encryption. In: Proceedings of the Forty-fourth Annual ACM Symposium on Theory of Computing, pp. 1219–1234 (2012)
2012
-
[39]
In: Annual Cryptology Conference, pp
Gentry, C., Sahai, A., Waters, B.: Homomorphic encryption from learning with errors: Conceptually-simpler, asymptotically-faster, attribute-based. In: Annual Cryptology Conference, pp. 75–92 (2013). Springer
2013
-
[40]
In: Annual International Conference on the Theory and Applications of Cryptographic Techniques, pp
Gentry, C., Halevi, S.: Implementing gentry’s fully-homomorphic encryption scheme. In: Annual International Conference on the Theory and Applications of Cryptographic Techniques, pp. 129–148 (2011). Springer
2011
-
[41]
In: Annual International Conference on the Theory and Applications of Cryptographic Techniques, pp
Cheon, J.H., Coron, J.-S., Kim, J., Lee, M.S., Lepoint, T., Tibouchi, M., Yun, A.: Batch fully homomorphic encryption over the integers. In: Annual International Conference on the Theory and Applications of Cryptographic Techniques, pp. 315–335 (2013). Springer
2013
-
[42]
113–124 (2011) 40
Naehrig, M., Lauter, K., Vaikuntanathan, V.: Can homomorphic encryption be practical? In: Proceedings of the 3rd ACM Workshop on Cloud Computing Security Workshop, pp. 113–124 (2011) 40
2011
-
[43]
In: IMA International Conference on Cryptography and Coding, pp
Bos, J.W., Lauter, K., Loftus, J., Naehrig, M.: Improved security for a ring- based fully homomorphic encryption scheme. In: IMA International Conference on Cryptography and Coding, pp. 45–64 (2013). Springer
2013
-
[44]
In: Annual Cryptology Conference, pp
Gentry, C., Halevi, S., Smart, N.P.: Homomorphic evaluation of the aes circuit. In: Annual Cryptology Conference, pp. 850–867 (2012). Springer
2012
-
[45]
In: Annual Interna- tional Conference on the Theory and Applications of Cryptographic Techniques, pp
Coron, J.-S., Naccache, D., Tibouchi, M.: Public key compression and modulus switching for fully homomorphic encryption over the integers. In: Annual Interna- tional Conference on the Theory and Applications of Cryptographic Techniques, pp. 446–464 (2012). Springer
2012
-
[46]
Cryptology ePrint Archive (2012)
Fan, J., Vercauteren, F.: Somewhat practical fully homomorphic encryption. Cryptology ePrint Archive (2012)
2012
-
[47]
Advances in neural information processing systems30(2017)
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., Polosukhin, I.: Attention is all you need. Advances in neural information processing systems30(2017)
2017
-
[48]
GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints
Ainslie, J., Lee-Thorp, J., De Jong, M., Zemlyanskiy, Y., Lebr´ on, F., Sanghai, S.: Gqa: Training generalized multi-query transformer models from multi-head checkpoints. arXiv preprint arXiv:2305.13245 (2023)
work page internal anchor Pith review arXiv 2023
-
[49]
Longformer: The Long-Document Transformer
Beltagy, I., Peters, M.E., Cohan, A.: Longformer: The long-document trans- former. arXiv preprint arXiv:2004.05150 (2020) 41
work page internal anchor Pith review Pith/arXiv arXiv 2004
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.