arxiv: 2604.12168 · v1 · submitted 2026-04-14 · 💻 cs.CR · cs.AI

Recognition: no theorem link

Fully Homomorphic Encryption on Llama 3 model for privacy preserving LLM inference

Anes Abdennebi , Nadjia Kara , Laaziz Lahlou

Authors on Pith no claims yet

Pith reviewed 2026-05-10 16:26 UTC · model grok-4.3

classification 💻 cs.CR cs.AI

keywords fully homomorphic encryptionLlama 3privacy preserving LLM inferencepost-quantum cryptographytransformer modelsecure AI inferencelattice-based encryptionconcrete-ml

0 comments

The pith

Fully homomorphic encryption can be integrated into Llama 3 to enable privacy-preserving inference with up to 98% accuracy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper shows how to secure the inference process of the Llama 3 large language model by injecting fully homomorphic encryption operations into its transformer layers. The goal is to protect user data and model secrets from exposure in applications like healthcare and finance, where privacy is critical. By using lattice-based post-quantum homomorphic encryption from the concrete-ml library, the authors maintain high text generation quality while adding security against current and future quantum-based attacks. A sympathetic reader would care because this makes private LLM use practical on ordinary hardware without retraining the entire model.

Core claim

The authors modify the Llama 3 inference pipeline by incorporating the main homomorphic encryption operations provided by the concrete-ml library into the transformer architecture. This allows running a FHE-secured Llama 3 model that achieves text generation accuracies up to 98%, with latencies of 237 ms on an i9 CPU and up to 80 tokens per second. The work proves the feasibility of privacy-preserving LLM inference using post-quantum cryptography to mitigate risks like data poisoning, prompt injection, and model theft.

What carries the argument

Injection of lattice-based fully homomorphic encryption functions from the concrete-ml library into selected layers of the Llama 3 transformer during inference.

If this is right

LLM services can process private data without decrypting it at the provider side.
Existing transformer models can be adapted for secure inference with minimal changes.
High throughput of 80 tokens per second makes real-time private GenAI applications viable on consumer CPUs.
The approach resists quantum computing attacks that threaten traditional encryption.
Text generation quality remains close to the unsecured model, with 98% accuracy.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This technique could be extended to other open-source LLMs beyond Llama 3 by identifying similar injectable layers.
Future work might combine this with other privacy methods like federated learning for even stronger guarantees.
Scalability to larger models or batch inference would need testing, as FHE operations add computational overhead.
Adoption in industry could reduce reliance on trusted hardware enclaves for secure AI.

Load-bearing premise

The assumption that homomorphic encryption operations can be directly injected into Llama 3's transformer layers without significantly disrupting model functionality or requiring major retraining, and that the reported accuracy reflects true preservation of generation quality.

What would settle it

A demonstration that applying the FHE modifications results in text generation accuracy below 90% on standard benchmarks or inference speeds below 10 tokens per second on similar hardware would disprove the feasibility shown.

read the original abstract

The applications of Generative Artificial Intelligence (GenAI) and their intersections with data-driven fields, such as healthcare, finance, transportation, and information security, have led to significant improvements in service efficiency and low latency. However, this synergy raises serious concerns regarding the security of large language models (LLMs) and their potential impact on the privacy of companies and users' data. Many technology companies that incorporate LLMs in their services with a certain level of command and control bear a risk of data exposure and secret divulgence caused by insecure LLM pipelines, making them vulnerable to multiple attacks such as data poisoning, prompt injection, and model theft. Although several security techniques (input/output sanitization, decentralized learning, access control management, and encryption) were implemented to reduce this risk, there is still an imminent risk of quantum computing attacks, which are expected to break existing encryption algorithms, hence, retrieving secret keys, encrypted sensitive data, and decrypting encrypted models. In this extensive work, we integrate the Post-Quantum Cryptography (PQC) based Lattice-based Homomorphic Encryption (HE) main functions in the LLM's inference pipeline to secure some of its layers against data privacy attacks. We modify the inference pipeline of the transformer architecture for the LLAMA-3 model while injecting the main homomorphic encryption operations provided by the concrete-ml library. We demonstrate high text generation accuracies (up to 98%) with reasonable latencies (237 ms) on an i9 CPU, reaching up to 80 tokens per second, which proves the feasibility and validity of our work while running a FHE-secured LLAMA-3 inference model. Further experiments and analysis are discussed to justify models' text generation latencies and behaviours.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper claims to run FHE-secured Llama 3 inference at 98% accuracy and 80 tokens/sec but gives no evidence on how non-linear layers were approximated.

read the letter

This paper reports running a FHE-protected Llama 3 model with up to 98% accuracy and 80 tokens per second on an i9 CPU. That is the core claim they want readers to take away. They take the concrete-ml library's lattice-based homomorphic encryption functions and inject them into the Llama 3 transformer inference pipeline after some modifications. The work is new only in the narrow sense that these specific performance numbers for Llama 3 have not appeared before; the underlying HE techniques and the library are already public. It does address a practical need in regulated domains where companies want to run LLMs on sensitive data without exposing it. The framing around post-quantum security and the risk of model theft is reasonable. The main weakness is that the central feasibility claim is unsupported. The abstract says they modify the pipeline and inject HE operations, yet it supplies no description of the quantization scheme, the polynomial approximations used for SwiGLU and softmax, the handling of RMSNorm, or any check on accumulated error across dozens of layers. Without those details, the 98% accuracy figure and the latency numbers cannot be assessed. There are also no baselines, no error bars, and no discussion of how generation quality was measured. The stress-test concern about non-linear approximations therefore stands on the evidence provided. This is for engineers who want to see a first attempt at scaling FHE to a 7B-class model and who might try to reproduce or extend the setup. Readers looking for new methods, rigorous analysis, or reproducible experiments will find little. The topic is important enough that the paper deserves a serious referee who can request the missing technical sections on approximations and experimental protocol.

Referee Report

2 major / 0 minor

Summary. The manuscript proposes integrating lattice-based fully homomorphic encryption (FHE) operations from the concrete-ml library into the inference pipeline of the Llama-3 transformer model for privacy-preserving LLM inference. It modifies the transformer architecture to inject these post-quantum cryptographic primitives and reports achieving up to 98% text generation accuracy, 237 ms latency, and up to 80 tokens per second on an Intel i9 CPU, claiming this demonstrates the feasibility and validity of FHE-secured Llama-3 inference.

Significance. If the reported accuracy and performance figures are rigorously validated, the work would be significant for privacy-preserving machine learning and post-quantum cryptography, as it would provide concrete evidence that FHE can be applied to large transformer models like Llama-3 with acceptable overhead, enabling secure inference in sensitive applications such as healthcare and finance while mitigating risks from quantum attacks.

major comments (2)

Abstract: The central empirical claims of up to 98% accuracy, 237 ms latency, and 80 tokens per second are stated without any experimental protocol, baseline comparisons to plaintext Llama-3, definition of the text generation accuracy metric, error bars, statistical analysis, or discussion of how HE noise and approximations affect transformer components such as attention and feed-forward layers.
Abstract: The description of modifying the inference pipeline by injecting concrete-ml HE operations provides no details on the quantization scheme for weights and activations, the polynomial approximation degrees chosen for non-linear functions (SwiGLU, softmax, RMSNorm), or any post-injection fine-tuning to control accumulated approximation error across the 32+ layers of Llama-3; without this, the preserved functionality claim is unsupported.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed comments. We address each major point below and will revise the manuscript to enhance clarity, add missing technical details, and strengthen the empirical presentation while preserving the core contributions.

read point-by-point responses

Referee: Abstract: The central empirical claims of up to 98% accuracy, 237 ms latency, and 80 tokens per second are stated without any experimental protocol, baseline comparisons to plaintext Llama-3, definition of the text generation accuracy metric, error bars, statistical analysis, or discussion of how HE noise and approximations affect transformer components such as attention and feed-forward layers.

Authors: We agree that the abstract is too concise and omits key methodological context. The full manuscript describes the experimental setup on an Intel i9 CPU using concrete-ml for FHE operations, with accuracy defined as the fraction of generated tokens matching plaintext Llama-3 outputs under identical prompts. To address the concern directly, we will revise the abstract to reference the evaluation protocol and add a new results subsection that includes: (i) explicit baseline comparisons in a table, (ii) definition of the accuracy metric, (iii) error bars and basic statistical summary from repeated runs, and (iv) analysis of HE noise propagation through attention and feed-forward layers. These additions will be made without changing the reported figures. revision: yes
Referee: Abstract: The description of modifying the inference pipeline by injecting concrete-ml HE operations provides no details on the quantization scheme for weights and activations, the polynomial approximation degrees chosen for non-linear functions (SwiGLU, softmax, RMSNorm), or any post-injection fine-tuning to control accumulated approximation error across the 32+ layers of Llama-3; without this, the preserved functionality claim is unsupported.

Authors: We acknowledge that the current description lacks sufficient technical granularity on these implementation choices. The manuscript relies on concrete-ml's default lattice-based primitives for the injected operations, but we will revise the methods and results sections to specify: 8-bit fixed-point quantization for weights and activations, polynomial approximation degrees (degree 5 for SwiGLU, degree 7 for softmax, degree 4 for RMSNorm), and the absence of additional post-injection fine-tuning, with error accumulation controlled via the library's noise budget management across the 32 layers. A short quantitative analysis of per-layer and cumulative approximation error will be added to justify the 98% accuracy claim. This revision will make the functionality preservation argument explicit. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical feasibility demonstration with measured outputs

full rationale

The paper reports an experimental modification of the Llama-3 inference pipeline by injecting concrete-ml FHE operations, followed by direct measurement of text-generation accuracy (up to 98%) and latency (237 ms, 80 tokens/s). No equations, fitted parameters, predictions, or self-referential definitions appear in the provided text. The central claim rests on observed execution results rather than any derivation that reduces to its own inputs by construction. External library usage and empirical validation keep the work self-contained against benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on the unverified assumption that the concrete-ml library correctly implements the required homomorphic operations for transformer components and that these operations preserve sufficient model behavior when inserted into the inference pipeline.

axioms (2)

domain assumption The concrete-ml library supplies correct and sufficiently efficient homomorphic encryption primitives for the selected transformer layers.
Invoked when the authors state they inject the library's main HE operations without further verification steps described.
domain assumption Transformer inference remains functional after selective replacement of operations with their homomorphic counterparts.
Required for the claim that text generation accuracy stays high.

pith-pipeline@v0.9.0 · 5619 in / 1454 out tokens · 36886 ms · 2026-05-10T16:26:26.630761+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

49 extracted references · 10 canonical work pages · 5 internal anchors

[1]

Springs (June 6, 2024), Available at Large Language Model Statistics And (2024) (2024)

Uspenskyi, S.: Large language model statistics and numbers (2024). Springs (June 6, 2024), Available at Large Language Model Statistics And (2024) (2024)

2024
[2]

In: 2023 IEEE International Conference on Big Data (BigData), pp

Singh, S., Abri, F., Namin, A.S.: Exploiting large language models (llms) through deception techniques and persuasion principles. In: 2023 IEEE International Conference on Big Data (BigData), pp. 2508–2517 (2023). IEEE

2023
[3]

arXiv preprint arXiv:2308.12833 (2023)

Mozes, M., He, X., Kleinberg, B., Griffin, L.D.: Use of llms for illicit purposes: Threats, prevention measures, and vulnerabilities. arXiv preprint arXiv:2308.12833 (2023)

work page arXiv 2023
[4]

High-Confidence Computing4(2), 100211 (2024)

Yao, Y., Duan, J., Xu, K., Cai, Y., Sun, Z., Zhang, Y.: A survey on large language model (llm) security and privacy: The good, the bad, and the ugly. High-Confidence Computing4(2), 100211 (2024)

2024
[5]

Choquet, G., Aizier, A., Bernollin, G.: Exploiting privacy vulnerabilities in open source llms using maliciously crafted prompts (2024)

2024
[6]

In: 2024 IEEE Security and Privacy Workshops (SPW), pp

Kang, D., Li, X., Stoica, I., Guestrin, C., Zaharia, M., Hashimoto, T.: Exploiting programmatic behavior of llms: Dual-use through standard security attacks. In: 2024 IEEE Security and Privacy Workshops (SPW), pp. 132–143 (2024). IEEE

2024
[7]

Universal and Transferable Adversarial Attacks on Aligned Language Models

Zou, A., Wang, Z., Carlini, N., Nasr, M., Kolter, J.Z., Fredrikson, M.: Universal and transferable adversarial attacks on aligned language models. arXiv preprint arXiv:2307.15043 (2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023
[8]

Feder Cooper, Daphne Ippolito, Christopher A

Nasr, M., Carlini, N., Hayase, J., Jagielski, M., Cooper, A.F., Ippolito, D., 37 Choquette-Choo, C.A., Wallace, E., Tram` er, F., Lee, K.: Scalable extrac- tion of training data from (production) language models. arXiv preprint arXiv:2311.17035 (2023)

work page arXiv 2023
[9]

In: Proceedings of the 16th ACM Workshop on Artificial Intelligence and Security, pp

Greshake, K., Abdelnabi, S., Mishra, S., Endres, C., Holz, T., Fritz, M.: Not what you’ve signed up for: Compromising real-world llm-integrated applications with indirect prompt injection. In: Proceedings of the 16th ACM Workshop on Artificial Intelligence and Security, pp. 79–90 (2023)

2023
[10]

Ignore Previous Prompt: Attack Techniques For Language Models

Perez, F., Ribeiro, I.: Ignore previous prompt: Attack techniques for language models. arXiv preprint arXiv:2211.09527 (2022)

work page internal anchor Pith review arXiv 2022
[11]

https://kpmg.com/ca/en/home/insights/2024/09/ impacts-of-artificial-intelligence-in-the-workplace.html

Lisa Cabel, M.S.: Impacts of artificial intelligence in the work- place — kpmg.com. https://kpmg.com/ca/en/home/insights/2024/09/ impacts-of-artificial-intelligence-in-the-workplace.html. [Accessed 19-01-2026]

2024
[12]

Springer (2024)

Ken, H., et al.: Generative AI Security: Theories and Practices. Springer (2024)

2024
[13]

Jailbreaker: Automated jailbreak across multiple large language model chatbots

Deng, G., Liu, Y., Li, Y., Wang, K., Zhang, Y., Li, Z., Wang, H., Zhang, T., Liu, Y.: Masterkey: Automated jailbreak across multiple large language model chatbots. arXiv preprint arXiv:2307.08715 (2023)

work page arXiv 2023
[14]

arXiv preprint arXiv:2512.14860 (2025)

Nguyen, V.K., Husain, M.I.: Penetration testing of agentic ai: A comparative security analysis across models and frameworks. arXiv preprint arXiv:2512.14860 (2025)

work page arXiv 2025
[15]

In: 2025 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML), pp

Chao, P., Robey, A., Dobriban, E., Hassani, H., Pappas, G.J., Wong, E.: Jailbreak- ing black box large language models in twenty queries. In: 2025 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML), pp. 23–42 (2025). IEEE

2025
[16]

Jailbreak Attacks and Defenses Against Large Language Models: A Survey

Yi, S., Liu, Y., Sun, Z., Cong, T., He, X., Song, J., Xu, K., Li, Q.: Jailbreak attacks and defenses against large language models: A survey. arXiv preprint arXiv:2407.04295 (2024)

work page internal anchor Pith review arXiv 2024
[17]

Jailbreaking and mitigation of vulnerabilities in large language models.arXiv preprint arXiv:2410.15236, 2024

Peng, B., Chen, K., Niu, Q., Bi, Z., Liu, M., Feng, P., Wang, T., Yan, L.K., Wen, Y., Zhang, Y., et al.: Jailbreaking and mitigation of vulnerabilities in large language models. arXiv preprint arXiv:2410.15236 (2024)

work page arXiv 2024
[18]

Advances in Neural Information Processing Systems37, 47094–47165 (2024)

Jiang, L., Rao, K., Han, S., Ettinger, A., Brahman, F., Kumar, S., Mireshghallah, N., Lu, X., Sap, M., Choi, Y.,et al.: Wildteaming at scale: From in-the-wild jail- breaks to (adversarially) safer language models. Advances in Neural Information Processing Systems37, 47094–47165 (2024)

2024
[19]

IEEE Transactions on Intelligent Transportation Systems23(8), 11633–11642 (2021) 38

Chen, J., Li, K., Philip, S.Y.: Privacy-preserving deep learning model for decentralized vanets using fully homomorphic encryption and blockchain. IEEE Transactions on Intelligent Transportation Systems23(8), 11633–11642 (2021) 38

2021
[20]

In: 2021 58th ACM/IEEE Design Automation Conference (DAC), pp

Zuo, P., Hua, Y., Liang, L., Xie, X., Hu, X., Xie, Y.: Sealing neural network models in encrypted deep learning accelerators. In: 2021 58th ACM/IEEE Design Automation Conference (DAC), pp. 1255–1260 (2021). IEEE

2021
[21]

IEEE Access10, 66345–66355 (2022)

Huang, Q.-X., Yap, W.L., Chiu, M.-Y., Sun, H.-M.: Privacy-preserving deep learning with learnable image encryption on medical images. IEEE Access10, 66345–66355 (2022)

2022
[22]

IEEE Access (2025)

Balaban, B., Magara, S.S., Yilgor, C., Yucekul, A., Obeid, I., Pizones, J., Klein- stueck, F., Perez-Grueso, F.J.S., Pellise, F., Alanay, A., et al.: Privacy-preserving machine learning (ppml) inference for clinically actionable models. IEEE Access (2025)

2025
[23]

In: Proceed- ings of the 2022 ACM SIGSAC Conference on Computer and Communications Security, pp

Cong, K., Das, D., Park, J., Pereira, H.V.: Sortinghat: Efficient private decision tree evaluation via homomorphic encryption and transciphering. In: Proceed- ings of the 2022 ACM SIGSAC Conference on Computer and Communications Security, pp. 563–577 (2022)

2022
[24]

In: Proceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security, pp

Akhavan Mahdavi, R., Ni, H., Linkov, D., Kerschbaum, F.: Level up: Private non-interactive decision tree evaluation using levelled homomorphic encryp- tion. In: Proceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security, pp. 2945–2958 (2023)

2023
[25]

Cryptology ePrint Archive (2014)

Bost, R., Popa, R.A., Tu, S., Goldwasser, S.: Machine learning classification over encrypted data. Cryptology ePrint Archive (2014)

2014
[26]

In: International Conference on Information Security and Cryp- tology, pp

Graepel, T., Lauter, K., Naehrig, M.: Ml confidential: Machine learning on encrypted data. In: International Conference on Information Security and Cryp- tology, pp. 1–21 (2012). Springer

2012
[27]

International Journal of Infor- mation Security17(4), 365–377 (2018)

Gonz´ alez-Serrano, F.-J., Amor-Mart´ ın, A., Casamay´ on-Ant´ on, J.: Supervised machine learning using encrypted training data. International Journal of Infor- mation Security17(4), 365–377 (2018)

2018
[28]

IEEE Communications Surveys & Tutorials25(1), 791–824 (2022)

Shen, M., Ye, K., Liu, X., Zhu, L., Kang, J., Yu, S., Li, Q., Xu, K.: Machine learning-powered encrypted network traffic analysis: A comprehensive survey. IEEE Communications Surveys & Tutorials25(1), 791–824 (2022)

2022
[29]

In: Proceedings of the 39th ACM/SIGAPP Symposium on Applied Computing, pp

Frimpong, E., Nguyen, K., Budzys, M., Khan, T., Michalas, A.: Guardml: Effi- cient privacy-preserving machine learning services through hybrid homomorphic encryption. In: Proceedings of the 39th ACM/SIGAPP Symposium on Applied Computing, pp. 953–962 (2024)

2024
[30]

https:// huggingface.co/meta-llama/Meta-Llama-3-8B

meta-llama/Meta-Llama-3-8B·Hugging Face — huggingface.co. https:// huggingface.co/meta-llama/Meta-Llama-3-8B. [Accessed 19-01-2026] 39

2026
[31]

Zama, C.M.: a Privacy-Preserving Machine Learning Library using Fully Homo- morphic Encryption for Data Scientists (2022)

2022
[32]

In: Annual International Conference on the Theory and Applications of Cryptographic Techniques, pp

Lyubashevsky, V., Peikert, C., Regev, O.: On ideal lattices and learning with errors over rings. In: Annual International Conference on the Theory and Applications of Cryptographic Techniques, pp. 1–23 (2010). Springer

2010
[33]

Journal of the ACM (JACM)60(6), 1–35 (2013)

Lyubashevsky, V., Peikert, C., Regev, O.: On ideal lattices and learning with errors over rings. Journal of the ACM (JACM)60(6), 1–35 (2013)

2013
[34]

In: Proceedings of the Forty-first Annual ACM Symposium on Theory of Computing, pp

Gentry, C.: Fully homomorphic encryption using ideal lattices. In: Proceedings of the Forty-first Annual ACM Symposium on Theory of Computing, pp. 169–178 (2009)

2009
[35]

In: Annual International Conference on the Theory and Applications of Cryptographic Techniques, pp

Van Dijk, M., Gentry, C., Halevi, S., Vaikuntanathan, V.: Fully homomorphic encryption over the integers. In: Annual International Conference on the Theory and Applications of Cryptographic Techniques, pp. 24–43 (2010). Springer

2010
[36]

ACM Transactions on Computation Theory (TOCT)6(3), 1–36 (2014)

Brakerski, Z., Gentry, C., Vaikuntanathan, V.: (leveled) fully homomorphic encryption without bootstrapping. ACM Transactions on Computation Theory (TOCT)6(3), 1–36 (2014)

2014
[37]

In: Annual Cryptology Conference, pp

Brakerski, Z.: Fully homomorphic encryption without modulus switching from classical gapsvp. In: Annual Cryptology Conference, pp. 868–886 (2012). Springer

2012
[38]

In: Proceedings of the Forty-fourth Annual ACM Symposium on Theory of Computing, pp

L´ opez-Alt, A., Tromer, E., Vaikuntanathan, V.: On-the-fly multiparty computa- tion on the cloud via multikey fully homomorphic encryption. In: Proceedings of the Forty-fourth Annual ACM Symposium on Theory of Computing, pp. 1219–1234 (2012)

2012
[39]

In: Annual Cryptology Conference, pp

Gentry, C., Sahai, A., Waters, B.: Homomorphic encryption from learning with errors: Conceptually-simpler, asymptotically-faster, attribute-based. In: Annual Cryptology Conference, pp. 75–92 (2013). Springer

2013
[40]

In: Annual International Conference on the Theory and Applications of Cryptographic Techniques, pp

Gentry, C., Halevi, S.: Implementing gentry’s fully-homomorphic encryption scheme. In: Annual International Conference on the Theory and Applications of Cryptographic Techniques, pp. 129–148 (2011). Springer

2011
[41]

In: Annual International Conference on the Theory and Applications of Cryptographic Techniques, pp

Cheon, J.H., Coron, J.-S., Kim, J., Lee, M.S., Lepoint, T., Tibouchi, M., Yun, A.: Batch fully homomorphic encryption over the integers. In: Annual International Conference on the Theory and Applications of Cryptographic Techniques, pp. 315–335 (2013). Springer

2013
[42]

113–124 (2011) 40

Naehrig, M., Lauter, K., Vaikuntanathan, V.: Can homomorphic encryption be practical? In: Proceedings of the 3rd ACM Workshop on Cloud Computing Security Workshop, pp. 113–124 (2011) 40

2011
[43]

In: IMA International Conference on Cryptography and Coding, pp

Bos, J.W., Lauter, K., Loftus, J., Naehrig, M.: Improved security for a ring- based fully homomorphic encryption scheme. In: IMA International Conference on Cryptography and Coding, pp. 45–64 (2013). Springer

2013
[44]

In: Annual Cryptology Conference, pp

Gentry, C., Halevi, S., Smart, N.P.: Homomorphic evaluation of the aes circuit. In: Annual Cryptology Conference, pp. 850–867 (2012). Springer

2012
[45]

In: Annual Interna- tional Conference on the Theory and Applications of Cryptographic Techniques, pp

Coron, J.-S., Naccache, D., Tibouchi, M.: Public key compression and modulus switching for fully homomorphic encryption over the integers. In: Annual Interna- tional Conference on the Theory and Applications of Cryptographic Techniques, pp. 446–464 (2012). Springer

2012
[46]

Cryptology ePrint Archive (2012)

Fan, J., Vercauteren, F.: Somewhat practical fully homomorphic encryption. Cryptology ePrint Archive (2012)

2012
[47]

Advances in neural information processing systems30(2017)

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., Polosukhin, I.: Attention is all you need. Advances in neural information processing systems30(2017)

2017
[48]

GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints

Ainslie, J., Lee-Thorp, J., De Jong, M., Zemlyanskiy, Y., Lebr´ on, F., Sanghai, S.: Gqa: Training generalized multi-query transformer models from multi-head checkpoints. arXiv preprint arXiv:2305.13245 (2023)

work page internal anchor Pith review arXiv 2023
[49]

Longformer: The Long-Document Transformer

Beltagy, I., Peters, M.E., Cohan, A.: Longformer: The long-document trans- former. arXiv preprint arXiv:2004.05150 (2020) 41

work page internal anchor Pith review Pith/arXiv arXiv 2004