Leveraging ASIC AI Chips for Homomorphic Encryption
Pith reviewed 2026-05-23 06:01 UTC · model grok-4.3
The pith
Compiler framework turns AI ASICs into efficient platforms for homomorphic encryption by mapping modular arithmetic to low-precision matrix multiplications.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
CROSS is a compiler that applies Basis-Aligned Transformation to convert high-precision modular arithmetic into dense INT8 matrix multiplications that utilize the TPU MXU, and Memory-Aligned Transformation to fold data reordering into the kernels offline; on TPU v6e this yields higher throughput per watt for NTT and HE operators than WarpDrive, FIDESlib, FAB, HEAP, and Cheddar.
What carries the argument
Basis-Aligned Transformation (BAT), which rewrites high-precision modular arithmetic as low-precision (INT8) matrix multiplications that preserve exact correctness and security.
If this is right
- Existing AI ASIC hardware becomes usable for privacy-preserving cloud workloads without custom HE silicon.
- HE operators can reach ASIC-level energy efficiency on commodity AI accelerators.
- Compiler transformations can eliminate runtime data-reordering overhead for coarse-grained memory systems.
- NTT and other HE primitives become practical at higher scale under power constraints.
Where Pith is reading between the lines
- The same alignment technique may apply to other matrix-oriented AI accelerators beyond TPUs.
- Production HE services could shift from GPU clusters to lower-power AI ASIC fleets.
- Algorithm designers may begin co-optimizing new HE schemes for low-precision matrix engines from the start.
Load-bearing premise
The basis-aligned transformation converts high-precision modular arithmetic into low-precision matrix multiplications while preserving exact correctness and security properties required by the HE schemes.
What would settle it
A side-by-side measurement of throughput per watt for NTT and HE operators on TPU v6e using CROSS versus the five listed GPU libraries; failure of CROSS to exceed all five would falsify the efficiency claim.
Figures
read the original abstract
Homomorphic Encryption (HE) provides strong data privacy for cloud services but at the cost of prohibitive computational overhead. While GPUs have emerged as a practical platform for accelerating HE, there remains an order-of-magnitude energy-efficiency gap compared to specialized (but expensive) HE ASICs. This paper explores an alternate direction: leveraging existing AI accelerators, like Google's TPUs with coarse-grained compute and memory architectures, to offer a path toward ASIC-level energy efficiency for HE. However, this architectural paradigm creates a fundamental mismatch with SoTA HE algorithms designed for GPUs. These algorithms rely heavily on: (1) high-precision (32-bit) integer arithmetic to now run on a TPU's low-throughput vector unit, leaving its high-throughput low-precision (8-bit) matrix engine (MXU) idle, and (2) fine-grained data permutations that are inefficient on the TPU's coarse-grained memory subsystem. Consequently, porting GPU-optimized HE libraries to TPUs results in severe resource under-utilization and performance degradation. To tackle above challenges, we introduce CROSS, a compiler framework that systematically transforms HE workloads to align with the TPU's architecture. CROSS makes two key contributions: (1) Basis-Aligned Transformation (BAT), a novel technique that converts high-precision modular arithmetic into dense, low-precision (INT8) matrix multiplications, unlocking and improving the utilization of TPU's MXU for HE, and (2) Memory-Aligned Transformation (MAT), which eliminates costly runtime data reordering by embedding reordering into compute kernels through offline parameter transformation. CROSS (TPU v6e) achieves higher throughput per watt on NTT and HE operators than WarpDrive, FIDESlib, FAB, HEAP, and Cheddar, establishing AI ASIC as the SotA efficient platform for HE operators. Code: https://github.com/EfficientPPML/CROSS
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces CROSS, a compiler framework for mapping Homomorphic Encryption (HE) workloads onto TPUs. It proposes two transformations: Basis-Aligned Transformation (BAT), which rewrites high-precision modular arithmetic (including NTT) as dense INT8 matrix multiplications to utilize the TPU MXU, and Memory-Aligned Transformation (MAT), which folds data permutations into offline kernel parameters. The central empirical claim is that CROSS on TPU v6e delivers higher throughput per watt on NTT and HE operators than WarpDrive, FIDESlib, FAB, HEAP, and Cheddar. Open-source code is provided at https://github.com/EfficientPPML/CROSS.
Significance. If BAT and MAT are shown to preserve exact modular semantics and HE security parameters, the result would demonstrate that commodity AI ASICs can close much of the energy-efficiency gap to purpose-built HE accelerators. The direct hardware measurements and public code release are concrete strengths that support reproducibility.
major comments (2)
- [Abstract, BAT paragraph] Abstract (paragraph describing BAT): the claim that BAT converts high-precision modular arithmetic into exact INT8 matrix multiplications without precision loss or overflow is asserted but not accompanied by a derivation, invariant proof, or explicit verification that every intermediate value and final modular reduction matches the original arithmetic for the primes and polynomial degrees used in the evaluated schemes. This equivalence is load-bearing for both correctness and the reported performance numbers.
- [Experimental evaluation] Experimental evaluation (throughput-per-watt results): the comparisons against the five named libraries are presented on real TPU v6e hardware, yet the manuscript does not supply sufficient detail on precision handling during BAT, measurement methodology (e.g., power metering, batch sizes), or workload selection to allow independent assessment of whether these factors affect the central claim of superiority.
minor comments (2)
- [BAT description] Notation for the basis alignment in BAT could be introduced with a small worked example (one prime, small degree) to make the transformation concrete for readers unfamiliar with the technique.
- [Introduction] The abstract lists five baseline libraries; a short table in the introduction summarizing their target platforms and key optimizations would improve readability.
Simulated Author's Rebuttal
We appreciate the referee's feedback highlighting the need for more explicit support of the BAT equivalence claim and additional experimental details. We will revise the manuscript to include these elements, strengthening the presentation of our results on TPU-based HE acceleration.
read point-by-point responses
-
Referee: [Abstract, BAT paragraph] Abstract (paragraph describing BAT): the claim that BAT converts high-precision modular arithmetic into exact INT8 matrix multiplications without precision loss or overflow is asserted but not accompanied by a derivation, invariant proof, or explicit verification that every intermediate value and final modular reduction matches the original arithmetic for the primes and polynomial degrees used in the evaluated schemes. This equivalence is load-bearing for both correctness and the reported performance numbers.
Authors: The manuscript asserts the no-precision-loss property based on the BAT design, but we acknowledge that a full derivation and verification are not present in the current version. We will add this in the revision, including the mathematical invariant and checks for the specific parameters used in our experiments. revision: yes
-
Referee: [Experimental evaluation] Experimental evaluation (throughput-per-watt results): the comparisons against the five named libraries are presented on real TPU v6e hardware, yet the manuscript does not supply sufficient detail on precision handling during BAT, measurement methodology (e.g., power metering, batch sizes), or workload selection to allow independent assessment of whether these factors affect the central claim of superiority.
Authors: We agree that more details are needed for reproducibility. In the revised version, we will include a detailed description of precision handling (referencing the new BAT proof), power metering methodology using TPU hardware counters, specific batch sizes employed, and the criteria for selecting the HE workloads and parameters. revision: yes
Circularity Check
No circularity; claims rest on implementation and hardware measurement
full rationale
The paper introduces CROSS, a compiler with BAT (converting high-precision modular ops to INT8 matmuls) and MAT (embedding reordering into kernels). Its strongest claim—higher throughput/watt on TPU v6e vs. listed baselines—is presented as the outcome of direct hardware execution and measurement, not any derivation, fitted parameter, or equation that reduces to its own inputs by construction. No self-citation chains, uniqueness theorems, or ansatzes are invoked to justify the core mapping or performance numbers. The BAT correctness assertion is an implementation claim (with code released) rather than a self-referential mathematical step. This is the normal case of an engineering paper whose results are externally falsifiable via the provided implementation.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Modular arithmetic operations remain correct when rewritten as low-precision matrix multiplications under the basis-aligned mapping
Reference graph
Works this paper leans on
-
[1]
Number theoretic transforms to implement fast digital convolution,
R. Agarwal and C. Burrus, “Number theoretic transforms to implement fast digital convolution,”Proceedings of the IEEE, 1975
work page 1975
-
[2]
Heap: A fully ho- momorphic encryption accelerator with parallelized bootstrapping,
R. Agrawal, A. Chandrakasan, and A. Joshi, “Heap: A fully ho- momorphic encryption accelerator with parallelized bootstrapping,” in 2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA), 2024, pp. 756–769
work page 2024
-
[3]
Mad: Memory-aware design techniques for accelerating fully homomorphic encryption,
R. Agrawal, L. De Castro, C. Juvekar, A. Chandrakasan, V . Vaikun- tanathan, and A. Joshi, “Mad: Memory-aware design techniques for accelerating fully homomorphic encryption,” inProceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, ser. MICRO ’23. New York, NY , USA: Association for Computing Machinery, 2023
work page 2023
-
[4]
Fab: An fpga-based accel- erator for bootstrappable fully homomorphic encryption,
R. Agrawal, L. de Castro, G. Yang, C. Juvekar, R. Yazicigil, A. Chan- drakasan, V . Vaikuntanathan, and A. Joshi, “Fab: An fpga-based accel- erator for bootstrappable fully homomorphic encryption,” 2022
work page 2022
-
[5]
Fideslib: A fully-fledged open-source fhe library for efficient ckks on gpus,
C. Agull ´o-Domingo, ´Oscar Vera-L´opez, S. Guzelhan, L. Daksha, A. E. Jerari, K. Shivdikar, R. Agrawal, D. Kaeli, A. Joshi, and J. L. Abell ´an, “Fideslib: A fully-fledged open-source fhe library for efficient ckks on gpus,” 2025. [Online]. Available: https://arxiv.org/abs/2507.04775
-
[6]
Implementation and performance evaluation of rns variants of the bfv homomorphic encryption scheme,
A. Al Badawi, Y . Polyakov, K. M. M. Aung, B. Veeravalli, and K. Rohloff, “Implementation and performance evaluation of rns variants of the bfv homomorphic encryption scheme,”IEEE Transactions on Emerging Topics in Computing, vol. 9, no. 2, pp. 941–956, 2021
work page 2021
-
[7]
Homomorphic encryption standard,
M. Albrecht, M. Chase, H. Chen, J. Ding, S. Goldwasser, S. Gorbunov, S. Halevi, J. Hoffstein, K. Laineet al., “Homomorphic encryption standard,”Protecting privacy through homomorphic encryption, 2021
work page 2021
-
[8]
Pallas: a jax kernel language,
J. authors, “Pallas: a jax kernel language,” 2024. [Online]. Available: https://jax.readthedocs.io/en/latest/pallas/index.html
work page 2024
-
[9]
Openfhe: Open-source fully homomorphic encryption library,
A. A. Badawi, J. Bates, F. Bergamaschi, D. B. Cousins, S. Erabelli, N. Genise, S. Halevi, H. Hunt, A. Kim, Y . Lee, Z. Liu, D. Micciancio, I. Quah, Y . Polyakov, S. R.V ., K. Rohloff, J. Saylor, D. Suponitsky, M. Triplett, V . Vaikuntanathan, and V . Zucca, “Openfhe: Open-source fully homomorphic encryption library,” Cryptology ePrint Archive, Paper 2022/...
work page 2022
-
[10]
Demystifying bootstrapping in fully homomorphic encryption,
A. A. Badawi and Y . Polyakov, “Demystifying bootstrapping in fully homomorphic encryption,” Cryptology ePrint Archive, Paper 2023/149,
work page 2023
-
[11]
Available: https://eprint.iacr.org/2023/149
[Online]. Available: https://eprint.iacr.org/2023/149
work page 2023
-
[12]
P. Barrett, “Implementing the rivest shamir and adleman public key encryption algorithm on a standard digital signal processor,” inProceed- ings on Advances in Cryptology—CRYPTO ’86. Berlin, Heidelberg: Springer-Verlag, 1987, p. 311–323
work page 1987
-
[13]
F. Boemer, S. Kim, G. Seifu, F. D. de Souza, V . Gopalet al., “Intel HEXL (release 1.2),” https://github.com/intel/hexl, Sep. 2021
work page 2021
-
[14]
JAX: composable transformations of Python+NumPy programs,
J. Bradbury, R. Frostig, P. Hawkins, M. J. Johnson, C. Leary, D. Maclaurin, G. Necula, A. Paszke, J. VanderPlas, S. Wanderman-Milne, and Q. Zhang, “JAX: composable transformations of Python+NumPy programs,” 2018. [Online]. Available: http://github.com/google/jax
work page 2018
-
[15]
Low Latency Privacy Preserving Inference
A. Brutzkus, O. Elisha, and R. Gilad-Bachrach, “Low latency privacy preserving inference,” 2019. [Online]. Available: https: //arxiv.org/abs/1812.10659
work page internal anchor Pith review Pith/arXiv arXiv 2019
-
[16]
A full rns variant of approximate homomorphic encryption,
J. H. Cheon, K. Han, A. Kim, M. Kim, and Y . Song, “A full rns variant of approximate homomorphic encryption,” inSelected Areas in Cryptography–SAC 2018: 25th International Conference, Calgary, AB, Canada, August 15–17, 2018, Revised Selected Papers 25. Springer, 2019, pp. 347–368
work page 2018
-
[17]
Homomorphic encryption for arithmetic of approximate numbers,
J. H. Cheon, A. Kim, M. Kim, and Y . Song, “Homomorphic encryption for arithmetic of approximate numbers,” Cryptology ePrint Archive, Paper 2016/421, 2016, https://eprint.iacr.org/2016/421. [Online]. Available: https://eprint.iacr.org/2016/421
work page 2016
-
[18]
Dacapo: Automatic bootstrapping management for efficient fully homomorphic encryption,
S. Cheon, Y . Lee, D. Kim, J. M. Lee, S. Jung, T. Kim, D. Lee, and H. Kim, “Dacapo: Automatic bootstrapping management for efficient fully homomorphic encryption,” in33rd USENIX Security Symposium (USENIX Security 24), 2024, pp. 6993–7010
work page 2024
-
[19]
Cheddar: A swift fully homomorphic encryption library designed for gpu architectures,
W. Choi, J. Kim, and J. H. Ahn, “Cheddar: A swift fully homomorphic encryption library designed for gpu architectures,” inProceedings of the 31st ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 1, ser. ASPLOS ’26. New York, NY , USA: Association for Computing Machinery, 2025, p. 35–49. [Online]...
-
[20]
Profile your model on cloud tpu nodes,
G. Cloud, “Profile your model on cloud tpu nodes,” 2024. [Online]. Available: https://cloud.google.com/tpu/docs/cloud-tpu-tools
work page 2024
-
[21]
N. Corporation. Matrix multiplication background user’s guide. NVIDIA. [Online]. Available: https://docs.nvidia.com/deeplearning/ performance/dl-performance-matrix-multiplication/index.html
-
[22]
W. J. Dally and B. P. Towles,Principles and practices of interconnection networks. Elsevier, 2004
work page 2004
-
[23]
Chet: Compiler and runtime for homomorphic evaluation of tensor programs,
R. Dathathri, O. Saarikivi, H. Chen, K. Laine, K. Lauter, S. Maleki, M. Musuvathi, and T. Mytkowicz, “Chet: Compiler and runtime for homomorphic evaluation of tensor programs,” 2018
work page 2018
-
[24]
Does fully homomorphic encryption need compute acceleration?
L. de Castro, R. Agrawal, R. Yazicigil, A. Chandrakasan, V . Vaikun- tanathan, C. Juvekar, and A. Joshi, “Does fully homomorphic encryption need compute acceleration?” 2021
work page 2021
-
[25]
Orion: A fully homomorphic encryption framework for deep learning,
A. Ebel, K. Garimella, and B. Reagen, “Orion: A fully homomorphic encryption framework for deep learning,” inProceedings of the 30th ACM International Conference on Architectural Support for Program- ming Languages and Operating Systems, Volume 2, ser. ASPLOS ’25. New York, NY , USA: Association for Computing Machinery, 2025
work page 2025
-
[26]
Warpdrive: Gpu-based fully homo- morphic encryption acceleration leveraging tensor and cuda cores,
G. Fan, M. Zhang, F. Zheng, S. Fan, T. Zhou, X. Deng, W. Tang, L. Kong, Y . Song, and S. Yan, “Warpdrive: Gpu-based fully homo- morphic encryption acceleration leveraging tensor and cuda cores,” in 2025 IEEE International Symposium on High Performance Computer Architecture (HPCA), 2025, pp. 1187–1200
work page 2025
-
[27]
G. Fan, F. Zheng, L. Wan, L. Gao, Y . Zhao, J. Dong, Y . Song, Y . Wang, and J. Lin, “Towards faster fully homomorphic encryption implementation with integer and floating-point computing power of gpus,” in2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2023, pp. 798–808
work page 2023
-
[28]
Tensorfhe: Achieving practical computation on encrypted data using gpgpu,
S. Fan, Z. Wang, W. Xu, R. Hou, D. Meng, and M. Zhang, “Tensorfhe: Achieving practical computation on encrypted data using gpgpu,” 2022
work page 2022
-
[29]
BASALISC: Programmable hardware accelerator for BGV fully homomorphic encryption,
R. Geelen, M. V . Beirendonck, H. V . L. Pereira, B. Huffman, T. McAuley, B. Selfridge, D. Wagner, G. Dimou, I. Verbauwhede, F. Vercauteren, and D. W. Archer, “BASALISC: Programmable hardware accelerator for BGV fully homomorphic encryption,” Cryptology ePrint Archive, Paper 2022/657, 2022. [Online]. Available: https://eprint.iacr.org/2022/657
work page 2022
-
[30]
Google, “Google Cloud TPU,” 2024. [Online]. Available: https: //cloud.google.com/tpu/docs/system-architecture-tpu-vm
work page 2024
-
[31]
Google, “Gemini 2.5: Pushing the frontier with advanced reasoning, multimodality, long context, and next generation agentic capabilities,”
-
[32]
[Online]. Available: https://arxiv.org/abs/2507.06261
work page internal anchor Pith review Pith/arXiv arXiv
-
[33]
Logistic regression on homomorphic encrypted data at scale,
K. Han, S. Hong, J. H. Cheon, and D. Park, “Logistic regression on homomorphic encrypted data at scale,” inProceedings of the Thirty-Third AAAI Conference on Artificial Intelligence and Thirty-First Innovative Applications of Artificial Intelligence Conference and Ninth AAAI Symposium on Educational Advances in Artificial Intelligence, ser. AAAI’19/IAAI’1...
-
[34]
Cinnamon: A framework for scale-out encrypted ai,
S. Jayashankar, E. Chen, T. Tang, W. Zheng, and D. Skarlatos, “Cinnamon: A framework for scale-out encrypted ai,” inProceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 1, ser. ASPLOS ’25. New York, NY , USA: Association for Computing Machinery, 2025, p. 133–150. [Online]. Av...
-
[35]
Ten lessons from three generations shaped google’s tpuv4i,
N. P. Jouppi, D. Hyun Yoon, M. Ashcraft, M. Gottscho, T. B. Jablin, G. Kurian, J. Laudon, S. Li, P. Ma, X. Ma, T. Norrie, N. Patil, S. Prasad, C. Young, Z. Zhou, and D. Patterson, “Ten lessons from three generations shaped google’s tpuv4i,” in2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA), 2021
work page 2021
-
[36]
N. P. Jouppi, G. Kurian, S. Li, P. Ma, R. Nagarajan, L. Nai, N. Patil, S. Subramanian, A. Swing, B. Towles, C. Young, X. Zhou, Z. Zhou, and D. Patterson, “Tpu v4: An optically reconfigurable supercomputer for machine learning with hardware support for embeddings,” 2023
work page 2023
-
[37]
N. P. Jouppi, D. H. Yoon, G. Kurian, S. Li, N. Patil, J. Laudon, C. Young, and D. A. Patterson, “A domain-specific supercomputer for training deep neural networks,”Commun. ACM, vol. 63, no. 7, pp. 67–78, 2020. [Online]. Available: https://doi.org/10.1145/3360307
-
[38]
W. Jung, S. Kim, J. H. Ahn, J. H. Cheon, and Y . Lee, “Over 100x faster bootstrapping in fully homomorphic encryption through memory- centric optimization with gpus,”IACR Transactions on Cryptographic Hardware and Embedded Systems, pp. 114–148, 2021. 14
work page 2021
-
[39]
Gazelle: A Low Latency Framework for Secure Neural Network Inference
C. Juvekar, V . Vaikuntanathan, and A. Chandrakasan, “Gazelle: A low latency framework for secure neural network inference,” 2018. [Online]. Available: https://arxiv.org/abs/1801.05507
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[40]
Revisiting homomorphic encryption schemes for finite fields,
A. Kim, Y . Polyakov, and V . Zucca, “Revisiting homomorphic encryption schemes for finite fields,” Cryptology ePrint Archive, Paper 2021/204, 2021. [Online]. Available: https://eprint.iacr.org/2021/204
work page 2021
-
[41]
Sharp: A short-word hierarchical accelerator for robust and practical fully homomorphic encryption,
J. Kim, S. Kim, J. Choi, J. Park, D. Kim, and J. H. Ahn, “Sharp: A short-word hierarchical accelerator for robust and practical fully homomorphic encryption,” inProceedings of the 50th Annual International Symposium on Computer Architecture, ser. ISCA ’23. New York, NY , USA: Association for Computing Machinery, 2023. [Online]. Available: https://doi.org/...
-
[42]
J. Kim, G. Lee, S. Kim, G. Sohn, M. Rhu, J. Kim, and J. H. Ahn, “Ark: Fully homomorphic encryption accelerator with runtime data generation and inter-operation key reuse,” in2022 55th IEEE/ACM International Symposium on Microarchitecture (MICRO), 2022, pp. 1237–1254
work page 2022
-
[43]
Accelerating number theoretic transformations for bootstrappable homomorphic encryption on gpus,
S. Kim, W. Jung, J. Park, and J. H. Ahn, “Accelerating number theoretic transformations for bootstrappable homomorphic encryption on gpus,” in2020 IEEE International Symposium on Workload Characterization (IISWC). IEEE, Oct. 2020, p. 264–275. [Online]. Available: http://dx.doi.org/10.1109/IISWC50251.2020.00033
-
[44]
Bts: An accelerator for bootstrappable fully homomorphic encryption,
S. Kim, J. Kim, M. J. Kim, W. Jung, J. Kim, M. Rhu, and J. H. Ahn, “Bts: An accelerator for bootstrappable fully homomorphic encryption,” inProceedings of the 49th Annual International Symposium on Computer Architecture, ser. ISCA ’22. New York, NY , USA: Association for Computing Machinery, 2022, p. 711–725. [Online]. Available: https://doi.org/10.1145/3...
-
[45]
W.-K. Lee, S. Akleylek, D. C.-K. Wong, W.-S. Yap, B.-M. Goi, and S.-O. Hwang, “Parallel implementation of nussbaumer algorithm and number theoretic transform on a gpu platform: application to qtesla,” J. Supercomput., vol. 77, no. 4, p. 3289–3314, Apr. 2021. [Online]. Available: https://doi.org/10.1007/s11227-020-03392-x
-
[46]
Error-latency-aware scale management for fully homomorphic encryption,
Y . Lee, S. Cheon, D. Kim, D. Lee, and H. Kim, “Error-latency-aware scale management for fully homomorphic encryption,” in32nd USENIX Security Symposium (USENIX Security 23), 2023
work page 2023
-
[47]
Performance-aware scale analysis with reserve for homomorphic encryption,
Y . Lee, S. Cheon, D. Kim, D. Lee, and H. Kim, “Performance-aware scale analysis with reserve for homomorphic encryption,” inProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 1, 2024
work page 2024
-
[48]
Hecate: Performance-aware scale optimization for homomor- phic encryption compiler,
Y . Lee, S. Heo, S. Cheon, S. Jeong, C. Kim, E. Kim, D. Lee, and H. Kim, “Hecate: Performance-aware scale optimization for homomor- phic encryption compiler,” in2022 IEEE/ACM International Symposium on Code Generation and Optimization (CGO), 2022, pp. 193–204
work page 2022
-
[49]
Cat: A gpu-accelerated fhe framework with its application to high-precision private dataset query,
Q. Li and R. Zong, “Cat: A gpu-accelerated fhe framework with its application to high-precision private dataset query,” 2025. [Online]. Available: https://arxiv.org/abs/2503.22227
-
[50]
A large-scale survey on the usability of ai programming assistants: Successes and challenges,
J. T. Liang, C. Yang, and B. A. Myers, “A large-scale survey on the usability of ai programming assistants: Successes and challenges,” in Proceedings of the 46th IEEE/ACM international conference on software engineering, 2024, pp. 1–13
work page 2024
-
[51]
Lattice signatures without trapdoors,
V . Lyubashevsky, “Lattice signatures without trapdoors,” inAdvances in Cryptology – EUROCRYPT 2012, D. Pointcheval and T. Johansson, Eds. Berlin, Heidelberg: Springer Berlin Heidelberg, 2012, pp. 738–755
work page 2012
-
[52]
Modular multiplication without trial division,
P. L. Montgomery, “Modular multiplication without trial division,” Mathematics of computation, vol. 44, no. 170, pp. 519–521, 1985
work page 1985
-
[53]
Lattigo: A multiparty homomorphic encryption library in go,
C. V . Mouchet, J.-P. Bossuat, J. R. Troncoso-Pastoriza, and J.-P. Hubaux, “Lattigo: A multiparty homomorphic encryption library in go,” in Proceedings of the 8th Workshop on Encrypted Computing and Applied Homomorphic Cryptography, 2020, pp. 64–70
work page 2020
-
[54]
OpenAI, “Gpt-4o system card,” 2024. [Online]. Available: https: //arxiv.org/abs/2410.21276
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[55]
A. Putra, Prasetiyo, Y . Chen, J. Kim, and J.-Y . Kim, “Strix: An end-to- end streaming architecture with two-level ciphertext batching for fully homomorphic encryption with programmable bootstrapping,” 2023
work page 2023
-
[56]
Cham: A customized homomorphic encryption accelerator for fast matrix-vector product,
X. Ren, Z. Chen, Z. Gu, Y . Lu, R. Zhong, W.-J. Lu, J. Zhang, Y . Zhang, H. Wu, X. Zheng, H. Liu, T. Chu, C. Hong, C. Wei, D. Niu, and Y . Xie, “Cham: A customized homomorphic encryption accelerator for fast matrix-vector product,” in2023 60th ACM/IEEE Design Automation Conference (DAC), 2023, pp. 1–6
work page 2023
-
[57]
Heax: An architecture for computing on encrypted data,
M. S. Riazi, K. Laine, B. Pelton, and W. Dai, “Heax: An architecture for computing on encrypted data,” inProceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems, ser. ASPLOS ’20. New York, NY , USA: Association for Computing Machinery, 2020, p. 1295–1309. [Online]. Available: https:...
-
[58]
F1: A fast and programmable accelerator for fully homomorphic encryption,
N. Samardzic, A. Feldmann, A. Krastev, S. Devadas, R. Dreslinski, C. Peikert, and D. Sanchez, “F1: A fast and programmable accelerator for fully homomorphic encryption,” inMICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture, ser. MICRO ’21. New York, NY , USA: Association for Computing Machinery, 2021, p. 238–252. [Online]. Availab...
-
[59]
Craterlake: A hardware accelerator for efficient unbounded computation on encrypted data,
N. Samardzic, A. Feldmann, A. Krastev, N. Manohar, N. Genise, S. Devadas, K. Eldefrawy, C. Peikert, and D. Sanchez, “Craterlake: A hardware accelerator for efficient unbounded computation on encrypted data,” inProceedings of the 49th Annual International Symposium on Computer Architecture, ser. ISCA ’22. New York, NY , USA: Association for Computing Machi...
work page 2022
-
[60]
Generative artificial intelligence: A systematic review and applications,
S. S. Sengar, A. B. Hasan, S. Kumar, and F. Carroll, “Generative artificial intelligence: A systematic review and applications,” 2024. [Online]. Available: https://arxiv.org/abs/2405.11029
-
[61]
High-throughput polynomial multiplier architecture for lattice-based cryptography,
T. Shimada and M. Ikeda, “High-throughput polynomial multiplier architecture for lattice-based cryptography,” in2021 IEEE International Symposium on Circuits and Systems (ISCAS), 2021, pp. 1–5
work page 2021
-
[62]
Gme: Gpu-based microarchitectural extensions to accelerate homomorphic encryption,
K. Shivdikar, Y . Bao, R. Agrawal, M. Shen, G. Jonatan, E. Mora, A. Ingare, N. Livesay, J. L. Abell ´an, J. Kimet al., “Gme: Gpu-based microarchitectural extensions to accelerate homomorphic encryption,” arXiv preprint arXiv:2309.11001, 2023
-
[63]
Accelerating polynomial multiplication for homomorphic encryption on gpus,
K. Shivdikar, G. Jonatan, E. Mora, N. Livesay, R. Agrawal, A. Joshi, J. Abellan, J. Kim, and D. Kaeli, “Accelerating polynomial multiplication for homomorphic encryption on gpus,” 2022. [Online]. Available: https://arxiv.org/abs/2209.01290
-
[64]
Ntl: A library for doing number theory,
V . Shoupet al., “Ntl: A library for doing number theory,” 2001
work page 2001
-
[65]
Fpga-based high-performance parallel architecture for homomorphic computing on encrypted data,
S. Sinha Roy, F. Turan, K. Jarvinen, F. Vercauteren, and I. Verbauwhede, “Fpga-based high-performance parallel architecture for homomorphic computing on encrypted data,” in2019 IEEE International Symposium on High Performance Computer Architecture (HPCA), 2019
work page 2019
-
[66]
Tensorfhe+: Fully homomorphic encryption acceleration based on linear algebra,
Y . Sun, S. Fan, Z. Yin, X. Song, X. Hu, Z. Du, Q. Guo, W. Xu, R. Hou, D. Meng, S. Bian, and M. Zhan, “Tensorfhe+: Fully homomorphic encryption acceleration based on linear algebra,”IEEE Transactions on Computers, pp. 1–14, 2025
work page 2025
-
[67]
Client-optimized algorithms and acceleration for encrypted compute offloading,
M. van der Hagen and B. Lucia, “Client-optimized algorithms and acceleration for encrypted compute offloading,” inProceedings of the 27th ACM International Conference on Architectural Support for Pro- gramming Languages and Operating Systems, ser. ASPLOS ’22. New York, NY , USA: Association for Computing Machinery, 2022
work page 2022
-
[68]
Chameleon: An efficient fhe scheme switching acceleration on gpus,
Z. Wang, H. He, L. Zhao, P. Li, Z. Li, D. Meng, and R. Hou, “Chameleon: An efficient fhe scheme switching acceleration on gpus,”
-
[69]
Available: https://arxiv.org/abs/2410.05934
[Online]. Available: https://arxiv.org/abs/2410.05934
-
[70]
Z. Wang, P. Li, R. Hou, Z. Li, J. Cao, X. Wang, and D. Meng, “He- booster: An efficient polynomial arithmetic acceleration on gpus for fully homomorphic encryption,”IEEE Transactions on Parallel and Distributed Systems, vol. 34, no. 4, pp. 1067–1081, 2023
work page 2023
-
[71]
WISE-HE. Wise. GitHub repository; main branch; commit f0689fd (”init”). [Online]. Available: https://github.com/WISE-HE/WISE
-
[72]
Phantom: A cuda-accelerated word-wise homomorphic encryption library,
H. Yang, S. Shen, W. Dai, L. Zhou, Z. Liu, and Y . Zhao, “Phantom: A cuda-accelerated word-wise homomorphic encryption library,”IEEE Trans. Dependable Secur. Comput., vol. 21, no. 5, p. 4895–4906, Sep
-
[73]
Available: https://doi.org/10.1109/TDSC.2024.3363900
[Online]. Available: https://doi.org/10.1109/TDSC.2024.3363900
-
[74]
Poseidon: Practical homomorphic encryption accelerator,
Y . Yang, H. Zhang, S. Fan, H. Lu, M. Zhang, and X. Li, “Poseidon: Practical homomorphic encryption accelerator,” in2023 IEEE Interna- tional Symposium on High-Performance Computer Architecture (HPCA), 2023, pp. 870–881
work page 2023
-
[75]
Accelerating encrypted computing on intel gpus,
Y . Zhai, M. Ibrahim, Y . Qiu, F. Boemer, Z. Chen, A. Titov, and A. Lyashevsky, “Accelerating encrypted computing on intel gpus,”
-
[76]
Available: https://arxiv.org/abs/2109.14704
[Online]. Available: https://arxiv.org/abs/2109.14704
-
[77]
Sok: Fully homomorphic encryption accelerators,
J. Zhang, X. Cheng, L. Yang, J. Hu, X. Liu, and K. Chen, “Sok: Fully homomorphic encryption accelerators,”ACM Comput. Surv., vol. 56, no. 12, Oct. 2024. [Online]. Available: https://doi.org/10.1145/3676955
-
[78]
Hengine: A high performance optimization framework on a gpu for homomorphic encryption,
J. Zhao, H. Yang, M. Hao, W. Zhang, H. He, and D. Wang, “Hengine: A high performance optimization framework on a gpu for homomorphic encryption,”ACM Trans. Archit. Code Optim., vol. 22, no. 2, Jul. 2025. [Online]. Available: https://doi.org/10.1145/3732942
-
[79]
Y . Zhu, X. Wang, L. Ju, and S. Guo, “Fxhenn: Fpga-based acceleration framework for homomorphic encrypted cnn inference,” in2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA), 2023, pp. 896–907. 15 APPENDIX A. Abstract We provide scripts to reproduce latency of BAT (Tab. V) and BConv (Tab. VI), throughput of NTT (Tab. VII, ...
-
[80]
Radix-2 Cooley-Tukey NTT algorithm (Butterfly NTT): The NTT converts polynomial representations from the co- efficient domain to the evaluation domain, where polynomial multiplication simplifies to element-wise (vectorized) coeffi- cient multiplication. The NTT and INTT are computationally intensive, accounting for approximately 45.1% to 86.3% of the over...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.