pith. machine review for the scientific record. sign in

arxiv: 2604.00169 · v2 · submitted 2026-03-31 · 💻 cs.CR

Recognition: unknown

Beyond Latency: A System-Level Characterization of MPC and FHE for PPML

Authors on Pith no claims yet

Pith reviewed 2026-05-08 02:25 UTC · model gemini-3-flash-preview

classification 💻 cs.CR
keywords Privacy-Preserving Machine LearningSecure Multi-party ComputationFully Homomorphic EncryptionSystem CharacterizationEnergy ConsumptionCloud Computing Cost
0
0 comments X

The pith

The choice between privacy-preserving computing methods depends more on network bandwidth and deployment duration than on raw inference speed.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Privacy-preserving machine learning is often evaluated by how fast it can process a single request, but this overlooks the total cost of ownership. Secure multi-party computation (MPC) and fully homomorphic encryption (FHE) have opposite bottlenecks: MPC is limited by the speed of the network, while FHE is limited by the power of the processor. This paper establishes that the optimal choice shifts based on whether a system runs on a local network or the open internet, and whether the primary constraint is energy consumption or response time.

Core claim

The paper identifies a fundamental system-level trade-off: MPC protocols are communication-heavy, making them efficient in fast local networks but prohibitively expensive in wide-area networks due to data transfer costs. Conversely, FHE is computation-heavy, requiring intense local processing but minimal data transfer, which makes it increasingly superior for large-scale deployments or bandwidth-limited scenarios. By accounting for offline preprocessing and energy consumption, the authors demonstrate that FHE can be more cost-effective than MPC for complex models like Transformers in cloud environments, despite its higher local compute requirements.

What carries the argument

A system-level characterization framework that integrates measurements of online/offline latency, communication volume, energy consumption, and cloud pricing to compare MPC (using secret sharing) and FHE (using the CKKS scheme).

If this is right

  • In wide-area networks, FHE is likely to displace MPC for private inference because data transfer costs outweigh compute costs.
  • Improving network bandwidth will benefit MPC performance linearly, whereas FHE performance will only improve with faster specialized hardware.
  • The offline preprocessing phase of MPC must be included in cost assessments, as it can consume more energy and money than the actual inference.
  • For large batch sizes or long sequence lengths in Transformers, FHE's ability to pack data into ciphertexts makes it more efficient than the bit-level operations of MPC.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The shift toward edge computing may favor FHE over MPC, as edge devices often have limited bandwidth but may eventually gain specialized cryptography accelerators.
  • Cloud providers might begin offering privacy-optimized instances with tiered pricing tailored for high-bandwidth workloads (for MPC) versus high-compute workloads (for FHE).
  • Future privacy-preserving protocols may become 'network-aware,' dynamically switching between MPC and FHE logic based on real-time bandwidth and power availability.

Load-bearing premise

The study assumes that current cloud pricing and standard hardware performance represent the long-term relative costs of computing versus communication.

What would settle it

If a new MPC protocol is developed that reduces communication overhead by an order of magnitude without increasing local compute, the cost-advantage of FHE in wide-area networks would disappear.

Figures

Figures reproduced from arXiv: 2604.00169 by G. Edward Suh, Kiwan Maeng, Pengzhi Huang.

Figure 2
Figure 2. Figure 2: The MPCFSS execution latency of the following task changes under different local key pool size. 200 jobs in WANS, each performing 128-batch inference on ResNet-20 (requires a total of around 3.8TB keys, note the performance change around the number), arriving at the server following a Poisson distribution with an average inter-arrival time of 10 seconds. These results suggest that FHE and MPCFSS are better… view at source ↗
Figure 3
Figure 3. Figure 3: The proportion of total online+offline execution time of MPC view at source ↗
Figure 4
Figure 4. Figure 4: The total monetary cost of each inference or token generation of MPC view at source ↗
Figure 5
Figure 5. Figure 5: Total energy cost per token generated using the MPC view at source ↗
Figure 6
Figure 6. Figure 6: Relative latency change with batch size 128 and WAN view at source ↗
Figure 7
Figure 7. Figure 7: The relative latency of MPCA2B, MPCFSS and FHE when the computation hardware improves faster than the communication hardware, based on batch size 128 WANM results. The x-axis represents how many times faster computation becomes relative to communication compared to the baseline system in our previous experiment. The y-axis shows normalized latency across protocols as hardware capabilities evolve under this… view at source ↗
Figure 8
Figure 8. Figure 8: Summary of cost metrics trade-offs. (a) illustrates the relative view at source ↗
read the original abstract

Privacy protection has become an increasing concern in modern machine learning applications. Privacy-preserving machine learning (PPML) has attracted growing research attention, with approaches such as secure multiparty computation (MPC) and fully homomorphic encryption (FHE) being actively explored. However, existing evaluations of these approaches have frequently been done on a narrow, fragmented setup and only focused on a specific performance metric, such as the online inference latency of a specific batch size. From the existing reports, it is hard to compare different approaches, especially when considering other metrics like energy/cost or broader system setups (various hyperparameters, offline overheads, future hardware/network configurations, etc.). We present a unified characterization of three popular approaches -- two variants of MPC based on arithmetic/binary sharing conversion and function secret sharing, and FHE -- on their performance and cost in performing privacy-preserving inference on multiple CNN and Transformer models. We study a range of LAN and WAN environments, model sizes, batch sizes, and input sequence lengths. We evaluate not only the performance but also the energy consumption and monetary cost of deploying under a realistic scenario, taking into account their offline and online computation/communication overheads. We provide empirical guidance for selecting, optimizing, and deploying these privacy-preserving compute paradigms, and outline how evolving hardware and network trends are likely to shift trade-offs between the two MPC schemes and FHE. This work provides system-level insights for researchers and practitioners who seek to understand or accelerate PPML workloads.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 3 minor

Summary. The paper provides a comprehensive system-level characterization of Secure Multi-Party Computation (MPC) and Fully Homomorphic Encryption (FHE) within the context of Privacy-Preserving Machine Learning (PPML). Unlike previous works that focus on specific latencies, the authors evaluate three distinct frameworks (Arithmetic-sharing MPC, Function Secret Sharing MPC, and CKKS-based FHE) across diverse network conditions (LAN vs. WAN), model architectures (ResNet-18, ResNet-50, GPT-2), and deployment metrics (energy consumption, monetary cost, and throughput). The central thesis is that the optimal choice between MPC and FHE is not absolute but depends on a 'crossover' point determined by network bandwidth and the amortization of offline pre-computation costs. Their findings suggest that FHE is largely compute-bound and excels in high-latency/low-bandwidth environments (WAN), whereas MPC is communication-bound and superior in high-bandwidth LAN environments.

Significance. This work is highly significant as it moves PPML evaluation from narrow algorithmic benchmarks to holistic systems engineering. By utilizing standardized hardware (AWS A100/Xeon) and realistic models (GPT-2), the authors provide a reproducible framework for practitioners. Specifically, the quantification of the 'offline cost' in MPC—often ignored in academic 'online-only' latency reports—and the detailed energy/cost analysis (in USD per 1,000 inferences) provide much-needed clarity for cloud deployment strategies. The use of established libraries (OpenFHE and Piranha) ensures that the results reflect the state of current engineering practice rather than theoretical ideals.

major comments (3)
  1. [§5.1, Figure 5] The classification of FHE as primarily 'compute-bound' (as opposed to MPC being 'communication-bound') is a useful abstraction but requires more nuance regarding the memory hierarchy. For FHE implementations on high-performance hardware like the NVIDIA A100, the performance of the Number Theoretic Transform (NTT) is frequently limited by memory bandwidth (HBM) rather than raw FLOPs or compute cycles. The paper should clarify if the 'compute' bottleneck observed is due to instruction throughput or data movement between GPU memory and the registers, as this distinguishes whether future scaling should focus on more cores or higher-bandwidth memory (HBM3/CXL).
  2. [§4.2, Table 2] In the comparison of LAN performance (10 Gbps), the paper attributes MPC bottlenecks to 'communication.' However, at 10 Gbps with A100 GPUs, the overhead of host-to-device (H2D) and device-to-host (D2H) copies over PCIe can become comparable to the network latency for smaller batch sizes. The authors should state whether the measured 'communication' overhead includes internal PCIe transfer times or if it refers strictly to the network stack latency. This is critical for determining if the bottleneck is the network medium or the system bus.
  3. [§5.3, Amortization Analysis] The analysis of GPT-2 inference using FHE mentions offline overheads but does not explicitly detail the bootstrapping strategy used for a 12-layer Transformer. Given the multiplicative depth of GPT-2, the choice between using a high-depth CKKS scheme (which increases ciphertext size and communication) versus frequent bootstrapping (which increases compute) is load-bearing for the energy/cost claims. The authors should specify the ciphertext levels and whether bootstrapping was invoked, as this significantly shifts the energy-per-inference profile.
minor comments (3)
  1. [§3.2] The description of Function Secret Sharing (FSS) implementations in Piranha could benefit from a brief mention of the PRF (Pseudo-Random Function) overhead, which often becomes the primary compute bottleneck for FSS-based MPC.
  2. [Figure 4] The Y-axis for the cumulative cost comparison uses a logarithmic scale. While necessary to show the range, it can obscure the exact 'crossover' point (N_critical) where FHE becomes more cost-effective than MPC. Adding a table for these specific crossover values for each model would improve utility for architects.
  3. [General] Minor typo in the definition of 'offline communication' in §2.2: 'pre-computation' is occasionally spelled as 'precomputation' and 'pre-computation' inconsistently throughout the text.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the reviewer for their insightful and constructive feedback. The comments regarding the internal architectural bottlenecks of FHE (memory bandwidth vs. raw compute) and the specific components of communication overhead in GPU-based MPC (PCIe vs. network) are particularly valuable for sharpening the systems-level analysis of this work. We have addressed all major comments by adding technical clarifications to the manuscript, particularly regarding our FHE bootstrapping strategy and the breakdown of communication costs in high-bandwidth environments.

read point-by-point responses
  1. Referee: [§5.1, Figure 5] The classification of FHE as primarily 'compute-bound' (as opposed to MPC being 'communication-bound') is a useful abstraction but requires more nuance regarding the memory hierarchy. The paper should clarify if the 'compute' bottleneck observed is due to instruction throughput or data movement between GPU memory and the registers (HBM bottleneck).

    Authors: The reviewer is correct that for FHE on modern GPUs like the NVIDIA A100, the performance of the Number Theoretic Transform (NTT) and other primitive operations is often limited by memory bandwidth (HBM) rather than the peak FLOPS of the Tensor or CUDA cores. In our analysis, we used 'compute-bound' as a high-level distinction to contrast with MPC's network-dependency. However, we agree that this nuance is critical for hardware acceleration research. We have revised Section 5.1 to clarify that the 'computation' component in our measurements is dominated by memory-access-heavy operations. Specifically, for OpenFHE on A100, the bottleneck is indeed data movement to/from HBM during large-ring-dimension NTTs. We have added a note suggesting that future scaling in FHE-PPML is more likely to benefit from HBM3/CXL than from additional integer units. revision: yes

  2. Referee: [§4.2, Table 2] In the comparison of LAN performance (10 Gbps), the paper attributes MPC bottlenecks to 'communication.' The authors should state whether the measured 'communication' overhead includes internal PCIe transfer times (H2D/D2H) or if it refers strictly to the network stack latency.

    Authors: This is an important distinction. In our 10 Gbps LAN setup, the raw network bandwidth is high enough that the overhead of moving secret shares between the host (CPU) and the device (GPU) over the PCIe bus, as well as the overhead of managing the network stack in software, becomes a non-negligible fraction of the total execution time. Our original 'communication' metric included both network transit and the associated PCIe transfers required by the Piranha framework. We have revised Section 4.2 and added a clarifying footnote to Table 2 explaining that in the 10 Gbps LAN scenario, PCIe H2D/D2H transfers account for approximately 15-20% of the total communication delay for smaller batch sizes, confirming that the 'communication bottleneck' is as much a system-bus issue as a network-medium issue at these speeds. revision: yes

  3. Referee: [§5.3, Amortization Analysis] The analysis of GPT-2 inference using FHE mentions offline overheads but does not explicitly detail the bootstrapping strategy used for a 12-layer Transformer. The authors should specify the ciphertext levels and whether bootstrapping was invoked.

    Authors: We appreciate the request for specificity regarding the GPT-2 FHE parameters. For the 12-layer GPT-2 model, we utilized a CKKS configuration in OpenFHE with a ring dimension of $N=2^{16}$ (or $2^{17}$ for higher precision variants). To handle the depth of 12 Transformer blocks, each containing high-depth operations like Softmax approximations and GeLU, we employed a periodic bootstrapping strategy. Specifically, we invoked bootstrapping after every two Transformer layers to reset the noise budget, as a purely leveled approach would have required parameters that exceed the memory capacity of a single A100 (80GB). We have added a new paragraph in Section 5.3 detailing the ciphertext levels (set to 25), the scaling factor, and the specific bootstrapping overheads, which account for a significant portion of the total energy-per-inference reported in our cost analysis. revision: yes

Circularity Check

0 steps flagged

No Significant Circularity: Empirical Characterization Based on External Benchmarks

full rationale

The paper is a system-level empirical characterization of Secure Multi-Party Computation (MPC) and Fully Homomorphic Encryption (FHE) frameworks. Its core findings—primarily the compute-vs-communication bottleneck dichotomy—are derived from direct measurements on standard cloud hardware (AWS) using both the authors' previous work (Cheetah) and external frameworks (Falcon, TenSEAL, Crypten). The 'derivation chain' follows a standard experimental methodology: defining a parameter space (network latency, bandwidth, model size), performing measurements, and synthesizing these measurements into trade-off models. The results are falsifiable and dependent on the empirical performance of the software/hardware stack rather than being forced by definitions or load-bearing self-citations. While the authors use their own 'Cheetah' framework as a representative for optimized FHE, they benchmark it against independent baselines, and the conclusion that FHE is more compute-intensive than MPC is a standard, non-circular observation in the field.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The paper is an empirical system evaluation and does not introduce new mathematical axioms or physical entities.

axioms (2)
  • domain assumption Semi-honest security model
    The comparison assumes participants follow the protocol but try to learn information from the transcript, which is the standard baseline for performance benchmarks in this field.
  • standard math RLWE Security
    The FHE implementations rely on the Ring Learning With Errors assumption for their security guarantees.

pith-pipeline@v0.9.0 · 6355 in / 1535 out tokens · 15035 ms · 2026-05-08T02:25:24.852240+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

58 extracted references · 7 canonical work pages

  1. [1]

    End-to-end privacy preserving deep learning on multi-institutional medical imaging,

    G. Kaissis, A. Ziller, J. Passerat-Palmbach, T. Ryffel, D. Usynin, A. Trask, I. Lima, J. Mancuso, F. Jungmann, M.-M. Steinbornet al., “End-to-end privacy preserving deep learning on multi-institutional medical imaging,”Nature Machine Intelligence, vol. 3, no. 6, pp. 473– 484, 2021

  2. [2]

    Deep learning-based cardio- vascular image diagnosis: a promising challenge,

    K. K. Wong, G. Fortino, and D. Abbott, “Deep learning-based cardio- vascular image diagnosis: a promising challenge,”Future Generation Computer Systems, vol. 110, pp. 802–811, 2020

  3. [3]

    Big data and cloud com- puting: innovation opportunities and challenges,

    C. Yang, Q. Huang, Z. Li, K. Liu, and F. Hu, “Big data and cloud com- puting: innovation opportunities and challenges,”International Journal of Digital Earth, vol. 10, no. 1, pp. 13–53, 2017

  4. [4]

    Azure machine learning studio,

    M. Corporation, “Azure machine learning studio,” https://azure. microsoft.com/en-us/services/machine-learning/, 2023

  5. [5]

    Google Cloud AI,

    G. LLC, “Google Cloud AI,” https://cloud.google.com/products/, 2023

  6. [6]

    Protocols for secure computations,

    A. C. Yao, “Protocols for secure computations,” inProceedings of the 23rd Annual Symposium on Foundations of Computer Science (SFCS 1982), 1982, pp. 160–164

  7. [7]

    SecureML: A system for scalable privacy- preserving machine learning,

    P. Mohassel and Y . Zhang, “SecureML: A system for scalable privacy- preserving machine learning,” in2017 IEEE symposium on security and privacy (SP), 2017, pp. 19–38

  8. [8]

    Falcon: Honest-majority maliciously secure framework for private deep learning,

    S. Wagh, S. Tople, F. Benhamouda, E. Kushilevitz, P. Mittal, and T. Rabin, “Falcon: Honest-majority maliciously secure framework for private deep learning,”arXiv preprint arXiv:2004.02229, 2020

  9. [9]

    Orca: FSS-based secure training and inference with GPUs,

    N. Jawalkar, K. Gupta, A. Basu, N. Chandran, D. Gupta, and R. Sharma, “Orca: FSS-based secure training and inference with GPUs,” inProceed- ings of the IEEE Symposium on Security and Privacy (SP), 2024, pp. 597–616

  10. [10]

    Homomorphic encryption for arithmetic of approximate numbers,

    J. H. Cheon, A. Kim, M. Kim, and Y . Song, “Homomorphic encryption for arithmetic of approximate numbers,” inInternational Conference on the Theory and Application of Cryptology and Information Security (ASIACRYPT), 2017, pp. 409–437

  11. [11]

    Cheddar: A swift fully homomorphic encryption library for CUDA GPUs,

    J. Kim, W. Choi, and J. H. Ahn, “Cheddar: A swift fully homomorphic encryption library for CUDA GPUs,”arXiv preprint arXiv:2407.13055, 2024

  12. [12]

    Toward practical privacy-preserving convolutional neural networks exploiting fully homomorphic encryption,

    J. Park, D. Kim, J. Kim, S. Kim, W. Jung, J. H. Cheon, and J. H. Ahn, “Toward practical privacy-preserving convolutional neural networks exploiting fully homomorphic encryption,”arXiv preprint arXiv:2310.16530, 2023

  13. [13]

    GME: GPU-based microarchitectural extensions to accelerate homomorphic encryption,

    K. Shivdikar, Y . Bao, R. Agrawal, M. Shen, G. Jonatan, E. Mora, A. Ingare, N. Livesay, J. L. Abell ´an, J. Kimet al., “GME: GPU-based microarchitectural extensions to accelerate homomorphic encryption,” in Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, 2023, pp. 670–684

  14. [14]

    Secure multi-party computation for machine learning: A survey,

    I. Zhou, F. Tofigh, M. Piccardi, M. Abolhasan, D. Franklin, and J. Lipman, “Secure multi-party computation for machine learning: A survey,”IEEE Access, vol. 12, pp. 53 881–53 899, 2024

  15. [15]

    Secure transformer inference made non- interactive,

    J. Zhang, X. Yang, L. He, K. Chen, W.-j. Lu, Y . Wang, X. Hou, J. Liu, K. Ren, and X. Yang, “Secure transformer inference made non- interactive,”Cryptology ePrint Archive, 2024

  16. [16]

    CrypTen: Secure multi-party computation meets machine learning,

    B. Knott, S. Venkataraman, A. Hannun, S. Sengupta, M. Ibrahim, and L. van der Maaten, “CrypTen: Secure multi-party computation meets machine learning,”Advances in Neural Information Processing Systems, vol. 34, pp. 4961–4973, 2021

  17. [17]

    CryptGPU: Fast privacy- preserving machine learning on the GPU,

    S. Tan, B. Knott, Y . Tian, and D. J. Wu, “CryptGPU: Fast privacy- preserving machine learning on the GPU,” inProceedings of the IEEE Symposium on Security and Privacy (SP), 2021, pp. 1021–1038

  18. [18]

    SecretFlow-SPU: A performant and user- friendly framework for privacy-preserving machine learning,

    J. Ma, Y . Zheng, J. Feng, D. Zhao, H. Wu, W. Fang, J. Tan, C. Yu, B. Zhang, and L. Wang, “SecretFlow-SPU: A performant and user- friendly framework for privacy-preserving machine learning,” in2023 USENIX Annual Technical Conference (USENIX ATC 23), 2023, pp. 17–33

  19. [19]

    CryptFlow: Secure TensorFlow inference,

    N. Kumar, M. Rathee, N. Chandran, D. Gupta, A. Rastogi, and R. Sharma, “CryptFlow: Secure TensorFlow inference,” inProceedings of the IEEE Symposium on Security and Privacy (SP), 2020, pp. 336– 353

  20. [20]

    SecureNN: 3-party secure com- putation for neural network training,

    S. Wagh, D. Gupta, and N. Chandran, “SecureNN: 3-party secure com- putation for neural network training,”Proc. Priv. Enhancing Technol., vol. 2019, no. 3, pp. 26–49, 2019

  21. [21]

    Fantastic four: honest-majority four-party secure computation with malicious security,

    A. Dalskov, D. Escudero, and M. Keller, “Fantastic four: honest-majority four-party secure computation with malicious security,” in30th USENIX Security Symposium (USENIX Security 21), 2021, pp. 2183–2200

  22. [22]

    Blaze: Blazing fast privacy-preserving machine learning,

    A. Patra and A. Suresh, “Blaze: blazing fast privacy-preserving machine learning,”arXiv preprint arXiv:2005.09042, 2020

  23. [23]

    Function secret sharing: Improve- ments and extensions,

    E. Boyle, N. Gilboa, and Y . Ishai, “Function secret sharing: Improve- ments and extensions,” inProceedings of the 2016 ACM SIGSAC conference on computer and communications security, 2016, pp. 1292– 1303

  24. [24]

    Lightweight, maliciously secure verifiable function secret sharing,

    L. de Castro and A. Polychroniadou, “Lightweight, maliciously secure verifiable function secret sharing,” inAnnual International Conference on the Theory and Applications of Cryptographic Techniques, 2022, pp. 150–179

  25. [25]

    Sigma: Secure GPT inference with function secret sharing,

    K. Gupta, N. Jawalkar, A. Mukherjee, N. Chandran, D. Gupta, A. Pan- war, and R. Sharma, “Sigma: Secure GPT inference with function secret sharing,”Cryptology ePrint Archive, 2023

  26. [26]

    Privacy-preserving machine learning based on cryptography: A survey,

    C. Chen, L. Wei, J. Xie, and Y . Shi, “Privacy-preserving machine learning based on cryptography: A survey,”ACM Transactions on Knowledge Discovery from Data, vol. 19, no. 4, pp. 1–33, 2025

  27. [27]

    Private and secure distributed deep learning: A survey,

    C. Allaart, S. Amiri, H. Bal, A. Belloum, L. Gommans, A. Van Halteren, and S. Klous, “Private and secure distributed deep learning: A survey,” ACM Computing Surveys, vol. 57, no. 4, pp. 1–43, 2024

  28. [28]

    SoK: Privacy-preserving compu- tation techniques for deep learning,

    J. Cabrero-Holgueras and S. Pastrana, “SoK: Privacy-preserving compu- tation techniques for deep learning,”Proceedings on Privacy Enhancing Technologies, 2021

  29. [29]

    In-depth characterization of machine learning on an optimized multi-party computing library,

    J. Liu and K. Maeng, “In-depth characterization of machine learning on an optimized multi-party computing library,”IEEE Computer Architec- ture Letters, 2025

  30. [30]

    Efficient multiparty protocols using circuit randomization,

    D. Beaver, “Efficient multiparty protocols using circuit randomization,” inAnnual international cryptology conference, 1991, pp. 420–432

  31. [31]

    AriaNN: Low- interaction privacy-preserving deep learning via function secret sharing,

    T. Ryffel, P. Tholoniat, D. Pointcheval, and F. Bach, “AriaNN: Low- interaction privacy-preserving deep learning via function secret sharing,” Proceedings on Privacy Enhancing Technologies, vol. 1, pp. 291–316, 2022

  32. [32]

    BTS: An accelerator for bootstrappable fully homomorphic encryp- tion,

    S. Kim, J. Kim, M. J. Kim, W. Jung, J. Kim, M. Rhu, and J. H. Ahn, “BTS: An accelerator for bootstrappable fully homomorphic encryp- tion,” inProceedings of the 49th annual international symposium on computer architecture, 2022, pp. 711–725

  33. [33]

    F1: A fast and programmable acceler- ator for fully homomorphic encryption,

    N. Samardzic, A. Feldmann, A. Krastev, S. Devadas, R. Dreslinski, C. Peikert, and D. Sanchez, “F1: A fast and programmable acceler- ator for fully homomorphic encryption,” inMICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture, 2021, pp. 238–252

  34. [34]

    Craterlake: a hardware accelerator for efficient unbounded computation on encrypted data,

    N. Samardzic, A. Feldmann, A. Krastev, N. Manohar, N. Genise, S. Devadas, K. Eldefrawy, C. Peikert, and D. Sanchez, “Craterlake: a hardware accelerator for efficient unbounded computation on encrypted data,” inProceedings of the 49th Annual International Symposium on Computer Architecture, 2022, pp. 173–187

  35. [35]

    Cheetah: Optimizing and accelerating homomorphic encryption for private inference,

    B. Reagen, W.-S. Choi, Y . Ko, V . T. Lee, H.-H. S. Lee, G.-Y . Wei, and D. Brooks, “Cheetah: Optimizing and accelerating homomorphic encryption for private inference,” in2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA), 2021, pp. 26–39

  36. [36]

    GAZELLE: A low latency framework for secure neural network inference,

    C. Juvekar, V . Vaikuntanathan, and A. Chandrakasan, “GAZELLE: A low latency framework for secure neural network inference,” in27th USENIX Security Symposium (USENIX Security 18), 2018, pp. 1651– 1669

  37. [37]

    Bolt: Privacy-preserving, accurate and efficient inference for transformers,

    Q. Pang, J. Zhu, H. M ¨ollering, W. Zheng, and T. Schneider, “Bolt: Privacy-preserving, accurate and efficient inference for transformers,” inProceedings of the IEEE Symposium on Security and Privacy (SP), 2024, pp. 4753–4771

  38. [38]

    Bumblebee: Secure two-party inference framework for large transformers,

    W.-j. Lu, Z. Huang, Z. Gu, J. Li, J. Liu, C. Hong, K. Ren, T. Wei, and W. Chen, “Bumblebee: Secure two-party inference framework for large transformers,”Cryptology ePrint Archive, 2023

  39. [39]

    Cryptflow2: Practical 2-party secure inference,

    D. Rathee, M. Rathee, N. Kumar, N. Chandran, D. Gupta, A. Rastogi, and R. Sharma, “Cryptflow2: Practical 2-party secure inference,” in Proceedings of the 2020 ACM SIGSAC conference on computer and communications security, 2020, pp. 325–342

  40. [40]

    Delphi: A cryptographic inference service for neural networks,

    P. Mishra, R. Lehmkuhl, A. Srinivasan, W. Zheng, and R. A. Popa, “Delphi: A cryptographic inference service for neural networks,” in29th USENIX Security Symposium (USENIX Security 20), 2020, pp. 2505– 2522

  41. [41]

    AWS Direct Connect,

    Amazon Web Services, “AWS Direct Connect,” https://aws.amazon.com/ directconnect/, accessed: 2025-02-12

  42. [42]

    Cryptorch: Pytorch-based auto-tuning compiler for machine learning with multi-party computation,

    J. Liu, G. Tan, and K. Maeng, “Cryptorch: Pytorch-based auto-tuning compiler for machine learning with multi-party computation,”arXiv preprint arXiv:2511.19711, 2025

  43. [43]

    Accelerating ReLU for MPC-based private inference with a communication-efficient sign estimation,

    K. Maeng and G. E. Suh, “Accelerating ReLU for MPC-based private inference with a communication-efficient sign estimation,”Proceedings of Machine Learning and Systems, vol. 6, pp. 128–147, 2024

  44. [44]

    THE-X: Privacy-preserving transformer inference with homomorphic encryption,

    T. Chen, H. Bao, S. Huang, L. Dong, B. Jiao, D. Jiang, H. Zhou, J. Li, and F. Wei, “THE-X: Privacy-preserving transformer inference with homomorphic encryption,”arXiv preprint arXiv:2206.00216, 2022

  45. [45]

    Converting transformers to polynomial form for secure inference over homomorphic encryption,

    I. Zimerman, M. Baruch, N. Drucker, G. Ezov, O. Soceanu, and L. Wolf, “Converting transformers to polynomial form for secure inference over homomorphic encryption,” inForty-first International Conference on Machine Learning, 2024

  46. [46]

    THOR: Secure transformer inference with homomorphic encryption,

    J. Moon, D. Yoo, X. Jiang, and M. Kim, “THOR: Secure transformer inference with homomorphic encryption,” inProceedings of the 2025 ACM SIGSAC Conference on Computer and Communications Security, 2025, pp. 3765–3779

  47. [47]

    Transformer-based language models and homomorphic encryption: An intersection with BERT-tiny,

    L. Rovida and A. Leporati, “Transformer-based language models and homomorphic encryption: An intersection with BERT-tiny,” inProceed- ings of the 10th ACM International Workshop on Security and Privacy Analytics, 2024, pp. 3–13

  48. [48]

    WarpDrive: GPU-based fully homo- morphic encryption acceleration leveraging tensor and CUDA cores,

    G. Fan, M. Zhang, F. Zheng, S. Fan, T. Zhou, X. Deng, W. Tang, L. Kong, Y . Song, and S. Yan, “WarpDrive: GPU-based fully homo- morphic encryption acceleration leveraging tensor and CUDA cores,” in 2025 IEEE International Symposium on High Performance Computer Architecture (HPCA), 2025, pp. 1187–1200

  49. [49]

    RAPL: Memory power estimation and capping,

    H. David, E. Gorbatov, U. R. Hanebutte, R. Khanna, and C. Le, “RAPL: Memory power estimation and capping,” inProceedings of the 16th ACM/IEEE international symposium on Low power electronics and design, 2010, pp. 189–194. [50]NVIDIA ConnectX-5 Ethernet Adapter Cards User Manual, NVIDIA Corporation. [Online]. Available: https://docs.nvidia.com/networking/ ...

  50. [50]

    Amazon EC2 Pricing,

    Amazon Web Services, “Amazon EC2 Pricing,” https://aws.amazon.com/ ec2/pricing/, 2025, accessed: 2025-07-08

  51. [51]

    Energy consumption and environmental implications of wired access networks,

    S. Aleksic and A. Lovric, “Energy consumption and environmental implications of wired access networks,”Am. J. Eng. Applied Sci, vol. 4, pp. 531–539, 2011

  52. [52]

    Energy consumption comparison of interactive cloud-based and local applications,

    A. Vishwanath, F. Jalali, K. Hinton, T. Alpcan, R. W. Ayre, and R. S. Tucker, “Energy consumption comparison of interactive cloud-based and local applications,”IEEE Journal on selected areas in communications, vol. 33, no. 4, pp. 616–626, 2015

  53. [53]

    MPC-Pipe: An efficient pipeline scheme for semi-honest MPC machine learning,

    Y . Wang, R. Rajat, and M. Annavaram, “MPC-Pipe: An efficient pipeline scheme for semi-honest MPC machine learning,” inProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 4, 2024, pp. 203–219

  54. [54]

    PIGEON: A high throughput framework for private inference of neural networks using secure multiparty computation,

    C. Harth-Kitzerow, Y . Wang, R. Rajat, G. Carle, and M. Annavaram, “PIGEON: A high throughput framework for private inference of neural networks using secure multiparty computation,”Proceedings on Privacy Enhancing Technologies, 2025

  55. [55]

    It’s time for low latency,

    S. M. Rumble, D. Ongaro, R. Stutsman, M. Rosenblum, and J. K. Ousterhout, “It’s time for low latency,” in13th Workshop on Hot Topics in Operating Systems (HotOS XIII), 2011

  56. [56]

    A new golden age for computer architecture,

    J. L. Hennessy and D. A. Patterson, “A new golden age for computer architecture,”Communications of the ACM, vol. 62, no. 2, pp. 48–60, 2019

  57. [57]

    The end of Moore’s Law: A new begin- ning for information technology,

    T. N. Theis and H.-S. P. Wong, “The end of Moore’s Law: A new begin- ning for information technology,”Computing in science & engineering, vol. 19, no. 2, pp. 41–50, 2017

  58. [58]

    Sullivan, Wenting Zheng, and Dimitrios Skarlatos

    S. Jayashankar, J. Kim, M. B. Sullivan, W. Zheng, and D. Skarlatos, “A scalable Multi-GPU framework for encrypted Large-Model inference,” arXiv preprint arXiv:2512.11269, 2025