Multi-Input Ciphertext Multiplication for Homomorphic Encryption

Sajjad Akherati; Xinmiao Zhang

arxiv: 2601.15401 · v1 · pith:IPMNZ7DSnew · submitted 2026-01-21 · 💻 cs.CR

Multi-Input Ciphertext Multiplication for Homomorphic Encryption

Sajjad Akherati , Xinmiao Zhang This is my paper

Pith reviewed 2026-05-21 15:27 UTC · model grok-4.3

classification 💻 cs.CR

keywords homomorphic encryptionciphertext multiplicationCKKS schememulti-input multiplicationrescalingrelinearizationhardware architecturenoise budget

0 comments

The pith

Reformulating three-input ciphertext multiplication and extending it with multi-level rescaling enables efficient hardware designs for products of up to twelve encrypted values in homomorphic encryption.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops methods to perform multiplication on three or more ciphertexts at once in schemes like CKKS, where standard operations allow only two inputs. It first rewrites the three-input case so that separate computations can be merged to lower overall complexity. The work then generalizes this to four through twelve inputs by introducing extra evaluation keys for relinearization and a multi-level rescaling technique that groups operations to keep hardware simple. These changes matter because many privacy-preserving tasks require multiplying several encrypted numbers, and lower area and latency make such computations feasible on actual devices. Architectural results show clear reductions compared with earlier designs while preserving noise levels.

Core claim

By reformulating the three-input ciphertext multiplication to combine computations, extending the approach to more inputs with added evaluation keys, and introducing a multi-level rescaling method based on input partitioning, the multiplier hardware can achieve combined rescaling at the cost of a single unit while maintaining the noise budget and correctness of homomorphic operations.

What carries the argument

The multi-level rescaling approach, which relocates rescaling steps across levels and combines them according to input partition guidelines to match the complexity of one rescaling unit.

If this is right

Products of three ciphertexts can be computed with 15 percent less logic area and 50 percent shorter latency than the best existing design.
Multipliers for four to twelve inputs deliver average savings of 32 percent in area and 45 percent in latency while adding evaluation keys for relinearization.
The rescaling units remain comparable in complexity to a single unit even as the number of inputs grows.
Input partitioning guidelines allow more rescaling operations to be combined without increasing the number of dedicated hardware blocks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

These multipliers could support deeper circuits in encrypted machine learning or statistical analysis by keeping noise growth manageable across multiple factors.
The same partitioning and relocation ideas might extend to other sequences of homomorphic operations that involve repeated rescaling.
Verification on FPGA or ASIC platforms would test whether the theoretical area and latency reductions hold under real synthesis constraints.

Load-bearing premise

The multi-level rescaling and input partitioning guidelines preserve the noise budget and correctness of the homomorphic operations without extra overhead.

What would settle it

A hardware implementation of the proposed three-input or twelve-input multiplier that measures actual logic area, clock latency, and final noise growth after multiplication and compares them directly to the best prior design would confirm or refute the claimed savings.

Figures

Figures reproduced from arXiv: 2601.15401 by Sajjad Akherati, Xinmiao Zhang.

**Figure 1.** Figure 1: Hardware architectures for: (a) 2-input ciphertext multiplication using the RNS-CKKS scheme [26]; (b) ModUp operation; (c) ModDown operation; and (d) rescaling (RS). R2 P Q) with modulus P Q, where P has a bit length similar to that of Q. Assume that P is decomposed into K ≥ L co-prime factors as P = ∏K−1 i=0 pi . From the secret key s, the RNS components of ek2 are generated as ek(i) 2 = ( ek(i) 2,0 , ek(… view at source ↗

**Figure 2.** Figure 2: (a) Block diagram for three-input ciphertext multiplication using the RNS-CKKS scheme [26]; (b) architecture for polynomial multiplication (PM) in three-input ciphertext multiplication; (c) architecture for relinearization. The ModUp, ModDown, and RS blocks are implemented using the architecture in [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: (a) Block diagram for the proposed improved three-input ciphertext multiplication; (b) architecture for relinearization; (c) architecture for rs block according to (6); and (d) the architecture for RS* block. The computations inside the dashed block in (b) correspond to those for the ModDown operation after applying the proposed improvements in Section . IV The architectures in [PITH_FULL_IMAGE:figures/fu… view at source ↗

**Figure 4.** Figure 4: Hardware implementation architecture of the proposed multi-RS for µ = 2 (2-RS). From (15), it can be derived that g 1,L−1 η = q −1 L−1 mod qη. For the base case µ = 1, (16) becomes a (η),{1} = g 1,L−1 η a (η) − g 1,L−1 η a (L−1),{0} mod qη = q −1 L−1 a (η) − q −1 L−1 a (L−1) mod qη, which is equal to the result of applying a single rescaling block from (6). Accordingly, (16) holds for the base case µ = 1.… view at source ↗

**Figure 5.** Figure 5: Block diagram for the 6-input ciphertext multiplication: (a) the ciphertexts are partitioned to (2, 2, 2); (b) the ciphertexts are grouped as (5, 1) | (3, 2). moduli qj for 0 ≤ j < L − 2. The outputs of the dashed block are shared, whereas the other computation units in [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗

**Figure 6.** Figure 6: Tree structures for multiplying 17 ciphertexts: (a) a binary tree with the conventional single rescaling; (b) partitions with minimized number of rescaling incorporating multi-rescaling. The numbers in the red, blue, and green blocks of each node represent the number of ciphertext polynomials multiplied, the maximum multiplicative depth allowed, and the number of single rescaling units required at that nod… view at source ↗

read the original abstract

Homomorphic encryption (HE) enables arithmetic operations to be performed directly on encrypted data. It is essential for privacy-preserving applications such as machine learning, medical diagnosis, and financial data analysis. In popular HE schemes, ciphertext multiplication is only defined for two inputs. However, the multiplication of multiple inputs is needed in many HE applications. In our previous work, a three-input ciphertext multiplication method for the CKKS HE scheme was developed. This paper first reformulates the three-input ciphertext multiplication to enable the combination of computations in order to further reduce the complexity. The second contribution is extending the multiplication to multiple inputs without compromising the noise overhead. Additional evaluation keys are introduced to achieve relinearization of polynomial multiplication results. To minimize the complexity of the large number of rescaling units in the multiplier, a theoretical analysis is developed to relocate the rescaling, and a multi-level rescaling approach is proposed to implement combined rescaling with complexity similar to that of a single rescaling unit. Guidelines and examples are provided on the input partition to enable the combination of more rescaling. Additionally, efficient hardware architectures are designed to implement our proposed multipliers. The improved three-input ciphertext multiplier reduces the logic area and latency by 15% and 50%, respectively, compared to the best prior design. For multipliers with more inputs, ranging from 4 to 12, the architectural analysis reveals 32% savings in area and 45% shorter latency, on average, compared to prior work.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This extends their prior three-input CKKS work with reformulated multiplication and multi-level rescaling for 4-12 inputs, showing plausible hardware gains if the noise claims hold.

read the letter

The core advance is a reformulation of their earlier three-input ciphertext multiplier that lets them merge operations, followed by a generalization to 4-12 inputs using extra evaluation keys and a multi-level rescaling approach. They partition the inputs so that combined rescalings stay roughly as complex as one unit, and they back this with hardware architectures that report 15% area and 50% latency cuts on the improved three-input case, plus average 32% area and 45% latency savings for the wider range versus prior designs. That is concrete and directly useful for people building accelerators. The theoretical analysis and partitioning guidelines are the parts that make the scaling look feasible without exploding the number of rescaling units. Credit to them for shipping specific numbers and examples rather than just high-level ideas. The soft spot is the noise budget when going beyond three inputs. The paper argues that the partitioning keeps growth inside CKKS limits without added overhead, but the cumulative effect of several polynomial multiplications plus the relocated rescalings is not trivial. If the model undercounts noise from the extra keys or assumes particular ring dimensions, the claimed savings could require either larger parameters or hidden extra steps in practice. The abstract and analysis give a plausible story, but it would be stronger with explicit noise bounds or simulation results for the 8- and 12-input cases. This is aimed at hardware engineers already working on CKKS accelerators for privacy-preserving ML or analytics. A reader who needs concrete multiplier designs and is willing to verify the noise math themselves will get value. It deserves a serious referee because the implementation claims are specific enough to be checked and the extension is a natural next step in the area.

Referee Report

1 major / 2 minor

Summary. The paper extends ciphertext multiplication in the CKKS homomorphic encryption scheme from two inputs to three and then to 4–12 inputs. It reformulates the three-input case for reduced complexity, introduces additional evaluation keys for relinearization, develops a multi-level rescaling scheme with input-partitioning guidelines to keep rescaling cost comparable to a single unit, and presents hardware architectures. Quantitative claims include 15% area and 50% latency reduction for the improved three-input multiplier versus the best prior design, and average 32% area and 45% latency savings for 4–12 inputs.

Significance. If the noise-budget claims hold, the work offers concrete efficiency gains for HE applications that require multi-operand multiplications (e.g., polynomial evaluation or neural-network layers). The combination of theoretical relocation analysis, partitioning guidelines, and synthesized hardware results is a strength; reproducible area/latency numbers and the parameter-free flavor of the rescaling-complexity argument would be particularly valuable.

major comments (1)

[theoretical analysis / multi-level rescaling] The central claim that multi-input multiplication (4–12 inputs) incurs no extra noise overhead rests on the multi-level rescaling and input-partitioning guidelines (abstract and theoretical-analysis section). The provided noise analysis must explicitly bound the cumulative effect of the additional polynomial multiplications that arise when more than three ciphertexts are combined; without such a bound or a concrete parameter-set example showing that the noise growth stays inside CKKS limits, the reported 32%/45% savings cannot be guaranteed without either increasing noise or inserting unaccounted rescaling steps.

minor comments (2)

[abstract and hardware-results section] The abstract states that the three-input multiplier reduces logic area by 15% and latency by 50% versus the best prior design; the manuscript should name the exact prior architecture (including its reference) and the synthesis conditions (target FPGA/ASIC, clock frequency) used for that comparison.
[preliminaries / key-generation section] Notation for the additional evaluation keys introduced for relinearization of the multi-input products should be introduced once and used consistently; a small table summarizing key count versus number of inputs would improve readability.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the careful and constructive review of our manuscript. We address the major comment below and have revised the manuscript to strengthen the noise analysis as requested.

read point-by-point responses

Referee: The central claim that multi-input multiplication (4–12 inputs) incurs no extra noise overhead rests on the multi-level rescaling and input-partitioning guidelines (abstract and theoretical-analysis section). The provided noise analysis must explicitly bound the cumulative effect of the additional polynomial multiplications that arise when more than three ciphertexts are combined; without such a bound or a concrete parameter-set example showing that the noise growth stays inside CKKS limits, the reported 32%/45% savings cannot be guaranteed without either increasing noise or inserting unaccounted rescaling steps.

Authors: We agree that an explicit bound on noise growth for the multi-input case strengthens the central claim. Section 4 develops a relocation analysis for rescaling and introduces multi-level rescaling together with input-partitioning guidelines that keep the number and cost of rescalings comparable to a single standard rescaling unit. These mechanisms ensure that the additional polynomial multiplications required for 4–12 inputs do not introduce noise growth beyond what occurs in sequential pairwise CKKS multiplications. To make this explicit, we have added a new subsection that derives a formal bound on the cumulative noise contribution from the extra relinearization steps and provides a concrete parameter-set example (using standard CKKS parameters with 128-bit security) confirming that the final noise remains within the allowable budget without extra rescaling operations. The revised analysis supports the reported 32 % area and 45 % latency savings while preserving the original noise overhead. revision: yes

Circularity Check

0 steps flagged

No circularity: hardware architecture proposals and comparisons are independent of self-referential fits or definitions

full rationale

The paper presents new hardware architectures for multi-input CKKS ciphertext multiplication, including reformulation of prior three-input designs, extension via additional evaluation keys, multi-level rescaling, and input partitioning guidelines. Area/latency savings (15%/50% for three-input; 32%/45% average for 4-12 inputs) are derived from explicit architectural analysis and comparison to prior designs, not from any equation or parameter that reduces to its own inputs by construction. The noise budget discussion is a supporting theoretical analysis rather than a load-bearing derivation that loops back. Self-citation to prior three-input work is present but not used to justify uniqueness or force the current results; the central contributions are new partitioning and relocation techniques whose correctness is analyzed separately. No fitted-input-called-prediction, self-definitional, or ansatz-smuggled patterns appear in the provided claims or structure.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claims rest on domain assumptions from prior HE literature regarding polynomial operations and noise control, with no new free parameters or invented entities introduced in the abstract.

axioms (1)

domain assumption Standard noise growth models and relinearization techniques in CKKS homomorphic encryption
These are foundational to the scheme and assumed to hold for the proposed extensions.

pith-pipeline@v0.9.0 · 5791 in / 1281 out tokens · 74864 ms · 2026-05-21T15:27:32.166785+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

29 extracted references · 29 canonical work pages

[1]

Homo- morphic encryption for arithmetic of approximate numbers,

J. H. Cheon, A. Kim, M. Kim, and Y. Song, “Homo- morphic encryption for arithmetic of approximate numbers,” in Proc. of Intl. Conf. on the Theory and Appl. of Cryptol. and Info. Secur., Cham, Switzerland, 2017, pp. 409–437

work page 2017
[2]

A full RNS variant of approximate homomorphic encryption,

J. H. Cheon, K. Han, A. Kim, M. Kim, and Y. Song, “A full RNS variant of approximate homomorphic encryption,” in Proc. of Select. Areas in Cryptog. Intl. Conf., Springer, 2019, pp. 347–368

work page 2019
[3]

Eﬀicient fully homomorphic encryption from (Standard) L WE,

Z. Brakerski and V. Vaikuntanathan, “Eﬀicient fully homomorphic encryption from (Standard) L WE,” in Proc. of IEEE Annu. Symp. on Found. of Comp. Sci., 2011, pp. 97–106

work page 2011
[4]

Somewhat practical fully homomorphic encryption,

J. Fan and F. Vercauteren, “Somewhat practical fully homomorphic encryption,” Cryptology ePrint Archive, 2012

work page 2012
[5]

(Leveled) fully homomorphic encryption without bootstrapping,

Z. Brakerski, C. Gentry, and V. Vaikuntanathan, “(Leveled) fully homomorphic encryption without bootstrapping,” in Proc. of Innov. in Theoret. Comp. Sci. Conf., Cambridge, Massachusetts, 2012, pp. 309–325

work page 2012
[6]

Reduced-complexity modular polynomial multiplication for R-L WE cryp- tosystems,

X. Zhang and K. K. Parhi, “Reduced-complexity modular polynomial multiplication for R-L WE cryp- tosystems,” in Proc. of IEEE Intl. Conf. on Acous- tics, Speech and Sig. Process., 2021, pp. 7853–7857

work page 2021
[7]

Polynomial multiplication architecture with integrated modular reduction for R-L WE cryptosystems,

X. Zhang, Z. Huai, and K. K. Parhi, “Polynomial multiplication architecture with integrated modular reduction for R-L WE cryptosystems,” Jour. of Sig. Process. Syst., vol. 94, no. 8, pp. 799–809, 2022

work page 2022
[8]

PaReNTT: Low-latency parallel residue number system and NTT-based long polynomial modular multiplication for homomorphic encryp- tion,

W. Tan, S. W. Chiu, A. Wang, Y. Lao, and K. K. Parhi, “PaReNTT: Low-latency parallel residue number system and NTT-based long polynomial modular multiplication for homomorphic encryp- tion,” IEEE Trans. on Info. Foren. and Secur., vol. 19, pp. 1646–1659, 2024. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS , VOL. X, NO. X, AUGUST 2025 12 Tabl...

work page 2024
[10]

An area-eﬀicient, conflict-free, and configurable archi- tecture for accelerating NTT/INTT,

S.-H. Liu, C.-Y. Kuo, Y.-N. Mo, and T. Su, “An area-eﬀicient, conflict-free, and configurable archi- tecture for accelerating NTT/INTT,” IEEE Trans. on Very Large Scale Integ. (VLSI) Syst., vol. 32, no. 3, pp. 519–529, 2024

work page 2024
[11]

AC-PM: An area-eﬀicient and configurable polynomial multiplier for lattice based cryptography,

X. Hu, J. Tian, M. Li, and Z. Wang, “AC-PM: An area-eﬀicient and configurable polynomial multiplier for lattice based cryptography,” IEEE Trans. on Circ. and Syst. I, vol. 70, no. 2, pp. 719–732, 2023

work page 2023
[12]

Area- eﬀicient number theoretic transform architecture for homomorphic encryption,

P. Duong-Ngoc, S. Kwon, D. Yoo, and H. Lee, “Area- eﬀicient number theoretic transform architecture for homomorphic encryption,” IEEE Trans. on Circ. and Syst. I, vol. 70, no. 3, pp. 1270–1283, 2023

work page 2023
[13]

Eﬀicient architecture for long integer modular multiplication over Solinas prime,

Z. Huai, K. K. Parhi, and X. Zhang, “Eﬀicient architecture for long integer modular multiplication over Solinas prime,” in Proc. of IEEE Workshop on Sig. Process. Syst., 2021, pp. 146–151

work page 2021
[14]

Eﬀicient gen- eralized integer division and modular reduction architectures for homomorphic encryption

S. Akherati, J. Cai, and X. Zhang, “Eﬀicient gen- eralized integer division and modular reduction architectures for homomorphic encryption”, Journal of Signal Processing Systems, 2025

work page 2025
[15]

A full RNS variant of FV like somewhat homo- morphic encryption schemes,

J.-C. Bajard, J. Eynard, M. A. Hasan, and V. Zucca, “A full RNS variant of FV like somewhat homo- morphic encryption schemes,” in Selected Areas in Cryptography, R. A vanzi and H. Heys, Eds., 2017, pp. 423–442

work page 2017
[16]

Low-complexity cipher- text multiplication for CKKS homomorphic encryp- tion,

S. Akherati and X. Zhang, “Low-complexity cipher- text multiplication for CKKS homomorphic encryp- tion,” IEEE Trans. on Circuits and Syst.-II, vol. 71, no. 3, pp. 1396–1400, 2024

work page 2024
[17]

Improved ciphertext multiplication for RNS-CKKS homomorphic encryp- tion

S. Akherati and X. Zhang, “Improved ciphertext multiplication for RNS-CKKS homomorphic encryp- tion”, in IEEE Workshop on Signal Processing Systems, 2024, pp. 136–140

work page 2024
[18]

F1: A fast and programmable accelerator for fully homomorphic encryption

N. Samardzic, A. Feldmann, A. Krastev, S. De- vadas, R. Dreslinski, C. Peikert, and D. Sanchez, “F1: A fast and programmable accelerator for fully homomorphic encryption”, in 54th Annual IEEE/ACM International Symposium on Microar- chitecture, New York, NY, USA: Association for Computing Machinery, 2021, pp. 238–252

work page 2021
[19]

ARK: Fully homomorphic encryp- tion accelerator with runtime data generation and inter-operation key reuse

J. Kim, G. Lee, S. Kim, G. Sohn, M. Rhu, J. Kim, and J. H. Ahn, “ARK: Fully homomorphic encryp- tion accelerator with runtime data generation and inter-operation key reuse”, in 55th IEEE/ACM In- ternational Symposium on Microarchitecture, 2022, pp. 1237–1254

work page 2022
[20]

BTS: An accelerator for bootstrappable fully homomorphic encryption

S. Kim, J. Kim, M. J. Kim, W. Jung, J. Kim, M. Rhu, and J. H. Ahn, “BTS: An accelerator for bootstrappable fully homomorphic encryption”, in Proceedings of the 49th Annual Intl. Symp. on Computer Arch., New York, NY, USA: Association for Computing Machinery, 2022, pp. 711–725

work page 2022
[21]

SHARP: A short-word hierarchical ac- celerator for robust and practical fully homomorphic encryption

J. Kim, S. Kim, J. Choi, J. Park, D. Kim, and J. H. Ahn, “SHARP: A short-word hierarchical ac- celerator for robust and practical fully homomorphic encryption”, in Proceedings of the 50th Annual Intl. Symp. on Computer Arch., New York, NY, USA: Association for Computing Machinery, 2023

work page 2023
[22]

Minimax approximation of sign function by composite poly- nomial for homomorphic comparison,

E. Lee, J. W. Lee, J. S. No, and Y. S. Kim, “Minimax approximation of sign function by composite poly- nomial for homomorphic comparison,” IEEE Trans. on Dependable and Secure Comp., vol. 19, no. 6, pp. 3711–3727, 2022. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS , VOL. X, NO. X, AUGUST 2025 13

work page 2022
[23]

HyPHEN: A hybrid packing method and its opti- mizations for homomorphic encryption-based neural networks

D. Kim, J. Park, J. Kim, S. Kim, and J. H. Ahn, “HyPHEN: A hybrid packing method and its opti- mizations for homomorphic encryption-based neural networks”, IEEE Access, vol. 12, pp. 3024–3038, 2024

work page 2024
[24]

Homo- morphic encryption for machine learning in medicine and bioinformatics,

A. Wood, K. Najarian, and D. Kahrobaei, “Homo- morphic encryption for machine learning in medicine and bioinformatics,” ACM Comput. Surv., vol. 53, no. 4, 2020

work page 2020
[25]

Ultra-secure storage and analysis of genetic data for the advancement of precision medicine

J. Blindenbach, J. Kang, S. Hong, C. Karam, T. Lehner, and G. Gürsoy, “Ultra-secure storage and analysis of genetic data for the advancement of precision medicine”, bioRxiv, 2024

work page 2024
[26]

Three- input ciphertext multiplication for homomorphic encryption

S. Akherati, Y. J. Tang, and X. Zhang, “Three- input ciphertext multiplication for homomorphic encryption”, in IEEE Intl. Symp. on Circ. and Syst., 2025, pp. 1–5

work page 2025
[27]

Eﬀicient hardware implementation architectures for long integer modu- lar multiplication over general Solinas prime,

Z. Huai, J. Zhou, and X. Zhang, “Eﬀicient hardware implementation architectures for long integer modu- lar multiplication over general Solinas prime,” Jour. of Sig. Process. Syst., vol. 94, no. 10, pp. 1067–1082, 2022

work page 2022
[28]

Homomor- phic evaluation of the AES circuit

C. Gentry, S. Halevi, and N. P. Smart, “Homomor- phic evaluation of the AES circuit”, in Advances in Cryptology, R. Safavi-Naini and R. Canetti, Eds., Berlin, Heidelberg: Springer Berlin Heidelberg, 2012, pp. 850–867

work page 2012
[29]

FIR Filters

S. Winograd, “FIR Filters”, in Arithmetic Complex- ity of Computations, Philadelphia, PA: Society for Industrial and Applied Mathematics, 1980, ch. 5, pp. 39–56

work page 1980
[30]

Low-latency prepro- cessing architecture for residue number system via flexible Barrett reduction for homomorphic encryp- tion

S.-W. Chiu and K. K. Parhi, “Low-latency prepro- cessing architecture for residue number system via flexible Barrett reduction for homomorphic encryp- tion”, IEEE Trans. on Circuits and Syst.-II, vol. 71, no. 5, pp. 2784–2788, 2024. Sajjad Akherati (Graduate Student Member, IEEE) received the B.Sc. degree in electrical engineering from Sharif University o...

work page 2024

[1] [1]

Homo- morphic encryption for arithmetic of approximate numbers,

J. H. Cheon, A. Kim, M. Kim, and Y. Song, “Homo- morphic encryption for arithmetic of approximate numbers,” in Proc. of Intl. Conf. on the Theory and Appl. of Cryptol. and Info. Secur., Cham, Switzerland, 2017, pp. 409–437

work page 2017

[2] [2]

A full RNS variant of approximate homomorphic encryption,

J. H. Cheon, K. Han, A. Kim, M. Kim, and Y. Song, “A full RNS variant of approximate homomorphic encryption,” in Proc. of Select. Areas in Cryptog. Intl. Conf., Springer, 2019, pp. 347–368

work page 2019

[3] [3]

Eﬀicient fully homomorphic encryption from (Standard) L WE,

Z. Brakerski and V. Vaikuntanathan, “Eﬀicient fully homomorphic encryption from (Standard) L WE,” in Proc. of IEEE Annu. Symp. on Found. of Comp. Sci., 2011, pp. 97–106

work page 2011

[4] [4]

Somewhat practical fully homomorphic encryption,

J. Fan and F. Vercauteren, “Somewhat practical fully homomorphic encryption,” Cryptology ePrint Archive, 2012

work page 2012

[5] [5]

(Leveled) fully homomorphic encryption without bootstrapping,

Z. Brakerski, C. Gentry, and V. Vaikuntanathan, “(Leveled) fully homomorphic encryption without bootstrapping,” in Proc. of Innov. in Theoret. Comp. Sci. Conf., Cambridge, Massachusetts, 2012, pp. 309–325

work page 2012

[6] [6]

Reduced-complexity modular polynomial multiplication for R-L WE cryp- tosystems,

X. Zhang and K. K. Parhi, “Reduced-complexity modular polynomial multiplication for R-L WE cryp- tosystems,” in Proc. of IEEE Intl. Conf. on Acous- tics, Speech and Sig. Process., 2021, pp. 7853–7857

work page 2021

[7] [7]

Polynomial multiplication architecture with integrated modular reduction for R-L WE cryptosystems,

X. Zhang, Z. Huai, and K. K. Parhi, “Polynomial multiplication architecture with integrated modular reduction for R-L WE cryptosystems,” Jour. of Sig. Process. Syst., vol. 94, no. 8, pp. 799–809, 2022

work page 2022

[8] [8]

PaReNTT: Low-latency parallel residue number system and NTT-based long polynomial modular multiplication for homomorphic encryp- tion,

W. Tan, S. W. Chiu, A. Wang, Y. Lao, and K. K. Parhi, “PaReNTT: Low-latency parallel residue number system and NTT-based long polynomial modular multiplication for homomorphic encryp- tion,” IEEE Trans. on Info. Foren. and Secur., vol. 19, pp. 1646–1659, 2024. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS , VOL. X, NO. X, AUGUST 2025 12 Tabl...

work page 2024

[9] [10]

An area-eﬀicient, conflict-free, and configurable archi- tecture for accelerating NTT/INTT,

S.-H. Liu, C.-Y. Kuo, Y.-N. Mo, and T. Su, “An area-eﬀicient, conflict-free, and configurable archi- tecture for accelerating NTT/INTT,” IEEE Trans. on Very Large Scale Integ. (VLSI) Syst., vol. 32, no. 3, pp. 519–529, 2024

work page 2024

[10] [11]

AC-PM: An area-eﬀicient and configurable polynomial multiplier for lattice based cryptography,

X. Hu, J. Tian, M. Li, and Z. Wang, “AC-PM: An area-eﬀicient and configurable polynomial multiplier for lattice based cryptography,” IEEE Trans. on Circ. and Syst. I, vol. 70, no. 2, pp. 719–732, 2023

work page 2023

[11] [12]

Area- eﬀicient number theoretic transform architecture for homomorphic encryption,

P. Duong-Ngoc, S. Kwon, D. Yoo, and H. Lee, “Area- eﬀicient number theoretic transform architecture for homomorphic encryption,” IEEE Trans. on Circ. and Syst. I, vol. 70, no. 3, pp. 1270–1283, 2023

work page 2023

[12] [13]

Eﬀicient architecture for long integer modular multiplication over Solinas prime,

Z. Huai, K. K. Parhi, and X. Zhang, “Eﬀicient architecture for long integer modular multiplication over Solinas prime,” in Proc. of IEEE Workshop on Sig. Process. Syst., 2021, pp. 146–151

work page 2021

[13] [14]

Eﬀicient gen- eralized integer division and modular reduction architectures for homomorphic encryption

S. Akherati, J. Cai, and X. Zhang, “Eﬀicient gen- eralized integer division and modular reduction architectures for homomorphic encryption”, Journal of Signal Processing Systems, 2025

work page 2025

[14] [15]

A full RNS variant of FV like somewhat homo- morphic encryption schemes,

J.-C. Bajard, J. Eynard, M. A. Hasan, and V. Zucca, “A full RNS variant of FV like somewhat homo- morphic encryption schemes,” in Selected Areas in Cryptography, R. A vanzi and H. Heys, Eds., 2017, pp. 423–442

work page 2017

[15] [16]

Low-complexity cipher- text multiplication for CKKS homomorphic encryp- tion,

S. Akherati and X. Zhang, “Low-complexity cipher- text multiplication for CKKS homomorphic encryp- tion,” IEEE Trans. on Circuits and Syst.-II, vol. 71, no. 3, pp. 1396–1400, 2024

work page 2024

[16] [17]

Improved ciphertext multiplication for RNS-CKKS homomorphic encryp- tion

S. Akherati and X. Zhang, “Improved ciphertext multiplication for RNS-CKKS homomorphic encryp- tion”, in IEEE Workshop on Signal Processing Systems, 2024, pp. 136–140

work page 2024

[17] [18]

F1: A fast and programmable accelerator for fully homomorphic encryption

N. Samardzic, A. Feldmann, A. Krastev, S. De- vadas, R. Dreslinski, C. Peikert, and D. Sanchez, “F1: A fast and programmable accelerator for fully homomorphic encryption”, in 54th Annual IEEE/ACM International Symposium on Microar- chitecture, New York, NY, USA: Association for Computing Machinery, 2021, pp. 238–252

work page 2021

[18] [19]

ARK: Fully homomorphic encryp- tion accelerator with runtime data generation and inter-operation key reuse

J. Kim, G. Lee, S. Kim, G. Sohn, M. Rhu, J. Kim, and J. H. Ahn, “ARK: Fully homomorphic encryp- tion accelerator with runtime data generation and inter-operation key reuse”, in 55th IEEE/ACM In- ternational Symposium on Microarchitecture, 2022, pp. 1237–1254

work page 2022

[19] [20]

BTS: An accelerator for bootstrappable fully homomorphic encryption

S. Kim, J. Kim, M. J. Kim, W. Jung, J. Kim, M. Rhu, and J. H. Ahn, “BTS: An accelerator for bootstrappable fully homomorphic encryption”, in Proceedings of the 49th Annual Intl. Symp. on Computer Arch., New York, NY, USA: Association for Computing Machinery, 2022, pp. 711–725

work page 2022

[20] [21]

SHARP: A short-word hierarchical ac- celerator for robust and practical fully homomorphic encryption

J. Kim, S. Kim, J. Choi, J. Park, D. Kim, and J. H. Ahn, “SHARP: A short-word hierarchical ac- celerator for robust and practical fully homomorphic encryption”, in Proceedings of the 50th Annual Intl. Symp. on Computer Arch., New York, NY, USA: Association for Computing Machinery, 2023

work page 2023

[21] [22]

Minimax approximation of sign function by composite poly- nomial for homomorphic comparison,

E. Lee, J. W. Lee, J. S. No, and Y. S. Kim, “Minimax approximation of sign function by composite poly- nomial for homomorphic comparison,” IEEE Trans. on Dependable and Secure Comp., vol. 19, no. 6, pp. 3711–3727, 2022. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS , VOL. X, NO. X, AUGUST 2025 13

work page 2022

[22] [23]

HyPHEN: A hybrid packing method and its opti- mizations for homomorphic encryption-based neural networks

D. Kim, J. Park, J. Kim, S. Kim, and J. H. Ahn, “HyPHEN: A hybrid packing method and its opti- mizations for homomorphic encryption-based neural networks”, IEEE Access, vol. 12, pp. 3024–3038, 2024

work page 2024

[23] [24]

Homo- morphic encryption for machine learning in medicine and bioinformatics,

A. Wood, K. Najarian, and D. Kahrobaei, “Homo- morphic encryption for machine learning in medicine and bioinformatics,” ACM Comput. Surv., vol. 53, no. 4, 2020

work page 2020

[24] [25]

Ultra-secure storage and analysis of genetic data for the advancement of precision medicine

J. Blindenbach, J. Kang, S. Hong, C. Karam, T. Lehner, and G. Gürsoy, “Ultra-secure storage and analysis of genetic data for the advancement of precision medicine”, bioRxiv, 2024

work page 2024

[25] [26]

Three- input ciphertext multiplication for homomorphic encryption

S. Akherati, Y. J. Tang, and X. Zhang, “Three- input ciphertext multiplication for homomorphic encryption”, in IEEE Intl. Symp. on Circ. and Syst., 2025, pp. 1–5

work page 2025

[26] [27]

Eﬀicient hardware implementation architectures for long integer modu- lar multiplication over general Solinas prime,

Z. Huai, J. Zhou, and X. Zhang, “Eﬀicient hardware implementation architectures for long integer modu- lar multiplication over general Solinas prime,” Jour. of Sig. Process. Syst., vol. 94, no. 10, pp. 1067–1082, 2022

work page 2022

[27] [28]

Homomor- phic evaluation of the AES circuit

C. Gentry, S. Halevi, and N. P. Smart, “Homomor- phic evaluation of the AES circuit”, in Advances in Cryptology, R. Safavi-Naini and R. Canetti, Eds., Berlin, Heidelberg: Springer Berlin Heidelberg, 2012, pp. 850–867

work page 2012

[28] [29]

FIR Filters

S. Winograd, “FIR Filters”, in Arithmetic Complex- ity of Computations, Philadelphia, PA: Society for Industrial and Applied Mathematics, 1980, ch. 5, pp. 39–56

work page 1980

[29] [30]

Low-latency prepro- cessing architecture for residue number system via flexible Barrett reduction for homomorphic encryp- tion

S.-W. Chiu and K. K. Parhi, “Low-latency prepro- cessing architecture for residue number system via flexible Barrett reduction for homomorphic encryp- tion”, IEEE Trans. on Circuits and Syst.-II, vol. 71, no. 5, pp. 2784–2788, 2024. Sajjad Akherati (Graduate Student Member, IEEE) received the B.Sc. degree in electrical engineering from Sharif University o...

work page 2024