Multi-Input Ciphertext Multiplication for Homomorphic Encryption
Pith reviewed 2026-05-21 15:27 UTC · model grok-4.3
The pith
Reformulating three-input ciphertext multiplication and extending it with multi-level rescaling enables efficient hardware designs for products of up to twelve encrypted values in homomorphic encryption.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By reformulating the three-input ciphertext multiplication to combine computations, extending the approach to more inputs with added evaluation keys, and introducing a multi-level rescaling method based on input partitioning, the multiplier hardware can achieve combined rescaling at the cost of a single unit while maintaining the noise budget and correctness of homomorphic operations.
What carries the argument
The multi-level rescaling approach, which relocates rescaling steps across levels and combines them according to input partition guidelines to match the complexity of one rescaling unit.
If this is right
- Products of three ciphertexts can be computed with 15 percent less logic area and 50 percent shorter latency than the best existing design.
- Multipliers for four to twelve inputs deliver average savings of 32 percent in area and 45 percent in latency while adding evaluation keys for relinearization.
- The rescaling units remain comparable in complexity to a single unit even as the number of inputs grows.
- Input partitioning guidelines allow more rescaling operations to be combined without increasing the number of dedicated hardware blocks.
Where Pith is reading between the lines
- These multipliers could support deeper circuits in encrypted machine learning or statistical analysis by keeping noise growth manageable across multiple factors.
- The same partitioning and relocation ideas might extend to other sequences of homomorphic operations that involve repeated rescaling.
- Verification on FPGA or ASIC platforms would test whether the theoretical area and latency reductions hold under real synthesis constraints.
Load-bearing premise
The multi-level rescaling and input partitioning guidelines preserve the noise budget and correctness of the homomorphic operations without extra overhead.
What would settle it
A hardware implementation of the proposed three-input or twelve-input multiplier that measures actual logic area, clock latency, and final noise growth after multiplication and compares them directly to the best prior design would confirm or refute the claimed savings.
Figures
read the original abstract
Homomorphic encryption (HE) enables arithmetic operations to be performed directly on encrypted data. It is essential for privacy-preserving applications such as machine learning, medical diagnosis, and financial data analysis. In popular HE schemes, ciphertext multiplication is only defined for two inputs. However, the multiplication of multiple inputs is needed in many HE applications. In our previous work, a three-input ciphertext multiplication method for the CKKS HE scheme was developed. This paper first reformulates the three-input ciphertext multiplication to enable the combination of computations in order to further reduce the complexity. The second contribution is extending the multiplication to multiple inputs without compromising the noise overhead. Additional evaluation keys are introduced to achieve relinearization of polynomial multiplication results. To minimize the complexity of the large number of rescaling units in the multiplier, a theoretical analysis is developed to relocate the rescaling, and a multi-level rescaling approach is proposed to implement combined rescaling with complexity similar to that of a single rescaling unit. Guidelines and examples are provided on the input partition to enable the combination of more rescaling. Additionally, efficient hardware architectures are designed to implement our proposed multipliers. The improved three-input ciphertext multiplier reduces the logic area and latency by 15% and 50%, respectively, compared to the best prior design. For multipliers with more inputs, ranging from 4 to 12, the architectural analysis reveals 32% savings in area and 45% shorter latency, on average, compared to prior work.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper extends ciphertext multiplication in the CKKS homomorphic encryption scheme from two inputs to three and then to 4–12 inputs. It reformulates the three-input case for reduced complexity, introduces additional evaluation keys for relinearization, develops a multi-level rescaling scheme with input-partitioning guidelines to keep rescaling cost comparable to a single unit, and presents hardware architectures. Quantitative claims include 15% area and 50% latency reduction for the improved three-input multiplier versus the best prior design, and average 32% area and 45% latency savings for 4–12 inputs.
Significance. If the noise-budget claims hold, the work offers concrete efficiency gains for HE applications that require multi-operand multiplications (e.g., polynomial evaluation or neural-network layers). The combination of theoretical relocation analysis, partitioning guidelines, and synthesized hardware results is a strength; reproducible area/latency numbers and the parameter-free flavor of the rescaling-complexity argument would be particularly valuable.
major comments (1)
- [theoretical analysis / multi-level rescaling] The central claim that multi-input multiplication (4–12 inputs) incurs no extra noise overhead rests on the multi-level rescaling and input-partitioning guidelines (abstract and theoretical-analysis section). The provided noise analysis must explicitly bound the cumulative effect of the additional polynomial multiplications that arise when more than three ciphertexts are combined; without such a bound or a concrete parameter-set example showing that the noise growth stays inside CKKS limits, the reported 32%/45% savings cannot be guaranteed without either increasing noise or inserting unaccounted rescaling steps.
minor comments (2)
- [abstract and hardware-results section] The abstract states that the three-input multiplier reduces logic area by 15% and latency by 50% versus the best prior design; the manuscript should name the exact prior architecture (including its reference) and the synthesis conditions (target FPGA/ASIC, clock frequency) used for that comparison.
- [preliminaries / key-generation section] Notation for the additional evaluation keys introduced for relinearization of the multi-input products should be introduced once and used consistently; a small table summarizing key count versus number of inputs would improve readability.
Simulated Author's Rebuttal
We thank the referee for the careful and constructive review of our manuscript. We address the major comment below and have revised the manuscript to strengthen the noise analysis as requested.
read point-by-point responses
-
Referee: The central claim that multi-input multiplication (4–12 inputs) incurs no extra noise overhead rests on the multi-level rescaling and input-partitioning guidelines (abstract and theoretical-analysis section). The provided noise analysis must explicitly bound the cumulative effect of the additional polynomial multiplications that arise when more than three ciphertexts are combined; without such a bound or a concrete parameter-set example showing that the noise growth stays inside CKKS limits, the reported 32%/45% savings cannot be guaranteed without either increasing noise or inserting unaccounted rescaling steps.
Authors: We agree that an explicit bound on noise growth for the multi-input case strengthens the central claim. Section 4 develops a relocation analysis for rescaling and introduces multi-level rescaling together with input-partitioning guidelines that keep the number and cost of rescalings comparable to a single standard rescaling unit. These mechanisms ensure that the additional polynomial multiplications required for 4–12 inputs do not introduce noise growth beyond what occurs in sequential pairwise CKKS multiplications. To make this explicit, we have added a new subsection that derives a formal bound on the cumulative noise contribution from the extra relinearization steps and provides a concrete parameter-set example (using standard CKKS parameters with 128-bit security) confirming that the final noise remains within the allowable budget without extra rescaling operations. The revised analysis supports the reported 32 % area and 45 % latency savings while preserving the original noise overhead. revision: yes
Circularity Check
No circularity: hardware architecture proposals and comparisons are independent of self-referential fits or definitions
full rationale
The paper presents new hardware architectures for multi-input CKKS ciphertext multiplication, including reformulation of prior three-input designs, extension via additional evaluation keys, multi-level rescaling, and input partitioning guidelines. Area/latency savings (15%/50% for three-input; 32%/45% average for 4-12 inputs) are derived from explicit architectural analysis and comparison to prior designs, not from any equation or parameter that reduces to its own inputs by construction. The noise budget discussion is a supporting theoretical analysis rather than a load-bearing derivation that loops back. Self-citation to prior three-input work is present but not used to justify uniqueness or force the current results; the central contributions are new partitioning and relocation techniques whose correctness is analyzed separately. No fitted-input-called-prediction, self-definitional, or ansatz-smuggled patterns appear in the provided claims or structure.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Standard noise growth models and relinearization techniques in CKKS homomorphic encryption
Reference graph
Works this paper leans on
-
[1]
Homo- morphic encryption for arithmetic of approximate numbers,
J. H. Cheon, A. Kim, M. Kim, and Y. Song, “Homo- morphic encryption for arithmetic of approximate numbers,” in Proc. of Intl. Conf. on the Theory and Appl. of Cryptol. and Info. Secur., Cham, Switzerland, 2017, pp. 409–437
work page 2017
-
[2]
A full RNS variant of approximate homomorphic encryption,
J. H. Cheon, K. Han, A. Kim, M. Kim, and Y. Song, “A full RNS variant of approximate homomorphic encryption,” in Proc. of Select. Areas in Cryptog. Intl. Conf., Springer, 2019, pp. 347–368
work page 2019
-
[3]
Efficient fully homomorphic encryption from (Standard) L WE,
Z. Brakerski and V. Vaikuntanathan, “Efficient fully homomorphic encryption from (Standard) L WE,” in Proc. of IEEE Annu. Symp. on Found. of Comp. Sci., 2011, pp. 97–106
work page 2011
-
[4]
Somewhat practical fully homomorphic encryption,
J. Fan and F. Vercauteren, “Somewhat practical fully homomorphic encryption,” Cryptology ePrint Archive, 2012
work page 2012
-
[5]
(Leveled) fully homomorphic encryption without bootstrapping,
Z. Brakerski, C. Gentry, and V. Vaikuntanathan, “(Leveled) fully homomorphic encryption without bootstrapping,” in Proc. of Innov. in Theoret. Comp. Sci. Conf., Cambridge, Massachusetts, 2012, pp. 309–325
work page 2012
-
[6]
Reduced-complexity modular polynomial multiplication for R-L WE cryp- tosystems,
X. Zhang and K. K. Parhi, “Reduced-complexity modular polynomial multiplication for R-L WE cryp- tosystems,” in Proc. of IEEE Intl. Conf. on Acous- tics, Speech and Sig. Process., 2021, pp. 7853–7857
work page 2021
-
[7]
Polynomial multiplication architecture with integrated modular reduction for R-L WE cryptosystems,
X. Zhang, Z. Huai, and K. K. Parhi, “Polynomial multiplication architecture with integrated modular reduction for R-L WE cryptosystems,” Jour. of Sig. Process. Syst., vol. 94, no. 8, pp. 799–809, 2022
work page 2022
-
[8]
W. Tan, S. W. Chiu, A. Wang, Y. Lao, and K. K. Parhi, “PaReNTT: Low-latency parallel residue number system and NTT-based long polynomial modular multiplication for homomorphic encryp- tion,” IEEE Trans. on Info. Foren. and Secur., vol. 19, pp. 1646–1659, 2024. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS , VOL. X, NO. X, AUGUST 2025 12 Tabl...
work page 2024
-
[10]
An area-efficient, conflict-free, and configurable archi- tecture for accelerating NTT/INTT,
S.-H. Liu, C.-Y. Kuo, Y.-N. Mo, and T. Su, “An area-efficient, conflict-free, and configurable archi- tecture for accelerating NTT/INTT,” IEEE Trans. on Very Large Scale Integ. (VLSI) Syst., vol. 32, no. 3, pp. 519–529, 2024
work page 2024
-
[11]
AC-PM: An area-efficient and configurable polynomial multiplier for lattice based cryptography,
X. Hu, J. Tian, M. Li, and Z. Wang, “AC-PM: An area-efficient and configurable polynomial multiplier for lattice based cryptography,” IEEE Trans. on Circ. and Syst. I, vol. 70, no. 2, pp. 719–732, 2023
work page 2023
-
[12]
Area- efficient number theoretic transform architecture for homomorphic encryption,
P. Duong-Ngoc, S. Kwon, D. Yoo, and H. Lee, “Area- efficient number theoretic transform architecture for homomorphic encryption,” IEEE Trans. on Circ. and Syst. I, vol. 70, no. 3, pp. 1270–1283, 2023
work page 2023
-
[13]
Efficient architecture for long integer modular multiplication over Solinas prime,
Z. Huai, K. K. Parhi, and X. Zhang, “Efficient architecture for long integer modular multiplication over Solinas prime,” in Proc. of IEEE Workshop on Sig. Process. Syst., 2021, pp. 146–151
work page 2021
-
[14]
S. Akherati, J. Cai, and X. Zhang, “Efficient gen- eralized integer division and modular reduction architectures for homomorphic encryption”, Journal of Signal Processing Systems, 2025
work page 2025
-
[15]
A full RNS variant of FV like somewhat homo- morphic encryption schemes,
J.-C. Bajard, J. Eynard, M. A. Hasan, and V. Zucca, “A full RNS variant of FV like somewhat homo- morphic encryption schemes,” in Selected Areas in Cryptography, R. A vanzi and H. Heys, Eds., 2017, pp. 423–442
work page 2017
-
[16]
Low-complexity cipher- text multiplication for CKKS homomorphic encryp- tion,
S. Akherati and X. Zhang, “Low-complexity cipher- text multiplication for CKKS homomorphic encryp- tion,” IEEE Trans. on Circuits and Syst.-II, vol. 71, no. 3, pp. 1396–1400, 2024
work page 2024
-
[17]
Improved ciphertext multiplication for RNS-CKKS homomorphic encryp- tion
S. Akherati and X. Zhang, “Improved ciphertext multiplication for RNS-CKKS homomorphic encryp- tion”, in IEEE Workshop on Signal Processing Systems, 2024, pp. 136–140
work page 2024
-
[18]
F1: A fast and programmable accelerator for fully homomorphic encryption
N. Samardzic, A. Feldmann, A. Krastev, S. De- vadas, R. Dreslinski, C. Peikert, and D. Sanchez, “F1: A fast and programmable accelerator for fully homomorphic encryption”, in 54th Annual IEEE/ACM International Symposium on Microar- chitecture, New York, NY, USA: Association for Computing Machinery, 2021, pp. 238–252
work page 2021
-
[19]
J. Kim, G. Lee, S. Kim, G. Sohn, M. Rhu, J. Kim, and J. H. Ahn, “ARK: Fully homomorphic encryp- tion accelerator with runtime data generation and inter-operation key reuse”, in 55th IEEE/ACM In- ternational Symposium on Microarchitecture, 2022, pp. 1237–1254
work page 2022
-
[20]
BTS: An accelerator for bootstrappable fully homomorphic encryption
S. Kim, J. Kim, M. J. Kim, W. Jung, J. Kim, M. Rhu, and J. H. Ahn, “BTS: An accelerator for bootstrappable fully homomorphic encryption”, in Proceedings of the 49th Annual Intl. Symp. on Computer Arch., New York, NY, USA: Association for Computing Machinery, 2022, pp. 711–725
work page 2022
-
[21]
SHARP: A short-word hierarchical ac- celerator for robust and practical fully homomorphic encryption
J. Kim, S. Kim, J. Choi, J. Park, D. Kim, and J. H. Ahn, “SHARP: A short-word hierarchical ac- celerator for robust and practical fully homomorphic encryption”, in Proceedings of the 50th Annual Intl. Symp. on Computer Arch., New York, NY, USA: Association for Computing Machinery, 2023
work page 2023
-
[22]
Minimax approximation of sign function by composite poly- nomial for homomorphic comparison,
E. Lee, J. W. Lee, J. S. No, and Y. S. Kim, “Minimax approximation of sign function by composite poly- nomial for homomorphic comparison,” IEEE Trans. on Dependable and Secure Comp., vol. 19, no. 6, pp. 3711–3727, 2022. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS , VOL. X, NO. X, AUGUST 2025 13
work page 2022
-
[23]
D. Kim, J. Park, J. Kim, S. Kim, and J. H. Ahn, “HyPHEN: A hybrid packing method and its opti- mizations for homomorphic encryption-based neural networks”, IEEE Access, vol. 12, pp. 3024–3038, 2024
work page 2024
-
[24]
Homo- morphic encryption for machine learning in medicine and bioinformatics,
A. Wood, K. Najarian, and D. Kahrobaei, “Homo- morphic encryption for machine learning in medicine and bioinformatics,” ACM Comput. Surv., vol. 53, no. 4, 2020
work page 2020
-
[25]
Ultra-secure storage and analysis of genetic data for the advancement of precision medicine
J. Blindenbach, J. Kang, S. Hong, C. Karam, T. Lehner, and G. Gürsoy, “Ultra-secure storage and analysis of genetic data for the advancement of precision medicine”, bioRxiv, 2024
work page 2024
-
[26]
Three- input ciphertext multiplication for homomorphic encryption
S. Akherati, Y. J. Tang, and X. Zhang, “Three- input ciphertext multiplication for homomorphic encryption”, in IEEE Intl. Symp. on Circ. and Syst., 2025, pp. 1–5
work page 2025
-
[27]
Z. Huai, J. Zhou, and X. Zhang, “Efficient hardware implementation architectures for long integer modu- lar multiplication over general Solinas prime,” Jour. of Sig. Process. Syst., vol. 94, no. 10, pp. 1067–1082, 2022
work page 2022
-
[28]
Homomor- phic evaluation of the AES circuit
C. Gentry, S. Halevi, and N. P. Smart, “Homomor- phic evaluation of the AES circuit”, in Advances in Cryptology, R. Safavi-Naini and R. Canetti, Eds., Berlin, Heidelberg: Springer Berlin Heidelberg, 2012, pp. 850–867
work page 2012
-
[29]
S. Winograd, “FIR Filters”, in Arithmetic Complex- ity of Computations, Philadelphia, PA: Society for Industrial and Applied Mathematics, 1980, ch. 5, pp. 39–56
work page 1980
-
[30]
S.-W. Chiu and K. K. Parhi, “Low-latency prepro- cessing architecture for residue number system via flexible Barrett reduction for homomorphic encryp- tion”, IEEE Trans. on Circuits and Syst.-II, vol. 71, no. 5, pp. 2784–2788, 2024. Sajjad Akherati (Graduate Student Member, IEEE) received the B.Sc. degree in electrical engineering from Sharif University o...
work page 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.