Explicit cost analysis of Toom-4 multiplication for incomplete NTT in lattice-based cryptography
Pith reviewed 2026-05-19 23:13 UTC · model grok-4.3
The pith
Explicit addition-chain counts for Toom-4 produce a cost model that identifies parameter ranges where it outperforms Karatsuba inside incomplete-NTT hybrids for lattice cryptography.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
A concrete Toom-4 implementation yields explicit operation counts that separate additions and subtractions from multiplications over the coefficient field; addition-chain analysis converts these counts into a compact cost model for incomplete NTT. Substituting the model into hybrid multiplication strategies reveals the degree and modulus ranges in which Toom-4 hybrids require fewer total operations than Karatsuba hybrids. Experiments on representative parameter sets reproduce the predicted advantage regions.
What carries the argument
Addition-chain-derived explicit counts of field additions, subtractions, and multiplications for Toom-4 that feed directly into an incomplete-NTT cost model.
If this is right
- Hybrid multiplication routines can select Toom-4 or Karatsuba on the basis of degree to reduce total field operations inside incomplete NTT.
- The separated addition and multiplication counts let implementers predict cost without writing the full code first.
- Parameter selection for lattice schemes can now include Toom-4 as a first-class option alongside Karatsuba and truncated NTT.
- Experimental runs confirm that the advantage regions predicted by the model appear in real timings.
Where Pith is reading between the lines
- The same counting technique could be applied to Toom-3 or Toom-5 to enlarge the set of hybrid choices.
- Hardware or SIMD realizations might shift the advantage boundaries once cache misses and vectorized reductions are counted.
- An automated tuner could use the model to pick the multiplication method for any new ring dimension without exhaustive benchmarking.
Load-bearing premise
The addition-chain cost model captures the dominant operations that actually occur in an incomplete-NTT implementation across the tested ranges, without large hidden costs from memory traffic or modular reduction.
What would settle it
Implement the hybrid Toom-4 and Karatsuba multipliers for the exact polynomial degrees and field sizes listed in the identified advantage intervals and measure whether the observed operation totals or runtimes match the model's ordering.
read the original abstract
Polynomial multiplication is fundamental in lattice-based cryptography. While the Number Theoretic Transform (NTT) enables fast multiplication, it imposes constraints on the modulus of the coefficient field. Hafiz et al. (2025) addressed this limitation by analyzing the incomplete NTT, which combines a truncated NTT with conventional multiplication methods In this work, we revisit Toom-4 multiplication in the context of incomplete NTT. Although Toom-4 is asymptotically faster than Karatsuba, its precise cost has not been expressed in a form compatible with the incomplete NTT framework. We present a concrete Toom-4 implementation and derive explicit operation counts that separate additions/subtractions and multiplications over the coefficient field. Our analysis based on addition chains yields a simple cost model for incomplete NTT. Using this model, we analyze hybrid strategies combining Toom-4, Karatsuba, and incomplete NTT. We identify parameter ranges where Toom-4 is advantageous and validate the predicted behavior experimentally.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents a concrete Toom-4 implementation for use with incomplete NTT, derives explicit operation counts that separate additions/subtractions from multiplications over the coefficient field via addition-chain analysis, constructs a simple cost model, analyzes hybrid strategies that combine Toom-4 with Karatsuba and incomplete NTT, identifies parameter ranges in which Toom-4 is advantageous, and validates the predicted behavior experimentally.
Significance. If the cost model holds, the work supplies concrete, implementable guidance for selecting multiplication algorithms in lattice-based schemes that rely on incomplete NTT, thereby helping practitioners optimize performance for concrete parameter sets in post-quantum cryptography.
major comments (1)
- Cost-model section (following the addition-chain derivation): the explicit counts separate additions/subtractions from multiplications, yet the simple cost model does not fold in the modular-reduction cost that follows each coefficient addition or subtraction in Z_q. Because reduction overhead (Barrett, Montgomery, etc.) is q-dependent and non-negligible, the identified advantage regions for Toom-4 hybrids may shift once these costs are included; this directly affects the central claim that the model reliably predicts when Toom-4 outperforms Karatsuba+NTT.
minor comments (1)
- The experimental-validation paragraph would benefit from an explicit table listing the exact (n, q, k) tuples tested together with measured versus predicted cycle counts.
Simulated Author's Rebuttal
We thank the referee for the careful review and constructive comment on our manuscript. We address the major comment below and will incorporate revisions to strengthen the cost model.
read point-by-point responses
-
Referee: Cost-model section (following the addition-chain derivation): the explicit counts separate additions/subtractions from multiplications, yet the simple cost model does not fold in the modular-reduction cost that follows each coefficient addition or subtraction in Z_q. Because reduction overhead (Barrett, Montgomery, etc.) is q-dependent and non-negligible, the identified advantage regions for Toom-4 hybrids may shift once these costs are included; this directly affects the central claim that the model reliably predicts when Toom-4 outperforms Karatsuba+NTT.
Authors: We agree that the current cost model, which is built from the separated addition/subtraction and multiplication counts derived via addition chains, does not explicitly parameterize the cost of modular reductions following each coefficient operation in Z_q. Our focus was on producing operation counts directly compatible with the incomplete NTT framework, treating field arithmetic as the core cost. However, because reduction overhead is indeed q-dependent and non-negligible in practice, we will revise the cost-model section to include an additional term (or parameterized cost) for reductions after each addition or subtraction. We will then re-evaluate the hybrid Toom-4/Karatsuba/incomplete-NTT strategies and update the identified advantage regions accordingly. This revision will be supported by the existing experimental validation, which already reflects full implementation costs including reductions. revision: yes
Circularity Check
No significant circularity; derivation is self-contained
full rationale
The paper derives explicit operation counts for Toom-4 via addition-chain analysis on the algebraic structure of the algorithm, yielding a cost model that is then applied to hybrid strategies. Advantage regions are predicted from this model and separately validated experimentally. No load-bearing self-citations, no fitted parameters renamed as predictions, and no self-definitional reductions appear in the provided derivation chain. The model rests on standard minimization techniques independent of the validation data.
Axiom & Free-Parameter Ledger
axioms (2)
- standard math Toom-4 evaluation and interpolation formulas hold over the coefficient ring used in the incomplete NTT.
- domain assumption Addition chains produce the minimal number of additions for the required linear combinations.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We present a concrete Toom-4 implementation and derive explicit operation counts that separate additions/subtractions and multiplications... Our analysis based on addition chains yields a simple cost model for incomplete NTT.
-
IndisputableMonolith/Foundation/DimensionForcing.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Cw(n; ℓ, L) = 2^ℓ Cw(n/2^ℓ; L) + (3ℓn/2)(1 + 2w) + ...
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Modul e-Lattice-Based Key- Encapsulation Mechanism Standard
National Institute of Standards and Technology: “Modul e-Lattice-Based Key- Encapsulation Mechanism Standard”, FIPS 203, 2024
work page 2024
-
[2]
Modul e-Lattice-Based Digital Signature Standard
National Institute of Standards and Technology: “Modul e-Lattice-Based Digital Signature Standard”, FIPS 204, 2024
work page 2024
-
[3]
Fast Convolution using fermat number transforms with ap- plications to digital filtering
R. Agarwal and C. Burrus: “Fast Convolution using fermat number transforms with ap- plications to digital filtering”, In: IEEE Transactions on A coustics, Speech, and Signal Processing, 22, no. 2, pp. 87–97, 1974
work page 1974
-
[4]
M. Bodrato, “Towards optimal Toom–Cook multiplication for univariate and multivariate polynomials in characteristic 2 and 0”, In: Proc. W AIFI 2007 , LNCS 4547, pp. 116–133, 2007
work page 2007
-
[5]
Integer and polynomial multip lication: towards optimal Toom– Cook matrices
M. Bodrato and A. Zanoni, “Integer and polynomial multip lication: towards optimal Toom– Cook matrices”, In: International Symposium on Symbolic an d Algebraic Computation, ISSAC 2007, pp. 17–24, 2007
work page 2007
-
[6]
CRYSTALS-Kyber: A CCA-Secure Module-Lat tice-Based KEM
J. Bos et al.: “CRYSTALS-Kyber: A CCA-Secure Module-Lat tice-Based KEM”, 2018 IEEE 10th EuroS&P, pp. 353–367, 2018
work page 2018
-
[7]
On the Minimum Computation Time of Functions
S. A. Cook: “On the Minimum Computation Time of Functions ”, PhD thesis, Harvard University, 1966
work page 1966
-
[8]
An Algorithm for the Machine Calculation of Complex Fourier Series
J. W. Cooley and J. W. Tukey: “An Algorithm for the Machine Calculation of Complex Fourier Series”, Mathematics of Computation, 19, No. 90, pp. 297–301, 1965
work page 1965
-
[9]
CRYSTALS-Dilithium: A Lattice-Based D igital Signature Scheme
L. Ducas et al.: “CRYSTALS-Dilithium: A Lattice-Based D igital Signature Scheme”, IACR Transactions on Cryptographic Hardware and Embedded Syste ms, Vol. 2018 (1), pp. 238– 268, 2018
work page 2018
-
[10]
S. M. Hafiz et al.: “Incompleteness in Number-Theoretic Transforms: New Tradeoffs and Faster Lattice-Based Cryptographic Applications”, 2025 I EEE 10th EuroS&P, pp. 565–584, 2025
work page 2025
-
[11]
Multiplication of many-dig ital numbers by automatic com- puters
A. Karatsuba and Y. Ofman: “Multiplication of many-dig ital numbers by automatic com- puters”, Proceedings of USSR Academy of Sciences, 145 (7), pp. 293–294, 1962
work page 1962
-
[12]
Number Theoretic Transform and It s Applications in Lattice-based Cryptosystems: A Survey
Z. Liang and Y. Zhao: “Number Theoretic Transform and It s Applications in Lattice-based Cryptosystems: A Survey”, Cryptology ePrint Archive, Pape r 2022/1682, 2022
work page 2022
-
[13]
Implementation of Toom-4 Hybrid Mul tiplication for Incomplete NTT
S. Oku and M. Kudo: “Implementation of Toom-4 Hybrid Mul tiplication for Incomplete NTT”, GitHub repository, https://github.com/Sakura-Oku/toom4_incomplete_ntt , 2026
work page 2026
-
[14]
The complexity of a scheme of functional eleme nts realizing the multiplication of integers
A. Toom: “The complexity of a scheme of functional eleme nts realizing the multiplication of integers”, In: Soviet Mathematics-Doklady, 7, pp. 714–716, 1963
work page 1963
-
[15]
J. von zur Gathen and J. Gerhard, Modern Computer Algebr a, 3rd ed., Cambridge Uni- versity Press, 2013. 11
work page 2013
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.