pith. sign in

arxiv: 2605.17505 · v1 · pith:EDUFIN3Ynew · submitted 2026-05-17 · 💻 cs.CR · cs.SC· math.NT

Explicit cost analysis of Toom-4 multiplication for incomplete NTT in lattice-based cryptography

Pith reviewed 2026-05-19 23:13 UTC · model grok-4.3

classification 💻 cs.CR cs.SCmath.NT
keywords Toom-4incomplete NTTlattice-based cryptographypolynomial multiplicationaddition chainsKaratsubahybrid multiplicationoperation counting
0
0 comments X

The pith

Explicit addition-chain counts for Toom-4 produce a cost model that identifies parameter ranges where it outperforms Karatsuba inside incomplete-NTT hybrids for lattice cryptography.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper derives separate counts of additions, subtractions, and multiplications for a concrete Toom-4 implementation by using addition chains. These counts are assembled into a simple cost model compatible with the incomplete-NTT framework that already mixes truncated transforms with schoolbook or Karatsuba steps. The model is then applied to hybrid strategies that choose among Toom-4, Karatsuba, and incomplete NTT according to polynomial degree and coefficient-field size. A reader cares because the analysis pinpoints concrete ranges in which the asymptotically faster Toom-4 becomes the cheaper choice in practice, and the authors confirm the prediction with direct experiments.

Core claim

A concrete Toom-4 implementation yields explicit operation counts that separate additions and subtractions from multiplications over the coefficient field; addition-chain analysis converts these counts into a compact cost model for incomplete NTT. Substituting the model into hybrid multiplication strategies reveals the degree and modulus ranges in which Toom-4 hybrids require fewer total operations than Karatsuba hybrids. Experiments on representative parameter sets reproduce the predicted advantage regions.

What carries the argument

Addition-chain-derived explicit counts of field additions, subtractions, and multiplications for Toom-4 that feed directly into an incomplete-NTT cost model.

If this is right

  • Hybrid multiplication routines can select Toom-4 or Karatsuba on the basis of degree to reduce total field operations inside incomplete NTT.
  • The separated addition and multiplication counts let implementers predict cost without writing the full code first.
  • Parameter selection for lattice schemes can now include Toom-4 as a first-class option alongside Karatsuba and truncated NTT.
  • Experimental runs confirm that the advantage regions predicted by the model appear in real timings.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same counting technique could be applied to Toom-3 or Toom-5 to enlarge the set of hybrid choices.
  • Hardware or SIMD realizations might shift the advantage boundaries once cache misses and vectorized reductions are counted.
  • An automated tuner could use the model to pick the multiplication method for any new ring dimension without exhaustive benchmarking.

Load-bearing premise

The addition-chain cost model captures the dominant operations that actually occur in an incomplete-NTT implementation across the tested ranges, without large hidden costs from memory traffic or modular reduction.

What would settle it

Implement the hybrid Toom-4 and Karatsuba multipliers for the exact polynomial degrees and field sizes listed in the identified advantage intervals and measure whether the observed operation totals or runtimes match the model's ordering.

read the original abstract

Polynomial multiplication is fundamental in lattice-based cryptography. While the Number Theoretic Transform (NTT) enables fast multiplication, it imposes constraints on the modulus of the coefficient field. Hafiz et al. (2025) addressed this limitation by analyzing the incomplete NTT, which combines a truncated NTT with conventional multiplication methods In this work, we revisit Toom-4 multiplication in the context of incomplete NTT. Although Toom-4 is asymptotically faster than Karatsuba, its precise cost has not been expressed in a form compatible with the incomplete NTT framework. We present a concrete Toom-4 implementation and derive explicit operation counts that separate additions/subtractions and multiplications over the coefficient field. Our analysis based on addition chains yields a simple cost model for incomplete NTT. Using this model, we analyze hybrid strategies combining Toom-4, Karatsuba, and incomplete NTT. We identify parameter ranges where Toom-4 is advantageous and validate the predicted behavior experimentally.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The manuscript presents a concrete Toom-4 implementation for use with incomplete NTT, derives explicit operation counts that separate additions/subtractions from multiplications over the coefficient field via addition-chain analysis, constructs a simple cost model, analyzes hybrid strategies that combine Toom-4 with Karatsuba and incomplete NTT, identifies parameter ranges in which Toom-4 is advantageous, and validates the predicted behavior experimentally.

Significance. If the cost model holds, the work supplies concrete, implementable guidance for selecting multiplication algorithms in lattice-based schemes that rely on incomplete NTT, thereby helping practitioners optimize performance for concrete parameter sets in post-quantum cryptography.

major comments (1)
  1. Cost-model section (following the addition-chain derivation): the explicit counts separate additions/subtractions from multiplications, yet the simple cost model does not fold in the modular-reduction cost that follows each coefficient addition or subtraction in Z_q. Because reduction overhead (Barrett, Montgomery, etc.) is q-dependent and non-negligible, the identified advantage regions for Toom-4 hybrids may shift once these costs are included; this directly affects the central claim that the model reliably predicts when Toom-4 outperforms Karatsuba+NTT.
minor comments (1)
  1. The experimental-validation paragraph would benefit from an explicit table listing the exact (n, q, k) tuples tested together with measured versus predicted cycle counts.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the careful review and constructive comment on our manuscript. We address the major comment below and will incorporate revisions to strengthen the cost model.

read point-by-point responses
  1. Referee: Cost-model section (following the addition-chain derivation): the explicit counts separate additions/subtractions from multiplications, yet the simple cost model does not fold in the modular-reduction cost that follows each coefficient addition or subtraction in Z_q. Because reduction overhead (Barrett, Montgomery, etc.) is q-dependent and non-negligible, the identified advantage regions for Toom-4 hybrids may shift once these costs are included; this directly affects the central claim that the model reliably predicts when Toom-4 outperforms Karatsuba+NTT.

    Authors: We agree that the current cost model, which is built from the separated addition/subtraction and multiplication counts derived via addition chains, does not explicitly parameterize the cost of modular reductions following each coefficient operation in Z_q. Our focus was on producing operation counts directly compatible with the incomplete NTT framework, treating field arithmetic as the core cost. However, because reduction overhead is indeed q-dependent and non-negligible in practice, we will revise the cost-model section to include an additional term (or parameterized cost) for reductions after each addition or subtraction. We will then re-evaluate the hybrid Toom-4/Karatsuba/incomplete-NTT strategies and update the identified advantage regions accordingly. This revision will be supported by the existing experimental validation, which already reflects full implementation costs including reductions. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained

full rationale

The paper derives explicit operation counts for Toom-4 via addition-chain analysis on the algebraic structure of the algorithm, yielding a cost model that is then applied to hybrid strategies. Advantage regions are predicted from this model and separately validated experimentally. No load-bearing self-citations, no fitted parameters renamed as predictions, and no self-definitional reductions appear in the provided derivation chain. The model rests on standard minimization techniques independent of the validation data.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The analysis rests on standard algebraic identities for Toom-4 evaluation points and the correctness of addition-chain minimization; no new entities or fitted constants are introduced in the abstract.

axioms (2)
  • standard math Toom-4 evaluation and interpolation formulas hold over the coefficient ring used in the incomplete NTT.
    Invoked when deriving the concrete operation counts for the Toom-4 step.
  • domain assumption Addition chains produce the minimal number of additions for the required linear combinations.
    Used to obtain the simple cost model stated in the abstract.

pith-pipeline@v0.9.0 · 5706 in / 1398 out tokens · 33199 ms · 2026-05-19T23:13:06.694407+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

15 extracted references · 15 canonical work pages

  1. [1]

    Modul e-Lattice-Based Key- Encapsulation Mechanism Standard

    National Institute of Standards and Technology: “Modul e-Lattice-Based Key- Encapsulation Mechanism Standard”, FIPS 203, 2024

  2. [2]

    Modul e-Lattice-Based Digital Signature Standard

    National Institute of Standards and Technology: “Modul e-Lattice-Based Digital Signature Standard”, FIPS 204, 2024

  3. [3]

    Fast Convolution using fermat number transforms with ap- plications to digital filtering

    R. Agarwal and C. Burrus: “Fast Convolution using fermat number transforms with ap- plications to digital filtering”, In: IEEE Transactions on A coustics, Speech, and Signal Processing, 22, no. 2, pp. 87–97, 1974

  4. [4]

    Towards optimal Toom–Cook multiplication for univariate and multivariate polynomials in characteristic 2 and 0

    M. Bodrato, “Towards optimal Toom–Cook multiplication for univariate and multivariate polynomials in characteristic 2 and 0”, In: Proc. W AIFI 2007 , LNCS 4547, pp. 116–133, 2007

  5. [5]

    Integer and polynomial multip lication: towards optimal Toom– Cook matrices

    M. Bodrato and A. Zanoni, “Integer and polynomial multip lication: towards optimal Toom– Cook matrices”, In: International Symposium on Symbolic an d Algebraic Computation, ISSAC 2007, pp. 17–24, 2007

  6. [6]

    CRYSTALS-Kyber: A CCA-Secure Module-Lat tice-Based KEM

    J. Bos et al.: “CRYSTALS-Kyber: A CCA-Secure Module-Lat tice-Based KEM”, 2018 IEEE 10th EuroS&P, pp. 353–367, 2018

  7. [7]

    On the Minimum Computation Time of Functions

    S. A. Cook: “On the Minimum Computation Time of Functions ”, PhD thesis, Harvard University, 1966

  8. [8]

    An Algorithm for the Machine Calculation of Complex Fourier Series

    J. W. Cooley and J. W. Tukey: “An Algorithm for the Machine Calculation of Complex Fourier Series”, Mathematics of Computation, 19, No. 90, pp. 297–301, 1965

  9. [9]

    CRYSTALS-Dilithium: A Lattice-Based D igital Signature Scheme

    L. Ducas et al.: “CRYSTALS-Dilithium: A Lattice-Based D igital Signature Scheme”, IACR Transactions on Cryptographic Hardware and Embedded Syste ms, Vol. 2018 (1), pp. 238– 268, 2018

  10. [10]

    Incompleteness in Number-Theoretic Transforms: New Tradeoffs and Faster Lattice-Based Cryptographic Applications

    S. M. Hafiz et al.: “Incompleteness in Number-Theoretic Transforms: New Tradeoffs and Faster Lattice-Based Cryptographic Applications”, 2025 I EEE 10th EuroS&P, pp. 565–584, 2025

  11. [11]

    Multiplication of many-dig ital numbers by automatic com- puters

    A. Karatsuba and Y. Ofman: “Multiplication of many-dig ital numbers by automatic com- puters”, Proceedings of USSR Academy of Sciences, 145 (7), pp. 293–294, 1962

  12. [12]

    Number Theoretic Transform and It s Applications in Lattice-based Cryptosystems: A Survey

    Z. Liang and Y. Zhao: “Number Theoretic Transform and It s Applications in Lattice-based Cryptosystems: A Survey”, Cryptology ePrint Archive, Pape r 2022/1682, 2022

  13. [13]

    Implementation of Toom-4 Hybrid Mul tiplication for Incomplete NTT

    S. Oku and M. Kudo: “Implementation of Toom-4 Hybrid Mul tiplication for Incomplete NTT”, GitHub repository, https://github.com/Sakura-Oku/toom4_incomplete_ntt , 2026

  14. [14]

    The complexity of a scheme of functional eleme nts realizing the multiplication of integers

    A. Toom: “The complexity of a scheme of functional eleme nts realizing the multiplication of integers”, In: Soviet Mathematics-Doklady, 7, pp. 714–716, 1963

  15. [15]

    von zur Gathen and J

    J. von zur Gathen and J. Gerhard, Modern Computer Algebr a, 3rd ed., Cambridge Uni- versity Press, 2013. 11