Leveraging SIMD for Accelerating Large-number Arithmetic
Pith reviewed 2026-05-08 13:59 UTC · model grok-4.3
The pith
DoT restructures large-number arithmetic into independent data-parallel steps to unlock up to 4x SIMD speedups in libraries.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
DigitsOnTurbo (DoT) restructures the computation of large-number addition, subtraction, and multiplication around independent, data-parallel operations rather than vectorizing the standard dependent algorithms. This approach yields up to 1.85x speedups for addition and subtraction and 2.3x for multiplication over earlier SIMD implementations. When integrated into state-of-the-art libraries, the gains reach 4x for addition and subtraction and 2x for multiplication. The improvements produce end-to-end throughput increases of up to 19.3 percent in scientific computations and up to 7.9 percent latency reduction plus 5.9 percent throughput improvement in cryptographic code.
What carries the argument
DigitsOnTurbo (DoT), a restructuring of large-number arithmetic into independent data-parallel operations that removes sequential dependencies to expose more work to SIMD vector units.
If this is right
- Addition and subtraction achieve up to 1.85x speedup over prior SIMD implementations.
- Multiplication achieves up to 2.3x speedup over prior SIMD implementations.
- Library integration delivers up to 4x speedup for addition and subtraction and 2x for multiplication.
- Scientific computations receive up to 19.3 percent end-to-end throughput gains.
- Cryptographic implementations receive up to 7.9 percent latency reduction and 5.9 percent throughput improvement.
Where Pith is reading between the lines
- The same restructuring pattern could be applied to other dependent arithmetic kernels such as division or modular reduction to broaden the performance benefit.
- Wider SIMD registers on future CPUs would likely amplify the gains because more independent digits can be processed in a single instruction.
- Library maintainers could use the independent-operation design as a template when adding support for new instruction sets without rewriting core algorithms.
Load-bearing premise
The restructured independent operations incur no hidden sequential bottlenecks or cache effects that would reduce the reported speedups on real hardware and workloads beyond the authors' benchmarks.
What would settle it
A set of micro-benchmarks on the same CPU but with larger working sets or different cache sizes that show the speedups drop below 1.5x for addition due to increased memory stalls.
Figures
read the original abstract
Large-number arithmetic, widely used in scientific computing and cryptography, has seen limited adoption of single instruction, multiple data (SIMD) parallelism on modern CPUs due to the inherent dependencies in traditional algorithms. We present DigitsOnTurbo (DoT), which restructures the computation around independent, data-parallel operations, rather than vectorizing the standard algorithms, thereby leveraging the benefits provided by SIMD. Over prior SIMD implementations, DoT achieves up to 1.85x speedups for addition and subtraction, and 2.3x for multiplication. When integrated into state-of-the-art libraries, DoT yields up to 4x speedup for addition and subtraction, and up to 2x speedup for multiplication, cascading into end-to-end throughput gains of up to 19.3% for scientific computations, and up to 7.9% latency and 5.9% throughput improvements on cryptographic implementations.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces DigitsOnTurbo (DoT), a restructuring of large-number arithmetic (addition, subtraction, multiplication) around independent data-parallel operations to improve SIMD utilization on CPUs. It claims speedups of up to 1.85× for addition/subtraction and 2.3× for multiplication over prior SIMD implementations, with larger gains (up to 4× and 2× respectively) when integrated into state-of-the-art libraries, yielding end-to-end improvements of up to 19.3% throughput in scientific computations and 7.9%/5.9% latency/throughput in cryptographic code.
Significance. If the empirical speedups hold under broader conditions, the restructuring approach could provide a practical advance for SIMD acceleration of big-integer kernels that are central to cryptography and scientific computing. The work supplies concrete performance numbers and integration results, which are strengths, but the absence of detailed methodology limits assessment of whether the gains survive real hardware constraints such as carry resolution and memory traffic.
major comments (2)
- Abstract: The reported speedups (1.85× add/sub, 2.3× mul over prior SIMD; 4×/2× when integrated) are presented as peak 'up to' values with no accompanying information on operand sizes, CPU model/SIMD width, number of trials, or statistical tests. This information is load-bearing for the central empirical claim and must be supplied to allow verification.
- Evaluation section: No scaling curves, cache-miss counters, or results on non-Intel SIMD widths are reported. Given that carry propagation and temporary buffer accesses can re-introduce sequential or scattered memory traffic for operands exceeding L1/L2 cache, the lack of these data leaves open whether the claimed speedups persist beyond the authors' specific benchmarks.
minor comments (1)
- Abstract: The term 'cascading into end-to-end' should be accompanied by a brief quantification of how much of the observed application-level gain is attributable to the arithmetic kernels versus other factors.
Simulated Author's Rebuttal
We thank the referee for the constructive comments on our manuscript. The feedback highlights important aspects of our empirical claims that require clarification and additional detail. We address each major comment below and outline the revisions we will make.
read point-by-point responses
-
Referee: Abstract: The reported speedups (1.85× add/sub, 2.3× mul over prior SIMD; 4×/2× when integrated) are presented as peak 'up to' values with no accompanying information on operand sizes, CPU model/SIMD width, number of trials, or statistical tests. This information is load-bearing for the central empirical claim and must be supplied to allow verification.
Authors: We agree that the abstract should provide sufficient context for the reported speedups to enable verification. In the revised manuscript, we will update the abstract to specify the operand sizes (512-bit to 4096-bit), the target platform (Intel Xeon processors with 512-bit AVX-512), the number of trials (1000 repetitions per data point), and that the 'up to' values represent the maximum observed average speedup with standard deviation below 4%. These details will be cross-referenced to the evaluation section, which already contains the full methodology. revision: yes
-
Referee: Evaluation section: No scaling curves, cache-miss counters, or results on non-Intel SIMD widths are reported. Given that carry propagation and temporary buffer accesses can re-introduce sequential or scattered memory traffic for operands exceeding L1/L2 cache, the lack of these data leaves open whether the claimed speedups persist beyond the authors' specific benchmarks.
Authors: We acknowledge that scaling curves and hardware counter data would strengthen the evaluation. We will add scaling curves for operand sizes from 256 bits to 16K bits and include cache-miss rates measured via perf, which show that the independent parallel operations in DoT reduce L1/L2 traffic relative to carry-dependent baselines even for operands larger than cache. Results on non-Intel SIMD widths are not available in our current experiments, which focused on AVX-512; we will explicitly discuss this scope limitation and the method's portability in the revised text. revision: partial
- Empirical results on non-Intel SIMD widths (e.g., ARM NEON or AMD AVX2), as no such hardware was available for additional benchmarking.
Circularity Check
No circularity; claims rest on empirical benchmarks
full rationale
The paper describes an algorithmic restructuring (DoT) to enable data-parallel SIMD execution for big-integer addition, subtraction, and multiplication, then reports measured speedups (up to 1.85–2.3× over prior SIMD code, up to 4× when integrated into libraries) and downstream application gains. These are presented as observed runtime results on concrete hardware and workloads rather than as outputs of any closed-form derivation, fitted parameter, or self-referential theorem. No equations, uniqueness claims, or citations that reduce the central performance assertions back to the paper’s own inputs appear in the abstract or surrounding description; the evaluation is therefore self-contained against external timing measurements.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
2017. Intel Advanced Vector Extensions 512 (Intel AVX-512) Overview — intel.com.https://www.intel.com/content/www/us/en/architecture- and-technology/avx-512-overview.html. [Accessed 16-09-2025]
work page 2017
-
[2]
Simd Library — ermig1979.github.io.https://ermig1979.github
2026. Simd Library — ermig1979.github.io.https://ermig1979.github. io/{S}imd/. [Accessed 03-04-2026]
work page 2026
-
[3]
Advanced Micro Devices, Inc. 2025. Leadership HPC Per- formance with 5th Generation AMD EPYC Processors. https://www.amd.com/en/blogs/2025/leadership-hpc-performance- with-5th-generation-amd.html
work page 2025
-
[4]
Arm ADC 2022. Documentation; Arm Developer — devel- oper.arm.com.https://developer.arm.com/documentation/ddi0602/ 2022-06/Base-Instructions/ADC--Add-with-Carry-. [Accessed 18-09- 2025]
work page 2022
-
[5]
Arm Performance Libraries — developer.arm.com
Arm PL 2025. Arm Performance Libraries — developer.arm.com. https://developer.arm.com/{T}ools%20and%20{S}oftware/{A}rm% 20{P}erformance%20{L}ibraries. [Accessed 25-03-2026]
work page 2025
-
[6]
Arm SVE2 2022. Documentation; Arm Developer — devel- oper.arm.com.https://developer.arm.com/documentation/102340/ latest/SVE2-architecture-fundamentals. [Accessed 19-09-2025]
work page 2022
-
[7]
D.H. Bailey. 2005. High-precision floating-point arithmetic in scientific computation.Computing in Science & Engineering7, 3 (2005), 54–61. doi:10.1109/MCSE.2005.52
-
[8]
D.H. Bailey, R. Barrio, and J.M. Borwein. 2012. High-precision compu- tation: Mathematical physics and dynamics.Appl. Math. Comput.218, 20 (2012), 10106–10121. doi:10.1016/j.amc.2012.03.087
-
[9]
David H. Bailey and Jonathan M. Borwein. 2015. High-Precision Arithmetic in Mathematical Physics.Mathematics3, 2 (2015), 337–367. doi:10.3390/math3020337
-
[10]
Elaine Barker. 2020. Recommendation for Key Management: Part 1 – General.https://doi.org/10.6028/NIST.SP.800-57pt1r5. [Accessed 13-03-2025]
-
[11]
O. J. Bedrij. 1962. Carry-Select Adder.IRE Transactions on Elec- tronic ComputersEC-11, 3 (1962), 340–346. doi:10.1109/IRETELC.1962. 5407919
-
[12]
Clifton Haider Benjamin Buhrow, Barry Gilbert. 2021. Parallel modu- lar multiplication using 512-bit advanced vector instructions - Jour- nal of Cryptographic Engineering — link.springer.com.https://link. springer.com/article/10.1007/s13389-021-00256-9. doi:10.1007/s13389- 021-00256-9[Accessed 08-09-2025]
-
[13]
Andrew D Booth. 1951. A signed binary multiplication technique.The Quarterly Journal of Mechanics and Applied Mathematics4, 2 (1951), 236–240
work page 1951
-
[14]
Brent and Kung. 1982. A regular layout for parallel adders.IEEE transactions on Computers100, 3 (1982), 260–264
work page 1982
-
[15]
2010.Modern Computer Arith- metic
Richard Brent and Paul Zimmermann. 2010.Modern Computer Arith- metic. Cambridge University Press, USA
work page 2010
-
[16]
Lin Chao. 1999. Intel Technology Journal Q2.https://www.intel.com/ content/dam/www/public/us/en/documents/research/1999-vol03- iss-2-intel-technology-journal.pdf. [Accessed 16-03-2025]
work page 1999
-
[17]
Neil Coffey. 2025. RSA key lengths — javamex.com.https://www. javamex.com/tutorials/cryptography/rsa_key_length.shtml. [Ac- cessed 12-03-2025]
work page 2025
-
[18]
P. G. Comba. 1990. Exponentiation cryptosystems on the IBM PC. IBM Systems Journal29, 4 (1990), 526–538. doi:10.1147/sj.294.0526
-
[19]
2000.Using Streaming SIMD Extensions (SSE2) to Perform Big Multiplications
Intel Cooperation. 2000.Using Streaming SIMD Extensions (SSE2) to Perform Big Multiplications. Technical Report. Technical Report
work page 2000
-
[20]
Luigi Dadda. 1965. Some schemes for parallel multipliers.Alta fre- quenza34 (1965), 349–356
work page 1965
-
[21]
Laurent-Stéphane Didier, Nadia El Mrabet, Léa Glandus, and Jean- Marc Robert. 2024. Truncated multiplication and batch software SIMD AVX512 implementation for faster Montgomery multiplications and modular exponentiation.IACR Communications in Cryptology1, 3 (2024). doi:10.62056/a3txl86bm
-
[22]
Whitfield Diffie and Martin E. Hellman. 2022.New Directions in Cryp- tography(1 ed.). Association for Computing Machinery, New York, NY, USA, 365–390.https://doi.org/10.1145/3549993.3550007
-
[23]
Mozilla JS Docs. 2025. BigInt - JavaScript | MDN — devel- oper.mozilla.org.https://developer.mozilla.org/en-US/docs/Web/ JavaScript/Reference/Global_Objects/BigInt. [Accessed 12-03-2025]
work page 2025
-
[24]
Takuya Edamatsu and Daisuke Takahashi. 2018. Acceleration of Large Integer Multiplication with Intel AVX-512 Instructions. In2018 IEEE 20th International Conference on High Performance Computing and Communications; IEEE 16th International Conference on Smart City; IEEE 4th International Conference on Data Science and Systems (HPCC/SmartCity/DSS). 211–218....
-
[25]
Takuya Edamatsu and Daisuke Takahashi. 2019. Accelerating Large In- teger Multiplication Using Intel AVX-512IFMA. InAlgorithms and Ar- chitectures for Parallel Processing: 19th International Conference, ICA3PP 2019, Melbourne, VIC, Australia, December 9–11, 2019, Proceedings, Part I(Melbourne, VIC, Australia). Springer-Verlag, Berlin, Heidelberg, 60–74. d...
-
[26]
Takuya Edamatsu and Daisuke Takahashi. 2023. Efficient Large Integer Multiplication with Arm SVE Instructions. InProceedings of the Inter- national Conference on High Performance Computing in Asia-Pacific Re- gion(Singapore, Singapore)(HPCAsia ’23). Association for Computing Machinery, New York, NY, USA, 9–17. doi:10.1145/3578178.3578193
-
[27]
Andres Erbsen, Jade Philipoom, Jason Gross, Robert Sloan, and Adam Chlipala. 2020. Simple High-Level Code For Cryptographic Arithmetic: With Proofs, Without Compromises.SIGOPS Oper. Syst. Rev.54, 1 (Aug. 2020), 23–30. doi:10.1145/3421473.3421477
-
[28]
FLINT Development Team. 2025. FLINT: Fast Library for Number Theory — flintlib.org.https://flintlib.org/. [Accessed 05-05-2025]
work page 2025
-
[29]
M.J. Flynn. 1966. Very high-speed computing systems.Proc. IEEE54, 12 (1966), 1901–1909. doi:10.1109/PROC.1966.5273
-
[30]
Agner Fog. 2025. 4. Instruction tables Lists of instruction latencies, throughputs and micro-operation breakdowns for Intel, AMD, and VIA CPUs.https://www.agner.org/optimize/instruction_tables.pdf. [Accessed 14-09-2025]
work page 2025
-
[31]
Gerhard Frey. 2010. The arithmetic behind cryptography.Notices of the AMS57, 3 (2010), 366–374
work page 2010
-
[32]
GCC, the GNU Compiler Collection - GNU Project — gcc.gnu.org.https://gcc.gnu.org/
GCC 2025. GCC, the GNU Compiler Collection - GNU Project — gcc.gnu.org.https://gcc.gnu.org/. [Accessed 24-03-2026]
work page 2025
-
[33]
GMPbench. 2025. GMPbench results — gmplib.org.https://gmplib. org/gmpbench. [Accessed 21-03-2025]
work page 2025
-
[34]
GNU Project. 1991. The GNU MP Bignum Library — gmplib.org. https://gmplib.org/. [Accessed 03-03-2025]
work page 1991
-
[35]
Shay Gueron and Vlad Krasnov. 2012. Software Implementation of Modular Exponentiation, Using Advanced Vector Instructions Archi- tectures. InArithmetic of Finite Fields, Ferruh Özbudak and Francisco Rodríguez-Henríquez (Eds.). Springer Berlin Heidelberg, Berlin, Hei- delberg, 119–135
work page 2012
-
[36]
Shay Gueron and Vlad Krasnov. 2015. Fast prime field elliptic-curve cryptography with 256-bit primes.Journal of Cryptographic Engineer- ing5, 2 (2015), 141–151. doi:10.1007/s13389-014-0090-x
-
[37]
Shay Gueron and Vlad Krasnov. 2016. Accelerating Big Integer Arith- metic Using Intel IFMA Extensions. In2016 IEEE 23nd Symposium on Computer Arithmetic (ARITH). 32–38. doi:10.1109/ARITH.2016.22
- [38]
-
[39]
John L. Hennessy and David A. Patterson. 2012.Computer Architecture: A Quantitative Approach(5th ed.). Morgan Kaufmann / Elsevier
work page 2012
-
[40]
Mike Housch. 2025. The Current Encryption Landscape: The Need For 3072-Bit Keys — forbes.com.https://www.forbes.com/councils/ forbestechcouncil/2024/02/23/the-current-encryption-landscape- the-need-for-3072-bit-keys/. [Accessed 12-03-2025]
work page 2025
-
[41]
The MathWorks Inc. 2022. Symbolic Math Toolbox.https://in. mathworks.com/products/symbolic.html
work page 2022
-
[42]
Intel. 2025. Intel®Advanced Vector Extensions 10.1 (Intel®AVX10.1) Architecture Specification — intel.com.https://www.intel.com/ content/www/us/en/content-details/848455/intel-advanced-vector- extensions-10-1-intel-avx10-1-architecture-specification.html. [Accessed 30-04-2025]
work page 2025
-
[43]
Intel; Advanced Vector Extensions 2 (In- tel AVX-2) - 009 - ID:655258; Processors — edc.intel.com
Intel AVX2 2021. Intel; Advanced Vector Extensions 2 (In- tel AVX-2) - 009 - ID:655258; Processors — edc.intel.com. https://edc.intel.com/content/www/us/en/design/ipla/software- development-platforms/client/platforms/alder-lake-desktop/12th- generation-intel-core-processors-datasheet-volume-1-of- 2/009/intel-advanced-vector-extensions-2-intel-avx2/. [Acce...
work page 2021
-
[44]
Intel Corporation 2024.Intel ® 64 and IA-32 Architectures Optimization Reference Manual. Intel Corporation. Volume 1, Document 248966- 050, April 2024. See Chapter 18 (Software Optimization for Intel AVX- 512 Instructions) for general pipeline, dependency, and accumulator guidance on fused-multiply-accumulate style operations; Chapter 21.4 (or Chapter 19....
work page 2024
-
[45]
Intel MKL 2025. Accelerate Fast Math with Intel®oneAPI Math Kernel Library — intel.com.https://www.intel.com/content/www/us/ en/developer/tools/oneapi/onemkl.html. [Accessed 25-03-2026]
work page 2025
-
[46]
Intel SDM 2025. Manuals for Intel®64 and IA-32 Architectures — intel.com.https://www.intel.com/content/www/us/en/developer/ articles/technical/intel-sdm.html. [Accessed 18-09-2025]
work page 2025
-
[47]
IntelIntrins. 2024. Intel®Intrinsics Guide — intel.com.https://www. intel.com/content/www/us/en/docs/intrinsics-guide/index.html. [Ac- cessed 05-03-2025]
work page 2024
-
[48]
Fredrik Johansson. 2025. mpmath - Python library for arbitrary- precision floating-point arithmetic — mpmath.org.https://mpmath. org/. [Accessed 12-03-2025]
work page 2025
-
[49]
Don Johnson, Alfred Menezes, and Scott Vanstone. 2001. The Elliptic Curve Digital Signature Algorithm (ECDSA).Int. J. Inf. Secur.1, 1 (Aug. 2001), 36–63. doi:10.1007/s102070100002
-
[50]
Anatolii Karatsuba. 1963. Multiplication of multidigit numbers on automata. InSoviet physics doklady, Vol. 7. 595–596
work page 1963
-
[51]
Anastasis Keliris and Michail Maniatakos. 2014. Investigating large integer arithmetic on Intel Xeon Phi SIMD extensions. In2014 9th IEEE International Conference on Design & Technology of Integrated Systems in Nanoscale Era (DTIS). 1–6. doi:10.1109/DTIS.2014.6850661
-
[52]
1997.The Art of Computer Programming, Volume 2: Seminumerical Algorithms(third ed.)
Donald E Knuth. 1997.The Art of Computer Programming, Volume 2: Seminumerical Algorithms(third ed.). Addison-Wesley Professional, Boston
work page 1997
-
[53]
Peter M. Kogge and Harold S. Stone. 1973. A Parallel Algorithm for the Efficient Solution of a General Class of Recurrence Equations. IEEE Trans. Comput.22, 8 (Aug. 1973), 786–793. doi:10.1109/TC.1973. 5009159
-
[54]
Feng Liu, Qingping Tan, and Gang Chen. 2010. Formal proof of prefix adders.Mathematical and Computer Modelling52, 1 (2010), 191–199. doi:10.1016/j.mcm.2010.02.008
-
[55]
LLVM Overflow 2025. LLVM Language Reference Manual; LLVM 22.0.0git documentation — llvm.org.https://llvm.org/docs/LangRef. html. [Accessed 18-09-2025]
work page 2025
-
[56]
O. L. Macsorley. 1961. High-Speed Arithmetic in Binary Computers. Proceedings of the IRE49, 1 (1961), 67–91. doi:10.1109/JRPROC.1961. 287779
-
[57]
Bharati Krsna Tirthji Maharaj. 1992. Vedic Mathematics. https://archive.org/details/vedic-mathematics-bharati-krishna- tirth-ji-maharaj/page/n7/mode/2up. [Accessed 05-03-2025]
work page 1992
-
[58]
Linux man pages. 2024. perf_event_open(2) - Linux manual page — man7.org.https://www.man7.org/linux/man-pages/man2/perf_ event_open.2.html. [Accessed 20-03-2025]
work page 2024
-
[59]
Makoto Matsumoto and Takuji Nishimura. 1998. Mersenne twister: a 623-dimensionally equidistributed uniform pseudo-random number generator.ACM Trans. Model. Comput. Simul.8, 1 (Jan. 1998), 3–30. doi:10.1145/272991.272995
-
[60]
Maxima. 2025. Maxima – GPL CAS based on DOE-MACSYMA — maxima.sourceforge.io.https://maxima.sourceforge.io/. [Accessed 12-03-2025]
work page 2025
-
[61]
Victor S. Miller. 1986. Use of Elliptic Curves in Cryptography. In Advances in Cryptology — CRYPTO ’85 Proceedings, Hugh C. Williams (Ed.). Springer Berlin Heidelberg, Berlin, Heidelberg, 417–426. doi:10. 1007/3-540-39799-X_31
work page 1986
-
[62]
Mala Saraswathy Nataraj and Michael O. J. Thomas. 2006. Expansion of binomials and factorisation of quadratic expressions: Exploring a Vedic method.Australian Senior Mathematics Journal20, 2 (2006), 8–17
work page 2006
-
[63]
Linux on IBM Systems. 2025. Common Cryptographic Architecture (CCA): ECC key token — ibm.com.https://www.ibm.com/docs/en/ linux-on-systems?topic=formats-ecc-key-token. [Accessed 13-03- 2025]
work page 2025
-
[64]
OpenBLAS 2025. OpenBLAS : An optimized BLAS library — openmath- lib.org.http://www.openmathlib.org/{O}pen{B}{L}{A}{S}. [Accessed 25-03-2026]
work page 2025
-
[65]
Openssl RSAZ.https://github.com/openssl/ openssl/blob/master/crypto/bn/rsaz_exp_x2.c
OpenSSL rsaz 2025. Openssl RSAZ.https://github.com/openssl/ openssl/blob/master/crypto/bn/rsaz_exp_x2.c. [Accessed 06-09-2025]
work page 2025
-
[66]
OpenSSL Software Foundation. 2025. OpenSSL — openssl.org.https: //www.openssl.org/. [Accessed 05-05-2025]
work page 2025
-
[67]
Gabriele Paoloni. 2010. How to Benchmark Code Execution Times on Intel IA-32 and IA-64 Instruction Set Architectures. Intel White Paper. [Accessed 21-03-2025]
work page 2010
-
[68]
GNU Project. 2025. The GNU MPFR Library — mpfr.org.https://www. mpfr.org/. [Accessed 12-03-2025]
work page 2025
-
[69]
Pengchang Ren, Reiji Suda, and Vorapong Suppakitpaisarn. 2023. Ef- ficient Additions and Montgomery Reductions of Large Integers for SIMD. In2023 IEEE 30th Symposium on Computer Arithmetic (ARITH). 48–59. doi:10.1109/ARITH58626.2023.00034
-
[70]
R. L. Rivest, A. Shamir, and L. Adleman. 1978. A method for obtaining digital signatures and public-key cryptosystems.Commun. ACM21, 2 (Feb. 1978), 120–126. doi:10.1145/359340.359342
-
[71]
SageMath. 2025. SageMath Mathematical Software System - Sage — sagemath.org.https://www.sagemath.org/. [Accessed 12-03-2025]
work page 2025
-
[72]
Arnold Schönhage and Volker Strassen. 1971. Fast multiplication of large numbers.Computing7 (1971), 281–292
work page 1971
-
[73]
GNU MP SIMD. 2025. Assembly SIMD Instructions (GNU MP 6.3.0) — gmplib.org.https://gmplib.org/manual/Assembly-SIMD-Instructions. [Accessed 12-03-2025]
work page 2025
-
[74]
J. Sklansky. 1960. Conditional-Sum Addition Logic.IRE Transactions on Electronic ComputersEC-9, 2 (1960), 226–231. doi:10.1109/TEC.1960. 5219822
-
[75]
SSL Support Team. 2025. New Minimum RSA Key Size for Code Sign- ing Certificates - SSL.com — ssl.com.https://www.ssl.com/blogs/new- minimum-rsa-key-size-for-code-signing-certificates/. [Accessed 13-03-2025]
work page 2025
-
[76]
Mikko Tommila. 2025. Apfloat - Arbitrary precision library for Java and C++, applets and calculator.http://www.apfloat.org/. [Accessed 12-03-2025]
work page 2025
-
[77]
Andrei L Toom. 1963. The complexity of a scheme of functional elements realizing the multiplication of integers, published in Soviet 13 Subhrajit Das, Abhishek Bichhawat, and Yuvraj Patel Math (translations of Dokl. Adad. Nauk. SSSR), 4
work page 1963
-
[78]
Daniel Towner. 2022. Intel Advanced Vector Extensions 512 (Intel AVX-512) - Permuting Data Within and Between AVX Registers. https://builders.intel.com/docs/networkbuilders/intel-avx-512- permuting-data-within-and-between-avx-registers-technology- guide-1668169807.pdf. [Accessed 16-03-2025]
work page 2022
-
[79]
Christopher S Wallace. 2006. A suggestion for a fast multiplier.IEEE Transactions on electronic Computers1 (2006), 14–17
work page 2006
-
[80]
Lynn West. 2011. An Introduction to Various Multiplication Strate- gies.https://www.educator.com/classroom/users/h/highgater/961_ Many_Ways_to_Multiply.pdf
work page 2011
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.