Correctly Rounded Functions For Vector Applications: A Performance Study
Pith reviewed 2026-05-19 18:27 UTC · model grok-4.3
pith:4NEEKZUG Add to your LaTeX paper
What is a Pith Number?\usepackage{pith}
\pithnumber{4NEEKZUG}
Prints a linked pith:4NEEKZUG badge after your title and writes the identifier into PDF metadata. Compiles on arXiv with no extra files. Learn more
The pith
SIMD algorithms for correctly rounded single-precision functions form the core of a new vector math library planned for mid-2026.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We have designed several SIMD algorithms for one-input single precision functions and integrated them into our CPU math library; these will form the core of the first correctly rounded vector math library, to be available to users in mid-2026. We adapted and evaluated a few SIMD implementations on GPUs to take advantage of the cross-platform bitwise reproducibility afforded by correct rounding. In addition, we designed and evaluated proof-of-concept SIMD implementations of two correctly rounded double precision functions.
What carries the argument
SIMD algorithms for one-input single-precision functions that achieve correct rounding, integrated into CPU and GPU math libraries to support vector applications.
If this is right
- A correctly rounded vector math library becomes available to users in mid-2026.
- GPU-adapted implementations deliver bitwise reproducibility across different platforms.
- Proof-of-concept work shows the SIMD approach extends to at least two double-precision functions.
- Vector applications can rely on guaranteed accuracy without custom rounding adjustments.
Where Pith is reading between the lines
- Widespread use of these functions could reduce numerical discrepancies in large-scale simulations that run on mixed CPU and GPU hardware.
- The methods might be extended to additional elementary functions or to multi-argument operations as a natural next step.
- Performance data on real vector workloads would help quantify whether the added correctness changes overall application throughput.
Load-bearing premise
The SIMD algorithms achieve correct rounding according to IEEE 754 while delivering competitive performance on vector hardware.
What would settle it
Public benchmarks that test the new functions on every representable single-precision input to confirm correct rounding and compare their execution speed against existing vector math libraries.
Figures
read the original abstract
Following recent interest in correctly rounded math library functions (as currently recommended by the IEEE 754 standard), we have designed several SIMD algorithms for one-input single precision functions and integrated them into our CPU math library; these will form the core of the first correctly rounded vector math library, to be available to users in mid-2026. To take advantage of the cross-platform bitwise reproducibility afforded by correct rounding, we adapted and evaluated a few SIMD implementations on graphics processing units (GPU). In addition, we designed and evaluated proof-of-concept SIMD implementations of two correctly rounded double precision functions.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript claims to have designed several SIMD algorithms for one-input single-precision correctly rounded mathematical functions, integrated them into a CPU math library to serve as the core of the first correctly rounded vector math library (to be available mid-2026), adapted a few implementations for GPUs to leverage bitwise reproducibility, and provided proof-of-concept SIMD implementations for two double-precision functions, accompanied by performance evaluations on CPU and GPU hardware.
Significance. If the correctness claims hold and are properly verified, the work would be significant for numerical computing by delivering high-performance, IEEE 754-compliant vectorized math functions that enable cross-platform bitwise reproducibility, addressing a key recommendation of the standard. The dual CPU/GPU evaluation and planned library release timeline suggest practical utility for scientific and high-performance applications relying on vector hardware.
major comments (2)
- [Abstract] Abstract: The central claim that the designed SIMD algorithms achieve correct rounding (per IEEE 754) for single-precision functions (and the double-precision proofs-of-concept) is unsupported by any algorithm descriptions, error-bound analysis, exhaustive test results, verification harness, or formal argument that the final rounding step is always exact. This directly undermines the assertion that these implementations can serve as the core of a correctly rounded vector math library.
- [Performance evaluation] Performance evaluation sections: The manuscript references performance measurements and evaluations on CPU and GPU but supplies no specific numbers, tables, figures, or comparisons demonstrating competitive performance while preserving correct rounding. Without these data, the practical claims cannot be assessed.
minor comments (1)
- The manuscript would benefit from a summary table listing the implemented functions, their precision, and key performance metrics to improve clarity for readers.
Simulated Author's Rebuttal
We thank the referee for the careful and constructive review. We address the two major comments point by point below and will revise the manuscript to incorporate additional detail where needed.
read point-by-point responses
-
Referee: [Abstract] Abstract: The central claim that the designed SIMD algorithms achieve correct rounding (per IEEE 754) for single-precision functions (and the double-precision proofs-of-concept) is unsupported by any algorithm descriptions, error-bound analysis, exhaustive test results, verification harness, or formal argument that the final rounding step is always exact. This directly undermines the assertion that these implementations can serve as the core of a correctly rounded vector math library.
Authors: The manuscript does contain high-level descriptions of the SIMD algorithm designs for the single-precision functions and the two double-precision proofs-of-concept, together with references to the underlying correctly-rounded scalar methods. However, we agree that the current version does not include sufficient low-level algorithm pseudocode, explicit error-bound derivations, or a dedicated verification section. In the revised manuscript we will add these elements: expanded algorithm descriptions with error analysis, a description of the exhaustive testing harness (using MPFR as the reference), and a concise argument establishing that the final rounding step is always exact under the stated conditions. These additions will directly support the claim that the implementations can serve as the core of the planned correctly rounded vector math library. revision: yes
-
Referee: [Performance evaluation] Performance evaluation sections: The manuscript references performance measurements and evaluations on CPU and GPU but supplies no specific numbers, tables, figures, or comparisons demonstrating competitive performance while preserving correct rounding. Without these data, the practical claims cannot be assessed.
Authors: The performance evaluation sections do describe the experimental setup and the hardware platforms used, but the submitted manuscript version omitted the concrete numerical results, tables, and figures (likely due to space or formatting constraints in the arXiv upload). We will restore and expand these in the revision, adding the measured cycle counts or throughput figures for both CPU and GPU, direct comparisons against existing vector math libraries, and explicit statements confirming that the reported performance figures were obtained with the correctly rounded implementations. revision: yes
Circularity Check
No circularity: implementation and performance study with no derivations or self-referential reductions
full rationale
The paper describes the design of several SIMD algorithms for one-input single-precision functions (plus two double-precision proofs-of-concept), their integration into a CPU math library, and performance measurements on CPU and GPU hardware. No equations, first-principles derivations, fitted parameters, or predictions appear in the provided text or abstract. Claims rest directly on the described implementations and empirical timing results rather than reducing to self-definitions, self-citations, or renamed known results. The noted gap in explicit verification for IEEE 754 compliance is a completeness issue, not a circularity pattern.
Axiom & Free-Parameter Ledger
axioms (1)
- standard math IEEE 754 standard defines correct rounding for math functions
Reference graph
Works this paper leans on
-
[1]
Fast evaluation of elementary mathematical functions with correctly rounded last bit
A. Ziv, “Fast evaluation of elementary mathematical functions with correctly rounded last bit”, ACM Trans. Math. Software, vol. 17, no. 3, 1991, pp. 410–423
work page 1991
-
[2]
Handbook of Floating-Point Arithmetic
J J.-M. Muller, N. Brisebarre, F. de Dinechin, C.-P. Jeannerod, V. Lefèvre, G. Melquiond, N. Revol, D. Stehlé, and S. Torres, “Handbook of Floating-Point Arithmetic”, Birkhäuser, 2010
work page 2010
-
[3]
Correctly Rounded Math Libraries without Worrying about the Application’s Rounding Mode
S. Park, J. Kim, and S. Nagakaratte, “Correctly Rounded Math Libraries without Worrying about the Application’s Rounding Mode”, Proceedings of the ACM on Programming Languages, vol.9, 2025
work page 2025
-
[4]
Correctly rounded evaluation of a function: why, how, and at what cost?
N. Brisebarre, G. Hanrot, J.-M. Muller, P. Zimmermann, “Correctly rounded evaluation of a function: why, how, and at what cost?”, ACM Computing Surveys, 58 (1), 2026. Available: https://hal.science/hal- 04474530v4
work page 2026
-
[5]
A. Sibidanov, P. Zimmermann, S. Glondu, “The CORE-MATH Project”, 29th IEEE Symposium on Computer Arithmetic, 2022, pp.26-34
work page 2022
-
[6]
Enhanced Vector Math Support on the Intel®AVX-512 Architecture
C. Anderson, J. Zhang, M.Cornea, “Enhanced Vector Math Support on the Intel®AVX-512 Architecture”, 25th IEEE Symposium on Computer Arithmetic, 2018, pp.116-120
work page 2018
-
[7]
CR-LIBM A library of correctly rounded elementary functions in double-precision
C. Daramy-Loirat, D. Defour, F. Dinechin, M. Gallet, N. Gast, et al., “CR-LIBM A library of correctly rounded elementary functions in double-precision”, [Research Report] LIP, 2006. Available: https://ens- lyon.hal.science/ensl-01529804v1
work page 2006
-
[8]
FFT Interpolation Based on FFT Samples: A Detective Story With a Surprise Ending
R. Lyons, “FFT Interpolation Based on FFT Samples: A Detective Story With a Surprise Ending”, 2018. Available: https://www.dsprelated.com/blogimages/RickLyons/FFT_Interpolation_ Lyons.pdf
work page 2018
-
[9]
Worst cases for correct rounding of the elementary functions in double precision
V. Lefèvre, J.-M. Muller, “Worst cases for correct rounding of the elementary functions in double precision”, 15th IEEE Symposium on Computer Arithmetic, 2001
work page 2001
-
[10]
A. Ziv, M. Olshansky, E. Henis, A. Reitman, “IBM Accurate Portable Mathlib”. Available: https://github.com/dreal-deps/mathlib
-
[11]
MPFR: A multiple-precision binary floating-point library with correct rounding
L. Fousse, G. Hanrot, V. Lefèvre, P. Pélissier, P. Zimmermann, "MPFR: A multiple-precision binary floating-point library with correct rounding", ACM Transactions on Mathematical Software (TOMS), Volume 33, Issue 2, 2007
work page 2007
-
[12]
LLVM libc math library: Current status and future directions
T. Ly, "LLVM libc math library: Current status and future directions",
-
[13]
Available: https://llvm.org/devmtg/2024-10/slides/techtalk/Ly- LLVM-libc-math-library-CurrentStatus.pdf
work page 2024
-
[14]
A. Sibidanov, P. Zimmermann, S. Glondu, et al., CORE-MATH open- source repository. Available: https://gitlab.inria.fr/core-math/core-math/ TABLE III. Relative performance of correctly rounded versus 1-ulp implementations; lower ratio means correctly rounded function is slower GPU Function GPU 1 GPU 2 expf 0.93 0.24 logf 0.76 0.19 sincosf 0.81 0.20 TABLE I...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.