Correctly Rounded Functions For Vector Applications: A Performance Study

arxiv: 2605.15547 · v1 · pith:4NEEKZUGnew · submitted 2026-05-15 · 💻 cs.MS

Correctly Rounded Functions For Vector Applications: A Performance Study

Cristina Anderson , Marius Cornea , Andrey Stepin , Mihai Tudor Panu This is my paper

Pith reviewed 2026-05-19 18:27 UTC · model grok-4.3

classification 💻 cs.MS

keywords correctly rounded functionsSIMD algorithmsvector math libraryIEEE 754single precisionGPUdouble precision

0 comments p. Extension

pith:4NEEKZUG Add to your LaTeX paper

What is a Pith Number?

\usepackage{pith}
\pithnumber{4NEEKZUG}

Prints a linked pith:4NEEKZUG badge after your title and writes the identifier into PDF metadata. Compiles on arXiv with no extra files. Learn more

The pith

SIMD algorithms for correctly rounded single-precision functions form the core of a new vector math library planned for mid-2026.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents the design of several SIMD algorithms for one-input single-precision math functions that meet IEEE 754 correct rounding requirements. These algorithms have been integrated into a CPU math library and are positioned as the foundation for the first correctly rounded vector math library, with availability to users targeted for mid-2026. The work also includes adaptations of some implementations for GPUs to enable cross-platform bitwise reproducibility and proof-of-concept SIMD versions for two double-precision functions. A sympathetic reader would care because correct rounding in vectorized settings can deliver consistent numerical results across different hardware platforms without sacrificing the performance benefits of SIMD processing.

Core claim

We have designed several SIMD algorithms for one-input single precision functions and integrated them into our CPU math library; these will form the core of the first correctly rounded vector math library, to be available to users in mid-2026. We adapted and evaluated a few SIMD implementations on GPUs to take advantage of the cross-platform bitwise reproducibility afforded by correct rounding. In addition, we designed and evaluated proof-of-concept SIMD implementations of two correctly rounded double precision functions.

What carries the argument

SIMD algorithms for one-input single-precision functions that achieve correct rounding, integrated into CPU and GPU math libraries to support vector applications.

If this is right

A correctly rounded vector math library becomes available to users in mid-2026.
GPU-adapted implementations deliver bitwise reproducibility across different platforms.
Proof-of-concept work shows the SIMD approach extends to at least two double-precision functions.
Vector applications can rely on guaranteed accuracy without custom rounding adjustments.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Widespread use of these functions could reduce numerical discrepancies in large-scale simulations that run on mixed CPU and GPU hardware.
The methods might be extended to additional elementary functions or to multi-argument operations as a natural next step.
Performance data on real vector workloads would help quantify whether the added correctness changes overall application throughput.

Load-bearing premise

The SIMD algorithms achieve correct rounding according to IEEE 754 while delivering competitive performance on vector hardware.

What would settle it

Public benchmarks that test the new functions on every representable single-precision input to confirm correct rounding and compare their execution speed against existing vector math libraries.

Figures

Figures reproduced from arXiv: 2605.15547 by Andrey Stepin, Cristina Anderson, Marius Cornea, Mihai Tudor Panu.

**Figure 2.** Figure 2: has another pseudo-code example, for the SIMD cr_log2f on AVX-512 targets. Since this sequence relies on xd = (double)x; // R = x – nearest_int(x*8)/8, |R|<2-4 R = FP64_VREDUCE(xd, 0x38); // table index in last 3 bits of d_index d_index = FP64_ADD_RN (xd, 0x1.8p+49); // Nd = x – R; Inf/NaN propagated to Nd Nd = FP64_SUB_RN (xd, R); // permute (lookup) from 8-element table // T = (double)exp2(Nd – floor (Nd… view at source ↗

read the original abstract

Following recent interest in correctly rounded math library functions (as currently recommended by the IEEE 754 standard), we have designed several SIMD algorithms for one-input single precision functions and integrated them into our CPU math library; these will form the core of the first correctly rounded vector math library, to be available to users in mid-2026. To take advantage of the cross-platform bitwise reproducibility afforded by correct rounding, we adapted and evaluated a few SIMD implementations on graphics processing units (GPU). In addition, we designed and evaluated proof-of-concept SIMD implementations of two correctly rounded double precision functions.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper delivers practical SIMD implementations and performance data for correctly rounded vector math functions, but the rounding verification needs more detail.

read the letter

The paper presents new SIMD algorithms for single precision math functions that the authors claim are correctly rounded, integrated into a CPU library, with some GPU work and double precision examples. They position this as the start of the first such vector math library. What the paper does well is the performance evaluation across hardware. The benchmarks on vector CPUs and GPUs give concrete numbers that library users can look at when deciding whether to adopt correctly rounded functions. The integration story and the plan for 2026 release also show practical thinking about deployment. The main soft spot is that the correctness claim lacks visible support. There are algorithm descriptions and performance results, but no error analysis, no mention of exhaustive testing for single precision, and no verification harness. That makes the central promise harder to trust without more data. If the full paper has those details, they need to be front and center. This kind of paper is for people who build or use high-performance math libraries. Someone working on reproducibility in scientific computing would get the most out of the performance study and implementation notes. I would send it to peer review. The topic matters for numerical software, and referees can ask for the missing verification steps. Overall, it's a solid engineering effort with one clear area that needs strengthening.

Referee Report

2 major / 1 minor

Summary. The manuscript claims to have designed several SIMD algorithms for one-input single-precision correctly rounded mathematical functions, integrated them into a CPU math library to serve as the core of the first correctly rounded vector math library (to be available mid-2026), adapted a few implementations for GPUs to leverage bitwise reproducibility, and provided proof-of-concept SIMD implementations for two double-precision functions, accompanied by performance evaluations on CPU and GPU hardware.

Significance. If the correctness claims hold and are properly verified, the work would be significant for numerical computing by delivering high-performance, IEEE 754-compliant vectorized math functions that enable cross-platform bitwise reproducibility, addressing a key recommendation of the standard. The dual CPU/GPU evaluation and planned library release timeline suggest practical utility for scientific and high-performance applications relying on vector hardware.

major comments (2)

[Abstract] Abstract: The central claim that the designed SIMD algorithms achieve correct rounding (per IEEE 754) for single-precision functions (and the double-precision proofs-of-concept) is unsupported by any algorithm descriptions, error-bound analysis, exhaustive test results, verification harness, or formal argument that the final rounding step is always exact. This directly undermines the assertion that these implementations can serve as the core of a correctly rounded vector math library.
[Performance evaluation] Performance evaluation sections: The manuscript references performance measurements and evaluations on CPU and GPU but supplies no specific numbers, tables, figures, or comparisons demonstrating competitive performance while preserving correct rounding. Without these data, the practical claims cannot be assessed.

minor comments (1)

The manuscript would benefit from a summary table listing the implemented functions, their precision, and key performance metrics to improve clarity for readers.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful and constructive review. We address the two major comments point by point below and will revise the manuscript to incorporate additional detail where needed.

read point-by-point responses

Referee: [Abstract] Abstract: The central claim that the designed SIMD algorithms achieve correct rounding (per IEEE 754) for single-precision functions (and the double-precision proofs-of-concept) is unsupported by any algorithm descriptions, error-bound analysis, exhaustive test results, verification harness, or formal argument that the final rounding step is always exact. This directly undermines the assertion that these implementations can serve as the core of a correctly rounded vector math library.

Authors: The manuscript does contain high-level descriptions of the SIMD algorithm designs for the single-precision functions and the two double-precision proofs-of-concept, together with references to the underlying correctly-rounded scalar methods. However, we agree that the current version does not include sufficient low-level algorithm pseudocode, explicit error-bound derivations, or a dedicated verification section. In the revised manuscript we will add these elements: expanded algorithm descriptions with error analysis, a description of the exhaustive testing harness (using MPFR as the reference), and a concise argument establishing that the final rounding step is always exact under the stated conditions. These additions will directly support the claim that the implementations can serve as the core of the planned correctly rounded vector math library. revision: yes
Referee: [Performance evaluation] Performance evaluation sections: The manuscript references performance measurements and evaluations on CPU and GPU but supplies no specific numbers, tables, figures, or comparisons demonstrating competitive performance while preserving correct rounding. Without these data, the practical claims cannot be assessed.

Authors: The performance evaluation sections do describe the experimental setup and the hardware platforms used, but the submitted manuscript version omitted the concrete numerical results, tables, and figures (likely due to space or formatting constraints in the arXiv upload). We will restore and expand these in the revision, adding the measured cycle counts or throughput figures for both CPU and GPU, direct comparisons against existing vector math libraries, and explicit statements confirming that the reported performance figures were obtained with the correctly rounded implementations. revision: yes

Circularity Check

0 steps flagged

No circularity: implementation and performance study with no derivations or self-referential reductions

full rationale

The paper describes the design of several SIMD algorithms for one-input single-precision functions (plus two double-precision proofs-of-concept), their integration into a CPU math library, and performance measurements on CPU and GPU hardware. No equations, first-principles derivations, fitted parameters, or predictions appear in the provided text or abstract. Claims rest directly on the described implementations and empirical timing results rather than reducing to self-definitions, self-citations, or renamed known results. The noted gap in explicit verification for IEEE 754 compliance is a completeness issue, not a circularity pattern.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The work rests on the IEEE 754 standard for floating-point rounding and the existence of SIMD hardware; no free parameters, invented entities, or ad-hoc axioms are introduced in the abstract.

axioms (1)

standard math IEEE 754 standard defines correct rounding for math functions
The abstract explicitly follows recent interest in correctly rounded functions as recommended by IEEE 754.

pith-pipeline@v0.9.0 · 5621 in / 1159 out tokens · 47263 ms · 2026-05-19T18:27:36.265007+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

14 extracted references · 14 canonical work pages

[1]

Fast evaluation of elementary mathematical functions with correctly rounded last bit

A. Ziv, “Fast evaluation of elementary mathematical functions with correctly rounded last bit”, ACM Trans. Math. Software, vol. 17, no. 3, 1991, pp. 410–423

work page 1991
[2]

Handbook of Floating-Point Arithmetic

J J.-M. Muller, N. Brisebarre, F. de Dinechin, C.-P. Jeannerod, V. Lefèvre, G. Melquiond, N. Revol, D. Stehlé, and S. Torres, “Handbook of Floating-Point Arithmetic”, Birkhäuser, 2010

work page 2010
[3]

Correctly Rounded Math Libraries without Worrying about the Application’s Rounding Mode

S. Park, J. Kim, and S. Nagakaratte, “Correctly Rounded Math Libraries without Worrying about the Application’s Rounding Mode”, Proceedings of the ACM on Programming Languages, vol.9, 2025

work page 2025
[4]

Correctly rounded evaluation of a function: why, how, and at what cost?

N. Brisebarre, G. Hanrot, J.-M. Muller, P. Zimmermann, “Correctly rounded evaluation of a function: why, how, and at what cost?”, ACM Computing Surveys, 58 (1), 2026. Available: https://hal.science/hal- 04474530v4

work page 2026
[5]

The CORE-MATH Project

A. Sibidanov, P. Zimmermann, S. Glondu, “The CORE-MATH Project”, 29th IEEE Symposium on Computer Arithmetic, 2022, pp.26-34

work page 2022
[6]

Enhanced Vector Math Support on the Intel®AVX-512 Architecture

C. Anderson, J. Zhang, M.Cornea, “Enhanced Vector Math Support on the Intel®AVX-512 Architecture”, 25th IEEE Symposium on Computer Arithmetic, 2018, pp.116-120

work page 2018
[7]

CR-LIBM A library of correctly rounded elementary functions in double-precision

C. Daramy-Loirat, D. Defour, F. Dinechin, M. Gallet, N. Gast, et al., “CR-LIBM A library of correctly rounded elementary functions in double-precision”, [Research Report] LIP, 2006. Available: https://ens- lyon.hal.science/ensl-01529804v1

work page 2006
[8]

FFT Interpolation Based on FFT Samples: A Detective Story With a Surprise Ending

R. Lyons, “FFT Interpolation Based on FFT Samples: A Detective Story With a Surprise Ending”, 2018. Available: https://www.dsprelated.com/blogimages/RickLyons/FFT_Interpolation_ Lyons.pdf

work page 2018
[9]

Worst cases for correct rounding of the elementary functions in double precision

V. Lefèvre, J.-M. Muller, “Worst cases for correct rounding of the elementary functions in double precision”, 15th IEEE Symposium on Computer Arithmetic, 2001

work page 2001
[10]

IBM Accurate Portable Mathlib

A. Ziv, M. Olshansky, E. Henis, A. Reitman, “IBM Accurate Portable Mathlib”. Available: https://github.com/dreal-deps/mathlib

work page
[11]

MPFR: A multiple-precision binary floating-point library with correct rounding

L. Fousse, G. Hanrot, V. Lefèvre, P. Pélissier, P. Zimmermann, "MPFR: A multiple-precision binary floating-point library with correct rounding", ACM Transactions on Mathematical Software (TOMS), Volume 33, Issue 2, 2007

work page 2007
[12]

LLVM libc math library: Current status and future directions

T. Ly, "LLVM libc math library: Current status and future directions",

work page
[13]

Available: https://llvm.org/devmtg/2024-10/slides/techtalk/Ly- LLVM-libc-math-library-CurrentStatus.pdf

work page 2024
[14]

Sibidanov, P

A. Sibidanov, P. Zimmermann, S. Glondu, et al., CORE-MATH open- source repository. Available: https://gitlab.inria.fr/core-math/core-math/ TABLE III. Relative performance of correctly rounded versus 1-ulp implementations; lower ratio means correctly rounded function is slower GPU Function GPU 1 GPU 2 expf 0.93 0.24 logf 0.76 0.19 sincosf 0.81 0.20 TABLE I...

work page

[1] [1]

Fast evaluation of elementary mathematical functions with correctly rounded last bit

A. Ziv, “Fast evaluation of elementary mathematical functions with correctly rounded last bit”, ACM Trans. Math. Software, vol. 17, no. 3, 1991, pp. 410–423

work page 1991

[2] [2]

Handbook of Floating-Point Arithmetic

J J.-M. Muller, N. Brisebarre, F. de Dinechin, C.-P. Jeannerod, V. Lefèvre, G. Melquiond, N. Revol, D. Stehlé, and S. Torres, “Handbook of Floating-Point Arithmetic”, Birkhäuser, 2010

work page 2010

[3] [3]

Correctly Rounded Math Libraries without Worrying about the Application’s Rounding Mode

S. Park, J. Kim, and S. Nagakaratte, “Correctly Rounded Math Libraries without Worrying about the Application’s Rounding Mode”, Proceedings of the ACM on Programming Languages, vol.9, 2025

work page 2025

[4] [4]

Correctly rounded evaluation of a function: why, how, and at what cost?

N. Brisebarre, G. Hanrot, J.-M. Muller, P. Zimmermann, “Correctly rounded evaluation of a function: why, how, and at what cost?”, ACM Computing Surveys, 58 (1), 2026. Available: https://hal.science/hal- 04474530v4

work page 2026

[5] [5]

The CORE-MATH Project

A. Sibidanov, P. Zimmermann, S. Glondu, “The CORE-MATH Project”, 29th IEEE Symposium on Computer Arithmetic, 2022, pp.26-34

work page 2022

[6] [6]

Enhanced Vector Math Support on the Intel®AVX-512 Architecture

C. Anderson, J. Zhang, M.Cornea, “Enhanced Vector Math Support on the Intel®AVX-512 Architecture”, 25th IEEE Symposium on Computer Arithmetic, 2018, pp.116-120

work page 2018

[7] [7]

CR-LIBM A library of correctly rounded elementary functions in double-precision

C. Daramy-Loirat, D. Defour, F. Dinechin, M. Gallet, N. Gast, et al., “CR-LIBM A library of correctly rounded elementary functions in double-precision”, [Research Report] LIP, 2006. Available: https://ens- lyon.hal.science/ensl-01529804v1

work page 2006

[8] [8]

FFT Interpolation Based on FFT Samples: A Detective Story With a Surprise Ending

R. Lyons, “FFT Interpolation Based on FFT Samples: A Detective Story With a Surprise Ending”, 2018. Available: https://www.dsprelated.com/blogimages/RickLyons/FFT_Interpolation_ Lyons.pdf

work page 2018

[9] [9]

Worst cases for correct rounding of the elementary functions in double precision

V. Lefèvre, J.-M. Muller, “Worst cases for correct rounding of the elementary functions in double precision”, 15th IEEE Symposium on Computer Arithmetic, 2001

work page 2001

[10] [10]

IBM Accurate Portable Mathlib

A. Ziv, M. Olshansky, E. Henis, A. Reitman, “IBM Accurate Portable Mathlib”. Available: https://github.com/dreal-deps/mathlib

work page

[11] [11]

MPFR: A multiple-precision binary floating-point library with correct rounding

L. Fousse, G. Hanrot, V. Lefèvre, P. Pélissier, P. Zimmermann, "MPFR: A multiple-precision binary floating-point library with correct rounding", ACM Transactions on Mathematical Software (TOMS), Volume 33, Issue 2, 2007

work page 2007

[12] [12]

LLVM libc math library: Current status and future directions

T. Ly, "LLVM libc math library: Current status and future directions",

work page

[13] [13]

Available: https://llvm.org/devmtg/2024-10/slides/techtalk/Ly- LLVM-libc-math-library-CurrentStatus.pdf

work page 2024

[14] [14]

Sibidanov, P

A. Sibidanov, P. Zimmermann, S. Glondu, et al., CORE-MATH open- source repository. Available: https://gitlab.inria.fr/core-math/core-math/ TABLE III. Relative performance of correctly rounded versus 1-ulp implementations; lower ratio means correctly rounded function is slower GPU Function GPU 1 GPU 2 expf 0.93 0.24 logf 0.76 0.19 sincosf 0.81 0.20 TABLE I...

work page