pith. sign in

arxiv: 2606.08339 · v2 · pith:HOEG6LRAnew · submitted 2026-06-06 · 💻 cs.MS · cs.NA· math.NA

Floating-point autotuning with customized precisions

Pith reviewed 2026-06-27 18:42 UTC · model grok-4.3

classification 💻 cs.MS cs.NAmath.NA
keywords precision tuningmixed precisionfloating-point formatsautotuningnumerical accuracycustomized precisionsperformance optimization
0
0 comments X

The pith

Automated precision tuning finds mixed-precision floating-point configurations that reduce many variables to lower precision while preserving required accuracy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a method for automatically tuning the precision of floating-point variables in numerical programs by allowing user-defined exponent and significand sizes. It implements this in the PROMISE tool, which pairs systematic search over precision choices with numerical validation to produce variants that meet user-specified accuracy targets. The approach is tested on linear solvers and Rodinia benchmark programs, where results indicate that a substantial share of variables can use reduced precision without violating accuracy. This matters because lower precision can cut memory use, execution time, and energy consumption in applications that currently default to double. The work frames standard double as frequently over-provisioned for the accuracy actually needed.

Core claim

The central claim is that customized floating-point formats combined with numerical validation and systematic search generate valid mixed-precision program variants, and evaluation on numerical codes shows that a substantial proportion of variables can be safely lowered in precision while accuracy requirements remain satisfied.

What carries the argument

The PROMISE precision autotuning tool, which combines numerical validation with systematic search over user-defined exponent and significand sizes to produce accuracy-compliant program variants.

If this is right

  • Standard double precision is often over-provisioned for the accuracy demands of the tested linear solvers and benchmark applications.
  • Mixed-precision configurations can be derived automatically for a range of numerical programs to reduce resource use while respecting accuracy constraints.
  • Customized exponent and significand sizes enable exploration of both emerging low-precision formats and non-standard configurations in a single framework.
  • Containerized parallel benchmarking allows the search process to scale across multiple algorithms and parameter sets.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If such autotuning were integrated into compilers, many existing double-precision codes could be rewritten with lower memory footprints without manual intervention.
  • The results raise the question of whether hardware designers should prioritize support for a wider set of custom precisions rather than fixed formats.
  • Similar search-plus-validation loops could be applied to other resource-accuracy trade-offs, such as choosing between different numerical algorithms.

Load-bearing premise

The combination of numerical validation and systematic search correctly identifies precision settings that meet accuracy requirements without missing errors introduced by the emulation or testing process.

What would settle it

Take a program variant produced by the tool, execute it on a fresh set of inputs or a different architecture, and observe whether accuracy falls below the user-specified threshold.

Figures

Figures reproduced from arXiv: 2606.08339 by Fabienne J\'ez\'equel, Thibault Hilaire, Xinye Chen.

Figure 2
Figure 2. Figure 2: PROMISE workflow for multiple preci￾sions (the top-to-bottom arrow indicates delta debug￾ging execution). must retain higher precision, enabling precision reduction elsewhere. Overall, PROMISE automates the derivation of mixed-precision mappings that would be difficult to obtain through manual analysis. 2.3 Customized precision with FloatX and its extension to mathematical functions Most hardware implement… view at source ↗
Figure 3
Figure 3. Figure 3: Examples of emulated and customized floating-point formats in FloatX. [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Instrumented Program and precision Instantiation [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Customized precision file and compiling file. [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗
Figure 5
Figure 5. Figure 5: In the promise.yml file, compile defines the command to compile the source code with CADNA support and optional OpenMP multithreading capabilities, run specifies the executable, files lists source files (default: all .cc files), log designates the log file (optional), and output sets the directory for transformed code. The fp.json file allows users to define custom floating-point precisions by specifying t… view at source ↗
Figure 6
Figure 6. Figure 6: Overview of the benchmarking workflow. Fine-grained tasks are executed through either a [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Precision configurations for Backprop with varying requested accuracies. [PITH_FULL_IMAGE:figures/full_fig_p011_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Precision configurations for Hotspot with varying requested accuracies. [PITH_FULL_IMAGE:figures/full_fig_p011_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Precision configurations for Particle Filter with varying requested accuracies. [PITH_FULL_IMAGE:figures/full_fig_p012_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Precision configurations for SRAD with varying requested accuracies. [PITH_FULL_IMAGE:figures/full_fig_p013_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Precision configurations for LU factorization (dense and sparse) under varying requested [PITH_FULL_IMAGE:figures/full_fig_p014_11.png] view at source ↗
read the original abstract

Reduced-precision arithmetic offers significant opportunities to improve performance, memory usage, and energy efficiency in numerical applications, provided that numerical accuracy is preserved. This work investigates automated precision tuning through customized floating-point formats with user-defined exponent and significand sizes, enabling the emulation of emerging low-precision formats and the exploration of non-standard precision configurations within a unified mixed-precision framework. The proposed methodology, implemented in the PROMISE precision autotuning tool, combines numerical validation with a systematic search to generate program variants that satisfy user-defined accuracy requirements. To address the computational cost of this exploration, a containerized benchmarking framework supports parallel execution across multiple algorithms and parameter configurations. The approach is evaluated on a suite of numerical programs, including linear solvers and applications from the Rodinia benchmark. Results show that a substantial proportion of variables can be safely reduced to lower precision while preserving accuracy, indicating that standard double precision is often over-provisioned. These findings highlight the potential of automated precision tuning to derive efficient mixed-precision configurations tailored to application-specific accuracy requirements.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 0 minor

Summary. The manuscript presents the PROMISE precision autotuning tool for floating-point applications. It supports customized floating-point formats with user-defined exponent and significand bit widths to emulate low-precision formats within a mixed-precision framework. The tool combines numerical validation against user-specified accuracy requirements with a systematic search over precision configurations; a containerized parallel benchmarking framework is used to manage the computational cost of the exploration. Evaluation on linear solvers and Rodinia benchmarks indicates that a substantial proportion of variables can be reduced to lower precision while meeting accuracy targets, suggesting that double precision is frequently over-provisioned.

Significance. If the empirical validation is robust, the work demonstrates a practical automated method for deriving application-specific mixed-precision configurations that exploit custom exponent/significand formats. This directly addresses performance, memory, and energy opportunities in numerical computing while providing a unified framework that includes both standard and non-standard precisions. The containerized benchmarking component is a concrete engineering contribution that enables reproducible parallel evaluation across multiple programs and configurations.

major comments (1)
  1. [Abstract] Abstract and overall description: the central claim that 'a substantial proportion of variables can be safely reduced to lower precision while preserving accuracy' rests on the combination of numerical validation and systematic search, yet no details are supplied on the accuracy metric used, the search algorithm itself, or how emulation errors are bounded. This information is load-bearing for verifying that the reported configurations satisfy requirements without undetected violations.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback. The single major comment highlights a valid gap in the abstract's description of key methodological components. We will revise the manuscript to address this.

read point-by-point responses
  1. Referee: [Abstract] Abstract and overall description: the central claim that 'a substantial proportion of variables can be safely reduced to lower precision while preserving accuracy' rests on the combination of numerical validation and systematic search, yet no details are supplied on the accuracy metric used, the search algorithm itself, or how emulation errors are bounded. This information is load-bearing for verifying that the reported configurations satisfy requirements without undetected violations.

    Authors: We agree that the abstract does not provide sufficient detail on these load-bearing elements. The full manuscript describes the accuracy metric (user-specified relative or absolute error tolerances validated against a reference implementation), the search algorithm (a systematic enumeration combined with pruning based on numerical validation), and emulation error bounding (via interval arithmetic and cross-validation against higher-precision runs) in Sections 3 and 4. However, these details are not summarized at the abstract level. In the revised manuscript we will expand the abstract with a concise description of the accuracy metric, search procedure, and error-control approach so that the central claim can be evaluated directly from the abstract. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical search+validation is independent of inputs

full rationale

The paper describes an engineering tool (PROMISE) that performs systematic search over custom floating-point formats combined with numerical validation against user-specified accuracy requirements. No derivation chain reduces a claimed result to its own fitted parameters by construction, no self-citation is load-bearing for a uniqueness theorem, and no ansatz or renaming is presented as a first-principles prediction. The central empirical finding (many variables safely reduced) is produced by external benchmark execution and accuracy checks that are not tautological with the search procedure itself. This is the normal case of a self-contained experimental paper.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review provides insufficient detail to populate the ledger with specific free parameters, axioms, or invented entities.

pith-pipeline@v0.9.1-grok · 5711 in / 1014 out tokens · 24266 ms · 2026-06-27T18:42:44.007529+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

27 extracted references · 23 canonical work pages

  1. [1]

    InIEEE International Symposium on Workload Characterization

    Rodinia: A benchmark suite for heterogeneous computing. InIEEE International Symposium on Workload Characterization. 44–54. doi:10.1109/IISWC.2009.5306797 Stefano Cherubin and Giovanni Agosta

  2. [2]

    ACM Comput

    Tools for Reduced Precision Computation: A Survey. ACM Comput. Surv.53, 2 (April 2020), 35 pages. doi:10.1145/3381039 Wei-Fan Chiang, Mark Baranowski, Ian Briggs, Alexey Solovyev, Ganesh Gopalakrishnan, and Zvonimir Rakamari´c

  3. [3]

    InProceedings of the 44th ACM SIGPLAN Symposium on Principles of Programming Languages (POPL ’17)

    Rigorous floating-point mixed-precision tuning. InProceedings of the 44th ACM SIGPLAN Symposium on Principles of Programming Languages (POPL ’17). ACM, USA, 300–315. doi:10.1145/3009837.3009846 Timothy A. Davis and Yifan Hu

  4. [4]

    The University of Florida sparse matrix collection.ACM Trans. Math. Software38, 1 (2011), 1–25. doi:10.1145/2049662.2049663 P. Eberhart, J. Brajard, P. Fortin, and F. Jézéquel

  5. [5]

    Quentin Ferro, Stef Graillat, Thibault Hilaire, and Fabienne Jézéquel

    High Performance Numerical Validation using Stochastic Arithmetic.Reliable Computing21 (2015), 35–52. Quentin Ferro, Stef Graillat, Thibault Hilaire, and Fabienne Jézéquel

  6. [6]

    In2023 IEEE 16th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)

    Performance of precision auto- tuned neural networks. In2023 IEEE 16th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC). 592–599. doi:10.1109/MCSoC60832.2023.00092 Goran Flegar, Florian Scheidegger, Vedran Novakovi´c, Giovani Mariani, Andrés E. Tomás, A. Cristiano I. Malossi, and Enrique S. Quintana-Ortí

  7. [7]

    FloatX: A C++ Library for Customized Floating-Point Arithmetic.ACM Trans. Math. Softw.45, 4, Article 40 (Dec. 2019), 23 pages. doi: 10.1145/ 3368086 Laurent Fousse, Guillaume Hanrot, Vincent Lefèvre, Patrick Pélissier, and Paul Zimmermann

  8. [8]

    Algorithm 866: IFISS, a Matlab toolbox for modelling incompressible flow

    MPFR: A multiple-precision binary floating-point library with correct rounding. 33, 2 (2007), 13–es. doi:10.1145/1236463.1236468 Gene H. Golub and Charles F. Van Loan. 2013.Matrix Computations(4 ed.). Johns Hopkins University Press, Baltimore. Stef Graillat, Fabienne Jézéquel, Romain Picot, François Févotte, and Bruno Lathuilière

  9. [9]

    doi:10.1016/j.jocs.2019.07.004 Hui Guo and Cindy Rubio-González

    Auto- tuning for floating-point precision with Discrete Stochastic Arithmetic.Journal of Computational Science36 (2019), 101017. doi:10.1016/j.jocs.2019.07.004 Hui Guo and Cindy Rubio-González

  10. [10]

    InProceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA 2018)

    Exploiting community structure for floating-point precision tuning. InProceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA 2018). ACM, USA, 333–343. doi:10.1145/3213846.3213862 Nicholas J. Higham and Theo Mary

  11. [11]

    doi:10.1017/S0962492922000022 IEEE Computer Society

    Mixed precision algorithms in numerical linear algebra.Acta Numerica31 (2022), 347–414. doi:10.1017/S0962492922000022 IEEE Computer Society

  12. [12]

    Kamath, A

    IEEE Standard for Floating-Point Arithmetic.IEEE Std 754-2019(2019). doi:10.1109/IEEESTD.2019.8766229 F. Jézéquel and J.-M. Chesneaux

  13. [13]

    Computer Physics Communications178, 12 (2008), 933–955

    CADNA: A Library for Estimating Round-Off Error Propagation. Computer Physics Communications178, 12 (2008), 933–955. doi: 10.1016/j.cpc.2008.02. 013 16 Fabienne Jézéquel, Sara sadat Hoseininasab, and Thibault Hilaire

  14. [14]

    arXiv:2412.19322 [cs.CE]https://arxiv.org/abs/2412.19322 Michael O Lam and Jeffrey K Hollingsworth

    Mixed-precision numerics in scientific applications: survey and perspectives. arXiv:2412.19322 [cs.CE]https://arxiv.org/abs/2412.19322 Michael O Lam and Jeffrey K Hollingsworth

  15. [15]

    Fine-grained floating-point precision analysis.Int. J. High Perform. Comput. Appl.32, 2 (2018), 231–245. doi:10.1177/1094342016652462 Michael O. Lam, Tristan Vanderbruggen, Harshitha Menon, and Markus Schordan

  16. [16]

    In2019 IEEE/ACM 3rd International Workshop on Software Correctness for HPC Applications (Correctness)

    Tool Integration for Source-Level Mixed Precision. In2019 IEEE/ACM 3rd International Workshop on Software Correctness for HPC Applications (Correctness). 27–35. doi: 10.1109/Correctness49594. 2019.00009 Harshitha Menon, Michael O. Lam, Daniel Osei-Kuffuor, Markus Schordan, Scott Lloyd, Kathryn Mohror, and Jeffrey Hittinger

  17. [17]

    Lebeck, K

    ADAPT: Algorithmic Differentiation Applied to Floating-Point Precision Tuning. InProceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC). 48:1–48:13. doi:10.1109/SC.2018.00051 Konstantinos Parasyris, Ignacio Laguna, Harshitha Menon, Markus Schordan, Daniel Osei-Kuffuor, Giorgis Georgakoudis, Micha...

  18. [18]

    InIEEE International Symposium on Workload Characterization

    HPC-MixPBench: An HPC Benchmark Suite for Mixed-Precision Analysis. InIEEE International Symposium on Workload Characterization. 25–36. doi:10.1109/IISWC50251.2020.00012 Cindy Rubio-González, Cuong Nguyen, Hong Diep Nguyen, James Demmel, William Kahan, Koushik Sen, David H. Bailey, Costin Iancu, and David Hough

  19. [19]

    InProceedings of the International Conference on High Performance Computing, Net- working, Storage and Analysis (SC ’13)

    Precimonious: tuning assistant for floating- point precision. InProceedings of the International Conference on High Performance Computing, Net- working, Storage and Analysis (SC ’13). ACM, USA, 12 pages. doi:10.1145/2503210.2503296 Cindy Rubio-González, Cuong Nguyen, Benjamin Mehne, Koushik Sen, James Demmel, William Kahan, Costin Iancu, Wim Lavrijsen, Da...

  20. [20]

    InIEEE/ACM 38th International Conference on Software Engineering

    Floating-Point Precision Tuning Using Blame Analysis. InIEEE/ACM 38th International Conference on Software Engineering. 1074–1085. doi:10.1145/2884781.2884850 Garima Singh, Baidyanath Kundu, Harshitha Menon, Alexander Penev, David J. Lange, and Vassil Vassilev

  21. [21]

    InProceedings of the IEEE International Parallel and Distributed Processing Symposium (IPDPS)

    Fast and Automatic Floating-Point Error Analysis with CHEF-FP. InProceedings of the IEEE International Parallel and Distributed Processing Symposium (IPDPS). 1018–1028. doi:10.1109/IPDPS54959.2023.00105 Alexey Solovyev, Marek S. Baranowski, Ian Briggs, Charles Jacobsen, Zvonimir Rakamaric, and Ganesh Gopalakrishnan

  22. [22]

    doi:10.1145/3230733 Jackson Vanover, Alper Altuntas, and Cindy Rubio-González

    Rigorous Estimation of Floating-Point Round-Off Errors with Symbolic Taylor Expansions.ACM Transactions on Programming Languages and Systems41, 1 (2019), 2:1–2:39. doi:10.1145/3230733 Jackson Vanover, Alper Altuntas, and Cindy Rubio-González

  23. [23]

    1109/SCW63240.2024.00026 Jean Vignes

  24. [24]

    doi:10.1016/0378-4754(93)90003-D Jean Vignes

    A stochastic arithmetic for reliable scientific computation.Mathematics and Computers in Simulation35, 3 (1993), 233–261. doi:10.1016/0378-4754(93)90003-D Jean Vignes

  25. [25]

    doi:10.1023/B:NUMA.0000049483.75679.ce 17 Yutong Wang and Cindy Rubio-González

    Discrete Stochastic Arithmetic for Validating Results of Numerical Software.Numer- ical Algorithms37, 1 (2004), 377–390. doi:10.1023/B:NUMA.0000049483.75679.ce 17 Yutong Wang and Cindy Rubio-González

  26. [26]

    In2024 IEEE/ACM 46th International Conference on Software Engineering (ICSE)

    Predicting Performance and Accuracy of Mixed- Precision Programs for Precision Tuning. In2024 IEEE/ACM 46th International Conference on Software Engineering (ICSE). 152–164. doi:10.1145/3597503.3623338 A. Zeller and R. Hildebrandt

  27. [27]

    doi:10.1109/32.988498 18

    Simplifying and isolating failure-inducing input.IEEE Transactions on Software Engineering28, 2 (2002), 183–200. doi:10.1109/32.988498 18