FLoPS: Semantics, Operations, and Properties of P3109 Floating-Point Representations in Lean

Jay P Lim; Santosh Nagarakatte; Sehyeok Park; Tung-Che Chang

arxiv: 2602.15965 · v3 · pith:UR32IWKMnew · submitted 2026-02-17 · 💻 cs.MS

FLoPS: Semantics, Operations, and Properties of P3109 Floating-Point Representations in Lean

Tung-Che Chang , Sehyeok Park , Jay P Lim , Santosh Nagarakatte This is my paper

Pith reviewed 2026-05-21 11:57 UTC · model grok-4.3

classification 💻 cs.MS

keywords P3109low-precision floating pointformal verificationLeansaturation arithmeticstochastic roundingnumerical algorithmsmachine learning hardware

0 comments

The pith

A Lean formalization of the P3109 standard shows FastTwoSum computes exact overflow errors under saturation for any rounding mode.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper builds FLoPS, a machine-checked model in Lean that encodes the full parametric definition of the upcoming P3109 low-precision floating-point standard, including bitwidth, precision, signedness, domain, stochastic rounding, and saturation arithmetic. This addresses the verification challenges created by the combinatorial variety of formats, some with only one bit of precision, that will appear in future machine learning hardware. Using the model, the authors prove a new property for the FastTwoSum algorithm and show that prior properties for ExtractScalar no longer hold in the one-bit case. A reader should care because the formal model supplies a reliable foundation for checking numerical code that will run on these nonstandard formats.

Core claim

The authors present FLoPS as a comprehensive formal model in Lean of the P3109 standard. The model captures the semantics and operations for all parametric variations and is used to verify foundational properties. It establishes that FastTwoSum computes an exact overflow error under saturation arithmetic regardless of the chosen rounding mode, while properties previously shown for ExtractScalar fail when precision drops to one bit.

What carries the argument

The FLoPS Lean formalization, which encodes the parametric P3109 framework together with its novel saturation and stochastic-rounding operations and serves as a machine-checked specification for algorithm analysis.

If this is right

Machine-checked proofs become available for other numerical routines that will be deployed on P3109 hardware.
Designers must re-examine algorithms whose correctness was previously established only for IEEE-754 or higher-precision formats.
Verified implementations of low-precision arithmetic can be built directly against the formal model.
Future extensions of the model can check additional algorithms across the full space of P3109 parameters.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Hardware teams could use the same formal model to generate test suites that exercise every parametric corner case.
The breakdown of ExtractScalar properties at one-bit precision suggests similar re-verification will be needed for other classic floating-point kernels.
Connecting the model to hardware description languages would allow end-to-end verification from algorithm to circuit.

Load-bearing premise

The Lean code accurately captures every semantic detail, operation, and parametric combination in the P3109 standard, including stochastic rounding and saturation.

What would settle it

A concrete counterexample in any P3109 format with saturation where FastTwoSum fails to return the exact overflow error under a supported rounding mode would refute the claimed property.

read the original abstract

The upcoming IEEE-P3109 standard for low-precision floating-point arithmetic can become the foundation of future machine learning hardware and software. Unlike IEEE-754, P3109 introduces a parametric framework defined by bitwidth, precision, signedness, and domain. This flexibility results in a vast combinatorial space of formats -- some with as little as one bit of precision -- alongside novel features such as stochastic rounding and saturation arithmetic. These deviations create a unique verification gap that this paper intends to address. This paper presents FLoPS, Formalization in Lean of the P3109 Standard, which is a comprehensive formal model of P3109 in Lean. Our work serves as a rigorous, machine-checked specification that facilitates deep analysis of the standard. We demonstrate the model's utility by verifying foundational properties and analyzing key algorithms within the P3109 context. Specifically, we reveal that FastTwoSum exhibits a novel property of computing exact "overflow error" under saturation using any rounding mode, whereas previously established properties of the ExtractScalar algorithm fail for formats with one bit of precision. This work provides a verified foundation for reasoning about P3109 and enables formal verification of future numerical software. Our Lean development is open source and publicly available.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Lean formalization of P3109 with machine-checked properties for FastTwoSum under saturation and ExtractScalar at one-bit precision, but model fidelity to the draft is the key unverified piece.

read the letter

The main takeaway is that this paper builds a Lean model of the P3109 parametric floating-point standard and uses it to prove two concrete things: FastTwoSum computes an exact overflow error under saturation for any rounding mode, and the standard properties of ExtractScalar break down when precision drops to one bit. Both results are new relative to prior floating-point verification work and come with machine-checked proofs. The development is also open source, which is the right call for something meant to serve as a foundation for others. That part is solid and worth having on record. The soft spot is exactly what the stress-test note flags. The claimed novelties depend on the Lean definitions of saturation clamping, overflow detection, stochastic rounding, and the encoding of the one-bit signed and unsigned cases. If those definitions diverge from the P3109 draft text in any material way, the positive result for FastTwoSum and the negative result for ExtractScalar become properties of the model rather than of the standard. The abstract gives no details on how they validated the model against the specification or on coverage of edge cases, so independent assessment is limited right now. This work is aimed at people who do formal verification of numerical code or who need a trusted starting point for analyzing low-precision formats in ML hardware and libraries. A reader who wants to extend the model or check algorithms inside P3109 will get direct value. It deserves a serious referee because the formal artifact exists and the claims are in principle checkable against the code and the draft. Send it out, but ask the authors to supply a clear mapping from their Lean definitions to the P3109 text and any cross-checks they performed.

Referee Report

2 major / 2 minor

Summary. The paper presents FLoPS, a comprehensive formal model in Lean of the IEEE P3109 standard for low-precision floating-point arithmetic. The model covers parametric formats defined by bitwidth, precision, signedness, and domain, including novel features such as stochastic rounding and saturation arithmetic. It verifies foundational properties and analyzes key algorithms, showing that FastTwoSum computes exact 'overflow error' under saturation using any rounding mode, while previously established properties of the ExtractScalar algorithm fail for formats with one bit of precision. The Lean development is open source.

Significance. If the formalization is faithful to the P3109 draft, this work provides a machine-checked specification that can facilitate formal verification of numerical software for low-precision formats used in machine learning hardware. The strengths include the use of machine-checked proofs in Lean and the public availability of the code, which supports reproducibility. The analysis of algorithm behaviors in extreme cases like 1-bit precision and saturation arithmetic offers insights that could inform the design and implementation of P3109-compliant systems.

major comments (2)

[Abstract and Section 4] The novel property of FastTwoSum computing exact overflow error under saturation for any rounding mode relies on the definitions of saturation arithmetic and overflow detection in the Lean model. However, the manuscript does not include a direct comparison or validation against the textual P3109 specification for how saturation clamps results in edge cases, which is load-bearing for claiming this as a property of the standard rather than the model.
[Section 3] The claim that ExtractScalar properties fail for one-bit precision formats is interesting, but the paper should specify which exact properties from prior work are being tested and provide the counterexamples or failed proof attempts in the formalization to allow readers to assess the scope of the failure.

minor comments (2)

[Abstract] The abstract mentions 'comprehensive formal model' but provides no details on coverage of all edge cases, model completeness, or the specific Lean commit hash used for the proofs, which would aid independent verification.
[Throughout] Notation for parametric formats (bitwidth, precision, etc.) could be clarified with a table summarizing the possible combinations and how they map to Lean types.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their careful review and constructive comments on our paper. We address each of the major comments in detail below, and we plan to incorporate revisions to strengthen the manuscript.

read point-by-point responses

Referee: [Abstract and Section 4] The novel property of FastTwoSum computing exact overflow error under saturation for any rounding mode relies on the definitions of saturation arithmetic and overflow detection in the Lean model. However, the manuscript does not include a direct comparison or validation against the textual P3109 specification for how saturation clamps results in edge cases, which is load-bearing for claiming this as a property of the standard rather than the model.

Authors: We acknowledge the importance of explicitly linking our formal definitions to the P3109 textual specification. While our model was developed based on the draft standard, the current manuscript does not provide a direct side-by-side comparison for saturation clamping in edge cases. In the revised version, we will include an additional paragraph or subsection that quotes the relevant sections from the P3109 draft on saturation arithmetic and demonstrates how our Lean definitions align with them, thereby supporting the claim that the property is with respect to the standard. revision: yes
Referee: [Section 3] The claim that ExtractScalar properties fail for one-bit precision formats is interesting, but the paper should specify which exact properties from prior work are being tested and provide the counterexamples or failed proof attempts in the formalization to allow readers to assess the scope of the failure.

Authors: We agree that greater specificity would benefit the reader. The properties tested are those previously proven for ExtractScalar in the context of standard floating-point formats, specifically the exact reconstruction property and related error bounds from the referenced prior work. In the revision, we will explicitly state which properties are being examined and provide the counterexamples for one-bit precision, including the specific input values that cause the failure and references to the corresponding Lean lemmas or proof attempts in the open-source repository. revision: yes

Circularity Check

0 steps flagged

No significant circularity: direct formalization with machine-checked proofs from model definitions

full rationale

The paper defines a comprehensive Lean model of the P3109 parametric floating-point framework (bitwidth, precision, signedness, domain, stochastic rounding, saturation) and then verifies properties such as FastTwoSum's exact overflow-error behavior under saturation for any rounding mode, plus the failure of ExtractScalar properties at 1-bit precision. These results are obtained by direct proof inside the formal model; no parameters are fitted to data, no predictions are statistically forced by construction, and no central claim reduces to a self-citation chain or imported uniqueness theorem. The work is self-contained as a machine-checked specification, with all derivations following from the explicitly stated Lean definitions rather than circular reductions. External fidelity to the textual P3109 draft is a separate correctness concern, not a circularity issue.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on encoding the P3109 parametric framework into Lean's dependent type theory; no free parameters are introduced and no new entities are postulated.

axioms (1)

domain assumption The semantics of P3109 representations defined by bitwidth, precision, signedness, and domain, including stochastic rounding and saturation arithmetic.
The model is built directly on the definitions and operations specified in the P3109 standard.

pith-pipeline@v0.9.0 · 5757 in / 1188 out tokens · 45882 ms · 2026-05-21T11:57:46.575043+00:00 · methodology

Review history (2 revisions) →

FLoPS: Semantics, Operations, and Properties of P3109 Floating-Point Representations in Lean

Core claim

What carries the argument

If this is right

Where Pith is reading between the lines

Load-bearing premise

What would settle it

discussion (0)