Direction-Preserving Number Representations
Pith reviewed 2026-05-11 02:16 UTC · model grok-4.3
The pith
Standard low-precision number formats are suboptimal for preserving vector directions.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
A geometric framework is introduced for analyzing the directional coverage of such product-structured codes. This work analytically quantifies the suboptimality gap between such product-structured codes and spherical codes for the vector as a whole, in both low and asymptotically high dimensions. Furthermore, within the product code class, it is proven that the standard formats of two's complement, fixed-point, and floating-point are suboptimal, again with quantified gap, pointing to the potential to develop new scalar number formats.
What carries the argument
The geometric directional-coverage metric for product-structured codes, in which every vector element is drawn from the same finite scalar alphabet, compared against full spherical codes.
Load-bearing premise
That the geometric directional coverage metric accurately predicts performance gains in actual machine-learning workloads, which may also depend on magnitude preservation and the specific arithmetic operations used.
What would settle it
Train or evaluate a neural-network model on a standard benchmark while swapping a conventional low-precision format for one of the paper's optimized alphabets and measure whether accuracy rises in proportion to the reported directional-coverage improvement.
Figures
read the original abstract
Low-precision number formats are widely used in modern machine learning systems due to their efficiency. Accurate direction representation is key to the accuracy of vector operations. This work precisely explores the extent to which the direction of a vector can be represented by selecting its scalar elements from a common finite alphabet of a given size. This is standard practice in machine learning, where low-precision significands may be narrow-width floating-point or integer values. A geometric framework is introduced for analyzing the directional coverage of such product-structured codes. This work analytically quantifies the suboptimality gap between such product-structured codes and spherical codes for the vector as a whole, in both low and asymptotically high dimensions. Furthermore, within the product code class, it is proven that the standard formats of two's complement, fixed-point, and floating-point are suboptimal, again with quantified gap, pointing to the potential to develop new scalar number formats. Such scalar alphabets are numerically optimized across multiple block dimensions for directional coverage, including the dimension used in NVIDIA's NVFP4 format. Experimental results are presented comparing the performance of standard formats and the optimized alphabet. We find that for four bits, NVIDIA's choice of E2M1 closely approximates the optimized alphabet, providing a geometric explanation for its strong performance in low-precision machine learning workloads and an analytical understanding of the link between that superiority and block size. We provide open-source formal proofs in Lean for the theorems in this work, along with the experimental code and the optimized alphabets obtained.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces a geometric framework for directional coverage of product-structured codes, where each vector component is drawn from a shared finite alphabet. It analytically quantifies the gap between these codes and optimal spherical codes in low and high dimensions, proves that two's complement, fixed-point, and floating-point alphabets are strictly suboptimal within the product class (with quantified gaps), numerically optimizes alphabets for several block dimensions (including that of NVIDIA's E2M1), and provides supporting experiments plus machine-checked Lean proofs.
Significance. If the directional-coverage metric is predictive of ML workload accuracy, the results could guide design of improved low-precision formats. The machine-checked Lean proofs, open experimental code, and optimized alphabets are clear strengths that enhance verifiability. The geometric explanation for E2M1's performance is a useful contribution to understanding existing formats.
major comments (1)
- [§4.2, Theorem 3] §4.2, Theorem 3: the suboptimality claim for floating-point alphabets is proven only inside the fixed shared-alphabet product class; the quantified gap may change if the more general class of per-dimension alphabets is considered, which is common in mixed-precision practice.
minor comments (3)
- [Figure 4] Figure 4: the error bars on the directional-coverage plots are difficult to distinguish; adding a table of numerical values would improve readability.
- [§5.4] §5.4: the high-dimensional asymptotic result is stated clearly, but a short remark on the dimension at which the gap approaches its limit would help readers assess practical relevance.
- [§6] The experimental section compares only to a subset of standard formats; including bfloat16 or other common low-precision baselines would make the performance claims more comprehensive.
Simulated Author's Rebuttal
We thank the referee for their careful reading of the manuscript and for highlighting the scope of Theorem 3. We address the comment below and have incorporated a clarification in the revised version.
read point-by-point responses
-
Referee: [§4.2, Theorem 3] §4.2, Theorem 3: the suboptimality claim for floating-point alphabets is proven only inside the fixed shared-alphabet product class; the quantified gap may change if the more general class of per-dimension alphabets is considered, which is common in mixed-precision practice.
Authors: We agree that Theorem 3 establishes the suboptimality of standard formats (including floating-point alphabets) strictly within the shared-alphabet product-structured code class defined in the paper. This class models the common hardware practice of applying a single low-precision format uniformly across vector dimensions. The quantified gaps are therefore specific to this setting. Allowing independent alphabets per dimension would indeed define a broader class, potentially altering the gaps, but such per-component optimization is not the focus of the current work and would require a substantially different analysis. We have added a clarifying sentence in §4.2 to make the scope of the theorem explicit and to note that extensions to per-dimension alphabets remain future work. revision: yes
Circularity Check
No significant circularity detected
full rationale
The paper's core results consist of a geometric definition of directional coverage for product-structured codes, analytic quantification of the gap to spherical codes in low and high dimensions, and formal Lean proofs establishing strict suboptimality of standard formats (two's complement, fixed-point, floating-point) within the product-code class. These theorems operate internally on the defined metric and are machine-checked, with open experimental code for numerical optimization of alphabets and external benchmarking against spherical codes. The connection to ML workload performance is presented only as empirical motivation and observation, not as a premise or fitted input for the proofs. No load-bearing step reduces by construction to a self-citation, fitted parameter renamed as prediction, or ansatz smuggled via prior work; the derivation chain is self-contained against the stated geometric definitions and external benchmarks.
Axiom & Free-Parameter Ledger
free parameters (1)
- optimized scalar alphabet values =
numerically optimized per block dimension
axioms (2)
- domain assumption Vector direction is represented via product of scalar elements chosen independently from a finite alphabet
- domain assumption Spherical codes provide the optimal directional coverage benchmark for comparison
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Theorem 2 (Sign-count bound). ... Fn(A) ≥ arccos(min{1, 2√m(A)/Hn})
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
A Survey On Neural Network Quantization,
J. Yang, Z. Li, Z. Feng, and Y. Xie, “A Survey On Neural Network Quantization,” inProceedings of the 2025 6th International Conference on Computer Information and Big Data Applications, ser. CIBDA ’25. New York, NY, USA: Association for Computing Machinery, 2025, p. 384–394. [Online]. Available: https://doi.org/10.1145/3746709.3746773
-
[2]
FP4 All the Way: Fully Quantized Training of Large Language Models,
B. Chmiel, M. Fishman, R. Banner, and D. Soudry, “FP4 All the Way: Fully Quantized Training of Large Language Models,” inThe Thirty- ninth Annual Conference on Neural Information Processing Systems, 2025. [Online]. Available: https://openreview.net/forum?id=kuzye4EPLR
work page 2025
-
[3]
A survey of quantization methods for efficient neural network inference,
A. Gholami, S. Kim, Z. Dong, Z. Yao, M. W. Mahoney, and K. Keutzer, “A survey of quantization methods for efficient neural network inference,” inLow-power computer vision. Chapman and Hall/CRC, 2022, pp. 291–326
work page 2022
-
[4]
A Block Minifloat Representation for Train- ing Deep Neural Networks,
S. Fox, S. Rasoulinezhad, J. Faraone, D. Boland, and P. Leong, “A Block Minifloat Representation for Train- ing Deep Neural Networks,” inInternational Conference on Learning Representations, 2021. [Online]. Available: https://openreview.net/forum?id=6zaTwpNSsQ2
work page 2021
-
[5]
OCP Microscal- ing Formats (MX) Specification,
B. D. Rouhaniet al., “OCP Microscal- ing Formats (MX) Specification,” Open Com- pute Project Foundation. [Online]. Avail- able: https://www.opencompute.org/documents/ ocp-microscaling-formats-mx-v1-0-spec-final-pdf
-
[6]
Interim Report on Binary Floating-point Formats for Machine Learning,
IEEE SA P3109 Working Group, “Interim Report on Binary Floating-point Formats for Machine Learning,” Jan. 2026. [Online]. Available: https://github.com/ P3109/Public/blob/main/Shared%20Reports/IEEE% 20WG%20P3109%20Interim%20Report%20v3.2.1.pdf
work page 2026
-
[7]
Pretraining Large Language Models with NVFP4 , 2025 c
NVIDIAet al., “Pretraining Large Language Models with NVFP4,” 2026. [Online]. Available: https://arxiv. org/abs/2509.25149
-
[8]
Random Packings and Coverings of the Unit n-Sphere,
A. D. Wyner, “Random Packings and Coverings of the Unit n-Sphere,”Bell System Technical Journal, vol. 46, no. 9, pp. 2111–2118, 1967. [Online]. Available: https://onlinelibrary.wiley.com/doi/abs/10. 1002/j.1538-7305.1967.tb04246.x
-
[9]
R. Gray, “Vector quantization,”IEEE Acoustics, Speech and Signal Processing Magazine, vol. 1, no. 2, pp. 4–29, 1984
work page 1984
-
[10]
TurboQuant: Online Vector Quantization with Near-optimal Distortion Rate,
A. Zandieh, M. Daliri, M. Hadian, and V. Mirrokni, “TurboQuant: Online Vector Quantization with Near-optimal Distortion Rate,” inThe Fourteenth International Conference on Learning Representations,
-
[11]
Available: https://openreview.net/ forum?id=tO3ASKZlok
[Online]. Available: https://openreview.net/ forum?id=tO3ASKZlok
-
[12]
PolarQuant: Polar-Coordinate KV Cache Quantization,
I. Han, P. Kacham, A. Karbasi, V. Mirrokni, and A. Zandieh, “PolarQuant: Quantizing KV Caches with Polar Transformation,” 2025. [Online]. Available: https://arxiv.org/abs/2502.02617
-
[13]
R. A. DeVore, “Nonlinear Approximation,”Acta Nu- merica, vol. 7, pp. 51–150, 1998
work page 1998
-
[14]
A note on a method for generating points uniformly on n-dimensional spheres,
M. E. Muller, “A note on a method for generating points uniformly on n-dimensional spheres,”Commun. ACM, vol. 2, no. 4, p. 19–20, Apr. 1959. [Online]. Available: https://doi.org/10.1145/377939.377946
-
[15]
J. Gao, Y. Gou, Y. Xu, Y. Yang, C. Long, and R. C.- W. Wong, “Practical and Asymptotically Optimal Quantization of High-Dimensional Vectors in Euclidean Space for Approximate Nearest Neighbor Search,”Proc. ACM Manag. Data, vol. 3, no. 3, Jun. 2025. [Online]. Available: https://doi.org/10.1145/3725413
-
[16]
Quantization-aware distillation for nvfp4 infer- ence accuracy recovery,
NVIDIAet al., “Quantization-Aware Distillation for NVFP4 Inference Accuracy Recovery,” 2026. [Online]. Available: https://arxiv.org/abs/2601.20088
-
[17]
R. Storn and K. Price, “Differential Evolution – A Simple and Efficient Heuristic for Global Optimization over Continuous Spaces,”Journal of Global Optimization, vol. 11, no. 4, pp. 341–359, 1997. [Online]. Available: https://doi.org/10.1023/A:1008202821328
-
[18]
M. J. D. Powell, “An efficient method for finding the minimum of a function of several variables without calculating derivatives,”The Computer Journal, vol. 7, no. 2, pp. 155–162, 01 1964. [Online]. Available: https://doi.org/10.1093/comjnl/7.2.155
-
[19]
Cambridge: Cambridge University Press, 1952
G.H.Hardy,J.E.Littlewood,andG.Pólya,Inequalities, 2nd ed. Cambridge: Cambridge University Press, 1952
work page 1952
-
[20]
A Short Course on Rearrangement In- equalities,
A. Burchard, “A Short Course on Rearrangement In- equalities,” 2009, lecture notes. 9 Appendix A Two-dimensional classification Following the classification in the main paper, this appendix provides an elementary but complete characterization of the two-dimensional case, as this is the first dimension at which spherical and product codesdiverge,andcanbeea...
work page 2009
-
[21]
Therefore d2 j mj ≤4 Hn (1≤j≤s), and hence s∑ j=1 d2 j mj ≤4s Hn ≤4r Hn
For the second factor, because u(n) is nonincreasing and Ij has lengthm j, we have dj≤ mj∑ i=1 u(n) i ≤2 √mj Hn by Lemma B2. Therefore d2 j mj ≤4 Hn (1≤j≤s), and hence s∑ j=1 d2 j mj ≤4s Hn ≤4r Hn . Combining the preceding inequalities gives ⟨u(n),z⟩2≤∥z∥2 2 4r Hn , which is equivalent to the stated bound. Finally, using the preceding lemmas, we prove the...
-
[22]
The alphabetAn,m from Lemma D5 has2m + 1 = 2b−1real values. Adjoin one additional scalar value not already inAn,m to obtain an alphabet ˜An,m with exactly2b values. SinceAn,m⊂˜An,m, 16 enlarging the alphabet cannot increase the covering radius, so αn(˜An,m)≥αn(An,m). Taking the supremum over all2b-element alphabets gives coswn,2b≥αn(˜An,m)≥2√m−o(1)√Hn . T...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.