Inner Product Aware Quantization: Provably Fast, Accurate, and Adaptive Algorithms

Krish Singal; Nathan White

arxiv: 2606.00289 · v1 · pith:FPWZZSHYnew · submitted 2026-05-29 · 💻 cs.LG · cs.DS

Inner Product Aware Quantization: Provably Fast, Accurate, and Adaptive Algorithms

Nathan White , Krish Singal This is my paper

Pith reviewed 2026-06-28 22:47 UTC · model grok-4.3

classification 💻 cs.LG cs.DS

keywords quantizationinner productsadaptive stochastic quantizationvector compressionmachine learning algorithmsapproximate algorithms

0 comments

The pith

Quantization methods targeting inner product preservation admit faster algorithms than standard approaches.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces objectives for quantization that aim to approximately preserve inner products between vectors rather than just minimizing reconstruction error. It establishes a connection between these objectives and adaptive stochastic quantization, then provides provably efficient exact and approximate algorithms for solving them. These theoretical advances translate into practical algorithms that work well on various data distributions and deliver 2-10 times speedup for standard ASQ tasks without loss in quality. Readers interested in efficient compression for machine learning workloads would care because inner products are central to many similarity-based computations.

Core claim

Formulating inner product aware quantization objectives leads to a tight connection with adaptive stochastic quantization, enabling the development of provably fast exact and approximate algorithms that inspire practical methods 2-10 times faster than prior state-of-the-art while maintaining quality.

What carries the argument

Inner product aware quantization objectives that approximately preserve inner products with worst-case and average-case inputs, connected to adaptive stochastic quantization.

If this is right

Provably fast exact and approximate algorithms exist for the inner product preservation objectives.
Practical algorithms perform well across a variety of workload distributions.
Standard ASQ can be accelerated by 2-10× using the inspired practical algorithms.
Adaptive quantization techniques become more efficient and tractable in practical settings.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

These methods might be combined with other compression techniques like pruning for further gains in neural network deployment.
Extending the worst-case and average-case analyses to specific application domains could yield tailored variants.
The unbiased property of the quantization methods may allow for better integration into stochastic gradient descent pipelines.

Load-bearing premise

The formulated objectives capture natural desiderata for approximately preserving inner products with worst-case and average-case inputs.

What would settle it

A benchmark where the proposed practical algorithms do not achieve 2-10 times speedup over prior methods on standard inner product workloads while keeping comparable accuracy would falsify the main practical claim.

Figures

Figures reproduced from arXiv: 2606.00289 by Krish Singal, Nathan White.

**Figure 2.** Figure 2: (a) Average variance VarSSQ[wi ] of the worst k coordinates on vectors w drawn from LogNormal(0,1) over 100 trials. (b) Average absolute inner product error on vectors w drawn from a mixture of 16 Gaussians with variance 10 and means separated by 105 . In both plots, the shaded regions show the 10–90% range of average variances We now briefly discuss the merits of unbiased and adaptive quantization methods… view at source ↗

**Figure 3.** Figure 3: Runtime of exact algorithms for ASQ on vectors [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 4.** Figure 4: Runtime comparison of approximate QUIVER and our implementation using faster exact subroutines on vectors w drawn from LogNormal(0, 1), with m = 100s uniformly spaced discretization points. The shaded regions show the 10–90% range of runtimes across draws of w. 9Due to the special structure of the calls approximate QUIVER makes to the exact algorithm, we use a slight variant of our fast exact algorithm; s… view at source ↗

**Figure 5.** Figure 5: Runtime of algorithms for MDV compared against Improved Approx QUIVER as a baseline (although it does not provide a good approximation to MDV), on vectors drawn from LogNormal(0, 1). The shaded regions show the 10–90% range of runtimes across draws of the input vector. In [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗

**Figure 6.** Figure 6: Illustration of distribution D′ , showing the reduction of support of D to just two elements See [PITH_FULL_IMAGE:figures/full_fig_p014_6.png] view at source ↗

**Figure 7.** Figure 7: Illustration of (a) case 1 and (b) case 2 [PITH_FULL_IMAGE:figures/full_fig_p026_7.png] view at source ↗

read the original abstract

Quantization is a fundamental tool used to compress datasets, neural network weights, and memory usage in a range of computational tasks. Many downstream applications of vector quantization perform inner products with arbitrary inputs. This motivates the study of inner product aware quantization schemes that approximately preserve inner products with unseen vectors -- in contrast to simply minimizing the mean-squared error. In this work, we formulate objectives that capture natural desiderata and develop adaptive and unbiased quantization methods that approximately preserve inner products with worst-case and average-case inputs. An analysis of these objectives shows a tight connection with the well-studied notion of Adaptive Stochastic Quantization (ASQ). We develop provably fast exact and approximate algorithms for our objectives. Our theoretical results inspire efficient practical algorithms that perform well across a variety of workload distributions. They also lead to practical algorithms for standard ASQ which are 2-10$\times$ faster than prior state-of-the-art methods while maintaining quality. These theoretical and empirical results contribute towards making adaptive quantization techniques more efficient and tractable in practical settings.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper formulates inner-product preserving quantization objectives tied to ASQ, gives provably fast algorithms, and claims 2-10x practical speedups for standard ASQ.

read the letter

The main thing here is that they define new objectives for quantization aimed at preserving inner products with unseen vectors instead of just minimizing MSE, develop adaptive unbiased methods for worst-case and average-case inputs, establish a tight link to Adaptive Stochastic Quantization, and supply exact and approximate algorithms that are both theoretically efficient and practically faster.

What stands out as new is the specific inner-product aware objectives and the way they leverage the ASQ connection to derive faster methods. The work does well by delivering both the theoretical analysis and the resulting practical algorithms that maintain quality across different workload distributions.

The soft spots are modest but worth noting. The abstract promises proofs of speed and correctness, so the value hinges on whether those derivations are clean and the constants reasonable; without the full details it is hard to judge if the bounds are tight enough for real use. The empirical speedups of 2-10x also need checking against the exact baselines and whether the tested distributions include actual neural network weights or stay mostly synthetic. The claim that the objectives capture natural desiderata is plausible but could be sensitive to how pessimistic the worst-case setting turns out to be.

This is for researchers and practitioners working on model compression, vector quantization, and similarity search in machine learning. Anyone implementing adaptive quantization in production or looking for faster ASQ variants would get direct value if the speed claims hold up.

It deserves a serious referee because the claims are concrete, build directly on prior ASQ literature, and are testable with code and experiments.

Referee Report

0 major / 3 minor

Summary. The manuscript formulates inner-product-aware quantization objectives that approximately preserve inner products with worst-case and average-case inputs (as opposed to MSE minimization), establishes a tight connection between these objectives and Adaptive Stochastic Quantization (ASQ), develops provably fast exact and approximate algorithms for the objectives, and derives practical algorithms that achieve 2-10× speedups over prior state-of-the-art methods for standard ASQ while preserving quality across varied workload distributions.

Significance. If the claimed theoretical guarantees and empirical speedups are substantiated, the work would meaningfully advance practical quantization for inner-product-heavy tasks in machine learning. The explicit connection to ASQ together with the provision of both exact/approximate algorithms and faster practical implementations for the standard ASQ setting constitute a clear strength; the emphasis on adaptive, unbiased methods with worst-case and average-case analysis is a useful framing.

minor comments (3)

[Abstract] The abstract states that the analysis 'shows a tight connection' to ASQ; a brief sentence clarifying whether this is an equivalence, a reduction, or an approximation under specific conditions would improve readability.
[Abstract] The claim of 'provably fast exact and approximate algorithms' would benefit from a one-sentence statement of the key complexity results (e.g., time or space bounds) already in the abstract or introduction.
Figure and table captions should explicitly state the workload distributions and baseline implementations used to obtain the reported 2-10× speedups so that the empirical claims are immediately verifiable.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the thorough summary and positive evaluation of the manuscript, including the recognition of the theoretical connections to ASQ and the practical speedups. The recommendation for minor revision is appreciated. No specific major comments were raised in the report.

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The provided abstract and description formulate inner-product-preserving objectives, note a connection to ASQ, and claim provably fast algorithms with empirical speedups. No equations, derivations, or self-referential steps are visible in the text. No load-bearing claim reduces by construction to a fit, self-definition, or self-citation chain. The argument structure (formulation to analysis to algorithms) appears self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review supplies no information on free parameters, axioms, or invented entities used in the derivations or objectives.

pith-pipeline@v0.9.1-grok · 5708 in / 835 out tokens · 19314 ms · 2026-06-28T22:47:43.858775+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

32 extracted references · 4 canonical work pages · 3 internal anchors

[1]

Qsgd: Communication-efficient sgd via gradient quantization and encoding

Dan Alistarh, Demjan Grubic, Jerry Li, Ryota Tomioka, and Milan Vojnovic. Qsgd: Communication-efficient sgd via gradient quantization and encoding. Advances in neural information processing systems , 30, 2017

2017
[2]

Geometric applications of a matrix searching algorithm

Alok Aggarwal, Maria Klawe, Shlomo Moran, Peter Shor, and Robert Wilber. Geometric applications of a matrix searching algorithm. In Proceedings of the second annual symposium on Computational geometry , pages 285--292, 1986

1986
[3]

Finding a minimum weight k-link path in graphs with monge property and applications

Alok Aggarwal, Baruch Schieber, and Takashi Tokuyama. Finding a minimum weight k-link path in graphs with monge property and applications. In Proceedings of the ninth annual symposium on Computational geometry , pages 189--197, 1993

1993
[4]

Optimal and approximate adaptive stochastic quantization

Ran Ben-Basat, Yaniv Ben-Itzhak, Michael Mitzenmacher, and Shay Vargaftik. Optimal and approximate adaptive stochastic quantization. Advances in Neural Information Processing Systems , 37:94265--94291, 2024

2024
[5]

Better than optimal: Improving adaptive stochastic quantization using shared randomness

Ran Ben Basat, Yaniv Ben-Itzhak, Michael Mitzenmacher, and Shay Vargaftik. Better than optimal: Improving adaptive stochastic quantization using shared randomness. Proceedings of the ACM on Measurement and Analysis of Computing Systems , 9(3):1--44, 2025

2025
[6]

Post training 4-bit quantization of convolutional networks for rapid-deployment

Ron Banner, Yury Nahshan, and Daniel Soudry. Post training 4-bit quantization of convolutional networks for rapid-deployment. Advances in neural information processing systems , 32, 2019

2019
[7]

Riley Carlson, John Bauer, and Christopher D. Manning. A new pair of gloves, 2025

2025
[8]

Neural gradients are near-lognormal: improved quantized and sparse training

Brian Chmiel, Liad Ben-Uri, Moran Shkolnik, Elad Hoffer, Ron Banner, and Daniel Soudry. Neural gradients are near-lognormal: improved quantized and sparse training. arXiv preprint arXiv:2006.08173 , 2020

work page arXiv 2006
[9]

Optq: Accurate quantization for generative pre-trained transformers

E Frantar, S Ashkboos, T Hoefler, and D Alistarh. Optq: Accurate quantization for generative pre-trained transformers. 2023. In URL https://openreview. net/forum , 2023

2023
[10]

Optimal algorithms for approximate clustering

Tom\' a s Feder and Daniel Greene. Optimal algorithms for approximate clustering. In Proceedings of the Twentieth Annual ACM Symposium on Theory of Computing , STOC '88, page 434–444, New York, NY, USA, 1988. Association for Computing Machinery

1988
[11]

Don’t waste your bits! squeeze activations and gradients for deep neural networks via tinyscript

Fangcheng Fu, Yuzheng Hu, Yihan He, Jiawei Jiang, Yingxia Shao, Ce Zhang, and Bin Cui. Don’t waste your bits! squeeze activations and gradients for deep neural networks via tinyscript. In International Conference on Machine Learning , pages 3304--3314. PMLR, 2020

2020
[12]

Generalized selection and ranking: sorted matrices

Greg N Frederickson and Donald B Johnson. Generalized selection and ranking: sorted matrices. SIAM Journal on computing , 13(1):14--30, 1984

1984
[13]

Adaptive gradient quantization for data-parallel sgd

Fartash Faghri, Iman Tabrizian, Ilia Markov, Dan Alistarh, Daniel M Roy, and Ali Ramezani-Kebrya. Adaptive gradient quantization for data-parallel sgd. Advances in neural information processing systems , 33:3174--3185, 2020

2020
[14]

Practical and asymptotically optimal quantization of high-dimensional vectors in euclidean space for approximate nearest neighbor search

Jianyang Gao, Yutong Gou, Yuexuan Xu, Yongyi Yang, Cheng Long, and Raymond Chi-Wing Wong. Practical and asymptotically optimal quantization of high-dimensional vectors in euclidean space for approximate nearest neighbor search. Proceedings of the ACM on Management of Data , 3(3):1--26, 2025

2025
[15]

Optimized product quantization

Tiezheng Ge, Kaiming He, Qifa Ke, and Jian Sun. Optimized product quantization. IEEE transactions on pattern analysis and machine intelligence , 36(4):744--755, 2013

2013
[16]

Some simplified np-complete problems

Michael R Garey, David S Johnson, and Larry Stockmeyer. Some simplified np-complete problems. In Proceedings of the sixth annual ACM symposium on Theory of computing , pages 47--63, 1974

1974
[17]

Fast Exact k-Means, k-Medians and Bregman Divergence Clustering in 1D

Allan Gr nlund, Kasper Green Larsen, Alexander Mathiasen, Jesper Sindahl Nielsen, Stefan Schneider, and Mingzhou Song. Fast exact k-means, k-medians and bregman divergence clustering in 1d. arXiv preprint arXiv:1701.07204 , 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[18]

Clustering to minimize the maximum intercluster distance

Teofilo F Gonzalez. Clustering to minimize the maximum intercluster distance. Theoretical computer science , 38:293--306, 1985

1985
[19]

Horn and Charles R

Roger A. Horn and Charles R. Johnson. Matrix Analysis . Cambridge University Press, 2nd edition, 2012

2012
[20]

A frustratingly easy post-training quantization scheme for llms

Yongkweon Jeon, Chungman Lee, Kyungphil Park, and Ho-young Kim. A frustratingly easy post-training quantization scheme for llms. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing , pages 14446--14461, 2023

2023
[21]

Selection in x+ y and matrices with sorted rows and columns

Andranik Mirzaian and Eshrat Arjomandi. Selection in x+ y and matrices with sorted rows and columns. Information processing letters , 20(1):13--17, 1985

1985
[22]

Mixed Precision Training

Paulius Micikevicius, Sharan Narang, Jonah Alben, Gregory Diamos, Erich Elsen, David Garcia, Boris Ginsburg, Michael Houston, Oleksii Kuchaiev, Ganesh Venkatesh, et al. Mixed precision training. arXiv preprint arXiv:1710.03740 , 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[23]

A survey of product quantization

Yusuke Matsui, Yusuke Uchida, Herve Jegou, and Shin'ichi Satoh. A survey of product quantization. ITE Transactions on Media Technology and Applications , 6(1):2--10, 2018

2018
[24]

Up or down? adaptive rounding for post-training quantization

Markus Nagel, Rana Ali Amjad, Mart Van Baalen, Christos Louizos, and Tijmen Blankevoort. Up or down? adaptive rounding for post-training quantization. In International conference on machine learning , pages 7197--7206. PMLR, 2020

2020
[25]

Nuqsgd: Provably communication-efficient data-parallel sgd via nonuniform quantization

Ali Ramezani-Kebrya, Fartash Faghri, Ilya Markov, Vitalii Aksenov, Dan Alistarh, and Daniel M Roy. Nuqsgd: Provably communication-efficient data-parallel sgd via nonuniform quantization. Journal of Machine Learning Research , 22(114):1--43, 2021

2021
[26]

Computing a minimum weightk-link path in graphs with the concave monge property

Baruch Schieber. Computing a minimum weightk-link path in graphs with the concave monge property. Journal of Algorithms , 29(2):204--222, 1998

1998
[27]

Flexgen: High-throughput generative inference of large language models with a single gpu

Ying Sheng, Lianmin Zheng, Binhang Yuan, Zhuohan Li, Max Ryabinin, Beidi Chen, Percy Liang, Christopher R \'e , Ion Stoica, and Ce Zhang. Flexgen: High-throughput generative inference of large language models with a single gpu. In International Conference on Machine Learning , pages 31094--31116. PMLR, 2023

2023
[28]

Bayesian neural networks become heavier-tailed with depth

Mariia Vladimirova, Julyan Arbel, and Pablo Mesejo. Bayesian neural networks become heavier-tailed with depth. In NeurIPS 2018-Thirty-second Conference on Neural Information Processing Systems , pages 1--7, 2018

2018
[29]

The concave least-weight subsequence problem revisited

Robert Wilber. The concave least-weight subsequence problem revisited. Journal of Algorithms , 9(3):418--425, 1988

1988
[30]

Optimal quantization by matrix searching

Xiaolin Wu. Optimal quantization by matrix searching. Journal of algorithms , 12(4):663--673, 1991

1991
[31]

TurboQuant: Online Vector Quantization with Near-optimal Distortion Rate

Amir Zandieh, Majid Daliri, Majid Hadian, and Vahab Mirrokni. Turboquant: Online vector quantization with near-optimal distortion rate. arXiv preprint arXiv:2504.19874 , 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[32]

Zipml: Training linear models with end-to-end low precision, and a little bit of deep learning

Hantian Zhang, Jerry Li, Kaan Kara, Dan Alistarh, Ji Liu, and Ce Zhang. Zipml: Training linear models with end-to-end low precision, and a little bit of deep learning. In International Conference on Machine Learning , pages 4035--4043. PMLR, 2017

2017

[1] [1]

Qsgd: Communication-efficient sgd via gradient quantization and encoding

Dan Alistarh, Demjan Grubic, Jerry Li, Ryota Tomioka, and Milan Vojnovic. Qsgd: Communication-efficient sgd via gradient quantization and encoding. Advances in neural information processing systems , 30, 2017

2017

[2] [2]

Geometric applications of a matrix searching algorithm

Alok Aggarwal, Maria Klawe, Shlomo Moran, Peter Shor, and Robert Wilber. Geometric applications of a matrix searching algorithm. In Proceedings of the second annual symposium on Computational geometry , pages 285--292, 1986

1986

[3] [3]

Finding a minimum weight k-link path in graphs with monge property and applications

Alok Aggarwal, Baruch Schieber, and Takashi Tokuyama. Finding a minimum weight k-link path in graphs with monge property and applications. In Proceedings of the ninth annual symposium on Computational geometry , pages 189--197, 1993

1993

[4] [4]

Optimal and approximate adaptive stochastic quantization

Ran Ben-Basat, Yaniv Ben-Itzhak, Michael Mitzenmacher, and Shay Vargaftik. Optimal and approximate adaptive stochastic quantization. Advances in Neural Information Processing Systems , 37:94265--94291, 2024

2024

[5] [5]

Better than optimal: Improving adaptive stochastic quantization using shared randomness

Ran Ben Basat, Yaniv Ben-Itzhak, Michael Mitzenmacher, and Shay Vargaftik. Better than optimal: Improving adaptive stochastic quantization using shared randomness. Proceedings of the ACM on Measurement and Analysis of Computing Systems , 9(3):1--44, 2025

2025

[6] [6]

Post training 4-bit quantization of convolutional networks for rapid-deployment

Ron Banner, Yury Nahshan, and Daniel Soudry. Post training 4-bit quantization of convolutional networks for rapid-deployment. Advances in neural information processing systems , 32, 2019

2019

[7] [7]

Riley Carlson, John Bauer, and Christopher D. Manning. A new pair of gloves, 2025

2025

[8] [8]

Neural gradients are near-lognormal: improved quantized and sparse training

Brian Chmiel, Liad Ben-Uri, Moran Shkolnik, Elad Hoffer, Ron Banner, and Daniel Soudry. Neural gradients are near-lognormal: improved quantized and sparse training. arXiv preprint arXiv:2006.08173 , 2020

work page arXiv 2006

[9] [9]

Optq: Accurate quantization for generative pre-trained transformers

E Frantar, S Ashkboos, T Hoefler, and D Alistarh. Optq: Accurate quantization for generative pre-trained transformers. 2023. In URL https://openreview. net/forum , 2023

2023

[10] [10]

Optimal algorithms for approximate clustering

Tom\' a s Feder and Daniel Greene. Optimal algorithms for approximate clustering. In Proceedings of the Twentieth Annual ACM Symposium on Theory of Computing , STOC '88, page 434–444, New York, NY, USA, 1988. Association for Computing Machinery

1988

[11] [11]

Don’t waste your bits! squeeze activations and gradients for deep neural networks via tinyscript

Fangcheng Fu, Yuzheng Hu, Yihan He, Jiawei Jiang, Yingxia Shao, Ce Zhang, and Bin Cui. Don’t waste your bits! squeeze activations and gradients for deep neural networks via tinyscript. In International Conference on Machine Learning , pages 3304--3314. PMLR, 2020

2020

[12] [12]

Generalized selection and ranking: sorted matrices

Greg N Frederickson and Donald B Johnson. Generalized selection and ranking: sorted matrices. SIAM Journal on computing , 13(1):14--30, 1984

1984

[13] [13]

Adaptive gradient quantization for data-parallel sgd

Fartash Faghri, Iman Tabrizian, Ilia Markov, Dan Alistarh, Daniel M Roy, and Ali Ramezani-Kebrya. Adaptive gradient quantization for data-parallel sgd. Advances in neural information processing systems , 33:3174--3185, 2020

2020

[14] [14]

Practical and asymptotically optimal quantization of high-dimensional vectors in euclidean space for approximate nearest neighbor search

Jianyang Gao, Yutong Gou, Yuexuan Xu, Yongyi Yang, Cheng Long, and Raymond Chi-Wing Wong. Practical and asymptotically optimal quantization of high-dimensional vectors in euclidean space for approximate nearest neighbor search. Proceedings of the ACM on Management of Data , 3(3):1--26, 2025

2025

[15] [15]

Optimized product quantization

Tiezheng Ge, Kaiming He, Qifa Ke, and Jian Sun. Optimized product quantization. IEEE transactions on pattern analysis and machine intelligence , 36(4):744--755, 2013

2013

[16] [16]

Some simplified np-complete problems

Michael R Garey, David S Johnson, and Larry Stockmeyer. Some simplified np-complete problems. In Proceedings of the sixth annual ACM symposium on Theory of computing , pages 47--63, 1974

1974

[17] [17]

Fast Exact k-Means, k-Medians and Bregman Divergence Clustering in 1D

Allan Gr nlund, Kasper Green Larsen, Alexander Mathiasen, Jesper Sindahl Nielsen, Stefan Schneider, and Mingzhou Song. Fast exact k-means, k-medians and bregman divergence clustering in 1d. arXiv preprint arXiv:1701.07204 , 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[18] [18]

Clustering to minimize the maximum intercluster distance

Teofilo F Gonzalez. Clustering to minimize the maximum intercluster distance. Theoretical computer science , 38:293--306, 1985

1985

[19] [19]

Horn and Charles R

Roger A. Horn and Charles R. Johnson. Matrix Analysis . Cambridge University Press, 2nd edition, 2012

2012

[20] [20]

A frustratingly easy post-training quantization scheme for llms

Yongkweon Jeon, Chungman Lee, Kyungphil Park, and Ho-young Kim. A frustratingly easy post-training quantization scheme for llms. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing , pages 14446--14461, 2023

2023

[21] [21]

Selection in x+ y and matrices with sorted rows and columns

Andranik Mirzaian and Eshrat Arjomandi. Selection in x+ y and matrices with sorted rows and columns. Information processing letters , 20(1):13--17, 1985

1985

[22] [22]

Mixed Precision Training

Paulius Micikevicius, Sharan Narang, Jonah Alben, Gregory Diamos, Erich Elsen, David Garcia, Boris Ginsburg, Michael Houston, Oleksii Kuchaiev, Ganesh Venkatesh, et al. Mixed precision training. arXiv preprint arXiv:1710.03740 , 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[23] [23]

A survey of product quantization

Yusuke Matsui, Yusuke Uchida, Herve Jegou, and Shin'ichi Satoh. A survey of product quantization. ITE Transactions on Media Technology and Applications , 6(1):2--10, 2018

2018

[24] [24]

Up or down? adaptive rounding for post-training quantization

Markus Nagel, Rana Ali Amjad, Mart Van Baalen, Christos Louizos, and Tijmen Blankevoort. Up or down? adaptive rounding for post-training quantization. In International conference on machine learning , pages 7197--7206. PMLR, 2020

2020

[25] [25]

Nuqsgd: Provably communication-efficient data-parallel sgd via nonuniform quantization

Ali Ramezani-Kebrya, Fartash Faghri, Ilya Markov, Vitalii Aksenov, Dan Alistarh, and Daniel M Roy. Nuqsgd: Provably communication-efficient data-parallel sgd via nonuniform quantization. Journal of Machine Learning Research , 22(114):1--43, 2021

2021

[26] [26]

Computing a minimum weightk-link path in graphs with the concave monge property

Baruch Schieber. Computing a minimum weightk-link path in graphs with the concave monge property. Journal of Algorithms , 29(2):204--222, 1998

1998

[27] [27]

Flexgen: High-throughput generative inference of large language models with a single gpu

Ying Sheng, Lianmin Zheng, Binhang Yuan, Zhuohan Li, Max Ryabinin, Beidi Chen, Percy Liang, Christopher R \'e , Ion Stoica, and Ce Zhang. Flexgen: High-throughput generative inference of large language models with a single gpu. In International Conference on Machine Learning , pages 31094--31116. PMLR, 2023

2023

[28] [28]

Bayesian neural networks become heavier-tailed with depth

Mariia Vladimirova, Julyan Arbel, and Pablo Mesejo. Bayesian neural networks become heavier-tailed with depth. In NeurIPS 2018-Thirty-second Conference on Neural Information Processing Systems , pages 1--7, 2018

2018

[29] [29]

The concave least-weight subsequence problem revisited

Robert Wilber. The concave least-weight subsequence problem revisited. Journal of Algorithms , 9(3):418--425, 1988

1988

[30] [30]

Optimal quantization by matrix searching

Xiaolin Wu. Optimal quantization by matrix searching. Journal of algorithms , 12(4):663--673, 1991

1991

[31] [31]

TurboQuant: Online Vector Quantization with Near-optimal Distortion Rate

Amir Zandieh, Majid Daliri, Majid Hadian, and Vahab Mirrokni. Turboquant: Online vector quantization with near-optimal distortion rate. arXiv preprint arXiv:2504.19874 , 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[32] [32]

Zipml: Training linear models with end-to-end low precision, and a little bit of deep learning

Hantian Zhang, Jerry Li, Kaan Kara, Dan Alistarh, Ji Liu, and Ce Zhang. Zipml: Training linear models with end-to-end low precision, and a little bit of deep learning. In International Conference on Machine Learning , pages 4035--4043. PMLR, 2017

2017