Derives non-asymptotic 2-norm and infinity-norm error bounds for deterministic and stochastic variants of OPTQ and Qronos PTQ algorithms.
Title resolution pending
3 Pith papers cite this work. Polarity classification is still indexing.
verdicts
UNVERDICTED 3representative citing papers
Waterfilling rate allocation makes quantized matrix multiplication for LLMs near information-theoretically optimal, with WaterSIC being basis-free and within 0.25 bits per entry of the limit.
High-rate quantization theory yields accurate approximations for the distortion of absmax INT and FP schemes in generic weight-plus-activation matrix multiplication.
citing papers explorer
-
Provable Post-Training Quantization: Theoretical Analysis of OPTQ and Qronos
Derives non-asymptotic 2-norm and infinity-norm error bounds for deterministic and stochastic variants of OPTQ and Qronos PTQ algorithms.
-
High-Rate Quantized Matrix Multiplication II
Waterfilling rate allocation makes quantized matrix multiplication for LLMs near information-theoretically optimal, with WaterSIC being basis-free and within 0.25 bits per entry of the limit.
-
High-Rate Quantized Matrix Multiplication I
High-rate quantization theory yields accurate approximations for the distortion of absmax INT and FP schemes in generic weight-plus-activation matrix multiplication.