pith. sign in

arxiv: 2603.04956 · v2 · pith:4PPORGAUnew · submitted 2026-03-05 · 💻 cs.LG · cs.IT· math.IT

WaterSIC: Information-Theoretically (Near) Optimal Linear Layer Quantization

classification 💻 cs.LG cs.ITmath.IT
keywords watersicquantizationalgorithmbitsdifferentlayerlimitlinear
0
0 comments X
read the original abstract

This paper considers the problem of converting a given dense linear layer to low precision. The tradeoff between compressed length and output discrepancy is analyzed information theoretically (IT). It is shown that a popular GPTQ algorithm may have an arbitrarily large gap to the IT limit. To alleviate this problem, a novel algorithm, termed ``WaterSIC'', is proposed and is shown to be within a rate gap of 0.255 bits to the IT limit, uniformly over all possible covariance matrices of input activations. The key innovation of WaterSIC's is to allocate different quantization rates to different columns (in-features) of the weight matrix, mimicking the classical IT solution known as "waterfilling". Applying WaterSIC to the Llama and Qwen family of LLMs establishes new state-of-the-art performance for all quantization rates from 1 to 4 bits. Our code is available at https://github.com/egorlifar/watersic.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. High-Rate Quantized Matrix Multiplication II

    cs.LG 2026-05 unverdicted novelty 6.0

    Waterfilling rate allocation makes quantized matrix multiplication for LLMs near information-theoretically optimal, with WaterSIC being basis-free and within 0.25 bits per entry of the limit.