pith. sign in

arxiv: 2603.29078 · v2 · submitted 2026-03-30 · 💻 cs.CL · cs.LG

PolarQuant: Optimal Gaussian Weight Quantization via Hadamard Rotation for LLM Compression

Pith reviewed 2026-05-14 20:58 UTC · model grok-4.3

classification 💻 cs.CL cs.LG
keywords post-training quantizationHadamard rotationGaussian distributionLLM compressionweight quantizationnear-lossless
0
0 comments X

The pith

Walsh-Hadamard rotation after block normalization converts LLM weights to Gaussian distributions for near-lossless quantization.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

PolarQuant quantizes LLM weights in three steps: block-wise normalization to the unit hypersphere, Walsh-Hadamard rotation to create approximately Gaussian variables, and quantization using Gaussian-matched centroids. The method requires no calibration data and achieves near-FP16 performance, as shown by reducing Qwen3.5-9B perplexity from 6.90 with absmax Q5 to 6.40. The rotation step alone explains 98 percent of the gain, and the approach also improves downstream INT4 quantizers while preserving inference speed.

Core claim

The central discovery is that Walsh-Hadamard rotation applied to block-normalized weights produces approximately Gaussian random variables, allowing quantization with distribution-matched centroids to achieve practically lossless compression without calibration.

What carries the argument

Walsh-Hadamard rotation that transforms block-normalized weights into approximately Gaussian coordinates for matched quantization.

If this is right

  • Q5 quantization reaches within 0.03 perplexity of FP16 on tested models.
  • 98% of the improvement comes from the rotation alone.
  • Preprocessed weights improve torchao INT4 results from 6.68 to 6.56 perplexity.
  • High throughput of 43.1 tokens per second at 6.5 GB VRAM is maintained.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Rotation-based preprocessing may simplify quantization for a wide range of models by removing the need for data-dependent calibration.
  • Similar transformations could apply to other compression techniques that assume specific distributions.
  • Verifying the Gaussian fit on more architectures would test how general the approach is.

Load-bearing premise

Block-wise normalized weights become approximately Gaussian random variables after Walsh-Hadamard rotation.

What would settle it

A direct measurement showing that post-rotation weights in LLM layers do not follow a Gaussian distribution closely enough to make the matched centroids effective, resulting in higher perplexity than absmax quantization.

read the original abstract

We present PolarQuant, a post-training weight quantization method for large language models (LLMs) that exploits the distributional structure of neural network weights to achieve near-lossless compression. PolarQuant operates in three stages: (1) block-wise normalization to the unit hypersphere, (2) Walsh-Hadamard rotation to transform coordinates into approximately Gaussian random variables, and (3) quantization with centroids matched to the Gaussian distribution. Our ablation reveals that Hadamard rotation alone accounts for 98% of the quality improvement, reducing Qwen3.5-9B perplexity from 6.90 (absmax Q5) to 6.40 (Delta = +0.03 from FP16), making it practically lossless without any calibration data. Furthermore, PolarQuant functions as an effective preprocessing step for downstream INT4 quantizers: PolarQuant Q5 dequantized and re-quantized by torchao INT4 achieves perplexity 6.56 versus 6.68 for direct absmax INT4, while maintaining 43.1 tok/s throughput at 6.5 GB VRAM. Code and models are publicly available.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces PolarQuant, a post-training weight quantization method for LLMs consisting of three stages: block-wise normalization to the unit hypersphere, Walsh-Hadamard rotation to produce approximately Gaussian coordinates, and quantization with Gaussian-matched centroids. It claims near-lossless compression without calibration data, supported by an ablation attributing 98% of gains to the rotation step, with concrete results on Qwen3.5-9B reducing perplexity from 6.90 (absmax Q5) to 6.40 (+0.03 over FP16) and improved performance when used as preprocessing for torchao INT4.

Significance. If the empirical claims hold under full verification, the method would be significant for calibration-free LLM compression, demonstrating that fixed transforms like Walsh-Hadamard can yield near-lossless Gaussian quantization across models. The public code and models strengthen reproducibility potential, and the preprocessing compatibility with existing INT4 tools adds practical value for deployment efficiency.

major comments (2)
  1. [Abstract] Abstract: the ablation claim that Hadamard rotation alone accounts for 98% of the quality improvement is load-bearing for the central optimality argument, yet the abstract provides no breakdown of the ablation design, isolated component contributions, controls, or statistical details, preventing assessment of whether the attribution is robust.
  2. [Abstract] Abstract: the foundational assumption that block-wise normalized weights become approximately Gaussian random variables after Walsh-Hadamard rotation is presented without any supporting distributional analysis, histograms, or references; this directly underpins the Gaussian centroid choice and the near-lossless claim but remains unverified in the given text.
minor comments (2)
  1. [Abstract] Abstract: the FP16 perplexity value is referenced only via delta (+0.03) rather than stated absolutely, which reduces clarity when comparing the 6.40 result.
  2. [Abstract] Abstract: the INT4 preprocessing results (perplexity 6.56 vs 6.68, 43.1 tok/s at 6.5 GB) would benefit from explicit baseline throughput/VRAM numbers for direct absmax INT4 to allow full evaluation of the claimed gains.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on the abstract. We will revise the abstract to include a brief description of the ablation design and a short justification for the Gaussian distributional assumption, while maintaining conciseness.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the ablation claim that Hadamard rotation alone accounts for 98% of the quality improvement is load-bearing for the central optimality argument, yet the abstract provides no breakdown of the ablation design, isolated component contributions, controls, or statistical details, preventing assessment of whether the attribution is robust.

    Authors: We agree the abstract should briefly contextualize the ablation. The full manuscript details an ablation on Qwen3.5-9B comparing full PolarQuant against ablated versions (no normalization, no rotation, no Gaussian centroids) using perplexity metrics, with rotation contributing the dominant share; results are averaged over multiple seeds. We will revise the abstract to note the ablation compares isolated components and reports the 98% figure from those controlled experiments. revision: yes

  2. Referee: [Abstract] Abstract: the foundational assumption that block-wise normalized weights become approximately Gaussian random variables after Walsh-Hadamard rotation is presented without any supporting distributional analysis, histograms, or references; this directly underpins the Gaussian centroid choice and the near-lossless claim but remains unverified in the given text.

    Authors: The assumption follows from the fact that the Walsh-Hadamard matrix is an orthogonal transform whose rows are balanced sign patterns; applying it to high-dimensional unit vectors yields coordinates whose marginals converge to Gaussian by a central-limit effect, as each coordinate is a sum of many independent signed terms. The full paper includes empirical histograms confirming this approximation. We will add a concise clause in the revised abstract referencing this property and the supporting analysis in the main text. revision: yes

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The abstract presents PolarQuant as an empirical post-training method with three explicit stages—block-wise unit normalization, Walsh-Hadamard rotation to produce approximately Gaussian coordinates, and Gaussian-matched centroid quantization—whose performance is demonstrated via ablation on Qwen3.5-9B perplexity (6.90 to 6.40). No equations, parameter fits, or derivations are supplied that reduce the claimed improvement or Gaussian assumption back to the target result by construction. The 98% attribution to rotation is stated as an experimental outcome rather than a self-referential definition, and the text contains no self-citations, uniqueness theorems, or ansatzes imported from prior author work. The derivation chain is therefore self-contained as a practical preprocessing technique validated externally by measured perplexity and throughput figures.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on one domain assumption about the distributional effect of the Hadamard rotation; no free parameters or new entities are introduced.

axioms (1)
  • domain assumption After block-wise normalization to the unit hypersphere and Walsh-Hadamard rotation, weight coordinates are approximately Gaussian random variables.
    This premise justifies the choice of Gaussian-matched quantization centroids and is stated as the basis for the observed quality gains.

pith-pipeline@v0.9.0 · 5468 in / 1171 out tokens · 41514 ms · 2026-05-14T20:58:20.511202+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Fast-TurboQuant: A Multiplier-Free Online Vector Quantization Approach

    cs.LG 2026-06 unverdicted novelty 6.0

    Fast-TurboQuant substitutes a structured fast Johnson-Lindenstrauss transform for dense random projections in 1-bit vector quantization, cutting arithmetic to additions only and reporting 19.7x speedup plus lower MSE ...