WUSH: Near-Optimal Adaptive Transforms for LLM Quantization

Dan Alistarh; Jiale Chen; Roberto L. Castro; Torsten Hoefler; Vage Egiazarian

arxiv: 2512.00956 · v3 · pith:WS4AZZ2Dnew · submitted 2025-11-30 · 💻 cs.LG · cs.CL

WUSH: Near-Optimal Adaptive Transforms for LLM Quantization

Jiale Chen , Vage Egiazarian , Roberto L. Castro , Torsten Hoefler , Dan Alistarh This is my paper

classification 💻 cs.LG cs.CL

keywords quantizationwushefficienthadamardnear-optimalquantizersstandardtransforms

0 comments

read the original abstract

Quantizing LLM weights and activations is a standard approach for efficient deployment, but a few extreme outliers can stretch the dynamic range and amplify low-bit quantization errors. Prior transform-based mitigations (e.g., Hadamard rotations) are fixed and data-agnostic, and their optimality for quantization has remained unclear. We derive closed-form optimal linear blockwise transforms for joint weight-activation quantization under standard RTN AbsMax-scaled block quantizers, covering both integer and floating-point formats. The resulting construction, WUSH, combines a Hadamard backbone with a data-dependent second-moment component to form a non-orthogonal transform that is provably near-optimal for FP and INT quantizers under mild assumptions while admitting an efficient fused GPU implementation. Empirically, WUSH improves W4A4 accuracy over the strongest Hadamard-based baselines (e.g., on Llama-3.1-8B-Instruct in MXFP4, it gains +2.8 average points with RTN and +0.7 with GPTQ) while delivering up to 5.8$\times$ per-layer throughput over BF16 via FP4 MatMul. Source code is available at https://github.com/IST-DASLab/WUSH.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

High-Rate Quantized Matrix Multiplication I
cs.IT 2026-01 unverdicted novelty 5.0

High-rate quantization theory yields accurate approximations for the distortion of absmax INT and FP schemes in generic weight-plus-activation matrix multiplication.