Grid Games: The Power of Multiple Grids for Quantizing Large Language Models

· 2026 · cs.LG · arXiv 2605.12327

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

A major recent advance in quantization is given by microscaled 4-bit formats such as NVFP4 and MXFP4, quantizing values into small groups sharing a scale, assuming a fixed floating-point grid. In this paper, we study the following natural extension: assume that, for each group of values, we are free to select the "better" among two or more 4-bit grids marked by one or more bits in the scale value. We formalize the power-of-two-grids (PO2) problem, and provide theoretical results showing that practical small-group formats such as MXFP or NVFP can benefit significantly from PO2 grids, while the advantage vanishes for very large groups. On the practical side, we instantiate several grid families, including 1) PO2(NF4), which pairs the standard NF4 normal grid with a learned grid, 2) MPO2, a grid pair that is fully learned over real weights and activations, 3) PO2(Split87), an explicit-zero asymmetric grid and 4) SFP4, a TensorCore-implementable triple which pairs NVFP4 with two shifted variants. Results for post-training quantization of standard open models and pre-training of Llama-like models show that adaptive grids consistently improve accuracy vs single-grid FP4 under both weight-only and weight+activation. Source code is available at https://github.com/IST-DASLab/GridGames.

representative citing papers

A Hardware-Aware, Per-Layer Methodology for Post-Training Quantization of Large Language Models

cs.LG · 2026-05-14 · unverdicted · novelty 4.0

SOP post-training quantization for LLMs reports lower weight reconstruction error than per-layer FP8 at 1.5 bpw lower cost using per-layer codebook search and hardware-aware formats.

citing papers explorer

Showing 1 of 1 citing paper.

A Hardware-Aware, Per-Layer Methodology for Post-Training Quantization of Large Language Models cs.LG · 2026-05-14 · unverdicted · none · ref 10 · internal anchor
SOP post-training quantization for LLMs reports lower weight reconstruction error than per-layer FP8 at 1.5 bpw lower cost using per-layer codebook search and hardware-aware formats.

Grid Games: The Power of Multiple Grids for Quantizing Large Language Models

fields

years

verdicts

representative citing papers

citing papers explorer