Bridging the gap between promise and performance for microscaling fp4 quantization

Vage Egiazarian, Roberto L · 2026 · arXiv 2509.23202

9 Pith papers cite this work. Polarity classification is still indexing.

9 Pith papers citing it

read on arXiv browse 9 citing papers

citation-role summary

background 1 other 1

citation-polarity summary

background 1 unclear 1

representative citing papers

Four Over Six: More Accurate NVFP4 Quantization with Adaptive Block Scaling

cs.CL · 2025-12-01 · conditional · novelty 7.0

Four Over Six adaptively scales blocks in NVFP4 quantization to smaller FP4 values, making representable value distributions more uniform and reducing quantization error especially for near-maximal values.

Pretraining large language models with MXFP4 on Native FP4 Hardware

cs.LG · 2026-05-11 · unverdicted · novelty 6.0 · 3 refs

Weight gradient FP4 quantization drives LLM pretraining divergence, which deterministic Hadamard rotations can stabilize on native MXFP4 hardware.

Finer is Better (with the Right Scaling)

cs.LG · 2026-05-08 · unverdicted · novelty 6.0

Finer block sizes strictly improve theoretical MSE in microscaling for LLMs when scaling is adjusted to handle heavy-tailed distributions and FP4 binning, allowing standard formats to match custom wider-exponent ones.

Robust Ultra Low-Bit Post-Training Quantization via Stable Diagonal Curvature Estimate

cs.LG · 2026-04-15 · unverdicted · novelty 6.0

DASH-Q uses a stable diagonal curvature estimate and weighted least squares to achieve robust ultra-low-bit post-training quantization of LLMs, improving zero-shot accuracy by 7% on average over baselines.

AdaHOP: Fast and Accurate Low-Precision Training via Outlier-Pattern-Aware Rotation

cs.LG · 2026-04-02 · unverdicted · novelty 6.0

AdaHOP applies pattern-aware Hadamard transforms and selective outlier extraction to enable from-scratch MXFP4 training of LLMs at BF16 quality with up to 3.6X memory compression and 1.46X speedup.

Mix-Quant: Quantized Prefilling, Precise Decoding for Agentic LLMs

cs.CL · 2026-05-19 · unverdicted · novelty 5.0

Mix-Quant quantizes prefilling to NVFP4 and keeps BF16 for decoding in agentic LLMs, achieving up to 3x prefilling speedup while largely preserving task performance on long-context and agentic benchmarks.

TACO: Efficient Communication Compression of Intermediate Tensors for Scalable Tensor-Parallel LLM Training

cs.DC · 2026-04-27 · unverdicted · novelty 5.0

TACO compresses tensor-parallel intermediate tensors with an adaptive FP8 scheme and fused kernels, yielding up to 1.87X throughput gains on GPT and Qwen models with near-lossless accuracy.

DuQuant++: Fine-grained Rotation Enhances Microscaling FP4 Quantization

cs.CV · 2026-04-20 · unverdicted · novelty 4.0

DuQuant++ adapts outlier-aware fine-grained rotation to MXFP4 by matching block size to the 32-element microscaling group, enabling a single rotation that smooths distributions and achieves SOTA performance on LLaMA-3 with lower cost.

Decomposing MXFP4 quantization error for LLM reinforcement learning: reducible bias, recoverable deadzone, and an irreducible floor

cs.LG · 2026-05-19 · 2 refs

citing papers explorer

Showing 9 of 9 citing papers.

Four Over Six: More Accurate NVFP4 Quantization with Adaptive Block Scaling cs.CL · 2025-12-01 · conditional · none · ref 26
Four Over Six adaptively scales blocks in NVFP4 quantization to smaller FP4 values, making representable value distributions more uniform and reducing quantization error especially for near-maximal values.
Pretraining large language models with MXFP4 on Native FP4 Hardware cs.LG · 2026-05-11 · unverdicted · none · ref 9 · 3 links
Weight gradient FP4 quantization drives LLM pretraining divergence, which deterministic Hadamard rotations can stabilize on native MXFP4 hardware.
Finer is Better (with the Right Scaling) cs.LG · 2026-05-08 · unverdicted · none · ref 11
Finer block sizes strictly improve theoretical MSE in microscaling for LLMs when scaling is adjusted to handle heavy-tailed distributions and FP4 binning, allowing standard formats to match custom wider-exponent ones.
Robust Ultra Low-Bit Post-Training Quantization via Stable Diagonal Curvature Estimate cs.LG · 2026-04-15 · unverdicted · none · ref 15
DASH-Q uses a stable diagonal curvature estimate and weighted least squares to achieve robust ultra-low-bit post-training quantization of LLMs, improving zero-shot accuracy by 7% on average over baselines.
AdaHOP: Fast and Accurate Low-Precision Training via Outlier-Pattern-Aware Rotation cs.LG · 2026-04-02 · unverdicted · none · ref 14
AdaHOP applies pattern-aware Hadamard transforms and selective outlier extraction to enable from-scratch MXFP4 training of LLMs at BF16 quality with up to 3.6X memory compression and 1.46X speedup.
Mix-Quant: Quantized Prefilling, Precise Decoding for Agentic LLMs cs.CL · 2026-05-19 · unverdicted · none · ref 6
Mix-Quant quantizes prefilling to NVFP4 and keeps BF16 for decoding in agentic LLMs, achieving up to 3x prefilling speedup while largely preserving task performance on long-context and agentic benchmarks.
TACO: Efficient Communication Compression of Intermediate Tensors for Scalable Tensor-Parallel LLM Training cs.DC · 2026-04-27 · unverdicted · none · ref 16
TACO compresses tensor-parallel intermediate tensors with an adaptive FP8 scheme and fused kernels, yielding up to 1.87X throughput gains on GPT and Qwen models with near-lossless accuracy.
DuQuant++: Fine-grained Rotation Enhances Microscaling FP4 Quantization cs.CV · 2026-04-20 · unverdicted · none · ref 5
DuQuant++ adapts outlier-aware fine-grained rotation to MXFP4 by matching block size to the 32-element microscaling group, enabling a single rotation that smooths distributions and achieves SOTA performance on LLaMA-3 with lower cost.
Decomposing MXFP4 quantization error for LLM reinforcement learning: reducible bias, recoverable deadzone, and an irreducible floor cs.LG · 2026-05-19 · unreviewed · ref 14 · 2 links

Bridging the gap between promise and performance for microscaling fp4 quantization

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer