DuQuant++ adapts outlier-aware fine-grained rotation to MXFP4 by matching block size to the 32-element microscaling group, enabling a single rotation that smooths distributions and achieves SOTA performance on LLaMA-3 with lower cost.
Sherry: Hardware-efficient 1.25-bit ternary quantization via fine-grained sparsification
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it