Mixed-Precision Quantization of Large Language Models

Utkarsh Saxena, Sayeh Sharify, Kaushik Roy, Xin Wang · 2024 · arXiv 2412.14363

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

Variance Is Not Importance: Structural Analysis of Transformer Compressibility Across Model Scales

cs.LG · 2026-04-22 · unverdicted · novelty 7.0

High-variance activation directions are uncorrelated with predictions, transformer blocks grow more linear with depth, and single-block linear replacement yields 34x compression on Mistral's final block at a 1.71 perplexity cost.

dMX: Differentiable Mixed-Precision Assignment for Low-Precision Floating-Point Formats

cs.LG · 2026-06-02 · unverdicted · novelty 6.0

dMX is a differentiable mixed-precision framework that learns per-layer MXFP bit-width assignments for LLMs and outperforms KL-based heuristics on perplexity and zero-shot accuracy under bit-width budgets.

citing papers explorer

Showing 2 of 2 citing papers.

Variance Is Not Importance: Structural Analysis of Transformer Compressibility Across Model Scales cs.LG · 2026-04-22 · unverdicted · none · ref 6
High-variance activation directions are uncorrelated with predictions, transformer blocks grow more linear with depth, and single-block linear replacement yields 34x compression on Mistral's final block at a 1.71 perplexity cost.
dMX: Differentiable Mixed-Precision Assignment for Low-Precision Floating-Point Formats cs.LG · 2026-06-02 · unverdicted · none · ref 35
dMX is a differentiable mixed-precision framework that learns per-layer MXFP bit-width assignments for LLMs and outperforms KL-based heuristics on perplexity and zero-shot accuracy under bit-width budgets.

Mixed-Precision Quantization of Large Language Models

fields

years

verdicts

representative citing papers

citing papers explorer