LLM 2-bit quantization fails via either cumulative signal degradation or early computation collapse in key components.
Exploring
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2representative citing papers
SnapMLA achieves up to 1.91x higher throughput in long-output MLA decoding using FP8 quantization and specialized kernels while keeping benchmark quality near the BF16 baseline.
citing papers explorer
-
From Signal Degradation to Computation Collapse: Uncovering the Two Failure Modes of LLM Quantization
LLM 2-bit quantization fails via either cumulative signal degradation or early computation collapse in key components.
-
SnapMLA: Efficient Long-Context MLA Decoding via Hardware-Aware FP8 Quantized Pipelining
SnapMLA achieves up to 1.91x higher throughput in long-output MLA decoding using FP8 quantization and specialized kernels while keeping benchmark quality near the BF16 baseline.