Drop-by-Drop uses additive codebooks and Matryoshka-style training to produce one LLM model whose ordered codebook subsets give accurate reconstructions at successively higher bitwidths under a weighted MSE distortion.
Residual quantization with implicit neural codebooks
2 Pith papers cite this work. Polarity classification is still indexing.
years
2026 2verdicts
UNVERDICTED 2representative citing papers
EVA is a vector-quantization hardware architecture that transforms LLM decoding from GEMV to GEMM via direct codebook dot products and conflict-free output buffering, claiming up to 11.17x speedup over prior lookup designs.
citing papers explorer
-
Multi-Bitwidth Quantization for LLMs Using Additive Codebooks
Drop-by-Drop uses additive codebooks and Matryoshka-style training to produce one LLM model whose ordered codebook subsets give accurate reconstructions at successively higher bitwidths under a weighted MSE distortion.
-
EVA: Accelerating LLM Decoding via an Efficient Vector Quantization Architecture
EVA is a vector-quantization hardware architecture that transforms LLM decoding from GEMV to GEMM via direct codebook dot products and conflict-free output buffering, claiming up to 11.17x speedup over prior lookup designs.