LC-QAT: Data-Efficient 2-Bit QAT for LLMs via Linear-Constrained Vector Quantization
Pith reviewed 2026-07-02 22:44 UTC · model grok-4.3
The pith
LC-QAT represents 2-bit LLM weights as a learned affine mapping over discrete vectors to enable differentiable training from a strong PTQ start.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
LC-QAT represents quantized weights via a learned affine mapping over discrete vectors, which yields a high-quality PTQ initialization and enables fully differentiable end-to-end optimization without explicit codebook lookup in the training forward pass. Experiments across diverse LLMs demonstrate that LC-QAT consistently outperforms state-of-the-art QAT methods while using only 0.1%--10% of the training data.
What carries the argument
The learned affine mapping over discrete vectors, which replaces explicit codebook lookup to keep the forward pass differentiable while retaining the capacity of vector quantization.
If this is right
- 2-bit weight-only quantization becomes feasible for LLMs without the severe accuracy loss typical of scalar methods.
- Training data requirements drop to between 0.1% and 10% of standard QAT budgets while still exceeding prior performance.
- The same framework supplies both a strong post-training starting point and end-to-end optimization.
- Extreme low-bit deployment of LLMs is presented as immediately practical and scalable.
Where Pith is reading between the lines
- The mapping technique could reduce the cost of quantizing models when only limited calibration data is available.
- Because the forward pass stays differentiable, the method may combine more readily with other gradient-based compression stages.
- The separation of initialization and optimization steps suggests the approach might extend to bit widths other than 2 bits with limited additional tuning.
Load-bearing premise
The learned affine mapping can be optimized to deliver both a high-quality PTQ initialization and fully differentiable training without ever performing discrete codebook lookup.
What would settle it
A controlled comparison in which LC-QAT either requires more than 10% of the usual training data to reach the accuracy of existing 2-bit QAT methods or fails to exceed their accuracy on the same models and data budgets.
Figures
read the original abstract
Quantization-aware training (QAT) is essential for extremely low-bit large language models (LLMs). Current QAT methods are mainly based on scalar quantization (SQ), which enables efficient optimization but suffers from severe performance degradation at 2-bit precision. On the other hand, vector quantization (VQ) provides substantially higher representational capacity, but its discrete codebook lookup prevents end-to-end training. We propose LC-QAT, a 2-bit weight-only VQ-QAT framework that represents quantized weights via a learned affine mapping over discrete vectors, which yields a high-quality PTQ initialization and enables fully differentiable end-to-end optimization without explicit codebook lookup in the training forward pass. This strong post-training initialization makes LC-QAT highly data-efficient. Experiments across diverse LLMs demonstrate that LC-QAT consistently outperforms state-of-the-art QAT methods while using only 0.1%--10% of the training data. Our results establish LC-QAT as a practical and scalable solution for extreme low-bit model deployment. Codes are publicly available at https://github.com/AI9Stars/UniSVQ.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes LC-QAT, a 2-bit weight-only vector quantization (VQ) based quantization-aware training (QAT) framework for LLMs. It represents quantized weights as a learned affine mapping over discrete vectors to obtain a strong post-training quantization (PTQ) initialization while enabling fully differentiable end-to-end optimization that avoids explicit codebook lookup during the forward pass. Experiments show consistent outperformance of prior QAT methods on diverse LLMs using only 0.1%–10% of the training data.
Significance. If the central construction holds, the work would meaningfully advance extreme low-bit LLM deployment by bridging the representational capacity of VQ with the trainability of QAT and substantially lowering data requirements. The public release of code is a positive factor for reproducibility.
major comments (2)
- [Abstract] Abstract (framework paragraph): the claim that a learned affine mapping simultaneously supplies a high-quality PTQ initialization and permits fully differentiable end-to-end VQ optimization without ever performing explicit codebook lookup is load-bearing for the data-efficiency result, yet the abstract supplies no equations showing how the affine parameters interact with the discrete vectors or how the 2-bit constraint is preserved throughout training.
- [Framework description] The skeptic's concern lands: without the explicit forward-pass formulation it is impossible to verify whether the mapping collapses the effective capacity of VQ or fails to enforce discreteness while remaining differentiable; if either occurs, the reported gains over scalar QAT baselines with 0.1–10 % data would not follow.
minor comments (1)
- [Abstract] The abstract states performance gains and data efficiency but supplies no equations, ablation details, or error analysis; the full manuscript should include these to allow verification.
Simulated Author's Rebuttal
We thank the referee for the constructive comments on the abstract and framework description. We address each point below and will revise the manuscript to improve clarity on the mathematical formulation.
read point-by-point responses
-
Referee: [Abstract] Abstract (framework paragraph): the claim that a learned affine mapping simultaneously supplies a high-quality PTQ initialization and permits fully differentiable end-to-end VQ optimization without ever performing explicit codebook lookup is load-bearing for the data-efficiency result, yet the abstract supplies no equations showing how the affine parameters interact with the discrete vectors or how the 2-bit constraint is preserved throughout training.
Authors: We agree the abstract is high-level by design. The interaction is formalized in the manuscript body (Section 3, Equations 1-4): quantized weights are W_q = A V + b where V belongs to a discrete codebook of size 4 (enforcing the 2-bit constraint per vector) and A, b are learned affine parameters initialized via PTQ. This enables the claimed properties. We will revise the abstract to include a concise textual reference to this formulation for better self-containment. revision: yes
-
Referee: [Framework description] The skeptic's concern lands: without the explicit forward-pass formulation it is impossible to verify whether the mapping collapses the effective capacity of VQ or fails to enforce discreteness while remaining differentiable; if either occurs, the reported gains over scalar QAT baselines with 0.1–10 % data would not follow.
Authors: The explicit forward pass is provided in Section 3.2: it applies the learned affine transform directly to the discrete vectors (initialized from PTQ) using a straight-through estimator for gradients, avoiding codebook lookup while keeping vectors constrained to the finite discrete set. This preserves both discreteness and VQ capacity, as confirmed by our ablations and theoretical bound in Appendix A. We will add an explicit forward/backward pass algorithm box and expanded discussion in the revised version to eliminate any ambiguity. revision: yes
Circularity Check
No circularity: empirical claims rest on experiments, not algebraic reduction to inputs
full rationale
The paper proposes LC-QAT via a learned affine mapping over discrete vectors for 2-bit VQ-QAT, asserts this yields strong PTQ initialization and differentiable training without codebook lookup, then reports empirical outperformance on diverse LLMs with 0.1-10% data. No equations, fitted parameters, or self-citations are shown that would make the performance results a direct algebraic consequence of the construction by definition. The central claims are supported by external experimental benchmarks rather than reducing to the method's own definitions or prior self-citations.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption An affine mapping over discrete vectors can be learned end-to-end while preserving the representational benefits of vector quantization and avoiding explicit codebook lookup during training.
Reference graph
Works this paper leans on
-
[1]
Unveiling the basin-like loss landscape in large language models.CoRR, abs/2505.17646,
Chen, H., Dong, Y ., Wei, Z., Huang, Y ., Zhang, Y ., Su, H., and Zhu, J. Unveiling the basin-like loss landscape in large language models. CoRR, abs/2505.17646,
-
[2]
Chen, M., Tworek, J., Jun, H., Yuan, Q., de Oliveira Pinto, H. P., Kaplan, J., Edwards, H., Burda, Y ., Joseph, N., Brockman, G., Ray, A., Puri, R., Krueger, G., Petrov, M., Khlaaf, H., Sastry, G., Mishkin, P., Chan, B., Gray, S., Ryder, N., Pavlov, M., Power, A., Kaiser, L., Bavar- ian, M., Winter, C., Tillet, P., Such, F. P., Cummings, D., Plappert, M.,...
work page internal anchor Pith review Pith/arXiv arXiv
-
[3]
Efficientqat: Efficient quantization- aware training for large language models.CoRR, abs/2407.11062,
Chen, M., Shao, W., Xu, P., Wang, J., Gao, P., Zhang, K., and Luo, P. Efficientqat: Efficient quantization- aware training for large language models. CoRR, abs/2407.11062,
-
[4]
Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge
Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., and Tafjord, O. Think you have solved question answering? try arc, the ai2 reasoning challenge. CoRR, abs/1803.05457,
work page internal anchor Pith review Pith/arXiv arXiv
-
[5]
Training Verifiers to Solve Math Word Problems
Cobbe, K., Kosaraju, V ., Bavarian, M., Chen, M., Jun, H., Kaiser, L., Plappert, M., Tworek, J., Hilton, J., Nakano, R., Hesse, C., and Schulman, J. Training verifiers to solve math word problems. CoRR, abs/2110.14168,
work page internal anchor Pith review Pith/arXiv arXiv
-
[6]
GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers
Frantar, E., Ashkboos, S., Hoefler, T., and Alistarh, D. GPTQ: Accurate post-training compression for gener- ative pretrained transformers. CoRR, abs/2210.17323,
work page internal anchor Pith review Pith/arXiv arXiv
-
[7]
Low-precision training of large language models: Methods, challenges, and opportunities
Hao, Z., Guo, J., Shen, L., Luo, Y ., Hu, H., Wang, G., Yu, D., Wen, Y ., and Tao, D. Low-precision training of large language models: Methods, challenges, and opportunities. CoRR, abs/2505.01043,
-
[8]
Measuring Massive Multitask Language Understanding
Hendrycks, D., Burns, C., Basart, S., Zou, A., Mazeika, M., Song, D., and Steinhardt, J. Measuring massive multitask language understanding. CoRR, abs/2009.03300,
work page internal anchor Pith review Pith/arXiv arXiv 2009
-
[9]
MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies
Hu, S., Tu, Y ., Han, X., He, C., Cui, G., Long, X., Zheng, Z., Fang, Y ., Huang, Y ., Zhao, W., Zhang, X., Thai, Z. L., Zhang, K., Wang, C., Yao, Y ., Zhao, C., Zhou, J., Cai, J., Zhai, Z., Ding, N., Jia, C., Zeng, G., Li, D., Liu, Z., and Sun, M. Minicpm: Unveiling the potential of small language models with scalable training strategies. CoRR, abs/2404.06395,
work page internal anchor Pith review Pith/arXiv arXiv
-
[10]
Lightman, H., Kosaraju, V ., Burda, Y ., Edwards, H., Baker, B., Lee, T., Leike, J., Schulman, J., Sutskever, I., and Cobbe, K. Let’s verify step by step. CoRR, abs/2305.20050,
work page internal anchor Pith review Pith/arXiv arXiv
-
[11]
Llm-qat: Data-free quantization aware training for large language models.CoRR, abs/2305.17888,
Liu, Z., Oguz, B., Zhao, C., Chang, E., Stock, P., Mehdad, Y ., Shi, Y ., Krishnamoorthi, R., and Chandra, V . Llm-qat: Data-free quantization aware training for large language models. CoRR, abs/2305.17888,
-
[12]
Llama Team. The llama 3 herd of models. CoRR, abs/2407.21783,
work page internal anchor Pith review Pith/arXiv arXiv
-
[13]
Bitnet b1.58 2b4t technical report
Ma, S., Wang, H., Huang, S., Zhang, X., Hu, Y ., Song, T., Xia, Y ., and Wei, F. Bitnet b1.58 2b4t technical report. CoRR, abs/2504.12285,
-
[14]
Pointer Sentinel Mixture Models
Merity, S., Xiong, C., Bradbury, J., and Socher, R. Pointer sentinel mixture models. CoRR, abs/1609.07843,
work page internal anchor Pith review Pith/arXiv arXiv
-
[15]
Can a Suit of Armor Conduct Electricity? A New Dataset for Open Book Question Answering
Mihaylov, T., Clark, P., Khot, T., and Sabharwal, A. Can a suit of armor conduct electricity? a new dataset for open book question answering. CoRR, abs/1809.02789,
work page internal anchor Pith review Pith/arXiv arXiv
-
[16]
The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale
Penedo, G., Kydl´ıˇcek, H., allal, L. B., Lozhkov, A., Mitchell, M., Raffel, C., Werra, L. V ., and Wolf, T. The FineWeb datasets: Decanting the web for the finest text data at scale. CoRR, abs/2406.17557,
work page internal anchor Pith review Pith/arXiv arXiv
-
[17]
Tseng, A., Chee, J., Sun, Q., Kuleshov, V ., and Sa, C. D. QuIP#: Even better llm quantization with hadamard in- coherence and lattice codebooks. In Proceedings of the International Conference on Machine Learning, 2024a. Tseng, A., Sun, Q., Hou, D., and De Sa, C. QTIP: quan- tization with trellises and incoherence processing. In Proceedings of the Interna...
work page internal anchor Pith review Pith/arXiv arXiv
-
[18]
Optimizing Large Language Model Training Using FP4 Quantization
Wang, R., Gong, Y ., Liu, X., Zhao, G., Yang, Z., Guo, B., Zha, Z., and Cheng, P. Optimizing large language model training using fp4 quantization. CoRR, abs/2501.17116,
work page internal anchor Pith review Pith/arXiv arXiv
-
[19]
Yang, A., Li, A., Yang, B., Zhang, B., Hui, B., Zheng, B., Yu, B., Gao, C., Huang, C., Lv, C., Zheng, C., Liu, D., Zhou, F., Huang, F., Hu, F., Ge, H., Wei, H., Lin, H., Tang, J., Yang, J., Tu, J., Zhang, J., Yang, J., Yang, J., Zhou, J., Lin, J., Dang, K., Bao, K., Yang, K., Yu, L., Deng, L., Li, M., Xue, M., Li, M., Zhang, P., Wang, P., Zhu, Q., Men, R....
work page internal anchor Pith review Pith/arXiv arXiv
-
[20]
Yin, P., Lyu, J., Zhang, S., Osher, S., Qi, Y ., and Xin, J. Understanding straight-through estimator in training ac- tivation quantized neural nets. CoRR, abs/1903.05662,
-
[21]
Instruction-Following Evaluation for Large Language Models
Zhou, J., Lu, T., Mishra, S., Brahma, S., Basu, S., Luan, Y ., Zhou, D., and Hou, L. Instruction-following evaluation for large language models. CoRR, abs/2311.07911,
work page internal anchor Pith review Pith/arXiv arXiv
-
[22]
CCQ: Convolutional code for extreme low-bit quantization in llms.CoRR, abs/2507.07145,
Zhou, Z., Li, X., Li, M., Zhang, H., Wang, H., Chang, W., Liu, Y ., Dang, Q., Yu, D., Ma, Y ., and Wang, H. CCQ: Convolutional code for extreme low-bit quantization in llms. CoRR, abs/2507.07145,
-
[23]
METHOD PTQ TIME (H) QAT TIME (H) T OTAL TIME (H) LC-QAT 6 55 61 PARETO Q N/A 417 417 A.3
Total wall-clock time comparison including PTQ initialization (estimated on 8 A800 GPUs). METHOD PTQ TIME (H) QAT TIME (H) T OTAL TIME (H) LC-QAT 6 55 61 PARETO Q N/A 417 417 A.3. Detailed Results of Preliminary Optimization Analysis Table 8 shows the performance discrepancy between the initialization point used by LC-QAT and that of scalar quantization. ...
2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.