CBQ: Cross-Block Quantization for Large Language Models

Baoqun Yin; Hanting Chen; Jie Hu; Wei Li; Xiaoyu Liu; Xin Ding; Yehui Tang; Yunhe Wang; Yun Zhang; Zhijun Tu

arxiv: 2312.07950 · v5 · pith:JFD27HN3new · submitted 2023-12-13 · 💻 cs.LG · cs.CL

CBQ: Cross-Block Quantization for Large Language Models

Xin Ding , Xiaoyu Liu , Zhijun Tu , Yun Zhang , Wei Li , Jie Hu , Hanting Chen , Yehui Tang

show 3 more authors

Zhiwei Xiong Baoqun Yin Yunhe Wang

This is my paper

classification 💻 cs.LG cs.CL

keywords quantizationcross-blockllmsonlyoutliersacrossblocksdependency

0 comments

read the original abstract

Post-training quantization (PTQ) has played a key role in compressing large language models (LLMs) with ultra-low costs. However, existing PTQ methods only focus on handling the outliers within one layer or one block, which ignores the dependency of blocks and leads to severe performance degradation in low-bit settings. In this paper, we propose CBQ, a cross-block reconstruction-based PTQ method for LLMs. CBQ employs a cross-block dependency using a homologous reconstruction scheme, establishing long-range dependencies across multiple blocks to minimize error accumulation. Furthermore, CBQ incorporates a coarse-to-fine preprocessing (CFP) strategy for suppressing weight and activation outliers, coupled with an adaptive LoRA-Rounding technique for precise weight quantization. These innovations enable CBQ to not only handle extreme outliers effectively but also improve overall quantization accuracy. Extensive experiments show that CBQ achieves superior low-bit quantization (W4A4, W4A8, W2A16) and outperforms existing state-of-the-art methods across various LLMs and datasets. Notably, CBQ quantizes the 4-bit LLAMA1-65B model within only 4.3 hours on a single GPU, achieving a commendable tradeoff between performance and quantization efficiency.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 5 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

DPQuant: Efficient and Differentially-Private Model Training via Dynamic Quantization Scheduling
cs.LG 2025-09 unverdicted novelty 7.0

DPQuant uses epoch-wise probabilistic layer rotation and DP loss sensitivity to quantize only a changing subset of layers, reducing accuracy degradation from quantization noise in DP-SGD and delivering up to 2.21x thr...
Breaking Modality Heterogeneity in Low-Bit Quantization for Large Vision-Language Models
cs.CV 2026-05 unverdicted novelty 6.0

SplitQ improves low-bit PTQ for VLMs by isolating modality-specific outlier channels via MOCD and applying dual-branch adaptive calibration via ACC, outperforming prior methods on six datasets across W4A8 to W3A2 settings.
CoreQ: Learning-Free Mismatch Correction and Successive Rounding for Quantization
cs.LG 2026-02 unverdicted novelty 6.0

CoreQ delivers adaptive mismatch correction via closed-form geometric coefficient and successive rounding to improve PTQ accuracy for large language models.
Rethinking 1-bit Optimization Leveraging Pre-trained Large Language Models
cs.CL 2025-08 conditional novelty 6.0

A progressive training scheme with binary-aware initialization and dual-scaling allows pre-trained LLMs to be converted to high-performance 1-bit models without training from scratch.
SEPTQ: A Simple and Effective Post-Training Quantization Paradigm for Large Language Models
cs.CL 2026-04 unverdicted novelty 5.0

SEPTQ simplifies LLM post-training quantization to two steps via static global importance scoring and mask-guided column-wise weight updates, claiming superior results over baselines in low-bit settings.