Recognition: 1 theorem link
· Lean TheoremCoreQ: Learning-Free Mismatch Correction and Successive Rounding for Quantization
Pith reviewed 2026-05-16 06:49 UTC · model grok-4.3
The pith
A closed-form geometric coefficient corrects quantization mismatch across layers without retraining or tuning
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
CoreQ derives a closed-form coefficient from the geometric decomposition of the activation mismatch and uses it to adaptively correct each layer's calibration target, after which the induced triangular least-squares objective is minimized by an efficient greedy successive-rounding solver.
What carries the argument
Closed-form mismatch correction coefficient obtained from geometric decomposition of the activation error
If this is right
- The method improves perplexity and downstream accuracy over strong PTQ baselines across LLM families, scales, bit-widths, and quantization settings.
- No hyperparameter tuning is required because the coefficient adapts automatically to each layer.
- The K-CoreQ beam-search extension trades modest extra compute for further gains while remaining learning-free.
- The triangular least-squares formulation is solved exactly by the greedy successive-rounding procedure.
Where Pith is reading between the lines
- Similar closed-form geometric corrections could be tested on other sequential compression operations such as pruning or low-rank adaptation.
- If the geometric assumption holds for activation mismatches, the same coefficient derivation might apply to calibration objectives in other layer-wise optimization settings.
- The absence of learned parameters suggests the approach may maintain performance when calibration data is extremely limited or drawn from a different distribution than the test set.
Load-bearing premise
The geometric decomposition of the mismatch produces a coefficient that compensates for error propagation without introducing bias or overfitting to the finite calibration set.
What would settle it
If applying the derived coefficient to adjust calibration targets produces no improvement or degrades perplexity and accuracy relative to uncorrected PTQ baselines on the same models and calibration data, the central claim is false.
Figures
read the original abstract
Post-training quantization (PTQ) enables efficient deployment of large language models by mapping pretrained weights to low-bit formats without retraining, typically using a small calibration set to minimize a layer-wise calibration objective. However, this sequential procedure induces a mismatch: errors from earlier quantized layers alter the inputs received by later layers, causing the activations to deviate from those of the full-precision model. Recent approaches introduce mismatch-aware calibration objectives to compensate for this effect, but leave open how much of the observed mismatch should shift each layer's calibration target. Fully applying this correction can overfit limited calibration data, while scaling the mismatch correction with a fixed coefficient ignores varying reliability of mismatch estimates across layers. To address these limitations, we propose CoreQ, a learning-free PTQ framework that applies a closed-form coefficient for mismatch correction derived from a geometric decomposition of the mismatch. The resulting coefficient adapts the correction across layers, reduces overfitting to finite calibration data, and requires no hyperparameter tuning. Given the corrected target, CoreQ minimizes the induced triangular least-squares objective with an efficient greedy successive-rounding solver and a bounded beam-search extension, K-CoreQ, that trades modest additional compute for improved performance. Across multiple LLM families, scales, bit-widths, and quantization settings, CoreQ improves perplexity and downstream accuracy over strong PTQ baselines.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes CoreQ, a learning-free post-training quantization (PTQ) framework for large language models. It derives a closed-form coefficient for mismatch correction from a geometric decomposition of the quantization-induced input deviation, uses this to adaptively adjust layer-wise calibration targets, and minimizes the resulting triangular least-squares objective via a greedy successive-rounding solver (with an optional bounded beam-search extension K-CoreQ). Experiments report improved perplexity and downstream accuracy over strong PTQ baselines across LLM families, scales, bit-widths, and settings.
Significance. If the geometric decomposition produces an unbiased closed-form coefficient that reliably compensates mismatch propagation without overfitting the calibration set, CoreQ would represent a meaningful advance in parameter-free PTQ by eliminating hyperparameter tuning and providing an efficient solver for the induced objective. The approach directly targets a core limitation of sequential layer-wise PTQ.
major comments (1)
- [§3 (geometric decomposition and coefficient derivation)] The central derivation assumes that geometric decomposition of the mismatch yields a closed-form coefficient that compensates error propagation. However, quantization mismatch propagates through non-linear activations (ReLU, softmax, LayerNorm) whose Jacobians are state-dependent and non-constant, violating the linear projection assumption used to obtain the coefficient. This risks under-correction in deeper blocks and residual input deviation that successive rounding cannot fully recover. The manuscript must explicitly state the linearity assumption and provide either a proof of robustness or empirical validation that the coefficient remains effective under realistic non-linear propagation.
minor comments (2)
- [Abstract and §4] Clarify the precise definition of the 'triangular least-squares objective' and how the successive-rounding solver exploits its structure; the abstract introduces the term without sufficient context for readers unfamiliar with the formulation.
- [§3.3] The abstract states that the coefficient 'adapts the correction across layers' and 'requires no hyperparameter tuning'; confirm that no implicit scaling or layer-specific thresholds are introduced in the implementation details.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. We address the major comment regarding the linearity assumption in the geometric decomposition below, and we are prepared to revise the manuscript accordingly.
read point-by-point responses
-
Referee: [§3 (geometric decomposition and coefficient derivation)] The central derivation assumes that geometric decomposition of the mismatch yields a closed-form coefficient that compensates error propagation. However, quantization mismatch propagates through non-linear activations (ReLU, softmax, LayerNorm) whose Jacobians are state-dependent and non-constant, violating the linear projection assumption used to obtain the coefficient. This risks under-correction in deeper blocks and residual input deviation that successive rounding cannot fully recover. The manuscript must explicitly state the linearity assumption and provide either a proof of robustness or empirical validation that the coefficient remains effective under realistic non-linear propagation.
Authors: We agree that the derivation in §3 employs a first-order linear approximation by modeling the effective mismatch propagation as a geometric projection onto the calibration direction. This yields the closed-form coefficient without requiring learned parameters. While non-linear activations introduce state-dependent effects that are not captured exactly, the coefficient is intended to compensate the dominant linear component of the input deviation. Our experiments across diverse LLM families, scales, and bit-widths show consistent gains in perplexity and downstream accuracy, indicating that the approximation remains effective in practice and that residual deviations are further mitigated by the successive-rounding solver. In the revised manuscript we will explicitly state the linearity assumption in §3, add a short discussion of its scope, and include additional empirical validation (e.g., layer-wise mismatch reduction plots) demonstrating robustness under realistic non-linear propagation. A full analytic proof is intractable given the composition of non-linearities, but the provided empirical evidence supports the coefficient’s utility. revision: partial
Circularity Check
No significant circularity: closed-form geometric coefficient and independent solver
full rationale
The paper's central derivation applies a closed-form coefficient obtained from geometric decomposition of the mismatch, presented as learning-free and without reference to fitted parameters or self-citation chains. The successive-rounding solver then minimizes the resulting triangular least-squares objective independently of the coefficient derivation. No equation or claim reduces by construction to its own inputs; the coefficient adapts across layers via the decomposition rather than being defined in terms of the final output or calibrated to the target result. This satisfies the self-contained benchmark with external falsifiability via perplexity and accuracy metrics.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Mismatch between quantized and full-precision activations admits a geometric decomposition that produces an adaptive closed-form correction coefficient.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquationwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
the interpolated proxy L(Ŵ;α) admits the decomposition L(Ŵ;α) ≡ α Lasym(Ŵ) + (1-α) Lsym(Ŵ)
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 2 Pith papers
-
Saliency-Aware Regularized Quantization Calibration for Large Language Models
SARQC augments standard PTQ calibration with a saliency-aware regularizer to keep quantized weights closer to original floating-point values, yielding improved perplexity and zero-shot accuracy on dense and MoE LLMs.
-
Saliency-Aware Regularized Quantization Calibration for Large Language Models
SARQC augments standard PTQ calibration with a saliency-aware regularization term that reduces generalization risk and yields better perplexity and zero-shot accuracy on dense and MoE LLMs.
Reference graph
Works this paper leans on
-
[1]
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
URL https://arxiv.org/abs/2404.14219. Behdin, K., Acharya, A., Gupta, A., Selvaraj, S. K., and Mazumder, R. Quantease: Optimization-based quan- tization for language models-an efficient and intuitive algorithm.CoRR,
work page internal anchor Pith review Pith/arXiv arXiv
-
[2]
The Geometry of LLM Quantization: GPTQ as Babai's Nearest Plane Algorithm
Chen, J., Shabanzadeh, Y ., Crnˇcevi´c, E., Hoefler, T., and Alistarh, D. The geometry of llm quantization: Gptq as babai’s nearest plane algorithm.arXiv preprint arXiv:2507.18553,
work page internal anchor Pith review Pith/arXiv arXiv
-
[3]
BoolQ: Exploring the Surprising Difficulty of Natural Yes/No Questions
Clark, C., Lee, K., Chang, M.-W., Kwiatkowski, T., Collins, M., and Toutanova, K. Boolq: Exploring the surprising difficulty of natural yes/no questions.arXiv preprint arXiv:1905.10044,
work page internal anchor Pith review Pith/arXiv arXiv 1905
-
[4]
Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge
Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., and Tafjord, O. Think you have solved question answering? try arc, the ai2 reasoning challenge. arXiv preprint arXiv:1803.05457,
work page internal anchor Pith review Pith/arXiv arXiv
-
[5]
Cbq: Cross-block quantization for large language models.arXiv preprint arXiv:2312.07950,
Ding, X., Liu, X., Tu, Z., Zhang, Y ., Li, W., Hu, J., Chen, H., Tang, Y ., Xiong, Z., Yin, B., et al. Cbq: Cross-block quantization for large language models.arXiv preprint arXiv:2312.07950,
-
[6]
Egiazarian, V ., Panferov, A., Kuznedelev, D., Frantar, E., Babenko, A., and Alistarh, D. Extreme compression of large language models via additive quantization.arXiv preprint arXiv:2401.06118,
-
[7]
GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers
9 Regularized Calibration with Successive Rounding for Post-Training Quantization Frantar, E., Ashkboos, S., Hoefler, T., and Alistarh, D. Gptq: Accurate post-training quantization for generative pre- trained transformers.arXiv preprint arXiv:2210.17323,
work page internal anchor Pith review Pith/arXiv arXiv
-
[8]
Grattafiori, A., Dubey, A., Jauhri, A., Pandey, A., Kadian, A., Al-Dahle, A., Letman, A., Mathur, A., Schelten, A., Vaughan, A., et al. The llama 3 herd of models.arXiv preprint arXiv:2407.21783,
work page internal anchor Pith review Pith/arXiv arXiv
-
[9]
Kim, J., Kim, H.-y., Cho, E., Lee, C., Kim, J., and Jeon, Y . Boa: Attention-aware post-training quantization with- out backpropagation.arXiv preprint arXiv:2406.13474,
-
[10]
Kim, S., Hooper, C., Gholami, A., Dong, Z., Li, X., Shen, S., Mahoney, M. W., and Keutzer, K. Squeezellm: Dense-and-sparse quantization.arXiv preprint arXiv:2306.07629,
-
[11]
Li, Y ., Gong, R., Tan, X., Yang, Y ., Hu, P., Zhang, Q., Yu, F., Wang, W., and Gu, S. Brecq: Pushing the limit of post-training quantization by block reconstruction.arXiv preprint arXiv:2102.05426,
-
[12]
Apiq: Finetuning of 2-bit quantized large language model.arXiv preprint arXiv:2402.05147,
Liao, B., Herold, C., Khadivi, S., and Monz, C. Apiq: Finetuning of 2-bit quantized large language model.arXiv preprint arXiv:2402.05147,
-
[13]
SpinQuant: LLM quantization with learned rotations
Liu, Z., Zhao, C., Fedorov, I., Soran, B., Choudhary, D., Kr- ishnamoorthi, R., Chandra, V ., Tian, Y ., and Blankevoort, T. Spinquant: Llm quantization with learned rotations. arXiv preprint arXiv:2405.16406,
work page internal anchor Pith review Pith/arXiv arXiv
- [14]
-
[15]
LLaMA: Open and Efficient Foundation Language Models
Touvron, H., Lavril, T., Izacard, G., Martinet, X., Lachaux, M.-A., Lacroix, T., Rozi`ere, B., Goyal, N., Hambro, E., Azhar, F., et al. Llama: Open and efficient foundation lan- guage models.arXiv preprint arXiv:2302.13971,
work page internal anchor Pith review Pith/arXiv arXiv
-
[16]
Model-preserving adap- tive rounding.arXiv preprint arXiv:2505.22988,
Tseng, A., Sun, Z., and De Sa, C. Model-preserving adap- tive rounding.arXiv preprint arXiv:2505.22988,
-
[17]
Gptvq: The blessing of dimensional- ity for llm quantization.arXiv preprint arXiv:2402.15319,
Van Baalen, M., Kuzmin, A., Koryakovskiy, I., Nagel, M., Couperus, P., Bastoul, C., Mahurin, E., Blankevoort, T., and Whatmough, P. Gptvq: The blessing of dimensional- ity for llm quantization.arXiv preprint arXiv:2402.15319,
-
[18]
On-device language models: A comprehensive review.arXiv preprint arXiv:2409.00088,
Xu, J., Li, Z., Chen, W., Wang, Q., Gao, X., Cai, Q., and Ling, Z. On-device language models: A comprehensive review.arXiv preprint arXiv:2409.00088,
-
[19]
Yang, A., Li, A., Yang, B., Zhang, B., Hui, B., Zheng, B., Yu, B., Gao, C., Huang, C., Lv, C., et al. Qwen3 technical report.arXiv preprint arXiv:2505.09388,
work page internal anchor Pith review Pith/arXiv arXiv
-
[20]
HellaSwag: Can a Machine Really Finish Your Sentence?
10 Regularized Calibration with Successive Rounding for Post-Training Quantization Zellers, R., Holtzman, A., Bisk, Y ., Farhadi, A., and Choi, Y . Hellaswag: Can a machine really finish your sentence? arXiv preprint arXiv:1905.07830,
work page internal anchor Pith review Pith/arXiv arXiv 1905
-
[21]
First-order error matters: Accu- rate compensation for quantized large language models
Zheng, X., Qin, H., Li, Y ., Chu, H., Wang, J., Guo, J., Magno, M., and Liu, X. First-order error matters: Accu- rate compensation for quantized large language models. arXiv preprint arXiv:2507.11017,
-
[22]
11 Regularized Calibration with Successive Rounding for Post-Training Quantization A. Algorithms We provide pseudocode for SNRQ and K-SNRQ, together with practical implementation details that bridge the objective reformulation and search procedures used in our codebase. Specifically, Algorithm 1 presents the base SNRQ algorithm, while Algorithm 2 introduc...
work page 2022
-
[23]
Remarks.Because the tail minimization (error feedback) solution to a least-squares problem depends on the target through correlations with Xq:,:, replacing Tq by rq changes the continuous optimum ∆W ⋆ :,q: in general, and therefore can change the discrete greedy choice for the quantized column ˆW:,q as well. If GPTAQ didnotapply the surrogate replacement ...
work page 2023
-
[24]
L2-7B L2-13B L3-8B α Wiki2 (↓) C4 (↓) Avg.Acc (↑) Wiki2 (↓) C4 (↓) Avg.Acc (↑) Wiki2 (↓) C4 (↓) Avg.Acc (↑) 0.0 6.62 8.13 66.48 5.537.1371.14 12.85 13.33 65.78 0.25 6.38 7.89 66.93 5.567.1270.33 9.49 11.96 69.53 0.5 6.37 7.86 66.96 5.62 7.16 70.73 8.48 11.87 70.53 0.75 6.43 7.96 66.87 5.79 7.29 69.30 8.94 12.65 68.26 1.0 6.52 8.05 66.55 5.72 7.23 70.45 11...
work page 2023
-
[25]
We bold the best results, as well as those whose value falls within the top score±standard deviation
3-bit quantization results with incoherence processing (Chee et al., 2023; Tseng et al., 2024). We bold the best results, as well as those whose value falls within the top score±standard deviation. Model Method Wiki2 (↓) C4 (↓) Avg.Acc (↑)Q.Time (s) L3-8B QuIP 8.63 11.80 67.60 763.8 GPTAQ 9.07 12.15 65.77 715.0 SNRQ-C 8.74 11.87 67.60 594.4 SNRQ-S 8.55 11...
work page 2023
-
[26]
We follow the experimental protocol of Li et al
with 22M and 86M parameters, respectively. We follow the experimental protocol of Li et al. (2025): 128 images are sampled from the ImageNet training set and used for calibration. We apply weight clipping based on mean squared error and adopt asymmetric quantization grids for both weights and activations. Quantization time is measured on a single NVIDIA R...
work page 2025
-
[27]
Average extra GPU time (ms) per module in L2-7B from (i) obtaining closed-form α in SNRQ-C and (ii) sampling in SNRQ-S. Method attn.k-proj attn.v-proj attn.q-proj attn.o-proj mlp.up-proj mlp.gate-proj mlp.down-projAvg SNRQ-C 17.22 10.46 10.46 10.32 27.07 27.02 85.21 26.82 SNRQ-S 10.04 0 0 9.99 10.15 0 36.75 16.73 Table 13.Beam width vs. coordinate-descent...
work page 2023
-
[28]
2 3 4 5 6 7 8 Beam size K 7.09 GPTAQ Performance-Time Comparison 1500 2000 2500 Search Time (ms) 7.05 7.06 7.07C4 Perplexity SNRQ Least Square Fit ±1 band SNRQ (K =
work page 2000
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.