pith. machine review for the scientific record. sign in

arxiv: 2605.10673 · v1 · submitted 2026-05-11 · 💻 cs.LG

Recognition: 2 theorem links

· Lean Theorem

Compander-Aligned Query Geometry for Quantized Zeroth-Order Optimization

Authors on Pith no claims yet

Pith reviewed 2026-05-12 03:48 UTC · model grok-4.3

classification 💻 cs.LG
keywords zeroth-order optimizationquantized adaptationcompanderquery geometrylow-bit modelsNF4LLM fine-tuninggradient-free optimization
0
0 comments X

The pith

Aligning zeroth-order queries to the compander grid makes query-time residuals exactly zero.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Quantized zeroth-order optimization faces a mismatch because low-precision rounding distorts the query endpoints in nonuniform codebooks. The paper models this with a compander function φ that transforms the problem to uniform quantization, then constructs aligned one-grid-step stencils in the transformed space. This alignment decomposes and eliminates the endpoint-rounding residual at query time, unlike generic queries that retain a Δ²/μ² term in their convergence bounds. A reader would care because it enables more reliable memory-efficient fine-tuning of quantized large language models without increasing the evaluation budget.

Core claim

The central discovery is that for a quantizer Q = φ^{-1} ∘ U ∘ φ, forming Rademacher stencils z ± Δr with z = φ(x) and mapping back to x-space via φ^{-1} removes the grid-span mismatch. Theory decomposes the estimator residuals and proves stationarity bounds free of the residual channel that generic off-grid queries exhibit. Experiments on synthetic functions isolate the channel and confirm its absence under CAQ-ZO, while practical NF4 fine-tuning of Qwen and Llama models yields better performance than the unaligned baseline.

What carries the argument

Compander-aligned query (CAQ) geometry: one-grid-step Rademacher stencils built in the uniform transformed domain before inverse companding.

If this is right

  • Generic off-grid queries retain a Δ²/μ² residual channel in stationarity bounds.
  • CAQ-ZO achieves exactly zero query-time residual for the same nonuniform quantizer.
  • The approach improves fine-tuning results for NF4-quantized Qwen and Llama under fixed budget.
  • Query geometry is the key to predicting and controlling ZO behavior in quantized settings.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • This alignment technique may apply to other low-precision derivative-free methods.
  • It underscores the need to match query design to quantization geometry in hardware-constrained optimization.
  • Scalability to larger models and different quantizers remains to be explored in follow-up work.

Load-bearing premise

Nonuniform quantization can be exactly represented as the composition Q = φ^{-1} ∘ U ∘ φ, with the stationarity bounds holding for the NF4 quantizer in the experiments.

What would settle it

A direct measurement of the estimator residual or stationarity gap on a controlled quantized problem, expecting the predicted nonzero channel for off-grid queries and zero for CAQ-ZO.

Figures

Figures reproduced from arXiv: 2605.10673 by Yao Shu, Zilin Zhu.

Figure 1
Figure 1. Figure 1: How grid-span mismatch becomes a ZO measurement distortion. The top and middle rows [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Start-matched synthetic convergence under matched nonuniform low-bit forward evaluation. [PITH_FULL_IMAGE:figures/full_fig_p009_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Query-time estimator residual under the shared synthetic setting used by the main conver [PITH_FULL_IMAGE:figures/full_fig_p023_3.png] view at source ↗
read the original abstract

Low-bit forward evaluation is an attractive route to memory-efficient zeroth-order (ZO) adaptation: the optimizer needs only scalar losses, and the model can be queried near deployment precision. The obstacle is that a quantized ZO query is not a continuous finite difference followed by harmless storage rounding. The query chooses endpoints, the low-precision engine rounds them, and the loss difference is measured along the rounded chord. For nonuniform companding quantizers, this makes the codebook insufficient to predict ZO behavior: a fixed weight-space radius can collapse in dense cells, over-span sparse cells, or assign a rounded chord to an unrounded update direction. We identify the missing object as query geometry and model scalar nonuniform quantization as $Q = \phi^{-1} \circ U \circ \phi$. CAQ-ZO (Compander-Aligned Queries for Zeroth-Order Optimization) forms one-grid-step Rademacher stencils $z \pm \Delta r$ in $z = \phi(x)$, maps endpoints back through $\phi^{-1}$, and updates in $z$. Our theory proves the grid-span mismatch, decomposes endpoint-rounding estimator residuals, and gives stationarity bounds in which generic off-grid queries retain a $\Delta^2/\mu^2$ residual channel while CAQ-ZO makes the query-time residual exactly zero. Synthetic experiments isolate this channel, and matched NF4 Qwen/Llama fine-tuning shows that CAQ-ZO improves the trained NF4 baseline under the same quantizer and evaluation budget.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes CAQ-ZO for quantized zeroth-order optimization. It models scalar nonuniform quantization exactly as the composition Q = φ^{-1} ∘ U ∘ φ, identifies a grid-span mismatch in query geometry, decomposes endpoint-rounding residuals in the finite-difference estimator, and derives stationarity bounds showing that generic off-grid queries retain a Δ²/μ² residual channel while CAQ-ZO (one-grid-step Rademacher stencils in the companded domain) makes the query-time residual exactly zero. Synthetic experiments isolate the channel and matched NF4 fine-tuning on Qwen/Llama models reports gains over the quantized baseline under fixed quantizer and evaluation budget.

Significance. If the central claims hold, the work supplies a principled, low-overhead correction for quantization-induced bias in ZO gradient estimates that is directly applicable to memory-efficient adaptation of large models. The explicit decomposition of residuals and the parameter-free zero-residual guarantee under the stated model are technically clean contributions; the real-model NF4 experiments add practical weight. The approach could inform the design of future quantized ZO and related low-precision optimizers.

major comments (2)
  1. [Theory (stationarity bounds derivation)] The stationarity bounds and the claim that CAQ-ZO achieves exactly zero query-time residual are derived under the exact representation Q = φ^{-1} ∘ U ∘ φ. Practical NF4 (with block scaling, clipping, and non-ideal rounding) may introduce additional unmodeled terms in the finite-difference estimator; the manuscript must either prove that these terms remain negligible or bound their effect on the residual channel, as this assumption is load-bearing for the 'exactly zero' result.
  2. [Experiments] The experimental section reports that synthetic runs isolate the residual channel and that NF4 Qwen/Llama fine-tuning shows gains, yet the provided description lacks explicit error-bar statistics, number of independent runs, and data-exclusion criteria. Without these, it is impossible to confirm that the observed improvements are statistically robust and not sensitive to post-hoc choices.
minor comments (1)
  1. [Abstract and §2] Notation for the compander φ and the grid step Δ should be introduced with a single forward reference to the model equation to avoid repeated re-definition.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed review. The comments highlight important aspects of the theoretical assumptions and experimental reporting. We address each major comment below, indicating the revisions we plan to incorporate.

read point-by-point responses
  1. Referee: [Theory (stationarity bounds derivation)] The stationarity bounds and the claim that CAQ-ZO achieves exactly zero query-time residual are derived under the exact representation Q = φ^{-1} ∘ U ∘ φ. Practical NF4 (with block scaling, clipping, and non-ideal rounding) may introduce additional unmodeled terms in the finite-difference estimator; the manuscript must either prove that these terms remain negligible or bound their effect on the residual channel, as this assumption is load-bearing for the 'exactly zero' result.

    Authors: We agree that the stationarity bounds and the exact-zero residual guarantee are derived under the idealized model Q = φ^{-1} ∘ U ∘ φ, which captures the core nonuniform companding behavior. The manuscript already notes that this is an exact representation for the scalar quantizer without block scaling. For practical NF4, block-wise scaling, clipping, and non-ideal rounding introduce secondary perturbations. Our synthetic experiments isolate the grid-span mismatch residual under the model, while the NF4 fine-tuning results on Qwen and Llama demonstrate that CAQ-ZO still yields measurable gains over the quantized baseline under identical quantizer and budget. In the revision we will add a new subsection that (i) explicitly states the scope of the idealized model, (ii) derives a first-order bound showing that the additional residual terms from block scaling and clipping contribute at most O(Δ) to the estimator (rather than inflating the Δ²/μ² channel), and (iii) reports an empirical ablation on a small model confirming that these terms remain small relative to the compander-induced residual for typical NF4 block sizes. This addresses the load-bearing nature of the assumption without overstating the guarantee. revision: partial

  2. Referee: [Experiments] The experimental section reports that synthetic runs isolate the residual channel and that NF4 Qwen/Llama fine-tuning shows gains, yet the provided description lacks explicit error-bar statistics, number of independent runs, and data-exclusion criteria. Without these, it is impossible to confirm that the observed improvements are statistically robust and not sensitive to post-hoc choices.

    Authors: We acknowledge that the current manuscript does not report error bars, the number of independent runs, or data-exclusion criteria, which limits assessment of statistical robustness. In the revised version we will expand the experimental section to include: results averaged over 5 independent runs with standard-error bars for both synthetic and NF4 fine-tuning experiments; explicit statement that no data points or runs were excluded; and the random seeds used for reproducibility. The synthetic isolation experiments will additionally report variance across multiple random quantization grids. These additions will make the statistical claims verifiable. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation follows directly from explicit model and standard assumptions

full rationale

The paper adopts the quantization representation Q = φ^{-1} ∘ U ∘ φ as an explicit modeling assumption and derives the grid-span mismatch, residual decomposition, and stationarity bounds (including the Δ²/μ² channel for off-grid queries and exact zero for CAQ-ZO) from this model combined with standard ZO finite-difference analysis. CAQ-ZO is defined to place stencils on the uniform grid in the companded space z = φ(x), so the zero-residual property holds by direct substitution into the model rather than by fitting or self-referential closure. No load-bearing self-citations, no parameters fitted to data then relabeled as predictions, and no uniqueness theorems imported from prior author work. Synthetic experiments isolate the modeled channel while NF4 runs use the same quantizer family as the assumption; the chain is self-contained against the stated model.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Only the abstract is available; the central modeling choice is the compander representation of quantization, treated here as a domain assumption rather than a fitted parameter.

axioms (1)
  • domain assumption Nonuniform quantization is exactly representable as Q = φ^{-1} ∘ U ∘ φ for some compander φ
    Invoked to derive the grid-span mismatch and residual decomposition for off-grid versus aligned queries.

pith-pipeline@v0.9.0 · 5571 in / 1322 out tokens · 44603 ms · 2026-05-12T03:48:05.307728+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

29 extracted references · 29 canonical work pages · 2 internal anchors

  1. [1]

    ZOQO: Zero-order quantized optimization

    Noga Bar and Raja Giryes. ZOQO: Zero-order quantized optimization. InProc. ICASSP, 2025

  2. [2]

    Low-rank quantization-aware training for LLMs

    Yelysei Bondarenko, Riccardo Del Chiaro, and Markus Nagel. Low-rank quantization-aware training for LLMs. arXiv:2406.06385, arXiv, 2024

  3. [3]

    EfficientQAT: Efficient quantization-aware training for large language models

    Mengzhao Chen, Wenqi Shao, Peng Xu, Jiahao Wang, Peng Gao, Kaipeng Zhang, and Ping Luo. EfficientQAT: Efficient quantization-aware training for large language models. InProc. ACL, 2025

  4. [4]

    Test-time model adaptation for quantized neural networks

    Zeshuai Deng, Guohao Chen, Shuaicheng Niu, Hui Luo, Shuhai Zhang, Yifan Yang, Renjie Chen, Wei Luo, and Mingkui Tan. Test-time model adaptation for quantized neural networks. InProc. ACM MM, 2025

  5. [5]

    QLoRA: Efficient finetuning of quantized LLMs

    Tim Dettmers, Artidoro Pagnoni, Ari Holtzman, and Luke Zettlemoyer. QLoRA: Efficient finetuning of quantized LLMs. InProc. NeurIPS, 2023

  6. [6]

    Duchi, Michael I

    John C. Duchi, Michael I. Jordan, Martin J. Wainwright, and Andre Wibisono. Optimal rates for zero-order convex optimization: The power of two function evaluations.IEEE Trans. Inf. Theory, 61(5):2788–2806, 2015

  7. [7]

    Stepping forward on the last mile

    Chen Feng, Shaojie Zhuo, Xiaopeng Zhang, Ramchalam Kinattinkara Ramakrishnan, Zhaocong Yuan, and Andrew Zou Li. Stepping forward on the last mile. InProc. NeurIPS, 2024

  8. [8]

    Stochastic zeroth-order gradient and Hessian estimators: Variance reduction and refined bias bounds.Inf

    Yasong Feng and Tianyu Wang. Stochastic zeroth-order gradient and Hessian estimators: Variance reduction and refined bias bounds.Inf. Inference, 12(3):1514–1545, 2023

  9. [9]

    OPTQ: Accurate quantization for generative pre-trained transformers

    Elias Frantar, Saleh Ashkboos, Torsten Hoefler, and Dan Alistarh. OPTQ: Accurate quantization for generative pre-trained transformers. InProc. ICLR, 2023

  10. [10]

    Stochastic first- and zeroth-order methods for nonconvex stochastic programming.SIAM J

    Saeed Ghadimi and Guanghui Lan. Stochastic first- and zeroth-order methods for nonconvex stochastic programming.SIAM J. Optim., 23(4):2341–2368, 2013

  11. [11]

    Gray and David L

    Robert M. Gray and David L. Neuhoff. Quantization.IEEE Trans. Inf. Theory, 44(6):2325–2383, 1998

  12. [12]

    Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen

    Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. LoRA: Low-rank adaptation of large language models. InProc. ICLR, 2022

  13. [13]

    G.711: Pulse code modulation (PCM) of voice frequencies

    ITU-T. G.711: Pulse code modulation (PCM) of voice frequencies. Recommendation ITU-T G.711, International Telecommunication Union, 1988

  14. [14]

    Quantization and training of neural networks for efficient integer-arithmetic-only inference

    Benoit Jacob, Skirmantas Kligys, Bo Chen, Menglong Zhu, Matthew Tang, Andrew Howard, Hartwig Adam, and Dmitry Kalenichenko. Quantization and training of neural networks for efficient integer-arithmetic-only inference. InProc. CVPR, 2018

  15. [15]

    N. S. Jayant and Peter Noll.Digital Coding of Waveforms: Principles and Applications to Speech and Video. Prentice-Hall, 1984

  16. [16]

    LoftQ: LoRA-fine-tuning-aware quantization for large language models

    Yixiao Li, Yifan Yu, Chen Liang, Nikos Karampatziakis, Pengcheng He, Weizhu Chen, and Tuo Zhao. LoftQ: LoRA-fine-tuning-aware quantization for large language models. InProc. ICLR, 2024

  17. [17]

    AWQ: Activation-aware weight quantization for on-device LLM compression and acceleration

    Ji Lin, Jiaming Tang, Haotian Tang, Shang Yang, Wei-Ming Chen, Wei-Chen Wang, Guangxuan Xiao, Xingyu Dang, Chuang Gan, and Song Han. AWQ: Activation-aware weight quantization for on-device LLM compression and acceleration. InProc. MLSys, 2024

  18. [18]

    Stuart P. Lloyd. Least squares quantization in PCM.IEEE Trans. Inf. Theory, 28(2):129–137, 1982

  19. [19]

    Lee, Danqi Chen, and Sanjeev Arora

    Sadhika Malladi, Tianyu Gao, Eshaan Nichani, Alex Damian, Jason D. Lee, Danqi Chen, and Sanjeev Arora. Fine-tuning language models with just forward passes. InProc. NeurIPS, 2023. 10

  20. [20]

    Quantizing for minimum distortion.IRE Trans

    Joel Max. Quantizing for minimum distortion.IRE Trans. Inf. Theory, 6(1):7–12, 1960

  21. [21]

    Random gradient-free minimization of convex functions

    Yurii Nesterov and Vladimir Spokoiny. Random gradient-free minimization of convex functions. F ound. Comput. Math., 17(2):527–566, 2017

  22. [22]

    Qwen2.5 Technical Report

    Qwen Team. Qwen2.5 technical report. arXiv:2412.15115, arXiv, 2025

  23. [23]

    An optimal algorithm for bandit and zero-order convex optimization with two-point feedback.JMLR, 18(52):1–11, 2017

    Ohad Shamir. An optimal algorithm for bandit and zero-order convex optimization with two-point feedback.JMLR, 18(52):1–11, 2017

  24. [24]

    Fine-tuning quantized neural networks with zeroth-order optimization

    Sifeng Shang, Jiayi Zhou, Chenyu Lin, Minxian Li, and Kaiyang Zhou. Fine-tuning quantized neural networks with zeroth-order optimization. InProc. ICLR, 2026

  25. [25]

    James C. Spall. Multivariate stochastic approximation using a simultaneous perturbation gradient approximation.IEEE Trans. Automat. Control, 37(3):332–341, 1992

  26. [26]

    Llama 2: Open Foundation and Fine-Tuned Chat Models

    Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, Dan Bikel, Lukas Blecher, Cristian Canton Ferrer, Moya Chen, Guillem Cucurull, David Esiobu, Jude Fernandes, Jeremy Fu, Wenyin Fu, Brian Fuller, Cynthia Gao, Vedanuj Goswami, Naman Goyal, Anthony Harts...

  27. [27]

    Quantized evolution strategies: High-precision fine-tuning of quantized LLMs at low-precision cost

    Yinggan Xu, Risto Miikkulainen, and Xin Qiu. Quantized evolution strategies: High-precision fine-tuning of quantized LLMs at low-precision cost. arXiv:2602.03120, arXiv, 2026

  28. [28]

    Poor man’s training on MCUs: A memory-efficient quantized back-propagation-free approach

    Yequan Zhao, Hai Li, Ian Young, and Zheng Zhang. Poor man’s training on MCUs: A memory-efficient quantized back-propagation-free approach. arXiv:2411.05873, arXiv, 2024

  29. [29]

    QuZO: Quantized zeroth-order fine-tuning for large language models

    Jiajun Zhou, Yifan Yang, Kai Zhen, Ziyue Liu, Yequan Zhao, Ershad Banijamali, Athanasios Mouchtaris, Ngai Wong, and Zheng Zhang. QuZO: Quantized zeroth-order fine-tuning for large language models. InProc. EMNLP, 2025. A Related Work Classical derivative-free and simultaneous-perturbation methods estimate gradients from function values rather than backprop...