pith. sign in

arxiv: 2605.18035 · v1 · pith:DL6WGVJEnew · submitted 2026-05-18 · 💻 cs.AI · cs.LG

New Insight of Variance reduce in Zero-Order Hard-Thresholding: Mitigating Gradient Error and Expansivity Contradictions

Pith reviewed 2026-05-20 11:15 UTC · model grok-4.3

classification 💻 cs.AI cs.LG
keywords zero-order optimizationhard-thresholdingvariance reductionl0 sparsitysparse optimizationconvergence analysisblack-box optimizationgradient-free methods
0
0 comments X

The pith

Variance reduction in zero-order hard-thresholding removes restrictions on the number of random directions for gradient estimation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Hard-thresholding solves l0-constrained optimization problems but often requires zero-order gradient approximations when true gradients are unavailable. The prior SZOHT algorithm was limited by a conflict between the deviation in those approximations and the expansivity of the hard-thresholding operator. This paper reframes variance reduction as a tool to ease that specific conflict. It introduces a generalized variance-reduced zero-order hard-thresholding algorithm whose analysis shows arbitrary numbers of random directions are now feasible. The result is faster convergence and applicability to problems such as ridge regression and black-box adversarial attacks.

Core claim

The paper establishes that variance reduction mitigates the contradictions between zero-order gradient error and the expansivity of the hard-thresholding operator, enabling a generalized algorithm that eliminates the prior restriction on the number of random directions and delivers improved convergence rates under standard assumptions for zero-order methods.

What carries the argument

Generalized variance-reduced zero-order hard-thresholding algorithm that treats variance reduction as a resolver of gradient deviation versus operator expansivity conflicts.

If this is right

  • Improved convergence rates compared with SZOHT
  • No restrictions on the number of random directions in gradient estimation
  • Broader applicability to sparse optimization problems
  • Demonstrated utility on ridge regression and black-box adversarial attacks

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same variance-reduction perspective might address analogous conflicts in other zero-order methods that use non-smooth projection operators.
  • Applying the algorithm to higher-dimensional or non-convex sparse problems could expose further scalability advantages.
  • The insight suggests variance control as a general lever for making gradient-free methods viable in additional constrained black-box settings.

Load-bearing premise

The convergence analysis depends on standard assumptions for zero-order gradient estimates and hard-thresholding operators holding without further restrictions.

What would settle it

Experiments that increase the number of random directions past the SZOHT limit and measure whether the new algorithm still exhibits the claimed faster convergence on the ridge regression task would settle the central claim.

Figures

Figures reproduced from arXiv: 2605.18035 by 2) ((1) Harbin Institute of Technology, (2) Mohamed bin Zayed University of Artificial Intelligence, 3), (3) Jilin University), Bin Gu (2, Huan Xiong (1, William de Vazelhes (2), Xinzhe Yuan (1).

Figure 1
Figure 1. Figure 1: Motivation of our algorithm. enhances the algorithm’s convergence but also expands its utility across a wider range of scenarios. 3. General Analysis: The introduction of a general analysis framework is another contribution to the paper. This framework systematically evaluates the performance and behavior of varying variance reduced algorithms under ℓ0-constraint and ZO gradient. 2 PRELIMINARIES Throughout… view at source ↗
Figure 2
Figure 2. Figure 2: #IZO and #NHT on the ridge regression task. [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: #IZO and #NHT on the few pixels adv. attacks (CIFAR-10), for the original class ’airplane’. [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: #IZO (up) and #NHT (down) on the ridge regression task, synthetic example (k=2). [PITH_FULL_IMAGE:figures/full_fig_p028_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: #IZO (up) and #NHT (down) on the ridge regression task, synthetic example (k=3). [PITH_FULL_IMAGE:figures/full_fig_p028_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: #IZO (up) and #NHT (down) on the ridge regression task, synthetic example (k=4). [PITH_FULL_IMAGE:figures/full_fig_p029_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: #IZO (up) and #NHT (down) on the ridge regression task, [PITH_FULL_IMAGE:figures/full_fig_p030_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: #IZO (up) and #NHT (down) on the ridge regression task, [PITH_FULL_IMAGE:figures/full_fig_p030_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: #IZO and #NHT on the few pixels adversarial attacks task (CIFAR-10), for the original [PITH_FULL_IMAGE:figures/full_fig_p031_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: #IZO and #NHT on the few pixels adversarial attacks task (CIFAR-10), for the original [PITH_FULL_IMAGE:figures/full_fig_p032_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: #IZO and #NHT on the few pixels adversarial attacks task (CIFAR-10), for the original [PITH_FULL_IMAGE:figures/full_fig_p033_11.png] view at source ↗
read the original abstract

Hard-thresholding is an important type of algorithm in machine learning that is used to solve $\ell_0$ constrained optimization problems. However, the true gradient of the objective function can be difficult to access in certain scenarios, which normally can be approximated by zeroth-order (ZO) methods. The SZOHT algorithm is the only algorithm tackling $\ell_0$ sparsity constraints with ZO gradients so far. Unfortunately, SZOHT has a notable limitation on the number of random directions % in ZO gradients due to the inherent conflict between the deviation of ZO gradients and the expansivity of the hard-thresholding operator. This paper approaches this problem by considering the role of variance and provides a new insight into variance reduction: mitigating the unique conflicts between ZO gradients and hard-thresholding. Under this perspective, we propose a generalized variance reduced ZO hard-thresholding algorithm as well as the generalized convergence analysis under standard assumptions. The theoretical results demonstrate the new algorithm eliminates the restrictions on the number of random directions, leading to improved convergence rates and broader applicability compared with SZOHT. Finally, we illustrate the utility of our method on a ridge regression problem as well as black-box adversarial attacks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 3 minor

Summary. The manuscript proposes a variance-reduced zero-order hard-thresholding (VR-ZOHT) algorithm for solving ℓ₀-constrained optimization problems using only zeroth-order gradient estimates. It identifies the core limitation in the prior SZOHT method as a conflict between the deviation of the ZO estimator and the expansivity of the hard-thresholding operator, which imposed a lower bound on the number of random directions q. By incorporating a variance-reduction term, the new method damps the effective gradient error inside the non-expansive analysis of the composite iteration, yielding a generalized convergence analysis under standard ZO assumptions (smoothness, bounded estimator variance, and hard-thresholding properties) that removes the restriction on q and improves the convergence rate. Experiments on ridge regression and black-box adversarial attacks are reported to support the theoretical gains.

Significance. If the derivations hold, the work meaningfully extends the practical reach of zeroth-order methods to sparse optimization by eliminating a key computational restriction on q, which directly reduces per-iteration cost while preserving or improving rates. The insight that variance reduction can resolve the specific ZO-HT expansivity tension is a targeted contribution, and the provision of a generalized analysis together with consistent empirical results on ridge regression and adversarial attacks adds value. The paper correctly builds on cited prior work without introducing hidden parameter dependence in the final rate.

minor comments (3)
  1. Title: 'Variance reduce' is grammatically incorrect and should read 'Variance Reduction'.
  2. Abstract and §1: The claim that the new algorithm 'eliminates the restrictions on the number of random directions' would be strengthened by an explicit side-by-side statement of the old and new lower bounds on q (or the absence thereof) immediately after the main theorem.
  3. Experiments section: The black-box attack results would benefit from reporting attack success rate and query complexity in a single table with SZOHT and at least one additional ZO baseline for direct comparison.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive summary, recognition of the significance of our variance-reduction insight for resolving the ZO-HT expansivity tension, and the recommendation of minor revision. The referee's description of the VR-ZOHT algorithm and its generalized analysis accurately reflects the manuscript.

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained

full rationale

The paper introduces a variance-reduced zero-order hard-thresholding algorithm and derives a generalized convergence analysis under standard ZO smoothness, bounded variance, and hard-thresholding properties. The key step—using the variance-reduction term to dampen gradient error so that the expansivity of the hard-thresholding operator no longer forces a lower bound on the number of random directions q—follows directly from the composite iteration analysis without reducing to a fitted parameter, self-definition, or load-bearing self-citation. The final rate expression contains no hidden dependence on q that is presupposed by the inputs, and the result is externally falsifiable via the stated assumptions and experiments. No steps match the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Review based solely on abstract; no explicit free parameters or invented entities are described. The analysis invokes standard assumptions for convergence.

axioms (1)
  • domain assumption standard assumptions for convergence of variance-reduced zero-order methods and hard-thresholding
    Explicitly referenced in the abstract for the generalized convergence analysis.

pith-pipeline@v0.9.0 · 5800 in / 1152 out tokens · 45982 ms · 2026-05-20T11:15:25.623887+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

43 extracted references · 43 canonical work pages · 2 internal anchors

  1. [1]

    International Conference on Artificial Intelligence and Statistics , pages=

    Stability and risk bounds of iterative hard thresholding , author=. International Conference on Artificial Intelligence and Statistics , pages=. 2021 , organization=

  2. [2]

    2009 , publisher=

    Learning multiple layers of features from tiny images , author=. 2009 , publisher=

  3. [3]

    2011 , publisher=

    Statistics for high-dimensional data: methods, theory and applications , author=. 2011 , publisher=

  4. [4]

    IEEE transactions on information theory , volume=

    Minimax rates of estimation for high-dimensional linear regression over _q -balls , author=. IEEE transactions on information theory , volume=. 2011 , publisher=

  5. [5]

    Proceedings of the AAAI Conference on Artificial Intelligence , volume=

    Faster gradient-free proximal stochastic methods for nonconvex nonsmooth optimization , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=

  6. [6]

    A unified framework for high-dimensional analysis of M-estimators with decomposable regularizers , author=

  7. [7]

    Applied and computational harmonic analysis , volume=

    Iterative hard thresholding for compressed sensing , author=. Applied and computational harmonic analysis , volume=. 2009 , publisher=

  8. [8]

    Advances in neural information processing systems , volume=

    On iterative hard thresholding methods for high-dimensional m-estimation , author=. Advances in neural information processing systems , volume=

  9. [9]

    IEEE Transactions on Information Theory , volume=

    Linear convergence of stochastic iterative greedy algorithms with sparse constraints , author=. IEEE Transactions on Information Theory , volume=. 2017 , publisher=

  10. [10]

    , author=

    Gradient Hard Thresholding Pursuit. , author=. J. Mach. Learn. Res. , volume=

  11. [11]

    Advances in Neural Information Processing Systems , volume=

    Efficient stochastic gradient hard thresholding , author=. Advances in Neural Information Processing Systems , volume=

  12. [12]

    The Journal of Machine Learning Research , volume=

    A tight bound of hard thresholding , author=. The Journal of Machine Learning Research , volume=. 2017 , publisher=

  13. [13]

    Nonconvex Sparse Learning via Stochastic Optimization with Progressive Variance Reduction

    Nonconvex sparse learning via stochastic optimization with progressive variance reduction , author=. arXiv preprint arXiv:1605.02711 , year=

  14. [14]

    Foundations and Trends

    Graphical models, exponential families, and variational inference , author=. Foundations and Trends. 2008 , publisher=

  15. [15]

    Mathematics , year=

    Variance Reduced Stochastic Gradient Descent with Neighbors , author=. Mathematics , year=

  16. [16]

    The Journal of Machine Learning Research , volume=

    An optimal algorithm for bandit and zero-order convex optimization with two-point feedback , author=. The Journal of Machine Learning Research , volume=. 2017 , publisher=

  17. [17]

    Proceedings of the AAAI Conference on Artificial Intelligence , number=

    Autozoom: Autoencoder-based zeroth order optimization method for attacking black-box neural networks , author=. Proceedings of the AAAI Conference on Artificial Intelligence , number=

  18. [18]

    Proceedings of the 10th ACM workshop on artificial intelligence and security , pages=

    Zoo: Zeroth order optimization based black-box attacks to deep neural networks without training substitute models , author=. Proceedings of the 10th ACM workshop on artificial intelligence and security , pages=

  19. [19]

    Advances in neural information processing systems , volume=

    Zo-adamm: Zeroth-order adaptive momentum method for black-box optimization , author=. Advances in neural information processing systems , volume=

  20. [20]

    Evolution Strategies as a Scalable Alternative to Reinforcement Learning

    Evolution strategies as a scalable alternative to reinforcement learning , author=. arXiv preprint arXiv:1703.03864 , year=

  21. [21]

    Advances in Neural Information Processing Systems , volume=

    Simple random search of static linear policies is competitive for reinforcement learning , author=. Advances in Neural Information Processing Systems , volume=

  22. [22]

    Conference on Robot Learning , pages=

    Provably robust blackbox optimization for reinforcement learning , author=. Conference on Robot Learning , pages=. 2020 , organization=

  23. [23]

    Foundations of Computational Mathematics , volume=

    Random gradient-free minimization of convex functions , author=. Foundations of Computational Mathematics , volume=. 2017 , publisher=

  24. [24]

    Advances in Neural Information Processing Systems , volume=

    Zeroth-order stochastic variance reduction for nonconvex optimization , author=. Advances in Neural Information Processing Systems , volume=

  25. [25]

    Advances in Neural Information Processing Systems , volume=

    Zeroth-order (non)-convex stochastic optimization via conditional gradient and gradient updates , author=. Advances in Neural Information Processing Systems , volume=

  26. [26]

    Journal of the American statistical Association , volume=

    Variable selection via nonconcave penalized likelihood and its oracle properties , author=. Journal of the American statistical Association , volume=. 2001 , publisher=

  27. [27]

    Nearly unbiased variable selection under minimax concave penalty , author=

  28. [28]

    The Journal of Machine Learning Research , volume=

    A unified q-memorization framework for asynchronous stochastic optimization , author=. The Journal of Machine Learning Research , volume=. 2020 , publisher=

  29. [29]

    Expansivity , author=

    Zeroth-Order Hard-Thresholding: Gradient Error vs. Expansivity , author=. Advances in Neural Information Processing Systems , volume=

  30. [30]

    International Conference on Machine Learning , pages=

    Gradient hard thresholding pursuit for sparsity-constrained optimization , author=. International Conference on Machine Learning , pages=. 2014 , organization=

  31. [31]

    Mathematical Programming , volume=

    Mini-batch Stochastic Approximation Methods for Nonconvex Stochastic Composite Optimization , author=. Mathematical Programming , volume=

  32. [32]

    Zeroth-order (Non)-Convex Stochastic Optimization via Conditional Gradient and Gradient Updates , author=

  33. [33]

    Journal of Scientific Computing , volume=

    On the Information-Adaptive Variants of the ADMM: An Iteration Complexity Perspective , author=. Journal of Scientific Computing , volume=

  34. [34]

    arXiv preprint arXiv:1705.09554 , year=

    Analysis of universal adversarial perturbations , author=. arXiv preprint arXiv:1705.09554 , year=

  35. [35]

    2017 ieee symposium on security and privacy (sp) , pages=

    Towards evaluating the robustness of neural networks , author=. 2017 ieee symposium on security and privacy (sp) , pages=. 2017 , organization=

  36. [36]

    UCI Machine Learning Repository

    Dua, Dheeru and Graff, Casey. UCI Machine Learning Repository. 2017

  37. [37]

    the Journal of machine Learning research , volume=

    Scikit-learn: Machine learning in Python , author=. the Journal of machine Learning research , volume=. 2011 , publisher=

  38. [38]

    ACM SIGKDD Explorations Newsletter , volume=

    OpenML: networked science in machine learning , author=. ACM SIGKDD Explorations Newsletter , volume=. 2014 , publisher=

  39. [39]

    Breheny, Patrick , year=

  40. [40]

    Proceedings of the National Academy of Sciences , volume=

    Genotypic predictors of human immunodeficiency virus type 1 drug resistance , author=. Proceedings of the National Academy of Sciences , volume=. 2006 , publisher=

  41. [41]

    Advances in neural information processing systems , volume=

    SAGA: A fast incremental gradient method with support for non-strongly convex composite objectives , author=. Advances in neural information processing systems , volume=

  42. [42]

    International conference on machine learning , pages=

    SARAH: A novel method for machine learning problems using stochastic recursive gradient , author=. International conference on machine learning , pages=. 2017 , organization=

  43. [43]

    Advances in neural information processing systems , volume=

    Accelerating stochastic gradient descent using predictive variance reduction , author=. Advances in neural information processing systems , volume=