New Insight of Variance reduce in Zero-Order Hard-Thresholding: Mitigating Gradient Error and Expansivity Contradictions

2) ((1) Harbin Institute of Technology; (2) Mohamed bin Zayed University of Artificial Intelligence; 3); (3) Jilin University); Bin Gu (2; Huan Xiong (1; William de Vazelhes (2); Xinzhe Yuan (1)

arxiv: 2605.18035 · v1 · pith:DL6WGVJEnew · submitted 2026-05-18 · 💻 cs.AI · cs.LG

New Insight of Variance reduce in Zero-Order Hard-Thresholding: Mitigating Gradient Error and Expansivity Contradictions

Xinzhe Yuan (1) , William de Vazelhes (2) , Bin Gu (2 , 3) , Huan Xiong (1 , 2) ((1) Harbin Institute of Technology , (2) Mohamed bin Zayed University of Artificial Intelligence , (3) Jilin University) This is my paper

Pith reviewed 2026-05-20 11:15 UTC · model grok-4.3

classification 💻 cs.AI cs.LG

keywords zero-order optimizationhard-thresholdingvariance reductionl0 sparsitysparse optimizationconvergence analysisblack-box optimizationgradient-free methods

0 comments

The pith

Variance reduction in zero-order hard-thresholding removes restrictions on the number of random directions for gradient estimation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Hard-thresholding solves l0-constrained optimization problems but often requires zero-order gradient approximations when true gradients are unavailable. The prior SZOHT algorithm was limited by a conflict between the deviation in those approximations and the expansivity of the hard-thresholding operator. This paper reframes variance reduction as a tool to ease that specific conflict. It introduces a generalized variance-reduced zero-order hard-thresholding algorithm whose analysis shows arbitrary numbers of random directions are now feasible. The result is faster convergence and applicability to problems such as ridge regression and black-box adversarial attacks.

Core claim

The paper establishes that variance reduction mitigates the contradictions between zero-order gradient error and the expansivity of the hard-thresholding operator, enabling a generalized algorithm that eliminates the prior restriction on the number of random directions and delivers improved convergence rates under standard assumptions for zero-order methods.

What carries the argument

Generalized variance-reduced zero-order hard-thresholding algorithm that treats variance reduction as a resolver of gradient deviation versus operator expansivity conflicts.

If this is right

Improved convergence rates compared with SZOHT
No restrictions on the number of random directions in gradient estimation
Broader applicability to sparse optimization problems
Demonstrated utility on ridge regression and black-box adversarial attacks

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same variance-reduction perspective might address analogous conflicts in other zero-order methods that use non-smooth projection operators.
Applying the algorithm to higher-dimensional or non-convex sparse problems could expose further scalability advantages.
The insight suggests variance control as a general lever for making gradient-free methods viable in additional constrained black-box settings.

Load-bearing premise

The convergence analysis depends on standard assumptions for zero-order gradient estimates and hard-thresholding operators holding without further restrictions.

What would settle it

Experiments that increase the number of random directions past the SZOHT limit and measure whether the new algorithm still exhibits the claimed faster convergence on the ridge regression task would settle the central claim.

Figures

Figures reproduced from arXiv: 2605.18035 by 2) ((1) Harbin Institute of Technology, (2) Mohamed bin Zayed University of Artificial Intelligence, 3), (3) Jilin University), Bin Gu (2, Huan Xiong (1, William de Vazelhes (2), Xinzhe Yuan (1).

**Figure 1.** Figure 1: Motivation of our algorithm. enhances the algorithm’s convergence but also expands its utility across a wider range of scenarios. 3. General Analysis: The introduction of a general analysis framework is another contribution to the paper. This framework systematically evaluates the performance and behavior of varying variance reduced algorithms under ℓ0-constraint and ZO gradient. 2 PRELIMINARIES Throughout… view at source ↗

**Figure 2.** Figure 2: #IZO and #NHT on the ridge regression task. [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗

**Figure 3.** Figure 3: #IZO and #NHT on the few pixels adv. attacks (CIFAR-10), for the original class ’airplane’. [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 4.** Figure 4: #IZO (up) and #NHT (down) on the ridge regression task, synthetic example (k=2). [PITH_FULL_IMAGE:figures/full_fig_p028_4.png] view at source ↗

**Figure 5.** Figure 5: #IZO (up) and #NHT (down) on the ridge regression task, synthetic example (k=3). [PITH_FULL_IMAGE:figures/full_fig_p028_5.png] view at source ↗

**Figure 6.** Figure 6: #IZO (up) and #NHT (down) on the ridge regression task, synthetic example (k=4). [PITH_FULL_IMAGE:figures/full_fig_p029_6.png] view at source ↗

**Figure 7.** Figure 7: #IZO (up) and #NHT (down) on the ridge regression task, [PITH_FULL_IMAGE:figures/full_fig_p030_7.png] view at source ↗

**Figure 8.** Figure 8: #IZO (up) and #NHT (down) on the ridge regression task, [PITH_FULL_IMAGE:figures/full_fig_p030_8.png] view at source ↗

**Figure 9.** Figure 9: #IZO and #NHT on the few pixels adversarial attacks task (CIFAR-10), for the original [PITH_FULL_IMAGE:figures/full_fig_p031_9.png] view at source ↗

**Figure 10.** Figure 10: #IZO and #NHT on the few pixels adversarial attacks task (CIFAR-10), for the original [PITH_FULL_IMAGE:figures/full_fig_p032_10.png] view at source ↗

**Figure 11.** Figure 11: #IZO and #NHT on the few pixels adversarial attacks task (CIFAR-10), for the original [PITH_FULL_IMAGE:figures/full_fig_p033_11.png] view at source ↗

read the original abstract

Hard-thresholding is an important type of algorithm in machine learning that is used to solve $\ell_0$ constrained optimization problems. However, the true gradient of the objective function can be difficult to access in certain scenarios, which normally can be approximated by zeroth-order (ZO) methods. The SZOHT algorithm is the only algorithm tackling $\ell_0$ sparsity constraints with ZO gradients so far. Unfortunately, SZOHT has a notable limitation on the number of random directions % in ZO gradients due to the inherent conflict between the deviation of ZO gradients and the expansivity of the hard-thresholding operator. This paper approaches this problem by considering the role of variance and provides a new insight into variance reduction: mitigating the unique conflicts between ZO gradients and hard-thresholding. Under this perspective, we propose a generalized variance reduced ZO hard-thresholding algorithm as well as the generalized convergence analysis under standard assumptions. The theoretical results demonstrate the new algorithm eliminates the restrictions on the number of random directions, leading to improved convergence rates and broader applicability compared with SZOHT. Finally, we illustrate the utility of our method on a ridge regression problem as well as black-box adversarial attacks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Variance reduction removes the lower bound on random directions in zero-order hard-thresholding with a generalized analysis that improves rates over SZOHT.

read the letter

Hi, The one thing to know is that this paper uses variance reduction to fix the conflict in zero-order hard-thresholding that forced a minimum number of random directions in the earlier SZOHT method. The new algorithm and its analysis remove that restriction on q. They add a variance-reduced estimator to the ZO gradient and show in the analysis that this damps the error term enough for the hard-thresholding operator's expansivity not to require a large q anymore. The convergence rates improve and the method applies more broadly under the standard smoothness and variance bounds. That part reads cleanly and builds directly on the prior work without circular steps or hidden dependencies. The experiments back it with results on ridge regression and black-box attacks, showing faster convergence as expected from the theory. Where it could be stronger is in the practical side. The gains assume you can actually reduce the variance effectively, which might be trickier in very high-dimensional or noisy black-box settings than the theory lets on. The experiments are solid for what they are but could include more varied baselines or scaling tests to make the advantage clearer. Still, nothing major is off here. This paper is for optimization researchers focused on zero-order methods with sparsity constraints. A reader who knows the SZOHT paper will get the most out of the generalization and the new proof strategy. It deserves serious referee time because the central claim about removing the restriction holds up and the work is a useful incremental advance in this area. I would recommend sending it for peer review. Best,

Referee Report

0 major / 3 minor

Summary. The manuscript proposes a variance-reduced zero-order hard-thresholding (VR-ZOHT) algorithm for solving ℓ₀-constrained optimization problems using only zeroth-order gradient estimates. It identifies the core limitation in the prior SZOHT method as a conflict between the deviation of the ZO estimator and the expansivity of the hard-thresholding operator, which imposed a lower bound on the number of random directions q. By incorporating a variance-reduction term, the new method damps the effective gradient error inside the non-expansive analysis of the composite iteration, yielding a generalized convergence analysis under standard ZO assumptions (smoothness, bounded estimator variance, and hard-thresholding properties) that removes the restriction on q and improves the convergence rate. Experiments on ridge regression and black-box adversarial attacks are reported to support the theoretical gains.

Significance. If the derivations hold, the work meaningfully extends the practical reach of zeroth-order methods to sparse optimization by eliminating a key computational restriction on q, which directly reduces per-iteration cost while preserving or improving rates. The insight that variance reduction can resolve the specific ZO-HT expansivity tension is a targeted contribution, and the provision of a generalized analysis together with consistent empirical results on ridge regression and adversarial attacks adds value. The paper correctly builds on cited prior work without introducing hidden parameter dependence in the final rate.

minor comments (3)

Title: 'Variance reduce' is grammatically incorrect and should read 'Variance Reduction'.
Abstract and §1: The claim that the new algorithm 'eliminates the restrictions on the number of random directions' would be strengthened by an explicit side-by-side statement of the old and new lower bounds on q (or the absence thereof) immediately after the main theorem.
Experiments section: The black-box attack results would benefit from reporting attack success rate and query complexity in a single table with SZOHT and at least one additional ZO baseline for direct comparison.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive summary, recognition of the significance of our variance-reduction insight for resolving the ZO-HT expansivity tension, and the recommendation of minor revision. The referee's description of the VR-ZOHT algorithm and its generalized analysis accurately reflects the manuscript.

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained

full rationale

The paper introduces a variance-reduced zero-order hard-thresholding algorithm and derives a generalized convergence analysis under standard ZO smoothness, bounded variance, and hard-thresholding properties. The key step—using the variance-reduction term to dampen gradient error so that the expansivity of the hard-thresholding operator no longer forces a lower bound on the number of random directions q—follows directly from the composite iteration analysis without reducing to a fitted parameter, self-definition, or load-bearing self-citation. The final rate expression contains no hidden dependence on q that is presupposed by the inputs, and the result is externally falsifiable via the stated assumptions and experiments. No steps match the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Review based solely on abstract; no explicit free parameters or invented entities are described. The analysis invokes standard assumptions for convergence.

axioms (1)

domain assumption standard assumptions for convergence of variance-reduced zero-order methods and hard-thresholding
Explicitly referenced in the abstract for the generalized convergence analysis.

pith-pipeline@v0.9.0 · 5800 in / 1152 out tokens · 45982 ms · 2026-05-20T11:15:25.623887+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

43 extracted references · 43 canonical work pages · 2 internal anchors

[1]

International Conference on Artificial Intelligence and Statistics , pages=

Stability and risk bounds of iterative hard thresholding , author=. International Conference on Artificial Intelligence and Statistics , pages=. 2021 , organization=

work page 2021
[2]

2009 , publisher=

Learning multiple layers of features from tiny images , author=. 2009 , publisher=

work page 2009
[3]

2011 , publisher=

Statistics for high-dimensional data: methods, theory and applications , author=. 2011 , publisher=

work page 2011
[4]

IEEE transactions on information theory , volume=

Minimax rates of estimation for high-dimensional linear regression over _q -balls , author=. IEEE transactions on information theory , volume=. 2011 , publisher=

work page 2011
[5]

Proceedings of the AAAI Conference on Artificial Intelligence , volume=

Faster gradient-free proximal stochastic methods for nonconvex nonsmooth optimization , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=

work page
[6]

A unified framework for high-dimensional analysis of M-estimators with decomposable regularizers , author=

work page
[7]

Applied and computational harmonic analysis , volume=

Iterative hard thresholding for compressed sensing , author=. Applied and computational harmonic analysis , volume=. 2009 , publisher=

work page 2009
[8]

Advances in neural information processing systems , volume=

On iterative hard thresholding methods for high-dimensional m-estimation , author=. Advances in neural information processing systems , volume=

work page
[9]

IEEE Transactions on Information Theory , volume=

Linear convergence of stochastic iterative greedy algorithms with sparse constraints , author=. IEEE Transactions on Information Theory , volume=. 2017 , publisher=

work page 2017
[10]

, author=

Gradient Hard Thresholding Pursuit. , author=. J. Mach. Learn. Res. , volume=

work page
[11]

Advances in Neural Information Processing Systems , volume=

Efficient stochastic gradient hard thresholding , author=. Advances in Neural Information Processing Systems , volume=

work page
[12]

The Journal of Machine Learning Research , volume=

A tight bound of hard thresholding , author=. The Journal of Machine Learning Research , volume=. 2017 , publisher=

work page 2017
[13]

Nonconvex Sparse Learning via Stochastic Optimization with Progressive Variance Reduction

Nonconvex sparse learning via stochastic optimization with progressive variance reduction , author=. arXiv preprint arXiv:1605.02711 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[14]

Foundations and Trends

Graphical models, exponential families, and variational inference , author=. Foundations and Trends. 2008 , publisher=

work page 2008
[15]

Mathematics , year=

Variance Reduced Stochastic Gradient Descent with Neighbors , author=. Mathematics , year=

work page
[16]

The Journal of Machine Learning Research , volume=

An optimal algorithm for bandit and zero-order convex optimization with two-point feedback , author=. The Journal of Machine Learning Research , volume=. 2017 , publisher=

work page 2017
[17]

Proceedings of the AAAI Conference on Artificial Intelligence , number=

Autozoom: Autoencoder-based zeroth order optimization method for attacking black-box neural networks , author=. Proceedings of the AAAI Conference on Artificial Intelligence , number=

work page
[18]

Proceedings of the 10th ACM workshop on artificial intelligence and security , pages=

Zoo: Zeroth order optimization based black-box attacks to deep neural networks without training substitute models , author=. Proceedings of the 10th ACM workshop on artificial intelligence and security , pages=

work page
[19]

Advances in neural information processing systems , volume=

Zo-adamm: Zeroth-order adaptive momentum method for black-box optimization , author=. Advances in neural information processing systems , volume=

work page
[20]

Evolution Strategies as a Scalable Alternative to Reinforcement Learning

Evolution strategies as a scalable alternative to reinforcement learning , author=. arXiv preprint arXiv:1703.03864 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[21]

Advances in Neural Information Processing Systems , volume=

Simple random search of static linear policies is competitive for reinforcement learning , author=. Advances in Neural Information Processing Systems , volume=

work page
[22]

Conference on Robot Learning , pages=

Provably robust blackbox optimization for reinforcement learning , author=. Conference on Robot Learning , pages=. 2020 , organization=

work page 2020
[23]

Foundations of Computational Mathematics , volume=

Random gradient-free minimization of convex functions , author=. Foundations of Computational Mathematics , volume=. 2017 , publisher=

work page 2017
[24]

Advances in Neural Information Processing Systems , volume=

Zeroth-order stochastic variance reduction for nonconvex optimization , author=. Advances in Neural Information Processing Systems , volume=

work page
[25]

Advances in Neural Information Processing Systems , volume=

Zeroth-order (non)-convex stochastic optimization via conditional gradient and gradient updates , author=. Advances in Neural Information Processing Systems , volume=

work page
[26]

Journal of the American statistical Association , volume=

Variable selection via nonconcave penalized likelihood and its oracle properties , author=. Journal of the American statistical Association , volume=. 2001 , publisher=

work page 2001
[27]

Nearly unbiased variable selection under minimax concave penalty , author=

work page
[28]

The Journal of Machine Learning Research , volume=

A unified q-memorization framework for asynchronous stochastic optimization , author=. The Journal of Machine Learning Research , volume=. 2020 , publisher=

work page 2020
[29]

Expansivity , author=

Zeroth-Order Hard-Thresholding: Gradient Error vs. Expansivity , author=. Advances in Neural Information Processing Systems , volume=

work page
[30]

International Conference on Machine Learning , pages=

Gradient hard thresholding pursuit for sparsity-constrained optimization , author=. International Conference on Machine Learning , pages=. 2014 , organization=

work page 2014
[31]

Mathematical Programming , volume=

Mini-batch Stochastic Approximation Methods for Nonconvex Stochastic Composite Optimization , author=. Mathematical Programming , volume=

work page
[32]

Zeroth-order (Non)-Convex Stochastic Optimization via Conditional Gradient and Gradient Updates , author=

work page
[33]

Journal of Scientific Computing , volume=

On the Information-Adaptive Variants of the ADMM: An Iteration Complexity Perspective , author=. Journal of Scientific Computing , volume=

work page
[34]

arXiv preprint arXiv:1705.09554 , year=

Analysis of universal adversarial perturbations , author=. arXiv preprint arXiv:1705.09554 , year=

work page arXiv
[35]

2017 ieee symposium on security and privacy (sp) , pages=

Towards evaluating the robustness of neural networks , author=. 2017 ieee symposium on security and privacy (sp) , pages=. 2017 , organization=

work page 2017
[36]

UCI Machine Learning Repository

Dua, Dheeru and Graff, Casey. UCI Machine Learning Repository. 2017

work page 2017
[37]

the Journal of machine Learning research , volume=

Scikit-learn: Machine learning in Python , author=. the Journal of machine Learning research , volume=. 2011 , publisher=

work page 2011
[38]

ACM SIGKDD Explorations Newsletter , volume=

OpenML: networked science in machine learning , author=. ACM SIGKDD Explorations Newsletter , volume=. 2014 , publisher=

work page 2014
[39]

Breheny, Patrick , year=

work page
[40]

Proceedings of the National Academy of Sciences , volume=

Genotypic predictors of human immunodeficiency virus type 1 drug resistance , author=. Proceedings of the National Academy of Sciences , volume=. 2006 , publisher=

work page 2006
[41]

Advances in neural information processing systems , volume=

SAGA: A fast incremental gradient method with support for non-strongly convex composite objectives , author=. Advances in neural information processing systems , volume=

work page
[42]

International conference on machine learning , pages=

SARAH: A novel method for machine learning problems using stochastic recursive gradient , author=. International conference on machine learning , pages=. 2017 , organization=

work page 2017
[43]

Advances in neural information processing systems , volume=

Accelerating stochastic gradient descent using predictive variance reduction , author=. Advances in neural information processing systems , volume=

work page

[1] [1]

International Conference on Artificial Intelligence and Statistics , pages=

Stability and risk bounds of iterative hard thresholding , author=. International Conference on Artificial Intelligence and Statistics , pages=. 2021 , organization=

work page 2021

[2] [2]

2009 , publisher=

Learning multiple layers of features from tiny images , author=. 2009 , publisher=

work page 2009

[3] [3]

2011 , publisher=

Statistics for high-dimensional data: methods, theory and applications , author=. 2011 , publisher=

work page 2011

[4] [4]

IEEE transactions on information theory , volume=

Minimax rates of estimation for high-dimensional linear regression over _q -balls , author=. IEEE transactions on information theory , volume=. 2011 , publisher=

work page 2011

[5] [5]

Proceedings of the AAAI Conference on Artificial Intelligence , volume=

Faster gradient-free proximal stochastic methods for nonconvex nonsmooth optimization , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=

work page

[6] [6]

A unified framework for high-dimensional analysis of M-estimators with decomposable regularizers , author=

work page

[7] [7]

Applied and computational harmonic analysis , volume=

Iterative hard thresholding for compressed sensing , author=. Applied and computational harmonic analysis , volume=. 2009 , publisher=

work page 2009

[8] [8]

Advances in neural information processing systems , volume=

On iterative hard thresholding methods for high-dimensional m-estimation , author=. Advances in neural information processing systems , volume=

work page

[9] [9]

IEEE Transactions on Information Theory , volume=

Linear convergence of stochastic iterative greedy algorithms with sparse constraints , author=. IEEE Transactions on Information Theory , volume=. 2017 , publisher=

work page 2017

[10] [10]

, author=

Gradient Hard Thresholding Pursuit. , author=. J. Mach. Learn. Res. , volume=

work page

[11] [11]

Advances in Neural Information Processing Systems , volume=

Efficient stochastic gradient hard thresholding , author=. Advances in Neural Information Processing Systems , volume=

work page

[12] [12]

The Journal of Machine Learning Research , volume=

A tight bound of hard thresholding , author=. The Journal of Machine Learning Research , volume=. 2017 , publisher=

work page 2017

[13] [13]

Nonconvex Sparse Learning via Stochastic Optimization with Progressive Variance Reduction

Nonconvex sparse learning via stochastic optimization with progressive variance reduction , author=. arXiv preprint arXiv:1605.02711 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[14] [14]

Foundations and Trends

Graphical models, exponential families, and variational inference , author=. Foundations and Trends. 2008 , publisher=

work page 2008

[15] [15]

Mathematics , year=

Variance Reduced Stochastic Gradient Descent with Neighbors , author=. Mathematics , year=

work page

[16] [16]

The Journal of Machine Learning Research , volume=

An optimal algorithm for bandit and zero-order convex optimization with two-point feedback , author=. The Journal of Machine Learning Research , volume=. 2017 , publisher=

work page 2017

[17] [17]

Proceedings of the AAAI Conference on Artificial Intelligence , number=

Autozoom: Autoencoder-based zeroth order optimization method for attacking black-box neural networks , author=. Proceedings of the AAAI Conference on Artificial Intelligence , number=

work page

[18] [18]

Proceedings of the 10th ACM workshop on artificial intelligence and security , pages=

Zoo: Zeroth order optimization based black-box attacks to deep neural networks without training substitute models , author=. Proceedings of the 10th ACM workshop on artificial intelligence and security , pages=

work page

[19] [19]

Advances in neural information processing systems , volume=

Zo-adamm: Zeroth-order adaptive momentum method for black-box optimization , author=. Advances in neural information processing systems , volume=

work page

[20] [20]

Evolution Strategies as a Scalable Alternative to Reinforcement Learning

Evolution strategies as a scalable alternative to reinforcement learning , author=. arXiv preprint arXiv:1703.03864 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[21] [21]

Advances in Neural Information Processing Systems , volume=

Simple random search of static linear policies is competitive for reinforcement learning , author=. Advances in Neural Information Processing Systems , volume=

work page

[22] [22]

Conference on Robot Learning , pages=

Provably robust blackbox optimization for reinforcement learning , author=. Conference on Robot Learning , pages=. 2020 , organization=

work page 2020

[23] [23]

Foundations of Computational Mathematics , volume=

Random gradient-free minimization of convex functions , author=. Foundations of Computational Mathematics , volume=. 2017 , publisher=

work page 2017

[24] [24]

Advances in Neural Information Processing Systems , volume=

Zeroth-order stochastic variance reduction for nonconvex optimization , author=. Advances in Neural Information Processing Systems , volume=

work page

[25] [25]

Advances in Neural Information Processing Systems , volume=

Zeroth-order (non)-convex stochastic optimization via conditional gradient and gradient updates , author=. Advances in Neural Information Processing Systems , volume=

work page

[26] [26]

Journal of the American statistical Association , volume=

Variable selection via nonconcave penalized likelihood and its oracle properties , author=. Journal of the American statistical Association , volume=. 2001 , publisher=

work page 2001

[27] [27]

Nearly unbiased variable selection under minimax concave penalty , author=

work page

[28] [28]

The Journal of Machine Learning Research , volume=

A unified q-memorization framework for asynchronous stochastic optimization , author=. The Journal of Machine Learning Research , volume=. 2020 , publisher=

work page 2020

[29] [29]

Expansivity , author=

Zeroth-Order Hard-Thresholding: Gradient Error vs. Expansivity , author=. Advances in Neural Information Processing Systems , volume=

work page

[30] [30]

International Conference on Machine Learning , pages=

Gradient hard thresholding pursuit for sparsity-constrained optimization , author=. International Conference on Machine Learning , pages=. 2014 , organization=

work page 2014

[31] [31]

Mathematical Programming , volume=

Mini-batch Stochastic Approximation Methods for Nonconvex Stochastic Composite Optimization , author=. Mathematical Programming , volume=

work page

[32] [32]

Zeroth-order (Non)-Convex Stochastic Optimization via Conditional Gradient and Gradient Updates , author=

work page

[33] [33]

Journal of Scientific Computing , volume=

On the Information-Adaptive Variants of the ADMM: An Iteration Complexity Perspective , author=. Journal of Scientific Computing , volume=

work page

[34] [34]

arXiv preprint arXiv:1705.09554 , year=

Analysis of universal adversarial perturbations , author=. arXiv preprint arXiv:1705.09554 , year=

work page arXiv

[35] [35]

2017 ieee symposium on security and privacy (sp) , pages=

Towards evaluating the robustness of neural networks , author=. 2017 ieee symposium on security and privacy (sp) , pages=. 2017 , organization=

work page 2017

[36] [36]

UCI Machine Learning Repository

Dua, Dheeru and Graff, Casey. UCI Machine Learning Repository. 2017

work page 2017

[37] [37]

the Journal of machine Learning research , volume=

Scikit-learn: Machine learning in Python , author=. the Journal of machine Learning research , volume=. 2011 , publisher=

work page 2011

[38] [38]

ACM SIGKDD Explorations Newsletter , volume=

OpenML: networked science in machine learning , author=. ACM SIGKDD Explorations Newsletter , volume=. 2014 , publisher=

work page 2014

[39] [39]

Breheny, Patrick , year=

work page

[40] [40]

Proceedings of the National Academy of Sciences , volume=

Genotypic predictors of human immunodeficiency virus type 1 drug resistance , author=. Proceedings of the National Academy of Sciences , volume=. 2006 , publisher=

work page 2006

[41] [41]

Advances in neural information processing systems , volume=

SAGA: A fast incremental gradient method with support for non-strongly convex composite objectives , author=. Advances in neural information processing systems , volume=

work page

[42] [42]

International conference on machine learning , pages=

SARAH: A novel method for machine learning problems using stochastic recursive gradient , author=. International conference on machine learning , pages=. 2017 , organization=

work page 2017

[43] [43]

Advances in neural information processing systems , volume=

Accelerating stochastic gradient descent using predictive variance reduction , author=. Advances in neural information processing systems , volume=

work page