pith. sign in

arxiv: 1805.10367 · v2 · pith:POMF6K33new · submitted 2018-05-25 · 💻 cs.LG · stat.ML

Zeroth-Order Stochastic Variance Reduction for Nonconvex Optimization

classification 💻 cs.LG stat.ML
keywords varianceoptimizationreducedzo-svrganalysisgradientzeroth-orderapproaches
0
0 comments X
read the original abstract

As application demands for zeroth-order (gradient-free) optimization accelerate, the need for variance reduced and faster converging approaches is also intensifying. This paper addresses these challenges by presenting: a) a comprehensive theoretical analysis of variance reduced zeroth-order (ZO) optimization, b) a novel variance reduced ZO algorithm, called ZO-SVRG, and c) an experimental evaluation of our approach in the context of two compelling applications, black-box chemical material classification and generation of adversarial examples from black-box deep neural network models. Our theoretical analysis uncovers an essential difficulty in the analysis of ZO-SVRG: the unbiased assumption on gradient estimates no longer holds. We prove that compared to its first-order counterpart, ZO-SVRG with a two-point random gradient estimator could suffer an additional error of order $O(1/b)$, where $b$ is the mini-batch size. To mitigate this error, we propose two accelerated versions of ZO-SVRG utilizing variance reduced gradient estimators, which achieve the best rate known for ZO stochastic optimization (in terms of iterations). Our extensive experimental results show that our approaches outperform other state-of-the-art ZO algorithms, and strike a balance between the convergence rate and the function query complexity.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Global Convergence of Sampling-Based Nonconvex Optimization through Diffusion-Style Smoothing

    cs.LG 2026-05 unverdicted novelty 6.0

    Recasts sampling-based nonconvex optimization as smoothed gradient descent to obtain non-asymptotic convergence guarantees and introduces the DIDA annealed algorithm that converges to the global optimum.