A simpler approach to obtaining an O(1/t) convergence rate for the projected stochastic subgradient method

A simpler approach to obtaining an O(1/t) convergence rate for the projected stochastic subgradient method , author= · 2012 · cs.LG · arXiv 1212.2002

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

open full Pith review browse 6 citing papers arXiv PDF

abstract

In this note, we present a new averaging technique for the projected stochastic subgradient method. By using a weighted average with a weight of t+1 for each iterate w_t at iteration t, we obtain the convergence rate of O(1/t) with both an easy proof and an easy implementation. The new scheme is compared empirically to existing techniques, with similar performance behavior.

representative citing papers

Gradient Descent's Last Iterate is Often (slightly) Suboptimal

math.OC · 2026-04-15 · unverdicted · novelty 8.0

Proves it is impossible to achieve optimal last-iterate rates for GD and SGD without knowing the horizon T in advance, incurring an unavoidable poly-log factor penalty even in the deterministic case.

Stochastic Gradient Variational Inference with Price's Gradient Estimator from Bures-Wasserstein to Parameter Space

stat.ML · 2026-02-21 · unverdicted · novelty 7.0

Price's gradient estimator enables black-box VI to achieve the same state-of-the-art iteration complexity as Wasserstein VI, with experiments confirming it as the main performance driver.

Mini-batch Estimation for Deep Cox Models: Statistical Foundations and Practical Guidance

stat.ML · 2024-08-05 · unverdicted · novelty 7.0

Mini-batch SGD optimizes a different objective than full partial likelihood in Cox models, but the resulting mb-MPLE is still consistent with optimal rates for neural nets and asymptotic normality for linear models.

Factor Augmented High-Dimensional SGD

stat.ML · 2026-05-19 · unverdicted · novelty 6.0

Proposes Factor-Augmented SGD that runs on streaming high-dimensional data and supplies the first convergence analysis explicitly accounting for latent-factor estimation error.

Adaptive Federated Optimization

cs.LG · 2020-02-29 · unverdicted · novelty 6.0

Proposes federated adaptive optimizers (FedAdagrad, FedAdam, FedYogi) with convergence analysis for non-convex objectives under data heterogeneity and reports empirical gains over FedAvg.

Robust Learning Meets Quasar-Convex Optimization: Inexact High-Order Proximal-Point Methods

math.OC · 2026-05-08 · unverdicted · novelty 5.0

Robust learning problems are formulated as quasar-convex optimization, and HiPPA is proposed as an inexact high-order proximal method with global and superlinear convergence guarantees.

citing papers explorer

Showing 6 of 6 citing papers.

Gradient Descent's Last Iterate is Often (slightly) Suboptimal math.OC · 2026-04-15 · unverdicted · none · ref 15
Proves it is impossible to achieve optimal last-iterate rates for GD and SGD without knowing the horizon T in advance, incurring an unavoidable poly-log factor penalty even in the deterministic case.
Stochastic Gradient Variational Inference with Price's Gradient Estimator from Bures-Wasserstein to Parameter Space stat.ML · 2026-02-21 · unverdicted · none · ref 8 · internal anchor
Price's gradient estimator enables black-box VI to achieve the same state-of-the-art iteration complexity as Wasserstein VI, with experiments confirming it as the main performance driver.
Mini-batch Estimation for Deep Cox Models: Statistical Foundations and Practical Guidance stat.ML · 2024-08-05 · unverdicted · none · ref 25 · internal anchor
Mini-batch SGD optimizes a different objective than full partial likelihood in Cox models, but the resulting mb-MPLE is still consistent with optimal rates for neural nets and asymptotic normality for linear models.
Factor Augmented High-Dimensional SGD stat.ML · 2026-05-19 · unverdicted · none · ref 22 · internal anchor
Proposes Factor-Augmented SGD that runs on streaming high-dimensional data and supplies the first convergence analysis explicitly accounting for latent-factor estimation error.
Adaptive Federated Optimization cs.LG · 2020-02-29 · unverdicted · none · ref 144 · internal anchor
Proposes federated adaptive optimizers (FedAdagrad, FedAdam, FedYogi) with convergence analysis for non-convex objectives under data heterogeneity and reports empirical gains over FedAvg.
Robust Learning Meets Quasar-Convex Optimization: Inexact High-Order Proximal-Point Methods math.OC · 2026-05-08 · unverdicted · none · ref 109
Robust learning problems are formulated as quasar-convex optimization, and HiPPA is proposed as an inexact high-order proximal method with global and superlinear convergence guarantees.

A simpler approach to obtaining an O(1/t) convergence rate for the projected stochastic subgradient method

fields

years

verdicts

representative citing papers

citing papers explorer