pith. sign in

Towards stability and optimality in stochastic gradient descent

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it
abstract

Iterative procedures for parameter estimation based on stochastic gradient descent allow the estimation to scale to massive data sets. However, in both theory and practice, they suffer from numerical instability. Moreover, they are statistically inefficient as estimators of the true parameter value. To address these two issues, we propose a new iterative procedure termed averaged implicit SGD (AI-SGD). For statistical efficiency, AI-SGD employs averaging of the iterates, which achieves the optimal Cram\'{e}r-Rao bound under strong convexity, i.e., it is an optimal unbiased estimator of the true parameter value. For numerical stability, AI-SGD employs an implicit update at each iteration, which is related to proximal operators in optimization. In practice, AI-SGD achieves competitive performance with other state-of-the-art procedures. Furthermore, it is more stable than averaging procedures that do not employ proximal updates, and is simple to implement as it requires fewer tunable hyperparameters than procedures that do employ proximal updates.

citation-role summary

background 1

citation-polarity summary

fields

cs.LG 1

years

2026 1

verdicts

UNVERDICTED 1

roles

background 1

polarities

background 1

representative citing papers

citing papers explorer

Showing 1 of 1 citing paper.