arXiv preprint arXiv:2506.02336 , year=

Large Stepsizes Accelerate Gradient Descent for Regularized Logistic Regression , author= · 2025 · arXiv 2506.02336

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

Large-Step Training Dynamics of a Two-Factor Linear Transformer Model

stat.ML · 2026-05-20 · unverdicted · novelty 7.0

Large constant learning rates in a two-factor linear transformer model can induce cycles, bounded chaos, or divergence rather than convergence to a single in-context linear-regression solution.

SGD at the Edge of Stability: Stochastic Stabilization with Large Learning Rates

stat.ML · 2026-06-29 · unverdicted · novelty 6.0

SGD on multiclass cross-entropy loss alternates between curvature-driven oscillations and stable regimes but self-stabilizes to enable best-iterate convergence with large learning rates for linear and two-layer models.

citing papers explorer

Showing 2 of 2 citing papers after filters.

Large-Step Training Dynamics of a Two-Factor Linear Transformer Model stat.ML · 2026-05-20 · unverdicted · none · ref 23
Large constant learning rates in a two-factor linear transformer model can induce cycles, bounded chaos, or divergence rather than convergence to a single in-context linear-regression solution.
SGD at the Edge of Stability: Stochastic Stabilization with Large Learning Rates stat.ML · 2026-06-29 · unverdicted · none · ref 4
SGD on multiclass cross-entropy loss alternates between curvature-driven oscillations and stable regimes but self-stabilizes to enable best-iterate convergence with large learning rates for linear and two-layer models.

arXiv preprint arXiv:2506.02336 , year=

fields

years

verdicts

representative citing papers

citing papers explorer