hub

Handbook of convergence theorems for (stochastic) gradient methods

Guillaume Garrigos, Robert M Gower · 2023 · arXiv 2301.11235

16 Pith papers cite this work. Polarity classification is still indexing.

16 Pith papers citing it

read on arXiv browse 16 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Stochastic Krasnoselskii-Mann Iterations: Convergence without Uniformly Bounded Variance

math.OC · 2026-04-24 · unverdicted · novelty 7.0 · 2 refs

Stochastic Krasnoselskii-Mann iterations converge almost surely and with rates under finite variance at a single fixed point rather than uniform variance bounds, recovering optimal complexity and providing first such results for some splitting methods.

Stochastic Gradient Variational Inference with Price's Gradient Estimator from Bures-Wasserstein to Parameter Space

stat.ML · 2026-02-21 · unverdicted · novelty 7.0

Price's gradient estimator enables black-box VI to achieve the same state-of-the-art iteration complexity as Wasserstein VI, with experiments confirming it as the main performance driver.

On the Convergence Rate of LoRA Gradient Descent

cs.LG · 2025-12-20 · unverdicted · novelty 7.0

LoRA gradient descent converges to a stationary point at rate O(1/log T).

SketchGuard: Scaling Byzantine-Robust Decentralized Federated Learning via Sketch-Based Screening

cs.LG · 2025-10-09 · accept · novelty 7.0

SketchGuard decouples Byzantine filtering from aggregation in decentralized federated learning by exchanging k-dimensional Count Sketches for screening and full models only from accepted neighbors, achieving up to 50-70% communication savings while proving convergence and matching SOTA robustness.

How does the optimizer implicitly bias the model merging loss landscape?

cs.LG · 2025-10-06 · unverdicted · novelty 7.0

Effective noise scale non-monotonically governs model merging success with an optimum, unifying effects of learning rate, weight decay, batch size, and augmentation on the loss landscape.

Factor Augmented High-Dimensional SGD

stat.ML · 2026-05-19 · unverdicted · novelty 6.0

Proposes Factor-Augmented SGD that runs on streaming high-dimensional data and supplies the first convergence analysis explicitly accounting for latent-factor estimation error.

Distributed Learning with Adversarial Gradient Perturbations

cs.LG · 2026-05-05 · unverdicted · novelty 6.0

Tight feasibility thresholds are derived for the minimal sub-optimality gap in convex L-smooth distributed optimization under bounded adversarial gradient perturbations, together with algorithms attaining them at matching query complexity.

One Coordinate at a Time: Convergence Guarantees for Rotosolve in Variational Quantum Algorithms

quant-ph · 2026-04-28 · unverdicted · novelty 6.0

Rotosolve converges to ε-stationary points for smooth non-convex objectives and ε-suboptimal points under PL, with explicit worst-case rates in the finite-shot regime, outperforming or matching RCD in nuanced ways.

Mini-Batch Stochastic Krasnosel'ski\u\i-Mann Algorithm for Nonexpansive Fixed Point Problems

math.OC · 2026-04-08 · unverdicted · novelty 6.0

A mini-batch stochastic Krasnosel'skiĭ-Mann algorithm converges almost surely to fixed points of nonexpansive mappings when batch sizes increase appropriately.

Constrained free energy minimization for the design of thermal states and stabilizer thermodynamic systems

quant-ph · 2025-08-12 · unverdicted · novelty 6.0

Benchmarks gradient-ascent algorithms for constrained free energy minimization on quantum Heisenberg models and stabilizer codes, with applications to thermal state design and fixed-temperature quantum encoding.

COOPO: Cyclic Offline-Online Policy Optimization Algorithm

cs.LG · 2026-05-18 · unverdicted · novelty 5.0

COOPO is a cyclic offline-online RL algorithm that repeatedly anchors the policy to a dataset via KL-regularized updates then fine-tunes online, claiming better sample efficiency and monotonic improvement under coverage assumptions.

HTMuon: Improving Muon via Heavy-Tailed Spectral Correction

cs.LG · 2026-03-10 · unverdicted · novelty 5.0

HTMuon modifies Muon to produce heavier-tailed updates and weight spectra via HT-SR theory, yielding up to 0.98 lower perplexity on LLaMA pretraining and serving as a plug-in for other Muon variants.

Stochastic versus Deterministic in Stochastic Gradient Descent

math.OC · 2025-09-03 · unverdicted · novelty 5.0

Treating stochastic and deterministic gradients separately in mini-batch SGD yields faster convergence and smaller error radius than uniform treatment, with further gains under strong convexity.

On the Convergence Analysis of Muon

stat.ML · 2025-05-29 · unverdicted · novelty 5.0

Convergence analysis shows Muon outperforms gradient descent by exploiting low-rank structure in neural network Hessians.

Accelerated Gradient Descent for Faster Convergence with Minimal Overhead

cs.LG · 2026-05-15 · unverdicted · novelty 4.0

CT-AGD accelerates first-order optimization in deep learning by using finite-difference curvature estimates and noise-mitigation heuristics, achieving equivalent accuracy with 33% fewer training epochs and overhead comparable to Adam.

Complex Stochastic Gradient Descent and Directional Bias in Reproducing Kernel Hilbert Spaces

cs.LG · 2026-04-24

citing papers explorer

Showing 16 of 16 citing papers.

Stochastic Krasnoselskii-Mann Iterations: Convergence without Uniformly Bounded Variance math.OC · 2026-04-24 · unverdicted · none · ref 15 · 2 links
Stochastic Krasnoselskii-Mann iterations converge almost surely and with rates under finite variance at a single fixed point rather than uniform variance bounds, recovering optimal complexity and providing first such results for some splitting methods.
Stochastic Gradient Variational Inference with Price's Gradient Estimator from Bures-Wasserstein to Parameter Space stat.ML · 2026-02-21 · unverdicted · none · ref 4
Price's gradient estimator enables black-box VI to achieve the same state-of-the-art iteration complexity as Wasserstein VI, with experiments confirming it as the main performance driver.
On the Convergence Rate of LoRA Gradient Descent cs.LG · 2025-12-20 · unverdicted · none · ref 3
LoRA gradient descent converges to a stationary point at rate O(1/log T).
SketchGuard: Scaling Byzantine-Robust Decentralized Federated Learning via Sketch-Based Screening cs.LG · 2025-10-09 · accept · none · ref 16
SketchGuard decouples Byzantine filtering from aggregation in decentralized federated learning by exchanging k-dimensional Count Sketches for screening and full models only from accepted neighbors, achieving up to 50-70% communication savings while proving convergence and matching SOTA robustness.
How does the optimizer implicitly bias the model merging loss landscape? cs.LG · 2025-10-06 · unverdicted · none · ref 3
Effective noise scale non-monotonically governs model merging success with an optimum, unifying effects of learning rate, weight decay, batch size, and augmentation on the loss landscape.
Factor Augmented High-Dimensional SGD stat.ML · 2026-05-19 · unverdicted · none · ref 55
Proposes Factor-Augmented SGD that runs on streaming high-dimensional data and supplies the first convergence analysis explicitly accounting for latent-factor estimation error.
Distributed Learning with Adversarial Gradient Perturbations cs.LG · 2026-05-05 · unverdicted · none · ref 11
Tight feasibility thresholds are derived for the minimal sub-optimality gap in convex L-smooth distributed optimization under bounded adversarial gradient perturbations, together with algorithms attaining them at matching query complexity.
One Coordinate at a Time: Convergence Guarantees for Rotosolve in Variational Quantum Algorithms quant-ph · 2026-04-28 · unverdicted · none · ref 3
Rotosolve converges to ε-stationary points for smooth non-convex objectives and ε-suboptimal points under PL, with explicit worst-case rates in the finite-shot regime, outperforming or matching RCD in nuanced ways.
Mini-Batch Stochastic Krasnosel'ski\u\i-Mann Algorithm for Nonexpansive Fixed Point Problems math.OC · 2026-04-08 · unverdicted · none · ref 12
A mini-batch stochastic Krasnosel'skiĭ-Mann algorithm converges almost surely to fixed points of nonexpansive mappings when batch sizes increase appropriately.
Constrained free energy minimization for the design of thermal states and stabilizer thermodynamic systems quant-ph · 2025-08-12 · unverdicted · none · ref 77
Benchmarks gradient-ascent algorithms for constrained free energy minimization on quantum Heisenberg models and stabilizer codes, with applications to thermal state design and fixed-temperature quantum encoding.
COOPO: Cyclic Offline-Online Policy Optimization Algorithm cs.LG · 2026-05-18 · unverdicted · none · ref 45
COOPO is a cyclic offline-online RL algorithm that repeatedly anchors the policy to a dataset via KL-regularized updates then fine-tunes online, claiming better sample efficiency and monotonic improvement under coverage assumptions.
HTMuon: Improving Muon via Heavy-Tailed Spectral Correction cs.LG · 2026-03-10 · unverdicted · none · ref 10
HTMuon modifies Muon to produce heavier-tailed updates and weight spectra via HT-SR theory, yielding up to 0.98 lower perplexity on LLaMA pretraining and serving as a plug-in for other Muon variants.
Stochastic versus Deterministic in Stochastic Gradient Descent math.OC · 2025-09-03 · unverdicted · none · ref 36
Treating stochastic and deterministic gradients separately in mini-batch SGD yields faster convergence and smaller error radius than uniform treatment, with further gains under strong convexity.
On the Convergence Analysis of Muon stat.ML · 2025-05-29 · unverdicted · none · ref 8
Convergence analysis shows Muon outperforms gradient descent by exploiting low-rank structure in neural network Hessians.
Accelerated Gradient Descent for Faster Convergence with Minimal Overhead cs.LG · 2026-05-15 · unverdicted · none · ref 9
CT-AGD accelerates first-order optimization in deep learning by using finite-difference curvature estimates and noise-mitigation heuristics, achieving equivalent accuracy with 33% fewer training epochs and overhead comparable to Adam.
Complex Stochastic Gradient Descent and Directional Bias in Reproducing Kernel Hilbert Spaces cs.LG · 2026-04-24 · unreviewed · ref 21

Handbook of convergence theorems for (stochastic) gradient methods

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer