hub

Handbook of convergence theorems for (stochastic) gradient methods.arXiv preprint arXiv:2301.11235

Handbook of convergence theorems for (stochastic) gradient methods , author= · 2023 · arXiv 2301.11235

22 Pith papers cite this work. Polarity classification is still indexing.

22 Pith papers citing it

read on arXiv browse 22 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Sharp $O(1/k)$ convergence rate for the Sinkhorn algorithm via a local analysis

math.OC · 2026-06-27 · unverdicted · novelty 7.0

Proves sharp O(1/k) rate for Sinkhorn via local bipartite graph analysis of positive-mass edges, bootstrapped from prior almost-sharp global bound.

Stochastic Krasnoselskii-Mann Iterations: Convergence without Uniformly Bounded Variance

math.OC · 2026-04-24 · unverdicted · novelty 7.0 · 2 refs

Stochastic Krasnoselskii-Mann iterations converge almost surely and with rates under finite variance at a single fixed point rather than uniform variance bounds, recovering optimal complexity and providing first such results for some splitting methods.

Stochastic Gradient Variational Inference with Price's Gradient Estimator from Bures-Wasserstein to Parameter Space

stat.ML · 2026-02-21 · unverdicted · novelty 7.0

Price's gradient estimator enables black-box VI to achieve the same state-of-the-art iteration complexity as Wasserstein VI, with experiments confirming it as the main performance driver.

On the Convergence Rate of LoRA Gradient Descent

cs.LG · 2025-12-20 · unverdicted · novelty 7.0

LoRA gradient descent converges to a stationary point at rate O(1/log T).

SketchGuard: Scaling Byzantine-Robust Decentralized Federated Learning via Sketch-Based Screening

cs.LG · 2025-10-09 · accept · novelty 7.0

SketchGuard decouples Byzantine filtering from aggregation in decentralized federated learning by exchanging k-dimensional Count Sketches for screening and full models only from accepted neighbors, achieving up to 50-70% communication savings while proving convergence and matching SOTA robustness.

How does the optimizer implicitly bias the model merging loss landscape?

cs.LG · 2025-10-06 · unverdicted · novelty 7.0

Effective noise scale non-monotonically governs model merging success with an optimum, unifying effects of learning rate, weight decay, batch size, and augmentation on the loss landscape.

Highly Data Parallelizable Estimation of the Sliced-Wasserstein Distance Using Cumulative Distribution Functions

stat.ML · 2026-06-29 · unverdicted · novelty 6.0

New class of CDF-based estimators for sliced Wasserstein distance avoids sorting, enables massive parallelism, and suits federated learning and Gaussian mixture models.

Randomized conjugate gradient least squares

math.NA · 2026-05-24 · unverdicted · novelty 6.0

RCGLS replaces the gradient in CGLS with a randomized coordinate version via a constraint correction view, proving linear convergence in expectation better than randomized coordinate descent, plus sparse implementation and ridge regression extension.

Factor Augmented High-Dimensional SGD

stat.ML · 2026-05-19 · unverdicted · novelty 6.0

Proposes Factor-Augmented SGD that runs on streaming high-dimensional data and supplies the first convergence analysis explicitly accounting for latent-factor estimation error.

Distributed Learning with Adversarial Gradient Perturbations

cs.LG · 2026-05-05 · unverdicted · novelty 6.0

Tight feasibility thresholds are derived for the minimal sub-optimality gap in convex L-smooth distributed optimization under bounded adversarial gradient perturbations, together with algorithms attaining them at matching query complexity.

One Coordinate at a Time: Convergence Guarantees for Rotosolve in Variational Quantum Algorithms

quant-ph · 2026-04-28 · unverdicted · novelty 6.0

Rotosolve converges to ε-stationary points for smooth non-convex objectives and ε-suboptimal points under PL, with explicit worst-case rates in the finite-shot regime, outperforming or matching RCD in nuanced ways.

Mini-Batch Stochastic Krasnosel'ski\u\i-Mann Algorithm for Nonexpansive Fixed Point Problems

math.OC · 2026-04-08 · unverdicted · novelty 6.0

A mini-batch stochastic Krasnosel'skiĭ-Mann algorithm converges almost surely to fixed points of nonexpansive mappings when batch sizes increase appropriately.

Constrained free energy minimization for the design of thermal states and stabilizer thermodynamic systems

quant-ph · 2025-08-12 · unverdicted · novelty 6.0

Benchmarks gradient-ascent algorithms for constrained free energy minimization on quantum Heisenberg models and stabilizer codes, with applications to thermal state design and fixed-temperature quantum encoding.

On subspace-constrained preconditioning for randomized iterative methods

math.NA · 2026-05-28 · unverdicted · novelty 5.0

Refines subspace preconditioning for randomized linear solvers via QR-like factorization, enabling implicit use and proving expected linear convergence while reducing to a smaller system with good singular values.

Accelerated Dynamic Importance Weighting with Versatile Divergence-Minimizing Estimators

cs.LG · 2026-05-25 · unverdicted · novelty 5.0

ADIW accelerates dynamic importance weighting for joint distribution shift by using a few lightweight projected gradient descent updates with warm-starting from prior weights and generalizes it to support multiple divergence-based estimators in a plug-and-play manner.

COOPO: Cyclic Offline-Online Policy Optimization Algorithm

cs.LG · 2026-05-18 · unverdicted · novelty 5.0

COOPO is a cyclic offline-online RL algorithm that repeatedly anchors the policy to a dataset via KL-regularized updates then fine-tunes online, claiming better sample efficiency and monotonic improvement under coverage assumptions.

HTMuon: Improving Muon via Heavy-Tailed Spectral Correction

cs.LG · 2026-03-10 · unverdicted · novelty 5.0

HTMuon modifies Muon to produce heavier-tailed updates and weight spectra via HT-SR theory, yielding up to 0.98 lower perplexity on LLaMA pretraining and serving as a plug-in for other Muon variants.

Stochastic versus Deterministic in Stochastic Gradient Descent

math.OC · 2025-09-03 · unverdicted · novelty 5.0

Treating stochastic and deterministic gradients separately in mini-batch SGD yields faster convergence and smaller error radius than uniform treatment, with further gains under strong convexity.

On the Convergence Analysis of Muon

stat.ML · 2025-05-29 · unverdicted · novelty 5.0

Convergence analysis shows Muon outperforms gradient descent by exploiting low-rank structure in neural network Hessians.

Accelerated Gradient Descent for Faster Convergence with Minimal Overhead

cs.LG · 2026-05-15 · unverdicted · novelty 4.0

CT-AGD accelerates first-order optimization in deep learning by using finite-difference curvature estimates and noise-mitigation heuristics, achieving equivalent accuracy with 33% fewer training epochs and overhead comparable to Adam.

How AI settled the complexity of the oldest SGD algorithm

cs.LG · 2026-06-28 · unverdicted · novelty 3.0

AI models discovered the worst-case complexity of the Kaczmarz algorithm for solving linear systems.

Complex Stochastic Gradient Descent and Directional Bias in Reproducing Kernel Hilbert Spaces

cs.LG · 2026-04-24

citing papers explorer

Showing 20 of 20 citing papers after filters.

Sharp $O(1/k)$ convergence rate for the Sinkhorn algorithm via a local analysis math.OC · 2026-06-27 · unverdicted · none · ref 35
Proves sharp O(1/k) rate for Sinkhorn via local bipartite graph analysis of positive-mass edges, bootstrapped from prior almost-sharp global bound.
Stochastic Krasnoselskii-Mann Iterations: Convergence without Uniformly Bounded Variance math.OC · 2026-04-24 · unverdicted · none · ref 15 · 2 links
Stochastic Krasnoselskii-Mann iterations converge almost surely and with rates under finite variance at a single fixed point rather than uniform variance bounds, recovering optimal complexity and providing first such results for some splitting methods.
Stochastic Gradient Variational Inference with Price's Gradient Estimator from Bures-Wasserstein to Parameter Space stat.ML · 2026-02-21 · unverdicted · none · ref 4
Price's gradient estimator enables black-box VI to achieve the same state-of-the-art iteration complexity as Wasserstein VI, with experiments confirming it as the main performance driver.
On the Convergence Rate of LoRA Gradient Descent cs.LG · 2025-12-20 · unverdicted · none · ref 3
LoRA gradient descent converges to a stationary point at rate O(1/log T).
How does the optimizer implicitly bias the model merging loss landscape? cs.LG · 2025-10-06 · unverdicted · none · ref 3
Effective noise scale non-monotonically governs model merging success with an optimum, unifying effects of learning rate, weight decay, batch size, and augmentation on the loss landscape.
Highly Data Parallelizable Estimation of the Sliced-Wasserstein Distance Using Cumulative Distribution Functions stat.ML · 2026-06-29 · unverdicted · none · ref 2
New class of CDF-based estimators for sliced Wasserstein distance avoids sorting, enables massive parallelism, and suits federated learning and Gaussian mixture models.
Randomized conjugate gradient least squares math.NA · 2026-05-24 · unverdicted · none · ref 10
RCGLS replaces the gradient in CGLS with a randomized coordinate version via a constraint correction view, proving linear convergence in expectation better than randomized coordinate descent, plus sparse implementation and ridge regression extension.
Factor Augmented High-Dimensional SGD stat.ML · 2026-05-19 · unverdicted · none · ref 55
Proposes Factor-Augmented SGD that runs on streaming high-dimensional data and supplies the first convergence analysis explicitly accounting for latent-factor estimation error.
Distributed Learning with Adversarial Gradient Perturbations cs.LG · 2026-05-05 · unverdicted · none · ref 11
Tight feasibility thresholds are derived for the minimal sub-optimality gap in convex L-smooth distributed optimization under bounded adversarial gradient perturbations, together with algorithms attaining them at matching query complexity.
One Coordinate at a Time: Convergence Guarantees for Rotosolve in Variational Quantum Algorithms quant-ph · 2026-04-28 · unverdicted · none · ref 3
Rotosolve converges to ε-stationary points for smooth non-convex objectives and ε-suboptimal points under PL, with explicit worst-case rates in the finite-shot regime, outperforming or matching RCD in nuanced ways.
Mini-Batch Stochastic Krasnosel'ski\u\i-Mann Algorithm for Nonexpansive Fixed Point Problems math.OC · 2026-04-08 · unverdicted · none · ref 12
A mini-batch stochastic Krasnosel'skiĭ-Mann algorithm converges almost surely to fixed points of nonexpansive mappings when batch sizes increase appropriately.
Constrained free energy minimization for the design of thermal states and stabilizer thermodynamic systems quant-ph · 2025-08-12 · unverdicted · none · ref 77
Benchmarks gradient-ascent algorithms for constrained free energy minimization on quantum Heisenberg models and stabilizer codes, with applications to thermal state design and fixed-temperature quantum encoding.
On subspace-constrained preconditioning for randomized iterative methods math.NA · 2026-05-28 · unverdicted · none · ref 24
Refines subspace preconditioning for randomized linear solvers via QR-like factorization, enabling implicit use and proving expected linear convergence while reducing to a smaller system with good singular values.
Accelerated Dynamic Importance Weighting with Versatile Divergence-Minimizing Estimators cs.LG · 2026-05-25 · unverdicted · none · ref 52
ADIW accelerates dynamic importance weighting for joint distribution shift by using a few lightweight projected gradient descent updates with warm-starting from prior weights and generalizes it to support multiple divergence-based estimators in a plug-and-play manner.
COOPO: Cyclic Offline-Online Policy Optimization Algorithm cs.LG · 2026-05-18 · unverdicted · none · ref 45
COOPO is a cyclic offline-online RL algorithm that repeatedly anchors the policy to a dataset via KL-regularized updates then fine-tunes online, claiming better sample efficiency and monotonic improvement under coverage assumptions.
HTMuon: Improving Muon via Heavy-Tailed Spectral Correction cs.LG · 2026-03-10 · unverdicted · none · ref 10
HTMuon modifies Muon to produce heavier-tailed updates and weight spectra via HT-SR theory, yielding up to 0.98 lower perplexity on LLaMA pretraining and serving as a plug-in for other Muon variants.
Stochastic versus Deterministic in Stochastic Gradient Descent math.OC · 2025-09-03 · unverdicted · none · ref 36
Treating stochastic and deterministic gradients separately in mini-batch SGD yields faster convergence and smaller error radius than uniform treatment, with further gains under strong convexity.
On the Convergence Analysis of Muon stat.ML · 2025-05-29 · unverdicted · none · ref 8
Convergence analysis shows Muon outperforms gradient descent by exploiting low-rank structure in neural network Hessians.
Accelerated Gradient Descent for Faster Convergence with Minimal Overhead cs.LG · 2026-05-15 · unverdicted · none · ref 9
CT-AGD accelerates first-order optimization in deep learning by using finite-difference curvature estimates and noise-mitigation heuristics, achieving equivalent accuracy with 33% fewer training epochs and overhead comparable to Adam.
How AI settled the complexity of the oldest SGD algorithm cs.LG · 2026-06-28 · unverdicted · none · ref 17
AI models discovered the worst-case complexity of the Kaczmarz algorithm for solving linear systems.

Handbook of convergence theorems for (stochastic) gradient methods.arXiv preprint arXiv:2301.11235

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer