hub

Handbook of convergence theorems for (stochastic) gradient methods.arXiv preprint arXiv:2301.11235

Handbook of convergence theorems for (stochastic) gradient methods , author= · 2023 · arXiv 2301.11235

26 Pith papers cite this work. Polarity classification is still indexing.

26 Pith papers citing it

read on arXiv browse 26 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Random Reshuffling Dominates Stochastic Gradient Descent

math.OC · 2026-06-30 · unverdicted · novelty 8.0

RR dominates SGD in smooth convex optimization under any reasonable stepsize after any finite number of epochs.

Effective dynamics of the Sinkhorn algorithm in the regime of low entropy regularization

math.OC · 2026-07-01 · unverdicted · novelty 7.0

Derives the cold Sinkhorn limiting dynamics as tau approaches zero, proving finite-time convergence to unregularized OT and improved O(tau^{-1}) iteration complexity for dual suboptimality.

Sharp $O(1/k)$ convergence rate for the Sinkhorn algorithm via a local analysis

math.OC · 2026-06-27 · unverdicted · novelty 7.0

Proves sharp O(1/k) rate for Sinkhorn via local bipartite graph analysis of positive-mass edges, bootstrapped from prior almost-sharp global bound.

Stochastic Krasnoselskii-Mann Iterations: Convergence without Uniformly Bounded Variance

math.OC · 2026-04-24 · unverdicted · novelty 7.0 · 2 refs

Stochastic Krasnoselskii-Mann iterations converge almost surely and with rates under finite variance at a single fixed point rather than uniform variance bounds, recovering optimal complexity and providing first such results for some splitting methods.

Stochastic Gradient Variational Inference with Price's Gradient Estimator from Bures-Wasserstein to Parameter Space

stat.ML · 2026-02-21 · unverdicted · novelty 7.0

Price's gradient estimator enables black-box VI to achieve the same state-of-the-art iteration complexity as Wasserstein VI, with experiments confirming it as the main performance driver.

On the Convergence Rate of LoRA Gradient Descent

cs.LG · 2025-12-20 · unverdicted · novelty 7.0

LoRA gradient descent converges to a stationary point at rate O(1/log T).

SketchGuard: Scaling Byzantine-Robust Decentralized Federated Learning via Sketch-Based Screening

cs.LG · 2025-10-09 · accept · novelty 7.0

SketchGuard decouples Byzantine filtering from aggregation in decentralized federated learning by exchanging k-dimensional Count Sketches for screening and full models only from accepted neighbors, achieving up to 50-70% communication savings while proving convergence and matching SOTA robustness.

How does the optimizer implicitly bias the model merging loss landscape?

cs.LG · 2025-10-06 · unverdicted · novelty 7.0

Effective noise scale non-monotonically governs model merging success with an optimum, unifying effects of learning rate, weight decay, batch size, and augmentation on the loss landscape.

Highly Data Parallelizable Estimation of the Sliced-Wasserstein Distance Using Cumulative Distribution Functions

stat.ML · 2026-06-29 · unverdicted · novelty 6.0

New class of CDF-based estimators for sliced Wasserstein distance avoids sorting, enables massive parallelism, and suits federated learning and Gaussian mixture models.

Randomized conjugate gradient least squares

math.NA · 2026-05-24 · unverdicted · novelty 6.0

RCGLS replaces the gradient in CGLS with a randomized coordinate version via a constraint correction view, proving linear convergence in expectation better than randomized coordinate descent, plus sparse implementation and ridge regression extension.

Factor Augmented High-Dimensional SGD

stat.ML · 2026-05-19 · unverdicted · novelty 6.0

Proposes Factor-Augmented SGD that runs on streaming high-dimensional data and supplies the first convergence analysis explicitly accounting for latent-factor estimation error.

Distributed Learning with Adversarial Gradient Perturbations

cs.LG · 2026-05-05 · unverdicted · novelty 6.0

Tight feasibility thresholds are derived for the minimal sub-optimality gap in convex L-smooth distributed optimization under bounded adversarial gradient perturbations, together with algorithms attaining them at matching query complexity.

One Coordinate at a Time: Convergence Guarantees for Rotosolve in Variational Quantum Algorithms

quant-ph · 2026-04-28 · unverdicted · novelty 6.0

Rotosolve converges to ε-stationary points for smooth non-convex objectives and ε-suboptimal points under PL, with explicit worst-case rates in the finite-shot regime, outperforming or matching RCD in nuanced ways.

Mini-Batch Stochastic Krasnosel'ski\u\i-Mann Algorithm for Nonexpansive Fixed Point Problems

math.OC · 2026-04-08 · unverdicted · novelty 6.0

A mini-batch stochastic Krasnosel'skiĭ-Mann algorithm converges almost surely to fixed points of nonexpansive mappings when batch sizes increase appropriately.

Constrained free energy minimization for the design of thermal states and stabilizer thermodynamic systems

quant-ph · 2025-08-12 · unverdicted · novelty 6.0

Benchmarks gradient-ascent algorithms for constrained free energy minimization on quantum Heisenberg models and stabilizer codes, with applications to thermal state design and fixed-temperature quantum encoding.

On subspace-constrained preconditioning for randomized iterative methods

math.NA · 2026-05-28 · unverdicted · novelty 5.0

Refines subspace preconditioning for randomized linear solvers via QR-like factorization, enabling implicit use and proving expected linear convergence while reducing to a smaller system with good singular values.

Accelerated Dynamic Importance Weighting with Versatile Divergence-Minimizing Estimators

cs.LG · 2026-05-25 · unverdicted · novelty 5.0

ADIW accelerates dynamic importance weighting for joint distribution shift by using a few lightweight projected gradient descent updates with warm-starting from prior weights and generalizes it to support multiple divergence-based estimators in a plug-and-play manner.

COOPO: Cyclic Offline-Online Policy Optimization Algorithm

cs.LG · 2026-05-18 · unverdicted · novelty 5.0

COOPO is a cyclic offline-online RL algorithm that repeatedly anchors the policy to a dataset via KL-regularized updates then fine-tunes online, claiming better sample efficiency and monotonic improvement under coverage assumptions.

Optimal Asymptotic Rates for (Stochastic) Gradient Descent under the Local PL-Condition: A Geometric Approach

math.OC · 2026-05-14 · unverdicted · novelty 5.0

Under the local PL condition with multiplicative noise for C² functions, (S)GD asymptotic rates match those of strongly convex quadratics via a geometric argument.

HTMuon: Improving Muon via Heavy-Tailed Spectral Correction

cs.LG · 2026-03-10 · unverdicted · novelty 5.0

HTMuon modifies Muon to produce heavier-tailed updates and weight spectra via HT-SR theory, yielding up to 0.98 lower perplexity on LLaMA pretraining and serving as a plug-in for other Muon variants.

Stochastic versus Deterministic in Stochastic Gradient Descent

math.OC · 2025-09-03 · unverdicted · novelty 5.0

Treating stochastic and deterministic gradients separately in mini-batch SGD yields faster convergence and smaller error radius than uniform treatment, with further gains under strong convexity.

On the Convergence Analysis of Muon

stat.ML · 2025-05-29 · unverdicted · novelty 5.0

Convergence analysis shows Muon outperforms gradient descent by exploiting low-rank structure in neural network Hessians.

FOAM: Frequency and Operator Error-Based Adaptive Damping Method for Reducing Staleness-Oriented Error for Shampoo

cs.LG · 2026-06-01 · unverdicted · novelty 4.0

FOAM adaptively controls damping and update frequency in Shampoo based on staleness-oriented error approximation to cut wall-clock time while preserving convergence.

Accelerated Gradient Descent for Faster Convergence with Minimal Overhead

cs.LG · 2026-05-15 · unverdicted · novelty 4.0

CT-AGD accelerates first-order optimization in deep learning by using finite-difference curvature estimates and noise-mitigation heuristics, achieving equivalent accuracy with 33% fewer training epochs and overhead comparable to Adam.

citing papers explorer

Showing 0 of 0 citing papers after filters.

No citing papers match the current filters.

Handbook of convergence theorems for (stochastic) gradient methods.arXiv preprint arXiv:2301.11235

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer