Stochastic Krasnoselskii-Mann iterations converge almost surely and with rates under finite variance at a single fixed point rather than uniform variance bounds, recovering optimal complexity and providing first such results for some splitting methods.
hub
Handbook of convergence theorems for (stochastic) gradient methods
16 Pith papers cite this work. Polarity classification is still indexing.
hub tools
citation-role summary
citation-polarity summary
roles
background 1polarities
background 1representative citing papers
Price's gradient estimator enables black-box VI to achieve the same state-of-the-art iteration complexity as Wasserstein VI, with experiments confirming it as the main performance driver.
LoRA gradient descent converges to a stationary point at rate O(1/log T).
SketchGuard decouples Byzantine filtering from aggregation in decentralized federated learning by exchanging k-dimensional Count Sketches for screening and full models only from accepted neighbors, achieving up to 50-70% communication savings while proving convergence and matching SOTA robustness.
Effective noise scale non-monotonically governs model merging success with an optimum, unifying effects of learning rate, weight decay, batch size, and augmentation on the loss landscape.
Proposes Factor-Augmented SGD that runs on streaming high-dimensional data and supplies the first convergence analysis explicitly accounting for latent-factor estimation error.
Tight feasibility thresholds are derived for the minimal sub-optimality gap in convex L-smooth distributed optimization under bounded adversarial gradient perturbations, together with algorithms attaining them at matching query complexity.
Rotosolve converges to ε-stationary points for smooth non-convex objectives and ε-suboptimal points under PL, with explicit worst-case rates in the finite-shot regime, outperforming or matching RCD in nuanced ways.
A mini-batch stochastic Krasnosel'skiĭ-Mann algorithm converges almost surely to fixed points of nonexpansive mappings when batch sizes increase appropriately.
Benchmarks gradient-ascent algorithms for constrained free energy minimization on quantum Heisenberg models and stabilizer codes, with applications to thermal state design and fixed-temperature quantum encoding.
COOPO is a cyclic offline-online RL algorithm that repeatedly anchors the policy to a dataset via KL-regularized updates then fine-tunes online, claiming better sample efficiency and monotonic improvement under coverage assumptions.
HTMuon modifies Muon to produce heavier-tailed updates and weight spectra via HT-SR theory, yielding up to 0.98 lower perplexity on LLaMA pretraining and serving as a plug-in for other Muon variants.
Treating stochastic and deterministic gradients separately in mini-batch SGD yields faster convergence and smaller error radius than uniform treatment, with further gains under strong convexity.
Convergence analysis shows Muon outperforms gradient descent by exploiting low-rank structure in neural network Hessians.
CT-AGD accelerates first-order optimization in deep learning by using finite-difference curvature estimates and noise-mitigation heuristics, achieving equivalent accuracy with 33% fewer training epochs and overhead comparable to Adam.
citing papers explorer
-
Stochastic Krasnoselskii-Mann Iterations: Convergence without Uniformly Bounded Variance
Stochastic Krasnoselskii-Mann iterations converge almost surely and with rates under finite variance at a single fixed point rather than uniform variance bounds, recovering optimal complexity and providing first such results for some splitting methods.
-
Stochastic Gradient Variational Inference with Price's Gradient Estimator from Bures-Wasserstein to Parameter Space
Price's gradient estimator enables black-box VI to achieve the same state-of-the-art iteration complexity as Wasserstein VI, with experiments confirming it as the main performance driver.
-
On the Convergence Rate of LoRA Gradient Descent
LoRA gradient descent converges to a stationary point at rate O(1/log T).
-
SketchGuard: Scaling Byzantine-Robust Decentralized Federated Learning via Sketch-Based Screening
SketchGuard decouples Byzantine filtering from aggregation in decentralized federated learning by exchanging k-dimensional Count Sketches for screening and full models only from accepted neighbors, achieving up to 50-70% communication savings while proving convergence and matching SOTA robustness.
-
How does the optimizer implicitly bias the model merging loss landscape?
Effective noise scale non-monotonically governs model merging success with an optimum, unifying effects of learning rate, weight decay, batch size, and augmentation on the loss landscape.
-
Factor Augmented High-Dimensional SGD
Proposes Factor-Augmented SGD that runs on streaming high-dimensional data and supplies the first convergence analysis explicitly accounting for latent-factor estimation error.
-
Distributed Learning with Adversarial Gradient Perturbations
Tight feasibility thresholds are derived for the minimal sub-optimality gap in convex L-smooth distributed optimization under bounded adversarial gradient perturbations, together with algorithms attaining them at matching query complexity.
-
One Coordinate at a Time: Convergence Guarantees for Rotosolve in Variational Quantum Algorithms
Rotosolve converges to ε-stationary points for smooth non-convex objectives and ε-suboptimal points under PL, with explicit worst-case rates in the finite-shot regime, outperforming or matching RCD in nuanced ways.
-
Mini-Batch Stochastic Krasnosel'ski\u\i-Mann Algorithm for Nonexpansive Fixed Point Problems
A mini-batch stochastic Krasnosel'skiĭ-Mann algorithm converges almost surely to fixed points of nonexpansive mappings when batch sizes increase appropriately.
-
Constrained free energy minimization for the design of thermal states and stabilizer thermodynamic systems
Benchmarks gradient-ascent algorithms for constrained free energy minimization on quantum Heisenberg models and stabilizer codes, with applications to thermal state design and fixed-temperature quantum encoding.
-
COOPO: Cyclic Offline-Online Policy Optimization Algorithm
COOPO is a cyclic offline-online RL algorithm that repeatedly anchors the policy to a dataset via KL-regularized updates then fine-tunes online, claiming better sample efficiency and monotonic improvement under coverage assumptions.
-
HTMuon: Improving Muon via Heavy-Tailed Spectral Correction
HTMuon modifies Muon to produce heavier-tailed updates and weight spectra via HT-SR theory, yielding up to 0.98 lower perplexity on LLaMA pretraining and serving as a plug-in for other Muon variants.
-
Stochastic versus Deterministic in Stochastic Gradient Descent
Treating stochastic and deterministic gradients separately in mini-batch SGD yields faster convergence and smaller error radius than uniform treatment, with further gains under strong convexity.
-
On the Convergence Analysis of Muon
Convergence analysis shows Muon outperforms gradient descent by exploiting low-rank structure in neural network Hessians.
-
Accelerated Gradient Descent for Faster Convergence with Minimal Overhead
CT-AGD accelerates first-order optimization in deep learning by using finite-difference curvature estimates and noise-mitigation heuristics, achieving equivalent accuracy with 33% fewer training epochs and overhead comparable to Adam.
- Complex Stochastic Gradient Descent and Directional Bias in Reproducing Kernel Hilbert Spaces