SIAM journal on control and optimization , volume=

Acceleration of stochastic approximation by averaging , author= · 1992

8 Pith papers cite this work. Polarity classification is still indexing.

8 Pith papers citing it

browse 8 citing papers

representative citing papers

AMUSE: Anytime Muon with Stable Gradient Evaluation

cs.LG · 2026-05-21 · unverdicted · novelty 7.0

AMUSE is a new optimizer integrating Muon orthogonalization with Schedule-Free averaging via adaptive interpolation for schedule-free anytime training that improves Pareto frontiers on vision and LLM tasks.

Concentration of General Stochastic Approximation Under Heavy-Tailed Markovian Noise

math.PR · 2026-05-20 · unverdicted · novelty 7.0

Establishes maximal concentration bounds for stochastic approximation under heavy-tailed Markovian noise, with tails ranging from sub-Gaussian to heavier than Weibull depending on step sizes and contractivity properties, plus a truncation argument for unbounded noise.

Gaussian Approximation and Multiplier Bootstrap for Federated Linear Stochastic Approximation

stat.ML · 2026-05-19 · unverdicted · novelty 7.0

Establishes non-asymptotic Gaussian approximation bounds for federated LSA with explicit communication-heterogeneity trade-offs and introduces an online multiplier bootstrap for last-iterate inference with validity guarantees.

Berry-Esseen bounds for multivariate martingale difference sequences in the Kolmogorov distance

math.PR · 2026-05-04 · unverdicted · novelty 6.0

New Berry-Esseen bounds for multivariate martingale difference sequences achieve n^{-1/4} rate and polylog(d) dimension dependence in Kolmogorov distance.

Vision Transformers Need Registers

cs.CV · 2023-09-28 · unverdicted · novelty 6.0

Adding register tokens to Vision Transformers eliminates high-norm background artifacts and raises state-of-the-art performance on dense visual prediction tasks.

Efficient Training of Language Models to Fill in the Middle

cs.CL · 2022-07-28 · unverdicted · novelty 6.0

Autoregressive language models trained on data with middle spans relocated to the end learn infilling without degrading left-to-right perplexity or sampling quality.

The Platonic Representation Hypothesis

cs.LG · 2024-05-13 · unverdicted · novelty 5.0

Representations learned by large AI models are converging toward a shared statistical model of reality.

Stochastic Optimization and Data Science

math.OC · 2026-05-16 · unverdicted · novelty 2.0

The paper motivates stochastic optimization problems from statistical perspectives and describes offline and online approaches to solve expectation minimization problems.

citing papers explorer

Showing 8 of 8 citing papers.

AMUSE: Anytime Muon with Stable Gradient Evaluation cs.LG · 2026-05-21 · unverdicted · none · ref 28
AMUSE is a new optimizer integrating Muon orthogonalization with Schedule-Free averaging via adaptive interpolation for schedule-free anytime training that improves Pareto frontiers on vision and LLM tasks.
Concentration of General Stochastic Approximation Under Heavy-Tailed Markovian Noise math.PR · 2026-05-20 · unverdicted · none · ref 209
Establishes maximal concentration bounds for stochastic approximation under heavy-tailed Markovian noise, with tails ranging from sub-Gaussian to heavier than Weibull depending on step sizes and contractivity properties, plus a truncation argument for unbounded noise.
Gaussian Approximation and Multiplier Bootstrap for Federated Linear Stochastic Approximation stat.ML · 2026-05-19 · unverdicted · none · ref 55
Establishes non-asymptotic Gaussian approximation bounds for federated LSA with explicit communication-heterogeneity trade-offs and introduces an online multiplier bootstrap for last-iterate inference with validity guarantees.
Berry-Esseen bounds for multivariate martingale difference sequences in the Kolmogorov distance math.PR · 2026-05-04 · unverdicted · none · ref 61
New Berry-Esseen bounds for multivariate martingale difference sequences achieve n^{-1/4} rate and polylog(d) dimension dependence in Kolmogorov distance.
Vision Transformers Need Registers cs.CV · 2023-09-28 · unverdicted · none · ref 46
Adding register tokens to Vision Transformers eliminates high-norm background artifacts and raises state-of-the-art performance on dense visual prediction tasks.
Efficient Training of Language Models to Fill in the Middle cs.CL · 2022-07-28 · unverdicted · none · ref 41
Autoregressive language models trained on data with middle spans relocated to the end learn infilling without degrading left-to-right perplexity or sampling quality.
The Platonic Representation Hypothesis cs.LG · 2024-05-13 · unverdicted · none · ref 102
Representations learned by large AI models are converging toward a shared statistical model of reality.
Stochastic Optimization and Data Science math.OC · 2026-05-16 · unverdicted · none · ref 123
The paper motivates stochastic optimization problems from statistical perspectives and describes offline and online approaches to solve expectation minimization problems.

SIAM journal on control and optimization , volume=

fields

years

verdicts

representative citing papers

citing papers explorer