pith. sign in

Why gradient clipping accelerates training: A theoretical justification for adaptivity

9 Pith papers cite this work. Polarity classification is still indexing.

9 Pith papers citing it

citation-role summary

background 2

citation-polarity summary

verdicts

UNVERDICTED 9

roles

background 2

polarities

background 1 unclear 1

representative citing papers

The Multi-Block DC Function Class: Theory, Algorithms, and Applications

math.OC · 2026-04-19 · unverdicted · novelty 7.0

The Multi-Block DC class admits polynomial-size DC decompositions for problems that require exponential size under standard DC programming and supplies explicit constructive formulations for deep ReLU networks together with convergent batch and stochastic algorithms.

Distributionally Robust Multi-Objective Optimization

cs.LG · 2026-05-07 · unverdicted · novelty 6.0

DR-MOO adds distributional robustness to multi-objective optimization and gives single-loop MGDA algorithms reaching epsilon-Pareto-stationary points in O(epsilon^{-4}) samples for nonconvex problems.

Cost-Aware Learning

cs.LG · 2026-04-30 · unverdicted · novelty 6.0

Cost-aware SGD achieves target error with lower total sampling cost than standard methods, and Cost-Aware GRPO reduces token usage by up to 30% in LLM reinforcement learning while matching baseline performance.

Adaptive Federated Optimization

cs.LG · 2020-02-29 · unverdicted · novelty 6.0

Proposes federated adaptive optimizers (FedAdagrad, FedAdam, FedYogi) with convergence analysis for non-convex objectives under data heterogeneity and reports empirical gains over FedAvg.

Frank-Wolfe Algorithms for (L0, L1)-smooth functions

math.OC · 2025-10-18 · unverdicted · novelty 5.0 · 2 refs

Proposes (L0, L1)-Frank-Wolfe and adaptive variant claiming superior convergence rates for (L0, L1)-smooth objectives over classical Frank-Wolfe.

citing papers explorer

Showing 9 of 9 citing papers.

  • Stochastic Non-Smooth Convex Optimization with Unbounded Gradients math.OC · 2026-05-15 · unverdicted · none · ref 3

    Introduces generalized Lipschitz class and shows clipped AdamW outperforms SGD and AdaGrad for stochastic convex optimization under this and related assumptions.

  • Beyond Bounded Variance: Variance-Reduced Normalized Methods for Nonconvex Optimization under Blum-Gladyshev Noise cs.LG · 2026-05-14 · unverdicted · none · ref 44

    Normalized momentum SGD and variance-reduced STORM achieve O(ε^{-6}) and O(ε^{-4}) oracle complexities respectively under quadratic distance-dependent noise in nonconvex stochastic optimization.

  • Newton methods beyond Hessian Lipschitz continuity: A nonlinear preconditioning approach math.OC · 2026-05-12 · unverdicted · none · ref 42

    Nonlinear preconditioning extends Newton methods to objectives lacking Hessian Lipschitz continuity by analyzing a transformed mapping under a relaxed smoothness condition, with superlinear convergence and O(ε^{-3/2}) iteration complexity.

  • The Multi-Block DC Function Class: Theory, Algorithms, and Applications math.OC · 2026-04-19 · unverdicted · none · ref 14

    The Multi-Block DC class admits polynomial-size DC decompositions for problems that require exponential size under standard DC programming and supplies explicit constructive formulations for deep ReLU networks together with convergent batch and stochastic algorithms.

  • Distributionally Robust Multi-Objective Optimization cs.LG · 2026-05-07 · unverdicted · none · ref 27

    DR-MOO adds distributional robustness to multi-objective optimization and gives single-loop MGDA algorithms reaching epsilon-Pareto-stationary points in O(epsilon^{-4}) samples for nonconvex problems.

  • Cost-Aware Learning cs.LG · 2026-04-30 · unverdicted · none · ref 22

    Cost-aware SGD achieves target error with lower total sampling cost than standard methods, and Cost-Aware GRPO reduces token usage by up to 30% in LLM reinforcement learning while matching baseline performance.

  • Adaptive Federated Optimization cs.LG · 2020-02-29 · unverdicted · none · ref 44

    Proposes federated adaptive optimizers (FedAdagrad, FedAdam, FedYogi) with convergence analysis for non-convex objectives under data heterogeneity and reports empirical gains over FedAvg.

  • Constrained Stochastic Spectral Preconditioning Converges for Nonconvex Objectives math.OC · 2026-05-12 · unverdicted · none · ref 71

    Proximal stochastic spectral preconditioning converges for nonconvex constrained objectives under heavy-tailed noise, with a variance-reduced version achieving faster rates and a refined analysis of Muon iterations.

  • Frank-Wolfe Algorithms for (L0, L1)-smooth functions math.OC · 2025-10-18 · unverdicted · none · ref 21 · 2 links

    Proposes (L0, L1)-Frank-Wolfe and adaptive variant claiming superior convergence rates for (L0, L1)-smooth objectives over classical Frank-Wolfe.