International Conference on Artificial Intelligence and Statistics , pages=

A unified analysis of extra-gradient, optimistic gradient methods for saddle point problems: Proximal point approach , author= · 2020

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

browse 3 citing papers

representative citing papers

Understanding Dynamics of Adam in Zero-Sum Games: An ODE Approach

cs.LG · 2026-05-19 · unverdicted · novelty 7.0

Derives ODE limits of Adam-DA showing that first- and second-order momentum parameters reverse their convergence roles in zero-sum games compared to minimization, validated on GAN experiments.

Training Deep Learning Models with Norm-Constrained LMOs

cs.LG · 2025-02-11 · unverdicted · novelty 7.0

Scion is a new stochastic LMO-based optimizer family that unifies existing methods, supports unconstrained problems, and delivers hyperparameter transferability plus speedups on nanoGPT training.

Why SGD is not Brownian Motion: A New Perspective on Stochastic Dynamics

cs.LG · 2026-05-21 · unverdicted · novelty 6.0

SGD is reformulated via a master equation from discrete updates, producing a discrete Fokker-Planck equation that predicts non-stationary variance growth proportional to learning rate in flat Hessian directions.

citing papers explorer

Showing 3 of 3 citing papers.

Understanding Dynamics of Adam in Zero-Sum Games: An ODE Approach cs.LG · 2026-05-19 · unverdicted · none · ref 129
Derives ODE limits of Adam-DA showing that first- and second-order momentum parameters reverse their convergence roles in zero-sum games compared to minimization, validated on GAN experiments.
Training Deep Learning Models with Norm-Constrained LMOs cs.LG · 2025-02-11 · unverdicted · none · ref 79
Scion is a new stochastic LMO-based optimizer family that unifies existing methods, supports unconstrained problems, and delivers hyperparameter transferability plus speedups on nanoGPT training.
Why SGD is not Brownian Motion: A New Perspective on Stochastic Dynamics cs.LG · 2026-05-21 · unverdicted · none · ref 35
SGD is reformulated via a master equation from discrete updates, producing a discrete Fokker-Planck equation that predicts non-stationary variance growth proportional to learning rate in flat Hessian directions.

International Conference on Artificial Intelligence and Statistics , pages=

fields

years

verdicts

representative citing papers

citing papers explorer