Adaptive subgradient methods for online learning and stochastic optimization

· 2011

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

browse 3 citing papers

representative citing papers

An Improved Adaptive PID Optimizer with Enhanced Convergence and Stability for Deep Learning

cs.LG · 2026-05-21 · unverdicted · novelty 5.0

IAdaPID-ADG integrates non-increasing effective learning rates from AMSGrad and gradient-difference modulation from DiffGrad into AdaPID, yielding better convergence and stability than prior optimizers on MNIST, CIFAR10, IARC, and AnnoCerv.

FG$^2$-GDN: Enhancing Long-Context Gated Delta Networks with Doubly Fine-Grained Control

cs.LG · 2026-04-21 · unverdicted · novelty 5.0

FG²-GDN replaces the scalar beta in the delta update with a channel-wise vector and decouples key/value scaling to improve recall over prior GDN and KDA models.

Training Neural Networks with Optimal Double-Bayesian Learning

cs.LG · 2026-05-19 · unverdicted · novelty 4.0

A double-Bayesian framework derives an optimal learning rate for neural network training via two antagonistic Bayesian processes.

citing papers explorer

Showing 3 of 3 citing papers.

An Improved Adaptive PID Optimizer with Enhanced Convergence and Stability for Deep Learning cs.LG · 2026-05-21 · unverdicted · none · ref 20
IAdaPID-ADG integrates non-increasing effective learning rates from AMSGrad and gradient-difference modulation from DiffGrad into AdaPID, yielding better convergence and stability than prior optimizers on MNIST, CIFAR10, IARC, and AnnoCerv.
FG$^2$-GDN: Enhancing Long-Context Gated Delta Networks with Doubly Fine-Grained Control cs.LG · 2026-04-21 · unverdicted · none · ref 8
FG²-GDN replaces the scalar beta in the delta update with a channel-wise vector and decouples key/value scaling to improve recall over prior GDN and KDA models.
Training Neural Networks with Optimal Double-Bayesian Learning cs.LG · 2026-05-19 · unverdicted · none · ref 8
A double-Bayesian framework derives an optimal learning rate for neural network training via two antagonistic Bayesian processes.

Adaptive subgradient methods for online learning and stochastic optimization

fields

years

verdicts

representative citing papers

citing papers explorer