IAdaPID-ADG integrates non-increasing effective learning rates from AMSGrad and gradient-difference modulation from DiffGrad into AdaPID, yielding better convergence and stability than prior optimizers on MNIST, CIFAR10, IARC, and AnnoCerv.
Adaptive subgradient methods for online learning and stochastic optimization
3 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.LG 3years
2026 3verdicts
UNVERDICTED 3representative citing papers
FG²-GDN replaces the scalar beta in the delta update with a channel-wise vector and decouples key/value scaling to improve recall over prior GDN and KDA models.
A double-Bayesian framework derives an optimal learning rate for neural network training via two antagonistic Bayesian processes.
citing papers explorer
-
An Improved Adaptive PID Optimizer with Enhanced Convergence and Stability for Deep Learning
IAdaPID-ADG integrates non-increasing effective learning rates from AMSGrad and gradient-difference modulation from DiffGrad into AdaPID, yielding better convergence and stability than prior optimizers on MNIST, CIFAR10, IARC, and AnnoCerv.
-
FG$^2$-GDN: Enhancing Long-Context Gated Delta Networks with Doubly Fine-Grained Control
FG²-GDN replaces the scalar beta in the delta update with a channel-wise vector and decouples key/value scaling to improve recall over prior GDN and KDA models.
-
Training Neural Networks with Optimal Double-Bayesian Learning
A double-Bayesian framework derives an optimal learning rate for neural network training via two antagonistic Bayesian processes.