Adabelief optimizer: Adapting stepsizes by the belief in observed gradients

AdaBelief Optimizer: Adapting Stepsizes by the Belief in Observed Gradients · 2010 · arXiv 2010.07468

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

representative citing papers

Gaussian processes on ray-guided transformed uniform grids for fast, flexible, and auto-differentiable adaptive source reconstruction in lens modelling

astro-ph.IM · 2026-06-29 · unverdicted · novelty 7.0

A new RTU grid method models the lensing source as a Gaussian process on a ray-transformed uniform grid, achieving comparable fits with roughly half the pixels per dimension and higher ELBOs on mock data.

On the Convergence of Muon and Beyond

cs.LG · 2025-09-19 · unverdicted · novelty 7.0

Muon-MVR2 attains the optimal anytime convergence rate of ~O(T^{-1/3}) in stochastic non-convex settings under horizon-free schedules.

Ligandformer: A Graph Neural Network for Predicting Compound Property with Robust Interpretation

q-bio.BM · 2022-02-21 · unverdicted · novelty 4.0

Ligandformer is a self-attention graph neural network framework that predicts compound properties, outputs attention maps for local structural interpretation, and claims improved robustness and generalization over prior methods.

citing papers explorer

Showing 1 of 1 citing paper after filters.

On the Convergence of Muon and Beyond cs.LG · 2025-09-19 · unverdicted · none · ref 56
Muon-MVR2 attains the optimal anytime convergence rate of ~O(T^{-1/3}) in stochastic non-convex settings under horizon-free schedules.

Adabelief optimizer: Adapting stepsizes by the belief in observed gradients

fields

years

verdicts

representative citing papers

citing papers explorer