Neural computation , volume=

Flat minima , author= · 1997

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

browse 5 citing papers

representative citing papers

Understanding Dynamics of Adam in Zero-Sum Games: An ODE Approach

cs.LG · 2026-05-19 · unverdicted · novelty 7.0

Derives ODE limits of Adam-DA showing that first- and second-order momentum parameters reverse their convergence roles in zero-sum games compared to minimization, validated on GAN experiments.

Understanding Generalization through Decision Pattern Shift

cs.LG · 2026-05-13 · unverdicted · novelty 6.0

DPS quantifies deviation of per-sample decision patterns from class averages and shows linear correlation with generalization gaps while unifying degradation scenarios into a continuous trajectory.

Optimizer-Induced Mode Connectivity: From AdamW to Muon

cs.AI · 2026-05-11 · unverdicted · novelty 6.0

Optimizer choice induces distinct connected regions in the loss landscape of two-layer ReLU networks, with AdamW and Muon sometimes separated by provable barriers.

Don't Stop Me Yet: Sampling Loss Minima via Dissipative Riemannian Mechanics

cs.LG · 2026-05-14 · unverdicted · novelty 5.0

DiMS is a physics-inspired dynamical sampler guaranteed to exactly sample reparameterization-invariant minimum level sets in neural network loss landscapes.

From Flat Facts to Sharp Hallucinations: Detecting Stubborn Errors via Gradient Sensitivity

cs.LG · 2026-05-01 · unverdicted · novelty 5.0

EPGS detects high-confidence factual errors in LLMs by using embedding perturbations to measure gradient sensitivity as a proxy for sharp versus flat minima.

citing papers explorer

Showing 5 of 5 citing papers.

Understanding Dynamics of Adam in Zero-Sum Games: An ODE Approach cs.LG · 2026-05-19 · unverdicted · none · ref 181
Derives ODE limits of Adam-DA showing that first- and second-order momentum parameters reverse their convergence roles in zero-sum games compared to minimization, validated on GAN experiments.
Understanding Generalization through Decision Pattern Shift cs.LG · 2026-05-13 · unverdicted · none · ref 47
DPS quantifies deviation of per-sample decision patterns from class averages and shows linear correlation with generalization gaps while unifying degradation scenarios into a continuous trajectory.
Optimizer-Induced Mode Connectivity: From AdamW to Muon cs.AI · 2026-05-11 · unverdicted · none · ref 53
Optimizer choice induces distinct connected regions in the loss landscape of two-layer ReLU networks, with AdamW and Muon sometimes separated by provable barriers.
Don't Stop Me Yet: Sampling Loss Minima via Dissipative Riemannian Mechanics cs.LG · 2026-05-14 · unverdicted · none · ref 37
DiMS is a physics-inspired dynamical sampler guaranteed to exactly sample reparameterization-invariant minimum level sets in neural network loss landscapes.
From Flat Facts to Sharp Hallucinations: Detecting Stubborn Errors via Gradient Sensitivity cs.LG · 2026-05-01 · unverdicted · none · ref 2
EPGS detects high-confidence factual errors in LLMs by using embedding perturbations to measure gradient sensitivity as a proxy for sharp versus flat minima.

Neural computation , volume=

fields

years

verdicts

representative citing papers

citing papers explorer