arXiv preprint arXiv:2211.05729 , year=

How does sharpness-aware minimization minimize sharpness? , author= · 2022 · arXiv 2211.05729

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

read on arXiv browse 3 citing papers

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

Flatness and Gradient Alignment Are Both Necessary: Spectral-Aware Gradient-Aligned Exploration for Multi-Distribution Learning

cs.LG · 2026-05-08 · unverdicted · novelty 7.0

Excess risk decomposes into independent alignment (trace of inverse average Hessian times gradient covariance) and curvature terms, so both flatness and gradient alignment are required; SAGE achieves this and sets new SOTA on DomainBed.

Nexus: Same Pretraining Loss, Better Downstream Generalization via Common Minima

cs.LG · 2026-04-10 · unverdicted · novelty 6.0

Nexus optimizer improves LLM downstream performance by converging to common minima across data sources despite identical pretraining loss.

There Will Be a Scientific Theory of Deep Learning

stat.ML · 2026-04-23 · unverdicted · novelty 2.0

A mechanics of the learning process is emerging in deep learning theory, characterized by dynamics, coarse statistics, and falsifiable predictions across idealized settings, limits, laws, hyperparameters, and universal behaviors.

citing papers explorer

Showing 3 of 3 citing papers.

Flatness and Gradient Alignment Are Both Necessary: Spectral-Aware Gradient-Aligned Exploration for Multi-Distribution Learning cs.LG · 2026-05-08 · unverdicted · none · ref 33
Excess risk decomposes into independent alignment (trace of inverse average Hessian times gradient covariance) and curvature terms, so both flatness and gradient alignment are required; SAGE achieves this and sets new SOTA on DomainBed.
Nexus: Same Pretraining Loss, Better Downstream Generalization via Common Minima cs.LG · 2026-04-10 · unverdicted · none · ref 44
Nexus optimizer improves LLM downstream performance by converging to common minima across data sources despite identical pretraining loss.
There Will Be a Scientific Theory of Deep Learning stat.ML · 2026-04-23 · unverdicted · none · ref 118
A mechanics of the learning process is emerging in deep learning theory, characterized by dynamics, coarse statistics, and falsifiable predictions across idealized settings, limits, laws, hyperparameters, and universal behaviors.

arXiv preprint arXiv:2211.05729 , year=

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer