International Conference on Learning Representations , year=

Sharpness-aware Minimization for Efficiently Improving Generalization , author=

8 Pith papers cite this work. Polarity classification is still indexing.

8 Pith papers citing it

browse 8 citing papers

citation-role summary

background 1 method 1

citation-polarity summary

unclear 1 use method 1

representative citing papers

Pointwise Generalization in Deep Neural Networks

cs.LG · 2026-05-18 · unverdicted · novelty 7.0

Proposes pointwise Riemannian Dimension from feature eigenvalues to derive tighter, representation-aware generalization bounds for deep networks in the nonlinear regime.

Fix the Loss, Not the Radius: Rethinking the Adversarial Perturbation of Sharpness-Aware Minimization

cs.LG · 2026-05-11 · unverdicted · novelty 7.0

LE-SAM inverts SAM by fixing the loss budget instead of the parameter-space radius, yielding better generalization across benchmarks.

Towards Understanding Self-Pretraining for Sequence Classification

cs.LG · 2026-05-20 · unverdicted · novelty 6.0

Self-pretraining improves Transformer sequence classification by enabling learning of proximity-biased attention from positional encodings that label supervision alone cannot easily acquire from random starts.

Adapt or Forget: Provable Tradeoffs Between Adam and SGD in Nonstationary Optimization

stat.ML · 2026-05-05 · unverdicted · novelty 6.0

Adam's adaptive preconditioning and first-moment averaging improve high-probability tracking error in noise-dominated nonstationary regimes but can increase it under strong drift, where SGD achieves a smaller floor, with explicit beta-dependent bounds.

Sharpness-Aware Pretraining Mitigates Catastrophic Forgetting

cs.LG · 2026-05-04 · unverdicted · novelty 6.0

Sharpness-aware pretraining and related flat-minima interventions reduce catastrophic forgetting by up to 80% after post-training across 20M-150M models and by 31-40% at 1B scale.

Towards E-Value Based Stopping Rules for Bayesian Deep Ensembles

cs.LG · 2026-04-20 · unverdicted · novelty 6.0

E-value sequential tests enable early stopping of MCMC sampling in Bayesian deep ensembles, often needing only a fraction of the full budget while improving over standard deep ensembles.

Position: Zeroth-Order Optimization in Deep Learning Is Underexplored, Not Underpowered

cs.LG · 2026-05-15 · unverdicted · novelty 5.0

Zeroth-order optimization is underexplored rather than underpowered in deep learning, with limitations stemming from full-space designs that can be addressed via subspace, spectral, and systems-aware approaches.

LIVEditor-14B: Lightning Unified Video Editing via In-Context Sparse Attention

cs.CV · 2026-05-06

citing papers explorer

Showing 1 of 1 citing paper after filters.

Fix the Loss, Not the Radius: Rethinking the Adversarial Perturbation of Sharpness-Aware Minimization cs.LG · 2026-05-11 · unverdicted · none · ref 15
LE-SAM inverts SAM by fixing the loss budget instead of the parameter-space radius, yielding better generalization across benchmarks.

International Conference on Learning Representations , year=

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer