pith. sign in

arxiv: 2405.03676 · v1 · pith:TOXE5HFRnew · submitted 2024-05-06 · 💻 cs.LG

Why is SAM Robust to Label Noise?

classification 💻 cs.LG
keywords effectlabelnetworksnoiseinducedjacobianrobustnesschanges
0
0 comments X
read the original abstract

Sharpness-Aware Minimization (SAM) is most known for achieving state-of the-art performances on natural image and language tasks. However, its most pronounced improvements (of tens of percent) is rather in the presence of label noise. Understanding SAM's label noise robustness requires a departure from characterizing the robustness of minimas lying in "flatter" regions of the loss landscape. In particular, the peak performance under label noise occurs with early stopping, far before the loss converges. We decompose SAM's robustness into two effects: one induced by changes to the logit term and the other induced by changes to the network Jacobian. The first can be observed in linear logistic regression where SAM provably up-weights the gradient contribution from clean examples. Although this explicit up-weighting is also observable in neural networks, when we intervene and modify SAM to remove this effect, surprisingly, we see no visible degradation in performance. We infer that SAM's effect in deeper networks is instead explained entirely by the effect SAM has on the network Jacobian. We theoretically derive the implicit regularization induced by this Jacobian effect in two layer linear networks. Motivated by our analysis, we see that cheaper alternatives to SAM that explicitly induce these regularization effects largely recover the benefits in deep networks trained on real-world datasets.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. SHAPO: Sharpness-Aware Policy Optimization for Safe Exploration

    cs.LG 2026-06 unverdicted novelty 5.0

    SHAPO adds a sharpness-aware adjustment to policy optimization that reweights gradients to favor conservative behavior in uncertain areas, yielding better safety-performance tradeoffs on continuous control tasks.