{"paper":{"title":"Sharpness-Aware Minimization for Efficiently Improving Generalization","license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","headline":"Sharpness-Aware Minimization finds parameters in flat loss neighborhoods to improve generalization over standard training.","cross_cats":["stat.ML"],"primary_cat":"cs.LG","authors_text":"Ariel Kleiner, Behnam Neyshabur, Hossein Mobahi, Pierre Foret","submitted_at":"2020-10-03T19:02:10Z","abstract_excerpt":"In today's heavily overparameterized models, the value of the training loss provides few guarantees on model generalization ability. Indeed, optimizing only the training loss value, as is commonly done, can easily lead to suboptimal model quality. Motivated by prior work connecting the geometry of the loss landscape and generalization, we introduce a novel, effective procedure for instead simultaneously minimizing loss value and loss sharpness. In particular, our procedure, Sharpness-Aware Minimization (SAM), seeks parameters that lie in neighborhoods having uniformly low loss; this formulatio"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"SAM improves model generalization across a variety of benchmark datasets (e.g., CIFAR-10, CIFAR-100, ImageNet, finetuning tasks) and models, yielding novel state-of-the-art performance for several.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"That seeking parameters whose neighborhoods have uniformly low loss will reliably produce better generalization than standard training; this is motivated by prior geometry work but is not derived from first principles in the given text.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"SAM solves a min-max problem to locate flat low-loss regions, improving generalization on CIFAR, ImageNet and label-noise tasks.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"Sharpness-Aware Minimization finds parameters in flat loss neighborhoods to improve generalization over standard training.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"3aea69fd61dbdf0c9eb9c2c9c472d173016bf12aa211ffc2fd239afce0e58cd0"},"source":{"id":"2010.01412","kind":"arxiv","version":3},"verdict":{"id":"618582a9-e956-461e-bc54-1a2bad3f18b8","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-16T20:10:54.430679Z","strongest_claim":"SAM improves model generalization across a variety of benchmark datasets (e.g., CIFAR-10, CIFAR-100, ImageNet, finetuning tasks) and models, yielding novel state-of-the-art performance for several.","one_line_summary":"SAM solves a min-max problem to locate flat low-loss regions, improving generalization on CIFAR, ImageNet and label-noise tasks.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"That seeking parameters whose neighborhoods have uniformly low loss will reliably produce better generalization than standard training; this is motivated by prior geometry work but is not derived from first principles in the given text.","pith_extraction_headline":"Sharpness-Aware Minimization finds parameters in flat loss neighborhoods to improve generalization over standard training."},"references":{"count":47,"sample":[{"doi":"","year":2021,"title":"URL https://openreview.net/forum? id=BJl6t64tvr. 8https://github.com/google/spectral-density 9https://github.com/davda54/sam 9 Published as a conference paper at ICLR 2021 James Bradbury, Roy Frostig,","work_id":"e58840cd-cb2a-403b-b355-2eb2b6befce5","ref_index":1,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":null,"title":"Entropy-sgd: Biasing gradient descent into wide valleys","work_id":"bdc7ed98-348d-4ac0-acaa-b46ab29dcc3b","ref_index":2,"cited_arxiv_id":"1611.01838","is_internal_anchor":true},{"doi":"","year":1905,"title":"Understanding and Utilizing Deep Neural Networks Trained with Noisy Labels","work_id":"f56304ba-0708-4b42-8999-91ede99f534b","ref_index":4,"cited_arxiv_id":"1905.05040","is_internal_anchor":true},{"doi":"","year":null,"title":"AutoAugment: Learning Augmentation Policies from Data","work_id":"9cfcaaf4-6f01-4522-b146-cf16d4be7b90","ref_index":5,"cited_arxiv_id":"1805.09501","is_internal_anchor":true},{"doi":"","year":null,"title":"Improved Regularization of Convolutional Neural Networks with Cutout","work_id":"a3bf8477-f913-4f6a-8e36-125767300d1f","ref_index":7,"cited_arxiv_id":"1708.04552","is_internal_anchor":true}],"resolved_work":47,"snapshot_sha256":"aa05e7421b9f928449d8c9d3f56ecb17dc53890d613051329a80c90f82a37f64","internal_anchors":27},"formal_canon":{"evidence_count":2,"snapshot_sha256":"db4b72609f52c55c47d5694542eadf78c2cd17797598f6f0963f3d5bf4d93aa0"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"}