Sharpness-aware pretraining and related flat-minima interventions reduce catastrophic forgetting by up to 80% after post-training across 20M-150M models and by 31-40% at 1B scale.
ASAM: adaptive sharpness-aware minimization for scale-invariant learning of deep neural networks
4 Pith papers cite this work. Polarity classification is still indexing.
verdicts
UNVERDICTED 4representative citing papers
Atlas reaches over 42% accuracy on Natural Questions with only 64 examples, outperforming a 540B-parameter model by 3% with 50x fewer parameters.
Contrastive learning trains unsupervised dense retrievers that beat BM25 on most BEIR datasets and support cross-lingual retrieval across scripts.
Z-Score Filtered SAM retains only high absolute Z-score gradient components per layer during the ascent step and reports higher test accuracy than standard SAM on CIFAR and Tiny-ImageNet benchmarks.
citing papers explorer
-
Sharpness-Aware Pretraining Mitigates Catastrophic Forgetting
Sharpness-aware pretraining and related flat-minima interventions reduce catastrophic forgetting by up to 80% after post-training across 20M-150M models and by 31-40% at 1B scale.
-
Atlas: Few-shot Learning with Retrieval Augmented Language Models
Atlas reaches over 42% accuracy on Natural Questions with only 64 examples, outperforming a 540B-parameter model by 3% with 50x fewer parameters.
-
Unsupervised Dense Information Retrieval with Contrastive Learning
Contrastive learning trains unsupervised dense retrievers that beat BM25 on most BEIR datasets and support cross-lingual retrieval across scripts.
-
Sharpness-Aware Minimization with Z-Score Gradient Filtering
Z-Score Filtered SAM retains only high absolute Z-score gradient components per layer during the ascent step and reports higher test accuracy than standard SAM on CIFAR and Tiny-ImageNet benchmarks.