Sharpness-aware pretraining and related flat-minima interventions reduce catastrophic forgetting by up to 80% after post-training across 20M-150M models and by 31-40% at 1B scale.
Asam: Adaptive sharpness-aware minimization for scale-invariant learning of deep neural networks
4 Pith papers cite this work. Polarity classification is still indexing.
4
Pith papers citing it
verdicts
UNVERDICTED 4representative citing papers
Atlas reaches over 42% accuracy on Natural Questions with only 64 examples, outperforming a 540B-parameter model by 3% with 50x fewer parameters.
Contrastive learning trains unsupervised dense retrievers that beat BM25 on most BEIR datasets and support cross-lingual retrieval across scripts.
Z-Score Filtered SAM retains only high absolute Z-score gradient components per layer during the ascent step and reports higher test accuracy than standard SAM on CIFAR and Tiny-ImageNet benchmarks.
citing papers explorer
-
Atlas: Few-shot Learning with Retrieval Augmented Language Models
Atlas reaches over 42% accuracy on Natural Questions with only 64 examples, outperforming a 540B-parameter model by 3% with 50x fewer parameters.