Muon outperforms Adam by reducing curvature penalty via lower Normalized Directional Sharpness, as shown via Taylor approximation on LLM training and proven on stylized quadratic problems with heterogeneous curvature.
Efficient sharpness-aware minimization for improved training of neural networks
4 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
Introduces MM-Privacy dataset and evaluations showing MLLMs leak sensitive data from images in various tasks, highlighting task inconsistency effects.
SAP locates safety-correlated directions via contrastive signals and perturbs hidden-state propagation with a lightweight probe to preserve safety while fine-tuning LLMs for task performance.
C-Flat Turbo accelerates continual learning by skipping redundant flatness gradients via direction-invariance observations and linear adaptive scheduling, delivering 1-1.25x speedup with comparable accuracy.
citing papers explorer
-
Secure LLM Fine-Tuning via Safety-Aware Probing
SAP locates safety-correlated directions via contrastive signals and perturbs hidden-state propagation with a lightweight probe to preserve safety while fine-tuning LLMs for task performance.