AMUSE is a new optimizer integrating Muon orthogonalization with Schedule-Free averaging via adaptive interpolation for schedule-free anytime training that improves Pareto frontiers on vision and LLM tasks.
NIPS workshop on deep learning and unsupervised feature learning , volume=
5 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.LG 5years
2026 5representative citing papers
Spectral Gradient Surgery disentangles class-discriminative and domain-specific signals in distribution-matching distilled datasets by analyzing gradient agreement in the spectral domain, yielding better out-of-distribution performance.
DAPPr projects a possibilistic posterior over network parameters to predictions using supremum operators and approximates it with learnable Dirichlet functions to yield an efficient training objective for epistemic uncertainty.
Avoiding CenterLoss improves OOD detection via multi-scale Mahalanobis on L2-normalized features, yielding 0.9483 AUROC on CIFAR-10 while preserving competitive in-distribution accuracy.
A multi-encoder fusion of representation-space diffusion models via EncMin2L and Tippett minimum p-value combination detects OOD across global, semantic, texture, and corruption shifts with >=0.94 AUROC at reduced parameter cost.
citing papers explorer
-
AMUSE: Anytime Muon with Stable Gradient Evaluation
AMUSE is a new optimizer integrating Muon orthogonalization with Schedule-Free averaging via adaptive interpolation for schedule-free anytime training that improves Pareto frontiers on vision and LLM tasks.
-
Spectral Gradient Surgery for Domain-Generalizable Dataset Distillation
Spectral Gradient Surgery disentangles class-discriminative and domain-specific signals in distribution-matching distilled datasets by analyzing gradient agreement in the spectral domain, yielding better out-of-distribution performance.
-
Possibilistic Predictive Uncertainty for Deep Learning
DAPPr projects a possibilistic posterior over network parameters to predictions using supremum operators and approximates it with learnable Dirichlet functions to yield an efficient training objective for epistemic uncertainty.
-
Tippett-minimum Fusion of Representation-space Diffusion Models for Multi-Encoder Out-of-Distribution Detection
A multi-encoder fusion of representation-space diffusion models via EncMin2L and Tippett minimum p-value combination detects OOD across global, semantic, texture, and corruption shifts with >=0.94 AUROC at reduced parameter cost.