AMUSE is a new optimizer integrating Muon orthogonalization with Schedule-Free averaging via adaptive interpolation for schedule-free anytime training that improves Pareto frontiers on vision and LLM tasks.
NIPS workshop on deep learning and unsupervised feature learning , volume=
5 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.LG 5years
2026 5representative citing papers
Spectral Gradient Surgery disentangles class-discriminative and domain-specific signals in distribution-matching distilled datasets by analyzing gradient agreement in the spectral domain, yielding better out-of-distribution performance.
DAPPr introduces a possibilistic framework that projects parameter posteriors to predictions via supremum and approximates them with Dirichlet possibility functions to yield efficient, closed-form epistemic uncertainty estimates.
Avoiding CenterLoss improves OOD detection via multi-scale Mahalanobis on L2-normalized features, yielding 0.9483 AUROC on CIFAR-10 while preserving competitive in-distribution accuracy.
A multi-encoder fusion of representation-space diffusion models via EncMin2L and Tippett minimum p-value combination detects OOD across global, semantic, texture, and corruption shifts with >=0.94 AUROC at reduced parameter cost.
citing papers explorer
-
AMUSE: Anytime Muon with Stable Gradient Evaluation
AMUSE is a new optimizer integrating Muon orthogonalization with Schedule-Free averaging via adaptive interpolation for schedule-free anytime training that improves Pareto frontiers on vision and LLM tasks.
-
Spectral Gradient Surgery for Domain-Generalizable Dataset Distillation
Spectral Gradient Surgery disentangles class-discriminative and domain-specific signals in distribution-matching distilled datasets by analyzing gradient agreement in the spectral domain, yielding better out-of-distribution performance.
-
Possibilistic Predictive Uncertainty for Deep Learning
DAPPr introduces a possibilistic framework that projects parameter posteriors to predictions via supremum and approximates them with Dirichlet possibility functions to yield efficient, closed-form epistemic uncertainty estimates.
-
Don't Collapse Your Features: Why CenterLoss Hurts OOD Detection and Multi-Scale Mahalanobis Wins
Avoiding CenterLoss improves OOD detection via multi-scale Mahalanobis on L2-normalized features, yielding 0.9483 AUROC on CIFAR-10 while preserving competitive in-distribution accuracy.
-
Tippett-minimum Fusion of Representation-space Diffusion Models for Multi-Encoder Out-of-Distribution Detection
A multi-encoder fusion of representation-space diffusion models via EncMin2L and Tippett minimum p-value combination detects OOD across global, semantic, texture, and corruption shifts with >=0.94 AUROC at reduced parameter cost.