High-probability generalization bounds for D-SGD are derived at the optimal rate O(1/sqrt(mn) log(1/δ)) via pointwise uniform stability across convex and non-convex settings.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
verdicts
UNVERDICTED 2representative citing papers
Adding register tokens to Vision Transformers eliminates high-norm background artifacts and raises state-of-the-art performance on dense visual prediction tasks.
citing papers explorer
-
Unveiling High-Probability Generalization in Decentralized SGD
High-probability generalization bounds for D-SGD are derived at the optimal rate O(1/sqrt(mn) log(1/δ)) via pointwise uniform stability across convex and non-convex settings.
-
Vision Transformers Need Registers
Adding register tokens to Vision Transformers eliminates high-norm background artifacts and raises state-of-the-art performance on dense visual prediction tasks.