Vision Transformer (ViT) applies a standard transformer directly to image patches and matches or exceeds state-of-the-art CNN performance on classification benchmarks after large-scale pre-training.
Title resolution pending
7 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 1polarities
background 1representative citing papers
A CNN-based discrete diffusion method refines sparse contours from segmentation masks using simplified denoising steps and minimal post-processing, outperforming baselines on small medical and environmental datasets while running 3.5 times faster.
Low-resolution data improves high-resolution model performance when high-resolution samples are limited, via KL-divergence bounds and experiments on vision transformers and CNNs.
New cycle-consistent optimization, task vector theory, singular vector decompositions, adaptive routing, and efficient evolutionary search provide foundations for merging neural network weights across tasks.
Self-supervised pre-training on multimodal neutrino detector simulations produces reusable representations that improve downstream classification, regression, and data efficiency over training from scratch.
VaRDASS improves unsupervised domain adaptation by using stratified sampling to reduce variance in discrepancy estimation for measures like correlation alignment and MMD, with derived error bounds, an optimality proof for MMD under assumptions, and a k-means style algorithm.
LGD reaches Bayes optimality at optimal hyperparameters and admits an O(dh) pseudo-dimension bound for meta-learning hyperparameters on convex regression tasks.
citing papers explorer
-
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Vision Transformer (ViT) applies a standard transformer directly to image patches and matches or exceeds state-of-the-art CNN performance on classification benchmarks after large-scale pre-training.
-
Contour Refinement using Discrete Diffusion in Low Data Regime
A CNN-based discrete diffusion method refines sparse contours from segmentation masks using simplified denoising steps and minimal post-processing, outperforming baselines on small medical and environmental datasets while running 3.5 times faster.
-
On What We Can Learn from Low-Resolution Data
Low-resolution data improves high-resolution model performance when high-resolution samples are limited, via KL-divergence bounds and experiments on vision transformers and CNNs.
-
Model Merging: Foundations and Algorithms
New cycle-consistent optimization, task vector theory, singular vector decompositions, adaptive routing, and efficient evolutionary search provide foundations for merging neural network weights across tasks.
-
Towards foundation-style models for energy-frontier heterogeneous neutrino detectors via self-supervised pre-training
Self-supervised pre-training on multimodal neutrino detector simulations produces reusable representations that improve downstream classification, regression, and data efficiency over training from scratch.
-
Variance Matters: Improving Domain Adaptation via Stratified Sampling
VaRDASS improves unsupervised domain adaptation by using stratified sampling to reduce variance in discrepancy estimation for measures like correlation alignment and MMD, with derived error bounds, an optimality proof for MMD under assumptions, and a k-means style algorithm.
-
Generalization Guarantees on Data-Driven Tuning of Gradient Descent with Langevin Updates
LGD reaches Bayes optimality at optimal hyperparameters and admits an O(dh) pseudo-dimension bound for meta-learning hyperparameters on convex regression tasks.