Vision Transformer (ViT) applies a standard transformer directly to image patches and matches or exceeds state-of-the-art CNN performance on classification benchmarks after large-scale pre-training.
Polyak and Anatoli B
10 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 1polarities
background 1representative citing papers
Derives joint asymptotic jump-diffusion limit for global parameters and latent variables in SGLD-Gibbs under space-time rescaling, yielding explicit hyperparameter tuning guidance for calibrated uncertainty quantification.
A CNN-based discrete diffusion method refines sparse contours from segmentation masks using simplified denoising steps and minimal post-processing, outperforming baselines on small medical and environmental datasets while running 3.5 times faster.
Introduces and analyzes the λ-target update for linear Q-learning via geometric averaging of periodic target maps, studied with a switching-system model in the deterministic case.
Low-resolution data improves high-resolution model performance when high-resolution samples are limited, via KL-divergence bounds and experiments on vision transformers and CNNs.
New cycle-consistent optimization, task vector theory, singular vector decompositions, adaptive routing, and efficient evolutionary search provide foundations for merging neural network weights across tasks.
Self-supervised pre-training on multimodal neutrino detector simulations produces reusable representations that improve downstream classification, regression, and data efficiency over training from scratch.
VaRDASS improves unsupervised domain adaptation by using stratified sampling to reduce variance in discrepancy estimation for measures like correlation alignment and MMD, with derived error bounds, an optimality proof for MMD under assumptions, and a k-means style algorithm.
A multiscale optimization method using explicit protein backbone geometry reconstructs atomic models from cryo-EM data, showing improved RMSD and TM scores on three simulated datasets.
LGD reaches Bayes optimality at optimal hyperparameters and admits an O(dh) pseudo-dimension bound for meta-learning hyperparameters on convex regression tasks.
citing papers explorer
-
Large-scale Uncertainty Quantification for Latent Variable Models Using Subsampling Markov Chain Monte Carlo
Derives joint asymptotic jump-diffusion limit for global parameters and latent variables in SGLD-Gibbs under space-time rescaling, yielding explicit hyperparameter tuning guidance for calibrated uncertainty quantification.
-
Contour Refinement using Discrete Diffusion in Low Data Regime
A CNN-based discrete diffusion method refines sparse contours from segmentation masks using simplified denoising steps and minimal post-processing, outperforming baselines on small medical and environmental datasets while running 3.5 times faster.
-
Geometrically Averaged Hard Target Updates for Linear Q-Learning
Introduces and analyzes the λ-target update for linear Q-learning via geometric averaging of periodic target maps, studied with a switching-system model in the deterministic case.
-
On What We Can Learn from Low-Resolution Data
Low-resolution data improves high-resolution model performance when high-resolution samples are limited, via KL-divergence bounds and experiments on vision transformers and CNNs.
-
Model Merging: Foundations and Algorithms
New cycle-consistent optimization, task vector theory, singular vector decompositions, adaptive routing, and efficient evolutionary search provide foundations for merging neural network weights across tasks.
-
Towards foundation-style models for energy-frontier heterogeneous neutrino detectors via self-supervised pre-training
Self-supervised pre-training on multimodal neutrino detector simulations produces reusable representations that improve downstream classification, regression, and data efficiency over training from scratch.
-
Multiscale reconstruction of protein conformations from cryo-EM images
A multiscale optimization method using explicit protein backbone geometry reconstructs atomic models from cryo-EM data, showing improved RMSD and TM scores on three simulated datasets.
-
Generalization Guarantees on Data-Driven Tuning of Gradient Descent with Langevin Updates
LGD reaches Bayes optimality at optimal hyperparameters and admits an O(dh) pseudo-dimension bound for meta-learning hyperparameters on convex regression tasks.