An image is worth 16x16 words: Transformers for image recognition at scale

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby · 2021

9 Pith papers cite this work. Polarity classification is still indexing.

9 Pith papers citing it

browse 9 citing papers

citation-role summary

background 1 method 1

citation-polarity summary

background 1 use method 1

representative citing papers

Disentangled Sparse Representations for Concept-Separated Diffusion Unlearning

cs.LG · 2026-05-12 · unverdicted · novelty 7.0

SAEParate disentangles sparse representations in diffusion models via contrastive clustering and nonlinear encoding to enable more precise concept unlearning with reduced side effects.

Flatness and Gradient Alignment Are Both Necessary: Spectral-Aware Gradient-Aligned Exploration for Multi-Distribution Learning

cs.LG · 2026-05-08 · unverdicted · novelty 7.0

Excess risk decomposes into independent alignment (trace of inverse average Hessian times gradient covariance) and curvature terms, so both flatness and gradient alignment are required; SAGE achieves this and sets new SOTA on DomainBed.

Accelerating Inference for Multilayer Neural Networks with Quantum Computers

quant-ph · 2025-10-08 · unverdicted · novelty 7.0

Quantum circuits for coherent multilayer neural network inference achieve quadratic to polylogarithmic speedups over classical methods depending on quantum data access models for inputs and weights.

Empirical Bayes Conformal Prediction for Vision and Language Models

cs.LG · 2026-05-22 · unverdicted · novelty 6.0

Empirical Bayes conformal prediction converts score variability into r-value nonconformity scores that preserve target coverage while reducing inclusion of high-variance false candidates in image classification, CLIP VLMs, and LLMs.

VoxCor: Training-Free Volumetric Features for Multimodal Voxel Correspondence

cs.CV · 2026-05-13 · unverdicted · novelty 6.0

VoxCor creates reusable volumetric features from frozen 2D ViT models by combining triplanar inference with a closed-form weighted partial least squares projection, enabling direct voxel correspondence across modalities without training or registration.

Beyond Activation Alignment: The Geometry of Neural Sensitivity

cs.LG · 2026-05-04 · unverdicted · novelty 6.0

A new Spectral Riemannian Alignment Score (S-RAS) based on expected projected Fisher metrics quantifies local sensitivity in neural representations and supports layer matching, training dissociations, and brain data analysis.

Extending Pretrained 10-Second ECG Foundation Models to Longer Horizons

cs.LG · 2026-05-16 · unverdicted · novelty 5.0

A parameter-efficient plug-in framework adds structurally compatible long-sequence processing and semantically informed temporal modeling to extend pretrained 10-second ECG foundation models to longer variable-length inputs.

MahaVar: OOD Detection via Class-wise Mahalanobis Distance Variance under Neural Collapse

cs.LG · 2026-05-14 · conditional · novelty 5.0

MahaVar augments the Mahalanobis OOD score with class-wise distance variance, which is theoretically higher for in-distribution samples under relaxed Neural Collapse geometry.

Architecture-agnostic Lipschitz-constant Bayesian header and its application to resolve semantically proximal classification errors with vision transformers

cs.CV · 2026-05-07 · unverdicted · novelty 5.0

LipB-ViT adds bi-Lipschitz Bayesian layers to vision transformers and uses uncertainty-aware fusion to identify corrupted labels with over 93% recall at 15% noise, beating kNN baselines.

citing papers explorer

Showing 9 of 9 citing papers.

Disentangled Sparse Representations for Concept-Separated Diffusion Unlearning cs.LG · 2026-05-12 · unverdicted · none · ref 10
SAEParate disentangles sparse representations in diffusion models via contrastive clustering and nonlinear encoding to enable more precise concept unlearning with reduced side effects.
Flatness and Gradient Alignment Are Both Necessary: Spectral-Aware Gradient-Aligned Exploration for Multi-Distribution Learning cs.LG · 2026-05-08 · unverdicted · none · ref 54
Excess risk decomposes into independent alignment (trace of inverse average Hessian times gradient covariance) and curvature terms, so both flatness and gradient alignment are required; SAGE achieves this and sets new SOTA on DomainBed.
Accelerating Inference for Multilayer Neural Networks with Quantum Computers quant-ph · 2025-10-08 · unverdicted · none · ref 6
Quantum circuits for coherent multilayer neural network inference achieve quadratic to polylogarithmic speedups over classical methods depending on quantum data access models for inputs and weights.
Empirical Bayes Conformal Prediction for Vision and Language Models cs.LG · 2026-05-22 · unverdicted · none · ref 8
Empirical Bayes conformal prediction converts score variability into r-value nonconformity scores that preserve target coverage while reducing inclusion of high-variance false candidates in image classification, CLIP VLMs, and LLMs.
VoxCor: Training-Free Volumetric Features for Multimodal Voxel Correspondence cs.CV · 2026-05-13 · unverdicted · none · ref 12
VoxCor creates reusable volumetric features from frozen 2D ViT models by combining triplanar inference with a closed-form weighted partial least squares projection, enabling direct voxel correspondence across modalities without training or registration.
Beyond Activation Alignment: The Geometry of Neural Sensitivity cs.LG · 2026-05-04 · unverdicted · none · ref 10
A new Spectral Riemannian Alignment Score (S-RAS) based on expected projected Fisher metrics quantifies local sensitivity in neural representations and supports layer matching, training dissociations, and brain data analysis.
Extending Pretrained 10-Second ECG Foundation Models to Longer Horizons cs.LG · 2026-05-16 · unverdicted · none · ref 22
A parameter-efficient plug-in framework adds structurally compatible long-sequence processing and semantically informed temporal modeling to extend pretrained 10-second ECG foundation models to longer variable-length inputs.
MahaVar: OOD Detection via Class-wise Mahalanobis Distance Variance under Neural Collapse cs.LG · 2026-05-14 · conditional · none · ref 7
MahaVar augments the Mahalanobis OOD score with class-wise distance variance, which is theoretically higher for in-distribution samples under relaxed Neural Collapse geometry.
Architecture-agnostic Lipschitz-constant Bayesian header and its application to resolve semantically proximal classification errors with vision transformers cs.CV · 2026-05-07 · unverdicted · none · ref 32
LipB-ViT adds bi-Lipschitz Bayesian layers to vision transformers and uses uncertainty-aware fusion to identify corrupted labels with over 93% recall at 15% noise, beating kNN baselines.

An image is worth 16x16 words: Transformers for image recognition at scale

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer