How do vision transformers work?

Namuk Park, Songkuk Kim · 2022 · arXiv 2202.06709

8 Pith papers cite this work. Polarity classification is still indexing.

8 Pith papers citing it

read on arXiv browse 8 citing papers

citation-role summary

background 3

citation-polarity summary

background 3

representative citing papers

CineMatte: Background Matting for Virtual Production and Beyond

cs.CV · 2026-05-18 · unverdicted · novelty 7.0

CineMatte uses a cross-attention design on a Siamese DINOv3 ViT plus a pretrained upsampler to produce robust mattes for virtual production, backed by a new non-synthetic 4K VP dataset that supports camera motion.

Elastic Attention Cores for Scalable Vision Transformers

cs.CV · 2026-05-12 · unverdicted · novelty 6.0

VECA learns effective visual representations using core-periphery attention where patches interact exclusively via a resolution-invariant set of learned core embeddings, achieving linear O(N) complexity while maintaining competitive performance.

PnP-Corrector: A Universal Correction Framework for Coupled Spatiotemporal Forecasting

cs.AI · 2026-05-09 · unverdicted · novelty 6.0 · 2 refs

PnP-Corrector decouples physics simulation from error correction via a plug-and-play agent, cutting error by 29% in 300-day global ocean-atmosphere forecasts.

Scale-Aware Adversarial Analysis: A Diagnostic for Generative AI in Multiscale Complex Systems

cs.LG · 2026-05-01 · unverdicted · novelty 6.0

A new scale-aware diagnostic framework shows that unconstrained diffusion generative models exhibit structural freezing and instability instead of smooth physical responses under multiscale perturbations.

FreqTrack: Frequency Learning based Vision Transformer for RGB-Event Object Tracking

cs.CV · 2026-04-16 · unverdicted · novelty 6.0

FreqTrack is a frequency-aware RGB-event tracking model using spectral enhancement transformers and wavelet edge refinement that reaches 76.6% precision on the COESOT benchmark.

Rethinking Random Transformers as Adaptive Sequence Smoothers for Sleep Staging

cs.LG · 2026-05-11 · unverdicted · novelty 5.0

Randomly initialized Transformers act as adaptive sequence smoothers for sleep staging via a Random Attention Prior Kernel, with gains mainly from inductive bias rather than training.

Insights from Visual Cognition: Understanding Human Action Dynamics with Overall Glance and Refined Gaze Transformer

cs.CV · 2026-04-08 · unverdicted · novelty 5.0

The OG-ReG Transformer achieves state-of-the-art results on Kinetics-400, Something-Something v2, and Diving-48 by combining global glance and local gaze processing paths.

Cross Paradigm Representation and Alignment Transformer for Image Deraining

cs.CV · 2025-04-23 · conditional · novelty 5.0

CPRAformer fuses spatial-channel and global-local attention paradigms via SPC-SA, SPR-SA, and AAFM to achieve state-of-the-art image deraining on eight benchmarks.

citing papers explorer

Showing 8 of 8 citing papers.

CineMatte: Background Matting for Virtual Production and Beyond cs.CV · 2026-05-18 · unverdicted · none · ref 33
CineMatte uses a cross-attention design on a Siamese DINOv3 ViT plus a pretrained upsampler to produce robust mattes for virtual production, backed by a new non-synthetic 4K VP dataset that supports camera motion.
Elastic Attention Cores for Scalable Vision Transformers cs.CV · 2026-05-12 · unverdicted · none · ref 10
VECA learns effective visual representations using core-periphery attention where patches interact exclusively via a resolution-invariant set of learned core embeddings, achieving linear O(N) complexity while maintaining competitive performance.
PnP-Corrector: A Universal Correction Framework for Coupled Spatiotemporal Forecasting cs.AI · 2026-05-09 · unverdicted · none · ref 11 · 2 links
PnP-Corrector decouples physics simulation from error correction via a plug-and-play agent, cutting error by 29% in 300-day global ocean-atmosphere forecasts.
Scale-Aware Adversarial Analysis: A Diagnostic for Generative AI in Multiscale Complex Systems cs.LG · 2026-05-01 · unverdicted · none · ref 112
A new scale-aware diagnostic framework shows that unconstrained diffusion generative models exhibit structural freezing and instability instead of smooth physical responses under multiscale perturbations.
FreqTrack: Frequency Learning based Vision Transformer for RGB-Event Object Tracking cs.CV · 2026-04-16 · unverdicted · none · ref 5
FreqTrack is a frequency-aware RGB-event tracking model using spectral enhancement transformers and wavelet edge refinement that reaches 76.6% precision on the COESOT benchmark.
Rethinking Random Transformers as Adaptive Sequence Smoothers for Sleep Staging cs.LG · 2026-05-11 · unverdicted · none · ref 83
Randomly initialized Transformers act as adaptive sequence smoothers for sleep staging via a Random Attention Prior Kernel, with gains mainly from inductive bias rather than training.
Insights from Visual Cognition: Understanding Human Action Dynamics with Overall Glance and Refined Gaze Transformer cs.CV · 2026-04-08 · unverdicted · none · ref 61
The OG-ReG Transformer achieves state-of-the-art results on Kinetics-400, Something-Something v2, and Diving-48 by combining global glance and local gaze processing paths.
Cross Paradigm Representation and Alignment Transformer for Image Deraining cs.CV · 2025-04-23 · conditional · none · ref 39
CPRAformer fuses spatial-channel and global-local attention paradigms via SPC-SA, SPR-SA, and AAFM to achieve state-of-the-art image deraining on eight benchmarks.

How do vision transformers work?

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer