How do vision transformers work?

Park, N · 2022 · arXiv 2202.06709

11 Pith papers cite this work. Polarity classification is still indexing.

11 Pith papers citing it

read on arXiv browse 11 citing papers

citation-role summary

background 3

citation-polarity summary

background 3

representative citing papers

Identifying Latent Concepts and Structures for Generalized Category Discovery

cs.CV · 2026-07-01 · unverdicted · novelty 7.0

CPF-GCD enforces low-rank compositional structure on vision backbone features via spatial primitive fields so that novel categories emerge as new activation patterns over a shared vocabulary of reusable visual primitives.

CineMatte: Background Matting for Virtual Production and Beyond

cs.CV · 2026-05-18 · unverdicted · novelty 7.0

CineMatte uses a cross-attention design on a Siamese DINOv3 ViT plus a pretrained upsampler to produce robust mattes for virtual production, backed by a new non-synthetic 4K VP dataset that supports camera motion.

Does Your ViT Still Need U-Net for Segmentation?

cs.CV · 2026-06-30 · unverdicted · novelty 6.0

EoSeg shows that modern ViT backbones support accurate medical image segmentation without U-Net-style decoders via multi-level query modeling and learnable block fusion, with strong results on seven benchmarks.

Elastic Attention Cores for Scalable Vision Transformers

cs.CV · 2026-05-12 · unverdicted · novelty 6.0

VECA learns effective visual representations using core-periphery attention where patches interact exclusively via a resolution-invariant set of learned core embeddings, achieving linear O(N) complexity while maintaining competitive performance.

Scale-Aware Adversarial Analysis: A Diagnostic for Generative AI in Multiscale Complex Systems

cs.LG · 2026-05-01 · unverdicted · novelty 6.0

A new scale-aware diagnostic framework shows that unconstrained diffusion generative models exhibit structural freezing and instability instead of smooth physical responses under multiscale perturbations.

FreqTrack: Frequency Learning based Vision Transformer for RGB-Event Object Tracking

cs.CV · 2026-04-16 · unverdicted · novelty 6.0

FreqTrack is a frequency-aware RGB-event tracking model using spectral enhancement transformers and wavelet edge refinement that reaches 76.6% precision on the COESOT benchmark.

SFKD: Spatial--Frequency Joint-Aware Heterogeneous Knowledge Distillation via Multi-Level Wavelet Spectral Interaction

cs.CV · 2026-07-02 · unverdicted · novelty 5.0

SFKD uses multi-level discrete wavelet transform plus dual-stream refinement and Gaussian-filtered frequency loss to transfer spatial and global information across heterogeneous models.

Rethinking Random Transformers as Adaptive Sequence Smoothers for Sleep Staging

cs.LG · 2026-05-11 · unverdicted · novelty 5.0

Randomly initialized Transformers act as adaptive sequence smoothers for sleep staging via a Random Attention Prior Kernel, with gains mainly from inductive bias rather than training.

Insights from Visual Cognition: Understanding Human Action Dynamics with Overall Glance and Refined Gaze Transformer

cs.CV · 2026-04-08 · unverdicted · novelty 5.0

The OG-ReG Transformer achieves state-of-the-art results on Kinetics-400, Something-Something v2, and Diving-48 by combining global glance and local gaze processing paths.

Cross Paradigm Representation and Alignment Transformer for Image Deraining

cs.CV · 2025-04-23 · conditional · novelty 5.0

CPRAformer fuses spatial-channel and global-local attention paradigms via SPC-SA, SPR-SA, and AAFM to achieve state-of-the-art image deraining on eight benchmarks.

PnP-Corrector: A Universal Correction Framework for Coupled Spatiotemporal Forecasting

cs.AI · 2026-05-09 · unverdicted · novelty 4.0 · 2 refs

PnP-Corrector decouples pre-trained physics engines from a correction agent to mitigate reciprocal error amplification in coupled spatiotemporal forecasting, cutting error by 28% on a 300-day ocean-atmosphere task.

citing papers explorer

Showing 1 of 1 citing paper after filters.

PnP-Corrector: A Universal Correction Framework for Coupled Spatiotemporal Forecasting cs.AI · 2026-05-09 · unverdicted · none · ref 11 · 2 links
PnP-Corrector decouples pre-trained physics engines from a correction agent to mitigate reciprocal error amplification in coupled spatiotemporal forecasting, cutting error by 28% on a 300-day ocean-atmosphere task.

How do vision transformers work?

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer