CPF-GCD enforces low-rank compositional structure on vision backbone features via spatial primitive fields so that novel categories emerge as new activation patterns over a shared vocabulary of reusable visual primitives.
How do vision transformers work?
11 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 2polarities
background 2representative citing papers
CineMatte uses a cross-attention design on a Siamese DINOv3 ViT plus a pretrained upsampler to produce robust mattes for virtual production, backed by a new non-synthetic 4K VP dataset that supports camera motion.
EoSeg shows that modern ViT backbones support accurate medical image segmentation without U-Net-style decoders via multi-level query modeling and learnable block fusion, with strong results on seven benchmarks.
VECA learns effective visual representations using core-periphery attention where patches interact exclusively via a resolution-invariant set of learned core embeddings, achieving linear O(N) complexity while maintaining competitive performance.
A new scale-aware diagnostic framework shows that unconstrained diffusion generative models exhibit structural freezing and instability instead of smooth physical responses under multiscale perturbations.
FreqTrack is a frequency-aware RGB-event tracking model using spectral enhancement transformers and wavelet edge refinement that reaches 76.6% precision on the COESOT benchmark.
SFKD uses multi-level discrete wavelet transform plus dual-stream refinement and Gaussian-filtered frequency loss to transfer spatial and global information across heterogeneous models.
Randomly initialized Transformers act as adaptive sequence smoothers for sleep staging via a Random Attention Prior Kernel, with gains mainly from inductive bias rather than training.
The OG-ReG Transformer achieves state-of-the-art results on Kinetics-400, Something-Something v2, and Diving-48 by combining global glance and local gaze processing paths.
CPRAformer fuses spatial-channel and global-local attention paradigms via SPC-SA, SPR-SA, and AAFM to achieve state-of-the-art image deraining on eight benchmarks.
PnP-Corrector decouples pre-trained physics engines from a correction agent to mitigate reciprocal error amplification in coupled spatiotemporal forecasting, cutting error by 28% on a 300-day ocean-atmosphere task.
citing papers explorer
-
PnP-Corrector: A Universal Correction Framework for Coupled Spatiotemporal Forecasting
PnP-Corrector decouples pre-trained physics engines from a correction agent to mitigate reciprocal error amplification in coupled spatiotemporal forecasting, cutting error by 28% on a 300-day ocean-atmosphere task.