CineMatte uses a cross-attention design on a Siamese DINOv3 ViT plus a pretrained upsampler to produce robust mattes for virtual production, backed by a new non-synthetic 4K VP dataset that supports camera motion.
How do vision transformers work?
8 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 2polarities
background 2representative citing papers
VECA learns effective visual representations using core-periphery attention where patches interact exclusively via a resolution-invariant set of learned core embeddings, achieving linear O(N) complexity while maintaining competitive performance.
PnP-Corrector decouples physics simulation from error correction via a plug-and-play agent, cutting error by 29% in 300-day global ocean-atmosphere forecasts.
A new scale-aware diagnostic framework shows that unconstrained diffusion generative models exhibit structural freezing and instability instead of smooth physical responses under multiscale perturbations.
FreqTrack is a frequency-aware RGB-event tracking model using spectral enhancement transformers and wavelet edge refinement that reaches 76.6% precision on the COESOT benchmark.
Randomly initialized Transformers act as adaptive sequence smoothers for sleep staging via a Random Attention Prior Kernel, with gains mainly from inductive bias rather than training.
The OG-ReG Transformer achieves state-of-the-art results on Kinetics-400, Something-Something v2, and Diving-48 by combining global glance and local gaze processing paths.
CPRAformer fuses spatial-channel and global-local attention paradigms via SPC-SA, SPR-SA, and AAFM to achieve state-of-the-art image deraining on eight benchmarks.
citing papers explorer
-
CineMatte: Background Matting for Virtual Production and Beyond
CineMatte uses a cross-attention design on a Siamese DINOv3 ViT plus a pretrained upsampler to produce robust mattes for virtual production, backed by a new non-synthetic 4K VP dataset that supports camera motion.
-
Elastic Attention Cores for Scalable Vision Transformers
VECA learns effective visual representations using core-periphery attention where patches interact exclusively via a resolution-invariant set of learned core embeddings, achieving linear O(N) complexity while maintaining competitive performance.
-
PnP-Corrector: A Universal Correction Framework for Coupled Spatiotemporal Forecasting
PnP-Corrector decouples physics simulation from error correction via a plug-and-play agent, cutting error by 29% in 300-day global ocean-atmosphere forecasts.
-
Scale-Aware Adversarial Analysis: A Diagnostic for Generative AI in Multiscale Complex Systems
A new scale-aware diagnostic framework shows that unconstrained diffusion generative models exhibit structural freezing and instability instead of smooth physical responses under multiscale perturbations.
-
FreqTrack: Frequency Learning based Vision Transformer for RGB-Event Object Tracking
FreqTrack is a frequency-aware RGB-event tracking model using spectral enhancement transformers and wavelet edge refinement that reaches 76.6% precision on the COESOT benchmark.
-
Rethinking Random Transformers as Adaptive Sequence Smoothers for Sleep Staging
Randomly initialized Transformers act as adaptive sequence smoothers for sleep staging via a Random Attention Prior Kernel, with gains mainly from inductive bias rather than training.
-
Insights from Visual Cognition: Understanding Human Action Dynamics with Overall Glance and Refined Gaze Transformer
The OG-ReG Transformer achieves state-of-the-art results on Kinetics-400, Something-Something v2, and Diving-48 by combining global glance and local gaze processing paths.
-
Cross Paradigm Representation and Alignment Transformer for Image Deraining
CPRAformer fuses spatial-channel and global-local attention paradigms via SPC-SA, SPR-SA, and AAFM to achieve state-of-the-art image deraining on eight benchmarks.