Exploring plain vision transformer backbones for object de- tection

Yanghao Li, Hanzi Mao, Ross Girshick, Kaiming He · 2022

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

browse 2 citing papers

representative citing papers

CineMatte: Background Matting for Virtual Production and Beyond

cs.CV · 2026-05-18 · unverdicted · novelty 7.0

CineMatte uses a cross-attention design on a Siamese DINOv3 ViT plus a pretrained upsampler to produce robust mattes for virtual production, backed by a new non-synthetic 4K VP dataset that supports camera motion.

SToRe3D: Sparse Token Relevance in ViTs for Efficient Multi-View 3D Object Detection

cs.CV · 2026-05-13 · unverdicted · novelty 6.0

SToRe3D delivers up to 3x faster inference for multi-view 3D object detection in ViTs by selecting relevant 2D tokens and 3D queries via mutual relevance heads with only marginal accuracy loss.

citing papers explorer

Showing 2 of 2 citing papers.

CineMatte: Background Matting for Virtual Production and Beyond cs.CV · 2026-05-18 · unverdicted · none · ref 25
CineMatte uses a cross-attention design on a Siamese DINOv3 ViT plus a pretrained upsampler to produce robust mattes for virtual production, backed by a new non-synthetic 4K VP dataset that supports camera motion.
SToRe3D: Sparse Token Relevance in ViTs for Efficient Multi-View 3D Object Detection cs.CV · 2026-05-13 · unverdicted · none · ref 26
SToRe3D delivers up to 3x faster inference for multi-view 3D object detection in ViTs by selecting relevant 2D tokens and 3D queries via mutual relevance heads with only marginal accuracy loss.

Exploring plain vision transformer backbones for object de- tection

fields

years

verdicts

representative citing papers

citing papers explorer