CineMatte uses a cross-attention design on a Siamese DINOv3 ViT plus a pretrained upsampler to produce robust mattes for virtual production, backed by a new non-synthetic 4K VP dataset that supports camera motion.
Emerg- ing properties in self-supervised vision transformers
4 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.CV 4years
2026 4verdicts
UNVERDICTED 4roles
method 1polarities
use method 1representative citing papers
A scene-agnostic object codebook learned via unsupervised object-centric learning provides consistent identity-anchored representations for 3D Gaussians across multiple scenes.
DetRefiner fuses global and local features with a Transformer to refine OVOD confidence scores, delivering up to +10.1 AP gains on novel categories across multiple datasets.
Transformer backbones with mean pooling and combined self-supervised embeddings yield robust, compact representations for EO tasks that are over 500x smaller than raw data.
citing papers explorer
-
CineMatte: Background Matting for Virtual Production and Beyond
CineMatte uses a cross-attention design on a Siamese DINOv3 ViT plus a pretrained upsampler to produce robust mattes for virtual production, backed by a new non-synthetic 4K VP dataset that supports camera motion.
-
Scene-Agnostic Object-Centric Representation Learning for 3D Gaussian Splatting
A scene-agnostic object codebook learned via unsupervised object-centric learning provides consistent identity-anchored representations for 3D Gaussians across multiple scenes.
-
DetRefiner: Model-Agnostic Detection Refinement with Feature Fusion Transformer
DetRefiner fuses global and local features with a Transformer to refine OVOD confidence scores, delivering up to +10.1 AP gains on novel categories across multiple datasets.
-
How to Embed Matters: Evaluation of EO Embedding Design Choices
Transformer backbones with mean pooling and combined self-supervised embeddings yield robust, compact representations for EO tasks that are over 500x smaller than raw data.