Tracking Anything with Decoupled Video Segmentation

Alexander Schwing; Brian Price; Ho Kei Cheng; Joon-Young Lee; Seoung Wug Oh

arxiv: 2309.03903 · v1 · pith:MFIHPM24new · submitted 2023-09-07 · 💻 cs.CV

Tracking Anything with Decoupled Video Segmentation

Ho Kei Cheng , Seoung Wug Oh , Brian Price , Alexander Schwing , Joon-Young Lee This is my paper

classification 💻 cs.CV

keywords segmentationvideodecoupledpropagationtasksanythingbi-directionaldata

0 comments

read the original abstract

Training data for video segmentation are expensive to annotate. This impedes extensions of end-to-end algorithms to new video segmentation tasks, especially in large-vocabulary settings. To 'track anything' without training on video data for every individual task, we develop a decoupled video segmentation approach (DEVA), composed of task-specific image-level segmentation and class/task-agnostic bi-directional temporal propagation. Due to this design, we only need an image-level model for the target task (which is cheaper to train) and a universal temporal propagation model which is trained once and generalizes across tasks. To effectively combine these two modules, we use bi-directional propagation for (semi-)online fusion of segmentation hypotheses from different frames to generate a coherent segmentation. We show that this decoupled formulation compares favorably to end-to-end approaches in several data-scarce tasks including large-vocabulary video panoptic segmentation, open-world video segmentation, referring video segmentation, and unsupervised video object segmentation. Code is available at: https://hkchengrex.github.io/Tracking-Anything-with-DEVA

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

4D Vessel Reconstruction for Benchtop Thrombectomy Analysis
eess.IV 2026-04 conditional novelty 5.0

A nine-camera multi-view workflow with 4D Gaussian Splatting reconstructs dynamic vessel surfaces in thrombectomy phantoms to enable standardized comparative displacement and stress-proxy tracking.