An ILP-based oracle applied to seven VIS methods on YouTube-VIS and OVIS shows tracking instability as the dominant bottleneck, producing gaps exceeding 20 AP under occlusion while classification impact is secondary.
Mask2former for video instance segmentation
5 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.CV 5verdicts
UNVERDICTED 5roles
method 1polarities
use method 1representative citing papers
SA-VIS trains video instance segmentation models on sparse frame annotations via a Past-frames Feature Propagation module and frame-specific instance queries, showing only a 0.4% AP drop versus dense training on YouTube-VIS and OVIS benchmarks.
GOLD-BEV learns dense BEV semantic maps including dynamic agents from ego-centric sensors by using synchronized aerial imagery for training supervision and pseudo-label generation.
Primus and PrimusV2 are Transformer-centric models that match or exceed nnU-Net and top CNNs on nine 3D medical segmentation datasets by enforcing attention usage.
PAT-VCM adds lightweight auxiliary tokens to a shared baseline video stream to support multiple downstream machine tasks without task-specific codecs.
citing papers explorer
-
Mind the Gap: Disentangling Performance Bottlenecks in Video Instance Segmentation
An ILP-based oracle applied to seven VIS methods on YouTube-VIS and OVIS shows tracking instability as the dominant bottleneck, producing gaps exceeding 20 AP under occlusion while classification impact is secondary.
-
SA-VIS: Sparse frame Annotations for training Video Instance Segmentation
SA-VIS trains video instance segmentation models on sparse frame annotations via a Past-frames Feature Propagation module and frame-specific instance queries, showing only a 0.4% AP drop versus dense training on YouTube-VIS and OVIS benchmarks.
-
GOLD-BEV: GrOund and aeriaL Data for Dense Semantic BEV Mapping of Dynamic Scenes
GOLD-BEV learns dense BEV semantic maps including dynamic agents from ego-centric sensors by using synchronized aerial imagery for training supervision and pseudo-label generation.
-
PAT-VCM: Plug-and-Play Auxiliary Tokens for Video Coding for Machines
PAT-VCM adds lightweight auxiliary tokens to a shared baseline video stream to support multiple downstream machine tasks without task-specific codecs.