Putting the Object Back into Video Object Segmentation
read the original abstract
We present Cutie, a video object segmentation (VOS) network with object-level memory reading, which puts the object representation from memory back into the video object segmentation result. Recent works on VOS employ bottom-up pixel-level memory reading which struggles due to matching noise, especially in the presence of distractors, resulting in lower performance in more challenging data. In contrast, Cutie performs top-down object-level memory reading by adapting a small set of object queries. Via those, it interacts with the bottom-up pixel features iteratively with a query-based object transformer (qt, hence Cutie). The object queries act as a high-level summary of the target object, while high-resolution feature maps are retained for accurate segmentation. Together with foreground-background masked attention, Cutie cleanly separates the semantics of the foreground object from the background. On the challenging MOSE dataset, Cutie improves by 8.7 J&F over XMem with a similar running time and improves by 4.2 J&F over DeAOT while being three times faster. Code is available at: https://hkchengrex.github.io/Cutie
This paper has not been read by Pith yet.
Forward citations
Cited by 3 Pith papers
-
Enabling Extensible Embodied Capabilities with Tools
Introduces Embodied Tool Protocol and tool externalization to improve embodied AI performance on perception and cognition tasks, with measured gains but limits on execution capabilities.
-
SigLoMa: Learning Open-World Quadrupedal Loco-Manipulation from Ego-Centric Vision
SigLoMa enables dynamic loco-manipulation on quadrupeds from ego-centric 5 Hz vision alone by using Sigma Points for scalable exteroception, an ego-centric Kalman Filter for high-rate state estimation, and an active s...
-
4D Vessel Reconstruction for Benchtop Thrombectomy Analysis
A nine-camera multi-view workflow with 4D Gaussian Splatting reconstructs dynamic vessel surfaces in thrombectomy phantoms to enable standardized comparative displacement and stress-proxy tracking.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.