Sa2VA unifies SAM-2 segmentation with MLLM reasoning into a single model for referring segmentation and conversation on images and videos, supported by a new 72k-expression Ref-SAV dataset.
Gary Chan
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
citation-role summary
background 1
citation-polarity summary
fields
cs.CV 2years
2025 2roles
background 1polarities
background 1representative citing papers
LLaVA-OneVision-1.5 provides open datasets, code, and models that match or exceed closed competitors on 27 benchmarks at low cost through curated data and efficient training.
citing papers explorer
-
Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos
Sa2VA unifies SAM-2 segmentation with MLLM reasoning into a single model for referring segmentation and conversation on images and videos, supported by a new 72k-expression Ref-SAV dataset.
-
LLaVA-OneVision-1.5: Fully Open Framework for Democratized Multimodal Training
LLaVA-OneVision-1.5 provides open datasets, code, and models that match or exceed closed competitors on 27 benchmarks at low cost through curated data and efficient training.