VASA is a vision-guided agent for open ad-hoc segmentation that creates and validates masks through planning, tool use, and error recovery, outperforming baselines on the new PARS benchmark and RefCOCOm.
org/abs/2411.14347
5 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
SAM 3 introduces promptable concept segmentation that doubles accuracy of prior systems on images and videos while improving standard SAM segmentation performance.
DeFacto trains multimodal models with counterfactual image variants and GRPO reinforcement learning to enforce that correct answers are supported by correct visual evidence.
See&Say combines depth gradients, semantic masks, and VLM-guided refinement to generate safety maps and alternative drop zones for autonomous drone deliveries, outperforming baselines in accuracy and IoU.
citing papers explorer
-
Vision Harnessing Agent for Open Ad-hoc Segmentation
VASA is a vision-guided agent for open ad-hoc segmentation that creates and validates masks through planning, tool use, and error recovery, outperforming baselines on the new PARS benchmark and RefCOCOm.
-
SAM 3: Segment Anything with Concepts
SAM 3 introduces promptable concept segmentation that doubles accuracy of prior systems on images and videos while improving standard SAM segmentation performance.
-
DeFacto: Counterfactual Thinking with Images for Enforcing Evidence-Grounded and Faithful Reasoning
DeFacto trains multimodal models with counterfactual image variants and GRPO reinforcement learning to enforce that correct answers are supported by correct visual evidence.
-
See&Say: Vision Language Guided Safe Zone Detection for Autonomous Package Delivery Drones
See&Say combines depth gradients, semantic masks, and VLM-guided refinement to generate safety maps and alternative drop zones for autonomous drone deliveries, outperforming baselines in accuracy and IoU.
- Image Generators are Generalist Vision Learners