VL-SAM-v2: Open-world object detection with general and specific query fusion

Zhiwei Lin, Yongtao Wang · 2025 · arXiv 2505.18986

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

read on arXiv browse 3 citing papers

citation-role summary

baseline 1

citation-polarity summary

baseline 1

representative citing papers

FoodCHA: Multi-Modal LLM Agent for Fine-Grained Food Analysis

cs.AI · 2026-05-06 · unverdicted · novelty 6.0

FoodCHA reformulates food recognition as hierarchical decision-making with the Moondream-2B model, achieving 13.8%, 38.2%, and 153.2% precision gains in category, subcategory, and cooking style recognition over Food-Llama-3.2-11B on FoodNExTDB.

VL-SAM-v3: Memory-Guided Visual Priors for Open-World Object Detection

cs.CV · 2026-05-05 · unverdicted · novelty 5.0 · 3 refs

VL-SAM-v3 retrieves visual prototypes from memory to generate sparse spatial and dense contextual priors that refine detection prompts, yielding gains on rare categories in LVIS for both open-vocabulary and open-ended settings.

Visual Prompt Based Reasoning for Offroad Mapping using Multimodal LLMs

cs.RO · 2026-04-06 · unverdicted · novelty 5.0

A zero-shot pipeline uses SAM2 segmentation plus numeric-label prompting of a VLM to identify drivable off-road areas and enable navigation without task-specific training or datasets.

citing papers explorer

Showing 3 of 3 citing papers.

FoodCHA: Multi-Modal LLM Agent for Fine-Grained Food Analysis cs.AI · 2026-05-06 · unverdicted · none · ref 23
FoodCHA reformulates food recognition as hierarchical decision-making with the Moondream-2B model, achieving 13.8%, 38.2%, and 153.2% precision gains in category, subcategory, and cooking style recognition over Food-Llama-3.2-11B on FoodNExTDB.
VL-SAM-v3: Memory-Guided Visual Priors for Open-World Object Detection cs.CV · 2026-05-05 · unverdicted · none · ref 28 · 3 links
VL-SAM-v3 retrieves visual prototypes from memory to generate sparse spatial and dense contextual priors that refine detection prompts, yielding gains on rare categories in LVIS for both open-vocabulary and open-ended settings.
Visual Prompt Based Reasoning for Offroad Mapping using Multimodal LLMs cs.RO · 2026-04-06 · unverdicted · none · ref 20
A zero-shot pipeline uses SAM2 segmentation plus numeric-label prompting of a VLM to identify drivable off-road areas and enable navigation without task-specific training or datasets.

VL-SAM-v2: Open-world object detection with general and specific query fusion

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer