pith. sign in

VL-SAM-v2: Open-world object detection with general and specific query fusion

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

citation-role summary

baseline 1

citation-polarity summary

years

2026 3

verdicts

UNVERDICTED 3

roles

baseline 1

polarities

baseline 1

representative citing papers

FoodCHA: Multi-Modal LLM Agent for Fine-Grained Food Analysis

cs.AI · 2026-05-06 · unverdicted · novelty 6.0

FoodCHA reformulates food recognition as hierarchical decision-making with the Moondream-2B model, achieving 13.8%, 38.2%, and 153.2% precision gains in category, subcategory, and cooking style recognition over Food-Llama-3.2-11B on FoodNExTDB.

VL-SAM-v3: Memory-Guided Visual Priors for Open-World Object Detection

cs.CV · 2026-05-05 · unverdicted · novelty 5.0 · 3 refs

VL-SAM-v3 retrieves visual prototypes from memory to generate sparse spatial and dense contextual priors that refine detection prompts, yielding gains on rare categories in LVIS for both open-vocabulary and open-ended settings.

citing papers explorer

Showing 3 of 3 citing papers.

  • FoodCHA: Multi-Modal LLM Agent for Fine-Grained Food Analysis cs.AI · 2026-05-06 · unverdicted · none · ref 23

    FoodCHA reformulates food recognition as hierarchical decision-making with the Moondream-2B model, achieving 13.8%, 38.2%, and 153.2% precision gains in category, subcategory, and cooking style recognition over Food-Llama-3.2-11B on FoodNExTDB.

  • VL-SAM-v3: Memory-Guided Visual Priors for Open-World Object Detection cs.CV · 2026-05-05 · unverdicted · none · ref 28 · 3 links

    VL-SAM-v3 retrieves visual prototypes from memory to generate sparse spatial and dense contextual priors that refine detection prompts, yielding gains on rare categories in LVIS for both open-vocabulary and open-ended settings.

  • Visual Prompt Based Reasoning for Offroad Mapping using Multimodal LLMs cs.RO · 2026-04-06 · unverdicted · none · ref 20

    A zero-shot pipeline uses SAM2 segmentation plus numeric-label prompting of a VLM to identify drivable off-road areas and enable navigation without task-specific training or datasets.