FoodCHA reformulates food recognition as hierarchical decision-making with the Moondream-2B model, achieving 13.8%, 38.2%, and 153.2% precision gains in category, subcategory, and cooking style recognition over Food-Llama-3.2-11B on FoodNExTDB.
VL-SAM-v2: Open-world object detection with general and specific query fusion
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 3verdicts
UNVERDICTED 3roles
baseline 1polarities
baseline 1representative citing papers
VL-SAM-v3 retrieves visual prototypes from memory to generate sparse spatial and dense contextual priors that refine detection prompts, yielding gains on rare categories in LVIS for both open-vocabulary and open-ended settings.
A zero-shot pipeline uses SAM2 segmentation plus numeric-label prompting of a VLM to identify drivable off-road areas and enable navigation without task-specific training or datasets.
citing papers explorer
-
FoodCHA: Multi-Modal LLM Agent for Fine-Grained Food Analysis
FoodCHA reformulates food recognition as hierarchical decision-making with the Moondream-2B model, achieving 13.8%, 38.2%, and 153.2% precision gains in category, subcategory, and cooking style recognition over Food-Llama-3.2-11B on FoodNExTDB.
-
VL-SAM-v3: Memory-Guided Visual Priors for Open-World Object Detection
VL-SAM-v3 retrieves visual prototypes from memory to generate sparse spatial and dense contextual priors that refine detection prompts, yielding gains on rare categories in LVIS for both open-vocabulary and open-ended settings.
-
Visual Prompt Based Reasoning for Offroad Mapping using Multimodal LLMs
A zero-shot pipeline uses SAM2 segmentation plus numeric-label prompting of a VLM to identify drivable off-road areas and enable navigation without task-specific training or datasets.