pith. sign in

hub Mixed citations

Ferret-v2: An improved baseline for referring and grounding with large language models

Mixed citation behavior. Most common role is background (67%).

12 Pith papers citing it
Background 67% of classified citations

hub tools

citation-role summary

background 4 baseline 1 dataset 1

citation-polarity summary

fields

cs.CV 11 cs.SD 1

representative citing papers

Qwen2.5-VL Technical Report

cs.CV · 2025-02-19 · unverdicted · novelty 5.0

Qwen2.5-VL reports a vision-language model family using native dynamic-resolution ViT and absolute time encoding that matches GPT-4o on document and diagram tasks while supporting hour-long videos with second-level localization.

PaliGemma: A versatile 3B VLM for transfer

cs.CV · 2024-07-10 · unverdicted · novelty 4.0

PaliGemma is an open 3B VLM based on SigLIP and Gemma that achieves strong performance on nearly 40 diverse open-world tasks including benchmarks, remote-sensing, and segmentation.

APRVOS: 1st Place Winner of 5th PVUW MeViS-Audio Track

cs.SD · 2026-04-20 · unverdicted · novelty 3.0

A staged pipeline using ASR transcription, visual existence verification, Sa2VA coarse segmentation, and agent-guided SAM3 refinement won first place in the PVUW MeViS-Audio track by decomposing audio-conditioned Ref-VOS into sequential verification and refinement steps.

citing papers explorer

Showing 12 of 12 citing papers.