Vision-language grounding shows high prompt sensitivity, with different wordings for the same object leading to distinct instance selections and text embeddings explaining only 34% of the disagreement.
Title resolution pending
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CV 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Prompt Sensitivity in Vision-Language Grounding: How Small Changes in Wording Affect Object Detection
Vision-language grounding shows high prompt sensitivity, with different wordings for the same object leading to distinct instance selections and text embeddings explaining only 34% of the disagreement.