Infant-scale VLMs discriminate size and texture visually but perform poorly on color and struggle to ground attributes in text, while web-scale models excel at color grounding.
Pro- totypical networks for few-shot learning,
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2025 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Benchmarking Attribute Discrimination in Infant-Scale Vision-Language Models
Infant-scale VLMs discriminate size and texture visually but perform poorly on color and struggle to ground attributes in text, while web-scale models excel at color grounding.