A framework mines spatial, functional, and qualitative commonsense constraints from SGG training data and uses them to correct ranked predictions at inference, yielding consistent gains on three benchmarks.
From recognition to cog- nition: Visual commonsense reasoning
2 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 1polarities
background 1representative citing papers
The survey identifies a key tension in multilingual vision-language models between language neutrality via contrastive learning and cultural awareness via diverse data, with most benchmarks relying on translation-based evaluation.
citing papers explorer
-
Visual Commonsense Driven Knowledge Refinements for Scene Graph Generation
A framework mines spatial, functional, and qualitative commonsense constraints from SGG training data and uses them to correct ranked predictions at inference, yielding consistent gains on three benchmarks.
-
Multilingual Vision-Language Models, A Survey
The survey identifies a key tension in multilingual vision-language models between language neutrality via contrastive learning and cultural awareness via diverse data, with most benchmarks relying on translation-based evaluation.