CompART adds a composition loss on decomposed captions to regularize attention sums and improves multi-object grounding plus VQA across four VLM types and six benchmarks.
Qwen2.5 technical report,
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.CV 2verdicts
UNVERDICTED 2representative citing papers
Zero-shot VLM evaluation on WLASL300 reveals open-source models lag far behind supervised ISLR baselines, but proprietary models improve with scale and exhibit some visual-semantic alignment.
citing papers explorer
-
The ART of Composition: Attention-Regularized Training for Compositional Visual Grounding
CompART adds a composition loss on decomposed captions to regularize attention sums and improves multi-object grounding plus VQA across four VLM types and six benchmarks.
-
Sign Language Recognition in the Age of LLMs
Zero-shot VLM evaluation on WLASL300 reveals open-source models lag far behind supervised ISLR baselines, but proprietary models improve with scale and exhibit some visual-semantic alignment.