pith. sign in

hub Mixed citations

Tinyllava: A framework of small-scale large multimodal models

Mixed citation behavior. Most common role is background (60%).

10 Pith papers citing it
Background 60% of classified citations

hub tools

citation-role summary

background 3 baseline 2

citation-polarity summary

representative citing papers

Are We on the Right Way for Evaluating Large Vision-Language Models?

cs.CV · 2024-03-29 · conditional · novelty 6.0

Current LVLM benchmarks overestimate capabilities because many questions can be answered without images due to design flaws or data leakage; MMStar is a human-curated set of 1,500 vision-indispensable samples across 6 capabilities and 18 axes with new metrics for leakage and true multi-modal gain.

citing papers explorer

Showing 10 of 10 citing papers.