pith. sign in

hub Canonical reference

arXiv preprint arXiv:2402.05935 (2024) 3

Canonical reference. 80% of citing Pith papers cite this work as background.

13 Pith papers citing it
Background 80% of classified citations

hub tools

citation-role summary

background 4 baseline 1

citation-polarity summary

fields

cs.CV 13

representative citing papers

Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs

cs.CV · 2024-06-24 · unverdicted · novelty 7.0

Cambrian-1 is a vision-centric multimodal LLM family that evaluates over 20 vision encoders, introduces CV-Bench and the Spatial Vision Aggregator, and releases open models, code, and data achieving strong performance on visual grounding tasks.

Are We on the Right Way for Evaluating Large Vision-Language Models?

cs.CV · 2024-03-29 · conditional · novelty 6.0

Current LVLM benchmarks overestimate capabilities because many questions can be answered without images due to design flaws or data leakage; MMStar is a human-curated set of 1,500 vision-indispensable samples across 6 capabilities and 18 axes with new metrics for leakage and true multi-modal gain.

TempCompass: Do Video LLMs Really Understand Videos?

cs.CV · 2024-03-01 · unverdicted · novelty 6.0

TempCompass benchmark reveals that state-of-the-art Video LLMs have poor ability to perceive temporal aspects such as speed, direction, and ordering in videos.

citing papers explorer

Showing 13 of 13 citing papers.