pith. sign in

Canonical reference

Cheap and quick: Efficient vision-language instruction tuning for large language models

Canonical reference. 80% of citing Pith papers cite this work as background.

5 Pith papers citing it
Background 80% of classified citations

citation-role summary

background 3 baseline 1 method 1

citation-polarity summary

fields

cs.CV 4 cs.CL 1

years

2024 1 2023 4

representative citing papers

Are We on the Right Way for Evaluating Large Vision-Language Models?

cs.CV · 2024-03-29 · conditional · novelty 6.0

Current LVLM benchmarks overestimate capabilities because many questions can be answered without images due to design flaws or data leakage; MMStar is a human-curated set of 1,500 vision-indispensable samples across 6 capabilities and 18 axes with new metrics for leakage and true multi-modal gain.

A Survey on Multimodal Large Language Models

cs.CV · 2023-06-23 · accept · novelty 3.0

This survey organizes the architectures, training strategies, data, evaluation methods, extensions, and challenges of Multimodal Large Language Models.

citing papers explorer

Showing 5 of 5 citing papers.