pith. sign in

Canonical reference

Multimodal founda- tion models: From specialists to general-purpose assistants

Canonical reference. 100% of citing Pith papers cite this work as background.

7 Pith papers citing it
Background 100% of classified citations

citation-role summary

background 7

citation-polarity summary

fields

cs.CV 6 cs.LG 1

years

2024 2 2023 5

roles

background 7

polarities

background 7

representative citing papers

Visual Instruction Tuning

cs.CV · 2023-04-17 · unverdicted · novelty 7.0

LLaVA is trained on GPT-4 generated visual instruction data to achieve 85.1% relative performance to GPT-4 on synthetic multimodal tasks and 92.53% accuracy on Science QA.

Improved Baselines with Visual Instruction Tuning

cs.CV · 2023-10-05 · conditional · novelty 4.0

Simple changes to LLaVA using CLIP-ViT-L-336px, an MLP connector, and academic VQA data yield state-of-the-art results on 11 benchmarks with only 1.2M public examples and one-day training on 8 A100 GPUs.

citing papers explorer

Showing 7 of 7 citing papers.