LLaVA-OneVision-1.5 provides open datasets, code, and models that match or exceed closed competitors on 27 benchmarks at low cost through curated data and efficient training.
Fengshenbang 1.0: Being the foundation of chinese cognitive intelligence
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
citation-role summary
baseline 1
citation-polarity summary
fields
cs.CV 2verdicts
UNVERDICTED 2roles
baseline 1polarities
baseline 1representative citing papers
InternVL scales a vision model to 6B parameters and aligns it with LLMs using web data to achieve state-of-the-art results on 32 visual-linguistic benchmarks.
citing papers explorer
-
LLaVA-OneVision-1.5: Fully Open Framework for Democratized Multimodal Training
LLaVA-OneVision-1.5 provides open datasets, code, and models that match or exceed closed competitors on 27 benchmarks at low cost through curated data and efficient training.
-
InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks
InternVL scales a vision model to 6B parameters and aligns it with LLMs using web data to achieve state-of-the-art results on 32 visual-linguistic benchmarks.