ALLaVA creates 1.3M GPT4V-synthesized samples enabling 4B VLMs to achieve competitive results on 17 benchmarks and match 7B/13B models on some tasks.
Tinyllava factory: A modularized codebase for small-scale large multimodal models
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
verdicts
UNVERDICTED 2representative citing papers
AERIS organizes small language models into dynamic roles for edge UAVs with attention-subgoal alignment to enable long-horizon vision-language navigation while preserving real-time closed-loop operation.
citing papers explorer
-
ALLaVA: Harnessing GPT4V-Synthesized Data for Lite Vision-Language Models
ALLaVA creates 1.3M GPT4V-synthesized samples enabling 4B VLMs to achieve competitive results on 17 benchmarks and match 7B/13B models on some tasks.
-
AERIS: Aerial-Edge Role-Driven Intelligence at Runtime via Orchestrated Language-Model Swarm
AERIS organizes small language models into dynamic roles for edge UAVs with attention-subgoal alignment to enable long-horizon vision-language navigation while preserving real-time closed-loop operation.