RAR combines CLIP retrieval with MLLM ranking to improve few-shot and zero-shot fine-grained visual recognition on 5 benchmarks, 11 few-shot datasets, and 2 detection tasks.
Alpha-CLIP: A clip model focusing on wherever you want
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.CV 2years
2024 2verdicts
UNVERDICTED 2representative citing papers
InternLM-XComposer2 introduces Partial LoRA on InternLM2-7B to enable high-quality free-form text-image composition while matching or exceeding GPT-4V on select vision-language benchmarks.
citing papers explorer
-
RAR: Retrieving And Ranking Augmented MLLMs for Visual Recognition
RAR combines CLIP retrieval with MLLM ranking to improve few-shot and zero-shot fine-grained visual recognition on 5 benchmarks, 11 few-shot datasets, and 2 detection tasks.
-
InternLM-XComposer2: Mastering Free-form Text-Image Composition and Comprehension in Vision-Language Large Model
InternLM-XComposer2 introduces Partial LoRA on InternLM2-7B to enable high-quality free-form text-image composition while matching or exceeding GPT-4V on select vision-language benchmarks.