AeroRAG improves fine-grained aerial visual question answering by converting images to scene graphs and using retrieval-augmented generation to create compact LLM prompts.
Making the v in vqa matter: Elevating the role of image understanding in visual question answering,
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CV 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
AeroRAG: Structured Multimodal Retrieval-Augmented LLM for Fine-Grained Aerial Visual Reasoning
AeroRAG improves fine-grained aerial visual question answering by converting images to scene graphs and using retrieval-augmented generation to create compact LLM prompts.