DUALVISION is a new lightweight fusion module using localized cross-attention to integrate infrared with RGB data in MLLMs, improving robustness to degradations and supported by the new DV-204K training dataset and DV-500 benchmark.
MiniGPT-4: Enhancing vision-language understanding with advanced large language models
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CV 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
DUALVISION: RGB-Infrared Multimodal Large Language Models for Robust Visual Reasoning
DUALVISION is a new lightweight fusion module using localized cross-attention to integrate infrared with RGB data in MLLMs, improving robustness to degradations and supported by the new DV-204K training dataset and DV-500 benchmark.