MiniGPT-4: Enhancing vision-language understanding with advanced large language models

Deyao Zhu, Jun Chen, Xiaoqian Shen, Xiang Li, Mohamed Elhoseiny · 2024 · arXiv 5887.5885

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

representative citing papers

DUALVISION: RGB-Infrared Multimodal Large Language Models for Robust Visual Reasoning

cs.CV · 2026-04-20 · unverdicted · novelty 7.0

DUALVISION is a new lightweight fusion module using localized cross-attention to integrate infrared with RGB data in MLLMs, improving robustness to degradations and supported by the new DV-204K training dataset and DV-500 benchmark.

citing papers explorer

Showing 1 of 1 citing paper.

DUALVISION: RGB-Infrared Multimodal Large Language Models for Robust Visual Reasoning cs.CV · 2026-04-20 · unverdicted · none · ref 48
DUALVISION is a new lightweight fusion module using localized cross-attention to integrate infrared with RGB data in MLLMs, improving robustness to degradations and supported by the new DV-204K training dataset and DV-500 benchmark.

MiniGPT-4: Enhancing vision-language understanding with advanced large language models

fields

years

verdicts

representative citing papers

citing papers explorer