MVI-Bench supplies the first taxonomy and dataset focused on misleading visual inputs to measure LVLM robustness, with tests on 18 models revealing clear weaknesses.
Lvlm-interpret: an interpretability tool for large vision-language models.arXiv preprint arXiv:2404.03118, 2024
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
verdicts
UNVERDICTED 2representative citing papers
Enhanced EWC for LVLMs cuts forgetting rates by 78% versus naive training and keeps visual-textual alignment with 15% extra compute.
citing papers explorer
-
MVI-Bench: A Comprehensive Benchmark for Evaluating Robustness to Misleading Visual Inputs in LVLMs
MVI-Bench supplies the first taxonomy and dataset focused on misleading visual inputs to measure LVLM robustness, with tests on 18 models revealing clear weaknesses.
-
Lifelong Learning in Vision-Language Models: Enhanced EWC with Cross-Modal Knowledge Retention
Enhanced EWC for LVLMs cuts forgetting rates by 78% versus naive training and keeps visual-textual alignment with 15% extra compute.