Analysing the Robustness of Vision-Language-Models to Common Cor- ruptions

Analysing the robustness of vision-language-models to common corruptions , author= · 2025 · arXiv 2504.13690

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

read on arXiv browse 3 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Are Reasoning Vision-Language Models Robust to Semantic Visual Distractions?

cs.CV · 2026-06-08 · unverdicted · novelty 7.0

Reasoning VLMs show lower robustness to semantic visual distractions than to perceptual corruptions, with distractions entering their reasoning chains and causing errors.

DUALVISION: RGB-Infrared Multimodal Large Language Models for Robust Visual Reasoning

cs.CV · 2026-04-20 · unverdicted · novelty 7.0

DUALVISION is a new lightweight fusion module using localized cross-attention to integrate infrared with RGB data in MLLMs, improving robustness to degradations and supported by the new DV-204K training dataset and DV-500 benchmark.

RemoteShield: Enable Robust Multimodal Large Language Models for Earth Observation

cs.CV · 2026-04-19 · unverdicted · novelty 6.0

RemoteShield improves robustness of Earth observation MLLMs by training on semantic equivalence clusters of clean and perturbed inputs via preference learning to maintain consistent reasoning under noise.

citing papers explorer

Showing 3 of 3 citing papers after filters.

Are Reasoning Vision-Language Models Robust to Semantic Visual Distractions? cs.CV · 2026-06-08 · unverdicted · none · ref 32
Reasoning VLMs show lower robustness to semantic visual distractions than to perceptual corruptions, with distractions entering their reasoning chains and causing errors.
DUALVISION: RGB-Infrared Multimodal Large Language Models for Robust Visual Reasoning cs.CV · 2026-04-20 · unverdicted · none · ref 38
DUALVISION is a new lightweight fusion module using localized cross-attention to integrate infrared with RGB data in MLLMs, improving robustness to degradations and supported by the new DV-204K training dataset and DV-500 benchmark.
RemoteShield: Enable Robust Multimodal Large Language Models for Earth Observation cs.CV · 2026-04-19 · unverdicted · none · ref 50
RemoteShield improves robustness of Earth observation MLLMs by training on semantic equivalence clusters of clean and perturbed inputs via preference learning to maintain consistent reasoning under noise.

Analysing the Robustness of Vision-Language-Models to Common Cor- ruptions

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer