In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Cai, M · 2024

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

browse 2 citing papers

representative citing papers

Multimodal LLMs under Pairwise Modalities

cs.CV · 2026-05-20 · unverdicted · novelty 6.0

A two-stage framework enables multimodal LLMs to learn shared latent representations from pairwise modality data and achieve cross-modal generation when incorporating new modalities.

SurgCheck: Do Vision-Language Models Really Look at Images in Surgical VQA?

cs.CV · 2026-05-03 · unverdicted · novelty 6.0

SurgCheck benchmark reveals that vision-language models for surgical VQA often depend on linguistic shortcuts rather than visual reasoning, shown by consistent performance drops on less-biased questions.

citing papers explorer

Showing 2 of 2 citing papers.

Multimodal LLMs under Pairwise Modalities cs.CV · 2026-05-20 · unverdicted · none · ref 6
A two-stage framework enables multimodal LLMs to learn shared latent representations from pairwise modality data and achieve cross-modal generation when incorporating new modalities.
SurgCheck: Do Vision-Language Models Really Look at Images in Surgical VQA? cs.CV · 2026-05-03 · unverdicted · none · ref 6
SurgCheck benchmark reveals that vision-language models for surgical VQA often depend on linguistic shortcuts rather than visual reasoning, shown by consistent performance drops on less-biased questions.

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

fields

years

verdicts

representative citing papers

citing papers explorer