ITIScore evaluates MLLM image captions via image-to-text-to-image reconstruction consistency and aligns with human judgments on a new 40K-caption benchmark.
Vision Language Model-based Caption Evaluation Method Leveraging Visual Context Extraction, Feb
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
citation-role summary
background 1
citation-polarity summary
fields
cs.CV 2verdicts
UNVERDICTED 2roles
background 1polarities
background 1representative citing papers
VC-Inspector introduces a lightweight open-source LMM and a controllable factual-error generation framework that achieves state-of-the-art correlation with human judgments on reference-free video caption evaluation.
citing papers explorer
-
ITIScore: An Image-to-Text-to-Image Rating Framework for the Image Captioning Ability of MLLMs
ITIScore evaluates MLLM image captions via image-to-text-to-image reconstruction consistency and aligns with human judgments on a new 40K-caption benchmark.
-
VC-Inspector: Advancing Reference-free Evaluation of Video Captions with Factual Analysis
VC-Inspector introduces a lightweight open-source LMM and a controllable factual-error generation framework that achieves state-of-the-art correlation with human judgments on reference-free video caption evaluation.