Clair: Evaluating image captions with large language models

David Chan, Suzanne Petryk, Joseph Gonzalez, Trevor Darrell, John F · 2023 · arXiv 2310.12971

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

read on arXiv browse 3 citing papers

citation-role summary

method 1

citation-polarity summary

use method 1

representative citing papers

Do Audio-Visual Large Language Models Really See and Hear?

cs.AI · 2026-04-03 · unverdicted · novelty 8.0

AVLLMs encode audio semantics in middle layers but suppress them in final text outputs when audio conflicts with vision, due to training that largely inherits from vision-language base models.

ClaimDiff-RL: Fine-Grained Caption Reinforcement Learning through Visual Claim Comparison

cs.LG · 2026-05-19 · unverdicted · novelty 6.0 · 2 refs

ClaimDiff-RL introduces reference-conditioned atomic claim differences verified by a multimodal judge as the reward signal for fine-grained RL in long-form image captioning.

VC-Inspector: Advancing Reference-free Evaluation of Video Captions with Factual Analysis

cs.CV · 2025-09-20 · unverdicted · novelty 6.0

VC-Inspector introduces a lightweight open-source LMM and a controllable factual-error generation framework that achieves state-of-the-art correlation with human judgments on reference-free video caption evaluation.

citing papers explorer

Showing 3 of 3 citing papers.

Do Audio-Visual Large Language Models Really See and Hear? cs.AI · 2026-04-03 · unverdicted · none · ref 10
AVLLMs encode audio semantics in middle layers but suppress them in final text outputs when audio conflicts with vision, due to training that largely inherits from vision-language base models.
ClaimDiff-RL: Fine-Grained Caption Reinforcement Learning through Visual Claim Comparison cs.LG · 2026-05-19 · unverdicted · none · ref 4 · 2 links
ClaimDiff-RL introduces reference-conditioned atomic claim differences verified by a multimodal judge as the reward signal for fine-grained RL in long-form image captioning.
VC-Inspector: Advancing Reference-free Evaluation of Video Captions with Factual Analysis cs.CV · 2025-09-20 · unverdicted · none · ref 6
VC-Inspector introduces a lightweight open-source LMM and a controllable factual-error generation framework that achieves state-of-the-art correlation with human judgments on reference-free video caption evaluation.

Clair: Evaluating image captions with large language models

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer