What is a good caption? a comprehensive visual caption benchmark for evaluating both correctness and thoroughness

· 2025 · arXiv 2502.14914

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

read on arXiv browse 5 citing papers

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

CaptionQA: Is Your Caption as Useful as the Image Itself?

cs.CV · 2025-11-26 · conditional · novelty 7.0

CaptionQA is a new benchmark with 33,027 questions across natural, document, e-commerce, and embodied AI domains that measures how much utility model-generated captions retain compared to original images when used by LLMs for downstream tasks.

ChartFI: Benchmarking Faithfulness and Insightfulness of Chart Descriptions from Multimodal Large Language Models

cs.CL · 2026-05-22 · unverdicted · novelty 6.0

ChartFI-Bench supplies 896 chart-description pairs and four metrics (Faithfulness, Coverage, Informativeness, Acuity) to evaluate MLLM-generated chart descriptions on faithfulness and insightfulness.

Bridging Brain and Semantics: A Hierarchical Framework for Semantically Enhanced fMRI-to-Video Reconstruction

cs.CV · 2026-05-14 · unverdicted · novelty 6.0

CineNeuron improves fMRI-to-video reconstruction by combining bottom-up semantic enrichment with top-down Mixture-of-Memories integration and outperforms prior methods on benchmarks.

Building a Precise Video Language with Human-AI Oversight

cs.CV · 2026-04-22 · unverdicted · novelty 6.0

CHAI framework pairs AI pre-captions with expert human critiques to produce precise video descriptions, enabling open models to outperform closed ones like Gemini-3.1-Pro and improve fine-grained control in video generation models.

From Heads to Neurons: Causal Attribution and Steering in Multi-Task Vision-Language Models

cs.CV · 2026-04-20 · unverdicted · novelty 5.0

HONES ranks feed-forward neurons by their causal contributions from task-relevant attention heads and uses lightweight scaling to steer performance on multiple vision-language tasks.

citing papers explorer

Showing 5 of 5 citing papers.

CaptionQA: Is Your Caption as Useful as the Image Itself? cs.CV · 2025-11-26 · conditional · none · ref 26
CaptionQA is a new benchmark with 33,027 questions across natural, document, e-commerce, and embodied AI domains that measures how much utility model-generated captions retain compared to original images when used by LLMs for downstream tasks.
ChartFI: Benchmarking Faithfulness and Insightfulness of Chart Descriptions from Multimodal Large Language Models cs.CL · 2026-05-22 · unverdicted · none · ref 38
ChartFI-Bench supplies 896 chart-description pairs and four metrics (Faithfulness, Coverage, Informativeness, Acuity) to evaluate MLLM-generated chart descriptions on faithfulness and insightfulness.
Bridging Brain and Semantics: A Hierarchical Framework for Semantically Enhanced fMRI-to-Video Reconstruction cs.CV · 2026-05-14 · unverdicted · none · ref 63
CineNeuron improves fMRI-to-video reconstruction by combining bottom-up semantic enrichment with top-down Mixture-of-Memories integration and outperforms prior methods on benchmarks.
Building a Precise Video Language with Human-AI Oversight cs.CV · 2026-04-22 · unverdicted · none · ref 39
CHAI framework pairs AI pre-captions with expert human critiques to produce precise video descriptions, enabling open models to outperform closed ones like Gemini-3.1-Pro and improve fine-grained control in video generation models.
From Heads to Neurons: Causal Attribution and Steering in Multi-Task Vision-Language Models cs.CV · 2026-04-20 · unverdicted · none · ref 90
HONES ranks feed-forward neurons by their causal contributions from task-relevant attention heads and uses lightweight scaling to steer performance on multiple vision-language tasks.

What is a good caption? a comprehensive visual caption benchmark for evaluating both correctness and thoroughness

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer