pith. machine review for the scientific record. sign in

arxiv: 1411.5726 · v2 · submitted 2014-11-20 · 💻 cs.CV · cs.CL· cs.IR

Recognition: unknown

CIDEr: Consensus-based Image Description Evaluation

Authors on Pith no claims yet
classification 💻 cs.CV cs.CLcs.IR
keywords imageconsensusciderevaluationhumancapturesdescribingdescription
0
0 comments X
read the original abstract

Automatically describing an image with a sentence is a long-standing challenge in computer vision and natural language processing. Due to recent progress in object detection, attribute classification, action recognition, etc., there is renewed interest in this area. However, evaluating the quality of descriptions has proven to be challenging. We propose a novel paradigm for evaluating image descriptions that uses human consensus. This paradigm consists of three main parts: a new triplet-based method of collecting human annotations to measure consensus, a new automated metric (CIDEr) that captures consensus, and two new datasets: PASCAL-50S and ABSTRACT-50S that contain 50 sentences describing each image. Our simple metric captures human judgment of consensus better than existing metrics across sentences generated by various sources. We also evaluate five state-of-the-art image description approaches using this new protocol and provide a benchmark for future comparisons. A version of CIDEr named CIDEr-D is available as a part of MS COCO evaluation server to enable systematic evaluation and benchmarking.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. BalCapRL: A Balanced Framework for RL-Based MLLM Image Captioning

    cs.CV 2026-05 unverdicted novelty 6.0

    BalCapRL applies balanced multi-objective RL with GDPO-style normalization and length-conditional masking to improve MLLM image captioning, reporting gains of up to +13.6 DCScore, +9.0 CaptionQA, and +29.0 CapArena on...

  2. Microsoft COCO Captions: Data Collection and Evaluation Server

    cs.CV 2015-04 accept novelty 6.0

    Microsoft COCO Captions provides 1.5 million human captions across 330,000 images and a public server to evaluate captioning models with BLEU, METEOR, ROUGE, and CIDEr.