pith. sign in

arxiv: 1612.07600 · v1 · pith:2PKIBQGAnew · submitted 2016-12-22 · 💻 cs.CL · cs.CV

Re-evaluating Automatic Metrics for Image Captioning

classification 💻 cs.CL cs.CV
keywords captioningimagemetricsautomaticaccuracyadvantagesapproachesattention
0
0 comments X
read the original abstract

The task of generating natural language descriptions from images has received a lot of attention in recent years. Consequently, it is becoming increasingly important to evaluate such image captioning approaches in an automatic manner. In this paper, we provide an in-depth evaluation of the existing image captioning metrics through a series of carefully designed experiments. Moreover, we explore the utilization of the recently proposed Word Mover's Distance (WMD) document metric for the purpose of image captioning. Our findings outline the differences and/or similarities between metrics and their relative robustness by means of extensive correlation, accuracy and distraction based evaluations. Our results also demonstrate that WMD provides strong advantages over other metrics.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. CaptionQA: Is Your Caption as Useful as the Image Itself?

    cs.CV 2025-11 conditional novelty 7.0

    CaptionQA is a new benchmark with 33,027 questions across natural, document, e-commerce, and embodied AI domains that measures how much utility model-generated captions retain compared to original images when used by ...