ROUGE 2.0: Updated and Improved Measures for Evaluation of Summarization Tasks

· 2018 · cs.IR · arXiv 1803.01937

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

open full Pith review browse 2 citing papers arXiv PDF

abstract

Evaluation of summarization tasks is extremely crucial to determining the quality of machine generated summaries. Over the last decade, ROUGE has become the standard automatic evaluation measure for evaluating summarization tasks. While ROUGE has been shown to be effective in capturing n-gram overlap between system and human composed summaries, there are several limitations with the existing ROUGE measures in terms of capturing synonymous concepts and coverage of topics. Thus, often times ROUGE scores do not reflect the true quality of summaries and prevents multi-faceted evaluation of summaries (i.e. by topics, by overall content coverage and etc). In this paper, we introduce ROUGE 2.0, which has several updated measures of ROUGE: ROUGE-N+Synonyms, ROUGE-Topic, ROUGE-Topic+Synonyms, ROUGE-TopicUniq and ROUGE-TopicUniq+Synonyms; all of which are improvements over the core ROUGE measures.

representative citing papers

Matter to Mechanism: A Benchmark for AI Co-Scientists in Materials and Battery Research

cs.CE · 2026-06-01 · unverdicted · novelty 7.0

Introduces the Matter to Mechanism benchmark of 2,645 structured instances and a composite metric suite for evaluating AI co-scientists on problem-to-hypothesis reasoning in battery materials research.

Distinguishing Right from Wrong in Debates: Attribution Analysis of Chinese Harmful Memes

cs.CL · 2026-05-23 · unverdicted · novelty 6.0

Introduces Ex-ToxiCN-MM dataset and RIKE framework (with AKE and RIR modules) that outperforms baselines on attributing harm in ambiguous Chinese memes using C-HarmKB.

citing papers explorer

Showing 1 of 1 citing paper after filters.

Distinguishing Right from Wrong in Debates: Attribution Analysis of Chinese Harmful Memes cs.CL · 2026-05-23 · unverdicted · none · ref 34 · internal anchor
Introduces Ex-ToxiCN-MM dataset and RIKE framework (with AKE and RIR modules) that outperforms baselines on attributing harm in ambiguous Chinese memes using C-HarmKB.

ROUGE 2.0: Updated and Improved Measures for Evaluation of Summarization Tasks

fields

years

verdicts

representative citing papers

citing papers explorer