The NarrativeQA Reading Comprehension Challenge

Tom\'a\v{s} Ko\v{c}isk\'y , Jonathan Schwarz , Phil Blunsom , Chris Dyer , Karl Moritz Hermann , G\'abor Melis , Edward Grefenstette

Authors on Pith no claims yet

classification 💻 cs.CL cs.AIcs.NE

keywords taskscomprehensioninformationquestionsreadingansweringdatasetability

0 comments

read the original abstract

Reading comprehension (RC)---in contrast to information retrieval---requires integrating information and reasoning about events, entities, and their relations across a full document. Question answering is conventionally used to assess RC ability, in both artificial agents and children learning to read. However, existing RC datasets and tasks are dominated by questions that can be solved by selecting answers using superficial information (e.g., local context similarity or global term frequency); they thus fail to test for the essential integrative aspect of RC. To encourage progress on deeper comprehension of language, we present a new dataset and set of tasks in which the reader must answer questions about stories by reading entire books or movie scripts. These tasks are designed so that successfully answering their questions requires understanding the underlying narrative rather than relying on shallow pattern matching or salience. We show that although humans solve the tasks easily, standard RC models struggle on the tasks presented here. We provide an analysis of the dataset and the challenges it presents.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 4 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Can a Suit of Armor Conduct Electricity? A New Dataset for Open Book Question Answering
cs.CL 2018-09 accept novelty 8.0

OpenBookQA tests AI by requiring it to apply provided science facts plus common knowledge to new questions, where advanced models perform worse than simple baselines while humans score near 92%.
XGRAG: A Graph-Native Framework for Explaining KG-based Retrieval-Augmented Generation
cs.AI 2026-04 unverdicted novelty 7.0

XGRAG uses graph perturbations to quantify component contributions in GraphRAG and achieves 14.81% better explanation quality than text-based baselines on QA datasets, with correlations to graph centrality.
MS MARCO: A Human Generated MAchine Reading COmprehension Dataset
cs.CL 2016-11 accept novelty 7.0

MS MARCO is a new large-scale machine reading comprehension dataset built from real Bing search queries, human-generated answers, and web passages, supporting three tasks including answer synthesis and passage ranking.
RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval
cs.CL 2024-01 unverdicted novelty 6.0

RAPTOR introduces a tree-organized retrieval method using recursive abstractive summaries, achieving a 20% absolute accuracy improvement on the QuALITY benchmark when paired with GPT-4.