{"work":{"id":"73ded6d0-6905-404f-b7d5-b66e36e1b4f8","openalex_id":null,"doi":null,"arxiv_id":"2011.01060","raw_key":null,"title":"Constructing A Multi-hop QA Dataset for Comprehensive Evaluation of Reasoning Steps","authors":null,"authors_text":"Xanh Ho, Anh-Khoa Duong Nguyen, Saku Sugawara, Akiko Aizawa","year":2020,"venue":"cs.CL","abstract":"A multi-hop question answering (QA) dataset aims to test reasoning and inference skills by requiring a model to read multiple paragraphs to answer a given question. However, current datasets do not provide a complete explanation for the reasoning process from the question to the answer. Further, previous studies revealed that many examples in existing multi-hop datasets do not require multi-hop reasoning to answer a question. In this study, we present a new multi-hop QA dataset, called 2WikiMultiHopQA, which uses structured and unstructured data. In our dataset, we introduce the evidence information containing a reasoning path for multi-hop questions. The evidence information has two benefits: (i) providing a comprehensive explanation for predictions and (ii) evaluating the reasoning skills of a model. We carefully design a pipeline and a set of templates when generating a question-answer pair that guarantees the multi-hop steps and the quality of the questions. We also exploit the structured format in Wikidata and use logical rules to create questions that are natural but still require multi-hop reasoning. Through experiments, we demonstrate that our dataset is challenging for multi-hop models and it ensures that multi-hop reasoning is required.","external_url":"https://arxiv.org/abs/2011.01060","cited_by_count":null,"metadata_source":"pith","metadata_fetched_at":"2026-06-28T20:42:37.995091+00:00","pith_arxiv_id":"2011.01060","created_at":"2026-05-10T05:56:11.459064+00:00","updated_at":"2026-06-28T20:42:37.995091+00:00","title_quality_ok":true,"display_title":"Constructing A Multi-hop QA Dataset for Comprehensive Evaluation of Reasoning Steps","render_title":"Constructing A Multi-hop QA Dataset for Comprehensive Evaluation of Reasoning Steps"},"hub":{"state":{"work_id":"73ded6d0-6905-404f-b7d5-b66e36e1b4f8","tier":"hub","tier_reason":"10+ Pith inbound or 1,000+ external citations","pith_inbound_count":22,"external_cited_by_count":null,"distinct_field_count":5,"first_pith_cited_at":"2023-12-18T07:47:33+00:00","last_pith_cited_at":"2026-05-30T08:18:53+00:00","author_build_status":"not_needed","summary_status":"needed","contexts_status":"needed","graph_status":"needed","ask_index_status":"not_needed","reader_status":"not_needed","recognition_status":"not_needed","updated_at":"2026-06-29T16:29:04.670836+00:00","tier_text":"hub"},"tier":"hub","role_counts":[{"context_role":"background","n":4},{"context_role":"dataset","n":3},{"context_role":"method","n":1}],"polarity_counts":[{"context_polarity":"background","n":4},{"context_polarity":"use_dataset","n":3},{"context_polarity":"unclear","n":1}],"runs":{},"summary":{},"graph":{},"authors":[]}}