Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing , pages=

HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering , author= · 2018

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

browse 3 citing papers

representative citing papers

LongBench: A Bilingual, Multitask Benchmark for Long Context Understanding

cs.CL · 2023-08-28 · unverdicted · novelty 8.0

LongBench is the first bilingual multi-task benchmark for long context understanding in LLMs, containing 21 datasets in 6 categories with average lengths of 6711 words (English) and 13386 characters (Chinese).

A Multi-Agent Framework for Feature-Constrained Difficulty Control in Reading Comprehension Item Generation

cs.CL · 2026-05-19 · unverdicted · novelty 6.0

MAFIG is a multi-agent framework that uses LLM agents and evaluators to generate reading comprehension items with significantly higher adherence to specified feature constraints than single-agent baselines.

A Benchmark Construction and Evaluation Framework for Specialist Domains: Case Study on Defense-related Documents

cs.CL · 2026-04-20

citing papers explorer

Showing 1 of 1 citing paper after filters.

A Benchmark Construction and Evaluation Framework for Specialist Domains: Case Study on Defense-related Documents cs.CL · 2026-04-20 · unreviewed · ref 21

Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing , pages=

fields

years

verdicts

representative citing papers

citing papers explorer