CoQA: A Conversational Question Answering Challenge

Siva Reddy , Danqi Chen , Christopher D. Manning

Authors on Pith no claims yet

classification 💻 cs.CL cs.AIcs.LG

keywords conversationalcoqaquestionsanswersansweringchallengecomprehensionconversations

read the original abstract

Humans gather information by engaging in conversations involving a series of interconnected questions and answers. For machines to assist in information gathering, it is therefore essential to enable them to answer conversational questions. We introduce CoQA, a novel dataset for building Conversational Question Answering systems. Our dataset contains 127k questions with answers, obtained from 8k conversations about text passages from seven diverse domains. The questions are conversational, and the answers are free-form text with their corresponding evidence highlighted in the passage. We analyze CoQA in depth and show that conversational questions have challenging phenomena not present in existing reading comprehension datasets, e.g., coreference and pragmatic reasoning. We evaluate strong conversational and reading comprehension models on CoQA. The best system obtains an F1 score of 65.4%, which is 23.4 points behind human performance (88.8%), indicating there is ample room for improvement. We launch CoQA as a challenge to the community at http://stanfordnlp.github.io/coqa/

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 4 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Dual Hierarchical Dialogue Policy Learning for Legal Inquisitive Conversational Agents
cs.CL 2026-05 unverdicted novelty 6.0

A dual hierarchical RL framework lets agents learn when and how to ask probing questions in U.S. Supreme Court arguments, outperforming baselines on a court dataset.
Hallucination as an Anomaly: Dynamic Intervention via Probabilistic Circuits
cs.CL 2026-05 unverdicted novelty 6.0

Probabilistic circuits detect LLM hallucinations as residual-stream anomalies with up to 99% AUROC and enable dynamic correction that raises truthfulness scores while cutting unnecessary output corruption.
Parcae: Scaling Laws For Stable Looped Language Models
cs.LG 2026-04 unverdicted novelty 6.0

Parcae stabilizes looped LLMs via spectral norm constraints on injection parameters, enabling power-law scaling for training FLOPs and saturating exponential scaling at test time that improves quality over fixed-depth...
PaLM: Scaling Language Modeling with Pathways
cs.CL 2022-04 accept novelty 6.0

PaLM 540B demonstrates continued scaling benefits by setting new few-shot SOTA results on hundreds of benchmarks and outperforming humans on BIG-bench.