CoQA: A Conversational Question Answering Challenge
read the original abstract
Humans gather information by engaging in conversations involving a series of interconnected questions and answers. For machines to assist in information gathering, it is therefore essential to enable them to answer conversational questions. We introduce CoQA, a novel dataset for building Conversational Question Answering systems. Our dataset contains 127k questions with answers, obtained from 8k conversations about text passages from seven diverse domains. The questions are conversational, and the answers are free-form text with their corresponding evidence highlighted in the passage. We analyze CoQA in depth and show that conversational questions have challenging phenomena not present in existing reading comprehension datasets, e.g., coreference and pragmatic reasoning. We evaluate strong conversational and reading comprehension models on CoQA. The best system obtains an F1 score of 65.4%, which is 23.4 points behind human performance (88.8%), indicating there is ample room for improvement. We launch CoQA as a challenge to the community at http://stanfordnlp.github.io/coqa/
This paper has not been read by Pith yet.
Forward citations
Cited by 10 Pith papers
-
PRIMETIME : Limits of LLMs in Temporal Primitives
PRIMETIME generator reveals that LLM datetime parsing and arithmetic primitives are individually unreliable but fully learnable via fine-tuning, enabling frontier-level accuracy on event planning with small LoRA models.
-
Language Models as Knowledge Bases?
BERT stores relational knowledge extractable via cloze queries without fine-tuning and matches supervised baselines on open-domain QA tasks.
-
Asking Clarifying Questions in Open-Domain Information-Seeking Conversations
The authors introduce the task of asking clarifying questions for open-domain information-seeking conversations, collect the Qulac dataset from TREC topics, and propose a retrieval framework that outperforms baselines...
-
Dual Hierarchical Dialogue Policy Learning for Legal Inquisitive Conversational Agents
A dual hierarchical RL framework with two agents coordinates high-level dialogue strategy and low-level question generation to emulate judicial questioning and extract key information from Supreme Court arguments, out...
-
Dual Hierarchical Dialogue Policy Learning for Legal Inquisitive Conversational Agents
A dual hierarchical RL framework lets agents learn when and how to ask probing questions in U.S. Supreme Court arguments, outperforming baselines on a court dataset.
-
Hallucination as an Anomaly: Dynamic Intervention via Probabilistic Circuits
Probabilistic circuits detect LLM hallucinations as residual-stream anomalies with up to 99% AUROC and enable dynamic correction that raises truthfulness scores while cutting unnecessary output corruption.
-
Parcae: Scaling Laws For Stable Looped Language Models
Parcae stabilizes looped LLMs via spectral norm constraints on injection parameters, enabling power-law scaling for training FLOPs and saturating exponential scaling at test time that improves quality over fixed-depth...
-
PaLM: Scaling Language Modeling with Pathways
PaLM 540B demonstrates continued scaling benefits by setting new few-shot SOTA results on hundreds of benchmarks and outperforming humans on BIG-bench.
-
To Tune or Not To Tune? How About the Best of Both Worlds?
A sequential fine-tuning strategy for pre-trained language models reports modest accuracy gains of 4.7%, 0.99%, and 0.72% on semantic similarity, sequence labeling, and text classification tasks.
-
Machine Reading Comprehension: a Literature Review
A 2019 survey of machine reading comprehension corpora and methods.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.