Transforming Question Answering Datasets Into Natural Language Inference Datasets

Dorottya Demszky , Kelvin Guu , Percy Liang

Authors on Pith no claims yet

classification 💻 cs.CL

keywords datasetsinferencelanguageansweringautomaticallydatasetnaturalquestion

read the original abstract

Existing datasets for natural language inference (NLI) have propelled research on language understanding. We propose a new method for automatically deriving NLI datasets from the growing abundance of large-scale question answering datasets. Our approach hinges on learning a sentence transformation model which converts question-answer pairs into their declarative forms. Despite being primarily trained on a single QA dataset, we show that it can be successfully applied to a variety of other QA resources. Using this system, we automatically derive a new freely available dataset of over 500k NLI examples (QA-NLI), and show that it exhibits a wide range of inference phenomena rarely seen in previous NLI datasets.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

BoolQ: Exploring the Surprising Difficulty of Natural Yes/No Questions
cs.CL 2019-05 accept novelty 7.0

BoolQ introduces naturally occurring yes/no questions as a challenging benchmark where BERT fine-tuned on MultiNLI reaches 80.4% accuracy against 90% human performance.
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
cs.CL 2018-04 unverdicted novelty 7.0

GLUE is a multi-task benchmark for general natural language understanding that includes a diagnostic test suite and finds limited gains from current multi-task learning methods over single-task training.
Consistency-Guided Decoding with Proof-Driven Disambiguation for Three-Way Logical Question Answering
cs.CL 2026-03 unverdicted novelty 6.0

CGD-PD improves three-way logical QA accuracy by up to 16% relative on FOLIO through negation-consistent projection and proof-driven disambiguation that reduces Unknown predictions across frontier LLMs.