A large annotated corpus for learning natural language inference

Christopher D. Manning; Christopher Potts; Gabor Angeli; Samuel R. Bowman

arxiv: 1508.05326 · v1 · pith:W4BYOHMNnew · submitted 2015-08-21 · 💻 cs.CL

A large annotated corpus for learning natural language inference

Samuel R. Bowman , Gabor Angeli , Christopher Potts , Christopher D. Manning This is my paper

classification 💻 cs.CL

keywords inferencelanguagenaturalentailmentallowscontradictioncorpuslearning

0 comments

read the original abstract

Understanding entailment and contradiction is fundamental to understanding natural language, and inference about entailment and contradiction is a valuable testing ground for the development of semantic representations. However, machine learning research in this area has been dramatically limited by the lack of large-scale resources. To address this, we introduce the Stanford Natural Language Inference corpus, a new, freely available collection of labeled sentence pairs, written by humans doing a novel grounded task based on image captioning. At 570K pairs, it is two orders of magnitude larger than all other resources of its type. This increase in scale allows lexicalized classifiers to outperform some sophisticated existing entailment models, and it allows a neural network-based model to perform competitively on natural language inference benchmarks for the first time.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 11 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Don't Label Twice: Quantity Beats Quality when Comparing Binary Classifiers on a Budget
cs.LG 2024-02 unverdicted novelty 8.0

For comparing two binary classifiers using a budget of noisy labels, collecting one label per sample across more samples outperforms aggregating multiple labels per sample.
C-Pack: Packed Resources For General Chinese Embeddings
cs.CL 2023-09 accept novelty 7.0

C-Pack releases a new Chinese embedding benchmark, large training dataset, and optimized models that outperform priors by up to 10% on C-MTEB while also delivering English SOTA results.
From Articles to Premises: Building PrimeFacts, an Extraction Methodology and Resource for Fact-Checking Evidence
cs.CL 2026-05 unverdicted novelty 6.0

PrimeFacts extracts decontextualized premises from fact-check articles, raising evidence retrieval MRR by up to 30% and verdict prediction Macro-F1 by 10-20 points over baselines.
Is Textual Similarity Invariant under Machine Translation? Evidence Based on the Political Manifesto Corpus
cs.CL 2026-05 unverdicted novelty 6.0

Machine translation preserves embedding similarity structure for ten languages but distorts it for four in the Manifesto Corpus, via a new non-inferiority testing framework.
LIMO: Less is More for Reasoning
cs.CL 2025-02 unverdicted novelty 6.0

LIMO achieves 63.3% on AIME24 and 95.6% on MATH500 via supervised fine-tuning on roughly 1% of the data used by prior models, supporting the claim that minimal strategic examples suffice when pre-training has already ...
DropAttention: A Regularization Method for Fully-Connected Self-Attention Networks
cs.CL 2019-07 unverdicted novelty 6.0

DropAttention regularizes attention weights in fully-connected self-attention networks to reduce overfitting and improve performance.
A Deep Generative Model for Code-Switched Text
cs.CL 2019-06 unverdicted novelty 6.0

VACS is a two-level hierarchical VAE that generates diverse code-switched sentences, and augmenting monolingual data with its output reduces language model perplexity by 33.06%.
Convex Dataset Valuation for Post-Training
cs.LG 2026-05 unverdicted novelty 5.0

A convex KMM-based valuation method that accounts for both target-task alignment and inter-dataset redundancy in gradient space outperforms standard gradient-alignment baselines for LLM post-training data selection.
Fake News Detection as Natural Language Inference
cs.CL 2019-07 unverdicted novelty 4.0

Framing fake news classification as natural language inference and ensembling NLI models with BERT, plus transitivity rules, achieves 88.063% test accuracy in the WSDM 2019 challenge.
A Survey of Hallucination in Large Foundation Models
cs.AI 2023-09 accept novelty 3.0

A survey classifying hallucination phenomena specific to large foundation models, establishing evaluation criteria, examining mitigation strategies, and discussing future directions.
Bias in Large Language Models: Origin, Evaluation, and Mitigation
cs.CL 2024-11 unverdicted novelty 2.0

A literature review that categorizes bias in LLMs, surveys evaluation and mitigation techniques, and discusses ethical implications.