A large annotated corpus for learning natural language inference
read the original abstract
Understanding entailment and contradiction is fundamental to understanding natural language, and inference about entailment and contradiction is a valuable testing ground for the development of semantic representations. However, machine learning research in this area has been dramatically limited by the lack of large-scale resources. To address this, we introduce the Stanford Natural Language Inference corpus, a new, freely available collection of labeled sentence pairs, written by humans doing a novel grounded task based on image captioning. At 570K pairs, it is two orders of magnitude larger than all other resources of its type. This increase in scale allows lexicalized classifiers to outperform some sophisticated existing entailment models, and it allows a neural network-based model to perform competitively on natural language inference benchmarks for the first time.
This paper has not been read by Pith yet.
Forward citations
Cited by 11 Pith papers
-
Don't Label Twice: Quantity Beats Quality when Comparing Binary Classifiers on a Budget
For comparing two binary classifiers using a budget of noisy labels, collecting one label per sample across more samples outperforms aggregating multiple labels per sample.
-
C-Pack: Packed Resources For General Chinese Embeddings
C-Pack releases a new Chinese embedding benchmark, large training dataset, and optimized models that outperform priors by up to 10% on C-MTEB while also delivering English SOTA results.
-
From Articles to Premises: Building PrimeFacts, an Extraction Methodology and Resource for Fact-Checking Evidence
PrimeFacts extracts decontextualized premises from fact-check articles, raising evidence retrieval MRR by up to 30% and verdict prediction Macro-F1 by 10-20 points over baselines.
-
Is Textual Similarity Invariant under Machine Translation? Evidence Based on the Political Manifesto Corpus
Machine translation preserves embedding similarity structure for ten languages but distorts it for four in the Manifesto Corpus, via a new non-inferiority testing framework.
-
LIMO: Less is More for Reasoning
LIMO achieves 63.3% on AIME24 and 95.6% on MATH500 via supervised fine-tuning on roughly 1% of the data used by prior models, supporting the claim that minimal strategic examples suffice when pre-training has already ...
-
DropAttention: A Regularization Method for Fully-Connected Self-Attention Networks
DropAttention regularizes attention weights in fully-connected self-attention networks to reduce overfitting and improve performance.
-
A Deep Generative Model for Code-Switched Text
VACS is a two-level hierarchical VAE that generates diverse code-switched sentences, and augmenting monolingual data with its output reduces language model perplexity by 33.06%.
-
Convex Dataset Valuation for Post-Training
A convex KMM-based valuation method that accounts for both target-task alignment and inter-dataset redundancy in gradient space outperforms standard gradient-alignment baselines for LLM post-training data selection.
-
Fake News Detection as Natural Language Inference
Framing fake news classification as natural language inference and ensembling NLI models with BERT, plus transitivity rules, achieves 88.063% test accuracy in the WSDM 2019 challenge.
-
A Survey of Hallucination in Large Foundation Models
A survey classifying hallucination phenomena specific to large foundation models, establishing evaluation criteria, examining mitigation strategies, and discussing future directions.
-
Bias in Large Language Models: Origin, Evaluation, and Mitigation
A literature review that categorizes bias in LLMs, surveys evaluation and mitigation techniques, and discusses ethical implications.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.