A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference

Adina Williams , Nikita Nangia , Samuel R. Bowman

Authors on Pith no claims yet

classification 💻 cs.CL

keywords corpusavailableevaluationinferenceofferssentenceunderstandingadaptation

read the original abstract

This paper introduces the Multi-Genre Natural Language Inference (MultiNLI) corpus, a dataset designed for use in the development and evaluation of machine learning models for sentence understanding. In addition to being one of the largest corpora available for the task of NLI, at 433k examples, this corpus improves upon available resources in its coverage: it offers data from ten distinct genres of written and spoken English--making it possible to evaluate systems on nearly the full complexity of the language--and it offers an explicit setting for the evaluation of cross-genre domain adaptation.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 22 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

RoFormer: Enhanced Transformer with Rotary Position Embedding
cs.CL 2021-04 accept novelty 8.0

RoFormer introduces rotary position embeddings that encode absolute positions via rotation matrices and relative dependencies in attention, outperforming prior position methods on long text classification tasks.
SimCSE: Simple Contrastive Learning of Sentence Embeddings
cs.CL 2021-04 conditional novelty 8.0

SimCSE achieves 76.3% unsupervised and 81.6% supervised Spearman's correlation on STS tasks with BERT-base, improving prior best results by 4.2% and 2.2% via simple contrastive learning.
A Multi-View Media Profiling Suite: Resources, Evaluation, and Analysis
cs.CL 2026-05 unverdicted novelty 7.0

Presents MBFC-2025 dataset and multi-view embeddings with fusion methods for media bias and factuality, reporting SOTA results on ACL-2020 and new benchmarks on MBFC-2025.
Learning Posterior Predictive Distributions for Node Classification from Synthetic Graph Priors
cs.LG 2026-04 unverdicted novelty 7.0

NodePFN pre-trains on synthetic graphs with controllable homophily and causal feature-label models to achieve 71.27 average accuracy on 23 node classification benchmarks without graph-specific training.
Wiring the 'Why': A Unified Taxonomy and Survey of Abductive Reasoning in LLMs
cs.AI 2026-04 accept novelty 7.0

The paper delivers the first survey of abductive reasoning in LLMs, a unified two-stage taxonomy, a compact benchmark, and an analysis of gaps relative to deductive and inductive reasoning.
C-Pack: Packed Resources For General Chinese Embeddings
cs.CL 2023-09 accept novelty 7.0

C-Pack releases a new Chinese embedding benchmark, large training dataset, and optimized models that outperform priors by up to 10% on C-MTEB while also delivering English SOTA results.
LoRA: Low-Rank Adaptation of Large Language Models
cs.CL 2021-06 accept novelty 7.0

Adapting large language models by training only a low-rank decomposition BA added to frozen weight matrices matches full fine-tuning while cutting trainable parameters by orders of magnitude and adding no inference latency.
BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
cs.CL 2019-10 accept novelty 7.0

BART introduces a denoising pretraining method for seq2seq models that matches RoBERTa on GLUE and SQuAD while setting new state-of-the-art results on abstractive summarization, dialogue, and QA with up to 6 ROUGE gains.
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
cs.LG 2019-10 unverdicted novelty 7.0

T5 casts all NLP tasks as text-to-text generation, systematically explores pre-training choices, and reaches strong performance on summarization, QA, classification and other tasks via large-scale training on the Colo...
ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
cs.CL 2019-09 accept novelty 7.0

ALBERT reduces BERT parameters via embedding factorization and layer sharing, adds inter-sentence coherence pretraining, and reaches SOTA on GLUE, RACE, and SQuAD with fewer parameters than BERT-large.
BoolQ: Exploring the Surprising Difficulty of Natural Yes/No Questions
cs.CL 2019-05 accept novelty 7.0

BoolQ introduces naturally occurring yes/no questions as a challenging benchmark where BERT fine-tuned on MultiNLI reaches 80.4% accuracy against 90% human performance.
On the Burden of Achieving Fairness in Conformal Prediction
stat.ML 2026-05 unverdicted novelty 6.0

Pooled conformal calibration incurs irreducible group-wise coverage distortion set by cross-group quantile heterogeneity, and Equalized Coverage and Equalized Set Size are in fundamental tension.
When Reviews Disagree: Fine-Grained Contradiction Analysis in Scientific Peer Reviews
cs.CL 2026-05 unverdicted novelty 6.0

Introduces RevCI benchmark and IMPACT multi-agent framework for evidence-level contradiction detection and graded intensity scoring in peer reviews, distilled into efficient TIDE model.
From Articles to Premises: Building PrimeFacts, an Extraction Methodology and Resource for Fact-Checking Evidence
cs.CL 2026-05 unverdicted novelty 6.0

PrimeFacts extracts decontextualized premises from fact-check articles, raising evidence retrieval MRR by up to 30% and verdict prediction Macro-F1 by 10-20 points over baselines.
On the (In-)Security of the Shuffling Defense in the Transformer Secure Inference
cs.CR 2026-05 conditional novelty 6.0

An attack aligns differently shuffled intermediate activations from secure Transformer inference queries to recover model weights with low error using roughly one dollar of queries.
Is Textual Similarity Invariant under Machine Translation? Evidence Based on the Political Manifesto Corpus
cs.CL 2026-05 unverdicted novelty 6.0

Machine translation preserves embedding similarity structure for ten languages but distorts it for four in the Manifesto Corpus, via a new non-inferiority testing framework.
Memory-Efficient Transfer Learning with Fading Side Networks via Masked Dual Path Distillation
cs.CV 2026-04 unverdicted novelty 6.0

MDPD mutually distills knowledge between a frozen backbone and a learnable side network during fine-tuning, then discards the side network at inference to accelerate speed by at least 25% while preserving accuracy.
MP-ISMoE: Mixed-Precision Interactive Side Mixture-of-Experts for Efficient Transfer Learning
cs.LG 2026-04 unverdicted novelty 6.0

MP-ISMoE uses Gaussian noise perturbed iterative quantization and interactive side mixture-of-experts to deliver higher accuracy than prior memory-efficient transfer learning methods while keeping similar parameter an...
Enhancing Chat Language Models by Scaling High-quality Instructional Conversations
cs.CL 2023-05 conditional novelty 6.0

UltraChat supplies 1.5 million high-quality multi-turn dialogues that, when used to fine-tune LLaMA, produce UltraLLaMA, which outperforms prior open-source chat models including Vicuna.
SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models
cs.CL 2023-03 unverdicted novelty 6.0

SelfCheckGPT detects hallucinations by checking consistency across multiple sampled responses from black-box LLMs on WikiBio biography generation tasks.
Semantic Uncertainty: Linguistic Invariances for Uncertainty Estimation in Natural Language Generation
cs.CL 2023-02 unverdicted novelty 6.0

Semantic entropy improves uncertainty estimation in natural language generation by incorporating semantic equivalences, outperforming standard entropy baselines on predicting model accuracy for question answering.
Commonsense Knowledge with Negation: A Resource to Enhance Negation Understanding
cs.CL 2026-04 unverdicted novelty 5.0

Augmenting commonsense knowledge corpora with negation produces over 2M new triples that benefit LLM negation understanding when used for pre-training.