pith. sign in

hub Canonical reference

FEVER: a large-scale dataset for Fact Extraction and VERification

Canonical reference. 71% of citing Pith papers cite this work as background.

55 Pith papers citing it
514 external citations · Pith
Background 71% of classified citations
abstract

In this paper we introduce a new publicly available dataset for verification against textual sources, FEVER: Fact Extraction and VERification. It consists of 185,445 claims generated by altering sentences extracted from Wikipedia and subsequently verified without knowledge of the sentence they were derived from. The claims are classified as Supported, Refuted or NotEnoughInfo by annotators achieving 0.6841 in Fleiss $\kappa$. For the first two classes, the annotators also recorded the sentence(s) forming the necessary evidence for their judgment. To characterize the challenge of the dataset presented, we develop a pipeline approach and compare it to suitably designed oracles. The best accuracy we achieve on labeling a claim accompanied by the correct evidence is 31.87%, while if we ignore the evidence we achieve 50.91%. Thus we believe that FEVER is a challenging testbed that will help stimulate progress on claim verification against textual sources.

hub tools

citation-role summary

background 6 dataset 1

citation-polarity summary

clear filters

representative citing papers

Discovering Latent Knowledge in Language Models Without Supervision

cs.CL · 2022-12-07 · conditional · novelty 8.0

An unsupervised technique extracts latent yes-no knowledge from language model activations by locating a direction that satisfies logical consistency properties, outperforming zero-shot accuracy by 4% on average across models and datasets.

RWGBench: Evaluating Scholarly Positioning in Related Work Generation

cs.DL · 2026-05-30 · unverdicted · novelty 7.0

RWGBench is a citation-centric benchmark for related work generation built from 40k CS papers and a 100-paper test set, with multi-dimensional metrics that better match human expert judgment than standard similarity scores.

Vector Linking via Cross-Model Local Isometric Consistency

cs.AI · 2026-05-29 · unverdicted · novelty 7.0

A reference-based geometric hashing method recovers cross-model vector correspondences by exploiting local isometric consistency in contrastive embeddings and iteratively bootstrapping from a seed of paired anchors.

TSVer: A Benchmark for Fact Verification Against Time-Series Evidence

cs.CL · 2025-11-02 · unverdicted · novelty 7.0

TSVer is a new benchmark dataset for fact verification against time-series evidence, with 304 annotated real-world claims, 400 time series, verdicts, and justifications, plus baseline results showing current models struggle.

The Warrant Gap: Claim-Conditioned Re-scoring for Fact-Checking

cs.CL · 2026-06-23 · unverdicted · novelty 6.0

Introduces claim-conditioned re-scoring (SIFT) and warranted supports proportion (WSP) metric, reporting accuracy recovery up to 27.6 points and WSP calibration at AUC 0.92 on FEVER, SciFact and other benchmarks.

RSRank: Learning Relevance from Representational Shifts

cs.IR · 2026-06-16 · unverdicted · novelty 6.0

RSRank learns calibrated relevance scores from alignment between representational shifts induced by candidate documents and those from oracle document sets, enabling zero-threshold filtering.

citing papers explorer

Showing 1 of 1 citing paper after filters.