"Liar, Liar Pants on Fire": A New Benchmark Dataset for Fake News Detection

William Yang Wang

arxiv: 1705.00648 · v1 · pith:TJRZKSXZnew · submitted 2017-05-01 · 💻 cs.CL · cs.CY

"Liar, Liar Pants on Fire": A New Benchmark Dataset for Fake News Detection

William Yang Wang This is my paper

classification 💻 cs.CL cs.CY

keywords fakenewsdetectiondatasetliarautomaticbenchmarkdatasets

0 comments

read the original abstract

Automatic fake news detection is a challenging problem in deception detection, and it has tremendous real-world political and social impacts. However, statistical approaches to combating fake news has been dramatically limited by the lack of labeled benchmark datasets. In this paper, we present liar: a new, publicly available dataset for fake news detection. We collected a decade-long, 12.8K manually labeled short statements in various contexts from PolitiFact.com, which provides detailed analysis report and links to source documents for each case. This dataset can be used for fact-checking research as well. Notably, this new dataset is an order of magnitude larger than previously largest public fake news datasets of similar type. Empirically, we investigate automatic fake news detection based on surface-level linguistic patterns. We have designed a novel, hybrid convolutional neural network to integrate meta-data with text. We show that this hybrid approach can improve a text-only deep learning model.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

AI Feedback Enhances Community-Based Content Moderation through Engagement with Counterarguments
cs.CY 2025-07 unverdicted novelty 5.0

AI argumentative feedback on community notes produces larger quality improvements than supportive or neutral feedback in a hybrid moderation experiment.