Recognition: unknown
From Articles to Premises: Building PrimeFacts, an Extraction Methodology and Resource for Fact-Checking Evidence
Pith reviewed 2026-05-08 10:42 UTC · model grok-4.3
The pith
Rewriting anchor sentences from fact-checking articles into standalone premises boosts retrieval accuracy by up to 30 percent and verdict prediction by 10-20 points.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Decontextualized evidence yields higher retrievability, achieving up to a 30 percent relative gain in Mean Reciprocal Rank over verbatim sentences, and using the evidence for verdict prediction raises Macro-F1 by 10-20 points over the baseline, with consistent results across 2-class and 5-class verdicts and different model architectures.
What carries the argument
The PrimeFacts extraction pipeline that locates in-article hyperlinks as anchors and uses LLMs to rewrite the anchor sentences into stand-alone, context-independent premises.
If this is right
- The performance gains hold across both binary and five-class verdict settings as well as multiple model architectures.
- The extracted premises remain faithful to the source articles according to qualitative review.
- The resource of 49,718 premises can be reused for training or evaluating automated verification systems.
- Additional implicit evidence can be extracted beyond the anchor-based premises to further enrich the dataset.
Where Pith is reading between the lines
- The premises could serve as building blocks for a shared database that reduces repeated fact-checking effort on overlapping claims.
- Similar decontextualization steps might improve evidence handling in other unstructured document domains such as legal or scientific literature.
- Comparing LLM-rewritten premises against human-rewritten versions would test whether the gains depend on the specific rewriting model.
- Integrating the pipeline into live fact-checking workflows could make existing human research more directly usable by automated tools.
Load-bearing premise
That LLM rewriting of anchor sentences produces premises faithful to the original sources and that the observed gains result from decontextualization rather than other modeling choices.
What would settle it
Re-running the retrieval and verdict-prediction experiments with the original verbatim anchor sentences substituted for the LLM-rewritten versions and checking whether the MRR and Macro-F1 improvements disappear.
Figures
read the original abstract
Fact-checking articles encode rich supporting evidence and reasoning, yet this evidence remains largely inaccessible to automated verification systems due to unstructured presentation. We introduce PrimeFacts, a methodology and resource for extracting fine-grained evidence from full fact-checking articles. We compile 13,106 PolitiFact articles with claims, verdicts, and all referenced sources, and we identify 49,718 in-article hyperlinks as natural anchors to pinpoint key evidence. Our framework leverages large language models (LLMs) to rewrite these anchor sentences into stand-alone, context-independent premises and investigates the extraction of additional implicit evidence. In evaluations on cross-article evidence retrieval and claim verification, the extracted premises substantially improve performance. Decontextualized evidence yields higher retrievability, achieving up to a 30 percent relative gain in Mean Reciprocal Rank over verbatim sentences, and using the evidence for verdict prediction raises Macro-F1 by 10-20 points over the baseline. These gains are consistent across different verdict granularities (2-class vs. 5-class) and model architectures. A qualitative analysis indicates that the decontextualized premises remain faithful to the original sources. Our work highlights the promise of reusing fact-checkers' evidence for automation and provides a large-scale resource of structured evidence from real-world fact-checks.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces PrimeFacts, a methodology and large-scale resource for extracting fine-grained, decontextualized evidence from 13,106 PolitiFact fact-checking articles. It identifies 49,718 in-article hyperlinks as anchors, uses LLMs to rewrite anchor sentences into standalone premises (and extract implicit evidence), and reports that these premises yield up to 30% relative MRR gains in cross-article retrieval and 10-20 point Macro-F1 gains in verdict prediction over baselines, with gains consistent across 2-class/5-class settings and model architectures. A qualitative analysis is cited to support faithfulness to original sources.
Significance. If the gains are causally attributable to decontextualization and the premises are verifiably faithful, the work supplies a valuable public resource of structured human fact-checker evidence that could advance automated retrieval and verification pipelines. The empirical focus on reusing real-world fact-checks rather than synthetic data is a concrete strength.
major comments (3)
- [Abstract / evaluation] Abstract and evaluation sections: the claim that decontextualized premises 'remain faithful' rests solely on an unspecified qualitative analysis with no reported sample size, inter-annotator agreement, or quantitative metrics (e.g., NLI entailment/contradiction rates). This is load-bearing for the central thesis that performance lifts stem from faithful decontextualization.
- [Evaluation] Evaluation on retrieval and verification: the 30% MRR and 10-20 F1 gains are presented without baseline details, statistical significance tests, error bars, or exact experimental configurations (e.g., retrieval corpus construction, model hyperparameters). This prevents ruling out that gains arise from incidental LLM rephrasing effects rather than removal of context.
- [Methodology / experiments] No ablation is described that holds length, lexical diversity, or embedding statistics constant while varying only contextual anchoring. Without such controls, the decontextualization hypothesis cannot be isolated from simpler summarization or length-reduction artifacts.
minor comments (2)
- [Abstract] The abstract states gains are 'consistent across different verdict granularities and model architectures' but does not reference specific tables or figures showing per-setting breakdowns.
- [Data collection] Dataset statistics (13,106 articles, 49,718 anchors) are given without discussion of coverage gaps or selection biases in the PolitiFact corpus.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback. The comments identify key areas where additional transparency and controls will strengthen the manuscript. We address each major comment below and will incorporate revisions accordingly.
read point-by-point responses
-
Referee: [Abstract / evaluation] Abstract and evaluation sections: the claim that decontextualized premises 'remain faithful' rests solely on an unspecified qualitative analysis with no reported sample size, inter-annotator agreement, or quantitative metrics (e.g., NLI entailment/contradiction rates). This is load-bearing for the central thesis that performance lifts stem from faithful decontextualization.
Authors: We agree that the qualitative analysis section requires substantially more detail to support the faithfulness claim. In the revised manuscript we will expand the relevant section to specify the sample size analyzed, report inter-annotator agreement, and add quantitative faithfulness metrics (e.g., NLI entailment/contradiction rates on the sampled premises). These additions will provide stronger evidence that the observed performance gains are linked to faithful decontextualization rather than other factors. revision: yes
-
Referee: [Evaluation] Evaluation on retrieval and verification: the 30% MRR and 10-20 F1 gains are presented without baseline details, statistical significance tests, error bars, or exact experimental configurations (e.g., retrieval corpus construction, model hyperparameters). This prevents ruling out that gains arise from incidental LLM rephrasing effects rather than removal of context.
Authors: We acknowledge that the current presentation of results lacks sufficient experimental detail. The revised version will include complete baseline descriptions, statistical significance tests with p-values, error bars on all reported metrics, and precise specifications of the retrieval corpus construction and model hyperparameters. We will also add a direct comparison against LLM-rephrased (but non-decontextualized) sentences to help isolate the contribution of context removal from general rephrasing effects. revision: yes
-
Referee: [Methodology / experiments] No ablation is described that holds length, lexical diversity, or embedding statistics constant while varying only contextual anchoring. Without such controls, the decontextualization hypothesis cannot be isolated from simpler summarization or length-reduction artifacts.
Authors: This is a fair criticism of the current experimental design. While our existing baselines include verbatim sentences, we did not perform a controlled ablation that matches length and lexical statistics. In the revision we will add such an ablation: we will generate length- and diversity-matched control premises (via LLM paraphrasing that preserves contextual anchors) and compare retrieval and verification performance against the decontextualized premises. This will provide a clearer isolation of the decontextualization effect. revision: yes
Circularity Check
No circularity: empirical methodology with baseline comparisons
full rationale
The paper introduces a data collection and LLM-rewriting pipeline for premise extraction, then reports empirical results on retrieval (MRR) and verification (Macro-F1) tasks against explicitly stated baselines. No mathematical derivations, equations, fitted parameters renamed as predictions, or load-bearing self-citations appear in the described chain. The central claims rest on experimental measurements rather than any reduction to inputs by construction, satisfying the self-contained criterion.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption LLMs can rewrite sentences into context-independent premises while preserving original meaning
- domain assumption Hyperlinks in fact-checking articles point to key supporting evidence
Reference graph
Works this paper leans on
-
[1]
Chen, Tong and Wang, Hongwei and Chen, Sihao and Yu, Wenhao and Ma, Kaixin and Zhao, Xinran and Yu, Dong and Zhang, Hongming , booktitle =. Dense. 2024 , publisher =
2024
-
[2]
Findings of the Association for Computational Linguistics: NAACL 2025 , pages =
Language Modeling with Editable External Knowledge , author =. Findings of the Association for Computational Linguistics: NAACL 2025 , pages =. 2025 , publisher =
2025
-
[3]
Alhindi, Tariq and Petridis, Savvas and Muresan, Smaranda , editor =. Where Is. Proceedings of the. doi:10.18653/v1/W18-5513 , urldate =
-
[4]
doi:10.48550/arXiv.2106.05707 , urldate =
Aly, Rami and Guo, Zhijiang and Schlichtkrull, Michael and Thorne, James and Vlachos, Andreas and Christodoulopoulos, Christos and Cocarascu, Oana and Mittal, Arpit , year = 2021, month = oct, number =. doi:10.48550/arXiv.2106.05707 , urldate =. arXiv , keywords =:2106.05707 , primaryclass =
-
[5]
Augenstein, Isabelle and Lioma, Christina and Wang, Dongsheng and Chaves Lima, Lucas and Hansen, Casper and Hansen, Christian and Simonsen, Jakob Grue , editor =. Proceedings of the 2019. doi:10.18653/v1/D19-1475 , urldate =
-
[6]
arXiv preprint arXiv:1508.05326 (2015)
A Large Annotated Corpus for Learning Natural Language Inference , author =. doi:10.48550/arXiv.1508.05326 , urldate =. arXiv , keywords =:1508.05326 , primaryclass =
-
[7]
Cazzamatta, Regina , year = 2025, month = may, journal =. Building a. doi:10.17645/mac.9389 , urldate =
-
[8]
Cazzamatta, Regina , year = 2025, month = may, journal =. Decoding. doi:10.1080/1461670X.2025.2470177 , urldate =
-
[9]
Cazzamatta, Regina , year = 2025, month = apr, journal =. Redefining Objectivity:. doi:10.1177/02673231251319145 , urldate =
-
[10]
Chen, Chih Yao and Wu, Dennis and Ku, Lun-Wei , editor =. Findings of the. doi:10.18653/v1/2023.findings-acl.296 , urldate =
-
[11]
Choi, Eunsol and Palomaki, Jennimaria and Lamm, Matthew and Kwiatkowski, Tom and Das, Dipanjan and Collins, Michael , editor =. Decontextualization:. Transactions of the Association for Computational Linguistics , volume =. doi:10.1162/tacl_a_00377 , urldate =
-
[12]
Deng, Zhenyun and Schlichtkrull, Michael and Vlachos, Andreas , editor =. Document-Level. Proceedings of the 62nd. doi:10.18653/v1/2024.acl-long.645 , urldate =
-
[13]
DeYoung, Jay and Jain, Sarthak and Rajani, Nazneen Fatema and Lehman, Eric and Xiong, Caiming and Socher, Richard and Wallace, Byron C. , editor =. Proceedings of the 58th. doi:10.18653/v1/2020.acl-main.408 , urldate =
-
[14]
Falke, Tobias and Ribeiro, Leonardo F. R. and Utama, Prasetya Ajie and Dagan, Ido and Gurevych, Iryna , editor =. Ranking. Proceedings of the 57th. doi:10.18653/v1/P19-1213 , urldate =
-
[15]
Glockner, Max and Hou, Yufang and Gurevych, Iryna , editor =. Missing. Proceedings of the 2022. doi:10.18653/v1/2022.emnlp-main.397 , urldate =
-
[16]
Deciding What's True: The Rise of Political Fact-Checking in
Graves, Lucas , year = 2016, publisher =. Deciding What's True: The Rise of Political Fact-Checking in
2016
-
[17]
Gunjal, Anisha and Durrett, Greg , editor =. Molecular. Findings of the. doi:10.18653/v1/2024.findings-emnlp.215 , urldate =
-
[18]
Guo, Zhijiang and Schlichtkrull, Michael and Vlachos, Andreas , year = 2022, month = feb, journal =. A. doi:10.1162/tacl_a_00454 , urldate =
-
[19]
Humprecht, Edda , year = 2020, month = mar, journal =. How. doi:10.1080/21670811.2019.1691031 , urldate =
-
[20]
Jiang, Shan and Baumgartner, Simon and Ittycheriah, Abe and Yu, Cong , year = 2020, month = apr, series =. Factoring. Proceedings of. doi:10.1145/3366423.3380231 , urldate =
-
[21]
Jolly, Shailza and Atanasova, Pepa and Augenstein, Isabelle , year = 2022, month = oct, journal =. Generating. doi:10.3390/info13100500 , urldate =
-
[22]
Khan, Kashif and Wang, Ruizhe and Poupart, Pascal , editor =. Proceedings of the 60th. doi:10.18653/v1/2022.acl-long.92 , urldate =
-
[23]
Evaluating the Factual Consistency of Abstractive Text Summarization
Kryscinski, Wojciech and McCann, Bryan and Xiong, Caiming and Socher, Richard , editor =. Evaluating the. Proceedings of the 2020. doi:10.18653/v1/2020.emnlp-main.750 , urldate =
-
[24]
(2024) Bm25s: Orders of magnitude faster lexical search via eager sparse scoring
L. doi:10.48550/arXiv.2407.03618 , urldate =. arXiv , keywords =:2407.03618 , primaryclass =
-
[25]
doi:10.48550/arXiv.2310.09754 , urldate =
Ma, Huanhuan and Xu, Weizhi and Wei, Yifan and Chen, Liuji and Wang, Liang and Liu, Qiang and Wu, Shu and Wang, Liang , year = 2024, month = feb, number =. doi:10.48550/arXiv.2310.09754 , urldate =. arXiv , keywords =:2310.09754 , primaryclass =
-
[26]
and Raghavan, Prabhakar and Sch
Manning, Christopher D. and Raghavan, Prabhakar and Sch. Introduction to. doi:10.1017/CBO9780511809071 , urldate =
-
[27]
Maynez, Joshua and Narayan, Shashi and Bohnet, Bernd and McDonald, Ryan , editor =. On. Proceedings of the 58th. doi:10.18653/v1/2020.acl-main.173 , urldate =
-
[28]
Naik, Aakanksha and Ravichander, Abhilasha and Sadeh, Norman and Rose, Carolyn and Neubig, Graham , editor =. Stress. Proceedings of the 27th
-
[29]
Nakov, Preslav and Martino, Giovanni Da San and Elsayed, Tamer and. Overview of the. doi:10.48550/arXiv.2109.12987 , urldate =. arXiv , keywords =:2109.12987 , primaryclass =
-
[30]
Ostrowski, Wojciech and Arora, Arnav and Atanasova, Pepa and Augenstein, Isabelle , year = 2021, month = jun, number =. Multi-. doi:10.48550/arXiv.2009.06401 , urldate =. arXiv , keywords =:2009.06401 , primaryclass =
-
[31]
Panchendrarajan, Rrubaa and Zubiaga, Arkaitz , year = 2024, month = jun, journal =. Claim. doi:10.1016/j.nlp.2024.100066 , urldate =. arXiv , langid =:2401.11969 , primaryclass =
-
[32]
Sahitaj, Premtim and Maab, Iffat and Yamagishi, Junichi and Kolanowski, Jawan and M. Towards. doi:10.48550/arXiv.2502.08909 , urldate =. arXiv , keywords =:2502.08909 , primaryclass =
-
[33]
Schuster, Tal and Fisch, Adam and Barzilay, Regina , editor =. Get. Proceedings of the 2021. doi:10.18653/v1/2021.naacl-main.52 , urldate =
-
[34]
FEVER: a large-scale dataset for Fact Extraction and VERification
Thorne, James and Vlachos, Andreas and Christodoulopoulos, Christos and Mittal, Arpit , editor =. Proceedings of the 2018. doi:10.18653/v1/N18-1074 , urldate =
work page internal anchor Pith review doi:10.18653/v1/n18-1074 2018
-
[35]
Wang, William Yang , editor =. ``. Proceedings of the 55th. doi:10.18653/v1/P17-2067 , urldate =
-
[36]
Wang, Xiaoou and Cabrio, Elena and Villata, Serena , year = 2025, month = apr, journal =. When Automated Fact-Checking Meets Argumentation: Unveiling Fake News through Argumentative Evidence , shorttitle =. doi:10.1177/19462174251330980 , urldate =
-
[37]
Williams, Adina and Nangia, Nikita and Bowman, Samuel , editor =. A. Proceedings of the 2018. doi:10.18653/v1/N18-1101 , urldate =
-
[38]
J usti LM : Few-shot Justification Generation for Explainable Fact-Checking of Real-world Claims
Zeng, Fengzhu and Gao, Wei , year = 2024, journal =. doi:10.1162/tacl_a_00649 , urldate =
-
[39]
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations , pages=
Improving evidence retrieval for automated explainable fact-checking , author=. Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations , pages=
2021
-
[40]
arXiv e-prints , pages=
Evaluating Transparency of Machine Generated Fact Checking Explanations , author=. arXiv e-prints , pages=
-
[41]
Proceedings of the 2025 International ACM SIGIR Conference on Innovative Concepts and Theories in Information Retrieval (ICTIR) , pages=
The next phase of scientific fact-checking: advanced evidence retrieval from complex structured academic papers , author=. Proceedings of the 2025 International ACM SIGIR Conference on Innovative Concepts and Theories in Information Retrieval (ICTIR) , pages=
2025
-
[42]
Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval , pages=
On the role of relevance in natural language processing tasks , author=. Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval , pages=
-
[43]
Media and Communication , volume=
Building a Cross-Border European Information Network: Hyperlink Connections Among Fact-Checking Organizations , author=. Media and Communication , volume=
-
[44]
Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems , pages=
Show Me the Work: Fact-Checkers' Requirements for Explainable Automated Fact-Checking , author=. Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems , pages=
2025
-
[45]
Emerging Media , volume=
Challenges of automating fact-checking: A technographic case study , author=. Emerging Media , volume=. 2024 , publisher=
2024
-
[46]
Proceedings of the 15th ACM/IEEE-CS joint conference on digital libraries , pages=
No more 404s: predicting referenced link rot in scholarly articles for pro-active archiving , author=. Proceedings of the 15th ACM/IEEE-CS joint conference on digital libraries , pages=
-
[47]
PloS one , volume=
Scholarly context not found: one in five articles suffers from reference rot , author=. PloS one , volume=. 2014 , publisher=
2014
-
[48]
Advances in Neural Information Processing Systems , volume=
Averitec: A dataset for real-world claim verification with evidence from the web , author=. Advances in Neural Information Processing Systems , volume=
-
[49]
2024 , eprint=
BM25S: Orders of magnitude faster lexical search via eager sparse scoring , author=. 2024 , eprint=
2024
-
[50]
Research & politics , volume=
Checking how fact-checkers check , author=. Research & politics , volume=. 2018 , publisher=
2018
-
[51]
Proceedings of The Web Conference 2020 , pages=
Factoring fact-checks: Structured information extraction from fact-checking articles , author=. Proceedings of The Web Conference 2020 , pages=
2020
-
[52]
Information Processing & Management , volume=
Automated rhetorical move and step recognition in fact-checking articles with neural models , author=. Information Processing & Management , volume=. 2025 , publisher=
2025
-
[53]
Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence,
SAFE: Structured Argumentation for Fact-checking with Explanations , author =. Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence,. 2025 , month =. doi:10.24963/ijcai.2025/1274 , url =
-
[54]
Qwen3 technical report , author=. arXiv preprint arXiv:2505.09388 , year=
work page internal anchor Pith review arXiv
-
[55]
2025 , note =
Llama 4 Scout (17B, 16 Experts) — Instruct version , author =. 2025 , note =
2025
-
[56]
2024 , howpublished =
2024
-
[57]
Computational Linguistics , volume=
Parsing argumentation structures in persuasive essays , author=. Computational Linguistics , volume=. 2017 , publisher=
2017
-
[58]
Media Bias Detection Across Families of Language Models
Maab, Iffat and Marrese-Taylor, Edison and Pad \'o , Sebastian and Matsuo, Yutaka. Media Bias Detection Across Families of Language Models. Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers). 2024. doi:10.18653/v1/2024.naacl-long.227
-
[59]
Journal of Open Source Software , author =
Inscriptis - A Python-Based. Journal of Open Source Software , author =. doi:10.21105/joss.03557 , pages =
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.