pith. sign in

arxiv: 2511.01188 · v3 · submitted 2025-11-03 · 💻 cs.CL · cs.AI

ZoFia: Zero-Shot Fake News Detection with Entity-Guided Retrieval and Multi-LLM Interaction

Pith reviewed 2026-05-18 01:44 UTC · model grok-4.3

classification 💻 cs.CL cs.AI
keywords zero-shot fake news detectionentity-guided retrievalmulti-LLM interactionadversarial debatelarge language modelsfact verificationmisinformation detection
0
0 comments X p. Extension

The pith

ZoFia detects fake news in zero-shot settings by retrieving evidence with core entities and verifying through multi-LLM adversarial debate.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper aims to create a fake news detector that requires no labeled training examples or model fine-tuning. Current large language models often fail on recent news because their knowledge stops at a certain date and because one model can get stuck in a biased line of thinking. ZoFia fixes this in two steps: first it pulls out the main entities from the story to search for supporting or contradicting facts from two different sources, then it has several language models reason from different angles and debate until they agree on a judgment. A reader would care if this works because it could help spot misleading information about unfolding events quickly and at low cost. Experiments on two public datasets demonstrate that the full system exceeds other zero-shot techniques and surpasses most methods that do use a small number of examples.

Core claim

The authors propose ZoFia, a two-stage zero-shot fake news detection framework. The first stage uses a novel Hierarchical Salience and Salience-Calibrated Minimum Marginal Relevance (SC-MMR) algorithm to extract core entities that drive dual-source retrieval to overcome knowledge and evidence gaps. The second stage employs a multi-agent system for multi-perspective reasoning and verification in parallel, achieving an explainable and robust result via adversarial debate. Comprehensive experiments on two public datasets show that ZoFia outperforms existing zero-shot baselines and even most few-shot methods.

What carries the argument

Entity-guided dual-source retrieval using the SC-MMR algorithm for core entity extraction, paired with a multi-agent adversarial debate system.

If this is right

  • Detection of time-sensitive fake news becomes possible without task-specific training or labeled data.
  • The adversarial debate provides built-in explanations for the detection decision.
  • Knowledge cutoffs and hallucinations are mitigated through external evidence retrieval.
  • Confirmation bias is reduced by requiring multiple models to reconcile differing perspectives.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The entity extraction technique might apply to improving retrieval in other LLM applications like question answering.
  • Multi-agent debate could be tested for enhancing decision making in fields with high uncertainty such as climate science reporting.
  • Integrating real-time web search into the retrieval stage would likely strengthen performance on breaking stories.

Load-bearing premise

The approach relies on the premise that automatically identified core entities will lead to useful dual-source evidence and that multi-LLM adversarial debate will correct for individual model biases and knowledge limitations.

What would settle it

If evaluations on additional recent news datasets reveal that ZoFia performs no better than a simple single-LLM classification prompt, this would indicate that the retrieval and debate components do not deliver the expected improvements.

Figures

Figures reproduced from arXiv: 2511.01188 by Lvhua Wu, Min Liu, Sheng Sun, Tian Wen, Xuefeng Jiang, Yan Lei, Yuwei Wang.

Figure 1
Figure 1. Figure 1: Overall architecture of our proposed ZoFia framework. to perform named entity recognition (NER) on the news text. This process can be expressed as: {(ti , ei , ci)} N i=1 = MBERT-NER(T), (1) where M is the pre-trained model, T is the input news text. (ti , ei , ci) denotes the recognized entity triplet, ti is the entity token, ei is the entity label, and ci is the confidence score for the corresponding lab… view at source ↗
Figure 2
Figure 2. Figure 2: The diagram of Multi-Source Information Matrix and the quadrants used by LLM agents. 4.1.1 Linguist Following prior studies (Shahid et al., 2022)(Zhou and Zafarani, 2018), the linguist agent is designed to systematically divide the text into 5 linguistic dimensions that are strongly associated with misin￾formation: • Sentence: Lexical complexity, sentence length, and formality of tone. • Word: Frequency of… view at source ↗
Figure 3
Figure 3. Figure 3: Performance comparison of ZoFia’s keyword extraction with other extraction methods. Effectiveness of Entity-Guided Retrieval. As shown in [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: The effect of the number of keywords k on performance (F1-Score). to adapt the size of the keyword set while avoiding performance loss due to redundancy. 6 Conclusion We propose ZoFia, a training-free zero-shot frame￾work for fake news detection, to address the funda￾mental conflict between the static prior knowledge of large language models (LLMs) and the uncer￾tainty of dynamic news streams. ZoFia uses a… view at source ↗
read the original abstract

The rapid spread of fake news threatens social stability and public trust, highlighting the urgent need for its effective detection. Although large language models (LLMs) show potential in fake news detection, they are limited by knowledge cutoff and easily generate factual hallucinations when handling time-sensitive news. Furthermore, the thinking of a single LLM easily falls into early stance locking and confirmation bias, making it hard to handle both content reasoning and fact checking simultaneously. To address these challenges, we propose ZoFia, a two-stage zero-shot fake news detection framework. In the first retrieval stage, we propose novel Hierarchical Salience and Salience-Calibrated Minimum Marginal Relevance (SC-MMR) algorithm to extract core entities accurately, which drive dual-source retrieval to overcome knowledge and evidence gaps. In the subsequent stage, a multi-agent system conducts multi-perspective reasoning and verification in parallel and achieves an explainable and robust result via adversarial debate. Comprehensive experiments on two public datasets show that ZoFia outperforms existing zero-shot baselines and even most few-shot methods. Our code has been open-sourced to facilitate the research community at https://github.com/SakiRinn/ZoFia.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The manuscript presents ZoFia, a two-stage zero-shot fake news detection framework. Stage one extracts core entities via a Hierarchical Salience method and the Salience-Calibrated Minimum Marginal Relevance (SC-MMR) algorithm to drive dual-source retrieval, addressing knowledge cutoffs and evidence gaps. Stage two deploys a multi-agent LLM system for parallel multi-perspective reasoning and adversarial debate to yield explainable, robust verdicts. Comprehensive experiments on two public datasets are reported to show that ZoFia outperforms existing zero-shot baselines and most few-shot methods. The code is open-sourced.

Significance. If the results hold under rigorous evaluation, the work supplies a practical zero-shot pipeline that combines entity-guided retrieval with multi-LLM adversarial interaction to mitigate hallucinations and confirmation bias. The open-sourced implementation is a clear strength that aids reproducibility and community follow-up.

major comments (1)
  1. [Experimental Evaluation] Experimental Evaluation section: The central claim is that entity-guided dual-source retrieval plus multi-LLM debate overcomes knowledge cutoffs and hallucinations specifically for time-sensitive news. Standard public fake-news corpora (e.g., LIAR, FakeNewsNet) predominantly contain items published well before the training cutoffs of the LLMs used (2021–2023). In this regime the base models already encode the relevant facts, so measured gains cannot be attributed to the retrieval stage’s ability to supply post-cutoff evidence. A controlled evaluation on recent, post-cutoff news items is required to substantiate the motivating failure mode.
minor comments (2)
  1. [Abstract] Abstract: The claim of outperformance is stated without any quantitative metrics, baseline names, or significance tests. A concise summary of key F1 or accuracy deltas would make the abstract self-contained.
  2. [Methods] Methods: The precise formulation of the SC-MMR scoring function and the protocol for the adversarial debate (number of agents, turn structure, aggregation rule) would benefit from pseudocode or a worked example to improve clarity and reproducibility.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their thoughtful and constructive review of our manuscript. We address the major comment regarding the experimental evaluation below, providing a point-by-point response and indicating planned revisions to strengthen the work.

read point-by-point responses
  1. Referee: [Experimental Evaluation] Experimental Evaluation section: The central claim is that entity-guided dual-source retrieval plus multi-LLM debate overcomes knowledge cutoffs and hallucinations specifically for time-sensitive news. Standard public fake-news corpora (e.g., LIAR, FakeNewsNet) predominantly contain items published well before the training cutoffs of the LLMs used (2021–2023). In this regime the base models already encode the relevant facts, so measured gains cannot be attributed to the retrieval stage’s ability to supply post-cutoff evidence. A controlled evaluation on recent, post-cutoff news items is required to substantiate the motivating failure mode.

    Authors: We appreciate the referee's careful analysis of the temporal characteristics of our evaluation datasets. We acknowledge that the LIAR and FakeNewsNet corpora primarily consist of news items predating the knowledge cutoffs of the LLMs employed in our experiments. While the current results demonstrate the benefits of hierarchical entity salience retrieval and multi-LLM adversarial debate for robust zero-shot detection, we agree that these benchmarks do not fully isolate the framework's ability to address post-cutoff knowledge gaps in time-sensitive scenarios. To directly substantiate this aspect of our motivating claims, we will revise the Experimental Evaluation section to include a controlled evaluation on a set of recent news items published after 2023. This addition will feature a new dataset or curated collection of contemporary articles, with ablation studies isolating the contribution of the dual-source retrieval stage in supplying up-to-date evidence. We believe this revision will better align the empirical evaluation with the paper's focus on overcoming knowledge cutoffs. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical pipeline with no derivations or self-referential reductions

full rationale

The paper proposes ZoFia as a two-stage zero-shot framework: entity extraction via Hierarchical Salience and SC-MMR for dual-source retrieval, followed by multi-agent adversarial debate. No equations, fitted parameters, predictions derived from inputs, or load-bearing self-citations appear in the abstract or described method. The central claims rest on experimental outperformance on public datasets rather than any mathematical derivation that reduces to its own definitions or prior author results. This is a standard engineering contribution validated empirically and is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The framework rests on the untested premise that entity salience extraction plus multi-LLM debate can close knowledge gaps in LLMs for current events; no free parameters, axioms, or invented entities are explicitly listed in the abstract.

pith-pipeline@v0.9.0 · 5753 in / 1081 out tokens · 23072 ms · 2026-05-18T01:44:04.400719+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

51 extracted references · 51 canonical work pages · 7 internal anchors

  1. [1]

    Vian Bakir and Andrew McStay. 2018. Fake news and the economy of emotions: Problems, causes, solutions. Digital journalism, 6(2):154--175

  2. [2]

    Benjamin Bullough, Harrison Lundberg, Chen Hu, and Weihang Xiao. 2024. Predicting entity salience in extremely short documents. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track, pages 50--64

  3. [3]

    Jaime Carbonell and Jade Goldstein. 1998. The use of mmr, diversity-based reranking for reordering documents and producing summaries. In Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval, pages 335--336

  4. [4]

    Claudio Carpineto and Giovanni Romano. 2012. A survey of automatic query expansion in information retrieval. Acm Computing Surveys (CSUR), 44(1):1--50

  5. [5]

    Chi-Min Chan, Weize Chen, Yusheng Su, Jianxuan Yu, Wei Xue, Shanghang Zhang, Jie Fu, and Zhiyuan Liu. 2023. Chateval: Towards better llm-based evaluators through multi-agent debate. arXiv preprint arXiv:2308.07201

  6. [6]

    Canyu Chen and Kai Shu. 2023. Can llm-generated misinformation be detected? arXiv preprint arXiv:2309.13788

  7. [7]

    Jeffrey Cheng, Marc Marone, Orion Weller, Dawn Lawrie, Daniel Khashabi, and Benjamin Van Durme. 2024. Dated data: Tracing knowledge cutoffs in large language models. arXiv preprint arXiv:2403.12958

  8. [8]

    DeepSeek‑AI. 2024. https://arxiv.org/abs/2412.19437 Deepseek‑v3 technical report . Preprint, arXiv:2412.19437

  9. [9]

    Chunyuan Deng, Yilun Zhao, Xiangru Tang, Mark Gerstein, and Arman Cohan. 2023. Investigating data contamination in modern benchmarks for large language models. arXiv preprint arXiv:2311.09783

  10. [10]

    Jesse Dunietz and Dan Gillick. 2014. A new entity salience task with millions of training examples. In Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, volume 2: Short Papers, pages 205--209

  11. [11]

    Shangbin Feng, Weijia Shi, Yike Wang, Wenxuan Ding, Vidhisha Balachandran, and Yulia Tsvetkov. 2024. Don't hallucinate, abstain: Identifying llm knowledge gaps via multi-llm collaboration. arXiv preprint arXiv:2402.00367

  12. [12]

    Marc Fisher, John Woodrow Cox, and Peter Hermann. 2016. Pizzagate: From rumor, to hashtag, to gunfire in dc. Washington Post, 6:8410--8415

  13. [13]

    Hao Guo, Zihan Ma, Zhi Zeng, Minnan Luo, Weixin Zeng, Jiuyang Tang, and Xiang Zhao. 2025. Each fake news is fake in its own way: An attribution multi-granularity benchmark for multimodal fake news detection. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 39, pages 228--236

  14. [14]

    Feng Hou, Ruili Wang, Jun He, and Yi Zhou. 2021. Improving entity linking through semantic reinforced entity embeddings. arXiv preprint arXiv:2106.08495

  15. [15]

    Nathaniel Hoy and Theodora Koulouri. 2022. Exploring the generalisability of fake news detection models. In 2022 IEEE International Conference on Big Data (Big Data), pages 5731--5740. IEEE

  16. [16]

    Beizhe Hu, Qiang Sheng, Juan Cao, Yuhui Shi, Yang Li, Danding Wang, and Peng Qi. 2024 a . Bad actor, good advisor: Exploring the role of large language models in fake news detection. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 38, pages 22105--22113

  17. [17]

    Beizhe Hu, Qiang Sheng, Juan Cao, Yuhui Shi, Yang Li, Danding Wang, and Peng Qi. 2024 b . Bad actor, good advisor: Exploring the role of large language models in fake news detection. In Proceedings of the AAAI conference on artificial intelligence, volume 38, pages 22105--22113

  18. [18]

    Weiqi Hu, Ye Wang, Yan Jia, Qing Liao, and Bin Zhou. 2024 c . A multi-modal prompt learning framework for early detection of fake news. In Proceedings of the International AAAI Conference on Web and Social Media, volume 18, pages 651--662

  19. [19]

    Ziwei Ji, Nayeon Lee, Rita Frieske, Tiezheng Yu, Dan Su, Yan Xu, Etsuko Ishii, Ye Jin Bang, Andrea Madotto, and Pascale Fung. 2023. Survey of hallucination in natural language generation. ACM computing surveys, 55(12):1--38

  20. [20]

    Gongyao Jiang, Shuang Liu, Yu Zhao, Yueheng Sun, and Meishan Zhang. 2022. Fake news detection via knowledgeable prompt learning. Information Processing & Management, 59(5):103029

  21. [21]

    Xuefeng Jiang, Lvhua Wu, Sheng Sun, Jia Li, Jingjing Xue, Yuwei Wang, Tingting Wu, and Min Liu. 2024. Investigating large language models for code vulnerability detection: An experimental study. arXiv preprint arXiv:2412.18260

  22. [22]

    Yiqiao Jin, Minje Choi, Gaurav Verma, Jindong Wang, and Srijan Kumar. 2024. Mm-soc: Benchmarking multimodal large language models in social media platforms. arXiv preprint arXiv:2402.14154

  23. [23]

    Rohit Kumar Kaliyar, Anurag Goswami, and Pratik Narang. 2021. Fakebert: Fake news detection in social media with a bert-based deep learning approach. Multimedia tools and applications, 80(8):11765--11788

  24. [24]

    Soveatin Kuntur, Anna Wr A blewska, Marcin Paprzycki, and Maria Ganzha. 2024. Fake news detection: It's all in the data! arXiv preprint arXiv:2407.02122

  25. [25]

    Xiaochong Lan, Chen Gao, Depeng Jin, and Yong Li. 2024. Stance detection with collaborative role-infused llm-based agents. In Proceedings of the international AAAI conference on web and social media, volume 18, pages 891--903

  26. [26]

    u ttler, Mike Lewis, Wen-tau Yih, Tim Rockt \

    Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich K \"u ttler, Mike Lewis, Wen-tau Yih, Tim Rockt \"a schel, and 1 others. 2020. Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in neural information processing systems, 33:9459--9474

  27. [27]

    Jia Li, Lijie Hu, Zhixian He, Jingfeng Zhang, Tianhang Zheng, and Di Wang. 2024. Text guided image editing with automatic concept locating and forgetting. arXiv preprint arXiv:2405.19708

  28. [28]

    Jia Li, Lijie Hu, Jingfeng Zhang, Tianhang Zheng, Hua Zhang, and Di Wang. 2025. Fair text-to-image diffusion via fair mapping. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 39, pages 26256--26264

  29. [29]

    Tian Liang, Zhiwei He, Wenxiang Jiao, Xing Wang, Yan Wang, Rui Wang, Yujiu Yang, Shuming Shi, and Zhaopeng Tu. 2023. Encouraging divergent thinking in large language models through multi-agent debate. arXiv preprint arXiv:2305.19118

  30. [30]

    Ye Liu, Jiajun Zhu, Kai Zhang, Haoyu Tang, Yanghai Zhang, Xukai Liu, Qi Liu, and Enhong Chen. 2024. Detect, investigate, judge and determine: A novel llm-based framework for few-shot fake news detection. arXiv preprint arXiv:2407.08952

  31. [31]

    Yuhan Liu, Yuxuan Liu, Xiaoqing Zhang, Xiuying Chen, and Rui Yan. 2025. The truth becomes clearer through debate! multi-agent systems with large language models unmask fake news. In Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 504--514

  32. [32]

    Federico Monti, Fabrizio Frasca, Davide Eynard, Damon Mannion, and Michael M Bronstein. 2019. Fake news detection on social media using geometric deep learning. arXiv preprint arXiv:1902.06673

  33. [33]

    Qiong Nan, Juan Cao, Yongchun Zhu, Yanyan Wang, and Jintao Li. 2021. Mdfend: Multi-domain fake news detection. In Proceedings of the 30th ACM international conference on information & knowledge management, pages 3343--3347

  34. [34]

    Qiong Nan, Qiang Sheng, Juan Cao, Beizhe Hu, Danding Wang, and Jintao Li. 2024. Let silence speak: Enhancing fake news detection with generated comments from large language models. In Proceedings of the 33rd ACM International Conference on Information and Knowledge Management, pages 1732--1742

  35. [35]

    Bo Ni, Zhichun Guo, Jianing Li, and Meng Jiang. 2020. Improving generalizability of fake news detection methods using propensity score matching. arXiv preprint arXiv:2002.00838

  36. [36]

    Cheng Niu, Yang Guan, Yuanhao Wu, Juno Zhu, Juntong Song, Randy Zhong, Kaihua Zhu, Siliang Xu, Shizhe Diao, and Tong Zhang. 2024. Veract scan: Retrieval-augmented fake news detection with justifiable reasoning. arXiv preprint arXiv:2406.10289

  37. [37]

    Bohdan M Pavlyshenko. 2023. Analysis of disinformation and fake news detection using fine-tuned large language model. arXiv preprint arXiv:2309.04704

  38. [38]

    Anupam Purwar and 1 others. 2024. Evaluating the efficacy of open-source llms in enterprise-specific rag systems: A comparative study of performance and scalability. arXiv preprint arXiv:2406.11424

  39. [39]

    Nils Reimers and Iryna Gurevych. 2019. Sentence-bert: Sentence embeddings using siamese bert-networks. arXiv preprint arXiv:1908.10084

  40. [40]

    Wajiha Shahid, Bahman Jamshidi, Saqib Hakak, Haruna Isah, Wazir Zada Khan, Muhammad Khurram Khan, and Kim-Kwang Raymond Choo. 2022. Detecting and mitigating the dissemination of fake news: Challenges and future research opportunities. IEEE Transactions on Computational Social Systems, 11(4):4649--4662

  41. [41]

    Kai Shu, Deepak Mahudeswaran, Suhang Wang, Dongwon Lee, and Huan Liu. 2020. Fakenewsnet: A data repository with news content, social context, and spatiotemporal information for studying fake news on social media. Big data, 8(3):171--188

  42. [42]

    Karen Sparck Jones. 1972. A statistical interpretation of term specificity and its application in retrieval. Journal of documentation, 28(1):11--21

  43. [43]

    Jinyan Su, Claire Cardie, and Preslav Nakov. 2023. Adapting fake news detection to the era of large language models. arXiv preprint arXiv:2311.04917

  44. [44]

    Tjong Kim Sang and Fien De Meulder

    Erik F. Tjong Kim Sang and Fien De Meulder. 2003. https://www.aclweb.org/anthology/W03-0419 Introduction to the C o NLL -2003 shared task: Language-independent named entity recognition . In Proceedings of the Seventh Conference on Natural Language Learning at HLT - NAACL 2003 , pages 142--147

  45. [45]

    Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao. 2023. https://arxiv.org/abs/2210.03629 React: Synergizing reasoning and acting in language models . Preprint, arXiv:2210.03629

  46. [46]

    Jingyuan Yi, Zeqiu Xu, Tianyi Huang, and Peiyang Yu. 2025. https://arxiv.org/abs/2502.00339 Challenges and innovations in llm-powered fake news detection: A synthesis of approaches and future directions . Preprint, arXiv:2502.00339

  47. [47]

    Zhuosheng Zhang, Aston Zhang, Mu Li, and Alex Smola. 2022. https://arxiv.org/abs/2210.03493 Automatic chain of thought prompting in large language models . Preprint, arXiv:2210.03493

  48. [48]

    Xinyi Zhou and Reza Zafarani. 2018. Fake news: A survey of research, detection methods, and opportunities. arXiv preprint arXiv:1812.00315, 2:13

  49. [49]

    Xinyi Zhou and Reza Zafarani. 2020. A survey of fake news: Fundamental theories, detection methods, and opportunities. ACM Computing Surveys (CSUR), 53(5):1--40

  50. [50]

    online" 'onlinestring :=

    ENTRY address archivePrefix author booktitle chapter edition editor eid eprint eprinttype howpublished institution journal key month note number organization pages publisher school series title type volume year doi pubmed url lastchecked label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block STRING...

  51. [51]

    write newline

    " write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in capitalize " " * FUNCT...