pith. sign in

arxiv: 2504.10166 · v2 · submitted 2025-04-14 · 💻 cs.MM

Fact-Checking with Contextual Narratives: Leveraging Retrieval-Augmented LLMs for Social Media Analysis

Pith reviewed 2026-05-22 21:03 UTC · model grok-4.3

classification 💻 cs.MM
keywords fact-checkingretrieval-augmented generationclusteringlarge language modelsmultimodal evidencesocial medianarrative synthesis
0
0 comments X

The pith

CRAVE retrieves multimodal evidence from diverse sources, clusters it into coherent narratives, and uses an LLM judge to deliver explained fact-checking verdicts.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces CRAVE, a framework that pulls evidence from text and image sources across social media, groups that evidence into consistent narratives despite contradictions, and lets an LLM reach a verdict backed by evidence summaries. This setup aims to give fact-checkers a structured way to synthesize conflicting multimodal inputs rather than relying on unorganized retrieval or single-pass model judgments. If the approach holds, it would allow more reliable and transparent decisions on rapidly spreading claims by turning raw data into grouped stories that support clear conclusions.

Core claim

CRAVE automatically retrieves multimodal evidence from diverse, often contradictory, sources, clusters evidence into coherent narratives, and uses an LLM-based judge to deliver fact-checking verdicts explained by evidence summaries, with experiments demonstrating efficacy in retrieval precision, clustering quality, and judgment accuracy.

What carries the argument

The CRAVE framework, which combines retrieval-augmented LLMs with clustering to form coherent narratives from multimodal evidence for verification.

If this is right

  • Evidence clustering improves consistency when sources contradict one another.
  • LLM judgments gain explainability through attached narrative summaries.
  • Multimodal retrieval and refinement steps increase overall precision and quality.
  • The system serves as a practical decision-support aid for human fact-checkers.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The narrative-clustering step might help reduce unsupported outputs from LLMs in other verification tasks.
  • The framework could scale to monitor live social media streams for emerging claims.
  • Extending the same retrieval-plus-clustering pattern to video or audio evidence would test its limits on additional modalities.

Load-bearing premise

Clustering retrieved evidence into coherent narratives combined with an LLM-based judge will produce accurate, consistent fact-checking verdicts even when sources are contradictory and multimodal.

What would settle it

Run CRAVE on a labeled dataset of social media claims that include deliberately contradictory text and image evidence, then measure whether the LLM judge's verdicts match independent human fact-checker ground truth at high accuracy.

Figures

Figures reproduced from arXiv: 2504.10166 by Arka Ujjal Dey, Christian Schroeder de Witt, Georgia Channing, John Collomosse, Muhammad Junaid Awan.

Figure 1
Figure 1. Figure 1: CRAVE is a fact-checking method that analyses multi [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Overview of CRAVE: Given a claim (image + text), contained in a social media post, we retrieve evidence from reverse [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Reasoning protocol used by the LLM (GPT 4o) to assess [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Dynamic Clustering performance varies with dataset complexity, as higher similarity thresholds fragment evidence into [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Misclassified samples are color-coded—orange for those labeled ‘True’, blue for those labeled ‘Fake’. A sample with a [PITH_FULL_IMAGE:figures/full_fig_p010_5.png] view at source ↗
read the original abstract

We propose CRAVE (Cluster-based Retrieval Augmented Verification with Explanation); a novel framework that integrates retrieval-augmented Large Language Models (LLMs) with clustering techniques to address fact-checking challenges on social media. CRAVE automatically retrieves multimodal evidence from diverse, often contradictory, sources. Evidence is clustered into coherent narratives, and evaluated via an LLM-based judge to deliver fact-checking verdicts explained by evidence summaries. By synthesizing evidence from both text and image modalities and incorporating agent-based refinement, CRAVE ensures consistency and diversity in evidence representation. Comprehensive experiments demonstrate CRAVE's efficacy in retrieval precision, clustering quality, and judgment accuracy, showcasing its potential as a robust decision-support tool for fact-checkers.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 0 minor

Summary. The manuscript proposes CRAVE (Cluster-based Retrieval Augmented Verification with Explanation), a framework integrating retrieval-augmented LLMs with clustering for fact-checking social media claims. It retrieves multimodal evidence from diverse and often contradictory sources, clusters the evidence into coherent narratives, and uses an LLM-based judge to produce fact-checking verdicts accompanied by evidence summaries. The approach includes agent-based refinement to promote consistency and diversity across text and image modalities. The central claim is that this pipeline yields accurate, consistent verdicts, with comprehensive experiments demonstrating efficacy in retrieval precision, clustering quality, and judgment accuracy.

Significance. If the empirical results hold, the work could offer a practical decision-support system for fact-checkers by synthesizing contradictory multimodal evidence into narrative clusters rather than isolated snippets. The combination of RAG, clustering, and an LLM judge is a natural extension of existing retrieval and reasoning pipelines, with potential value in the emphasis on explanations. However, the significance is difficult to evaluate because the abstract asserts experimental support without specifying datasets, baselines, metrics, or controls, leaving the strength of the evidence unclear. No machine-checked proofs or parameter-free derivations are present.

major comments (1)
  1. [Abstract] Abstract: The claim that 'comprehensive experiments demonstrate CRAVE's efficacy in retrieval precision, clustering quality, and judgment accuracy' is load-bearing for the central contribution, yet the abstract (and the provided manuscript description) supplies no information on the datasets, baselines, evaluation metrics, number of trials, or controls used. Without these details, the data support for the efficacy claims cannot be assessed.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback on the abstract. We address the single major comment below.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The claim that 'comprehensive experiments demonstrate CRAVE's efficacy in retrieval precision, clustering quality, and judgment accuracy' is load-bearing for the central contribution, yet the abstract (and the provided manuscript description) supplies no information on the datasets, baselines, evaluation metrics, number of trials, or controls used. Without these details, the data support for the efficacy claims cannot be assessed.

    Authors: We agree that the abstract would be strengthened by briefly indicating the experimental details that support the efficacy claims. The full manuscript already contains dedicated sections on the evaluation (including the specific social-media fact-checking datasets, baselines such as standard RAG pipelines and LLM-only judges, metrics for retrieval precision, clustering quality, and verdict accuracy, as well as the number of runs and controls). In the revised version we will condense the key elements of this setup into the abstract while preserving its length and readability. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper proposes CRAVE, a framework combining retrieval-augmented LLMs with clustering and an LLM judge for multimodal fact-checking. No derivation chain, equations, or self-definitional steps are present that reduce predictions or results to inputs by construction. Claims of efficacy rest on separate experiments for retrieval precision, clustering quality, and judgment accuracy rather than any fitted parameter renamed as prediction or self-citation load-bearing premise. The framework is presented as an empirical proposal without internal reductions to its own definitions or prior author work invoked as uniqueness theorems.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Based solely on the abstract, no specific free parameters, axioms, or invented entities are identifiable or detailed in the provided text.

pith-pipeline@v0.9.0 · 5661 in / 1054 out tokens · 34718 ms · 2026-05-22T21:03:52.298789+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

46 extracted references · 46 canonical work pages · 6 internal anchors

  1. [1]

    Detecting and mitigating the dissemination of fake news: Challenges and future research opportunities,

    W. Shahid, B. Jamshidi, S. Hakak, H. Isah, W. Z. Khan, M. K. Khan, and K.-K. R. Choo, “Detecting and mitigating the dissemination of fake news: Challenges and future research opportunities,” IEEE Transactions on Computational Social Systems , vol. 11, no. 4, pp. 4649– 4662, 2022

  2. [2]

    The spread of true and false news online,

    S. V osoughi, D. Roy, and S. Aral, “The spread of true and false news online,” science, vol. 359, no. 6380, pp. 1146–1151, 2018

  3. [3]

    Fake news, disinformation and misinformation in social media: a review,

    E. A ¨ımeur, S. Amri, and G. Brassard, “Fake news, disinformation and misinformation in social media: a review,” Social Network Analysis and Mining , vol. 13, no. 1, p. 30, 2023

  4. [4]

    The economics of “fake news

    N. Kshetri and J. V oas, “The economics of “fake news”,” IT Professional, vol. 19, no. 6, pp. 8–12, 2017

  5. [5]

    The impact of misinformation on the covid-19 pandemic,

    M. M. F. Caceres, J. P. Sosa, J. A. Lawrence, C. Sestacov- schi, A. Tidd-Johnson, M. H. U. Rasool, V . K. Gadamidi, S. Ozair, K. Pandav, C. Cuevas-Lou et al., “The impact of misinformation on the covid-19 pandemic,” AIMS public health, vol. 9, no. 2, p. 262, 2022

  6. [6]

    Social media and fake news in the 2016 election,

    H. Allcott and M. Gentzkow, “Social media and fake news in the 2016 election,” Journal of economic perspectives , vol. 31, no. 2, pp. 211–236, 2017

  7. [7]

    Red-dot: Multimodal fact-checking via relevant evidence detection,

    S.-I. Papadopoulos, C. Koutlis, S. Papadopoulos, and P. C. Petrantonakis, “Red-dot: Multimodal fact-checking via relevant evidence detection,” 2024. [Online]. Available: https://arxiv.org/abs/2311.09939

  8. [8]

    Similarity over factuality: Are we making progress on multimodal out-of-context misinformation detection?

    ——, “Similarity over factuality: Are we making progress on multimodal out-of-context misinformation detection?” arXiv preprint arXiv:2407.13488 , 2024

  9. [9]

    Verite: a robust benchmark for multimodal mis- information detection accounting for unimodal bias,

    ——, “Verite: a robust benchmark for multimodal mis- information detection accounting for unimodal bias,” International Journal of Multimedia Information Retrieval, vol. 13, no. 1, p. 4, 2024

  10. [10]

    Newsclippings: Automatic generation of out-of-context multimodal media,

    G. Luo, T. Darrell, and A. Rohrbach, “Newsclippings: Automatic generation of out-of-context multimodal media,” arXiv preprint arXiv:2104.05893 , 2021

  11. [11]

    Open-domain, content-based, multi-modal fact-checking of out-of- context images via online resources,

    S. Abdelnabi, R. Hasan, and M. Fritz, “Open-domain, content-based, multi-modal fact-checking of out-of- context images via online resources,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 14 940–14 949

  12. [12]

    Cove: Context and veracity prediction for out-of-context images,

    J. Tonglet, G. Thiem, and I. Gurevych, “Cove: Context and veracity prediction for out-of-context images,” arXiv preprint arXiv:2502.01194, 2025

  13. [13]

    Integrating social explanations into explainable artificial intelligence (xai) for combating misinformation: Vision and challenges,

    Y . Gong, L. Shang, and D. Wang, “Integrating social explanations into explainable artificial intelligence (xai) for combating misinformation: Vision and challenges,” IEEE Transactions on Computational Social Systems , vol. 11, no. 5, pp. 6705–6726, 2024

  14. [14]

    Retrieval-augmented generation for knowledge- intensive nlp tasks,

    P. Lewis, E. Perez, A. Piktus, F. Petroni, V . Karpukhin, N. Goyal, H. K ¨uttler, M. Lewis, W.-t. Yih, T. Rockt¨aschel et al. , “Retrieval-augmented generation for knowledge- intensive nlp tasks,” Advances in neural information processing systems, vol. 33, pp. 9459–9474, 2020

  15. [15]

    Rag-fusion based information retrieval for fact-checking,

    Y . Momii, T. Takiguchi, and Y . Ariki, “Rag-fusion based information retrieval for fact-checking,” in Proceedings of the Seventh Fact Extraction and VERification Workshop (FEVER), 2024, pp. 47–54

  16. [16]

    Believe in artificial intelligence? a user study on the chatgpt’s fake information impact,

    I. Amaro, P. Barra, A. Della Greca, R. Francese, and C. Tucci, “Believe in artificial intelligence? a user study on the chatgpt’s fake information impact,”IEEE Transactions on Computational Social Systems , 2023

  17. [17]

    Understanding the promise and limits of automated fact-checking,

    D. Graves, “Understanding the promise and limits of automated fact-checking,” Reuters Institute for the Study of Journalism, 2018

  18. [18]

    Mmfakebench: A mixed-source mul- timodal misinformation detection benchmark for lvlms,

    X. Liu, Z. Li, P. Li, S. Xia, X. Cui, L. Huang, H. Huang, W. Deng, and Z. He, “Mmfakebench: A mixed-source mul- timodal misinformation detection benchmark for lvlms,” arXiv preprint arXiv:2406.08772 , 2024

  19. [19]

    Detecting and grounding multi-modal media manipulation,

    R. Shao, T. Wu, and Z. Liu, “Detecting and grounding multi-modal media manipulation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 6904–6913

  20. [20]

    Cosmos: Catch- ing out-of-context misinformation with self-supervised learning,

    S. Aneja, C. Bregler, and M. Nießner, “Cosmos: Catch- ing out-of-context misinformation with self-supervised learning,” arXiv preprint arXiv:2101.06278 , 2021

  21. [21]

    Amir: An automated misinformation rebuttal system–a covid-19 vaccination datasets-based exposition,

    S. Sharma, A. Datta, and R. Sharma, “Amir: An automated misinformation rebuttal system–a covid-19 vaccination datasets-based exposition,” IEEE Transactions on Com- putational Social Systems , 2024

  22. [22]

    Defame: Dynamic evidence-based fact-checking with multimodal experts,

    T. Braun, M. Rothermel, M. Rohrbach, and A. Rohrbach, “Defame: Dynamic evidence-based fact-checking with multimodal experts,” 2025. [Online]. Available: https: //arxiv.org/abs/2412.10510

  23. [23]

    Retrieval augmented verification for zero-shot detec- tion of multimodal disinformation,

    A. U. Dey, A. Llabr ´es, E. Valveny, and D. Karatzas, “Retrieval augmented verification for zero-shot detec- tion of multimodal disinformation,” arXiv preprint arXiv:2404.10702, 2024

  24. [24]

    Sniffer: Mul- timodal large language model for explainable out-of- context misinformation detection,

    P. Qi, Z. Yan, W. Hsu, and M. L. Lee, “Sniffer: Mul- timodal large language model for explainable out-of- context misinformation detection,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2024, pp. 13 052–13 062

  25. [25]

    Wikidata: a free col- laborative knowledgebase,

    D. Vrande ˇci´c and M. Kr ¨otzsch, “Wikidata: a free col- laborative knowledgebase,” Communications of the ACM , vol. 57, no. 10, pp. 78–85, 2014

  26. [26]

    Wikichat: Stopping the hallucination of large language model chatbots by few-shot grounding on wikipedia

    S. J. Semnani, V . Z. Yao, H. C. Zhang, and M. S. Lam, “Wikichat: Stopping the hallucination of large language model chatbots by few-shot grounding on wikipedia,” arXiv preprint arXiv:2305.14292 , 2023

  27. [27]

    InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning

    W. Dai, J. Li, D. Li, A. M. H. Tiong, J. Zhao, W. Wang, B. Li, P. Fung, and S. Hoi, “Instructblip: Towards general-purpose vision-language models with instruction tuning,” 2023. [Online]. Available: https: //arxiv.org/abs/2305.06500

  28. [28]

    Judging llm-as-a-judge with mt-bench and chatbot arena,

    L. Zheng, W.-L. Chiang, Y . Sheng, S. Zhuang, Z. Wu, Y . Zhuang, Z. Lin, Z. Li, D. Li, E. P. Xing, H. Zhang, J. E. Gonzalez, and I. Stoica, “Judging llm-as-a-judge with mt-bench and chatbot arena,” 2023

  29. [29]

    Llm-consensus: Multi- agent debate for visual misinformation detection,

    K. Lakara, G. Channing, J. Sock, C. Rupprecht, P. Torr, J. Collomosse, and C. S. de Witt, “Llm-consensus: Multi- agent debate for visual misinformation detection,” 2025. [Online]. Available: https://arxiv.org/abs/2410.20140

  30. [30]

    The Llama 3 Herd of Models

    A. Grattafiori, A. Dubey, A. Jauhri, A. Pandey, A. Ka- dian, A. Al-Dahle, A. Letman, A. Mathur, A. Schelten, A. Vaughan et al., “The llama 3 herd of models,” arXiv preprint arXiv:2407.21783, 2024

  31. [31]

    DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with Gradient-Disentangled Embedding Sharing

    P. He, J. Gao, and W. Chen, “Debertav3: Im- proving deberta using electra-style pre-training with gradient-disentangled embedding sharing,” arXiv preprint arXiv:2111.09543, 2021

  32. [32]

    R., Rocktäschel, T., and Perez, E

    A. Khan, J. Hughes, D. Valentine, L. Ruis, K. Sachan, A. Radhakrishnan, E. Grefenstette, S. R. Bowman, T. Rockt ¨aschel, and E. Perez, “Debating with more persuasive llms leads to more truthful answers,” arXiv preprint arXiv:2402.06782, 2024

  33. [33]

    Super- notes: Driving consensus in crowd-sourced fact-checking,

    S. De, M. A. Bakker, J. Baxter, and M. Saveski, “Super- notes: Driving consensus in crowd-sourced fact-checking,” arXiv preprint arXiv:2411.06116 , 2024

  34. [34]

    Unchecked vs. uncheck- able: How opinion-based claims can impede corrections of misinformation,

    N. Walter and N. A. Salovich, “Unchecked vs. uncheck- able: How opinion-based claims can impede corrections of misinformation,” Mass communication and society , vol. 24, no. 4, pp. 500–526, 2021

  35. [35]

    Google vision api: Detect web entities and pages,

    Google, “Google vision api: Detect web entities and pages,” https://cloud.google.com/vision/docs/ detecting-web

  36. [36]

    Programmable search engine,

    ——, “Programmable search engine,” https://developers. google.com/custom-search/v1/overview

  37. [37]

    Verifying online information,

    S. Urbani, “Verifying online information,” Es- sential Guides , 2020, published in 2022. [Online]. Available: https://firstdraftnews.org/articles/ verifying-online-information-the-absolute-essentials/

  38. [38]

    ” image, tell me your story!

    J. Tonglet, M.-F. Moens, and I. Gurevych, “” image, tell me your story!” predicting the original meta-context of visual misinformation,” arXiv preprint arXiv:2408.09939 , 2024

  39. [39]

    Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks

    N. Reimers and I. Gurevych, “Sentence-bert: Sentence embeddings using siamese bert-networks,” in Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing . Association for Computational Linguistics, 11 2019. [Online]. Available: https://arxiv.org/abs/1908.10084

  40. [40]

    Joint face detec- tion and alignment using multitask cascaded convolutional networks,

    K. Zhang, Z. Zhang, Z. Li, and Y . Qiao, “Joint face detec- tion and alignment using multitask cascaded convolutional networks,” IEEE signal processing letters, vol. 23, no. 10, pp. 1499–1503, 2016

  41. [41]

    Facenet: A unified embedding for face recognition and clustering,

    F. Schroff, D. Kalenichenko, and J. Philbin, “Facenet: A unified embedding for face recognition and clustering,” in CVPR, 2015

  42. [42]

    Places: A 10 million image database for scene recognition,

    B. Zhou, A. Lapedriza, A. Khosla, A. Oliva, and A. Tor- ralba, “Places: A 10 million image database for scene recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017

  43. [43]

    An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

    A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly et al., “An image is worth 16x16 words: Transformers for image recognition at scale,”arXiv preprint arXiv:2010.11929, 2020

  44. [44]

    Improved Baselines with Visual Instruction Tuning

    H. Liu, C. Li, Y . Li, and Y . J. Lee, “Improved baselines with visual instruction tuning,” 2024. [Online]. Available: https://arxiv.org/abs/2310.03744

  45. [45]

    Silhouettes: a graphical aid to the interpretation and validation of cluster analysis,

    P. J. Rousseeuw, “Silhouettes: a graphical aid to the interpretation and validation of cluster analysis,” Journal of computational and applied mathematics , vol. 20, pp. 53–65, 1987

  46. [46]

    A cluster separation measure,

    D. L. Davies and D. W. Bouldin, “A cluster separation measure,” IEEE transactions on pattern analysis and machine intelligence, no. 2, pp. 224–227, 1979