Fact-Checking with Contextual Narratives: Leveraging Retrieval-Augmented LLMs for Social Media Analysis
Pith reviewed 2026-05-22 21:03 UTC · model grok-4.3
The pith
CRAVE retrieves multimodal evidence from diverse sources, clusters it into coherent narratives, and uses an LLM judge to deliver explained fact-checking verdicts.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
CRAVE automatically retrieves multimodal evidence from diverse, often contradictory, sources, clusters evidence into coherent narratives, and uses an LLM-based judge to deliver fact-checking verdicts explained by evidence summaries, with experiments demonstrating efficacy in retrieval precision, clustering quality, and judgment accuracy.
What carries the argument
The CRAVE framework, which combines retrieval-augmented LLMs with clustering to form coherent narratives from multimodal evidence for verification.
If this is right
- Evidence clustering improves consistency when sources contradict one another.
- LLM judgments gain explainability through attached narrative summaries.
- Multimodal retrieval and refinement steps increase overall precision and quality.
- The system serves as a practical decision-support aid for human fact-checkers.
Where Pith is reading between the lines
- The narrative-clustering step might help reduce unsupported outputs from LLMs in other verification tasks.
- The framework could scale to monitor live social media streams for emerging claims.
- Extending the same retrieval-plus-clustering pattern to video or audio evidence would test its limits on additional modalities.
Load-bearing premise
Clustering retrieved evidence into coherent narratives combined with an LLM-based judge will produce accurate, consistent fact-checking verdicts even when sources are contradictory and multimodal.
What would settle it
Run CRAVE on a labeled dataset of social media claims that include deliberately contradictory text and image evidence, then measure whether the LLM judge's verdicts match independent human fact-checker ground truth at high accuracy.
Figures
read the original abstract
We propose CRAVE (Cluster-based Retrieval Augmented Verification with Explanation); a novel framework that integrates retrieval-augmented Large Language Models (LLMs) with clustering techniques to address fact-checking challenges on social media. CRAVE automatically retrieves multimodal evidence from diverse, often contradictory, sources. Evidence is clustered into coherent narratives, and evaluated via an LLM-based judge to deliver fact-checking verdicts explained by evidence summaries. By synthesizing evidence from both text and image modalities and incorporating agent-based refinement, CRAVE ensures consistency and diversity in evidence representation. Comprehensive experiments demonstrate CRAVE's efficacy in retrieval precision, clustering quality, and judgment accuracy, showcasing its potential as a robust decision-support tool for fact-checkers.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes CRAVE (Cluster-based Retrieval Augmented Verification with Explanation), a framework integrating retrieval-augmented LLMs with clustering for fact-checking social media claims. It retrieves multimodal evidence from diverse and often contradictory sources, clusters the evidence into coherent narratives, and uses an LLM-based judge to produce fact-checking verdicts accompanied by evidence summaries. The approach includes agent-based refinement to promote consistency and diversity across text and image modalities. The central claim is that this pipeline yields accurate, consistent verdicts, with comprehensive experiments demonstrating efficacy in retrieval precision, clustering quality, and judgment accuracy.
Significance. If the empirical results hold, the work could offer a practical decision-support system for fact-checkers by synthesizing contradictory multimodal evidence into narrative clusters rather than isolated snippets. The combination of RAG, clustering, and an LLM judge is a natural extension of existing retrieval and reasoning pipelines, with potential value in the emphasis on explanations. However, the significance is difficult to evaluate because the abstract asserts experimental support without specifying datasets, baselines, metrics, or controls, leaving the strength of the evidence unclear. No machine-checked proofs or parameter-free derivations are present.
major comments (1)
- [Abstract] Abstract: The claim that 'comprehensive experiments demonstrate CRAVE's efficacy in retrieval precision, clustering quality, and judgment accuracy' is load-bearing for the central contribution, yet the abstract (and the provided manuscript description) supplies no information on the datasets, baselines, evaluation metrics, number of trials, or controls used. Without these details, the data support for the efficacy claims cannot be assessed.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on the abstract. We address the single major comment below.
read point-by-point responses
-
Referee: [Abstract] Abstract: The claim that 'comprehensive experiments demonstrate CRAVE's efficacy in retrieval precision, clustering quality, and judgment accuracy' is load-bearing for the central contribution, yet the abstract (and the provided manuscript description) supplies no information on the datasets, baselines, evaluation metrics, number of trials, or controls used. Without these details, the data support for the efficacy claims cannot be assessed.
Authors: We agree that the abstract would be strengthened by briefly indicating the experimental details that support the efficacy claims. The full manuscript already contains dedicated sections on the evaluation (including the specific social-media fact-checking datasets, baselines such as standard RAG pipelines and LLM-only judges, metrics for retrieval precision, clustering quality, and verdict accuracy, as well as the number of runs and controls). In the revised version we will condense the key elements of this setup into the abstract while preserving its length and readability. revision: yes
Circularity Check
No significant circularity
full rationale
The paper proposes CRAVE, a framework combining retrieval-augmented LLMs with clustering and an LLM judge for multimodal fact-checking. No derivation chain, equations, or self-definitional steps are present that reduce predictions or results to inputs by construction. Claims of efficacy rest on separate experiments for retrieval precision, clustering quality, and judgment accuracy rather than any fitted parameter renamed as prediction or self-citation load-bearing premise. The framework is presented as an empirical proposal without internal reductions to its own definitions or prior author work invoked as uniqueness theorems.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
CRAVE automatically retrieves multimodal evidence... clusters evidence into coherent narratives, and uses an LLM-based judge to deliver fact-checking verdicts
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
K-means clustering... K=4... narrative assessment with 5W1H
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
W. Shahid, B. Jamshidi, S. Hakak, H. Isah, W. Z. Khan, M. K. Khan, and K.-K. R. Choo, “Detecting and mitigating the dissemination of fake news: Challenges and future research opportunities,” IEEE Transactions on Computational Social Systems , vol. 11, no. 4, pp. 4649– 4662, 2022
work page 2022
-
[2]
The spread of true and false news online,
S. V osoughi, D. Roy, and S. Aral, “The spread of true and false news online,” science, vol. 359, no. 6380, pp. 1146–1151, 2018
work page 2018
-
[3]
Fake news, disinformation and misinformation in social media: a review,
E. A ¨ımeur, S. Amri, and G. Brassard, “Fake news, disinformation and misinformation in social media: a review,” Social Network Analysis and Mining , vol. 13, no. 1, p. 30, 2023
work page 2023
-
[4]
N. Kshetri and J. V oas, “The economics of “fake news”,” IT Professional, vol. 19, no. 6, pp. 8–12, 2017
work page 2017
-
[5]
The impact of misinformation on the covid-19 pandemic,
M. M. F. Caceres, J. P. Sosa, J. A. Lawrence, C. Sestacov- schi, A. Tidd-Johnson, M. H. U. Rasool, V . K. Gadamidi, S. Ozair, K. Pandav, C. Cuevas-Lou et al., “The impact of misinformation on the covid-19 pandemic,” AIMS public health, vol. 9, no. 2, p. 262, 2022
work page 2022
-
[6]
Social media and fake news in the 2016 election,
H. Allcott and M. Gentzkow, “Social media and fake news in the 2016 election,” Journal of economic perspectives , vol. 31, no. 2, pp. 211–236, 2017
work page 2016
-
[7]
Red-dot: Multimodal fact-checking via relevant evidence detection,
S.-I. Papadopoulos, C. Koutlis, S. Papadopoulos, and P. C. Petrantonakis, “Red-dot: Multimodal fact-checking via relevant evidence detection,” 2024. [Online]. Available: https://arxiv.org/abs/2311.09939
-
[8]
——, “Similarity over factuality: Are we making progress on multimodal out-of-context misinformation detection?” arXiv preprint arXiv:2407.13488 , 2024
-
[9]
Verite: a robust benchmark for multimodal mis- information detection accounting for unimodal bias,
——, “Verite: a robust benchmark for multimodal mis- information detection accounting for unimodal bias,” International Journal of Multimedia Information Retrieval, vol. 13, no. 1, p. 4, 2024
work page 2024
-
[10]
Newsclippings: Automatic generation of out-of-context multimodal media,
G. Luo, T. Darrell, and A. Rohrbach, “Newsclippings: Automatic generation of out-of-context multimodal media,” arXiv preprint arXiv:2104.05893 , 2021
-
[11]
S. Abdelnabi, R. Hasan, and M. Fritz, “Open-domain, content-based, multi-modal fact-checking of out-of- context images via online resources,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 14 940–14 949
work page 2022
-
[12]
Cove: Context and veracity prediction for out-of-context images,
J. Tonglet, G. Thiem, and I. Gurevych, “Cove: Context and veracity prediction for out-of-context images,” arXiv preprint arXiv:2502.01194, 2025
-
[13]
Y . Gong, L. Shang, and D. Wang, “Integrating social explanations into explainable artificial intelligence (xai) for combating misinformation: Vision and challenges,” IEEE Transactions on Computational Social Systems , vol. 11, no. 5, pp. 6705–6726, 2024
work page 2024
-
[14]
Retrieval-augmented generation for knowledge- intensive nlp tasks,
P. Lewis, E. Perez, A. Piktus, F. Petroni, V . Karpukhin, N. Goyal, H. K ¨uttler, M. Lewis, W.-t. Yih, T. Rockt¨aschel et al. , “Retrieval-augmented generation for knowledge- intensive nlp tasks,” Advances in neural information processing systems, vol. 33, pp. 9459–9474, 2020
work page 2020
-
[15]
Rag-fusion based information retrieval for fact-checking,
Y . Momii, T. Takiguchi, and Y . Ariki, “Rag-fusion based information retrieval for fact-checking,” in Proceedings of the Seventh Fact Extraction and VERification Workshop (FEVER), 2024, pp. 47–54
work page 2024
-
[16]
Believe in artificial intelligence? a user study on the chatgpt’s fake information impact,
I. Amaro, P. Barra, A. Della Greca, R. Francese, and C. Tucci, “Believe in artificial intelligence? a user study on the chatgpt’s fake information impact,”IEEE Transactions on Computational Social Systems , 2023
work page 2023
-
[17]
Understanding the promise and limits of automated fact-checking,
D. Graves, “Understanding the promise and limits of automated fact-checking,” Reuters Institute for the Study of Journalism, 2018
work page 2018
-
[18]
Mmfakebench: A mixed-source mul- timodal misinformation detection benchmark for lvlms,
X. Liu, Z. Li, P. Li, S. Xia, X. Cui, L. Huang, H. Huang, W. Deng, and Z. He, “Mmfakebench: A mixed-source mul- timodal misinformation detection benchmark for lvlms,” arXiv preprint arXiv:2406.08772 , 2024
-
[19]
Detecting and grounding multi-modal media manipulation,
R. Shao, T. Wu, and Z. Liu, “Detecting and grounding multi-modal media manipulation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 6904–6913
work page 2023
-
[20]
Cosmos: Catch- ing out-of-context misinformation with self-supervised learning,
S. Aneja, C. Bregler, and M. Nießner, “Cosmos: Catch- ing out-of-context misinformation with self-supervised learning,” arXiv preprint arXiv:2101.06278 , 2021
-
[21]
Amir: An automated misinformation rebuttal system–a covid-19 vaccination datasets-based exposition,
S. Sharma, A. Datta, and R. Sharma, “Amir: An automated misinformation rebuttal system–a covid-19 vaccination datasets-based exposition,” IEEE Transactions on Com- putational Social Systems , 2024
work page 2024
-
[22]
Defame: Dynamic evidence-based fact-checking with multimodal experts,
T. Braun, M. Rothermel, M. Rohrbach, and A. Rohrbach, “Defame: Dynamic evidence-based fact-checking with multimodal experts,” 2025. [Online]. Available: https: //arxiv.org/abs/2412.10510
-
[23]
Retrieval augmented verification for zero-shot detec- tion of multimodal disinformation,
A. U. Dey, A. Llabr ´es, E. Valveny, and D. Karatzas, “Retrieval augmented verification for zero-shot detec- tion of multimodal disinformation,” arXiv preprint arXiv:2404.10702, 2024
-
[24]
Sniffer: Mul- timodal large language model for explainable out-of- context misinformation detection,
P. Qi, Z. Yan, W. Hsu, and M. L. Lee, “Sniffer: Mul- timodal large language model for explainable out-of- context misinformation detection,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2024, pp. 13 052–13 062
work page 2024
-
[25]
Wikidata: a free col- laborative knowledgebase,
D. Vrande ˇci´c and M. Kr ¨otzsch, “Wikidata: a free col- laborative knowledgebase,” Communications of the ACM , vol. 57, no. 10, pp. 78–85, 2014
work page 2014
-
[26]
S. J. Semnani, V . Z. Yao, H. C. Zhang, and M. S. Lam, “Wikichat: Stopping the hallucination of large language model chatbots by few-shot grounding on wikipedia,” arXiv preprint arXiv:2305.14292 , 2023
-
[27]
InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning
W. Dai, J. Li, D. Li, A. M. H. Tiong, J. Zhao, W. Wang, B. Li, P. Fung, and S. Hoi, “Instructblip: Towards general-purpose vision-language models with instruction tuning,” 2023. [Online]. Available: https: //arxiv.org/abs/2305.06500
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[28]
Judging llm-as-a-judge with mt-bench and chatbot arena,
L. Zheng, W.-L. Chiang, Y . Sheng, S. Zhuang, Z. Wu, Y . Zhuang, Z. Lin, Z. Li, D. Li, E. P. Xing, H. Zhang, J. E. Gonzalez, and I. Stoica, “Judging llm-as-a-judge with mt-bench and chatbot arena,” 2023
work page 2023
-
[29]
Llm-consensus: Multi- agent debate for visual misinformation detection,
K. Lakara, G. Channing, J. Sock, C. Rupprecht, P. Torr, J. Collomosse, and C. S. de Witt, “Llm-consensus: Multi- agent debate for visual misinformation detection,” 2025. [Online]. Available: https://arxiv.org/abs/2410.20140
-
[30]
A. Grattafiori, A. Dubey, A. Jauhri, A. Pandey, A. Ka- dian, A. Al-Dahle, A. Letman, A. Mathur, A. Schelten, A. Vaughan et al., “The llama 3 herd of models,” arXiv preprint arXiv:2407.21783, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[31]
P. He, J. Gao, and W. Chen, “Debertav3: Im- proving deberta using electra-style pre-training with gradient-disentangled embedding sharing,” arXiv preprint arXiv:2111.09543, 2021
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[32]
R., Rocktäschel, T., and Perez, E
A. Khan, J. Hughes, D. Valentine, L. Ruis, K. Sachan, A. Radhakrishnan, E. Grefenstette, S. R. Bowman, T. Rockt ¨aschel, and E. Perez, “Debating with more persuasive llms leads to more truthful answers,” arXiv preprint arXiv:2402.06782, 2024
-
[33]
Super- notes: Driving consensus in crowd-sourced fact-checking,
S. De, M. A. Bakker, J. Baxter, and M. Saveski, “Super- notes: Driving consensus in crowd-sourced fact-checking,” arXiv preprint arXiv:2411.06116 , 2024
-
[34]
Unchecked vs. uncheck- able: How opinion-based claims can impede corrections of misinformation,
N. Walter and N. A. Salovich, “Unchecked vs. uncheck- able: How opinion-based claims can impede corrections of misinformation,” Mass communication and society , vol. 24, no. 4, pp. 500–526, 2021
work page 2021
-
[35]
Google vision api: Detect web entities and pages,
Google, “Google vision api: Detect web entities and pages,” https://cloud.google.com/vision/docs/ detecting-web
-
[36]
——, “Programmable search engine,” https://developers. google.com/custom-search/v1/overview
-
[37]
S. Urbani, “Verifying online information,” Es- sential Guides , 2020, published in 2022. [Online]. Available: https://firstdraftnews.org/articles/ verifying-online-information-the-absolute-essentials/
work page 2020
-
[38]
J. Tonglet, M.-F. Moens, and I. Gurevych, “” image, tell me your story!” predicting the original meta-context of visual misinformation,” arXiv preprint arXiv:2408.09939 , 2024
-
[39]
Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
N. Reimers and I. Gurevych, “Sentence-bert: Sentence embeddings using siamese bert-networks,” in Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing . Association for Computational Linguistics, 11 2019. [Online]. Available: https://arxiv.org/abs/1908.10084
work page internal anchor Pith review Pith/arXiv arXiv 2019
-
[40]
Joint face detec- tion and alignment using multitask cascaded convolutional networks,
K. Zhang, Z. Zhang, Z. Li, and Y . Qiao, “Joint face detec- tion and alignment using multitask cascaded convolutional networks,” IEEE signal processing letters, vol. 23, no. 10, pp. 1499–1503, 2016
work page 2016
-
[41]
Facenet: A unified embedding for face recognition and clustering,
F. Schroff, D. Kalenichenko, and J. Philbin, “Facenet: A unified embedding for face recognition and clustering,” in CVPR, 2015
work page 2015
-
[42]
Places: A 10 million image database for scene recognition,
B. Zhou, A. Lapedriza, A. Khosla, A. Oliva, and A. Tor- ralba, “Places: A 10 million image database for scene recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017
work page 2017
-
[43]
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly et al., “An image is worth 16x16 words: Transformers for image recognition at scale,”arXiv preprint arXiv:2010.11929, 2020
work page internal anchor Pith review Pith/arXiv arXiv 2010
-
[44]
Improved Baselines with Visual Instruction Tuning
H. Liu, C. Li, Y . Li, and Y . J. Lee, “Improved baselines with visual instruction tuning,” 2024. [Online]. Available: https://arxiv.org/abs/2310.03744
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[45]
Silhouettes: a graphical aid to the interpretation and validation of cluster analysis,
P. J. Rousseeuw, “Silhouettes: a graphical aid to the interpretation and validation of cluster analysis,” Journal of computational and applied mathematics , vol. 20, pp. 53–65, 1987
work page 1987
-
[46]
D. L. Davies and D. W. Bouldin, “A cluster separation measure,” IEEE transactions on pattern analysis and machine intelligence, no. 2, pp. 224–227, 1979
work page 1979
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.