Fact-Checking with Contextual Narratives: Leveraging Retrieval-Augmented LLMs for Social Media Analysis

Arka Ujjal Dey; Christian Schroeder de Witt; Georgia Channing; John Collomosse; Muhammad Junaid Awan

arxiv: 2504.10166 · v2 · submitted 2025-04-14 · 💻 cs.MM

Fact-Checking with Contextual Narratives: Leveraging Retrieval-Augmented LLMs for Social Media Analysis

Arka Ujjal Dey , Muhammad Junaid Awan , Georgia Channing , Christian Schroeder de Witt , John Collomosse This is my paper

Pith reviewed 2026-05-22 21:03 UTC · model grok-4.3

classification 💻 cs.MM

keywords fact-checkingretrieval-augmented generationclusteringlarge language modelsmultimodal evidencesocial medianarrative synthesis

0 comments

The pith

CRAVE retrieves multimodal evidence from diverse sources, clusters it into coherent narratives, and uses an LLM judge to deliver explained fact-checking verdicts.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces CRAVE, a framework that pulls evidence from text and image sources across social media, groups that evidence into consistent narratives despite contradictions, and lets an LLM reach a verdict backed by evidence summaries. This setup aims to give fact-checkers a structured way to synthesize conflicting multimodal inputs rather than relying on unorganized retrieval or single-pass model judgments. If the approach holds, it would allow more reliable and transparent decisions on rapidly spreading claims by turning raw data into grouped stories that support clear conclusions.

Core claim

CRAVE automatically retrieves multimodal evidence from diverse, often contradictory, sources, clusters evidence into coherent narratives, and uses an LLM-based judge to deliver fact-checking verdicts explained by evidence summaries, with experiments demonstrating efficacy in retrieval precision, clustering quality, and judgment accuracy.

What carries the argument

The CRAVE framework, which combines retrieval-augmented LLMs with clustering to form coherent narratives from multimodal evidence for verification.

If this is right

Evidence clustering improves consistency when sources contradict one another.
LLM judgments gain explainability through attached narrative summaries.
Multimodal retrieval and refinement steps increase overall precision and quality.
The system serves as a practical decision-support aid for human fact-checkers.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The narrative-clustering step might help reduce unsupported outputs from LLMs in other verification tasks.
The framework could scale to monitor live social media streams for emerging claims.
Extending the same retrieval-plus-clustering pattern to video or audio evidence would test its limits on additional modalities.

Load-bearing premise

Clustering retrieved evidence into coherent narratives combined with an LLM-based judge will produce accurate, consistent fact-checking verdicts even when sources are contradictory and multimodal.

What would settle it

Run CRAVE on a labeled dataset of social media claims that include deliberately contradictory text and image evidence, then measure whether the LLM judge's verdicts match independent human fact-checker ground truth at high accuracy.

Figures

Figures reproduced from arXiv: 2504.10166 by Arka Ujjal Dey, Christian Schroeder de Witt, Georgia Channing, John Collomosse, Muhammad Junaid Awan.

**Figure 2.** Figure 2: Overview of CRAVE: Given a claim (image + text), contained in a social media post, we retrieve evidence from reverse [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Reasoning protocol used by the LLM (GPT 4o) to assess [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: Dynamic Clustering performance varies with dataset complexity, as higher similarity thresholds fragment evidence into [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 5.** Figure 5: Misclassified samples are color-coded—orange for those labeled ‘True’, blue for those labeled ‘Fake’. A sample with a [PITH_FULL_IMAGE:figures/full_fig_p010_5.png] view at source ↗

read the original abstract

We propose CRAVE (Cluster-based Retrieval Augmented Verification with Explanation); a novel framework that integrates retrieval-augmented Large Language Models (LLMs) with clustering techniques to address fact-checking challenges on social media. CRAVE automatically retrieves multimodal evidence from diverse, often contradictory, sources. Evidence is clustered into coherent narratives, and evaluated via an LLM-based judge to deliver fact-checking verdicts explained by evidence summaries. By synthesizing evidence from both text and image modalities and incorporating agent-based refinement, CRAVE ensures consistency and diversity in evidence representation. Comprehensive experiments demonstrate CRAVE's efficacy in retrieval precision, clustering quality, and judgment accuracy, showcasing its potential as a robust decision-support tool for fact-checkers.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

CRAVE bundles standard RAG, clustering, and an LLM judge into a fact-checking pipeline, but the abstract gives no experimental details so the claims cannot be checked.

read the letter

The paper's core idea is to retrieve multimodal evidence from social media, cluster it into narratives to handle contradictions, then let an LLM produce a verdict plus explanation. That specific combination for this use case looks new on the surface, though each piece is familiar from prior RAG and clustering work. The motivation is practical: fact-checkers need help organizing messy, conflicting sources, and the narrative step is a reasonable way to try to impose some order before judgment. The agent-based refinement for consistency and diversity is a small but sensible addition to the basic pipeline. Credit for framing the output as evidence summaries rather than just a binary label. The main weakness is that the abstract asserts comprehensive experiments on retrieval precision, clustering quality, and judgment accuracy without naming datasets, baselines, metrics, or any numbers. That leaves the central claim—that clustering plus the LLM judge produces accurate verdicts on contradictory multimodal input—unsupported in what is visible. The assumption that coherent narratives will emerge reliably from noisy sources is plausible but remains an open empirical question rather than a demonstrated result. This is the kind of paper that could interest people building decision-support tools for misinformation work, especially if the full version includes reproducible experiments and comparisons. A reader already working on RAG pipelines might pick up the clustering-for-narratives trick. It is coherent enough on its own terms to deserve a serious referee who can check the actual methods and results sections, though the current description is too thin to judge soundness.

Referee Report

1 major / 0 minor

Summary. The manuscript proposes CRAVE (Cluster-based Retrieval Augmented Verification with Explanation), a framework integrating retrieval-augmented LLMs with clustering for fact-checking social media claims. It retrieves multimodal evidence from diverse and often contradictory sources, clusters the evidence into coherent narratives, and uses an LLM-based judge to produce fact-checking verdicts accompanied by evidence summaries. The approach includes agent-based refinement to promote consistency and diversity across text and image modalities. The central claim is that this pipeline yields accurate, consistent verdicts, with comprehensive experiments demonstrating efficacy in retrieval precision, clustering quality, and judgment accuracy.

Significance. If the empirical results hold, the work could offer a practical decision-support system for fact-checkers by synthesizing contradictory multimodal evidence into narrative clusters rather than isolated snippets. The combination of RAG, clustering, and an LLM judge is a natural extension of existing retrieval and reasoning pipelines, with potential value in the emphasis on explanations. However, the significance is difficult to evaluate because the abstract asserts experimental support without specifying datasets, baselines, metrics, or controls, leaving the strength of the evidence unclear. No machine-checked proofs or parameter-free derivations are present.

major comments (1)

[Abstract] Abstract: The claim that 'comprehensive experiments demonstrate CRAVE's efficacy in retrieval precision, clustering quality, and judgment accuracy' is load-bearing for the central contribution, yet the abstract (and the provided manuscript description) supplies no information on the datasets, baselines, evaluation metrics, number of trials, or controls used. Without these details, the data support for the efficacy claims cannot be assessed.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback on the abstract. We address the single major comment below.

read point-by-point responses

Referee: [Abstract] Abstract: The claim that 'comprehensive experiments demonstrate CRAVE's efficacy in retrieval precision, clustering quality, and judgment accuracy' is load-bearing for the central contribution, yet the abstract (and the provided manuscript description) supplies no information on the datasets, baselines, evaluation metrics, number of trials, or controls used. Without these details, the data support for the efficacy claims cannot be assessed.

Authors: We agree that the abstract would be strengthened by briefly indicating the experimental details that support the efficacy claims. The full manuscript already contains dedicated sections on the evaluation (including the specific social-media fact-checking datasets, baselines such as standard RAG pipelines and LLM-only judges, metrics for retrieval precision, clustering quality, and verdict accuracy, as well as the number of runs and controls). In the revised version we will condense the key elements of this setup into the abstract while preserving its length and readability. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper proposes CRAVE, a framework combining retrieval-augmented LLMs with clustering and an LLM judge for multimodal fact-checking. No derivation chain, equations, or self-definitional steps are present that reduce predictions or results to inputs by construction. Claims of efficacy rest on separate experiments for retrieval precision, clustering quality, and judgment accuracy rather than any fitted parameter renamed as prediction or self-citation load-bearing premise. The framework is presented as an empirical proposal without internal reductions to its own definitions or prior author work invoked as uniqueness theorems.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Based solely on the abstract, no specific free parameters, axioms, or invented entities are identifiable or detailed in the provided text.

pith-pipeline@v0.9.0 · 5661 in / 1054 out tokens · 34718 ms · 2026-05-22T21:03:52.298789+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

CRAVE automatically retrieves multimodal evidence... clusters evidence into coherent narratives, and uses an LLM-based judge to deliver fact-checking verdicts
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

K-means clustering... K=4... narrative assessment with 5W1H

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

46 extracted references · 46 canonical work pages · 6 internal anchors

[1]

Detecting and mitigating the dissemination of fake news: Challenges and future research opportunities,

W. Shahid, B. Jamshidi, S. Hakak, H. Isah, W. Z. Khan, M. K. Khan, and K.-K. R. Choo, “Detecting and mitigating the dissemination of fake news: Challenges and future research opportunities,” IEEE Transactions on Computational Social Systems , vol. 11, no. 4, pp. 4649– 4662, 2022

work page 2022
[2]

The spread of true and false news online,

S. V osoughi, D. Roy, and S. Aral, “The spread of true and false news online,” science, vol. 359, no. 6380, pp. 1146–1151, 2018

work page 2018
[3]

Fake news, disinformation and misinformation in social media: a review,

E. A ¨ımeur, S. Amri, and G. Brassard, “Fake news, disinformation and misinformation in social media: a review,” Social Network Analysis and Mining , vol. 13, no. 1, p. 30, 2023

work page 2023
[4]

The economics of “fake news

N. Kshetri and J. V oas, “The economics of “fake news”,” IT Professional, vol. 19, no. 6, pp. 8–12, 2017

work page 2017
[5]

The impact of misinformation on the covid-19 pandemic,

M. M. F. Caceres, J. P. Sosa, J. A. Lawrence, C. Sestacov- schi, A. Tidd-Johnson, M. H. U. Rasool, V . K. Gadamidi, S. Ozair, K. Pandav, C. Cuevas-Lou et al., “The impact of misinformation on the covid-19 pandemic,” AIMS public health, vol. 9, no. 2, p. 262, 2022

work page 2022
[6]

Social media and fake news in the 2016 election,

H. Allcott and M. Gentzkow, “Social media and fake news in the 2016 election,” Journal of economic perspectives , vol. 31, no. 2, pp. 211–236, 2017

work page 2016
[7]

Red-dot: Multimodal fact-checking via relevant evidence detection,

S.-I. Papadopoulos, C. Koutlis, S. Papadopoulos, and P. C. Petrantonakis, “Red-dot: Multimodal fact-checking via relevant evidence detection,” 2024. [Online]. Available: https://arxiv.org/abs/2311.09939

work page arXiv 2024
[8]

Similarity over factuality: Are we making progress on multimodal out-of-context misinformation detection?

——, “Similarity over factuality: Are we making progress on multimodal out-of-context misinformation detection?” arXiv preprint arXiv:2407.13488 , 2024

work page arXiv 2024
[9]

Verite: a robust benchmark for multimodal mis- information detection accounting for unimodal bias,

——, “Verite: a robust benchmark for multimodal mis- information detection accounting for unimodal bias,” International Journal of Multimedia Information Retrieval, vol. 13, no. 1, p. 4, 2024

work page 2024
[10]

Newsclippings: Automatic generation of out-of-context multimodal media,

G. Luo, T. Darrell, and A. Rohrbach, “Newsclippings: Automatic generation of out-of-context multimodal media,” arXiv preprint arXiv:2104.05893 , 2021

work page arXiv 2021
[11]

Open-domain, content-based, multi-modal fact-checking of out-of- context images via online resources,

S. Abdelnabi, R. Hasan, and M. Fritz, “Open-domain, content-based, multi-modal fact-checking of out-of- context images via online resources,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 14 940–14 949

work page 2022
[12]

Cove: Context and veracity prediction for out-of-context images,

J. Tonglet, G. Thiem, and I. Gurevych, “Cove: Context and veracity prediction for out-of-context images,” arXiv preprint arXiv:2502.01194, 2025

work page arXiv 2025
[13]

Integrating social explanations into explainable artificial intelligence (xai) for combating misinformation: Vision and challenges,

Y . Gong, L. Shang, and D. Wang, “Integrating social explanations into explainable artificial intelligence (xai) for combating misinformation: Vision and challenges,” IEEE Transactions on Computational Social Systems , vol. 11, no. 5, pp. 6705–6726, 2024

work page 2024
[14]

Retrieval-augmented generation for knowledge- intensive nlp tasks,

P. Lewis, E. Perez, A. Piktus, F. Petroni, V . Karpukhin, N. Goyal, H. K ¨uttler, M. Lewis, W.-t. Yih, T. Rockt¨aschel et al. , “Retrieval-augmented generation for knowledge- intensive nlp tasks,” Advances in neural information processing systems, vol. 33, pp. 9459–9474, 2020

work page 2020
[15]

Rag-fusion based information retrieval for fact-checking,

Y . Momii, T. Takiguchi, and Y . Ariki, “Rag-fusion based information retrieval for fact-checking,” in Proceedings of the Seventh Fact Extraction and VERification Workshop (FEVER), 2024, pp. 47–54

work page 2024
[16]

Believe in artificial intelligence? a user study on the chatgpt’s fake information impact,

I. Amaro, P. Barra, A. Della Greca, R. Francese, and C. Tucci, “Believe in artificial intelligence? a user study on the chatgpt’s fake information impact,”IEEE Transactions on Computational Social Systems , 2023

work page 2023
[17]

Understanding the promise and limits of automated fact-checking,

D. Graves, “Understanding the promise and limits of automated fact-checking,” Reuters Institute for the Study of Journalism, 2018

work page 2018
[18]

Mmfakebench: A mixed-source mul- timodal misinformation detection benchmark for lvlms,

X. Liu, Z. Li, P. Li, S. Xia, X. Cui, L. Huang, H. Huang, W. Deng, and Z. He, “Mmfakebench: A mixed-source mul- timodal misinformation detection benchmark for lvlms,” arXiv preprint arXiv:2406.08772 , 2024

work page arXiv 2024
[19]

Detecting and grounding multi-modal media manipulation,

R. Shao, T. Wu, and Z. Liu, “Detecting and grounding multi-modal media manipulation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 6904–6913

work page 2023
[20]

Cosmos: Catch- ing out-of-context misinformation with self-supervised learning,

S. Aneja, C. Bregler, and M. Nießner, “Cosmos: Catch- ing out-of-context misinformation with self-supervised learning,” arXiv preprint arXiv:2101.06278 , 2021

work page arXiv 2021
[21]

Amir: An automated misinformation rebuttal system–a covid-19 vaccination datasets-based exposition,

S. Sharma, A. Datta, and R. Sharma, “Amir: An automated misinformation rebuttal system–a covid-19 vaccination datasets-based exposition,” IEEE Transactions on Com- putational Social Systems , 2024

work page 2024
[22]

Defame: Dynamic evidence-based fact-checking with multimodal experts,

T. Braun, M. Rothermel, M. Rohrbach, and A. Rohrbach, “Defame: Dynamic evidence-based fact-checking with multimodal experts,” 2025. [Online]. Available: https: //arxiv.org/abs/2412.10510

work page arXiv 2025
[23]

Retrieval augmented verification for zero-shot detec- tion of multimodal disinformation,

A. U. Dey, A. Llabr ´es, E. Valveny, and D. Karatzas, “Retrieval augmented verification for zero-shot detec- tion of multimodal disinformation,” arXiv preprint arXiv:2404.10702, 2024

work page arXiv 2024
[24]

Sniffer: Mul- timodal large language model for explainable out-of- context misinformation detection,

P. Qi, Z. Yan, W. Hsu, and M. L. Lee, “Sniffer: Mul- timodal large language model for explainable out-of- context misinformation detection,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2024, pp. 13 052–13 062

work page 2024
[25]

Wikidata: a free col- laborative knowledgebase,

D. Vrande ˇci´c and M. Kr ¨otzsch, “Wikidata: a free col- laborative knowledgebase,” Communications of the ACM , vol. 57, no. 10, pp. 78–85, 2014

work page 2014
[26]

Wikichat: Stopping the hallucination of large language model chatbots by few-shot grounding on wikipedia

S. J. Semnani, V . Z. Yao, H. C. Zhang, and M. S. Lam, “Wikichat: Stopping the hallucination of large language model chatbots by few-shot grounding on wikipedia,” arXiv preprint arXiv:2305.14292 , 2023

work page arXiv 2023
[27]

InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning

W. Dai, J. Li, D. Li, A. M. H. Tiong, J. Zhao, W. Wang, B. Li, P. Fung, and S. Hoi, “Instructblip: Towards general-purpose vision-language models with instruction tuning,” 2023. [Online]. Available: https: //arxiv.org/abs/2305.06500

work page internal anchor Pith review Pith/arXiv arXiv 2023
[28]

Judging llm-as-a-judge with mt-bench and chatbot arena,

L. Zheng, W.-L. Chiang, Y . Sheng, S. Zhuang, Z. Wu, Y . Zhuang, Z. Lin, Z. Li, D. Li, E. P. Xing, H. Zhang, J. E. Gonzalez, and I. Stoica, “Judging llm-as-a-judge with mt-bench and chatbot arena,” 2023

work page 2023
[29]

Llm-consensus: Multi- agent debate for visual misinformation detection,

K. Lakara, G. Channing, J. Sock, C. Rupprecht, P. Torr, J. Collomosse, and C. S. de Witt, “Llm-consensus: Multi- agent debate for visual misinformation detection,” 2025. [Online]. Available: https://arxiv.org/abs/2410.20140

work page arXiv 2025
[30]

The Llama 3 Herd of Models

A. Grattafiori, A. Dubey, A. Jauhri, A. Pandey, A. Ka- dian, A. Al-Dahle, A. Letman, A. Mathur, A. Schelten, A. Vaughan et al., “The llama 3 herd of models,” arXiv preprint arXiv:2407.21783, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[31]

DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with Gradient-Disentangled Embedding Sharing

P. He, J. Gao, and W. Chen, “Debertav3: Im- proving deberta using electra-style pre-training with gradient-disentangled embedding sharing,” arXiv preprint arXiv:2111.09543, 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021
[32]

R., Rocktäschel, T., and Perez, E

A. Khan, J. Hughes, D. Valentine, L. Ruis, K. Sachan, A. Radhakrishnan, E. Grefenstette, S. R. Bowman, T. Rockt ¨aschel, and E. Perez, “Debating with more persuasive llms leads to more truthful answers,” arXiv preprint arXiv:2402.06782, 2024

work page arXiv 2024
[33]

Super- notes: Driving consensus in crowd-sourced fact-checking,

S. De, M. A. Bakker, J. Baxter, and M. Saveski, “Super- notes: Driving consensus in crowd-sourced fact-checking,” arXiv preprint arXiv:2411.06116 , 2024

work page arXiv 2024
[34]

Unchecked vs. uncheck- able: How opinion-based claims can impede corrections of misinformation,

N. Walter and N. A. Salovich, “Unchecked vs. uncheck- able: How opinion-based claims can impede corrections of misinformation,” Mass communication and society , vol. 24, no. 4, pp. 500–526, 2021

work page 2021
[35]

Google vision api: Detect web entities and pages,

Google, “Google vision api: Detect web entities and pages,” https://cloud.google.com/vision/docs/ detecting-web

work page
[36]

Programmable search engine,

——, “Programmable search engine,” https://developers. google.com/custom-search/v1/overview

work page
[37]

Verifying online information,

S. Urbani, “Verifying online information,” Es- sential Guides , 2020, published in 2022. [Online]. Available: https://firstdraftnews.org/articles/ verifying-online-information-the-absolute-essentials/

work page 2020
[38]

” image, tell me your story!

J. Tonglet, M.-F. Moens, and I. Gurevych, “” image, tell me your story!” predicting the original meta-context of visual misinformation,” arXiv preprint arXiv:2408.09939 , 2024

work page arXiv 2024
[39]

Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks

N. Reimers and I. Gurevych, “Sentence-bert: Sentence embeddings using siamese bert-networks,” in Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing . Association for Computational Linguistics, 11 2019. [Online]. Available: https://arxiv.org/abs/1908.10084

work page internal anchor Pith review Pith/arXiv arXiv 2019
[40]

Joint face detec- tion and alignment using multitask cascaded convolutional networks,

K. Zhang, Z. Zhang, Z. Li, and Y . Qiao, “Joint face detec- tion and alignment using multitask cascaded convolutional networks,” IEEE signal processing letters, vol. 23, no. 10, pp. 1499–1503, 2016

work page 2016
[41]

Facenet: A unified embedding for face recognition and clustering,

F. Schroff, D. Kalenichenko, and J. Philbin, “Facenet: A unified embedding for face recognition and clustering,” in CVPR, 2015

work page 2015
[42]

Places: A 10 million image database for scene recognition,

B. Zhou, A. Lapedriza, A. Khosla, A. Oliva, and A. Tor- ralba, “Places: A 10 million image database for scene recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017

work page 2017
[43]

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly et al., “An image is worth 16x16 words: Transformers for image recognition at scale,”arXiv preprint arXiv:2010.11929, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2010
[44]

Improved Baselines with Visual Instruction Tuning

H. Liu, C. Li, Y . Li, and Y . J. Lee, “Improved baselines with visual instruction tuning,” 2024. [Online]. Available: https://arxiv.org/abs/2310.03744

work page internal anchor Pith review Pith/arXiv arXiv 2024
[45]

Silhouettes: a graphical aid to the interpretation and validation of cluster analysis,

P. J. Rousseeuw, “Silhouettes: a graphical aid to the interpretation and validation of cluster analysis,” Journal of computational and applied mathematics , vol. 20, pp. 53–65, 1987

work page 1987
[46]

A cluster separation measure,

D. L. Davies and D. W. Bouldin, “A cluster separation measure,” IEEE transactions on pattern analysis and machine intelligence, no. 2, pp. 224–227, 1979

work page 1979

[1] [1]

Detecting and mitigating the dissemination of fake news: Challenges and future research opportunities,

W. Shahid, B. Jamshidi, S. Hakak, H. Isah, W. Z. Khan, M. K. Khan, and K.-K. R. Choo, “Detecting and mitigating the dissemination of fake news: Challenges and future research opportunities,” IEEE Transactions on Computational Social Systems , vol. 11, no. 4, pp. 4649– 4662, 2022

work page 2022

[2] [2]

The spread of true and false news online,

S. V osoughi, D. Roy, and S. Aral, “The spread of true and false news online,” science, vol. 359, no. 6380, pp. 1146–1151, 2018

work page 2018

[3] [3]

Fake news, disinformation and misinformation in social media: a review,

E. A ¨ımeur, S. Amri, and G. Brassard, “Fake news, disinformation and misinformation in social media: a review,” Social Network Analysis and Mining , vol. 13, no. 1, p. 30, 2023

work page 2023

[4] [4]

The economics of “fake news

N. Kshetri and J. V oas, “The economics of “fake news”,” IT Professional, vol. 19, no. 6, pp. 8–12, 2017

work page 2017

[5] [5]

The impact of misinformation on the covid-19 pandemic,

M. M. F. Caceres, J. P. Sosa, J. A. Lawrence, C. Sestacov- schi, A. Tidd-Johnson, M. H. U. Rasool, V . K. Gadamidi, S. Ozair, K. Pandav, C. Cuevas-Lou et al., “The impact of misinformation on the covid-19 pandemic,” AIMS public health, vol. 9, no. 2, p. 262, 2022

work page 2022

[6] [6]

Social media and fake news in the 2016 election,

H. Allcott and M. Gentzkow, “Social media and fake news in the 2016 election,” Journal of economic perspectives , vol. 31, no. 2, pp. 211–236, 2017

work page 2016

[7] [7]

Red-dot: Multimodal fact-checking via relevant evidence detection,

S.-I. Papadopoulos, C. Koutlis, S. Papadopoulos, and P. C. Petrantonakis, “Red-dot: Multimodal fact-checking via relevant evidence detection,” 2024. [Online]. Available: https://arxiv.org/abs/2311.09939

work page arXiv 2024

[8] [8]

Similarity over factuality: Are we making progress on multimodal out-of-context misinformation detection?

——, “Similarity over factuality: Are we making progress on multimodal out-of-context misinformation detection?” arXiv preprint arXiv:2407.13488 , 2024

work page arXiv 2024

[9] [9]

Verite: a robust benchmark for multimodal mis- information detection accounting for unimodal bias,

——, “Verite: a robust benchmark for multimodal mis- information detection accounting for unimodal bias,” International Journal of Multimedia Information Retrieval, vol. 13, no. 1, p. 4, 2024

work page 2024

[10] [10]

Newsclippings: Automatic generation of out-of-context multimodal media,

G. Luo, T. Darrell, and A. Rohrbach, “Newsclippings: Automatic generation of out-of-context multimodal media,” arXiv preprint arXiv:2104.05893 , 2021

work page arXiv 2021

[11] [11]

Open-domain, content-based, multi-modal fact-checking of out-of- context images via online resources,

S. Abdelnabi, R. Hasan, and M. Fritz, “Open-domain, content-based, multi-modal fact-checking of out-of- context images via online resources,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 14 940–14 949

work page 2022

[12] [12]

Cove: Context and veracity prediction for out-of-context images,

J. Tonglet, G. Thiem, and I. Gurevych, “Cove: Context and veracity prediction for out-of-context images,” arXiv preprint arXiv:2502.01194, 2025

work page arXiv 2025

[13] [13]

Integrating social explanations into explainable artificial intelligence (xai) for combating misinformation: Vision and challenges,

Y . Gong, L. Shang, and D. Wang, “Integrating social explanations into explainable artificial intelligence (xai) for combating misinformation: Vision and challenges,” IEEE Transactions on Computational Social Systems , vol. 11, no. 5, pp. 6705–6726, 2024

work page 2024

[14] [14]

Retrieval-augmented generation for knowledge- intensive nlp tasks,

P. Lewis, E. Perez, A. Piktus, F. Petroni, V . Karpukhin, N. Goyal, H. K ¨uttler, M. Lewis, W.-t. Yih, T. Rockt¨aschel et al. , “Retrieval-augmented generation for knowledge- intensive nlp tasks,” Advances in neural information processing systems, vol. 33, pp. 9459–9474, 2020

work page 2020

[15] [15]

Rag-fusion based information retrieval for fact-checking,

Y . Momii, T. Takiguchi, and Y . Ariki, “Rag-fusion based information retrieval for fact-checking,” in Proceedings of the Seventh Fact Extraction and VERification Workshop (FEVER), 2024, pp. 47–54

work page 2024

[16] [16]

Believe in artificial intelligence? a user study on the chatgpt’s fake information impact,

I. Amaro, P. Barra, A. Della Greca, R. Francese, and C. Tucci, “Believe in artificial intelligence? a user study on the chatgpt’s fake information impact,”IEEE Transactions on Computational Social Systems , 2023

work page 2023

[17] [17]

Understanding the promise and limits of automated fact-checking,

D. Graves, “Understanding the promise and limits of automated fact-checking,” Reuters Institute for the Study of Journalism, 2018

work page 2018

[18] [18]

Mmfakebench: A mixed-source mul- timodal misinformation detection benchmark for lvlms,

X. Liu, Z. Li, P. Li, S. Xia, X. Cui, L. Huang, H. Huang, W. Deng, and Z. He, “Mmfakebench: A mixed-source mul- timodal misinformation detection benchmark for lvlms,” arXiv preprint arXiv:2406.08772 , 2024

work page arXiv 2024

[19] [19]

Detecting and grounding multi-modal media manipulation,

R. Shao, T. Wu, and Z. Liu, “Detecting and grounding multi-modal media manipulation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 6904–6913

work page 2023

[20] [20]

Cosmos: Catch- ing out-of-context misinformation with self-supervised learning,

S. Aneja, C. Bregler, and M. Nießner, “Cosmos: Catch- ing out-of-context misinformation with self-supervised learning,” arXiv preprint arXiv:2101.06278 , 2021

work page arXiv 2021

[21] [21]

Amir: An automated misinformation rebuttal system–a covid-19 vaccination datasets-based exposition,

S. Sharma, A. Datta, and R. Sharma, “Amir: An automated misinformation rebuttal system–a covid-19 vaccination datasets-based exposition,” IEEE Transactions on Com- putational Social Systems , 2024

work page 2024

[22] [22]

Defame: Dynamic evidence-based fact-checking with multimodal experts,

T. Braun, M. Rothermel, M. Rohrbach, and A. Rohrbach, “Defame: Dynamic evidence-based fact-checking with multimodal experts,” 2025. [Online]. Available: https: //arxiv.org/abs/2412.10510

work page arXiv 2025

[23] [23]

Retrieval augmented verification for zero-shot detec- tion of multimodal disinformation,

A. U. Dey, A. Llabr ´es, E. Valveny, and D. Karatzas, “Retrieval augmented verification for zero-shot detec- tion of multimodal disinformation,” arXiv preprint arXiv:2404.10702, 2024

work page arXiv 2024

[24] [24]

Sniffer: Mul- timodal large language model for explainable out-of- context misinformation detection,

P. Qi, Z. Yan, W. Hsu, and M. L. Lee, “Sniffer: Mul- timodal large language model for explainable out-of- context misinformation detection,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2024, pp. 13 052–13 062

work page 2024

[25] [25]

Wikidata: a free col- laborative knowledgebase,

D. Vrande ˇci´c and M. Kr ¨otzsch, “Wikidata: a free col- laborative knowledgebase,” Communications of the ACM , vol. 57, no. 10, pp. 78–85, 2014

work page 2014

[26] [26]

Wikichat: Stopping the hallucination of large language model chatbots by few-shot grounding on wikipedia

S. J. Semnani, V . Z. Yao, H. C. Zhang, and M. S. Lam, “Wikichat: Stopping the hallucination of large language model chatbots by few-shot grounding on wikipedia,” arXiv preprint arXiv:2305.14292 , 2023

work page arXiv 2023

[27] [27]

InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning

W. Dai, J. Li, D. Li, A. M. H. Tiong, J. Zhao, W. Wang, B. Li, P. Fung, and S. Hoi, “Instructblip: Towards general-purpose vision-language models with instruction tuning,” 2023. [Online]. Available: https: //arxiv.org/abs/2305.06500

work page internal anchor Pith review Pith/arXiv arXiv 2023

[28] [28]

Judging llm-as-a-judge with mt-bench and chatbot arena,

L. Zheng, W.-L. Chiang, Y . Sheng, S. Zhuang, Z. Wu, Y . Zhuang, Z. Lin, Z. Li, D. Li, E. P. Xing, H. Zhang, J. E. Gonzalez, and I. Stoica, “Judging llm-as-a-judge with mt-bench and chatbot arena,” 2023

work page 2023

[29] [29]

Llm-consensus: Multi- agent debate for visual misinformation detection,

K. Lakara, G. Channing, J. Sock, C. Rupprecht, P. Torr, J. Collomosse, and C. S. de Witt, “Llm-consensus: Multi- agent debate for visual misinformation detection,” 2025. [Online]. Available: https://arxiv.org/abs/2410.20140

work page arXiv 2025

[30] [30]

The Llama 3 Herd of Models

A. Grattafiori, A. Dubey, A. Jauhri, A. Pandey, A. Ka- dian, A. Al-Dahle, A. Letman, A. Mathur, A. Schelten, A. Vaughan et al., “The llama 3 herd of models,” arXiv preprint arXiv:2407.21783, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[31] [31]

DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with Gradient-Disentangled Embedding Sharing

P. He, J. Gao, and W. Chen, “Debertav3: Im- proving deberta using electra-style pre-training with gradient-disentangled embedding sharing,” arXiv preprint arXiv:2111.09543, 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021

[32] [32]

R., Rocktäschel, T., and Perez, E

A. Khan, J. Hughes, D. Valentine, L. Ruis, K. Sachan, A. Radhakrishnan, E. Grefenstette, S. R. Bowman, T. Rockt ¨aschel, and E. Perez, “Debating with more persuasive llms leads to more truthful answers,” arXiv preprint arXiv:2402.06782, 2024

work page arXiv 2024

[33] [33]

Super- notes: Driving consensus in crowd-sourced fact-checking,

S. De, M. A. Bakker, J. Baxter, and M. Saveski, “Super- notes: Driving consensus in crowd-sourced fact-checking,” arXiv preprint arXiv:2411.06116 , 2024

work page arXiv 2024

[34] [34]

Unchecked vs. uncheck- able: How opinion-based claims can impede corrections of misinformation,

N. Walter and N. A. Salovich, “Unchecked vs. uncheck- able: How opinion-based claims can impede corrections of misinformation,” Mass communication and society , vol. 24, no. 4, pp. 500–526, 2021

work page 2021

[35] [35]

Google vision api: Detect web entities and pages,

Google, “Google vision api: Detect web entities and pages,” https://cloud.google.com/vision/docs/ detecting-web

work page

[36] [36]

Programmable search engine,

——, “Programmable search engine,” https://developers. google.com/custom-search/v1/overview

work page

[37] [37]

Verifying online information,

S. Urbani, “Verifying online information,” Es- sential Guides , 2020, published in 2022. [Online]. Available: https://firstdraftnews.org/articles/ verifying-online-information-the-absolute-essentials/

work page 2020

[38] [38]

” image, tell me your story!

J. Tonglet, M.-F. Moens, and I. Gurevych, “” image, tell me your story!” predicting the original meta-context of visual misinformation,” arXiv preprint arXiv:2408.09939 , 2024

work page arXiv 2024

[39] [39]

Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks

N. Reimers and I. Gurevych, “Sentence-bert: Sentence embeddings using siamese bert-networks,” in Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing . Association for Computational Linguistics, 11 2019. [Online]. Available: https://arxiv.org/abs/1908.10084

work page internal anchor Pith review Pith/arXiv arXiv 2019

[40] [40]

Joint face detec- tion and alignment using multitask cascaded convolutional networks,

K. Zhang, Z. Zhang, Z. Li, and Y . Qiao, “Joint face detec- tion and alignment using multitask cascaded convolutional networks,” IEEE signal processing letters, vol. 23, no. 10, pp. 1499–1503, 2016

work page 2016

[41] [41]

Facenet: A unified embedding for face recognition and clustering,

F. Schroff, D. Kalenichenko, and J. Philbin, “Facenet: A unified embedding for face recognition and clustering,” in CVPR, 2015

work page 2015

[42] [42]

Places: A 10 million image database for scene recognition,

B. Zhou, A. Lapedriza, A. Khosla, A. Oliva, and A. Tor- ralba, “Places: A 10 million image database for scene recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017

work page 2017

[43] [43]

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly et al., “An image is worth 16x16 words: Transformers for image recognition at scale,”arXiv preprint arXiv:2010.11929, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2010

[44] [44]

Improved Baselines with Visual Instruction Tuning

H. Liu, C. Li, Y . Li, and Y . J. Lee, “Improved baselines with visual instruction tuning,” 2024. [Online]. Available: https://arxiv.org/abs/2310.03744

work page internal anchor Pith review Pith/arXiv arXiv 2024

[45] [45]

Silhouettes: a graphical aid to the interpretation and validation of cluster analysis,

P. J. Rousseeuw, “Silhouettes: a graphical aid to the interpretation and validation of cluster analysis,” Journal of computational and applied mathematics , vol. 20, pp. 53–65, 1987

work page 1987

[46] [46]

A cluster separation measure,

D. L. Davies and D. W. Bouldin, “A cluster separation measure,” IEEE transactions on pattern analysis and machine intelligence, no. 2, pp. 224–227, 1979

work page 1979