RW-Post: Auditable Evidence-Grounded Multimodal Fact-Checking in the Wild

Danni Xu; Harry Cheng; Mohan Kankanhalli; Shaojing Fan

arxiv: 2512.22933 · v4 · submitted 2025-12-28 · 💻 cs.AI · cs.CL

RW-Post: Auditable Evidence-Grounded Multimodal Fact-Checking in the Wild

Danni Xu , Shaojing Fan , Harry Cheng , Mohan Kankanhalli This is my paper

Pith reviewed 2026-05-16 19:25 UTC · model grok-4.3

classification 💻 cs.AI cs.CL

keywords multimodal fact-checkingevidence groundingbenchmark datasetmisinformation detectionvision-language modelsauditable annotationssocial media verification

0 comments

The pith

RW-Post benchmark shows evidence-bounded evaluation improves accuracy and faithfulness in multimodal fact-checking.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents RW-Post, a dataset that pairs real social-media posts with reasoning traces and explicitly linked evidence extracted from human fact-check articles. An LLM-assisted pipeline creates auditable annotations that support three evaluation regimes: closed-book, evidence-bounded, and open-web. Experiments on this benchmark reveal that current large vision-language models frequently fail to ground their outputs faithfully in the supplied evidence. When evaluation is restricted to the provided evidence, both accuracy and faithfulness rise. The work supplies AgentFact as a baseline and demonstrates measurable headroom for future systems.

Core claim

RW-Post supplies post-aligned instances with auditable evidence links drawn from human fact-check articles; under unified protocols, strong open-source LVLMs exhibit low faithfulness in evidence grounding, yet evidence-bounded evaluation measurably raises both accuracy and adherence to the supplied facts.

What carries the argument

RW-Post benchmark of post-aligned text-image instances whose annotations are produced by an LLM-assisted extraction-and-auditing pipeline that converts human fact-check articles into explicit reasoning traces and evidence items.

If this is right

Models can be diagnosed separately for visual grounding failures versus reasoning failures.
Evidence-bounded protocols become a practical way to measure and improve faithfulness.
AgentFact and similar agent baselines can be compared directly against LVLMs under the same three regimes.
Development of new multimodal systems can target the identified gaps in evidence utilization.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same extraction pipeline could be reused to create comparable benchmarks for text-only or video-based misinformation.
Auditable traces may support downstream training of models that learn to cite evidence explicitly.
The observed headroom implies that hybrid systems combining retrieval and generation could close much of the gap.

Load-bearing premise

The LLM pipeline produces annotations that faithfully match the original human fact-check articles without introducing systematic errors or biases.

What would settle it

A manual audit of a random sample of RW-Post instances that finds frequent mismatches between the extracted evidence links and the content of the original fact-check articles would undermine the benchmark.

Figures

Figures reproduced from arXiv: 2512.22933 by Danni Xu, Harry Cheng, Mohan Kankanhalli, Shaojing Fan.

**Figure 1.** Figure 1: RW-Post Dataset: Use Context (purple highlight) helps LLM determine whether the link (pink highlight) is post or [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗

**Figure 2.** Figure 2: Examples of image annotations in the fact-checking [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Statistics of RW-Post Dataset IV. METHOD a) Overview: We decompose the fact-checking problem into five independent sub-tasks and design a dedicated agent for each of them. Building on these components, we develop a fact-checking pipeline, termed the AgentFact framework, which integrates the five agents into a multi-round evidence retrieval, filtering and reasoning process to achieve highquality fact check… view at source ↗

**Figure 4.** Figure 4: Proposed Multimodal Fact-checking Agents. [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

**Figure 5.** Figure 5: Proposed Multimodal Fact-checking workflow with [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗

**Figure 7.** Figure 7: From these two examples, we observe that AgentFact is capable of producing coherent reasoning and structured key points supported by multimodal evidence from generally reliable sources. However, a closer comparison with the ground truth reasoning reveals several notable shortcomings. A common failure pattern highlighted by these cases is the model’s inability to retrieve accurate image contextual evidenc… view at source ↗

**Figure 7.** Figure 7: Case study of a correctly classified claim. [PITH_FULL_IMAGE:figures/full_fig_p010_7.png] view at source ↗

read the original abstract

Multimodal misinformation increasingly leverages visual persuasion, where repurposed or manipulated images strengthen misleading text. We introduce RW-Post, a post-aligned text--image benchmark for real-world multimodal fact-checking with auditable annotations: each instance links the original social-media post with reasoning traces and explicitly linked evidence items derived from human fact-check articles via an LLM-assisted extraction-and-auditing pipeline. RW-Post supports controlled evaluation across closed-book, evidence-bounded, and open-web regimes, enabling systematic diagnosis of visual grounding and evidence utilization. We provide AgentFact as a reference verification baseline and benchmark strong open-source LVLMs under unified protocols. Experiments show substantial headroom: current models struggle with faithful evidence grounding, while evidence-bounded evaluation improves both accuracy and faithfulness.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

RW-Post offers a new auditable benchmark built from real fact-checks, but the claims about model headroom rest on an unvalidated LLM extraction pipeline and lack any numbers.

read the letter

The paper's core contribution is RW-Post, a dataset of social media posts paired with text and images, each linked to evidence and reasoning traces pulled from human fact-check articles. They use an LLM-assisted extraction and auditing process to make those links explicit and auditable. Along with that, they introduce AgentFact as a reference baseline and test several open-source large vision-language models. What stands out is the evaluation setup. It allows running the same instances in closed-book mode, with bounded evidence, and with open web access. This makes it possible to measure how well models use evidence and ground their answers visually. The abstract says current models struggle with faithful evidence grounding and that the bounded regime improves both accuracy and faithfulness. The soft spots are around the pipeline and the missing details. The LLM-assisted extraction is central to the benchmark's value, yet there's no mention of how they validated it against human judgments or checked for biases and hallucinations. If that step isn't reliable, the diagnosis of model headroom could be off. The abstract also gives no numbers on dataset size, model performances, or error analysis, which leaves the experimental findings without concrete support. On the positive side, the construction from real human fact-checks avoids some circularity issues, and the focus on auditability is a step in the right direction for this area. This work is aimed at people developing AI systems for misinformation detection, especially multimodal cases. A reader building or evaluating fact-checking benchmarks would get practical ideas from the pipeline and the regime comparisons. I would recommend sending it for peer review. The benchmark idea has enough substance that referees can help strengthen the validation parts and clarify the results.

Referee Report

2 major / 1 minor

Summary. The paper introduces RW-Post, a post-aligned text-image benchmark for real-world multimodal fact-checking. Each instance links original social-media posts to reasoning traces and explicitly linked evidence items extracted from human fact-check articles via an LLM-assisted extraction-and-auditing pipeline. The benchmark supports controlled evaluation in closed-book, evidence-bounded, and open-web regimes. It provides AgentFact as a reference baseline and evaluates strong open-source LVLMs, claiming that current models struggle with faithful evidence grounding while evidence-bounded evaluation improves both accuracy and faithfulness.

Significance. If the pipeline produces faithful annotations, RW-Post would offer a valuable, auditable resource for diagnosing visual grounding failures and evidence utilization in multimodal models. The controlled regimes and post-alignment are strengths that could enable reproducible progress on a timely problem. However, the absence of reported validation for the core annotation process limits the immediate impact of the experimental claims.

major comments (2)

[Benchmark Construction] Benchmark construction section: the LLM-assisted extraction-and-auditing pipeline is presented as producing accurate, auditable annotations, yet no quantitative validation (human agreement rates, error analysis, or bias checks) is reported. This is load-bearing for the central claim that evidence-bounded evaluation improves faithfulness, because any systematic extraction errors would make measured improvements reflect pipeline artifacts rather than model capability.
[Experiments] Experiments section: the abstract and high-level findings assert substantial headroom and regime-specific improvements, but the provided text supplies no quantitative results, dataset statistics, or error breakdowns. Without these, the diagnosis of model struggles with evidence grounding cannot be directly assessed or reproduced.

minor comments (1)

[Abstract] The abstract states high-level experimental findings without any numerical values or dataset sizes; adding a brief summary table of key metrics would improve clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive review and for recognizing the potential value of RW-Post for diagnosing visual grounding and evidence utilization issues. We address each major comment below and will incorporate the suggested additions to strengthen the manuscript.

read point-by-point responses

Referee: [Benchmark Construction] Benchmark construction section: the LLM-assisted extraction-and-auditing pipeline is presented as producing accurate, auditable annotations, yet no quantitative validation (human agreement rates, error analysis, or bias checks) is reported. This is load-bearing for the central claim that evidence-bounded evaluation improves faithfulness, because any systematic extraction errors would make measured improvements reflect pipeline artifacts rather than model capability.

Authors: We agree that quantitative validation of the annotation pipeline is essential to substantiate the claims. The current manuscript describes the LLM-assisted extraction-and-auditing pipeline and its auditable design but does not report human agreement rates, error analysis, or bias checks. In the revised version we will add a dedicated subsection reporting results from a human audit of a sampled subset of annotations, including inter-annotator agreement rates, categorized error types, and bias checks. This will directly address the concern that measured improvements could reflect pipeline artifacts. revision: yes
Referee: [Experiments] Experiments section: the abstract and high-level findings assert substantial headroom and regime-specific improvements, but the provided text supplies no quantitative results, dataset statistics, or error breakdowns. Without these, the diagnosis of model struggles with evidence grounding cannot be directly assessed or reproduced.

Authors: We agree that the experiments section must supply explicit quantitative results, dataset statistics, and error breakdowns to support the claims and enable reproduction. The manuscript currently presents high-level findings without sufficient detail in the main text. In the revision we will expand the experiments section to include dataset statistics (instance counts, regime distributions), full quantitative accuracy and faithfulness metrics across models and regimes, and error breakdowns. These additions will make the diagnosis of evidence-grounding struggles directly assessable and reproducible. revision: yes

Circularity Check

0 steps flagged

No circularity detected in derivation chain

full rationale

The paper constructs the RW-Post benchmark directly from external human fact-check articles via an LLM-assisted extraction pipeline, then reports model performance across evaluation regimes. No equations, parameters, or central claims reduce by construction to fitted inputs, self-definitions, or self-citation chains; the annotations are presented as derived from independent sources, and the experimental diagnosis of headroom follows from direct measurement on this externally sourced dataset rather than any internal renaming or forced prediction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central contribution rests on the assumption that the extraction pipeline yields faithful annotations; no free parameters or invented entities are described.

axioms (1)

domain assumption LLM-assisted extraction from human fact-check articles produces accurate and auditable reasoning traces and evidence links.
This assumption underpins the validity of the RW-Post annotations and the evaluation regimes.

pith-pipeline@v0.9.0 · 5432 in / 1118 out tokens · 23652 ms · 2026-05-16T19:25:45.050078+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

RW-Post ... LLM-assisted extraction-and-auditing pipeline ... AgentFact ... five specialized agents ... Strategy Planning, Visual Retrieval and Analysis, Text Evidence Retrieval, Plan Guided Reasoning, Explanation Generation

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

50 extracted references · 50 canonical work pages

[1]

In- fodemics and health misinformation: a systematic review of reviews,

I. J. Borges do Nascimento, A. B. Pizarro, J. M. Almeida, N. Azzopardi- Muscat, M. A. Gonc ¸alves, M. Bj ¨orklund, and D. Novillo-Ortiz, “In- fodemics and health misinformation: a systematic review of reviews,” Bulletin of the World Health Organization, vol. 100, no. 9, pp. 544–561, Sep. 2022, epub 2022 Jun 30

work page 2022
[2]

The false tariff headline that sent stocks on a $2 trillion ride,

“The false tariff headline that sent stocks on a $2 trillion ride,” The Wall Street Journal, Apr. 2025, accessed: 2025-04-11. [Online]. Available: https://www.wsj.com/finance/stocks/the-false-tariff-headlin e-that-sent-stocks-on-a-2-trillion-ride-2224ef75

work page 2025
[3]

Does fake news impact stock returns? evidence from us and eu stock markets,

M. C. Arcuri, G. Gandolfi, and I. Russo, “Does fake news impact stock returns? evidence from us and eu stock markets,”Journal of Economics and Business, vol. 125-126, p. 106130, 2023. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0148619523000231

work page 2023
[4]

Tech companies are taking action on ai election misinformation. will it matter?

W. Henshall, “Tech companies are taking action on ai election misinformation. will it matter?”Time, 2023, accessed: 2025-04-11. [Online]. Available: https://time.com/6333288/tech-companies-ai-misin formation/

work page arXiv 2023
[5]

Deepfake detection: A comprehensive survey from the reliability perspective,

T. Wang, X. Liao, K. P. Chow, X. Lin, and Y . Wang, “Deepfake detection: A comprehensive survey from the reliability perspective,” ACM Comput. Surv., vol. 57, no. 3, Nov. 2024

work page 2024
[6]

Fake accounts drove praise of duterte and now target philippine election,

“Fake accounts drove praise of duterte and now target philippine election,”Reuters, Apr. 2025, accessed: 2025-04-11. [Online]. Available: https://www.reuters.com/world/asia-pacific/fake-accounts-drove-prais e-duterte-now-target-philippine-election-2025-04-11/

work page 2025
[7]

Semantics-oriented multitask learning for deepfake detection: A joint embedding approach,

M. Zou, B. Yu, Y . Zhan, S. Lyu, and K. Ma, “Semantics-oriented multitask learning for deepfake detection: A joint embedding approach,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 35, no. 10, pp. 9950–9963, 2025

work page 2025
[8]

Trump, twitter, and truth judgments: The effects of “disputed

J. C. Blanchar and C. J. Norris, “Trump, twitter, and truth judgments: The effects of “disputed” tags and political knowledge on the judged truthfulness of election misinformation,”HKS Misinformation Review, September 2024. [Online]. Available: https://misinforeview.hks.harvard. edu/article/trump-twitter-and-truth-judgments-the-effects-of-disputed-t ags-a...

work page 2024
[9]

The global effectiveness of fact-checking: Evidence from simultaneous experiments in argentina, nigeria, south africa, and the united kingdom,

E. Porter and T. J. Wood, “The global effectiveness of fact-checking: Evidence from simultaneous experiments in argentina, nigeria, south africa, and the united kingdom,”Proceedings of the National Academy of Sciences, vol. 118, no. 37, p. e2104235118, 2021. [Online]. Available: https://www.pnas.org/doi/abs/10.1073/pnas.2104235118

work page doi:10.1073/pnas.2104235118 2021
[10]

Sniffer: Multimodal large lan- guage model for explainable out-of-context misinformation detection,

P. Qi, Z. Yan, W. Hsu, and M. L. Lee, “Sniffer: Multimodal large lan- guage model for explainable out-of-context misinformation detection,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2024, pp. 13 052–13 062

work page 2024
[11]

Noise based deepfake detection via multi-head relative-interaction,

T. Wang and K. P. Chow, “Noise based deepfake detection via multi-head relative-interaction,”Proceedings of the AAAI Conference on Artificial Intelligence, vol. 37, no. 12, pp. 14 548–14 556, Jun. 2023

work page 2023
[12]

Unsupervised generative fake image detector,

T. Qiao, H. Shao, S. Xie, and R. Shi, “Unsupervised generative fake image detector,”IEEE Transactions on Circuits and Systems for Video Technology, vol. 34, no. 9, pp. 8442–8455, 2024

work page 2024
[13]

Audio-visual temporal forgery de- tection using embedding-level fusion and multi-dimensional contrastive loss,

M. Liu, J. Wang, X. Qian, and H. Li, “Audio-visual temporal forgery de- tection using embedding-level fusion and multi-dimensional contrastive loss,”IEEE Transactions on Circuits and Systems for Video Technology, vol. 34, no. 8, pp. 6937–6948, 2024

work page 2024
[14]

Detecting compressed deepfake videos in social networks using frame-temporality two-stream convolu- tional network,

J. Hu, X. Liao, W. Wang, and Z. Qin, “Detecting compressed deepfake videos in social networks using frame-temporality two-stream convolu- tional network,”IEEE Transactions on Circuits and Systems for Video Technology, vol. 32, no. 3, pp. 1089–1102, 2022

work page 2022
[15]

Qacheck: A demonstration system for question-guided multi-hop fact-checking,

L. Pan, X. Lu, M.-Y . Kan, and P. Nakov, “Qacheck: A demonstration system for question-guided multi-hop fact-checking,” inProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing System Demonstrations Track (EMNLP 2023 Demo Track), Singapore, Dec 2023

work page 2023
[16]

De- tecting misinformation with llm-predicted credibility signals and weak supervision,

J. A. Leite, O. Razuvayevskaya, K. Bontcheva, and C. Scarton, “De- tecting misinformation with llm-predicted credibility signals and weak supervision,”arXiv preprint arXiv:2309.07601, 2023

work page arXiv 2023
[17]

Towards llm-based fact verification on news claims with a hierarchical step-by-step prompting method,

X. Zhang and W. Gao, “Towards llm-based fact verification on news claims with a hierarchical step-by-step prompting method,”AACL, 2023

work page 2023
[18]

Fighting lies with intelligence: Using large language models and chain of thoughts technique to combat fake news,

W. Kareem and N. Abbas, “Fighting lies with intelligence: Using large language models and chain of thoughts technique to combat fake news,” inInternational Conference on Innovative Techniques and Applications of Artificial Intelligence. Springer, 2023, pp. 253–258

work page 2023
[19]

Mmidr: Teaching large language model to interpret multimodal misinformation via knowledge distillation,

L. Wang, X. Xu, L. Zhang, J. Lu, Y . Xu, H. Xu, and C. Zhang, “Mmidr: Teaching large language model to interpret multimodal misinformation via knowledge distillation,”arXiv preprint arXiv:2403.14171, 2024

work page arXiv 2024
[20]

Lemma: towards lvlm-enhanced multimodal misinformation detection with external knowledge augmentation.arXiv preprint arXiv:2402.11943, 2024

K. Xuan, L. Yi, F. Yang, R. Wu, Y . R. Fung, and H. Ji, “Lemma: To- wards lvlm-enhanced multimodal misinformation detection with external knowledge augmentation,”arXiv preprint arXiv:2402.11943, 2024

work page arXiv 2024
[21]

Few-shot in- context learning for implicit semantic multimodal content detection and interpretation,

X. Wang, L. Wang, Y . Su, H. Tian, G. Jin, and A.-A. Liu, “Few-shot in- context learning for implicit semantic multimodal content detection and interpretation,”IEEE Transactions on Circuits and Systems for Video Technology, vol. 35, no. 9, pp. 9545–9558, 2025

work page 2025
[22]

“image, tell me your story!

J. Tonglet, M.-F. Moens, and I. Gurevych, ““image, tell me your story!” predicting the original meta-context of visual misinformation,” inProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, Y . Al-Onaizan, M. Bansal, and Y .-N. Chen, Eds. Miami, Florida, USA: Association for Computational Linguistics, Nov. 2024, pp. 784...

work page 2024
[23]

Improving factuality and reasoning in language models through multiagent debate,

Y . Du, S. Li, A. Torralba, J. B. Tenenbaum, and I. Mordatch, “Improving factuality and reasoning in language models through multiagent debate,” inProceedings of the 41st International Conference on Machine Learn- ing, ser. ICML’24. JMLR.org, 2024

work page 2024
[24]

Defame: Dynamic evidence-based fact-checking with multimodal experts,

T. Braun, M. Rothermel, M. Rohrbach, and A. Rohrbach, “DEFAME: Dynamic Evidence-based FAct-checking with Multimodal Experts,” inProceedings of the 42nd International Conference on Machine Learning, 2025. [Online]. Available: https://arxiv.org/abs/2412.10510

work page arXiv 2025
[25]

Mdfend: Multi-domain fake news detection,

Q. Nan, J. Cao, Y . Zhu, Y . Wang, and J. Li, “Mdfend: Multi-domain fake news detection,” inProceedings of the 30th ACM International Conference on Information & Knowledge Management, 2021, pp. 3343– 3347

work page 2021
[26]

Fakenewsnet: A data repository with news content, social context, and spatiotemporal information for studying fake news on social media,

K. Shu, D. Mahudeswaran, S. Wang, D. Lee, and H. Liu, “Fakenewsnet: A data repository with news content, social context, and spatiotemporal information for studying fake news on social media,”Big data, vol. 8, no. 3, pp. 171–188, 2020

work page 2020
[27]

A coarse-to- fine cascaded evidence-distillation neural network for explainable fake news detection,

Z. Yang, J. Ma, H. Chen, H. Lin, Z. Luo, and Y . Chang, “A coarse-to- fine cascaded evidence-distillation neural network for explainable fake news detection,” inProceedings of the 29th International Conference on Computational Linguistics, N. Calzolari, C.-R. Huang, H. Kim, J. Pustejovsky, L. Wanner, K.-S. Choi, P.-M. Ryu, H.-H. Chen, L. Donatelli, H. Ji,...

work page 2022
[28]

Mumin: A large-scale multilingual multimodal fact-checked misinformation social network dataset,

D. S. Nielsen and R. McConville, “Mumin: A large-scale multilingual multimodal fact-checked misinformation social network dataset,” inPro- ceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR). ACM, 2022

work page 2022
[29]

Mr2: A benchmark for multimodal retrieval-augmented rumor detection in social media,

X. Hu, Z. Guo, J. Chen, L. Wen, and P. S. Yu, “Mr2: A benchmark for multimodal retrieval-augmented rumor detection in social media,” inProceedings of the 46th international ACM SIGIR conference on research and development in information retrieval, 2023, pp. 2901–2912

work page 2023
[30]

Fakeddit: A new multimodal benchmark dataset for fine-grained fake news detection,

K. Nakamura, S. Levy, and W. Y . Wang, “Fakeddit: A new multimodal benchmark dataset for fine-grained fake news detection,”Conference on Language Resources and Evaluation (LREC 2020), pp. 6149–6157, 2020. 12

work page 2020
[31]

Averitec: A dataset for real-world claim verification with evidence from the web,

M. Schlichtkrull, Z. Guo, and A. Vlachos, “Averitec: A dataset for real-world claim verification with evidence from the web,”Advances in Neural Information Processing Systems, vol. 36, pp. 65 128–65 167, 2023

work page 2023
[32]

Metasumperceiver: Multi- modal multi-document evidence summarization for fact-checking,

T.-C. Chen, C.-W. Tang, and C. Thomas, “Metasumperceiver: Multi- modal multi-document evidence summarization for fact-checking,”ACL, 2024

work page 2024
[33]

Multimedia semantic integrity assessment using joint embedding of images and text,

A. Jaiswal, E. Sabir, W. AbdAlmageed, and P. Natarajan, “Multimedia semantic integrity assessment using joint embedding of images and text,” inProceedings of the 25th ACM international conference on Multimedia, 2017, pp. 1465–1471

work page 2017
[34]

Cosmos: catching out-of-context image misuse using self-supervised learning,

S. Aneja, C. Bregler, and M. Nießner, “Cosmos: catching out-of-context image misuse using self-supervised learning,” inProceedings of the AAAI conference on artificial intelligence, vol. 37, no. 12, 2023, pp. 14 084–14 092

work page 2023
[35]

Newsclippings: Automatic generation of out-of-context multimodal media,

G. Luo, T. Darrell, and A. Rohrbach, “Newsclippings: Auto- matic generation of out-of-context multimodal media,”arXiv preprint arXiv:2104.05893, 2021

work page arXiv 2021
[36]

Synthetic misinformers: Generating and combating multimodal misin- formation,

S.-I. Papadopoulos, C. Koutlis, S. Papadopoulos, and P. Petrantonakis, “Synthetic misinformers: Generating and combating multimodal misin- formation,” inProceedings of the 2nd ACM International Workshop on Multimedia AI against Disinformation, 2023, pp. 36–44

work page 2023
[37]

Multimodal analytics for real-world news using measures of cross- modal entity consistency,

E. M ¨uller-Budack, J. Theiner, S. Diering, M. Idahl, and R. Ewerth, “Multimodal analytics for real-world news using measures of cross- modal entity consistency,” inProceedings of the 2020 international conference on multimedia retrieval, 2020, pp. 16–25

work page 2020
[38]

Capturing the style of fake news,

P. Przybyla, “Capturing the style of fake news,” inProceedings of the AAAI conference on artificial intelligence, vol. 34, no. 01, 2020, pp. 490–497

work page 2020
[39]

Hierarchical propa- gation networks for fake news detection: Investigation and exploitation,

K. Shu, D. Mahudeswaran, S. Wang, and H. Liu, “Hierarchical propa- gation networks for fake news detection: Investigation and exploitation,” inProceedings of the international AAAI conference on web and social media, vol. 14, 2020, pp. 626–637

work page 2020
[40]

Safe: Similarity-aware multi-modal fake news detection,

X. Zhou, J. Wu, and R. Zafarani, “Safe: Similarity-aware multi-modal fake news detection,” inPacific-Asia Conference on Knowledge Discov- ery and Data Mining. Springer, 2020, pp. 354–367

work page 2020
[41]

Causal inference for leveraging image-text matching bias in multi-modal fake news detection,

L. Hu, Z. Chen, Z. Zhao, J. Yin, and L. Nie, “Causal inference for leveraging image-text matching bias in multi-modal fake news detection,”IEEE Transactions on Knowledge and Data Engineering, vol. 35, no. 11, pp. 11 141–11 152, 2023

work page 2023
[42]

Similarity over factuality: Are we making progress on multimodal out-of-context misinformation detection?

S.-I. Papadopoulos, C. Koutlis, S. Papadopoulos, and P. C. Petrantonakis, “Similarity over factuality: Are we making progress on multimodal out-of-context misinformation detection?” inProceedings of the Winter Conference on Applications of Computer Vision (WACV), February 2025, pp. 5570–5579

work page 2025
[43]

Open-domain, content-based, multi-modal fact-checking of out-of-context images via online re- sources,

S. Abdelnabi, R. Hasan, and M. Fritz, “Open-domain, content-based, multi-modal fact-checking of out-of-context images via online re- sources,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 14 940–14 949

work page 2022
[44]

Mm-vet: Evaluating large multimodal models for integrated capabilities,

W. Yu, Z. Yang, L. Li, J. Wang, K. Lin, Z. Liu, X. Wang, and L. Wang, “Mm-vet: Evaluating large multimodal models for integrated capabilities,” inInternational conference on machine learning. PMLR, 2024

work page 2024
[45]

Measuring massive multitask language understanding,

D. Hendrycks, C. Burns, S. Basart, A. Zou, M. Mazeika, D. Song, and J. Steinhardt, “Measuring massive multitask language understanding,” Proceedings of the International Conference on Learning Representa- tions (ICLR), 2021

work page 2021
[46]

Mmfakebench: A mixed-source mul- timodal misinformation detection benchmark for lvlms,

X. Liu, Z. Li, P. Li, S. Xia, X. Cui, L. Huang, H. Huang, W. Deng, and Z. He, “Mmfakebench: A mixed-source multimodal misinformation detection benchmark for lvlms,”arXiv preprint arXiv:2406.08772, 2024

work page arXiv 2024
[47]

Flamingo: a visual language model for few-shot learning,

J.-B. Alayrac, J. Donahue, P. Luc, A. Miech, I. Barr, Y . Hasson, K. Lenc, A. Mensch, K. Millicah, M. Reynolds, R. Ring, E. Rutherford, S. Cabi, T. Han, Z. Gong, S. Samangooei, M. Monteiro, J. Menick, S. Borgeaud, A. Brock, A. Nematzadeh, S. Sharifzadeh, M. Binkowski, R. Barreira, O. Vinyals, A. Zisserman, and K. Simonyan, “Flamingo: a visual language mod...

work page 2022
[48]

Dire for diffusion-generated image detection,

Z. Wang, J. Bao, W. Zhou, W. Wang, H. Hu, H. Chen, and H. Li, “Dire for diffusion-generated image detection,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 22 445–22 455

work page 2023
[49]

End-to-end multimodal fact-checking and explanation generation: A challenging dataset and models,

B. M. Yao, A. Shah, L. Sun, J.-H. Cho, and L. Huang, “End-to-end multimodal fact-checking and explanation generation: A challenging dataset and models,” inProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, ser. SIGIR ’23. New York, NY , USA: Association for Computing Machinery, 2023, p. 2733–27...

work page doi:10.1145/3539618.3591879 2023
[50]

Can llms produce faithful explanations for fact-checking? towards faithful explainable fact-checking via multi-agent debate,

K. Kim, S. Lee, K.-H. Huang, H. P. Chan, M. Li, and H. Ji, “Can llms produce faithful explanations for fact-checking? towards faith- ful explainable fact-checking via multi-agent debate,”arXiv preprint arXiv:2402.07401, 2024. DANNI XUis currently a Ph.D. student with the School of Computing, National University of Sin- gapore. She received the B.S. degree...

work page arXiv 2024

[1] [1]

In- fodemics and health misinformation: a systematic review of reviews,

I. J. Borges do Nascimento, A. B. Pizarro, J. M. Almeida, N. Azzopardi- Muscat, M. A. Gonc ¸alves, M. Bj ¨orklund, and D. Novillo-Ortiz, “In- fodemics and health misinformation: a systematic review of reviews,” Bulletin of the World Health Organization, vol. 100, no. 9, pp. 544–561, Sep. 2022, epub 2022 Jun 30

work page 2022

[2] [2]

The false tariff headline that sent stocks on a $2 trillion ride,

“The false tariff headline that sent stocks on a $2 trillion ride,” The Wall Street Journal, Apr. 2025, accessed: 2025-04-11. [Online]. Available: https://www.wsj.com/finance/stocks/the-false-tariff-headlin e-that-sent-stocks-on-a-2-trillion-ride-2224ef75

work page 2025

[3] [3]

Does fake news impact stock returns? evidence from us and eu stock markets,

M. C. Arcuri, G. Gandolfi, and I. Russo, “Does fake news impact stock returns? evidence from us and eu stock markets,”Journal of Economics and Business, vol. 125-126, p. 106130, 2023. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0148619523000231

work page 2023

[4] [4]

Tech companies are taking action on ai election misinformation. will it matter?

W. Henshall, “Tech companies are taking action on ai election misinformation. will it matter?”Time, 2023, accessed: 2025-04-11. [Online]. Available: https://time.com/6333288/tech-companies-ai-misin formation/

work page arXiv 2023

[5] [5]

Deepfake detection: A comprehensive survey from the reliability perspective,

T. Wang, X. Liao, K. P. Chow, X. Lin, and Y . Wang, “Deepfake detection: A comprehensive survey from the reliability perspective,” ACM Comput. Surv., vol. 57, no. 3, Nov. 2024

work page 2024

[6] [6]

Fake accounts drove praise of duterte and now target philippine election,

“Fake accounts drove praise of duterte and now target philippine election,”Reuters, Apr. 2025, accessed: 2025-04-11. [Online]. Available: https://www.reuters.com/world/asia-pacific/fake-accounts-drove-prais e-duterte-now-target-philippine-election-2025-04-11/

work page 2025

[7] [7]

Semantics-oriented multitask learning for deepfake detection: A joint embedding approach,

M. Zou, B. Yu, Y . Zhan, S. Lyu, and K. Ma, “Semantics-oriented multitask learning for deepfake detection: A joint embedding approach,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 35, no. 10, pp. 9950–9963, 2025

work page 2025

[8] [8]

Trump, twitter, and truth judgments: The effects of “disputed

J. C. Blanchar and C. J. Norris, “Trump, twitter, and truth judgments: The effects of “disputed” tags and political knowledge on the judged truthfulness of election misinformation,”HKS Misinformation Review, September 2024. [Online]. Available: https://misinforeview.hks.harvard. edu/article/trump-twitter-and-truth-judgments-the-effects-of-disputed-t ags-a...

work page 2024

[9] [9]

The global effectiveness of fact-checking: Evidence from simultaneous experiments in argentina, nigeria, south africa, and the united kingdom,

E. Porter and T. J. Wood, “The global effectiveness of fact-checking: Evidence from simultaneous experiments in argentina, nigeria, south africa, and the united kingdom,”Proceedings of the National Academy of Sciences, vol. 118, no. 37, p. e2104235118, 2021. [Online]. Available: https://www.pnas.org/doi/abs/10.1073/pnas.2104235118

work page doi:10.1073/pnas.2104235118 2021

[10] [10]

Sniffer: Multimodal large lan- guage model for explainable out-of-context misinformation detection,

P. Qi, Z. Yan, W. Hsu, and M. L. Lee, “Sniffer: Multimodal large lan- guage model for explainable out-of-context misinformation detection,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2024, pp. 13 052–13 062

work page 2024

[11] [11]

Noise based deepfake detection via multi-head relative-interaction,

T. Wang and K. P. Chow, “Noise based deepfake detection via multi-head relative-interaction,”Proceedings of the AAAI Conference on Artificial Intelligence, vol. 37, no. 12, pp. 14 548–14 556, Jun. 2023

work page 2023

[12] [12]

Unsupervised generative fake image detector,

T. Qiao, H. Shao, S. Xie, and R. Shi, “Unsupervised generative fake image detector,”IEEE Transactions on Circuits and Systems for Video Technology, vol. 34, no. 9, pp. 8442–8455, 2024

work page 2024

[13] [13]

Audio-visual temporal forgery de- tection using embedding-level fusion and multi-dimensional contrastive loss,

M. Liu, J. Wang, X. Qian, and H. Li, “Audio-visual temporal forgery de- tection using embedding-level fusion and multi-dimensional contrastive loss,”IEEE Transactions on Circuits and Systems for Video Technology, vol. 34, no. 8, pp. 6937–6948, 2024

work page 2024

[14] [14]

Detecting compressed deepfake videos in social networks using frame-temporality two-stream convolu- tional network,

J. Hu, X. Liao, W. Wang, and Z. Qin, “Detecting compressed deepfake videos in social networks using frame-temporality two-stream convolu- tional network,”IEEE Transactions on Circuits and Systems for Video Technology, vol. 32, no. 3, pp. 1089–1102, 2022

work page 2022

[15] [15]

Qacheck: A demonstration system for question-guided multi-hop fact-checking,

L. Pan, X. Lu, M.-Y . Kan, and P. Nakov, “Qacheck: A demonstration system for question-guided multi-hop fact-checking,” inProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing System Demonstrations Track (EMNLP 2023 Demo Track), Singapore, Dec 2023

work page 2023

[16] [16]

De- tecting misinformation with llm-predicted credibility signals and weak supervision,

J. A. Leite, O. Razuvayevskaya, K. Bontcheva, and C. Scarton, “De- tecting misinformation with llm-predicted credibility signals and weak supervision,”arXiv preprint arXiv:2309.07601, 2023

work page arXiv 2023

[17] [17]

Towards llm-based fact verification on news claims with a hierarchical step-by-step prompting method,

X. Zhang and W. Gao, “Towards llm-based fact verification on news claims with a hierarchical step-by-step prompting method,”AACL, 2023

work page 2023

[18] [18]

Fighting lies with intelligence: Using large language models and chain of thoughts technique to combat fake news,

W. Kareem and N. Abbas, “Fighting lies with intelligence: Using large language models and chain of thoughts technique to combat fake news,” inInternational Conference on Innovative Techniques and Applications of Artificial Intelligence. Springer, 2023, pp. 253–258

work page 2023

[19] [19]

Mmidr: Teaching large language model to interpret multimodal misinformation via knowledge distillation,

L. Wang, X. Xu, L. Zhang, J. Lu, Y . Xu, H. Xu, and C. Zhang, “Mmidr: Teaching large language model to interpret multimodal misinformation via knowledge distillation,”arXiv preprint arXiv:2403.14171, 2024

work page arXiv 2024

[20] [20]

Lemma: towards lvlm-enhanced multimodal misinformation detection with external knowledge augmentation.arXiv preprint arXiv:2402.11943, 2024

K. Xuan, L. Yi, F. Yang, R. Wu, Y . R. Fung, and H. Ji, “Lemma: To- wards lvlm-enhanced multimodal misinformation detection with external knowledge augmentation,”arXiv preprint arXiv:2402.11943, 2024

work page arXiv 2024

[21] [21]

Few-shot in- context learning for implicit semantic multimodal content detection and interpretation,

X. Wang, L. Wang, Y . Su, H. Tian, G. Jin, and A.-A. Liu, “Few-shot in- context learning for implicit semantic multimodal content detection and interpretation,”IEEE Transactions on Circuits and Systems for Video Technology, vol. 35, no. 9, pp. 9545–9558, 2025

work page 2025

[22] [22]

“image, tell me your story!

J. Tonglet, M.-F. Moens, and I. Gurevych, ““image, tell me your story!” predicting the original meta-context of visual misinformation,” inProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, Y . Al-Onaizan, M. Bansal, and Y .-N. Chen, Eds. Miami, Florida, USA: Association for Computational Linguistics, Nov. 2024, pp. 784...

work page 2024

[23] [23]

Improving factuality and reasoning in language models through multiagent debate,

Y . Du, S. Li, A. Torralba, J. B. Tenenbaum, and I. Mordatch, “Improving factuality and reasoning in language models through multiagent debate,” inProceedings of the 41st International Conference on Machine Learn- ing, ser. ICML’24. JMLR.org, 2024

work page 2024

[24] [24]

Defame: Dynamic evidence-based fact-checking with multimodal experts,

T. Braun, M. Rothermel, M. Rohrbach, and A. Rohrbach, “DEFAME: Dynamic Evidence-based FAct-checking with Multimodal Experts,” inProceedings of the 42nd International Conference on Machine Learning, 2025. [Online]. Available: https://arxiv.org/abs/2412.10510

work page arXiv 2025

[25] [25]

Mdfend: Multi-domain fake news detection,

Q. Nan, J. Cao, Y . Zhu, Y . Wang, and J. Li, “Mdfend: Multi-domain fake news detection,” inProceedings of the 30th ACM International Conference on Information & Knowledge Management, 2021, pp. 3343– 3347

work page 2021

[26] [26]

Fakenewsnet: A data repository with news content, social context, and spatiotemporal information for studying fake news on social media,

K. Shu, D. Mahudeswaran, S. Wang, D. Lee, and H. Liu, “Fakenewsnet: A data repository with news content, social context, and spatiotemporal information for studying fake news on social media,”Big data, vol. 8, no. 3, pp. 171–188, 2020

work page 2020

[27] [27]

A coarse-to- fine cascaded evidence-distillation neural network for explainable fake news detection,

Z. Yang, J. Ma, H. Chen, H. Lin, Z. Luo, and Y . Chang, “A coarse-to- fine cascaded evidence-distillation neural network for explainable fake news detection,” inProceedings of the 29th International Conference on Computational Linguistics, N. Calzolari, C.-R. Huang, H. Kim, J. Pustejovsky, L. Wanner, K.-S. Choi, P.-M. Ryu, H.-H. Chen, L. Donatelli, H. Ji,...

work page 2022

[28] [28]

Mumin: A large-scale multilingual multimodal fact-checked misinformation social network dataset,

D. S. Nielsen and R. McConville, “Mumin: A large-scale multilingual multimodal fact-checked misinformation social network dataset,” inPro- ceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR). ACM, 2022

work page 2022

[29] [29]

Mr2: A benchmark for multimodal retrieval-augmented rumor detection in social media,

X. Hu, Z. Guo, J. Chen, L. Wen, and P. S. Yu, “Mr2: A benchmark for multimodal retrieval-augmented rumor detection in social media,” inProceedings of the 46th international ACM SIGIR conference on research and development in information retrieval, 2023, pp. 2901–2912

work page 2023

[30] [30]

Fakeddit: A new multimodal benchmark dataset for fine-grained fake news detection,

K. Nakamura, S. Levy, and W. Y . Wang, “Fakeddit: A new multimodal benchmark dataset for fine-grained fake news detection,”Conference on Language Resources and Evaluation (LREC 2020), pp. 6149–6157, 2020. 12

work page 2020

[31] [31]

Averitec: A dataset for real-world claim verification with evidence from the web,

M. Schlichtkrull, Z. Guo, and A. Vlachos, “Averitec: A dataset for real-world claim verification with evidence from the web,”Advances in Neural Information Processing Systems, vol. 36, pp. 65 128–65 167, 2023

work page 2023

[32] [32]

Metasumperceiver: Multi- modal multi-document evidence summarization for fact-checking,

T.-C. Chen, C.-W. Tang, and C. Thomas, “Metasumperceiver: Multi- modal multi-document evidence summarization for fact-checking,”ACL, 2024

work page 2024

[33] [33]

Multimedia semantic integrity assessment using joint embedding of images and text,

A. Jaiswal, E. Sabir, W. AbdAlmageed, and P. Natarajan, “Multimedia semantic integrity assessment using joint embedding of images and text,” inProceedings of the 25th ACM international conference on Multimedia, 2017, pp. 1465–1471

work page 2017

[34] [34]

Cosmos: catching out-of-context image misuse using self-supervised learning,

S. Aneja, C. Bregler, and M. Nießner, “Cosmos: catching out-of-context image misuse using self-supervised learning,” inProceedings of the AAAI conference on artificial intelligence, vol. 37, no. 12, 2023, pp. 14 084–14 092

work page 2023

[35] [35]

Newsclippings: Automatic generation of out-of-context multimodal media,

G. Luo, T. Darrell, and A. Rohrbach, “Newsclippings: Auto- matic generation of out-of-context multimodal media,”arXiv preprint arXiv:2104.05893, 2021

work page arXiv 2021

[36] [36]

Synthetic misinformers: Generating and combating multimodal misin- formation,

S.-I. Papadopoulos, C. Koutlis, S. Papadopoulos, and P. Petrantonakis, “Synthetic misinformers: Generating and combating multimodal misin- formation,” inProceedings of the 2nd ACM International Workshop on Multimedia AI against Disinformation, 2023, pp. 36–44

work page 2023

[37] [37]

Multimodal analytics for real-world news using measures of cross- modal entity consistency,

E. M ¨uller-Budack, J. Theiner, S. Diering, M. Idahl, and R. Ewerth, “Multimodal analytics for real-world news using measures of cross- modal entity consistency,” inProceedings of the 2020 international conference on multimedia retrieval, 2020, pp. 16–25

work page 2020

[38] [38]

Capturing the style of fake news,

P. Przybyla, “Capturing the style of fake news,” inProceedings of the AAAI conference on artificial intelligence, vol. 34, no. 01, 2020, pp. 490–497

work page 2020

[39] [39]

Hierarchical propa- gation networks for fake news detection: Investigation and exploitation,

K. Shu, D. Mahudeswaran, S. Wang, and H. Liu, “Hierarchical propa- gation networks for fake news detection: Investigation and exploitation,” inProceedings of the international AAAI conference on web and social media, vol. 14, 2020, pp. 626–637

work page 2020

[40] [40]

Safe: Similarity-aware multi-modal fake news detection,

X. Zhou, J. Wu, and R. Zafarani, “Safe: Similarity-aware multi-modal fake news detection,” inPacific-Asia Conference on Knowledge Discov- ery and Data Mining. Springer, 2020, pp. 354–367

work page 2020

[41] [41]

Causal inference for leveraging image-text matching bias in multi-modal fake news detection,

L. Hu, Z. Chen, Z. Zhao, J. Yin, and L. Nie, “Causal inference for leveraging image-text matching bias in multi-modal fake news detection,”IEEE Transactions on Knowledge and Data Engineering, vol. 35, no. 11, pp. 11 141–11 152, 2023

work page 2023

[42] [42]

Similarity over factuality: Are we making progress on multimodal out-of-context misinformation detection?

S.-I. Papadopoulos, C. Koutlis, S. Papadopoulos, and P. C. Petrantonakis, “Similarity over factuality: Are we making progress on multimodal out-of-context misinformation detection?” inProceedings of the Winter Conference on Applications of Computer Vision (WACV), February 2025, pp. 5570–5579

work page 2025

[43] [43]

Open-domain, content-based, multi-modal fact-checking of out-of-context images via online re- sources,

S. Abdelnabi, R. Hasan, and M. Fritz, “Open-domain, content-based, multi-modal fact-checking of out-of-context images via online re- sources,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 14 940–14 949

work page 2022

[44] [44]

Mm-vet: Evaluating large multimodal models for integrated capabilities,

W. Yu, Z. Yang, L. Li, J. Wang, K. Lin, Z. Liu, X. Wang, and L. Wang, “Mm-vet: Evaluating large multimodal models for integrated capabilities,” inInternational conference on machine learning. PMLR, 2024

work page 2024

[45] [45]

Measuring massive multitask language understanding,

D. Hendrycks, C. Burns, S. Basart, A. Zou, M. Mazeika, D. Song, and J. Steinhardt, “Measuring massive multitask language understanding,” Proceedings of the International Conference on Learning Representa- tions (ICLR), 2021

work page 2021

[46] [46]

Mmfakebench: A mixed-source mul- timodal misinformation detection benchmark for lvlms,

X. Liu, Z. Li, P. Li, S. Xia, X. Cui, L. Huang, H. Huang, W. Deng, and Z. He, “Mmfakebench: A mixed-source multimodal misinformation detection benchmark for lvlms,”arXiv preprint arXiv:2406.08772, 2024

work page arXiv 2024

[47] [47]

Flamingo: a visual language model for few-shot learning,

J.-B. Alayrac, J. Donahue, P. Luc, A. Miech, I. Barr, Y . Hasson, K. Lenc, A. Mensch, K. Millicah, M. Reynolds, R. Ring, E. Rutherford, S. Cabi, T. Han, Z. Gong, S. Samangooei, M. Monteiro, J. Menick, S. Borgeaud, A. Brock, A. Nematzadeh, S. Sharifzadeh, M. Binkowski, R. Barreira, O. Vinyals, A. Zisserman, and K. Simonyan, “Flamingo: a visual language mod...

work page 2022

[48] [48]

Dire for diffusion-generated image detection,

Z. Wang, J. Bao, W. Zhou, W. Wang, H. Hu, H. Chen, and H. Li, “Dire for diffusion-generated image detection,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 22 445–22 455

work page 2023

[49] [49]

End-to-end multimodal fact-checking and explanation generation: A challenging dataset and models,

B. M. Yao, A. Shah, L. Sun, J.-H. Cho, and L. Huang, “End-to-end multimodal fact-checking and explanation generation: A challenging dataset and models,” inProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, ser. SIGIR ’23. New York, NY , USA: Association for Computing Machinery, 2023, p. 2733–27...

work page doi:10.1145/3539618.3591879 2023

[50] [50]

Can llms produce faithful explanations for fact-checking? towards faithful explainable fact-checking via multi-agent debate,

K. Kim, S. Lee, K.-H. Huang, H. P. Chan, M. Li, and H. Ji, “Can llms produce faithful explanations for fact-checking? towards faith- ful explainable fact-checking via multi-agent debate,”arXiv preprint arXiv:2402.07401, 2024. DANNI XUis currently a Ph.D. student with the School of Computing, National University of Sin- gapore. She received the B.S. degree...

work page arXiv 2024