XNote: Benchmarking Automated Community Notes Generation for Image-based Contextual Deception

Ethan Anderson; Feng Luo; Jingwen Yan; Jinkyung Katie Park; Jin Ma; Long Cheng; Mohammed Aldeen; Taran Kavuru

arxiv: 2603.22453 · v2 · pith:DBLGS64Cnew · submitted 2026-03-23 · 💻 cs.CL · cs.SI

XNote: Benchmarking Automated Community Notes Generation for Image-based Contextual Deception

Jin Ma , Jingwen Yan , Mohammed Aldeen , Ethan Anderson , Taran Kavuru , Jinkyung Katie Park , Feng Luo , Long Cheng This is my paper

Pith reviewed 2026-05-21 10:34 UTC · model grok-4.3

classification 💻 cs.CL cs.SI

keywords community notescontextual deceptionimage deceptionautomated note generationbenchmarkinglarge vision language modelssocial media

0 comments

The pith

Researchers create the XNote dataset to benchmark automated generation of Community Notes for posts with authentic images but misleading contexts.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper aims to fill the gap in datasets for training and testing systems that automatically write Community Notes to counter image-based contextual deception on social media. Community Notes work by providing missing context to users, but relying on humans limits their speed and scale. By assembling XNote from real X posts that have human notes, adding annotations on topics and deception types, and then testing large vision language models along with other systems on generating similar notes, the authors demonstrate current performance levels and difficulties. This matters to readers because effective automation could allow corrections to reach users much sooner and on a larger number of deceptive posts.

Core claim

By curating the XNote dataset from X posts with associated Community Notes and external contexts along with annotations of topics and deceptive factors, and benchmarking a range of frontier large vision language models on both deception detection and note generation tasks, the work shows the challenges in producing concise and grounded notes that help users recover the missing or corrected context and the need for improved methods and metrics.

What carries the argument

The XNote dataset of real-world X posts paired with human Community Notes and new annotations for topics and deceptive factors, which enables evaluation of automated systems on generating helpful corrective notes rather than binary deception labels.

If this is right

Evaluation moves beyond binary true or false detection to assess whether generated notes recover the specific missing context.
Frontier models exhibit limitations in creating concise, grounded Community Notes for these cases.
Both specialized systems and commercial tools require advancements to handle this task effectively.
New metrics and methods tailored to note generation will be necessary to make progress.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the benchmark proves useful, social platforms could deploy similar AI systems to assist or scale up Community Notes production.
Collecting more data in this format could allow training models specifically for context recovery in deceptive posts.
Connections to other misinformation correction tasks, such as fact-checking, may benefit from similar grounded generation approaches.

Load-bearing premise

The selected X posts and the added annotations for topics and deceptive factors accurately reflect typical cases of image-based contextual deception and provide dependable ground truth for assessing automated note generation.

What would settle it

If future models achieve high agreement with human Community Notes on a diverse set of new posts, as judged by independent raters on helpfulness and accuracy in correcting the context, this would indicate that the challenges highlighted can be overcome with current or near-term techniques.

Figures

Figures reproduced from arXiv: 2603.22453 by Ethan Anderson, Feng Luo, Jingwen Yan, Jinkyung Katie Park, Jin Ma, Long Cheng, Mohammed Aldeen, Taran Kavuru.

**Figure 2.** Figure 2: XCHECK dataset collection pipeline. (e.g., meme), or a multi-photo collage. Claim can be conveyed, but not limited, in the post text or the text within the image. 3.1 Dataset Collection and Analysis XCHECK dataset was constructed in four stages as shown in [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Example data entry in XCHECK, with image, structured post metadata and external context [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 5.** Figure 5: Number of posts trend over time for top-5 topics. [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗

**Figure 6.** Figure 6: Source URLs analysis in XCHECK. social platforms rather than by news outlets. However, citing social posts carries reliability risks, as these platforms are also major venues for deception. Archival services (e.g., archive.ph) are also valuable resources, since they help preserve volatile content and provide durable citations when original pages are altered or removed. Surprisingly, fact-checking websit… view at source ↗

**Figure 7.** Figure 7: System design of the proposed ACCNOTE framework. provided in XCHECK, which is collected via Google reverse image search over the open Web. 4.1 System Overview As shown in [PITH_FULL_IMAGE:figures/full_fig_p007_7.png] view at source ↗

**Figure 8.** Figure 8: The original post for Example 2, with post text [PITH_FULL_IMAGE:figures/full_fig_p008_8.png] view at source ↗

**Figure 9.** Figure 9: One example post with three different notes used in user study. Method names are anonymized in the survey. [PITH_FULL_IMAGE:figures/full_fig_p010_9.png] view at source ↗

**Figure 10.** Figure 10: Statistical results from the user study. Bars indicate [PITH_FULL_IMAGE:figures/full_fig_p010_10.png] view at source ↗

**Figure 11.** Figure 11: The original post for Example 3. Example 3: Qualitative Example of Different Notes SNIFFER (web search): The image is wrongly used in a different news context. The given news caption and image are inconsistent in person. The person in caption is Tulsi Gabbard, and the person in image is John Kerry. GPT5-mini (web search): The people pictured are not the individuals named in the post: the woman shown is o… view at source ↗

read the original abstract

Community Notes have emerged as an effective crowd-sourced mechanism for combating online deception on social media platforms. However, its reliance on human contributors limits both the timeliness and scalability. In this work, we study the automated Community Notes generation task for image-based contextual deception, where an authentic image is paired with misleading context (e.g., time, entity, and event). Unlike prior work that primarily focuses on deception detection (i.e., judging whether a post is true or false in a binary manner), automated Community Notes generation requires producing concise and grounded notes that help users recover the missing or corrected context. This problem remains underexplored due to the scarcity of datasets that support this task. To address this gap, we curate a real-world dataset, XNote, comprising X posts with associated Community Notes and external contexts, along with annotations of topics and deceptive factors. We further benchmark a range of frontier large vision language models (LVLMs) on XNote, evaluating their performance on both deception detection and note generation tasks. We also compare against an end-to-end approach, SNIFFER, and a commercial tool, GPT-5. Our results highlight the challenges in automated Community Notes generation, underscoring the need for improved methods and metrics tailored for this task.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

XNote gives a real-world dataset for generating corrective Community Notes on image deceptions, but missing details on annotation quality make the benchmarks hard to trust.

read the letter

The main point is that this paper curates XNote from actual X posts that pair authentic images with misleading context, pulls in existing Community Notes, and adds labels for topics plus deception factors like time, entity, or event mismatches. They then benchmark frontier LVLMs on both spotting the deception and writing concise corrective notes, with side comparisons to SNIFFER and GPT-5. This shifts the focus from binary detection, which has been the usual target, to producing grounded notes that recover missing context.

Referee Report

2 major / 1 minor

Summary. The paper introduces the XNote dataset, curated from real X posts with associated Community Notes and external contexts, augmented with annotations for topics and deceptive factors (time/entity/event mismatches). It benchmarks frontier LVLMs on deception detection and automated generation of concise, grounded Community Notes to help users recover missing context, with comparisons to SNIFFER and GPT-5, highlighting challenges in this underexplored task.

Significance. If the annotations are shown to be reliable, the dataset and benchmark could meaningfully advance research on scalable, automated support for Community Notes by shifting focus from binary deception detection to contextual correction. The real-world sourcing from X posts and inclusion of existing notes provide a practical foundation for evaluating LVLM performance on image-based misinformation.

major comments (2)

[Abstract] Abstract and setup: no details are provided on evaluation metrics for note generation, inter-annotator agreement, data splits, or the protocol used to judge note quality. These omissions are load-bearing for the central claim that XNote enables reliable benchmarking of 'concise and grounded' notes.
[Dataset Curation] Dataset curation: the new annotations of deceptive factors (time/entity/event mismatches) lack any reported validation, consistency checks, or external corroboration. Without this, model performance numbers on note generation risk being driven by annotation noise rather than genuine ability to recover context.

minor comments (1)

[Experiments] Clarify whether note quality evaluation relies on automated metrics, human raters, or both, and report any agreement statistics.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive report. The comments correctly identify areas where additional transparency is needed to support the reliability of the XNote benchmark. We address each major comment below and will incorporate the requested clarifications in the revised manuscript.

read point-by-point responses

Referee: [Abstract] Abstract and setup: no details are provided on evaluation metrics for note generation, inter-annotator agreement, data splits, or the protocol used to judge note quality. These omissions are load-bearing for the central claim that XNote enables reliable benchmarking of 'concise and grounded' notes.

Authors: We agree that these elements were not described in the abstract or setup and that their absence weakens the central benchmarking claim. In the revision we will expand the abstract with a concise statement of the evaluation approach and add an explicit subsection (new Section 3.5) that reports the metrics used for note generation, inter-annotator agreement statistics for the annotations, the train/validation/test splits, and the human evaluation protocol for assessing conciseness and groundedness. revision: yes
Referee: [Dataset Curation] Dataset curation: the new annotations of deceptive factors (time/entity/event mismatches) lack any reported validation, consistency checks, or external corroboration. Without this, model performance numbers on note generation risk being driven by annotation noise rather than genuine ability to recover context.

Authors: We acknowledge that the current manuscript does not report validation or consistency checks for the deceptive-factor annotations. We will add a dedicated paragraph in Section 3 describing the annotation guidelines, the number of annotators, the procedure for resolving disagreements, and the resulting inter-annotator agreement. We will also include a small set of annotated examples to permit external scrutiny. If time permits we will attempt a limited external corroboration step; otherwise the internal validation details will be provided to reduce the risk of annotation noise affecting the reported results. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical benchmark on externally sourced dataset

full rationale

The paper curates the XNote dataset from real X posts paired with existing Community Notes and external contexts, then adds topic and deceptive-factor annotations to benchmark LVLM performance on detection and note generation. No equations, fitted parameters, or model predictions appear in the described chain. Evaluation metrics are computed against human-provided annotations and compared to independent baselines (SNIFFER, GPT-5), so results do not reduce to quantities defined by the authors' own prior fits or self-citations. The work is therefore self-contained against external benchmarks and exhibits no load-bearing self-referential steps.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on the assumption that existing human-written Community Notes and the authors' added annotations of topics and deceptive factors provide a faithful proxy for real-world deception cases; no free parameters or invented entities are introduced.

axioms (2)

domain assumption Community Notes have emerged as an effective crowd-sourced mechanism for combating online deception
Opening sentence of the abstract; used to motivate the automation task.
domain assumption The curated XNote dataset accurately captures real-world image-based contextual deception
Implicit in the decision to benchmark on this dataset as ground truth.

pith-pipeline@v0.9.0 · 5776 in / 1426 out tokens · 46175 ms · 2026-05-21T10:34:32.168185+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We curate a real-world dataset, XNote, comprising X posts with associated Community Notes and external contexts, along with annotations of topics and deceptive factors.
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We introduce a new evaluation metric, Context Helpfulness Score (CHS)

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

70 extracted references · 70 canonical work pages · 3 internal anchors

[1]

Multi-modal misinformation detection: Approaches, challenges and opportunities.ACM Computing Surveys, 57(3):1–29, 2024

Sara Abdali, Sina Shaham, and Bhaskar Krishnamachari. Multi-modal misinformation detection: Approaches, challenges and opportunities.ACM Computing Surveys, 57(3):1–29, 2024

work page 2024
[2]

Open- domain, content-based, multi-modal fact-checking of out-of-context images via online resources

Sahar Abdelnabi, Rakibul Hasan, and Mario Fritz. Open- domain, content-based, multi-modal fact-checking of out-of-context images via online resources. InProceed- ings of the IEEE/CVF conference on computer vision and pattern recognition, pages 14940–14949, 2022

work page 2022
[3]

Photoshop

Adobe. Photoshop. https://www.adobe.com/ products/photoshop, 2025. Accessed: September, 2025

work page 2025
[4]

Fake news, disinformation and misinformation in social me- dia: a review.Social Network Analysis and Mining, 13(1):30, 2023

Esma Aïmeur, Sabrine Amri, and Gilles Brassard. Fake news, disinformation and misinformation in social me- dia: a review.Social Network Analysis and Mining, 13(1):30, 2023

work page 2023
[5]

Quantifying the impact of misinformation and vaccine-skeptical content on facebook.Science, 384(6699):eadk3451, 2024

Jennifer Allen, Duncan J Watts, and David G Rand. Quantifying the impact of misinformation and vaccine-skeptical content on facebook.Science, 384(6699):eadk3451, 2024

work page 2024
[6]

Amazon mechanical turk

Amazon. Amazon mechanical turk. https://www. mturk.com/, 2025. Accessed: September, 2025

work page 2025
[7]

Covid-19 vaccine hesitancy—a scoping review of literature in high-income countries

Junjie Aw, Jun Jie Benjamin Seng, Sharna Si Ying Seah, and Lian Leng Low. Covid-19 vaccine hesitancy—a scoping review of literature in high-income countries. Vaccines, 9(8):900, 2021

work page 2021
[8]

Qwen2.5-VL Technical Report

Shuai Bai, Keqin Chen, Xuejing Liu, Jialin Wang, Wen- bin Ge, Sibo Song, Kai Dang, Peng Wang, Shijie Wang, Jun Tang, Humen Zhong, Yuanzhi Zhu, Mingkun Yang, Zhaohai Li, Jianqiang Wan, Pengfei Wang, Wei Ding, Zheren Fu, Yiheng Xu, Jiabo Ye, Xi Zhang, Tianbao Xie, Zesen Cheng, Hang Zhang, Zhibo Yang, Haiyang Xu, and Junyang Lin. Qwen2.5-vl technical report....

work page internal anchor Pith review Pith/arXiv arXiv 2025
[9]

Meteor: An auto- matic metric for mt evaluation with improved correla- tion with human judgments

Satanjeev Banerjee and Alon Lavie. Meteor: An auto- matic metric for mt evaluation with improved correla- tion with human judgments. InProceedings of the acl workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization, pages 65–72, 2005

work page 2005
[10]

Main-rag: Multi-agent filter- ing retrieval-augmented generation.arXiv preprint arXiv:2501.00332, 2024

Chia-Yuan Chang, Zhimeng Jiang, Vineeth Rakesh, Menghai Pan, Chin-Chia Michael Yeh, Guanchu Wang, Mingzhi Hu, Zhichao Xu, Yan Zheng, Ma- hashweta Das, et al. Main-rag: Multi-agent filter- ing retrieval-augmented generation.arXiv preprint arXiv:2501.00332, 2024

work page arXiv 2024
[11]

Internvl: Scaling up vi- sion foundation models and aligning for generic visual- linguistic tasks

Zhe Chen, Jiannan Wu, Wenhai Wang, Weijie Su, Guo Chen, Sen Xing, Muyan Zhong, Qinglong Zhang, Xizhou Zhu, Lewei Lu, et al. Internvl: Scaling up vi- sion foundation models and aligning for generic visual- linguistic tasks. InProceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition, pages 24185–24198, 2024

work page 2024
[12]

Supernotes: Driving consensus in crowd- sourced fact-checking

Soham De, Michiel A Bakker, Jay Baxter, and Mar- tin Saveski. Supernotes: Driving consensus in crowd- sourced fact-checking. InProceedings of the ACM on Web Conference 2025, pages 3751–3761, 2025

work page 2025
[13]

Ammeba: A large-scale survey and dataset of media-based misinformation in-the-wild

Nicholas Dufour, Arkanath Pathak, Pouya Samangouei, Nikki Hariri, Shashi Deshetti, Andrew Dudfield, Christo- pher Guess, Pablo Hernández Escayola, Bobby Tran, Mevan Babakar, et al. Ammeba: A large-scale survey and dataset of media-based misinformation in-the-wild. arXiv preprint arXiv:2405.11697, 2024

work page arXiv 2024
[14]

Factcheck

FactCheck.org. Factcheck. https://www.factcheck. org/, 2025. Accessed: January, 2026

work page 2025
[15]

Detect web entities and pages.https:// cloud.google.com/vision/docs/detecting-web,

Google Cloud. Detect web entities and pages.https:// cloud.google.com/vision/docs/detecting-web,

work page
[16]

Accessed: April, 2025

work page 2025
[17]

Fact check (claimreview) structured data

Google Search Central. Fact check (claimreview) structured data. https://developers.google. com/search/docs/appearance/structured-data/ factcheck. Accessed: October, 2025

work page 2025
[18]

An overview of fake news detection: From a new perspec- tive.Fundamental Research, 5(1):332–346, 2025

Bo Hu, Zhendong Mao, and Yongdong Zhang. An overview of fake news detection: From a new perspec- tive.Fundamental Research, 5(1):332–346, 2025

work page 2025
[19]

Langchain community

LangChain. Langchain community. https: //python.langchain.com/api_reference/ community/index.html, 2025. Accessed: June, 2025

work page 2025
[20]

Misinformation and the epistemic integrity of democracy.Current opinion in psychology, 54:101711, 2023

Stephan Lewandowsky, Ullrich KH Ecker, John Cook, Sander Van Der Linden, Jon Roozenbeek, and Naomi Oreskes. Misinformation and the epistemic integrity of democracy.Current opinion in psychology, 54:101711, 2023

work page 2023
[21]

Retrieval-augmented generation for knowledge- intensive nlp tasks.Advances in neural information processing systems, 33:9459–9474, 2020

Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, et al. Retrieval-augmented generation for knowledge- intensive nlp tasks.Advances in neural information processing systems, 33:9459–9474, 2020

work page 2020
[22]

Is a picture worth a thousand words? an empirical study of image content and so- cial media engagement.Journal of marketing research, 57(1):1–19, 2020

Yiyi Li and Ying Xie. Is a picture worth a thousand words? an empirical study of image content and so- cial media engagement.Journal of marketing research, 57(1):1–19, 2020. 15

work page 2020
[23]

Rouge: A package for automatic evalua- tion of summaries

Chin-Yew Lin. Rouge: A package for automatic evalua- tion of summaries. InText summarization branches out, pages 74–81, 2004

work page 2004
[24]

Detecting multimedia gen- erated by large ai models: A survey

Li Lin, Neeraj Gupta, Yue Zhang, Hainan Ren, Chun- Hao Liu, Feng Ding, Xin Wang, Xin Li, Luisa Verdoliva, and Shu Hu. Detecting multimedia generated by large ai models: A survey.arXiv preprint arXiv:2402.00045, 2024

work page arXiv 2024
[25]

Visual news: Benchmark and challenges in news image captioning

Fuxiao Liu, Yinghan Wang, Tianlu Wang, and Vicente Ordonez. Visual news: Benchmark and challenges in news image captioning. InProceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 6761–6771, 2021

work page 2021
[26]

Llavanext: Im- proved reasoning, ocr, and world knowledge, 2024

Haotian Liu, Chunyuan Li, Yuheng Li, Bo Li, Yuanhan Zhang, Sheng Shen, and Yong Jae Lee. Llavanext: Im- proved reasoning, ocr, and world knowledge, 2024

work page 2024
[27]

Mmfakebench: A mixed-source mul- timodal misinformation detection benchmark for lvlms,

Xuannan Liu, Zekun Li, Peipei Li, Huaibo Huang, Shuhan Xia, Xing Cui, Linzhi Huang, Weihong Deng, and Zhaofeng He. Mmfakebench: A mixed-source mul- timodal misinformation detection benchmark for lvlms. arXiv preprint arXiv:2406.08772, 2024

work page arXiv 2024
[28]

NVILA: Efficient Frontier Visual Language Models

Zhijian Liu, Ligeng Zhu, Baifeng Shi, Zhuoyang Zhang, Yuming Lou, Shang Yang, Haocheng Xi, Shiyi Cao, Yux- ian Gu, Dacheng Li, et al. Nvila: Efficient frontier vi- sual language models.arXiv preprint arXiv:2412.04468, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[29]

Textblob: Simplified text process- ing

Steven Loria. Textblob: Simplified text process- ing. https://textblob.readthedocs.io/en/dev/ index.html, 2026. Accessed: January, 2026

work page 2026
[30]

Newsclippings: Automatic generation of out-of-context multimodal media,

Grace Luo, Trevor Darrell, and Anna Rohrbach. Newsclippings: Automatic generation of out-of-context multimodal media.arXiv preprint arXiv:2104.05893, 2021

work page arXiv 2021
[31]

Local: Logical and causal fact-checking with llm-based multi- agents

Jiatong Ma, Linmei Hu, Rang Li, and Wenbo Fu. Local: Logical and causal fact-checking with llm-based multi- agents. InProceedings of the ACM on Web Conference 2025, pages 1614–1625, 2025

work page 2025
[32]

Introducing community notes

Meta. Introducing community notes. https://www. meta.com/technologies/community-notes/, 2025. Accessed: September, 2025

work page 2025
[33]

The creation and detec- tion of deepfakes: A survey.ACM computing surveys (CSUR), 54(1):1–41, 2021

Yisroel Mirsky and Wenke Lee. The creation and detec- tion of deepfakes: A survey.ACM computing surveys (CSUR), 54(1):1–41, 2021

work page 2021
[34]

r/fakeddit: A new multimodal benchmark dataset for fine-grained fake news detection.arXiv preprint arXiv:1911.03854, 2019

Kai Nakamura, Sharon Levy, and William Yang Wang. r/fakeddit: A new multimodal benchmark dataset for fine-grained fake news detection.arXiv preprint arXiv:1911.03854, 2019

work page arXiv 1911
[35]

Openai models

OpenAI. Openai models. https://platform.openai. com/docs/models/, 2025. Accessed: September, 2025

work page 2025
[36]

Verite: a robust benchmark for multimodal misinfor- mation detection accounting for unimodal bias.Inter- national Journal of Multimedia Information Retrieval, 13(1):4, 2024

Stefanos-Iordanis Papadopoulos, Christos Koutlis, Symeon Papadopoulos, and Panagiotis C Petrantonakis. Verite: a robust benchmark for multimodal misinfor- mation detection accounting for unimodal bias.Inter- national Journal of Multimedia Information Retrieval, 13(1):4, 2024

work page 2024
[37]

Bleu: a method for automatic evaluation of machine translation

Kishore Papineni, Salim Roukos, Todd Ward, and Wei- Jing Zhu. Bleu: a method for automatic evaluation of machine translation. InProceedings of the 40th annual meeting of the Association for Computational Linguis- tics, pages 311–318, 2002

work page 2002
[38]

The psychology of fake news.Trends in cognitive sciences, 25(5):388– 402, 2021

Gordon Pennycook and David G Rand. The psychology of fake news.Trends in cognitive sciences, 25(5):388– 402, 2021

work page 2021
[39]

Sniffer: Multimodal large language model for explain- able out-of-context misinformation detection

Peng Qi, Zehong Yan, Wynne Hsu, and Mong Li Lee. Sniffer: Multimodal large language model for explain- able out-of-context misinformation detection. InPro- ceedings of the IEEE/CVF conference on computer vi- sion and pattern recognition, pages 13052–13062, 2024

work page 2024
[40]

Learning Transferable Visual Models From Natural Language Supervision

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. Learning transferable vi- sual models from natural language supervision.arXiv preprint arXiv:2103.00020, 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021
[41]

Fin-fact: A benchmark dataset for multimodal financial fact-checking and explanation generation

Aman Rangapur, Haoran Wang, Ling Jian, and Kai Shu. Fin-fact: A benchmark dataset for multimodal financial fact-checking and explanation generation. InCompan- ion Proceedings of the ACM on Web Conference 2025, pages 785–788, 2025

work page 2025
[42]

Sentence- BERT : Sentence Embeddings using S iamese BERT -Networks

Nils Reimers and Iryna Gurevych. Sentence-BERT: Sentence embeddings using siamese bert-networks. In Proceedings of the 2019 Conference on Empirical Meth- ods in Natural Language Processing, pages 3982–3992, 2019.doi:10.18653/v1/D19-1410

work page doi:10.18653/v1/d19-1410 2019
[43]

How long do respondents think online surveys should be? new evi- dence from two online panels in germany.International Journal of Market Research, 62(5):538–545, 2020

Melanie Revilla and Jan Karem Höhne. How long do respondents think online surveys should be? new evi- dence from two online panels in germany.International Journal of Market Research, 62(5):538–545, 2020

work page 2020
[44]

Evaluating retrieval quality in retrieval-augmented generation

Alireza Salemi and Hamed Zamani. Evaluating retrieval quality in retrieval-augmented generation. InProceed- ings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 2395–2400, 2024. 16

work page 2024
[45]

Claimreview — schema.org type

Schema.org. Claimreview — schema.org type. https: //schema.org/ClaimReview. Accessed: October, 2025

work page 2025
[46]

Snopes, Inc. Snopes. https://www.snopes.com/,

work page
[47]

Accessed: January, 2026

work page 2026
[48]

References to unbiased sources increase the helpfulness of community fact-checks.Scientific Reports, 15(1):25749, 2025

Kirill Solovev and Nicolas Pröllochs. References to unbiased sources increase the helpfulness of community fact-checks.Scientific Reports, 15(1):25749, 2025

work page 2025
[49]

The proof and measurement of asso- ciation between two things

Charles Spearman. The proof and measurement of asso- ciation between two things. 1961

work page 1961
[50]

Politifact

The Poynter Institute. Politifact. https://www. politifact.com/, 2026. Accessed: January, 2026

work page 2026
[51]

Online de- ception in social media.Communications of the ACM, 57(9):72–80, 2014

Michail Tsikerdekis and Sherali Zeadally. Online de- ception in social media.Communications of the ACM, 57(9):72–80, 2014

work page 2014
[52]

Explainable fake news detection with large language model via de- fense among competing wisdom

Bo Wang, Jing Ma, Hongzhan Lin, Zhiwei Yang, Ruichao Yang, Yuan Tian, and Yi Chang. Explainable fake news detection with large language model via de- fense among competing wisdom. InProceedings of the ACM Web Conference 2024, pages 2452–2463, 2024

work page 2024
[53]

Un- derstanding the use of fauxtography on social media

Yuping Wang, Fatemeh Tahmasbi, Jeremy Blackburn, Barry Bradlyn, Emiliano De Cristofaro, David Mager- man, Savvas Zannettou, and Gianluca Stringhini. Un- derstanding the use of fauxtography on social media. InProceedings of the International AAAI Conference on Web and Social Media, volume 15, pages 776–786, 2021

work page 2021
[54]

The emergence of deepfake tech- nology: A review.Technology innovation management review, 9(11), 2019

Mika Westerlund. The emergence of deepfake tech- nology: A review.Technology innovation management review, 9(11), 2019

work page 2019
[55]

X community notes

X Corp. X community notes. https: //communitynotes.x.com/guide/en/about/ introduction, 2025. Accessed: September, 2025

work page 2025
[56]

X developer platform api

X Corp. X developer platform api. https: //developer.x.com/en/portal/dashboard, 2025. Accessed: April, 2025

work page 2025
[57]

Mmooc: A multimodal misinformation dataset for out-of-context news analysis

Qingzheng Xu, Heming Du, Huiqiang Chen, Bo Liu, and Xin Yu. Mmooc: A multimodal misinformation dataset for out-of-context news analysis. InAustralasian Conference on Information Security and Privacy, pages 444–459. Springer, 2024

work page 2024
[58]

Visual misinformation on facebook.Journal of Commu- nication, 73(4):316–328, 2023

Yunkang Yang, Trevor Davis, and Matthew Hindman. Visual misinformation on facebook.Journal of Commu- nication, 73(4):316–328, 2023

work page 2023
[59]

Support or refute: Analyzing the stance of ev- idence to detect out-of-context mis-and disinformation

Xin Yuan, Jie Guo, Weidong Qiu, Zheng Huang, and Shujun Li. Support or refute: Analyzing the stance of ev- idence to detect out-of-context mis-and disinformation. arXiv preprint arXiv:2311.01766, 2023

work page arXiv 2023
[60]

deceptive

Jingqing Zhang, Yao Zhao, Mohammad Saleh, and Peter Liu. Pegasus: Pre-training with extracted gap-sentences for abstractive summarization. InInternational confer- ence on machine learning, pages 11328–11339. PMLR, 2020. AXCHECKDataset A.1 Topics and Factors Classification We used OpenAI GPT5 with the Prompt 1 to assign topical categories to each post. For...

work page 2020
[61]

Identify the post’s main claim from the image, text, and date

work page
[62]

If the claim is based on the image, check whether the image’s visual details and factual context support or contradict it

work page
[63]

If the claim does not rely on the image, use knowledge and facts to support or contradict the claim

work page
[64]

If external context is provided, use the provided context to sup- port or contradict the claim

work page
[65]

Deceptive

If any contradiction is found (e.g., claim vs. image, claim vs. knowledge, claim vs. external context), label “Deceptive”; if none, label “Non-deceptive”. OUTPUT FORMAT (clear, unbiased, factual, relevant): - Begin with “Deceptive” or “Non-deceptive”. - Follow with 1-2 sentences citing specific visual details, knowl- edge, or relevant context. EXTERNAL CO...

work page
[66]

Source Credibility: cites reliable, trustworthy sources

work page
[67]

Clarity: concise and easy to understand

work page
[68]

Relevance: directly addresses the post’s image/text and context

work page
[69]

Veracity: factually correct and evidence-based

work page
[70]

Option X

Neutrality: neutral tone, no cultural/personal bias. OUTPUT FORMAT: - Begin with “Option X”, where X is the option number. - Follow with 1-2 sentences explaining why this option is best. POST DETAILS: Image: <image>; Text: <text>; Date: <date> EV ALUATION OPTIONS:[1. {Note 1}, 2. {Note 2}, . . .] •Source Credibility •Clarity •Relevance •Font Size •Veracit...

work page

[1] [1]

Multi-modal misinformation detection: Approaches, challenges and opportunities.ACM Computing Surveys, 57(3):1–29, 2024

Sara Abdali, Sina Shaham, and Bhaskar Krishnamachari. Multi-modal misinformation detection: Approaches, challenges and opportunities.ACM Computing Surveys, 57(3):1–29, 2024

work page 2024

[2] [2]

Open- domain, content-based, multi-modal fact-checking of out-of-context images via online resources

Sahar Abdelnabi, Rakibul Hasan, and Mario Fritz. Open- domain, content-based, multi-modal fact-checking of out-of-context images via online resources. InProceed- ings of the IEEE/CVF conference on computer vision and pattern recognition, pages 14940–14949, 2022

work page 2022

[3] [3]

Photoshop

Adobe. Photoshop. https://www.adobe.com/ products/photoshop, 2025. Accessed: September, 2025

work page 2025

[4] [4]

Fake news, disinformation and misinformation in social me- dia: a review.Social Network Analysis and Mining, 13(1):30, 2023

Esma Aïmeur, Sabrine Amri, and Gilles Brassard. Fake news, disinformation and misinformation in social me- dia: a review.Social Network Analysis and Mining, 13(1):30, 2023

work page 2023

[5] [5]

Quantifying the impact of misinformation and vaccine-skeptical content on facebook.Science, 384(6699):eadk3451, 2024

Jennifer Allen, Duncan J Watts, and David G Rand. Quantifying the impact of misinformation and vaccine-skeptical content on facebook.Science, 384(6699):eadk3451, 2024

work page 2024

[6] [6]

Amazon mechanical turk

Amazon. Amazon mechanical turk. https://www. mturk.com/, 2025. Accessed: September, 2025

work page 2025

[7] [7]

Covid-19 vaccine hesitancy—a scoping review of literature in high-income countries

Junjie Aw, Jun Jie Benjamin Seng, Sharna Si Ying Seah, and Lian Leng Low. Covid-19 vaccine hesitancy—a scoping review of literature in high-income countries. Vaccines, 9(8):900, 2021

work page 2021

[8] [8]

Qwen2.5-VL Technical Report

Shuai Bai, Keqin Chen, Xuejing Liu, Jialin Wang, Wen- bin Ge, Sibo Song, Kai Dang, Peng Wang, Shijie Wang, Jun Tang, Humen Zhong, Yuanzhi Zhu, Mingkun Yang, Zhaohai Li, Jianqiang Wan, Pengfei Wang, Wei Ding, Zheren Fu, Yiheng Xu, Jiabo Ye, Xi Zhang, Tianbao Xie, Zesen Cheng, Hang Zhang, Zhibo Yang, Haiyang Xu, and Junyang Lin. Qwen2.5-vl technical report....

work page internal anchor Pith review Pith/arXiv arXiv 2025

[9] [9]

Meteor: An auto- matic metric for mt evaluation with improved correla- tion with human judgments

Satanjeev Banerjee and Alon Lavie. Meteor: An auto- matic metric for mt evaluation with improved correla- tion with human judgments. InProceedings of the acl workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization, pages 65–72, 2005

work page 2005

[10] [10]

Main-rag: Multi-agent filter- ing retrieval-augmented generation.arXiv preprint arXiv:2501.00332, 2024

Chia-Yuan Chang, Zhimeng Jiang, Vineeth Rakesh, Menghai Pan, Chin-Chia Michael Yeh, Guanchu Wang, Mingzhi Hu, Zhichao Xu, Yan Zheng, Ma- hashweta Das, et al. Main-rag: Multi-agent filter- ing retrieval-augmented generation.arXiv preprint arXiv:2501.00332, 2024

work page arXiv 2024

[11] [11]

Internvl: Scaling up vi- sion foundation models and aligning for generic visual- linguistic tasks

Zhe Chen, Jiannan Wu, Wenhai Wang, Weijie Su, Guo Chen, Sen Xing, Muyan Zhong, Qinglong Zhang, Xizhou Zhu, Lewei Lu, et al. Internvl: Scaling up vi- sion foundation models and aligning for generic visual- linguistic tasks. InProceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition, pages 24185–24198, 2024

work page 2024

[12] [12]

Supernotes: Driving consensus in crowd- sourced fact-checking

Soham De, Michiel A Bakker, Jay Baxter, and Mar- tin Saveski. Supernotes: Driving consensus in crowd- sourced fact-checking. InProceedings of the ACM on Web Conference 2025, pages 3751–3761, 2025

work page 2025

[13] [13]

Ammeba: A large-scale survey and dataset of media-based misinformation in-the-wild

Nicholas Dufour, Arkanath Pathak, Pouya Samangouei, Nikki Hariri, Shashi Deshetti, Andrew Dudfield, Christo- pher Guess, Pablo Hernández Escayola, Bobby Tran, Mevan Babakar, et al. Ammeba: A large-scale survey and dataset of media-based misinformation in-the-wild. arXiv preprint arXiv:2405.11697, 2024

work page arXiv 2024

[14] [14]

Factcheck

FactCheck.org. Factcheck. https://www.factcheck. org/, 2025. Accessed: January, 2026

work page 2025

[15] [15]

Detect web entities and pages.https:// cloud.google.com/vision/docs/detecting-web,

Google Cloud. Detect web entities and pages.https:// cloud.google.com/vision/docs/detecting-web,

work page

[16] [16]

Accessed: April, 2025

work page 2025

[17] [17]

Fact check (claimreview) structured data

Google Search Central. Fact check (claimreview) structured data. https://developers.google. com/search/docs/appearance/structured-data/ factcheck. Accessed: October, 2025

work page 2025

[18] [18]

An overview of fake news detection: From a new perspec- tive.Fundamental Research, 5(1):332–346, 2025

Bo Hu, Zhendong Mao, and Yongdong Zhang. An overview of fake news detection: From a new perspec- tive.Fundamental Research, 5(1):332–346, 2025

work page 2025

[19] [19]

Langchain community

LangChain. Langchain community. https: //python.langchain.com/api_reference/ community/index.html, 2025. Accessed: June, 2025

work page 2025

[20] [20]

Misinformation and the epistemic integrity of democracy.Current opinion in psychology, 54:101711, 2023

Stephan Lewandowsky, Ullrich KH Ecker, John Cook, Sander Van Der Linden, Jon Roozenbeek, and Naomi Oreskes. Misinformation and the epistemic integrity of democracy.Current opinion in psychology, 54:101711, 2023

work page 2023

[21] [21]

Retrieval-augmented generation for knowledge- intensive nlp tasks.Advances in neural information processing systems, 33:9459–9474, 2020

Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, et al. Retrieval-augmented generation for knowledge- intensive nlp tasks.Advances in neural information processing systems, 33:9459–9474, 2020

work page 2020

[22] [22]

Is a picture worth a thousand words? an empirical study of image content and so- cial media engagement.Journal of marketing research, 57(1):1–19, 2020

Yiyi Li and Ying Xie. Is a picture worth a thousand words? an empirical study of image content and so- cial media engagement.Journal of marketing research, 57(1):1–19, 2020. 15

work page 2020

[23] [23]

Rouge: A package for automatic evalua- tion of summaries

Chin-Yew Lin. Rouge: A package for automatic evalua- tion of summaries. InText summarization branches out, pages 74–81, 2004

work page 2004

[24] [24]

Detecting multimedia gen- erated by large ai models: A survey

Li Lin, Neeraj Gupta, Yue Zhang, Hainan Ren, Chun- Hao Liu, Feng Ding, Xin Wang, Xin Li, Luisa Verdoliva, and Shu Hu. Detecting multimedia generated by large ai models: A survey.arXiv preprint arXiv:2402.00045, 2024

work page arXiv 2024

[25] [25]

Visual news: Benchmark and challenges in news image captioning

Fuxiao Liu, Yinghan Wang, Tianlu Wang, and Vicente Ordonez. Visual news: Benchmark and challenges in news image captioning. InProceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 6761–6771, 2021

work page 2021

[26] [26]

Llavanext: Im- proved reasoning, ocr, and world knowledge, 2024

Haotian Liu, Chunyuan Li, Yuheng Li, Bo Li, Yuanhan Zhang, Sheng Shen, and Yong Jae Lee. Llavanext: Im- proved reasoning, ocr, and world knowledge, 2024

work page 2024

[27] [27]

Mmfakebench: A mixed-source mul- timodal misinformation detection benchmark for lvlms,

Xuannan Liu, Zekun Li, Peipei Li, Huaibo Huang, Shuhan Xia, Xing Cui, Linzhi Huang, Weihong Deng, and Zhaofeng He. Mmfakebench: A mixed-source mul- timodal misinformation detection benchmark for lvlms. arXiv preprint arXiv:2406.08772, 2024

work page arXiv 2024

[28] [28]

NVILA: Efficient Frontier Visual Language Models

Zhijian Liu, Ligeng Zhu, Baifeng Shi, Zhuoyang Zhang, Yuming Lou, Shang Yang, Haocheng Xi, Shiyi Cao, Yux- ian Gu, Dacheng Li, et al. Nvila: Efficient frontier vi- sual language models.arXiv preprint arXiv:2412.04468, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[29] [29]

Textblob: Simplified text process- ing

Steven Loria. Textblob: Simplified text process- ing. https://textblob.readthedocs.io/en/dev/ index.html, 2026. Accessed: January, 2026

work page 2026

[30] [30]

Newsclippings: Automatic generation of out-of-context multimodal media,

Grace Luo, Trevor Darrell, and Anna Rohrbach. Newsclippings: Automatic generation of out-of-context multimodal media.arXiv preprint arXiv:2104.05893, 2021

work page arXiv 2021

[31] [31]

Local: Logical and causal fact-checking with llm-based multi- agents

Jiatong Ma, Linmei Hu, Rang Li, and Wenbo Fu. Local: Logical and causal fact-checking with llm-based multi- agents. InProceedings of the ACM on Web Conference 2025, pages 1614–1625, 2025

work page 2025

[32] [32]

Introducing community notes

Meta. Introducing community notes. https://www. meta.com/technologies/community-notes/, 2025. Accessed: September, 2025

work page 2025

[33] [33]

The creation and detec- tion of deepfakes: A survey.ACM computing surveys (CSUR), 54(1):1–41, 2021

Yisroel Mirsky and Wenke Lee. The creation and detec- tion of deepfakes: A survey.ACM computing surveys (CSUR), 54(1):1–41, 2021

work page 2021

[34] [34]

r/fakeddit: A new multimodal benchmark dataset for fine-grained fake news detection.arXiv preprint arXiv:1911.03854, 2019

Kai Nakamura, Sharon Levy, and William Yang Wang. r/fakeddit: A new multimodal benchmark dataset for fine-grained fake news detection.arXiv preprint arXiv:1911.03854, 2019

work page arXiv 1911

[35] [35]

Openai models

OpenAI. Openai models. https://platform.openai. com/docs/models/, 2025. Accessed: September, 2025

work page 2025

[36] [36]

Verite: a robust benchmark for multimodal misinfor- mation detection accounting for unimodal bias.Inter- national Journal of Multimedia Information Retrieval, 13(1):4, 2024

Stefanos-Iordanis Papadopoulos, Christos Koutlis, Symeon Papadopoulos, and Panagiotis C Petrantonakis. Verite: a robust benchmark for multimodal misinfor- mation detection accounting for unimodal bias.Inter- national Journal of Multimedia Information Retrieval, 13(1):4, 2024

work page 2024

[37] [37]

Bleu: a method for automatic evaluation of machine translation

Kishore Papineni, Salim Roukos, Todd Ward, and Wei- Jing Zhu. Bleu: a method for automatic evaluation of machine translation. InProceedings of the 40th annual meeting of the Association for Computational Linguis- tics, pages 311–318, 2002

work page 2002

[38] [38]

The psychology of fake news.Trends in cognitive sciences, 25(5):388– 402, 2021

Gordon Pennycook and David G Rand. The psychology of fake news.Trends in cognitive sciences, 25(5):388– 402, 2021

work page 2021

[39] [39]

Sniffer: Multimodal large language model for explain- able out-of-context misinformation detection

Peng Qi, Zehong Yan, Wynne Hsu, and Mong Li Lee. Sniffer: Multimodal large language model for explain- able out-of-context misinformation detection. InPro- ceedings of the IEEE/CVF conference on computer vi- sion and pattern recognition, pages 13052–13062, 2024

work page 2024

[40] [40]

Learning Transferable Visual Models From Natural Language Supervision

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. Learning transferable vi- sual models from natural language supervision.arXiv preprint arXiv:2103.00020, 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021

[41] [41]

Fin-fact: A benchmark dataset for multimodal financial fact-checking and explanation generation

Aman Rangapur, Haoran Wang, Ling Jian, and Kai Shu. Fin-fact: A benchmark dataset for multimodal financial fact-checking and explanation generation. InCompan- ion Proceedings of the ACM on Web Conference 2025, pages 785–788, 2025

work page 2025

[42] [42]

Sentence- BERT : Sentence Embeddings using S iamese BERT -Networks

Nils Reimers and Iryna Gurevych. Sentence-BERT: Sentence embeddings using siamese bert-networks. In Proceedings of the 2019 Conference on Empirical Meth- ods in Natural Language Processing, pages 3982–3992, 2019.doi:10.18653/v1/D19-1410

work page doi:10.18653/v1/d19-1410 2019

[43] [43]

How long do respondents think online surveys should be? new evi- dence from two online panels in germany.International Journal of Market Research, 62(5):538–545, 2020

Melanie Revilla and Jan Karem Höhne. How long do respondents think online surveys should be? new evi- dence from two online panels in germany.International Journal of Market Research, 62(5):538–545, 2020

work page 2020

[44] [44]

Evaluating retrieval quality in retrieval-augmented generation

Alireza Salemi and Hamed Zamani. Evaluating retrieval quality in retrieval-augmented generation. InProceed- ings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 2395–2400, 2024. 16

work page 2024

[45] [45]

Claimreview — schema.org type

Schema.org. Claimreview — schema.org type. https: //schema.org/ClaimReview. Accessed: October, 2025

work page 2025

[46] [46]

Snopes, Inc. Snopes. https://www.snopes.com/,

work page

[47] [47]

Accessed: January, 2026

work page 2026

[48] [48]

References to unbiased sources increase the helpfulness of community fact-checks.Scientific Reports, 15(1):25749, 2025

Kirill Solovev and Nicolas Pröllochs. References to unbiased sources increase the helpfulness of community fact-checks.Scientific Reports, 15(1):25749, 2025

work page 2025

[49] [49]

The proof and measurement of asso- ciation between two things

Charles Spearman. The proof and measurement of asso- ciation between two things. 1961

work page 1961

[50] [50]

Politifact

The Poynter Institute. Politifact. https://www. politifact.com/, 2026. Accessed: January, 2026

work page 2026

[51] [51]

Online de- ception in social media.Communications of the ACM, 57(9):72–80, 2014

Michail Tsikerdekis and Sherali Zeadally. Online de- ception in social media.Communications of the ACM, 57(9):72–80, 2014

work page 2014

[52] [52]

Explainable fake news detection with large language model via de- fense among competing wisdom

Bo Wang, Jing Ma, Hongzhan Lin, Zhiwei Yang, Ruichao Yang, Yuan Tian, and Yi Chang. Explainable fake news detection with large language model via de- fense among competing wisdom. InProceedings of the ACM Web Conference 2024, pages 2452–2463, 2024

work page 2024

[53] [53]

Un- derstanding the use of fauxtography on social media

Yuping Wang, Fatemeh Tahmasbi, Jeremy Blackburn, Barry Bradlyn, Emiliano De Cristofaro, David Mager- man, Savvas Zannettou, and Gianluca Stringhini. Un- derstanding the use of fauxtography on social media. InProceedings of the International AAAI Conference on Web and Social Media, volume 15, pages 776–786, 2021

work page 2021

[54] [54]

The emergence of deepfake tech- nology: A review.Technology innovation management review, 9(11), 2019

Mika Westerlund. The emergence of deepfake tech- nology: A review.Technology innovation management review, 9(11), 2019

work page 2019

[55] [55]

X community notes

X Corp. X community notes. https: //communitynotes.x.com/guide/en/about/ introduction, 2025. Accessed: September, 2025

work page 2025

[56] [56]

X developer platform api

X Corp. X developer platform api. https: //developer.x.com/en/portal/dashboard, 2025. Accessed: April, 2025

work page 2025

[57] [57]

Mmooc: A multimodal misinformation dataset for out-of-context news analysis

Qingzheng Xu, Heming Du, Huiqiang Chen, Bo Liu, and Xin Yu. Mmooc: A multimodal misinformation dataset for out-of-context news analysis. InAustralasian Conference on Information Security and Privacy, pages 444–459. Springer, 2024

work page 2024

[58] [58]

Visual misinformation on facebook.Journal of Commu- nication, 73(4):316–328, 2023

Yunkang Yang, Trevor Davis, and Matthew Hindman. Visual misinformation on facebook.Journal of Commu- nication, 73(4):316–328, 2023

work page 2023

[59] [59]

Support or refute: Analyzing the stance of ev- idence to detect out-of-context mis-and disinformation

Xin Yuan, Jie Guo, Weidong Qiu, Zheng Huang, and Shujun Li. Support or refute: Analyzing the stance of ev- idence to detect out-of-context mis-and disinformation. arXiv preprint arXiv:2311.01766, 2023

work page arXiv 2023

[60] [60]

deceptive

Jingqing Zhang, Yao Zhao, Mohammad Saleh, and Peter Liu. Pegasus: Pre-training with extracted gap-sentences for abstractive summarization. InInternational confer- ence on machine learning, pages 11328–11339. PMLR, 2020. AXCHECKDataset A.1 Topics and Factors Classification We used OpenAI GPT5 with the Prompt 1 to assign topical categories to each post. For...

work page 2020

[61] [61]

Identify the post’s main claim from the image, text, and date

work page

[62] [62]

If the claim is based on the image, check whether the image’s visual details and factual context support or contradict it

work page

[63] [63]

If the claim does not rely on the image, use knowledge and facts to support or contradict the claim

work page

[64] [64]

If external context is provided, use the provided context to sup- port or contradict the claim

work page

[65] [65]

Deceptive

If any contradiction is found (e.g., claim vs. image, claim vs. knowledge, claim vs. external context), label “Deceptive”; if none, label “Non-deceptive”. OUTPUT FORMAT (clear, unbiased, factual, relevant): - Begin with “Deceptive” or “Non-deceptive”. - Follow with 1-2 sentences citing specific visual details, knowl- edge, or relevant context. EXTERNAL CO...

work page

[66] [66]

Source Credibility: cites reliable, trustworthy sources

work page

[67] [67]

Clarity: concise and easy to understand

work page

[68] [68]

Relevance: directly addresses the post’s image/text and context

work page

[69] [69]

Veracity: factually correct and evidence-based

work page

[70] [70]

Option X

Neutrality: neutral tone, no cultural/personal bias. OUTPUT FORMAT: - Begin with “Option X”, where X is the option number. - Follow with 1-2 sentences explaining why this option is best. POST DETAILS: Image: <image>; Text: <text>; Date: <date> EV ALUATION OPTIONS:[1. {Note 1}, 2. {Note 2}, . . .] •Source Credibility •Clarity •Relevance •Font Size •Veracit...

work page