Fall into a Pit, Gain in a Wit: Cognitive-Guided Harmful Meme Detection via Misjudgment Risk Pattern Retrieval

Feiyan Duan; Jie Huang; Junjie Wang; Mingyang Li; Qing Wang; Wenshuo Wang; Yuekai Huang; Zhiyuan Chang; Ziyou Jiang

arxiv: 2510.15946 · v3 · submitted 2025-10-10 · 💻 cs.LG · cs.AI· cs.CR

Fall into a Pit, Gain in a Wit: Cognitive-Guided Harmful Meme Detection via Misjudgment Risk Pattern Retrieval

Wenshuo Wang , Ziyou Jiang , Junjie Wang , Mingyang Li , Jie Huang , Yuekai Huang , Zhiyuan Chang , Feiyan Duan

show 1 more author

Qing Wang

This is my paper

Pith reviewed 2026-05-18 08:38 UTC · model grok-4.3

classification 💻 cs.LG cs.AIcs.CR

keywords harmful meme detectionmisjudgment risk patternsmultimodal large language modelsknowledge base retrievalfalse negative false positive mitigationimplicit rhetorical devicesadversarial robustness

0 comments

The pith

Retrieving misjudgment risk patterns guides multimodal models to detect harmful memes with fewer errors.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces PatMD, which builds a knowledge base by breaking down memes into patterns that explain why they might be misjudged as harmful when they are not or missed when they are. For any new meme, the system pulls the most relevant of these patterns and feeds them into a multimodal large language model to steer its reasoning away from those specific pitfalls. This moves detection past simple content matching toward anticipating common judgment failures on irony, metaphor, and other subtle cues. Experiments across more than six thousand memes in five tasks show consistent gains over prior methods and stronger results on memes the model has not seen before.

Core claim

PatMD detects harmful memes by first constructing a knowledge base in which each meme is paired with a misjudgment risk pattern that explains whether it tends to produce false negatives or false positives, then retrieving the most relevant patterns for a target meme and using them to dynamically guide the reasoning steps of a multimodal large language model.

What carries the argument

Misjudgment risk pattern retrieval, which deconstructs each meme into an explicit account of why an MLLM is likely to overlook its harm or overread harm into benign content and then injects those accounts into the model's current reasoning trace.

If this is right

Detection performance rises by roughly 8 percent in F1-score and 7.7 percent in accuracy across multiple harmful-meme benchmarks.
The method maintains its advantage on memes that were never seen during training or that contain deliberate adversarial changes.
Guidance from past error patterns reduces both missed harmful content and false alarms on harmless content.
The same retrieval step can be applied to any multimodal large language model without retraining the underlying model.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The approach may transfer to other domains where models must interpret subtle or ironic multimodal signals, such as short videos or captioned images.
Automating the initial construction of the misjudgment pattern base could reduce the manual effort required to scale the method.
The technique highlights a general strategy of using recorded model failures as a form of external memory to improve future decisions.

Load-bearing premise

The patterns stored in the knowledge base must correctly identify the real reasons models err on memes, and feeding those patterns into the model must not introduce fresh mistakes or biases of its own.

What would settle it

Running PatMD on a fresh set of memes where the retrieved patterns produce no accuracy gain or even lower scores than the unaided model would demonstrate that the patterns are not the source of the reported improvement.

Figures

Figures reproduced from arXiv: 2510.15946 by Feiyan Duan, Jie Huang, Junjie Wang, Mingyang Li, Qing Wang, Wenshuo Wang, Yuekai Huang, Zhiyuan Chang, Ziyou Jiang.

**Figure 2.** Figure 2: The overview of PatMD. decision-centric perspective allows it to identify and guard against specific reasoning paths that lead to errors. 3 Overview of PatMD [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: The example of PatMD hierarchical harm tree and structural matching versus content matching. 3.1.2 Causal Misjudgment Attribution. Building upon the deconstructed harm profile, we then derive the misjudgment risk pattern. We analyze how the identified harmful concepts could lead an MLLM to specific misjudgments (e.g., overlooking ironic cues or misinterpreting contextual references), and produce the misju… view at source ↗

**Figure 4.** Figure 4: The case study of PatMD. which are strictly excluded from the previous training and testing datasets. As shown in [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 6.** Figure 6: Impact of the retrieved case’s number (𝐾). In addition to the ablation study in the Section 5.2, we also investigated the impact of the number of retrieved samples (top-𝐾) on model performance, with the results shown in [PITH_FULL_IMAGE:figures/full_fig_p010_6.png] view at source ↗

**Figure 7.** Figure 7: The case study of PatMD on four examples. “Harmful” Samples “Benign” Samples [PITH_FULL_IMAGE:figures/full_fig_p012_7.png] view at source ↗

**Figure 8.** Figure 8: Examples of unseen memes [PITH_FULL_IMAGE:figures/full_fig_p012_8.png] view at source ↗

**Figure 9.** Figure 9: Prompt for hierarchical harm tree construction. [PITH_FULL_IMAGE:figures/full_fig_p013_9.png] view at source ↗

**Figure 10.** Figure 10: Prompt for harmful memes detection [PITH_FULL_IMAGE:figures/full_fig_p013_10.png] view at source ↗

**Figure 11.** Figure 11: Prompt for misjudgment risk pattern generation. [PITH_FULL_IMAGE:figures/full_fig_p014_11.png] view at source ↗

read the original abstract

Internet memes have emerged as a popular multimodal medium, yet they are increasingly weaponized to convey harmful opinions through subtle rhetorical devices like irony and metaphor. Existing detection approaches, including Multimodal Large Language Model (MLLM)-based techniques, struggle with these implicit expressions, leading to frequent misjudgments. This paper introduces PatMD, a novel approach that detects harmful memes by learning from and proactively mitigating these potential misjudgment risks. Our core idea is to move beyond superficial content-level matching and instead identify the underlying misjudgment risk patterns, proactively guiding the MLLMs to avoid known misjudgment pitfalls. We first construct a knowledge base where each meme is deconstructed into a misjudgment risk pattern explaining why it might be misjudged, either overlooking harmful undertones (false negative) or overinterpreting benign content (false positive). For a given target meme, PatMD retrieves relevant patterns and utilizes them to dynamically guide the MLLM's reasoning. Experiments on a benchmark of 6,626 memes across 5 harmful detection tasks show that PatMD outperforms state-of-the-art baselines, achieving an average of 8.30% improvement in F1-score and 7.71% improvement in accuracy, while exhibiting consistent robustness on unseen and adversarial memes.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper introduces PatMD, a method for harmful meme detection that constructs a knowledge base by deconstructing each meme into a misjudgment risk pattern (explaining potential false-negative overlooking of harm or false-positive overinterpretation of benign content) and retrieves relevant patterns to dynamically guide MLLM reasoning. It reports that this yields an average 8.30% F1-score and 7.71% accuracy improvement over state-of-the-art baselines on a 6,626-meme benchmark spanning 5 tasks, with added robustness to unseen and adversarial memes.

Significance. If the central mechanism is shown to be causally responsible for the gains, the work offers a cognitively motivated way to mitigate implicit rhetorical devices in multimodal content, which could improve reliability of MLLM-based moderation systems. The scale of the benchmark and focus on proactive error-pattern retrieval are strengths that, if substantiated, would advance the subfield beyond standard prompting or fine-tuning approaches.

major comments (3)

[§3] §3 (Method, knowledge-base construction): The deconstruction of memes into misjudgment risk patterns lacks any reported validation against actual observed MLLM errors on held-out data (e.g., no correlation analysis, inter-annotator agreement with model outputs, or ablation measuring error reduction attributable to specific patterns). Without this, the patterns may function as post-hoc rationalizations, and the attribution of the 8.30% F1 gain specifically to risk-pattern retrieval cannot be verified.
[§4.2] §4.2 (Experiments, baselines and ablations): The manuscript provides no statistical significance tests, confidence intervals, or controls for confounding factors such as variations in prompt engineering or retrieval hyperparameters; this makes it impossible to isolate whether the reported improvements and robustness on adversarial memes stem from the proposed mechanism rather than other implementation choices.
[§4.1] §4.1 (Retrieval mechanism): The description of how patterns are retrieved and injected into the MLLM prompt is insufficiently detailed to assess whether the process introduces new biases or spurious guidance, which directly bears on the weakest assumption that retrieved patterns reliably mitigate the intended misjudgments.

minor comments (2)

[Abstract] The abstract and §4 should explicitly name the five harmful detection tasks and the exact baseline models to allow readers to assess the scope of the claimed gains.
[§3] Notation for false-negative vs. false-positive patterns could be made more consistent across figures and text to improve readability.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback on our manuscript. We have carefully reviewed each major comment and provide point-by-point responses below, outlining the revisions we plan to incorporate to address the concerns raised.

read point-by-point responses

Referee: [§3] §3 (Method, knowledge-base construction): The deconstruction of memes into misjudgment risk patterns lacks any reported validation against actual observed MLLM errors on held-out data (e.g., no correlation analysis, inter-annotator agreement with model outputs, or ablation measuring error reduction attributable to specific patterns). Without this, the patterns may function as post-hoc rationalizations, and the attribution of the 8.30% F1 gain specifically to risk-pattern retrieval cannot be verified.

Authors: We agree that direct validation of the misjudgment risk patterns against observed MLLM errors on held-out data would strengthen the causal link to the reported gains. The patterns in the current manuscript are constructed from cognitive principles of meme interpretation drawn from prior literature on rhetorical devices and misjudgment risks. In the revised manuscript, we will add a dedicated analysis that correlates retrieved patterns with specific error categories (false negatives and false positives) made by baseline MLLMs on a held-out portion of the benchmark. We will also include an ablation that quantifies error reduction when patterns are used versus withheld, to better demonstrate that the patterns proactively mitigate the targeted misjudgments rather than serving solely as post-hoc rationalizations. revision: yes
Referee: [§4.2] §4.2 (Experiments, baselines and ablations): The manuscript provides no statistical significance tests, confidence intervals, or controls for confounding factors such as variations in prompt engineering or retrieval hyperparameters; this makes it impossible to isolate whether the reported improvements and robustness on adversarial memes stem from the proposed mechanism rather than other implementation choices.

Authors: We acknowledge that the absence of statistical significance testing and explicit controls for confounding factors such as prompt variations and retrieval hyperparameters weakens the isolation of the proposed mechanism's contribution. In the revised manuscript, we will add paired statistical tests (e.g., McNemar's test) along with 95% confidence intervals for all F1 and accuracy metrics across the five tasks. We will further include additional ablation experiments that systematically vary prompt templates and retrieval hyperparameters (including top-k values and similarity thresholds) while holding other components fixed, to show that the observed gains and robustness to adversarial memes remain consistent and are primarily driven by the risk-pattern retrieval component. revision: yes
Referee: [§4.1] §4.1 (Retrieval mechanism): The description of how patterns are retrieved and injected into the MLLM prompt is insufficiently detailed to assess whether the process introduces new biases or spurious guidance, which directly bears on the weakest assumption that retrieved patterns reliably mitigate the intended misjudgments.

Authors: We thank the referee for highlighting the need for greater specificity in the retrieval and injection process. The original manuscript provides only a high-level description in Section 4.1. In the revised version, we will substantially expand this section with a precise algorithmic account, including the embedding model and similarity function used for retrieval, the exact prompt template for pattern injection, and concrete examples illustrating how a retrieved pattern steers the MLLM away from a targeted misjudgment. We will also add a short discussion of potential biases that could arise from retrieval and how the cognitive grounding of the patterns is intended to reduce the risk of spurious guidance. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation relies on external benchmark evaluation

full rationale

The paper constructs a knowledge base of misjudgment risk patterns from memes and retrieves them to guide MLLM reasoning for harmful meme detection, with performance measured on an external benchmark of 6,626 memes across 5 tasks. No equations, self-definitional reductions, fitted inputs renamed as predictions, or load-bearing self-citations appear in the abstract or description. The claimed improvements (8.30% F1, 7.71% accuracy) are presented as empirical outcomes rather than by-construction equivalences to inputs. The approach is self-contained against external benchmarks, consistent with a normal non-circular finding.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Abstract-only review; no explicit free parameters, axioms, or invented entities are detailed beyond the core assumption that risk patterns exist and can be retrieved usefully.

axioms (1)

domain assumption MLLMs can be effectively guided by retrieved misjudgment risk patterns to reduce false negatives and false positives
Central premise of the PatMD approach stated in the abstract.

pith-pipeline@v0.9.0 · 5798 in / 1216 out tokens · 56007 ms · 2026-05-18T08:38:33.415984+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

52 extracted references · 52 canonical work pages · 7 internal anchors

[1]

Piush Aggarwal, Pranit Chawla, Mithun Das, Punyajoy Saha, Binny Mathew, Torsten Zesch, and Animesh Mukherjee. 2023. Hateproof: Are hateful meme detection systems really robust?. InProceedings of the ACM Web Conference 2023. 3734–3743

work page 2023
[2]

Akari Asai, Zeqiu Wu, Yizhong Wang, Avirup Sil, and Hannaneh Hajishirzi

work page
[3]

Self-rag: Learning to retrieve, generate, and critique through self-reflection. (2024)

work page 2024
[4]

Shuai Bai, Keqin Chen, Xuejing Liu, Jialin Wang, Wenbin Ge, Sibo Song, Kai Dang, Peng Wang, Shijie Wang, Jun Tang, et al. 2025. Qwen2.5-vl technical report. arXiv preprint arXiv:2502.13923(2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[5]

Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. 2020. Language models are few-shot learners.Advances in neural information processing systems33 (2020), 1877–1901

work page 2020
[6]

Yitao Cai, Huiyu Cai, and Xiaojun Wan. 2019. Multi-modal sarcasm detection in twitter with hierarchical fusion model. InProceedings of the 57th annual meeting of the association for computational linguistics. 2506–2515

work page 2019
[7]

Rui Cao, Ming Shan Hee, Adriel Kuek, Wen-Haw Chong, Roy Ka-Wei Lee, and Jing Jiang. 2023. Pro-cap: Leveraging a frozen vision-language model for hateful meme detection. InProceedings of the 31st ACM international conference on multimedia. 5244–5252

work page 2023
[8]

Rui Cao, Roy Ka-Wei Lee, Wen-Haw Chong, and Jing Jiang. 2022. Prompting for Multimodal Hateful Meme Classification. InProceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. 321–332

work page 2022
[9]

Rui Cao, Roy Ka-Wei Lee, and Jing Jiang. 2024. Modularized networks for few- shot hateful meme detection. InProceedings of the ACM Web Conference 2024. 4575–4584

work page 2024
[10]

Yen-Chun Chen, Linjie Li, Licheng Yu, Ahmed El Kholy, Faisal Ahmed, Zhe Gan, Yu Cheng, and Jingjing Liu. 2020. Uniter: Universal image-text representation learning. InEuropean conference on computer vision. Springer, 104–120

work page 2020
[11]

Zixin Chen, Hongzhan Lin, Kaixin Li, Ziyang Luo, Zhen Ye, Guang Chen, Zhiyong Huang, and Jing Ma. 2025. AdamMeme: Adaptively Probe the Reasoning Capacity of Multimodal Large Language Models on Harmfulness. InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics. 4234–4253

work page 2025
[12]

Aakanksha Chowdhery, Sharan Narang, Jacob Devlin, Maarten Bosma, Gaurav Mishra, Adam Roberts, Paul Barham, Hyung Won Chung, Charles Sutton, Se- bastian Gehrmann, et al. 2023. Palm: Scaling language modeling with pathways. Journal of Machine Learning Research24, 240 (2023), 1–113

work page 2023
[13]

Hyung Won Chung, Le Hou, Shayne Longpre, Barret Zoph, Yi Tay, William Fedus, Yunxuan Li, Xuezhi Wang, Mostafa Dehghani, Siddhartha Brahma, et al. 2024. Scaling instruction-finetuned language models.Journal of Machine Learning Research25, 70 (2024), 1–53

work page 2024
[14]

Elisabetta Fersini, Francesca Gasparini, Giulia Rizzi, Aurora Saibene, Berta Chulvi, Paolo Rosso, Alyssa Lees, and Jeffrey Sorensen. 2022. SemEval-2022 Task 5: Multi- media automatic misogyny identification. InProceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022). 533–549

work page 2022
[15]

Michael Günther, Saba Sturua, Mohammad Kalim Akram, Isabelle Mohr, Andrei Ungureanu, Bo Wang, Sedigheh Eslami, Scott Martens, Maximilian Werk, Nan Wang, et al. 2025. jina-embeddings-v4: Universal Embeddings for Multimodal Multilingual Retrieval.arXiv preprint arXiv:2506.18902(2025)

work page arXiv 2025
[16]

Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasupat, and Mingwei Chang. 2020. Retrieval augmented language model pre-training. InInternational conference on machine learning. PMLR, 3929–3938

work page 2020
[17]

Ming Shan Hee, Roy Ka-Wei Lee, and Wen-Haw Chong. 2022. On explaining multimodal hateful meme detection models. InProceedings of the ACM web conference 2022. 3651–3655

work page 2022
[18]

Jianzhao Huang, Hongzhan Lin, Liu Ziyan, Ziyang Luo, Guang Chen, and Jing Ma. 2024. Towards Low-Resource Harmful Meme Detection with LMM Agents. InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. 2269–2293

work page 2024
[19]

Aaron Hurst, Adam Lerer, Adam P Goucher, Adam Perelman, Aditya Ramesh, Aidan Clark, AJ Ostrow, Akila Welihinda, Alan Hayes, Alec Radford, et al. 2024. Gpt-4o system card.arXiv preprint arXiv:2410.21276(2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024
[20]

Junhui Ji, Xuanrui Lin, and Usman Naseem. 2024. Capalign: Improving cross modal alignment via informative captioning for harmful meme detection. In Proceedings of the ACM Web Conference 2024. 4585–4594

work page 2024
[21]

Junhui Ji, Wei Ren, and Usman Naseem. 2023. Identifying Creative Harmful Memes via Prompt based Approach. InProceedings of the ACM Web Conference 2023 (WWW ’23). Association for Computing Machinery, 3868–3872. doi:10.1145/ 3543507.3587427

work page arXiv 2023
[22]

Zhengbao Jiang, Frank F Xu, Luyu Gao, Zhiqing Sun, Qian Liu, Jane Dwivedi-Yu, Yiming Yang, Jamie Callan, and Graham Neubig. 2023. Active retrieval augmented generation. InProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. 7969–7992

work page 2023
[23]

Douwe Kiela, Suvrat Bhooshan, Hamed Firooz, Ethan Perez, and Davide Testug- gine. 2019. Supervised multimodal bitransformers for classifying images and text.arXiv preprint arXiv:1909.02950(2019)

work page arXiv 2019
[24]

Douwe Kiela, Hamed Firooz, Aravind Mohan, Vedanuj Goswami, Amanpreet Singh, Pratik Ringshia, and Davide Testuggine. 2020. The hateful memes chal- lenge: Detecting hate speech in multimodal memes.Advances in neural informa- tion processing systems33 (2020), 2611–2624

work page 2020
[25]

Takeshi Kojima, Shixiang Shane Gu, Machel Reid, Yutaka Matsuo, and Yusuke Iwasawa. 2022. Large language models are zero-shot reasoners.Advances in neural information processing systems35 (2022), 22199–22213

work page 2022
[26]

Gonzalez, Hao Zhang, and Ion Stoica

Woosuk Kwon, Zhuohan Li, Siyuan Zhuang, Ying Sheng, Lianmin Zheng, Cody Hao Yu, Joseph E. Gonzalez, Hao Zhang, and Ion Stoica. 2023. Efficient Mem- ory Management for Large Language Model Serving with PagedAttention. In Proceedings of the ACM SIGOPS 29th Symposium on Operating Systems Principles

work page 2023
[27]

Roy Ka-Wei Lee, Rui Cao, Ziqing Fan, Jing Jiang, and Wen-Haw Chong. 2021. Disentangling hate in online memes. InProceedings of the 29th ACM international conference on multimedia. 5138–5147

work page 2021
[28]

Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, et al. 2020. Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in neural information processing systems33 (2020), 9459–9474

work page 2020
[29]

Liunian Harold Li, Mark Yatskar, Da Yin, Cho-Jui Hsieh, and Kai-Wei Chang

work page
[30]

VisualBERT: A Simple and Performant Baseline for Vision and Language

VisualBERT: A simple and performant baseline for vision and language. arXiv preprint arXiv:1908.03557(2019)

work page internal anchor Pith review Pith/arXiv arXiv 1908
[31]

Hongzhan Lin, Ziyang Luo, Jing Ma, and Long Chen. 2023. Beneath the Surface: Unveiling Harmful Memes with Multimodal Reasoning Distilled from Large Language Models. InFindings of the Association for Computational Linguistics: EMNLP 2023, Houda Bouamor, Juan Pino, and Kalika Bali (Eds.). Association for Computational Linguistics, Singapore, 9114–9128. doi:...

work page doi:10.18653/v1/2023.findings- 2023
[32]

Hongzhan Lin, Ziyang Luo, Bo Wang, Ruichao Yang, and Jing Ma. 2024. Goat- bench: Safety insights to large multimodal models through meme-based social abuse.ACM Transactions on Intelligent Systems and Technology(2024)

work page 2024
[33]

Xuanrui Lin, Chao Jia, Junhui Ji, Hui Han, and Usman Naseem. 2025. Ask, acquire, understand: A multimodal agent-based framework for social abuse detection in memes. InProceedings of the ACM on Web Conference 2025. 4734–4744

work page 2025
[34]

Phillip Lippe, Nithin Holla, Shantanu Chandra, Santhosh Rajamanickam, Geor- gios Antoniou, Ekaterina Shutova, and Helen Yannakoudakis. 2020. A multimodal framework for the detection of hateful memes.arXiv preprint arXiv:2012.12871 (2020)

work page arXiv 2020
[35]

Haotian Liu, Chunyuan Li, Yuheng Li, and Yong Jae Lee. 2024. Improved baselines with visual instruction tuning. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition. 26296–26306

work page 2024
[36]

Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. Roberta: A robustly optimized bert pretraining approach.arXiv preprint arXiv:1907.11692 (2019)

work page internal anchor Pith review Pith/arXiv arXiv 2019
[37]

Ziyan Liu, Chunxiao Fan, Haoran Lou, Yuexin Wu, and Kaiwei Deng. 2025. MIND: A Multi-agent Framework for Zero-shot Harmful Meme Detection. InProceedings Conference acronym ’XX, June 03–05, 2018, Woodstock, NY Wang et al. of the 63rd Annual Meeting of the Association for Computational Linguistics. 923– 947

work page 2025
[38]

Jiasen Lu, Dhruv Batra, Devi Parikh, and Stefan Lee. 2019. ViLBERT: Pretrain- ing task-agnostic visiolinguistic representations for vision-and-language tasks. Advances in neural information processing systems32 (2019)

work page 2019
[39]

Abdullah Mazhar, Zuhair Hasan Shaik, Aseem Srivastava, Polly Ruhnke, La- vanya Vaddavalli, Sri Keshav Katragadda, Shweta Yadav, and Md Shad Akhtar

work page
[40]

InProceedings of the ACM on Web Conference 2025

Figurative-cum-Commonsense Knowledge Infusion for Multimodal Mental Health Meme Classification. InProceedings of the ACM on Web Conference 2025. 637–648

work page 2025
[41]

Jingbiao Mei, Jinghong Chen, Guangyu Yang, Weizhe Lin, and Bill Byrne. 2025. Robust adaptation of large multimodal models for retrieval augmented hateful meme detection. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. 23817–23839

work page 2025
[42]

Niklas Muennighoff. 2020. Vilio: State-of-the-art visio-linguistic models applied to hateful memes.arXiv preprint arXiv:2012.07788(2020)

work page arXiv 2020
[43]

Shad Akhtar, Preslav Nakov, and Tanmoy Chakraborty

Shraman Pramanick, Shivam Sharma, Dimitar Dimitrov, Md. Shad Akhtar, Preslav Nakov, and Tanmoy Chakraborty. 2021. MOMENTA: A Multimodal Framework for Detecting Harmful Memes and Their Targets. InFindings of the Associa- tion for Computational Linguistics: EMNLP 2021. Association for Computational Linguistics, 4439–4455. doi:10.18653/v1/2021.findings-emnlp.379

work page doi:10.18653/v1/2021.findings-emnlp.379 2021
[44]

Ori Ram, Yoav Levine, Itay Dalmedigos, Dor Muhlgay, Amnon Shashua, Kevin Leyton-Brown, and Yoav Shoham. 2023. In-context retrieval-augmented language models.Transactions of the Association for Computational Linguistics11 (2023), 1316–1331

work page 2023
[45]

Vlad Sandulescu. 2020. Detecting hateful memes using a multimodal deep en- semble.arXiv preprint arXiv:2012.13235(2020)

work page arXiv 2020
[46]

Shivam Sharma, Firoj Alam, Md Shad Akhtar, Dimitar Dimitrov, Giovanni Da San Martino, Hamed Firooz, Alon Halevy, Fabrizio Silvestri, Preslav Nakov, and Tanmoy Chakraborty. 2022. Detecting and understanding harmful memes: A survey.arXiv preprint arXiv:2205.04274(2022)

work page arXiv 2022
[47]

Shardul Suryawanshi, Bharathi Raja Chakravarthi, Mihael Arcan, and Paul Buite- laar. 2020. Multimodal meme dataset (MultiOFF) for identifying offensive content in image and text. InProceedings of the second workshop on trolling, aggression and cyberbullying. 32–41

work page 2020
[48]

Weiyun Wang, Zhangwei Gao, Lixin Gu, Hengjun Pu, Long Cui, Xingguang Wei, Zhaoyang Liu, Linglin Jing, Shenglong Ye, Jie Shao, et al. 2025. Internvl3.5: Ad- vancing open-source multimodal models in versatility, reasoning, and efficiency. arXiv preprint arXiv:2508.18265(2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[49]

Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Fei Xia, Ed Chi, Quoc V Le, Denny Zhou, et al. 2022. Chain-of-thought prompting elicits reasoning in large language models.Advances in neural information processing systems35 (2022), 24824–24837

work page 2022
[50]

Zhepei Wei, Wei-Lin Chen, and Yu Meng. 2024. Instructrag: Instructing retrieval-augmented generation via self-synthesized rationales.arXiv preprint arXiv:2406.13629(2024)

work page arXiv 2024
[51]

Zhuosheng Zhang, Aston Zhang, Mu Li, and Alex Smola. 2022. Automatic chain of thought prompting in large language models.arXiv preprint arXiv:2210.03493 (2022)

work page internal anchor Pith review Pith/arXiv arXiv 2022
[52]

InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models

Jinguo Zhu, Weiyun Wang, Zhe Chen, Zhaoyang Liu, Shenglong Ye, Lixin Gu, Hao Tian, Yuchen Duan, Weijie Su, Jie Shao, et al. 2025. Internvl3: Exploring advanced training and test-time recipes for open-source multimodal models. arXiv preprint arXiv:2504.10479(2025). A Analysis of Retrieved Samples (𝐾) 72 74 76 78 80 1 2 3 4 5 6 top-K 54 56 58 60 62 64 66 Ha...

work page internal anchor Pith review Pith/arXiv arXiv 2025

[1] [1]

Piush Aggarwal, Pranit Chawla, Mithun Das, Punyajoy Saha, Binny Mathew, Torsten Zesch, and Animesh Mukherjee. 2023. Hateproof: Are hateful meme detection systems really robust?. InProceedings of the ACM Web Conference 2023. 3734–3743

work page 2023

[2] [2]

Akari Asai, Zeqiu Wu, Yizhong Wang, Avirup Sil, and Hannaneh Hajishirzi

work page

[3] [3]

Self-rag: Learning to retrieve, generate, and critique through self-reflection. (2024)

work page 2024

[4] [4]

Shuai Bai, Keqin Chen, Xuejing Liu, Jialin Wang, Wenbin Ge, Sibo Song, Kai Dang, Peng Wang, Shijie Wang, Jun Tang, et al. 2025. Qwen2.5-vl technical report. arXiv preprint arXiv:2502.13923(2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025

[5] [5]

Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. 2020. Language models are few-shot learners.Advances in neural information processing systems33 (2020), 1877–1901

work page 2020

[6] [6]

Yitao Cai, Huiyu Cai, and Xiaojun Wan. 2019. Multi-modal sarcasm detection in twitter with hierarchical fusion model. InProceedings of the 57th annual meeting of the association for computational linguistics. 2506–2515

work page 2019

[7] [7]

Rui Cao, Ming Shan Hee, Adriel Kuek, Wen-Haw Chong, Roy Ka-Wei Lee, and Jing Jiang. 2023. Pro-cap: Leveraging a frozen vision-language model for hateful meme detection. InProceedings of the 31st ACM international conference on multimedia. 5244–5252

work page 2023

[8] [8]

Rui Cao, Roy Ka-Wei Lee, Wen-Haw Chong, and Jing Jiang. 2022. Prompting for Multimodal Hateful Meme Classification. InProceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. 321–332

work page 2022

[9] [9]

Rui Cao, Roy Ka-Wei Lee, and Jing Jiang. 2024. Modularized networks for few- shot hateful meme detection. InProceedings of the ACM Web Conference 2024. 4575–4584

work page 2024

[10] [10]

Yen-Chun Chen, Linjie Li, Licheng Yu, Ahmed El Kholy, Faisal Ahmed, Zhe Gan, Yu Cheng, and Jingjing Liu. 2020. Uniter: Universal image-text representation learning. InEuropean conference on computer vision. Springer, 104–120

work page 2020

[11] [11]

Zixin Chen, Hongzhan Lin, Kaixin Li, Ziyang Luo, Zhen Ye, Guang Chen, Zhiyong Huang, and Jing Ma. 2025. AdamMeme: Adaptively Probe the Reasoning Capacity of Multimodal Large Language Models on Harmfulness. InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics. 4234–4253

work page 2025

[12] [12]

Aakanksha Chowdhery, Sharan Narang, Jacob Devlin, Maarten Bosma, Gaurav Mishra, Adam Roberts, Paul Barham, Hyung Won Chung, Charles Sutton, Se- bastian Gehrmann, et al. 2023. Palm: Scaling language modeling with pathways. Journal of Machine Learning Research24, 240 (2023), 1–113

work page 2023

[13] [13]

Hyung Won Chung, Le Hou, Shayne Longpre, Barret Zoph, Yi Tay, William Fedus, Yunxuan Li, Xuezhi Wang, Mostafa Dehghani, Siddhartha Brahma, et al. 2024. Scaling instruction-finetuned language models.Journal of Machine Learning Research25, 70 (2024), 1–53

work page 2024

[14] [14]

Elisabetta Fersini, Francesca Gasparini, Giulia Rizzi, Aurora Saibene, Berta Chulvi, Paolo Rosso, Alyssa Lees, and Jeffrey Sorensen. 2022. SemEval-2022 Task 5: Multi- media automatic misogyny identification. InProceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022). 533–549

work page 2022

[15] [15]

Michael Günther, Saba Sturua, Mohammad Kalim Akram, Isabelle Mohr, Andrei Ungureanu, Bo Wang, Sedigheh Eslami, Scott Martens, Maximilian Werk, Nan Wang, et al. 2025. jina-embeddings-v4: Universal Embeddings for Multimodal Multilingual Retrieval.arXiv preprint arXiv:2506.18902(2025)

work page arXiv 2025

[16] [16]

Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasupat, and Mingwei Chang. 2020. Retrieval augmented language model pre-training. InInternational conference on machine learning. PMLR, 3929–3938

work page 2020

[17] [17]

Ming Shan Hee, Roy Ka-Wei Lee, and Wen-Haw Chong. 2022. On explaining multimodal hateful meme detection models. InProceedings of the ACM web conference 2022. 3651–3655

work page 2022

[18] [18]

Jianzhao Huang, Hongzhan Lin, Liu Ziyan, Ziyang Luo, Guang Chen, and Jing Ma. 2024. Towards Low-Resource Harmful Meme Detection with LMM Agents. InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. 2269–2293

work page 2024

[19] [19]

Aaron Hurst, Adam Lerer, Adam P Goucher, Adam Perelman, Aditya Ramesh, Aidan Clark, AJ Ostrow, Akila Welihinda, Alan Hayes, Alec Radford, et al. 2024. Gpt-4o system card.arXiv preprint arXiv:2410.21276(2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024

[20] [20]

Junhui Ji, Xuanrui Lin, and Usman Naseem. 2024. Capalign: Improving cross modal alignment via informative captioning for harmful meme detection. In Proceedings of the ACM Web Conference 2024. 4585–4594

work page 2024

[21] [21]

Junhui Ji, Wei Ren, and Usman Naseem. 2023. Identifying Creative Harmful Memes via Prompt based Approach. InProceedings of the ACM Web Conference 2023 (WWW ’23). Association for Computing Machinery, 3868–3872. doi:10.1145/ 3543507.3587427

work page arXiv 2023

[22] [22]

Zhengbao Jiang, Frank F Xu, Luyu Gao, Zhiqing Sun, Qian Liu, Jane Dwivedi-Yu, Yiming Yang, Jamie Callan, and Graham Neubig. 2023. Active retrieval augmented generation. InProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. 7969–7992

work page 2023

[23] [23]

Douwe Kiela, Suvrat Bhooshan, Hamed Firooz, Ethan Perez, and Davide Testug- gine. 2019. Supervised multimodal bitransformers for classifying images and text.arXiv preprint arXiv:1909.02950(2019)

work page arXiv 2019

[24] [24]

Douwe Kiela, Hamed Firooz, Aravind Mohan, Vedanuj Goswami, Amanpreet Singh, Pratik Ringshia, and Davide Testuggine. 2020. The hateful memes chal- lenge: Detecting hate speech in multimodal memes.Advances in neural informa- tion processing systems33 (2020), 2611–2624

work page 2020

[25] [25]

Takeshi Kojima, Shixiang Shane Gu, Machel Reid, Yutaka Matsuo, and Yusuke Iwasawa. 2022. Large language models are zero-shot reasoners.Advances in neural information processing systems35 (2022), 22199–22213

work page 2022

[26] [26]

Gonzalez, Hao Zhang, and Ion Stoica

Woosuk Kwon, Zhuohan Li, Siyuan Zhuang, Ying Sheng, Lianmin Zheng, Cody Hao Yu, Joseph E. Gonzalez, Hao Zhang, and Ion Stoica. 2023. Efficient Mem- ory Management for Large Language Model Serving with PagedAttention. In Proceedings of the ACM SIGOPS 29th Symposium on Operating Systems Principles

work page 2023

[27] [27]

Roy Ka-Wei Lee, Rui Cao, Ziqing Fan, Jing Jiang, and Wen-Haw Chong. 2021. Disentangling hate in online memes. InProceedings of the 29th ACM international conference on multimedia. 5138–5147

work page 2021

[28] [28]

Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, et al. 2020. Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in neural information processing systems33 (2020), 9459–9474

work page 2020

[29] [29]

Liunian Harold Li, Mark Yatskar, Da Yin, Cho-Jui Hsieh, and Kai-Wei Chang

work page

[30] [30]

VisualBERT: A Simple and Performant Baseline for Vision and Language

VisualBERT: A simple and performant baseline for vision and language. arXiv preprint arXiv:1908.03557(2019)

work page internal anchor Pith review Pith/arXiv arXiv 1908

[31] [31]

Hongzhan Lin, Ziyang Luo, Jing Ma, and Long Chen. 2023. Beneath the Surface: Unveiling Harmful Memes with Multimodal Reasoning Distilled from Large Language Models. InFindings of the Association for Computational Linguistics: EMNLP 2023, Houda Bouamor, Juan Pino, and Kalika Bali (Eds.). Association for Computational Linguistics, Singapore, 9114–9128. doi:...

work page doi:10.18653/v1/2023.findings- 2023

[32] [32]

Hongzhan Lin, Ziyang Luo, Bo Wang, Ruichao Yang, and Jing Ma. 2024. Goat- bench: Safety insights to large multimodal models through meme-based social abuse.ACM Transactions on Intelligent Systems and Technology(2024)

work page 2024

[33] [33]

Xuanrui Lin, Chao Jia, Junhui Ji, Hui Han, and Usman Naseem. 2025. Ask, acquire, understand: A multimodal agent-based framework for social abuse detection in memes. InProceedings of the ACM on Web Conference 2025. 4734–4744

work page 2025

[34] [34]

Phillip Lippe, Nithin Holla, Shantanu Chandra, Santhosh Rajamanickam, Geor- gios Antoniou, Ekaterina Shutova, and Helen Yannakoudakis. 2020. A multimodal framework for the detection of hateful memes.arXiv preprint arXiv:2012.12871 (2020)

work page arXiv 2020

[35] [35]

Haotian Liu, Chunyuan Li, Yuheng Li, and Yong Jae Lee. 2024. Improved baselines with visual instruction tuning. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition. 26296–26306

work page 2024

[36] [36]

Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. Roberta: A robustly optimized bert pretraining approach.arXiv preprint arXiv:1907.11692 (2019)

work page internal anchor Pith review Pith/arXiv arXiv 2019

[37] [37]

Ziyan Liu, Chunxiao Fan, Haoran Lou, Yuexin Wu, and Kaiwei Deng. 2025. MIND: A Multi-agent Framework for Zero-shot Harmful Meme Detection. InProceedings Conference acronym ’XX, June 03–05, 2018, Woodstock, NY Wang et al. of the 63rd Annual Meeting of the Association for Computational Linguistics. 923– 947

work page 2025

[38] [38]

Jiasen Lu, Dhruv Batra, Devi Parikh, and Stefan Lee. 2019. ViLBERT: Pretrain- ing task-agnostic visiolinguistic representations for vision-and-language tasks. Advances in neural information processing systems32 (2019)

work page 2019

[39] [39]

Abdullah Mazhar, Zuhair Hasan Shaik, Aseem Srivastava, Polly Ruhnke, La- vanya Vaddavalli, Sri Keshav Katragadda, Shweta Yadav, and Md Shad Akhtar

work page

[40] [40]

InProceedings of the ACM on Web Conference 2025

Figurative-cum-Commonsense Knowledge Infusion for Multimodal Mental Health Meme Classification. InProceedings of the ACM on Web Conference 2025. 637–648

work page 2025

[41] [41]

Jingbiao Mei, Jinghong Chen, Guangyu Yang, Weizhe Lin, and Bill Byrne. 2025. Robust adaptation of large multimodal models for retrieval augmented hateful meme detection. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. 23817–23839

work page 2025

[42] [42]

Niklas Muennighoff. 2020. Vilio: State-of-the-art visio-linguistic models applied to hateful memes.arXiv preprint arXiv:2012.07788(2020)

work page arXiv 2020

[43] [43]

Shad Akhtar, Preslav Nakov, and Tanmoy Chakraborty

Shraman Pramanick, Shivam Sharma, Dimitar Dimitrov, Md. Shad Akhtar, Preslav Nakov, and Tanmoy Chakraborty. 2021. MOMENTA: A Multimodal Framework for Detecting Harmful Memes and Their Targets. InFindings of the Associa- tion for Computational Linguistics: EMNLP 2021. Association for Computational Linguistics, 4439–4455. doi:10.18653/v1/2021.findings-emnlp.379

work page doi:10.18653/v1/2021.findings-emnlp.379 2021

[44] [44]

Ori Ram, Yoav Levine, Itay Dalmedigos, Dor Muhlgay, Amnon Shashua, Kevin Leyton-Brown, and Yoav Shoham. 2023. In-context retrieval-augmented language models.Transactions of the Association for Computational Linguistics11 (2023), 1316–1331

work page 2023

[45] [45]

Vlad Sandulescu. 2020. Detecting hateful memes using a multimodal deep en- semble.arXiv preprint arXiv:2012.13235(2020)

work page arXiv 2020

[46] [46]

Shivam Sharma, Firoj Alam, Md Shad Akhtar, Dimitar Dimitrov, Giovanni Da San Martino, Hamed Firooz, Alon Halevy, Fabrizio Silvestri, Preslav Nakov, and Tanmoy Chakraborty. 2022. Detecting and understanding harmful memes: A survey.arXiv preprint arXiv:2205.04274(2022)

work page arXiv 2022

[47] [47]

Shardul Suryawanshi, Bharathi Raja Chakravarthi, Mihael Arcan, and Paul Buite- laar. 2020. Multimodal meme dataset (MultiOFF) for identifying offensive content in image and text. InProceedings of the second workshop on trolling, aggression and cyberbullying. 32–41

work page 2020

[48] [48]

Weiyun Wang, Zhangwei Gao, Lixin Gu, Hengjun Pu, Long Cui, Xingguang Wei, Zhaoyang Liu, Linglin Jing, Shenglong Ye, Jie Shao, et al. 2025. Internvl3.5: Ad- vancing open-source multimodal models in versatility, reasoning, and efficiency. arXiv preprint arXiv:2508.18265(2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025

[49] [49]

Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Fei Xia, Ed Chi, Quoc V Le, Denny Zhou, et al. 2022. Chain-of-thought prompting elicits reasoning in large language models.Advances in neural information processing systems35 (2022), 24824–24837

work page 2022

[50] [50]

Zhepei Wei, Wei-Lin Chen, and Yu Meng. 2024. Instructrag: Instructing retrieval-augmented generation via self-synthesized rationales.arXiv preprint arXiv:2406.13629(2024)

work page arXiv 2024

[51] [51]

Zhuosheng Zhang, Aston Zhang, Mu Li, and Alex Smola. 2022. Automatic chain of thought prompting in large language models.arXiv preprint arXiv:2210.03493 (2022)

work page internal anchor Pith review Pith/arXiv arXiv 2022

[52] [52]

InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models

Jinguo Zhu, Weiyun Wang, Zhe Chen, Zhaoyang Liu, Shenglong Ye, Lixin Gu, Hao Tian, Yuchen Duan, Weijie Su, Jie Shao, et al. 2025. Internvl3: Exploring advanced training and test-time recipes for open-source multimodal models. arXiv preprint arXiv:2504.10479(2025). A Analysis of Retrieved Samples (𝐾) 72 74 76 78 80 1 2 3 4 5 6 top-K 54 56 58 60 62 64 66 Ha...

work page internal anchor Pith review Pith/arXiv arXiv 2025