pith. sign in

arxiv: 2605.24344 · v1 · pith:DGWFIGNQnew · submitted 2026-05-23 · 💻 cs.CL

Distinguishing Right from Wrong in Debates: Attribution Analysis of Chinese Harmful Memes

Pith reviewed 2026-06-30 13:51 UTC · model grok-4.3

classification 💻 cs.CL
keywords harmful meme detectionChinese memesattribution analysiscultural knowledge baseintent reasoningEx-ToxiCN-MMRIKE framework
0
0 comments X

The pith

RIKE framework uses cultural knowledge and relative intent reasoning to attribute harm in ambiguous Chinese memes more accurately than baselines.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper targets the gap in Chinese harmful meme detection caused by heavy reliance on cultural context and high semantic ambiguity that makes harm subjective. It builds the Ex-ToxiCN-MM dataset that supplies each meme with two opposing interpretations labeled harmful and non-harmful, plus the C-HarmKB knowledge base of Chinese cultural concepts and offensive terms. The RIKE attribution framework adds an Attribution Knowledge Enhancement module and a Relative Intent Reasoning module to supply missing background and compare competing intents. Experiments across quantitative metrics and qualitative cases show consistent gains over mainstream baselines in the attribution task.

Core claim

By pairing the Ex-ToxiCN-MM dataset of opposing interpretations with the C-HarmKB knowledge base and the RIKE framework that combines Attribution Knowledge Enhancement with Relative Intent Reasoning, models achieve higher accuracy in distinguishing harmful from non-harmful readings of culturally grounded and semantically ambiguous Chinese memes, outperforming standard baselines on multiple evaluation metrics.

What carries the argument

The RIKE framework, which integrates an Attribution Knowledge Enhancement module (AKE) to inject cultural priors and a Relative Intent Reasoning module (RIR) to compare competing interpretations, supplies the missing background and resolves ambiguity that standard models lack.

If this is right

  • Content moderation systems gain an explicit mechanism for surfacing the cultural knowledge and intent comparisons needed for Chinese memes.
  • The same opposing-interpretation evaluation design can be reused to test models on other ambiguous social-media content.
  • Open release of the dataset and C-HarmKB allows direct replication and extension by other researchers working on culturally specific harm detection.
  • The two-module structure separates knowledge injection from intent comparison, making each component easier to ablate or improve independently.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The approach may transfer to non-meme debates where two sides offer conflicting readings of the same statement.
  • If the knowledge base can be updated incrementally, the framework could track evolving cultural references without full retraining.
  • Real-world deployment would require checking whether the performance lift holds when memes appear in live streams rather than curated test sets.

Load-bearing premise

The dataset that assigns each meme two opposing harmful and non-harmful interpretations supplies a rigorous test of whether models can handle culturally grounded ambiguity.

What would settle it

If the RIKE model shows no improvement or underperforms the baselines on accuracy, F1, or other metrics when evaluated on the held-out portion of Ex-ToxiCN-MM, the superiority claim is false.

Figures

Figures reproduced from arXiv: 2605.24344 by Bo Xu, Han Wang, Hongfei Lin, Junyu Lu, Liang Yang, Weiming Wang, Xiaokun Zhang, Zewen Bai.

Figure 1
Figure 1. Figure 1: Examples of harmful memes in Chinese: (a) Uses a cute “puppy” [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Annotation process for Ex-ToxiCN-MM TABLE I EVALUATION RESULTS ON DATASETS WITH DIFFERENT EXPLANATIONS. FIVE EVALUATION CRITERIA: INFORMATIVENESS (INFO.), SOUNDNESS (SOUND.), CULTURAL RELEVANCE (CULT.), CONCISENESS (CONC.), PERSUASIVENESS (PERS.). Dataset Explanation Metrics Info. Sound. Cult. Conc. Pers. Harmful Harmful 4.72 4.81 4.05 4.53 4.79 non-harmful 3.85 4.25 3.75 4.61 3.92 non-harmful Harmful 3.53… view at source ↗
Figure 3
Figure 3. Figure 3: Illustration of the Attribution Analysis Framework, integrating the Relative Intent Reasoning module and Attribution Knowledge Enhancement module. [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Case study of attribution process. Presents three types of interpretive and retrieval knowledge derived from annotated datasets, baseline models, and [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗
read the original abstract

Research on harmful meme detection has garnered significant attention, resulting in the development of numerous datasets and methods. However, progress in detecting Chinese harmful memes lags considerably, primarily due to two challenges: first, accurately assessing a meme's harmfulness depends heavily on understanding deep cultural context; second, many memes are semantically ambiguous, making harmfulness highly subjective. To address these issues, we focus on the interpretable detection of Chinese harmful memes by constructing the first Chinese harmful meme explanation dataset, Ex-ToxiCN-MM. This dataset offers opposing interpretations, categorized as "harmful" and "non-harmful", for each meme, aiming to rigorously evaluate a model's ability to discern and comprehend ambiguous, culturally grounded content. We built a specialized knowledge base of Chinese cultural concepts and offensive vocabulary to supply models with essential prior knowledge (C-HarmKB). To address the ambiguity and lack of background knowledge in meme attribution, we have developed a comprehensive attribution analysis framework, RIKE, which includes an Attribution Knowledge Enhancement module (AKE) and a Relative Intent Reasoning module (RIR). Extensive quantitative and qualitative experiments demonstrate that our method outperforms mainstream baseline models across multiple metrics in the task of attributing harmful memes in Chinese. The code, Ex-ToxiCN-MM dataset, and Chinese Harmful Semantic Knowledge Base (C-HarmKB) involved in this study have been open-sourced at https://github.com/wimiw123/Ex-ToxiCN-MM

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces Ex-ToxiCN-MM, the first Chinese harmful meme explanation dataset providing opposing 'harmful' and 'non-harmful' interpretations for each meme to test cultural and semantic ambiguity. It constructs C-HarmKB, a knowledge base of Chinese cultural concepts and offensive terms, and proposes the RIKE framework with an Attribution Knowledge Enhancement (AKE) module and Relative Intent Reasoning (RIR) module. The central claim is that RIKE outperforms mainstream baseline models across multiple metrics on the task of attributing harmfulness to Chinese memes, with all artifacts open-sourced.

Significance. If the evaluation holds, the work would fill a documented gap in Chinese-language harmful meme research by supplying a dataset explicitly designed for disambiguation of culturally grounded ambiguity, a supporting knowledge base, and an open-sourced attribution framework. The open-sourcing of code, Ex-ToxiCN-MM, and C-HarmKB constitutes a concrete community contribution that could enable follow-on work on interpretable detection.

major comments (2)
  1. [§3] §3 (dataset construction): No inter-annotator agreement statistics (e.g., Cohen’s kappa or Fleiss’ kappa) or details on the number of annotators and resolution process are reported for assigning the opposing interpretations as ground-truth harmful/non-harmful labels. Because the central claim rests on RIKE’s ability to handle culturally ambiguous content using these labels, the absence of documented multi-annotator validation makes the reported performance gains difficult to interpret as evidence of genuine disambiguation rather than fit to author-provided framing.
  2. [Evaluation section] Evaluation section (likely §4 or §5): The abstract and summary assert outperformance on 'multiple metrics' without specifying the exact baselines, data splits, statistical significance tests, or controls for annotation bias in the provided text. If these details are present in the full manuscript, they must be cross-referenced to the dataset construction to confirm that the evaluation is not circular with respect to the authors’ own cultural interpretations.
minor comments (2)
  1. [Abstract] The abstract and introduction use 'attribution analysis' and 'interpretable detection' interchangeably; a brief clarification of the precise output format (e.g., which interpretation is selected and why) would improve readability.
  2. [Figure captions] Figure or table captions for the RIKE architecture should explicitly label the AKE and RIR modules and their inputs/outputs to match the textual description.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for the constructive feedback on dataset validation and evaluation clarity. We respond point-by-point to the major comments below.

read point-by-point responses
  1. Referee: [§3] §3 (dataset construction): No inter-annotator agreement statistics (e.g., Cohen’s kappa or Fleiss’ kappa) or details on the number of annotators and resolution process are reported for assigning the opposing interpretations as ground-truth harmful/non-harmful labels. Because the central claim rests on RIKE’s ability to handle culturally ambiguous content using these labels, the absence of documented multi-annotator validation makes the reported performance gains difficult to interpret as evidence of genuine disambiguation rather than fit to author-provided framing.

    Authors: We acknowledge this is a valid concern. The Ex-ToxiCN-MM opposing interpretations were developed by the authors drawing on specialized cultural expertise rather than through a multi-annotator process with independent labelers. Consequently, we do not have inter-annotator agreement statistics to report. In the revised manuscript we will explicitly state this limitation in §3 and note it as an area for future dataset refinement, but we cannot add retrospective agreement metrics. revision: no

  2. Referee: [Evaluation section] Evaluation section (likely §4 or §5): The abstract and summary assert outperformance on 'multiple metrics' without specifying the exact baselines, data splits, statistical significance tests, or controls for annotation bias in the provided text. If these details are present in the full manuscript, they must be cross-referenced to the dataset construction to confirm that the evaluation is not circular with respect to the authors’ own cultural interpretations.

    Authors: The full manuscript (§4) already specifies the baselines (BERT, RoBERTa, CLIP, VisualBERT and meme-specific models), the 70/15/15 data splits, and statistical significance via paired t-tests (p < 0.05). We will add explicit cross-references from §4 back to §3 and expand the discussion of annotation bias controls (e.g., external sourcing for C-HarmKB entries) to demonstrate that evaluation is not circular with respect to author framing. revision: yes

standing simulated objections not resolved
  • Absence of inter-annotator agreement statistics for Ex-ToxiCN-MM labels, as multi-annotator validation was not performed during dataset construction.

Circularity Check

0 steps flagged

No circularity: empirical ML evaluation on new dataset and framework

full rationale

The paper constructs Ex-ToxiCN-MM and C-HarmKB, proposes the RIKE framework with AKE and RIR modules, and reports that it outperforms baselines on attribution metrics. No equations, derivations, or parameter-fitting steps are described that reduce a claimed prediction to its inputs by construction. No self-citations are used to justify uniqueness theorems or ansatzes that bear on the central result. The work is self-contained as standard empirical ML research introducing artifacts and benchmarking them; the evaluation does not collapse into a re-statement of the authors' own labeling choices.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

Review based on abstract only; full methods, model details, and experimental setup unavailable so ledger entries are minimal and provisional.

axioms (1)
  • domain assumption External cultural knowledge bases can supply essential prior knowledge that improves model performance on ambiguous, culture-specific tasks.
    Invoked when introducing C-HarmKB as necessary to address lack of background knowledge.
invented entities (1)
  • RIKE framework (Attribution Knowledge Enhancement module + Relative Intent Reasoning module) no independent evidence
    purpose: To address ambiguity and lack of background knowledge in meme attribution analysis.
    New modules introduced in the paper without reference to prior independent validation.

pith-pipeline@v0.9.1-grok · 5810 in / 1212 out tokens · 51413 ms · 2026-06-30T13:51:05.944772+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

39 extracted references · 17 canonical work pages · 4 internal anchors

  1. [1]

    Memes in a digital world: Reconciling with a conceptual troublemaker,

    L. Shifman, “Memes in a digital world: Reconciling with a conceptual troublemaker,”Journal of computer-mediated communication, vol. 18, no. 3, pp. 362–377, 2013

  2. [2]

    A web-scale analysis of the community origins of image memes,

    D. Morina and M. S. Bernstein, “A web-scale analysis of the community origins of image memes,”Proceedings of the ACM on Human-Computer Interaction, vol. 6, no. CSCW1, pp. 1–25, 2022

  3. [3]

    “meme-ing

    Y . Zhang, S. Zhao, and K. Merritt, ““meme-ing” across cultures: Understanding how non-eu international students in the uk use internet memes for cultural adaptation and identity,”Behavioral Sciences, vol. 15, no. 5, p. 693, 2025

  4. [4]

    Beneath the surface: Unveiling harmful memes with multimodal reasoning distilled from large language models,

    H. Lin, Z. Luo, J. Ma, and L. Chen, “Beneath the surface: Unveiling harmful memes with multimodal reasoning distilled from large language models,”arXiv preprint arXiv:2312.05434, 2023

  5. [5]

    Momenta: A multimodal framework for detecting harmful memes and their targets,

    S. Pramanick, S. Sharma, D. Dimitrov, M. S. Akhtar, P. Nakov, and T. Chakraborty, “Momenta: A multimodal framework for detecting harmful memes and their targets,”arXiv preprint arXiv:2109.05184, 2021

  6. [6]

    Beyond (mis) representation: Visuals in covid-19 misinformation,

    J. S. Brennen, F. M. Simon, and R. K. Nielsen, “Beyond (mis) representation: Visuals in covid-19 misinformation,”The International Journal of Press/Politics, vol. 26, no. 1, pp. 277–299, 2021

  7. [7]

    Visual disinformation in a digital age: A literature synthesis and research agenda,

    T. Weikmann and S. Lecheler, “Visual disinformation in a digital age: A literature synthesis and research agenda,”New Media & Society, vol. 25, no. 12, pp. 3696–3713, 2023

  8. [8]

    Emotion-aware multimodal fusion for meme emotion detection,

    S. Sharma, R. S., M. S. Akhtar, and T. Chakraborty, “Emotion-aware multimodal fusion for meme emotion detection,”IEEE Trans. Affect. Comput., vol. 15, no. 3, pp. 1800–1811, 2024. [Online]. Available: https://doi.org/10.1109/TAFFC.2024.3378698

  9. [9]

    SCARE: A novel framework to enhance chinese harmful memes detection,

    T. Gu, M. Feng, X. Feng, and X. Wang, “SCARE: A novel framework to enhance chinese harmful memes detection,”IEEE Trans. Affect. Comput., vol. 16, no. 2, pp. 933–945, 2025. [Online]. Available: https://doi.org/10.1109/TAFFC.2024.3481419

  10. [10]

    Benchmark dataset of memes with text transcriptions for automatic detection of multi-modal misogynistic content,

    F. Gasparini, G. Rizzi, A. Saibene, and E. Fersini, “Benchmark dataset of memes with text transcriptions for automatic detection of multi-modal misogynistic content,”Data in brief, vol. 44, p. 108526, 2022

  11. [11]

    arXiv preprint arXiv:2305.17678 , year=

    M. S. Hee, W.-H. Chong, and R. K.-W. Lee, “Decoding the underlying meaning of multimodal hateful memes,”arXiv preprint arXiv:2305.17678, 2023

  12. [12]

    What do you meme? generating explanations for visual semantic role labelling in memes,

    S. Sharma, S. Agarwal, T. Suresh, P. Nakov, M. S. Akhtar, and T. Chakraborty, “What do you meme? generating explanations for visual semantic role labelling in memes,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 37, no. 8, 2023, pp. 9763–9771

  13. [13]

    Towards comprehensive detection of chinese harmful memes,

    J. Lu, B. Xu, X. Zhang, H. Wang, H. Zhu, D. Zhang, L. Yang, and H. Lin, “Towards comprehensive detection of chinese harmful memes,”Advances in Neural Information Processing Systems, vol. 37, pp. 13 302–13 320, 2024

  14. [14]

    Mememind: A large- scale multimodal dataset with chain-of-thought reasoning for harmful meme detection,

    H. Gu, Q. Yu, S. Hou, Z. Fang, H. Wu, and Z. He, “Mememind: A large- scale multimodal dataset with chain-of-thought reasoning for harmful meme detection,”arXiv preprint arXiv:2506.18919, 2025

  15. [15]

    Insightvision: A comprehensive, multi-level chinese-based benchmark for evaluating implicit visual semantics in large vision language models,

    X. Yin, Y . Hong, Y . Guo, Y . Tu, W. Wang, G. Liuet al., “Insightvision: A comprehensive, multi-level chinese-based benchmark for evaluating implicit visual semantics in large vision language models,”arXiv preprint arXiv:2502.15812, 2025

  16. [16]

    Towards detecting chinese harmful memes with fine-grained explanatory augmentation,

    X. Chen, D. Wen, and D. Zuo, “Towards detecting chinese harmful memes with fine-grained explanatory augmentation,”Electronics, vol. 14, no. 17, p. 3504, 2025. 10

  17. [17]

    An overview of the misogyny meme detection shared task for chinese social media,

    B. R. Chakravarthi, R. Ponnusamy, P. Du, X. Zhuang, S. Rajiakodi, P. Buitelaar, B. Sivagnanam, A. KA, S. Lavanyaet al., “An overview of the misogyny meme detection shared task for chinese social media,” in Proceedings of the 5th Conference on Language, Data and Knowledge: Fifth Workshop on Language Technology for Equality, Diversity, Inclusion, 2025, pp. 200–208

  18. [18]

    Computational meme understanding: A survey,

    K. Nguyen and V . Ng, “Computational meme understanding: A survey,” inProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024, pp. 21 251–21 267

  19. [19]

    Annotators with attitudes: How annotator beliefs and identities bias toxic language detection,

    M. Sap, S. Swayamdipta, L. Vianna, X. Zhou, Y . Choi, and N. A. Smith, “Annotators with attitudes: How annotator beliefs and identities bias toxic language detection,”arXiv preprint arXiv:2111.07997, 2021

  20. [20]

    A Survey on Retrieval-Augmented Text Generation for Large Language Models

    Y . Huang and J. Huang, “A survey on retrieval-augmented text generation for large language models,”arXiv preprint arXiv:2404.10981, 2024

  21. [21]

    R 2ag: Incorporating retrieval information into retrieval augmented generation,

    F. Ye, S. Li, Y . Zhang, and L. Chen, “R 2ag: Incorporating retrieval information into retrieval augmented generation,” inEMNLP (Findings), 2024

  22. [22]

    Self-rag: Learning to retrieve, generate, and critique through self-reflection,

    A. Asai, Z. Wu, Y . Wang, A. Sil, and H. Hajishirzi, “Self-rag: Learning to retrieve, generate, and critique through self-reflection,” 2024

  23. [23]

    Memegraphs: Linking memes to knowledge graphs,

    V . Kougia, S. Fetzel, T. Kirchmair, E. C ¸ano, S. M. Baharlou, S. Shar- ifzadeh, and B. Roth, “Memegraphs: Linking memes to knowledge graphs,” inInternational Conference on Document Analysis and Recog- nition. Springer, 2023, pp. 534–551

  24. [24]

    Kermit: Knowledge- empowered model in harmful meme detection,

    B. Grasso, V . La Gatta, V . Moscato, and G. Sperl`ı, “Kermit: Knowledge- empowered model in harmful meme detection,”Information Fusion, vol. 106, p. 102269, 2024

  25. [25]

    I know what you meme! understanding and detecting harmful memes with multimodal large language models

    Y . Zhuang, K. Guo, J. Wang, Y . Jing, X. Xu, W. Yi, M. Yang, B. Zhao, and H. Hu, “I know what you meme! understanding and detecting harmful memes with multimodal large language models.” inNDSS, 2025

  26. [26]

    Detecting harmful memes with decoupled understanding and guided cot reasoning,

    F. Pan, A. T. Luu, and X. Wu, “Detecting harmful memes with decoupled understanding and guided cot reasoning,”arXiv preprint arXiv:2506.08477, 2025

  27. [27]

    Visual instruction tuning,

    H. Liu, C. Li, Q. Wu, and Y . J. Lee, “Visual instruction tuning,”Advances in neural information processing systems, vol. 36, pp. 34 892–34 916, 2023

  28. [28]

    Lora: Low-rank adaptation of large language models

    E. J. Hu, Y . Shen, P. Wallis, Z. Allen-Zhu, Y . Li, S. Wang, L. Wang, W. Chenet al., “Lora: Low-rank adaptation of large language models.” ICLR, vol. 1, no. 2, p. 3, 2022

  29. [29]

    arXiv preprint arXiv:2102.12060 , year=

    S. Wiegreffe and A. Marasovic, “Teach me to explain: A review of datasets for explainable nlp,”arXiv preprint arXiv:2102.12060, vol. 3, 2021

  30. [30]

    Multimodal fake news video explanation: Dataset, analysis and evaluation,

    L. Chen, Z. Qian, P. Li, and Q. Zhu, “Multimodal fake news video explanation: Dataset, analysis and evaluation,”arXiv preprint arXiv:2501.08514, 2025

  31. [31]

    Analyzing and interpreting data from likert-type scales,

    G. M. Sullivan and A. R. Artino Jr, “Analyzing and interpreting data from likert-type scales,”Journal of graduate medical education, vol. 5, no. 4, p. 541, 2013

  32. [32]

    Hyrr: Hybrid infused reranking for passage retrieval,

    J. Lu, K. Hall, J. Ma, and J. Ni, “Hyrr: Hybrid infused reranking for passage retrieval,”arXiv preprint arXiv:2212.10528, 2022

  33. [33]

    A hybrid approach to information retrieval and answer generation for regulatory texts,

    J. Rayo, R. de La Rosa, and M. Garrido, “A hybrid approach to information retrieval and answer generation for regulatory texts,”arXiv preprint arXiv:2502.16767, 2025

  34. [34]

    ROUGE 2.0: Updated and Improved Measures for Evaluation of Summarization Tasks

    K. Ganesan, “Rouge 2.0: Updated and improved measures for evaluation of summarization tasks,”arXiv preprint arXiv:1803.01937, 2018

  35. [35]

    Out of the bleu: how should we assess quality of the code generation models?

    M. Evtikhiev, E. Bogomolov, Y . Sokolov, and T. Bryksin, “Out of the bleu: how should we assess quality of the code generation models?” Journal of Systems and Software, vol. 203, p. 111741, 2023

  36. [36]

    A Survey on LLM-as-a-Judge

    J. Gu, X. Jiang, Z. Shi, H. Tan, X. Zhai, C. Xu, W. Li, Y . Shen, S. Ma, H. Liuet al., “A survey on llm-as-a-judge,”arXiv preprint arXiv:2411.15594, 2024

  37. [37]

    Qwen2.5-VL Technical Report

    S. Bai, K. Chen, X. Liu, J. Wang, W. Ge, S. Song, K. Dang, P. Wang, S. Wang, J. Tanget al., “Qwen2. 5-vl technical report,”arXiv preprint arXiv:2502.13923, 2025

  38. [38]

    Internvl: Scaling up vision foundation models and aligning for generic visual-linguistic tasks,

    Z. Chen, J. Wu, W. Wang, W. Su, G. Chen, S. Xing, M. Zhong, Q. Zhang, X. Zhu, L. Luet al., “Internvl: Scaling up vision foundation models and aligning for generic visual-linguistic tasks,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2024, pp. 24 185–24 198

  39. [39]

    Improved baselines with visual instruction tuning,

    H. Liu, C. Li, Y . Li, and Y . J. Lee, “Improved baselines with visual instruction tuning,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2024, pp. 26 296–26 306