Decoding Multimodal Cues: Unveiling the Implicit Meaning Behind Hateful Videos
Pith reviewed 2026-06-27 10:08 UTC · model grok-4.3
The pith
The IARE framework detects hateful videos at state-of-the-art levels while generating accurate contextual rationales.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper claims that the Information Augmentation and Reasoning Enhancement (IARE) framework, by using multimodal chain-of-thought to integrate harmful elements in an information augmentation phase and Direct Preference Optimization to favor correct reasoning paths in a reasoning enhancement phase, reaches state-of-the-art detection performance while also producing accurate contextual rationales on the Ex-HateMM and Ex-ImpliHateVid datasets.
What carries the argument
The IARE framework, which augments multimodal information via chain-of-thought to enrich rationale evidence and then applies Direct Preference Optimization to improve logical coherence of justifications.
If this is right
- Detection systems can supply contextual rationales that integrate relevant multimodal evidence with logical reasoning.
- The logical coherence of model justifications increases relative to prior methods.
- Models become able to reveal implicit meanings behind hateful judgments in video content.
- New benchmarks for explainable multimodal detection are set on the two released datasets.
Where Pith is reading between the lines
- The same augmentation and optimization steps could be tested on related tasks such as detecting misinformation in short videos.
- Platforms might incorporate the generated rationales to give users clearer feedback on flagged content.
- Scaling the approach will likely require ways to produce or verify contextual rationales without full human annotation.
Load-bearing premise
The human-provided contextual rationales in Ex-HateMM and Ex-ImpliHateVid are reliable and consistent enough to serve as ground truth for training reasoning paths and measuring generated rationale accuracy.
What would settle it
Human raters scoring IARE-generated rationales as less accurate or coherent than baseline outputs on a new set of videos, or IARE detection accuracy falling below reported levels when tested on additional unlabeled hateful video collections.
Figures
read the original abstract
Hateful videos have become prevalent on online platforms, highlighting an urgent need for effective detection. However, existing studies primarily focus on binary classification and fail to provide contextual rationales that reveal the implicit meanings behind these judgments, significantly undermining model explainability. To fill this gap, we aim to achieve explainable hateful video detection, enabling models to provide contextual rationales that integrate relevant evidence and logical reasoning alongside decisions. This approach can comprehensively enhance the understanding of video content and the explainability of the decision-making process. We first introduce two datasets, Ex-HateMM and Ex-ImpliHateVid, for explainable hateful video detection. Each dataset provides fine-grained annotations of multimodal harmful elements, along with contextual rationales. We then propose an Information Augmentation and Reasoning Enhancement (IARE) framework designed for explainable detection. The framework employs an information augmentation phase that leverages the multimodal chain-of-thought to integrate harmful elements, thereby enriching rationale evidence. Additionally, IARE incorporates a reasoning enhancement phase, in which Direct Preference Optimization guides the model toward correct reasoning paths and away from incorrect ones, thereby improving the logical coherence of its justifications. We conduct extensive experiments on the two datasets, comparing multiple baselines with our proposed IARE framework. The results demonstrate that IARE achieves state-of-the-art performance while also generating accurate rationales.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces two new datasets (Ex-HateMM and Ex-ImpliHateVid) for explainable hateful video detection, each providing fine-grained multimodal annotations of harmful elements along with contextual rationales. It proposes the IARE framework, which performs information augmentation via multimodal chain-of-thought to enrich rationale evidence and reasoning enhancement via Direct Preference Optimization (DPO) to improve logical coherence, claiming state-of-the-art detection performance and accurate generated rationales on the datasets.
Significance. If the central claims hold after addressing validation gaps, the work would advance explainable multimodal hate detection by supplying new annotated resources and a method that generates contextual rationales for implicit hate, which is valuable for content moderation applications. The dataset contributions are a clear strength that could support follow-on research in the field.
major comments (2)
- [Datasets section] The central claim that IARE generates 'accurate rationales' and that DPO training succeeds rests on the human-provided contextual rationales serving as reliable ground truth. The Datasets section provides no inter-annotator agreement scores, external validation, or consistency checks for these rationales, despite the known subjectivity of implicit hateful meaning; this directly undermines both the optimization signal and the accuracy evaluation.
- [Experimental Evaluation section] The abstract asserts SOTA results and accurate rationales, yet the Experimental Evaluation section reports no quantitative metrics (e.g., detection accuracy, rationale quality scores), no ablation studies on the augmentation or DPO components, and no details on how rationale accuracy was judged against the human annotations.
minor comments (2)
- [Abstract] The abstract would benefit from naming the specific metrics and baseline models used to support the SOTA claim.
- [Framework Description] Notation for the IARE phases (information augmentation and reasoning enhancement) could be introduced with a diagram or explicit equations for clarity.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on validation and evaluation details. We address each major comment below and will revise the manuscript to strengthen these aspects.
read point-by-point responses
-
Referee: [Datasets section] The central claim that IARE generates 'accurate rationales' and that DPO training succeeds rests on the human-provided contextual rationales serving as reliable ground truth. The Datasets section provides no inter-annotator agreement scores, external validation, or consistency checks for these rationales, despite the known subjectivity of implicit hateful meaning; this directly undermines both the optimization signal and the accuracy evaluation.
Authors: We agree that formal inter-annotator agreement metrics and explicit consistency checks are important to report given the subjectivity of implicit hate. The rationales in Ex-HateMM and Ex-ImpliHateVid were produced by three domain-expert annotators who first labeled independently and then resolved all disagreements via discussion until full consensus; no external validators were used. We will add a dedicated subsection detailing the annotation protocol, annotator backgrounds, and any post-hoc consistency procedures to the Datasets section. revision: yes
-
Referee: [Experimental Evaluation section] The abstract asserts SOTA results and accurate rationales, yet the Experimental Evaluation section reports no quantitative metrics (e.g., detection accuracy, rationale quality scores), no ablation studies on the augmentation or DPO components, and no details on how rationale accuracy was judged against the human annotations.
Authors: The Experimental Evaluation section contains comparative results against multiple baselines that support the SOTA claim, yet we acknowledge it lacks explicit numerical tables, component ablations, and a precise description of rationale evaluation. In revision we will insert quantitative detection accuracy figures, rationale quality scores (both automatic and human), ablation studies isolating multimodal CoT augmentation and DPO, and an explicit account of how generated rationales were judged against the human annotations (via expert matching). revision: yes
Circularity Check
No circularity: empirical framework and new datasets are self-contained
full rationale
The paper introduces two new annotated datasets (Ex-HateMM, Ex-ImpliHateVid) and an IARE framework that augments multimodal chain-of-thought then applies DPO for reasoning enhancement. All performance and rationale-accuracy claims rest on standard experimental comparisons against baselines using the authors' own held-out splits. No equations, derivations, or fitted parameters are presented that reduce to the inputs by construction. No load-bearing self-citations, uniqueness theorems, or ansatzes imported from prior author work appear in the text. The central results are externally falsifiable via the released data and code, satisfying the criteria for a non-circular empirical contribution.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Cleber Alcântara, Viviane Pereira Moreira, and Diego de Vargas Feijó. 2020. Offensive Video Detection: Dataset and Baseline Results. InProceedings of The 12th Language Resources and Evaluation Conference, LREC 2020, Marseille, France, May 11-16, 2020. European Language Resources Association, 4309–4319
2020
-
[2]
Berta Céspedes-Sarrias, Carlos Collado-Capell, Pablo Rodenas-Ruiz, Olena Hry- nenko, and Andrea Cavallaro. 2025. MM-HSD: Multi-Modal Hate Speech Detec- tion in Videos.arXiv preprint arXiv:2508.20546(2025)
arXiv 2025
-
[3]
Aakanksha Chowdhery, Sharan Narang, Jacob Devlin, Maarten Bosma, Gaurav Mishra, Adam Roberts, Paul Barham, Hyung Won Chung, Charles Sutton, Sebas- tian Gehrmann, Parker Schuh, Kensen Shi, Sasha Tsvyashchenko, Joshua Maynez, Abhishek Rao, Parker Barnes, Yi Tay, Noam Shazeer, Vinodkumar Prabhakaran, Emily Reif, Nan Du, Ben Hutchinson, Reiner Pope, James Bra...
2023
-
[4]
Zheng Chu, Jingchang Chen, Qianglong Chen, Weijiang Yu, Tao He, Haotian Wang, Weihua Peng, Ming Liu, Bing Qin, and Ting Liu. 2023. A Survey of Chain of Thought Reasoning: Advances, Frontiers and Future.CoRRabs/2309.15402 (2023). arXiv:2309.15402 doi:10.48550/ARXIV.2309.15402
-
[5]
Mithun Das, Rohit Raj, Punyajoy Saha, Binny Mathew, Manish Gupta, and Ani- mesh Mukherjee. 2023. HateMM: A Multi-Modal Dataset for Hate Video Clas- sification. InProceedings of the Seventeenth International AAAI Conference on Web and Social Media, ICWSM 2023, Limassol, Cyprus, June 5-8, 2023. AAAI Press, 1014–1023
2023
-
[6]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Associa- tion for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, ...
2019
-
[7]
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xi- aohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. 2021. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. In9th Interna- tional Conference on Learning Representations, IC...
2021
-
[8]
Mai ElSherief, Caleb Ziems, David Muchlinski, Vaishnavi Anupindi, Jordyn Sey- bolt, Munmun De Choudhury, and Diyi Yang. 2021. Latent Hatred: A Benchmark for Understanding Implicit Hate Speech. InProceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021, Virtual Event / Punta Cana, Dominican Republic, 7-11 November,...
2021
-
[9]
Gongane, Mousami V
Vaishali U. Gongane, Mousami V. Munot, and Alwin D. Anuse. 2024. A survey of explainable AI techniques for detection of fake news and hate speech on social media platforms.J. Comput. Soc. Sci.7, 1 (2024), 587–623
2024
-
[10]
Ming Shan Hee, Wen-Haw Chong, and Roy Ka-Wei Lee. 2023. Decoding the Underlying Meaning of Multimodal Hateful Memes. InProceedings of the Thirty- Second International Joint Conference on Artificial Intelligence, IJCAI 2023, 19th- 25th August 2023, Macao, SAR, China. ijcai.org, 5995–6003
2023
-
[11]
Deyi Ji, Yuekui Yang, Liqun Liu, Peng Shu, Haiyang Wu, Shaogang Tang, Xudong Chen, Shaoping Ma, Tianrun Chen, and Lanyun Zhu. 2025. RAVEN++: Pinpoint- ing Fine-Grained Violations in Advertisement Videos with Active Reinforcement Reasoning. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track. 1–10
2025
-
[12]
Deyi Ji, Yuekui Yang, Haiyang Wu, Shaoping Ma, Tianrun Chen, and Lanyun Zhu. 2025. RAVEN: Robust advertisement video violation temporal grounding via reinforcement reasoning. InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 6: Industry Track). 22–31
2025
-
[13]
Jiaming Ji, Xinyu Chen, Rui Pan, Han Zhu, Conghui Zhang, Jiahao Li, Donghai Hong, Boyuan Chen, Jiayi Zhou, Kaile Wang, Juntao Dai, Chi-Min Chan, Sirui Han, Yike Guo, and Yaodong Yang. 2025. Safe RLHF-V: Safe Reinforcement Learning from Human Feedback in Multimodal Large Language Models.CoRR abs/2503.17682 (2025). arXiv:2503.17682 doi:10.48550/ARXIV.2503.17682
-
[14]
Hannah Kim, Kushan Mitra, Rafael Li Chen, Sajjadur Rahman, and Dan Zhang
-
[15]
InPro- ceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2024 - System Demonstrations, St
MEGAnno+: A Human-LLM Collaborative Annotation System. InPro- ceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2024 - System Demonstrations, St. Julians, Malta, March 17-22, 2024, Nikolaos Aletras and Orphée De Clercq (Eds.). Association for Computational Linguistics, 168–176. https://aclanthol...
2024
-
[16]
Takeshi Kojima, Shixiang Shane Gu, Machel Reid, Yutaka Matsuo, and Yusuke Iwasawa. 2022. Large Language Models are Zero-Shot Reasoners. InAdvances in Neural Information Processing Systems 35: Annual Conference on Neural Informa- tion Processing Systems 2022, NeurIPS 2022, New Orleans, LA, USA, November 28 - December 9, 2022, Sanmi Koyejo, S. Mohamed, A. A...
2022
-
[17]
Koushik, Diptesh Kanojia, and Helen Treharne
Girish A. Koushik, Diptesh Kanojia, and Helen Treharne. 2025. Towards a Robust Framework for Multimodal Hate Detection: A Study on Videovs.Image-based Content. InCompanion Proceedings of the ACM on Web Conference 2025, WWW 2025, Sydney, NSW, Australia, 28 April 2025 - 2 May 2025. ACM, 2014–2023
2025
-
[18]
Jiayi Kuang, Ying Shen, Jingyou Xie, Haohao Luo, Zhe Xu, Ronghao Li, Yinghui Li, Xianfeng Cheng, Xika Lin, and Yu Han. 2025. Natural Language Understanding and Inference with MLLM in Visual Question Answering: A Survey.ACM Comput. Surv.57, 8 (2025), 190:1–190:36. doi:10.1145/3711680
-
[19]
Jian Lang, Rongpei Hong, Jin Xu, Yili Li, Xovee Xu, and Fan Zhou. 2025. Biting Off More Than You Can Detect: Retrieval-Augmented Multimodal Experts for Short Video Hate Detection. InProceedings of the ACM on Web Conference 2025, WWW 2025, Sydney, NSW, Australia, 28 April 2025- 2 May 2025. ACM, 2763–2774
2025
-
[20]
Yangyang Li, Yuelin Li, Shihuai Zhang, Guangyuan Liu, Yanqiao Chen, Ronghua Shang, and Licheng Jiao. 2024. An attention-based, context-aware multimodal fusion method for sarcasm detection using inter-modality inconsistency.Knowl. Based Syst.287 (2024), 111457
2024
-
[21]
Hongzhan Lin, Ziyang Luo, Wei Gao, Jing Ma, Bo Wang, and Ruichao Yang. 2024. Towards Explainable Harmful Meme Detection through Multimodal Debate between Large Language Models. InProceedings of the ACM on Web Conference 2024, WWW 2024, Singapore, May 13-17, 2024, Tat-Seng Chua, Chong-Wah Ngo, Ravi Kumar, Hady W. Lauw, and Roy Ka-Wei Lee (Eds.). ACM, 2359–...
arXiv 2024
-
[22]
Junyu Lu, Bo Xu, Xiaokun Zhang, Hongbo Wang, Haohao Zhu, Dongyu Zhang, Liang Yang, and Hongfei Lin. 2024. Towards comprehensive detection of chinese harmful memes.Advances in Neural Information Processing Systems37 (2024), 13302–13320
2024
-
[23]
Junyu Lu, Bo Xu, Xiaokun Zhang, Haohao Zhu, Kaichun Wang, Liang Yang, and Hongfei Lin. 2025. Is Having Rationales Enough? Rethinking Knowledge Enhancement for Multimodal Hateful Meme Detection. InProceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2025, Padua, Italy, July 13-18, 2025, Nico...
-
[24]
OpenAI. 2023. GPT-4 Technical Report.CoRRabs/2303.08774 (2023). arXiv:2303.08774 doi:10.48550/ARXIV.2303.08774
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2303.08774 2023
-
[25]
Andrew Owens and Alexei A. Efros. 2018. Audio-Visual Scene Analysis with Self-Supervised Multisensory Features. InComputer Vision - ECCV 2018 - 15th European Conference, Munich, Germany, September 8-14, 2018, Proceedings, Part VI (Lecture Notes in Computer Science, Vol. 11210), Vittorio Ferrari, Martial Hebert, Cristian Sminchisescu, and Yair Weiss (Eds.)...
2018
-
[26]
Richard Yuanzhe Pang, Weizhe Yuan, He He, Kyunghyun Cho, Sainbayar Sukhbaatar, and Jason Weston. 2024. Iterative Reasoning Preference Op- timization. InAdvances in Neural Information Processing Systems 38: An- nual Conference on Neural Information Processing Systems 2024, NeurIPS 2024, Vancouver, BC, Canada, December 10 - 15, 2024, Amir Globersons, Lester...
2024
-
[27]
Mohammad Zia Ur Rehman, Anukriti Bhatnagar, Omkar Kabde, Shubhi Bansal, and Nagendra Kumar. 2025. ImpliHateVid: A Benchmark Dataset and Two-stage Contrastive Learning Framework for Implicit Hate Speech Detection in Videos. InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2025, Vienna, A...
2025
-
[28]
Association for Computational Linguistics, 17209–17221
-
[29]
Smith, and Yejin Choi
Maarten Sap, Saadia Gabriel, Lianhui Qin, Dan Jurafsky, Noah A. Smith, and Yejin Choi. 2020. Social Bias Frames: Reasoning about Social and Power Implications of Language. InProceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, July 5-10, 2020, Dan Jurafsky, Joyce Chai, Natalie Schluter, and Joel R. Tet...
2020
-
[30]
Zhen Tan, Dawei Li, Song Wang, Alimohammad Beigi, Bohan Jiang, Amrita Bhat- tacharjee, Mansooreh Karami, Jundong Li, Lu Cheng, and Huan Liu. 2024. Large Language Models for Data Annotation and Synthesis: A Survey. InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, EMNLP 2024, Miami, FL, USA, November 12-16, 2024, Yas...
-
[31]
Yunlong Tang, Jing Bi, Siting Xu, Luchuan Song, Susan Liang, Teng Wang, Daoan Zhang, Jie An, Jingyang Lin, Rongyi Zhu, Ali Vosoughi, Chao Huang, Zeliang Zhang, Pinxin Liu, Mingqian Feng, Feng Zheng, Jianguo Zhang, Ping Luo, Jiebo Luo, and Chenliang Xu. 2025. Video Understanding with Large Language Models: A Survey.IEEE Transactions on Circuits and Systems...
-
[32]
Romal Thoppilan, Daniel De Freitas, Jamie Hall, Noam Shazeer, Apoorv Kul- shreshtha, Heng-Tze Cheng, Alicia Jin, Taylor Bos, Leslie Baker, Yu Du, YaGuang Li, Hongrae Lee, Huaixiu Steven Zheng, Amin Ghafouri, Marcelo Menegali, Yanping Huang, Maxim Krikun, Dmitry Lepikhin, James Qin, Dehao Chen, Yuanzhong Xu, Zhifeng Chen, Adam Roberts, Maarten Bosma, Yanqi...
Pith/arXiv arXiv 2022
-
[33]
Zico Kolter, Louis-Philippe Morency, and Ruslan Salakhutdinov
Yao-Hung Hubert Tsai, Shaojie Bai, Paul Pu Liang, J. Zico Kolter, Louis-Philippe Morency, and Ruslan Salakhutdinov. 2019. Multimodal Transformer for Unaligned Multimodal Language Sequences. InProceedings of the 57th Conference of the Association for Computational Linguistics, ACL 2019, Florence, Italy, July 28- August 2, 2019, Volume 1: Long Papers. Assoc...
2019
-
[34]
Joachim Vanneste, Manisha Verma, and Debasis Ganguly. 2024. Detecting and Explaining Emotions in Video Advertisements. InProceedings of the 47th In- ternational ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2024, Washington DC, USA, July 14-18, 2024. ACM, 2734–2738
2024
-
[35]
Rabiul Awal, Kenny Tsu Wei Choo, and Roy Ka- Wei Lee
Han Wang, Ming Shan Hee, Md. Rabiul Awal, Kenny Tsu Wei Choo, and Roy Ka- Wei Lee. 2023. Evaluating GPT-3 Generated Explanations for Hateful Content Moderation. InProceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, IJCAI 2023, 19th-25th August 2023, Macao, SAR, China. ijcai.org, 6255–6263
2023
-
[36]
Han Wang, Deyi Ji, Junyu Lu, Lanyun Zhu, Hailong Zhang, Haiyang Wu, Liqun Liu, Peng Shu, and Roy Ka-Wei Lee. 2026. Multi-agent vlms guided self-training with pnu loss for low-resource offensive content detection. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 40. 39387–39396
2026
-
[37]
Han Wang, Deyi Ji, Lanyun Zhu, Jiebo Luo, and Roy Ka-Wei Lee. 2026. Stream- Sense: Streaming Social Task Detection with Selective Vision–Language Model Routing. InProceedings of the ACM Web Conference 2026(United Arab Emi- rates)(WWW ’26). Association for Computing Machinery, New York, NY, USA, 8897–8906. doi:10.1145/3774904.3793046
-
[38]
Hongbo Wang, Junyu Lu, Yan Han, Kai Ma, Liang Yang, and Hongfei Lin. 2024. Towards Patronizing and Condescending Language in Chinese Videos: A Multi- modal Dataset and Detector.ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)(2024), 1–5
2024
-
[39]
Han Wang, Rui Yang Tan, and Roy Ka-Wei Lee. 2025. Cross-Modal Transfer from Memes to Videos: Addressing Data Scarcity in Hateful Video Detection. InProceedings of the ACM on Web Conference 2025, WWW 2025, Sydney, NSW, Australia, 28 April 2025- 2 May 2025. ACM, 5255–5263
2025
-
[40]
Han Wang, Tan Rui Yang, Usman Naseem, and Roy Ka-Wei Lee. 2024. Multi- HateClip: A Multilingual Benchmark Dataset for Hateful Video Detection on YouTube and Bilibili. InProceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024 - 1 November
2024
-
[41]
Xinru Wang, Hannah Kim, Sajjadur Rahman, Kushan Mitra, and Zhengjie Miao
-
[42]
Human-LLM Collaborative Annotation Through Effective Verification of LLM Labels. InProceedings of the CHI Conference on Human Factors in Computing Systems, CHI 2024, Honolulu, HI, USA, May 11-16, 2024, Florian ’Floyd’ Mueller, Penny Kyburz, Julie R. Williamson, Corina Sas, Max L. Wilson, Phoebe O. Toups Dugas, and Irina Shklovski (Eds.). ACM, 303:1–303:21...
-
[43]
Le, Ed H
Xuezhi Wang, Jason Wei, Dale Schuurmans, Quoc V. Le, Ed H. Chi, Sharan Narang, Aakanksha Chowdhery, and Denny Zhou. 2023. Self-Consistency Improves Chain of Thought Reasoning in Language Models. InThe Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net. https://openreview.net/pdf?id=1PL1NIMMrw
2023
-
[44]
Yongjin Yang, Joonkee Kim, Yujin Kim, Namgyu Ho, James Thorne, and Se- Young Yun. 2023. HARE: Explainable Hate Speech Detection with Step-by-Step Reasoning. InFindings of the Association for Computational Linguistics: EMNLP 2023, Singapore, December 6-10, 2023, Houda Bouamor, Juan Pino, and Kalika Bali (Eds.). Association for Computational Linguistics, 5490–5505
2023
-
[45]
Zonghao Ying, Aishan Liu, Siyuan Liang, Lei Huang, Jinyang Guo, Wenbo Zhou, Xianglong Liu, and Dacheng Tao. 2026. SafeBench: A Safety Evaluation Frame- work for Multimodal Large Language Models.Int. J. Comput. Vis.134, 1 (2026),
2026
-
[46]
doi:10.1007/S11263-025-02613-1
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.