KSAFE-MM is a two-part multimodal safety benchmark for Korean contexts that shows MLLMs are more vulnerable to culturally grounded jailbreaks than generic ones, with a noted safety-over-refusal trade-off.
Byungjin Choi, Seongsu Bae, Sunjun Kweon, and Ed- ward Choi
3 Pith papers cite this work. Polarity classification is still indexing.
3
Pith papers citing it
fields
cs.CL 3years
2026 3representative citing papers
KMMMU benchmark demonstrates that leading multimodal models achieve at most 52.42% accuracy on hard Korean exam questions, highlighting limitations in non-English multimodal understanding.
K-MetBench shows LLMs have large gaps in interpreting meteorology diagrams and Korean-specific context, with smaller local models beating much larger global ones.
citing papers explorer
-
KMMMU: Evaluation of Massive Multi-discipline Multimodal Understanding in Korean Language and Context
KMMMU benchmark demonstrates that leading multimodal models achieve at most 52.42% accuracy on hard Korean exam questions, highlighting limitations in non-English multimodal understanding.