KMMMU benchmark demonstrates that leading multimodal models achieve at most 52.42% accuracy on hard Korean exam questions, highlighting limitations in non-English multimodal understanding.
Byungjin Choi, Seongsu Bae, Sunjun Kweon, and Ed- ward Choi
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.CL 2years
2026 2representative citing papers
K-MetBench shows LLMs have large gaps in interpreting meteorology diagrams and Korean-specific context, with smaller local models beating much larger global ones.
citing papers explorer
-
KMMMU: Evaluation of Massive Multi-discipline Multimodal Understanding in Korean Language and Context
KMMMU benchmark demonstrates that leading multimodal models achieve at most 52.42% accuracy on hard Korean exam questions, highlighting limitations in non-English multimodal understanding.
-
K-MetBench: A Multi-Dimensional Benchmark for Fine-Grained Evaluation of Expert Reasoning, Locality, and Multimodality in Meteorology
K-MetBench shows LLMs have large gaps in interpreting meteorology diagrams and Korean-specific context, with smaller local models beating much larger global ones.