This material must be used as the absolute standard for detecting ‘factual errors’ or ‘hallucinations’ in the <Response for Evaluation >

<Expert Verification Reference Material >: Model explanation material with 100% verified facts

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

K-MetBench: A Multi-Dimensional Benchmark for Fine-Grained Evaluation of Expert Reasoning, Locality, and Multimodality in Meteorology

cs.CL · 2026-04-27 · unverdicted · novelty 6.0

K-MetBench shows LLMs have large gaps in interpreting meteorology diagrams and Korean-specific context, with smaller local models beating much larger global ones.

citing papers explorer

Showing 1 of 1 citing paper.

K-MetBench: A Multi-Dimensional Benchmark for Fine-Grained Evaluation of Expert Reasoning, Locality, and Multimodality in Meteorology cs.CL · 2026-04-27 · unverdicted · none · ref 54
K-MetBench shows LLMs have large gaps in interpreting meteorology diagrams and Korean-specific context, with smaller local models beating much larger global ones.

This material must be used as the absolute standard for detecting ‘factual errors’ or ‘hallucinations’ in the <Response for Evaluation >

fields

years

verdicts

representative citing papers

citing papers explorer