GhazalBench: Evaluating LLM Understanding and Canonical Surface-Form Access in Persian Ghazals

Ghazal Kalhor; Yadollah Yaghoobzadeh

read the original abstract

Persian poetry plays an active role in Iranian cultural practice, where verses by canonical poets such as Hafez are frequently quoted, paraphrased, or completed from partial cues. Supporting such interactions requires language models to engage not only with poetic meaning but also with culturally canonical surface form. We introduce GhazalBench, a benchmark for evaluating how large language models (LLMs) interact with Persian ghazals under usage-grounded conditions. Unlike prior work that primarily studies memorization as a liability, GhazalBench examines settings where access to exact surface form is functionally important for culturally grounded interaction. The benchmark evaluates two complementary abilities: poem-to-prose understanding and canonical surface-form access under varying semantic and lexical cues. Across several proprietary and open-weight multilingual LLMs, we observe a consistent dissociation: models generally capture poetic meaning but struggle to produce exact verse completions in open-ended settings, while recognition-based settings substantially reduce this gap. Parallel experiments on English sonnets show markedly stronger completion performance, suggesting that these limitations are tied more to differences in training exposure than to inherent architectural constraints. Our findings highlight the need for evaluation frameworks that jointly assess meaning, form, and cue-dependent access to culturally significant texts. GhazalBench is available at https://anonymous.4open.science/r/GhazalBench/.

GhazalBench: Evaluating LLM Understanding and Canonical Surface-Form Access in Persian Ghazals

discussion (0)