Camellia: Benchmarking Cultural Biases in LLMs for Asian Languages

Alan Ritter; Anagha Savit; Carlos Rafael Catalan; Geyang Guo; Jaehyeok Lee; JinYeong Bak; Keisuke Sakaguchi; Kyungdon Lee; Lheane Marie Dizon; Mengyu Ye

arxiv: 2510.05291 · v2 · pith:N5D53IKSnew · submitted 2025-10-06 · 💻 cs.CL

Camellia: Benchmarking Cultural Biases in LLMs for Asian Languages

Tarek Naous , Anagha Savit , Carlos Rafael Catalan , Geyang Guo , Jaehyeok Lee , Kyungdon Lee , Lheane Marie Dizon , Mengyu Ye

show 12 more authors

Neel Kothari Sahajpreet Singh Sarah Masud Tanish Patwa Trung Thanh Tran Zohaib Khan Alan Ritter Tanmoy Chakraborty Yuki Arase Keisuke Sakaguchi JinYeong Bak Wei Xu

This is my paper

classification 💻 cs.CL

keywords llmsasianbiasesculturallanguagescamelliaculturesentities

0 comments

read the original abstract

As Large Language Models (LLMs) develop stronger multilingual capabilities, their sensitivity to culturally diverse entities becomes increasingly important. Prior work by Naous et al. (2024) has shown that LLMs often favor Western-associated entities in Arabic. Due to the lack of entity-centric multilingual benchmarks, it remains unclear if such biases also manifest in various non-Western languages. In this paper, we introduce Camellia, a benchmark for evaluating entity-centric cultural biases in nine Asian languages, spanning six Asian cultures. Camellia includes 19,530 manually annotated entities associated with the covered Asian or Western cultures, as well as 2,173 masked contexts for these entities derived from social media posts. Using Camellia, we evaluate cultural biases in four recent multilingual LLMs across three tasks: cultural context adaptation, sentiment association, and entity extractive QA. Our analyses show that LLMs struggle with cultural adaptation across these languages, with performance differing across models developed in different regions. We further observe that different LLM families can hold distinct biases, reflected in the ways they link cultures to particular sentiments. Lastly, we find that LLMs can struggle with context understanding in some Asian languages, creating performance gaps between cultures in entity extraction.

This paper has not been read by Pith yet.

Camellia: Benchmarking Cultural Biases in LLMs for Asian Languages

discussion (0)