ArabCulture-Dialogue dataset shows LLMs perform worse on dialectal Arabic than Modern Standard Arabic across cultural reasoning, translation, and generation tasks.
Dialect prejudice predicts ai decisions about people's character, employability, and criminality
4 Pith papers cite this work. Polarity classification is still indexing.
4
Pith papers citing it
representative citing papers
Negative valence localizes to early layers and positive valence to mid-to-late layers in LLMs, with the directions being causally steerable.
AI functions as a determinant of health with ambient and personal exposure types, requiring new epidemiological study designs beyond current experiments.
citing papers explorer
-
Cultural Benchmarking of LLMs in Standard and Dialectal Arabic Dialogues
ArabCulture-Dialogue dataset shows LLMs perform worse on dialectal Arabic than Modern Standard Arabic across cultural reasoning, translation, and generation tasks.
-
Negative Before Positive: Asymmetric Valence Processing in Large Language Models
Negative valence localizes to early layers and positive valence to mid-to-late layers in LLMs, with the directions being causally steerable.
- Lessons from the Trenches on Reproducible Evaluation of Language Models