{"paper":{"title":"Mitigating Cross-Lingual Cultural Inconsistencies in LLMs via Consensus-Driven Preference Optimisation","license":"http://creativecommons.org/licenses/by/4.0/","headline":"Consensus-driven preference optimization raises cross-language cultural consistency in multilingual LLMs by up to 0.10 points on a new metric.","cross_cats":[],"primary_cat":"cs.CL","authors_text":"Anna Korhonen, Isabelle Augenstein, Lucas Resck","submitted_at":"2026-04-02T14:04:06Z","abstract_excerpt":"Despite their impressive capabilities, multilingual large language models (MLLMs) frequently exhibit inconsistent behaviour when the prompt's language changes. While such adaptation is generally desirable, it becomes a critical failure when a user's identity is explicitly defined. For instance, given a fixed British persona and an ambiguous everyday knowledge query about literature, the prompt's language frequently overwrites the system persona -- yielding Shakespeare in English but Cervantes in Spanish. To robustly quantify this Cross-lingual Cultural Inconsistency, we introduce Singleton Fle"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"C-3PO achieves up to a 0.10-point absolute increase in κ_S over unaligned models, outperforming strong prompting and representation steering baselines.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"That the consensus across languages in C-3PO represents genuine cultural consistency rather than an average that erases valid cultural differences, and that κ_S accurately isolates inconsistency without confounding factors.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"Multilingual LLMs display cross-lingual cultural inconsistency that a new metric quantifies and a consensus-driven preference optimization method reduces by up to 0.10 points.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"Consensus-driven preference optimization raises cross-language cultural consistency in multilingual LLMs by up to 0.10 points on a new metric.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"59649de43d920e1f6e716a91117de4598af1d6d7fbd0f2e838d7ba34c657a420"},"source":{"id":"2605.12515","kind":"arxiv","version":1},"verdict":{"id":"939a6a76-c957-47cb-bda7-ebc4eebf5cbd","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-14T21:53:55.907146Z","strongest_claim":"C-3PO achieves up to a 0.10-point absolute increase in κ_S over unaligned models, outperforming strong prompting and representation steering baselines.","one_line_summary":"Multilingual LLMs display cross-lingual cultural inconsistency that a new metric quantifies and a consensus-driven preference optimization method reduces by up to 0.10 points.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"That the consensus across languages in C-3PO represents genuine cultural consistency rather than an average that erases valid cultural differences, and that κ_S accurately isolates inconsistency without confounding factors.","pith_extraction_headline":"Consensus-driven preference optimization raises cross-language cultural consistency in multilingual LLMs by up to 0.10 points on a new metric."},"references":{"count":32,"sample":[{"doi":"10.18653/v1/2025.emnlp-industry.9","year":2025,"title":"Aligning LLM s for Multilingual Consistency in Enterprise Applications","work_id":"92d41103-eded-4852-a122-f01caabcc787","ref_index":1,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"10.18653/v1/2025.emnlp-main.328","year":2025,"title":"Mengyu Bu, Shaolei Zhang, Zhongjun He, Hua Wu, and Yang Feng. 2025. https://doi.org/10.18653/v1/2025.emnlp-main.328 AlignX : Advancing Multilingual Large Language Models with Multilingual Representati","work_id":"e48cf2ba-6934-4719-bbce-03e61dcc960d","ref_index":2,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"10.1162/coli.a.583","year":2025,"title":"doi: 10.1162/COLI.a.583","work_id":"4619fdd0-b039-4474-aef5-d7fddb980730","ref_index":3,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"10.18653/v1/2025.naacl-long.280","year":2025,"title":"Menglong Cui, Pengzhi Gao, Wei Liu, Jian Luan, and Bin Wang. 2025. https://doi.org/10.18653/v1/2025.naacl-long.280 Multilingual Machine Translation with Open Large Language Models at Practical Scale :","work_id":"1290086c-8eb2-49d2-8698-c7a4448fea7f","ref_index":4,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"10.18653/v1/2022.findings-acl.240","year":2022,"title":"Constanza Fierro and Anders Søgaard. 2022. https://doi.org/10.18653/v1/2022.findings-acl.240 Factual Consistency of Multilingual Pretrained Language Models . In Findings of the Association for Computa","work_id":"032b928a-f659-4fc8-bedc-f7f06739aaa8","ref_index":5,"cited_arxiv_id":"","is_internal_anchor":false}],"resolved_work":32,"snapshot_sha256":"edb96773750d68381f7ebe2db1eef5225fef1e44bd62ba2b1ea7bf287d90bcfa","internal_anchors":3},"formal_canon":{"evidence_count":2,"snapshot_sha256":"ff738823566b2123693aaafc0f474fe8c9d524e8c7c483b1ca942f8222117361"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"}