English-only safety alignment fails to transfer cross-lingually, while multilingual DPO training on the new RefusEU dataset improves safety across 12 European languages without degrading Global MMLU performance.
Proceedings of the AAAI Conference on Artificial Intelligence , author=
2 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.CL 2years
2026 2representative citing papers
A survey that catalogs threat models, detection approaches, and mitigation strategies for toxicity in multilingual LLMs while identifying challenges such as uneven language coverage and culturally variable harm definitions.
citing papers explorer
-
Multilingual Refusal Alignment for Safer Large Language Models
English-only safety alignment fails to transfer cross-lingually, while multilingual DPO training on the new RefusEU dataset improves safety across 12 European languages without degrading Global MMLU performance.
-
A Survey of Toxicity Detection and Mitigation Strategies for Multilingual Language Models
A survey that catalogs threat models, detection approaches, and mitigation strategies for toxicity in multilingual LLMs while identifying challenges such as uneven language coverage and culturally variable harm definitions.