ROK-FORTRESS shows Korean-language prompts increase LLM safety suppression compared with English, while Korean geopolitical grounding often reduces that suppression, indicating translation-only evaluations miss language-context interactions.
Tongue-Tied: Breaking LLM s Safety Through New Language Learning
2 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.CL 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
A survey that catalogs threat models, detection approaches, and mitigation strategies for toxicity in multilingual LLMs while identifying challenges such as uneven language coverage and culturally variable harm definitions.
citing papers explorer
-
ROK-FORTRESS: Measuring the Effect of Geopolitical Transcreation for National Security and Public Safety
ROK-FORTRESS shows Korean-language prompts increase LLM safety suppression compared with English, while Korean geopolitical grounding often reduces that suppression, indicating translation-only evaluations miss language-context interactions.
-
A Survey of Toxicity Detection and Mitigation Strategies for Multilingual Language Models
A survey that catalogs threat models, detection approaches, and mitigation strategies for toxicity in multilingual LLMs while identifying challenges such as uneven language coverage and culturally variable harm definitions.