ROK-FORTRESS shows Korean-language prompts increase LLM safety suppression compared with English, while Korean geopolitical grounding often reduces that suppression, indicating translation-only evaluations miss language-context interactions.
Cage: A framework for culturally adaptive red-teaming benchmark generation.arXiv preprint arXiv:2602.20170,
2 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.CL 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
Culturally-adapted red-teaming prompts raise ASR by a mean of 9.3 pp over direct translations across 16 language-model pairs in four Asian languages, with DT scoring mean cultural depth of 0.17 versus up to 2.51 for CA.
citing papers explorer
-
ROK-FORTRESS: Measuring the Effect of Geopolitical Transcreation for National Security and Public Safety
ROK-FORTRESS shows Korean-language prompts increase LLM safety suppression compared with English, while Korean geopolitical grounding often reduces that suppression, indicating translation-only evaluations miss language-context interactions.
-
Culturally-Adapted Red-Teaming Across East and Southeast Asian Contexts: A Methodological and Comparative Analysis
Culturally-adapted red-teaming prompts raise ASR by a mean of 9.3 pp over direct translations across 16 language-model pairs in four Asian languages, with DT scoring mean cultural depth of 0.17 versus up to 2.51 for CA.