LASA improves LLM safety by aligning at language-agnostic semantic bottlenecks, reducing average ASR from 24.7% to 2.8% on LLaMA-3.1-8B and to 3-4% on Qwen models.
In Findings of the Association for Computational Linguistics: ACL 2024, pages 9954– 9972, Bangkok, Thailand
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
LASA: Language-Agnostic Semantic Alignment at the Semantic Bottleneck for LLM Safety
LASA improves LLM safety by aligning at language-agnostic semantic bottlenecks, reducing average ASR from 24.7% to 2.8% on LLaMA-3.1-8B and to 3-4% on Qwen models.