LASA improves LLM safety by aligning at language-agnostic semantic bottlenecks, reducing average ASR from 24.7% to 2.8% on LLaMA-3.1-8B and to 3-4% on Qwen models.
In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (V olume 1: Long Papers), pages 13392–13413
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
LASA: Language-Agnostic Semantic Alignment at the Semantic Bottleneck for LLM Safety
LASA improves LLM safety by aligning at language-agnostic semantic bottlenecks, reducing average ASR from 24.7% to 2.8% on LLaMA-3.1-8B and to 3-4% on Qwen models.