SecurityLingua: Efficient Defense of LLM Jailbreak Attacks via Security-Aware Prompt Compression

Amir H. Abdi; Huiqiang Jiang; Lili Qiu; Surin Ahn; Yucheng Li; Yuqing Yang

arxiv: 2506.12707 · v1 · pith:D6EZOACYnew · submitted 2025-06-15 · 💻 cs.CR · cs.CL

SecurityLingua: Efficient Defense of LLM Jailbreak Attacks via Security-Aware Prompt Compression

Yucheng Li , Surin Ahn , Huiqiang Jiang , Amir H. Abdi , Yuqing Yang , Lili Qiu This is my paper

classification 💻 cs.CR cs.CL

keywords promptsecuritylinguaattacksllmsmaliciousintentionsafetyadversarial

0 comments

read the original abstract

Large language models (LLMs) have achieved widespread adoption across numerous applications. However, many LLMs are vulnerable to malicious attacks even after safety alignment. These attacks typically bypass LLMs' safety guardrails by wrapping the original malicious instructions inside adversarial jailbreaks prompts. Previous research has proposed methods such as adversarial training and prompt rephrasing to mitigate these safety vulnerabilities, but these methods often reduce the utility of LLMs or lead to significant computational overhead and online latency. In this paper, we propose SecurityLingua, an effective and efficient approach to defend LLMs against jailbreak attacks via security-oriented prompt compression. Specifically, we train a prompt compressor designed to discern the "true intention" of the input prompt, with a particular focus on detecting the malicious intentions of adversarial prompts. Then, in addition to the original prompt, the intention is passed via the system prompt to the target LLM to help it identify the true intention of the request. SecurityLingua ensures a consistent user experience by leaving the original input prompt intact while revealing the user's potentially malicious intention and stimulating the built-in safety guardrails of the LLM. Moreover, thanks to prompt compression, SecurityLingua incurs only a negligible overhead and extra token cost compared to all existing defense methods, making it an especially practical solution for LLM defense. Experimental results demonstrate that SecurityLingua can effectively defend against malicious attacks and maintain utility of the LLM with negligible compute and latency overhead. Our code is available at https://aka.ms/SecurityLingua.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 4 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Dictionary-Aligned Concept Control for Safeguarding Multimodal LLMs
cs.LG 2026-04 unverdicted novelty 6.0

DACO curates a 15,000-concept dictionary from 400K image-caption pairs and uses it to initialize an SAE that enables granular, concept-specific steering of MLLM activations, raising safety scores on MM-SafetyBench and...
CoopGuard: Stateful Cooperative Agents Safeguarding LLMs Against Evolving Multi-Round Attacks
cs.CR 2026-04 unverdicted novelty 6.0

CoopGuard deploys cooperative agents to track conversation history and counter evolving multi-round attacks on LLMs, achieving a 78.9% reduction in attack success rate on a new 5,200-sample benchmark.
Semantic Attacks on Tool-Augmented LLMs: Securing the Model Context Protocol Against Descriptor-Level Manipulation
cs.CR 2025-12 unverdicted novelty 6.0

Descriptor-level manipulation in the Model Context Protocol can drive LLMs to unsafe tool selections in up to 36% of cases; a layered defense of integrity checks, auxiliary-LLM vetting, and runtime guardrails reduces ...
Prompt Governance? On Governing Technologies Governed by Natural Language
cs.CY 2026-04 unverdicted novelty 4.0

Literature on system prompts for AI shows fragmented and contradictory claims that complicate policy efforts to use them as reliable governance mechanisms.