Sandwich attack: Multi-language mixture adaptive attack on LLMs.arXiv preprint arXiv:2404.07242,

Bibek Upadhayay, Vahid Behzadan · arXiv 2404.07242

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

representative citing papers

Safety Targeted Embedding Exploit via Refinement

cs.AI · 2026-07-02 · unverdicted · novelty 6.0

STEER is a gradient-guided attack that iteratively translates refusal-triggering words into low-resource languages to jailbreak LLMs, reaching 93-96.7% success on open models and 35.5% transfer to GPT-4o-mini.

citing papers explorer

Showing 1 of 1 citing paper.

Safety Targeted Embedding Exploit via Refinement cs.AI · 2026-07-02 · unverdicted · none · ref 12
STEER is a gradient-guided attack that iteratively translates refusal-triggering words into low-resource languages to jailbreak LLMs, reaching 93-96.7% success on open models and 35.5% transfer to GPT-4o-mini.

Sandwich attack: Multi-language mixture adaptive attack on LLMs.arXiv preprint arXiv:2404.07242,

fields

years

verdicts

representative citing papers

citing papers explorer