CAVGAN: Unifying Jailbreak and Defense of LLMs via Generative Adversarial Attacks on their Internal Representations

Xiaohu Li, Yunfeng Ning, Zepeng Bao, Mayi Xu, Jianhao Chen, Tieyun Qian · 2025 · Findings of the Association for Computational Linguistics: ACL 2025 · DOI 10.18653/v1/2025.findings-acl.346

1 Pith paper cite this work, alongside 1 external citations. Polarity classification is still indexing.

1 Pith paper citing it

1 external citations · Crossref

open at publisher browse 1 citing papers

representative citing papers

MetaBackdoor: Exploiting Positional Encoding as a Backdoor Attack Surface in LLMs

cs.CR · 2026-05-14 · unverdicted · novelty 7.0

MetaBackdoor shows that LLMs can be backdoored using positional triggers like sequence length, enabling stealthy activation on clean inputs to leak system prompts or trigger malicious behavior.

citing papers explorer

Showing 1 of 1 citing paper.

MetaBackdoor: Exploiting Positional Encoding as a Backdoor Attack Surface in LLMs cs.CR · 2026-05-14 · unverdicted · none · ref 31
MetaBackdoor shows that LLMs can be backdoored using positional triggers like sequence length, enabling stealthy activation on clean inputs to leak system prompts or trigger malicious behavior.

CAVGAN: Unifying Jailbreak and Defense of LLMs via Generative Adversarial Attacks on their Internal Representations

fields

years

verdicts

representative citing papers

citing papers explorer