BEAP is a black-box embedding-aware prompting attack using LLM-guided search that raises attack success rate over 60% against unlearned diffusion models while keeping prompts undetectable.
Shieldlm: Empowering llms as aligned, cus- tomizable and explainable safety detectors.arXiv preprint arXiv:2402.16444, 2024
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CV 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Erased but Exploitable: Black-box Embedding-Aware Prompting Against Unlearned Text-to-Image Diffusion Models
BEAP is a black-box embedding-aware prompting attack using LLM-guided search that raises attack success rate over 60% against unlearned diffusion models while keeping prompts undetectable.