SAEs detect concepts well in diffusion models but fail as direct intervention points for unlearning; a detection-guided patch replacement method yields significantly cleaner erasure results.
Concept steerers: Leveragingk-sparse autoencoders for controllable genera- tions
3 Pith papers cite this work. Polarity classification is still indexing.
years
2026 3verdicts
UNVERDICTED 3representative citing papers
A training-free double-projection linear transformation erases target concepts from generative models by computing a proxy projection then applying a constrained update in the left null space of known directions.
SafeDIG applies position-aware sparse feature transfer via SAEs in DiT models to reduce unsafe generations in target risk domains on FLUX.1 Dev and SD 3.5 while keeping source safety and quality.
citing papers explorer
-
Look But Don't Touch with Sparse Autoencoders for Unlearning in Diffusion Models
SAEs detect concepts well in diffusion models but fail as direct intervention points for unlearning; a detection-guided patch replacement method yields significantly cleaner erasure results.
-
Closed-Form Concept Erasure via Double Projections
A training-free double-projection linear transformation erases target concepts from generative models by computing a proxy projection then applying a constrained update in the left null space of known directions.
-
Robust and Generalizable Safety Steering for Text-to-Image Diffusion Transformers
SafeDIG applies position-aware sparse feature transfer via SAEs in DiT models to reduce unsafe generations in target risk domains on FLUX.1 Dev and SD 3.5 while keeping source safety and quality.