Concept steerers: Leveragingk-sparse autoencoders for controllable genera- tions

Kim, D · 2025 · arXiv 2501.19066

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

representative citing papers

Look But Don't Touch with Sparse Autoencoders for Unlearning in Diffusion Models

cs.CV · 2026-06-30 · unverdicted · novelty 7.0 · 2 refs

SAEs detect concepts well in diffusion models but fail as direct intervention points for unlearning; a detection-guided patch replacement method yields significantly cleaner erasure results.

Closed-Form Concept Erasure via Double Projections

cs.LG · 2026-04-11 · unverdicted · novelty 6.0

A training-free double-projection linear transformation erases target concepts from generative models by computing a proxy projection then applying a constrained update in the left null space of known directions.

Robust and Generalizable Safety Steering for Text-to-Image Diffusion Transformers

cs.AI · 2026-05-28 · unverdicted · novelty 4.0

SafeDIG applies position-aware sparse feature transfer via SAEs in DiT models to reduce unsafe generations in target risk domains on FLUX.1 Dev and SD 3.5 while keeping source safety and quality.

citing papers explorer

Showing 3 of 3 citing papers after filters.

Look But Don't Touch with Sparse Autoencoders for Unlearning in Diffusion Models cs.CV · 2026-06-30 · unverdicted · none · ref 11 · 2 links
SAEs detect concepts well in diffusion models but fail as direct intervention points for unlearning; a detection-guided patch replacement method yields significantly cleaner erasure results.
Closed-Form Concept Erasure via Double Projections cs.LG · 2026-04-11 · unverdicted · none · ref 36
A training-free double-projection linear transformation erases target concepts from generative models by computing a proxy projection then applying a constrained update in the left null space of known directions.
Robust and Generalizable Safety Steering for Text-to-Image Diffusion Transformers cs.AI · 2026-05-28 · unverdicted · none · ref 23
SafeDIG applies position-aware sparse feature transfer via SAEs in DiT models to reduce unsafe generations in target risk domains on FLUX.1 Dev and SD 3.5 while keeping source safety and quality.

Concept steerers: Leveragingk-sparse autoencoders for controllable genera- tions

fields

years

verdicts

representative citing papers

citing papers explorer