SafeDiffusion-R1 uses online GRPO with CLIP embedding steering to cut inappropriate content from 48.9% to 18.07% and nudity detections from 646 to 15 in diffusion models while raising GenEval scores from 42.08% to 47.83% and generalizing across seven harm categories without supervised pairs or extra
Title resolution pending
3 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.CV 3verdicts
UNVERDICTED 3representative citing papers
BARRIER applies interval arithmetic to SVD-based activation projections to create bounded forget regions that enable aggressive unlearning while providing formal protection for retain distributions via tail bounds on functional drift.
GrOCE uses dynamic semantic graphs for online, training-free erasure of target concepts from diffusion model prompts via cluster identification and selective severing.
citing papers explorer
-
SafeDiffusion-R1: Online Reward Steering for Safe Diffusion Post-Training
SafeDiffusion-R1 uses online GRPO with CLIP embedding steering to cut inappropriate content from 48.9% to 18.07% and nudity detections from 646 to 15 in diffusion models while raising GenEval scores from 42.08% to 47.83% and generalizing across seven harm categories without supervised pairs or extra
-
BARRIER: Bounded Activation Regions for Robust Information Erasure
BARRIER applies interval arithmetic to SVD-based activation projections to create bounded forget regions that enable aggressive unlearning while providing formal protection for retain distributions via tail bounds on functional drift.
-
GrOCE:Graph-Guided Online Concept Erasure for Text-to-Image Diffusion Models
GrOCE uses dynamic semantic graphs for online, training-free erasure of target concepts from diffusion model prompts via cluster identification and selective severing.