Title resolution pending

DiffuGuard: How Intrinsic Safety is Lost · 2025 · arXiv 2509.24296

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

representative citing papers

Machine Unlearning for Masked Diffusion Language Models

cs.CL · 2026-05-18 · unverdicted · novelty 7.0

MDU minimizes forward KL divergence from prompt-conditional to prompt-masked unconditional predictions at masked positions to unlearn knowledge in MDLMs while trading off privacy and utility via temperature scaling.

SafeDiffusion-R1: Online Reward Steering for Safe Diffusion Post-Training

cs.CV · 2026-05-18 · unverdicted · novelty 6.0

SafeDiffusion-R1 uses online GRPO with CLIP embedding steering to cut inappropriate content from 48.9% to 18.07% and nudity detections from 646 to 15 in diffusion models while raising GenEval scores from 42.08% to 47.83% and generalizing across seven harm categories without supervised pairs or extra

Evaluation without Generation: Non-Generative Assessment of Harmful Model Specialization with Applications to CSAM

cs.LG · 2026-04-28 · unverdicted · novelty 6.0

Gaussian probing infers harmful model specialization from parameter perturbations and internal representation responses to Gaussian latent ensembles rather than from generated outputs.

citing papers explorer

Showing 3 of 3 citing papers.

Machine Unlearning for Masked Diffusion Language Models cs.CL · 2026-05-18 · unverdicted · none · ref 9
MDU minimizes forward KL divergence from prompt-conditional to prompt-masked unconditional predictions at masked positions to unlearn knowledge in MDLMs while trading off privacy and utility via temperature scaling.
SafeDiffusion-R1: Online Reward Steering for Safe Diffusion Post-Training cs.CV · 2026-05-18 · unverdicted · none · ref 98
SafeDiffusion-R1 uses online GRPO with CLIP embedding steering to cut inappropriate content from 48.9% to 18.07% and nudity detections from 646 to 15 in diffusion models while raising GenEval scores from 42.08% to 47.83% and generalizing across seven harm categories without supervised pairs or extra
Evaluation without Generation: Non-Generative Assessment of Harmful Model Specialization with Applications to CSAM cs.LG · 2026-04-28 · unverdicted · none · ref 31
Gaussian probing infers harmful model specialization from parameter perturbations and internal representation responses to Gaussian latent ensembles rather than from generated outputs.

Title resolution pending

fields

years

verdicts

representative citing papers

citing papers explorer