3DEditSafe adds generation-stage guidance, 3D safety regularization, semantic projection, residue suppression, and mask-aware preservation to reduce unsafe semantic alignment in 3D editing while noting a safety-quality tradeoff.
Sneakyprompt: Jailbreaking text-to-image generative models
3 Pith papers cite this work. Polarity classification is still indexing.
years
2026 3verdicts
UNVERDICTED 3representative citing papers
TrajShield is a training-free defense that reduces jailbreak success rates by 52.44% on average in text-to-video models by localizing and neutralizing risks through trajectory simulation and causal intervention.
Gaussian probing infers harmful model specialization from parameter perturbations and internal representation responses to Gaussian latent ensembles rather than from generated outputs.
citing papers explorer
-
3DEditSafe: Defending 3D Editing Pipelines from Unsafe Generation
3DEditSafe adds generation-stage guidance, 3D safety regularization, semantic projection, residue suppression, and mask-aware preservation to reduce unsafe semantic alignment in 3D editing while noting a safety-quality tradeoff.
-
TrajShield: Trajectory-Level Safety Mediation for Defending Text-to-Video Models Against Jailbreak Attacks
TrajShield is a training-free defense that reduces jailbreak success rates by 52.44% on average in text-to-video models by localizing and neutralizing risks through trajectory simulation and causal intervention.
-
Evaluation without Generation: Non-Generative Assessment of Harmful Model Specialization with Applications to CSAM
Gaussian probing infers harmful model specialization from parameter perturbations and internal representation responses to Gaussian latent ensembles rather than from generated outputs.