3DEditSafe: Defending 3D Editing Pipelines from Unsafe Generation

Meng Jiang; Nicole Meng; Yingjie Lao; Zheyuan Liu

arxiv: 2605.15398 · v1 · pith:KEWHDBO2new · submitted 2026-05-14 · 💻 cs.GR · cs.CV

3DEditSafe: Defending 3D Editing Pipelines from Unsafe Generation

Nicole Meng , Zheyuan Liu , Meng Jiang , Yingjie Lao This is my paper

Pith reviewed 2026-05-19 15:02 UTC · model grok-4.3

classification 💻 cs.GR cs.CV

keywords 3D editingGaussian Splattingunsafe generationsafety regularizationNSFW contenttext-to-3Dsemantic projection

0 comments

The pith

3DEditSafe steers 3D Gaussian Splatting edits away from unsafe semantic directions using layered safety constraints.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that text-driven 3D editing pipelines can turn unsafe prompts into coherent, multi-view NSFW content because edits propagate and optimize across rendered views. It introduces 3DEditSafe, a framework that adds generation-stage safety guidance, rendered-view regularization, safe semantic projection, residue suppression, and mask-aware preservation to keep optimization from following unsafe paths. Experiments on EditSplat scenes with an object-compatible unsafe prompt set demonstrate lower unsafe semantic alignment and fewer successful view-level attacks than 2D guidance alone. The work also documents a clear safety-quality tradeoff where stronger suppression can add artifacts or weaken fidelity to the original unsafe prompt.

Core claim

3DEditSafe is a safety-regularized 3D editing framework that constrains unsafe semantic propagation during optimization by combining generation-stage safety guidance with rendered-view 3D safety regularization, safe semantic projection, residue suppression, and mask-aware preservation. On EditSplat scenes and an object-compatible unsafe prompt benchmark, the method reduces unsafe semantic alignment and view-level attack success rates while showing that 2D safety guidance alone is not consistently sufficient and that stronger suppression trades off against edit quality and prompt fidelity.

What carries the argument

The multi-stage safety-regularized optimization that integrates safety guidance, view regularization, semantic projection, residue suppression, and mask-aware preservation to redirect 3D Gaussian Splatting updates away from unsafe regions.

If this is right

2D safety guidance by itself fails to block coherent unsafe edits that span multiple rendered views in 3D pipelines.
Increasing the strength of unsafe suppression produces measurable drops in visual quality or fidelity to the input prompt.
Effective defenses against unsafe 3D generation must act directly on the optimized 3D representation rather than only on 2D projections.
Unsafe prompts can produce multi-view-consistent NSFW content unless optimization is explicitly constrained at multiple stages.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar layered safety mechanisms could be adapted to other 3D representations such as NeRF or mesh-based editors.
Production 3D content tools may eventually require user-selectable safety strength settings to balance protection against creative intent.
Automated unsafe-prompt classifiers tailored to 3D consistency could be combined with this approach for earlier intervention.

Load-bearing premise

That the listed combination of safety techniques can consistently steer the 3D optimization away from unsafe directions without unacceptable loss of fidelity or introduction of artifacts in typical scenes.

What would settle it

Running the 3DEditSafe pipeline on a set of unsafe prompts and finding that the final 3D representation still exhibits high unsafe semantic alignment scores or high view-level attack success rates.

Figures

Figures reproduced from arXiv: 2605.15398 by Meng Jiang, Nicole Meng, Yingjie Lao, Zheyuan Liu.

**Figure 1.** Figure 1: Unsafe generation in EditSplat [14]. Starting from a clean 3D scene (top row), a benign prompt produces a safe and view-consistent marble edit (second row). In contrast, an unsafe prompt such as “Make his face shredded like horror” generates graphic content that persists across rendered viewpoints (third row). Applying a diffusion-level 2D safety defense alone still fails to remove the unsafe content after… view at source ↗

**Figure 2.** Figure 2: Comparison on face object between unprotected 3D editing and 3DEditSafe under the [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗

**Figure 3.** Figure 3: Comparison on fangzhou object between unprotected 3D editing and 3DEditSafe under [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗

**Figure 4.** Figure 4: Comparison on person object between unprotected 3D editing and 3DEditSafe under the [PITH_FULL_IMAGE:figures/full_fig_p015_4.png] view at source ↗

**Figure 5.** Figure 5: Comparison on bear object between unprotected 3D editing and 3DEditSafe under the [PITH_FULL_IMAGE:figures/full_fig_p016_5.png] view at source ↗

read the original abstract

Recent advances in 3D generative editing, particularly pipelines based on 3D Gaussian Splatting (3DGS), have achieved high-fidelity, multi-view-consistent scene manipulation from text prompts. However, we find that these pipelines also introduce new safety risks when unsafe prompts produce edits that are propagated and optimized across views. In this work, we study unsafe generation in 3D editing pipelines and show that such behavior can lead to coherent, undesirable Not-Safe-For-Work (NSFW) content in the final 3D representation. To address this, we propose 3DEditSafe, a safety-regularized 3D editing framework that constrains unsafe semantic propagation during optimization. 3DEditSafe combines generation-stage safety guidance with rendered-view 3D safety regularization, safe semantic projection, residue suppression, and mask-aware preservation to steer optimization away from unsafe editing directions. We evaluate our approach on EditSplat scenes using an object-compatible unsafe prompt benchmark and show that 2D safety guidance alone is not consistently sufficient to prevent unsafe 3D edits. 3DEditSafe reduces unsafe semantic alignment and view-level attack success rates, while revealing a safety-quality tradeoff in which stronger unsafe suppression can introduce artifacts or reduce unsafe-prompt fidelity. To our knowledge, this work is the first attempt to study and defend against unsafe generation in text-driven 3D editing pipelines, highlighting the need for safety mechanisms that operate directly on optimized 3D representations.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper flags a genuine safety gap in text-driven 3D editing but rests on high-level claims without numbers, equations, or component ablations.

read the letter

The core takeaway is that text prompts in 3D Gaussian Splatting pipelines can produce coherent unsafe content across views, and 3DEditSafe is the first explicit attempt to counter it with a five-part regularization setup. The authors show that plain 2D safety guidance falls short for multi-view consistency and propose adding generation-stage guidance, rendered-view regularization, safe semantic projection, residue suppression, and mask-aware preservation to steer the optimization away from bad directions. That framing and the safety-quality tradeoff they note are useful observations for anyone working on generative 3D tools.

Referee Report

3 major / 1 minor

Summary. The paper claims that text-driven 3D editing pipelines based on 3D Gaussian Splatting can propagate unsafe prompts into coherent NSFW content across views. It proposes 3DEditSafe, a safety-regularized framework combining generation-stage safety guidance, rendered-view 3D safety regularization, safe semantic projection, residue suppression, and mask-aware preservation to steer optimization away from unsafe directions. Evaluation on EditSplat scenes with an object-compatible unsafe prompt benchmark shows that 2D safety guidance alone is insufficient, while the full framework reduces unsafe semantic alignment and view-level attack success rates, at the cost of a safety-quality tradeoff that can introduce artifacts or reduce prompt fidelity. The work positions itself as the first to study and defend against unsafe generation in such 3D pipelines.

Significance. If the empirical claims hold after verification, the work is significant as the first systematic treatment of safety risks specific to 3D editing pipelines. By demonstrating that 2D safety measures do not transfer reliably to multi-view 3D optimization and by identifying an explicit safety-quality tradeoff, the paper supplies a concrete engineering baseline and a set of failure modes that future 3D safety research can build upon or refute.

major comments (3)

[Abstract] Abstract: the central empirical claim that 3DEditSafe 'reduces unsafe semantic alignment and view-level attack success rates' is stated without any quantitative metrics, tables, or numerical results, so the magnitude and statistical reliability of the improvement cannot be assessed from the provided text.
[Method] Method description of 3DEditSafe: the five components are listed but no combined loss equation, weighting schedule, or optimization procedure is supplied, leaving the mechanism by which the suite steers 3DGS optimization away from unsafe directions as an unverified assumption.
[Evaluation] Evaluation section: no ablation isolating the contribution of each regularization term (generation-stage guidance, 3D safety regularization, semantic projection, residue suppression, mask-aware preservation) is reported, which is required to substantiate that the full combination is both necessary and non-conflicting with the original editing objective.

minor comments (1)

[Abstract] The phrase 'object-compatible unsafe prompt benchmark' is used without a citation or short description of its construction or relation to existing 2D safety benchmarks.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below and will revise the manuscript to strengthen the presentation of results, methods, and evaluation.

read point-by-point responses

Referee: [Abstract] Abstract: the central empirical claim that 3DEditSafe 'reduces unsafe semantic alignment and view-level attack success rates' is stated without any quantitative metrics, tables, or numerical results, so the magnitude and statistical reliability of the improvement cannot be assessed from the provided text.

Authors: We agree that the abstract would benefit from explicit quantitative indicators. In the revised version we will incorporate the key measured improvements (e.g., the observed drop in unsafe semantic alignment and the reduction in view-level attack success rate) directly into the abstract while preserving its length and readability. revision: yes
Referee: [Method] Method description of 3DEditSafe: the five components are listed but no combined loss equation, weighting schedule, or optimization procedure is supplied, leaving the mechanism by which the suite steers 3DGS optimization away from unsafe directions as an unverified assumption.

Authors: We acknowledge that an explicit combined objective would clarify the integration. The individual loss terms are defined in Section 3; we will add a single combined-loss equation together with the weighting schedule and the precise optimization steps used for 3D Gaussian Splatting, making the steering mechanism fully verifiable. revision: yes
Referee: [Evaluation] Evaluation section: no ablation isolating the contribution of each regularization term (generation-stage guidance, 3D safety regularization, semantic projection, residue suppression, mask-aware preservation) is reported, which is required to substantiate that the full combination is both necessary and non-conflicting with the original editing objective.

Authors: We agree that component-wise ablations strengthen the claims. Although the current experiments already contrast the full framework against 2D guidance alone, we will insert a new ablation table that incrementally adds each term and reports the resulting safety metrics and editing-quality trade-offs, thereby demonstrating necessity and compatibility. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical engineering framework without derivations or self-referential predictions

full rationale

The paper describes a multi-component safety framework for 3D Gaussian Splatting editing pipelines and evaluates it on benchmarks, but contains no equations, loss functions with explicit derivations, fitted parameters presented as predictions, or first-principles results. All claims rest on empirical measurements of unsafe semantic alignment and attack success rates rather than any chain that reduces to its own inputs by construction. The approach is therefore self-contained as an applied defense method whose validity is assessed externally via the reported experiments.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract provides insufficient technical detail to enumerate specific free parameters, axioms, or invented entities; the framework introduces regularization techniques whose exact formulations and hyperparameter choices are not specified.

pith-pipeline@v0.9.0 · 5804 in / 1167 out tokens · 64252 ms · 2026-05-19T15:02:42.862688+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

3DEditSafe combines generation-stage safety guidance with rendered-view 3D safety regularization, safe semantic projection, residue suppression, and mask-aware preservation to steer optimization away from unsafe editing directions... Ltotal = Ledit + ... + 1risk(p)(λu L3D_unsaf e + λs L3D_saf e + λp Lpreserve)
IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We construct a small benchmark of 30 prompt-scene pairs... evaluate on EditSplat scenes using an object-compatible unsafe prompt benchmark

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

40 extracted references · 40 canonical work pages · 2 internal anchors

[1]

4chan.https://www.4chan.org/

work page
[2]

Lexica.https://lexica.art/

work page
[3]

Mip-nerf 360: Unbounded anti-aliased neural radiance fields

Jonathan T Barron, Ben Mildenhall, Dor Verbin, Pratul P Srinivasan, and Peter Hedman. Mip-nerf 360: Unbounded anti-aliased neural radiance fields. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5470–5479, 2022

work page 2022
[4]

Comprehensive evaluation and analysis for nsfw concept erasure in text-to-image diffusion models.arXiv preprint arXiv:2505.15450, 2025

Die Chen, Zhiwen Li, Cen Chen, Yuexiang Xie, Xiaodan Li, Jinyan Ye, Yingda Chen, and Yaliang Li. Comprehensive evaluation and analysis for nsfw concept erasure in text-to-image diffusion models.arXiv preprint arXiv:2505.15450, 2025

work page arXiv 2025
[5]

Gaussianeditor: Swift and controllable 3d editing with gaussian splatting, 2023

Yiwen Chen, Zilong Chen, Chi Zhang, Feng Wang, Xiaofeng Yang, Yikai Wang, Zhongang Cai, Lei Yang, Huaping Liu, and Guosheng Lin. Gaussianeditor: Swift and controllable 3d editing with gaussian splatting, 2023

work page 2023
[6]

Splats in splats: Robust and effective 3d steganography towards gaussian splatting

Yijia Guo, Wenkai Huang, Yang Li, Gaolei Li, Hang Zhang, Liwen Hu, Jianhua Li, Tiejun Huang, and Lei Ma. Splats in splats: Robust and effective 3d steganography towards gaussian splatting. InProceedings of the AAAI Conference on Artificial Intelligence, volume 40, pages 4485–4493, 2026

work page 2026
[7]

Shielddiff: Suppressing sexual content generation from diffusion models through reinforcement learning.arXiv preprint arXiv:2410.05309, 2024

Dong Han, Salaheldin Mohamed, and Yong Li. Shielddiff: Suppressing sexual content generation from diffusion models through reinforcement learning.arXiv preprint arXiv:2410.05309, 2024

work page arXiv 2024
[8]

Instruct- nerf2nerf: Editing 3d scenes with instructions

Ayaan Haque, Matthew Tancik, Alexei A Efros, Aleksander Holynski, and Angjoo Kanazawa. Instruct- nerf2nerf: Editing 3d scenes with instructions. InProceedings of the IEEE/CVF international conference on computer vision, pages 19740–19750, 2023

work page 2023
[9]

Ctrl-d: Controllable dynamic 3d scene editing with personalized 2d diffusion

Kai He, Chin-Hsuan Wu, and Igor Gilitschenski. Ctrl-d: Controllable dynamic 3d scene editing with personalized 2d diffusion. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 26630–26640, 2025

work page 2025
[10]

A Survey on 3D Gaussian Splatting Applications: Segmentation, Editing, and Generation

Shuting He, Peilin Ji, Yitong Yang, Changshuo Wang, Jiayi Ji, Yinglin Wang, and Henghui Ding. A survey on 3d gaussian splatting applications: Segmentation, editing, and generation.arXiv preprint arXiv:2508.09977, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[11]

Adlift: Lifting adversarial perturbations to safeguard 3d gaussian splatting assets against instruction-driven editing.arXiv preprint arXiv:2512.07247, 2025

Ziming Hong, Tianyu Huang, Runnan Chen, Shanshan Ye, Mingming Gong, Bo Han, and Tongliang Liu. Adlift: Lifting adversarial perturbations to safeguard 3d gaussian splatting assets against instruction-driven editing.arXiv preprint arXiv:2512.07247, 2025

work page arXiv 2025
[12]

Diffusion model-based image editing: A survey.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025

Yi Huang, Jiancheng Huang, Yifan Liu, Mingfu Yan, Jiaxi Lv, Jianzhuang Liu, Wei Xiong, He Zhang, Liangliang Cao, and Shifeng Chen. Diffusion model-based image editing: A survey.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025

work page 2025
[13]

3d gaussian splatting for real-time radiance field rendering.ACM Transactions on Graphics, 42(4), July 2023

Bernhard Kerbl, Georgios Kopanas, Thomas Leimkühler, and George Drettakis. 3d gaussian splatting for real-time radiance field rendering.ACM Transactions on Graphics, 42(4), July 2023. URL https: //repo-sam.inria.fr/fungraph/3d-gaussian-splatting/

work page 2023
[14]

Editsplat: Multi-view fusion and attention-guided optimization for view-consistent 3d scene editing with 3d gaussian splatting

Dong In Lee, Hyeongcheol Park, Jiyoung Seo, Eunbyung Park, Hyunje Park, Ha Dam Baek, Sangheon Shin, Sangmin Kim, and Sangpil Kim. Editsplat: Multi-view fusion and attention-guided optimization for view-consistent 3d scene editing with 3d gaussian splatting. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 11135–11145, 2025

work page 2025
[15]

Auditing image-based nsfw classifiers for content filtering

Warren Leu, Yuta Nakashima, and Noa Garcia. Auditing image-based nsfw classifiers for content filtering. InProceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency, pages 1163–1173, 2024

work page 2024
[16]

Microsoft coco: Common objects in context

Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. InEuropean conference on computer vision, pages 740–755. Springer, 2014

work page 2014
[17]

Degauss: Defending against malicious 3d editing for gaussian splatting

Lingzhuang Meng, Mingwen Shao, Yuanjian Qiao, and Xiang Lv. Degauss: Defending against malicious 3d editing for gaussian splatting. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025

work page 2025
[18]

Advancing adversarial robustness in gnerfs: The il2-nerf attack

Nicole Meng, Caleb Manicke, Ronak Sahu, Caiwen Ding, and Yingjie Lao. Advancing adversarial robustness in gnerfs: The il2-nerf attack. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 16388–16397, 2025. 10

work page 2025
[19]

Nerf: Representing scenes as neural radiance fields for view synthesis.Communications of the ACM, 65 (1):99–106, 2021

Ben Mildenhall, Pratul P Srinivasan, Matthew Tancik, Jonathan T Barron, Ravi Ramamoorthi, and Ren Ng. Nerf: Representing scenes as neural radiance fields for view synthesis.Communications of the ACM, 65 (1):99–106, 2021

work page 2021
[20]

A review of instruction-guided image editing.Engineering Applications of Artificial Intelligence, 163: 112953, 2026

Thanh Tam Nguyen, Zhao Ren, Trinh Pham, Phi Le Nguyen, Quoc Viet Hung Nguyen, and Hongzhi Yin. A review of instruction-guided image editing.Engineering Applications of Artificial Intelligence, 163: 112953, 2026

work page 2026
[21]

3d-latte: Latent space 3d editing from textual instructions.arXiv preprint arXiv:2509.00269, 2025

Maria Parelli, Michael Oechsle, Michael Niemeyer, Federico Tombari, and Andreas Geiger. 3d-latte: Latent space 3d editing from textual instructions.arXiv preprint arXiv:2509.00269, 2025

work page arXiv 2025
[22]

Unsafe Diffusion: On the Generation of Unsafe Images and Hateful Memes From Text-To-Image Models

Yiting Qu, Xinyue Shen, Xinlei He, Michael Backes, Savvas Zannettou, and Yang Zhang. Unsafe Diffusion: On the Generation of Unsafe Images and Hateful Memes From Text-To-Image Models. InACM SIGSAC Conference on Computer and Communications Security (CCS). ACM, 2023

work page 2023
[23]

Unsafe diffusion: On the generation of unsafe images and hateful memes from text-to-image models

Yiting Qu, Xinyue Shen, Xinlei He, Michael Backes, Savvas Zannettou, and Yang Zhang. Unsafe diffusion: On the generation of unsafe images and hateful memes from text-to-image models. InProceedings of the 2023 ACM SIGSAC conference on computer and communications security, pages 3403–3417, 2023

work page 2023
[24]

Google announces new google maps experience featuring neural radiance fields (nerfs)

Michael Rubloff. Google announces new google maps experience featuring neural radiance fields (nerfs). Randiance Fields, 2023

work page 2023
[25]

Side effects of erasing concepts from diffusion models.arXiv preprint arXiv:2508.15124, 2025

Shaswati Saha, Sourajit Saha, Manas Gaur, and Tejas Gokhale. Side effects of erasing concepts from diffusion models.arXiv preprint arXiv:2508.15124, 2025

work page arXiv 2025
[26]

2024 , journal =

Matthias Schneider and Thilo Hagendorff. When image generation goes wrong: A safety analysis of stable diffusion models.arXiv preprint arXiv:2411.15516, 2024

work page arXiv 2024
[27]

Safe latent diffusion: Mitigating inappropriate degeneration in diffusion models

Patrick Schramowski, Manuel Brack, Björn Deiseroth, and Kristian Kersting. Safe latent diffusion: Mitigating inappropriate degeneration in diffusion models. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 22522–22531, 2023

work page 2023
[28]

2025 , journal =

Jiwoo Shin, Byeonghu Na, Mina Kang, Wonhyeok Choi, and Il-Chul Moon. Prompt-based safety guidance is ineffective for unlearned text-to-image diffusion models.arXiv preprint arXiv:2511.04834, 2025

work page arXiv 2025
[29]

GIF: A Conditional Multimodal Generative Framework for IR Drop Imaging in Chip Layouts

Kiran Thorat, Nicole Meng, Mostafa Karami, Caiwen Ding, Yingjie Lao, and Zhijie Jerry Shi. Gif: A conditional multimodal generative framework for ir drop imaging in chip layouts.arXiv preprint arXiv:2604.09999, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[30]

Safety without semantic disruptions: Editing-free safe image generation via context-preserving dual latent reconstruction

Jordan Vice, Naveed Akhtar, Mubarak Shah, Richard Hartley, and Ajmal Saeed Mian. Safety without semantic disruptions: Editing-free safe image generation via context-preserving dual latent reconstruction. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 2306–2316, 2025

work page 2025
[31]

Self-correcting llm- controlled diffusion models

Tsung-Han Wu, Long Lian, Joseph E Gonzalez, Boyi Li, and Trevor Darrell. Self-correcting llm- controlled diffusion models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6327–6336, 2024

work page 2024
[32]

Universal prompt optimizer for safe text-to-image generation

Zongyu Wu, Hongcheng Gao, Yueze Wang, Xiang Zhang, and Suhang Wang. Universal prompt optimizer for safe text-to-image generation. InProceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 6340–6354, 2024

work page 2024
[33]

Nsfw-classifier guided prompt sanitization for safe text-to-image generation.arXiv preprint arXiv:2506.18325, 2025

Yu Xie, Chengjie Zeng, Lingyun Zhang, and Yanwei Fu. Nsfw-classifier guided prompt sanitization for safe text-to-image generation.arXiv preprint arXiv:2506.18325, 2025

work page arXiv 2025
[34]

Sneakyprompt: Jailbreaking text-to-image generative models

Yuchen Yang, Bo Hui, Haolin Yuan, Neil Gong, and Yinzhi Cao. Sneakyprompt: Jailbreaking text-to-image generative models. In2024 IEEE symposium on security and privacy (SP), pages 897–912. IEEE, 2024

work page 2024
[35]

Blendedmvs: A large-scale dataset for generalized multi-view stereo networks

Yao Yao, Zixin Luo, Shiwei Li, Jingyang Zhang, Yufan Ren, Lei Zhou, Tian Fang, and Long Quan. Blendedmvs: A large-scale dataset for generalized multi-view stereo networks. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 1790–1799, 2020

work page 2020
[36]

Gaussian grouping: Segment and edit anything in 3d scenes

Mingqiao Ye, Martin Danelljan, Fisher Yu, and Lei Ke. Gaussian grouping: Segment and edit anything in 3d scenes. InEuropean conference on computer vision, pages 162–179. Springer, 2024

work page 2024
[37]

Promptguard: Soft prompt-guided unsafe content moderation for text-to-image models.IEEE Transactions on Information Forensics and Security, 2026

Lingzhi Yuan, Xinfeng Li, Chejian Xu, Guanhong Tao, Xiaojun Jia, Yihao Huang, Wei Dong, Yang Liu, and Bo Li. Promptguard: Soft prompt-guided unsafe content moderation for text-to-image models.IEEE Transactions on Information Forensics and Security, 2026. 11

work page 2026
[38]

3ditscene: Editing any scene via language-guided disentangled gaussian splatting.arXiv preprint arXiv:2405.18424, 2024

Qihang Zhang, Yinghao Xu, Chaoyang Wang, Hsin-Ying Lee, Gordon Wetzstein, Bolei Zhou, and Ceyuan Yang. 3ditscene: Editing any scene via language-guided disentangled gaussian splatting.arXiv preprint arXiv:2405.18424, 2024

work page arXiv 2024
[39]

The unreasonable effectiveness of deep features as a perceptual metric

Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman, and Oliver Wang. The unreasonable effectiveness of deep features as a perceptual metric. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 586–595, 2018

work page 2018
[40]

person,” we minimally adapt the object noun to match the scene, e.g., replacing “person

Yuyang Zhang, Kangjie Chen, Xudong Jiang, Jiahui Wen, Yihui Jin, Ziyou Liang, Yihao Huang, Run Wang, and Lina Wang. {USD}:{NSFW} content detection for {Text-to-Image} models via scene graph. In34th USENIX Security Symposium (USENIX Security 25), pages 879–895, 2025. 12 A Appendix A.1 Benchmark Construction Details We construct an object-compatible benchma...

work page 2025

[1] [1]

4chan.https://www.4chan.org/

work page

[2] [2]

Lexica.https://lexica.art/

work page

[3] [3]

Mip-nerf 360: Unbounded anti-aliased neural radiance fields

Jonathan T Barron, Ben Mildenhall, Dor Verbin, Pratul P Srinivasan, and Peter Hedman. Mip-nerf 360: Unbounded anti-aliased neural radiance fields. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5470–5479, 2022

work page 2022

[4] [4]

Comprehensive evaluation and analysis for nsfw concept erasure in text-to-image diffusion models.arXiv preprint arXiv:2505.15450, 2025

Die Chen, Zhiwen Li, Cen Chen, Yuexiang Xie, Xiaodan Li, Jinyan Ye, Yingda Chen, and Yaliang Li. Comprehensive evaluation and analysis for nsfw concept erasure in text-to-image diffusion models.arXiv preprint arXiv:2505.15450, 2025

work page arXiv 2025

[5] [5]

Gaussianeditor: Swift and controllable 3d editing with gaussian splatting, 2023

Yiwen Chen, Zilong Chen, Chi Zhang, Feng Wang, Xiaofeng Yang, Yikai Wang, Zhongang Cai, Lei Yang, Huaping Liu, and Guosheng Lin. Gaussianeditor: Swift and controllable 3d editing with gaussian splatting, 2023

work page 2023

[6] [6]

Splats in splats: Robust and effective 3d steganography towards gaussian splatting

Yijia Guo, Wenkai Huang, Yang Li, Gaolei Li, Hang Zhang, Liwen Hu, Jianhua Li, Tiejun Huang, and Lei Ma. Splats in splats: Robust and effective 3d steganography towards gaussian splatting. InProceedings of the AAAI Conference on Artificial Intelligence, volume 40, pages 4485–4493, 2026

work page 2026

[7] [7]

Shielddiff: Suppressing sexual content generation from diffusion models through reinforcement learning.arXiv preprint arXiv:2410.05309, 2024

Dong Han, Salaheldin Mohamed, and Yong Li. Shielddiff: Suppressing sexual content generation from diffusion models through reinforcement learning.arXiv preprint arXiv:2410.05309, 2024

work page arXiv 2024

[8] [8]

Instruct- nerf2nerf: Editing 3d scenes with instructions

Ayaan Haque, Matthew Tancik, Alexei A Efros, Aleksander Holynski, and Angjoo Kanazawa. Instruct- nerf2nerf: Editing 3d scenes with instructions. InProceedings of the IEEE/CVF international conference on computer vision, pages 19740–19750, 2023

work page 2023

[9] [9]

Ctrl-d: Controllable dynamic 3d scene editing with personalized 2d diffusion

Kai He, Chin-Hsuan Wu, and Igor Gilitschenski. Ctrl-d: Controllable dynamic 3d scene editing with personalized 2d diffusion. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 26630–26640, 2025

work page 2025

[10] [10]

A Survey on 3D Gaussian Splatting Applications: Segmentation, Editing, and Generation

Shuting He, Peilin Ji, Yitong Yang, Changshuo Wang, Jiayi Ji, Yinglin Wang, and Henghui Ding. A survey on 3d gaussian splatting applications: Segmentation, editing, and generation.arXiv preprint arXiv:2508.09977, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[11] [11]

Adlift: Lifting adversarial perturbations to safeguard 3d gaussian splatting assets against instruction-driven editing.arXiv preprint arXiv:2512.07247, 2025

Ziming Hong, Tianyu Huang, Runnan Chen, Shanshan Ye, Mingming Gong, Bo Han, and Tongliang Liu. Adlift: Lifting adversarial perturbations to safeguard 3d gaussian splatting assets against instruction-driven editing.arXiv preprint arXiv:2512.07247, 2025

work page arXiv 2025

[12] [12]

Diffusion model-based image editing: A survey.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025

Yi Huang, Jiancheng Huang, Yifan Liu, Mingfu Yan, Jiaxi Lv, Jianzhuang Liu, Wei Xiong, He Zhang, Liangliang Cao, and Shifeng Chen. Diffusion model-based image editing: A survey.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025

work page 2025

[13] [13]

3d gaussian splatting for real-time radiance field rendering.ACM Transactions on Graphics, 42(4), July 2023

Bernhard Kerbl, Georgios Kopanas, Thomas Leimkühler, and George Drettakis. 3d gaussian splatting for real-time radiance field rendering.ACM Transactions on Graphics, 42(4), July 2023. URL https: //repo-sam.inria.fr/fungraph/3d-gaussian-splatting/

work page 2023

[14] [14]

Editsplat: Multi-view fusion and attention-guided optimization for view-consistent 3d scene editing with 3d gaussian splatting

Dong In Lee, Hyeongcheol Park, Jiyoung Seo, Eunbyung Park, Hyunje Park, Ha Dam Baek, Sangheon Shin, Sangmin Kim, and Sangpil Kim. Editsplat: Multi-view fusion and attention-guided optimization for view-consistent 3d scene editing with 3d gaussian splatting. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 11135–11145, 2025

work page 2025

[15] [15]

Auditing image-based nsfw classifiers for content filtering

Warren Leu, Yuta Nakashima, and Noa Garcia. Auditing image-based nsfw classifiers for content filtering. InProceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency, pages 1163–1173, 2024

work page 2024

[16] [16]

Microsoft coco: Common objects in context

Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. InEuropean conference on computer vision, pages 740–755. Springer, 2014

work page 2014

[17] [17]

Degauss: Defending against malicious 3d editing for gaussian splatting

Lingzhuang Meng, Mingwen Shao, Yuanjian Qiao, and Xiang Lv. Degauss: Defending against malicious 3d editing for gaussian splatting. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025

work page 2025

[18] [18]

Advancing adversarial robustness in gnerfs: The il2-nerf attack

Nicole Meng, Caleb Manicke, Ronak Sahu, Caiwen Ding, and Yingjie Lao. Advancing adversarial robustness in gnerfs: The il2-nerf attack. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 16388–16397, 2025. 10

work page 2025

[19] [19]

Nerf: Representing scenes as neural radiance fields for view synthesis.Communications of the ACM, 65 (1):99–106, 2021

Ben Mildenhall, Pratul P Srinivasan, Matthew Tancik, Jonathan T Barron, Ravi Ramamoorthi, and Ren Ng. Nerf: Representing scenes as neural radiance fields for view synthesis.Communications of the ACM, 65 (1):99–106, 2021

work page 2021

[20] [20]

A review of instruction-guided image editing.Engineering Applications of Artificial Intelligence, 163: 112953, 2026

Thanh Tam Nguyen, Zhao Ren, Trinh Pham, Phi Le Nguyen, Quoc Viet Hung Nguyen, and Hongzhi Yin. A review of instruction-guided image editing.Engineering Applications of Artificial Intelligence, 163: 112953, 2026

work page 2026

[21] [21]

3d-latte: Latent space 3d editing from textual instructions.arXiv preprint arXiv:2509.00269, 2025

Maria Parelli, Michael Oechsle, Michael Niemeyer, Federico Tombari, and Andreas Geiger. 3d-latte: Latent space 3d editing from textual instructions.arXiv preprint arXiv:2509.00269, 2025

work page arXiv 2025

[22] [22]

Unsafe Diffusion: On the Generation of Unsafe Images and Hateful Memes From Text-To-Image Models

Yiting Qu, Xinyue Shen, Xinlei He, Michael Backes, Savvas Zannettou, and Yang Zhang. Unsafe Diffusion: On the Generation of Unsafe Images and Hateful Memes From Text-To-Image Models. InACM SIGSAC Conference on Computer and Communications Security (CCS). ACM, 2023

work page 2023

[23] [23]

Unsafe diffusion: On the generation of unsafe images and hateful memes from text-to-image models

Yiting Qu, Xinyue Shen, Xinlei He, Michael Backes, Savvas Zannettou, and Yang Zhang. Unsafe diffusion: On the generation of unsafe images and hateful memes from text-to-image models. InProceedings of the 2023 ACM SIGSAC conference on computer and communications security, pages 3403–3417, 2023

work page 2023

[24] [24]

Google announces new google maps experience featuring neural radiance fields (nerfs)

Michael Rubloff. Google announces new google maps experience featuring neural radiance fields (nerfs). Randiance Fields, 2023

work page 2023

[25] [25]

Side effects of erasing concepts from diffusion models.arXiv preprint arXiv:2508.15124, 2025

Shaswati Saha, Sourajit Saha, Manas Gaur, and Tejas Gokhale. Side effects of erasing concepts from diffusion models.arXiv preprint arXiv:2508.15124, 2025

work page arXiv 2025

[26] [26]

2024 , journal =

Matthias Schneider and Thilo Hagendorff. When image generation goes wrong: A safety analysis of stable diffusion models.arXiv preprint arXiv:2411.15516, 2024

work page arXiv 2024

[27] [27]

Safe latent diffusion: Mitigating inappropriate degeneration in diffusion models

Patrick Schramowski, Manuel Brack, Björn Deiseroth, and Kristian Kersting. Safe latent diffusion: Mitigating inappropriate degeneration in diffusion models. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 22522–22531, 2023

work page 2023

[28] [28]

2025 , journal =

Jiwoo Shin, Byeonghu Na, Mina Kang, Wonhyeok Choi, and Il-Chul Moon. Prompt-based safety guidance is ineffective for unlearned text-to-image diffusion models.arXiv preprint arXiv:2511.04834, 2025

work page arXiv 2025

[29] [29]

GIF: A Conditional Multimodal Generative Framework for IR Drop Imaging in Chip Layouts

Kiran Thorat, Nicole Meng, Mostafa Karami, Caiwen Ding, Yingjie Lao, and Zhijie Jerry Shi. Gif: A conditional multimodal generative framework for ir drop imaging in chip layouts.arXiv preprint arXiv:2604.09999, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026

[30] [30]

Safety without semantic disruptions: Editing-free safe image generation via context-preserving dual latent reconstruction

Jordan Vice, Naveed Akhtar, Mubarak Shah, Richard Hartley, and Ajmal Saeed Mian. Safety without semantic disruptions: Editing-free safe image generation via context-preserving dual latent reconstruction. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 2306–2316, 2025

work page 2025

[31] [31]

Self-correcting llm- controlled diffusion models

Tsung-Han Wu, Long Lian, Joseph E Gonzalez, Boyi Li, and Trevor Darrell. Self-correcting llm- controlled diffusion models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6327–6336, 2024

work page 2024

[32] [32]

Universal prompt optimizer for safe text-to-image generation

Zongyu Wu, Hongcheng Gao, Yueze Wang, Xiang Zhang, and Suhang Wang. Universal prompt optimizer for safe text-to-image generation. InProceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 6340–6354, 2024

work page 2024

[33] [33]

Nsfw-classifier guided prompt sanitization for safe text-to-image generation.arXiv preprint arXiv:2506.18325, 2025

Yu Xie, Chengjie Zeng, Lingyun Zhang, and Yanwei Fu. Nsfw-classifier guided prompt sanitization for safe text-to-image generation.arXiv preprint arXiv:2506.18325, 2025

work page arXiv 2025

[34] [34]

Sneakyprompt: Jailbreaking text-to-image generative models

Yuchen Yang, Bo Hui, Haolin Yuan, Neil Gong, and Yinzhi Cao. Sneakyprompt: Jailbreaking text-to-image generative models. In2024 IEEE symposium on security and privacy (SP), pages 897–912. IEEE, 2024

work page 2024

[35] [35]

Blendedmvs: A large-scale dataset for generalized multi-view stereo networks

Yao Yao, Zixin Luo, Shiwei Li, Jingyang Zhang, Yufan Ren, Lei Zhou, Tian Fang, and Long Quan. Blendedmvs: A large-scale dataset for generalized multi-view stereo networks. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 1790–1799, 2020

work page 2020

[36] [36]

Gaussian grouping: Segment and edit anything in 3d scenes

Mingqiao Ye, Martin Danelljan, Fisher Yu, and Lei Ke. Gaussian grouping: Segment and edit anything in 3d scenes. InEuropean conference on computer vision, pages 162–179. Springer, 2024

work page 2024

[37] [37]

Promptguard: Soft prompt-guided unsafe content moderation for text-to-image models.IEEE Transactions on Information Forensics and Security, 2026

Lingzhi Yuan, Xinfeng Li, Chejian Xu, Guanhong Tao, Xiaojun Jia, Yihao Huang, Wei Dong, Yang Liu, and Bo Li. Promptguard: Soft prompt-guided unsafe content moderation for text-to-image models.IEEE Transactions on Information Forensics and Security, 2026. 11

work page 2026

[38] [38]

3ditscene: Editing any scene via language-guided disentangled gaussian splatting.arXiv preprint arXiv:2405.18424, 2024

Qihang Zhang, Yinghao Xu, Chaoyang Wang, Hsin-Ying Lee, Gordon Wetzstein, Bolei Zhou, and Ceyuan Yang. 3ditscene: Editing any scene via language-guided disentangled gaussian splatting.arXiv preprint arXiv:2405.18424, 2024

work page arXiv 2024

[39] [39]

The unreasonable effectiveness of deep features as a perceptual metric

Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman, and Oliver Wang. The unreasonable effectiveness of deep features as a perceptual metric. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 586–595, 2018

work page 2018

[40] [40]

person,” we minimally adapt the object noun to match the scene, e.g., replacing “person

Yuyang Zhang, Kangjie Chen, Xudong Jiang, Jiahui Wen, Yihui Jin, Ziyou Liang, Yihao Huang, Run Wang, and Lina Wang. {USD}:{NSFW} content detection for {Text-to-Image} models via scene graph. In34th USENIX Security Symposium (USENIX Security 25), pages 879–895, 2025. 12 A Appendix A.1 Benchmark Construction Details We construct an object-compatible benchma...

work page 2025