pith. sign in

arxiv: 2605.15398 · v1 · pith:KEWHDBO2new · submitted 2026-05-14 · 💻 cs.GR · cs.CV

3DEditSafe: Defending 3D Editing Pipelines from Unsafe Generation

Pith reviewed 2026-05-19 15:02 UTC · model grok-4.3

classification 💻 cs.GR cs.CV
keywords 3D editingGaussian Splattingunsafe generationsafety regularizationNSFW contenttext-to-3Dsemantic projection
0
0 comments X

The pith

3DEditSafe steers 3D Gaussian Splatting edits away from unsafe semantic directions using layered safety constraints.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that text-driven 3D editing pipelines can turn unsafe prompts into coherent, multi-view NSFW content because edits propagate and optimize across rendered views. It introduces 3DEditSafe, a framework that adds generation-stage safety guidance, rendered-view regularization, safe semantic projection, residue suppression, and mask-aware preservation to keep optimization from following unsafe paths. Experiments on EditSplat scenes with an object-compatible unsafe prompt set demonstrate lower unsafe semantic alignment and fewer successful view-level attacks than 2D guidance alone. The work also documents a clear safety-quality tradeoff where stronger suppression can add artifacts or weaken fidelity to the original unsafe prompt.

Core claim

3DEditSafe is a safety-regularized 3D editing framework that constrains unsafe semantic propagation during optimization by combining generation-stage safety guidance with rendered-view 3D safety regularization, safe semantic projection, residue suppression, and mask-aware preservation. On EditSplat scenes and an object-compatible unsafe prompt benchmark, the method reduces unsafe semantic alignment and view-level attack success rates while showing that 2D safety guidance alone is not consistently sufficient and that stronger suppression trades off against edit quality and prompt fidelity.

What carries the argument

The multi-stage safety-regularized optimization that integrates safety guidance, view regularization, semantic projection, residue suppression, and mask-aware preservation to redirect 3D Gaussian Splatting updates away from unsafe regions.

If this is right

  • 2D safety guidance by itself fails to block coherent unsafe edits that span multiple rendered views in 3D pipelines.
  • Increasing the strength of unsafe suppression produces measurable drops in visual quality or fidelity to the input prompt.
  • Effective defenses against unsafe 3D generation must act directly on the optimized 3D representation rather than only on 2D projections.
  • Unsafe prompts can produce multi-view-consistent NSFW content unless optimization is explicitly constrained at multiple stages.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar layered safety mechanisms could be adapted to other 3D representations such as NeRF or mesh-based editors.
  • Production 3D content tools may eventually require user-selectable safety strength settings to balance protection against creative intent.
  • Automated unsafe-prompt classifiers tailored to 3D consistency could be combined with this approach for earlier intervention.

Load-bearing premise

That the listed combination of safety techniques can consistently steer the 3D optimization away from unsafe directions without unacceptable loss of fidelity or introduction of artifacts in typical scenes.

What would settle it

Running the 3DEditSafe pipeline on a set of unsafe prompts and finding that the final 3D representation still exhibits high unsafe semantic alignment scores or high view-level attack success rates.

Figures

Figures reproduced from arXiv: 2605.15398 by Meng Jiang, Nicole Meng, Yingjie Lao, Zheyuan Liu.

Figure 1
Figure 1. Figure 1: Unsafe generation in EditSplat [14]. Starting from a clean 3D scene (top row), a benign prompt produces a safe and view-consistent marble edit (second row). In contrast, an unsafe prompt such as “Make his face shredded like horror” generates graphic content that persists across rendered viewpoints (third row). Applying a diffusion-level 2D safety defense alone still fails to remove the unsafe content after… view at source ↗
Figure 2
Figure 2. Figure 2: Comparison on face object between unprotected 3D editing and 3DEditSafe under the [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Comparison on fangzhou object between unprotected 3D editing and 3DEditSafe under [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Comparison on person object between unprotected 3D editing and 3DEditSafe under the [PITH_FULL_IMAGE:figures/full_fig_p015_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Comparison on bear object between unprotected 3D editing and 3DEditSafe under the [PITH_FULL_IMAGE:figures/full_fig_p016_5.png] view at source ↗
read the original abstract

Recent advances in 3D generative editing, particularly pipelines based on 3D Gaussian Splatting (3DGS), have achieved high-fidelity, multi-view-consistent scene manipulation from text prompts. However, we find that these pipelines also introduce new safety risks when unsafe prompts produce edits that are propagated and optimized across views. In this work, we study unsafe generation in 3D editing pipelines and show that such behavior can lead to coherent, undesirable Not-Safe-For-Work (NSFW) content in the final 3D representation. To address this, we propose 3DEditSafe, a safety-regularized 3D editing framework that constrains unsafe semantic propagation during optimization. 3DEditSafe combines generation-stage safety guidance with rendered-view 3D safety regularization, safe semantic projection, residue suppression, and mask-aware preservation to steer optimization away from unsafe editing directions. We evaluate our approach on EditSplat scenes using an object-compatible unsafe prompt benchmark and show that 2D safety guidance alone is not consistently sufficient to prevent unsafe 3D edits. 3DEditSafe reduces unsafe semantic alignment and view-level attack success rates, while revealing a safety-quality tradeoff in which stronger unsafe suppression can introduce artifacts or reduce unsafe-prompt fidelity. To our knowledge, this work is the first attempt to study and defend against unsafe generation in text-driven 3D editing pipelines, highlighting the need for safety mechanisms that operate directly on optimized 3D representations.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 1 minor

Summary. The paper claims that text-driven 3D editing pipelines based on 3D Gaussian Splatting can propagate unsafe prompts into coherent NSFW content across views. It proposes 3DEditSafe, a safety-regularized framework combining generation-stage safety guidance, rendered-view 3D safety regularization, safe semantic projection, residue suppression, and mask-aware preservation to steer optimization away from unsafe directions. Evaluation on EditSplat scenes with an object-compatible unsafe prompt benchmark shows that 2D safety guidance alone is insufficient, while the full framework reduces unsafe semantic alignment and view-level attack success rates, at the cost of a safety-quality tradeoff that can introduce artifacts or reduce prompt fidelity. The work positions itself as the first to study and defend against unsafe generation in such 3D pipelines.

Significance. If the empirical claims hold after verification, the work is significant as the first systematic treatment of safety risks specific to 3D editing pipelines. By demonstrating that 2D safety measures do not transfer reliably to multi-view 3D optimization and by identifying an explicit safety-quality tradeoff, the paper supplies a concrete engineering baseline and a set of failure modes that future 3D safety research can build upon or refute.

major comments (3)
  1. [Abstract] Abstract: the central empirical claim that 3DEditSafe 'reduces unsafe semantic alignment and view-level attack success rates' is stated without any quantitative metrics, tables, or numerical results, so the magnitude and statistical reliability of the improvement cannot be assessed from the provided text.
  2. [Method] Method description of 3DEditSafe: the five components are listed but no combined loss equation, weighting schedule, or optimization procedure is supplied, leaving the mechanism by which the suite steers 3DGS optimization away from unsafe directions as an unverified assumption.
  3. [Evaluation] Evaluation section: no ablation isolating the contribution of each regularization term (generation-stage guidance, 3D safety regularization, semantic projection, residue suppression, mask-aware preservation) is reported, which is required to substantiate that the full combination is both necessary and non-conflicting with the original editing objective.
minor comments (1)
  1. [Abstract] The phrase 'object-compatible unsafe prompt benchmark' is used without a citation or short description of its construction or relation to existing 2D safety benchmarks.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below and will revise the manuscript to strengthen the presentation of results, methods, and evaluation.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central empirical claim that 3DEditSafe 'reduces unsafe semantic alignment and view-level attack success rates' is stated without any quantitative metrics, tables, or numerical results, so the magnitude and statistical reliability of the improvement cannot be assessed from the provided text.

    Authors: We agree that the abstract would benefit from explicit quantitative indicators. In the revised version we will incorporate the key measured improvements (e.g., the observed drop in unsafe semantic alignment and the reduction in view-level attack success rate) directly into the abstract while preserving its length and readability. revision: yes

  2. Referee: [Method] Method description of 3DEditSafe: the five components are listed but no combined loss equation, weighting schedule, or optimization procedure is supplied, leaving the mechanism by which the suite steers 3DGS optimization away from unsafe directions as an unverified assumption.

    Authors: We acknowledge that an explicit combined objective would clarify the integration. The individual loss terms are defined in Section 3; we will add a single combined-loss equation together with the weighting schedule and the precise optimization steps used for 3D Gaussian Splatting, making the steering mechanism fully verifiable. revision: yes

  3. Referee: [Evaluation] Evaluation section: no ablation isolating the contribution of each regularization term (generation-stage guidance, 3D safety regularization, semantic projection, residue suppression, mask-aware preservation) is reported, which is required to substantiate that the full combination is both necessary and non-conflicting with the original editing objective.

    Authors: We agree that component-wise ablations strengthen the claims. Although the current experiments already contrast the full framework against 2D guidance alone, we will insert a new ablation table that incrementally adds each term and reports the resulting safety metrics and editing-quality trade-offs, thereby demonstrating necessity and compatibility. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical engineering framework without derivations or self-referential predictions

full rationale

The paper describes a multi-component safety framework for 3D Gaussian Splatting editing pipelines and evaluates it on benchmarks, but contains no equations, loss functions with explicit derivations, fitted parameters presented as predictions, or first-principles results. All claims rest on empirical measurements of unsafe semantic alignment and attack success rates rather than any chain that reduces to its own inputs by construction. The approach is therefore self-contained as an applied defense method whose validity is assessed externally via the reported experiments.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract provides insufficient technical detail to enumerate specific free parameters, axioms, or invented entities; the framework introduces regularization techniques whose exact formulations and hyperparameter choices are not specified.

pith-pipeline@v0.9.0 · 5804 in / 1167 out tokens · 64252 ms · 2026-05-19T15:02:42.862688+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

40 extracted references · 40 canonical work pages · 2 internal anchors

  1. [1]

    4chan.https://www.4chan.org/

  2. [2]

    Lexica.https://lexica.art/

  3. [3]

    Mip-nerf 360: Unbounded anti-aliased neural radiance fields

    Jonathan T Barron, Ben Mildenhall, Dor Verbin, Pratul P Srinivasan, and Peter Hedman. Mip-nerf 360: Unbounded anti-aliased neural radiance fields. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5470–5479, 2022

  4. [4]

    Comprehensive evaluation and analysis for nsfw concept erasure in text-to-image diffusion models.arXiv preprint arXiv:2505.15450, 2025

    Die Chen, Zhiwen Li, Cen Chen, Yuexiang Xie, Xiaodan Li, Jinyan Ye, Yingda Chen, and Yaliang Li. Comprehensive evaluation and analysis for nsfw concept erasure in text-to-image diffusion models.arXiv preprint arXiv:2505.15450, 2025

  5. [5]

    Gaussianeditor: Swift and controllable 3d editing with gaussian splatting, 2023

    Yiwen Chen, Zilong Chen, Chi Zhang, Feng Wang, Xiaofeng Yang, Yikai Wang, Zhongang Cai, Lei Yang, Huaping Liu, and Guosheng Lin. Gaussianeditor: Swift and controllable 3d editing with gaussian splatting, 2023

  6. [6]

    Splats in splats: Robust and effective 3d steganography towards gaussian splatting

    Yijia Guo, Wenkai Huang, Yang Li, Gaolei Li, Hang Zhang, Liwen Hu, Jianhua Li, Tiejun Huang, and Lei Ma. Splats in splats: Robust and effective 3d steganography towards gaussian splatting. InProceedings of the AAAI Conference on Artificial Intelligence, volume 40, pages 4485–4493, 2026

  7. [7]

    Shielddiff: Suppressing sexual content generation from diffusion models through reinforcement learning.arXiv preprint arXiv:2410.05309, 2024

    Dong Han, Salaheldin Mohamed, and Yong Li. Shielddiff: Suppressing sexual content generation from diffusion models through reinforcement learning.arXiv preprint arXiv:2410.05309, 2024

  8. [8]

    Instruct- nerf2nerf: Editing 3d scenes with instructions

    Ayaan Haque, Matthew Tancik, Alexei A Efros, Aleksander Holynski, and Angjoo Kanazawa. Instruct- nerf2nerf: Editing 3d scenes with instructions. InProceedings of the IEEE/CVF international conference on computer vision, pages 19740–19750, 2023

  9. [9]

    Ctrl-d: Controllable dynamic 3d scene editing with personalized 2d diffusion

    Kai He, Chin-Hsuan Wu, and Igor Gilitschenski. Ctrl-d: Controllable dynamic 3d scene editing with personalized 2d diffusion. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 26630–26640, 2025

  10. [10]

    A Survey on 3D Gaussian Splatting Applications: Segmentation, Editing, and Generation

    Shuting He, Peilin Ji, Yitong Yang, Changshuo Wang, Jiayi Ji, Yinglin Wang, and Henghui Ding. A survey on 3d gaussian splatting applications: Segmentation, editing, and generation.arXiv preprint arXiv:2508.09977, 2025

  11. [11]

    Adlift: Lifting adversarial perturbations to safeguard 3d gaussian splatting assets against instruction-driven editing.arXiv preprint arXiv:2512.07247, 2025

    Ziming Hong, Tianyu Huang, Runnan Chen, Shanshan Ye, Mingming Gong, Bo Han, and Tongliang Liu. Adlift: Lifting adversarial perturbations to safeguard 3d gaussian splatting assets against instruction-driven editing.arXiv preprint arXiv:2512.07247, 2025

  12. [12]

    Diffusion model-based image editing: A survey.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025

    Yi Huang, Jiancheng Huang, Yifan Liu, Mingfu Yan, Jiaxi Lv, Jianzhuang Liu, Wei Xiong, He Zhang, Liangliang Cao, and Shifeng Chen. Diffusion model-based image editing: A survey.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025

  13. [13]

    3d gaussian splatting for real-time radiance field rendering.ACM Transactions on Graphics, 42(4), July 2023

    Bernhard Kerbl, Georgios Kopanas, Thomas Leimkühler, and George Drettakis. 3d gaussian splatting for real-time radiance field rendering.ACM Transactions on Graphics, 42(4), July 2023. URL https: //repo-sam.inria.fr/fungraph/3d-gaussian-splatting/

  14. [14]

    Editsplat: Multi-view fusion and attention-guided optimization for view-consistent 3d scene editing with 3d gaussian splatting

    Dong In Lee, Hyeongcheol Park, Jiyoung Seo, Eunbyung Park, Hyunje Park, Ha Dam Baek, Sangheon Shin, Sangmin Kim, and Sangpil Kim. Editsplat: Multi-view fusion and attention-guided optimization for view-consistent 3d scene editing with 3d gaussian splatting. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 11135–11145, 2025

  15. [15]

    Auditing image-based nsfw classifiers for content filtering

    Warren Leu, Yuta Nakashima, and Noa Garcia. Auditing image-based nsfw classifiers for content filtering. InProceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency, pages 1163–1173, 2024

  16. [16]

    Microsoft coco: Common objects in context

    Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. InEuropean conference on computer vision, pages 740–755. Springer, 2014

  17. [17]

    Degauss: Defending against malicious 3d editing for gaussian splatting

    Lingzhuang Meng, Mingwen Shao, Yuanjian Qiao, and Xiang Lv. Degauss: Defending against malicious 3d editing for gaussian splatting. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025

  18. [18]

    Advancing adversarial robustness in gnerfs: The il2-nerf attack

    Nicole Meng, Caleb Manicke, Ronak Sahu, Caiwen Ding, and Yingjie Lao. Advancing adversarial robustness in gnerfs: The il2-nerf attack. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 16388–16397, 2025. 10

  19. [19]

    Nerf: Representing scenes as neural radiance fields for view synthesis.Communications of the ACM, 65 (1):99–106, 2021

    Ben Mildenhall, Pratul P Srinivasan, Matthew Tancik, Jonathan T Barron, Ravi Ramamoorthi, and Ren Ng. Nerf: Representing scenes as neural radiance fields for view synthesis.Communications of the ACM, 65 (1):99–106, 2021

  20. [20]

    A review of instruction-guided image editing.Engineering Applications of Artificial Intelligence, 163: 112953, 2026

    Thanh Tam Nguyen, Zhao Ren, Trinh Pham, Phi Le Nguyen, Quoc Viet Hung Nguyen, and Hongzhi Yin. A review of instruction-guided image editing.Engineering Applications of Artificial Intelligence, 163: 112953, 2026

  21. [21]

    3d-latte: Latent space 3d editing from textual instructions.arXiv preprint arXiv:2509.00269, 2025

    Maria Parelli, Michael Oechsle, Michael Niemeyer, Federico Tombari, and Andreas Geiger. 3d-latte: Latent space 3d editing from textual instructions.arXiv preprint arXiv:2509.00269, 2025

  22. [22]

    Unsafe Diffusion: On the Generation of Unsafe Images and Hateful Memes From Text-To-Image Models

    Yiting Qu, Xinyue Shen, Xinlei He, Michael Backes, Savvas Zannettou, and Yang Zhang. Unsafe Diffusion: On the Generation of Unsafe Images and Hateful Memes From Text-To-Image Models. InACM SIGSAC Conference on Computer and Communications Security (CCS). ACM, 2023

  23. [23]

    Unsafe diffusion: On the generation of unsafe images and hateful memes from text-to-image models

    Yiting Qu, Xinyue Shen, Xinlei He, Michael Backes, Savvas Zannettou, and Yang Zhang. Unsafe diffusion: On the generation of unsafe images and hateful memes from text-to-image models. InProceedings of the 2023 ACM SIGSAC conference on computer and communications security, pages 3403–3417, 2023

  24. [24]

    Google announces new google maps experience featuring neural radiance fields (nerfs)

    Michael Rubloff. Google announces new google maps experience featuring neural radiance fields (nerfs). Randiance Fields, 2023

  25. [25]

    Side effects of erasing concepts from diffusion models.arXiv preprint arXiv:2508.15124, 2025

    Shaswati Saha, Sourajit Saha, Manas Gaur, and Tejas Gokhale. Side effects of erasing concepts from diffusion models.arXiv preprint arXiv:2508.15124, 2025

  26. [26]

    2024 , journal =

    Matthias Schneider and Thilo Hagendorff. When image generation goes wrong: A safety analysis of stable diffusion models.arXiv preprint arXiv:2411.15516, 2024

  27. [27]

    Safe latent diffusion: Mitigating inappropriate degeneration in diffusion models

    Patrick Schramowski, Manuel Brack, Björn Deiseroth, and Kristian Kersting. Safe latent diffusion: Mitigating inappropriate degeneration in diffusion models. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 22522–22531, 2023

  28. [28]

    2025 , journal =

    Jiwoo Shin, Byeonghu Na, Mina Kang, Wonhyeok Choi, and Il-Chul Moon. Prompt-based safety guidance is ineffective for unlearned text-to-image diffusion models.arXiv preprint arXiv:2511.04834, 2025

  29. [29]

    GIF: A Conditional Multimodal Generative Framework for IR Drop Imaging in Chip Layouts

    Kiran Thorat, Nicole Meng, Mostafa Karami, Caiwen Ding, Yingjie Lao, and Zhijie Jerry Shi. Gif: A conditional multimodal generative framework for ir drop imaging in chip layouts.arXiv preprint arXiv:2604.09999, 2026

  30. [30]

    Safety without semantic disruptions: Editing-free safe image generation via context-preserving dual latent reconstruction

    Jordan Vice, Naveed Akhtar, Mubarak Shah, Richard Hartley, and Ajmal Saeed Mian. Safety without semantic disruptions: Editing-free safe image generation via context-preserving dual latent reconstruction. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 2306–2316, 2025

  31. [31]

    Self-correcting llm- controlled diffusion models

    Tsung-Han Wu, Long Lian, Joseph E Gonzalez, Boyi Li, and Trevor Darrell. Self-correcting llm- controlled diffusion models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6327–6336, 2024

  32. [32]

    Universal prompt optimizer for safe text-to-image generation

    Zongyu Wu, Hongcheng Gao, Yueze Wang, Xiang Zhang, and Suhang Wang. Universal prompt optimizer for safe text-to-image generation. InProceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 6340–6354, 2024

  33. [33]

    Nsfw-classifier guided prompt sanitization for safe text-to-image generation.arXiv preprint arXiv:2506.18325, 2025

    Yu Xie, Chengjie Zeng, Lingyun Zhang, and Yanwei Fu. Nsfw-classifier guided prompt sanitization for safe text-to-image generation.arXiv preprint arXiv:2506.18325, 2025

  34. [34]

    Sneakyprompt: Jailbreaking text-to-image generative models

    Yuchen Yang, Bo Hui, Haolin Yuan, Neil Gong, and Yinzhi Cao. Sneakyprompt: Jailbreaking text-to-image generative models. In2024 IEEE symposium on security and privacy (SP), pages 897–912. IEEE, 2024

  35. [35]

    Blendedmvs: A large-scale dataset for generalized multi-view stereo networks

    Yao Yao, Zixin Luo, Shiwei Li, Jingyang Zhang, Yufan Ren, Lei Zhou, Tian Fang, and Long Quan. Blendedmvs: A large-scale dataset for generalized multi-view stereo networks. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 1790–1799, 2020

  36. [36]

    Gaussian grouping: Segment and edit anything in 3d scenes

    Mingqiao Ye, Martin Danelljan, Fisher Yu, and Lei Ke. Gaussian grouping: Segment and edit anything in 3d scenes. InEuropean conference on computer vision, pages 162–179. Springer, 2024

  37. [37]

    Promptguard: Soft prompt-guided unsafe content moderation for text-to-image models.IEEE Transactions on Information Forensics and Security, 2026

    Lingzhi Yuan, Xinfeng Li, Chejian Xu, Guanhong Tao, Xiaojun Jia, Yihao Huang, Wei Dong, Yang Liu, and Bo Li. Promptguard: Soft prompt-guided unsafe content moderation for text-to-image models.IEEE Transactions on Information Forensics and Security, 2026. 11

  38. [38]

    3ditscene: Editing any scene via language-guided disentangled gaussian splatting.arXiv preprint arXiv:2405.18424, 2024

    Qihang Zhang, Yinghao Xu, Chaoyang Wang, Hsin-Ying Lee, Gordon Wetzstein, Bolei Zhou, and Ceyuan Yang. 3ditscene: Editing any scene via language-guided disentangled gaussian splatting.arXiv preprint arXiv:2405.18424, 2024

  39. [39]

    The unreasonable effectiveness of deep features as a perceptual metric

    Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman, and Oliver Wang. The unreasonable effectiveness of deep features as a perceptual metric. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 586–595, 2018

  40. [40]

    person,” we minimally adapt the object noun to match the scene, e.g., replacing “person

    Yuyang Zhang, Kangjie Chen, Xudong Jiang, Jiahui Wen, Yihui Jin, Ziyou Liang, Yihao Huang, Run Wang, and Lina Wang. {USD}:{NSFW} content detection for {Text-to-Image} models via scene graph. In34th USENIX Security Symposium (USENIX Security 25), pages 879–895, 2025. 12 A Appendix A.1 Benchmark Construction Details We construct an object-compatible benchma...