pith. sign in

arxiv: 2605.18132 · v1 · pith:WTDDUA6Anew · submitted 2026-05-18 · 💻 cs.CV · cs.AI

Who Generated This 3D Asset? Learning Source Attribution for Generative 3D Models

Pith reviewed 2026-05-20 11:50 UTC · model grok-4.3

classification 💻 cs.CV cs.AI
keywords 3D generationsource attributiongenerative modelsfingerprintingmulti-view fusionbenchmarkcontent provenancegeometric artifacts
0
0 comments X

The pith

Generative 3D models leave stable fingerprints that reveal which model created a given asset.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that generative 3D models leave two stable types of fingerprints consisting of cross-view inconsistencies and structural artifacts visible in geometric statistics and frequency-domain cues. It builds the first benchmark covering 22 representative generators and tests attribution under full supervision, few-shot learning with 1 percent of labels, and realistic mixed real-synthetic conditions. A hierarchical multi-view multi-modal Transformer is introduced to fuse appearance, geometric, and frequency features inside each view while relating signals across views. Experiments reach 97.22 percent accuracy with full data and 77.17 percent with fewer than five samples per generator. This matters because it supplies a concrete way to trace the origin of 3D content used in games, robotics, and immersive environments.

Core claim

Modern generative 3D models leave two types of stable fingerprints: cross-view inconsistency and structural artifacts reflected in geometric statistics and frequency-domain cues. By constructing a benchmark with 22 generators under standard, few-shot, and realistic deployment protocols and training a hierarchical multi-view multi-modal Transformer that fuses appearance, geometric, and frequency-domain features within each view while modeling global relationships across views, the work demonstrates that these dispersed signals support source attribution at 97.22 percent accuracy under full supervision and 77.17 percent accuracy with only 1 percent training data.

What carries the argument

Hierarchical multi-view multi-modal Transformer that fuses appearance, geometric, and frequency-domain features within each view and models global relationships across views to capture dispersed attribution signals.

Load-bearing premise

The fingerprints remain stable and detectable under realistic deployment constraints including scarce labels, degraded prompts, and mixed real or synthetic assets.

What would settle it

A substantial drop in attribution accuracy when evaluating on assets produced with heavily varied prompts, post-processed geometry, or combinations of generators absent from the original benchmark.

Figures

Figures reproduced from arXiv: 2605.18132 by Dacheng Tao, Sihan Ma, Siyuan Liang.

Figure 1
Figure 1. Figure 1: Overview of our attribution framework. Given multi-view renderings and structural priors of a 3D asset, our model learns discriminative fingerprints through intra-view fusion and cross-view reasoning, enabling accurate attribution across 22 generative 3D models. Abstract Generative 3D models are deployed in gaming, robotics, and immersive creation, making source attribution critical: given a 3D asset, can … view at source ↗
Figure 2
Figure 2. Figure 2: Representative benchmark samples [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Dispersed attribution fingerprints in generated 3D assets. [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Overview of our attribution model. Given multi-view renderings and structural priors from a 3D asset, our model learns dispersed attribution signals across viewpoints and modalities through multi-modal fusion within each single view and cross-view reasoning, achieving strong attribution accuracy and interpretable clustering across 22 generative 3D models. Observable cue tokenization. For each viewpoint i, … view at source ↗
Figure 5
Figure 5. Figure 5: Visualization of attribution fingerprints captured by our framework. [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Overview of our attribution benchmark [PITH_FULL_IMAGE:figures/full_fig_p014_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: View number ablation. View number ablation study. We further study the effect of the number of rendered views in [PITH_FULL_IMAGE:figures/full_fig_p017_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Row-normalized confusion matrix heatmap. Each value denotes the proportion of samples [PITH_FULL_IMAGE:figures/full_fig_p018_8.png] view at source ↗
read the original abstract

Generative 3D models are deployed in gaming, robotics, and immersive creation, making source attribution critical: given a 3D asset, can we identify whether and which generative model created it? This problem faces two core challenges: dispersed attribution signals, where 3D fingerprints are distributed across multi-view, geometric, and frequency-domain cues; and realistic deployment constraints, where scarce labels, degraded prompts, and mixed real/synthetic assets undermine attribution reliability. To systematically study this problem, we construct, to the best of our knowledge, the first passive source attribution benchmark for modern generated assets, covering 22 representative 3D generators under standard, few-shot, and realistic deployment protocols. Based on this benchmark, we find that generative 3D models leave two types of stable fingerprints: cross-view inconsistency and structural artifacts reflected in geometric statistics and frequency-domain cues. To capture these dispersed signals, we propose a hierarchical multi-view multi-modal Transformer that fuses appearance, geometric, and frequency-domain features within each view and models global relationships across views. Extensive experiments demonstrate strong performance, achieving 97.22% accuracy under full supervision and 77.17% accuracy with only 1% training data, corresponding to fewer than five samples per generator. These results show that modern 3D generators leave stable and attributable fingerprints, establishing a new benchmark and methodological foundation for trustworthy 3D content provenance.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces the first passive source attribution benchmark for 3D generative models, spanning 22 generators under standard, few-shot, and realistic deployment protocols. It identifies two types of stable fingerprints—cross-view inconsistency and structural artifacts in geometric statistics and frequency-domain cues—and proposes a hierarchical multi-view multi-modal Transformer to fuse appearance, geometric, and frequency features across views. Experiments report 97.22% accuracy under full supervision and 77.17% accuracy with only 1% training data (fewer than five samples per generator), supporting the claim that modern 3D generators leave attributable fingerprints and establishing a foundation for 3D content provenance.

Significance. If the results hold, this work is significant for enabling trustworthy provenance in generative 3D content, an area of growing importance in gaming, robotics, and immersive applications. The construction of a new benchmark covering multiple generators and protocols, combined with strong empirical performance in the low-data regime, provides a concrete methodological foundation and falsifiable baseline for future attribution research.

major comments (2)
  1. [Section 3] Benchmark Construction and Evaluation Protocols (Section 3): The central claim that fingerprints are 'stable' and remain detectable under realistic constraints is load-bearing for both the 77.17% few-shot result and the motivation for multi-modal fusion, yet the paper does not report experiments testing invariance to common post-processing operations such as mesh simplification, format conversion, coordinate normalization, or integration into mixed real/synthetic scenes. The benchmark treats collected assets as fixed representations, leaving open the possibility that reported accuracies partly reflect export-pipeline artifacts rather than intrinsic generator properties.
  2. [Section 4] Hierarchical Fusion Method (Section 4): While the Transformer fuses multi-view and multi-modal cues, there is insufficient ablation or analysis quantifying the individual contribution of cross-view inconsistency versus geometric/frequency artifacts, particularly in the 1%-data regime; this weakens the justification for the hierarchical design over simpler baselines.
minor comments (2)
  1. [Abstract] The abstract and introduction could more explicitly situate the new benchmark against prior 2D attribution or forensic work to strengthen the 'first' claim.
  2. [Section 3] Clarify the precise definition and implementation of 'realistic deployment protocols' to make the generalization claims easier to evaluate.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which help strengthen the robustness and analysis in our work on source attribution for generative 3D models. We address each major comment below and will incorporate revisions to improve the manuscript.

read point-by-point responses
  1. Referee: [Section 3] Benchmark Construction and Evaluation Protocols (Section 3): The central claim that fingerprints are 'stable' and remain detectable under realistic constraints is load-bearing for both the 77.17% few-shot result and the motivation for multi-modal fusion, yet the paper does not report experiments testing invariance to common post-processing operations such as mesh simplification, format conversion, coordinate normalization, or integration into mixed real/synthetic scenes. The benchmark treats collected assets as fixed representations, leaving open the possibility that reported accuracies partly reflect export-pipeline artifacts rather than intrinsic generator properties.

    Authors: We agree that explicit tests for post-processing operations would further validate the stability of the identified fingerprints. Our benchmark already incorporates realistic deployment protocols, including few-shot settings and mixed real/synthetic scenes, but we did not include dedicated experiments on operations such as mesh simplification or format conversion. In the revised manuscript, we will add a new analysis subsection with preliminary results on a subset of generators evaluating attribution performance after mesh simplification and coordinate normalization. We will also explicitly discuss the possibility of export-pipeline artifacts as a limitation. This addresses the concern while preserving the core claim that the fingerprints are attributable to generator properties. revision: yes

  2. Referee: [Section 4] Hierarchical Fusion Method (Section 4): While the Transformer fuses multi-view and multi-modal cues, there is insufficient ablation or analysis quantifying the individual contribution of cross-view inconsistency versus geometric/frequency artifacts, particularly in the 1%-data regime; this weakens the justification for the hierarchical design over simpler baselines.

    Authors: We acknowledge that more detailed ablations would strengthen the justification for the hierarchical multi-view multi-modal design. The architecture is motivated by the dispersed signals observed in the benchmark (cross-view inconsistency and structural artifacts in geometric and frequency domains), but we did not provide component-wise breakdowns specifically for the 1% data regime. In the revision, we will expand Section 4 with additional ablation studies, including performance when removing the cross-view modeling module and when using only geometric or frequency features, reported under both full supervision and 1% training data. These results will better quantify contributions and compare against simpler baselines. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical benchmark construction and model evaluation

full rationale

The paper constructs a new benchmark covering 22 3D generators and reports empirical accuracies (97.22% full supervision, 77.17% with 1% data) for a proposed hierarchical multi-view multi-modal Transformer. No derivation chain, equations, or predictions are presented that reduce by construction to fitted inputs, self-citations, or renamed ansatzes. The fingerprints are observed from the benchmark rather than defined into existence, and the method is a standard fusion architecture trained and evaluated on held-out data. The work is self-contained against its external benchmark with no load-bearing self-referential steps.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

Review based on abstract only; specific model hyperparameters and exact benchmark construction details are not provided.

free parameters (1)
  • Transformer fusion and attention hyperparameters
    The hierarchical multi-view multi-modal Transformer requires choices for layer counts, attention mechanisms, and feature fusion weights that are typically tuned on the training data.
axioms (1)
  • domain assumption Generative 3D models produce assets containing stable, detectable fingerprints in cross-view, geometric, and frequency domains.
    This assumption underpins both the benchmark findings and the design of the attribution model.

pith-pipeline@v0.9.0 · 5786 in / 1379 out tokens · 66851 ms · 2026-05-20T11:50:21.175441+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

62 extracted references · 62 canonical work pages · 7 internal anchors

  1. [1]

    Constraint is all you need: optimization-based 3d level generation with llms

    Kaijie Xu and Clark Verbrugge. Constraint is all you need: optimization-based 3d level generation with llms. InProceedings of the 20th International Conference on the Foundations of Digital Games, pages 1–13, 2025

  2. [2]

    Sketch2play: Runtime generation of playable 3d game content from sketches

    Tianpei Zang, Keyuan Chen, Haiyan Li, and Guoyu Sun. Sketch2play: Runtime generation of playable 3d game content from sketches. InAdjunct Proceedings of the 38th Annual ACM Symposium on User Interface Software and Technology, pages 1–3, 2025

  3. [3]

    Procthor: Large-scale embodied ai using procedural generation.Advances in Neural Information Processing Systems, 35:5982–5994, 2022

    Matt Deitke, Eli VanderBilt, Alvaro Herrasti, Luca Weihs, Kiana Ehsani, Jordi Salvador, Winson Han, Eric Kolve, Aniruddha Kembhavi, and Roozbeh Mottaghi. Procthor: Large-scale embodied ai using procedural generation.Advances in Neural Information Processing Systems, 35:5982–5994, 2022

  4. [4]

    Holodeck: Language guided generation of 3d embodied ai environments

    Yue Yang, Fan-Yun Sun, Luca Weihs, Eli VanderBilt, Alvaro Herrasti, Winson Han, Jiajun Wu, Nick Haber, Ranjay Krishna, Lingjie Liu, et al. Holodeck: Language guided generation of 3d embodied ai environments. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16227–16237, 2024

  5. [5]

    Syncity: Training-free generation of 3d worlds

    Paul Engstler, Aleksandar Shtedritski, Iro Laina, Christian Rupprecht, and Andrea Vedaldi. Syncity: Training-free generation of 3d worlds. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 27585–27595, 2025

  6. [6]

    Text2nerf: Text-driven 3d scene generation with neural radiance fields.IEEE Transactions on Visualization and Computer Graphics, 30(12):7749–7762, 2024

    Jingbo Zhang, Xiaoyu Li, Ziyu Wan, Can Wang, and Jing Liao. Text2nerf: Text-driven 3d scene generation with neural radiance fields.IEEE Transactions on Visualization and Computer Graphics, 30(12):7749–7762, 2024

  7. [7]

    Text2room: Extracting textured 3d meshes from 2d text-to-image models

    Lukas Höllein, Ang Cao, Andrew Owens, Justin Johnson, and Matthias Nießner. Text2room: Extracting textured 3d meshes from 2d text-to-image models. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 7909–7920, 2023

  8. [8]

    DreamFusion: Text-to-3D using 2D Diffusion

    Ben Poole, Ajay Jain, Jonathan T Barron, and Ben Mildenhall. Dreamfusion: Text-to-3d using 2d diffusion.arXiv preprint arXiv:2209.14988, 2022

  9. [9]

    Magic3d: High-resolution text-to-3d content creation

    Chen-Hsuan Lin, Jun Gao, Luming Tang, Towaki Takikawa, Xiaohui Zeng, Xun Huang, Karsten Kreis, Sanja Fidler, Ming-Yu Liu, and Tsung-Yi Lin. Magic3d: High-resolution text-to-3d content creation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 300–309, 2023

  10. [10]

    Pro- lificdreamer: High-fidelity and diverse text-to-3d generation with variational score distillation

    Zhengyi Wang, Cheng Lu, Yikai Wang, Fan Bao, Chongxuan Li, Hang Su, and Jun Zhu. Pro- lificdreamer: High-fidelity and diverse text-to-3d generation with variational score distillation. Advances in Neural Information Processing Systems, 36, 2024

  11. [11]

    Objaverse: A universe of annotated 3d objects

    Matt Deitke, Dustin Schwenk, Jordi Salvador, Luca Weihs, Oscar Michel, Eli VanderBilt, Ludwig Schmidt, Kiana Ehsani, Aniruddha Kembhavi, and Ali Farhadi. Objaverse: A universe of annotated 3d objects. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13142–13153, 2023

  12. [12]

    Objaverse-xl: A universe of 10m+ 3d objects.Advances in Neural Information Processing Systems, 36, 2024

    Matt Deitke, Ruoshi Liu, Matthew Wallingford, Huong Ngo, Oscar Michel, Aditya Kusupati, Alan Fan, Christian Laforte, Vikram V oleti, Samir Yitzhak Gadre, et al. Objaverse-xl: A universe of 10m+ 3d objects.Advances in Neural Information Processing Systems, 36, 2024

  13. [13]

    2025.doi:10.48550/arXiv.2505.07747

    Weiyu Li, Xuanyang Zhang, Zheng Sun, Di Qi, Hao Li, Wei Cheng, Weiwei Cai, Shihao Wu, Jiarui Liu, Zihao Wang, et al. Step1x-3d: Towards high-fidelity and controllable generation of textured 3d assets.arXiv preprint arXiv:2505.07747, 2025

  14. [14]

    ORCA: Open research content archive.https://developer.nvidia.com/orca

    NVIDIA. ORCA: Open research content archive.https://developer.nvidia.com/orca

  15. [15]

    Evading data provenance in deep neural networks

    Hongyu Zhu, Sichu Liang, Wenwen Wang, Zhuomeng Zhang, Fangqi Li, and Shi-Lin Wang. Evading data provenance in deep neural networks. InProceedings of the IEEE/CVF Interna- tional Conference on Computer Vision, pages 1249–1260, 2025. 10

  16. [16]

    Data governance: Organizing data for trustworthy artificial intelligence.Government information quarterly, 37(3):101493, 2020

    Marijn Janssen, Paul Brous, Elsa Estevez, Luis S Barbosa, and Tomasz Janowski. Data governance: Organizing data for trustworthy artificial intelligence.Government information quarterly, 37(3):101493, 2020

  17. [17]

    Mapping the evolution of data governance scientific research.Data & Policy, 7:e51, 2025

    Hossein Hassani, Xu Huang, and Steve MacFeely. Mapping the evolution of data governance scientific research.Data & Policy, 7:e51, 2025

  18. [18]

    SAGA: Source Attribution of Generative AI Videos

    Rohit Kundu, Vishal Mohanty, Hao Xiong, Shan Jia, Athula Balachandran, and Amit K Roy- Chowdhury. Saga: Source attribution of generative ai videos.arXiv preprint arXiv:2511.12834, 2025

  19. [19]

    Where did i come from? origin attribution of ai-generated images.Advances in neural information processing systems, 36:74478–74500, 2023

    Zhenting Wang, Chen Chen, Yi Zeng, Lingjuan Lyu, and Shiqing Ma. Where did i come from? origin attribution of ai-generated images.Advances in neural information processing systems, 36:74478–74500, 2023

  20. [20]

    Deepfake network architecture attribution

    Tianyun Yang, Ziyao Huang, Juan Cao, Lei Li, and Xirong Li. Deepfake network architecture attribution. InProceedings of the AAAI Conference on Artificial Intelligence, volume 36, pages 4662–4670, 2022

  21. [21]

    3dgen-bench: comprehensive benchmark suite for 3d generative models.arXiv preprint arXiv:2503.21745, 2025

    Yuhan Zhang, Mengchen Zhang, Tong Wu, Tengfei Wang, Gordon Wetzstein, Dahua Lin, and Ziwei Liu. 3dgen-bench: comprehensive benchmark suite for 3d generative models.arXiv preprint arXiv:2503.21745, 2025

  22. [22]

    Scalable 3d captioning with pretrained models.Advances in Neural Information Processing Systems, 36:75307–75337, 2023

    Tiange Luo, Chris Rockwell, Honglak Lee, and Justin Johnson. Scalable 3d captioning with pretrained models.Advances in Neural Information Processing Systems, 36:75307–75337, 2023

  23. [23]

    Point-E: A System for Generating 3D Point Clouds from Complex Prompts

    Alex Nichol, Heewoo Jun, Prafulla Dhariwal, Pamela Mishkin, and Mark Chen. Point-e: A system for generating 3d point clouds from complex prompts.arXiv preprint arXiv:2212.08751, 2022

  24. [24]

    Shap-E: Generating Conditional 3D Implicit Functions

    Heewoo Jun and Alex Nichol. Shap-e: Generating conditional 3d implicit functions.arXiv preprint arXiv:2305.02463, 2023

  25. [25]

    Sheng-Yu Wang, Oliver Wang, Richard Zhang, Andrew Owens, and Alexei A. Efros. CNN- generated images are surprisingly easy to spot. . . for now. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 8695–8704, 2020

  26. [26]

    FaceForensics++: Learning to detect manipulated facial images

    Andreas Rössler, Davide Cozzolino, Luisa Verdoliva, Christian Riess, Justus Thies, and Matthias Nießner. FaceForensics++: Learning to detect manipulated facial images. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 1–11, 2019

  27. [27]

    On the detection of synthetic images generated by diffusion models

    Riccardo Corvi, Davide Cozzolino, Giada Zingarini, Giovanni Poggi, Koki Nagano, and Luisa Verdoliva. On the detection of synthetic images generated by diffusion models. InIEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 1–5, 2023

  28. [28]

    Poisoned forgery face: Towards backdoor attacks on face forgery detection

    Jiawei Liang, Siyuan Liang, Aishan Liu, Xiaojun Jia, Junhao Kuang, and Xiaochun Cao. Poisoned forgery face: Towards backdoor attacks on face forgery detection.arXiv preprint arXiv:2402.11473, 2024

  29. [29]

    Do GANs leave artificial fingerprints? In2019 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR), pages 506–511, 2019

    Francesco Marra, Diego Gragnaniello, Luisa Verdoliva, and Giovanni Poggi. Do GANs leave artificial fingerprints? In2019 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR), pages 506–511, 2019

  30. [30]

    Attributing fake images to gans: Learning and analyzing gan fingerprints

    Ning Yu, Larry S Davis, and Mario Fritz. Attributing fake images to gans: Learning and analyzing gan fingerprints. InProceedings of the IEEE/CVF international conference on computer vision, pages 7556–7566, 2019

  31. [31]

    Artificial fingerprinting for generative models: Rooting deepfake attribution in training data

    Ning Yu, Vladislav Skripniuk, Sahar Abdelnabi, and Mario Fritz. Artificial fingerprinting for generative models: Rooting deepfake attribution in training data. InProceedings of the IEEE/CVF International conference on computer vision, pages 14448–14457, 2021. 11

  32. [32]

    Towards discovery and attribution of open-world gan generated images

    Sharath Girish, Saksham Suri, Sai Saketh Rambhatla, and Abhinav Shrivastava. Towards discovery and attribution of open-world gan generated images. InProceedings of the IEEE/CVF international conference on computer vision, pages 14094–14103, 2021

  33. [33]

    Responsible disclosure of generative models using scalable fingerprinting

    Ning Yu, Vladislav Skripniuk, Dingfan Chen, Larry Davis, and Mario Fritz. Responsible disclosure of generative models using scalable fingerprinting. InInternational Conference on Learning Representations (ICLR), 2022

  34. [34]

    Source generator attribution via inversion

    Michael Albright and Scott McCloskey. Source generator attribution via inversion. InProceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Media Forensics, pages 96–103, 2019

  35. [35]

    Progres- sive open space expansion for open-set model attribution

    Tianyun Yang, Danding Wang, Fan Tang, Xinying Zhao, Juan Cao, and Sheng Tang. Progres- sive open space expansion for open-set model attribution. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 15856–15865, 2023

  36. [36]

    ManiFPT: Defining and analyzing fingerprints of generative models

    Hae Jin Song, Mahyar Khayatkhoei, and Wael AbdAlmageed. ManiFPT: Defining and analyzing fingerprints of generative models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024

  37. [37]

    Fakepcd: Fake point cloud detection via source attribution

    Yiting Qu, Zhikun Zhang, Yun Shen, Michael Backes, and Yang Zhang. Fakepcd: Fake point cloud detection via source attribution. InProceedings of the 19th ACM Asia Conference on Computer and Communications Security, pages 930–946, 2024

  38. [38]

    Universal watermark vaccine: Universal adversarial perturbations for watermark protection

    Jianbo Chen, Xinwei Liu, Siyuan Liang, Xiaojun Jia, and Yuan Xun. Universal watermark vaccine: Universal adversarial perturbations for watermark protection. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2023

  39. [39]

    arXiv preprint arXiv:2412.01528 , year=

    Zhixiang Guo, Siyuan Liang, Aishan Liu, and Dacheng Tao. Copyrightshield: Spatial similarity guided backdoor defense against copyright infringement in diffusion models.arXiv preprint arXiv:2412.01528, 2024

  40. [40]

    Me: Trigger element combination backdoor attack on copyright infringement.arXiv preprint arXiv:2506.10776, 2025

    Feiyu Yang, Siyuan Liang, Aishan Liu, and Dacheng Tao. Me: Trigger element combination backdoor attack on copyright infringement.arXiv preprint arXiv:2506.10776, 2025

  41. [41]

    Imitated detectors: Stealing knowledge of black-box object detectors

    Siyuan Liang, Aishan Liu, Jiawei Liang, Longkang Li, Yang Bai, and Xiaochun Cao. Imitated detectors: Stealing knowledge of black-box object detectors. InProceedings of the 30th ACM International Conference on Multimedia, pages 4839–4847, 2022

  42. [42]

    A large- scale audit of dataset licensing and attribution in ai.Nature Machine Intelligence, 6(8):975–987, 2024

    Shayne Longpre, Robert Mahari, Anthony Chen, Naana Obeng-Marnu, Damien Sileo, William Brannon, Niklas Muennighoff, Nathan Khazam, Jad Kabbara, Kartik Perisetla, et al. A large- scale audit of dataset licensing and attribution in ai.Nature Machine Intelligence, 6(8):975–987, 2024

  43. [43]

    AI models collapse when trained on recursively generated data.Nature, 631:755–759, 2024

    Ilia Shumailov, Zakhar Shumaylov, Yiren Zhao, Nicolas Papernot, Ross Anderson, and Yarin Gal. AI models collapse when trained on recursively generated data.Nature, 631:755–759, 2024

  44. [44]

    Intrigu- ing properties of synthetic images: from generative adversarial networks to diffusion models

    Riccardo Corvi, Davide Cozzolino, Giovanni Poggi, Koki Nagano, and Luisa Verdoliva. Intrigu- ing properties of synthetic images: from generative adversarial networks to diffusion models. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 973–982, 2023

  45. [45]

    Leveraging frequency analysis for deep fake image recognition

    Joel Frank, Thorsten Eisenhofer, Lea Schönherr, Asja Fischer, Dorothea Kolossa, and Thorsten Holz. Leveraging frequency analysis for deep fake image recognition. InInternational confer- ence on machine learning, pages 3247–3258. PMLR, 2020

  46. [46]

    Watch your up-convolution: Cnn based generative deep neural networks are failing to reproduce spectral distributions

    Ricard Durall, Margret Keuper, and Janis Keuper. Watch your up-convolution: Cnn based generative deep neural networks are failing to reproduce spectral distributions. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 7890–7899, 2020. 12

  47. [47]

    Vishal Asnani, Xi Yin, Tal Hassner, and Xiaoming Liu. Reverse engineering of generative models: Inferring model hyperparameters from generated images.IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(12):15477–15493, 2023

  48. [48]

    Decoupled Weight Decay Regularization

    Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization.arXiv preprint arXiv:1711.05101, 2017

  49. [49]

    Gaussiandreamer: Fast generation from text to 3d gaussians.arXiv preprint arXiv:2311.11284, 2023

    Yixun Liang, Xin Yang, Jiantao Lin, Haodong Li, Xiaogang Xu, and Yingcong Chen. Lucid- dreamer: Towards high-fidelity text-to-3d generation via interval score matching.arXiv preprint arXiv:2311.11284, 2023

  50. [50]

    Grm: Large gaussian reconstruction model for ef- ficient 3d reconstruction and generation.arXiv preprint arXiv:2403.14621, 2024

    Yinghao Xu, Zifan Shi, Wang Yifan, Sida Peng, Ceyuan Yang, Yujun Shen, and Wetzstein Gor- don. Grm: Large gaussian reconstruction model for efficient 3d reconstruction and generation. arxiv: 2403.14621, 2024

  51. [51]

    MVDream: Multi-view Diffusion for 3D Generation

    Yichun Shi, Peng Wang, Jianglong Ye, Mai Long, Kejie Li, and Xiao Yang. Mvdream: Multi- view diffusion for 3d generation.arXiv preprint arXiv:2308.16512, 2023

  52. [52]

    Triplane meets gaussian splatting: Fast and generalizable single-view 3d reconstruction with transformers.arXiv preprint arXiv:2312.09147, 2023

    Zi-Xin Zou, Zhipeng Yu, Yuan-Chen Guo, Yangguang Li, Ding Liang, Yan-Pei Cao, and Song-Hai Zhang. Triplane meets gaussian splatting: Fast and generalizable single-view 3d reconstruction with transformers.arXiv preprint arXiv:2312.09147, 2023

  53. [53]

    Openlrm: Open-source large reconstruction models

    Zexin He and Tengfei Wang. Openlrm: Open-source large reconstruction models. https: //github.com/3DTopia/OpenLRM, 2023

  54. [54]

    Magic123: One image to high-quality 3d object generation using both 2d and 3d diffusion priors

    Guocheng Qian, Jinjie Mai, Abdullah Hamdi, Jian Ren, Aliaksandr Siarohin, Bing Li, Hsin-Ying Lee, Ivan Skorokhodov, Peter Wonka, Sergey Tulyakov, and Bernard Ghanem. Magic123: One image to high-quality 3d object generation using both 2d and 3d diffusion priors. InThe Twelfth International Conference on Learning Representations (ICLR), 2024

  55. [55]

    Zero-1-to-3: Zero-shot one image to 3d object, 2023

    Ruoshi Liu, Rundi Wu, Basile Van Hoorick, Pavel Tokmakov, Sergey Zakharov, and Carl V ondrick. Zero-1-to-3: Zero-shot one image to 3d object, 2023

  56. [56]

    Wonder3d: Sin- gle image to 3d using cross-domain diffusion.arXiv preprint arXiv:2310.15008, 2023

    Xiaoxiao Long, Yuan-Chen Guo, Cheng Lin, Yuan Liu, Zhiyang Dou, Lingjie Liu, Yuexin Ma, Song-Hai Zhang, Marc Habermann, Christian Theobalt, et al. Wonder3d: Single image to 3d using cross-domain diffusion.arXiv preprint arXiv:2310.15008, 2023

  57. [57]

    Free3d: Consistent novel view synthesis without 3d representation.arXiv, 2023

    Chuanxia Zheng and Andrea Vedaldi. Free3d: Consistent novel view synthesis without 3d representation.arXiv, 2023

  58. [58]

    Eschernet: A generative model for scalable view synthesis.arXiv preprint arXiv:2402.03908, 2024

    Xin Kong, Shikun Liu, Xiaoyang Lyu, Marwan Taher, Xiaojuan Qi, and Andrew J Davison. Eschernet: A generative model for scalable view synthesis.arXiv preprint arXiv:2402.03908, 2024

  59. [59]

    SyncDreamer: Generating Multiview-consistent Images from a Single-view Image

    Yuan Liu, Cheng Lin, Zijiao Zeng, Xiaoxiao Long, Lingjie Liu, Taku Komura, and Wenping Wang. Syncdreamer: Generating multiview-consistent images from a single-view image.arXiv preprint arXiv:2309.03453, 2023

  60. [60]

    Lgm: Large multi-view gaussian model for high-resolution 3d content creation.arXiv preprint arXiv:2402.05054, 2024

    Jiaxiang Tang, Zhaoxi Chen, Xiaokang Chen, Tengfei Wang, Gang Zeng, and Ziwei Liu. Lgm: Large multi-view gaussian model for high-resolution 3d content creation.arXiv preprint arXiv:2402.05054, 2024

  61. [61]

    Latent-nerf for shape-guided generation of 3d shapes and textures

    Gal Metzer, Elad Richardson, Or Patashnik, Raja Giryes, and Daniel Cohen-Or. Latent-nerf for shape-guided generation of 3d shapes and textures. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12663–12673, 2023

  62. [62]

    Empty” denotes missing prompts only during inference, while “Empty*

    Haochen Wang, Xiaodan Du, Jiahao Li, Raymond A Yeh, and Greg Shakhnarovich. Score jacobian chaining: Lifting pretrained 2d diffusion models for 3d generation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12619–12629, 2023. 13 Figure 6: Overview of our attribution benchmark. Table 6: Overview of 3D Generation M...