pith. machine review for the scientific record. sign in

arxiv: 2605.09071 · v1 · submitted 2026-05-09 · 💻 cs.CV

Recognition: no theorem link

Probability-Flow Distillation: Exact Wasserstein Gradient Flow for High-Fidelity 3D Generation

Authors on Pith no claims yet

Pith reviewed 2026-05-12 02:21 UTC · model grok-4.3

classification 💻 cs.CV
keywords probability flow distillationwasserstein gradient flowscore distillation samplingtext-to-3D generationdiffusion models3D generationmode collapse
0
0 comments X

The pith

Probability-Flow Distillation exactly matches the Wasserstein gradient flow to produce higher-fidelity 3D models from 2D priors.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Score Distillation Sampling and its inversion variant often yield over-smoothed or mode-collapsed 3D outputs because they rely on a posterior mean estimator. The paper shows this estimator equals a single-step Euler approximation of the deterministic reverse diffusion trajectory, which prevents full capture of the target distribution. Probability-Flow Distillation replaces that estimator with the complete probability-flow trajectory. The authors prove the resulting objective is identical to the Wasserstein gradient flow, which supplies principled dynamics for matching distributions. This change produces 3D assets that retain finer details and align more closely with the intended output distribution.

Core claim

Probability-Flow Distillation (PFD) corresponds exactly to a Wasserstein gradient flow, thereby inducing principled distribution-matching dynamics that address the mode collapse and incomplete sampling seen in prior score distillation methods for text-to-3D generation.

What carries the argument

Probability-Flow Distillation, the extension of score-distillation-via-inversion that substitutes the full probability-flow ODE for the posterior-mean estimator and is shown to equal the Wasserstein gradient flow.

Load-bearing premise

The mathematical equivalence between PFD and the Wasserstein gradient flow holds without additional approximations or hidden steps.

What would settle it

A direct derivation of the PFD gradient that fails to match the standard Wasserstein gradient flow equation, or a controlled comparison of 3D models generated by PFD versus SDI on identical prompts that shows no gain in detail fidelity or distribution coverage.

Figures

Figures reproduced from arXiv: 2605.09071 by A. N. Rajagopalan, Rohith Ramanan.

Figure 1
Figure 1. Figure 1: Examples of 3D objects generated using PFD. [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Comparison of different distillation gradients. (a) [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Visualization of 2D particle evolution targeting a concentric circles dataset. The rows depict [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Illustration of the proposed text-to-3D generation pipeline based on PFD. The DDIM-ODE [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Qualitative comparison of 3D objects generated by SDS, SDI, VSD, CSD, and PFD. [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: (a,b) SDI at 64 × 64 and 512 × 512, (c,d) PFD at 64 × 64 and 512 × 512 E.2 Effect of the CFG Scale Observation at varying scales. We analyze the effect of the CFG scale on generation quality. As the guidance scale increases, the generated outputs become increasingly noisy and distorted (see [PITH_FULL_IMAGE:figures/full_fig_p019_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: ). This behavior arises because larger guidance scales amplify the guidance term, making the discretized DDIM-ODE less stable during optimization. In practice, moderate guidance scales in the range of 5 to 12 provide a good balance between convergence speed and prompt alignment without causing instability. Accordingly, we use a CFG scale of 7.5 in all experiments [PITH_FULL_IMAGE:figures/full_fig_p019_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Effect of forward and reverse CFG scales [PITH_FULL_IMAGE:figures/full_fig_p020_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Effect of time annealing during late-stage optimization on generated 3D assets. [PITH_FULL_IMAGE:figures/full_fig_p020_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Illustration of the Janus problem. Generated 3D objects exhibit inconsistent geometry [PITH_FULL_IMAGE:figures/full_fig_p020_10.png] view at source ↗
read the original abstract

Score Distillation Sampling (SDS) and its variants have been widely used for text-to-3D generation by distilling 2D image diffusion priors. However, the standard SDS objective is prone to severe mode collapse, frequently yielding over-smoothed and over-saturated results. Although recent advancements, such as Score Distillation via Inversion (SDI), mitigate these artifacts and produce visually sharper models, they ultimately fail to faithfully capture the full target distribution. In this work, we show that the bottleneck limiting the sampling capacity of SDI stems from its reliance on the posterior mean estimator, which is mathematically equivalent to a single-step Euler approximation of the deterministic reverse DDIM trajectory. To address this, we propose a naturally motivated extension termed Probability-Flow Distillation (PFD). We establish that PFD corresponds exactly to a Wasserstein gradient flow, thereby inducing principled distribution-matching dynamics. Finally, we show that PFD can synthesize 3D assets with fine-grained, high-fidelity details and achieve improved quality compared to existing methods.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper identifies limitations in Score Distillation Sampling (SDS) and its variant Score Distillation via Inversion (SDI) for text-to-3D generation, attributing SDI's failure to capture the full target distribution to its reliance on the posterior mean estimator, which is equivalent to a single-step Euler approximation of the deterministic reverse DDIM trajectory. It introduces Probability-Flow Distillation (PFD) and claims that PFD corresponds exactly to a Wasserstein gradient flow on the space of measures, thereby inducing principled distribution-matching dynamics that yield 3D assets with fine-grained, high-fidelity details and improved quality over prior methods.

Significance. If the claimed exact equivalence between PFD and the Wasserstein gradient flow holds without unaccounted discretization or approximation residuals, the work would supply a theoretically grounded alternative to heuristic distillation objectives, potentially enabling more faithful matching to the target distribution in 3D generation tasks. This could strengthen the link between optimal transport theory and practical diffusion-prior distillation, with implications for reducing mode collapse and over-smoothing artifacts.

major comments (2)
  1. [Abstract and §3] Abstract and §3 (method derivation): The central claim that 'PFD corresponds exactly to a Wasserstein gradient flow' is load-bearing for the paper's contribution, yet the provided abstract supplies no derivation steps, no explicit velocity-field matching, and no accounting for residual terms arising from score estimation, Monte-Carlo sampling over camera poses, or discretization of the probability-flow ODE. The manuscript must exhibit the precise steps showing that the PFD objective and its gradient reproduce the WGF velocity field identically, without Euler-like truncation errors analogous to those identified for SDI.
  2. [§4] §4 (experiments) and the discrete implementation: The skeptic concern that the exact match may hold only in the continuous-time limit must be addressed by showing that the practical PFD update rule (including any multi-step flow solver) introduces no residual discretization error that would undermine the 'principled distribution-matching dynamics' assertion; otherwise the improvement in fidelity cannot be attributed to the WGF equivalence.
minor comments (2)
  1. [Abstract] The abstract states that SDI 'ultimately fail[s] to faithfully capture the full target distribution' but provides no quantitative metrics or ablation details supporting this; the full manuscript should include such evidence to ground the motivation.
  2. [§2] Notation for the probability-flow ODE and the posterior mean estimator should be introduced with explicit equation numbers early in the method section to aid readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. The comments highlight important aspects of our theoretical claims and their practical implications. We address each major comment below and describe the revisions we will implement.

read point-by-point responses
  1. Referee: [Abstract and §3] Abstract and §3 (method derivation): The central claim that 'PFD corresponds exactly to a Wasserstein gradient flow' is load-bearing for the paper's contribution, yet the provided abstract supplies no derivation steps, no explicit velocity-field matching, and no accounting for residual terms arising from score estimation, Monte-Carlo sampling over camera poses, or discretization of the probability-flow ODE. The manuscript must exhibit the precise steps showing that the PFD objective and its gradient reproduce the WGF velocity field identically, without Euler-like truncation errors analogous to those identified for SDI.

    Authors: We agree that the derivation of the exact equivalence requires greater explicitness. In the revised manuscript we will expand Section 3 with a complete step-by-step derivation: starting from the probability-flow ODE, we show that the PFD loss gradient exactly recovers the Wasserstein gradient flow velocity field on the space of measures when the expectation is taken over camera poses. We will add an explicit velocity-field matching lemma and clarify that, in the continuous-time setting, score estimation is treated as exact (standard in the diffusion literature) and that PFD avoids the single-step Euler truncation that characterizes SDI. Residual terms from finite discretization and Monte-Carlo sampling will be isolated and analyzed in a new subsection. We will also revise the abstract to reference these key steps. revision: yes

  2. Referee: [§4] §4 (experiments) and the discrete implementation: The skeptic concern that the exact match may hold only in the continuous-time limit must be addressed by showing that the practical PFD update rule (including any multi-step flow solver) introduces no residual discretization error that would undermine the 'principled distribution-matching dynamics' assertion; otherwise the improvement in fidelity cannot be attributed to the WGF equivalence.

    Authors: We acknowledge that the continuous-time equivalence does not automatically extend to the discrete solver without further analysis. In the revision we will add to Section 4 both a theoretical error bound for the multi-step probability-flow ODE integrator and empirical measurements of discretization residuals across the step sizes used in our experiments. These additions will demonstrate that the observed fidelity gains are consistent with reduced deviation from the Wasserstein gradient flow relative to SDI. We will explicitly state that the term 'exact' applies to the continuous limit while the practical algorithm remains a high-fidelity approximation. revision: partial

Circularity Check

0 steps flagged

No significant circularity detected; derivation presented as independent establishment

full rationale

The provided abstract motivates PFD as a natural extension addressing the explicit single-step Euler limitation of SDI's posterior mean estimator. It then states that PFD 'corresponds exactly' to the Wasserstein gradient flow as an established result. No equations, self-citations, or definitional reductions are quoted that would make the correspondence tautological by construction. No fitted parameters are renamed as predictions, no uniqueness theorems are imported, and no ansatz is smuggled via prior work. The central claim therefore remains a non-circular derivation step within the given text.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on standard diffusion-model assumptions (DDIM reverse process, posterior mean estimator) and the mathematical identification of probability flow with Wasserstein gradient flow; no new free parameters or invented entities are introduced in the abstract.

axioms (2)
  • domain assumption The deterministic reverse DDIM trajectory can be approximated by its posterior mean estimator in a single Euler step.
    Invoked to identify the bottleneck in SDI.
  • standard math Wasserstein gradient flow provides the correct continuous dynamics for matching the target distribution in distillation.
    Used to establish that PFD induces principled distribution-matching.

pith-pipeline@v0.9.0 · 5486 in / 1373 out tokens · 61044 ms · 2026-05-12T02:21:59.997954+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

57 extracted references · 57 canonical work pages · 1 internal anchor

  1. [1]

    Denoising diffusion probabilistic models

    Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models. In Advances in Neural Information Processing Systems, volume 33, pages 6840–6851, 2020

  2. [2]

    High-Resolution Image Synthesis with Latent Diffusion Models

    Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. High- resolution image synthesis with latent diffusion models.CoRR, abs/2112.10752, 2021. URL https://arxiv.org/abs/2112.10752

  3. [3]

    Fleet, and Mohammad Norouzi

    Chitwan Saharia, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily Denton, Seyed Kamyar Seyed Ghasemipour, Raphael Gontijo-Lopes, Burcu Karagol Ayan, Tim Salimans, Jonathan Ho, David J. Fleet, and Mohammad Norouzi. Photorealistic text-to-image diffusion models with deep language understanding. In Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Ky...

  4. [4]

    Elucidating the design space of diffusion-based generative models

    Tero Karras, Miika Aittala, Timo Aila, and Samuli Laine. Elucidating the design space of diffusion-based generative models. In Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho, editors,Advances in Neural Information Processing Systems, 2022. URL https://openreview.net/forum?id=k7FuTOWMOc7

  5. [5]

    Yaron Lipman, Ricky T. Q. Chen, Heli Ben-Hamu, Maximilian Nickel, and Matthew Le. Flow matching for generative modeling. InThe Eleventh International Conference on Learning Representations, 2023. URLhttps://openreview.net/forum?id=PqvMRDCJT9t

  6. [6]

    Barron, and Ben Mildenhall

    Ben Poole, Ajay Jain, Jonathan T. Barron, and Ben Mildenhall. Dreamfusion: Text-to-3d using 2d diffusion.arXiv, 2022

  7. [7]

    Srinivasan, Matthew Tancik, Jonathan T

    Ben Mildenhall, Pratul P. Srinivasan, Matthew Tancik, Jonathan T. Barron, Ravi Ramamoorthi, and Ren Ng. Nerf: Representing scenes as neural radiance fields for view synthesis. InECCV, 2020

  8. [8]

    URL https://doi.org/10.1145/3528223

    Thomas Müller, Alex Evans, Christoph Schied, and Alexander Keller. Instant neural graphics primitives with a multiresolution hash encoding.ACM Trans. Graph., 41(4):102:1–102:15, July 2022. doi: 10.1145/3528223.3530127. URL https://doi.org/10.1145/3528223. 3530127

  9. [9]

    Deep marching tetrahedra: a hybrid representation for high-resolution 3d shape synthesis

    Tianchang Shen, Jun Gao, Kangxue Yin, Ming-Yu Liu, and Sanja Fidler. Deep marching tetrahedra: a hybrid representation for high-resolution 3d shape synthesis. InAdvances in Neural Information Processing Systems (NeurIPS), 2021

  10. [10]

    3d gaussian splatting for real-time radiance field rendering.ACM Transactions on Graphics, 42(4), July

    Bernhard Kerbl, Georgios Kopanas, Thomas Leimkühler, and George Drettakis. 3d gaussian splatting for real-time radiance field rendering.ACM Transactions on Graphics, 42(4), July

  11. [11]

    URLhttps://repo-sam.inria.fr/fungraph/3d-gaussian-splatting/

  12. [12]

    2023 , url =

    Amir Hertz, Kfir Aberman, and Daniel Cohen-Or. Delta denoising score. In2023 IEEE/CVF International Conference on Computer Vision (ICCV), pages 2328–2337, 2023. doi: 10.1109/ ICCV51070.2023.00221

  13. [13]

    Noise-free score distillation

    Oren Katzir, Or Patashnik, Daniel Cohen-Or, and Dani Lischinski. Noise-free score distillation. InThe Twelfth International Conference on Learning Representations, 2024. URL https: //openreview.net/forum?id=dlIMcmlAdk

  14. [14]

    Magic3d: High-resolution text-to-3d content creation

    Chen-Hsuan Lin, Jun Gao, Luming Tang, Towaki Takikawa, Xiaohui Zeng, Xun Huang, Karsten Kreis, Sanja Fidler, Ming-Yu Liu, and Tsung-Yi Lin. Magic3d: High-resolution text-to-3d content creation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 300–309, 2023

  15. [15]

    Fantasia3d: Disentangling geometry and appearance for high-quality text-to-3d content creation

    Rui Chen, Yongwei Chen, Ningxin Jiao, and Kui Jia. Fantasia3d: Disentangling geometry and appearance for high-quality text-to-3d content creation. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 22246–22256, October 2023

  16. [16]

    Gaussiandreamer: Fast generation from text to 3d gaussians by bridging 2d and 3d diffusion models

    Taoran Yi, Jiemin Fang, Junjie Wang, Guanjun Wu, Lingxi Xie, Xiaopeng Zhang, Wenyu Liu, Qi Tian, and Xinggang Wang. Gaussiandreamer: Fast generation from text to 3d gaussians by bridging 2d and 3d diffusion models. InCVPR, 2024

  17. [17]

    Text-to-3d with classifier score distillation

    Xin Yu, Yuan-Chen Guo, Yangguang Li, Ding Liang, Song-Hai Zhang, and XIAOJUAN QI. Text-to-3d with classifier score distillation. InThe Twelfth International Conference on Learning Representations, 2024. URLhttps://openreview.net/forum?id=ktG8Tun1Cy. 10

  18. [18]

    Pro- lificdreamer: High-fidelity and diverse text-to-3d generation with variational score distillation

    Zhengyi Wang, Cheng Lu, Yikai Wang, Fan Bao, Chongxuan Li, Hang Su, and Jun Zhu. Pro- lificdreamer: High-fidelity and diverse text-to-3d generation with variational score distillation. InAdvances in Neural Information Processing Systems (NeurIPS), 2023

  19. [19]

    LoRA: Low-rank adaptation of large language models

    Edward J Hu, yelong shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. LoRA: Low-rank adaptation of large language models. In International Conference on Learning Representations, 2022. URL https://openreview. net/forum?id=nZeVKeeFYf9

  20. [20]

    Score distillation via reparametrized ddim.Advances in Neural Information Processing Systems, 37:26011–26044, 2024

    Artem Lukoianov, Haitz S’aez de Oc’ariz Borde, Kristjan Greenewald, Vitor Guizilini, Timur Bagautdinov, Vincent Sitzmann, and Justin M Solomon. Score distillation via reparametrized ddim.Advances in Neural Information Processing Systems, 37:26011–26044, 2024

  21. [21]

    Jacobs, Alexei A

    David McAllister, Songwei Ge, Jia-Bin Huang, David W. Jacobs, Alexei A. Efros, Aleksander Holynski, and Angjoo Kanazawa. Rethinking score distillation as a bridge between image distributions. InAdvances in Neural Information Processing Systems, 2024

  22. [22]

    Dual diffusion implicit bridges for image-to-image translation

    Xuan Su, Jiaming Song, Chenlin Meng, and Stefano Ermon. Dual diffusion implicit bridges for image-to-image translation. InThe Eleventh International Conference on Learning Representa- tions, 2023. URLhttps://openreview.net/forum?id=5HLoTvVGDe

  23. [23]

    Anderson

    Brian D.O. Anderson. Reverse-time diffusion equation models.Stochastic Processes and their Applications, 12(3):313–326, 1982. ISSN 0304-4149. doi: https://doi.org/10.1016/ 0304-4149(82)90051-5. URL https://www.sciencedirect.com/science/article/ pii/0304414982900515

  24. [24]

    A connection between score matching and denoising autoencoders.Neural computation, 23(7):1661–1674, 2011

    Pascal Vincent. A connection between score matching and denoising autoencoders.Neural Computation, 23(7):1661–1674, 2011. doi: 10.1162/NECO_a_00142

  25. [25]

    Score-based generative modeling through stochastic differential equations

    Yang Song, Jascha Sohl-Dickstein, Diederik P Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. Score-based generative modeling through stochastic differential equations. In International Conference on Learning Representations, 2021. URL https://openreview. net/forum?id=PxTIG12RRHS

  26. [26]

    Denoising diffusion implicit models

    Jiaming Song, Chenlin Meng, and Stefano Ermon. Denoising diffusion implicit models. In International Conference on Learning Representations, 2021. URL https://openreview. net/forum?id=St1giarCHLP

  27. [27]

    Diffusion models beat GANs on image synthesis

    Prafulla Dhariwal and Alexander Quinn Nichol. Diffusion models beat GANs on image synthesis. In A. Beygelzimer, Y . Dauphin, P. Liang, and J. Wortman Vaughan, editors,Advances in Neural Information Processing Systems, 2021. URL https://openreview.net/forum? id=AAWuCvzaVt

  28. [28]

    Assran, Q

    Ron Mokady, Amir Hertz, Kfir Aberman, Yael Pritch, and Daniel Cohen-Or. Null-text inversion for editing real images using guided diffusion models. In2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 6038–6047, 2023. doi: 10.1109/ CVPR52729.2023.00585

  29. [29]

    American Mathematical Society, Providence, RI, 2003

    Cédric Villani.Topics in Optimal Transportation, volume 58 ofGraduate Studies in Mathemat- ics. American Mathematical Society, Providence, RI, 2003

  30. [30]

    An introduction to optimal transport and wasserstein gradient flows,

    Alessio Figalli. An introduction to optimal transport and wasserstein gradient flows,

  31. [31]

    Optimal Transport on Quantum Structures

    URL https://people.math.ethz.ch/~afigalli/lecture-notes-pdf/ An-introduction-to-optimal-transport-and-Wasserstein-gradient-flows. pdf. Lecture notes from the School “Optimal Transport on Quantum Structures”, Erd˝os Center, Alfréd Rényi Institute of Mathematics, September 19–23, 2022

  32. [32]

    Birkhäuser, 2008

    Luigi Ambrosio, Nicola Gigli, and Giuseppe Savaré.Gradient Flows in Metric Spaces and in the Space of Probability Measures. Birkhäuser, 2008

  33. [33]

    Langevin dynamics — Wikipedia, the free encyclopedia

    Wikipedia contributors. Langevin dynamics — Wikipedia, the free encyclopedia. https: //en.wikipedia.org/w/index.php?title=Langevin_dynamics&oldid=1348039185

  34. [34]

    Classifier-free diffusion guidance

    Jonathan Ho and Tim Salimans. Classifier-free diffusion guidance. InNeurIPS 2021 Workshop on Deep Generative Models and Downstream Applications, 2021. URLhttps://openreview. net/forum?id=qw8AKxfYbI

  35. [35]

    Yeh, and Greg Shakhnarovich

    Haochen Wang, Xiaodan Du, Jiahao Li, Raymond A. Yeh, and Greg Shakhnarovich. Score jacobian chaining: Lifting pretrained 2d diffusion models for 3d generation. InProceedings 11 of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 7047–7056, 2023

  36. [36]

    HIFA: High-fidelity text-to-3d generation with advanced diffusion guidance

    Junzhe Zhu, Peiye Zhuang, and Sanmi Koyejo. HIFA: High-fidelity text-to-3d generation with advanced diffusion guidance. InThe Twelfth International Conference on Learning Representations, 2024. URLhttps://openreview.net/forum?id=IZMPWmcS3H

  37. [37]

    Lucid- dreamer: Towards high-fidelity text-to-3d generation via interval score matching

    Yixun Liang, Xin Yang, Jiantao Lin, Haodong Li, Xiaogang Xu, and Yingcong Chen. Lucid- dreamer: Towards high-fidelity text-to-3d generation via interval score matching. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 6517– 6526, June 2024

  38. [38]

    Zero-1-to-3: Zero-shot one image to 3d object

    Ruoshi Liu, Rundi Wu, Basile Van Hoorick, Pavel Tokmakov, Sergey Zakharov, and Carl V ondrick. Zero-1-to-3: Zero-shot one image to 3d object. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 9298–9309, October 2023

  39. [39]

    Syncdreamer: Generating multiview-consistent images from a single-view image

    Yuan Liu, Cheng Lin, Zijiao Zeng, Xiaoxiao Long, Lingjie Liu, Taku Komura, and Wenping Wang. Syncdreamer: Generating multiview-consistent images from a single-view image. InThe Twelfth International Conference on Learning Representations, 2024. URL https: //openreview.net/forum?id=MN3yH2ovHb

  40. [40]

    In: CVPR

    Xiaoxiao Long, Yuan-Chen Guo, Cheng Lin, Yuan Liu, Zhiyang Dou, Lingjie Liu, Yuexin Ma, Song-Hai Zhang, Marc Habermann, Christian Theobalt, and Wenping Wang. Wonder3d: Single image to 3d using cross-domain diffusion. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024. doi: 10.1109/CVPR52733.2024. 00951

  41. [41]

    MVDream: Multi-view diffusion for 3d generation

    Yichun Shi, Peng Wang, Jianglong Ye, Long Mai, Kejie Li, and Xiao Yang. MVDream: Multi-view diffusion for 3d generation. InThe Twelfth International Conference on Learning Representations, 2024. URLhttps://openreview.net/forum?id=FUgrjq2pbB

  42. [42]

    Re-imagine the negative prompt algorithm: Transform 2d diffusion into 3d, alleviate janus problem and beyond.arXiv preprint arXiv:2304.04968, 2023

    Mohammadreza Armandpour, Huangjie Zheng, Ali Sadeghian, Amir Sadeghian, and Mingyuan Zhou. Re-imagine the negative prompt algorithm: Transform 2d diffusion into 3d, alleviate janus problem and beyond.arXiv preprint arXiv:2304.04968, 2023

  43. [43]

    Stein variational gradient descent: A general purpose bayesian inference algorithm

    Qiang Liu and Dilin Wang. Stein variational gradient descent: A general purpose bayesian inference algorithm. InAdvances in Neural Information Processing Systems 29 (NeurIPS 2016), 2016

  44. [44]

    Chapter 10 - geometry in sampling methods: A review on man- ifold mcmc and particle-based variational inference methods

    Chang Liu and Jun Zhu. Chapter 10 - geometry in sampling methods: A review on man- ifold mcmc and particle-based variational inference methods. In Arni S.R. Srinivasa Rao, G. Alastair Young, and C.R. Rao, editors,Advancements in Bayesian Methods and Imple- mentation, volume 47 ofHandbook of Statistics, pages 239–293. Elsevier, 2022. doi: https: //doi.org/...

  45. [45]

    A unified particle- optimization framework for scalable bayesian sampling

    Changyou Chen, Ruiyi Zhang, Wenlin Wang, Bai Li, and Liqun Chen. A unified particle- optimization framework for scalable bayesian sampling. InConference on Uncertainty in Artifi- cial Intelligence, 2018. URL https://api.semanticscholar.org/CorpusID:44111731

  46. [46]

    CLIPScore: A reference-free evaluation metric for image captioning

    Jack Hessel, Ari Holtzman, Maxwell Forbes, Ronan Le Bras, and Yejin Choi. CLIPScore: A reference-free evaluation metric for image captioning. In Marie-Francine Moens, Xuanjing Huang, Lucia Specia, and Scott Wen-tau Yih, editors,Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 7514–7528, Online and Punta Cana, D...

  47. [47]

    Exploring clip for assessing the look and feel of images

    Jianyi Wang, Kelvin CK Chan, and Chen Change Loy. Exploring clip for assessing the look and feel of images. InAAAI, 2023

  48. [48]

    Torch- metrics - measuring reproducibility in pytorch.Journal of Open Source Software, 7(70):4101,

    Nicki Skafte Detlefsen, Jiri Borovec, Justus Schock, Ananya Harsh Jha, Teddy Koker, Luca Di Liello, Daniel Stancl, Changsheng Quan, Maxim Grechkin, and William Falcon. Torch- metrics - measuring reproducibility in pytorch.Journal of Open Source Software, 7(70):4101,

  49. [49]

    URLhttps://doi.org/10.21105/joss.04101

    doi: 10.21105/joss.04101. URLhttps://doi.org/10.21105/joss.04101

  50. [50]

    Imagereward: learning and evaluating human preferences for text-to-image generation

    Jiazheng Xu, Xiao Liu, Yuchen Wu, Yuxuan Tong, Qinkai Li, Ming Ding, Jie Tang, and Yuxiao Dong. Imagereward: learning and evaluating human preferences for text-to-image generation. 12 InProceedings of the 37th International Conference on Neural Information Processing Systems, pages 15903–15935, 2023

  51. [51]

    Greiner and J

    W. Greiner and J. Reinhardt.Field Quantization. Springer, 1996. ISBN 9783540591795. URL https://books.google.co.in/books?id=VvBAvf0wSrIC

  52. [52]

    Engel and R.M

    E. Engel and R.M. Dreizler.Density Functional Theory: An Advanced Course. Theoretical and Mathematical Physics. Springer Berlin Heidelberg, 2011. ISBN 9783642140891. URL https://books.google.co.in/books?id=o9byjwEACAAJ

  53. [53]

    Society for Industrial and Applied Mathemat- ics, second edition, 2002

    Philip Hartman.Ordinary Differential Equations. Society for Industrial and Applied Mathemat- ics, second edition, 2002. doi: 10.1137/1.9780898719222. URL https://epubs.siam.org/ doi/abs/10.1137/1.9780898719222

  54. [54]

    threestudio: A uni- fied framework for 3d content generation

    Yuan-Chen Guo, Ying-Tian Liu, Ruizhi Shao, Christian Laforte, Vikram V oleti, Guan Luo, Chia-Hao Chen, Zi-Xin Zou, Chen Wang, Yan-Pei Cao, and Song-Hai Zhang. threestudio: A uni- fied framework for 3d content generation. https://github.com/threestudio-project/ threestudio, 2023

  55. [55]

    DPM-solver++: Fast solver for guided sampling of diffusion probabilistic models, 2023

    Cheng Lu, Yuhao Zhou, Fan Bao, Jianfei Chen, Chongxuan Li, and Jun Zhu. DPM-solver++: Fast solver for guided sampling of diffusion probabilistic models, 2023. URL https:// openreview.net/forum?id=4vGwQqviud5

  56. [56]

    DeepFloyd IF.https://github.com/deep-floyd/IF, 2023

    StabilityAI. DeepFloyd IF.https://github.com/deep-floyd/IF, 2023

  57. [57]

    A marble bust of a mouse

    Patrick von Platen, Suraj Patil, Anton Lozhkov, Pedro Cuenca, Nathan Lambert, Kashif Rasul, Mishig Davaadorj, Dhruv Nair, Sayak Paul, William Berman, Yiyi Xu, Steven Liu, and Thomas Wolf. Diffusers: State-of-the-art diffusion models. https://github.com/huggingface/ diffusers, 2022. A Proof of Theorem 1 Assumption and Definitions.We assume that the underly...