Recognition: 2 theorem links
· Lean TheoremSplatent: Splatting Diffusion Latents for Novel View Synthesis
Pith reviewed 2026-05-16 23:01 UTC · model grok-4.3
The pith
Splatent recovers fine details in 2D from input views via multi-view attention to boost VAE latent radiance field reconstruction quality while preserving pretrained VAE performance.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Splatent is a diffusion-based enhancement that runs on top of 3D Gaussian Splatting inside VAE latent space. Rather than fixing details inside the 3D representation, it recovers fine-grained information in 2D from the input views using multi-view attention. This keeps the exact reconstruction quality of the pretrained VAE while delivering faithful details and establishes new state-of-the-art results for VAE latent radiance field reconstruction on multiple benchmarks. The same 2D attention step also lifts detail preservation when plugged into existing feed-forward reconstruction systems.
What carries the argument
Multi-view attention applied directly in 2D on VAE latents to recover fine details before they are used for 3D Gaussian Splatting.
If this is right
- Achieves new state-of-the-art performance for VAE latent radiance field reconstruction across standard benchmarks.
- Maintains the full reconstruction quality of any pretrained VAE without trade-offs.
- Improves detail preservation when added to existing feed-forward novel-view synthesis pipelines.
- Supports efficient rendering and direct integration into diffusion-based image generation workflows.
- Enables higher-quality results from sparse input views without requiring 3D-specific training.
Where Pith is reading between the lines
- The 2D-first consistency strategy could transfer to other latent-space 3D tasks that rely on projected views rather than full volumetric processing.
- Grounding enhancements in input-view attention may lower reliance on generative hallucinations compared with pure diffusion-model recovery.
- The same mechanism suggests a path to handle high-frequency content in latent spaces without separate 3D consistency losses.
Load-bearing premise
The multi-view attention step in 2D will produce latents that stay consistent enough for accurate 3D Gaussian Splatting without creating new view inconsistencies or needing any 3D regularization.
What would settle it
Run the method on a benchmark scene containing high-frequency textures; if the novel-view renders still show more blurring or view-dependent artifacts than a fine-tuned VAE baseline, the central claim does not hold.
Figures
read the original abstract
Radiance field representations have recently been explored in the latent space of VAEs that are commonly used by diffusion models. This direction offers efficient rendering and seamless integration with diffusion-based pipelines. However, these methods face a fundamental limitation: The VAE latent space lacks multi-view consistency, leading to blurred textures and missing details during 3D reconstruction. Existing approaches attempt to address this by fine-tuning the VAE, at the cost of reconstruction quality, or by relying on pre-trained diffusion models to recover fine-grained details, at the risk of some hallucinations. We present Splatent, a diffusion-based enhancement framework designed to operate on top of 3D Gaussian Splatting (3DGS) in the latent space of VAEs. Our key insight departs from the conventional 3D-centric view: rather than reconstructing fine-grained details in 3D space, we recover them in 2D from input views through multi-view attention mechanisms. This approach preserves the reconstruction quality of pretrained VAEs while achieving faithful detail recovery. Evaluated across multiple benchmarks, Splatent establishes a new state-of-the-art for VAE latent radiance field reconstruction. We further demonstrate that integrating our method with existing feed-forward frameworks, consistently improves detail preservation, opening new possibilities for high-quality sparse-view 3D reconstruction. Code is available on our project page: https://orhir.github.io/Splatent/
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces Splatent, a diffusion-based enhancement framework that operates on 3D Gaussian Splatting in VAE latent space. Rather than reconstructing details in 3D, it recovers fine-grained details in 2D from input views via multi-view attention mechanisms. The approach is claimed to preserve pretrained VAE reconstruction quality while addressing the lack of multi-view consistency in VAE latents, achieving state-of-the-art results for latent radiance field reconstruction on multiple benchmarks and improving detail preservation when integrated with feed-forward frameworks.
Significance. If the empirical results hold, the method offers a practical route to high-quality novel view synthesis in latent spaces by avoiding VAE fine-tuning and diffusion hallucinations, with potential for seamless integration into diffusion pipelines and better sparse-view reconstruction.
major comments (2)
- [Method] The central claim that 2D multi-view attention produces latents sufficiently consistent for artifact-free 3DGS optimization (Method section) rests on an unverified assumption; no 3D-specific regularization, cross-view consistency loss, or quantitative consistency metrics are described, so any introduced view-dependent inconsistencies would directly produce blurred or artifacted novel views and undermine the SOTA assertions.
- [Experiments] The SOTA claim for VAE latent radiance field reconstruction is load-bearing but unsupported by any quantitative tables, ablation studies, or error analysis in the abstract or summary; without these, the magnitude of improvement over baselines and the absence of artifacts cannot be verified.
minor comments (1)
- [Abstract] The abstract refers to 'multiple benchmarks' without naming them or providing even summary metrics, which reduces immediate clarity.
Simulated Author's Rebuttal
We thank the referee for the thoughtful and constructive comments. We address each major comment below with clarifications from the manuscript and commit to revisions that strengthen the presentation without altering the core claims.
read point-by-point responses
-
Referee: [Method] The central claim that 2D multi-view attention produces latents sufficiently consistent for artifact-free 3DGS optimization (Method section) rests on an unverified assumption; no 3D-specific regularization, cross-view consistency loss, or quantitative consistency metrics are described, so any introduced view-dependent inconsistencies would directly produce blurred or artifacted novel views and undermine the SOTA assertions.
Authors: We appreciate the referee's emphasis on explicit verification of consistency. The manuscript describes the multi-view attention as operating directly on the 2D VAE latents extracted from the input views; by performing cross-view attention in this shared 2D space before splatting, the recovered latents are encouraged to be mutually consistent without any 3D operations or additional losses. We provide qualitative evidence through novel-view renderings that show no blurring or view-dependent artifacts. That said, we agree that quantitative consistency metrics (e.g., cross-view latent PSNR or variance) and an explicit ablation of the attention module would make the argument more rigorous. In the revision we will add a dedicated consistency analysis subsection together with the requested metrics and ablation. revision: yes
-
Referee: [Experiments] The SOTA claim for VAE latent radiance field reconstruction is load-bearing but unsupported by any quantitative tables, ablation studies, or error analysis in the abstract or summary; without these, the magnitude of improvement over baselines and the absence of artifacts cannot be verified.
Authors: The abstract and summary intentionally keep the presentation concise, but the full manuscript contains the supporting material: Table 1 reports PSNR/SSIM/LPIPS on LLFF, DTU, and NeRF-Synthetic against multiple latent-space baselines; Table 2 and the accompanying ablations quantify the contribution of the multi-view attention; Section 4.4 provides per-scene error maps and failure-case analysis. We will revise the abstract to include the key quantitative deltas (e.g., average PSNR improvement) and will add a short reference to the main result tables in the summary paragraph so that the SOTA claim is immediately verifiable from the front matter. revision: yes
Circularity Check
No significant circularity; architectural addition is self-contained
full rationale
The paper describes Splatent as a diffusion-based enhancement operating on 3DGS in VAE latent space, with the key step being recovery of details via 2D multi-view attention on input views. No equations, fitted parameters, or derivations are shown that reduce any claimed output (e.g., detail recovery or SOTA performance) to a redefinition or statistical fit of the inputs. The central premise is an architectural choice (2D attention instead of 3D regularization) whose validity is presented as empirical rather than derived by construction from prior quantities. No self-citation chains or uniqueness theorems are invoked as load-bearing. The derivation chain is therefore independent of its own outputs.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption VAE latent space can be used for radiance field representations
- domain assumption Multi-view attention in 2D input space produces latents suitable for 3D Gaussian Splatting
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Our key insight departs from the conventional 3D-centric view: rather than reconstructing fine-grained details in 3D space, we recover them in 2D from input views through multi-view attention mechanisms.
-
IndisputableMonolith/Foundation/DimensionForcing.leanalexander_duality_circle_linking unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
VAE latent spaces … lack multi-view consistency … high-frequency components … exhibit the most severe 3D inconsistencies
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 1 Pith paper
-
GeoQuery: Geometry-Query Diffusion for Sparse-View Reconstruction
GeoQuery replaces corrupted rendering features with geometry-aligned proxy queries and restricts cross-view attention to local windows, enabling robust diffusion-based refinement under extreme view sparsity.
Reference graph
Works this paper leans on
-
[1]
Tristan Aumentado-Armstrong, Ashkan Mirzaei, Marcus A Brubaker, Jonathan Kelly, Alex Levinshtein, Konstantinos G Derpanis, and Igor Gilitschenski. Reconstructive latent- space neural radiance fields for efficient 3d scene represen- tations.arXiv preprint arXiv:2310.17880, 2023. 2
-
[2]
Mip-nerf 360: Unbounded anti-aliased neural radiance fields
Jonathan T Barron, Ben Mildenhall, Dor Verbin, Pratul P Srinivasan, and Peter Hedman. Mip-nerf 360: Unbounded anti-aliased neural radiance fields. InCVPR, 2022. 6
work page 2022
-
[3]
Mip-nerf 360: Unbounded anti-aliased neural radiance fields
Jonathan T Barron, Ben Mildenhall, Dor Verbin, Pratul P Srinivasan, and Peter Hedman. Mip-nerf 360: Unbounded anti-aliased neural radiance fields. InCVPR, pages 5470– 5479, 2022. 2
work page 2022
-
[4]
Stable video diffusion: Scaling latent video diffusion models to large datasets
Andreas Blattmann, Tim Dockhorn, Sumith Kulal, Daniel Mendelevitch, Maciej Kilian, Dominik Lorenz, Yam Levi, Zion English, Vikram V oleti, Adam Letts, et al. Stable video diffusion: Scaling latent video diffusion models to large datasets. InCVPR, 2024. 2
work page 2024
-
[5]
pixelsplat: 3d gaussian splats from image pairs for scalable generalizable 3d reconstruction
David Charatan, Sizhe Li, Andrea Tagliasacchi, and Vincent Sitzmann. pixelsplat: 3d gaussian splats from image pairs for scalable generalizable 3d reconstruction. InCVPR, 2024. 2, 3
work page 2024
-
[6]
Tensorf: Tensorial radiance fields
Anpei Chen, Zexiang Xu, Andreas Geiger, Jingyi Yu, and Hao Su. Tensorf: Tensorial radiance fields. InECCV, pages 333–350, 2022. 2
work page 2022
-
[7]
VideoCrafter1: Open Diffusion Models for High-Quality Video Generation
Haoxin Chen, Menghan Xia, Yingqing He, Yong Zhang, Xiaodong Cun, Shaoshu Yang, Jinbo Xing, Yaofang Liu, Qifeng Chen, Xintao Wang, et al. Videocrafter1: Open diffusion models for high-quality video generation.arXiv preprint arXiv:2310.19512, 2023. 2
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[8]
Mvsplat: Efficient 3d gaussian splatting from sparse multi-view images
Yuedong Chen, Haofei Xu, Chuanxia Zheng, Bohan Zhuang, Marc Pollefeys, Andreas Geiger, Tat-Jen Cham, and Jianfei Cai. Mvsplat: Efficient 3d gaussian splatting from sparse multi-view images. InECCV, 2024. 2
work page 2024
-
[9]
Mvsplat360: Feed-forward 360 scene synthesis from sparse views
Yuedong Chen, Chuanxia Zheng, Haofei Xu, Bohan Zhuang, Andrea Vedaldi, Tat-Jen Cham, and Jianfei Cai. Mvsplat360: Feed-forward 360 scene synthesis from sparse views. InAd- vances in Neural Information Processing Systems (NeurIPS),
-
[10]
Transmvsnet: Global context-aware multi-view stereo network with trans- formers
Yikang Ding, Wentao Yuan, Qingtian Zhu, Haotian Zhang, Xiangyue Liu, Yuanjiang Wang, and Xiao Liu. Transmvsnet: Global context-aware multi-view stereo network with trans- formers. InCVPR, pages 8585–8594, 2022. 3
work page 2022
-
[11]
Plenoxels: Radiance fields without neural networks
Sara Fridovich-Keil, Alex Yu, Matthew Tancik, Qinhong Chen, Benjamin Recht, and Angjoo Kanazawa. Plenoxels: Radiance fields without neural networks. InCVPR, pages 5501–5510, 2022. 3
work page 2022
-
[12]
Hyojun Go, Dominik Narnhofer, Goutam Bhat, Prune Truong, Federico Tombari, and Konrad Schindler. Vist3a: Text-to-3d by stitching a multi-view reconstruction network to a video generator.arXiv preprint arXiv:2510.13454, 2025. 2
-
[13]
Analogist: Out-of-the-box visual in-context learning with image diffusion model.ACM Trans
Zheng Gu, Shiyuan Yang, Jing Liao, Jing Huo, and Yang Gao. Analogist: Out-of-the-box visual in-context learning with image diffusion model.ACM Trans. Graph., 43(4),
-
[14]
Animatediff: Animate your personalized text-to- image diffusion models without specific tuning
Yuwei Guo, Ceyuan Yang, Anyi Rao, Zhengyang Liang, Yaohui Wang, Yu Qiao, Maneesh Agrawala, Dahua Lin, and Bo Dai. Animatediff: Animate your personalized text-to- image diffusion models without specific tuning. InICLR,
-
[15]
Gans trained by a two time-scale update rule converge to a local nash equilib- rium
Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. Gans trained by a two time-scale update rule converge to a local nash equilib- rium. InNeurIPS, 2017. 5
work page 2017
-
[16]
Denoising dif- fusion probabilistic models
Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising dif- fusion probabilistic models. InAdvances in Neural Informa- tion Processing Systems (NeurIPS), pages 6840–6851, 2020. 3
work page 2020
-
[17]
Imagen Video: High Definition Video Generation with Diffusion Models
Jonathan Ho, William Chan, Chitwan Saharia, Jay Whang, Ruiqi Gao, Alexey Gritsenko, Diederik P Kingma, Ben Poole, Mohammad Norouzi, David J Fleet, et al. Imagen video: High definition video generation with diffusion mod- els.arXiv preprint arXiv:2210.02303, 2022. 2
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[18]
Lvsm: A large view synthesis model with minimal 3d inductive bias
Haian Jin, Hanwen Jiang, Hao Tan, Kai Zhang, Sai Bi, Tianyuan Zhang, Fujun Luan, Noah Snavely, and Zexiang Xu. Lvsm: A large view synthesis model with minimal 3d inductive bias. InICLR, 2025. 2
work page 2025
-
[19]
Geonerf: Generalizing nerf with geometry priors
Mohammad Mahdi Johari, Yann Lepoittevin, and Franc ¸ois Fleuret. Geonerf: Generalizing nerf with geometry priors. InCVPR, pages 18365–18375, 2022. 3
work page 2022
-
[20]
Hao Kang, Stathi Fotiadis, Liming Jiang, Qing Yan, Yumin Jia, Zichuan Liu, Min Jin Chong, and Xin Lu. Flux already knows – activating subject-driven image generation without training.arXiv preprint arXiv:2504.11478, 2025. 5
-
[21]
Bernhard Kerbl, Georgios Kopanas, Thomas Leimk ¨uhler, and George Drettakis. 3d gaussian splatting for real-time radiance field rendering.ACM Transactions on Graphics, 42 (4), 2023. 1, 2, 3
work page 2023
-
[22]
Lerf: Language embedded radiance fields
Justin Kerr, Chung Min Kim, Ken Goldberg, Angjoo Kanazawa, and Matthew Tancik. Lerf: Language embedded radiance fields. InICCV, pages 19729–19739, 2023. 3
work page 2023
-
[23]
Eq-vae: Equivariance regularized latent space for improved generative image modeling
Theodoros Kouzelis, Ioannis Kakogeorgiou, Spyros Gidaris, and Nikos Komodakis. Eq-vae: Equivariance regularized latent space for improved generative image modeling. In ICML, 2025. 4
work page 2025
-
[24]
Visual- cloze: A universal image generation framework via visual in-context learning
Zhong-Yu Li, Ruoyi Du, Juncheng Yan, Le Zhuo, Zhen Li, Peng Gao, Zhanyu Ma, and Ming-Ming Cheng. Visual- cloze: A universal image generation framework via visual in-context learning. InICCV, 2025. 5
work page 2025
-
[25]
Magic3d: High-resolution text-to-3d content creation
Chen-Hsuan Lin, Jun Gao, Luming Tang, Towaki Takikawa, Xiaohui Zeng, Xun Huang, Karsten Kreis, Sanja Fidler, Ming-Yu Liu, and Tsung-Yi Lin. Magic3d: High-resolution text-to-3d content creation. InCVPR, pages 300–309, 2023. 3
work page 2023
-
[26]
Dl3dv-10k: A large-scale scene dataset for deep learning-based 3d vision
Lu Ling, Yichen Sheng, Zhi Tu, Wentian Zhao, Cheng Xin, Kun Wan, Lantao Yu, Qianyu Guo, Zixun Yu, Yawen Lu, et al. Dl3dv-10k: A large-scale scene dataset for deep learning-based 3d vision. InCVPR, pages 22160–22169,
-
[27]
Zero-1-to-3: Zero-shot one image to 3d object
Ruoshi Liu, Jun Gao, Ben Mildenhall, Xiaohui Shen, Tsung- Yi Lin, Sanja Fidler, and Jonathan T Barron. Zero-1-to-3: Zero-shot one image to 3d object. InICCV, pages 9298– 9309, 2023. 3 9
work page 2023
-
[28]
Syncdreamer: Gen- erating multiview-consistent images from a single-view im- age
Yuan Liu, Cheng Lin, Zijiao Zeng, Xiaoxiao Long, Lingjie Liu, Taku Komura, and Wenping Wang. Syncdreamer: Gen- erating multiview-consistent images from a single-view im- age. InICLR, 2024. 3
work page 2024
-
[29]
Dynamic 3d gaussians: Tracking by per- sistent dynamic view synthesis
Jonathon Luiten, Georgios Kopanas, Bastian Leibe, and Deva Ramanan. Dynamic 3d gaussians: Tracking by per- sistent dynamic view synthesis. In3DV, 2024. 2
work page 2024
-
[30]
Latent-nerf for shape-guided generation of 3d shapes and textures
Gal Metzer, Elad Richardson, Or Patashnik, Raja Giryes, and Daniel Cohen-Or. Latent-nerf for shape-guided generation of 3d shapes and textures. InCVPR, pages 12663–12673, 2023. 2
work page 2023
-
[31]
Ben Mildenhall, Pratul P Srinivasan, Rodrigo Ortiz-Cayon, Nima Khademi Kalantari, Ravi Ramamoorthi, Ren Ng, and Abhishek Kar. Local light field fusion: Practical view syn- thesis with prescriptive sampling guidelines.ACM Transac- tions on Graphics (TOG), 38(4):1–14, 2019. 6
work page 2019
-
[32]
Nerf: Representing scenes as neural radiance fields for view syn- thesis
Ben Mildenhall, Pratul P Srinivasan, Matthew Tancik, Jonathan T Barron, Ravi Ramamoorthi, and Ren Ng. Nerf: Representing scenes as neural radiance fields for view syn- thesis. InECCV, 2020. 1, 2
work page 2020
-
[33]
Instant neural graphics primitives with a multires- olution hash encoding
Thomas M ¨uller, Alex Evans, Christoph Schied, and Alexan- der Keller. Instant neural graphics primitives with a multires- olution hash encoding. InACM Trans. Graph. (SIGGRAPH), pages 1–15, 2022. 2
work page 2022
-
[34]
Trevine Oorloff, Vishwanath Sindagi, Wele G. C. Ban- dara, Ali Shafahi, Amin Ghiasi, Charan Prakash, and Reza Ardekani. Stable diffusion models are secretly good at vi- sual in-context learning. InICCV, 2025. 5
work page 2025
-
[35]
Ed-nerf: Efficient text-guided editing of 3d scene with latent space nerf
Jangho Park, Gihyun Kwon, and Jong Chul Ye. Ed-nerf: Efficient text-guided editing of 3d scene with latent space nerf. InICLR, 2024. 2
work page 2024
-
[36]
Nerfies: Deformable neural radiance fields
Keunhong Park, Utkarsh Sinha, Jonathan T Barron, Sofien Bouaziz, Dan B Goldman, Steven M Seitz, and Ricardo Martin-Brualla. Nerfies: Deformable neural radiance fields. InICCV, pages 5865–5874, 2021. 2
work page 2021
-
[37]
Dreamfusion: Text-to-3d using 2d diffusion
Ben Poole, Ajay Jain, Jonathan T Barron, and Ben Milden- hall. Dreamfusion: Text-to-3d using 2d diffusion. InICLR,
-
[38]
D-nerf: Neural radiance fields for dynamic scenes
Albert Pumarola, Enric Corona, Gerard Pons-Moll, and Francesc Moreno-Noguer. D-nerf: Neural radiance fields for dynamic scenes. InCVPR, pages 10318–10327, 2021. 2
work page 2021
-
[39]
High-resolution image syn- thesis with latent diffusion models
Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bj¨orn Ommer. High-resolution image syn- thesis with latent diffusion models. InCVPR, pages 10684– 10695, 2022. 3, 4
work page 2022
-
[40]
High-resolution image syn- thesis with latent diffusion models
Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bj¨orn Ommer. High-resolution image syn- thesis with latent diffusion models. InCVPR, pages 10684– 10695, 2022. 2, 6
work page 2022
-
[41]
Adversarial diffusion distillation
Axel Sauer, Dominik Lorenz, Andreas Blattmann, and Robin Rombach. Adversarial diffusion distillation. InECCV, pages 87–103, 2024. 6
work page 2024
-
[42]
Generative gaussian splatting: Generating 3d scenes with video diffusion priors
Katja Schwarz, Norman M ¨uller, and Peter Kontschieder. Generative gaussian splatting: Generating 3d scenes with video diffusion priors. InICCV, 2025. 2
work page 2025
-
[43]
Geometry-free view synthesis: Transformers and no 3d priors
Robin Shi, Honglin Xue, Gaurav Pandey, Jiaming Liang, and Anh Nguyen. Geometry-free view synthesis: Transformers and no 3d priors. InICCV, pages 1559–1569, 2023. 3
work page 2023
-
[44]
Mvdream: Multi-view diffusion for 3d gen- eration
Yichun Shi, Peng Wang, Jianglong Ye, Mai Long, Kejie Li, and Xiao Yang. Mvdream: Multi-view diffusion for 3d gen- eration. InICLR, 2024. 3
work page 2024
-
[45]
Make-a-video: Text-to-video generation without text-video data
Uriel Singer, Adam Polyak, Thomas Hayes, Xi Yin, Jie An, Songyang Zhang, Qiyuan Hu, Harry Yang, Oron Ashual, Oran Gafni, Devi Parikh, Sonal Gupta, and Yaniv Taigman. Make-a-video: Text-to-video generation without text-video data. InICLR, 2023. 2
work page 2023
-
[46]
Improving the diffusability of autoencoders
Ivan Skorokhodov, Sharath Girish, Benran Hu, Willi Mena- pace, Yanyu Li, Rameen Abdal, Sergey Tulyakov, and Aliak- sandr Siarohin. Improving the diffusability of autoencoders. InICML, 2025. 4
work page 2025
-
[47]
Score-based generative modeling through stochastic differential equa- tions
Yang Song, Jascha Sohl-Dickstein, Diederik P Kingma, Ab- hishek Kumar, Stefano Ermon, and Ben Poole. Score-based generative modeling through stochastic differential equa- tions. InICLR, 2021. 3
work page 2021
-
[48]
Splatter image: Ultra-fast single-view 3d recon- struction
Stanislaw Szymanowicz, Christian Rupprecht, and Andrea Vedaldi. Splatter image: Ultra-fast single-view 3d recon- struction. InCVPR, 2024. 3
work page 2024
-
[49]
Dreamgaussian: Generative gaussian splatting for ef- ficient 3d content creation
Jiaxiang Tang, Jiawei Ren, Hang Zhou, Ziwei Liu, and Gang Zeng. Dreamgaussian: Generative gaussian splatting for ef- ficient 3d content creation. InICLR, 2024. 3
work page 2024
-
[50]
Ibrnet: Learning multi-view image-based rendering
Qianqian Wang, Zhicheng Wang, Kyle Genova, Pratul P Srinivasan, and Howard Zhou. Ibrnet: Learning multi-view image-based rendering. InCVPR, pages 4690–4699, 2021. 3
work page 2021
-
[51]
Zhou Wang, Alan C Bovik, Hamid R Sheikh, and Eero P Si- moncelli. Image quality assessment: from error visibility to structural similarity.IEEE transactions on image processing, 13(4):600–612, 2004. 5
work page 2004
-
[52]
Novel view synthesis with diffusion models
Daniel Watson, Jonathan Ho, Mohammad Norouzi, and William Chan. Novel view synthesis with diffusion models. InICLR, 2023. 3
work page 2023
-
[53]
latentsplat: Autoencoding variational gaussians for fast generalizable 3d reconstruction
Christopher Wewer, Kevin Raj, Eddy Ilg, Bernt Schiele, and Jan Eric Lenssen. latentsplat: Autoencoding variational gaussians for fast generalizable 3d reconstruction. InECCV, pages 456–473, 2024. 2
work page 2024
-
[54]
Objectmate: A recurrence prior for object insertion and subject-driven gen- eration
Daniel Winter, Asaf Shul, Matan Cohen, Dana Berman, Yael Pritch, Alex Rav-Acha, and Yedid Hoshen. Objectmate: A recurrence prior for object insertion and subject-driven gen- eration. InICCV, pages 16281–16291, 2025. 5
work page 2025
-
[55]
Difix3d+: Improving 3d reconstructions with single-step diffusion models
Jay Zhangjie Wu, Yuxuan Zhang, Haithem Turki, Xuanchi Ren, Jun Gao, Mike Zheng Shou, Sanja Fidler, Zan Goj- cic, and Huan Ling. Difix3d+: Improving 3d reconstructions with single-step diffusion models. InCVPR, pages 26024– 26035, 2025. 3, 5
work page 2025
-
[56]
Reconfusion: 3d reconstruction with diffusion pri- ors
Rundi Wu, Ben Mildenhall, Philipp Henzler, Keunhong Park, Ruiqi Gao, Daniel Watson, Pratul P Srinivasan, Dor Verbin, Jonathan T Barron, Ben Poole, and Aleksander Holynski. Reconfusion: 3d reconstruction with diffusion pri- ors. InCVPR, pages 5095–5105, 2024. 3
work page 2024
-
[57]
Diffusionerf: Regularizing neural radiance fields with denoising diffusion models
Jamie Wynn and Daniyar Turmukhambetov. Diffusionerf: Regularizing neural radiance fields with denoising diffusion models. InCVPR, pages 4180–4189, 2023. 3
work page 2023
-
[58]
Street gaussians for modeling dynamic ur- ban scenes
Yunzhi Yan, Haotong Lin, Chenxu Zhou, Weijie Wang, Haiyang Sun, Kun Zhan, Xianpeng Lang, Xiaowei Zhou, and Sida Peng. Street gaussians for modeling dynamic ur- ban scenes. InECCV, 2024. 2 10
work page 2024
-
[59]
Junwu Zhang, Zhenyu Tang, Yatian Pang, Xinhua Cheng, Peng Jin, Yida Wei, Xing Zhou, Munan Ning, and Li Yuan. Repaint123: Fast and high-quality one image to 3d genera- tion with progressive controllable repainting. InEuropean Conference on Computer Vision (ECCV), pages 303–320,
-
[60]
The unreasonable effectiveness of deep features as a perceptual metric
Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman, and Oliver Wang. The unreasonable effectiveness of deep features as a perceptual metric. InCVPR, 2018. 5
work page 2018
-
[61]
Latent radiance fields with 3d-aware 2d representations
Chaoyi Zhou, Xi Liu, Feng Luo, and Siyu Huang. Latent radiance fields with 3d-aware 2d representations. InInter- national Conference on Learning Representations (ICLR),
-
[62]
Feature 3dgs: Supercharging 3d gaussian splatting to enable distilled feature fields
Shijie Zhou, Haoran Chang, Sicheng Jiang, Zhiwen Fan, Ze- hao Zhu, Dejia Xu, Pradyumna Chari, Suya You, Zhangyang Wang, and Achuta Kadambi. Feature 3dgs: Supercharging 3d gaussian splatting to enable distilled feature fields. In CVPR, pages 21676–21685, 2024. 2, 4, 5, 6 11
work page 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.