HAD: Hallucination-Aware Diffusion Priors for 3D Reconstruction
Pith reviewed 2026-05-19 20:21 UTC · model grok-4.3
pith:7MNTUTZT Add to your LaTeX paper
What is a Pith Number?\usepackage{pith}
\pithnumber{7MNTUTZT}
Prints a linked pith:7MNTUTZT badge after your title and writes the identifier into PDF metadata. Compiles on arXiv with no extra files. Learn more
The pith
HAD estimates pixel-wise hallucination scores from a pre-trained novel view synthesis network to mask unreliable pixels in diffusion-augmented images during sparse-view 3D reconstruction.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
HAD estimates pixel-wise hallucination score maps for augmented images by leveraging multi-view reasoning capabilities from a feedforward novel view synthesis network pre-trained on large-scale 3D data. These hallucination scores enable selective masking of unreliable pixels during the progressive 3D reconstruction procedure, preventing the introduction of non-existent artifacts into the 3D model. To further enhance performance, multiple versions of augmented images at each novel view are created by conditioning the diffusion prior on different input views and then fused into a final image that leverages the broader context across all input views.
What carries the argument
Pixel-wise hallucination score maps produced by a pre-trained feedforward novel view synthesis network, which identify inconsistent pixels for selective masking in diffusion-augmented training views during progressive 3D reconstruction.
If this is right
- Selective masking prevents non-existent artifacts from being baked into the final 3D model.
- Fusing multiple conditioned augmentations at each novel viewpoint incorporates broader context from all input views.
- The overall procedure substantially reduces hallucination artifacts compared to standard diffusion-assisted reconstruction.
- The approach reaches state-of-the-art results on multiple benchmarks for novel view synthesis from sparse inputs.
Where Pith is reading between the lines
- The same scoring mechanism could be tested on other generative priors used in 3D tasks to check whether multi-view consistency filtering generalizes beyond diffusion models.
- If the pre-trained network's reasoning is the key enabler, similar networks might serve as lightweight consistency checkers in related pipelines such as dynamic scene reconstruction.
- An extension worth checking is whether the fusion step remains effective when the number of original input views drops below the levels tested in the benchmarks.
Load-bearing premise
The pre-trained novel view synthesis network can reliably produce hallucination scores that accurately identify pixels inconsistent with the original input views, and that masking these pixels improves rather than harms the final 3D model quality.
What would settle it
A direct comparison showing that reconstructions using the hallucination-masked augmented views produce no measurable improvement or even lower quality on standard novel view synthesis metrics than reconstructions that use the unmasked diffusion outputs.
Figures
read the original abstract
Diffusion priors have recently demonstrated strong capability in enhancing the quality of sparse-view 3D reconstruction by augmenting training views at novel viewpoints, but they inevitably introduce hallucinated content -- artifacts inconsistent with the input views -- into the final 3D model. To address this challenge, we propose Hallucination-Aware Diffusion prior (HAD), which estimates pixel-wise hallucination score maps for augmented images by leveraging multi-view reasoning capabilities from a feedforward novel view synthesis (NVS) network pre-trained on large-scale 3D data. These hallucination scores enable selective masking of unreliable pixels during the progressive 3D reconstruction procedure, preventing the introduction of non-existent artifacts into the 3D model. To further enhance performance, we create multiple versions of augmented images at each novel view by conditioning the diffusion prior on different input views, which are then fused into a final image that leverages the broader context across all input views. We show that our method substantially reduces hallucination artifacts in diffusion-assisted 3D reconstruction, thereby achieving state-of-the-art performance across multiple benchmarks on novel view synthesis. Our project are publicly available at \href{https://xiliu8006.github.io/HAD-Project-website/}{project website}.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes Hallucination-Aware Diffusion prior (HAD) for sparse-view 3D reconstruction. It estimates pixel-wise hallucination score maps on diffusion-augmented novel-view images by exploiting multi-view reasoning from a pre-trained feedforward novel view synthesis (NVS) network. These scores enable selective masking of unreliable pixels during progressive reconstruction; multiple conditioned augmentations are fused to leverage broader context. The method is claimed to substantially reduce hallucination artifacts and deliver state-of-the-art novel-view synthesis results on multiple benchmarks.
Significance. If the quantitative claims hold, the work would be significant because it supplies a practical, training-free mechanism to detect and suppress diffusion-induced inconsistencies using an existing large-scale NVS prior. The combination of per-pixel masking and multi-view fusion directly targets a known failure mode of diffusion-assisted reconstruction pipelines and could improve reliability in downstream applications that require geometrically consistent 3D models from limited input views.
major comments (2)
- [Method / Experiments] The central claim that masking pixels flagged by the pre-trained NVS network improves final 3D reconstruction quality (rather than discarding useful signal) is load-bearing yet rests on an unverified assumption. The manuscript should provide a direct ablation (e.g., reconstruction metrics with vs. without masking) together with qualitative examples showing that masked regions correspond to genuine hallucinations rather than view-consistent content.
- [Abstract / Experiments] The abstract asserts SOTA performance across benchmarks, but the provided description contains no quantitative tables, error bars, or statistical significance tests. Without these data the magnitude of improvement attributable to HAD versus prior diffusion-augmented baselines cannot be assessed.
minor comments (2)
- [Method] Clarify the exact architecture and training data of the feedforward NVS network used for scoring; a brief citation or diagram would help readers reproduce the pipeline.
- [Method] The fusion step for multiple conditioned augmentations is described at a high level; a short algorithmic outline or pseudocode would improve clarity.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback, which helps clarify the empirical validation of our approach. We address each major comment below and outline the revisions we will make to strengthen the manuscript.
read point-by-point responses
-
Referee: [Method / Experiments] The central claim that masking pixels flagged by the pre-trained NVS network improves final 3D reconstruction quality (rather than discarding useful signal) is load-bearing yet rests on an unverified assumption. The manuscript should provide a direct ablation (e.g., reconstruction metrics with vs. without masking) together with qualitative examples showing that masked regions correspond to genuine hallucinations rather than view-consistent content.
Authors: We agree that a direct ablation study is necessary to substantiate the benefit of the hallucination-aware masking. In the revised manuscript, we will add a dedicated ablation in the Experiments section that reports reconstruction metrics (PSNR, SSIM, LPIPS) on the same benchmarks with and without the masking step. We will also include qualitative examples that visualize the hallucination score maps overlaid on the augmented views, highlighting regions that are masked and demonstrating their inconsistency with the input views (e.g., via multi-view consistency checks) rather than view-consistent geometry. This addition directly addresses the concern and will be supported by the existing multi-view reasoning mechanism described in Section 3. revision: yes
-
Referee: [Abstract / Experiments] The abstract asserts SOTA performance across benchmarks, but the provided description contains no quantitative tables, error bars, or statistical significance tests. Without these data the magnitude of improvement attributable to HAD versus prior diffusion-augmented baselines cannot be assessed.
Authors: The full manuscript already contains quantitative tables in the Experiments section that compare HAD against prior diffusion-augmented baselines on multiple benchmarks, reporting standard novel-view synthesis metrics. To improve clarity and address the referee's point, we will revise the abstract to include a concise statement of the key quantitative gains and will augment the tables with error bars (computed over multiple runs or scenes) as well as statistical significance tests (e.g., paired t-tests) where appropriate. These changes will make the magnitude of improvement more transparent without altering the core claims. revision: partial
Circularity Check
No significant circularity in derivation chain
full rationale
The paper describes a pipeline that estimates pixel-wise hallucination scores using a pre-trained external feedforward NVS network on large-scale 3D data, then applies selective masking during progressive reconstruction and fuses multiple conditioned augmentations. No equations, fitted parameters, or self-citations are shown reducing the hallucination scores or performance gains to quantities defined by the method's own inputs or outputs. The approach is benchmarked on external NVS tasks with claimed SOTA results, confirming the derivation remains independent of self-referential definitions or forced predictions.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption A pre-trained feedforward NVS network can produce reliable pixel-wise hallucination scores that reflect inconsistency with input views.
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/AbsoluteFloorClosure.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
estimates pixel-wise hallucination score maps … leveraging multi-view reasoning capabilities from a feedforward novel view synthesis (NVS) network … selective masking of unreliable pixels during the progressive 3D reconstruction
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
multi-sampling strategy … fuse … ArgMin fusion … lowest hallucination score
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Instant uncertainty calibration of nerfs us- ing a meta-calibrator
Niki Amini-Naieni, Tomas Jakab, Andrea Vedaldi, and Ronald Clark. Instant uncertainty calibration of nerfs us- ing a meta-calibrator. InEuropean Conference on Computer Vision, pages 309–324. Springer, 2024. 3
work page 2024
-
[2]
Mip-nerf: A multiscale representation for anti-aliasing neu- ral radiance fields
Jonathan T Barron, Ben Mildenhall, Matthew Tancik, Peter Hedman, Ricardo Martin-Brualla, and Pratul P Srinivasan. Mip-nerf: A multiscale representation for anti-aliasing neu- ral radiance fields. InProceedings of the IEEE/CVF inter- national conference on computer vision, pages 5855–5864,
-
[3]
Mip-nerf 360: Unbounded anti-aliased neural radiance fields
Jonathan T Barron, Ben Mildenhall, Dor Verbin, Pratul P Srinivasan, and Peter Hedman. Mip-nerf 360: Unbounded anti-aliased neural radiance fields. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5470–5479, 2022. 2, 6, 7, 9
work page 2022
-
[4]
Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets
Andreas Blattmann, Tim Dockhorn, Sumith Kulal, Daniel Mendelevitch, Maciej Kilian, Dominik Lorenz, Yam Levi, Zion English, Vikram V oleti, Adam Letts, et al. Stable video diffusion: Scaling latent video diffusion models to large datasets.arXiv preprint arXiv:2311.15127, 2023. 3
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[5]
Tensorf: Tensorial radiance fields
Anpei Chen, Zexiang Xu, Andreas Geiger, Jingyi Yu, and Hao Su. Tensorf: Tensorial radiance fields. InEuropean con- ference on computer vision, pages 333–350. Springer, 2022. 2
work page 2022
-
[6]
Danpeng Chen, Hai Li, Weicai Ye, Yifan Wang, Weijian Xie, Shangjin Zhai, Nan Wang, Haomin Liu, Hujun Bao, and Guofeng Zhang. Pgsr: Planar-based gaussian splatting for ef- ficient and high-fidelity surface reconstruction.IEEE Trans- actions on Visualization and Computer Graphics, 2024. 2
work page 2024
-
[7]
Mvsplat: Efficient 3d gaussian splatting from sparse multi-view images
Yuedong Chen, Haofei Xu, Chuanxia Zheng, Bohan Zhuang, Marc Pollefeys, Andreas Geiger, Tat-Jen Cham, and Jianfei Cai. Mvsplat: Efficient 3d gaussian splatting from sparse multi-view images. InEuropean Conference on Computer Vision, pages 370–386. Springer, 2024. 3
work page 2024
-
[8]
Yuedong Chen, Chuanxia Zheng, Haofei Xu, Bohan Zhuang, Andrea Vedaldi, Tat-Jen Cham, and Jianfei Cai. Mvs- plat360: Feed-forward 360 scene synthesis from sparse views.Advances in Neural Information Processing Systems, 37:107064–107086, 2024. 3
work page 2024
-
[9]
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Syl- vain Gelly, et al. An image is worth 16x16 words: Trans- formers for image recognition at scale.arXiv preprint arXiv:2010.11929, 2020. 3
work page internal anchor Pith review Pith/arXiv arXiv 2010
-
[10]
Lightgaussian: Unbounded 3d gaussian compression with 15x reduction and 200+ fps
Zhiwen Fan, Kevin Wang, Kairun Wen, Zehao Zhu, Dejia Xu, Zhangyang Wang, et al. Lightgaussian: Unbounded 3d gaussian compression with 15x reduction and 200+ fps. Advances in neural information processing systems, 37: 140138–140158, 2024. 2
work page 2024
-
[11]
Flowr: Flowing from sparse to dense 3d reconstructions
Tobias Fischer, Samuel Rota Bul `o, Yung-Hsu Yang, Nikhil Keetha, Lorenzo Porzi, Norman M ¨uller, Katja Schwarz, Jonathon Luiten, Marc Pollefeys, and Peter Kontschieder. Flowr: Flowing from sparse to dense 3d reconstructions. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 27702–27712, 2025. 1, 2, 3
work page 2025
-
[12]
Plenoxels: Radiance fields without neural networks
Sara Fridovich-Keil, Alex Yu, Matthew Tancik, Qinhong Chen, Benjamin Recht, and Angjoo Kanazawa. Plenoxels: Radiance fields without neural networks. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5501–5510, 2022. 2
work page 2022
-
[13]
CAT3D: Create Anything in 3D with Multi-View Diffusion Models
Ruiqi Gao, Aleksander Holynski, Philipp Henzler, Arthur Brussee, Ricardo Martin-Brualla, Pratul Srinivasan, Jonathan T Barron, and Ben Poole. Cat3d: Create anything in 3d with multi-view diffusion models.arXiv preprint arXiv:2405.10314, 2024. 3
work page internal anchor Pith review arXiv 2024
-
[14]
Bayes’ rays: Uncertainty quan- tification for neural radiance fields
Lily Goli, Cody Reading, Silvia Sell ´an, Alec Jacobson, and Andrea Tagliasacchi. Bayes’ rays: Uncertainty quan- tification for neural radiance fields. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 20061–20070, 2024. 3
work page 2024
-
[15]
Jiaxin Guo, Jiangliu Wang, Ruofeng Wei, Di Kang, Qi Dou, and Yun-Hui Liu. Uc-nerf: Uncertainty-aware conditional neural radiance fields from endoscopic sparse views.IEEE Transactions on Medical Imaging, 44(3):1284–1296, 2024. 3
work page 2024
-
[16]
2d gaussian splatting for geometrically accu- rate radiance fields
Binbin Huang, Zehao Yu, Anpei Chen, Andreas Geiger, and Shenghua Gao. 2d gaussian splatting for geometrically accu- rate radiance fields. InSIGGRAPH 2024 Conference Papers. Association for Computing Machinery, 2024. 2
work page 2024
-
[17]
Putting nerf on a diet: Semantically consistent few-shot view synthesis
Ajay Jain, Matthew Tancik, and Pieter Abbeel. Putting nerf on a diet: Semantically consistent few-shot view synthesis. InICCV, pages 5885–5894, 2021. 2
work page 2021
-
[18]
Rayzer: A self-supervised large view synthe- sis model
Hanwen Jiang, Hao Tan, Peng Wang, Haian Jin, Yue Zhao, Sai Bi, Kai Zhang, Fujun Luan, Kalyan Sunkavalli, Qixing Huang, et al. Rayzer: A self-supervised large view synthe- sis model. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 4918–4929, 2025. 3, 8
work page 2025
-
[19]
Lihan Jiang, Yucheng Mao, Linning Xu, Tao Lu, Kerui Ren, Yichen Jin, Xudong Xu, Mulin Yu, Jiangmiao Pang, Feng Zhao, et al. Anysplat: Feed-forward 3d gaussian splatting from unconstrained views.ACM Transactions on Graphics (TOG), 44(6):1–16, 2025. 3
work page 2025
-
[20]
Lvsm: A large view synthesis model with minimal 3d inductive bias
Haian Jin, Hanwen Jiang, Hao Tan, Kai Zhang, Sai Bi, Tianyuan Zhang, Fujun Luan, Noah Snavely, and Zexiang Xu. Lvsm: A large view synthesis model with minimal 3d inductive bias. InThe Thirteenth International Conference on Learning Representations, 2025. 2, 3, 5, 6, 7
work page 2025
-
[21]
Bernhard Kerbl, Georgios Kopanas, Thomas Leimk ¨uhler, and George Drettakis. 3d gaussian splatting for real-time radiance field rendering.ACM Transactions on Graphics, 42 (4), 2023. 1, 2, 3, 4, 6, 7
work page 2023
-
[22]
Shakiba Kheradmand, Daniel Rebain, Gopal Sharma, Wei- wei Sun, Yang-Che Tseng, Hossam Isack, Abhishek Kar, Andrea Tagliasacchi, and Kwang Moo Yi. 3d gaussian splat- ting as markov chain monte carlo.Advances in Neural In- formation Processing Systems, 37:80965–80986, 2024. 1, 6, 7
work page 2024
-
[23]
Jiahe Li, Jiawei Zhang, Xiao Bai, Jin Zheng, Xin Ning, Jun Zhou, and Lin Gu. Dngaussian: Optimizing sparse-view 3d gaussian radiance fields with global-local depth normaliza- tion. InProceedings of the IEEE/CVF conference on com- puter vision and pattern recognition, pages 20775–20785,
-
[24]
Variational multi-scale rep- resentation for estimating uncertainty in 3d gaussian splat- ting
Ruiqi Li and Yiu-ming Cheung. Variational multi-scale rep- resentation for estimating uncertainty in 3d gaussian splat- ting. InAdvances in Neural Information Processing Systems, pages 87934–87958, 2024. 3
work page 2024
-
[25]
Dl3dv-10k: A large-scale scene dataset for deep learning-based 3d vision
Lu Ling, Yichen Sheng, Zhi Tu, Wentian Zhao, Cheng Xin, Kun Wan, Lantao Yu, Qianyu Guo, Zixun Yu, Yawen Lu, et al. Dl3dv-10k: A large-scale scene dataset for deep learning-based 3d vision. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 22160–22169, 2024. 2, 6, 7, 8
work page 2024
-
[26]
Fangfu Liu, Wenqiang Sun, Hanyang Wang, Yikai Wang, Haowen Sun, Junliang Ye, Jun Zhang, and Yueqi Duan. Re- conx: Reconstruct any scene from sparse views with video diffusion model.IEEE Transactions on Image Processing,
-
[27]
Zero-1-to- 3: Zero-shot one image to 3d object
Ruoshi Liu, Rundi Wu, Basile Van Hoorick, Pavel Tok- makov, Sergey Zakharov, and Carl V ondrick. Zero-1-to- 3: Zero-shot one image to 3d object. InProceedings of the IEEE/CVF international conference on computer vision, pages 9298–9309, 2023. 3
work page 2023
-
[28]
Xi Liu, Chaoyi Zhou, and Siyu Huang. 3dgs-enhancer: Enhancing unbounded 3d gaussian splatting with view- consistent 2d diffusion priors.Advances in Neural Informa- tion Processing Systems, 37:133305–133327, 2024. 1, 2, 3
work page 2024
-
[29]
Sparseneus: Fast generalizable neural sur- face reconstruction from sparse views
Xiaoxiao Long, Cheng Lin, Peng Wang, Taku Komura, and Wenping Wang. Sparseneus: Fast generalizable neural sur- face reconstruction from sparse views. InEuropean Confer- ence on Computer Vision, pages 210–227. Springer, 2022. 2
work page 2022
-
[30]
Luke Melas-Kyriazi, Iro Laina, Christian Rupprecht, Natalia Neverova, Andrea Vedaldi, Oran Gafni, and Filippos Kokki- nos. Im-3d: Iterative multiview diffusion and reconstruction for high-quality 3d generation.International Conference on Machine Learning, 2024. 3
work page 2024
-
[31]
Srinivasan, Matthew Tancik, Jonathan T
Ben Mildenhall, Pratul P. Srinivasan, Matthew Tancik, Jonathan T. Barron, Ravi Ramamoorthi, and Ren Ng. Nerf: Representing scenes as neural radiance fields for view syn- thesis. InECCV, 2020. 1, 2
work page 2020
-
[32]
Hardware acceleration of neu- ral graphics
Muhammad Husnain Mubarik, Ramakrishna Kanungo, To- bias Zirr, and Rakesh Kumar. Hardware acceleration of neu- ral graphics. InProceedings of the 50th Annual International Symposium on Computer Architecture, pages 1–12, 2023. 2
work page 2023
-
[33]
Reg- nerf: Regularizing neural radiance fields for view synthesis from sparse inputs
Michael Niemeyer, Jonathan T Barron, Ben Mildenhall, Mehdi SM Sajjadi, Andreas Geiger, and Noha Radwan. Reg- nerf: Regularizing neural radiance fields for view synthesis from sparse inputs. InProceedings of the IEEE/CVF con- ference on computer vision and pattern recognition, pages 5480–5490, 2022. 2
work page 2022
-
[34]
Ben Poole, Ajay Jain, Jonathan T. Barron, and Ben Milden- hall. Dreamfusion: Text-to-3d using 2d diffusion. InThe Eleventh International Conference on Learning Representa- tions, 2023. 3
work page 2023
-
[35]
Estimating 3d uncertainty field: Quantify- ing uncertainty for neural radiance fields
Jianxiong Shen, Ruijie Ren, Adria Ruiz, and Francesc Moreno-Noguer. Estimating 3d uncertainty field: Quantify- ing uncertainty for neural radiance fields. In2024 IEEE In- ternational Conference on Robotics and Automation (ICRA), pages 2375–2381. IEEE, 2024. 3
work page 2024
-
[36]
MVDream: Multi-view Diffusion for 3D Generation
Yichun Shi, Peng Wang, Jianglong Ye, Long Mai, Kejie Li, and Xiao Yang. Mvdream: Multi-view diffusion for 3d gen- eration.arXiv:2308.16512, 2023. 3
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[37]
Attention is all you need.Advances in neural information processing systems, 30, 2017
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszko- reit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need.Advances in neural information processing systems, 30, 2017. 3
work page 2017
-
[38]
Sparsenerf: Distilling depth ranking for few-shot novel view synthesis
Guangcong Wang, Zhaoxi Chen, Chen Change Loy, and Zi- wei Liu. Sparsenerf: Distilling depth ranking for few-shot novel view synthesis. InProceedings of the IEEE/CVF inter- national conference on computer vision, pages 9065–9076,
-
[39]
Zhou Wang, Alan C Bovik, Hamid R Sheikh, and Eero P Si- moncelli. Image quality assessment: from error visibility to structural similarity.IEEE transactions on image processing, 13(4):600–612, 2004. 6
work page 2004
-
[40]
Sparse2dgs: Geometry-prioritized gaussian splatting for surface reconstruction from sparse views
Jiang Wu, Rui Li, Yu Zhu, Rong Guo, Jinqiu Sun, and Yan- ning Zhang. Sparse2dgs: Geometry-prioritized gaussian splatting for surface reconstruction from sparse views. In Proceedings of the Computer Vision and Pattern Recognition Conference, pages 11307–11316, 2025. 2
work page 2025
-
[41]
Difix3d+: Improving 3d reconstruc- tions with single-step diffusion models
Jay Zhangjie Wu, Yuxuan Zhang, Haithem Turki, Xuanchi Ren, Jun Gao, Mike Zheng Shou, Sanja Fidler, Zan Goj- cic, and Huan Ling. Difix3d+: Improving 3d reconstruc- tions with single-step diffusion models. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 26024–26035, 2025. 1, 3, 4, 5, 6, 7, 8
work page 2025
-
[42]
Reconfusion: 3d reconstruction with diffusion priors
Rundi Wu, Ben Mildenhall, Philipp Henzler, Keunhong Park, Ruiqi Gao, Daniel Watson, Pratul P Srinivasan, Dor Verbin, Jonathan T Barron, Ben Poole, et al. Reconfusion: 3d reconstruction with diffusion priors. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 21551–21561, 2024. 3, 6
work page 2024
-
[43]
Genfusion: Closing the loop between recon- struction and generation via videos
Sibo Wu, Congrong Xu, Binbin Huang, Andreas Geiger, and Anpei Chen. Genfusion: Closing the loop between recon- struction and generation via videos. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 6078–6088, 2025. 2, 6, 7, 1
work page 2025
-
[44]
Depthsplat: Connecting gaussian splatting and depth
Haofei Xu, Songyou Peng, Fangjinhua Wang, Hermann Blum, Daniel Barath, Andreas Geiger, and Marc Pollefeys. Depthsplat: Connecting gaussian splatting and depth. In Proceedings of the Computer Vision and Pattern Recognition Conference, pages 16453–16463, 2025. 3, 6
work page 2025
-
[45]
Freenerf: Im- proving few-shot neural rendering with free frequency reg- ularization
Jiawei Yang, Marco Pavone, and Yue Wang. Freenerf: Im- proving few-shot neural rendering with free frequency reg- ularization. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 8254–8263,
-
[46]
Gaussiandreamer: Fast generation from text to 3d gaussians by bridging 2d and 3d diffusion models
Taoran Yi, Jiemin Fang, Junjie Wang, Guanjun Wu, Lingxi Xie, Xiaopeng Zhang, Wenyu Liu, Qi Tian, and Xinggang Wang. Gaussiandreamer: Fast generation from text to 3d gaussians by bridging 2d and 3d diffusion models. InCVPR,
-
[47]
Xingyilang Yin, Qi Zhang, Jiahao Chang, Ying Feng, Qing- nan Fan, Xi Yang, Chi-Man Pun, Huaqi Zhang, and Xi- aodong Cun. Gsfixer: Improving 3d gaussian splatting with reference-guided video diffusion priors.arXiv preprint arXiv:2508.09667, 2025. 2
-
[48]
Plenoctrees for real-time rendering of neural radiance fields
Alex Yu, Ruilong Li, Matthew Tancik, Hao Li, Ren Ng, and Angjoo Kanazawa. Plenoctrees for real-time rendering of neural radiance fields. InProceedings of the IEEE/CVF international conference on computer vision, pages 5752– 5761, 2021. 2
work page 2021
-
[49]
Mip-splatting: Alias-free 3d gaussian splat- ting
Zehao Yu, Anpei Chen, Binbin Huang, Torsten Sattler, and Andreas Geiger. Mip-splatting: Alias-free 3d gaussian splat- ting. InProceedings of the IEEE/CVF conference on com- puter vision and pattern recognition, pages 19447–19456,
-
[50]
The unreasonable effectiveness of deep features as a perceptual metric
Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shecht- man, and Oliver Wang. The unreasonable effectiveness of deep features as a perceptual metric. InProceedings of the IEEE conference on computer vision and pattern recogni- tion, pages 586–595, 2018. 6
work page 2018
-
[51]
Stable virtual camera: Generative view synthesis with diffusion models
Jensen Zhou, Hang Gao, Vikram V oleti, Aaryaman Vasishta, Chun-Han Yao, Mark Boss, Philip Torr, Christian Rupprecht, and Varun Jampani. Stable virtual camera: Generative view synthesis with diffusion models. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 12405–12414, 2025. 2, 7, 1, 3
work page 2025
-
[52]
Fsgs: Real-time few-shot view synthesis using gaussian splatting
Zehao Zhu, Zhiwen Fan, Yifan Jiang, and Zhangyang Wang. Fsgs: Real-time few-shot view synthesis using gaussian splatting. InEuropean conference on computer vision, pages 145–163. Springer, 2024. 7
work page 2024
-
[53]
Matthias Zwicker, Hanspeter Pfister, Jeroen van Baar, and Markus Gross. Surface splatting. InProceedings of the 28th Annual Conference on Computer Graphics and Interactive Techniques, page 371–378, New York, NY , USA, 2001. As- sociation for Computing Machinery. 2 HAD: Hallucination-Aware Diffusion Priors for 3D Reconstruction Supplementary Material Overv...
work page 2001
-
[54]
Hallucination in NVS via diffusion We provide additional analysis to deepen understanding the hallucination issue introduced by diffusion models in NVS task. Specifically, we evaluate recent state-of-the-art meth- ods from two representative paradigms:Diffusion-assisted NVS with explicit 3DGS model(e.g., Difix3D [41], GenFu- sion [43] and 3DGS-enhancer [2...
-
[55]
Details of Hallucination Scoring Network Overview of hallucination score network.We provide a detailed model architecture Fig. 6. Training dataset curation.We provide additional details on the constructing training dataset for the hallucination score network. For all training scenes, we follow the Di- fix3D [41] pipeline under the 9-view setting to first ...
-
[56]
Generalizing to other diffusion models 9.1. Improving GenFusion We integrate our hallucination scoring network into Gen- Fusion [43] – the state-of-the-art video-diffusion-assisted 3DGS training pipeline. Importantly, we apply the same HAD model as in the main paper without any additional fine-tuning on video diffusion data. To ensure a fair com- parison,...
-
[57]
More Results 10.1. Additional Qualitative Comparisons We provide additional qualitative results on both the DL3DV – as shown in Fig. 11 and Fig. 10, and MipN- eRF360 datasets – see Fig. 12. Note we also include the corresponding rendered videos in project website, provid- ing a clearer comparison across viewpoints. Both the qual- itative results and video...
work page 1974
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.