HAD: Hallucination-Aware Diffusion Priors for 3D Reconstruction

Chris Broaddus; Laurent Guigues; Siyu Huang; Weiwei Sun; Xi Liu; Zhou Ren

arxiv: 2605.16873 · v1 · pith:7MNTUTZTnew · submitted 2026-05-16 · 💻 cs.CV

HAD: Hallucination-Aware Diffusion Priors for 3D Reconstruction

Xi Liu , Weiwei Sun , Zhou Ren , Chris Broaddus , Siyu Huang , Laurent Guigues This is my paper

Pith reviewed 2026-05-19 20:21 UTC · model grok-4.3

classification 💻 cs.CV

keywords 3D reconstructiondiffusion priorshallucination detectionnovel view synthesissparse view reconstructionmulti-view consistencyartifact reduction

0 comments

The pith

HAD estimates pixel-wise hallucination scores from a pre-trained novel view synthesis network to mask unreliable pixels in diffusion-augmented images during sparse-view 3D reconstruction.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper targets the introduction of invented content by diffusion models when they generate extra views to help build 3D models from limited input images. It does so by running those generated images through a separate network that can compare information across multiple viewpoints and assign a score to each pixel indicating how inconsistent it is with the real inputs. Selective masking of high-score pixels during the reconstruction process then keeps only the reliable additions, while fusing several differently conditioned versions of each new view brings in more surrounding context from the original set.

Core claim

HAD estimates pixel-wise hallucination score maps for augmented images by leveraging multi-view reasoning capabilities from a feedforward novel view synthesis network pre-trained on large-scale 3D data. These hallucination scores enable selective masking of unreliable pixels during the progressive 3D reconstruction procedure, preventing the introduction of non-existent artifacts into the 3D model. To further enhance performance, multiple versions of augmented images at each novel view are created by conditioning the diffusion prior on different input views and then fused into a final image that leverages the broader context across all input views.

What carries the argument

Pixel-wise hallucination score maps produced by a pre-trained feedforward novel view synthesis network, which identify inconsistent pixels for selective masking in diffusion-augmented training views during progressive 3D reconstruction.

If this is right

Selective masking prevents non-existent artifacts from being baked into the final 3D model.
Fusing multiple conditioned augmentations at each novel viewpoint incorporates broader context from all input views.
The overall procedure substantially reduces hallucination artifacts compared to standard diffusion-assisted reconstruction.
The approach reaches state-of-the-art results on multiple benchmarks for novel view synthesis from sparse inputs.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same scoring mechanism could be tested on other generative priors used in 3D tasks to check whether multi-view consistency filtering generalizes beyond diffusion models.
If the pre-trained network's reasoning is the key enabler, similar networks might serve as lightweight consistency checkers in related pipelines such as dynamic scene reconstruction.
An extension worth checking is whether the fusion step remains effective when the number of original input views drops below the levels tested in the benchmarks.

Load-bearing premise

The pre-trained novel view synthesis network can reliably produce hallucination scores that accurately identify pixels inconsistent with the original input views, and that masking these pixels improves rather than harms the final 3D model quality.

What would settle it

A direct comparison showing that reconstructions using the hallucination-masked augmented views produce no measurable improvement or even lower quality on standard novel view synthesis metrics than reconstructions that use the unmasked diffusion outputs.

Figures

Figures reproduced from arXiv: 2605.16873 by Chris Broaddus, Laurent Guigues, Siyu Huang, Weiwei Sun, Xi Liu, Zhou Ren.

**Figure 1.** Figure 1: While diffusion priors [41] enhance the quality of 3D reconstruction, they introduce detrimental aliens – the hallucinated elements that do not exist in the observed regions, as highlighted in boxes. This work addresses this issue through hallucination score modeling, achieving high-quality 3D reconstruction with improved fidelity. Abstract Diffusion priors have recently demonstrated strong capability in … view at source ↗

**Figure 2.** Figure 2: Overview of framework – We train 3DGS with input images and HAD-augmented novel views. HAD combines a pretrained diffusion prior (which generates images from 3DGS-rendered views conditioned on reference input images) with our hallucination score network (which predicts pixel-wise reliability maps). Our multi-sampling strategy fuses multiple generated versions into refined augmented views. Hallucination sco… view at source ↗

**Figure 3.** Figure 3: Examples on DL3DV [25] – We show novel-view rendering obtained by ours, Gspat-mcmc [22], LVSM [20] and Difix3D [41]. Our approach achieves the sharper rendering as well as the better fidelity to the ground-truth. the input views, then optimize the model using both input and augmented views. Under this setting, we replace Difix3D’s diffusion prior with our HAD. We demonstrate that even under this suboptima… view at source ↗

**Figure 4.** Figure 4: Qualitative demonstration of hallucination pattens in diffusion-assisted 3DGS pipelines. Both Difix3D (image diffusion) and [PITH_FULL_IMAGE:figures/full_fig_p013_4.png] view at source ↗

**Figure 5.** Figure 5: Hallucination analysis for multi-view diffusion (SVC). Hallucination maps are computed against ground-truth images to highlight [PITH_FULL_IMAGE:figures/full_fig_p014_5.png] view at source ↗

**Figure 6.** Figure 6: Overview of hallucination scoring network – The network predicts a pixel-wise hallucination score map s for a hallucinated novel view ˜iG. It consists of a multi-view feature encoder V (the frozen feature backbone of a pre-trained LVSM) and a three-layer U-Net score branch S, which estimates hallucination scores using both multi-view features and the novel view image. The model is trained on curated mu… view at source ↗

**Figure 7.** Figure 7: Hallucination Scoring for GenFusion. Our hallucination scoring network can also mitigate hallucinations in video diffusion without fine-tuning [PITH_FULL_IMAGE:figures/full_fig_p015_7.png] view at source ↗

**Figure 8.** Figure 8: Generalization of our hallucination scoring network to video diffusion (GenFusion). Our model is [PITH_FULL_IMAGE:figures/full_fig_p016_8.png] view at source ↗

**Figure 9.** Figure 9: Generalization of our hallucination scoring network to multi-view diffusion (SVC). Our model is [PITH_FULL_IMAGE:figures/full_fig_p017_9.png] view at source ↗

**Figure 10.** Figure 10: More Qualitative Results on DL3DV [25] [PITH_FULL_IMAGE:figures/full_fig_p018_10.png] view at source ↗

**Figure 11.** Figure 11: More Qualitative Results on DL3DV [25] [PITH_FULL_IMAGE:figures/full_fig_p019_11.png] view at source ↗

**Figure 12.** Figure 12: More Qualitative Results on MipNeRF-360 [3] [PITH_FULL_IMAGE:figures/full_fig_p020_12.png] view at source ↗

read the original abstract

Diffusion priors have recently demonstrated strong capability in enhancing the quality of sparse-view 3D reconstruction by augmenting training views at novel viewpoints, but they inevitably introduce hallucinated content -- artifacts inconsistent with the input views -- into the final 3D model. To address this challenge, we propose Hallucination-Aware Diffusion prior (HAD), which estimates pixel-wise hallucination score maps for augmented images by leveraging multi-view reasoning capabilities from a feedforward novel view synthesis (NVS) network pre-trained on large-scale 3D data. These hallucination scores enable selective masking of unreliable pixels during the progressive 3D reconstruction procedure, preventing the introduction of non-existent artifacts into the 3D model. To further enhance performance, we create multiple versions of augmented images at each novel view by conditioning the diffusion prior on different input views, which are then fused into a final image that leverages the broader context across all input views. We show that our method substantially reduces hallucination artifacts in diffusion-assisted 3D reconstruction, thereby achieving state-of-the-art performance across multiple benchmarks on novel view synthesis. Our project are publicly available at \href{https://xiliu8006.github.io/HAD-Project-website/}{project website}.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

HAD introduces hallucination scoring and masking with a pre-trained NVS network for diffusion priors in 3D recon, tackling artifacts but needing more experimental detail to fully assess.

read the letter

The one or two things to know: HAD uses a pre-trained novel view synthesis network to create pixel-wise hallucination score maps for images generated by diffusion priors, allowing selective masking of inconsistent pixels during 3D reconstruction, combined with fusing multiple augmented versions at each novel view. What is new is this hallucination-aware masking mechanism and the multi-view fusion strategy for diffusion-assisted reconstruction. The paper does well at clearly describing how the NVS network's multi-view capabilities can identify artifacts that don't match the original sparse inputs, which is a direct response to a limitation in prior diffusion prior work. The method avoids some common pitfalls by not relying on self-supervised fitting that could be circular. It builds on large-scale pre-trained models, which is efficient. Soft spots include the fact that claims of substantially reduced hallucinations and SOTA performance are made without specific quantitative results or ablation studies in the provided abstract. The effectiveness hinges on the NVS network producing reliable scores, and it's possible that aggressive masking could remove valid details in some cases. These are not deal-breakers but need to be addressed with data in the full paper. This paper is for researchers working on improving 3D reconstruction from limited views using generative models. It would be of interest to those focused on practical deployment where hallucination artifacts are a concern. I would recommend sending it for peer review to get expert input on the experiments and to see if the gains are as substantial as claimed.

Referee Report

2 major / 2 minor

Summary. The paper proposes Hallucination-Aware Diffusion prior (HAD) for sparse-view 3D reconstruction. It estimates pixel-wise hallucination score maps on diffusion-augmented novel-view images by exploiting multi-view reasoning from a pre-trained feedforward novel view synthesis (NVS) network. These scores enable selective masking of unreliable pixels during progressive reconstruction; multiple conditioned augmentations are fused to leverage broader context. The method is claimed to substantially reduce hallucination artifacts and deliver state-of-the-art novel-view synthesis results on multiple benchmarks.

Significance. If the quantitative claims hold, the work would be significant because it supplies a practical, training-free mechanism to detect and suppress diffusion-induced inconsistencies using an existing large-scale NVS prior. The combination of per-pixel masking and multi-view fusion directly targets a known failure mode of diffusion-assisted reconstruction pipelines and could improve reliability in downstream applications that require geometrically consistent 3D models from limited input views.

major comments (2)

[Method / Experiments] The central claim that masking pixels flagged by the pre-trained NVS network improves final 3D reconstruction quality (rather than discarding useful signal) is load-bearing yet rests on an unverified assumption. The manuscript should provide a direct ablation (e.g., reconstruction metrics with vs. without masking) together with qualitative examples showing that masked regions correspond to genuine hallucinations rather than view-consistent content.
[Abstract / Experiments] The abstract asserts SOTA performance across benchmarks, but the provided description contains no quantitative tables, error bars, or statistical significance tests. Without these data the magnitude of improvement attributable to HAD versus prior diffusion-augmented baselines cannot be assessed.

minor comments (2)

[Method] Clarify the exact architecture and training data of the feedforward NVS network used for scoring; a brief citation or diagram would help readers reproduce the pipeline.
[Method] The fusion step for multiple conditioned augmentations is described at a high level; a short algorithmic outline or pseudocode would improve clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback, which helps clarify the empirical validation of our approach. We address each major comment below and outline the revisions we will make to strengthen the manuscript.

read point-by-point responses

Referee: [Method / Experiments] The central claim that masking pixels flagged by the pre-trained NVS network improves final 3D reconstruction quality (rather than discarding useful signal) is load-bearing yet rests on an unverified assumption. The manuscript should provide a direct ablation (e.g., reconstruction metrics with vs. without masking) together with qualitative examples showing that masked regions correspond to genuine hallucinations rather than view-consistent content.

Authors: We agree that a direct ablation study is necessary to substantiate the benefit of the hallucination-aware masking. In the revised manuscript, we will add a dedicated ablation in the Experiments section that reports reconstruction metrics (PSNR, SSIM, LPIPS) on the same benchmarks with and without the masking step. We will also include qualitative examples that visualize the hallucination score maps overlaid on the augmented views, highlighting regions that are masked and demonstrating their inconsistency with the input views (e.g., via multi-view consistency checks) rather than view-consistent geometry. This addition directly addresses the concern and will be supported by the existing multi-view reasoning mechanism described in Section 3. revision: yes
Referee: [Abstract / Experiments] The abstract asserts SOTA performance across benchmarks, but the provided description contains no quantitative tables, error bars, or statistical significance tests. Without these data the magnitude of improvement attributable to HAD versus prior diffusion-augmented baselines cannot be assessed.

Authors: The full manuscript already contains quantitative tables in the Experiments section that compare HAD against prior diffusion-augmented baselines on multiple benchmarks, reporting standard novel-view synthesis metrics. To improve clarity and address the referee's point, we will revise the abstract to include a concise statement of the key quantitative gains and will augment the tables with error bars (computed over multiple runs or scenes) as well as statistical significance tests (e.g., paired t-tests) where appropriate. These changes will make the magnitude of improvement more transparent without altering the core claims. revision: partial

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper describes a pipeline that estimates pixel-wise hallucination scores using a pre-trained external feedforward NVS network on large-scale 3D data, then applies selective masking during progressive reconstruction and fuses multiple conditioned augmentations. No equations, fitted parameters, or self-citations are shown reducing the hallucination scores or performance gains to quantities defined by the method's own inputs or outputs. The approach is benchmarked on external NVS tasks with claimed SOTA results, confirming the derivation remains independent of self-referential definitions or forced predictions.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the assumption that the pre-trained NVS network provides accurate hallucination detection without introducing new biases, and that selective masking preserves geometric consistency. No explicit free parameters or invented entities are described in the abstract.

axioms (1)

domain assumption A pre-trained feedforward NVS network can produce reliable pixel-wise hallucination scores that reflect inconsistency with input views.
Invoked when using the NVS network to estimate scores for masking in the reconstruction procedure.

pith-pipeline@v0.9.0 · 5761 in / 1344 out tokens · 28461 ms · 2026-05-19T20:21:11.246063+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

estimates pixel-wise hallucination score maps … leveraging multi-view reasoning capabilities from a feedforward novel view synthesis (NVS) network … selective masking of unreliable pixels during the progressive 3D reconstruction
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

multi-sampling strategy … fuse … ArgMin fusion … lowest hallucination score

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

57 extracted references · 57 canonical work pages · 4 internal anchors

[1]

Instant uncertainty calibration of nerfs us- ing a meta-calibrator

Niki Amini-Naieni, Tomas Jakab, Andrea Vedaldi, and Ronald Clark. Instant uncertainty calibration of nerfs us- ing a meta-calibrator. InEuropean Conference on Computer Vision, pages 309–324. Springer, 2024. 3

work page 2024
[2]

Mip-nerf: A multiscale representation for anti-aliasing neu- ral radiance fields

Jonathan T Barron, Ben Mildenhall, Matthew Tancik, Peter Hedman, Ricardo Martin-Brualla, and Pratul P Srinivasan. Mip-nerf: A multiscale representation for anti-aliasing neu- ral radiance fields. InProceedings of the IEEE/CVF inter- national conference on computer vision, pages 5855–5864,

work page
[3]

Mip-nerf 360: Unbounded anti-aliased neural radiance fields

Jonathan T Barron, Ben Mildenhall, Dor Verbin, Pratul P Srinivasan, and Peter Hedman. Mip-nerf 360: Unbounded anti-aliased neural radiance fields. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5470–5479, 2022. 2, 6, 7, 9

work page 2022
[4]

Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets

Andreas Blattmann, Tim Dockhorn, Sumith Kulal, Daniel Mendelevitch, Maciej Kilian, Dominik Lorenz, Yam Levi, Zion English, Vikram V oleti, Adam Letts, et al. Stable video diffusion: Scaling latent video diffusion models to large datasets.arXiv preprint arXiv:2311.15127, 2023. 3

work page internal anchor Pith review Pith/arXiv arXiv 2023
[5]

Tensorf: Tensorial radiance fields

Anpei Chen, Zexiang Xu, Andreas Geiger, Jingyi Yu, and Hao Su. Tensorf: Tensorial radiance fields. InEuropean con- ference on computer vision, pages 333–350. Springer, 2022. 2

work page 2022
[6]

Pgsr: Planar-based gaussian splatting for ef- ficient and high-fidelity surface reconstruction.IEEE Trans- actions on Visualization and Computer Graphics, 2024

Danpeng Chen, Hai Li, Weicai Ye, Yifan Wang, Weijian Xie, Shangjin Zhai, Nan Wang, Haomin Liu, Hujun Bao, and Guofeng Zhang. Pgsr: Planar-based gaussian splatting for ef- ficient and high-fidelity surface reconstruction.IEEE Trans- actions on Visualization and Computer Graphics, 2024. 2

work page 2024
[7]

Mvsplat: Efficient 3d gaussian splatting from sparse multi-view images

Yuedong Chen, Haofei Xu, Chuanxia Zheng, Bohan Zhuang, Marc Pollefeys, Andreas Geiger, Tat-Jen Cham, and Jianfei Cai. Mvsplat: Efficient 3d gaussian splatting from sparse multi-view images. InEuropean Conference on Computer Vision, pages 370–386. Springer, 2024. 3

work page 2024
[8]

Mvs- plat360: Feed-forward 360 scene synthesis from sparse views.Advances in Neural Information Processing Systems, 37:107064–107086, 2024

Yuedong Chen, Chuanxia Zheng, Haofei Xu, Bohan Zhuang, Andrea Vedaldi, Tat-Jen Cham, and Jianfei Cai. Mvs- plat360: Feed-forward 360 scene synthesis from sparse views.Advances in Neural Information Processing Systems, 37:107064–107086, 2024. 3

work page 2024
[9]

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Syl- vain Gelly, et al. An image is worth 16x16 words: Trans- formers for image recognition at scale.arXiv preprint arXiv:2010.11929, 2020. 3

work page internal anchor Pith review Pith/arXiv arXiv 2010
[10]

Lightgaussian: Unbounded 3d gaussian compression with 15x reduction and 200+ fps

Zhiwen Fan, Kevin Wang, Kairun Wen, Zehao Zhu, Dejia Xu, Zhangyang Wang, et al. Lightgaussian: Unbounded 3d gaussian compression with 15x reduction and 200+ fps. Advances in neural information processing systems, 37: 140138–140158, 2024. 2

work page 2024
[11]

Flowr: Flowing from sparse to dense 3d reconstructions

Tobias Fischer, Samuel Rota Bul `o, Yung-Hsu Yang, Nikhil Keetha, Lorenzo Porzi, Norman M ¨uller, Katja Schwarz, Jonathon Luiten, Marc Pollefeys, and Peter Kontschieder. Flowr: Flowing from sparse to dense 3d reconstructions. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 27702–27712, 2025. 1, 2, 3

work page 2025
[12]

Plenoxels: Radiance fields without neural networks

Sara Fridovich-Keil, Alex Yu, Matthew Tancik, Qinhong Chen, Benjamin Recht, and Angjoo Kanazawa. Plenoxels: Radiance fields without neural networks. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5501–5510, 2022. 2

work page 2022
[13]

CAT3D: Create Anything in 3D with Multi-View Diffusion Models

Ruiqi Gao, Aleksander Holynski, Philipp Henzler, Arthur Brussee, Ricardo Martin-Brualla, Pratul Srinivasan, Jonathan T Barron, and Ben Poole. Cat3d: Create anything in 3d with multi-view diffusion models.arXiv preprint arXiv:2405.10314, 2024. 3

work page internal anchor Pith review arXiv 2024
[14]

Bayes’ rays: Uncertainty quan- tification for neural radiance fields

Lily Goli, Cody Reading, Silvia Sell ´an, Alec Jacobson, and Andrea Tagliasacchi. Bayes’ rays: Uncertainty quan- tification for neural radiance fields. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 20061–20070, 2024. 3

work page 2024
[15]

Uc-nerf: Uncertainty-aware conditional neural radiance fields from endoscopic sparse views.IEEE Transactions on Medical Imaging, 44(3):1284–1296, 2024

Jiaxin Guo, Jiangliu Wang, Ruofeng Wei, Di Kang, Qi Dou, and Yun-Hui Liu. Uc-nerf: Uncertainty-aware conditional neural radiance fields from endoscopic sparse views.IEEE Transactions on Medical Imaging, 44(3):1284–1296, 2024. 3

work page 2024
[16]

2d gaussian splatting for geometrically accu- rate radiance fields

Binbin Huang, Zehao Yu, Anpei Chen, Andreas Geiger, and Shenghua Gao. 2d gaussian splatting for geometrically accu- rate radiance fields. InSIGGRAPH 2024 Conference Papers. Association for Computing Machinery, 2024. 2

work page 2024
[17]

Putting nerf on a diet: Semantically consistent few-shot view synthesis

Ajay Jain, Matthew Tancik, and Pieter Abbeel. Putting nerf on a diet: Semantically consistent few-shot view synthesis. InICCV, pages 5885–5894, 2021. 2

work page 2021
[18]

Rayzer: A self-supervised large view synthe- sis model

Hanwen Jiang, Hao Tan, Peng Wang, Haian Jin, Yue Zhao, Sai Bi, Kai Zhang, Fujun Luan, Kalyan Sunkavalli, Qixing Huang, et al. Rayzer: A self-supervised large view synthe- sis model. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 4918–4929, 2025. 3, 8

work page 2025
[19]

Anysplat: Feed-forward 3d gaussian splatting from unconstrained views.ACM Transactions on Graphics (TOG), 44(6):1–16, 2025

Lihan Jiang, Yucheng Mao, Linning Xu, Tao Lu, Kerui Ren, Yichen Jin, Xudong Xu, Mulin Yu, Jiangmiao Pang, Feng Zhao, et al. Anysplat: Feed-forward 3d gaussian splatting from unconstrained views.ACM Transactions on Graphics (TOG), 44(6):1–16, 2025. 3

work page 2025
[20]

Lvsm: A large view synthesis model with minimal 3d inductive bias

Haian Jin, Hanwen Jiang, Hao Tan, Kai Zhang, Sai Bi, Tianyuan Zhang, Fujun Luan, Noah Snavely, and Zexiang Xu. Lvsm: A large view synthesis model with minimal 3d inductive bias. InThe Thirteenth International Conference on Learning Representations, 2025. 2, 3, 5, 6, 7

work page 2025
[21]

3d gaussian splatting for real-time radiance field rendering.ACM Transactions on Graphics, 42 (4), 2023

Bernhard Kerbl, Georgios Kopanas, Thomas Leimk ¨uhler, and George Drettakis. 3d gaussian splatting for real-time radiance field rendering.ACM Transactions on Graphics, 42 (4), 2023. 1, 2, 3, 4, 6, 7

work page 2023
[22]

3d gaussian splat- ting as markov chain monte carlo.Advances in Neural In- formation Processing Systems, 37:80965–80986, 2024

Shakiba Kheradmand, Daniel Rebain, Gopal Sharma, Wei- wei Sun, Yang-Che Tseng, Hossam Isack, Abhishek Kar, Andrea Tagliasacchi, and Kwang Moo Yi. 3d gaussian splat- ting as markov chain monte carlo.Advances in Neural In- formation Processing Systems, 37:80965–80986, 2024. 1, 6, 7

work page 2024
[23]

Dngaussian: Optimizing sparse-view 3d gaussian radiance fields with global-local depth normaliza- tion

Jiahe Li, Jiawei Zhang, Xiao Bai, Jin Zheng, Xin Ning, Jun Zhou, and Lin Gu. Dngaussian: Optimizing sparse-view 3d gaussian radiance fields with global-local depth normaliza- tion. InProceedings of the IEEE/CVF conference on com- puter vision and pattern recognition, pages 20775–20785,

work page
[24]

Variational multi-scale rep- resentation for estimating uncertainty in 3d gaussian splat- ting

Ruiqi Li and Yiu-ming Cheung. Variational multi-scale rep- resentation for estimating uncertainty in 3d gaussian splat- ting. InAdvances in Neural Information Processing Systems, pages 87934–87958, 2024. 3

work page 2024
[25]

Dl3dv-10k: A large-scale scene dataset for deep learning-based 3d vision

Lu Ling, Yichen Sheng, Zhi Tu, Wentian Zhao, Cheng Xin, Kun Wan, Lantao Yu, Qianyu Guo, Zixun Yu, Yawen Lu, et al. Dl3dv-10k: A large-scale scene dataset for deep learning-based 3d vision. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 22160–22169, 2024. 2, 6, 7, 8

work page 2024
[26]

Re- conx: Reconstruct any scene from sparse views with video diffusion model.IEEE Transactions on Image Processing,

Fangfu Liu, Wenqiang Sun, Hanyang Wang, Yikai Wang, Haowen Sun, Junliang Ye, Jun Zhang, and Yueqi Duan. Re- conx: Reconstruct any scene from sparse views with video diffusion model.IEEE Transactions on Image Processing,

work page
[27]

Zero-1-to- 3: Zero-shot one image to 3d object

Ruoshi Liu, Rundi Wu, Basile Van Hoorick, Pavel Tok- makov, Sergey Zakharov, and Carl V ondrick. Zero-1-to- 3: Zero-shot one image to 3d object. InProceedings of the IEEE/CVF international conference on computer vision, pages 9298–9309, 2023. 3

work page 2023
[28]

3dgs-enhancer: Enhancing unbounded 3d gaussian splatting with view- consistent 2d diffusion priors.Advances in Neural Informa- tion Processing Systems, 37:133305–133327, 2024

Xi Liu, Chaoyi Zhou, and Siyu Huang. 3dgs-enhancer: Enhancing unbounded 3d gaussian splatting with view- consistent 2d diffusion priors.Advances in Neural Informa- tion Processing Systems, 37:133305–133327, 2024. 1, 2, 3

work page 2024
[29]

Sparseneus: Fast generalizable neural sur- face reconstruction from sparse views

Xiaoxiao Long, Cheng Lin, Peng Wang, Taku Komura, and Wenping Wang. Sparseneus: Fast generalizable neural sur- face reconstruction from sparse views. InEuropean Confer- ence on Computer Vision, pages 210–227. Springer, 2022. 2

work page 2022
[30]

Im-3d: Iterative multiview diffusion and reconstruction for high-quality 3d generation.International Conference on Machine Learning, 2024

Luke Melas-Kyriazi, Iro Laina, Christian Rupprecht, Natalia Neverova, Andrea Vedaldi, Oran Gafni, and Filippos Kokki- nos. Im-3d: Iterative multiview diffusion and reconstruction for high-quality 3d generation.International Conference on Machine Learning, 2024. 3

work page 2024
[31]

Srinivasan, Matthew Tancik, Jonathan T

Ben Mildenhall, Pratul P. Srinivasan, Matthew Tancik, Jonathan T. Barron, Ravi Ramamoorthi, and Ren Ng. Nerf: Representing scenes as neural radiance fields for view syn- thesis. InECCV, 2020. 1, 2

work page 2020
[32]

Hardware acceleration of neu- ral graphics

Muhammad Husnain Mubarik, Ramakrishna Kanungo, To- bias Zirr, and Rakesh Kumar. Hardware acceleration of neu- ral graphics. InProceedings of the 50th Annual International Symposium on Computer Architecture, pages 1–12, 2023. 2

work page 2023
[33]

Reg- nerf: Regularizing neural radiance fields for view synthesis from sparse inputs

Michael Niemeyer, Jonathan T Barron, Ben Mildenhall, Mehdi SM Sajjadi, Andreas Geiger, and Noha Radwan. Reg- nerf: Regularizing neural radiance fields for view synthesis from sparse inputs. InProceedings of the IEEE/CVF con- ference on computer vision and pattern recognition, pages 5480–5490, 2022. 2

work page 2022
[34]

Barron, and Ben Milden- hall

Ben Poole, Ajay Jain, Jonathan T. Barron, and Ben Milden- hall. Dreamfusion: Text-to-3d using 2d diffusion. InThe Eleventh International Conference on Learning Representa- tions, 2023. 3

work page 2023
[35]

Estimating 3d uncertainty field: Quantify- ing uncertainty for neural radiance fields

Jianxiong Shen, Ruijie Ren, Adria Ruiz, and Francesc Moreno-Noguer. Estimating 3d uncertainty field: Quantify- ing uncertainty for neural radiance fields. In2024 IEEE In- ternational Conference on Robotics and Automation (ICRA), pages 2375–2381. IEEE, 2024. 3

work page 2024
[36]

MVDream: Multi-view Diffusion for 3D Generation

Yichun Shi, Peng Wang, Jianglong Ye, Long Mai, Kejie Li, and Xiao Yang. Mvdream: Multi-view diffusion for 3d gen- eration.arXiv:2308.16512, 2023. 3

work page internal anchor Pith review Pith/arXiv arXiv 2023
[37]

Attention is all you need.Advances in neural information processing systems, 30, 2017

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszko- reit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need.Advances in neural information processing systems, 30, 2017. 3

work page 2017
[38]

Sparsenerf: Distilling depth ranking for few-shot novel view synthesis

Guangcong Wang, Zhaoxi Chen, Chen Change Loy, and Zi- wei Liu. Sparsenerf: Distilling depth ranking for few-shot novel view synthesis. InProceedings of the IEEE/CVF inter- national conference on computer vision, pages 9065–9076,

work page
[39]

Image quality assessment: from error visibility to structural similarity.IEEE transactions on image processing, 13(4):600–612, 2004

Zhou Wang, Alan C Bovik, Hamid R Sheikh, and Eero P Si- moncelli. Image quality assessment: from error visibility to structural similarity.IEEE transactions on image processing, 13(4):600–612, 2004. 6

work page 2004
[40]

Sparse2dgs: Geometry-prioritized gaussian splatting for surface reconstruction from sparse views

Jiang Wu, Rui Li, Yu Zhu, Rong Guo, Jinqiu Sun, and Yan- ning Zhang. Sparse2dgs: Geometry-prioritized gaussian splatting for surface reconstruction from sparse views. In Proceedings of the Computer Vision and Pattern Recognition Conference, pages 11307–11316, 2025. 2

work page 2025
[41]

Difix3d+: Improving 3d reconstruc- tions with single-step diffusion models

Jay Zhangjie Wu, Yuxuan Zhang, Haithem Turki, Xuanchi Ren, Jun Gao, Mike Zheng Shou, Sanja Fidler, Zan Goj- cic, and Huan Ling. Difix3d+: Improving 3d reconstruc- tions with single-step diffusion models. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 26024–26035, 2025. 1, 3, 4, 5, 6, 7, 8

work page 2025
[42]

Reconfusion: 3d reconstruction with diffusion priors

Rundi Wu, Ben Mildenhall, Philipp Henzler, Keunhong Park, Ruiqi Gao, Daniel Watson, Pratul P Srinivasan, Dor Verbin, Jonathan T Barron, Ben Poole, et al. Reconfusion: 3d reconstruction with diffusion priors. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 21551–21561, 2024. 3, 6

work page 2024
[43]

Genfusion: Closing the loop between recon- struction and generation via videos

Sibo Wu, Congrong Xu, Binbin Huang, Andreas Geiger, and Anpei Chen. Genfusion: Closing the loop between recon- struction and generation via videos. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 6078–6088, 2025. 2, 6, 7, 1

work page 2025
[44]

Depthsplat: Connecting gaussian splatting and depth

Haofei Xu, Songyou Peng, Fangjinhua Wang, Hermann Blum, Daniel Barath, Andreas Geiger, and Marc Pollefeys. Depthsplat: Connecting gaussian splatting and depth. In Proceedings of the Computer Vision and Pattern Recognition Conference, pages 16453–16463, 2025. 3, 6

work page 2025
[45]

Freenerf: Im- proving few-shot neural rendering with free frequency reg- ularization

Jiawei Yang, Marco Pavone, and Yue Wang. Freenerf: Im- proving few-shot neural rendering with free frequency reg- ularization. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 8254–8263,

work page
[46]

Gaussiandreamer: Fast generation from text to 3d gaussians by bridging 2d and 3d diffusion models

Taoran Yi, Jiemin Fang, Junjie Wang, Guanjun Wu, Lingxi Xie, Xiaopeng Zhang, Wenyu Liu, Qi Tian, and Xinggang Wang. Gaussiandreamer: Fast generation from text to 3d gaussians by bridging 2d and 3d diffusion models. InCVPR,

work page
[47]

Gsfixer: Improving 3d gaussian splatting with reference-guided video diffusion priors.arXiv preprint arXiv:2508.09667, 2025

Xingyilang Yin, Qi Zhang, Jiahao Chang, Ying Feng, Qing- nan Fan, Xi Yang, Chi-Man Pun, Huaqi Zhang, and Xi- aodong Cun. Gsfixer: Improving 3d gaussian splatting with reference-guided video diffusion priors.arXiv preprint arXiv:2508.09667, 2025. 2

work page arXiv 2025
[48]

Plenoctrees for real-time rendering of neural radiance fields

Alex Yu, Ruilong Li, Matthew Tancik, Hao Li, Ren Ng, and Angjoo Kanazawa. Plenoctrees for real-time rendering of neural radiance fields. InProceedings of the IEEE/CVF international conference on computer vision, pages 5752– 5761, 2021. 2

work page 2021
[49]

Mip-splatting: Alias-free 3d gaussian splat- ting

Zehao Yu, Anpei Chen, Binbin Huang, Torsten Sattler, and Andreas Geiger. Mip-splatting: Alias-free 3d gaussian splat- ting. InProceedings of the IEEE/CVF conference on com- puter vision and pattern recognition, pages 19447–19456,

work page
[50]

The unreasonable effectiveness of deep features as a perceptual metric

Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shecht- man, and Oliver Wang. The unreasonable effectiveness of deep features as a perceptual metric. InProceedings of the IEEE conference on computer vision and pattern recogni- tion, pages 586–595, 2018. 6

work page 2018
[51]

Stable virtual camera: Generative view synthesis with diffusion models

Jensen Zhou, Hang Gao, Vikram V oleti, Aaryaman Vasishta, Chun-Han Yao, Mark Boss, Philip Torr, Christian Rupprecht, and Varun Jampani. Stable virtual camera: Generative view synthesis with diffusion models. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 12405–12414, 2025. 2, 7, 1, 3

work page 2025
[52]

Fsgs: Real-time few-shot view synthesis using gaussian splatting

Zehao Zhu, Zhiwen Fan, Yifan Jiang, and Zhangyang Wang. Fsgs: Real-time few-shot view synthesis using gaussian splatting. InEuropean conference on computer vision, pages 145–163. Springer, 2024. 7

work page 2024
[53]

Surface splatting

Matthias Zwicker, Hanspeter Pfister, Jeroen van Baar, and Markus Gross. Surface splatting. InProceedings of the 28th Annual Conference on Computer Graphics and Interactive Techniques, page 371–378, New York, NY , USA, 2001. As- sociation for Computing Machinery. 2 HAD: Hallucination-Aware Diffusion Priors for 3D Reconstruction Supplementary Material Overv...

work page 2001
[54]

Hallucination in NVS via diffusion We provide additional analysis to deepen understanding the hallucination issue introduced by diffusion models in NVS task. Specifically, we evaluate recent state-of-the-art meth- ods from two representative paradigms:Diffusion-assisted NVS with explicit 3DGS model(e.g., Difix3D [41], GenFu- sion [43] and 3DGS-enhancer [2...

work page
[55]

Details of Hallucination Scoring Network Overview of hallucination score network.We provide a detailed model architecture Fig. 6. Training dataset curation.We provide additional details on the constructing training dataset for the hallucination score network. For all training scenes, we follow the Di- fix3D [41] pipeline under the 9-view setting to first ...

work page
[56]

Improving GenFusion We integrate our hallucination scoring network into Gen- Fusion [43] – the state-of-the-art video-diffusion-assisted 3DGS training pipeline

Generalizing to other diffusion models 9.1. Improving GenFusion We integrate our hallucination scoring network into Gen- Fusion [43] – the state-of-the-art video-diffusion-assisted 3DGS training pipeline. Importantly, we apply the same HAD model as in the main paper without any additional fine-tuning on video diffusion data. To ensure a fair com- parison,...

work page
[57]

Additional Qualitative Comparisons We provide additional qualitative results on both the DL3DV – as shown in Fig

More Results 10.1. Additional Qualitative Comparisons We provide additional qualitative results on both the DL3DV – as shown in Fig. 11 and Fig. 10, and MipN- eRF360 datasets – see Fig. 12. Note we also include the corresponding rendered videos in project website, provid- ing a clearer comparison across viewpoints. Both the qual- itative results and video...

work page 1974

[1] [1]

Instant uncertainty calibration of nerfs us- ing a meta-calibrator

Niki Amini-Naieni, Tomas Jakab, Andrea Vedaldi, and Ronald Clark. Instant uncertainty calibration of nerfs us- ing a meta-calibrator. InEuropean Conference on Computer Vision, pages 309–324. Springer, 2024. 3

work page 2024

[2] [2]

Mip-nerf: A multiscale representation for anti-aliasing neu- ral radiance fields

Jonathan T Barron, Ben Mildenhall, Matthew Tancik, Peter Hedman, Ricardo Martin-Brualla, and Pratul P Srinivasan. Mip-nerf: A multiscale representation for anti-aliasing neu- ral radiance fields. InProceedings of the IEEE/CVF inter- national conference on computer vision, pages 5855–5864,

work page

[3] [3]

Mip-nerf 360: Unbounded anti-aliased neural radiance fields

Jonathan T Barron, Ben Mildenhall, Dor Verbin, Pratul P Srinivasan, and Peter Hedman. Mip-nerf 360: Unbounded anti-aliased neural radiance fields. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5470–5479, 2022. 2, 6, 7, 9

work page 2022

[4] [4]

Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets

Andreas Blattmann, Tim Dockhorn, Sumith Kulal, Daniel Mendelevitch, Maciej Kilian, Dominik Lorenz, Yam Levi, Zion English, Vikram V oleti, Adam Letts, et al. Stable video diffusion: Scaling latent video diffusion models to large datasets.arXiv preprint arXiv:2311.15127, 2023. 3

work page internal anchor Pith review Pith/arXiv arXiv 2023

[5] [5]

Tensorf: Tensorial radiance fields

Anpei Chen, Zexiang Xu, Andreas Geiger, Jingyi Yu, and Hao Su. Tensorf: Tensorial radiance fields. InEuropean con- ference on computer vision, pages 333–350. Springer, 2022. 2

work page 2022

[6] [6]

Pgsr: Planar-based gaussian splatting for ef- ficient and high-fidelity surface reconstruction.IEEE Trans- actions on Visualization and Computer Graphics, 2024

Danpeng Chen, Hai Li, Weicai Ye, Yifan Wang, Weijian Xie, Shangjin Zhai, Nan Wang, Haomin Liu, Hujun Bao, and Guofeng Zhang. Pgsr: Planar-based gaussian splatting for ef- ficient and high-fidelity surface reconstruction.IEEE Trans- actions on Visualization and Computer Graphics, 2024. 2

work page 2024

[7] [7]

Mvsplat: Efficient 3d gaussian splatting from sparse multi-view images

Yuedong Chen, Haofei Xu, Chuanxia Zheng, Bohan Zhuang, Marc Pollefeys, Andreas Geiger, Tat-Jen Cham, and Jianfei Cai. Mvsplat: Efficient 3d gaussian splatting from sparse multi-view images. InEuropean Conference on Computer Vision, pages 370–386. Springer, 2024. 3

work page 2024

[8] [8]

Mvs- plat360: Feed-forward 360 scene synthesis from sparse views.Advances in Neural Information Processing Systems, 37:107064–107086, 2024

Yuedong Chen, Chuanxia Zheng, Haofei Xu, Bohan Zhuang, Andrea Vedaldi, Tat-Jen Cham, and Jianfei Cai. Mvs- plat360: Feed-forward 360 scene synthesis from sparse views.Advances in Neural Information Processing Systems, 37:107064–107086, 2024. 3

work page 2024

[9] [9]

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Syl- vain Gelly, et al. An image is worth 16x16 words: Trans- formers for image recognition at scale.arXiv preprint arXiv:2010.11929, 2020. 3

work page internal anchor Pith review Pith/arXiv arXiv 2010

[10] [10]

Lightgaussian: Unbounded 3d gaussian compression with 15x reduction and 200+ fps

Zhiwen Fan, Kevin Wang, Kairun Wen, Zehao Zhu, Dejia Xu, Zhangyang Wang, et al. Lightgaussian: Unbounded 3d gaussian compression with 15x reduction and 200+ fps. Advances in neural information processing systems, 37: 140138–140158, 2024. 2

work page 2024

[11] [11]

Flowr: Flowing from sparse to dense 3d reconstructions

Tobias Fischer, Samuel Rota Bul `o, Yung-Hsu Yang, Nikhil Keetha, Lorenzo Porzi, Norman M ¨uller, Katja Schwarz, Jonathon Luiten, Marc Pollefeys, and Peter Kontschieder. Flowr: Flowing from sparse to dense 3d reconstructions. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 27702–27712, 2025. 1, 2, 3

work page 2025

[12] [12]

Plenoxels: Radiance fields without neural networks

Sara Fridovich-Keil, Alex Yu, Matthew Tancik, Qinhong Chen, Benjamin Recht, and Angjoo Kanazawa. Plenoxels: Radiance fields without neural networks. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5501–5510, 2022. 2

work page 2022

[13] [13]

CAT3D: Create Anything in 3D with Multi-View Diffusion Models

Ruiqi Gao, Aleksander Holynski, Philipp Henzler, Arthur Brussee, Ricardo Martin-Brualla, Pratul Srinivasan, Jonathan T Barron, and Ben Poole. Cat3d: Create anything in 3d with multi-view diffusion models.arXiv preprint arXiv:2405.10314, 2024. 3

work page internal anchor Pith review arXiv 2024

[14] [14]

Bayes’ rays: Uncertainty quan- tification for neural radiance fields

Lily Goli, Cody Reading, Silvia Sell ´an, Alec Jacobson, and Andrea Tagliasacchi. Bayes’ rays: Uncertainty quan- tification for neural radiance fields. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 20061–20070, 2024. 3

work page 2024

[15] [15]

Uc-nerf: Uncertainty-aware conditional neural radiance fields from endoscopic sparse views.IEEE Transactions on Medical Imaging, 44(3):1284–1296, 2024

Jiaxin Guo, Jiangliu Wang, Ruofeng Wei, Di Kang, Qi Dou, and Yun-Hui Liu. Uc-nerf: Uncertainty-aware conditional neural radiance fields from endoscopic sparse views.IEEE Transactions on Medical Imaging, 44(3):1284–1296, 2024. 3

work page 2024

[16] [16]

2d gaussian splatting for geometrically accu- rate radiance fields

Binbin Huang, Zehao Yu, Anpei Chen, Andreas Geiger, and Shenghua Gao. 2d gaussian splatting for geometrically accu- rate radiance fields. InSIGGRAPH 2024 Conference Papers. Association for Computing Machinery, 2024. 2

work page 2024

[17] [17]

Putting nerf on a diet: Semantically consistent few-shot view synthesis

Ajay Jain, Matthew Tancik, and Pieter Abbeel. Putting nerf on a diet: Semantically consistent few-shot view synthesis. InICCV, pages 5885–5894, 2021. 2

work page 2021

[18] [18]

Rayzer: A self-supervised large view synthe- sis model

Hanwen Jiang, Hao Tan, Peng Wang, Haian Jin, Yue Zhao, Sai Bi, Kai Zhang, Fujun Luan, Kalyan Sunkavalli, Qixing Huang, et al. Rayzer: A self-supervised large view synthe- sis model. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 4918–4929, 2025. 3, 8

work page 2025

[19] [19]

Anysplat: Feed-forward 3d gaussian splatting from unconstrained views.ACM Transactions on Graphics (TOG), 44(6):1–16, 2025

Lihan Jiang, Yucheng Mao, Linning Xu, Tao Lu, Kerui Ren, Yichen Jin, Xudong Xu, Mulin Yu, Jiangmiao Pang, Feng Zhao, et al. Anysplat: Feed-forward 3d gaussian splatting from unconstrained views.ACM Transactions on Graphics (TOG), 44(6):1–16, 2025. 3

work page 2025

[20] [20]

Lvsm: A large view synthesis model with minimal 3d inductive bias

Haian Jin, Hanwen Jiang, Hao Tan, Kai Zhang, Sai Bi, Tianyuan Zhang, Fujun Luan, Noah Snavely, and Zexiang Xu. Lvsm: A large view synthesis model with minimal 3d inductive bias. InThe Thirteenth International Conference on Learning Representations, 2025. 2, 3, 5, 6, 7

work page 2025

[21] [21]

3d gaussian splatting for real-time radiance field rendering.ACM Transactions on Graphics, 42 (4), 2023

Bernhard Kerbl, Georgios Kopanas, Thomas Leimk ¨uhler, and George Drettakis. 3d gaussian splatting for real-time radiance field rendering.ACM Transactions on Graphics, 42 (4), 2023. 1, 2, 3, 4, 6, 7

work page 2023

[22] [22]

3d gaussian splat- ting as markov chain monte carlo.Advances in Neural In- formation Processing Systems, 37:80965–80986, 2024

Shakiba Kheradmand, Daniel Rebain, Gopal Sharma, Wei- wei Sun, Yang-Che Tseng, Hossam Isack, Abhishek Kar, Andrea Tagliasacchi, and Kwang Moo Yi. 3d gaussian splat- ting as markov chain monte carlo.Advances in Neural In- formation Processing Systems, 37:80965–80986, 2024. 1, 6, 7

work page 2024

[23] [23]

Dngaussian: Optimizing sparse-view 3d gaussian radiance fields with global-local depth normaliza- tion

Jiahe Li, Jiawei Zhang, Xiao Bai, Jin Zheng, Xin Ning, Jun Zhou, and Lin Gu. Dngaussian: Optimizing sparse-view 3d gaussian radiance fields with global-local depth normaliza- tion. InProceedings of the IEEE/CVF conference on com- puter vision and pattern recognition, pages 20775–20785,

work page

[24] [24]

Variational multi-scale rep- resentation for estimating uncertainty in 3d gaussian splat- ting

Ruiqi Li and Yiu-ming Cheung. Variational multi-scale rep- resentation for estimating uncertainty in 3d gaussian splat- ting. InAdvances in Neural Information Processing Systems, pages 87934–87958, 2024. 3

work page 2024

[25] [25]

Dl3dv-10k: A large-scale scene dataset for deep learning-based 3d vision

Lu Ling, Yichen Sheng, Zhi Tu, Wentian Zhao, Cheng Xin, Kun Wan, Lantao Yu, Qianyu Guo, Zixun Yu, Yawen Lu, et al. Dl3dv-10k: A large-scale scene dataset for deep learning-based 3d vision. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 22160–22169, 2024. 2, 6, 7, 8

work page 2024

[26] [26]

Re- conx: Reconstruct any scene from sparse views with video diffusion model.IEEE Transactions on Image Processing,

Fangfu Liu, Wenqiang Sun, Hanyang Wang, Yikai Wang, Haowen Sun, Junliang Ye, Jun Zhang, and Yueqi Duan. Re- conx: Reconstruct any scene from sparse views with video diffusion model.IEEE Transactions on Image Processing,

work page

[27] [27]

Zero-1-to- 3: Zero-shot one image to 3d object

Ruoshi Liu, Rundi Wu, Basile Van Hoorick, Pavel Tok- makov, Sergey Zakharov, and Carl V ondrick. Zero-1-to- 3: Zero-shot one image to 3d object. InProceedings of the IEEE/CVF international conference on computer vision, pages 9298–9309, 2023. 3

work page 2023

[28] [28]

3dgs-enhancer: Enhancing unbounded 3d gaussian splatting with view- consistent 2d diffusion priors.Advances in Neural Informa- tion Processing Systems, 37:133305–133327, 2024

Xi Liu, Chaoyi Zhou, and Siyu Huang. 3dgs-enhancer: Enhancing unbounded 3d gaussian splatting with view- consistent 2d diffusion priors.Advances in Neural Informa- tion Processing Systems, 37:133305–133327, 2024. 1, 2, 3

work page 2024

[29] [29]

Sparseneus: Fast generalizable neural sur- face reconstruction from sparse views

Xiaoxiao Long, Cheng Lin, Peng Wang, Taku Komura, and Wenping Wang. Sparseneus: Fast generalizable neural sur- face reconstruction from sparse views. InEuropean Confer- ence on Computer Vision, pages 210–227. Springer, 2022. 2

work page 2022

[30] [30]

Im-3d: Iterative multiview diffusion and reconstruction for high-quality 3d generation.International Conference on Machine Learning, 2024

Luke Melas-Kyriazi, Iro Laina, Christian Rupprecht, Natalia Neverova, Andrea Vedaldi, Oran Gafni, and Filippos Kokki- nos. Im-3d: Iterative multiview diffusion and reconstruction for high-quality 3d generation.International Conference on Machine Learning, 2024. 3

work page 2024

[31] [31]

Srinivasan, Matthew Tancik, Jonathan T

Ben Mildenhall, Pratul P. Srinivasan, Matthew Tancik, Jonathan T. Barron, Ravi Ramamoorthi, and Ren Ng. Nerf: Representing scenes as neural radiance fields for view syn- thesis. InECCV, 2020. 1, 2

work page 2020

[32] [32]

Hardware acceleration of neu- ral graphics

Muhammad Husnain Mubarik, Ramakrishna Kanungo, To- bias Zirr, and Rakesh Kumar. Hardware acceleration of neu- ral graphics. InProceedings of the 50th Annual International Symposium on Computer Architecture, pages 1–12, 2023. 2

work page 2023

[33] [33]

Reg- nerf: Regularizing neural radiance fields for view synthesis from sparse inputs

Michael Niemeyer, Jonathan T Barron, Ben Mildenhall, Mehdi SM Sajjadi, Andreas Geiger, and Noha Radwan. Reg- nerf: Regularizing neural radiance fields for view synthesis from sparse inputs. InProceedings of the IEEE/CVF con- ference on computer vision and pattern recognition, pages 5480–5490, 2022. 2

work page 2022

[34] [34]

Barron, and Ben Milden- hall

Ben Poole, Ajay Jain, Jonathan T. Barron, and Ben Milden- hall. Dreamfusion: Text-to-3d using 2d diffusion. InThe Eleventh International Conference on Learning Representa- tions, 2023. 3

work page 2023

[35] [35]

Estimating 3d uncertainty field: Quantify- ing uncertainty for neural radiance fields

Jianxiong Shen, Ruijie Ren, Adria Ruiz, and Francesc Moreno-Noguer. Estimating 3d uncertainty field: Quantify- ing uncertainty for neural radiance fields. In2024 IEEE In- ternational Conference on Robotics and Automation (ICRA), pages 2375–2381. IEEE, 2024. 3

work page 2024

[36] [36]

MVDream: Multi-view Diffusion for 3D Generation

Yichun Shi, Peng Wang, Jianglong Ye, Long Mai, Kejie Li, and Xiao Yang. Mvdream: Multi-view diffusion for 3d gen- eration.arXiv:2308.16512, 2023. 3

work page internal anchor Pith review Pith/arXiv arXiv 2023

[37] [37]

Attention is all you need.Advances in neural information processing systems, 30, 2017

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszko- reit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need.Advances in neural information processing systems, 30, 2017. 3

work page 2017

[38] [38]

Sparsenerf: Distilling depth ranking for few-shot novel view synthesis

Guangcong Wang, Zhaoxi Chen, Chen Change Loy, and Zi- wei Liu. Sparsenerf: Distilling depth ranking for few-shot novel view synthesis. InProceedings of the IEEE/CVF inter- national conference on computer vision, pages 9065–9076,

work page

[39] [39]

Image quality assessment: from error visibility to structural similarity.IEEE transactions on image processing, 13(4):600–612, 2004

Zhou Wang, Alan C Bovik, Hamid R Sheikh, and Eero P Si- moncelli. Image quality assessment: from error visibility to structural similarity.IEEE transactions on image processing, 13(4):600–612, 2004. 6

work page 2004

[40] [40]

Sparse2dgs: Geometry-prioritized gaussian splatting for surface reconstruction from sparse views

Jiang Wu, Rui Li, Yu Zhu, Rong Guo, Jinqiu Sun, and Yan- ning Zhang. Sparse2dgs: Geometry-prioritized gaussian splatting for surface reconstruction from sparse views. In Proceedings of the Computer Vision and Pattern Recognition Conference, pages 11307–11316, 2025. 2

work page 2025

[41] [41]

Difix3d+: Improving 3d reconstruc- tions with single-step diffusion models

Jay Zhangjie Wu, Yuxuan Zhang, Haithem Turki, Xuanchi Ren, Jun Gao, Mike Zheng Shou, Sanja Fidler, Zan Goj- cic, and Huan Ling. Difix3d+: Improving 3d reconstruc- tions with single-step diffusion models. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 26024–26035, 2025. 1, 3, 4, 5, 6, 7, 8

work page 2025

[42] [42]

Reconfusion: 3d reconstruction with diffusion priors

Rundi Wu, Ben Mildenhall, Philipp Henzler, Keunhong Park, Ruiqi Gao, Daniel Watson, Pratul P Srinivasan, Dor Verbin, Jonathan T Barron, Ben Poole, et al. Reconfusion: 3d reconstruction with diffusion priors. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 21551–21561, 2024. 3, 6

work page 2024

[43] [43]

Genfusion: Closing the loop between recon- struction and generation via videos

Sibo Wu, Congrong Xu, Binbin Huang, Andreas Geiger, and Anpei Chen. Genfusion: Closing the loop between recon- struction and generation via videos. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 6078–6088, 2025. 2, 6, 7, 1

work page 2025

[44] [44]

Depthsplat: Connecting gaussian splatting and depth

Haofei Xu, Songyou Peng, Fangjinhua Wang, Hermann Blum, Daniel Barath, Andreas Geiger, and Marc Pollefeys. Depthsplat: Connecting gaussian splatting and depth. In Proceedings of the Computer Vision and Pattern Recognition Conference, pages 16453–16463, 2025. 3, 6

work page 2025

[45] [45]

Freenerf: Im- proving few-shot neural rendering with free frequency reg- ularization

Jiawei Yang, Marco Pavone, and Yue Wang. Freenerf: Im- proving few-shot neural rendering with free frequency reg- ularization. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 8254–8263,

work page

[46] [46]

Gaussiandreamer: Fast generation from text to 3d gaussians by bridging 2d and 3d diffusion models

Taoran Yi, Jiemin Fang, Junjie Wang, Guanjun Wu, Lingxi Xie, Xiaopeng Zhang, Wenyu Liu, Qi Tian, and Xinggang Wang. Gaussiandreamer: Fast generation from text to 3d gaussians by bridging 2d and 3d diffusion models. InCVPR,

work page

[47] [47]

Gsfixer: Improving 3d gaussian splatting with reference-guided video diffusion priors.arXiv preprint arXiv:2508.09667, 2025

Xingyilang Yin, Qi Zhang, Jiahao Chang, Ying Feng, Qing- nan Fan, Xi Yang, Chi-Man Pun, Huaqi Zhang, and Xi- aodong Cun. Gsfixer: Improving 3d gaussian splatting with reference-guided video diffusion priors.arXiv preprint arXiv:2508.09667, 2025. 2

work page arXiv 2025

[48] [48]

Plenoctrees for real-time rendering of neural radiance fields

Alex Yu, Ruilong Li, Matthew Tancik, Hao Li, Ren Ng, and Angjoo Kanazawa. Plenoctrees for real-time rendering of neural radiance fields. InProceedings of the IEEE/CVF international conference on computer vision, pages 5752– 5761, 2021. 2

work page 2021

[49] [49]

Mip-splatting: Alias-free 3d gaussian splat- ting

Zehao Yu, Anpei Chen, Binbin Huang, Torsten Sattler, and Andreas Geiger. Mip-splatting: Alias-free 3d gaussian splat- ting. InProceedings of the IEEE/CVF conference on com- puter vision and pattern recognition, pages 19447–19456,

work page

[50] [50]

The unreasonable effectiveness of deep features as a perceptual metric

Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shecht- man, and Oliver Wang. The unreasonable effectiveness of deep features as a perceptual metric. InProceedings of the IEEE conference on computer vision and pattern recogni- tion, pages 586–595, 2018. 6

work page 2018

[51] [51]

Stable virtual camera: Generative view synthesis with diffusion models

Jensen Zhou, Hang Gao, Vikram V oleti, Aaryaman Vasishta, Chun-Han Yao, Mark Boss, Philip Torr, Christian Rupprecht, and Varun Jampani. Stable virtual camera: Generative view synthesis with diffusion models. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 12405–12414, 2025. 2, 7, 1, 3

work page 2025

[52] [52]

Fsgs: Real-time few-shot view synthesis using gaussian splatting

Zehao Zhu, Zhiwen Fan, Yifan Jiang, and Zhangyang Wang. Fsgs: Real-time few-shot view synthesis using gaussian splatting. InEuropean conference on computer vision, pages 145–163. Springer, 2024. 7

work page 2024

[53] [53]

Surface splatting

Matthias Zwicker, Hanspeter Pfister, Jeroen van Baar, and Markus Gross. Surface splatting. InProceedings of the 28th Annual Conference on Computer Graphics and Interactive Techniques, page 371–378, New York, NY , USA, 2001. As- sociation for Computing Machinery. 2 HAD: Hallucination-Aware Diffusion Priors for 3D Reconstruction Supplementary Material Overv...

work page 2001

[54] [54]

Hallucination in NVS via diffusion We provide additional analysis to deepen understanding the hallucination issue introduced by diffusion models in NVS task. Specifically, we evaluate recent state-of-the-art meth- ods from two representative paradigms:Diffusion-assisted NVS with explicit 3DGS model(e.g., Difix3D [41], GenFu- sion [43] and 3DGS-enhancer [2...

work page

[55] [55]

Details of Hallucination Scoring Network Overview of hallucination score network.We provide a detailed model architecture Fig. 6. Training dataset curation.We provide additional details on the constructing training dataset for the hallucination score network. For all training scenes, we follow the Di- fix3D [41] pipeline under the 9-view setting to first ...

work page

[56] [56]

Improving GenFusion We integrate our hallucination scoring network into Gen- Fusion [43] – the state-of-the-art video-diffusion-assisted 3DGS training pipeline

Generalizing to other diffusion models 9.1. Improving GenFusion We integrate our hallucination scoring network into Gen- Fusion [43] – the state-of-the-art video-diffusion-assisted 3DGS training pipeline. Importantly, we apply the same HAD model as in the main paper without any additional fine-tuning on video diffusion data. To ensure a fair com- parison,...

work page

[57] [57]

Additional Qualitative Comparisons We provide additional qualitative results on both the DL3DV – as shown in Fig

More Results 10.1. Additional Qualitative Comparisons We provide additional qualitative results on both the DL3DV – as shown in Fig. 11 and Fig. 10, and MipN- eRF360 datasets – see Fig. 12. Note we also include the corresponding rendered videos in project website, provid- ing a clearer comparison across viewpoints. Both the qual- itative results and video...

work page 1974