pith. sign in

arxiv: 2510.09880 · v2 · submitted 2025-10-10 · 💻 cs.CV

Geometry-Aware Scene Configurations for Novel View Synthesis

Pith reviewed 2026-05-18 07:30 UTC · model grok-4.3

classification 💻 cs.CV
keywords novel view synthesisneural radiance fieldsindoor scene reconstructiongeometry-aware configurationscalable NeRFbasis placementvirtual viewpointsgeometric priors
0
0 comments X

The pith

Geometric priors guide adaptive base placement in scalable NeRFs to improve indoor novel view synthesis over uniform arrangements.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes scene-adaptive strategies that allocate representation capacity more efficiently for novel view synthesis in indoor environments from incomplete observations. It records observation statistics on an estimated geometric scaffold to determine optimal placement of bases, replacing the uniform arrangements used in prior scalable NeRF methods. The approach also introduces scene-adaptive virtual viewpoints that impose regularization to address deficiencies in the input view trajectory. These choices are shown to deliver better rendering quality and lower memory use across several large-scale indoor scenes with clutter, occlusion, and irregular layouts.

Core claim

The central claim is that recording observation statistics on the estimated geometric scaffold and using them to guide optimal placement of bases in scalable NeRF representations greatly improves upon uniform basis arrangements, while scene-adaptive virtual viewpoints compensate for geometric deficiencies in the input trajectory; comprehensive analysis in large indoor scenes demonstrates significant enhancements in rendering quality and memory requirements compared to regular-placement baselines.

What carries the argument

Observation statistics recorded on an estimated geometric scaffold that guide adaptive placement of representation bases and selection of virtual viewpoints.

If this is right

  • Adaptive base placement utilizes limited resources more effectively in irregular multi-room layouts with varying complexity.
  • Scene-adaptive virtual viewpoints impose regularization that compensates for deficiencies in the original input trajectory.
  • Overall memory requirements decrease while rendering quality rises compared with regular-placement baselines.
  • The method handles clutter, occlusion, and flat walls more robustly than uniform configurations.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same observation-statistic principle could be tested on outdoor or dynamic scenes once reliable geometric scaffolds become available.
  • Integration with denser input capture or multi-view stereo priors might further reduce the number of required bases.
  • Memory savings could enable deployment on resource-constrained devices for immersive VR walkthroughs of real indoor spaces.

Load-bearing premise

Geometric priors estimated after pre-processing stages are accurate and reliable enough to guide optimal basis placement and virtual viewpoint selection without introducing errors in cluttered or occluded indoor scenes.

What would settle it

A direct side-by-side rendering comparison on a cluttered indoor scene with heavy occlusion where the geometry-guided adaptive placement produces lower PSNR or visible artifacts relative to a uniform baseline of the same memory budget.

Figures

Figures reproduced from arXiv: 2510.09880 by Changwoon Choi, Minkwan Kim, Young Min Kim.

Figure 1
Figure 1. Figure 1: Geometry-aware basis configuration and novel view synthesis results. After recording measurement statistics as coverage weights on geometry scaffold, we find optimal basis positions and training-view configurations in indoor environments. Abstract We propose scene-adaptive strategies to efficiently allo￾cate representation capacity for generating immersive ex￾periences of indoor environments from incomplet… view at source ↗
Figure 2
Figure 2. Figure 2: Scene-splitting strategies. (a) The original NeRF repre￾sents the entire scene with a single block. Scalable scene represen￾tations set multiple bases (b) evenly along the camera trajectory or (c) uniformly dividing the scene’s spatial extent. (d) We propose an adaptive approach based on scene configurations. Red curves denote camera trajectories and green dots mark the bases’ centers. 2. Related Work Scen… view at source ↗
Figure 3
Figure 3. Figure 3: Method overview. (a) We first extract the geometric scaffold from multi-view image observation. (b) Then we define coverage weights wi for surface points xi on the obtained mesh surface which consider both scene geometry and observation statistics. Starting from bases sampled along the camera trajectory using the FPS algorithm, we optimize their positions by minimizing energy function Lcov. (c) Finally, we… view at source ↗
Figure 4
Figure 4. Figure 4: Basis placement pipeline. (a) After accumulating cov￾erage weights wi along surface points xi, (b) bases positions are updated using these weights and their distance from the surface. NeRF blocks We modify the block locations to evaluate our placement strategy for multiple NeRF blocks. We set centroids of NeRF blocks at scene-adaptive basis positions while LocalRF [25] sequentially assigns NeRF blocks alon… view at source ↗
Figure 5
Figure 5. Figure 5: Novel view geometric regularization. (a) The illustra￾tion of selected virtual views (yellow) with rendered depth images on the right. (b) Test viewpoint images (red) show the effect of each component, added sequentially from left to right. the cluster centers after applying K-means clustering on the basis probe locations. When rendering, NeLF-pro aggregates features from a set of probes near the rendering… view at source ↗
Figure 6
Figure 6. Figure 6: Qualitative results of novel view synthesis on ScanNet++ [48] dataset. Best viewed on screen. (3) Depth-NeRFacto extends NeRFacto with depth super￾vision. We present results using SparseNeRF [13] depth supervision which showed the best performance among the available options. (4) 3DGS [15] is a state-of-the-art ex￾plicit scene representation method. Gaussians are initialized from the same feature points th… view at source ↗
Figure 7
Figure 7. Figure 7: Qualitative results of novel view synthesis on Zip-NeRF [2] dataset. Best viewed on screen. Method PSNR ↑ SSIM ↑ LPIPS ↓ Memory Time MonoSDF [52] 20.61 0.733 0.459 61 MB 6 h NeRFacto [39] 19.78 0.692 0.491 65 MB 14 min NeRFacto [39]-Big 17.31 0.626 0.556 197 MB 1 h NeRFacto [39]-Huge 17.41 0.627 0.514 211 MB 1.5 h Depth-NeRFacto [39] 20.64 0.710 0.486 65 MB 17 min ZipNeRF [2] 23.53 0.758 0.415 1.2 GB 42 mi… view at source ↗
Figure 8
Figure 8. Figure 8: View extrapolation on Zip-NeRF [2] datasets. LocalRF variants require more memory due to the TensoRF framework, Mega-NeRF uses less memory with the origi￾nal NeRF implementation, taking over 2 days to train for up to 200k iterations. Meanwhile, Hierarchical-3DGS en￾ables very fast rendering using 3D Gaussians within multi￾ple chunks, leading to excessive memory consumption as the scene gets larger. To test… view at source ↗
Figure 9
Figure 9. Figure 9: PSNR comparison on a different number of bases. and (128,16), where the first value indicates the total num￾ber of bases and the second denotes the number of nearby bases. The number of core probes is fixed at 3. To ad￾just the number of NeRF blocks in the original LocalRF, we set different values for the maximum allowable drift before adding a new block. We examined PSNRs of the test views with the changi… view at source ↗
Figure 1
Figure 1. Figure 1: 2D toy examples of basis placement. We evaluate basis placement using Lcov (a) with and (b) without the denominator. Initial basis (circle outlines), optimized basis (solid green circles), and observation views (blue frustums) are visualized. 12 [PITH_FULL_IMAGE:figures/full_fig_p012_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Coverage weights and optimized basis positions. (a) Given training (blue) and test (red) cameras, initial bases (yellow spheres) are obtained using FPS on cameras. (b) Optimized bases are distributed at balanced positions regarding the coverage weight. Initial bases locations are indicated with sphere outlines, while the optimized bases are solid spheres in a bright green color. age closeness in Euclidean … view at source ↗
Figure 3
Figure 3. Figure 3: Shifting optimized basis positions. (a) PSNR com￾parisons with different levels of perturbation noise scales. (b) En￾largements of rendered images for visual comparison of sampled noise levels. (a) Performance comparisons (b) Memory comparisons [PITH_FULL_IMAGE:figures/full_fig_p014_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Experiments for different number of bases. (a) Per￾formance comparisons and (b) memory comparisons are plotted. set. To clearly analyze the effects of basis perturbation, we used neural lightfield probes, reducing the number of basis probes to 16, with 4 nearby probes contributing to render￾ing. We have tested on ScanNet++ [48] e91722b5a3 scene for this experiment. As the bases are perturbed, they deviate … view at source ↗
Figure 5
Figure 5. Figure 5: Examples of inaccurate geometries. (a) Rendered RGBD from GT mesh and (b) example depth inputs with different noise levels. Blur PSNR Noise PSNR Reference PSNR σ = 0.25 27.61 σ = 0.25 27.61 NeLF-pro 26.83 σ = 0.5 27.49 σ = 0.5 27.52 Ours w/o depth 26.96 σ = 1.0 27.40 σ = 1.0 27.27 Ours w/ GT depth (σ = 0) 27.67 σ = 1.5 27.21 σ = 1.5 27.23 Ours w/ MonoSDF depth 27.20 [PITH_FULL_IMAGE:figures/full_fig_p015_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Qualitative comparisons in Scuol dataset. We com￾pared our framework to NeLF-pro and Depth-NeRFacto. Tanks (Barn) Scuol PSNR↑ SSIM↑ LPIPS↓ PSNR↑ SSIM↑ LPIPS↓ Depth-NeRFacto 17.04 0.634 0.571 18.21 0.676 0.471 NeLF-pro 21.12 0.652 0.556 25.44 0.779 0.3 NeLF-pro+Ours 21.88 0.654 0.523 25.73 0.781 0.236 LocalRF 21.33 0.706 0.566 22.54 0.776 0.418 LocalRF+Ours 22.20 0.711 0.511 23.01 0.780 0.359 [PITH_FULL_IM… view at source ↗
Figure 7
Figure 7. Figure 7: Comparisons to SOTA 3DGS algorithms. We com￾pared our frameworks using both (a) ScanNet++ and (b) Zip-NeRF datasets. F. Additional Comparisons Comparison to SOTA 3DGS algorithms We addition￾ally conducted visual comparisons of our methods with SOTA 3DGS baselines. We compared our methods to FSGS [55] and Hierarchical-3DGS [16] as in the main manuscript. For NeLF-pro [49] and LocalRF [25] variants, we evalu… view at source ↗
Figure 8
Figure 8. Figure 8: Examples of limitation. (a) Baseline comparison PSNR ± std (b) Ablation study PSNR ± std [PITH_FULL_IMAGE:figures/full_fig_p018_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Error bars for our experiments. We compared (a) baselines and (b) ablation studies on ScanNet++ (left) and Zip￾NeRF (right) dataset. tested NeLF-pro variants on ScanNet++ dataset and Lo￾calRF variants on Zip-NeRF dataset, respectively. As illus￾trated in both figures, while optimal basis placement helps reduce blurry artifacts, our geometric regularization with virtual viewpoints further enhances stability… view at source ↗
Figure 10
Figure 10. Figure 10: Sampled virtual depths from ScanNet++ [48] dataset Full GT Full NeLF-pro +② +②,③ +① +①,② +①,②,④ [PITH_FULL_IMAGE:figures/full_fig_p019_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Qualitative results for ablation study on ScanNet++ [48] dataset. Full GT Full LocalRF +② +②,③ +① +①,② +①,②,④ [PITH_FULL_IMAGE:figures/full_fig_p019_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Qualitative results for ablation study on Zip-NeRF [2] dataset. 19 [PITH_FULL_IMAGE:figures/full_fig_p019_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Additional visualizations of novel view synthesis results on ScanNet++ [48] dataset. 20 [PITH_FULL_IMAGE:figures/full_fig_p020_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: Additional visualizations of novel view synthesis results on Zip-NeRF [2] dataset. 21 [PITH_FULL_IMAGE:figures/full_fig_p021_14.png] view at source ↗
read the original abstract

We propose scene-adaptive strategies to efficiently allocate representation capacity for generating immersive experiences of indoor environments from incomplete observations. Indoor scenes with multiple rooms often exhibit irregular layouts with varying complexity, containing clutter, occlusion, and flat walls. We maximize the utilization of limited resources with guidance from geometric priors, which are often readily available after pre-processing stages. We record observation statistics on the estimated geometric scaffold and guide the optimal placement of bases, which greatly improves upon the uniform basis arrangements adopted by previous scalable Neural Radiance Field (NeRF) representations. We also suggest scene-adaptive virtual viewpoints to compensate for geometric deficiencies inherent in view configurations in the input trajectory and impose the necessary regularization. We present a comprehensive analysis and discussion regarding rendering quality and memory requirements in several large-scale indoor scenes, demonstrating significant enhancements compared to baselines that employ regular placements. Project page is available at: https://mkjjang3598.github.io/Geo-Scene-Config.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes scene-adaptive strategies for novel view synthesis in indoor environments using scalable NeRF representations. It records observation statistics on an estimated geometric scaffold derived from pre-processing to guide non-uniform optimal placement of bases, improving upon uniform grid arrangements in prior work. The approach also introduces scene-adaptive virtual viewpoints to compensate for deficiencies in input trajectories and provides regularization. The authors claim significant gains in rendering quality and reduced memory requirements, supported by analysis on several large-scale indoor scenes with clutter, occlusion, and irregular layouts.

Significance. If the central claims hold with proper validation, the work could meaningfully advance efficient NeRF deployment for complex indoor scenes by showing how readily available geometric priors enable better capacity allocation than fixed regular placements. This addresses practical challenges in resource-limited immersive rendering and could influence follow-on methods that incorporate scene geometry for adaptive representations.

major comments (2)
  1. Abstract: The central claim of 'significant enhancements' and 'greatly improves upon the uniform basis arrangements' is load-bearing but unsupported, as the abstract (and visible description) contains no quantitative results, error metrics, ablation studies, or validation procedures to demonstrate the improvement magnitude or reliability.
  2. Method (geometric scaffold and basis placement): The improvement over prior scalable NeRFs rests on the assumption that observation statistics from the pre-processed geometric scaffold produce superior non-uniform base placement; however, no robustness analysis or failure-case experiments are described for the cluttered, occluded, or flat-wall indoor scenes explicitly flagged in the introduction, leaving open the risk that estimation errors propagate directly into misplaced bases.
minor comments (2)
  1. Introduction: The description of how geometric priors transition to virtual viewpoint selection could be expanded with a short diagram or pseudocode for clarity.
  2. Project page reference: The link is provided but the manuscript should explicitly state which supplementary materials (e.g., additional quantitative tables) are hosted there to aid reviewers.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments and the recommendation for major revision. We address each major comment below, indicating where revisions will be made to strengthen the presentation of our results and method.

read point-by-point responses
  1. Referee: [—] Abstract: The central claim of 'significant enhancements' and 'greatly improves upon the uniform basis arrangements' is load-bearing but unsupported, as the abstract (and visible description) contains no quantitative results, error metrics, ablation studies, or validation procedures to demonstrate the improvement magnitude or reliability.

    Authors: We agree that the abstract would benefit from explicit quantitative support for the claimed improvements. The full manuscript reports quantitative comparisons of rendering quality (PSNR, SSIM) and memory usage against uniform grid baselines across multiple large indoor scenes. In the revised version we will update the abstract to include representative numerical results from these experiments. revision: yes

  2. Referee: [—] Method (geometric scaffold and basis placement): The improvement over prior scalable NeRFs rests on the assumption that observation statistics from the pre-processed geometric scaffold produce superior non-uniform base placement; however, no robustness analysis or failure-case experiments are described for the cluttered, occluded, or flat-wall indoor scenes explicitly flagged in the introduction, leaving open the risk that estimation errors propagate directly into misplaced bases.

    Authors: The experiments section evaluates the method on large-scale indoor scenes that contain clutter, occlusion, and irregular layouts, with results showing consistent gains over uniform placements. We acknowledge that the current manuscript does not provide dedicated robustness analysis or explicit failure-case studies for errors in the geometric scaffold estimation. We will add a discussion of potential limitations and sensitivity to scaffold accuracy in the revised manuscript. revision: partial

Circularity Check

0 steps flagged

No circularity: geometric priors are external pre-processing input, placement heuristic is independent design choice

full rationale

The paper's core proposal records observation statistics from a separately estimated geometric scaffold (obtained via pre-processing) to inform non-uniform base placement and scene-adaptive viewpoints. This is a methodological heuristic applied to an external input rather than a derivation that reduces the final rendering claim or capacity allocation back to fitted parameters by construction. No equations or self-citations in the provided text create a self-definitional loop, fitted-input prediction, or load-bearing uniqueness theorem. The improvement over uniform grids is presented as an empirical outcome evaluated on indoor scenes, not a tautological renaming or ansatz smuggled via prior author work. The derivation chain remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Review based on abstract only; no explicit free parameters, axioms, or invented entities are stated. The approach assumes geometric priors are available from standard pre-processing without detailing how they are computed or any new entities introduced.

pith-pipeline@v0.9.0 · 5687 in / 1044 out tokens · 48857 ms · 2026-05-18T07:30:37.078289+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

55 extracted references · 55 canonical work pages · 3 internal anchors

  1. [1]

    Barron, Ben Mildenhall, Dor Verbin, Pratul P

    Jonathan T. Barron, Ben Mildenhall, Dor Verbin, Pratul P. Srinivasan, and Peter Hedman. Mip-nerf 360: Unbounded anti-aliased neural radiance fields.CVPR, 2022. 1, 4, 5, 6

  2. [2]

    Barron, Ben Mildenhall, Dor Verbin, Pratul P

    Jonathan T. Barron, Ben Mildenhall, Dor Verbin, Pratul P. Srinivasan, and Peter Hedman. Zip-nerf: Anti-aliased grid- based neural radiance fields.ICCV, 2023. 2, 5, 6, 7, 8, 12, 14, 15, 16, 17, 18, 19, 21

  3. [3]

    ZoeDepth: Zero-shot Transfer by Combining Relative and Metric Depth

    Shariq Farooq Bhat, Reiner Birkl, Diana Wofk, Peter Wonka, and Matthias M ¨uller. Zoedepth: Zero-shot trans- fer by combining relative and metric depth.arXiv preprint arXiv:2302.12288, 2023. 3

  4. [4]

    Nope-nerf: Optimising neu- ral radiance field with no pose prior

    Wenjing Bian, Zirui Wang, Kejie Li, Jia-Wang Bian, and Victor Adrian Prisacariu. Nope-nerf: Optimising neu- ral radiance field with no pose prior. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4160–4169, 2023. 3

  5. [5]

    The opencv library.Dr

    Gary Bradski. The opencv library.Dr. Dobb’s Journal: Soft- ware Tools for the Professional Programmer, 25(11):120– 123, 2000. 15

  6. [6]

    Tensorf: Tensorial radiance fields

    Anpei Chen, Zexiang Xu, Andreas Geiger, Jingyi Yu, and Hao Su. Tensorf: Tensorial radiance fields. InEuropean Conference on Computer Vision (ECCV), 2022. 4

  7. [7]

    Bal- anced spherical grid for egocentric view synthesis

    Changwoon Choi, Sang Min Kim, and Young Min Kim. Bal- anced spherical grid for egocentric view synthesis. InPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 16590–16599, 2023. 1

  8. [8]

    Dongyoung Choi, Hyeonjoong Jang, and Min H. Kim. Om- nilocalrf: Omnidirectional local radiance fields from dy- namic videos. InIEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2024. 2, 4

  9. [9]

    Depth-supervised NeRF: Fewer views and faster training for free

    Kangle Deng, Andrew Liu, Jun-Yan Zhu, and Deva Ra- manan. Depth-supervised NeRF: Fewer views and faster training for free. InProceedings of the IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition (CVPR),

  10. [10]

    Omnidata: A scalable pipeline for making multi- task mid-level vision datasets from 3d scans

    Ainaz Eftekhar, Alexander Sax, Jitendra Malik, and Amir Zamir. Omnidata: A scalable pipeline for making multi- task mid-level vision datasets from 3d scans. InProceedings of the IEEE/CVF International Conference on Computer Vi- sion, pages 10786–10796, 2021. 15, 16

  11. [11]

    A generic and flexible regularization framework for nerfs

    Thibaud Ehret, Roger Mar ´ı, and Gabriele Facciolo. A generic and flexible regularization framework for nerfs. In Proceedings of the IEEE/CVF Winter Conference on Ap- plications of Computer Vision (WACV), pages 3088–3097,

  12. [12]

    Plenoxels: Radiance fields without neural networks

    Sara Fridovich-Keil, Alex Yu, Matthew Tancik, Qinhong Chen, Benjamin Recht, and Angjoo Kanazawa. Plenoxels: Radiance fields without neural networks. InCVPR, 2022. 1, 2, 16

  13. [13]

    Sparsenerf: Distilling depth ranking for few-shot novel view synthesis.IEEE/CVF International Conference on Computer Vision (ICCV), 2023

    Guangcong, Zhaoxi Chen, Chen Change Loy, and Ziwei Liu. Sparsenerf: Distilling depth ranking for few-shot novel view synthesis.IEEE/CVF International Conference on Computer Vision (ICCV), 2023. 2, 3, 6, 15

  14. [14]

    Neural 3d scene reconstruction with the manhattan-world assumption

    Haoyu Guo, Sida Peng, Haotong Lin, Qianqian Wang, Guofeng Zhang, Hujun Bao, and Xiaowei Zhou. Neural 3d scene reconstruction with the manhattan-world assumption. InCVPR, 2022. 3

  15. [15]

    3d gaussian splatting for real-time radiance field rendering.ACM Transactions on Graphics, 42 (4), 2023

    Bernhard Kerbl, Georgios Kopanas, Thomas Leimk ¨uhler, and George Drettakis. 3d gaussian splatting for real-time radiance field rendering.ACM Transactions on Graphics, 42 (4), 2023. 6, 7, 16, 20

  16. [16]

    A hierarchical 3d gaussian representation for real-time ren- dering of very large datasets.ACM Transactions on Graph- ics, 43(4), 2024

    Bernhard Kerbl, Andreas Meuleman, Georgios Kopanas, Michael Wimmer, Alexandre Lanvin, and George Drettakis. A hierarchical 3d gaussian representation for real-time ren- dering of very large datasets.ACM Transactions on Graph- ics, 43(4), 2024. 6, 7, 16, 17

  17. [17]

    Tanks and temples: Benchmarking large-scale scene reconstruction.ACM Transactions on Graphics, 36(4), 2017

    Arno Knapitsch, Jaesik Park, Qian-Yi Zhou, and Vladlen Koltun. Tanks and temples: Benchmarking large-scale scene reconstruction.ACM Transactions on Graphics, 36(4), 2017. 16

  18. [18]

    Improving NeRF Quality by Progressive Camera Placement for Free- Viewpoint Navigation

    Georgios Kopanas and George Drettakis. Improving NeRF Quality by Progressive Camera Placement for Free- Viewpoint Navigation. InVision, Modeling, and Visualiza- tion. The Eurographics Association, 2023. 2, 14

  19. [19]

    Dngaussian: Optimizing sparse-view 3d gaussian radiance fields with global-local depth normaliza- tion.arXiv preprint arXiv:2403.06912, 2024

    Jiahe Li, Jiawei Zhang, Xiao Bai, Jin Zheng, Xin Ning, Jun Zhou, and Lin Gu. Dngaussian: Optimizing sparse-view 3d gaussian radiance fields with global-local depth normaliza- tion.arXiv preprint arXiv:2403.06912, 2024. 6, 7, 16

  20. [20]

    NeRF-XL: Scaling nerfs with multiple GPUs

    Ruilong Li, Sanja Fidler, Angjoo Kanazawa, and Francis Williams. NeRF-XL: Scaling nerfs with multiple GPUs. In European Conference on Computer Vision (ECCV), 2024. 2, 4

  21. [21]

    KITTI-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d.Pattern Analysis and Machine Intelligence (PAMI),

    Yiyi Liao, Jun Xie, and Andreas Geiger. KITTI-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d.Pattern Analysis and Machine Intelligence (PAMI),

  22. [22]

    Neural sparse voxel fields.NeurIPS,

    Lingjie Liu, Jiatao Gu, Kyaw Zaw Lin, Tat-Seng Chua, and Christian Theobalt. Neural sparse voxel fields.NeurIPS,

  23. [23]

    Lorensen and Harvey E

    William E. Lorensen and Harvey E. Cline. Marching cubes: A high resolution 3d surface construction algorithm. In Proceedings of the 14th Annual Conference on Computer Graphics and Interactive Techniques, page 163–169, New York, NY , USA, 1987. Association for Computing Machin- ery. 4

  24. [24]

    Julien N. P. Martel, David B. Lindell, Connor Z. Lin, Eric R. Chan, Marco Monteiro, and Gordon Wetzstein. Acorn: Adaptive coordinate networks for neural scene representa- tion.ACM Trans. Graph. (SIGGRAPH), 40(4), 2021. 2

  25. [25]

    Kim, and Johannes Kopf

    Andreas Meuleman, Yu-Lun Liu, Chen Gao, Jia-Bin Huang, Changil Kim, Min H. Kim, and Johannes Kopf. Progres- sively optimized local radiance fields for robust view synthe- sis. InCVPR, 2023. 2, 4, 6, 7, 8, 16, 17, 21

  26. [26]

    Srinivasan, Rodrigo Ortiz-Cayon, Nima Khademi Kalantari, Ravi Ramamoorthi, Ren Ng, and Abhishek Kar

    Ben Mildenhall, Pratul P. Srinivasan, Rodrigo Ortiz-Cayon, Nima Khademi Kalantari, Ravi Ramamoorthi, Ren Ng, and Abhishek Kar. Local light field fusion: Practical view syn- thesis with prescriptive sampling guidelines.ACM Transac- tions on Graphics (TOG), 2019. 6

  27. [27]

    Srinivasan, Matthew Tancik, Jonathan T

    Ben Mildenhall, Pratul P. Srinivasan, Matthew Tancik, Jonathan T. Barron, Ravi Ramamoorthi, and Ren Ng. Nerf: 9 Representing scenes as neural radiance fields for view syn- thesis. InECCV, 2020. 1, 6

  28. [28]

    Instant neural graphics primitives with a multires- olution hash encoding.ACM Trans

    Thomas M ¨uller, Alex Evans, Christoph Schied, and Alexan- der Keller. Instant neural graphics primitives with a multires- olution hash encoding.ACM Trans. Graph., 41(4):102:1– 102:15, 2022. 1, 5

  29. [29]

    Barron, Ben Mildenhall, Mehdi S

    Michael Niemeyer, Jonathan T. Barron, Ben Mildenhall, Mehdi S. M. Sajjadi, Andreas Geiger, and Noha Radwan. Regnerf: Regularizing neural radiance fields for view syn- thesis from sparse inputs. InProc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2022. 2, 8, 15

  30. [30]

    Ac- tivenerf: Learning where to see with uncertainty estimation

    Xuran Pan, Zihang Lai, Shiji Song, and Gao Huang. Ac- tivenerf: Learning where to see with uncertainty estimation. InComputer Vision–ECCV 2022: 17th European Confer- ence, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXXIII, pages 230–246. Springer, 2022. 2, 14

  31. [31]

    Vi- sion transformers for dense prediction.ICCV, 2021

    Ren ´e Ranftl, Alexey Bochkovskiy, and Vladlen Koltun. Vi- sion transformers for dense prediction.ICCV, 2021. 3, 8, 16

  32. [32]

    Towards robust monocular depth estimation: Mixing datasets for zero-shot cross-dataset transfer.IEEE Transactions on Pattern Analysis and Ma- chine Intelligence, 44(3), 2022

    Ren ´e Ranftl, Katrin Lasinger, David Hafner, Konrad Schindler, and Vladlen Koltun. Towards robust monocular depth estimation: Mixing datasets for zero-shot cross-dataset transfer.IEEE Transactions on Pattern Analysis and Ma- chine Intelligence, 44(3), 2022. 3, 16

  33. [33]

    Barron, Ben Mildenhall, Pratul P

    Barbara Roessle, Jonathan T. Barron, Ben Mildenhall, Pratul P. Srinivasan, and Matthias Nießner. Dense depth pri- ors for neural radiance fields from sparse input views. In Proceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition (CVPR), 2022. 2, 3

  34. [34]

    K-planes: Explicit radiance fields in space, time, and appearance

    Sara Fridovich-Keil and Giacomo Meanti, Frederik Rahbæk Warburg, Benjamin Recht, and Angjoo Kanazawa. K-planes: Explicit radiance fields in space, time, and appearance. In CVPR, 2023. 1

  35. [35]

    Structure-from-motion revisited

    Johannes Lutz Sch ¨onberger and Jan-Michael Frahm. Structure-from-motion revisited. InConference on Com- puter Vision and Pattern Recognition (CVPR), 2016. 1, 3, 15

  36. [36]

    ViP-NeRF: Visibility prior for sparse input neural radiance fields.ACM Trans

    Nagabhushan Somraj and Rajiv Soundararajan. ViP-NeRF: Visibility prior for sparse input neural radiance fields.ACM Trans. Graph., 2023. 2

  37. [37]

    Direct voxel grid optimization: Super-fast convergence for radiance fields reconstruction

    Cheng Sun, Min Sun, and Hwann-Tzong Chen. Direct voxel grid optimization: Super-fast convergence for radiance fields reconstruction. InCVPR, 2022. 1

  38. [38]

    Srinivasan, Jonathan T

    Matthew Tancik, Vincent Casser, Xinchen Yan, Sabeek Prad- han, Ben Mildenhall, Pratul P. Srinivasan, Jonathan T. Bar- ron, and Henrik Kretzschmar. Block-nerf: Scalable large scene neural view synthesis.arXiv, 2022. 2, 4

  39. [39]

    Nerfstudio: A modular framework for neural radiance field development

    Matthew Tancik, Ethan Weber, Evonne Ng, Ruilong Li, Brent Yi, Justin Kerr, Terrance Wang, Alexander Kristof- fersen, Jake Austin, Kamyar Salahi, Abhik Ahuja, David McAllister, and Angjoo Kanazawa. Nerfstudio: A modular framework for neural radiance field development. InACM SIGGRAPH 2023 Conference Proceedings, 2023. 5, 6, 7, 16, 20

  40. [40]

    Mega-nerf: Scalable construction of large- scale nerfs for virtual fly-throughs

    Haithem Turki, Deva Ramanan, and Mahadev Satya- narayanan. Mega-nerf: Scalable construction of large- scale nerfs for virtual fly-throughs. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 12922–12931, 2022. 2, 4, 6, 7, 16, 21

  41. [41]

    Scade: Nerfs from space carving with ambiguity-aware depth estimates

    Mikaela Angelina Uy, Ricardo Martin-Brualla, Leonidas Guibas, and Ke Li. Scade: Nerfs from space carving with ambiguity-aware depth estimates. InConference on Com- puter Vision and Pattern Recognition (CVPR), 2023. 2, 3

  42. [42]

    NeuS: Learning Neural Implicit Surfaces by Volume Rendering for Multi-view Reconstruction

    Peng Wang, Lingjie Liu, Yuan Liu, Christian Theobalt, Taku Komura, and Wenping Wang. Neus: Learning neural implicit surfaces by volume rendering for multi-view reconstruction. arXiv preprint arXiv:2106.10689, 2021. 3

  43. [43]

    DiffusioNeRF: Regularizing Neural Radiance Fields with Denoising Diffu- sion Models

    Jamie Wynn and Daniyar Turmukhambetov. DiffusioNeRF: Regularizing Neural Radiance Fields with Denoising Diffu- sion Models. InCVPR, 2023. 8

  44. [44]

    Nerf director: Revisiting view selection in neural vol- ume rendering

    Wenhui Xiao, Rodrigo Santa Cruz, David Ahmedt- Aristizabal, Olivier Salvado, Clinton Fookes, and Leo Le- brat. Nerf director: Revisiting view selection in neural vol- ume rendering. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2024. 2, 5, 14

  45. [45]

    Neural visibility field for uncertainty-driven active mapping

    Shangjie Xue, Jesse Dill, Pranay Mathur, Frank Dellaert, Panagiotis Tsiotra, and Danfei Xu. Neural visibility field for uncertainty-driven active mapping. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 18122–18132, 2024. 2

  46. [46]

    Ner- fvs: Neural radiance fields for free view synthesis via geom- etry scaffolds

    Chen Yang, Peihao Li, Zanwei Zhou, Shanxin Yuan, Bing- bing Liu, Xiaokang Yang, Weichao Qiu, and Wei Shen. Ner- fvs: Neural radiance fields for free view synthesis via geom- etry scaffolds. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16549– 16558, 2023. 3, 5

  47. [47]

    V olume rendering of neural implicit surfaces

    Lior Yariv, Jiatao Gu, Yoni Kasten, and Yaron Lipman. V olume rendering of neural implicit surfaces. InThirty- Fifth Conference on Neural Information Processing Systems,

  48. [48]

    Scannet++: A high-fidelity dataset of 3d indoor scenes

    Chandan Yeshwanth, Yueh-Cheng Liu, Matthias Nießner, and Angela Dai. Scannet++: A high-fidelity dataset of 3d indoor scenes. InProceedings of the International Confer- ence on Computer Vision (ICCV), 2023. 2, 5, 6, 8, 12, 13, 14, 15, 17, 18, 19, 20

  49. [49]

    Nelf-pro: Neural light field probes for multi-scale novel view synthe- sis

    Zinuo You, Andreas Geiger, and Anpei Chen. Nelf-pro: Neural light field probes for multi-scale novel view synthe- sis. InConference on Computer Vision and Pattern Recog- nition (CVPR), 2024. 2, 4, 5, 6, 7, 8, 12, 14, 16, 17, 20, 21

  50. [50]

    PlenOctrees for real-time rendering of neural radiance fields

    Alex Yu, Ruilong Li, Matthew Tancik, Hao Li, Ren Ng, and Angjoo Kanazawa. PlenOctrees for real-time rendering of neural radiance fields. InICCV, 2021. 1, 2

  51. [51]

    Sdfstudio: A unified framework for surface reconstruction, 2022

    Zehao Yu, Anpei Chen, Bozidar Antic, Songyou Peng, Apra- tim Bhattacharyya, Michael Niemeyer, Siyu Tang, Torsten Sattler, and Andreas Geiger. Sdfstudio: A unified framework for surface reconstruction, 2022. 5, 15

  52. [52]

    Monosdf: Exploring monocu- lar geometric cues for neural implicit surface reconstruc- tion.Advances in Neural Information Processing Systems (NeurIPS), 2022

    Zehao Yu, Songyou Peng, Michael Niemeyer, Torsten Sat- tler, and Andreas Geiger. Monosdf: Exploring monocu- lar geometric cues for neural implicit surface reconstruc- tion.Advances in Neural Information Processing Systems (NeurIPS), 2022. 3, 5, 6, 7, 8, 15, 16, 17 10

  53. [53]

    Nerf++: Analyzing and improving neural radiance fields.arXiv preprint arXiv:2010.07492, 2020

    Kai Zhang, Gernot Riegler, Noah Snavely, and Vladlen Koltun. Nerf++: Analyzing and improving neural radiance fields.arXiv preprint arXiv:2010.07492, 2020. 1

  54. [54]

    Open3D: A Modern Library for 3D Data Processing

    Qian-Yi Zhou, Jaesik Park, and Vladlen Koltun. Open3D: A modern library for 3D data processing.arXiv:1801.09847,

  55. [55]

    Fsgs: Real-time few-shot view synthesis using gaussian splatting

    Zehao Zhu, Zhiwen Fan, Yifan Jiang, and Zhangyang Wang. Fsgs: Real-time few-shot view synthesis using gaussian splatting. InEuropean conference on computer vision, pages 145–163. Springer, 2024. 6, 7, 16, 17 11 Geometry-Aware Scene Configurations for Novel View Synthesis Supplementary Material A. Basis Placement Optimization A.1. Algorithms and Optimizati...