pith. machine review for the scientific record. sign in

arxiv: 2604.08370 · v1 · submitted 2026-04-09 · 💻 cs.CV

Recognition: 2 theorem links

· Lean Theorem

SurfelSplat: Learning Efficient and Generalizable Gaussian Surfel Representations for Sparse-View Surface Reconstruction

Authors on Pith no claims yet

Pith reviewed 2026-05-10 17:46 UTC · model grok-4.3

classification 💻 cs.CV
keywords Gaussian surfelssparse-view reconstructionfeed-forward networksurface reconstruction3D Gaussian SplattingNyquist samplingcross-view aggregationlow-pass filters
0
0 comments X

The pith

A feed-forward network reconstructs accurate 3D surfaces from sparse images by predicting Gaussian surfels after cross-view Nyquist filtering.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets out to build a fast, generalizable model that turns a handful of input photographs into precise 3D surface geometry represented as Gaussian surfels. It starts from the observation that ordinary feed-forward networks cannot recover fine geometric details because the spatial frequencies of pixel-aligned primitives exceed the sampling limits of the input views. To correct this, the authors insert a cross-view aggregation step that first damps the surfel geometry with low-pass filters scaled to each view's sampling rate, then projects the filtered surfels across all inputs to collect feature correlations, and finally routes those correlations through a fusion network that regresses accurate surfel parameters. The resulting system matches the reconstruction quality of slow per-scene optimization methods on standard benchmarks while finishing in roughly one second and requiring no scene-specific training. This matters because it removes the need for dense camera arrays or lengthy computation, making high-quality surface capture feasible in ordinary settings.

Core claim

SurfelSplat generates pixel-aligned Gaussian surfel representations from sparse-view images by adapting their geometric forms with spatial sampling rate-guided low-pass filters, projecting the filtered surfels across all input views to obtain cross-view feature correlations, and processing those correlations through a feature fusion network to regress Gaussian surfels that carry precise geometry, thereby producing efficient and generalizable surface reconstructions.

What carries the argument

The cross-view feature aggregation module, which first damps Gaussian surfel geometry with spatial sampling rate-guided low-pass filters, projects the results across views to extract correlations, and fuses them to regress accurate parameters.

If this is right

  • The model produces reconstruction quality comparable to state-of-the-art optimization pipelines on DTU benchmarks.
  • Gaussian surfels are predicted in approximately one second per scene.
  • No per-scene optimization or training is required, allowing the same weights to be used on new scenes.
  • The output representations remain efficient for both surface extraction and downstream rendering tasks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same sampling-rate-guided filtering step could be inserted into other feed-forward 3D reconstruction pipelines that currently suffer from high-frequency artifacts.
  • The one-second inference time opens the possibility of near-real-time surface capture on mobile or embedded hardware.
  • Extending the aggregation to handle video sequences rather than static sparse sets could support online 3D modeling without additional changes to the core architecture.
  • If the low-pass filters were made content-adaptive, the method might maintain accuracy even when input views are fewer or more irregularly spaced than those used in current benchmarks.

Load-bearing premise

That the main reason feed-forward networks lose geometric accuracy is the spatial-frequency mismatch between pixel-aligned primitives and the Nyquist limit of sparse inputs, and that low-pass filtering plus cross-view aggregation fully corrects it.

What would settle it

A direct comparison of geometric error on the same DTU scenes run once with the low-pass filters and cross-view fusion enabled and once with both disabled; if the disabled version shows no large increase in surface error, the claimed mechanism does not carry the result.

Figures

Figures reproduced from arXiv: 2604.08370 by Chensheng Dai, Min Chen, Shengjun Zhang, Yueqi Duan.

Figure 1
Figure 1. Figure 1: Our method delivers state-of-the-art surface reconstruction with ultra-fast inference speed. [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Experimental Observation. (a) Current feed-forward networks generate geometrically [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Pipeline. Given an image pair, our method first extracts initial image features using a [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Qualitative Comparison of Surface Reconstruction with Sparse Views on DTU Benchmarks. [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Visualization of Nyquist Theorem Verification [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Nyquist Theorem Verification: (a) Before adaptation, most surfels exceed the Nyquist [PITH_FULL_IMAGE:figures/full_fig_p010_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Given a pair of images, our method exhibits consistent and stable performance across [PITH_FULL_IMAGE:figures/full_fig_p019_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Visual comparison of novel view synthesis on DTU dataset. [PITH_FULL_IMAGE:figures/full_fig_p020_8.png] view at source ↗
read the original abstract

3D Gaussian Splatting (3DGS) has demonstrated impressive performance in 3D scene reconstruction. Beyond novel view synthesis, it shows great potential for multi-view surface reconstruction. Existing methods employ optimization-based reconstruction pipelines that achieve precise and complete surface extractions. However, these approaches typically require dense input views and high time consumption for per-scene optimization. To address these limitations, we propose SurfelSplat, a feed-forward framework that generates efficient and generalizable pixel-aligned Gaussian surfel representations from sparse-view images. We observe that conventional feed-forward structures struggle to recover accurate geometric attributes of Gaussian surfels because the spatial frequency of pixel-aligned primitives exceeds Nyquist sampling rates. Therefore, we propose a cross-view feature aggregation module based on the Nyquist sampling theorem. Specifically, we first adapt the geometric forms of Gaussian surfels with spatial sampling rate-guided low-pass filters. We then project the filtered surfels across all input views to obtain cross-view feature correlations. By processing these correlations through a specially designed feature fusion network, we can finally regress Gaussian surfels with precise geometry. Extensive experiments on DTU reconstruction benchmarks demonstrate that our model achieves comparable results with state-of-the-art methods, and predict Gaussian surfels within 1 second, offering a 100x speedup without costly per-scene training.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes SurfelSplat, a feed-forward framework for generating pixel-aligned Gaussian surfel representations from sparse-view images for 3D surface reconstruction. It addresses the challenge of high spatial frequencies in pixel-aligned primitives exceeding Nyquist rates by introducing a cross-view feature aggregation module with spatial sampling rate-guided low-pass filters, followed by projection for cross-view correlations and a feature fusion network to regress precise geometry. The method claims to achieve comparable performance to state-of-the-art on DTU benchmarks while providing a 100x speedup and predicting within 1 second without per-scene optimization.

Significance. Should the claims hold under rigorous validation, this work would be significant for enabling efficient and generalizable surface reconstruction in sparse-view settings, extending 3D Gaussian Splatting to practical applications by avoiding costly per-scene training. The feed-forward design and 100x speedup, if substantiated, offer clear practical value for real-time or large-scale deployment.

major comments (2)
  1. §3 (cross-view feature aggregation module): The core assumption that spatial sampling rate-guided low-pass filters followed by cross-view fusion can recover high-frequency geometric attributes filtered out by the Nyquist limit lacks any frequency-domain analysis, pre/post-filter spectra comparison to ground-truth surfaces, or explicit verification that the correlations restore the lost details using only sparse views.
  2. §4 (DTU experiments): The reported 'comparable results' to SOTA methods provide no error bars, no ablation isolating the low-pass filter component, and no details on how geometry accuracy was measured (e.g., Chamfer distance computation or surface normal evaluation), making it impossible to attribute performance gains to the proposed module versus the base network.
minor comments (2)
  1. The abstract would be strengthened by replacing the qualitative phrase 'comparable results' with specific quantitative metrics (e.g., mean Chamfer distance or PSNR values) from the DTU tables.
  2. Notation for the filtered surfel parameters (e.g., how the low-pass cutoff is computed from the sampling rate) should be defined more explicitly with an equation in the method section.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on our manuscript. We address each major point below and will revise the paper to incorporate additional analysis and experimental details.

read point-by-point responses
  1. Referee: §3 (cross-view feature aggregation module): The core assumption that spatial sampling rate-guided low-pass filters followed by cross-view fusion can recover high-frequency geometric attributes filtered out by the Nyquist limit lacks any frequency-domain analysis, pre/post-filter spectra comparison to ground-truth surfaces, or explicit verification that the correlations restore the lost details using only sparse views.

    Authors: We agree that the manuscript would be strengthened by explicit frequency-domain analysis. Section 3 motivates the low-pass filters from the Nyquist theorem to address high spatial frequencies in pixel-aligned surfels, but does not include spectra plots or direct verification. In the revision we will add a dedicated analysis subsection with pre- and post-filter frequency spectra comparisons on DTU surfaces and quantitative verification that cross-view correlations recover high-frequency details from the sparse input views. revision: yes

  2. Referee: §4 (DTU experiments): The reported 'comparable results' to SOTA methods provide no error bars, no ablation isolating the low-pass filter component, and no details on how geometry accuracy was measured (e.g., Chamfer distance computation or surface normal evaluation), making it impossible to attribute performance gains to the proposed module versus the base network.

    Authors: We acknowledge the need for more rigorous reporting. The current experiments follow the standard DTU protocol for Chamfer distance and normal evaluation, but omit error bars and the requested ablation. The revised manuscript will report mean and standard deviation over multiple runs, add an ablation isolating the low-pass filter, and explicitly describe the metric computation (Chamfer distance on reconstructed meshes and normal consistency as in prior DTU surface reconstruction papers). revision: yes

Circularity Check

0 steps flagged

No circularity: empirical feed-forward network with external validation

full rationale

The paper describes a learned feed-forward architecture that applies Nyquist-inspired low-pass filtering and cross-view fusion to regress Gaussian surfels. Its central result is obtained by training on external data and evaluating on DTU benchmarks, with no equations or steps that reduce the claimed output to a fitted input, self-definition, or self-citation chain by construction. The frequency-mismatch observation motivates the design but does not tautologically determine the network's learned parameters or predictions.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on the assumption that low-pass filtering guided by spatial sampling rate plus cross-view correlation fusion is sufficient to recover sub-Nyquist geometry; no new physical entities or ad-hoc constants are introduced beyond standard neural-network training.

axioms (2)
  • domain assumption Pixel-aligned Gaussian surfels have spatial frequencies that exceed the Nyquist rate of the input views.
    Invoked in the second paragraph of the abstract to motivate the cross-view module.
  • domain assumption Cross-view feature correlations after low-pass filtering contain sufficient information to regress accurate surfel geometry.
    Core premise of the feature fusion network described in the abstract.

pith-pipeline@v0.9.0 · 5543 in / 1356 out tokens · 24138 ms · 2026-05-10T17:46:44.415186+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

  • IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear
    ?
    unclear

    Relation between the paper passage and the cited Recognition theorem.

    We observe that conventional feed-forward structures struggle to recover accurate geometric attributes of Gaussian surfels because the spatial frequency of pixel-aligned primitives exceeds Nyquist sampling rates. Therefore, we propose a cross-view feature aggregation module based on the Nyquist sampling theorem. Specifically, we first adapt the geometric forms of Gaussian surfels with spatial sampling rate-guided low-pass filters.

  • IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear
    ?
    unclear

    Relation between the paper passage and the cited Recognition theorem.

    After adaptation, the spatial frequency can be constrained by setting s_u > 2/π : ν_k = 1/(s_u π) √(1 + 1/ν̂_k²) < ν̂_k / 2

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

69 extracted references · 10 canonical work pages · 1 internal anchor

  1. [1]

    Mvsnet: Depth inference for unstructured multi-view stereo

    Yao Yao, Zixin Luo, Shiwei Li, Tian Fang, and Long Quan. Mvsnet: Depth inference for unstructured multi-view stereo. InProceedings of the European conference on computer vision (ECCV), pages 767–783, 2018

  2. [2]

    Transmvsnet: Global context-aware multi-view stereo network with transformers

    Yikang Ding, Wentao Yuan, Qingtian Zhu, Haotian Zhang, Xiangyue Liu, Yuanjiang Wang, and Xiao Liu. Transmvsnet: Global context-aware multi-view stereo network with transformers. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 8585–8594, 2022

  3. [3]

    arXiv preprint arXiv:2106.10689 , year=

    Peng Wang, Lingjie Liu, Yuan Liu, Christian Theobalt, Taku Komura, and Wenping Wang. Neus: Learning neural implicit surfaces by volume rendering for multi-view reconstruction. arXiv preprint arXiv:2106.10689, 2021

  4. [4]

    V olume rendering of neural implicit surfaces.Advances in Neural Information Processing Systems, 34:4805–4815, 2021

    Lior Yariv, Jiatao Gu, Yoni Kasten, and Yaron Lipman. V olume rendering of neural implicit surfaces.Advances in Neural Information Processing Systems, 34:4805–4815, 2021

  5. [5]

    Monosdf: Exploring monocular geometric cues for neural implicit surface reconstruction.Advances in neural information processing systems, 35:25018–25032, 2022

    Zehao Yu, Songyou Peng, Michael Niemeyer, Torsten Sattler, and Andreas Geiger. Monosdf: Exploring monocular geometric cues for neural implicit surface reconstruction.Advances in neural information processing systems, 35:25018–25032, 2022

  6. [6]

    Neusurf: On- surface priors for neural surface reconstruction from sparse input views

    Han Huang, Yulun Wu, Junsheng Zhou, Ge Gao, Ming Gu, and Yu-Shen Liu. Neusurf: On- surface priors for neural surface reconstruction from sparse input views. InProceedings of the AAAI conference on artificial intelligence, volume 38, pages 2312–2320, 2024

  7. [7]

    Sparseneus: Fast generalizable neural surface reconstruction from sparse views

    Xiaoxiao Long, Cheng Lin, Peng Wang, Taku Komura, and Wenping Wang. Sparseneus: Fast generalizable neural surface reconstruction from sparse views. InEuropean Conference on Computer Vision, pages 210–227. Springer, 2022

  8. [8]

    Uforecon: gener- alizable sparse-view surface reconstruction from arbitrary and unfavorable sets

    Youngju Na, Woo Jae Kim, Kyu Beom Han, Suhyeon Ha, and Sung-Eui Yoon. Uforecon: gener- alizable sparse-view surface reconstruction from arbitrary and unfavorable sets. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5094–5104, 2024

  9. [9]

    Neuralrecon: Real- time coherent 3d reconstruction from monocular video

    Jiaming Sun, Yiming Xie, Linghao Chen, Xiaowei Zhou, and Hujun Bao. Neuralrecon: Real- time coherent 3d reconstruction from monocular video. In2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021

  10. [10]

    Neus2: Fast learning of neural implicit surfaces for multi-view reconstruction

    Yiming Wang, Qin Han, Marc Habermann, Kostas Daniilidis, Christian Theobalt, and Lingjie Liu. Neus2: Fast learning of neural implicit surfaces for multi-view reconstruction. In2023 IEEE/CVF International Conference on Computer Vision (ICCV), 2023

  11. [11]

    S-volsdf: Sparse multi-view stereo regularization of neural implicit surfaces

    Haoyu Wu, Alexandros Graikos, and Dimitris Samaras. S-volsdf: Sparse multi-view stereo regularization of neural implicit surfaces. In2023 IEEE/CVF International Conference on Computer Vision (ICCV), 2023

  12. [12]

    V olume rendering of neural implicit surfaces

    Lior Yariv, Jiatao Gu, Yoni Kasten, and Yaron Lipman. V olume rendering of neural implicit surfaces. InProceedings of the 35th International Conference on Neural Information Processing Systems, 2021

  13. [13]

    Taylor, Mathias Unberath, Ming-Yu Liu, and Chen-Hsuan Lin

    Zhaoshuo Li, Thomas Müller, Alex Evans, Russell H. Taylor, Mathias Unberath, Ming-Yu Liu, and Chen-Hsuan Lin. Neuralangelo: High-fidelity neural surface reconstruction. In2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023

  14. [14]

    C2f2neus: Cascade cost frustum fusion for high fidelity and generalizable neural surface reconstruction

    Luoyuan Xu, Tao Guan, Yuesong Wang, Wenkai Liu, Zhaojie Zeng, Junle Wang, and Wei Yang. C2f2neus: Cascade cost frustum fusion for high fidelity and generalizable neural surface reconstruction. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 18291–18301, 2023

  15. [15]

    Retr: Modeling rendering via transformer for generalizable neural surface reconstruction.Advances in neural information processing systems, 36:62332–62351, 2023

    Yixun Liang, Hao He, and Yingcong Chen. Retr: Modeling rendering via transformer for generalizable neural surface reconstruction.Advances in neural information processing systems, 36:62332–62351, 2023. 11

  16. [16]

    3d gaussian splatting for real-time radiance field rendering.ACM Trans

    Bernhard Kerbl, Georgios Kopanas, Thomas Leimkühler, and George Drettakis. 3d gaussian splatting for real-time radiance field rendering.ACM Trans. Graph., 42(4):139–1, 2023

  17. [17]

    Sugar: Surface-aligned gaussian splatting for efficient 3d mesh reconstruction and high-quality mesh rendering

    Antoine Guédon and Vincent Lepetit. Sugar: Surface-aligned gaussian splatting for efficient 3d mesh reconstruction and high-quality mesh rendering. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5354–5363, 2024

  18. [18]

    2d gaussian splatting for geometrically accurate radiance fields

    Binbin Huang, Zehao Yu, Anpei Chen, Andreas Geiger, and Shenghua Gao. 2d gaussian splatting for geometrically accurate radiance fields. InACM SIGGRAPH 2024 conference papers, pages 1–11, 2024

  19. [19]

    High- quality surface reconstruction using gaussian surfels

    Pinxuan Dai, Jiamin Xu, Wenxiang Xie, Xinguo Liu, Huamin Wang, and Weiwei Xu. High- quality surface reconstruction using gaussian surfels. InACM SIGGRAPH 2024 Conference Papers, pages 1–11, 2024

  20. [20]

    Gaussian opacity fields: Efficient adaptive surface reconstruction in unbounded scenes.ACM Transactions on Graphics (TOG), 43(6):1–13, 2024

    Zehao Yu, Torsten Sattler, and Andreas Geiger. Gaussian opacity fields: Efficient adaptive surface reconstruction in unbounded scenes.ACM Transactions on Graphics (TOG), 43(6):1–13, 2024

  21. [21]

    arXiv preprint arXiv:2403.16964 , year=

    Mulin Yu, Tao Lu, Linning Xu, Lihan Jiang, Yuanbo Xiangli, and Bo Dai. Gsdf: 3dgs meets sdf for improved rendering and reconstruction.arXiv preprint arXiv:2403.16964, 2024

  22. [22]

    Splatter image: Ultra-fast single-view 3d reconstruction

    Stanislaw Szymanowicz, Chrisitian Rupprecht, and Andrea Vedaldi. Splatter image: Ultra-fast single-view 3d reconstruction. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10208–10217, 2024

  23. [23]

    pixelsplat: 3d gaussian splats from image pairs for scalable generalizable 3d reconstruction

    David Charatan, Sizhe Lester Li, Andrea Tagliasacchi, and Vincent Sitzmann. pixelsplat: 3d gaussian splats from image pairs for scalable generalizable 3d reconstruction. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 19457–19467, 2024

  24. [24]

    Mvsplat: Efficient 3d gaussian splatting from sparse multi-view images

    Yuedong Chen, Haofei Xu, Chuanxia Zheng, Bohan Zhuang, Marc Pollefeys, Andreas Geiger, Tat-Jen Cham, and Jianfei Cai. Mvsplat: Efficient 3d gaussian splatting from sparse multi-view images. InEuropean Conference on Computer Vision, pages 370–386. Springer, 2024

  25. [25]

    latentsplat: Autoencoding variational gaussians for fast generalizable 3d reconstruction

    Christopher Wewer, Kevin Raj, Eddy Ilg, Bernt Schiele, and Jan Eric Lenssen. latentsplat: Autoencoding variational gaussians for fast generalizable 3d reconstruction. InEuropean Conference on Computer Vision, pages 456–473. Springer, 2024

  26. [26]

    arXiv preprint arXiv:2410.24207 (2024)

    Botao Ye, Sifei Liu, Haofei Xu, Xueting Li, Marc Pollefeys, Ming-Hsuan Yang, and Songyou Peng. No pose, no problem: Surprisingly simple 3d gaussian splats from sparse unposed images. arXiv preprint arXiv:2410.24207, 2024

  27. [27]

    Gaussian graph network: Learning efficient and generalizable gaussian representations from multi-view images.Advances in Neural Information Processing Systems, 37:50361–50380, 2024

    Shengjun Zhang, Xin Fei, Fangfu Liu, Haixu Song, and Yueqi Duan. Gaussian graph network: Learning efficient and generalizable gaussian representations from multi-view images.Advances in Neural Information Processing Systems, 37:50361–50380, 2024

  28. [28]

    Hisplat: Hierarchical 3d gaussian splatting for generalizable sparse-view reconstruction.arXiv preprint arXiv:2410.06245, 2024

    Shengji Tang, Weicai Ye, Peng Ye, Weihao Lin, Yang Zhou, Tao Chen, and Wanli Ouyang. Hisplat: Hierarchical 3d gaussian splatting for generalizable sparse-view reconstruction.arXiv preprint arXiv:2410.06245, 2024

  29. [29]

    Transplat: Generalizable 3d gaussian splatting from sparse multi-view images with transformers.AAAI, 2025

    Chuanrui Zhang, Yingshuang Zou, Zhuoling Li, Minmin Yi, and Haoqian Wang. Transplat: Generalizable 3d gaussian splatting from sparse multi-view images with transformers.AAAI, 2025

  30. [30]

    Nerf: Representing scenes as neural radiance fields for view synthesis

    Ben Mildenhall, Pratul P Srinivasan, Matthew Tancik, Jonathan T Barron, Ravi Ramamoor- thi, and Ren Ng. Nerf: Representing scenes as neural radiance fields for view synthesis. Communications of the ACM, 65(1):99–106, 2021

  31. [31]

    Freenerf: Improving few-shot neural rendering with free frequency regularization

    Jiawei Yang, Marco Pavone, and Yue Wang. Freenerf: Improving few-shot neural rendering with free frequency regularization. In2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023. 12

  32. [32]

    Mvsnerf: Fast generalizable radiance field reconstruction from multi-view stereo

    Anpei Chen, Zexiang Xu, Fuqiang Zhao, Xiaoshuai Zhang, Fanbo Xiang, Jingyi Yu, and Hao Su. Mvsnerf: Fast generalizable radiance field reconstruction from multi-view stereo. In2021 IEEE/CVF International Conference on Computer Vision (ICCV), 2021

  33. [33]

    Mip-nerf: A multiscale representation for anti-aliasing neural radiance fields

    Jonathan T Barron, Ben Mildenhall, Matthew Tancik, Peter Hedman, Ricardo Martin-Brualla, and Pratul P Srinivasan. Mip-nerf: A multiscale representation for anti-aliasing neural radiance fields. InICCV, pages 5855–5864, 2021

  34. [34]

    Kilonerf: Speeding up neural radiance fields with thousands of tiny mlps

    Christian Reiser, Songyou Peng, Yiyi Liao, and Andreas Geiger. Kilonerf: Speeding up neural radiance fields with thousands of tiny mlps. InICCV, pages 14335–14345, 2021

  35. [35]

    Point-nerf: Point-based neural radiance fields

    Qiangeng Xu, Zexiang Xu, Julien Philip, Sai Bi, Zhixin Shu, Kalyan Sunkavalli, and Ulrich Neumann. Point-nerf: Point-based neural radiance fields. In2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022

  36. [36]

    Fastnerf: High-fidelity neural rendering at 200fps

    Stephan J Garbin, Marek Kowalski, Matthew Johnson, Jamie Shotton, and Julien Valentin. Fastnerf: High-fidelity neural rendering at 200fps. InICCV, pages 14346–14355, 2021

  37. [37]

    Efficientnerf efficient neural radiance fields

    Tao Hu, Shu Liu, Yilun Chen, Tiancheng Shen, and Jiaya Jia. Efficientnerf efficient neural radiance fields. InCVPR, pages 12902–12911, 2022

  38. [38]

    Infonerf: Ray entropy minimization for few-shot neural volume rendering

    Mijeong Kim, Seonguk Seo, and Bohyung Han. Infonerf: Ray entropy minimization for few-shot neural volume rendering. In2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022

  39. [39]

    Qi, Xinchen Yan, Yin Zhou, Leonidas Guibas, and Dragomir Anguelov

    Congyue Deng, Chiyu Max Jiang, Charles R. Qi, Xinchen Yan, Yin Zhou, Leonidas Guibas, and Dragomir Anguelov. Nerdi: Single-view nerf synthesis with language-guided diffusion as general image priors. In2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023

  40. [40]

    Neural sparse voxel fields.NeurIPS, 33:15651–15663, 2020

    Lingjie Liu, Jiatao Gu, Kyaw Zaw Lin, Tat-Seng Chua, and Christian Theobalt. Neural sparse voxel fields.NeurIPS, 33:15651–15663, 2020

  41. [41]

    Occupancy networks: Learning 3d reconstruction in function space

    Lars Mescheder, Michael Oechsle, Michael Niemeyer, Sebastian Nowozin, and Andreas Geiger. Occupancy networks: Learning 3d reconstruction in function space. In2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019

  42. [42]

    D-nerf: Neural radiance fields for dynamic scenes

    Albert Pumarola, Enric Corona, Gerard Pons-Moll, and Francesc Moreno-Noguer. D-nerf: Neural radiance fields for dynamic scenes. InCVPR, pages 10318–10327, 2021

  43. [43]

    Grid-guided neural radiance fields for large urban scenes

    Linning Xu, Yuanbo Xiangli, Sida Peng, Xingang Pan, Nanxuan Zhao, Christian Theobalt, Bo Dai, and Dahua Lin. Grid-guided neural radiance fields for large urban scenes. InCVPR, pages 8296–8306, 2023

  44. [44]

    Mega-nerf: Scalable construc- tion of large-scale nerfs for virtual fly-throughs

    Haithem Turki, Deva Ramanan, and Mahadev Satyanarayanan. Mega-nerf: Scalable construc- tion of large-scale nerfs for virtual fly-throughs. InCVPR, pages 12922–12931, 2022

  45. [45]

    Block-nerf: Scalable large scene neural view synthesis

    Matthew Tancik, Vincent Casser, Xinchen Yan, Sabeek Pradhan, Ben Mildenhall, Pratul P Srinivasan, Jonathan T Barron, and Henrik Kretzschmar. Block-nerf: Scalable large scene neural view synthesis. InCVPR, pages 8248–8258, 2022

  46. [46]

    Efficient neural radiance fields for interactive free-viewpoint video

    Haotong Lin, Sida Peng, Zhen Xu, Yunzhi Yan, Qing Shuai, Hujun Bao, and Xiaowei Zhou. Efficient neural radiance fields for interactive free-viewpoint video. InSIGGRAPH Asia, pages 1–9, 2022

  47. [47]

    pixelnerf: Neural radiance fields from one or few images

    Alex Yu, Vickie Ye, Matthew Tancik, and Angjoo Kanazawa. pixelnerf: Neural radiance fields from one or few images. InCVPR, pages 4578–4587, 2021

  48. [48]

    Bungeenerf: Progressive neural radiance field for extreme multi-scale scene rendering

    Yuanbo Xiangli, Linning Xu, Xingang Pan, Nanxuan Zhao, Anyi Rao, Christian Theobalt, Bo Dai, and Dahua Lin. Bungeenerf: Progressive neural radiance field for extreme multi-scale scene rendering. InECCV, pages 106–122. Springer, 2022

  49. [49]

    Dynamic view synthesis from dynamic monocular video

    Chen Gao, Ayush Saraf, Johannes Kopf, and Jia-Bin Huang. Dynamic view synthesis from dynamic monocular video. InICCV, 2021. 13

  50. [50]

    Atlas: End-to-end 3d scene reconstruction from posed images

    Zak Murez, Tarrence Van As, James Bartolozzi, Ayan Sinha, Vijay Badrinarayanan, and Andrew Rabinovich. Atlas: End-to-end 3d scene reconstruction from posed images. InEuropean conference on computer vision, pages 414–431. Springer, 2020

  51. [51]

    V olrecon: V olume rendering of signed ray distance functions for generalizable multi-view reconstruction

    Yufan Ren, Fangjinhua Wang, Tong Zhang, Marc Pollefeys, and Sabine Süsstrunk. V olrecon: V olume rendering of signed ray distance functions for generalizable multi-view reconstruction. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16685–16695, 2023

  52. [52]

    Dynamic 3d gaussians: Tracking by persistent dynamic view synthesis

    Jonathon Luiten, Georgios Kopanas, Bastian Leibe, and Deva Ramanan. Dynamic 3d gaussians: Tracking by persistent dynamic view synthesis. In2024 International Conference on 3D Vision (3DV), 2024

  53. [53]

    Gps-gaussian: Generalizable pixel-wise 3d gaussian splatting for real-time human novel view synthesis

    Shunyuan Zheng, Boyao Zhou, Ruizhi Shao, Boning Liu, Shengping Zhang, Liqiang Nie, and Yebin Liu. Gps-gaussian: Generalizable pixel-wise 3d gaussian splatting for real-time human novel view synthesis. In2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024

  54. [54]

    Mip-splatting: Alias-free 3d gaussian splatting

    Zehao Yu, Anpei Chen, Binbin Huang, Torsten Sattler, and Andreas Geiger. Mip-splatting: Alias-free 3d gaussian splatting. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 19447–19456, 2024

  55. [55]

    Structure-from-motion revisited

    Johannes L Schonberger and Jan-Michael Frahm. Structure-from-motion revisited. InCVPR, pages 4104–4113, 2016

  56. [56]

    Lightgaussian: Unbounded 3d gaussian compression with 15x reduction and 200+ fps

    Zhiwen Fan, Kevin Wang, Kairun Wen, Zehao Zhu, Dejia Xu, and Zhangyang Wang. Light- gaussian: Unbounded 3d gaussian compression with 15x reduction and 200+ fps.arXiv preprint arXiv:2311.17245, 2023

  57. [57]

    arXiv preprint arXiv:2312.00109 , year =

    Tao Lu, Mulin Yu, Linning Xu, Yuanbo Xiangli, Limin Wang, Dahua Lin, and Bo Dai. Scaffold- gs: Structured 3d gaussians for view-adaptive rendering.arXiv preprint arXiv:2312.00109, 2023

  58. [58]

    An efficient 3D Gaussian representation for monocular/multi-view dynamic scenes.arXiv preprint arXiv:2311.12897, 2023

    Kai Katsumata, Duc Minh V o, and Hideki Nakayama. An efficient 3d gaussian representation for monocular/multi-view dynamic scenes.arXiv preprint arXiv:2311.12897, 2023

  59. [59]

    arXiv preprint arXiv:2312.00846 , year=

    Hanlin Chen, Chen Li, and Gim Hee Lee. Neusg: Neural implicit surface reconstruction with 3d gaussian splatting guidance.arXiv preprint arXiv:2312.00846, 2023

  60. [60]

    Uniquesplat: View- conditioned 3d gaussian splatting for generalizable 3d reconstruction.IEEE Transactions on Image Processing, 34:8376–8389, 2025

    Haixu Song, Xiaoke Yang, Shengjun Zhang, Jiwen Lu, and Yueqi Duan. Uniquesplat: View- conditioned 3d gaussian splatting for generalizable 3d reconstruction.IEEE Transactions on Image Processing, 34:8376–8389, 2025

  61. [61]

    Learning efficient and generalizable human representation with human gaussian model

    Yifan Liu, Shengjun Zhang, Chensheng Dai, Yang Chen, Hao Liu, Chen Li, and Yueqi Duan. Learning efficient and generalizable human representation with human gaussian model. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 11797– 11806, 2025

  62. [62]

    Certain topics in telegraph transmission theory.Transactions of the American Institute of Electrical Engineers, 47(2):617–644, 2009

    Harry Nyquist. Certain topics in telegraph transmission theory.Transactions of the American Institute of Electrical Engineers, 47(2):617–644, 2009

  63. [63]

    Large-scale data for multiple-view stereopsis.International Journal of Computer Vision, 120:153–168, 2016

    Henrik Aanæs, Rasmus Ramsbøl Jensen, George V ogiatzis, Engin Tola, and Anders Bjorholm Dahl. Large-scale data for multiple-view stereopsis.International Journal of Computer Vision, 120:153–168, 2016

  64. [64]

    Stereo Magnification: Learning View Synthesis using Multiplane Images

    Tinghui Zhou, Richard Tucker, John Flynn, Graham Fyffe, and Noah Snavely. Stereo magnifi- cation: Learning view synthesis using multiplane images.arXiv preprint arXiv:1805.09817, 2018

  65. [65]

    Blendedmvs: A large-scale dataset for generalized multi-view stereo networks

    Yao Yao, Zixin Luo, Shiwei Li, Jingyang Zhang, Yufan Ren, Lei Zhou, Tian Fang, and Long Quan. Blendedmvs: A large-scale dataset for generalized multi-view stereo networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 1790–1799, 2020. 14

  66. [66]

    Fatesgs: Fast and accurate sparse-view surface reconstruction using gaussian splatting with depth-feature consistency.arXiv preprint arXiv:2501.04628, 2025

    Han Huang, Yulun Wu, Chao Deng, Ge Gao, Ming Gu, and Yu-Shen Liu. Fatesgs: Fast and accurate sparse-view surface reconstruction using gaussian splatting with depth-feature consistency.arXiv preprint arXiv:2501.04628, 2025

  67. [67]

    Surfels: Surface elements as rendering primitives

    Hanspeter Pfister, Matthias Zwicker, Jeroen Van Baar, and Markus Gross. Surfels: Surface elements as rendering primitives. InProceedings of the 27th annual conference on Computer graphics and interactive techniques, pages 335–342, 2000

  68. [68]

    Vggt: Visual geometry grounded transformer

    Jianyuan Wang, Minghao Chen, Nikita Karaev, Andrea Vedaldi, Christian Rupprecht, and David Novotny. Vggt: Visual geometry grounded transformer. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 5294–5306, 2025

  69. [69]

    Dust3r: Geometric 3d vision made easy

    Shuzhe Wang, Vincent Leroy, Yohann Cabon, Boris Chidlovskii, and Jerome Revaud. Dust3r: Geometric 3d vision made easy. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 20697–20709, 2024. 15 Appendix A Preliminaries A.1 Surfels: Surface Elements Surface elements, commonly referred to assurfels, constitute a point-b...