pith. machine review for the scientific record. sign in

arxiv: 2604.22129 · v1 · submitted 2026-04-24 · 💻 cs.CV · cs.RO

Recognition: unknown

PAGaS: Pixel-Aligned 1DoF Gaussian Splatting for Depth Refinement

Authors on Pith no claims yet

Pith reviewed 2026-05-08 12:33 UTC · model grok-4.3

classification 💻 cs.CV cs.RO
keywords Gaussian SplattingDepth RefinementMulti-View Stereo3D ReconstructionPixel-Aligned Gaussians1DoF OptimizationGeometric Fidelity
0
0 comments X

The pith

Pixel-aligned 1DoF Gaussians refine depths by locking positions and sizes to back-projected volumes while optimizing only depth.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper adapts Gaussian Splatting to the task of refining scene depths from multiple images by representing each pixel with a Gaussian that has only one free parameter. Positions and sizes stay fixed according to the geometry of the camera rays, so optimization adjusts solely the depth value along each ray. This produces detailed depth maps that improve on both classic geometric and learning-based multi-view stereo methods when tested on standard reconstruction benchmarks. A reader would care because accurate depths form the foundation for building reliable 3D models from ordinary photographs, and the constraint keeps the process efficient.

Core claim

PAGaS models a pixel's depth using one-degree-of-freedom Gaussians whose positions and sizes are restricted by the back-projected pixel volumes, leaving depth as the sole degree of freedom to optimize during the splatting process for multi-view stereo depth refinement.

What carries the argument

Pixel-aligned 1DoF Gaussians, which fix lateral position and size to back-projected pixel volumes so depth is the only variable optimized.

If this is right

  • Produces highly detailed depth maps from multi-view images.
  • Outperforms reference geometric and learning-based multi-view stereo baselines on 3D reconstruction benchmarks.
  • Adapts the efficiency of Gaussian Splatting to depth refinement while preserving geometric fidelity.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The strict alignment with pixel volumes may reduce optimization ambiguity compared to fully free Gaussians in other tasks.
  • Refined depths from this method could serve as stronger initial geometry for subsequent novel-view synthesis pipelines.
  • The approach suggests that incorporating camera geometry directly into the representation can stabilize depth estimation across varied scenes.

Load-bearing premise

Locking Gaussian positions and sizes strictly to back-projected pixel volumes still supplies enough modeling power to represent complex real-world geometry without systematic bias or loss of fine details.

What would settle it

If PAGaS depth maps show higher error than unconstrained Gaussian splatting or standard multi-view stereo baselines on ground-truth benchmarks such as DTU or Tanks and Temples, the 1DoF constraint would fail to provide adequate modeling capacity.

Figures

Figures reproduced from arXiv: 2604.22129 by Aljaz Bozic, David Recasens, Edmond Boyer, Javier Civera, Robert Maier, Stephane Grabli, Tony Tung.

Figure 1
Figure 1. Figure 1: Results for 2DGS and after applying our PAGaS in DTU. Note the fine-grained details in our refined meshes and depth normals, particularly in the roof tiles, windows, curtains and brick walls, which prior methods fail to capture with this level of fidelity. Abstract Gaussian Splatting (GS) has emerged as an efficient ap￾proach for high-quality novel view synthesis. While early GS variants struggled to accur… view at source ↗
Figure 2
Figure 2. Figure 2: Depth refinement with Pixel-Aligned Gaussian Splat￾ting. For each valid pixel of the target view to refine, a 3D Gaus￾sian is initialized along its camera-to-pixel ray, at the position de￾termined by its initial coarse depth. Gaussians are spherical (triv￾ializing the rotation) with a depth dependent-scale (the further the bigger). Gaussian appearance is locked to its pixel color value, and opacity fixed t… view at source ↗
Figure 3
Figure 3. Figure 3: Occlusions problem (top) caused by having Gaussians only from the target view. These images show the normals from the rendered depth at a context view without and with the radius and depth thresholds of the Occlusion-Aware 3DGS Rasterizer. See how the Gaussians behind the wall perturbe its rendered val￾ues. In addition, some areas can have poor Gaussian coverage, provoking that those Gaussians that should … view at source ↗
Figure 4
Figure 4. Figure 4: Occlusion-Aware 3DGS Rasterizer logic, newly pro￾posed in this paper to effectively ignore Gaussians that should be occluded during alpha-blending. Radius threshold delimits in pixel units a circular area, around the center of the pixel that is being rasterized, where only Gaussians with a 2D mean that falls inside it are considered during alpha-blending. Depth threshold is a depth value that is added to t… view at source ↗
Figure 5
Figure 5. Figure 5: Normals from depth and 3D meshes from PGSR before and after refinement with PAGaS on DTU (scan24, scan106) and TnT (Barn, Courthouse). The normal maps reveal how PAGaS recovers the finest pixel-level surface details. All refined depth maps are fused into a single mesh using a TSDF. Under comparable memory budgets, smaller scenes allow finer voxel sizes, which preserve high-frequency geometry more effective… view at source ↗
Figure 6
Figure 6. Figure 6: ActorsHQ normals from depth and meshes from Colmap (left) and after PAGaS refinement (right). Zoom in to ob￾serve fine-grained details, e.g., at the eyelashes and eyebrows, the fabric of the dress or the sole of the sandals. Extensive additional qualitative results of the refined depths and meshes from MVSAnywhere, 2DGS and PGSR in DTU and TnT, and for COLMAP in BlendedMVS, are documented in the supplement… view at source ↗
read the original abstract

Gaussian Splatting (GS) has emerged as an efficient approach for high-quality novel view synthesis. While early GS variants struggled to accurately model the scene's geometry, recent advancements constraining the Gaussians' spread and shapes, such as 2D Gaussian Splatting, have significantly improved geometric fidelity. In this paper, we present Pixel-Aligned 1DoF Gaussian Splatting (PAGaS) that adapts the GS representation from novel view synthesis to the multi-view stereo depth task. Our key contribution is modeling a pixel's depth using one-degree-of-freedom (1DoF) Gaussians that remain tightly constrained during optimization. Unlike existing approaches, our Gaussians' positions and sizes are restricted by the back-projected pixel volumes, leaving depth as the sole degree of freedom to optimize. PAGaS produces highly detailed depths, as illustrated in Figure 1. We quantitatively validate these improvements on top of reference geometric and learning-based multi-view stereo baselines on challenging 3D reconstruction benchmarks. Code: davidrecasens.github.io/pagas

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript introduces Pixel-Aligned 1DoF Gaussian Splatting (PAGaS) as an adaptation of Gaussian Splatting for multi-view stereo depth refinement. Each pixel's depth is modeled by a Gaussian whose 3D position and 2D size are fixed by the back-projected frustum of the reference pixel, leaving only depth as a free optimization variable. The authors claim this constrained representation yields highly detailed depth maps that quantitatively outperform both classical geometric and recent learning-based MVS baselines on standard 3D reconstruction benchmarks.

Significance. If the 1DoF constraint proves sufficient, the approach offers a principled way to inject explicit geometric priors into Gaussian-based depth optimization, potentially improving efficiency and reducing overfitting relative to unconstrained 3DGS variants. The explicit linkage between pixel volumes and Gaussian parameters is a clear technical distinction from prior constrained splatting work. However, the significance hinges on whether the fixed footprint and ray-aligned placement can represent slanted or high-curvature surfaces without systematic bias; the abstract's benchmark claims cannot be fully evaluated without the detailed experimental section.

major comments (3)
  1. [Abstract] Abstract: the claim of 'highly detailed depths' and benchmark gains rests on the untested assumption that fixing Gaussian positions and sizes to back-projected pixel volumes still supplies enough expressivity; no equation or derivation shows how the model accommodates surface tilt or sub-pixel structure, which is load-bearing for the central contribution.
  2. [Method] Method section (key contribution paragraph): the statement that 'positions and sizes are restricted by the back-projected pixel volumes, leaving depth as the sole degree of freedom' is presented without an accompanying analysis or ablation demonstrating that this restriction does not introduce bias on non-fronto-parallel surfaces; such evidence is required to substantiate the modeling-power claim.
  3. [Experiments] Experiments: quantitative validation is asserted but the abstract supplies no error bars, ablation tables isolating the 1DoF constraint, or failure-case analysis; without these, the reported improvements over baselines cannot be assessed for statistical significance or robustness.
minor comments (2)
  1. [Abstract] Abstract: the code URL is given but no statement appears regarding reproducibility (e.g., random seeds, exact benchmark splits, or hyper-parameter ranges).
  2. [Figure 1] Figure 1 caption: the visual comparison would benefit from explicit annotation of regions where the 1DoF model succeeds or fails relative to baselines.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback on the expressivity of the 1DoF constraint and the need for stronger validation. We address each major comment below and have revised the manuscript to add the requested analysis, ablations, and statistical details.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the claim of 'highly detailed depths' and benchmark gains rests on the untested assumption that fixing Gaussian positions and sizes to back-projected pixel volumes still supplies enough expressivity; no equation or derivation shows how the model accommodates surface tilt or sub-pixel structure, which is load-bearing for the central contribution.

    Authors: We agree the abstract is concise and does not include the supporting derivation. The method section formulates each Gaussian as ray-aligned with its 3D center and 2D covariance fixed by the back-projected pixel frustum, so that optimizing the single depth parameter moves the Gaussian along the ray while the fixed footprint projects to cover the pixel. Neighboring pixels with different optimized depths naturally represent tilt, and the continuous depth optimization captures sub-pixel detail. To make this explicit in the abstract, we have added a one-sentence reference to the 1DoF formulation and moved a short derivation to the main text. revision: partial

  2. Referee: [Method] Method section (key contribution paragraph): the statement that 'positions and sizes are restricted by the back-projected pixel volumes, leaving depth as the sole degree of freedom' is presented without an accompanying analysis or ablation demonstrating that this restriction does not introduce bias on non-fronto-parallel surfaces; such evidence is required to substantiate the modeling-power claim.

    Authors: The referee correctly notes that the original submission lacked an explicit bias analysis. We have now inserted a short geometric argument in the method section showing that the ray-aligned 1DoF model can represent slanted surfaces because depth is optimized independently per pixel and the splatting kernel aggregates contributions across overlapping frustums. We have also added an ablation on the ETH3D and Tanks & Temples subsets containing predominantly non-fronto-parallel surfaces, comparing PAGaS against an unconstrained 3DGS baseline; the 1DoF version shows no measurable increase in error on slanted regions while remaining more stable. revision: yes

  3. Referee: [Experiments] Experiments: quantitative validation is asserted but the abstract supplies no error bars, ablation tables isolating the 1DoF constraint, or failure-case analysis; without these, the reported improvements over baselines cannot be assessed for statistical significance or robustness.

    Authors: We accept that the abstract and main experimental tables originally omitted error bars and a dedicated 1DoF ablation. The revised manuscript now reports mean and standard deviation over three random seeds for all quantitative tables, includes a new ablation table that isolates the effect of the 1DoF constraint versus full 3DGS and 2DGS variants, and adds a failure-case subsection discussing residual errors on high-curvature and specular regions with qualitative examples. These additions allow direct assessment of statistical significance and robustness. revision: yes

Circularity Check

0 steps flagged

No circularity: explicit 1DoF constraint is a direct modeling choice, not a self-referential reduction.

full rationale

The paper defines its PAGaS representation by construction as 1DoF Gaussians whose positions and sizes are fixed to back-projected pixel volumes, with depth as the only free parameter. This is an explicit design decision for adapting Gaussian Splatting to depth refinement, not a derivation whose output reduces to its inputs via equations or self-citation. No load-bearing steps match the enumerated circular patterns; the method is presented as a constrained optimization whose sufficiency is tested externally on benchmarks rather than assumed tautologically.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The approach rests on standard assumptions of differentiable rendering and multi-view geometry; no explicit free parameters or invented entities are named in the abstract, but the 1DoF restriction itself is an ad-hoc modeling choice.

axioms (2)
  • domain assumption Back-projection of image pixels defines valid 3D volumes that contain the true surface point
    Invoked when restricting Gaussian position and size to these volumes
  • domain assumption Optimization of the single depth parameter converges to a geometrically accurate solution
    Required for the claim that the method produces highly detailed depths
invented entities (1)
  • 1DoF Gaussian no independent evidence
    purpose: To represent pixel depth with only depth as optimizable variable while position and size are fixed by pixel volume
    New modeling primitive introduced for the depth task

pith-pipeline@v0.9.0 · 5504 in / 1395 out tokens · 35504 ms · 2026-05-08T12:33:46.230963+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

56 extracted references · 16 canonical work pages · 1 internal anchor

  1. [1]

    Milena T Bagdasarian, Paul Knoll, Yi-Hsin Li, Florian Barthel, Anna Hilsmann, Peter Eisert, and Wieland Morgen- stern. 3dgs. zip: A survey on 3d gaussian splatting compres- sion methods.arXiv preprint arXiv:2407.09510, 2024. 4

  2. [2]

    Mvsformer: Multi-view stereo by learning robust image features and temperature-based depth.arXiv preprint arXiv:2208.02541,

    Chenjie Cao, Xinlin Ren, and Yanwei Fu. Mvsformer: Multi-view stereo by learning robust image features and temperature-based depth.arXiv preprint arXiv:2208.02541,

  3. [3]

    Mvsformer++: Revealing the devil in transformer’s details for multi-view stereo.arXiv preprint arXiv:2401.11673, 2024

    Chenjie Cao, Xinlin Ren, and Yanwei Fu. Mvsformer++: Revealing the devil in transformer’s details for multi-view stereo.arXiv preprint arXiv:2401.11673, 2024. 2, 3

  4. [4]

    pixelsplat: 3d gaussian splats from image pairs for scalable generalizable 3d reconstruction

    David Charatan, Sizhe Lester Li, Andrea Tagliasacchi, and Vincent Sitzmann. pixelsplat: 3d gaussian splats from image pairs for scalable generalizable 3d reconstruction. InPro- ceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 19457–19467, 2024. 3

  5. [5]

    PGSR: Planar-based Gaussian Splatting for Efficient and High-Fidelity Surface Re- construction

    Danpeng Chen, Hai Li, Weicai Ye, Yifan Wang, Weijian Xie, Shangjin Zhai, Nan Wang, Haomin Liu, Hujun Bao, and Guofeng Zhang. Pgsr: Planar-based gaussian splatting for efficient and high-fidelity surface reconstruction.arxiv preprint arxiv:2406.06521, 2024. 2, 6

  6. [6]

    arXiv preprint arXiv:2312.00846 , year=

    Hanlin Chen, Chen Li, and Gim Hee Lee. Neusg: Neural im- plicit surface reconstruction with 3d gaussian splatting guid- ance.arXiv preprint arXiv:2312.00846, 2023. 2

  7. [7]

    Vcr-gaus: View consistent depth-normal regularizer for gaussian surface reconstruction

    Hanlin Chen, Fangyin Wei, Chen Li, Tianxin Huang, Yun- song Wang, and Gim Hee Lee. Vcr-gaus: View consistent depth-normal regularizer for gaussian surface reconstruction. arXiv preprint arXiv:2406.05774, 2024. 2

  8. [8]

    Splat-nav: Safe real-time robot navigation in gaussian splatting maps.arXiv preprint arXiv:2403.02751,

    Timothy Chen, Ola Shorinwa, Joseph Bruno, Aiden Swann, Javier Yu, Weijia Zeng, Keiko Nagami, Philip Dames, and Mac Schwager. Splat-nav: Safe real-time robot navigation in gaussian splatting maps.arXiv preprint arXiv:2403.02751,

  9. [9]

    Mvsplat: Efficient 3d gaussian splatting from sparse multi-view images

    Yuedong Chen, Haofei Xu, Chuanxia Zheng, Bohan Zhuang, Marc Pollefeys, Andreas Geiger, Tat-Jen Cham, and Jianfei Cai. Mvsplat: Efficient 3d gaussian splatting from sparse multi-view images. InEuropean Conference on Computer Vision, pages 370–386. Springer, 2024. 3

  10. [10]

    Click-gaussian: Interactive segmenta- tion to any 3d gaussians

    Seokhun Choi, Hyeonseop Song, Jaechul Kim, Taehyeong Kim, and Hoseok Do. Click-gaussian: Interactive segmenta- tion to any 3d gaussians. InEuropean Conference on Com- puter Vision, pages 289–305. Springer, 2024. 2

  11. [11]

    A volumetric method for building complex models from range images

    Brian Curless and Marc Levoy. A volumetric method for building complex models from range images. InProceedings of the 23rd annual conference on Computer graphics and interactive techniques, pages 303–312, 1996. 2

  12. [12]

    Mini-splatting2: Building 360 scenes within minutes via aggres- sive gaussian densification,

    Guangchi Fang and Bing Wang. Mini-splatting2: Building 360 scenes within minutes via aggressive gaussian densifica- tion.arXiv preprint arXiv:2411.12788, 2024. 2

  13. [13]

    3d gaussian as a new vision era: A survey.arXiv e-prints, pages arXiv–2402, 2024

    Ben Fei, Jingyi Xu, Rui Zhang, Qingyuan Zhou, Weidong Yang, and Ying He. 3d gaussian as a new vision era: A survey.arXiv e-prints, pages arXiv–2402, 2024. 2

  14. [14]

    Sugar: Surface-aligned gaussian splatting ...CVPR, 2024

    A Gu ´edon and V Lepetit. Sugar: Surface-aligned gaussian splatting ...CVPR, 2024. 2

  15. [15]

    2d gaussian splatting for geometrically accu- rate radiance fields

    Binbin Huang, Zehao Yu, Anpei Chen, Andreas Geiger, and Shenghua Gao. 2d gaussian splatting for geometrically accu- rate radiance fields. InSIGGRAPH 2024 Conference Papers. Association for Computing Machinery, 2024. 2, 6

  16. [16]

    Humanrf: High-fidelity neural radiance fields for humans in motion.ACM Transactions on Graphics (TOG), 42(4):1–12, 2023

    Mustafa Is ¸ık, Martin R ¨unz, Markos Georgopoulos, Taras Khakhulin, Jonathan Starck, Lourdes Agapito, and Matthias Nießner. Humanrf: High-fidelity neural radiance fields for humans in motion.ACM Transactions on Graphics (TOG), 42(4):1–12, 2023. 6

  17. [17]

    Mvsanywhere: Zero-shot multi-view stereo

    Sergio Izquierdo, Mohamed Sayed, Michael Firman, Guillermo Garcia-Hernando, Daniyar Turmukhambetov, Javier Civera, Oisin Mac Aodha, Gabriel Brostow, and Jamie Watson. Mvsanywhere: Zero-shot multi-view stereo. In Proceedings of the Computer Vision and Pattern Recognition Conference, pages 11493–11504, 2025. 2, 3, 6

  18. [18]

    Large scale multi-view stereopsis eval- uation

    Rasmus Jensen, Anders Dahl, George V ogiatzis, Engil Tola, and Henrik Aanæs. Large scale multi-view stereopsis eval- uation. In2014 IEEE Conference on Computer Vision and Pattern Recognition, pages 406–413. IEEE, 2014. 6

  19. [20]

    3d gaussian splatting for real-time radiance field rendering.ACM Trans

    Bernhard Kerbl, Georgios Kopanas, Thomas Leimk ¨uhler, and George Drettakis. 3d gaussian splatting for real-time radiance field rendering.ACM Trans. Graph., 42(4):139–1,

  20. [21]

    Tanks and temples: Benchmarking large-scale scene reconstruction.ACM Transactions on Graphics (ToG), 36 (4):1–13, 2017

    Arno Knapitsch, Jaesik Park, Qian-Yi Zhou, and Vladlen Koltun. Tanks and temples: Benchmarking large-scale scene reconstruction.ACM Transactions on Graphics (ToG), 36 (4):1–13, 2017. 6

  21. [22]

    Neuralangelo: High-fidelity neural surface reconstruction

    Zhaoshuo Li, Thomas M ¨uller, Alex Evans, Russell H Tay- lor, Mathias Unberath, Ming-Yu Liu, and Chen-Hsuan Lin. Neuralangelo: High-fidelity neural surface reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition, pages 8456–8465, 2023. 2

  22. [23]

    Gaussian-flow: 4d reconstruction with dynamic 3d gaus- sian particle

    Youtian Lin, Zuozhuo Dai, Siyu Zhu, and Yao Yao. Gaussian-flow: 4d reconstruction with dynamic 3d gaus- sian particle. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 21136– 21145, 2024. 2

  23. [24]

    Marching cubes: A high resolution 3d surface construction algorithm

    William E Lorensen and Harvey E Cline. Marching cubes: A high resolution 3d surface construction algorithm. InSem- inal graphics: pioneering efforts that shaped the field, pages 347–353. ACM, 1998. 6

  24. [25]

    Feature splatting for better novel view synthesis with low overlap

    T Berriel Martins and Javier Civera. Feature splatting for better novel view synthesis with low overlap. InBritish Ma- chine Vision Conference, 2024. 2

  25. [26]

    Nerf: Representing scenes as neural radiance fields for view syn- thesis.Communications of the ACM, 65(1):99–106, 2021

    Ben Mildenhall, Pratul P Srinivasan, Matthew Tancik, Jonathan T Barron, Ravi Ramamoorthi, and Ren Ng. Nerf: Representing scenes as neural radiance fields for view syn- thesis.Communications of the ACM, 65(1):99–106, 2021. 2

  26. [27]

    Langsplat: 3d language gaussian splatting

    Minghan Qin, Wanhua Li, Jiawei Zhou, Haoqian Wang, and Hanspeter Pfister. Langsplat: 3d language gaussian splatting. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 20051–20060, 2024. 2

  27. [28]

    A critical analysis 9 of nerf-based 3d reconstruction.Remote Sensing, 15(14): 3585, 2023

    Fabio Remondino, Ali Karami, Ziyang Yan, Gabriele Maz- zacca, Simone Rigon, and Rongjun Qin. A critical analysis 9 of nerf-based 3d reconstruction.Remote Sensing, 15(14): 3585, 2023. 2

  28. [29]

    Structure- from-motion revisited

    Johannes L Schonberger and Jan-Michael Frahm. Structure- from-motion revisited. InProceedings of the IEEE con- ference on computer vision and pattern recognition, pages 4104–4113, 2016. 6

  29. [30]

    Pixelwise view selection for unstructured multi-view stereo

    Johannes L Sch ¨onberger, Enliang Zheng, Jan-Michael Frahm, and Marc Pollefeys. Pixelwise view selection for unstructured multi-view stereo. InEuropean conference on computer vision, pages 501–518. Springer, 2016. 7

  30. [31]

    Pixelwise view selection for un- structured multi-view stereo

    Johannes Lutz Sch ¨onberger, Enliang Zheng, Marc Pollefeys, and Jan-Michael Frahm. Pixelwise view selection for un- structured multi-view stereo. InEuropean Conference on Computer Vision (ECCV), 2016. 6

  31. [32]

    A benchmark and a baseline for robust multi- view depth estimation

    Philipp Schr ¨oppel, Jan Bechtold, Artemij Amiranashvili, and Thomas Brox. A benchmark and a baseline for robust multi- view depth estimation. In2022 International Conference on 3D Vision (3DV), pages 637–645. IEEE, 2022. 3

  32. [33]

    Language embedded 3d gaussians for open- vocabulary scene understanding

    Jin-Chuan Shi, Miao Wang, Hao-Bin Duan, and Shao- Hua Guan. Language embedded 3d gaussians for open- vocabulary scene understanding. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5333–5343, 2024. 2

  33. [34]

    Direct voxel grid optimization: Super-fast convergence for radiance fields reconstruction

    Cheng Sun, Min Sun, and Hwann-Tzong Chen. Direct voxel grid optimization: Super-fast convergence for radiance fields reconstruction. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5459– 5469, 2022. 2

  34. [35]

    Splatter image: Ultra-fast single-view 3d recon- struction

    Stanislaw Szymanowicz, Chrisitian Rupprecht, and Andrea Vedaldi. Splatter image: Ultra-fast single-view 3d recon- struction. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10208– 10217, 2024. 3

  35. [36]

    How nerfs and 3d gaussian splatting are reshaping slam: A survey

    Fabio Tosi, Youmin Zhang, Ziren Gong, Erik Sandstr ¨om, Stefano Mattoccia, Martin R Oswald, and Matteo Poggi. How nerfs and 3d gaussian splatting are reshaping slam: a survey.arXiv preprint arXiv:2402.13255, 4:1, 2024. 2

  36. [37]

    Probesdf: Light field probes for neural surface reconstruc- tion.arXiv preprint arXiv:2412.10084, 2024

    Briac Toussaint, Diego Thomas, and Jean-S ´ebastien Franco. Probesdf: Light field probes for neural surface reconstruc- tion.arXiv preprint arXiv:2412.10084, 2024. 2

  37. [38]

    Dn-splatter: Depth and normal priors for gaussian splatting and meshing

    Matias Turkulainen, Xuqian Ren, Iaroslav Melekhov, Otto Seiskari, Esa Rahtu, and Juho Kannala. Dn-splatter: Depth and normal priors for gaussian splatting and meshing. In 2025 IEEE/CVF Winter Conference on Applications of Com- puter Vision (WACV), pages 2421–2431. IEEE, 2025. 2

  38. [39]

    Patchmatchnet: Learned multi-view patchmatch stereo

    Fangjinhua Wang, Silvano Galliani, Christoph V ogel, Pablo Speciale, and Marc Pollefeys. Patchmatchnet: Learned multi-view patchmatch stereo. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 14194–14203, 2021. 3

  39. [40]

    Unisdf: Unifying neural representations for high- fidelity 3d reconstruction of complex scenes with reflections

    Fangjinhua Wang, Marie-Julie Rakotosaona, Michael Niemeyer, Richard Szeliski, Marc Pollefeys, and Federico Tombari. Unisdf: Unifying neural representations for high- fidelity 3d reconstruction of complex scenes with reflections. InNeurIPS, 2024. 2

  40. [41]

    Learning-based multi-view stereo: A survey.arXiv preprint arXiv:2408.15235, 2024

    Fangjinhua Wang, Qingtian Zhu, Di Chang, Quankai Gao, Junlin Han, Tong Zhang, Richard Hartley, and Marc Polle- feys. Learning-based multi-view stereo: A survey.arXiv preprint arXiv:2408.15235, 2024. 3

  41. [42]

    NeuS: Learning Neural Implicit Surfaces by Volume Rendering for Multi-view Reconstruction

    Peng Wang, Lingjie Liu, Yuan Liu, Christian Theobalt, Taku Komura, and Wenping Wang. Neus: Learning neural implicit surfaces by volume rendering for multi-view reconstruction. arXiv preprint arXiv:2106.10689, 2021. 2

  42. [43]

    Neus2: Fast learning of neural implicit surfaces for multi-view recon- struction

    Yiming Wang, Qin Han, Marc Habermann, Kostas Dani- ilidis, Christian Theobalt, and Lingjie Liu. Neus2: Fast learning of neural implicit surfaces for multi-view recon- struction. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2023. 2

  43. [44]

    V oxurf: V oxel-based efficient and accurate neural surface reconstruction

    Tong Wu, Jiaqi Wang, Xingang Pan, Xudong Xu, Christian Theobalt, Ziwei Liu, and Dahua Lin. V oxurf: V oxel-based efficient and accurate neural surface reconstruction.arXiv preprint arXiv:2208.12697, 2022. 2

  44. [45]

    Depthsplat: Connecting gaussian splatting and depth.arXiv preprint arXiv:2410.13862, 2024

    Haofei Xu, Songyou Peng, Fangjinhua Wang, Hermann Blum, Daniel Barath, Andreas Geiger, and Marc Pollefeys. Depthsplat: Connecting gaussian splatting and depth.arXiv preprint arXiv:2410.13862, 2024. 3

  45. [46]

    Mvs2d: Efficient multi-view stereo via attention-driven 2d convolutions

    Zhenpei Yang, Zhile Ren, Qi Shan, and Qixing Huang. Mvs2d: Efficient multi-view stereo via attention-driven 2d convolutions. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 8574– 8584, 2022. 3

  46. [47]

    Deformable 3d gaussians for high- fidelity monocular dynamic scene reconstruction

    Ziyi Yang, Xinyu Gao, Wen Zhou, Shaohui Jiao, Yuqing Zhang, and Xiaogang Jin. Deformable 3d gaussians for high- fidelity monocular dynamic scene reconstruction. InPro- ceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 20331–20341, 2024. 2

  47. [48]

    Blendedmvs: A large-scale dataset for generalized multi-view stereo net- works.Computer Vision and Pattern Recognition (CVPR),

    Yao Yao, Zixin Luo, Shiwei Li, Jingyang Zhang, Yufan Ren, Lei Zhou, Tian Fang, and Long Quan. Blendedmvs: A large-scale dataset for generalized multi-view stereo net- works.Computer Vision and Pattern Recognition (CVPR),

  48. [49]

    Gaussian grouping: Segment and edit anything in 3d scenes

    Mingqiao Ye, Martin Danelljan, Fisher Yu, and Lei Ke. Gaussian grouping: Segment and edit anything in 3d scenes. InEuropean Conference on Computer Vision, pages 162–

  49. [50]

    gsplat: An open-source library for gaussian splatting, 2024

    Vickie Ye, Ruilong Li, Justin Kerr, Matias Turkulainen, Brent Yi, Zhuoyang Pan, Otto Seiskari, Jianbo Ye, Jeffrey Hu, Matthew Tancik, and Angjoo Kanazawa. gsplat: An open-source library for gaussian splatting, 2024. 5

  50. [51]

    Fast-mvsnet: Sparse-to- dense multi-view stereo with learned propagation and gauss- newton refinement

    Zehao Yu and Shenghua Gao. Fast-mvsnet: Sparse-to- dense multi-view stereo with learned propagation and gauss- newton refinement. InProceedings of the IEEE/CVF con- ference on computer vision and pattern recognition, pages 1949–1958, 2020. 3

  51. [52]

    arXiv preprint arXiv:2406.01467 (2024)

    Baowen Zhang, Chuan Fang, Rakesh Shrestha, Yixun Liang, Xiaoxiao Long, and Ping Tan. Rade-gs: Rasterizing depth in gaussian splatting.arXiv preprint arXiv:2406.01467, 2024. 2

  52. [53]

    Eve3d: Elevating vision models for enhanced 3d surface re- construction via gaussian splatting

    Jiawei Zhang, Youmin Zhang, Fabio Tosi, Meiying Gu, Ji- ahe Li, Xiaohan Yu, Jin Zheng, Xiao Bai, and Matteo Poggi. Eve3d: Elevating vision models for enhanced 3d surface re- construction via gaussian splatting. InThe Thirty-ninth An- nual Conference on Neural Information Processing Systems. 2 10

  53. [54]

    Vis-mvsnet: Visibility-aware multi-view stereo net- work.International Journal of Computer Vision, 131(1): 199–214, 2023

    Jingyang Zhang, Shiwei Li, Zixin Luo, Tian Fang, and Yao Yao. Vis-mvsnet: Visibility-aware multi-view stereo net- work.International Journal of Computer Vision, 131(1): 199–214, 2023. 3

  54. [55]

    Quadratic gaussian splat- ting for efficient and detailed surface reconstruction.arXiv preprint arXiv:2411.16392, 2024

    Ziyu Zhang, Binbin Huang, Hanqing Jiang, Liyang Zhou, Xiaojun Xiang, and Shunhan Shen. Quadratic gaussian splat- ting for efficient and detailed surface reconstruction.arXiv preprint arXiv:2411.16392, 2024. 2

  55. [56]

    Triplane meets gaussian splatting: Fast and generalizable single-view 3d reconstruction with transformers

    Zi-Xin Zou, Zhipeng Yu, Yuan-Chen Guo, Yangguang Li, Ding Liang, Yan-Pei Cao, and Song-Hai Zhang. Triplane meets gaussian splatting: Fast and generalizable single-view 3d reconstruction with transformers. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10324–10335, 2024. 2

  56. [57]

    Ewa volume splatting

    Matthias Zwicker, Hanspeter Pfister, Jeroen Van Baar, and Markus Gross. Ewa volume splatting. InProceedings Visu- alization, 2001. VIS’01., pages 29–538. IEEE, 2001. 3 11 PAGaS: Pixel-Aligned 1DoF Gaussian Splatting for Depth Refinement Supplementary Material A. Visual comparison with the ground truth 2DGS 2DGS+PAGaS MVSA MVSA+PAGaS GT PGSR PGSR+PAGaS Col...