2D-SuGaR: Surface-Aware Gaussian Splatting for Geometrically Accurate Mesh Reconstruction

Divyam Sheth; Jinjoo Ha; Justus Thies; Mirela Ostrek; Prajwal Gupta C. R.

arxiv: 2605.00569 · v1 · submitted 2026-05-01 · 💻 cs.CV · cs.GR

2D-SuGaR: Surface-Aware Gaussian Splatting for Geometrically Accurate Mesh Reconstruction

Prajwal Gupta C. R. , Divyam Sheth , Jinjoo Ha , Mirela Ostrek , Justus Thies This is my paper

Pith reviewed 2026-05-09 19:15 UTC · model grok-4.3

classification 💻 cs.CV cs.GR

keywords 2D Gaussian Splattingmesh reconstructionmonocular depth priorssurface reconstructionnovel view synthesisDTU datasetgeometric accuracy

0 comments

The pith

Monocular depth and normal priors guide 2D Gaussian Splatting to produce more accurate surface meshes from multi-view images.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper seeks to fix a key weakness in 2D Gaussian Splatting where reliance on Structure-from-Motion points often leads to poor surface geometry on difficult scenes. By feeding monocular depth and normal estimates into the initialization of the Gaussian primitives and using a clustering step to remove bad ones, the approach aims to enforce better surface awareness while keeping fast, high-quality rendering. A reader would care because many practical uses of 3D reconstruction, from robotics to content creation, need both precise meshes and photorealistic views from the same model. If successful, the method shows that cheap single-image priors can compensate for unreliable multi-view geometry without extra sensors.

Core claim

The authors present 2D-SuGaR as an enhancement to 2D Gaussian Splatting that adds monocular depth and normal priors for depth-guided Gaussian initialization together with a clustering-based pruning step that removes degenerate primitives. This combination reduces sensitivity to weak SfM initializations. On the DTU dataset the resulting meshes reach state-of-the-art geometric accuracy while novel-view rendering quality remains comparable to the original 2DGS baseline.

What carries the argument

Depth-guided initialization of 2D Gaussians paired with clustering-based pruning of degenerate primitives, which uses monocular priors to steer placement and remove unreliable splats.

If this is right

Mesh reconstruction accuracy improves on standard multi-view benchmarks such as DTU.
The system remains robust when Structure-from-Motion initialization is poor.
High-quality novel view synthesis is preserved alongside the geometric gains.
Surface-aware Gaussians become usable for applications that require both geometry and rendering.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same prior-guided pruning could be applied to other splatting variants that currently suffer from floating artifacts.
Scenes with strong textureless regions might still require additional regularization beyond the monocular priors.
Integration with video sequences could test whether temporal consistency further stabilizes the depth guidance.

Load-bearing premise

Monocular depth and normal estimates are accurate enough to correctly place Gaussians and identify degenerate ones even when SfM points are sparse or noisy.

What would settle it

Running the method on DTU scenes with deliberately degraded SfM points and finding no improvement in mesh error metrics such as Chamfer distance compared with plain 2DGS would disprove the central claim.

Figures

Figures reproduced from arXiv: 2605.00569 by Divyam Sheth, Jinjoo Ha, Justus Thies, Mirela Ostrek, Prajwal Gupta C. R..

**Figure 1.** Figure 1: Overview. We propose 2D-SuGaR, a method to reconstruct 3D meshes from multi-view input images by leveraging 2D Gaussian Splatting. We initialize and regularize the 2D Gaussian primitives with pretrained normal and depth priors, and extract a mesh from this volumetric representation. We refine the mesh afterwards with additional re-rendering losses. Abstract 3D Gaussian Splatting enables the reconstruction … view at source ↗

**Figure 2.** Figure 2: Effect of normal loss. Our new normal loss reduces 2DGS reconstruction artifacts, especially in specular regions, and leads to a smoother and more faithful representation of the surface. that the Gaussians remain bound to their respective triangles during optimization, we specify the Gaussian means in barycentric coordinates with respect to the mesh vertices. We refine it using: Lr = Lp +γLLap +δLm, (3) w… view at source ↗

**Figure 3.** Figure 3: Qualitative comparison of meshes on the DTU benchmark. Our method produces more detailed and complete meshes as can be seen in the zoom-ins. Method 24 37 40 55 63 65 69 83 97 105 106 110 114 118 122 Mean↓ Time↓ NeRF [MST∗20] 1.90 1.60 1.85 0.58 2.28 1.27 1.47 1.67 2.05 1.07 0.88 2.53 1.06 1.15 0.96 1.49 >12h VoISDF [YGKL21] 1.14 1.26 0.81 0.49 1.25 0.70 0.72 1.29 1.18 0.70 0.66 1.08 0.42 0.61 0.55 0.86 >12… view at source ↗

**Figure 4.** Figure 4: Qualitative comparison to 2DGS. Our meshes have fine geometric details such as windows, eyes, and feathers. dimension (sz = 10−3 ) to 2D surfels and refines using 3D Gaussian rasterization with losses from [KKLD23] and [GL24]. Strict 2D maintains rigid surfels using a surfel rasterizer. Without additional losses, results degrade as primitives drift along depth axis. With the regularization losses from [H… view at source ↗

read the original abstract

3D Gaussian Splatting (3DGS) has emerged as a powerful technique for generating photorealistic renderings of a scene in real-time. However, the volumetric nature of 3DGS limits its ability to accurately capture surface geometry. To address this, 2D Gaussian Splatting (2DGS) was proposed to enable view-consistent and geometrically accurate surface reconstruction from multi-view images. However, 2DGS can be sensitive to the initialization of the Gaussian primitives. Reliance on Structure-from-Motion (SfM) initializations, which can produce poor estimates on challenging image sets, may lead to subpar results. In this work, we enhance 2DGS by incorporating monocular depth and normal priors to improve both geometric accuracy and robustness. We propose a depth-guided initialization strategy for Gaussians and introduce a clustering-based technique for pruning degenerate Gaussians. We evaluate our method on the DTU dataset, where it achieves state-of-the-art results in mesh reconstruction while preserving high-quality novel view synthesis.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

2D-SuGaR adds depth-guided init and clustering pruning to 2DGS to handle weak SfM starts, but the DTU claims rest on unmeasured monocular prior accuracy.

read the letter

The core contribution is a pair of practical fixes for 2D Gaussian Splatting: using monocular depth and normals to place the initial primitives more reliably, plus a clustering step to drop degenerate ones. This directly targets the known fragility of 2DGS when SfM gives poor starting points on hard scenes. The idea is simple and fits naturally into the existing pipeline without adding heavy machinery. If the mesh gains on DTU turn out to be real while novel-view quality stays high, the work gives people a usable route to extractable surfaces from a renderer that is already popular. The paper builds cleanly on prior 2DGS and monocular estimation results and keeps the focus on the geometry problem rather than chasing unrelated bells and whistles. The main gap is the missing evidence on the priors themselves. The abstract asserts state-of-the-art mesh results on DTU, yet the text supplies no numbers, no baseline tables, and no ablation that isolates the new initialization and pruning steps. More critically, there is no reported error for the chosen monocular depth and normal estimator against DTU ground truth on the evaluated scenes. Without that, it is impossible to tell whether the reported improvement comes from the proposed techniques or from the priors happening to be decent on those particular views. DTU contains specular and textureless regions where monocular estimates often degrade, so the robustness claim stays untested. This paper is aimed at researchers already working with Gaussian splatting who need better surface output for downstream tasks like meshing or content creation. A reader who wants concrete implementation ideas for initialization and pruning will find value even before the numbers are tightened. It deserves a serious referee because the problem is real, the proposed solution is plausible, and the field would benefit from a properly documented version of this approach.

Referee Report

3 major / 1 minor

Summary. The paper proposes 2D-SuGaR, an extension of 2D Gaussian Splatting (2DGS) that incorporates monocular depth and normal priors to enable depth-guided initialization of Gaussian primitives and a clustering-based pruning strategy for removing degenerate Gaussians. It claims that this yields state-of-the-art mesh reconstruction accuracy on the DTU dataset while preserving high-quality novel-view synthesis.

Significance. If the quantitative claims hold, the work would be a useful incremental advance for surface-aware Gaussian splatting, addressing a known sensitivity of 2DGS to SfM initialization quality by leveraging readily available monocular estimators. The combination of depth-guided seeding and explicit degeneracy pruning is a concrete, implementable idea that could improve geometric fidelity in multi-view reconstruction pipelines.

major comments (3)

[Abstract] Abstract: the central claim that the method 'achieves state-of-the-art results in mesh reconstruction' on DTU is unsupported by any quantitative metrics, baseline tables, Chamfer distances, normal consistency scores, or ablation results in the provided text. Without these, the SOTA assertion cannot be verified and the contribution of the proposed initialization and pruning steps remains unquantified.
[Method] Method description (implied in abstract): the approach assumes monocular depth and normal priors are sufficiently accurate to guide initialization and enable effective pruning when SfM is poor, yet no error statistics (depth MAE, normal angular error) of the chosen monocular estimator versus DTU ground-truth geometry are reported on the evaluated scenes. This leaves open whether observed gains derive from the priors themselves or from other implementation choices, and whether performance would degrade on specular or textureless DTU regions where monocular estimates typically fail.
[Experiments] Evaluation: no ablation studies isolating the depth-guided initialization versus the clustering pruning, nor comparisons against the original 2DGS or prior SuGaR variants, are described. This makes it impossible to determine which component drives any reported improvement and whether the method is robust beyond the specific DTU scenes tested.

minor comments (1)

[Abstract] The abstract mentions 'clustering-based technique for pruning degenerate Gaussians' without defining the clustering criterion or distance metric used; a brief equation or pseudocode would clarify the procedure.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback. We agree that additional quantitative details are needed to support the claims and will revise the manuscript accordingly.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim that the method 'achieves state-of-the-art results in mesh reconstruction' on DTU is unsupported by any quantitative metrics, baseline tables, Chamfer distances, normal consistency scores, or ablation results in the provided text. Without these, the SOTA assertion cannot be verified and the contribution of the proposed initialization and pruning steps remains unquantified.

Authors: We agree the abstract should include key quantitative results for verifiability. The full manuscript contains tables in the Experiments section with Chamfer distances, normal consistency scores, and comparisons to baselines including 2DGS. We will revise the abstract to summarize these metrics (e.g., specific Chamfer and normal consistency improvements) to directly support the SOTA claim and highlight the contributions of initialization and pruning. revision: yes
Referee: [Method] Method description (implied in abstract): the approach assumes monocular depth and normal priors are sufficiently accurate to guide initialization and enable effective pruning when SfM is poor, yet no error statistics (depth MAE, normal angular error) of the chosen monocular estimator versus DTU ground-truth geometry are reported on the evaluated scenes. This leaves open whether observed gains derive from the priors themselves or from other implementation choices, and whether performance would degrade on specular or textureless DTU regions where monocular estimates typically fail.

Authors: We will add a new table in the Experiments section reporting depth MAE and normal angular error of the monocular estimator against DTU ground truth on the evaluated scenes. We will also include discussion of performance on specular and textureless regions, noting any observed limitations or robustness measures. revision: yes
Referee: [Experiments] Evaluation: no ablation studies isolating the depth-guided initialization versus the clustering pruning, nor comparisons against the original 2DGS or prior SuGaR variants, are described. This makes it impossible to determine which component drives any reported improvement and whether the method is robust beyond the specific DTU scenes tested.

Authors: We will expand the Experiments section with ablation studies that isolate the depth-guided initialization from the clustering pruning. Direct quantitative comparisons to original 2DGS and prior SuGaR variants will be added using the same DTU metrics. Additional results on a broader set of scenes will be included to assess robustness. revision: yes

Circularity Check

0 steps flagged

No circularity: method extends external priors and prior 2DGS without self-referential reductions

full rationale

The paper describes an enhancement to 2D Gaussian Splatting by adding monocular depth and normal priors (from separate estimators) for depth-guided Gaussian initialization and clustering-based pruning of degenerate primitives. These components are presented as extensions of existing techniques rather than derivations that reduce to the paper's own fitted quantities or self-defined terms. No equations or claims in the provided text equate a 'prediction' to its input by construction, and the SOTA mesh reconstruction claim on DTU is framed as an empirical outcome, not a mathematical necessity derived from the method's definitions. The approach is self-contained against external benchmarks and prior work without load-bearing self-citation chains or ansatz smuggling.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review prevents identification of concrete free parameters, axioms, or invented entities; the method implicitly relies on the accuracy of existing monocular depth/normal estimators and the 2DGS surface representation.

pith-pipeline@v0.9.0 · 5500 in / 1056 out tokens · 46882 ms · 2026-05-09T19:15:37.948851+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

12 extracted references · 12 canonical work pages

[1]

IEEE Transactions on Pattern Analysis and Machine Intelligence , year=

Metric3d v2: A versatile monocular geometric foundation model for zero-shot metric depth and surface normal estimation , author=. IEEE Transactions on Pattern Analysis and Machine Intelligence , year=

work page
[2]

, author=

3D Gaussian splatting for real-time radiance field rendering. , author=. ACM Trans. Graph. , volume=

work page
[3]

ACM SIGGRAPH 2024 conference papers , pages=

2d gaussian splatting for geometrically accurate radiance fields , author=. ACM SIGGRAPH 2024 conference papers , pages=

work page 2024
[4]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

Sugar: Surface-aligned gaussian splatting for efficient 3d mesh reconstruction and high-quality mesh rendering , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

work page
[5]

2020 , booktitle=

NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis , author=. 2020 , booktitle=

work page 2020
[6]

arXiv , year =

Neus: Learning neural implicit surfaces by volume rendering for multi-view reconstruction , author=. arXiv , year =

work page
[7]

Advances in neural information processing systems , volume=

Volume rendering of neural implicit surfaces , author=. Advances in neural information processing systems , volume=

work page
[8]

ACM Transactions on Graphics (ToG) , volume=

Gaussian opacity fields: Efficient adaptive surface reconstruction in unbounded scenes , author=. ACM Transactions on Graphics (ToG) , volume=. 2024 , publisher=

work page 2024
[9]

arXiv , year =

Rade-gs: Rasterizing depth in gaussian splatting , author=. arXiv , year =

work page
[10]

ACM Transactions on Graphics (TOG) , volume=

Milo: Mesh-in-the-loop gaussian splatting for detailed and efficient surface reconstruction , author=. ACM Transactions on Graphics (TOG) , volume=. 2025 , publisher=

work page 2025
[11]

Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

Large scale multi-view stereopsis evaluation , author=. Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

work page
[12]

Data mining and knowledge discovery , volume=

Density-based clustering in spatial databases: The algorithm gdbscan and its applications , author=. Data mining and knowledge discovery , volume=. 1998 , publisher=

work page 1998

[1] [1]

IEEE Transactions on Pattern Analysis and Machine Intelligence , year=

Metric3d v2: A versatile monocular geometric foundation model for zero-shot metric depth and surface normal estimation , author=. IEEE Transactions on Pattern Analysis and Machine Intelligence , year=

work page

[2] [2]

, author=

3D Gaussian splatting for real-time radiance field rendering. , author=. ACM Trans. Graph. , volume=

work page

[3] [3]

ACM SIGGRAPH 2024 conference papers , pages=

2d gaussian splatting for geometrically accurate radiance fields , author=. ACM SIGGRAPH 2024 conference papers , pages=

work page 2024

[4] [4]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

Sugar: Surface-aligned gaussian splatting for efficient 3d mesh reconstruction and high-quality mesh rendering , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

work page

[5] [5]

2020 , booktitle=

NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis , author=. 2020 , booktitle=

work page 2020

[6] [6]

arXiv , year =

Neus: Learning neural implicit surfaces by volume rendering for multi-view reconstruction , author=. arXiv , year =

work page

[7] [7]

Advances in neural information processing systems , volume=

Volume rendering of neural implicit surfaces , author=. Advances in neural information processing systems , volume=

work page

[8] [8]

ACM Transactions on Graphics (ToG) , volume=

Gaussian opacity fields: Efficient adaptive surface reconstruction in unbounded scenes , author=. ACM Transactions on Graphics (ToG) , volume=. 2024 , publisher=

work page 2024

[9] [9]

arXiv , year =

Rade-gs: Rasterizing depth in gaussian splatting , author=. arXiv , year =

work page

[10] [10]

ACM Transactions on Graphics (TOG) , volume=

Milo: Mesh-in-the-loop gaussian splatting for detailed and efficient surface reconstruction , author=. ACM Transactions on Graphics (TOG) , volume=. 2025 , publisher=

work page 2025

[11] [11]

Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

Large scale multi-view stereopsis evaluation , author=. Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

work page

[12] [12]

Data mining and knowledge discovery , volume=

Density-based clustering in spatial databases: The algorithm gdbscan and its applications , author=. Data mining and knowledge discovery , volume=. 1998 , publisher=

work page 1998