pith. machine review for the scientific record. sign in

arxiv: 2511.19172 · v4 · submitted 2025-11-24 · 💻 cs.CV

Recognition: 2 theorem links

· Lean Theorem

MetroGS: Efficient and Stable Reconstruction of Geometrically Accurate High-Fidelity Large-Scale Scenes

Authors on Pith no claims yet

Pith reviewed 2026-05-17 06:18 UTC · model grok-4.3

classification 💻 cs.CV
keywords Gaussian Splattinglarge-scale scene reconstructiongeometric accuracyurban environmentsdense enhancementhybrid optimizationappearance modeling3D reconstruction
0
0 comments X

The pith

MetroGS builds large-scale urban scenes with better geometric accuracy using distributed 2D Gaussians and hybrid refinement.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces MetroGS to solve the problem of achieving both efficiency and high geometric fidelity when reconstructing complex urban environments with Gaussian Splatting techniques. It starts from a distributed 2D Gaussian representation and adds a dense enhancement step that draws on SfM priors plus a pointmap model, followed by progressive optimization that mixes monocular and multi-view signals and a depth-guided model that separates geometry from appearance. A sympathetic reader would care because existing methods frequently produce incomplete or inconsistent results on city-scale data. If the claims hold, the work would supply a single pipeline that produces both accurate shapes and stable renderings without separate post-processing stages.

Core claim

MetroGS establishes a distributed 2D Gaussian Splatting representation as the core backbone. It adds a structured dense enhancement scheme that uses SfM priors and a pointmap model to produce denser initialization together with a sparsity compensation mechanism. A progressive hybrid geometric optimization strategy then combines monocular and multi-view optimization for refinement. Finally, depth-guided appearance modeling learns spatially consistent features to decouple geometry from appearance and improve overall stability.

What carries the argument

Distributed 2D Gaussian Splatting representation serves as the unified backbone that supports the subsequent dense enhancement, hybrid optimization, and appearance modeling modules.

If this is right

  • Denser initialization in sparse regions improves completeness of the final reconstruction.
  • Hybrid monocular and multi-view optimization produces more accurate geometry than either cue alone.
  • Depth-guided appearance modeling reduces inconsistencies across different viewpoints.
  • The combined pipeline yields both higher geometric accuracy and better rendering quality on city-scale data.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same backbone might support incremental updates when new images arrive without restarting the entire optimization.
  • Replacing the pointmap model with a learned depth estimator trained on the target domain could reduce dependence on SfM quality.
  • The geometric emphasis could make the output directly usable for tasks such as path planning that require precise surface positions.

Load-bearing premise

SfM priors combined with a pointmap model will produce a denser and more complete initialization without introducing systematic geometric errors in complex urban environments.

What would settle it

On a held-out large urban dataset, compute mean geometric error against ground-truth points and observe that MetroGS does not reduce this error below the best prior Gaussian Splatting baselines.

Figures

Figures reproduced from arXiv: 2511.19172 by Feng Dai, Hao Jiang, Honglong Zhao, Kehua Chen, Shuqin Gao, Tianlu Mao, Xinzhu Ma, Yucheng Zhang, Zehao Li, Zhaoqi Wang, Zihan Liu.

Figure 1
Figure 1. Figure 1: Illustration of the superiority of our method. (a) Our method accurately reconstructs the geometric structure of large-scale urban scenes, faithfully restoring fine details such as buildings, vegetation, and roads. (b) Compared with the SOTA method CityGSV2 [33], our result are more complete and geometrically precise. (c) Benefiting from a well-designed training framework, our method achieves superior conv… view at source ↗
Figure 2
Figure 2. Figure 2: Overview. Starting with the input image sequences, we first utilize the prior information provided by SfM, combined with a pointmap model, to generate a high-quality initial point cloud. Next, an additional sparsity compensation optimization is introduced during the densification process to further refine sparse regions. We then combine monocular depth priors with multi-view consistency optimiza￾tion to ac… view at source ↗
Figure 3
Figure 3. Figure 3: Visualization of hybrid multi-view refinement. (a) Strict geometric consistency yields reliable PM-refined depth. (b) and (c) show the restored refined depths, highlighting the effec￾tiveness of patch-based alignment for local restoration. When the alignment error between the aligned depth and the filtered depth falls below a predefined threshold, the fil￾tered depth is preserved. The restored depth Dmv is… view at source ↗
Figure 4
Figure 4. Figure 4: Qualitative comparison on the MatrixCity [23] dataset. Image rendering and mesh reconstruction are compared between our method and CityGSV2 [33]. variations that are independent of geometry. Additionally, each training image Ii is assigned a learnable appearance embedding li ∈ R d to capture global illumination and ex￾posure conditions. The queried Tri-Mip feature fTri(x) and the embedding li are concatena… view at source ↗
Figure 5
Figure 5. Figure 5: Qualitative results on the GauU-Scene [47] dataset. We present the image and depth rendering results of our method compared with state-of-the-art methods. able benchmarks for evaluating geometric quality in large￾scale scene reconstruction. Following the settings in [33], we evaluate our method on both the aerial-view and street￾view versions. The aerial-view images are downsampled to a 1600-pixel long sid… view at source ↗
Figure 6
Figure 6. Figure 6: Visualization results of ablation study. The top row shows the results without the corresponding modules, while the bottom row shows the results with the modules. Further visualiza￾tions are available in the supplementary materials. a substantial performance drop across all metrics, confirm￾ing the importance of decoupling geometry from appear￾ance in scenes with inconsistent visual conditions. Further rem… view at source ↗
Figure 7
Figure 7. Figure 7: Supplementary Visualization of ablation study re￾sults. The top row shows results without the modules, and the bottom row shows results with them. Our components yield a sig￾nificant improvement in depth quality, effectively addressing chal￾lenges across diverse and complex scenes. parison with CityGS-X , we utilized its provided Mill19 configuration to train the GauU-Scene dataset. Crucially, we disabled … view at source ↗
Figure 8
Figure 8. Figure 8: Qualitative comparison of meshes on the GauU-Scene [47] dataset. Our method achieves higher-quality results [PITH_FULL_IMAGE:figures/full_fig_p013_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Mesh visualization comparison on MatrixCity-Aerial [47]. Our method provides better results than the baselines. of scene details. For the MatrixCity dataset, we directly ap￾plied the corresponding official configuration provided by CityGS-X for training. C. Additional Results C.1. Training Efficiency Analysis Using a system with four RTX 3090 GPUs, we con￾ducted a training efficiency comparison between Cit… view at source ↗
Figure 10
Figure 10. Figure 10: Qualitative results on Mill-19 [40] and Urban￾scene3D [30] datasets. We compare against CityGS. limitations: Firstly, due to hardware constraints, memory consumption remains the primary bottleneck limiting the training scale, which to some extent weakens the model’s potential performance. Therefore, it is necessary to in￾troduce techniques such as advanced pruning [34] and cache management [56] to mitigat… view at source ↗
read the original abstract

Recently, 3D Gaussian Splatting and its derivatives have achieved significant breakthroughs in large-scale scene reconstruction. However, how to efficiently and stably achieve high-quality geometric fidelity remains a core challenge. To address this issue, we introduce MetroGS, a novel Gaussian Splatting framework for efficient and robust reconstruction in complex urban environments. Our method is built upon a distributed 2D Gaussian Splatting representation as the core foundation, serving as a unified backbone for subsequent modules. To handle potential sparse regions in complex scenes, we propose a structured dense enhancement scheme that utilizes SfM priors and a pointmap model to achieve a denser initialization, while incorporating a sparsity compensation mechanism to improve reconstruction completeness. Furthermore, we design a progressive hybrid geometric optimization strategy that organically integrates monocular and multi-view optimization to achieve efficient and accurate geometric refinement. Finally, to address the appearance inconsistency commonly observed in large-scale scenes, we introduce a depth-guided appearance modeling approach that learns spatial features with 3D consistency, facilitating effective decoupling between geometry and appearance and further enhancing reconstruction stability. Experiments on large-scale urban datasets demonstrate that MetroGS achieves superior geometric accuracy, rendering quality, offering a unified solution for high-fidelity large-scale scene reconstruction.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces MetroGS, a Gaussian Splatting framework for efficient and stable reconstruction of geometrically accurate high-fidelity large-scale scenes in complex urban environments. It uses a distributed 2D Gaussian Splatting representation as backbone, proposes a structured dense enhancement scheme leveraging SfM priors and a pointmap model for denser initialization with sparsity compensation, designs a progressive hybrid geometric optimization integrating monocular and multi-view cues, and introduces depth-guided appearance modeling for 3D-consistent spatial features. Experiments on large-scale urban datasets are claimed to demonstrate superior geometric accuracy and rendering quality, positioning the method as a unified solution.

Significance. If the geometric accuracy claims are substantiated with robust quantitative controls, this work could advance large-scale 3D reconstruction by providing an efficient, stable pipeline that addresses sparsity, geometric refinement, and appearance inconsistency. The modular integration of priors with optimization strategies offers a practical contribution to the field, though its impact depends on demonstrating that the initialization does not propagate uncorrectable biases.

major comments (2)
  1. [§3.2] §3.2 (structured dense enhancement): The central claim of superior geometric accuracy rests on this module producing a reliable, error-free denser initialization. The scheme explicitly depends on SfM priors and an off-the-shelf pointmap model, yet the manuscript provides no targeted experiments or analysis showing robustness to urban-specific failures (reflective surfaces, repetitive facades, dynamic elements). Subsequent progressive hybrid optimization is described only as refinement, not as a mechanism to detect or remove systematic initialization bias; this is load-bearing for the headline result.
  2. [§5] §5 (experiments): The results section asserts superior geometric accuracy and rendering quality, but supplies no quantitative metrics for geometry (e.g., depth error, surface normal consistency, or Chamfer distance), no ablation isolating the dense enhancement contribution, and no error bars or statistical tests. This prevents assessment of whether improvements survive standard controls or post-hoc dataset choices and directly weakens the cross-method comparison.
minor comments (2)
  1. [Abstract] The abstract would be strengthened by including one or two concrete quantitative results (e.g., PSNR gain or geometric error reduction) to support the superiority claims.
  2. A pipeline diagram clarifying data flow between the distributed 2D Gaussian backbone, dense enhancement, hybrid optimization, and appearance modeling would improve readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback on our manuscript. We have carefully addressed each major comment below and revised the manuscript to strengthen the presentation of our method and results.

read point-by-point responses
  1. Referee: [§3.2] §3.2 (structured dense enhancement): The central claim of superior geometric accuracy rests on this module producing a reliable, error-free denser initialization. The scheme explicitly depends on SfM priors and an off-the-shelf pointmap model, yet the manuscript provides no targeted experiments or analysis showing robustness to urban-specific failures (reflective surfaces, repetitive facades, dynamic elements). Subsequent progressive hybrid optimization is described only as refinement, not as a mechanism to detect or remove systematic initialization bias; this is load-bearing for the headline result.

    Authors: We agree that robustness to urban-specific challenges such as reflective surfaces and repetitive facades is important to substantiate. The original manuscript evaluated the full pipeline on large-scale urban datasets that contain these elements, and the sparsity compensation was introduced precisely to address incomplete SfM and pointmap outputs. In the revised manuscript we have added targeted experiments on challenging subsets exhibiting reflective surfaces and repetitive patterns, with quantitative comparisons before and after the dense enhancement. We have also revised the description of the progressive hybrid geometric optimization to clarify that it iteratively integrates monocular depth cues with multi-view consistency to reduce initialization biases, supported by new visualizations of geometry refinement. revision: yes

  2. Referee: [§5] §5 (experiments): The results section asserts superior geometric accuracy and rendering quality, but supplies no quantitative metrics for geometry (e.g., depth error, surface normal consistency, or Chamfer distance), no ablation isolating the dense enhancement contribution, and no error bars or statistical tests. This prevents assessment of whether improvements survive standard controls or post-hoc dataset choices and directly weakens the cross-method comparison.

    Authors: We acknowledge that additional quantitative controls would strengthen the geometric accuracy claims. In the revised manuscript we have added depth error and Chamfer distance metrics on the evaluated urban datasets, included an ablation study that isolates the contribution of the structured dense enhancement module, and reported error bars based on multiple runs. These additions provide a more rigorous basis for the reported improvements and cross-method comparisons. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected in MetroGS derivation

full rationale

The paper constructs MetroGS from an established distributed 2D Gaussian Splatting backbone, then adds three explicitly described modules: structured dense enhancement (SfM priors + pointmap model plus sparsity compensation), progressive hybrid geometric optimization (monocular + multi-view integration), and depth-guided appearance modeling. These steps are presented as independent engineering additions that address sparsity, geometric refinement, and appearance inconsistency, respectively. Central performance claims rest on experimental results from large-scale urban datasets rather than any quantity defined in terms of itself, any fitted parameter relabeled as a prediction, or a load-bearing self-citation chain. No equations or uniqueness theorems are shown reducing to prior author work by construction; the derivation chain therefore remains externally grounded and self-contained.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 0 invented entities

The approach rests on standard computer-vision assumptions (SfM points are sufficiently accurate, monocular depth estimates provide useful geometric signal) plus multiple implementation choices whose values are not derived from first principles.

free parameters (2)
  • weights and schedules in progressive hybrid geometric optimization
    Balance between monocular and multi-view terms must be chosen or tuned; these are free parameters that directly affect the final geometry.
  • sparsity compensation thresholds and pointmap model parameters
    Densification rules and pointmap usage introduce tunable thresholds that control completeness versus noise.
axioms (1)
  • domain assumption SfM priors combined with a pointmap model produce a denser and more reliable initialization than standard sparse SfM alone
    Invoked in the structured dense enhancement scheme without independent verification in the abstract.

pith-pipeline@v0.9.0 · 5547 in / 1336 out tokens · 73810 ms · 2026-05-17T06:18:18.365072+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

56 extracted references · 56 canonical work pages · 3 internal anchors

  1. [1]

    nuscenes: A multi- modal dataset for autonomous driving

    Holger Caesar, Varun Bankiti, Alex H Lang, Sourabh V ora, Venice Erin Liong, Qiang Xu, Anush Krishnan, Yu Pan, Gi- ancarlo Baldan, and Oscar Beijbom. nuscenes: A multi- modal dataset for autonomous driving. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 11621–11631, 2020. 1

  2. [2]

    pixelsplat: 3d gaussian splats from image pairs for scalable generalizable 3d reconstruction

    David Charatan, Sizhe Lester Li, Andrea Tagliasacchi, and Vincent Sitzmann. pixelsplat: 3d gaussian splats from image pairs for scalable generalizable 3d reconstruction. InPro- ceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 19457–19467, 2024. 2

  3. [3]

    Tensorf: Tensorial radiance fields

    Anpei Chen, Zexiang Xu, Andreas Geiger, Jingyi Yu, and Hao Su. Tensorf: Tensorial radiance fields. InEuropean con- ference on computer vision, pages 333–350. Springer, 2022. 2

  4. [4]

    Pgsr: Planar-based gaussian splatting for ef- ficient and high-fidelity surface reconstruction.IEEE Trans- actions on Visualization and Computer Graphics, 2024

    Danpeng Chen, Hai Li, Weicai Ye, Yifan Wang, Weijian Xie, Shangjin Zhai, Nan Wang, Haomin Liu, Hujun Bao, and Guofeng Zhang. Pgsr: Planar-based gaussian splatting for ef- ficient and high-fidelity surface reconstruction.IEEE Trans- actions on Visualization and Computer Graphics, 2024. 2, 3

  5. [5]

    Gigags: Scaling up planar-based 3d gaus- sians for large scene surface reconstruction.arXiv preprint arXiv:2409.06685, 2024

    Junyi Chen, Weicai Ye, Yifan Wang, Danpeng Chen, Di Huang, Wanli Ouyang, Guofeng Zhang, Yu Qiao, and Tong He. Gigags: Scaling up planar-based 3d gaus- sians for large scene surface reconstruction.arXiv preprint arXiv:2409.06685, 2024. 2, 3

  6. [6]

    Dual-level precision edges guided multi-view stereo with accurate planarization

    Kehua Chen, Zhenlong Yuan, Tianlu Mao, and Zhaoqi Wang. Dual-level precision edges guided multi-view stereo with accurate planarization. InProceedings of the AAAI Con- ference on Artificial Intelligence, pages 2105–2113, 2025. 1

  7. [7]

    Learning multi-view stereo with geometry-aware prior.IEEE Transactions on Circuits and Systems for Video Technology, 2025

    Kehua Chen, Zhenlong Yuan, Haihong Xiao, Tianlu Mao, and Zhaoqi Wang. Learning multi-view stereo with geometry-aware prior.IEEE Transactions on Circuits and Systems for Video Technology, 2025. 1

  8. [8]

    Mixedgaussianavatar: Realisti- cally and geometrically accurate head avatar via mixed 2d-3d gaussian splatting.arXiv preprint arXiv:2412.04955, 2024

    Peng Chen, Xiaobao Wei, Qingpo Wuwu, Xinyi Wang, Xingyu Xiao, and Ming Lu. Mixedgaussianavatar: Realisti- cally and geometrically accurate head avatar via mixed 2d-3d gaussian splatting.arXiv preprint arXiv:2412.04955, 2024. 2

  9. [9]

    3d gaussian splatting for fine- detailed surface reconstruction in large-scale scene.arXiv preprint arXiv:2506.17636, 2025

    Shihan Chen, Zhaojin Li, Zeyu Chen, Qingsong Yan, Gaoyang Shen, and Ran Duan. 3d gaussian splatting for fine- detailed surface reconstruction in large-scale scene.arXiv preprint arXiv:2506.17636, 2025. 2

  10. [10]

    Alexandre Delplanque, Julie Linchant, Xavier Vincke, Richard Lamprey, J´erˆome Th´eau, C´edric Vermeulen, Samuel Foucher, Amara Ouattara, Roger Kouadio, and Philippe Lejeune. Will artificial intelligence revolutionize aerial sur- veys? a first large-scale semi-automated survey of african wildlife using oblique imagery and deep learning.Ecologi- cal Inform...

  11. [11]

    Trim 3d gaussian splatting for accurate geometry representation.arXiv preprint arXiv:2406.07499,

    Lue Fan, Yuxue Yang, Minxing Li, Hongsheng Li, and Zhaoxiang Zhang. Trim 3d gaussian splatting for accurate geometry representation.arXiv preprint arXiv:2406.07499,

  12. [12]

    Mini-splatting: Repre- senting scenes with a constrained number of gaussians

    Guangchi Fang and Bing Wang. Mini-splatting: Repre- senting scenes with a constrained number of gaussians. In European Conference on Computer Vision, pages 165–181. Springer, 2024. 2

  13. [13]

    Cosurfgs: Collaborative 3d surface gaus- sian splatting with distributed learning for large scene recon- struction.arXiv preprint arXiv:2412.17612, 2024

    Yuanyuan Gao, Yalun Dai, Hao Li, Weicai Ye, Junyi Chen, Danpeng Chen, Dingwen Zhang, Tong He, Guofeng Zhang, and Junwei Han. Cosurfgs: Collaborative 3d surface gaus- sian splatting with distributed learning for large scene recon- struction.arXiv preprint arXiv:2412.17612, 2024. 3

  14. [14]

    Citygs- x: A scalable architecture for efficient and geometrically accurate large-scale scene reconstruction.arXiv preprint arXiv:2503.23044, 2025

    Yuanyuan Gao, Hao Li, Jiaqi Chen, Zhengyu Zou, Zhihang Zhong, Dingwen Zhang, Xiao Sun, and Junwei Han. Citygs- x: A scalable architecture for efficient and geometrically accurate large-scale scene reconstruction.arXiv preprint arXiv:2503.23044, 2025. 3, 7

  15. [15]

    Are we ready for autonomous driving? the kitti vision benchmark suite

    Andreas Geiger, Philip Lenz, and Raquel Urtasun. Are we ready for autonomous driving? the kitti vision benchmark suite. In2012 IEEE conference on computer vision and pat- tern recognition, pages 3354–3361. IEEE, 2012. 1

  16. [16]

    Ue4-nerf: Neural radiance field for real-time rendering of large-scale scene.Advances in Neural Information Processing Systems, 36:59124–59136, 2023

    Jiaming Gu, Minchao Jiang, Hongsheng Li, Xiaoyuan Lu, Guangming Zhu, Syed Afaq Ali Shah, Liang Zhang, and Mohammed Bennamoun. Ue4-nerf: Neural radiance field for real-time rendering of large-scale scene.Advances in Neural Information Processing Systems, 36:59124–59136, 2023. 1

  17. [17]

    Sugar: Surface- aligned gaussian splatting for efficient 3d mesh reconstruc- tion and high-quality mesh rendering

    Antoine Gu ´edon and Vincent Lepetit. Sugar: Surface- aligned gaussian splatting for efficient 3d mesh reconstruc- tion and high-quality mesh rendering. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5354–5363, 2024. 3, 7

  18. [18]

    Tri-miprf: Tri-mip represen- tation for efficient anti-aliasing neural radiance fields

    Wenbo Hu, Yuling Wang, Lin Ma, Bangbang Yang, Lin Gao, Xiao Liu, and Yuewen Ma. Tri-miprf: Tri-mip represen- tation for efficient anti-aliasing neural radiance fields. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 19774–19783, 2023. 2, 5

  19. [19]

    2d gaussian splatting for geometrically ac- curate radiance fields

    Binbin Huang, Zehao Yu, Anpei Chen, Andreas Geiger, and Shenghua Gao. 2d gaussian splatting for geometrically ac- curate radiance fields. InACM SIGGRAPH 2024 conference papers, pages 1–11, 2024. 2, 3, 5, 7, 8

  20. [20]

    Fatesgs: Fast and accurate sparse-view surface reconstruction using gaussian splatting with depth- feature consistency

    Han Huang, Yulun Wu, Chao Deng, Ge Gao, Ming Gu, and Yu-Shen Liu. Fatesgs: Fast and accurate sparse-view surface reconstruction using gaussian splatting with depth- feature consistency. InProceedings of the AAAI Conference on Artificial Intelligence, pages 3644–3652, 2025. 3

  21. [21]

    Halogs: Loose coupling of compact geometry and gaussian splats for 3d scenes.arXiv preprint arXiv:2505.20267, 2025

    Changjian Jiang, Kerui Ren, Linning Xu, Jiong Chen, Jiang- miao Pang, Yu Zhang, Bo Dai, and Mulin Yu. Halogs: Loose coupling of compact geometry and gaussian splats for 3d scenes.arXiv preprint arXiv:2505.20267, 2025. 3

  22. [22]

    3d gaussian splatting for real-time radiance field rendering.ACM Trans

    Bernhard Kerbl, Georgios Kopanas, Thomas Leimk ¨uhler, and George Drettakis. 3d gaussian splatting for real-time radiance field rendering.ACM Trans. Graph., 42(4):139–1,

  23. [23]

    Matrixcity: A large-scale city dataset for city-scale neural rendering and beyond

    Yixuan Li, Lihan Jiang, Linning Xu, Yuanbo Xiangli, Zhen- zhi Wang, Dahua Lin, and Bo Dai. Matrixcity: A large-scale city dataset for city-scale neural rendering and beyond. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 3205–3215, 2023. 6, 7, 8, 1

  24. [24]

    Neuralangelo: High-fidelity neural surface reconstruction

    Zhaoshuo Li, Thomas M ¨uller, Alex Evans, Russell H Tay- lor, Mathias Unberath, Ming-Yu Liu, and Chen-Hsuan Lin. Neuralangelo: High-fidelity neural surface reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition, pages 8456–8465, 2023. 7 9

  25. [25]

    Gradiseg: Gradient-guided gaussian segmentation with enhanced 3d boundary precision.arXiv preprint arXiv:2412.00392, 2024

    Zehao Li, Wenwei Han, Yujun Cai, Hao Jiang, Baolong Bi, Shuqin Gao, Honglong Zhao, and Zhaoqi Wang. Gradiseg: Gradient-guided gaussian segmentation with enhanced 3d boundary precision.arXiv preprint arXiv:2412.00392, 2024. 1

  26. [26]

    Stdr: Spatio-temporal decou- pling for real-time dynamic scene rendering.arXiv preprint arXiv:2505.22400, 2025

    Zehao Li, Hao Jiang, Yujun Cai, Jianing Chen, Baolong Bi, Shuqin Gao, Honglong Zhao, Yiwei Wang, Tianlu Mao, and Zhaoqi Wang. Stdr: Spatio-temporal decou- pling for real-time dynamic scene rendering.arXiv preprint arXiv:2505.22400, 2025. 2

  27. [27]

    Ulsr-gs: Urban large- scale surface reconstruction gaussian splatting with multi- view geometric consistency.ISPRS Journal of Photogram- metry and Remote Sensing, 230:861–880, 2025

    Zhuoxiao Li, Shanliang Yao, Taoyu Wu, Yong Yue, Wu- fan Zhao, Rongjun Qin, ´Angel F Garc´ıa-Fern´andez, Andrew Levers, Jason Ralph, and Xiaohui Zhu. Ulsr-gs: Urban large- scale surface reconstruction gaussian splatting with multi- view geometric consistency.ISPRS Journal of Photogram- metry and Remote Sensing, 230:861–880, 2025. 2

  28. [28]

    Longsplat: Robust unposed 3d gaussian splatting for casual long videos

    Chin-Yang Lin, Cheng Sun, Fu-En Yang, Min-Hung Chen, Yen-Yu Lin, and Yu-Lun Liu. Longsplat: Robust unposed 3d gaussian splatting for casual long videos. InProceedings of the IEEE/CVF International Conference on Computer Vi- sion, pages 27412–27422, 2025. 2

  29. [29]

    Vastgaussian: Vast 3d gaussians for large scene reconstruction

    Jiaqi Lin, Zhihao Li, Xiao Tang, Jianzhuang Liu, Shiyong Liu, Jiayue Liu, Yangdi Lu, Xiaofei Wu, Songcen Xu, You- liang Yan, et al. Vastgaussian: Vast 3d gaussians for large scene reconstruction. InProceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition, pages 5166–5175, 2024. 2, 3, 5

  30. [30]

    Capturing, reconstructing, and simulating: the urbanscene3d dataset

    Liqiang Lin, Yilin Liu, Yue Hu, Xingguang Yan, Ke Xie, and Hui Huang. Capturing, reconstructing, and simulating: the urbanscene3d dataset. InEuropean Conference on Computer Vision, pages 93–109. Springer, 2022. 3

  31. [31]

    Holistic large-scale scene reconstruction via mixed gaussian splatting.arXiv preprint arXiv:2505.23280, 2025

    Chuandong Liu, Huijiao Wang, Lei Yu, and Gui-Song Xia. Holistic large-scale scene reconstruction via mixed gaussian splatting.arXiv preprint arXiv:2505.23280, 2025. 2, 3

  32. [32]

    Citygaussian: Real-time high-quality large-scale scene rendering with gaussians

    Yang Liu, Chuanchen Luo, Lue Fan, Naiyan Wang, Jun- ran Peng, and Zhaoxiang Zhang. Citygaussian: Real-time high-quality large-scale scene rendering with gaussians. In European Conference on Computer Vision, pages 265–282. Springer, 2024. 7

  33. [33]

    Citygaussianv2: Efficient and geometri- cally accurate reconstruction for large-scale scenes.arXiv preprint arXiv:2411.00771, 2024

    Yang Liu, Chuanchen Luo, Zhongkai Mao, Junran Peng, and Zhaoxiang Zhang. Citygaussianv2: Efficient and geometri- cally accurate reconstruction for large-scale scenes.arXiv preprint arXiv:2411.00771, 2024. 1, 2, 3, 5, 6, 7, 8

  34. [34]

    Taming 3dgs: High-quality radiance fields with limited resources

    Saswat Subhajyoti Mallick, Rahul Goel, Bernhard Kerbl, Markus Steinberger, Francisco Vicente Carrasco, and Fer- nando De La Torre. Taming 3dgs: High-quality radiance fields with limited resources. InSIGGRAPH Asia 2024 Con- ference Papers, pages 1–11, 2024. 3

  35. [35]

    Nerf in the wild: Neural radiance fields for uncon- strained photo collections

    Ricardo Martin-Brualla, Noha Radwan, Mehdi SM Sajjadi, Jonathan T Barron, Alexey Dosovitskiy, and Daniel Duck- worth. Nerf in the wild: Neural radiance fields for uncon- strained photo collections. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 7210–7219, 2021. 2

  36. [36]

    Nerf: Representing scenes as neural radiance fields for view syn- thesis.Communications of the ACM, 65(1):99–106, 2021

    Ben Mildenhall, Pratul P Srinivasan, Matthew Tancik, Jonathan T Barron, Ravi Ramamoorthi, and Ren Ng. Nerf: Representing scenes as neural radiance fields for view syn- thesis.Communications of the ACM, 65(1):99–106, 2021. 2

  37. [37]

    Instant neural graphics primitives with a mul- tiresolution hash encoding.ACM transactions on graphics (TOG), 41(4):1–15, 2022

    Thomas M ¨uller, Alex Evans, Christoph Schied, and Alexan- der Keller. Instant neural graphics primitives with a mul- tiresolution hash encoding.ACM transactions on graphics (TOG), 41(4):1–15, 2022. 2

  38. [38]

    Octree-gs: Towards consistent real-time rendering with lod-structured 3d gaussians.arXiv preprint arXiv:2403.17898, 2024

    Kerui Ren, Lihan Jiang, Tao Lu, Mulin Yu, Linning Xu, Zhangkai Ni, and Bo Dai. Octree-gs: Towards consistent real-time rendering with lod-structured 3d gaussians.arXiv preprint arXiv:2403.17898, 2024. 3, 2

  39. [39]

    Structure- from-motion revisited

    Johannes L Schonberger and Jan-Michael Frahm. Structure- from-motion revisited. InProceedings of the IEEE con- ference on computer vision and pattern recognition, pages 4104–4113, 2016. 3

  40. [40]

    Mega-nerf: Scalable construction of large- scale nerfs for virtual fly-throughs

    Haithem Turki, Deva Ramanan, and Mahadev Satya- narayanan. Mega-nerf: Scalable construction of large- scale nerfs for virtual fly-throughs. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 12922–12931, 2022. 3

  41. [41]

    Gaus- surf: Geometry-guided 3d gaussian splatting for surface re- construction.arXiv preprint arXiv:2411.19454, 2024

    Jiepeng Wang, Yuan Liu, Peng Wang, Cheng Lin, Junhui Hou, Xin Li, Taku Komura, and Wenping Wang. Gaus- surf: Geometry-guided 3d gaussian splatting for surface re- construction.arXiv preprint arXiv:2411.19454, 2024. 3

  42. [42]

    NeuS: Learning Neural Implicit Surfaces by Volume Rendering for Multi-view Reconstruction

    Peng Wang, Lingjie Liu, Yuan Liu, Christian Theobalt, Taku Komura, and Wenping Wang. Neus: Learning neural implicit surfaces by volume rendering for multi-view reconstruction. arXiv preprint arXiv:2106.10689, 2021. 7

  43. [43]

    MoGe-2: Accurate Monocular Geometry with Metric Scale and Sharp Details

    Ruicheng Wang, Sicheng Xu, Yue Dong, Yu Deng, Jianfeng Xiang, Zelong Lv, Guangzhong Sun, Xin Tong, and Jiaolong Yang. Moge-2: Accurate monocular geometry with metric scale and sharp details.arXiv preprint arXiv:2507.02546,

  44. [44]

    Yifan Wang, Jianjun Zhou, Haoyi Zhu, Wenzheng Chang, Yang Zhou, Zizun Li, Junyi Chen, Jiangmiao Pang, Chunhua Shen, and Tong He.π 3: Scalable permutation-equivariant visual geometry learning.arXiv preprint arXiv:2507.13347,

  45. [45]

    4d gaussian splatting for real-time dynamic scene rendering

    Guanjun Wu, Taoran Yi, Jiemin Fang, Lingxi Xie, Xiaopeng Zhang, Wei Wei, Wenyu Liu, Qi Tian, and Xinggang Wang. 4d gaussian splatting for real-time dynamic scene rendering. InProceedings of the IEEE/CVF conference on computer vi- sion and pattern recognition, pages 20310–20320, 2024. 1

  46. [46]

    Sparse2dgs: Geometry-prioritized gaussian splatting for surface reconstruction from sparse views

    Jiang Wu, Rui Li, Yu Zhu, Rong Guo, Jinqiu Sun, and Yan- ning Zhang. Sparse2dgs: Geometry-prioritized gaussian splatting for surface reconstruction from sparse views. In Proceedings of the Computer Vision and Pattern Recognition Conference, pages 11307–11316, 2025. 3

  47. [47]

    Gauu-scene v2: Assessing the reliability of image-based metrics with expansive lidar image dataset using 3dgs and nerf.arXiv preprint arXiv:2404.04880, 2024

    Butian Xiong, Nanjun Zheng, Junhua Liu, and Zhen Li. Gauu-scene v2: Assessing the reliability of image-based metrics with expansive lidar image dataset using 3dgs and nerf.arXiv preprint arXiv:2404.04880, 2024. 1, 6, 7, 8, 2

  48. [48]

    Absgs: Recovering fine details in 3d gaussian splat- ting

    Zongxin Ye, Wenyu Li, Sidun Liu, Peng Qiao, and Yong Dou. Absgs: Recovering fine details in 3d gaussian splat- ting. InProceedings of the 32nd ACM International Confer- ence on Multimedia, pages 1053–1061, 2024. 2

  49. [49]

    Mip-splatting: Alias-free 3d gaussian splat- 10 ting

    Zehao Yu, Anpei Chen, Binbin Huang, Torsten Sattler, and Andreas Geiger. Mip-splatting: Alias-free 3d gaussian splat- 10 ting. InProceedings of the IEEE/CVF conference on com- puter vision and pattern recognition, pages 19447–19456,

  50. [50]

    Gaussian opacity fields: Efficient adaptive surface reconstruction in unbounded scenes.ACM Transactions on Graphics (ToG), 43(6):1–13, 2024

    Zehao Yu, Torsten Sattler, and Andreas Geiger. Gaussian opacity fields: Efficient adaptive surface reconstruction in unbounded scenes.ACM Transactions on Graphics (ToG), 43(6):1–13, 2024. 3, 7

  51. [51]

    Robust and efficient 3d gaussian splatting for urban scene reconstruction.arXiv preprint arXiv:2507.23006, 2025

    Zhensheng Yuan, Haozhi Huang, Zhen Xiong, Di Wang, and Guanghua Yang. Robust and efficient 3d gaussian splatting for urban scene reconstruction.arXiv preprint arXiv:2507.23006, 2025. 2, 3

  52. [52]

    3dmatch: Learning local geometric descriptors from rgb-d reconstruc- tions

    Andy Zeng, Shuran Song, Matthias Nießner, Matthew Fisher, Jianxiong Xiao, and Thomas Funkhouser. 3dmatch: Learning local geometric descriptors from rgb-d reconstruc- tions. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 1802–1811, 2017. 1

  53. [53]

    Ref-gs: Directional factorization for 2d gaussian splatting

    Youjia Zhang, Anpei Chen, Yumin Wan, Zikai Song, Jun- qing Yu, Yawei Luo, and Wei Yang. Ref-gs: Directional factorization for 2d gaussian splatting. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 26483–26492, 2025. 2, 3, 5

  54. [54]

    Pixel-gs: Density control with pixel-aware gradient for 3d gaussian splatting

    Zheng Zhang, Wenbo Hu, Yixing Lao, Tong He, and Heng- shuang Zhao. Pixel-gs: Density control with pixel-aware gradient for 3d gaussian splatting. InEuropean Conference on Computer Vision, pages 326–342. Springer, 2024. 2

  55. [55]

    On scaling up 3d gaussian splatting training

    Hexu Zhao, Haoyang Weng, Daohan Lu, Ang Li, Jinyang Li, Aurojit Panda, and Saining Xie. On scaling up 3d gaussian splatting training. InEuropean Conference on Computer Vi- sion, pages 14–36. Springer, 2024. 2, 3

  56. [56]

    Clm: Removing the gpu memory barrier for 3d gaussian splatting.arXiv preprint arXiv:2511.04951, 2025

    Hexu Zhao, Xiwen Min, Xiaoteng Liu, Moonjun Gong, Yim- ing Li, Ang Li, Saining Xie, Jinyang Li, and Aurojit Panda. Clm: Removing the gpu memory barrier for 3d gaussian splatting.arXiv preprint arXiv:2511.04951, 2025. 3 11 MetroGS: Efficient and Stable Reconstruction of Geometrically Accurate High-Fidelity Large-Scale Scenes Supplementary Material A. Imple...