A LoD of Gaussians: Unified Training and Rendering for Ultra-Large Scale Reconstruction with External Memory

Dieter Schmalstieg; Felix Windisch; Lukas Radl; Markus Steinberger; Mattia D'Urso; Michael Steiner; Thomas K\"ohler

arxiv: 2507.01110 · v4 · submitted 2025-07-01 · 💻 cs.GR · cs.LG

A LoD of Gaussians: Unified Training and Rendering for Ultra-Large Scale Reconstruction with External Memory

Felix Windisch , Thomas K\"ohler , Lukas Radl , Mattia D'Urso , Michael Steiner , Dieter Schmalstieg , Markus Steinberger This is my paper

Pith reviewed 2026-05-19 06:19 UTC · model grok-4.3

classification 💻 cs.GR cs.LG

keywords Gaussian SplattingLevel of DetailOut-of-core renderingLarge-scale reconstructionNovel view synthesisExternal memoryReal-time renderingMulti-scale scenes

0 comments

The pith

A single consumer GPU trains and renders city-scale Gaussian scenes by dynamically streaming level-of-detail representations from external memory without scene partitioning.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a framework that keeps an entire large scene in CPU memory and trains a level-of-detail Gaussian representation directly on that data. Only the Gaussians needed for the current viewpoint are streamed to the GPU, allowing training and rendering to move smoothly from wide aerial views down to fine street-level detail. This approach removes the need to split scenes into chunks, which previously created boundary artifacts and made multi-scale training difficult. A hybrid data structure and a caching system that uses temporal coherence keep the process efficient enough for interactive rates on ordinary hardware.

Core claim

We introduce A LoD of Gaussians, a framework for training and rendering ultra-large-scale Gaussian scenes on a single consumer-grade GPU without partitioning. Our method stores the full scene out-of-core in CPU memory and trains a Level-of-Detail representation directly, dynamically streaming only the relevant Gaussians. A hybrid data structure combining Gaussian hierarchies with Sequential Point Trees enables efficient view-dependent LoD selection, while a lightweight caching and view scheduling system exploits temporal coherence to support real-time streaming and rendering.

What carries the argument

Hybrid data structure of Gaussian hierarchies combined with Sequential Point Trees, paired with a lightweight caching and view scheduling system, that selects and streams the appropriate level of Gaussians for the current view.

If this is right

Training and rendering can proceed across wide ranges of scale in one continuous model rather than separate chunk models.
Boundary artifacts that appear when scenes are split into independent parts are avoided.
The full scene remains available for training without loading every part into GPU memory at once.
Interactive visualization becomes possible for complex outdoor environments on standard consumer hardware.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same streaming LoD idea could be applied to other point-based or radiance-field representations that currently hit memory limits on large scenes.
Real-time applications such as city navigation or drone mapping might become feasible without high-end server hardware.
Further work could test whether the same hierarchy supports dynamic scene changes or online updates without full retraining.

Load-bearing premise

The caching and scheduling system can reliably exploit temporal coherence to choose and transfer the right Gaussians fast enough to avoid both visual artifacts and slowdowns when scenes reach city scale.

What would settle it

Render a full city-scale scene from multiple changing viewpoints that include both aerial and ground-level perspectives while measuring whether frame rate remains interactive and whether visible seams or quality drops appear at transitions between detail levels.

Figures

Figures reproduced from arXiv: 2507.01110 by Dieter Schmalstieg, Felix Windisch, Lukas Radl, Markus Steinberger, Mattia D'Urso, Michael Steiner, Thomas K\"ohler.

**Figure 1.** Figure 1: We introduce a fully hierarchical 3D Gaussian representation trained directly across unstructured, multi-scale image [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗

**Figure 2.** Figure 2: A Sequential Point Tree and Gaussian hierarchy [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 4.** Figure 4: A Gaussian hierarchy is converted to an HSPT by [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

**Figure 5.** Figure 5: Caching strategy overview. Gaussians required for [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗

**Figure 7.** Figure 7: Method Overview: Steps ○1 to ○8 show the process of a single training iteration, while ○A through ○D show a densification step [PITH_FULL_IMAGE:figures/full_fig_p010_7.png] view at source ↗

**Figure 8.** Figure 8: Overview and examples of the street and aerial views (red) of the MatrixCity-Scale Dataset along with the COLMAP [PITH_FULL_IMAGE:figures/full_fig_p010_8.png] view at source ↗

**Figure 9.** Figure 9: Frustum Culling and LoD selection (left) greatly [PITH_FULL_IMAGE:figures/full_fig_p011_9.png] view at source ↗

**Figure 11.** Figure 11: SPTs for a frame of MatrixCity rendered in differ [PITH_FULL_IMAGE:figures/full_fig_p012_11.png] view at source ↗

**Figure 12.** Figure 12: Training image rendered during the iteration de [PITH_FULL_IMAGE:figures/full_fig_p012_12.png] view at source ↗

**Figure 13.** Figure 13: Images from Hierarchical 3DGS on the MatrixCity [PITH_FULL_IMAGE:figures/full_fig_p012_13.png] view at source ↗

read the original abstract

Gaussian Splatting has emerged as a high-performance technique for novel view synthesis, enabling real-time rendering and high-quality reconstruction of small scenes. However, scaling to larger environments has so far relied on partitioning the scene into chunks -- a strategy that introduces artifacts at chunk boundaries, complicates training across varying scales, and is poorly suited to unstructured scenarios such as city-scale flyovers combined with street-level views. Moreover, rendering remains fundamentally limited by GPU memory, as all visible chunks must reside in VRAM simultaneously. We introduce A LoD of Gaussians, a framework for training and rendering ultra-large-scale Gaussian scenes on a single consumer-grade GPU -- without partitioning. Our method stores the full scene out-of-core (e.g., in CPU memory) and trains a Level-of-Detail (LoD) representation directly, dynamically streaming only the relevant Gaussians. A hybrid data structure combining Gaussian hierarchies with Sequential Point Trees enables efficient, view-dependent LoD selection, while a lightweight caching and view scheduling system exploits temporal coherence to support real-time streaming and rendering. Together, these innovations enable seamless multi-scale reconstruction and interactive visualization of complex scenes -- from broad aerial views to fine-grained ground-level details.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper trains a LoD Gaussian representation directly on the full out-of-core scene and streams via a hybrid hierarchy plus Sequential Point Trees to skip chunking.

read the letter

The main point is that they store the entire scene externally, train a level-of-detail structure on it without splitting into chunks, and then pull in only the Gaussians needed for the current view. This targets the boundary seams and VRAM limits that show up in city-scale work today. The hybrid data structure with Sequential Point Trees for view-dependent selection plus the caching scheduler is the concrete mechanism they add on top of standard Gaussian Splatting. That combination is new relative to the partitioned baselines they cite. The approach looks workable for multi-scale flyovers down to street level on one consumer GPU. The temporal coherence trick in the scheduler is a sensible engineering choice that should help keep frame times stable. One soft spot is that the abstract gives no numbers on training time, peak memory during streaming, or side-by-side error metrics against chunked methods, so the real overhead and artifact levels are still unclear. If the hierarchy construction turns out expensive or the LoD transitions need extra tuning, the practical gain could shrink. The paper is aimed at people already working on large-scale novel view synthesis who hit memory walls with current tools. A reader who needs seamless urban reconstructions or fly-throughs would get the most from the architecture details. It deserves peer review because the core claim is coherent and the problem it attacks is genuine, even if the experiments will need careful checking.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces 'A LoD of Gaussians', a framework for unified training and rendering of ultra-large-scale Gaussian Splatting scenes on a single consumer GPU without scene partitioning. The full scene is stored out-of-core (e.g., CPU memory), a Level-of-Detail representation is trained directly, and only relevant Gaussians are dynamically streamed using a hybrid data structure that combines Gaussian hierarchies with Sequential Point Trees for view-dependent LoD selection, together with lightweight caching and view scheduling that exploits temporal coherence for real-time performance.

Significance. If the empirical claims hold, the work would be a meaningful contribution to novel view synthesis by removing the need for chunk-based partitioning and its associated boundary artifacts, while supporting seamless multi-scale reconstruction from aerial to ground-level views in unstructured large environments. This could broaden applicability to city-scale modeling and flyover scenarios on commodity hardware.

major comments (2)

[§4.2] §4.2, hybrid Gaussian hierarchy + Sequential Point Tree construction: the description of view-dependent LoD selection does not include a formal analysis of selection cost or a proof that the structure preserves the original Gaussian optimization objective across scales; without this, it is unclear whether the streaming mechanism can guarantee artifact-free results at the claimed scales.
[Table 3] Table 3, large-scene rows: the reported PSNR and rendering FPS values are given only for the proposed method; the absence of direct comparison against a partitioned baseline at identical memory budgets makes it difficult to quantify the claimed advantage of the out-of-core LoD approach.

minor comments (2)

[§5.3] §5.3, caching policy: the pseudocode for the view scheduler would benefit from an accompanying diagram showing the temporal coherence exploitation loop.
[Figure 7] Figure 7 caption: the scene extents and exact GPU model used for the timing measurements should be stated explicitly.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the encouraging summary, the recognition of the contribution, and the recommendation for minor revision. We address each major comment below.

read point-by-point responses

Referee: [§4.2] §4.2, hybrid Gaussian hierarchy + Sequential Point Tree construction: the description of view-dependent LoD selection does not include a formal analysis of selection cost or a proof that the structure preserves the original Gaussian optimization objective across scales; without this, it is unclear whether the streaming mechanism can guarantee artifact-free results at the claimed scales.

Authors: We thank the referee for this observation. The hybrid hierarchy is constructed by bottom-up merging of Gaussians that share similar spatial and appearance properties, with the Sequential Point Tree providing logarithmic-time view-dependent traversal and culling. Because the LoD representation is trained directly against the full out-of-core scene, the optimization objective is preserved at training time; at inference the hierarchy approximates the original set of Gaussians. We will add an explicit complexity analysis of the selection procedure (O(log N) per query) together with a discussion of the approximation error introduced by hierarchical merging in the revised §4.2. revision: yes
Referee: [Table 3] Table 3, large-scene rows: the reported PSNR and rendering FPS values are given only for the proposed method; the absence of direct comparison against a partitioned baseline at identical memory budgets makes it difficult to quantify the claimed advantage of the out-of-core LoD approach.

Authors: We agree that a side-by-side comparison at matched memory budgets would strengthen the evaluation. For the largest scenes, however, any partitioned baseline that keeps multiple chunks resident simultaneously exceeds the VRAM limits we target, which is precisely the limitation our method removes. In the revision we will augment Table 3 with memory-footprint numbers for representative partitioned baselines on the same scenes (where they can be run) and add a short paragraph quantifying the memory advantage of the unified out-of-core LoD approach. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper describes a systems-level framework for out-of-core LoD Gaussian training and rendering that relies on a hybrid data structure (Gaussian hierarchies combined with Sequential Point Trees), lightweight caching, and view-dependent scheduling to enable streaming from external memory. No equations, derivations, fitted parameters, or first-principles results are presented that reduce to their own inputs by construction. The central claims concern the design and empirical behavior of the new architecture rather than any self-referential mathematical step or load-bearing self-citation chain. The contribution is therefore self-contained as an engineering and data-structure innovation.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 1 invented entities

The framework depends on the unproven effectiveness of the hybrid hierarchy and caching system for maintaining visual quality and real-time performance at city scale; these components are introduced without independent empirical support in the provided abstract.

invented entities (1)

Hybrid Gaussian hierarchy with Sequential Point Trees for LoD selection no independent evidence
purpose: Enable efficient view-dependent streaming and rendering of ultra-large scenes from external memory
New data structure proposed in the abstract to solve memory and scale limitations; no independent validation or falsifiable prediction supplied.

pith-pipeline@v0.9.0 · 5768 in / 1282 out tokens · 58713 ms · 2026-05-19T06:19:47.002608+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

A hybrid data structure combining Gaussian hierarchies with Sequential Point Trees enables efficient, view-dependent LoD selection... cut condition cSPT(i, cam) = md(parent(i)) > ||μroot − pcam||² ≥ md(i)
IndisputableMonolith/Foundation/BranchSelection.lean branch_selection unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

hierarchy densification strategy... spawn two new child nodes for a leaf instead of splitting a Gaussian

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

2 extracted references · 2 canonical work pages

[1]

arXiv:2311.17245 [cs.CV] https://arxiv.org/abs/2311.17245 Guangchi Fang and Bing Wang

LightGaussian: Unbounded 3D Gaussian Compression with 15x Reduction and 200+ FPS. arXiv:2311.17245 [cs.CV] https://arxiv.org/abs/2311.17245 Guangchi Fang and Bing Wang. 2024. Mini-Splatting: Representing Scenes with a Constrained Number of Gaussians. arXiv:2403.14166 [cs.CV] https://arxiv.org/abs/ 2403.14166 Sharath Girish, Kamal Gupta, and Abhinav Shriva...

work page doi:10.1145/3658160 2024
[2]

Schonberger and Jan-Michael Frahm

Octree-GS: Towards Consistent Real-time Rendering with LOD-Structured A LoD of Gaussians: Unified Training and Rendering for Ultra-Large Scale Reconstruction with External Memory Conference’17, July 2017, Washington, DC, USA 3D Gaussians. arXiv:2403.17898 [cs.CV] https://arxiv.org/abs/2403.17898 Johannes L. Schönberger and Jan-Michael Frahm. 2016. Structu...

work page doi:10.1109/cvpr.2016.445 2017

[1] [1]

arXiv:2311.17245 [cs.CV] https://arxiv.org/abs/2311.17245 Guangchi Fang and Bing Wang

LightGaussian: Unbounded 3D Gaussian Compression with 15x Reduction and 200+ FPS. arXiv:2311.17245 [cs.CV] https://arxiv.org/abs/2311.17245 Guangchi Fang and Bing Wang. 2024. Mini-Splatting: Representing Scenes with a Constrained Number of Gaussians. arXiv:2403.14166 [cs.CV] https://arxiv.org/abs/ 2403.14166 Sharath Girish, Kamal Gupta, and Abhinav Shriva...

work page doi:10.1145/3658160 2024

[2] [2]

Schonberger and Jan-Michael Frahm

Octree-GS: Towards Consistent Real-time Rendering with LOD-Structured A LoD of Gaussians: Unified Training and Rendering for Ultra-Large Scale Reconstruction with External Memory Conference’17, July 2017, Washington, DC, USA 3D Gaussians. arXiv:2403.17898 [cs.CV] https://arxiv.org/abs/2403.17898 Johannes L. Schönberger and Jan-Michael Frahm. 2016. Structu...

work page doi:10.1109/cvpr.2016.445 2017