AdaptSplat adds a Frequency-Preserving Adapter to vision foundation models to boost high-frequency fidelity and cross-domain performance in feed-forward 3D Gaussian Splatting.
GlobalSplat: Efficient Feed-Forward 3D Gaussian Splatting via Global Scene Tokens
3 Pith papers cite this work. Polarity classification is still indexing.
abstract
The efficient spatial allocation of primitives serves as the foundation of 3D Gaussian Splatting, as it directly dictates the synergy between representation compactness, reconstruction speed, and rendering fidelity. Previous solutions, whether based on iterative optimization or feed-forward inference, suffer from significant trade-offs between these goals, mainly due to the reliance on local, heuristic-driven allocation strategies that lack global scene awareness. Specifically, current feed-forward methods are largely pixel-aligned or voxel-aligned. By unprojecting pixels into dense, view-aligned primitives, they bake redundancy into the 3D asset. As more input views are added, the representation size increases and global consistency becomes fragile. To this end, we introduce GlobalSplat, a framework built on the principle of align first, decode later. Our approach learns a compact, global, latent scene representation that encodes multi-view input and resolves cross-view correspondences before decoding any explicit 3D geometry. Crucially, this formulation enables compact, globally consistent reconstructions without relying on pretrained pixel-prediction backbones or reusing latent features from dense baselines. Utilizing a coarse-to-fine training curriculum that gradually increases decoded capacity, GlobalSplat natively prevents representation bloat. On RealEstate10K and ACID, our model achieves competitive novel-view synthesis performance while utilizing as few as 16K Gaussians, significantly less than required by dense pipelines, obtaining a light 4MB footprint. Further, GlobalSplat enables significantly faster inference than the baselines, operating under 78 milliseconds in a single forward pass. Project page is available at https://r-itk.github.io/globalsplat/
citation-role summary
citation-polarity summary
fields
cs.CV 3years
2026 3verdicts
UNVERDICTED 3roles
background 1polarities
background 1representative citing papers
PRISM is a feed-forward framework that decomposes single-image 3D reconstruction into a geometric warp prior plus residual correction, claiming competitive quality at 36-second inference.
CanonicalGS aggregates view-centric evidence into a canonical latent world with uncertainty-aware fusion to improve novel view synthesis and downstream perception tasks.
citing papers explorer
-
AdaptSplat: Adapting Vision Foundation Models for Feed-Forward 3D Gaussian Splatting
AdaptSplat adds a Frequency-Preserving Adapter to vision foundation models to boost high-frequency fidelity and cross-domain performance in feed-forward 3D Gaussian Splatting.
-
PRISM: Feed-Forward Single-Image 3D Reconstruction via Geometric Warp-Residual Modeling
PRISM is a feed-forward framework that decomposes single-image 3D reconstruction into a geometric warp prior plus residual correction, claiming competitive quality at 36-second inference.
-
Learning Stable Canonical Worlds for Novel View Synthesis and Beyond
CanonicalGS aggregates view-centric evidence into a canonical latent world with uncertainty-aware fusion to improve novel view synthesis and downstream perception tasks.