pith. machine review for the scientific record. sign in

arxiv: 2604.03069 · v1 · submitted 2026-04-03 · 💻 cs.CV

Recognition: no theorem link

SparseSplat: Towards Applicable Feed-Forward 3D Gaussian Splatting with Pixel-Unaligned Prediction

Authors on Pith no claims yet

Pith reviewed 2026-05-13 19:57 UTC · model grok-4.3

classification 💻 cs.CV
keywords 3D Gaussian Splattingfeed-forward predictionsparse representationentropy samplingscene reconstructionrendering qualitypoint cloud networkadaptive density
0
0 comments X

The pith

SparseSplat generates compact 3D Gaussian maps that deliver state-of-the-art rendering quality using only 22 percent of the usual primitives.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a feed-forward approach to 3D Gaussian Splatting that varies the number and size of Gaussians according to local scene complexity instead of spreading them uniformly. It introduces entropy-based sampling to place larger, fewer Gaussians in plain areas and smaller, denser ones where detail is high, plus a dedicated point cloud network that extracts context and predicts attributes directly. This produces much smaller representations than prior feed-forward methods while keeping or improving image quality. A reader would care because the resulting maps use far less memory and compute, making them easier to store, transmit, and use in downstream tasks such as reconstruction or robotics.

Core claim

SparseSplat is the first feed-forward 3DGS model that adaptively adjusts Gaussian density according to scene structure and information richness of local regions. It does so through entropy-based probabilistic sampling, which creates large sparse Gaussians in textureless areas and small dense Gaussians in rich regions, together with a specialized point cloud network that encodes local context and decodes it into 3DGS attributes to fix the receptive-field mismatch with standard optimization pipelines.

What carries the argument

Entropy-based probabilistic sampling paired with a specialized point cloud network that encodes local context and predicts 3DGS attributes, allowing density to vary with scene structure.

If this is right

  • Downstream reconstruction tasks can now use the generated maps directly because they are no longer uniformly dense and redundant.
  • Memory and storage requirements for 3D scene representations drop sharply while image fidelity stays high.
  • Reasonable quality is retained even when the map is reduced to 1.5 percent of the Gaussians produced by prior uniform methods.
  • The approach removes the need for post-hoc pruning steps that current feed-forward pipelines require.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same entropy-driven placement could be tested on dynamic scenes to see whether temporal consistency emerges without extra regularization.
  • Because the maps are already sparse, they may integrate more easily with existing compression or level-of-detail pipelines for large environments.
  • The point cloud network design suggests a route to replace the full 3DGS optimization loop with a single forward pass in other primitive-based representations.

Load-bearing premise

That entropy reliably measures information richness and scene structure and that the point cloud network fully compensates for the receptive-field difference between feed-forward prediction and standard 3DGS optimization.

What would settle it

A test scene in which entropy sampling visibly drops rendering quality below the claimed levels even at 22 percent density, or produces artifacts traceable to mismatched receptive fields in the point cloud network.

Figures

Figures reproduced from arXiv: 2604.03069 by Ke Wu, Wenchao Ding, Xiangting Meng, Zicheng Zhang.

Figure 1
Figure 1. Figure 1: SparseSplat achieves state-of-the-art rendering quality on DL3DV [ [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Overall Pipeline of SparseSplat. Our method begins with using a frozen backbone [39] to generate feature maps and depth maps from multi-view posed images. Next, in the Adaptive Primitive Sampling stage, the entropy maps are calculated and transformed into probability maps to perform sampling, resulting in sparse 2D pixels. These pixels are then back-projected into 3D space using the predicted depth to form… view at source ↗
Figure 3
Figure 3. Figure 3: The Locality of Classic 3DGS Optimization. In this example, three 3D Gaussian primitives are splatted onto the 2D image plane. Primitive gc covers two pixels: one covered ex￾clusively by gc, and another accumulating contributions from all three primitives. During backpropagation, gradients propagate to gc through both pixels. Notably, ga and gb modulate the gradient flow at the shared pixel by affecting th… view at source ↗
Figure 4
Figure 4. Figure 4: Rendering quality comparisons on DL3DV. Our model matches the SOTA rendering quality of DepthSplat with only 150k Gaussians (vs. 688k). Under sparse settings (40k and 10k), our method maintains structural integrity and shows minor progressive blurring [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Additional qualitative comparisons. 1 [PITH_FULL_IMAGE:figures/full_fig_p011_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Rendering Efficiency vs. Gaussian Count. X-axis log-scaled. We evaluate the rendering frame rate (FPS) across scenes with varying degrees of sparsity. While pixel-aligned baselines (typified by ∼688k Gaussians) operate at 71.9 FPS, our sparse-by-design approach significantly accelerates rendering. Our 150k model achieves ∼3× speedup (208.6 FPS), and ex￾tremely sparse settings (10k–40k) unlock rates suitabl… view at source ↗
read the original abstract

Recent progress in feed-forward 3D Gaussian Splatting (3DGS) has notably improved rendering quality. However, the spatially uniform and highly redundant 3DGS map generated by previous feed-forward 3DGS methods limits their integration into downstream reconstruction tasks. We propose SparseSplat, the first feed-forward 3DGS model that adaptively adjusts Gaussian density according to scene structure and information richness of local regions, yielding highly compact 3DGS maps. To achieve this, we propose entropy-based probabilistic sampling, generating large, sparse Gaussians in textureless areas and assigning small, dense Gaussians to regions with rich information. Additionally, we designed a specialized point cloud network that efficiently encodes local context and decodes it into 3DGS attributes, addressing the receptive field mismatch between the general 3DGS optimization pipeline and feed-forward models. Extensive experimental results demonstrate that SparseSplat can achieve state-of-the-art rendering quality with only 22% of the Gaussians and maintain reasonable rendering quality with only 1.5% of the Gaussians. Project page: https://victkk.github.io/SparseSplat-page/.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. SparseSplat proposes the first feed-forward 3D Gaussian Splatting model that adaptively adjusts Gaussian density according to scene structure and information richness using entropy-based probabilistic sampling (large sparse Gaussians in textureless regions, small dense ones in rich regions) together with a specialized point cloud network to resolve receptive-field mismatch between general 3DGS pipelines and feed-forward prediction. The central claim is that this yields highly compact maps while achieving state-of-the-art rendering quality with only 22% of the Gaussians and reasonable quality with only 1.5% of the Gaussians.

Significance. If the efficiency and quality claims hold under rigorous validation, the work would meaningfully advance practical feed-forward 3DGS by reducing spatial redundancy and producing compact representations better suited to downstream reconstruction tasks, directly addressing a core limitation of prior uniform-density feed-forward methods.

major comments (2)
  1. [Abstract] Abstract: the headline performance numbers (SOTA quality at 22% Gaussians, usable at 1.5%) rest on the unverified assumption that 2D entropy-based probabilistic sampling reliably identifies information richness and 3D structure; without explicit 3D consistency checks or geometric importance, thin structures or view-dependent effects risk under-sampling and silent quality degradation.
  2. [Method] Method description: the specialized point cloud network is asserted to fix receptive-field mismatch, yet its benefit is conditional on the sampling already placing Gaussians correctly; the two components are not independently validated, leaving the load-bearing contribution of each unclear for the reported efficiency gains.
minor comments (2)
  1. [Abstract] Abstract: specify the exact datasets, baselines, and quantitative metrics (PSNR/SSIM/LPIPS) supporting the 22% and 1.5% claims.
  2. [Experiments] Experiments: include ablation isolating entropy sampling from the point-cloud network and test cases with thin geometry or specular surfaces to probe the sampling assumption.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback and detailed comments on our manuscript. We address each major comment point by point below, clarifying our approach and outlining planned revisions to strengthen the presentation of results.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the headline performance numbers (SOTA quality at 22% Gaussians, usable at 1.5%) rest on the unverified assumption that 2D entropy-based probabilistic sampling reliably identifies information richness and 3D structure; without explicit 3D consistency checks or geometric importance, thin structures or view-dependent effects risk under-sampling and silent quality degradation.

    Authors: We appreciate the referee's point on the need for stronger validation of the sampling strategy. The entropy computation is performed in 2D image space to estimate local information richness, which we then use to guide non-uniform 3D Gaussian placement; our experiments across multiple datasets demonstrate that this yields compact maps without visible degradation in rendering quality, including on scenes containing thin structures. To directly address the concern, we will add in the revision: (i) qualitative visualizations overlaying sampled Gaussians on scene geometry, (ii) quantitative comparisons of Gaussian density versus local depth variance, and (iii) targeted evaluation on thin-structure subsets. These additions will make the link between 2D entropy and 3D structure explicit. revision: yes

  2. Referee: [Method] Method description: the specialized point cloud network is asserted to fix receptive-field mismatch, yet its benefit is conditional on the sampling already placing Gaussians correctly; the two components are not independently validated, leaving the load-bearing contribution of each unclear for the reported efficiency gains.

    Authors: We agree that the individual contributions should be isolated for clarity. The point-cloud network is specifically designed to process the irregularly distributed points produced by entropy sampling (using local neighborhood aggregation that respects the non-uniform density), which standard 2D CNN backbones cannot do efficiently. In the revised manuscript we will include a dedicated ablation that keeps the entropy sampling fixed and replaces the specialized network with a baseline (standard PointNet-style encoder followed by per-point MLP decoder). The resulting performance drop will quantify the network's role in handling receptive-field mismatch and enabling the reported efficiency-quality trade-off. revision: yes

Circularity Check

0 steps flagged

No significant circularity; claims rest on architectural novelty and experimental validation

full rationale

The paper introduces entropy-based probabilistic sampling and a specialized point-cloud network as core innovations for adaptive Gaussian density in feed-forward 3DGS. No equations, derivations, or self-citations are shown that reduce performance metrics (e.g., quality at 22% or 1.5% Gaussians) to quantities defined by fitted parameters or prior self-referential results. The approach is presented as an empirical architectural advance with external benchmarking, making the derivation chain self-contained against independent validation.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that local entropy is a sufficient proxy for information richness and on the unstated design choice that a point-cloud network can be trained to output valid 3DGS attributes at variable densities.

axioms (1)
  • domain assumption Local entropy computed from input images accurately reflects scene information richness for deciding Gaussian density.
    Invoked to justify the probabilistic sampling rule that places sparse large Gaussians in low-entropy regions.

pith-pipeline@v0.9.0 · 5515 in / 1189 out tokens · 56844 ms · 2026-05-13T19:57:41.256796+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. PointForward: Feedforward Driving Reconstruction through Point-Aligned Representations

    cs.CV 2026-05 unverdicted novelty 7.0

    PointForward uses sparse world-space 3D queries and scene graphs to deliver consistent single-pass reconstruction of dynamic driving scenes via point-aligned representations.

  2. Genie Sim PanoRecon: Fast Immersive Scene Generation from Single-View Panorama

    cs.RO 2026-04 unverdicted novelty 4.0

    A feed-forward Gaussian-splatting system reconstructs photo-realistic 3D scenes from single-view panoramas in seconds via cube-map decomposition and depth-aware fusion for robotic simulation use.

Reference graph

Works this paper leans on

52 extracted references · 52 canonical work pages · cited by 2 Pith papers · 1 internal anchor

  1. [1]

    sibr: A system for image based rendering, 2020

    Sebastien Bonopera, Peter Hedman, Jerome Esnault, Sid- dhant Prakash, Simon Rodriguez, Theo Thonat, Mehdi Be- nadel, Gaurav Chaurasia, Julien Philip, and George Dret- takis. sibr: A system for image based rendering, 2020. 1

  2. [2]

    pixelsplat: 3d gaussian splats from image pairs for scalable generalizable 3d reconstruction

    David Charatan, Sizhe Li, Andrea Tagliasacchi, and Vincent Sitzmann. pixelsplat: 3d gaussian splats from image pairs for scalable generalizable 3d reconstruction. InCVPR, 2024. 2, 3

  3. [3]

    Mvsplat: Efficient 3d gaussian splatting from sparse multi-view images.arXiv preprint arXiv:2403.14627, 2024

    Yuedong Chen, Haofei Xu, Chuanxia Zheng, Bohan Zhuang, Marc Pollefeys, Andreas Geiger, Tat-Jen Cham, and Jianfei Cai. Mvsplat: Efficient 3d gaussian splatting from sparse multi-view images.arXiv preprint arXiv:2403.14627, 2024. 2, 3, 6

  4. [4]

    Mvsplat360: Feed-forward 360 scene synthesis from sparse views

    Yuedong Chen, Chuanxia Zheng, Haofei Xu, Bohan Zhuang, Andrea Vedaldi, Tat-Jen Cham, and Jianfei Cai. Mvsplat360: Feed-forward 360 scene synthesis from sparse views. 2024

  5. [5]

    Splatformer: Point trans- former for robust 3d gaussian splatting

    Yutong Chen, Marko Mihajlovic, Xiyi Chen, Yiming Wang, Sergey Prokudin, and Siyu Tang. Splatformer: Point trans- former for robust 3d gaussian splatting. InInternational Conference on Learning Representations (ICLR), 2025. 3

  6. [6]

    Cover and P

    T. Cover and P. Hart. Nearest neighbor pattern classifica- tion.IEEE Transactions on Information Theory, 13(1):21– 27, 1967. 5

  7. [7]

    The faiss library

    Matthijs Douze, Alexandr Guzhva, Chengqi Deng, Jeff Johnson, Gergely Szilvasy, Pierre-Emmanuel Mazar´e, Maria Lomeli, Lucas Hosseini, and Herv´e J´egou. The faiss library. IEEE Transactions on Big Data, 2025. 5

  8. [8]

    Plenoxels: Radiance fields without neural networks

    Sara Fridovich-Keil, Alex Yu, Matthew Tancik, Qinhong Chen, Benjamin Recht, and Angjoo Kanazawa. Plenoxels: Radiance fields without neural networks. InCVPR, 2022. 2

  9. [9]

    Cascade cost volume for high-resolution multi-view stereo and stereo matching

    Xiaodong Gu, Zhiwen Fan, Siyu Zhu, Zuozhuo Dai, Feitong Tan, and Ping Tan. Cascade cost volume for high-resolution multi-view stereo and stereo matching. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2495–2504, 2020. 3

  10. [10]

    Pct: Point cloud transformer.Computational visual media, 7(2):187–199,

    Meng-Hao Guo, Jun-Xiong Cai, Zheng-Ning Liu, Tai-Jiang Mu, Ralph R Martin, and Shi-Min Hu. Pct: Point cloud transformer.Computational visual media, 7(2):187–199,

  11. [11]

    Statistical and structural approaches to texture.Proceedings of the IEEE, 67(5):786–804, 1979

    Robert M Haralick. Statistical and structural approaches to texture.Proceedings of the IEEE, 67(5):786–804, 1979. 3

  12. [12]

    Mvsanywhere: Zero-shot multi-view stereo

    Sergio Izquierdo, Mohamed Sayed, Michael Firman, Guillermo Garcia-Hernando, Daniyar Turmukhambetov, Javier Civera, Oisin Mac Aodha, Gabriel Brostow, and Jamie Watson. Mvsanywhere: Zero-shot multi-view stereo. In Proceedings of the Computer Vision and Pattern Recognition Conference, pages 11493–11504, 2025. 3

  13. [13]

    Anysplat: Feed-forward 3d gaussian splatting from unconstrained views.arXiv preprint arXiv:2505.23716,

    Lihan Jiang, Yucheng Mao, Linning Xu, Tao Lu, Kerui Ren, Yichen Jin, Xudong Xu, Mulin Yu, Jiangmiao Pang, Feng Zhao, et al. Anysplat: Feed-forward 3d gaussian splatting from unconstrained views.arXiv preprint arXiv:2505.23716,

  14. [14]

    Splatam: Splat, track & map 3d gaussians for dense rgb-d slam

    Nikhil Keetha, Jay Karhade, Krishna Murthy Jatavallabhula, Gengshan Yang, Sebastian Scherer, Deva Ramanan, and Jonathon Luiten. Splatam: Splat, track & map 3d gaussians for dense rgb-d slam. InProceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition, 2024. 2

  15. [15]

    Kerbl, Georgios Kopanas, Thomas Leimkuehler, and G

    B. Kerbl, Georgios Kopanas, Thomas Leimkuehler, and G. Drettakis. 3d gaussian splatting for real-time radiance field rendering.ACM Transactions on Graphics, 2023. 2, 3, 4, 5

  16. [16]

    Pointcnn: Convolution on x-transformed points.Advances in neural information processing systems, 31, 2018

    Yangyan Li, Rui Bu, Mingchao Sun, Wei Wu, Xinhan Di, and Baoquan Chen. Pointcnn: Convolution on x-transformed points.Advances in neural information processing systems, 31, 2018. 3

  17. [17]

    Dl3dv-10k: A large-scale scene dataset for deep learning-based 3d vision

    Lu Ling, Yichen Sheng, Zhi Tu, Wentian Zhao, Cheng Xin, Kun Wan, Lantao Yu, Qianyu Guo, Zixun Yu, Yawen Lu, et al. Dl3dv-10k: A large-scale scene dataset for deep learning-based 3d vision. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 22160–22169, 2024. 1, 5

  18. [18]

    Theory of edge detection

    David Marr and Ellen Hildreth. Theory of edge detection. Proceedings of the Royal Society of London. Series B. Bio- logical Sciences, 207(1167):187–217, 1980. 8

  19. [19]

    Srinivasan, Matthew Tancik, Jonathan T

    Ben Mildenhall, Pratul P. Srinivasan, Matthew Tancik, Jonathan T. Barron, Ravi Ramamoorthi, and Ren Ng. Nerf: Representing scenes as neural radiance fields for view syn- thesis, 2020. 2

  20. [20]

    Instant neural graphics primitives with a multires- olution hash encoding.ACM Trans

    Thomas M ¨uller, Alex Evans, Christoph Schied, and Alexan- der Keller. Instant neural graphics primitives with a multires- olution hash encoding.ACM Trans. Graph., 41(4):102:1– 102:15, 2022. 2

  21. [21]

    Pointnet: Deep learning on point sets for 3d classification and segmentation

    Charles R Qi, Hao Su, Kaichun Mo, and Leonidas J Guibas. Pointnet: Deep learning on point sets for 3d classification and segmentation. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 652–660,

  22. [22]

    Pointnet++: Deep hierarchical feature learning on point sets in a metric space.Advances in neural information processing systems, 30, 2017

    Charles Ruizhongtai Qi, Li Yi, Hao Su, and Leonidas J Guibas. Pointnet++: Deep hierarchical feature learning on point sets in a metric space.Advances in neural information processing systems, 30, 2017. 2

  23. [23]

    Pointnext: Revisiting pointnet++ with improved training and scaling strategies.Advances in neural informa- tion processing systems, 35:23192–23204, 2022

    Guocheng Qian, Yuchen Li, Houwen Peng, Jinjie Mai, Hasan Hammoud, Mohamed Elhoseiny, and Bernard Ghanem. Pointnext: Revisiting pointnet++ with improved training and scaling strategies.Advances in neural informa- tion processing systems, 35:23192–23204, 2022. 2

  24. [24]

    Advancing Extended Reality with 3D Gaussian Splatting: Innovations and Prospects

    Shi Qiu, Binzhu Xie, Qixuan Liu, and Pheng-Ann Heng. Advancing Extended Reality with 3D Gaussian Splatting: Innovations and Prospects . In2025 IEEE International Con- ference on Artificial Intelligence and eXtended and Virtual Reality (AIxVR), pages 203–208, Los Alamitos, CA, USA,

  25. [25]

    IEEE Computer Society. 2, 8 9

  26. [26]

    Entropy- based adaptive sampling

    Jaume Rigau, Miquel Feixas, and Mateu Sbert. Entropy- based adaptive sampling. InGraphics Interface, pages 79– 87, 2003. 3

  27. [27]

    A mathematical theory of communi- cation.The Bell system technical journal, 27(3):379–423,

    Claude E Shannon. A mathematical theory of communi- cation.The Bell system technical journal, 27(3):379–423,

  28. [28]

    Julian Straub, Thomas Whelan, Lingni Ma, Yufan Chen, Erik Wijmans, Simon Green, Jakob J. Engel, Raul Mur-Artal, Carl Ren, Shobhit Verma, Anton Clarkson, Mingfei Yan, Brian Budge, Yajie Yan, Xiaqing Pan, June Yon, Yuyang Zou, Kimberly Leon, Nigel Carter, Jesus Briales, Tyler Gillingham, Elias Mueggler, Luis Pesqueira, Manolis Savva, Dhruv Batra, Hauke M. S...

  29. [29]

    Kpconv: Flexible and deformable convolution for point clouds

    Hugues Thomas, Charles R Qi, Jean-Emmanuel Deschaud, Beatriz Marcotegui, Franc ¸ois Goulette, and Leonidas J Guibas. Kpconv: Flexible and deformable convolution for point clouds. InProceedings of the IEEE/CVF international conference on computer vision, pages 6411–6420, 2019. 3

  30. [30]

    How nerfs and 3d gaussian splatting are reshaping slam: A survey

    F Tosi, Y Zhang, Z Gong, E Sandstr ¨om, S Mattoccia, MR Oswald, and M Poggi. How nerfs and 3d gaussian splatting are reshaping slam: A survey. arxiv 2024.arXiv preprint arXiv:2402.13255. 2, 8

  31. [31]

    Attention is all you need.Advances in neural information processing systems, 30, 2017

    Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszko- reit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need.Advances in neural information processing systems, 30, 2017. 3

  32. [32]

    Learning-based multi-view stereo: A survey.arXiv preprint arXiv:2408.15235, 2024

    Fangjinhua Wang, Qingtian Zhu, Di Chang, Quankai Gao, Junlin Han, Tong Zhang, Richard Hartley, and Marc Polle- feys. Learning-based multi-view stereo: A survey.arXiv preprint arXiv:2408.15235, 2024. 3

  33. [33]

    Vggt: Visual geometry grounded transformer

    Jianyuan Wang, Minghao Chen, Nikita Karaev, Andrea Vedaldi, Christian Rupprecht, and David Novotny. Vggt: Visual geometry grounded transformer. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025. 6

  34. [34]

    Chen, and Bohan Zhuang

    Weijie Wang, Yeqing Chen, Zeyu Zhang, Hengyu Liu, Haoxiao Wang, Zhiyuan Feng, Wenkang Qin, Zheng Zhu, Donny Y . Chen, and Bohan Zhuang. V olsplat: Rethinking feed-forward 3d gaussian splatting with voxel-aligned pre- diction, 2025. 2, 3

  35. [35]

    Sarma, Michael M

    Yue Wang, Yongbin Sun, Ziwei Liu, Sanjay E. Sarma, Michael M. Bronstein, and Justin M. Solomon. Dynamic graph cnn for learning on point clouds.ACM Transactions on Graphics (TOG), 2019. 2, 5

  36. [36]

    Dynamic graph cnn for learning on point clouds.ACM Transactions on Graphics (tog), 38(5):1–12, 2019

    Yue Wang, Yongbin Sun, Ziwei Liu, Sanjay E Sarma, Michael M Bronstein, and Justin M Solomon. Dynamic graph cnn for learning on point clouds.ACM Transactions on Graphics (tog), 38(5):1–12, 2019. 3

  37. [37]

    Image quality assessment: from error visibility to structural similarity.IEEE transactions on image processing, 13(4):600–612, 2004

    Zhou Wang, Alan C Bovik, Hamid R Sheikh, and Eero P Si- moncelli. Image quality assessment: from error visibility to structural similarity.IEEE transactions on image processing, 13(4):600–612, 2004. 6

  38. [38]

    Vings-mono: Visual-inertial gaus- sian splatting monocular slam in large scenes.IEEE Trans- actions on Robotics, pages 1–20, 2025

    Ke Wu, Zicheng Zhang, Muer Tie, Ziqing Ai, Zhongxue Gan, and Wenchao Ding. Vings-mono: Visual-inertial gaus- sian splatting monocular slam in large scenes.IEEE Trans- actions on Robotics, pages 1–20, 2025. 2

  39. [39]

    Segformer: Simple and efficient design for semantic segmentation with transform- ers.Advances in neural information processing systems, 34: 12077–12090, 2021

    Enze Xie, Wenhai Wang, Zhiding Yu, Anima Anandkumar, Jose M Alvarez, and Ping Luo. Segformer: Simple and efficient design for semantic segmentation with transform- ers.Advances in neural information processing systems, 34: 12077–12090, 2021. 3

  40. [40]

    Depthsplat: Connecting gaussian splatting and depth

    Haofei Xu, Songyou Peng, Fangjinhua Wang, Hermann Blum, Daniel Barath, Andreas Geiger, and Marc Pollefeys. Depthsplat: Connecting gaussian splatting and depth. In CVPR, 2025. 1, 2, 3, 4, 6

  41. [41]

    Gs-slam: Dense visual slam with 3d gaussian splatting

    Chi Yan, Delin Qu, Dan Xu, Bin Zhao, Zhigang Wang, Dong Wang, and Xuelong Li. Gs-slam: Dense visual slam with 3d gaussian splatting. InCVPR, 2024. 2

  42. [42]

    Fold- ingnet: Point cloud auto-encoder via deep grid deformation

    Yaoqing Yang, Chen Feng, Yiru Shen, and Dong Tian. Fold- ingnet: Point cloud auto-encoder via deep grid deformation. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 206–215, 2018. 2

  43. [43]

    Mvsnet: Depth inference for unstructured multi-view stereo

    Yao Yao, Zixin Luo, Shiwei Li, Tian Fang, and Long Quan. Mvsnet: Depth inference for unstructured multi-view stereo. European Conference on Computer Vision (ECCV), 2018. 3

  44. [44]

    Recurrent mvsnet for high-resolution multi- view stereo depth inference.Computer Vision and Pattern Recognition (CVPR), 2019

    Yao Yao, Zixin Luo, Shiwei Li, Tianwei Shen, Tian Fang, and Long Quan. Recurrent mvsnet for high-resolution multi- view stereo depth inference.Computer Vision and Pattern Recognition (CVPR), 2019. 3

  45. [45]

    The unreasonable effectiveness of deep features as a perceptual metric

    Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shecht- man, and Oliver Wang. The unreasonable effectiveness of deep features as a perceptual metric. InProceedings of the IEEE conference on computer vision and pattern recogni- tion, pages 586–595, 2018. 5, 6

  46. [46]

    Gaussian graph network: Learning efficient and generalizable gaussian representations from multi-view images.Neural Information Processing Systems, 2025

    Shengjun Zhang, Xin Fei, Fangfu Liu, Haixu Song, and Yueqi Duan. Gaussian graph network: Learning efficient and generalizable gaussian representations from multi-view images.Neural Information Processing Systems, 2025. 2, 3, 6

  47. [47]

    Point transformer

    Hengshuang Zhao, Li Jiang, Jiaya Jia, Philip HS Torr, and Vladlen Koltun. Point transformer. InProceedings of the IEEE/CVF international conference on computer vision, pages 16259–16268, 2021. 3, 5, 4

  48. [48]

    3d gaussian splatting in robotics: A survey,

    Siting Zhu, Guangming Wang, Xin Kong, Dezhi Kong, and Hesheng Wang. 3d gaussian splatting in robotics: A survey. arXiv preprint arXiv:2410.12262, 2024. 2, 8

  49. [49]

    Long-lrm: Long- sequence large reconstruction model for wide-coverage gaussian splats

    Chen Ziwen, Hao Tan, Kai Zhang, Sai Bi, Fujun Luan, Yi- cong Hong, Li Fuxin, and Zexiang Xu. Long-lrm: Long- sequence large reconstruction model for wide-coverage gaussian splats. InProceedings of the IEEE/CVF Interna- tional Conference on Computer Vision, 2025. 2, 3, 6 10 SparseSplat: Towards Applicable Feed-Forward 3D Gaussian Splatting with Pixel-Unali...

  50. [50]

    real-time

    Applicability to Downstream Tasks To demonstrate the practical value of SparseSplat, we eval- uate how it integrates into various downstream tasks. We categorize these tasks based on two different forms of “real-time” requirements:Reconstruction Real-time-ness, which underpins online mapping and robotic perception, andRendering Real-time-ness, which is in...

  51. [51]

    Runtime Breakdown We present a detailed runtime breakdown of individual components across varying Gaussian counts in Tab. 7. The latency of backbone inference and entropy-based sampling remains constant regardless of the sparsity level. In con- trast, the computational costs of the KNN query and the At- tention prediction head scale with the number of gen...

  52. [52]

    3.3, our 3D-Local Attribute Prediction framework employs a lightweight predictor to regress Gaus- sian attributes based on K-nearest neighbors in 3D space

    Structure of Different Heads As described in Sec. 3.3, our 3D-Local Attribute Prediction framework employs a lightweight predictor to regress Gaus- sian attributes based on K-nearest neighbors in 3D space. We explored four different prediction head architectures, all sharing the same dual projection strategy for processing ge- ometric and image features b...