arxiv: 2604.03069 · v1 · submitted 2026-04-03 · 💻 cs.CV

Recognition: no theorem link

SparseSplat: Towards Applicable Feed-Forward 3D Gaussian Splatting with Pixel-Unaligned Prediction

Zicheng Zhang , Xiangting Meng , Ke Wu , Wenchao Ding

Authors on Pith no claims yet

Pith reviewed 2026-05-13 19:57 UTC · model grok-4.3

classification 💻 cs.CV

keywords 3D Gaussian Splattingfeed-forward predictionsparse representationentropy samplingscene reconstructionrendering qualitypoint cloud networkadaptive density

0 comments

The pith

SparseSplat generates compact 3D Gaussian maps that deliver state-of-the-art rendering quality using only 22 percent of the usual primitives.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a feed-forward approach to 3D Gaussian Splatting that varies the number and size of Gaussians according to local scene complexity instead of spreading them uniformly. It introduces entropy-based sampling to place larger, fewer Gaussians in plain areas and smaller, denser ones where detail is high, plus a dedicated point cloud network that extracts context and predicts attributes directly. This produces much smaller representations than prior feed-forward methods while keeping or improving image quality. A reader would care because the resulting maps use far less memory and compute, making them easier to store, transmit, and use in downstream tasks such as reconstruction or robotics.

Core claim

SparseSplat is the first feed-forward 3DGS model that adaptively adjusts Gaussian density according to scene structure and information richness of local regions. It does so through entropy-based probabilistic sampling, which creates large sparse Gaussians in textureless areas and small dense Gaussians in rich regions, together with a specialized point cloud network that encodes local context and decodes it into 3DGS attributes to fix the receptive-field mismatch with standard optimization pipelines.

What carries the argument

Entropy-based probabilistic sampling paired with a specialized point cloud network that encodes local context and predicts 3DGS attributes, allowing density to vary with scene structure.

If this is right

Downstream reconstruction tasks can now use the generated maps directly because they are no longer uniformly dense and redundant.
Memory and storage requirements for 3D scene representations drop sharply while image fidelity stays high.
Reasonable quality is retained even when the map is reduced to 1.5 percent of the Gaussians produced by prior uniform methods.
The approach removes the need for post-hoc pruning steps that current feed-forward pipelines require.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same entropy-driven placement could be tested on dynamic scenes to see whether temporal consistency emerges without extra regularization.
Because the maps are already sparse, they may integrate more easily with existing compression or level-of-detail pipelines for large environments.
The point cloud network design suggests a route to replace the full 3DGS optimization loop with a single forward pass in other primitive-based representations.

Load-bearing premise

That entropy reliably measures information richness and scene structure and that the point cloud network fully compensates for the receptive-field difference between feed-forward prediction and standard 3DGS optimization.

What would settle it

A test scene in which entropy sampling visibly drops rendering quality below the claimed levels even at 22 percent density, or produces artifacts traceable to mismatched receptive fields in the point cloud network.

Figures

Figures reproduced from arXiv: 2604.03069 by Ke Wu, Wenchao Ding, Xiangting Meng, Zicheng Zhang.

**Figure 2.** Figure 2: Overall Pipeline of SparseSplat. Our method begins with using a frozen backbone [39] to generate feature maps and depth maps from multi-view posed images. Next, in the Adaptive Primitive Sampling stage, the entropy maps are calculated and transformed into probability maps to perform sampling, resulting in sparse 2D pixels. These pixels are then back-projected into 3D space using the predicted depth to form… view at source ↗

**Figure 3.** Figure 3: The Locality of Classic 3DGS Optimization. In this example, three 3D Gaussian primitives are splatted onto the 2D image plane. Primitive gc covers two pixels: one covered exclusively by gc, and another accumulating contributions from all three primitives. During backpropagation, gradients propagate to gc through both pixels. Notably, ga and gb modulate the gradient flow at the shared pixel by affecting th… view at source ↗

**Figure 4.** Figure 4: Rendering quality comparisons on DL3DV. Our model matches the SOTA rendering quality of DepthSplat with only 150k Gaussians (vs. 688k). Under sparse settings (40k and 10k), our method maintains structural integrity and shows minor progressive blurring [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 5.** Figure 5: Additional qualitative comparisons. 1 [PITH_FULL_IMAGE:figures/full_fig_p011_5.png] view at source ↗

**Figure 6.** Figure 6: Rendering Efficiency vs. Gaussian Count. X-axis log-scaled. We evaluate the rendering frame rate (FPS) across scenes with varying degrees of sparsity. While pixel-aligned baselines (typified by ∼688k Gaussians) operate at 71.9 FPS, our sparse-by-design approach significantly accelerates rendering. Our 150k model achieves ∼3× speedup (208.6 FPS), and extremely sparse settings (10k–40k) unlock rates suitabl… view at source ↗

read the original abstract

Recent progress in feed-forward 3D Gaussian Splatting (3DGS) has notably improved rendering quality. However, the spatially uniform and highly redundant 3DGS map generated by previous feed-forward 3DGS methods limits their integration into downstream reconstruction tasks. We propose SparseSplat, the first feed-forward 3DGS model that adaptively adjusts Gaussian density according to scene structure and information richness of local regions, yielding highly compact 3DGS maps. To achieve this, we propose entropy-based probabilistic sampling, generating large, sparse Gaussians in textureless areas and assigning small, dense Gaussians to regions with rich information. Additionally, we designed a specialized point cloud network that efficiently encodes local context and decodes it into 3DGS attributes, addressing the receptive field mismatch between the general 3DGS optimization pipeline and feed-forward models. Extensive experimental results demonstrate that SparseSplat can achieve state-of-the-art rendering quality with only 22% of the Gaussians and maintain reasonable rendering quality with only 1.5% of the Gaussians. Project page: https://victkk.github.io/SparseSplat-page/.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

SparseSplat's entropy-driven sampling for compact feed-forward 3DGS is a practical step but rests on an unproven link between 2D entropy and actual 3D detail needs.

read the letter

The main takeaway is that this paper tries to fix the redundancy problem in feed-forward 3D Gaussian Splatting by making Gaussian density adaptive instead of uniform. They use entropy-based probabilistic sampling to place larger, sparser Gaussians in low-information regions and smaller, denser ones where the scene has more structure, plus a point-cloud network to handle attribute prediction from images without the usual receptive-field issues in standard 3DGS pipelines. The headline numbers are that they reach prior SOTA quality with 22% of the Gaussians and still get reasonable results at 1.5% density. That direction matters for anyone who wants lighter 3D maps for downstream tasks like reconstruction or rendering on constrained hardware. What they do well is identify a real bottleneck in existing feed-forward methods and propose a direct architectural fix rather than post-processing. The specialized decoder looks like a reasonable attempt to bridge the gap between image-based prediction and 3DGS optimization. The soft spots are around validation of the sampling itself. The claim that entropy reliably flags information richness assumes it captures 3D geometric importance, but if the entropy comes mainly from 2D patches it could under-sample thin structures or view-dependent effects without anyone noticing in aggregate metrics. The paper reports extensive experiments, yet without seeing the ablations on sampling variants or per-region error analysis it is hard to judge whether the quality holds where it counts. This is aimed at researchers already working on efficient novel-view synthesis who need smaller outputs without losing too much fidelity. A reader who cares about practical deployment would find the compression ratios interesting even if the method needs tuning. I would send it to peer review because the core idea is testable and the efficiency angle is relevant, though the authors will likely need to add clearer evidence that the entropy step does not silently degrade geometry in tricky areas.

Referee Report

2 major / 2 minor

Summary. SparseSplat proposes the first feed-forward 3D Gaussian Splatting model that adaptively adjusts Gaussian density according to scene structure and information richness using entropy-based probabilistic sampling (large sparse Gaussians in textureless regions, small dense ones in rich regions) together with a specialized point cloud network to resolve receptive-field mismatch between general 3DGS pipelines and feed-forward prediction. The central claim is that this yields highly compact maps while achieving state-of-the-art rendering quality with only 22% of the Gaussians and reasonable quality with only 1.5% of the Gaussians.

Significance. If the efficiency and quality claims hold under rigorous validation, the work would meaningfully advance practical feed-forward 3DGS by reducing spatial redundancy and producing compact representations better suited to downstream reconstruction tasks, directly addressing a core limitation of prior uniform-density feed-forward methods.

major comments (2)

[Abstract] Abstract: the headline performance numbers (SOTA quality at 22% Gaussians, usable at 1.5%) rest on the unverified assumption that 2D entropy-based probabilistic sampling reliably identifies information richness and 3D structure; without explicit 3D consistency checks or geometric importance, thin structures or view-dependent effects risk under-sampling and silent quality degradation.
[Method] Method description: the specialized point cloud network is asserted to fix receptive-field mismatch, yet its benefit is conditional on the sampling already placing Gaussians correctly; the two components are not independently validated, leaving the load-bearing contribution of each unclear for the reported efficiency gains.

minor comments (2)

[Abstract] Abstract: specify the exact datasets, baselines, and quantitative metrics (PSNR/SSIM/LPIPS) supporting the 22% and 1.5% claims.
[Experiments] Experiments: include ablation isolating entropy sampling from the point-cloud network and test cases with thin geometry or specular surfaces to probe the sampling assumption.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback and detailed comments on our manuscript. We address each major comment point by point below, clarifying our approach and outlining planned revisions to strengthen the presentation of results.

read point-by-point responses

Referee: [Abstract] Abstract: the headline performance numbers (SOTA quality at 22% Gaussians, usable at 1.5%) rest on the unverified assumption that 2D entropy-based probabilistic sampling reliably identifies information richness and 3D structure; without explicit 3D consistency checks or geometric importance, thin structures or view-dependent effects risk under-sampling and silent quality degradation.

Authors: We appreciate the referee's point on the need for stronger validation of the sampling strategy. The entropy computation is performed in 2D image space to estimate local information richness, which we then use to guide non-uniform 3D Gaussian placement; our experiments across multiple datasets demonstrate that this yields compact maps without visible degradation in rendering quality, including on scenes containing thin structures. To directly address the concern, we will add in the revision: (i) qualitative visualizations overlaying sampled Gaussians on scene geometry, (ii) quantitative comparisons of Gaussian density versus local depth variance, and (iii) targeted evaluation on thin-structure subsets. These additions will make the link between 2D entropy and 3D structure explicit. revision: yes
Referee: [Method] Method description: the specialized point cloud network is asserted to fix receptive-field mismatch, yet its benefit is conditional on the sampling already placing Gaussians correctly; the two components are not independently validated, leaving the load-bearing contribution of each unclear for the reported efficiency gains.

Authors: We agree that the individual contributions should be isolated for clarity. The point-cloud network is specifically designed to process the irregularly distributed points produced by entropy sampling (using local neighborhood aggregation that respects the non-uniform density), which standard 2D CNN backbones cannot do efficiently. In the revised manuscript we will include a dedicated ablation that keeps the entropy sampling fixed and replaces the specialized network with a baseline (standard PointNet-style encoder followed by per-point MLP decoder). The resulting performance drop will quantify the network's role in handling receptive-field mismatch and enabling the reported efficiency-quality trade-off. revision: yes

Circularity Check

0 steps flagged

No significant circularity; claims rest on architectural novelty and experimental validation

full rationale

The paper introduces entropy-based probabilistic sampling and a specialized point-cloud network as core innovations for adaptive Gaussian density in feed-forward 3DGS. No equations, derivations, or self-citations are shown that reduce performance metrics (e.g., quality at 22% or 1.5% Gaussians) to quantities defined by fitted parameters or prior self-referential results. The approach is presented as an empirical architectural advance with external benchmarking, making the derivation chain self-contained against independent validation.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that local entropy is a sufficient proxy for information richness and on the unstated design choice that a point-cloud network can be trained to output valid 3DGS attributes at variable densities.

axioms (1)

domain assumption Local entropy computed from input images accurately reflects scene information richness for deciding Gaussian density.
Invoked to justify the probabilistic sampling rule that places sparse large Gaussians in low-entropy regions.

pith-pipeline@v0.9.0 · 5515 in / 1189 out tokens · 56844 ms · 2026-05-13T19:57:41.256796+00:00 · methodology

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

PointForward: Feedforward Driving Reconstruction through Point-Aligned Representations
cs.CV 2026-05 unverdicted novelty 7.0

PointForward uses sparse world-space 3D queries and scene graphs to deliver consistent single-pass reconstruction of dynamic driving scenes via point-aligned representations.
Genie Sim PanoRecon: Fast Immersive Scene Generation from Single-View Panorama
cs.RO 2026-04 unverdicted novelty 4.0

A feed-forward Gaussian-splatting system reconstructs photo-realistic 3D scenes from single-view panoramas in seconds via cube-map decomposition and depth-aware fusion for robotic simulation use.

Reference graph

Works this paper leans on

52 extracted references · 52 canonical work pages · cited by 2 Pith papers · 1 internal anchor

[1]

sibr: A system for image based rendering, 2020

Sebastien Bonopera, Peter Hedman, Jerome Esnault, Sid- dhant Prakash, Simon Rodriguez, Theo Thonat, Mehdi Be- nadel, Gaurav Chaurasia, Julien Philip, and George Dret- takis. sibr: A system for image based rendering, 2020. 1

work page 2020
[2]

pixelsplat: 3d gaussian splats from image pairs for scalable generalizable 3d reconstruction

David Charatan, Sizhe Li, Andrea Tagliasacchi, and Vincent Sitzmann. pixelsplat: 3d gaussian splats from image pairs for scalable generalizable 3d reconstruction. InCVPR, 2024. 2, 3

work page 2024
[3]

Mvsplat: Efficient 3d gaussian splatting from sparse multi-view images.arXiv preprint arXiv:2403.14627, 2024

Yuedong Chen, Haofei Xu, Chuanxia Zheng, Bohan Zhuang, Marc Pollefeys, Andreas Geiger, Tat-Jen Cham, and Jianfei Cai. Mvsplat: Efficient 3d gaussian splatting from sparse multi-view images.arXiv preprint arXiv:2403.14627, 2024. 2, 3, 6

work page arXiv 2024
[4]

Mvsplat360: Feed-forward 360 scene synthesis from sparse views

Yuedong Chen, Chuanxia Zheng, Haofei Xu, Bohan Zhuang, Andrea Vedaldi, Tat-Jen Cham, and Jianfei Cai. Mvsplat360: Feed-forward 360 scene synthesis from sparse views. 2024

work page 2024
[5]

Splatformer: Point trans- former for robust 3d gaussian splatting

Yutong Chen, Marko Mihajlovic, Xiyi Chen, Yiming Wang, Sergey Prokudin, and Siyu Tang. Splatformer: Point trans- former for robust 3d gaussian splatting. InInternational Conference on Learning Representations (ICLR), 2025. 3

work page 2025
[6]

Cover and P

T. Cover and P. Hart. Nearest neighbor pattern classifica- tion.IEEE Transactions on Information Theory, 13(1):21– 27, 1967. 5

work page 1967
[7]

The faiss library

Matthijs Douze, Alexandr Guzhva, Chengqi Deng, Jeff Johnson, Gergely Szilvasy, Pierre-Emmanuel Mazar´e, Maria Lomeli, Lucas Hosseini, and Herv´e J´egou. The faiss library. IEEE Transactions on Big Data, 2025. 5

work page 2025
[8]

Plenoxels: Radiance fields without neural networks

Sara Fridovich-Keil, Alex Yu, Matthew Tancik, Qinhong Chen, Benjamin Recht, and Angjoo Kanazawa. Plenoxels: Radiance fields without neural networks. InCVPR, 2022. 2

work page 2022
[9]

Cascade cost volume for high-resolution multi-view stereo and stereo matching

Xiaodong Gu, Zhiwen Fan, Siyu Zhu, Zuozhuo Dai, Feitong Tan, and Ping Tan. Cascade cost volume for high-resolution multi-view stereo and stereo matching. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2495–2504, 2020. 3

work page 2020
[10]

Pct: Point cloud transformer.Computational visual media, 7(2):187–199,

Meng-Hao Guo, Jun-Xiong Cai, Zheng-Ning Liu, Tai-Jiang Mu, Ralph R Martin, and Shi-Min Hu. Pct: Point cloud transformer.Computational visual media, 7(2):187–199,

work page
[11]

Statistical and structural approaches to texture.Proceedings of the IEEE, 67(5):786–804, 1979

Robert M Haralick. Statistical and structural approaches to texture.Proceedings of the IEEE, 67(5):786–804, 1979. 3

work page 1979
[12]

Mvsanywhere: Zero-shot multi-view stereo

Sergio Izquierdo, Mohamed Sayed, Michael Firman, Guillermo Garcia-Hernando, Daniyar Turmukhambetov, Javier Civera, Oisin Mac Aodha, Gabriel Brostow, and Jamie Watson. Mvsanywhere: Zero-shot multi-view stereo. In Proceedings of the Computer Vision and Pattern Recognition Conference, pages 11493–11504, 2025. 3

work page 2025
[13]

Anysplat: Feed-forward 3d gaussian splatting from unconstrained views.arXiv preprint arXiv:2505.23716,

Lihan Jiang, Yucheng Mao, Linning Xu, Tao Lu, Kerui Ren, Yichen Jin, Xudong Xu, Mulin Yu, Jiangmiao Pang, Feng Zhao, et al. Anysplat: Feed-forward 3d gaussian splatting from unconstrained views.arXiv preprint arXiv:2505.23716,

work page arXiv
[14]

Splatam: Splat, track & map 3d gaussians for dense rgb-d slam

Nikhil Keetha, Jay Karhade, Krishna Murthy Jatavallabhula, Gengshan Yang, Sebastian Scherer, Deva Ramanan, and Jonathon Luiten. Splatam: Splat, track & map 3d gaussians for dense rgb-d slam. InProceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition, 2024. 2

work page 2024
[15]

Kerbl, Georgios Kopanas, Thomas Leimkuehler, and G

B. Kerbl, Georgios Kopanas, Thomas Leimkuehler, and G. Drettakis. 3d gaussian splatting for real-time radiance field rendering.ACM Transactions on Graphics, 2023. 2, 3, 4, 5

work page 2023
[16]

Pointcnn: Convolution on x-transformed points.Advances in neural information processing systems, 31, 2018

Yangyan Li, Rui Bu, Mingchao Sun, Wei Wu, Xinhan Di, and Baoquan Chen. Pointcnn: Convolution on x-transformed points.Advances in neural information processing systems, 31, 2018. 3

work page 2018
[17]

Dl3dv-10k: A large-scale scene dataset for deep learning-based 3d vision

Lu Ling, Yichen Sheng, Zhi Tu, Wentian Zhao, Cheng Xin, Kun Wan, Lantao Yu, Qianyu Guo, Zixun Yu, Yawen Lu, et al. Dl3dv-10k: A large-scale scene dataset for deep learning-based 3d vision. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 22160–22169, 2024. 1, 5

work page 2024
[18]

Theory of edge detection

David Marr and Ellen Hildreth. Theory of edge detection. Proceedings of the Royal Society of London. Series B. Bio- logical Sciences, 207(1167):187–217, 1980. 8

work page 1980
[19]

Srinivasan, Matthew Tancik, Jonathan T

Ben Mildenhall, Pratul P. Srinivasan, Matthew Tancik, Jonathan T. Barron, Ravi Ramamoorthi, and Ren Ng. Nerf: Representing scenes as neural radiance fields for view syn- thesis, 2020. 2

work page 2020
[20]

Instant neural graphics primitives with a multires- olution hash encoding.ACM Trans

Thomas M ¨uller, Alex Evans, Christoph Schied, and Alexan- der Keller. Instant neural graphics primitives with a multires- olution hash encoding.ACM Trans. Graph., 41(4):102:1– 102:15, 2022. 2

work page 2022
[21]

Pointnet: Deep learning on point sets for 3d classification and segmentation

Charles R Qi, Hao Su, Kaichun Mo, and Leonidas J Guibas. Pointnet: Deep learning on point sets for 3d classification and segmentation. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 652–660,

work page
[22]

Pointnet++: Deep hierarchical feature learning on point sets in a metric space.Advances in neural information processing systems, 30, 2017

Charles Ruizhongtai Qi, Li Yi, Hao Su, and Leonidas J Guibas. Pointnet++: Deep hierarchical feature learning on point sets in a metric space.Advances in neural information processing systems, 30, 2017. 2

work page 2017
[23]

Pointnext: Revisiting pointnet++ with improved training and scaling strategies.Advances in neural informa- tion processing systems, 35:23192–23204, 2022

Guocheng Qian, Yuchen Li, Houwen Peng, Jinjie Mai, Hasan Hammoud, Mohamed Elhoseiny, and Bernard Ghanem. Pointnext: Revisiting pointnet++ with improved training and scaling strategies.Advances in neural informa- tion processing systems, 35:23192–23204, 2022. 2

work page 2022
[24]

Advancing Extended Reality with 3D Gaussian Splatting: Innovations and Prospects

Shi Qiu, Binzhu Xie, Qixuan Liu, and Pheng-Ann Heng. Advancing Extended Reality with 3D Gaussian Splatting: Innovations and Prospects . In2025 IEEE International Con- ference on Artificial Intelligence and eXtended and Virtual Reality (AIxVR), pages 203–208, Los Alamitos, CA, USA,

work page
[25]

IEEE Computer Society. 2, 8 9

work page
[26]

Entropy- based adaptive sampling

Jaume Rigau, Miquel Feixas, and Mateu Sbert. Entropy- based adaptive sampling. InGraphics Interface, pages 79– 87, 2003. 3

work page 2003
[27]

A mathematical theory of communi- cation.The Bell system technical journal, 27(3):379–423,

Claude E Shannon. A mathematical theory of communi- cation.The Bell system technical journal, 27(3):379–423,

work page
[28]

Julian Straub, Thomas Whelan, Lingni Ma, Yufan Chen, Erik Wijmans, Simon Green, Jakob J. Engel, Raul Mur-Artal, Carl Ren, Shobhit Verma, Anton Clarkson, Mingfei Yan, Brian Budge, Yajie Yan, Xiaqing Pan, June Yon, Yuyang Zou, Kimberly Leon, Nigel Carter, Jesus Briales, Tyler Gillingham, Elias Mueggler, Luis Pesqueira, Manolis Savva, Dhruv Batra, Hauke M. S...

work page internal anchor Pith review Pith/arXiv arXiv 1906
[29]

Kpconv: Flexible and deformable convolution for point clouds

Hugues Thomas, Charles R Qi, Jean-Emmanuel Deschaud, Beatriz Marcotegui, Franc ¸ois Goulette, and Leonidas J Guibas. Kpconv: Flexible and deformable convolution for point clouds. InProceedings of the IEEE/CVF international conference on computer vision, pages 6411–6420, 2019. 3

work page 2019
[30]

How nerfs and 3d gaussian splatting are reshaping slam: A survey

F Tosi, Y Zhang, Z Gong, E Sandstr ¨om, S Mattoccia, MR Oswald, and M Poggi. How nerfs and 3d gaussian splatting are reshaping slam: A survey. arxiv 2024.arXiv preprint arXiv:2402.13255. 2, 8

work page arXiv 2024
[31]

Attention is all you need.Advances in neural information processing systems, 30, 2017

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszko- reit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need.Advances in neural information processing systems, 30, 2017. 3

work page 2017
[32]

Learning-based multi-view stereo: A survey.arXiv preprint arXiv:2408.15235, 2024

Fangjinhua Wang, Qingtian Zhu, Di Chang, Quankai Gao, Junlin Han, Tong Zhang, Richard Hartley, and Marc Polle- feys. Learning-based multi-view stereo: A survey.arXiv preprint arXiv:2408.15235, 2024. 3

work page arXiv 2024
[33]

Vggt: Visual geometry grounded transformer

Jianyuan Wang, Minghao Chen, Nikita Karaev, Andrea Vedaldi, Christian Rupprecht, and David Novotny. Vggt: Visual geometry grounded transformer. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025. 6

work page 2025
[34]

Chen, and Bohan Zhuang

Weijie Wang, Yeqing Chen, Zeyu Zhang, Hengyu Liu, Haoxiao Wang, Zhiyuan Feng, Wenkang Qin, Zheng Zhu, Donny Y . Chen, and Bohan Zhuang. V olsplat: Rethinking feed-forward 3d gaussian splatting with voxel-aligned pre- diction, 2025. 2, 3

work page 2025
[35]

Sarma, Michael M

Yue Wang, Yongbin Sun, Ziwei Liu, Sanjay E. Sarma, Michael M. Bronstein, and Justin M. Solomon. Dynamic graph cnn for learning on point clouds.ACM Transactions on Graphics (TOG), 2019. 2, 5

work page 2019
[36]

Dynamic graph cnn for learning on point clouds.ACM Transactions on Graphics (tog), 38(5):1–12, 2019

Yue Wang, Yongbin Sun, Ziwei Liu, Sanjay E Sarma, Michael M Bronstein, and Justin M Solomon. Dynamic graph cnn for learning on point clouds.ACM Transactions on Graphics (tog), 38(5):1–12, 2019. 3

work page 2019
[37]

Image quality assessment: from error visibility to structural similarity.IEEE transactions on image processing, 13(4):600–612, 2004

Zhou Wang, Alan C Bovik, Hamid R Sheikh, and Eero P Si- moncelli. Image quality assessment: from error visibility to structural similarity.IEEE transactions on image processing, 13(4):600–612, 2004. 6

work page 2004
[38]

Vings-mono: Visual-inertial gaus- sian splatting monocular slam in large scenes.IEEE Trans- actions on Robotics, pages 1–20, 2025

Ke Wu, Zicheng Zhang, Muer Tie, Ziqing Ai, Zhongxue Gan, and Wenchao Ding. Vings-mono: Visual-inertial gaus- sian splatting monocular slam in large scenes.IEEE Trans- actions on Robotics, pages 1–20, 2025. 2

work page 2025
[39]

Segformer: Simple and efficient design for semantic segmentation with transform- ers.Advances in neural information processing systems, 34: 12077–12090, 2021

Enze Xie, Wenhai Wang, Zhiding Yu, Anima Anandkumar, Jose M Alvarez, and Ping Luo. Segformer: Simple and efficient design for semantic segmentation with transform- ers.Advances in neural information processing systems, 34: 12077–12090, 2021. 3

work page 2021
[40]

Depthsplat: Connecting gaussian splatting and depth

Haofei Xu, Songyou Peng, Fangjinhua Wang, Hermann Blum, Daniel Barath, Andreas Geiger, and Marc Pollefeys. Depthsplat: Connecting gaussian splatting and depth. In CVPR, 2025. 1, 2, 3, 4, 6

work page 2025
[41]

Gs-slam: Dense visual slam with 3d gaussian splatting

Chi Yan, Delin Qu, Dan Xu, Bin Zhao, Zhigang Wang, Dong Wang, and Xuelong Li. Gs-slam: Dense visual slam with 3d gaussian splatting. InCVPR, 2024. 2

work page 2024
[42]

Fold- ingnet: Point cloud auto-encoder via deep grid deformation

Yaoqing Yang, Chen Feng, Yiru Shen, and Dong Tian. Fold- ingnet: Point cloud auto-encoder via deep grid deformation. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 206–215, 2018. 2

work page 2018
[43]

Mvsnet: Depth inference for unstructured multi-view stereo

Yao Yao, Zixin Luo, Shiwei Li, Tian Fang, and Long Quan. Mvsnet: Depth inference for unstructured multi-view stereo. European Conference on Computer Vision (ECCV), 2018. 3

work page 2018
[44]

Recurrent mvsnet for high-resolution multi- view stereo depth inference.Computer Vision and Pattern Recognition (CVPR), 2019

Yao Yao, Zixin Luo, Shiwei Li, Tianwei Shen, Tian Fang, and Long Quan. Recurrent mvsnet for high-resolution multi- view stereo depth inference.Computer Vision and Pattern Recognition (CVPR), 2019. 3

work page 2019
[45]

The unreasonable effectiveness of deep features as a perceptual metric

Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shecht- man, and Oliver Wang. The unreasonable effectiveness of deep features as a perceptual metric. InProceedings of the IEEE conference on computer vision and pattern recogni- tion, pages 586–595, 2018. 5, 6

work page 2018
[46]

Gaussian graph network: Learning efficient and generalizable gaussian representations from multi-view images.Neural Information Processing Systems, 2025

Shengjun Zhang, Xin Fei, Fangfu Liu, Haixu Song, and Yueqi Duan. Gaussian graph network: Learning efficient and generalizable gaussian representations from multi-view images.Neural Information Processing Systems, 2025. 2, 3, 6

work page 2025
[47]

Point transformer

Hengshuang Zhao, Li Jiang, Jiaya Jia, Philip HS Torr, and Vladlen Koltun. Point transformer. InProceedings of the IEEE/CVF international conference on computer vision, pages 16259–16268, 2021. 3, 5, 4

work page 2021
[48]

3d gaussian splatting in robotics: A survey,

Siting Zhu, Guangming Wang, Xin Kong, Dezhi Kong, and Hesheng Wang. 3d gaussian splatting in robotics: A survey. arXiv preprint arXiv:2410.12262, 2024. 2, 8

work page arXiv 2024
[49]

Long-lrm: Long- sequence large reconstruction model for wide-coverage gaussian splats

Chen Ziwen, Hao Tan, Kai Zhang, Sai Bi, Fujun Luan, Yi- cong Hong, Li Fuxin, and Zexiang Xu. Long-lrm: Long- sequence large reconstruction model for wide-coverage gaussian splats. InProceedings of the IEEE/CVF Interna- tional Conference on Computer Vision, 2025. 2, 3, 6 10 SparseSplat: Towards Applicable Feed-Forward 3D Gaussian Splatting with Pixel-Unali...

work page 2025
[50]

real-time

Applicability to Downstream Tasks To demonstrate the practical value of SparseSplat, we eval- uate how it integrates into various downstream tasks. We categorize these tasks based on two different forms of “real-time” requirements:Reconstruction Real-time-ness, which underpins online mapping and robotic perception, andRendering Real-time-ness, which is in...

work page
[51]

Runtime Breakdown We present a detailed runtime breakdown of individual components across varying Gaussian counts in Tab. 7. The latency of backbone inference and entropy-based sampling remains constant regardless of the sparsity level. In con- trast, the computational costs of the KNN query and the At- tention prediction head scale with the number of gen...

work page
[52]

3.3, our 3D-Local Attribute Prediction framework employs a lightweight predictor to regress Gaus- sian attributes based on K-nearest neighbors in 3D space

Structure of Different Heads As described in Sec. 3.3, our 3D-Local Attribute Prediction framework employs a lightweight predictor to regress Gaus- sian attributes based on K-nearest neighbors in 3D space. We explored four different prediction head architectures, all sharing the same dual projection strategy for processing ge- ometric and image features b...

work page