VkSplat: High-Performance 3DGS Training in Vulkan Compute
Pith reviewed 2026-05-09 20:03 UTC · model grok-4.3
The pith
VkSplat trains 3D Gaussian Splatting models entirely in Vulkan compute at 3.3 times the speed and with 33 percent less VRAM than CUDA baselines.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
VkSplat is the first fully-Vulkan-based 3DGS training pipeline that reaches state-of-the-art performance. It achieves 3.3 times the training speed and 33 percent VRAM reduction compared with a CUDA plus PyTorch baseline, preserves visual quality, and runs correctly on GPUs from multiple vendors.
What carries the argument
The full 3D Gaussian Splatting training loop implemented as optimized Vulkan compute shaders.
If this is right
- 3DGS training no longer requires NVIDIA hardware to reach high throughput.
- Lower VRAM demand allows larger scenes or higher-resolution training on the same cards.
- Existing CUDA pipelines can be replaced without changes to output quality or downstream use.
- Software projects gain simpler cross-vendor deployment for 3D reconstruction tasks.
Where Pith is reading between the lines
- The same Vulkan structure could be reused to accelerate other differentiable rendering methods beyond 3DGS.
- On-device or mobile implementations might become feasible once the compute shader path is established.
- Game engines could incorporate live 3DGS training for dynamic scene capture without vendor-specific code.
Load-bearing premise
The reported speed and memory gains hold for all common scenes and GPU models without introducing any quality loss under different viewing conditions.
What would settle it
A benchmark run on an untested GPU vendor or a more complex scene that either falls below 3 times the baseline speed or shows a measurable drop in PSNR or SSIM would disprove the central performance claim.
Figures
read the original abstract
We present VkSplat, a high-performance, cross-vendor 3D Gaussian Splatting (3DGS) training pipeline implemented fully in Vulkan compute, addressing performance and compatibility limitation of existing training pipelines. With various optimizations, we achieve $3.3\times$ speed and $33\%$ VRAM reduction over CUDA+PyTorch baseline, maintaining quality, and demonstrating compatibility across GPU vendors. To the best of our knowledge, this is the first fully-Vulkan-based 3DGS training pipeline that achieves state-of-the-art performance. Code: \href{https://github.com/harry7557558/vksplat}{https://github.com/harry7557558/vksplat}
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents VkSplat, a fully Vulkan-compute implementation of 3D Gaussian Splatting (3DGS) training. It claims that a combination of kernel fusion, memory-layout optimizations, and compute-shader scheduling yields a 3.3× training speedup and 33 % VRAM reduction relative to a CUDA+PyTorch baseline while preserving visual quality, and that the pipeline is the first cross-vendor, fully-Vulkan 3DGS trainer to reach state-of-the-art performance.
Significance. If the headline performance numbers and quality equivalence are shown to hold across diverse scenes, camera distributions, and non-NVIDIA GPUs, the work would be a meaningful step toward vendor-agnostic, high-performance 3DGS training. Releasing the code further strengthens the contribution by enabling direct reproduction and extension.
major comments (3)
- [§4.2, Table 2] §4.2 and Table 2: the 3.3× speedup and 33 % VRAM figures are reported as single scalar values without error bars, multiple random seeds, or per-scene breakdowns; it is therefore impossible to determine whether the gains are statistically robust or scene-dependent.
- [§5.1] §5.1: quality equivalence is asserted via aggregate PSNR/SSIM numbers, yet no per-scene tables, novel-view failure cases, or high-frequency detail comparisons are supplied; this leaves open the possibility that the reported optimizations trade off detail under certain viewing conditions.
- [§3.3] §3.3: the description of the fused splat-and-sort kernel does not quantify the arithmetic intensity or register pressure after fusion, making it difficult to verify that the claimed performance improvement is attributable to the fusion rather than to other unstated factors.
minor comments (3)
- The abstract states “maintaining quality” without defining the metric or the tolerance; a short sentence specifying the exact PSNR/SSIM thresholds used would improve clarity.
- Figure 3 caption refers to “various scenes” but does not list the scene names or their properties; adding the scene identifiers would aid reproducibility.
- The GitHub link in the abstract is given without a commit hash or release tag; pinning the code version would strengthen the reproducibility claim.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. We address each major comment below and have revised the manuscript to provide additional empirical details and technical clarifications where feasible.
read point-by-point responses
-
Referee: [§4.2, Table 2] §4.2 and Table 2: the 3.3× speedup and 33 % VRAM figures are reported as single scalar values without error bars, multiple random seeds, or per-scene breakdowns; it is therefore impossible to determine whether the gains are statistically robust or scene-dependent.
Authors: We agree that single aggregate values limit assessment of robustness. In the revised manuscript we will expand Table 2 and §4.2 with per-scene speedups and VRAM reductions for all evaluated scenes from the standard benchmarks, plus averages and standard deviations over three independent runs with different random seeds. These additions will demonstrate that the reported gains are consistent rather than scene-specific. revision: yes
-
Referee: [§5.1] §5.1: quality equivalence is asserted via aggregate PSNR/SSIM numbers, yet no per-scene tables, novel-view failure cases, or high-frequency detail comparisons are supplied; this leaves open the possibility that the reported optimizations trade off detail under certain viewing conditions.
Authors: We acknowledge the value of granular quality validation. The revised §5.1 will add a per-scene PSNR/SSIM table, supplementary qualitative figures comparing high-frequency details in novel views, and a brief discussion of any observed limitations. Current results on the Mip-NeRF 360 and Tanks & Temples sets show no meaningful degradation, but the expanded presentation will make equivalence more transparent. revision: yes
-
Referee: [§3.3] §3.3: the description of the fused splat-and-sort kernel does not quantify the arithmetic intensity or register pressure after fusion, making it difficult to verify that the claimed performance improvement is attributable to the fusion rather than to other unstated factors.
Authors: We will revise §3.3 to include quantitative metrics for the fused kernel: arithmetic intensity (FLOPs per byte transferred) before and after fusion, together with register-pressure estimates obtained from the Vulkan shader compiler and occupancy analysis. This will substantiate that the observed gains arise primarily from the fusion's reduction in memory traffic. revision: yes
Circularity Check
No circularity: empirical performance claims with no derivation chain
full rationale
The paper reports an engineering implementation of 3DGS training fully in Vulkan compute shaders, with kernel fusions, memory layout changes, and scheduling optimizations. Its headline results (3.3× speed, 33% VRAM reduction, maintained quality, cross-vendor compatibility) are direct benchmark measurements against a CUDA+PyTorch baseline on selected scenes. No equations, fitted parameters, self-definitional relations, or load-bearing self-citations appear in the provided text; the claims rest on external runtime measurements rather than any internal reduction to prior results or ansatzes. This is the expected non-circular outcome for a systems-performance paper.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
3D Gaussian Splatting for Real-Time Radiance Field Rendering , journal =
Kerbl, Bernhard and Kopanas, Georgios and Leimk. 3D Gaussian Splatting for Real-Time Radiance Field Rendering , journal =. 2023 , url =
work page 2023
-
[2]
Journal of Machine Learning Research , volume=
gsplat: An open-source library for Gaussian splatting , author=. Journal of Machine Learning Research , volume=
-
[3]
ACM Transactions on Graphics , number =
Radl, Lukas and Steiner, Michael and Parger, Mathias and Weinrauch, Alexander and Kerbl, Bernhard and Steinberger, Markus , title =. ACM Transactions on Graphics , number =
-
[4]
Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR) , month =
Hanson, Alex and Tu, Allen and Lin, Geng and Singla, Vasu and Zwicker, Matthias and Goldstein, Tom , title =. Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR) , month =. 2025 , pages =
work page 2025
-
[5]
LiteGS: A High-performance Framework to Train 3DGS in Subminutes via System and Algorithm Codesign , author=. 2025 , eprint=
work page 2025
-
[6]
Mallick, Saswat Subhajyoti and Goel, Rahul and Kerbl, Bernhard and Steinberger, Markus and Carrasco, Francisco Vicente and De La Torre, Fernando , title =. 2024 , isbn =. doi:10.1145/3680528.3687694 , booktitle =
-
[7]
arXiv preprint arXiv:2505.18764 , year=
Efficient Differentiable Hardware Rasterization for 3D Gaussian Splatting , author=. arXiv preprint arXiv:2505.18764 , year=
-
[8]
Moment-Based 3D Gaussian Splatting: Resolving Volumetric Occlusion with Order-Independent Transmittance , author=. 2025 , eprint=
work page 2025
- [9]
-
[10]
TC-GS: A Faster Gaussian Splatting Module Utilizing Tensor Cores , author=. 2025 , eprint=
work page 2025
-
[11]
Advances in Neural Information Processing Systems (NeurIPS) , year =
3D Gaussian Splatting as Markov Chain Monte Carlo , author =. Advances in Neural Information Processing Systems (NeurIPS) , year =
-
[12]
Mip-NeRF 360: Unbounded Anti-Aliased Neural Radiance Fields , author=. CVPR , year=
-
[13]
Fisheye-GS: Lightweight and Extensible Gaussian Splatting Module for Fisheye Cameras , author=. 2024 , eprint=
work page 2024
-
[14]
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , month =
Yu, Zehao and Chen, Anpei and Huang, Binbin and Sattler, Torsten and Geiger, Andreas , title =. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , month =. 2024 , pages =
work page 2024
- [15]
-
[16]
arXiv preprint arXiv:2511.04283 , year=
FastGS: Training 3D Gaussian Splatting in 100 Seconds , author=. arXiv preprint arXiv:2511.04283 , year=
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.