Accelerating 3D Gaussian Splatting using Tensor Cores

Bo Yuan; Sheng Li; Xulong Tang; Yang Sui; Yue Dai; Yue Wu; Zhuoran Song

arxiv: 2605.17855 · v2 · pith:NQMLAYHHnew · submitted 2026-05-18 · 💻 cs.GR

Accelerating 3D Gaussian Splatting using Tensor Cores

Sheng Li , Yang Sui , Yue Wu , Zhuoran Song , Bo Yuan , Xulong Tang , Yue Dai This is my paper

Pith reviewed 2026-05-22 10:13 UTC · model grok-4.3

classification 💻 cs.GR

keywords 3D Gaussian SplattingTensor CoresrasterizationGPU accelerationneural renderingFP16real-time renderingmatrix operations

0 comments

The pith

Reformulating 3D Gaussian Splatting rasterization as dense matrix operations lets Tensor Cores deliver 1.65 times faster rendering at unchanged image quality.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that the compute-heavy rasterization stage in 3D Gaussian Splatting can be accelerated by converting its per-pixel scalar work into regular matrix multiplications that Tensor Cores can execute in FP16. This matters for anyone using 3DGS in real-time applications because the stage currently dominates runtime and leaves modern GPU hardware underused. By adding cross-tile grouping the method also reuses Gaussian data across neighboring regions, cutting data movement. The result is a measured 1.65 times end-to-end speedup with no visible quality loss on standard test scenes.

Core claim

TensorGS tensorizes the dominant rasterization computation into Tensor-Core-compatible matrix operations and introduces cross-tile grouping to improve Gaussian reuse, amortize overhead, and increase Tensor Core utilization. Experimental results show that TensorGS improves end-to-end rendering performance by 1.65× while preserving image quality.

What carries the argument

Tensorization of rasterization into dense, regular matrix multiplications paired with cross-tile grouping that reuses each Gaussian across neighboring tiles.

If this is right

Rasterization becomes a dense matrix workload that modern GPUs can execute at full Tensor Core throughput.
3DGS scenes render fast enough for additional latency-sensitive uses such as interactive editing or mobile deployment.
FP16 arithmetic suffices for the final pixel contributions without measurable degradation.
Tile-based schedulers can be extended to share Gaussian data across tile boundaries rather than reloading it.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar tensorization may apply to other point-based or splatting renderers that currently run as irregular per-pixel loops.
Future GPU architectures could add graphics-specific matrix instructions if this pattern proves common.
The cross-tile grouping idea could combine with existing level-of-detail or culling passes to further reduce memory traffic.

Load-bearing premise

Rasterization can be turned into dense regular matrix operations that Tensor Cores handle efficiently without large overhead or visible quality loss.

What would settle it

Run the same 3DGS scenes on a Tensor-Core GPU, measure wall-clock rendering time and PSNR/SSIM, and check whether TensorGS still shows the reported speedup and quality match.

Figures

Figures reproduced from arXiv: 2605.17855 by Bo Yuan, Sheng Li, Xulong Tang, Yang Sui, Yue Dai, Yue Wu, Zhuoran Song.

**Figure 1.** Figure 1: (a) Averaged end-to-end time breakdown of the 3DGS rendering pipeline across six representative scenes. (b) Averaged time breakdown within the rasterization stage. evaluation of power𝑖 as the power computation, since it is the dominant arithmetic operation in rasterization. 3 3DGS Pipeline Analysis and Motivation 3.1 Rasterization Bottleneck and Underutilized Tensor Cores We begin by characterizing where t… view at source ↗

**Figure 2.** Figure 2: Overview of the original 3DGS pipeline and our proposed TensorGS. the same Gaussian data must be fetched and processed repeatedly across nearby tiles, preventing effective reuse and often producing small per-tile workloads. To address this issue, TensorGS groups neighboring tiles into a larger processing region so that overlapping Gaussians can be reused across multiple tiles, reducing the cost of Gaussi… view at source ↗

**Figure 3.** Figure 3: Gaussian loading reduction within the default 2×2 tile group across six scenes. suggest that the per-tile execution granularity is still inefficient for Tensor Core execution. To identify a better execution granularity, we revisit how rasterization is organized in the original 3DGS pipeline. There, each Gaussian is instantiated according to the tiles it overlaps, and each tile is processed independently… view at source ↗

**Figure 4.** Figure 4: Speedup over the original 3DGS pipeline on six representative scenes. 0.0 0.5 1.0 1.5 2.0 gsplat AdR-Gaussian FlashGS GSCore Speedup Original + TensorGS [PITH_FULL_IMAGE:figures/full_fig_p010_4.png] view at source ↗

**Figure 5.** Figure 5: Speedup (averaged over six scenes) when integrating TensorGS into the four representative 3DGS optimization methods. 5.2 Main Results Figures 4 and 5 show the main evaluation results from two perspectives. First, [PITH_FULL_IMAGE:figures/full_fig_p010_5.png] view at source ↗

read the original abstract

3D Gaussian Splatting (3DGS) has become a leading technique for real-time neural rendering and 3D scene reconstruction, but its rendering cost remains too high for many latency-sensitive scenarios. In particular, the rasterization stage in 3DGS dominates end-to-end rendering time, during which the renderer repeatedly evaluates each Gaussian's contribution to each covered pixel, making this stage compute-bound. At the same time, modern GPUs provide high-throughput Tensor Cores for low-precision matrix operations, yet existing 3DGS systems execute rasterization entirely on CUDA cores and leave Tensor Cores idle. We find that 3DGS rendering can be executed in FP16 with negligible quality degradation, suggesting a promising opportunity for Tensor Core acceleration. However, exploiting Tensor Cores for 3DGS is non-trivial because rasterization does not naturally match their execution model. Existing 3DGS rasterization is expressed as irregular per-pixel scalar operations, whereas Tensor Cores require dense, regular, and reuse-rich matrix workloads. Moreover, conventional tile-by-tile execution fails to exploit Gaussian reuse across neighboring tiles, resulting in repeated data loading and thus high data movement overhead. To this end, we present TensorGS, a 3DGS acceleration framework using Tensor Cores. TensorGS tensorizes the dominant rasterization computation into Tensor-Core-compatible matrix operations and introduces cross-tile grouping to improve Gaussian reuse, amortize overhead, and increase Tensor Core utilization. Experimental results show that TensorGS improves end-to-end rendering performance by 1.65$\times$ while preserving image quality.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

TensorGS maps 3DGS rasterization to Tensor Cores via matrix reformulation and cross-tile grouping for a claimed 1.65x speedup, but the overhead from making irregular work regular still needs clearer proof.

read the letter

The main point is that this paper gets a 1.65x end-to-end rendering speedup on 3D Gaussian Splatting by reformulating the dominant rasterization stage as dense matrix operations for Tensor Cores and adding cross-tile grouping to improve reuse. They also note that FP16 runs with little quality loss, which frees up hardware that standard CUDA pipelines leave idle. The concrete new piece is the specific tensorization of per-pixel Gaussian contributions plus the grouping step that amortizes data movement across neighboring tiles. This turns an irregular, compute-bound process into something that can actually use the high-throughput units on modern GPUs. The work is useful because it targets a real bottleneck in a widely adopted technique and shows a hardware-aware path that could extend to similar point-based or splatting renderers. The implementation choices around making the mapping efficient without big artifacts are the strongest part. The soft spot is the evidence around overhead. Rasterization has variable Gaussian coverage per pixel and depth sorting, so any shift to regular matrix form requires grouping, padding, and extra transfers. The stress-test concern about reformatting costs eating into the gains is reasonable, and the abstract alone does not give utilization numbers, time breakdowns, or ablations that isolate the tensorization overhead from the baseline. If the full paper supplies those measurements and shows the net gain holds against optimized CUDA, the claim strengthens. This is for graphics researchers and engineers focused on real-time neural rendering and low-level GPU optimizations. A reader building production 3DGS systems or exploring hardware features would pick up practical ideas here. I would send it to peer review so the measurements and code can be checked in detail.

Referee Report

2 major / 2 minor

Summary. The paper introduces TensorGS, a framework that accelerates the rasterization stage of 3D Gaussian Splatting by reformulating the dominant per-Gaussian per-pixel computations as dense, regular matrix operations suitable for Tensor Cores and by adding cross-tile grouping to improve Gaussian reuse across neighboring tiles. The central claim is that this yields a 1.65× end-to-end rendering speedup on modern GPUs while preserving image quality, with the work framed as an implementation and measurement study that exploits FP16 execution and underutilized Tensor Core hardware.

Significance. If the performance results and quality preservation hold under rigorous validation, the work would be significant for real-time neural rendering pipelines that rely on 3DGS. It demonstrates a practical way to map an irregular, compute-bound graphics kernel onto high-throughput matrix hardware, potentially improving latency in latency-sensitive applications. The emphasis on amortizing data-movement costs via cross-tile grouping addresses a key practical challenge in tensorizing graphics workloads.

major comments (2)

[Abstract and Experimental Results] Abstract and Experimental Results section: The 1.65× end-to-end speedup claim is presented without reported Tensor Core utilization percentages, per-stage time breakdowns (reformatting, padding, data movement vs. compute), or ablations that isolate the overhead of the tensorization step itself. Given the introduction's own emphasis on the irregular nature of rasterization and the risk of reformatting costs, these metrics are load-bearing for verifying that cross-tile grouping sufficiently offsets the added overheads.
[Method] Method section (tensorization description): The mapping of variable-coverage, depth-sorted alpha blending to dense matrix operations requires explicit quantification of padding ratios and grouping efficiency; without these numbers it is unclear whether the reformulation introduces unacceptable data-movement costs that would shrink or eliminate the reported net speedup relative to an already-optimized CUDA baseline.

minor comments (2)

[Abstract] The abstract states 'negligible quality degradation' for FP16 but does not specify the exact PSNR/SSIM thresholds or test scenes used to support this; a short quantitative statement would improve clarity.
[Method] Notation for the matrix dimensions after cross-tile grouping should be defined once and used consistently to avoid ambiguity when describing reuse across tiles.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below and agree that additional quantitative details will strengthen the manuscript.

read point-by-point responses

Referee: [Abstract and Experimental Results] Abstract and Experimental Results section: The 1.65× end-to-end speedup claim is presented without reported Tensor Core utilization percentages, per-stage time breakdowns (reformatting, padding, data movement vs. compute), or ablations that isolate the overhead of the tensorization step itself. Given the introduction's own emphasis on the irregular nature of rasterization and the risk of reformatting costs, these metrics are load-bearing for verifying that cross-tile grouping sufficiently offsets the added overheads.

Authors: We agree that these metrics are important for rigorous validation. In the revised manuscript we will report Tensor Core utilization percentages, per-stage time breakdowns that separate reformatting/padding/data-movement from compute, and ablations isolating tensorization overhead. These additions will directly show how cross-tile grouping amortizes the costs highlighted in the introduction. revision: yes
Referee: [Method] Method section (tensorization description): The mapping of variable-coverage, depth-sorted alpha blending to dense matrix operations requires explicit quantification of padding ratios and grouping efficiency; without these numbers it is unclear whether the reformulation introduces unacceptable data-movement costs that would shrink or eliminate the reported net speedup relative to an already-optimized CUDA baseline.

Authors: We accept that explicit numbers are needed. We will augment the method section with measured padding ratios for the matrix formulation and quantitative grouping-efficiency statistics for cross-tile grouping. These figures will allow readers to assess data-movement overhead relative to the optimized CUDA baseline and confirm that the net 1.65× speedup is preserved. revision: yes

Circularity Check

0 steps flagged

No circularity: implementation and measurement study with no load-bearing derivations

full rationale

The paper frames its contribution as an engineering reformulation of 3DGS rasterization into Tensor-Core matrix operations plus cross-tile grouping, followed by empirical timing and quality measurements. No equations, uniqueness theorems, fitted parameters, or predictions appear in the provided text that reduce by construction to the inputs. The 1.65× claim is presented as an experimental outcome rather than a derived result that is definitionally equivalent to its own assumptions. This is the common honest case of a self-contained systems paper whose central claims rest on external benchmarks and measurements, not on self-referential definitions or self-citation chains.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that FP16 arithmetic is sufficient for 3DGS quality and on the engineering choice to treat rasterization as matrix multiplication; no new physical entities or free parameters are introduced in the abstract.

axioms (1)

domain assumption 3DGS rasterization can be executed in FP16 with negligible quality degradation
Explicitly stated as a finding that enables Tensor Core use.

pith-pipeline@v0.9.0 · 5827 in / 1231 out tokens · 32655 ms · 2026-05-22T10:13:39.066060+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

we organize all Gbatch × 256 interactions into a batched matrix-style computation... P = Q Φ ... pad the inner dimension from 3 to 16 ... eP = eQ eΦ
IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

cross-tile grouping ... 2×2 tile group ... Gaussian loading reduction reaches 70%

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

36 extracted references · 36 canonical work pages

[1]

Jack Choquette, Wishwesh Gandhi, Olivier Giroux, Nick Stam, and Ronny Krashinsky. 2021. Nvidia a100 tensor core gpu: Performance and innovation.IEEE Micro41, 2 (2021), 29–35

work page 2021
[2]

Guofeng Feng, Siyan Chen, Rong Fu, Zimu Liao, Yi Wang, Tao Liu, Boni Hu, Lining Xu, Zhilin Pei, Hengjie Li, Xiuhong Li, Ninghui Sun, Xingcheng Zhang, and Bo Dai. 2025. Flashgs: Efficient 3d gaussian splatting for large-scale and high-resolution rendering. InProceedings of the Computer Vision and Pattern Recognition Conference. 26652– 26662

work page 2025
[3]

Houshu He, Naifeng Jing, Li Jiang, Xiaoyao Liang, and Zhuoran Song

work page
[4]

InProceedings of the 31st ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 1

AGS: A ccelerating 3D G aussian Splatting S LAM via CODEC- Assisted Frame Covisibility Detection. InProceedings of the 31st ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 1. 20–34

work page
[5]

Houshu He, Gang Li, Fangxin Liu, Li Jiang, Xiaoyao Liang, and Zhuo- ran Song. 2025. Gsarch: Breaking memory barriers in 3d gaussian splatting training via architectural support. In2025 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 366–379

work page 2025
[6]

Peter Hedman, Julien Philip, True Price, Jan-Michael Frahm, George Drettakis, and Gabriel Brostow. 2018. Deep blending for free- viewpoint image-based rendering.ACM Transactions on Graphics (ToG)37, 6 (2018), 1–15

work page 2018
[7]

Lukas Höllein, Aljaž Božič, Michael Zollhöfer, and Matthias Nießner

work page
[8]

InProceedings of the IEEE/CVF International Conference on Computer Vision

3dgs-lm: Faster gaussian-splatting optimization with levenberg- marquardt. InProceedings of the IEEE/CVF International Conference on Computer Vision. 26740–26750

work page
[9]

Xiaotong Huang, He Zhu, Tianrui Ma, Yuxiang Xiong, Fangxin Liu, Zhezhi He, Yiming Gan, Zihan Liu, Jingwen Leng, Yu Feng, and Minyi Guo. 2026. SPLATONIC: Architectural Support for 3D Gaussian Splat- ting SLAM via Sparse Processing. In2026 IEEE International Sympo- sium on High Performance Computer Architecture (HPCA). IEEE, 1–14

work page 2026
[10]

Bernhard Kerbl, Georgios Kopanas, Thomas Leimkuehler, and George Drettakis. 2023. 3D Gaussian Splatting for Real-Time Radiance Field Rendering.ACM Trans. Graph.42, 4, Article 139 (July 2023), 14 pages

work page 2023
[11]

Hyunjeong Kim and In-Kwon Lee. 2024. Is 3dgs useful?: Comparing the effectiveness of recent reconstruction methods in vr. In2024 IEEE International Symposium on Mixed and Augmented Reality (ISMAR). IEEE, 71–80

work page 2024
[12]

Arno Knapitsch, Jaesik Park, Qian-Yi Zhou, and Vladlen Koltun. 2017. Tanks and temples: Benchmarking large-scale scene reconstruction. ACM Transactions on Graphics (ToG)36, 4 (2017), 1–13

work page 2017
[13]

Donghyun Lee, Dawoon Jeong, Jae W Lee, and Hongil Yoon. 2026. GS- Scale: Unlocking Large-Scale 3D Gaussian Splatting Training via Host Offloading. InProceedings of the 31st ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2. 860–875

work page 2026
[14]

Junseo Lee, Seokwon Lee, Jungi Lee, Junyong Park, and Jaewoong Sim. 2024. Gscore: Efficient radiance field rendering via architectural support for 3d gaussian splatting. InProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 3. 497–511

work page 2024
[15]

Haomin Li, Yue Liang, Fangxin Liu, Bowen Zhu, Zongwu Wang, Yu Feng, Liqiang Lu, Li Jiang, and Haibing Guan. 2026. ORANGE: Ex- ploring Ockham’s Razor for Neural Rendering by Accelerating 3DGS on NPUs with GEMM-Friendly Blending and Balanced Workloads. In2026 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 1–15

work page 2026
[16]

Leshu Li, Jiayin Qin, Jie Peng, Zishen Wan, Huaizhi Qu, Ye Han, Pingqing Zheng, Hongsen Zhang, Yu Cao, Tianlong Chen, and Yang (Katie) Zhao. 2025. RTGS: Real-Time 3D Gaussian Splatting SLAM via Multi-Level Redundancy Reduction. InProceedings of the 58th IEEE/ACM International Symposium on Microarchitecture. 1838– 1851

work page 2025
[17]

Rongji Liao, Yuan Zhang, Wei Zhang, Lingjun Pu, Yu Guan, Yunpeng Jing, Tao Lin, and Jinyao Yan. 2025. 3DGS-enabled High-fidelity Low- cost Immersive Static 3D Video Streaming.IEEE Journal on Selected Areas in Communications(2025)

work page 2025
[18]

Lingjie Liu, Jiatao Gu, Kyaw Zaw Lin, Tat-Seng Chua, and Christian Theobalt. 2020. Neural sparse voxel fields.Advances in Neural Infor- mation Processing Systems33 (2020), 15651–15663

work page 2020
[19]

Stefano Markidis, Steven Wei Der Chien, Erwin Laure, Ivy Bo Peng, and Jeffrey S Vetter. 2018. Nvidia tensor core programmability, perfor- mance & precision. In2018 IEEE international parallel and distributed processing symposium workshops (IPDPSW). IEEE, 522–531

work page 2018
[20]

Vitor Pereira Matias, Daniel Perazzo, Vinicius Silva, Alberto Raposo, Luiz Velho, Afonso Paiva, and Tiago Novello. 2025. From volume rendering to 3d gaussian splatting: Theory and applications. In2025 38th SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI). IEEE, 1–6

work page 2025
[21]

Ben Mildenhall, Pratul P Srinivasan, Matthew Tancik, Jonathan T Barron, Ravi Ramamoorthi, and Ren Ng. 2021. Nerf: Representing scenes as neural radiance fields for view synthesis.Commun. ACM 65, 1 (2021), 99–106

work page 2021
[22]

Seock-Hwan Noh, Banseok Shin, Jeik Choi, Seungpyo Lee, Jaeha Kung, and Yeseong Kim. 2025. FlexNeRFer: A Multi-Dataflow, Adaptive Sparsity-Aware Accelerator for On-Device NeRF Rendering. InPro- ceedings of the 52nd Annual International Symposium on Computer Architecture. 1894–1909

work page 2025
[23]

Changhun Oh, Seongryong Oh, Jinwoo Hwang, Yoonsung Kim, Hardik Sharma, and Jongse Park. 2026. Neo: Real-Time On-Device 3D Gauss- ian Splatting with Reuse-and-Update Sorting Acceleration. InPro- ceedings of the 31st ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2. 1268–1284

work page 2026
[24]

Minnan Pei, Gang Li, Junwen Si, Zeyu Zhu, Zitao Mo, Peisong Wang, Zhuoran Song, Xiaoyao Liang, and Jian Cheng. 2025. GCC: A 3DGS Inference Architecture with Gaussian-Wise and Cross-Stage Condi- tional Processing. InProceedings of the 58th IEEE/ACM International Symposium on Microarchitecture. 1824–1837

work page 2025
[25]

Shi Qiu, Binzhu Xie, Qixuan Liu, and Pheng-Ann Heng. 2025. Ad- vancing extended reality with 3d gaussian splatting: Innovations and prospects. In2025 IEEE International Conference on Artificial Intelli- gence and eXtended and Virtual Reality (AIxVR). IEEE, 203–208

work page 2025
[26]

Santosh Reddy, H Abhiram, and KS Archish. 2025. A survey of 3D Gaussian splatting: optimization techniques, applications, and AI- driven advancements. In2025 International Conference on Intelligent and Innovative Technologies in Computing, Electrical and Electronics (IITCEE). IEEE, 1–6

work page 2025
[27]

Hongyi Wang, Zhenhua Zhu, Tianchen Zhao, Yunfei Xiang, Zehao Wang, Jincheng Yu, Huazhong Yang, Yuan Xie, and Yu Wang. 2025. REACT3D: Real-time Edge Accelerator for Incremental Training in 3D Gaussian Splatting based SLAM Systems. InProceedings of the 58th IEEE/ACM International Symposium on Microarchitecture. 1852–1866. 12

work page 2025
[28]

Xinzhe Wang, Ran Yi, and Lizhuang Ma. 2024. Adr-gaussian: Acceler- ating gaussian splatting with adaptive radius. InSIGGRAPH Asia 2024 Conference Papers. 1–10

work page 2024
[29]

Rui Wen, Zhifei Yue, Tianbo Liu, Xinkai Song, Jin Li, Di Huang, Jiaming Guo, Xing Hu, Zidong Du, Qi Guo, and Tianshi Chen. 2026. Cambricon- GS: An Accelerator for 3D Gaussian Splatting Training With Gaussian- Pixel Hybrid Parallelism. In2026 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 1–14

work page 2026
[30]

Lizhou Wu, Haozhe Zhu, Siqi He, Jiapei Zheng, Chixiao Chen, and Xiaoyang Zeng. 2024. Gauspu: 3d gaussian splatting processor for real- time slam systems. In2024 57th IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, 1562–1573

work page 2024
[31]

Yiwei Xu, Yifei Yu, Wentian Gan, Tengfei Wang, Zongqian Zhan, Hao Cheng, and Xin Wang. 2025. Gaussian on-the-fly splatting: A progressive framework for robust near real-time 3dgs optimization. IEEE Robotics and Automation Letters11, 1 (2025), 426–433

work page 2025
[32]

Vickie Ye, Ruilong Li, Justin Kerr, Matias Turkulainen, Brent Yi, Zhuoyang Pan, Otto Seiskari, Jianbo Ye, Jeffrey Hu, Matthew Tan- cik, and Angjoo Kanazawa. 2025. gsplat: An open-source library for Gaussian splatting.Journal of Machine Learning Research26, 34 (2025), 1–17

work page 2025
[33]

Zhifan Ye, Yonggan Fu, Jingqun Zhang, Leshu Li, Yongan Zhang, Sixu Li, Cheng Wan, Chenxi Wan, Chaojian Li, Sreemanth Prathipati, and Yingyan Celine Lin. 2025. Gaussian blending unit: An edge gpu plug-in for real-time gaussian-based rendering in ar/vr. In2025 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 353–365

work page 2025
[34]

Hexu Zhao, Xiwen Min, Xiaoteng Liu, Moonjun Gong, Yiming Li, Ang Li, Saining Xie, Jinyang Li, and Aurojit Panda. 2026. Clm: Removing the gpu memory barrier for 3d gaussian splatting. InProceedings of the 31st ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2. 377–393

work page 2026
[35]

Zhiyu Zhou, Feng Hui, Xing Li, and Yu Liu. 2025. Visual Localization Using 3D Gaussian Splatting Representation for Mobile Robots With Geometric Feature Correspondences Synthesis.IEEE Transactions on Automation Science and Engineering(2025)

work page 2025
[36]

Siting Zhu, Guangming Wang, Xin Kong, Dezhi Kong, and Hesheng Wang. 2024. 3d gaussian splatting in robotics: A survey.arXiv preprint arXiv:2410.12262(2024). 13

work page arXiv 2024

[1] [1]

Jack Choquette, Wishwesh Gandhi, Olivier Giroux, Nick Stam, and Ronny Krashinsky. 2021. Nvidia a100 tensor core gpu: Performance and innovation.IEEE Micro41, 2 (2021), 29–35

work page 2021

[2] [2]

Guofeng Feng, Siyan Chen, Rong Fu, Zimu Liao, Yi Wang, Tao Liu, Boni Hu, Lining Xu, Zhilin Pei, Hengjie Li, Xiuhong Li, Ninghui Sun, Xingcheng Zhang, and Bo Dai. 2025. Flashgs: Efficient 3d gaussian splatting for large-scale and high-resolution rendering. InProceedings of the Computer Vision and Pattern Recognition Conference. 26652– 26662

work page 2025

[3] [3]

Houshu He, Naifeng Jing, Li Jiang, Xiaoyao Liang, and Zhuoran Song

work page

[4] [4]

InProceedings of the 31st ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 1

AGS: A ccelerating 3D G aussian Splatting S LAM via CODEC- Assisted Frame Covisibility Detection. InProceedings of the 31st ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 1. 20–34

work page

[5] [5]

Houshu He, Gang Li, Fangxin Liu, Li Jiang, Xiaoyao Liang, and Zhuo- ran Song. 2025. Gsarch: Breaking memory barriers in 3d gaussian splatting training via architectural support. In2025 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 366–379

work page 2025

[6] [6]

Peter Hedman, Julien Philip, True Price, Jan-Michael Frahm, George Drettakis, and Gabriel Brostow. 2018. Deep blending for free- viewpoint image-based rendering.ACM Transactions on Graphics (ToG)37, 6 (2018), 1–15

work page 2018

[7] [7]

Lukas Höllein, Aljaž Božič, Michael Zollhöfer, and Matthias Nießner

work page

[8] [8]

InProceedings of the IEEE/CVF International Conference on Computer Vision

3dgs-lm: Faster gaussian-splatting optimization with levenberg- marquardt. InProceedings of the IEEE/CVF International Conference on Computer Vision. 26740–26750

work page

[9] [9]

Xiaotong Huang, He Zhu, Tianrui Ma, Yuxiang Xiong, Fangxin Liu, Zhezhi He, Yiming Gan, Zihan Liu, Jingwen Leng, Yu Feng, and Minyi Guo. 2026. SPLATONIC: Architectural Support for 3D Gaussian Splat- ting SLAM via Sparse Processing. In2026 IEEE International Sympo- sium on High Performance Computer Architecture (HPCA). IEEE, 1–14

work page 2026

[10] [10]

Bernhard Kerbl, Georgios Kopanas, Thomas Leimkuehler, and George Drettakis. 2023. 3D Gaussian Splatting for Real-Time Radiance Field Rendering.ACM Trans. Graph.42, 4, Article 139 (July 2023), 14 pages

work page 2023

[11] [11]

Hyunjeong Kim and In-Kwon Lee. 2024. Is 3dgs useful?: Comparing the effectiveness of recent reconstruction methods in vr. In2024 IEEE International Symposium on Mixed and Augmented Reality (ISMAR). IEEE, 71–80

work page 2024

[12] [12]

Arno Knapitsch, Jaesik Park, Qian-Yi Zhou, and Vladlen Koltun. 2017. Tanks and temples: Benchmarking large-scale scene reconstruction. ACM Transactions on Graphics (ToG)36, 4 (2017), 1–13

work page 2017

[13] [13]

Donghyun Lee, Dawoon Jeong, Jae W Lee, and Hongil Yoon. 2026. GS- Scale: Unlocking Large-Scale 3D Gaussian Splatting Training via Host Offloading. InProceedings of the 31st ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2. 860–875

work page 2026

[14] [14]

Junseo Lee, Seokwon Lee, Jungi Lee, Junyong Park, and Jaewoong Sim. 2024. Gscore: Efficient radiance field rendering via architectural support for 3d gaussian splatting. InProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 3. 497–511

work page 2024

[15] [15]

Haomin Li, Yue Liang, Fangxin Liu, Bowen Zhu, Zongwu Wang, Yu Feng, Liqiang Lu, Li Jiang, and Haibing Guan. 2026. ORANGE: Ex- ploring Ockham’s Razor for Neural Rendering by Accelerating 3DGS on NPUs with GEMM-Friendly Blending and Balanced Workloads. In2026 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 1–15

work page 2026

[16] [16]

Leshu Li, Jiayin Qin, Jie Peng, Zishen Wan, Huaizhi Qu, Ye Han, Pingqing Zheng, Hongsen Zhang, Yu Cao, Tianlong Chen, and Yang (Katie) Zhao. 2025. RTGS: Real-Time 3D Gaussian Splatting SLAM via Multi-Level Redundancy Reduction. InProceedings of the 58th IEEE/ACM International Symposium on Microarchitecture. 1838– 1851

work page 2025

[17] [17]

Rongji Liao, Yuan Zhang, Wei Zhang, Lingjun Pu, Yu Guan, Yunpeng Jing, Tao Lin, and Jinyao Yan. 2025. 3DGS-enabled High-fidelity Low- cost Immersive Static 3D Video Streaming.IEEE Journal on Selected Areas in Communications(2025)

work page 2025

[18] [18]

Lingjie Liu, Jiatao Gu, Kyaw Zaw Lin, Tat-Seng Chua, and Christian Theobalt. 2020. Neural sparse voxel fields.Advances in Neural Infor- mation Processing Systems33 (2020), 15651–15663

work page 2020

[19] [19]

Stefano Markidis, Steven Wei Der Chien, Erwin Laure, Ivy Bo Peng, and Jeffrey S Vetter. 2018. Nvidia tensor core programmability, perfor- mance & precision. In2018 IEEE international parallel and distributed processing symposium workshops (IPDPSW). IEEE, 522–531

work page 2018

[20] [20]

Vitor Pereira Matias, Daniel Perazzo, Vinicius Silva, Alberto Raposo, Luiz Velho, Afonso Paiva, and Tiago Novello. 2025. From volume rendering to 3d gaussian splatting: Theory and applications. In2025 38th SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI). IEEE, 1–6

work page 2025

[21] [21]

Ben Mildenhall, Pratul P Srinivasan, Matthew Tancik, Jonathan T Barron, Ravi Ramamoorthi, and Ren Ng. 2021. Nerf: Representing scenes as neural radiance fields for view synthesis.Commun. ACM 65, 1 (2021), 99–106

work page 2021

[22] [22]

Seock-Hwan Noh, Banseok Shin, Jeik Choi, Seungpyo Lee, Jaeha Kung, and Yeseong Kim. 2025. FlexNeRFer: A Multi-Dataflow, Adaptive Sparsity-Aware Accelerator for On-Device NeRF Rendering. InPro- ceedings of the 52nd Annual International Symposium on Computer Architecture. 1894–1909

work page 2025

[23] [23]

Changhun Oh, Seongryong Oh, Jinwoo Hwang, Yoonsung Kim, Hardik Sharma, and Jongse Park. 2026. Neo: Real-Time On-Device 3D Gauss- ian Splatting with Reuse-and-Update Sorting Acceleration. InPro- ceedings of the 31st ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2. 1268–1284

work page 2026

[24] [24]

Minnan Pei, Gang Li, Junwen Si, Zeyu Zhu, Zitao Mo, Peisong Wang, Zhuoran Song, Xiaoyao Liang, and Jian Cheng. 2025. GCC: A 3DGS Inference Architecture with Gaussian-Wise and Cross-Stage Condi- tional Processing. InProceedings of the 58th IEEE/ACM International Symposium on Microarchitecture. 1824–1837

work page 2025

[25] [25]

Shi Qiu, Binzhu Xie, Qixuan Liu, and Pheng-Ann Heng. 2025. Ad- vancing extended reality with 3d gaussian splatting: Innovations and prospects. In2025 IEEE International Conference on Artificial Intelli- gence and eXtended and Virtual Reality (AIxVR). IEEE, 203–208

work page 2025

[26] [26]

Santosh Reddy, H Abhiram, and KS Archish. 2025. A survey of 3D Gaussian splatting: optimization techniques, applications, and AI- driven advancements. In2025 International Conference on Intelligent and Innovative Technologies in Computing, Electrical and Electronics (IITCEE). IEEE, 1–6

work page 2025

[27] [27]

Hongyi Wang, Zhenhua Zhu, Tianchen Zhao, Yunfei Xiang, Zehao Wang, Jincheng Yu, Huazhong Yang, Yuan Xie, and Yu Wang. 2025. REACT3D: Real-time Edge Accelerator for Incremental Training in 3D Gaussian Splatting based SLAM Systems. InProceedings of the 58th IEEE/ACM International Symposium on Microarchitecture. 1852–1866. 12

work page 2025

[28] [28]

Xinzhe Wang, Ran Yi, and Lizhuang Ma. 2024. Adr-gaussian: Acceler- ating gaussian splatting with adaptive radius. InSIGGRAPH Asia 2024 Conference Papers. 1–10

work page 2024

[29] [29]

Rui Wen, Zhifei Yue, Tianbo Liu, Xinkai Song, Jin Li, Di Huang, Jiaming Guo, Xing Hu, Zidong Du, Qi Guo, and Tianshi Chen. 2026. Cambricon- GS: An Accelerator for 3D Gaussian Splatting Training With Gaussian- Pixel Hybrid Parallelism. In2026 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 1–14

work page 2026

[30] [30]

Lizhou Wu, Haozhe Zhu, Siqi He, Jiapei Zheng, Chixiao Chen, and Xiaoyang Zeng. 2024. Gauspu: 3d gaussian splatting processor for real- time slam systems. In2024 57th IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, 1562–1573

work page 2024

[31] [31]

Yiwei Xu, Yifei Yu, Wentian Gan, Tengfei Wang, Zongqian Zhan, Hao Cheng, and Xin Wang. 2025. Gaussian on-the-fly splatting: A progressive framework for robust near real-time 3dgs optimization. IEEE Robotics and Automation Letters11, 1 (2025), 426–433

work page 2025

[32] [32]

Vickie Ye, Ruilong Li, Justin Kerr, Matias Turkulainen, Brent Yi, Zhuoyang Pan, Otto Seiskari, Jianbo Ye, Jeffrey Hu, Matthew Tan- cik, and Angjoo Kanazawa. 2025. gsplat: An open-source library for Gaussian splatting.Journal of Machine Learning Research26, 34 (2025), 1–17

work page 2025

[33] [33]

Zhifan Ye, Yonggan Fu, Jingqun Zhang, Leshu Li, Yongan Zhang, Sixu Li, Cheng Wan, Chenxi Wan, Chaojian Li, Sreemanth Prathipati, and Yingyan Celine Lin. 2025. Gaussian blending unit: An edge gpu plug-in for real-time gaussian-based rendering in ar/vr. In2025 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 353–365

work page 2025

[34] [34]

Hexu Zhao, Xiwen Min, Xiaoteng Liu, Moonjun Gong, Yiming Li, Ang Li, Saining Xie, Jinyang Li, and Aurojit Panda. 2026. Clm: Removing the gpu memory barrier for 3d gaussian splatting. InProceedings of the 31st ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2. 377–393

work page 2026

[35] [35]

Zhiyu Zhou, Feng Hui, Xing Li, and Yu Liu. 2025. Visual Localization Using 3D Gaussian Splatting Representation for Mobile Robots With Geometric Feature Correspondences Synthesis.IEEE Transactions on Automation Science and Engineering(2025)

work page 2025

[36] [36]

Siting Zhu, Guangming Wang, Xin Kong, Dezhi Kong, and Hesheng Wang. 2024. 3d gaussian splatting in robotics: A survey.arXiv preprint arXiv:2410.12262(2024). 13

work page arXiv 2024