Compact 3D Gaussian Splatting For Dense Visual SLAM

Chang Nie; Danwei Wang; Hesheng Wang; Jianfei Yang; Jiuming Liu; Shenghai Yuan; Shuhong Liu; Tianchen Deng; Wenhua Wu

REVIEW 2 major objections 2 minor 7 cited by

A geometry codebook plus sliding-window masking cuts redundant 3D Gaussians in SLAM while keeping reconstruction quality.

Reviewed by Pith at T0; open to challenge. T0 means a machine referee read the full paper against a public rubric. the ladder, T0–T4 →

Challenge this review Re-run · record.json Download PDF Read on arXiv ↗

T0 review · grok-4.3

2026-05-24 03:30 UTC pith:HYNSITQO

load-bearing objection The paper adds sliding-window masking and a geometry codebook to cut redundant 3D Gaussians in SLAM, but the similarity claim for the codebook has no numbers behind it. the 2 major comments →

arxiv 2403.11247 v3 pith:HYNSITQO submitted 2024-03-17 cs.CV cs.RO

Compact 3D Gaussian Splatting For Dense Visual SLAM

Tianchen Deng , Chang Nie , Shuhong Liu , Wenhua Wu , Jianfei Yang , Shenghai Yuan , Jiuming Liu , Danwei Wang

show 1 more author

Hesheng Wang

This is my paper

classification cs.CV cs.RO

keywords 3D Gaussian SplattingVisual SLAMDense ReconstructionCodebook CompressionGaussian EllipsoidsBundle AdjustmentReal-time RenderingPose Estimation

verification ladder T0 review T1 audit T2 compute T3 formal T4 reserved

The pith

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a compact 3D Gaussian Splatting method for dense visual SLAM. Redundant ellipsoids are removed by a sliding window masking strategy, and the remaining geometric parameters are compressed by a codebook that exploits the observed similarity among most covariance matrices. A global bundle adjustment with reprojection loss keeps pose estimation accurate. The resulting system trains and renders faster than prior Gaussian SLAM approaches yet matches their scene quality on standard benchmarks. This directly lowers memory and storage demands that have limited real-world use of dense Gaussian mapping.

Core claim

The authors show that a single geometry codebook can compress the covariance matrices of most 3D Gaussian ellipsoids without material loss of reconstruction or pose accuracy, and that a sliding-window mask can further eliminate redundant ellipsoids, producing a SLAM pipeline whose training and rendering are materially faster while scene representation quality remains state-of-the-art.

What carries the argument

The geometry codebook that encodes similar covariance matrices of 3D Gaussian ellipsoids, paired with the sliding-window masking strategy that prunes redundant ellipsoids.

Load-bearing premise

The covariance matrices of most 3D Gaussian ellipsoids are similar enough that one codebook can represent them without harming reconstruction or pose accuracy.

What would settle it

Measure reconstruction PSNR, SSIM, and absolute pose error on Replica or TUM RGB-D sequences; a statistically significant drop relative to the uncompressed baseline would falsify the claim.

Watch this falsifier — get emailed when new claim-graph text bears on it.

If this is right

Training and rendering become faster than existing 3D Gaussian SLAM systems.
Memory and storage costs drop because fewer and smaller Gaussian parameters are stored.
Pose estimation remains accurate through global bundle adjustment with reprojection loss.
Scene representation quality stays at state-of-the-art levels on standard benchmarks.
Real-time rendering of dense maps becomes feasible with lower hardware requirements.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same codebook idea could be tested on non-SLAM Gaussian reconstruction pipelines to see whether the similarity assumption holds outside mapping tasks.
If the compression ratio scales with scene size, the method might allow long-term mapping on memory-constrained robots without periodic map resets.
An adaptive codebook that updates when scene geometry changes could be a natural next step if fixed-codebook accuracy degrades over time.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit.

Desk Editor's Note

The paper adds sliding-window masking and a geometry codebook to cut redundant 3D Gaussians in SLAM, but the similarity claim for the codebook has no numbers behind it.

read the letter

The paper's contribution is a sliding-window masking strategy to drop redundant 3D Gaussians followed by a geometry codebook that compresses their covariance parameters. The authors motivate the codebook by noting that most covariance matrices are extremely similar. They also keep a global bundle adjustment with reprojection loss for pose accuracy. This combination is not in the prior work they cite. The approach directly targets the memory and compute cost of dense Gaussian maps in SLAM, which is a practical bottleneck. The masking is a simple way to limit growth, and the codebook idea is a reasonable compression step if the similarity holds. The claim of faster training and rendering while holding SOTA quality is the main result they report. The soft spot is the lack of support for the similarity assumption. The abstract states it but does not give variance of the covariance parameters, codebook reconstruction error, or an ablation on how many entries are needed. If the similarity varies by scene or scale, the quality claim may not hold even when speed improves. The experiments are described as extensive, yet the abstract gives no error bars or precise baseline details, so the results are hard to assess without the full paper. This paper is for people working on real-time dense visual SLAM with 3D Gaussians who want lower resource use. A reader in that area could get value from the implementation details if the full text supplies the missing quantifications. I would send it to peer review. The core technique is worth checking even if the current evidence for the codebook is thin.

Referee Report

2 major / 2 minor

Summary. The paper proposes Compact 3D Gaussian Splatting for Dense Visual SLAM, introducing a sliding-window masking strategy to prune redundant 3D Gaussians and a geometry codebook to compress covariance parameters based on the observation that most ellipsoids have extremely similar geometries; pose accuracy is maintained via global bundle adjustment with reprojection loss. The central claim is that the resulting system delivers faster training and rendering while preserving SOTA reconstruction and tracking quality.

Significance. If the similarity assumption and quality claims hold under quantitative scrutiny, the work would provide a practical route to lower-memory, higher-speed dense visual SLAM, directly addressing the storage and compute overhead that currently limits 3DGS-based systems on resource-constrained platforms. The combination of masking and codebook compression is a targeted engineering contribution that could be adopted by follow-on real-time reconstruction pipelines.

major comments (2)

[Abstract] Abstract: The geometry-codebook motivation states that 'the covariance matrix (geometry) of most 3D Gaussian ellipsoids are extremely similar' yet supplies no quantitative support (variance of the six covariance parameters, codebook reconstruction error, or ablation on codebook size); because this unquantified similarity is the sole justification for replacing per-Gaussian covariances without material loss in reconstruction or pose accuracy, the SOTA-quality claim rests on an unverified premise.
[Experiments] Experiments (results tables): The reported maintenance of SOTA quality is presented without error bars across runs, without an ablation isolating the effect of codebook size on pose and reconstruction metrics, and without explicit statement of data-exclusion rules or baseline re-implementation details; these omissions prevent verification that localized geometric degradation is not masked by the global bundle-adjustment stage.

minor comments (2)

[Method] The description of how the codebook is learned and applied during optimization would benefit from an explicit equation or short algorithm box to make the compression step reproducible.
[Figures] Figure captions for the qualitative results should state the exact scenes and metrics shown so readers can directly compare against the quantitative tables.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below and commit to revisions that strengthen the quantitative justification and experimental transparency.

read point-by-point responses

Referee: [Abstract] Abstract: The geometry-codebook motivation states that 'the covariance matrix (geometry) of most 3D Gaussian ellipsoids are extremely similar' yet supplies no quantitative support (variance of the six covariance parameters, codebook reconstruction error, or ablation on codebook size); because this unquantified similarity is the sole justification for replacing per-Gaussian covariances without material loss in reconstruction or pose accuracy, the SOTA-quality claim rests on an unverified premise.

Authors: We agree the abstract provides no quantitative metrics for the similarity observation. The full manuscript reports overall performance but does not include the requested statistics. In revision we will add: a histogram and variance statistics for the six covariance parameters across scenes, codebook reconstruction error versus codebook size, and an ablation table showing reconstruction/pose metrics for varying codebook sizes. These additions will directly support the premise. revision: yes
Referee: [Experiments] Experiments (results tables): The reported maintenance of SOTA quality is presented without error bars across runs, without an ablation isolating the effect of codebook size on pose and reconstruction metrics, and without explicit statement of data-exclusion rules or baseline re-implementation details; these omissions prevent verification that localized geometric degradation is not masked by the global bundle-adjustment stage.

Authors: We accept that the current experiments lack these elements. The revision will add error bars from multiple runs, a dedicated ablation on codebook size reporting PSNR/SSIM/ATE, explicit statements on data splits (no test data leakage) and baseline re-implementations (official code, default hyperparameters). The new ablation will report metrics before and after codebook compression (with BA held fixed) to isolate any localized degradation. revision: yes

Circularity Check

0 steps flagged

No circularity: method rests on empirical observation and independent experiments

full rationale

The paper introduces a sliding-window masking strategy and a geometry codebook motivated by the stated observation that most covariance matrices are similar. No equations, fitted parameters, or self-citations are shown that reduce the compactness claim or SOTA-quality result to the same inputs by construction. The derivation chain consists of algorithmic choices justified by external data and benchmarks rather than self-definition or renamed predictions.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only; no explicit free parameters, axioms, or invented entities are stated beyond the core modeling choice that ellipsoids share similar covariances.

pith-pipeline@v0.9.0 · 5730 in / 1071 out tokens · 20233 ms · 2026-05-24T03:30:52.456385+00:00 · methodology

0 comments

read the original abstract

Recent work has shown that 3D Gaussian-based SLAM enables high-quality reconstruction, accurate pose estimation, and real-time rendering of scenes. However, these approaches are built on a tremendous number of redundant 3D Gaussian ellipsoids, leading to high memory and storage costs, and slow training speed. To address the limitation, we propose a compact 3D Gaussian Splatting SLAM system that reduces the number and the parameter size of Gaussian ellipsoids. A sliding window-based masking strategy is first proposed to reduce the redundant ellipsoids. Then we observe that the covariance matrix (geometry) of most 3D Gaussian ellipsoids are extremely similar, which motivates a novel geometry codebook to compress 3D Gaussian geometric attributes, i.e., the parameters. Robust and accurate pose estimation is achieved by a global bundle adjustment method with reprojection loss. Extensive experiments demonstrate that our method achieves faster training and rendering speed while maintaining the state-of-the-art (SOTA) quality of the scene representation.

Figures

Figures reproduced from arXiv: 2403.11247 by Chang Nie, Danwei Wang, Hesheng Wang, Jianfei Yang, Jiuming Liu, Shenghai Yuan, Shuhong Liu, Tianchen Deng, Wenhua Wu.

**Figure 1.** Figure 1: Our framework minimizes storage and accelerates rendering while maintaining the SOTA image reconstruction performance. The proposed [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗

**Figure 2.** Figure 2: The pipeline of our GS-based SLAM system. The input of our system is RGB-D images. We start the SLAM system by initializing the 3D [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: The core distinctions between GS-based SLAM systems and [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: The KL divergence distribution of the Gaussian ellipsoids with the online training of the SLAM system on different time steps (500, 1000, [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

**Figure 5.** Figure 5: The left figure shows the learnable mask strategy. We perform frustum selection and sliding widow reset to remove redundant Gaussian [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗

**Figure 6.** Figure 6: The R-VQ process to represent the scale and rotation of Gaussian ellipsoids. In the first stage, we cluster the scale and rotation vectors and [PITH_FULL_IMAGE:figures/full_fig_p006_6.png] view at source ↗

**Figure 7.** Figure 7: The rendering visualization results on the Replica dataset [ [PITH_FULL_IMAGE:figures/full_fig_p007_7.png] view at source ↗

**Figure 8.** Figure 8: The rendering visualization results on the TUM RGB-D dataset [ [PITH_FULL_IMAGE:figures/full_fig_p008_8.png] view at source ↗

**Figure 9.** Figure 9: We visualized the learned codebook indices for rotation vector in the replica dataset [ [PITH_FULL_IMAGE:figures/full_fig_p010_9.png] view at source ↗

**Figure 10.** Figure 10: We visualized the learned codebook indices for scale vector in the replica dataset [ [PITH_FULL_IMAGE:figures/full_fig_p011_10.png] view at source ↗

discussion (0)

Forward citations

Cited by 7 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

GaussLite: Online Task-Conditioned 3D Gaussian Splatting for Real-Time Robotic Mapping
cs.CV 2026-06 unverdicted novelty 7.0

GaussLite conditions 3D Gaussian Splatting seeding density, gradient flow, and scaling on task relevance masks derived from LLM-parsed natural language and open-vocabulary detection, yielding +2.72 dB ROI PSNR gains o...
Mamba-VGGT: Persistent Long-Sequence Video Geometry Grounded Transformer via External Sliding Window Mamba Memory
cs.CV 2026-05 unverdicted novelty 7.0

Mamba-VGGT introduces a Sliding Window Mamba memory module and Zero-Init Spatial Memory Injector to enable persistent long-range geometric reasoning in VGGT for extended video sequences.
LiveEdit: Towards Real-Time Diffusion-Based Streaming Video Editing
cs.CV 2026-06 unverdicted novelty 6.0

LiveEdit distills a bidirectional video foundation model into a unidirectional streaming editor via three-stage training plus mask caching to reach 12.66 FPS with stable edits.
DINO-VO: Learning Where to Focus for Enhanced State Estimation
cs.CV 2026-04 unverdicted novelty 6.0

DINO-VO achieves state-of-the-art monocular visual odometry accuracy and generalization by training a differentiable patch selector together with multi-task features and inverse-depth bundle adjustment.
VGGT-Occ: Geometry-Grounded and Density-Aware Gated Fusion for 3D Occupancy Prediction
cs.CV 2026-05 unverdicted novelty 5.0

VGGT-Occ embeds geometric tokens via PA-DA and uses sequential coarse-to-fine gated fusion to reach 33.00% IoU and 21.08% mIoU on SurroundOcc-nuScenes while using only ~41M parameters in the occupancy head.
The Code Whisperer: LLM and Graph-Based AI for Smell and Vulnerability Resolution
cs.SE 2026-04 unverdicted novelty 5.0

A hybrid graph-plus-LLM framework improves detection and repair of code smells and vulnerabilities over graph-only or LLM-only baselines on multi-language datasets.
A Survey on 3D Gaussian Splatting
cs.CV 2024-01 unverdicted novelty 2.0

A survey compiling principles, applications, benchmarks, and challenges of 3D Gaussian Splatting for explicit 3D scene representation.

Reference graph

Works this paper leans on

38 extracted references · 38 canonical work pages · cited by 7 Pith papers · 3 internal anchors

[1]

Simultaneous localization and mapping: part i,

H. Durrant-Whyte and T. Bailey, “Simultaneous localization and mapping: part i,” IEEE robotics & automation magazine, vol. 13, no. 2, pp. 99–110, 2006

work page 2006
[2]

Simultaneous localization and map- ping for augmented reality,

G. Reitmayr, T. Langlotz, D. Wagner, A. Mulloni, G. Schall, D. Schmalstieg, and Q. Pan, “Simultaneous localization and map- ping for augmented reality,” in 2010 International Symposium on Ubiquitous Virtual Reality. IEEE, 2010, pp. 5–8

work page 2010
[3]

Orb-slam: A versatile and accurate monocular slam system,

R. Mur-Artal, J. M. M. Montiel, and J. D. Tard ´os, “Orb-slam: A versatile and accurate monocular slam system,” IEEE T ransactions on Robotics, vol. 31, no. 5, pp. 1147–1163, 2015

work page 2015
[4]

Orb-slam2: An open-source slam system for monocular, stereo, and rgb-d cameras,

R. Mur-Artal and J. D. Tard ´os, “Orb-slam2: An open-source slam system for monocular, stereo, and rgb-d cameras,” IEEE T ransac- tions on Robotics , vol. 33, no. 5, pp. 1255–1262, 2017

work page 2017
[5]

Vins-mono: A robust and versatile monocular visual-inertial state estimator,

T. Qin, P . Li, and S. Shen, “Vins-mono: A robust and versatile monocular visual-inertial state estimator,” IEEE T ransactions on Robotics, vol. 34, no. 4, pp. 1004–1020, 2018

work page 2018
[6]

Long-term visual simul- taneous localization and mapping: Using a bayesian persistence filter-based global map prediction,

T. Deng, H. Xie, J. Wang, and W. Chen, “Long-term visual simul- taneous localization and mapping: Using a bayesian persistence filter-based global map prediction,” IEEE Robotics & Automation Magazine, vol. 30, no. 1, pp. 36–49, 2023

work page 2023
[7]

Robust incremental long- term visual topological localization in changing environments,

H. Xie, T. Deng, J. Wang, and W. Chen, “Robust incremental long- term visual topological localization in changing environments,” IEEE T ransactions on Instrumentation and Measurement , vol. 72, pp. 1–14, 2022

work page 2022
[8]

Past, present, and future of simultaneous localization and mapping: Toward the robust- perception age,

C. Cadena, L. Carlone, H. Carrillo, Y. Latif, D. Scaramuzza, J. Neira, I. Reid, and J. J. Leonard, “Past, present, and future of simultaneous localization and mapping: Toward the robust- perception age,” IEEE T ransactions on robotics , vol. 32, no. 6, pp. 1309–1332, 2016

work page 2016
[9]

Dtam: Dense tracking and mapping in real-time,

R. A. Newcombe, S. J. Lovegrove, and A. J. Davison, “Dtam: Dense tracking and mapping in real-time,” in 2011 international conference on computer vision . IEEE, 2011, pp. 2320–2327

work page 2011
[10]

Real-time large-scale dense rgb-d slam with volumetric fusion,

T. Whelan, M. Kaess, H. Johannsson, M. Fallon, J. J. Leonard, and J. McDonald, “Real-time large-scale dense rgb-d slam with volumetric fusion,” The International Journal of Robotics Research , vol. 34, no. 4-5, pp. 598–626, 2015

work page 2015
[11]

Elasticfusion: Real-time dense slam and light source estimation,

T. Whelan, R. F. Salas-Moreno, B. Glocker, A. J. Davison, and S. Leutenegger, “Elasticfusion: Real-time dense slam and light source estimation,” The International Journal of Robotics Research , vol. 35, no. 14, pp. 1697–1716, 2016

work page 2016
[12]

imap: Implicit mapping and positioning in real-time,

E. Sucar, S. Liu, J. Ortiz, and A. J. Davison, “imap: Implicit mapping and positioning in real-time,” in ICCV, October 2021, pp. 6229–6238

work page 2021
[13]

Nice-slam: Neural implicit scalable encoding for slam,

Z. Zhu, S. Peng, V . Larsson, W. Xu, H. Bao, Z. Cui, M. R. Oswald, and M. Pollefeys, “Nice-slam: Neural implicit scalable encoding for slam,” in CVPR, June 2022, pp. 12 786–12 796

work page 2022
[14]

Eslam: Efficient dense slam system based on hybrid representation of signed distance fields,

M. M. Johari, C. Carta, and F. Fleuret, “Eslam: Efficient dense slam system based on hybrid representation of signed distance fields,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 17 408–17 419

work page 2023
[15]

Co-slam: Joint coordinate and sparse parametric encodings for neural real-time slam,

H. Wang, J. Wang, and L. Agapito, “Co-slam: Joint coordinate and sparse parametric encodings for neural real-time slam,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 13 293–13 302

work page 2023
[16]

Plgslam: Progressive neural scene represenation with local to global bundle adjustment,

T. Deng, G. Shen, T. Qin, J. Wang, W. Zhao, J. Wang, D. Wang, and W. Chen, “Plgslam: Progressive neural scene represenation with local to global bundle adjustment,”arXiv preprint arXiv:2312.09866, 2023

work page arXiv 2023
[17]

3d gaussian splatting for real-time radiance field rendering,

B. Kerbl, G. Kopanas, T. Leimk ¨uhler, and G. Drettakis, “3d gaussian splatting for real-time radiance field rendering,” ACM T ransactions on Graphics, vol. 42, no. 4, 2023

work page 2023
[18]

Splatam: Splat, track & map 3d gaussians for dense rgb-d slam, 2024

N. Keetha, J. Karhade, K. M. Jatavallabhula, G. Yang, S. Scherer, D. Ramanan, and J. Luiten, “Splatam: Splat, track & map 3d gaussians for dense rgb-d slam,” arXiv preprint arXiv:2312.02126 , 2023

work page arXiv 2023
[19]

Gs-slam: Dense visual slam with 3d gaussian splatting,

C. Yan, D. Qu, D. Wang, D. Xu, Z. Wang, B. Zhao, and X. Li, “Gs- slam: Dense visual slam with 3d gaussian splatting,”arXiv preprint arXiv:2311.11700, 2023

work page arXiv 2023
[20]

Gaussian-SLAM: Photo-realistic Dense SLAM with Gaussian Splatting

V . Yugay, Y. Li, T. Gevers, and M. R. Oswald, “Gaussian-slam: Photo-realistic dense slam with gaussian splatting,” arXiv preprint arXiv:2312.10070, 2023

work page Pith review arXiv 2023
[21]

Gaussian splatting slam,

H. Matsuki, R. Murai, P . H. Kelly, and A. J. Davison, “Gaussian splatting slam,” arXiv preprint arXiv:2312.06741 , 2023

work page arXiv 2023
[22]

Kinectfusion: real-time 3d reconstruction and interaction using a moving depth camera,

S. Izadi, D. Kim, O. Hilliges, D. Molyneaux, R. Newcombe, P . Kohli, J. Shotton, S. Hodges, D. Freeman, A. Davison et al. , “Kinectfusion: real-time 3d reconstruction and interaction using a moving depth camera,” in Proceedings of the 24th annual ACM symposium on User interface software and technology , 2011, pp. 559– 568

work page 2011
[23]

Droid-slam: Deep visual slam for monocular, stereo, and rgb-d cameras,

Z. Teed and J. Deng, “Droid-slam: Deep visual slam for monocular, stereo, and rgb-d cameras,” Advances in neural information process- ing systems, vol. 34, pp. 16 558–16 569, 2021

work page 2021
[24]

Raft: Recurrent all-pairs field transforms for optical flow,

Z. Teed and J. Deng, “Raft: Recurrent all-pairs field transforms for optical flow,” in Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part II 16 . Springer, 2020, pp. 402–419

work page 2020
[25]

Codeslam — learning a compact, optimisable representation for dense visual slam,

M. Bloesch, J. Czarnowski, R. Clark, S. Leutenegger, and A. J. Davi- son, “Codeslam — learning a compact, optimisable representation for dense visual slam,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , June 2018

work page 2018
[26]

Deep patch visual odometry,

Z. Teed, L. Lipson, and J. Deng, “Deep patch visual odometry,” Advances in Neural Information Processing Systems , vol. 36, 2024

work page 2024
[27]

Nerf: Representing scenes as neural radiance fields for view synthesis,

B. Mildenhall, P . P . Srinivasan, M. Tancik, J. T. Barron, R. Ra- mamoorthi, and R. Ng, “Nerf: Representing scenes as neural radiance fields for view synthesis,” in ECCV, 2020

work page 2020
[28]

Vox-fusion: Dense tracking and mapping with voxel-based neural implicit representation,

X. Yang, H. Li, H. Zhai, Y. Ming, Y. Liu, and G. Zhang, “Vox-fusion: Dense tracking and mapping with voxel-based neural implicit representation,” in 2022 IEEE International Symposium on Mixed and Augmented Reality (ISMAR) , 2022, pp. 499–507

work page 2022
[29]

Point- slam: Dense neural point cloud-based slam,

E. Sandstr ¨om, Y. Li, L. Van Gool, and M. R. Oswald, “Point- slam: Dense neural point cloud-based slam,” in Proceedings of the 13 IEEE/CVF International Conference on Computer Vision , 2023, pp. 18 433–18 444

work page 2023
[30]

Compact 3d gaussian representation for radiance field,

J. C. Lee, D. Rho, X. Sun, J. H. Ko, and E. Park, “Compact 3d gaussian representation for radiance field,” arXiv preprint arXiv:2311.13681, 2023

work page arXiv 2023
[31]

Compact 3d scene representation via self-organizing gaussian grids,

W. Morgenstern, F. Barthel, A. Hilsmann, and P . Eisert, “Compact 3d scene representation via self-organizing gaussian grids,” arXiv preprint arXiv:2312.13299, 2023

work page arXiv 2023
[32]

Lightgaussian: Unbounded 3d gaussian compression with 15x reduction and 200+ fps,

Z. Fan, K. Wang, K. Wen, Z. Zhu, D. Xu, and Z. Wang, “Lightgaus- sian: Unbounded 3d gaussian compression with 15x reduction and 200+ fps,” arXiv preprint arXiv:2311.17245 , 2023

work page arXiv 2023
[33]

Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation

Y. Bengio, N. L ´eonard, and A. Courville, “Estimating or propagat- ing gradients through stochastic neurons for conditional compu- tation,” arXiv preprint arXiv:1308.3432 , 2013

work page internal anchor Pith review Pith/arXiv arXiv 2013
[34]

Soundstream: An end-to-end neural audio codec,

N. Zeghidour, A. Luebs, A. Omran, J. Skoglund, and M. Tagliasacchi, “Soundstream: An end-to-end neural audio codec,” IEEE/ACM T ransactions on Audio, Speech, and Language Processing, vol. 30, pp. 495–507, 2021

work page 2021
[35]

High Fidelity Neural Audio Compression

A. D ´efossez, J. Copet, G. Synnaeve, and Y. Adi, “High fidelity neural audio compression,” arXiv preprint arXiv:2210.13438 , 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022
[36]

The Replica Dataset: A Digital Replica of Indoor Spaces

J. Straub, T. Whelan, L. Ma, Y. Chen, E. Wijmans, S. Green, J. J. Engel, R. Mur-Artal, C. Ren, S. Verma et al. , “The replica dataset: A digital replica of indoor spaces,” arXiv preprint arXiv:1906.05797, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1906
[37]

Scannet: Richly-annotated 3d reconstructions of indoor scenes,

A. Dai, A. X. Chang, M. Savva, M. Halber, T. Funkhouser, and M. Niessner, “Scannet: Richly-annotated 3d reconstructions of indoor scenes,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , July 2017

work page 2017
[38]

A benchmark for the evaluation of rgb-d slam systems,

J. Sturm, N. Engelhard, F. Endres, W. Burgard, and D. Cremers, “A benchmark for the evaluation of rgb-d slam systems,” in 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems , 2012, pp. 573–580

work page 2012

[1] [1]

Simultaneous localization and mapping: part i,

H. Durrant-Whyte and T. Bailey, “Simultaneous localization and mapping: part i,” IEEE robotics & automation magazine, vol. 13, no. 2, pp. 99–110, 2006

work page 2006

[2] [2]

Simultaneous localization and map- ping for augmented reality,

G. Reitmayr, T. Langlotz, D. Wagner, A. Mulloni, G. Schall, D. Schmalstieg, and Q. Pan, “Simultaneous localization and map- ping for augmented reality,” in 2010 International Symposium on Ubiquitous Virtual Reality. IEEE, 2010, pp. 5–8

work page 2010

[3] [3]

Orb-slam: A versatile and accurate monocular slam system,

R. Mur-Artal, J. M. M. Montiel, and J. D. Tard ´os, “Orb-slam: A versatile and accurate monocular slam system,” IEEE T ransactions on Robotics, vol. 31, no. 5, pp. 1147–1163, 2015

work page 2015

[4] [4]

Orb-slam2: An open-source slam system for monocular, stereo, and rgb-d cameras,

R. Mur-Artal and J. D. Tard ´os, “Orb-slam2: An open-source slam system for monocular, stereo, and rgb-d cameras,” IEEE T ransac- tions on Robotics , vol. 33, no. 5, pp. 1255–1262, 2017

work page 2017

[5] [5]

Vins-mono: A robust and versatile monocular visual-inertial state estimator,

T. Qin, P . Li, and S. Shen, “Vins-mono: A robust and versatile monocular visual-inertial state estimator,” IEEE T ransactions on Robotics, vol. 34, no. 4, pp. 1004–1020, 2018

work page 2018

[6] [6]

Long-term visual simul- taneous localization and mapping: Using a bayesian persistence filter-based global map prediction,

T. Deng, H. Xie, J. Wang, and W. Chen, “Long-term visual simul- taneous localization and mapping: Using a bayesian persistence filter-based global map prediction,” IEEE Robotics & Automation Magazine, vol. 30, no. 1, pp. 36–49, 2023

work page 2023

[7] [7]

Robust incremental long- term visual topological localization in changing environments,

H. Xie, T. Deng, J. Wang, and W. Chen, “Robust incremental long- term visual topological localization in changing environments,” IEEE T ransactions on Instrumentation and Measurement , vol. 72, pp. 1–14, 2022

work page 2022

[8] [8]

Past, present, and future of simultaneous localization and mapping: Toward the robust- perception age,

C. Cadena, L. Carlone, H. Carrillo, Y. Latif, D. Scaramuzza, J. Neira, I. Reid, and J. J. Leonard, “Past, present, and future of simultaneous localization and mapping: Toward the robust- perception age,” IEEE T ransactions on robotics , vol. 32, no. 6, pp. 1309–1332, 2016

work page 2016

[9] [9]

Dtam: Dense tracking and mapping in real-time,

R. A. Newcombe, S. J. Lovegrove, and A. J. Davison, “Dtam: Dense tracking and mapping in real-time,” in 2011 international conference on computer vision . IEEE, 2011, pp. 2320–2327

work page 2011

[10] [10]

Real-time large-scale dense rgb-d slam with volumetric fusion,

T. Whelan, M. Kaess, H. Johannsson, M. Fallon, J. J. Leonard, and J. McDonald, “Real-time large-scale dense rgb-d slam with volumetric fusion,” The International Journal of Robotics Research , vol. 34, no. 4-5, pp. 598–626, 2015

work page 2015

[11] [11]

Elasticfusion: Real-time dense slam and light source estimation,

T. Whelan, R. F. Salas-Moreno, B. Glocker, A. J. Davison, and S. Leutenegger, “Elasticfusion: Real-time dense slam and light source estimation,” The International Journal of Robotics Research , vol. 35, no. 14, pp. 1697–1716, 2016

work page 2016

[12] [12]

imap: Implicit mapping and positioning in real-time,

E. Sucar, S. Liu, J. Ortiz, and A. J. Davison, “imap: Implicit mapping and positioning in real-time,” in ICCV, October 2021, pp. 6229–6238

work page 2021

[13] [13]

Nice-slam: Neural implicit scalable encoding for slam,

Z. Zhu, S. Peng, V . Larsson, W. Xu, H. Bao, Z. Cui, M. R. Oswald, and M. Pollefeys, “Nice-slam: Neural implicit scalable encoding for slam,” in CVPR, June 2022, pp. 12 786–12 796

work page 2022

[14] [14]

Eslam: Efficient dense slam system based on hybrid representation of signed distance fields,

M. M. Johari, C. Carta, and F. Fleuret, “Eslam: Efficient dense slam system based on hybrid representation of signed distance fields,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 17 408–17 419

work page 2023

[15] [15]

Co-slam: Joint coordinate and sparse parametric encodings for neural real-time slam,

H. Wang, J. Wang, and L. Agapito, “Co-slam: Joint coordinate and sparse parametric encodings for neural real-time slam,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 13 293–13 302

work page 2023

[16] [16]

Plgslam: Progressive neural scene represenation with local to global bundle adjustment,

T. Deng, G. Shen, T. Qin, J. Wang, W. Zhao, J. Wang, D. Wang, and W. Chen, “Plgslam: Progressive neural scene represenation with local to global bundle adjustment,”arXiv preprint arXiv:2312.09866, 2023

work page arXiv 2023

[17] [17]

3d gaussian splatting for real-time radiance field rendering,

B. Kerbl, G. Kopanas, T. Leimk ¨uhler, and G. Drettakis, “3d gaussian splatting for real-time radiance field rendering,” ACM T ransactions on Graphics, vol. 42, no. 4, 2023

work page 2023

[18] [18]

Splatam: Splat, track & map 3d gaussians for dense rgb-d slam, 2024

N. Keetha, J. Karhade, K. M. Jatavallabhula, G. Yang, S. Scherer, D. Ramanan, and J. Luiten, “Splatam: Splat, track & map 3d gaussians for dense rgb-d slam,” arXiv preprint arXiv:2312.02126 , 2023

work page arXiv 2023

[19] [19]

Gs-slam: Dense visual slam with 3d gaussian splatting,

C. Yan, D. Qu, D. Wang, D. Xu, Z. Wang, B. Zhao, and X. Li, “Gs- slam: Dense visual slam with 3d gaussian splatting,”arXiv preprint arXiv:2311.11700, 2023

work page arXiv 2023

[20] [20]

Gaussian-SLAM: Photo-realistic Dense SLAM with Gaussian Splatting

V . Yugay, Y. Li, T. Gevers, and M. R. Oswald, “Gaussian-slam: Photo-realistic dense slam with gaussian splatting,” arXiv preprint arXiv:2312.10070, 2023

work page Pith review arXiv 2023

[21] [21]

Gaussian splatting slam,

H. Matsuki, R. Murai, P . H. Kelly, and A. J. Davison, “Gaussian splatting slam,” arXiv preprint arXiv:2312.06741 , 2023

work page arXiv 2023

[22] [22]

Kinectfusion: real-time 3d reconstruction and interaction using a moving depth camera,

S. Izadi, D. Kim, O. Hilliges, D. Molyneaux, R. Newcombe, P . Kohli, J. Shotton, S. Hodges, D. Freeman, A. Davison et al. , “Kinectfusion: real-time 3d reconstruction and interaction using a moving depth camera,” in Proceedings of the 24th annual ACM symposium on User interface software and technology , 2011, pp. 559– 568

work page 2011

[23] [23]

Droid-slam: Deep visual slam for monocular, stereo, and rgb-d cameras,

Z. Teed and J. Deng, “Droid-slam: Deep visual slam for monocular, stereo, and rgb-d cameras,” Advances in neural information process- ing systems, vol. 34, pp. 16 558–16 569, 2021

work page 2021

[24] [24]

Raft: Recurrent all-pairs field transforms for optical flow,

Z. Teed and J. Deng, “Raft: Recurrent all-pairs field transforms for optical flow,” in Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part II 16 . Springer, 2020, pp. 402–419

work page 2020

[25] [25]

Codeslam — learning a compact, optimisable representation for dense visual slam,

M. Bloesch, J. Czarnowski, R. Clark, S. Leutenegger, and A. J. Davi- son, “Codeslam — learning a compact, optimisable representation for dense visual slam,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , June 2018

work page 2018

[26] [26]

Deep patch visual odometry,

Z. Teed, L. Lipson, and J. Deng, “Deep patch visual odometry,” Advances in Neural Information Processing Systems , vol. 36, 2024

work page 2024

[27] [27]

Nerf: Representing scenes as neural radiance fields for view synthesis,

B. Mildenhall, P . P . Srinivasan, M. Tancik, J. T. Barron, R. Ra- mamoorthi, and R. Ng, “Nerf: Representing scenes as neural radiance fields for view synthesis,” in ECCV, 2020

work page 2020

[28] [28]

Vox-fusion: Dense tracking and mapping with voxel-based neural implicit representation,

X. Yang, H. Li, H. Zhai, Y. Ming, Y. Liu, and G. Zhang, “Vox-fusion: Dense tracking and mapping with voxel-based neural implicit representation,” in 2022 IEEE International Symposium on Mixed and Augmented Reality (ISMAR) , 2022, pp. 499–507

work page 2022

[29] [29]

Point- slam: Dense neural point cloud-based slam,

E. Sandstr ¨om, Y. Li, L. Van Gool, and M. R. Oswald, “Point- slam: Dense neural point cloud-based slam,” in Proceedings of the 13 IEEE/CVF International Conference on Computer Vision , 2023, pp. 18 433–18 444

work page 2023

[30] [30]

Compact 3d gaussian representation for radiance field,

J. C. Lee, D. Rho, X. Sun, J. H. Ko, and E. Park, “Compact 3d gaussian representation for radiance field,” arXiv preprint arXiv:2311.13681, 2023

work page arXiv 2023

[31] [31]

Compact 3d scene representation via self-organizing gaussian grids,

W. Morgenstern, F. Barthel, A. Hilsmann, and P . Eisert, “Compact 3d scene representation via self-organizing gaussian grids,” arXiv preprint arXiv:2312.13299, 2023

work page arXiv 2023

[32] [32]

Lightgaussian: Unbounded 3d gaussian compression with 15x reduction and 200+ fps,

Z. Fan, K. Wang, K. Wen, Z. Zhu, D. Xu, and Z. Wang, “Lightgaus- sian: Unbounded 3d gaussian compression with 15x reduction and 200+ fps,” arXiv preprint arXiv:2311.17245 , 2023

work page arXiv 2023

[33] [33]

Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation

Y. Bengio, N. L ´eonard, and A. Courville, “Estimating or propagat- ing gradients through stochastic neurons for conditional compu- tation,” arXiv preprint arXiv:1308.3432 , 2013

work page internal anchor Pith review Pith/arXiv arXiv 2013

[34] [34]

Soundstream: An end-to-end neural audio codec,

N. Zeghidour, A. Luebs, A. Omran, J. Skoglund, and M. Tagliasacchi, “Soundstream: An end-to-end neural audio codec,” IEEE/ACM T ransactions on Audio, Speech, and Language Processing, vol. 30, pp. 495–507, 2021

work page 2021

[35] [35]

High Fidelity Neural Audio Compression

A. D ´efossez, J. Copet, G. Synnaeve, and Y. Adi, “High fidelity neural audio compression,” arXiv preprint arXiv:2210.13438 , 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022

[36] [36]

The Replica Dataset: A Digital Replica of Indoor Spaces

J. Straub, T. Whelan, L. Ma, Y. Chen, E. Wijmans, S. Green, J. J. Engel, R. Mur-Artal, C. Ren, S. Verma et al. , “The replica dataset: A digital replica of indoor spaces,” arXiv preprint arXiv:1906.05797, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1906

[37] [37]

Scannet: Richly-annotated 3d reconstructions of indoor scenes,

A. Dai, A. X. Chang, M. Savva, M. Halber, T. Funkhouser, and M. Niessner, “Scannet: Richly-annotated 3d reconstructions of indoor scenes,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , July 2017

work page 2017

[38] [38]

A benchmark for the evaluation of rgb-d slam systems,

J. Sturm, N. Engelhard, F. Endres, W. Burgard, and D. Cremers, “A benchmark for the evaluation of rgb-d slam systems,” in 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems , 2012, pp. 573–580

work page 2012