PRISM-SLAM: Probabilistic Ray-Grounded Inference for Scale-aware Metric SLAM

Eunsoo Im

arxiv: 2605.19257 · v1 · pith:ZKFFKZGRnew · submitted 2026-05-19 · 💻 cs.RO

PRISM-SLAM: Probabilistic Ray-Grounded Inference for Scale-aware Metric SLAM

Eunsoo Im This is my paper

Pith reviewed 2026-05-20 06:04 UTC · model grok-4.3

classification 💻 cs.RO

keywords monocular SLAMmetric scalevision foundation modelsfactor graphdynamic environmentsepistemic uncertaintyreal-time localization

0 comments

The pith

PRISM-SLAM anchors vision foundation model depth predictions with ray-distance factors to produce metric-scale trajectories from monocular RGB without correction.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a real-time monocular SLAM system that folds zero-shot depth predictions from vision foundation models into a Bayesian factor graph. It adds a Plücker Ray-Distance Factor that ties each image observation to absolute metric distances, removing the scale ambiguity that normally requires later correction. A separate mechanism measures how consistent those depth predictions are across consecutive frames and uses the result to down-weight moving objects softly. The outcome is a pipeline that runs at 30 frames per second on RGB alone and delivers trajectories whose metric error matches what an oracle scale alignment would achieve.

Core claim

PRISM-SLAM integrates VFM priors into a structured Bayesian factor graph for scale-aware metric SLAM. The Plücker Ray-Distance Factor anchors monocular observations in absolute space and makes metric scale Fisher-identifiable, eliminating drift. An epistemic uncertainty proxy derived from temporal depth consistency drives Dynamic Scene Uncertainty Gating that probabilistically down-weights dynamic distractors. On TUM RGB-D and 7-Scenes the metric SE(3) ATE is nearly identical to oracle-aligned Sim(3) error with no post-hoc scale correction required.

What carries the argument

The Plücker Ray-Distance Factor, which converts monocular depth observations into absolute metric constraints inside the global factor graph.

If this is right

Metric SE(3) trajectories are obtained directly without any post-hoc scale correction or alignment step.
The pipeline runs at 30 FPS on RGB input alone using asynchronous VFM inference and geometric tracking.
Dynamic objects are suppressed without semantic segmentation masks or extra sensors.
Scale drift is removed because the metric scale becomes identifiable through the ray-based factors.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same ray-grounding idea could be tested on longer outdoor sequences to check whether metric consistency holds over kilometers.
Replacing the current VFM with a lighter depth predictor might preserve accuracy while lowering latency further.
The temporal-consistency uncertainty signal could be applied to other VFM-based perception tasks that must ignore transient scene elements.

Load-bearing premise

The assumption that an epistemic uncertainty proxy derived solely from temporal depth consistency between VFM predictions is sufficient to identify and probabilistically down-weight dynamic distractors across varied environments without semantic segmentation or additional sensors.

What would settle it

A benchmark sequence containing independently moving objects where depth predictions remain temporally consistent yet tracking produces large metric trajectory deviations from ground truth.

Figures

Figures reproduced from arXiv: 2605.19257 by Eunsoo Im.

**Figure 1.** Figure 1: PRISM-SLAM system architecture. Our decoupled pipeline operates across four concurrent processes. (1) Tracking: A CPU-based frontend (∼30 Hz) estimates initial poses and sparse points. (2) VFM Extraction: An asynchronous GPU worker extracts dense metric depth and uncertainty priors via DA3. (3) Scale Recovery (KF): A log-domain Kalman filter and WLS estimator dynamically fuse VFM priors with sparse points … view at source ↗

**Figure 2.** Figure 2: Temporal Uncertainty Modeling in Dynamic Scenes. (a) Input RGB frame from the TUM RGB-D fr3/walking static sequence. (b) Ground-truth depth map. (c) Pose-compensated depth residual of DA3 estimates, utilized as our DSUG epistemic uncertainty proxy u(p). Bright regions indicate high temporal depth variation, precisely capturing the geometrically unstable boundaries of moving subjects. By mapping this varian… view at source ↗

**Figure 3.** Figure 3: Impact of ViT-driven Loop Closure on the TUM fr1/xyz sequence. (a) Without Loop Closure: The purely visual odometry estimate (orange) progressively deviates from the ground truth (grey) due to accumulated scale and rotation drift, resulting in an ATE RMSE of 4.8 cm. (b) With Loop Closure: ViT-driven place recognition successfully detects 30 valid loops. Applying these geometric constraints (pink edges) glo… view at source ↗

**Figure 4.** Figure 4: KITTI trajectory demos. Cyan: PRISM-SLAM. Grey: GT. Yellow/orange dots: Map points. D. Dense Map Quality Analysis PRISM-SLAM produces dense colored point clouds as a high-fidelity output of the reconstruction backend. To isolate the geometric fidelity of the depth estimation models, we fuse depth maps into a TSDF volume (1 cm voxel, 4 cm truncation) using ground-truth (GT) poses. We compare three depth sou… view at source ↗

**Figure 5.** Figure 5: Qualitative 3D Reconstruction and Metric Fidelity on fr1/desk2. This figure illustrates the dense, color-mapped point cloud generated by PRISM-SLAM using only monocular RGB input from the TUM sequence. The reconstruction demonstrates high geometric consistency and crisp surface boundaries. As indicated by the red measurement arrow, the vertical dimension of the computer monitor is estimated at 0.32 m withi… view at source ↗

**Figure 6.** Figure 6: Qualitative 3D Reconstruction and Large-Scale Metric Fidelity. This figure demonstrates the dense point cloud reconstructed on the TUM fr1/room sequence. The system successfully captures the global structure of the room with high geometric consistency. As indicated by the measurement arrow, the horizontal distance between the two walls is estimated at 1.58 m. This precise measurement confirms that our syst… view at source ↗

read the original abstract

Monocular SLAM historically suffers from scale ambiguity and tracking failure in dynamic environments. While recent vision foundation models (VFMs) provide remarkable zero-shot depth priors, naively integrating these deterministic predictions ignores predictive uncertainty and frame-to-frame scale inconsistencies. We propose PRISM-SLAM, a real-time framework that rigorously integrates VFM priors into a structured Bayesian factor graph to achieve scale-aware, metric-consistent localization and mapping. Specifically, we introduce a Pl\"ucker Ray-Distance Factor to anchor monocular observations in absolute space within a globally consistent metric coordinate system, mathematically resolving scale drift by making the metric scale Fisher-identifiable. To handle environmental dynamics, we derive an epistemic uncertainty proxy from temporal depth consistency and formulate a Dynamic Scene Uncertainty Gating (DSUG) mechanism. This soft-gating approach probabilistically down-weights dynamic distractors without incurring the heavy computational overhead associated with traditional semantic segmentation masks. By employing a multi-process architecture that asynchronously processes VFM inference and geometric tracking, PRISM-SLAM provides verified metric output at 30 FPS using solely RGB input, bridging the gap between foundation models and real-world robotic applications. Evaluated on the TUM RGB-D and 7-Scenes benchmarks, PRISM-SLAM achieves a metric $SE(3)$ Absolute Trajectory Error (ATE) nearly identical to its oracle-aligned $Sim(3)$ error. This demonstrates that our system can produce deployment-ready metric trajectories by delivering robust metric SLAM solutions without any post-hoc scale correction. Project page: https://prismslam-cmd.github.io/prismslam_pr/

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

PRISM-SLAM adds a Plücker ray factor and temporal-consistency gating to monocular SLAM with VFMs, but the metric-scale claim rests on thin experimental support.

read the letter

The main point is that this paper puts a new Plücker Ray-Distance Factor into a factor graph to make metric scale identifiable from monocular RGB plus off-the-shelf depth priors, then adds a Dynamic Scene Uncertainty Gating step that down-weights inconsistent regions using only frame-to-frame depth agreement. It reports metric SE(3) ATE nearly matching the oracle Sim(3) numbers on TUM RGB-D and 7-Scenes at 30 FPS without post-hoc correction or extra sensors.

Referee Report

2 major / 2 minor

Summary. The paper presents PRISM-SLAM, a real-time monocular SLAM system that integrates zero-shot depth priors from vision foundation models (VFMs) into a Bayesian factor graph. It introduces a Plücker Ray-Distance Factor to make metric scale Fisher-identifiable within a globally consistent coordinate system and a Dynamic Scene Uncertainty Gating (DSUG) mechanism that derives an epistemic uncertainty proxy from temporal depth consistency to probabilistically down-weight dynamic distractors. Using a multi-process architecture for asynchronous VFM inference and geometric tracking, the system claims 30 FPS operation on RGB input alone. On TUM RGB-D and 7-Scenes benchmarks, it reports that metric SE(3) Absolute Trajectory Error (ATE) is nearly identical to oracle-aligned Sim(3) ATE, demonstrating deployment-ready metric trajectories without post-hoc scale correction.

Significance. If the central claims are substantiated, this would be a meaningful contribution to metric monocular SLAM by showing how VFM priors can be rigorously incorporated via probabilistic factors to resolve scale ambiguity and handle dynamics without semantic segmentation or extra sensors. The emphasis on Fisher-identifiability and real-time multi-process design are positive aspects that could influence practical robotic deployments. The work bridges foundation models and classical SLAM in a structured way, though its impact depends on stronger empirical validation of the key mechanisms.

major comments (2)

[Abstract and Evaluation] Abstract and Evaluation: The central claim that metric SE(3) ATE is nearly identical to oracle Sim(3) ATE on TUM RGB-D and 7-Scenes is load-bearing for the assertion of scale-aware metric output without post-hoc correction, yet the abstract provides no quantitative ATE values, error bars, ablation tables, or explicit verification of Fisher-identifiability (e.g., how the Plücker Ray-Distance Factor renders scale observable). This leaves the result only moderately supported.
[DSUG mechanism] DSUG mechanism: The derivation of the epistemic uncertainty proxy solely from frame-to-frame depth consistency of VFM predictions (as used to formulate the soft-gating in DSUG) is critical to down-weighting dynamic distractors and preserving metric scale identifiability. However, this proxy can be violated by non-dynamic factors such as VFM viewpoint sensitivity or illumination shifts; the manuscript should include targeted ablations or failure-case analysis showing reliable isolation of true dynamics without semantic cues or auxiliary sensors.

minor comments (2)

[Method] Clarify the exact mathematical definition of the Plücker Ray-Distance Factor and its integration into the factor graph (including any relevant equations) to improve reproducibility.
[Evaluation] Ensure all benchmark results include both SE(3) and Sim(3) metrics side-by-side with standard deviations across multiple runs for clearer comparison.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. The comments highlight important aspects of presentation and validation that we will address to strengthen the paper. We respond to each major comment below.

read point-by-point responses

Referee: [Abstract and Evaluation] Abstract and Evaluation: The central claim that metric SE(3) ATE is nearly identical to oracle Sim(3) ATE on TUM RGB-D and 7-Scenes is load-bearing for the assertion of scale-aware metric output without post-hoc correction, yet the abstract provides no quantitative ATE values, error bars, ablation tables, or explicit verification of Fisher-identifiability (e.g., how the Plücker Ray-Distance Factor renders scale observable). This leaves the result only moderately supported.

Authors: We agree that quantitative support in the abstract will improve clarity. In the revision we will insert concrete SE(3) ATE figures (with standard deviations) for the TUM and 7-Scenes sequences together with the corresponding oracle-aligned Sim(3) values. The Fisher-identifiability argument is derived in Section 3.2 via the information matrix contribution of the Plücker ray-distance factors; we will add a one-sentence reference to this derivation in the abstract. Existing ablation tables appear in the supplementary material and will be explicitly cited from the main text. revision: yes
Referee: [DSUG mechanism] DSUG mechanism: The derivation of the epistemic uncertainty proxy solely from frame-to-frame depth consistency of VFM predictions (as used to formulate the soft-gating in DSUG) is critical to down-weighting dynamic distractors and preserving metric scale identifiability. However, this proxy can be violated by non-dynamic factors such as VFM viewpoint sensitivity or illumination shifts; the manuscript should include targeted ablations or failure-case analysis showing reliable isolation of true dynamics without semantic cues or auxiliary sensors.

Authors: We recognize that viewpoint sensitivity and illumination changes can affect the temporal-consistency proxy. We will add a new ablation subsection that isolates these effects on selected sequences (with and without DSUG) and will include failure-case visualizations together with quantitative metrics. These additions will be placed in the main paper or supplementary material to demonstrate that the gating still preferentially attenuates true dynamics. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivations are self-contained from geometric and probabilistic principles

full rationale

The paper derives the Plücker Ray-Distance Factor from first-principles geometry to enforce Fisher-identifiability of metric scale, and the DSUG epistemic uncertainty proxy directly from frame-to-frame VFM depth consistency checks. Neither step reduces the reported SE(3) ATE equivalence to a fitted parameter or self-citation chain; the metric-scale result is an empirical outcome of the factor graph rather than an input quantity redefined as output. No self-definitional loops, fitted-input predictions, or load-bearing self-citations appear in the derivation chain.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 2 invented entities

The central claim rests on the reliability of VFM depth priors, the geometric validity of the Plücker factor, and the effectiveness of temporal consistency as an uncertainty signal; these are introduced or assumed rather than derived from first principles within the paper.

free parameters (1)

DSUG gating thresholds and weights
Parameters that control how strongly temporal inconsistency down-weights observations; their values are chosen to achieve the reported benchmark performance.

axioms (2)

domain assumption Vision foundation models supply zero-shot depth estimates whose frame-to-frame inconsistencies can serve as a usable proxy for epistemic uncertainty in dynamic scenes.
Invoked when defining the Dynamic Scene Uncertainty Gating mechanism.
domain assumption A Plücker ray-distance factor renders metric scale Fisher-identifiable within the factor graph.
Central mathematical claim used to resolve scale drift.

invented entities (2)

Plücker Ray-Distance Factor no independent evidence
purpose: Anchors monocular observations to absolute metric space inside the factor graph.
New factor type introduced to make scale identifiable.
Dynamic Scene Uncertainty Gating (DSUG) no independent evidence
purpose: Soft probabilistic down-weighting of dynamic regions using temporal depth consistency.
New mechanism proposed to avoid semantic segmentation overhead.

pith-pipeline@v0.9.0 · 5810 in / 1714 out tokens · 58785 ms · 2026-05-20T06:04:44.721401+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Plücker Ray-Distance Factor ... eray(Ti, Xk) = ∥di ×Xk +mi∥ / ∥di∥ ... renders the metric scale locally Fisher-identifiable
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Dynamic Scene Uncertainty Gating (DSUG) ... u(p) = α·uspatial(p) + (1−α)·utemporal(p) ... w(p)=σ((τ−u(p))/T)

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

55 extracted references · 55 canonical work pages · 3 internal anchors

[1]

IEEE Transactions on Robotics , volume =

Campos, Carlos and Elvira, Richard and G. IEEE Transactions on Robotics , volume =. 2021 , publisher =

work page 2021
[2]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , pages =

Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data , author =. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , pages =

work page
[3]

Goldman, Evan and others , booktitle =

work page
[4]

Zheng, Jianhao and Zhu, Zihan and Bieri, Valentin and Pollefeys, Marc and Peng, Songyou and Armeni, Iro , booktitle =

work page
[5]

IEEE Robotics and Automation Letters , volume =

Bescos, Berta and F. IEEE Robotics and Automation Letters , volume =. 2018 , publisher =

work page 2018
[6]

Teed, Zachary and Deng, Jia , booktitle =

work page
[7]

ACM Transactions on Graphics (ToG) , volume =

3D Gaussian Splatting for Real-Time Radiance Field Rendering , author =. ACM Transactions on Graphics (ToG) , volume =. 2023 , publisher =

work page 2023
[8]

Gaussian Splatting

Matsuki, Hidenobu and Murai, Riku and Kelly, Paul HJ and Davison, Andrew J , booktitle =. Gaussian Splatting

work page
[9]

Hu, Mu and Yin, Wei and Zhang, Chi and Cai, Zhipeng and Long, Xiaoxiao and Chen, Hao and Wang, Chunhua and Sasic, Matia and Shen, Chunhua , journal =

work page
[10]

European Conference on Computer Vision (ECCV) , pages =

Machine learning for high-speed corner detection , author =. European Conference on Computer Vision (ECCV) , pages =. 2006 , organization =

work page 2006
[11]

2011 , organization =

Rublee, Ethan and Rabaud, Vincent and Konolige, Kurt and Bradski, Gary , booktitle =. 2011 , organization =

work page 2011
[12]

2003 , publisher =

Multiple View Geometry in Computer Vision , author =. 2003 , publisher =

work page 2003
[13]

Communications of the ACM , volume =

Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , author =. Communications of the ACM , volume =. 1981 , publisher =

work page 1981
[14]

2009 , publisher =

Lepetit, Vincent and Moreno-Noguer, Francesc and Fua, Pascal , journal =. 2009 , publisher =

work page 2009
[15]

International Workshop on Vision Algorithms , pages =

Bundle adjustment—a modern synthesis , author =. International Workshop on Vision Algorithms , pages =. 1999 , organization =

work page 1999
[16]

Strasdat, Hauke and Montiel, Jos. Visual. Image and Vision Computing , volume =. 2012 , publisher =

work page 2012
[17]

Foundations and Trends

Factor graphs for robot perception , author =. Foundations and Trends. 2017 , publisher =

work page 2017
[18]

A tutorial on graph-based

Grisetti, Giorgio and Kummerle, Rainer and Stachniss, Cyrill and Burgard, Wolfram , journal =. A tutorial on graph-based. 2010 , publisher =

work page 2010
[19]

Advances in Neural Information Processing Systems (NeurIPS) , volume =

What uncertainties do we need in bayesian deep learning for computer vision? , author =. Advances in Neural Information Processing Systems (NeurIPS) , volume =

work page
[20]

Computer Vision and Image Understanding , volume =

Structure-from-motion using lines: Representation, triangulation, and bundle adjustment , author =. Computer Vision and Image Understanding , volume =. 2005 , publisher =

work page 2005
[21]

Journal of Basic Engineering , volume =

A new approach to linear filtering and prediction problems , author =. Journal of Basic Engineering , volume =. 1960 , publisher =

work page 1960
[22]

Quarterly of applied mathematics , volume =

A method for the solution of certain non-linear problems in least squares , author =. Quarterly of applied mathematics , volume =

work page
[23]

Journal of the society for Industrial and Applied Mathematics , volume =

An algorithm for least-squares estimation of nonlinear parameters , author =. Journal of the society for Industrial and Applied Mathematics , volume =. 1963 , publisher =

work page 1963
[24]

Bulletin of the Calcutta Mathematical Society , volume =

On a measure of divergence between two statistical populations defined by their probability distributions , author =. Bulletin of the Calcutta Mathematical Society , volume =

work page
[25]

IEEE Transactions on Robotics , volume =

Bags of binary words for fast place recognition in image sequences , author =. IEEE Transactions on Robotics , volume =. 2012 , publisher =

work page 2012
[26]

International Conference on Learning Representations (ICLR) , year =

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , author =. International Conference on Learning Representations (ICLR) , year =

work page
[27]

DINOv2: Learning Robust Visual Features without Supervision

Oquab, Maxime and Darcet, Timoth. arXiv preprint arXiv:2304.07193 , year =

work page internal anchor Pith review Pith/arXiv arXiv
[28]

ACM Transactions on Graphics (ToG) , volume =

Real-time 3D reconstruction at scale using voxel hashing , author =. ACM Transactions on Graphics (ToG) , volume =. 2013 , publisher =

work page 2013
[29]

Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) , pages =

A benchmark for the evaluation of RGB-D SLAM systems , author =. Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) , pages =. 2012 , organization =

work page 2012
[30]

Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) , pages =

ReFusion: 3D reconstruction in dynamic environments for RGB-D cameras exploiting residuals , author =. Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) , pages =. 2019 , organization =

work page 2019
[31]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , pages =

Gaussian Splatting SLAM , author =. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , pages =

work page
[32]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , pages =

Scene coordinate regression forests for camera relocalization in RGB-D images , author =. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , pages =

work page
[33]

2026 , note =

Kim, Seonghun and Park, Jongwoo and Lee, Hyungtae , journal =. 2026 , note =

work page 2026
[34]

Zhang, Youmin and Tosi, Fabio and Beker, Simon and Poggi, Matteo and Mattoccia, Stefano , booktitle =

work page
[35]

European Conference on Computer Vision , pages =

Deep patch visual slam , author =. European Conference on Computer Vision , pages =. 2024 , organization =

work page 2024
[36]

Proceedings of the Computer Vision and Pattern Recognition Conference , pages =

Splat-slam: Globally optimized rgb-only slam with 3d gaussians , author =. Proceedings of the Computer Vision and Pattern Recognition Conference , pages =

work page
[37]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops , pages =

Sandstr. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops , pages =

work page
[38]

Murai, Riku and Dexheimer, Eric and Davison, Andrew J , booktitle =

work page
[39]

Hu, Lingxiang and Oufroukh, Naima Ait and Bonardi, Fabien and Ghandour, Raymond , journal =

work page
[40]

Deng, Kai and others , journal =

work page
[41]

Depth Anything 3: Recovering the Visual Space from Any Views

Depth Anything 3: Recovering the Visual Space from Any Views , author =. arXiv preprint arXiv:2511.10647 , year =

work page internal anchor Pith review Pith/arXiv arXiv
[42]

arXiv preprint arXiv:1812.04605 , year =

Deepv2d: Video to depth with differentiable structure from motion , author =. arXiv preprint arXiv:1812.04605 , year =

work page arXiv
[43]

IEEE Robotics and Automation Letters , volume =

Deepfactors: Real-time probabilistic dense monocular slam , author =. IEEE Robotics and Automation Letters , volume =. 2020 , publisher =

work page 2020
[44]

IEEE Robotics and Automation Letters , volume =

Rodyn-slam: Robust dynamic dense rgb-d slam with neural radiance fields , author =. IEEE Robotics and Automation Letters , volume =. 2024 , publisher =

work page 2024
[45]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages =

Megasam: Accurate, fast and robust structure and motion from casual dynamic videos , author =. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages =

work page
[46]

Proceedings of the Computer Vision and Pattern Recognition Conference , pages =

Vggt: Visual geometry grounded transformer , author =. Proceedings of the Computer Vision and Pattern Recognition Conference , pages =

work page
[47]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages =

Unidepth: Universal monocular metric depth estimation , author =. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages =

work page
[48]

European conference on computer vision , pages =

Grounding image matching in 3d with mast3r , author =. European conference on computer vision , pages =. 2024 , organization =

work page 2024
[49]

VGGT-SLAM: Dense RGB SLAM Optimized on the SL(4) Manifold

Vggt-slam: Dense rgb slam optimized on the sl (4) manifold , author =. arXiv preprint arXiv:2505.12549 , year =

work page internal anchor Pith review arXiv
[50]

Proceedings of the IEEE/CVF International Conference on Computer Vision , pages =

Omnidata: A scalable pipeline for making multi-task mid-level vision datasets from 3d scans , author =. Proceedings of the IEEE/CVF International Conference on Computer Vision , pages =

work page
[51]

The International Journal of Robotics Research , volume =

Observability-based rules for designing consistent EKF SLAM estimators , author =. The International Journal of Robotics Research , volume =. 2010 , publisher =

work page 2010
[52]

Barroso-Laguna, Axel and Riba, Edgar and Ellis, Daniel and Mikolajczyk, Krystian , booktitle =

work page
[53]

Advances in Neural Information Processing Systems (NeurIPS) , volume =

Tyszkiewicz, Micha. Advances in Neural Information Processing Systems (NeurIPS) , volume =

work page
[54]

IEEE Transactions on Robotics , volume =

Mur-Artal, Raul and Tard. IEEE Transactions on Robotics , volume =. 2017 , publisher =

work page 2017
[55]

2012 IEEE Conference on Computer Vision and Pattern Recognition , pages =

Are we ready for autonomous driving? The KITTI vision benchmark suite , author =. 2012 IEEE Conference on Computer Vision and Pattern Recognition , pages =. 2012 , organization =

work page 2012

[1] [1]

IEEE Transactions on Robotics , volume =

Campos, Carlos and Elvira, Richard and G. IEEE Transactions on Robotics , volume =. 2021 , publisher =

work page 2021

[2] [2]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , pages =

Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data , author =. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , pages =

work page

[3] [3]

Goldman, Evan and others , booktitle =

work page

[4] [4]

Zheng, Jianhao and Zhu, Zihan and Bieri, Valentin and Pollefeys, Marc and Peng, Songyou and Armeni, Iro , booktitle =

work page

[5] [5]

IEEE Robotics and Automation Letters , volume =

Bescos, Berta and F. IEEE Robotics and Automation Letters , volume =. 2018 , publisher =

work page 2018

[6] [6]

Teed, Zachary and Deng, Jia , booktitle =

work page

[7] [7]

ACM Transactions on Graphics (ToG) , volume =

3D Gaussian Splatting for Real-Time Radiance Field Rendering , author =. ACM Transactions on Graphics (ToG) , volume =. 2023 , publisher =

work page 2023

[8] [8]

Gaussian Splatting

Matsuki, Hidenobu and Murai, Riku and Kelly, Paul HJ and Davison, Andrew J , booktitle =. Gaussian Splatting

work page

[9] [9]

Hu, Mu and Yin, Wei and Zhang, Chi and Cai, Zhipeng and Long, Xiaoxiao and Chen, Hao and Wang, Chunhua and Sasic, Matia and Shen, Chunhua , journal =

work page

[10] [10]

European Conference on Computer Vision (ECCV) , pages =

Machine learning for high-speed corner detection , author =. European Conference on Computer Vision (ECCV) , pages =. 2006 , organization =

work page 2006

[11] [11]

2011 , organization =

Rublee, Ethan and Rabaud, Vincent and Konolige, Kurt and Bradski, Gary , booktitle =. 2011 , organization =

work page 2011

[12] [12]

2003 , publisher =

Multiple View Geometry in Computer Vision , author =. 2003 , publisher =

work page 2003

[13] [13]

Communications of the ACM , volume =

Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , author =. Communications of the ACM , volume =. 1981 , publisher =

work page 1981

[14] [14]

2009 , publisher =

Lepetit, Vincent and Moreno-Noguer, Francesc and Fua, Pascal , journal =. 2009 , publisher =

work page 2009

[15] [15]

International Workshop on Vision Algorithms , pages =

Bundle adjustment—a modern synthesis , author =. International Workshop on Vision Algorithms , pages =. 1999 , organization =

work page 1999

[16] [16]

Strasdat, Hauke and Montiel, Jos. Visual. Image and Vision Computing , volume =. 2012 , publisher =

work page 2012

[17] [17]

Foundations and Trends

Factor graphs for robot perception , author =. Foundations and Trends. 2017 , publisher =

work page 2017

[18] [18]

A tutorial on graph-based

Grisetti, Giorgio and Kummerle, Rainer and Stachniss, Cyrill and Burgard, Wolfram , journal =. A tutorial on graph-based. 2010 , publisher =

work page 2010

[19] [19]

Advances in Neural Information Processing Systems (NeurIPS) , volume =

What uncertainties do we need in bayesian deep learning for computer vision? , author =. Advances in Neural Information Processing Systems (NeurIPS) , volume =

work page

[20] [20]

Computer Vision and Image Understanding , volume =

Structure-from-motion using lines: Representation, triangulation, and bundle adjustment , author =. Computer Vision and Image Understanding , volume =. 2005 , publisher =

work page 2005

[21] [21]

Journal of Basic Engineering , volume =

A new approach to linear filtering and prediction problems , author =. Journal of Basic Engineering , volume =. 1960 , publisher =

work page 1960

[22] [22]

Quarterly of applied mathematics , volume =

A method for the solution of certain non-linear problems in least squares , author =. Quarterly of applied mathematics , volume =

work page

[23] [23]

Journal of the society for Industrial and Applied Mathematics , volume =

An algorithm for least-squares estimation of nonlinear parameters , author =. Journal of the society for Industrial and Applied Mathematics , volume =. 1963 , publisher =

work page 1963

[24] [24]

Bulletin of the Calcutta Mathematical Society , volume =

On a measure of divergence between two statistical populations defined by their probability distributions , author =. Bulletin of the Calcutta Mathematical Society , volume =

work page

[25] [25]

IEEE Transactions on Robotics , volume =

Bags of binary words for fast place recognition in image sequences , author =. IEEE Transactions on Robotics , volume =. 2012 , publisher =

work page 2012

[26] [26]

International Conference on Learning Representations (ICLR) , year =

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , author =. International Conference on Learning Representations (ICLR) , year =

work page

[27] [27]

DINOv2: Learning Robust Visual Features without Supervision

Oquab, Maxime and Darcet, Timoth. arXiv preprint arXiv:2304.07193 , year =

work page internal anchor Pith review Pith/arXiv arXiv

[28] [28]

ACM Transactions on Graphics (ToG) , volume =

Real-time 3D reconstruction at scale using voxel hashing , author =. ACM Transactions on Graphics (ToG) , volume =. 2013 , publisher =

work page 2013

[29] [29]

Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) , pages =

A benchmark for the evaluation of RGB-D SLAM systems , author =. Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) , pages =. 2012 , organization =

work page 2012

[30] [30]

Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) , pages =

ReFusion: 3D reconstruction in dynamic environments for RGB-D cameras exploiting residuals , author =. Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) , pages =. 2019 , organization =

work page 2019

[31] [31]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , pages =

Gaussian Splatting SLAM , author =. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , pages =

work page

[32] [32]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , pages =

Scene coordinate regression forests for camera relocalization in RGB-D images , author =. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , pages =

work page

[33] [33]

2026 , note =

Kim, Seonghun and Park, Jongwoo and Lee, Hyungtae , journal =. 2026 , note =

work page 2026

[34] [34]

Zhang, Youmin and Tosi, Fabio and Beker, Simon and Poggi, Matteo and Mattoccia, Stefano , booktitle =

work page

[35] [35]

European Conference on Computer Vision , pages =

Deep patch visual slam , author =. European Conference on Computer Vision , pages =. 2024 , organization =

work page 2024

[36] [36]

Proceedings of the Computer Vision and Pattern Recognition Conference , pages =

Splat-slam: Globally optimized rgb-only slam with 3d gaussians , author =. Proceedings of the Computer Vision and Pattern Recognition Conference , pages =

work page

[37] [37]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops , pages =

Sandstr. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops , pages =

work page

[38] [38]

Murai, Riku and Dexheimer, Eric and Davison, Andrew J , booktitle =

work page

[39] [39]

Hu, Lingxiang and Oufroukh, Naima Ait and Bonardi, Fabien and Ghandour, Raymond , journal =

work page

[40] [40]

Deng, Kai and others , journal =

work page

[41] [41]

Depth Anything 3: Recovering the Visual Space from Any Views

Depth Anything 3: Recovering the Visual Space from Any Views , author =. arXiv preprint arXiv:2511.10647 , year =

work page internal anchor Pith review Pith/arXiv arXiv

[42] [42]

arXiv preprint arXiv:1812.04605 , year =

Deepv2d: Video to depth with differentiable structure from motion , author =. arXiv preprint arXiv:1812.04605 , year =

work page arXiv

[43] [43]

IEEE Robotics and Automation Letters , volume =

Deepfactors: Real-time probabilistic dense monocular slam , author =. IEEE Robotics and Automation Letters , volume =. 2020 , publisher =

work page 2020

[44] [44]

IEEE Robotics and Automation Letters , volume =

Rodyn-slam: Robust dynamic dense rgb-d slam with neural radiance fields , author =. IEEE Robotics and Automation Letters , volume =. 2024 , publisher =

work page 2024

[45] [45]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages =

Megasam: Accurate, fast and robust structure and motion from casual dynamic videos , author =. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages =

work page

[46] [46]

Proceedings of the Computer Vision and Pattern Recognition Conference , pages =

Vggt: Visual geometry grounded transformer , author =. Proceedings of the Computer Vision and Pattern Recognition Conference , pages =

work page

[47] [47]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages =

Unidepth: Universal monocular metric depth estimation , author =. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages =

work page

[48] [48]

European conference on computer vision , pages =

Grounding image matching in 3d with mast3r , author =. European conference on computer vision , pages =. 2024 , organization =

work page 2024

[49] [49]

VGGT-SLAM: Dense RGB SLAM Optimized on the SL(4) Manifold

Vggt-slam: Dense rgb slam optimized on the sl (4) manifold , author =. arXiv preprint arXiv:2505.12549 , year =

work page internal anchor Pith review arXiv

[50] [50]

Proceedings of the IEEE/CVF International Conference on Computer Vision , pages =

Omnidata: A scalable pipeline for making multi-task mid-level vision datasets from 3d scans , author =. Proceedings of the IEEE/CVF International Conference on Computer Vision , pages =

work page

[51] [51]

The International Journal of Robotics Research , volume =

Observability-based rules for designing consistent EKF SLAM estimators , author =. The International Journal of Robotics Research , volume =. 2010 , publisher =

work page 2010

[52] [52]

Barroso-Laguna, Axel and Riba, Edgar and Ellis, Daniel and Mikolajczyk, Krystian , booktitle =

work page

[53] [53]

Advances in Neural Information Processing Systems (NeurIPS) , volume =

Tyszkiewicz, Micha. Advances in Neural Information Processing Systems (NeurIPS) , volume =

work page

[54] [54]

IEEE Transactions on Robotics , volume =

Mur-Artal, Raul and Tard. IEEE Transactions on Robotics , volume =. 2017 , publisher =

work page 2017

[55] [55]

2012 IEEE Conference on Computer Vision and Pattern Recognition , pages =

Are we ready for autonomous driving? The KITTI vision benchmark suite , author =. 2012 IEEE Conference on Computer Vision and Pattern Recognition , pages =. 2012 , organization =

work page 2012