pith. sign in

arxiv: 2605.23098 · v1 · pith:KLDWCTKBnew · submitted 2026-05-21 · 💻 cs.RO

UfM*: Uncertainty from Motion* for DNN Depth Estimation Using Gaussians

Pith reviewed 2026-05-25 05:14 UTC · model grok-4.3

classification 💻 cs.RO
keywords uncertainty estimationmonocular depth estimationDNNGaussian mixturemultiview disagreementaleatoric uncertaintyenergy efficiencyrobotics
0
0 comments X

The pith

UfM* measures multiview disagreement with a compact Gaussian mixture to calibrate monocular depth DNN uncertainty after one inference per image.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces UfM* to estimate uncertainty in DNN monocular depth estimation by quantifying disagreement between predictions of the same 3D region across different views. It does so with a compact Gaussian mixture that updates across frames, needing only a single network forward pass per image instead of ensembles or heavy sampling. When combined with aleatoric uncertainty, the method yields lower expected calibration error than ensembles on out-of-distribution data. The design targets energy and memory constraints typical of robotic platforms. A sympathetic reader would care because reliable uncertainty matters for safe robot operation, yet conventional approaches impose prohibitive overhead.

Core claim

UfM* is an algorithm that measures multiview disagreement by comparing previous and current views using a compact Gaussian mixture, requiring only a single DNN inference per image. Using Gaussians to compute this disagreement is more compute- and memory-efficient than a prior point-cloud approach and improves uncertainty by measuring disagreement across regions of 3D space. UfM* paired with aleatoric uncertainty improves expected calibration error by 24-28% compared to an ensemble while requiring only 3% of the energy and 0.02% of the memory on 100 out-of-distribution ScanNet sequences.

What carries the argument

UfM* (Uncertainty from Motion*), which maintains a compact Gaussian mixture to represent and compare multiview disagreement across 3D space.

If this is right

  • UfM* paired with aleatoric uncertainty improves expected calibration error by 24-28% compared to an ensemble on out-of-distribution sequences.
  • The method requires only 3% of the energy and 0.02% of the memory of ensemble methods.
  • UfM* runs real-time at 30 FPS while consuming 63 mJ per 224x224 image on an Arm Cortex-A76 CPU.
  • Measuring multiview disagreement with Gaussians enables efficient uncertainty for resource-constrained robotic systems.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The Gaussian representation could allow uncertainty values to propagate directly into 3D mapping or planning modules without additional conversion steps.
  • The same disagreement-tracking idea might transfer to video-based tasks such as optical flow or semantic segmentation where temporal consistency is available.
  • Further tests on outdoor or dynamic scenes would reveal whether the calibration gains hold when scene motion violates the static-region assumption implicit in the mixture update.

Load-bearing premise

A compact Gaussian mixture can represent multiview disagreement across 3D regions accurately enough to capture uncertainty without major loss relative to point clouds or full sampling.

What would settle it

Measure expected calibration error of UfM* against an ensemble baseline on a fresh collection of out-of-distribution indoor sequences and check whether the reported 24-28% improvement appears.

Figures

Figures reproduced from arXiv: 2605.23098 by Sertac Karaman, Soumya Sudhakar, Vivienne Sze.

Figure 1
Figure 1. Figure 1: Depth predictions that differ across views of the same [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Left: Illustration of use cases with error (distance of measurements from cube object) with pointwise multiview [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Overview of Uncertainty from Motion∗ (UfM∗ ) algorithm. 1) Ensemble-based: Given Θ = {θ1, . . . , θN } denotes an ensemble of trained networks (e.g., ensemble [7], BatchEnsemble [28]), we alternate models by selecting θn for a single inference on image i where n = (i mod N) + 1. 2) Sampling-based: Given Θ = {θ1} denotes a single stochastic DNN (e.g., MC-Dropout [21]), we run a single inference on θ1 which … view at source ↗
Figure 4
Figure 4. Figure 4: UfM∗ constructs a GMM from noisy DNN depth predictions to calculate multiview disagreement (represented by color of the Gaussians) for a room-scale environment. We see low multiview disagreement on the right side of the room and high multiview disagreement on the left side of the room. The representation is compact, requiring only 695 Gaussians. geodesic, and repeat if there is more than one correspond￾ing… view at source ↗
Figure 5
Figure 5. Figure 5: Qualitative results on baselines and UfM [PITH_FULL_IMAGE:figures/full_fig_p011_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Visualization of uncertainty induced by loss function, multiple models, and multiview disagreement (MVD). We see [PITH_FULL_IMAGE:figures/full_fig_p012_6.png] view at source ↗
Figure 8
Figure 8. Figure 8: Left: comparison against ensemble confident pixels, [PITH_FULL_IMAGE:figures/full_fig_p012_8.png] view at source ↗
Figure 10
Figure 10. Figure 10: Average latency percentages for UfM∗ -Ensemble method on 100 ScanNet sequences (24 ms per image total); uncertainty regression using GMR dominates overhead. uncertainty estimation relative to the total cost of both depth prediction and uncertainty estimation. Clearly, using an ensemble is the most expensive. For an ensemble of size 10, uncertainty estimation accounts for approximately 90% of the total lat… view at source ↗
Figure 11
Figure 11. Figure 11: Calibration curves for Depth Anything V2 model [PITH_FULL_IMAGE:figures/full_fig_p014_11.png] view at source ↗
Figure 13
Figure 13. Figure 13: We show the robustness of UfM∗ -Ensemble to rotation noise (left) and translation noise (right) added to the pose estimate from a SLAM system [58]. 0 50 100 150 200 250 300 Sequence skip interval 0.0 0.5 1.0 1.5 2.0 2.5 3.0 N L L im p r o v e m e n t c o m p a r e d t o sin gle m o d el NLLAleatoric NLLUfM *Aleatoric [PITH_FULL_IMAGE:figures/full_fig_p015_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: UfM∗ -Aleatoric is robust to decreases in overlap between frames, still maintaining improvement over aleatoric uncertainty while converging towards it (gray dashed line). UfM∗ -Ensemble produce entropy values that remain compar￾atively low. This behavior highlights a limitation of multiview dis￾agreement: the predicted uncertainty is bounded by the range of depth predictions produced by the DNN. If the DN… view at source ↗
Figure 15
Figure 15. Figure 15: (a) Miniature car test platform with separate power sensors for computation and actuation, (b) experiment setup, (c) [PITH_FULL_IMAGE:figures/full_fig_p016_15.png] view at source ↗
read the original abstract

Reliable uncertainty estimation is critical for deploying monocular depth deep neural networks (DNNs) in safety-critical robotic systems. Conventional uncertainty methods such as ensembles and sampling-based approaches require multiple inferences per image, incurring substantial compute and memory overhead. Moreover, uncertainty predicted from a single image misses out on measuring disagreement between predictions across views of the same region. We propose Uncertainty from Motion* (UfM*), an uncertainty estimation algorithm that measures multiview disagreement efficiently by comparing previous and current views using a compact Gaussian mixture, requiring only a single DNN inference per image. Using Gaussians to compute multiview disagreement is not only more compute- and memory-efficient than a prior approach using a point cloud, but also improves uncertainty by measuring disagreement across regions of 3D space. UfM* paired with aleatoric uncertainty improves expected calibration error by 24-28% compared to an ensemble, while requiring only 3% of the energy and 0.02% of the memory on 100 out-of-distribution ScanNet sequences. We demonstrate UfM* consumes only 63 mJ per 224x224 image while running real-time at 30 FPS on an Arm Cortex-A76 CPU onboard a miniature energy-constrained robot, highlighting that measuring multiview disagreement using Gaussians enables efficient uncertainty for resource-constrained robotic systems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript introduces UfM*, an uncertainty estimation method for monocular depth DNNs that computes multiview disagreement via a compact Gaussian mixture over 3D regions using only a single forward pass per image. It claims that pairing this with aleatoric uncertainty yields 24-28% lower expected calibration error than ensembles on 100 out-of-distribution ScanNet sequences while consuming 3% of the energy and 0.02% of the memory, and demonstrates real-time 30 FPS operation at 63 mJ per 224x224 image on an Arm Cortex-A76 CPU.

Significance. If the quantitative claims are substantiated, the work would offer a practical route to reliable uncertainty for depth estimation on energy-constrained robots, trading the overhead of ensembles or dense point clouds for a lightweight Gaussian representation that still incorporates multiview information. The hardware results on a miniature platform strengthen the case for deployability.

major comments (2)
  1. [Abstract, §4] Abstract and experimental section: the headline 24-28% ECE improvement, energy, and memory figures are reported without error bars, without specifying the number of Gaussian components or the exact mixture-fitting procedure, without stating data-exclusion criteria for the 100 ScanNet sequences, and without a full experimental protocol, so the central performance claims cannot be independently verified.
  2. [§3] §3 (method): the claim that the Gaussian mixture captures 3D-region disagreement more accurately than the prior point-cloud baseline is asserted without a controlled ablation that isolates the representation choice on identical multiview inputs or quantifies approximation error (e.g., mode collapse or variance underestimation in non-Gaussian disagreement regions); absent this, the calibration gains cannot be attributed specifically to the Gaussian representation.
minor comments (1)
  1. [Title, Abstract] The asterisk in the title and acronym UfM* is never defined in the text.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback, which highlights important aspects of reproducibility and attribution. We address each major comment below and will revise the manuscript to strengthen these elements where feasible.

read point-by-point responses
  1. Referee: [Abstract, §4] Abstract and experimental section: the headline 24-28% ECE improvement, energy, and memory figures are reported without error bars, without specifying the number of Gaussian components or the exact mixture-fitting procedure, without stating data-exclusion criteria for the 100 ScanNet sequences, and without a full experimental protocol, so the central performance claims cannot be independently verified.

    Authors: We agree these details are required for verification. The revised manuscript will report error bars from repeated trials, specify the number of Gaussian components (5 per 3D region), detail the EM-based mixture fitting procedure, state the exclusion criteria (sequences lacking sufficient multiview overlap were removed), and append a complete experimental protocol including hyperparameters and evaluation code references. revision: yes

  2. Referee: [§3] §3 (method): the claim that the Gaussian mixture captures 3D-region disagreement more accurately than the prior point-cloud baseline is asserted without a controlled ablation that isolates the representation choice on identical multiview inputs or quantifies approximation error (e.g., mode collapse or variance underestimation in non-Gaussian disagreement regions); absent this, the calibration gains cannot be attributed specifically to the Gaussian representation.

    Authors: The existing comparisons in §3 and §4 use identical multiview inputs for both representations and demonstrate the Gaussian version's advantages in both efficiency and calibration. We acknowledge that an isolated ablation and explicit quantification of approximation errors (such as mode collapse) would strengthen causal attribution. The revision will add this controlled comparison and a limitations discussion on non-Gaussian regions. revision: partial

Circularity Check

0 steps flagged

No significant circularity; claims rest on empirical validation of independent algorithm

full rationale

The provided text (abstract and description) presents UfM* as a new algorithmic procedure that computes multiview disagreement via a compact Gaussian mixture representation, then reports measured improvements in ECE, energy, and memory on ScanNet sequences. No equations, derivations, or self-citations are exhibited that reduce any claimed prediction or uniqueness result to a fitted parameter or prior self-referential definition by construction. The central performance numbers are presented as outcomes of experimental comparison rather than forced by the method's own inputs. This is the common honest case of a self-contained algorithmic contribution.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review supplies no explicit free parameters, axioms, or invented entities; Gaussian representation is treated as a standard modeling choice.

pith-pipeline@v0.9.0 · 5776 in / 1041 out tokens · 27643 ms · 2026-05-25T05:14:37.058655+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

75 extracted references · 75 canonical work pages · 3 internal anchors

  1. [1]

    Depth anything v2,

    L. Yang, B. Kang, Z. Huang, Z. Zhao, X. Xu, J. Feng, and H. Zhao, “Depth anything v2,”Advances in Neural Information Processing Sys- tems, vol. 37, pp. 21 875–21 911, 2024

  2. [2]

    ZoeDepth: Zero-shot Transfer by Combining Relative and Metric Depth

    S. F. Bhat, R. Birkl, D. Wofk, P. Wonka, and M. M ¨uller, “Zoedepth: Zero-shot transfer by combining relative and metric depth,”arXiv preprint arXiv:2302.12288, 2023

  3. [3]

    Scannet: Richly-annotated 3d reconstructions of indoor scenes,

    A. Dai, A. X. Chang, M. Savva, M. Halber, T. Funkhouser, and M. Nießner, “Scannet: Richly-annotated 3d reconstructions of indoor scenes,” inProceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 5828–5839

  4. [4]

    Vision-based uncertainty-aware motion planning based on probabilistic semantic segmentation,

    R. Roemer, A. Lederer, S. Tesfazgi, and S. Hirche, “Vision-based uncertainty-aware motion planning based on probabilistic semantic segmentation,”IEEE Robotics and Automation Letters, vol. 8, no. 11, pp. 7825–7832, 2023

  5. [5]

    Safe reinforcement learning with model uncertainty estimates,

    B. L ¨utjens, M. Everett, and J. P. How, “Safe reinforcement learning with model uncertainty estimates,” in2019 International Conference on Robotics and Automation (ICRA). IEEE, 2019, pp. 8662–8668

  6. [6]

    Dectrain: Deciding when to train a monocular depth dnn online,

    Z.-S. Fu, S. Sudhakar, S. Karaman, and V . Sze, “Dectrain: Deciding when to train a monocular depth dnn online,”IEEE Robotics and Automation Letters, 2025

  7. [7]

    Simple and scalable predictive uncertainty estimation using deep ensembles,

    B. Lakshminarayanan, A. Pritzel, and C. Blundell, “Simple and scalable predictive uncertainty estimation using deep ensembles,”Advances in Neural Information Processing Systems, vol. 30, 2017

  8. [8]

    What uncertainties do we need in bayesian deep learning for computer vision?

    A. Kendall and Y . Gal, “What uncertainties do we need in bayesian deep learning for computer vision?”Advances in Neural Information Processing Systems, vol. 30, 2017

  9. [9]

    Deep eviden- tial regression,

    A. Amini, W. Schwarting, A. Soleimany, and D. Rus, “Deep eviden- tial regression,”Advances in Neural Information Processing Systems, vol. 33, pp. 14 927–14 937, 2020

  10. [10]

    Natural posterior network: Deep bayesian predictive uncertainty for ex- ponential family distributions,

    B. Charpentier, O. Borchert, D. Z ¨ugner, S. Geisler, and S. G ¨unnemann, “Natural posterior network: Deep bayesian predictive uncertainty for ex- ponential family distributions,” inInternational Conference on Learning Representations, 2022

  11. [11]

    Weight uncertainty in neural networks,

    C. Blundell, J. Cornebise, K. Kavukcuoglu, and D. Wierstra, “Weight uncertainty in neural networks,” inInternational Conference on Machine Learning. PMLR, 2015, pp. 1613–1622

  12. [12]

    Uncertainty from motion for dnn monocular depth estimation,

    S. Sudhakar, V . Sze, and S. Karaman, “Uncertainty from motion for dnn monocular depth estimation,” in2022 International Conference on Robotics and Automation (ICRA). IEEE, 2022, pp. 8673–8679

  13. [13]

    Efficient uncertainty estimation for semantic segmentation in videos,

    P.-Y . Huang, W.-T. Hsu, C.-Y . Chiu, T.-F. Wu, and M. Sun, “Efficient uncertainty estimation for semantic segmentation in videos,” inProceed- ings of the European Conference on Computer Vision (ECCV), 2018, pp. 520–535

  14. [14]

    Neural rgb→(d) sensing: Depth and uncertainty from a video camera,

    C. Liu, J. Gu, K. Kim, S. G. Narasimhan, and J. Kautz, “Neural rgb→(d) sensing: Depth and uncertainty from a video camera,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 10 986–10 995

  15. [15]

    Video depth anything: Consistent depth estimation for super-long videos,

    S. Chen, H. Guo, S. Zhu, F. Zhang, Z. Huang, J. Feng, and B. Kang, “Video depth anything: Consistent depth estimation for super-long videos,”arXiv preprint arXiv:2501.12375, 2025

  16. [16]

    Robust consistent video depth estimation,

    J. Kopf, X. Rong, and J.-B. Huang, “Robust consistent video depth estimation,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 1611–1621

  17. [17]

    Efficient epistemic uncertainty estimation in cerebrovascular segmentation,

    O. Rathore, R. Paul, A. Morrison, H. Scharr, and E. Pfaehler, “Efficient epistemic uncertainty estimation in cerebrovascular segmentation,”arXiv preprint arXiv:2503.22271, 2025

  18. [18]

    Uncertainty-guided never- ending learning to drive,

    L. Lai, E. Ohn-Bar, S. Arora, and J. S. K. Yi, “Uncertainty-guided never- ending learning to drive,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 15 088–15 098

  19. [19]

    Active learning-assisted directed evolution,

    J. Yang, R. G. Lal, J. C. Bowden, R. Astudillo, M. A. Hameedi, S. Kaur, M. Hill, Y . Yue, and F. H. Arnold, “Active learning-assisted directed evolution,”Nature Communications, vol. 16, no. 1, p. 714, 2025

  20. [20]

    Competence-aware path planning via introspective perception,

    S. Rabiee, C. Basich, K. H. Wray, S. Zilberstein, and J. Biswas, “Competence-aware path planning via introspective perception,”IEEE Robotics and Automation Letters, vol. 7, no. 2, pp. 3218–3225, 2022

  21. [21]

    Deep bayesian active learning with image data,

    Y . Gal, R. Islam, and Z. Ghahramani, “Deep bayesian active learning with image data,” inInternational Conference on Machine Learning. PMLR, 2017, pp. 1183–1192

  22. [22]

    En- coding the latent posterior of bayesian neural networks for uncertainty quantification,

    G. Franchi, A. Bursuc, E. Aldea, S. Dubuisson, and I. Bloch, “En- coding the latent posterior of bayesian neural networks for uncertainty quantification,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 46, no. 4, pp. 2027–2040, 2023

  23. [23]

    Can you trust your model’s uncer- tainty? evaluating predictive uncertainty under dataset shift,

    Y . Ovadia, E. Fertig, J. Ren, Z. Nado, D. Sculley, S. Nowozin, J. Dillon, B. Lakshminarayanan, and J. Snoek, “Can you trust your model’s uncer- tainty? evaluating predictive uncertainty under dataset shift,”Advances in Neural Information Processing Systems, vol. 32, 2019

  24. [24]

    On the practicality of deterministic epistemic uncertainty,

    J. Postels, M. Seg `u, T. Sun, L. D. Sieber, L. Van Gool, F. Yu, and F. Tombari, “On the practicality of deterministic epistemic uncertainty,” inInternational Conference on Machine Learning. PMLR, 2022, pp. 17 870–17 909

  25. [25]

    Efficient self-ensemble for semantic segmenta- tion,

    W. Bousselham, G. Thibault, L. Pagano, A. Machireddy, J. Gray, Y . H. Chang, and X. Song, “Efficient self-ensemble for semantic segmenta- tion,”arXiv preprint arXiv:2111.13280, 2021

  26. [26]

    Prune and tune ensembles: low-cost en- semble learning with sparse independent subnetworks,

    T. Whitaker and D. Whitley, “Prune and tune ensembles: low-cost en- semble learning with sparse independent subnetworks,” inProceedings of the AAAI conference on artificial intelligence, vol. 36, no. 8, 2022, pp. 8638–8646

  27. [27]

    Deep ensembling with no overhead for either training or testing: The all-round blessings of dynamic sparsity,

    S. Liu, T. Chen, Z. Atashgahi, X. Chen, G. Sokar, E. Mocanu, M. Pechenizkiy, Z. Wang, and D. C. Mocanu, “Deep ensembling with no overhead for either training or testing: The all-round blessings of dynamic sparsity,” in10th International conference on Learning Representation, ICLR 2022, 2022

  28. [28]

    Batchensemble: an alternative approach to efficient ensemble and lifelong learning,

    Y . Wen, D. Tran, and J. Ba, “Batchensemble: an alternative approach to efficient ensemble and lifelong learning,” inInternational Conference on Learning Representations, 2019

  29. [29]

    Packed-ensembles for efficient uncertainty estimation,

    O. Laurent, A. Lafage, E. Tartaglione, G. Daniel, J.-M. Martinez, A. Bursuc, and G. Franchi, “Packed-ensembles for efficient uncertainty estimation,” inInternational Conference on Learning Representations, 2023

  30. [30]

    Training independent subnetworks for robust prediction,

    M. Havasi, R. Jenatton, S. Fort, J. Z. Liu, J. Snoek, B. Lakshmi- narayanan, A. M. Dai, and D. Tran, “Training independent subnetworks for robust prediction,” inInternational Conference on Learning Repre- sentations, 2021

  31. [31]

    Probabilistic mimo u-net: Efficient and accurate uncertainty estimation for pixel-wise regression,

    A. Baumann, T. Roßberg, and M. Schmitt, “Probabilistic mimo u-net: Efficient and accurate uncertainty estimation for pixel-wise regression,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 4498–4506

  32. [32]

    Towards inference efficient deep ensemble learning,

    Z. Li, K. Ren, Y . Yang, X. Jiang, Y . Yang, and D. Li, “Towards inference efficient deep ensemble learning,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 37, no. 7, 2023, pp. 8711– 8719

  33. [33]

    Estimating the mean and variance of the target probability distribution,

    D. A. Nix and A. S. Weigend, “Estimating the mean and variance of the target probability distribution,” inProceedings of 1994 ieee international conference on neural networks (ICNN’94), vol. 1. IEEE, 1994, pp. 55– 60

  34. [34]

    Dudes: Deep uncertainty distillation using ensembles for semantic segmentation,

    S. Landgraf, K. Wursthorn, M. Hillemann, and M. Ulrich, “Dudes: Deep uncertainty distillation using ensembles for semantic segmentation,” PFG–Journal of Photogrammetry, Remote Sensing and Geoinformation Science, vol. 92, no. 2, pp. 101–114, 2024

  35. [35]

    Distilling ensembles improves uncertainty estimates,

    Z. E. Mariet, R. Jenatton, F. Wenzel, and D. Tran, “Distilling ensembles improves uncertainty estimates,” inThird Symposium on Advances in Approximate Bayesian Inference, 2021

  36. [36]

    Streamlined and resource-efficient estimation of epistemic uncertainty in deep ensemble classification decision via regression,

    J. F. Masakuna, D. K. Nkashama, A. Soltani, M. Frappier, P. M. Tardif, and F. Kabanza, “Streamlined and resource-efficient estimation of epistemic uncertainty in deep ensemble classification decision via regression,”IEEE Transactions on Emerging Topics in Computational Intelligence, 2024

  37. [37]

    Predictive uncertainty estimation via prior networks,

    A. Malinin and M. Gales, “Predictive uncertainty estimation via prior networks,”Advances in Neural Information Processing Systems, vol. 31, 2018

  38. [38]

    Uncertainty in the Variational Information Bottleneck

    A. A. Alemi, I. Fischer, and J. V . Dillon, “Uncertainty in the variational information bottleneck,”arXiv preprint arXiv:1807.00906, 2018

  39. [39]

    A simple approach to improve single-model deep uncertainty via distance-awareness,

    J. Z. Liu, S. Padhy, J. Ren, Z. Lin, Y . Wen, G. Jerfel, Z. Nado, J. Snoek, D. Tran, and B. Lakshminarayanan, “A simple approach to improve single-model deep uncertainty via distance-awareness,”Journal of Machine Learning Research, vol. 24, no. 42, pp. 1–63, 2023

  40. [40]

    Conformal prediction: A gentle introduction,

    A. N. Angelopoulos and S. Bates, “Conformal prediction: A gentle introduction,”Foundations and Trends in Machine Learning, vol. 16, no. 4, pp. 494–591, 2023

  41. [41]

    Conformal prediction: a unified review of theory and new challenges,

    M. Fontana, G. Zeni, and S. Vantini, “Conformal prediction: a unified review of theory and new challenges,”Bernoulli, vol. 29, no. 1, pp. 1–23, 2023

  42. [42]

    Stereo processing by semiglobal matching and mutual information,

    H. Hirschmuller, “Stereo processing by semiglobal matching and mutual information,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 30, no. 2, pp. 328–341, 2008

  43. [43]

    Mvsnet: Depth inference for unstructured multi-view stereo,

    Y . Yao, Z. Luo, S. Li, T. Fang, and L. Quan, “Mvsnet: Depth inference for unstructured multi-view stereo,” inProceedings of the European conference on computer vision (ECCV), 2018, pp. 767–783

  44. [44]

    Kinectfusion: Real-time dense surface mapping and tracking,

    R. A. Newcombe, S. Izadi, O. Hilliges, D. Molyneaux, D. Kim, A. J. Davison, P. Kohi, J. Shotton, S. Hodges, and A. Fitzgibbon, “Kinectfusion: Real-time dense surface mapping and tracking,” in2011 18 10th IEEE international symposium on mixed and augmented reality. IEEE, 2011, pp. 127–136

  45. [45]

    Real-time large-scale dense rgb-d slam with volumetric fusion,

    T. Whelan, M. Kaess, H. Johannsson, M. Fallon, J. J. Leonard, and J. McDonald, “Real-time large-scale dense rgb-d slam with volumetric fusion,”The International Journal of Robotics Research, vol. 34, no. 4-5, pp. 598–626, 2015

  46. [46]

    Consistent video depth estimation,

    X. Luo, J.-B. Huang, R. Szeliski, K. Matzen, and J. Kopf, “Consistent video depth estimation,”ACM Transactions on Graphics (ToG), vol. 39, no. 4, pp. 71–1, 2020

  47. [47]

    Activenerf: Learning where to see with uncertainty estimation,

    X. Pan, Z. Lai, S. Song, and G. Huang, “Activenerf: Learning where to see with uncertainty estimation,” inEuropean Conference on Computer Vision. Springer, 2022, pp. 230–246

  48. [48]

    Sources of uncertainty in 3d scene reconstruction,

    M. Klasson, R. Mereu, J. Kannala, and A. Solin, “Sources of uncertainty in 3d scene reconstruction,” inEuropean Conference on Computer Vision. Springer, 2024, pp. 271–289

  49. [49]

    Ramen: Real-time asynchronous multi-agent neural implicit mapping,

    H. Zhao, B. Ivanovic, and N. Mehr, “Ramen: Real-time asynchronous multi-agent neural implicit mapping,” 2025

  50. [50]

    3d gaussian splatting for real-time radiance field rendering,

    B. Kerbl, G. Kopanas, T. Leimk ¨uhler, and G. Drettakis, “3d gaussian splatting for real-time radiance field rendering,”ACM Transactions on Graphics (ToG), vol. 42, no. 4, pp. 1–14, 2023

  51. [51]

    Efficient parametric multi-fidelity surface mapping

    A. Dhawale and N. Michael, “Efficient parametric multi-fidelity surface mapping.” inRobotics: Science and Systems, 2020, pp. 1–9

  52. [52]

    Memory-efficient gaussian fitting for depth images in real time,

    P. Z. X. Li, S. Karaman, and V . Sze, “Memory-efficient gaussian fitting for depth images in real time,” in2022 International Conference on Robotics and Automation (ICRA). IEEE, 2022, pp. 8003–8009

  53. [53]

    3d normal distributions transform occupancy maps: An efficient represen- tation for mapping in dynamic environments,

    J. P. Saarinen, H. Andreasson, T. Stoyanov, and A. J. Lilienthal, “3d normal distributions transform occupancy maps: An efficient represen- tation for mapping in dynamic environments,”The International Journal of Robotics Research, vol. 32, no. 14, pp. 1627–1644, 2013

  54. [54]

    On-manifold gmm registra- tion,

    W. Tabib, C. O’Meadhra, and N. Michael, “On-manifold gmm registra- tion,”IEEE Robotics and Automation Letters, vol. 3, no. 4, pp. 3805– 3812, 2018

  55. [55]

    Gmmap: Memory-efficient contin- uous occupancy map using gaussian mixture model,

    P. Z. X. Li, S. Karaman, and V . Sze, “Gmmap: Memory-efficient contin- uous occupancy map using gaussian mixture model,”IEEE Transactions on Robotics, vol. 40, pp. 1339–1355, 2024

  56. [56]

    Gaussian Splatting SLAM,

    H. Matsuki, R. Murai, P. H. J. Kelly, and A. J. Davison, “Gaussian Splatting SLAM,” 2024

  57. [57]

    Gs- slam: Dense visual slam with 3d gaussian splatting,

    C. Yan, D. Qu, D. Xu, B. Zhao, Z. Wang, D. Wang, and X. Li, “Gs- slam: Dense visual slam with 3d gaussian splatting,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2024, pp. 19 595–19 604

  58. [58]

    Bundle- fusion: Real-time globally consistent 3d reconstruction using on-the-fly surface reintegration,

    A. Dai, M. Nießner, M. Zollh ¨ofer, S. Izadi, and C. Theobalt, “Bundle- fusion: Real-time globally consistent 3d reconstruction using on-the-fly surface reintegration,”ACM Transactions on Graphics (ToG), vol. 36, no. 4, p. 1, 2017

  59. [59]

    Orb-slam3: An accurate open-source library for visual, visual– inertial, and multimap slam,

    C. Campos, R. Elvira, J. J. G. Rodr ´ıguez, J. M. Montiel, and J. D. Tard´os, “Orb-slam3: An accurate open-source library for visual, visual– inertial, and multimap slam,”IEEE Transactions on Robotics, vol. 37, no. 6, pp. 1874–1890, 2021

  60. [60]

    Barycenters in the wasserstein space,

    M. Agueh and G. Carlier, “Barycenters in the wasserstein space,”SIAM Journal on Mathematical Analysis, vol. 43, no. 2, pp. 904–924, 2011

  61. [61]

    Variance min- imization in the wasserstein space for invariant causal prediction,

    G. G. Martinet, A. Strzalkowski, and B. Engelhardt, “Variance min- imization in the wasserstein space for invariant causal prediction,” inInternational Conference on Artificial Intelligence and Statistics. PMLR, 2022, pp. 8803–8851

  62. [62]

    The quadtree and related hierarchical data structures,

    H. Samet, “The quadtree and related hierarchical data structures,”ACM Computing Surveys (CSUR), vol. 16, no. 2, pp. 187–260, 1984

  63. [63]

    Gsfusion: Online rgb-d mapping where gaussian splatting meets tsdf fusion,

    J. Wei and S. Leutenegger, “Gsfusion: Online rgb-d mapping where gaussian splatting meets tsdf fusion,”IEEE Robotics and Automation Letters, vol. 9, no. 12, pp. 11 865–11 872, 2024

  64. [64]

    R-trees: A dynamic index structure for spatial searching,

    A. Guttman, “R-trees: A dynamic index structure for spatial searching,” inProceedings of the 1984 ACM SIGMOD international conference on Management of data, 1984, pp. 47–57

  65. [65]

    H. G. Sung,Gaussian mixture regression and classification. Rice University, 2004

  66. [66]

    On calibration of modern neural networks,

    C. Guo, G. Pleiss, Y . Sun, and K. Q. Weinberger, “On calibration of modern neural networks,” inInternational Conference on Machine Learning. PMLR, 2017, pp. 1321–1330

  67. [67]

    Accurate uncertainties for deep learning using calibrated regression,

    V . Kuleshov, N. Fenner, and S. Ermon, “Accurate uncertainties for deep learning using calibrated regression,” inInternational conference on machine learning. PMLR, 2018, pp. 2796–2804

  68. [68]

    Digging into self-supervised monocular depth estimation,

    C. Godard, O. Mac Aodha, M. Firman, and G. J. Brostow, “Digging into self-supervised monocular depth estimation,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 3828–3838

  69. [69]

    Fastdepth: Fast monocular depth estimation on embedded systems,

    D. Wofk, F. Ma, T.-J. Yang, S. Karaman, and V . Sze, “Fastdepth: Fast monocular depth estimation on embedded systems,” in2019 Interna- tional Conference on Robotics and Automation (ICRA). IEEE, 2019, pp. 6101–6108

  70. [70]

    On the uncertainty of self-supervised monocular depth estimation,

    M. Poggi, F. Aleotti, F. Tosi, and S. Mattoccia, “On the uncertainty of self-supervised monocular depth estimation,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 3227–3237

  71. [71]

    Indoor segmentation and support inference from rgbd images,

    N. Silberman, D. Hoiem, P. Kohli, and R. Fergus, “Indoor segmentation and support inference from rgbd images,” inComputer Vision–ECCV 2012: 12th European Conference on Computer Vision, Florence, Italy, October 7-13, 2012, Proceedings, Part V 12. Springer, 2012, pp. 746– 760

  72. [72]

    Tartanair: A dataset to push the limits of visual slam,

    W. Wang, D. Zhu, X. Wang, Y . Hu, Y . Qiu, C. Wang, Y . Hu, A. Kapoor, and S. Scherer, “Tartanair: A dataset to push the limits of visual slam,” in2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2020, pp. 4909–4916

  73. [73]

    Kitti-360: A novel dataset and bench- marks for urban scene understanding in 2d and 3d,

    Y . Liao, J. Xie, and A. Geiger, “Kitti-360: A novel dataset and bench- marks for urban scene understanding in 2d and 3d,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 3, pp. 3292– 3310, 2022

  74. [74]

    DINOv2: Learning Robust Visual Features without Supervision

    M. Oquab, T. Darcet, T. Moutakanni, H. V o, M. Szafraniec, V . Khalidov, P. Fernandez, D. Haziza, F. Massa, A. El-Noubyet al., “Dinov2: Learning robust visual features without supervision,”arXiv preprint arXiv:2304.07193, 2023

  75. [75]

    Metrically-scaled monocular slam using learned scale factors,

    W. N. Greene and N. Roy, “Metrically-scaled monocular slam using learned scale factors,” in2020 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2020, pp. 43–50