Reliability-Guided Depth Fusion for Glare-Resilient Navigation Costmaps

Shang-En Tsai

arxiv: 2606.03421 · v1 · pith:6IFYXTUGnew · submitted 2026-06-02 · 💻 cs.RO

Reliability-Guided Depth Fusion for Glare-Resilient Navigation Costmaps

Shang-En Tsai This is my paper

Pith reviewed 2026-06-28 09:27 UTC · model grok-4.3

classification 💻 cs.RO

keywords depth reliabilityspecular glareoccupancy costmapRGB-D fusionglare-resilient navigationreal-time roboticsphantom obstaclesindoor navigation

0 comments

The pith

Modeling per-pixel depth reliability lets robots fuse RGB-D data into glare-resilient costmaps without accumulating phantom obstacles.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that specular glare corrupts RGB-D depth into holes and spikes that turn into persistent false obstacles in occupancy grids. Instead of completing missing depth, it trains a lightweight network to predict measurement trustworthiness and uses those scores to gate and weight the fusion step before the map is updated. This produces cleaner free-space estimates on real hardware while running at real-time rates. A sympathetic reader would care because indoor navigation safety depends on not mistaking glare reflections for solid objects. The approach is tested across reflective floors, glass walls, and natural light with ablation studies and embedded profiling.

Core claim

Explicit per-pixel reliability prediction via DRM-Net combined with reliability-guided weighted-and-gated fusion produces occupancy costmaps that reduce false obstacle insertion and preserve free space under specular glare, using pose-aligned multi-view reference depth to train without circular bias.

What carries the argument

DRM-Net (Depth Reliability Map network) that outputs per-pixel trustworthiness scores, used inside the RGF (reliability-guided weighted-and-gated fusion) mechanism to modulate occupancy updates.

If this is right

False obstacle insertion drops while free-space regions stay intact across reflective-floor, glass-wall, and natural-light glare.
Real-time throughput is preserved on Jetson Orin Nano hardware with the Intel RealSense D435.
Occupancy updates are modulated before corruption accumulates rather than repaired afterward.
The method is evaluated via fusion ablations, parameter sweeps, cross-condition tests, and reliability-map metrics.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same reliability gate could be applied to other transient sensor corruptions such as motion blur or lens flare without retraining the full pipeline.
Navigation stacks could drop separate dense depth-completion stages if reliability scores already prevent bad measurements from reaching the map.
If the reference-depth construction generalizes, the approach might transfer to new camera placements or multi-robot mapping without new labeled data.

Load-bearing premise

The pose-aligned multi-view reference depths used for supervision are themselves free of the specular corruption the method is meant to fix.

What would settle it

A side-by-side run on the same robot and glare scenes where the reliability-guided costmaps still insert phantom obstacles at the same rate as the baseline fusion method.

Figures

Figures reproduced from arXiv: 2606.03421 by Shang-En Tsai.

**Figure 1.** Figure 1: Overview of the proposed DRM+RGF pipeline. DRM-Net predicts a per-pixel depth reliability map from RGB-D input, and reliability-guided weighted-and-gated fusion suppresses glare-induced phantom obstacles before 3-D integration and 2-D costmap export for navigation. Contributions— This work makes the following contributions: (1) a glare-aware DRM-Net estimator that quantifies per-pixel depth reliability und… view at source ↗

**Figure 2.** Figure 2: Reference-target construction for DRM training and evaluation. The target-generation process incorporates pose-aligned multi-view reprojection consistency before reliability-target generation to reduce circular-supervision bias, while a geometry-derived reference costmap is used for downstream correctness evaluation. D. Reliability-Guided Weighted-and-Gated Fusion The RGF rule combines continuous reliabili… view at source ↗

**Figure 3.** Figure 3: Quantitative results under three glare severities. DRM+RGF reduces false obstacle rate and emergency stops while improving freespace recall and success rate across L0–L2 conditions. TABLE V Costmap Correctness And Navigation Performance Glare level Method FOR ↓ FSR ↑ Emerg. stops/10 m ↓ Success (%) ↑ L0 Baseline TSDF 0.009 0.993 0.12 100.0 L0 Validity/range gating 0.008 0.993 0.12 100.0 L0 Spatial median … view at source ↗

**Figure 4.** Figure 4: Trajectory detour analysis under severe glare. Baseline fusion forms glare-induced phantom obstacles that trigger detours, whereas DRM+RGF suppresses unreliable measurements and preserves a clean costmap with an efficient trajectory. D. Fusion-Variant Ablation TABLE VI Fusion-Variant Ablation Under L2 Glare Fusion variant FOR ↓ FSR ↑ Emerg. stops/10 m ↓ Success (%) ↑ Main observation Baseline TSDF 0.432 0.… view at source ↗

read the original abstract

Specular glare on reflective floors, glass boundaries, and glossy indoor surfaces frequently corrupts active-stereo RGB-D depth measurements, producing holes and spikes that accumulate as persistent phantom obstacles in occupancy-grid costmaps. This paper presents a glare-resilient costmap construction method based on explicit depth-reliability modeling. A lightweight Depth Reliability Map network (DRM-Net) predicts per-pixel measurement trustworthiness under specular interference, and a reliability-guided weighted-and-gated fusion (RGF) mechanism modulates occupancy updates before corrupted measurements are accumulated into the map. To support robust training and evaluation, the method uses pose-aligned multi-view reference-depth construction to reduce circular-supervision bias and is evaluated through fusion-variant ablations, parameter-sensitivity analysis, cross-condition tests, paired navigation comparisons, reliability-map metrics, and embedded runtime profiling. Experiments on a real mobile robotic platform equipped with an Intel RealSense D435 and a Jetson Orin Nano show that the proposed method reduces false obstacle insertion, improves free-space preservation, and maintains real-time throughput under reflective-floor, glass-wall, and natural-light glare conditions. These results support treating glare as a measurement-reliability problem rather than as a dense depth-completion problem for safety-critical indoor navigation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a practical DRM-Net plus RGF pipeline for glare in RGB-D costmaps using multi-view references, but the reference quality is the key open question.

read the letter

The main takeaway is that this work puts together a lightweight reliability predictor and a gated fusion step to keep glare-induced depth errors from turning into persistent phantom obstacles in indoor costmaps. They train on pose-aligned multi-view depth to create the labels and then run the whole thing on a RealSense D435 plus Jetson Orin Nano setup, with ablations, sensitivity checks, and timing numbers.

What is actually new is the specific DRM-Net + RGF combination aimed at costmap construction rather than dense completion, plus the multi-view reference as a way to get supervision without obvious circularity. The real-robot experiments under reflective floors, glass, and natural light are the strongest part; they show the method keeps free space open and runs in real time, which matters for people who actually deploy these systems.

The soft spot is the reference construction itself. If the additional views come from the same active-stereo sensor under the same glare conditions, the fused reference can still contain holes and spikes on specular surfaces. That would make the reliability labels noisy and leave open whether the reported gains come from the network or from the reference step. The abstract gives no numbers or error bars, so the size of the improvement is hard to judge without the full results.

This is for robotics groups that build navigation stacks around RGB-D sensors in indoor environments. It is a targeted engineering improvement rather than a broad advance, but the hardware tests and multiple evaluation axes make it worth a serious referee. I would send it to review with the expectation that the reference signal and quantitative metrics will get the most questions.

Referee Report

2 major / 1 minor

Summary. The paper proposes DRM-Net, a lightweight network that predicts per-pixel depth reliability under specular glare, paired with a reliability-guided weighted-and-gated fusion (RGF) step that modulates occupancy-grid updates. Training and evaluation rely on pose-aligned multi-view reference-depth construction to reduce circular-supervision bias. Experiments on a RealSense D435 + Jetson Orin Nano platform report reduced false-obstacle insertion, better free-space preservation, and real-time performance across reflective-floor, glass-wall, and natural-light conditions; the work frames glare as a reliability rather than dense-completion problem.

Significance. If the multi-view reference construction supplies an unbiased supervisory signal, the explicit reliability modeling plus real-robot ablations, cross-condition tests, navigation comparisons, and embedded profiling would constitute a practical contribution to glare-resilient costmap construction for indoor mobile robots.

major comments (2)

[training-support description (abstract and method section on reference-depth construction)] The central training support (pose-aligned multi-view reference-depth construction) is described as reducing circular-supervision bias, yet the manuscript provides no quantitative check that the fused reference remains free of the same holes and spikes on specular surfaces that DRM-Net is trained to detect. If reference views share the identical active-stereo sensor and glare conditions, the constructed target may itself contain the artifacts, making it impossible to attribute reported reductions in false-obstacle insertion to learned reliability rather than to the reference itself.
[abstract and experimental-results summary] No numerical metrics, error bars, or statistical tests accompany the abstract's claims of reduced false-obstacle insertion and improved free-space preservation; without these, the strength of the experimental outcomes cannot be assessed from the provided summary.

minor comments (1)

[abstract] The abstract lists multiple evaluation axes (fusion ablations, parameter sensitivity, cross-condition tests, reliability-map metrics, runtime profiling) but does not indicate which tables or figures report the quantitative outcomes for each axis.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below with clarifications and note the revisions we will incorporate.

read point-by-point responses

Referee: [training-support description (abstract and method section on reference-depth construction)] The central training support (pose-aligned multi-view reference-depth construction) is described as reducing circular-supervision bias, yet the manuscript provides no quantitative check that the fused reference remains free of the same holes and spikes on specular surfaces that DRM-Net is trained to detect. If reference views share the identical active-stereo sensor and glare conditions, the constructed target may itself contain the artifacts, making it impossible to attribute reported reductions in false-obstacle insertion to learned reliability rather than to the reference itself.

Authors: Different robot poses produce distinct specular reflection patterns because incidence angles and surface normals change relative to the fixed sensor. The pose-aligned fusion therefore aggregates measurements whose artifact locations are largely uncorrelated, yielding a reference with fewer persistent holes and spikes than any individual view. The manuscript does not contain an explicit quantitative artifact-count comparison between fused and single-view references on glare regions. We will add this analysis (including a table of hole/spike statistics before and after fusion) to the method and results sections of the revised manuscript. revision: yes
Referee: [abstract and experimental-results summary] No numerical metrics, error bars, or statistical tests accompany the abstract's claims of reduced false-obstacle insertion and improved free-space preservation; without these, the strength of the experimental outcomes cannot be assessed from the provided summary.

Authors: The abstract is written as a high-level summary; the full quantitative results (false-obstacle insertion rates, free-space preservation percentages, standard deviations across trials, and statistical tests) appear in Section 5 with tables and figures. To make the abstract self-contained, we will insert the principal numerical improvements (e.g., percentage reductions with error bars) into the abstract in the revised version. revision: yes

Circularity Check

0 steps flagged

No circularity: method relies on external multi-view reference construction without self-referential fitting or definition loops

full rationale

The provided abstract and description contain no equations, fitted parameters renamed as predictions, or self-citations that bear the central claim. The pose-aligned multi-view reference-depth construction is presented as an independent supervisory technique to reduce bias, not as a quantity derived from or equivalent to the DRM-Net outputs by construction. The reliability modeling and fusion steps are described as learned and modulated mechanisms evaluated via ablations and real-robot tests, with no reduction of the reported performance gains to the inputs themselves. This is a standard non-circular empirical robotics paper.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 1 invented entities

Abstract-only review; free parameters, axioms, and invented entities cannot be enumerated precisely without the methods and equations sections. The central claim rests on the unstated assumption that the reliability network generalizes across glare conditions and that the reference-depth construction is sufficiently clean.

invented entities (1)

DRM-Net no independent evidence
purpose: Predict per-pixel depth measurement trustworthiness under specular glare
Lightweight network introduced to output reliability scores; no independent evidence provided in abstract

pith-pipeline@v0.9.1-grok · 5736 in / 1291 out tokens · 19769 ms · 2026-06-28T09:27:16.236886+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

29 extracted references · 21 canonical work pages

[1]

Liang, W

P. Foster, C. Johnson, and B. Kuipers, “The reflectance field map: Mapping glass and specular surfaces in dynamic environment s,” in Proc. IEEE Int. Conf. Robot. Autom. (ICRA), London, U.K., 2023, pp. 8393–8399, doi: 10.1109/ICRA48891.2023.10161520

work page doi:10.1109/icra48891.2023.10161520 2023
[2]

Leveraging 3 -D data for whole object shape and reflection -aware 2 -D map building,

A. Mora, R. Barber, and L. Moreno, “Leveraging 3 -D data for whole object shape and reflection -aware 2 -D map building,” IEEE Sensors Journal, vol. 24, no. 14, pp. 21941–21948, Jul. 15, 2024, doi: 10.1109/JSEN.2023.3321936

work page doi:10.1109/jsen.2023.3321936 2024
[3]

A robust RGB-D SLAM system for indoor environments with reflective ground,

N. Zhou, H. Yao, C. Zhai, Z. Zhao, and X. Zhu, “A robust RGB-D SLAM system for indoor environments with reflective ground,” IEEE Sensors Journal, vol. 25, no. 20, pp. 38258–38270, Oct. 15, 2025, doi: 10.1109/JSEN.2025.3600569

work page doi:10.1109/jsen.2025.3600569 2025
[4]

Glass recognition and map optimization method for mobile robot based on boundary guidance,

C. He, H. Zhao, X. Zhang, J. Li, and Z. Dong, “Glass recognition and map optimization method for mobile robot based on boundary guidance,” Chin. J. Mech. Eng., vol. 36, Art. no. 88, Jun. 2023, doi: 10.1186/s10033 -023-00902-9

work page doi:10.1186/s10033 2023
[5]

Accurate intrinsic and extrinsic calibration of RGB -D cameras with GP- based depth correction,

G. Chen, G. Cui, Z. Jin, F. Wu, and X. Chen, “Accurate intrinsic and extrinsic calibration of RGB -D cameras with GP- based depth correction,” IEEE Sensors Journal, vol. 19, no. 7, pp. 2685–2694, 2019, doi: 10.1109/JSEN.2018.2889805

work page doi:10.1109/jsen.2018.2889805 2019
[6]

Accuracy and resolution of Kinect depth data for indoor mapping applications,

K. Khoshelham and S. O. Elberink, “Accuracy and resolution of Kinect depth data for indoor mapping applications,” Sensors, vol. 12, no. 2, pp. 1437–1454, 2012

2012
[7]

Intel RealSense D400 series product family datasheet,

Intel Corp., “Intel RealSense D400 series product family datasheet,” Doc. 337029 -005. [Online]. Available: https://www.intelrealsense.com/wp- content/uploads/2019/09/Intel_RealSense_D400_Series_Product_Family_Datasheet. Accessed: Feb. 2, 2026

2019
[8]

Polarization structured light 3D depth image sensor for scenes with reflective surfaces,

X. Huang, C. Wu, X. Xu, B. Wang, S. Zhang, C. Shen, C. Yu, J. Wang, N. Chi, S. Yu, and C. J. Chang -Hasnain, “Polarization structured light 3D depth image sensor for scenes with reflective surfaces,” Nat. Commun., vol. 14, Art. no. 6855, 2023, doi: 10.1038/s41467-023-42678-5

work page doi:10.1038/s41467-023-42678-5 2023
[9]

3DRef: 3D dataset and benchmark for reflection detection in RGB and LiDAR data,

X. Zhao and S. Schwertfeger, “3DRef: 3D dataset and benchmark for reflection detection in RGB and LiDAR data,” arXiv preprint arXiv:2403.06538, 2024, doi: 10.48550/arXiv.2403.06538

work page doi:10.48550/arxiv.2403.06538 2024
[10]

Onlyflow: Optical flow based motion conditioning for video diffusion models

P. Z. Ramirez et al., “NTIRE 2025 challenge on HR depth from images of specular and transparent surfaces,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. Workshops (CVPRW), Nashville, TN, USA, 2025, pp. 978 –992, doi: 10.1109/CVPRW67362.2025.00098

work page doi:10.1109/cvprw67362.2025.00098 2025
[11]

TDCNet: Transparent objects depth completion with CNN -transformer dual-branch parallel network,

X. Fan et al., “TDCNet: Transparent objects depth completion with CNN -transformer dual-branch parallel network,” IEEE Sensors Journal, vol. 25, no. 19, pp. 36629–36641, Oct. 1, 2025, doi: 10.1109/JSEN.2025.3599381

work page doi:10.1109/jsen.2025.3599381 2025
[12]

HDCNet: A hybrid depth completion network for grasping transparent and reflective objects,

G. Xie et al., “HDCNet: A hybrid depth completion network for grasping transparent and reflective objects,” arXiv preprint arXiv:2511.07081, Nov. 10, 2025. [Online]. Available: https://arxiv.org/abs/2511.07081

work page arXiv 2025
[13]

Geometry -aware sparse depth sampling for high -fidelity RGB-D depth completion in robotic systems,

T. Salloom, D. Zhou, and X. Sun, “Geometry -aware sparse depth sampling for high -fidelity RGB-D depth completion in robotic systems,” arXiv preprint arXiv:2512.08229, Dec. 9, 2025. [Online]. Available: https://arxiv.org/abs/2512.08229

work page arXiv 2025
[14]

TRICKY 2025 challenge on monocula r depth from images of specular and transparent surfaces,

P. Z. Ramirez, A. Costanzino, F. Tosi, M. Poggi, L. Di Stefano, J. -B. Weibel, D. Antensteiner, M. Vincze, B. Busam, G. Zhai, W. Li, J. Huang, H. Jung, M. Lavreniuk, P. Sun, Y. Luo, H. Wang, M. Gao, K. Jiang, and J. Jiang, “TRICKY 2025 challenge on monocula r depth from images of specular and transparent surfaces,” in Proc. IEEE/CVF Int. Conf. Comput. Vis...

2025
[15]

Seeing and seeing through the glass: Real and synthetic data for multi -layer depth estimation,

H. Wen, X. Yan, W. Tian, and J. Deng, “Seeing and seeing through the glass: Real and synthetic data for multi -layer depth estimation,” arXiv preprint arXiv:2503.11633, Mar. 14, 2025. [Online]. Available: https://arxiv.org/abs/2503.11633

work page arXiv 2025
[16]

doi:10.1109/DCC.2019.00022 , Pages =

T. Schöps, T. Sattler, and M. Pollefeys, “BAD SLAM: Bundle adjusted direct RGB-D SLAM,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Long Beach, CA, USA, 2019, pp. 134 –144, doi: 10.1109/CVPR.2019.00022

work page doi:10.1109/cvpr.2019.00022 2019
[17]

Stereo -GS: Online 3D Gaussian splatting mapping using stereo depth estimation,

J. Park, B. Lee, S. Lee, and S. Son, “Stereo -GS: Online 3D Gaussian splatting mapping using stereo depth estimation,” Electronics, vol. 14, no. 22, Art. no. 4436, 2025, doi: 10.3390/electronics14224436

work page doi:10.3390/electronics14224436 2025
[18]

Transformer-based sensor fusion for autonomous vehicles: A comprehensive review,

A. Abdulmaksoud and R. Ahmed, “Transformer-based sensor fusion for autonomous vehicles: A comprehensive review,” IEEE Access, vol. 13, pp. 41822–41838, 2025, doi: 10.1109/ACCESS.2025.3545032

work page doi:10.1109/access.2025.3545032 2025
[19]

High resolution maps from wide angle sonar,

H. Moravec and A. Elfes, “High resolution maps from wide angle sonar,” in Proc. 1985 IEEE Int. Conf. Robot. Autom., St. Louis, MO, USA, 1985, pp. 116–121, doi: 10.1109/ROBOT.1985.1087316

work page doi:10.1109/robot.1985.1087316 1985
[20]

Using occupancy grids for mobile robot perception and navigation,

A. Elfes, “Using occupancy grids for mobile robot perception and navigation,” Computer, vol. 22, no. 6, pp. 46–57, Jun. 1989, doi: 10.1109/2.30720

work page doi:10.1109/2.30720 1989
[21]

A volumetric method for building complex models from range images,

B. Curless and M. Levoy, “A volumetric method for building complex models from range images,” in Proc. 23rd Annu. Conf. Comput. Graph. Interactive Techn. (SIGGRAPH), New Orleans, LA, USA, 1996, pp. 303 –312, doi: 10.1145/237170.237269

work page doi:10.1145/237170.237269 1996
[22]

OctoMap: An efficient probabilistic 3d mapping framework based on octrees

A. Hornung, K. M. Wurm, M. Bennewitz, C. Stachniss, and W. Burgard, “OctoMap: An efficient probabilistic 3D mapping framework based on octrees,” Autonomous Robots, vol. 34, no. 3, pp. 189 –206, Apr. 2013, doi: 10.1007/s10514-012-9321-0

work page doi:10.1007/s10514-012-9321-0 2013
[23]

RGB-D video mirror detection,

M. Xu, P. Herbert, Y.-K. Lai, Z. Ji, and J. Wu, “RGB-D video mirror detection,” in Proc. IEEE/CVF Winter Conf. Appl. Comput. Vis. (WACV), 2025, pp. 9622–9631. [Online]. Available: https://github.com/UpChenF/DVMDNet. Accessed: Feb. 2, 2026

2025
[24]

Out-of-distribution detection for monocular depth estimation,

J. Hornauer, A. Holzbock, and V. Belagiannis, “Out-of-distribution detection for monocular depth estimation,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), 2023, pp. 1911–1921

2023
[25]

Comparison of Kinect v1 and v2 depth images in terms of accuracy and precision,

O. Wasenmüller and D. Stricker, “Comparison of Kinect v1 and v2 depth images in terms of accuracy and precision,” in Proc. Asian Conf. Comput. Vis. Workshops (ACCV Workshops), 2016, pp. 34 –45

2016
[26]

KinectFusion: Real -time dense surface mapping and tracking,

R. A. Newcombe et al., “KinectFusion: Real -time dense surface mapping and tracking,” in Proc. 10th IEEE Int. Symp. Mixed and Augmented Reality (ISMAR), Basel, Switzerland, 2011, pp. 127 –136, doi: 10.1109/ISMAR.2011.6092378

work page doi:10.1109/ismar.2011.6092378 2011
[27]

O’Neill, A

A. Millane, H. Oleynikova, E. Wirbel, R. Steiner, V. Ramasamy, D. Tingdahl, and R. Siegwart, “nvblox: GPU - accelerated incremental signed distance field mapping,” in Proc. IEEE Int. Conf. Robot. Autom. (ICRA), Yokohama, Japan, May 2024, pp. 2698–2705, doi: 10.1109/ICRA57147.2024.10611532

work page doi:10.1109/icra57147.2024.10611532 2024
[28]

Costmap 2D,

Navigation2, “Costmap 2D,” Navigation2 Documentation. [Online]. Available: https://docs.nav2.org/configuration/packages/configuring-costmaps.html. Accessed: Feb. 2, 2026

2026
[29]

costmap_2d/Inflation

ROS Wiki, “costmap_2d/Inflation.” [Online]. Available: https://wiki.ros.org/costmap_2d/hydro/inflation. Accessed: Feb. 2, 2026

2026

[1] [1]

Liang, W

P. Foster, C. Johnson, and B. Kuipers, “The reflectance field map: Mapping glass and specular surfaces in dynamic environment s,” in Proc. IEEE Int. Conf. Robot. Autom. (ICRA), London, U.K., 2023, pp. 8393–8399, doi: 10.1109/ICRA48891.2023.10161520

work page doi:10.1109/icra48891.2023.10161520 2023

[2] [2]

Leveraging 3 -D data for whole object shape and reflection -aware 2 -D map building,

A. Mora, R. Barber, and L. Moreno, “Leveraging 3 -D data for whole object shape and reflection -aware 2 -D map building,” IEEE Sensors Journal, vol. 24, no. 14, pp. 21941–21948, Jul. 15, 2024, doi: 10.1109/JSEN.2023.3321936

work page doi:10.1109/jsen.2023.3321936 2024

[3] [3]

A robust RGB-D SLAM system for indoor environments with reflective ground,

N. Zhou, H. Yao, C. Zhai, Z. Zhao, and X. Zhu, “A robust RGB-D SLAM system for indoor environments with reflective ground,” IEEE Sensors Journal, vol. 25, no. 20, pp. 38258–38270, Oct. 15, 2025, doi: 10.1109/JSEN.2025.3600569

work page doi:10.1109/jsen.2025.3600569 2025

[4] [4]

Glass recognition and map optimization method for mobile robot based on boundary guidance,

C. He, H. Zhao, X. Zhang, J. Li, and Z. Dong, “Glass recognition and map optimization method for mobile robot based on boundary guidance,” Chin. J. Mech. Eng., vol. 36, Art. no. 88, Jun. 2023, doi: 10.1186/s10033 -023-00902-9

work page doi:10.1186/s10033 2023

[5] [5]

Accurate intrinsic and extrinsic calibration of RGB -D cameras with GP- based depth correction,

G. Chen, G. Cui, Z. Jin, F. Wu, and X. Chen, “Accurate intrinsic and extrinsic calibration of RGB -D cameras with GP- based depth correction,” IEEE Sensors Journal, vol. 19, no. 7, pp. 2685–2694, 2019, doi: 10.1109/JSEN.2018.2889805

work page doi:10.1109/jsen.2018.2889805 2019

[6] [6]

Accuracy and resolution of Kinect depth data for indoor mapping applications,

K. Khoshelham and S. O. Elberink, “Accuracy and resolution of Kinect depth data for indoor mapping applications,” Sensors, vol. 12, no. 2, pp. 1437–1454, 2012

2012

[7] [7]

Intel RealSense D400 series product family datasheet,

Intel Corp., “Intel RealSense D400 series product family datasheet,” Doc. 337029 -005. [Online]. Available: https://www.intelrealsense.com/wp- content/uploads/2019/09/Intel_RealSense_D400_Series_Product_Family_Datasheet. Accessed: Feb. 2, 2026

2019

[8] [8]

Polarization structured light 3D depth image sensor for scenes with reflective surfaces,

X. Huang, C. Wu, X. Xu, B. Wang, S. Zhang, C. Shen, C. Yu, J. Wang, N. Chi, S. Yu, and C. J. Chang -Hasnain, “Polarization structured light 3D depth image sensor for scenes with reflective surfaces,” Nat. Commun., vol. 14, Art. no. 6855, 2023, doi: 10.1038/s41467-023-42678-5

work page doi:10.1038/s41467-023-42678-5 2023

[9] [9]

3DRef: 3D dataset and benchmark for reflection detection in RGB and LiDAR data,

X. Zhao and S. Schwertfeger, “3DRef: 3D dataset and benchmark for reflection detection in RGB and LiDAR data,” arXiv preprint arXiv:2403.06538, 2024, doi: 10.48550/arXiv.2403.06538

work page doi:10.48550/arxiv.2403.06538 2024

[10] [10]

Onlyflow: Optical flow based motion conditioning for video diffusion models

P. Z. Ramirez et al., “NTIRE 2025 challenge on HR depth from images of specular and transparent surfaces,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. Workshops (CVPRW), Nashville, TN, USA, 2025, pp. 978 –992, doi: 10.1109/CVPRW67362.2025.00098

work page doi:10.1109/cvprw67362.2025.00098 2025

[11] [11]

TDCNet: Transparent objects depth completion with CNN -transformer dual-branch parallel network,

X. Fan et al., “TDCNet: Transparent objects depth completion with CNN -transformer dual-branch parallel network,” IEEE Sensors Journal, vol. 25, no. 19, pp. 36629–36641, Oct. 1, 2025, doi: 10.1109/JSEN.2025.3599381

work page doi:10.1109/jsen.2025.3599381 2025

[12] [12]

HDCNet: A hybrid depth completion network for grasping transparent and reflective objects,

G. Xie et al., “HDCNet: A hybrid depth completion network for grasping transparent and reflective objects,” arXiv preprint arXiv:2511.07081, Nov. 10, 2025. [Online]. Available: https://arxiv.org/abs/2511.07081

work page arXiv 2025

[13] [13]

Geometry -aware sparse depth sampling for high -fidelity RGB-D depth completion in robotic systems,

T. Salloom, D. Zhou, and X. Sun, “Geometry -aware sparse depth sampling for high -fidelity RGB-D depth completion in robotic systems,” arXiv preprint arXiv:2512.08229, Dec. 9, 2025. [Online]. Available: https://arxiv.org/abs/2512.08229

work page arXiv 2025

[14] [14]

TRICKY 2025 challenge on monocula r depth from images of specular and transparent surfaces,

P. Z. Ramirez, A. Costanzino, F. Tosi, M. Poggi, L. Di Stefano, J. -B. Weibel, D. Antensteiner, M. Vincze, B. Busam, G. Zhai, W. Li, J. Huang, H. Jung, M. Lavreniuk, P. Sun, Y. Luo, H. Wang, M. Gao, K. Jiang, and J. Jiang, “TRICKY 2025 challenge on monocula r depth from images of specular and transparent surfaces,” in Proc. IEEE/CVF Int. Conf. Comput. Vis...

2025

[15] [15]

Seeing and seeing through the glass: Real and synthetic data for multi -layer depth estimation,

H. Wen, X. Yan, W. Tian, and J. Deng, “Seeing and seeing through the glass: Real and synthetic data for multi -layer depth estimation,” arXiv preprint arXiv:2503.11633, Mar. 14, 2025. [Online]. Available: https://arxiv.org/abs/2503.11633

work page arXiv 2025

[16] [16]

doi:10.1109/DCC.2019.00022 , Pages =

T. Schöps, T. Sattler, and M. Pollefeys, “BAD SLAM: Bundle adjusted direct RGB-D SLAM,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Long Beach, CA, USA, 2019, pp. 134 –144, doi: 10.1109/CVPR.2019.00022

work page doi:10.1109/cvpr.2019.00022 2019

[17] [17]

Stereo -GS: Online 3D Gaussian splatting mapping using stereo depth estimation,

J. Park, B. Lee, S. Lee, and S. Son, “Stereo -GS: Online 3D Gaussian splatting mapping using stereo depth estimation,” Electronics, vol. 14, no. 22, Art. no. 4436, 2025, doi: 10.3390/electronics14224436

work page doi:10.3390/electronics14224436 2025

[18] [18]

Transformer-based sensor fusion for autonomous vehicles: A comprehensive review,

A. Abdulmaksoud and R. Ahmed, “Transformer-based sensor fusion for autonomous vehicles: A comprehensive review,” IEEE Access, vol. 13, pp. 41822–41838, 2025, doi: 10.1109/ACCESS.2025.3545032

work page doi:10.1109/access.2025.3545032 2025

[19] [19]

High resolution maps from wide angle sonar,

H. Moravec and A. Elfes, “High resolution maps from wide angle sonar,” in Proc. 1985 IEEE Int. Conf. Robot. Autom., St. Louis, MO, USA, 1985, pp. 116–121, doi: 10.1109/ROBOT.1985.1087316

work page doi:10.1109/robot.1985.1087316 1985

[20] [20]

Using occupancy grids for mobile robot perception and navigation,

A. Elfes, “Using occupancy grids for mobile robot perception and navigation,” Computer, vol. 22, no. 6, pp. 46–57, Jun. 1989, doi: 10.1109/2.30720

work page doi:10.1109/2.30720 1989

[21] [21]

A volumetric method for building complex models from range images,

B. Curless and M. Levoy, “A volumetric method for building complex models from range images,” in Proc. 23rd Annu. Conf. Comput. Graph. Interactive Techn. (SIGGRAPH), New Orleans, LA, USA, 1996, pp. 303 –312, doi: 10.1145/237170.237269

work page doi:10.1145/237170.237269 1996

[22] [22]

OctoMap: An efficient probabilistic 3d mapping framework based on octrees

A. Hornung, K. M. Wurm, M. Bennewitz, C. Stachniss, and W. Burgard, “OctoMap: An efficient probabilistic 3D mapping framework based on octrees,” Autonomous Robots, vol. 34, no. 3, pp. 189 –206, Apr. 2013, doi: 10.1007/s10514-012-9321-0

work page doi:10.1007/s10514-012-9321-0 2013

[23] [23]

RGB-D video mirror detection,

M. Xu, P. Herbert, Y.-K. Lai, Z. Ji, and J. Wu, “RGB-D video mirror detection,” in Proc. IEEE/CVF Winter Conf. Appl. Comput. Vis. (WACV), 2025, pp. 9622–9631. [Online]. Available: https://github.com/UpChenF/DVMDNet. Accessed: Feb. 2, 2026

2025

[24] [24]

Out-of-distribution detection for monocular depth estimation,

J. Hornauer, A. Holzbock, and V. Belagiannis, “Out-of-distribution detection for monocular depth estimation,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), 2023, pp. 1911–1921

2023

[25] [25]

Comparison of Kinect v1 and v2 depth images in terms of accuracy and precision,

O. Wasenmüller and D. Stricker, “Comparison of Kinect v1 and v2 depth images in terms of accuracy and precision,” in Proc. Asian Conf. Comput. Vis. Workshops (ACCV Workshops), 2016, pp. 34 –45

2016

[26] [26]

KinectFusion: Real -time dense surface mapping and tracking,

R. A. Newcombe et al., “KinectFusion: Real -time dense surface mapping and tracking,” in Proc. 10th IEEE Int. Symp. Mixed and Augmented Reality (ISMAR), Basel, Switzerland, 2011, pp. 127 –136, doi: 10.1109/ISMAR.2011.6092378

work page doi:10.1109/ismar.2011.6092378 2011

[27] [27]

O’Neill, A

A. Millane, H. Oleynikova, E. Wirbel, R. Steiner, V. Ramasamy, D. Tingdahl, and R. Siegwart, “nvblox: GPU - accelerated incremental signed distance field mapping,” in Proc. IEEE Int. Conf. Robot. Autom. (ICRA), Yokohama, Japan, May 2024, pp. 2698–2705, doi: 10.1109/ICRA57147.2024.10611532

work page doi:10.1109/icra57147.2024.10611532 2024

[28] [28]

Costmap 2D,

Navigation2, “Costmap 2D,” Navigation2 Documentation. [Online]. Available: https://docs.nav2.org/configuration/packages/configuring-costmaps.html. Accessed: Feb. 2, 2026

2026

[29] [29]

costmap_2d/Inflation

ROS Wiki, “costmap_2d/Inflation.” [Online]. Available: https://wiki.ros.org/costmap_2d/hydro/inflation. Accessed: Feb. 2, 2026

2026