arxiv: 2603.10584 · v2 · submitted 2026-03-11 · 💻 cs.CV · cs.RO

Recognition: 1 theorem link

· Lean Theorem

Need for Speed: Zero-Shot Depth Completion with Single-Step Diffusion

Jakub Gregorek , Paraskevas Pegios , Nando Metzger , Konrad Schindler , Theodora Kontogianni , Lazaros Nalpantidis

Authors on Pith no claims yet

Pith reviewed 2026-05-15 13:48 UTC · model grok-4.3

classification 💻 cs.CV cs.RO

keywords depth completiondiffusion modelssingle-step inferencezero-shot3D perceptionsparse depthcomputer vision

0 comments

The pith

Marigold-SSD completes depth maps in one diffusion step by finetuning instead of test-time optimization.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents Marigold-SSD as a single-step late-fusion framework for depth completion that draws on diffusion priors. It removes the slow iterative sampling usually required at inference by performing a short finetuning phase instead. This shift keeps the benefits of strong priors while cutting inference time enough for real-world use. Tests on four indoor and two outdoor benchmarks show competitive accuracy and generalization without further per-scene training. The work also examines how results change when input depth becomes sparser.

Core claim

Marigold-SSD is a single-step late-fusion depth completion framework that leverages strong diffusion priors while eliminating the costly test-time optimization typically associated with diffusion-based methods. By shifting computational burden from inference to finetuning, the approach enables efficient and robust 3D perception under real-world latency constraints after only 4.5 GPU days of training. Evaluations across indoor and outdoor benchmarks demonstrate strong cross-domain generalization and zero-shot performance, narrowing the efficiency gap with discriminative models.

What carries the argument

Single-step late-fusion depth completion that fuses diffusion priors in one forward pass after finetuning.

If this is right

Faster inference times make diffusion priors practical for latency-sensitive 3D perception tasks.
Training cost limited to 4.5 GPU days removes the need for repeated test-time optimization.
Zero-shot results hold across indoor and outdoor domains without scene-specific retraining.
Performance remains stable under varying levels of input depth sparsity.
The method closes much of the speed gap between diffusion and discriminative depth completion models.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The single-step pattern could be adapted to speed up other diffusion tasks such as semantic segmentation or normal estimation.
Lower inference cost opens deployment on mobile or embedded hardware for robotics applications.
Rethinking sparsity protocols in evaluation may encourage more realistic benchmarks for depth completion methods.

Load-bearing premise

Finetuning a pre-trained diffusion model for single-step use keeps depth prediction accuracy and robustness across varied scenes.

What would settle it

A benchmark run where the single-step model produces clearly lower accuracy than standard multi-step diffusion depth completion on the same datasets would disprove the claim.

Figures

Figures reproduced from arXiv: 2603.10584 by Jakub Gregorek, Konrad Schindler, Lazaros Nalpantidis, Nando Metzger, Paraskevas Pegios, Theodora Kontogianni.

**Figure 1.** Figure 1: Performance vs speed trade-off. Comparison of our method Marigold-SSD with other diffusion-based approaches Marigold-DC [66] and Marigold-E2E [44] + LS (w/o sparse condition) as well as discriminative baselines [50, 88] on KITTI dataset [18]. Marigold-SSD occupies a unique region in the tradeoff space closing the efficiency gap to discriminative methods while retaining the benefit of the strong diffusion… view at source ↗

**Figure 2.** Figure 2: Marigold-SSD for zero-shot depth completion. We present a single-step diffusion framework with end-to-end fine-tuning as an efficient alternative to the test-time optimization approach of Marigold-DC [66]. To this end, we introduce a conditional decoder with late fusion to incorporate sparse depth measurements. At inference, our method Marigold-SSD produces high-quality results in a single step, while Mari… view at source ↗

**Figure 3.** Figure 3: Internal architecture of the conditional decoder. DC consists of the VAE decoder D (top row) and blocks processing the sparse condition C (bottom row), adapted from the VAE encoder E (differing in down-sampling positions). Feature maps are concatenated channel-wise (⊕) at five levels and the fusion blocks use 1×1 convolutions (Eq. 1). Conv denotes standard convolution layers, UP, DOWN, and MID blocks are R… view at source ↗

**Figure 4.** Figure 4: Qualitative results. Marigold-SSD generally produces smoother depth maps than Marigold-DC [66], which tends to overrefine details that can lead to unrealistic scene structures. The black arrows highlight variations in the estimated depth, while the red and blue colors indicate the nearest and farthest regions. 5. Discussion Preserving Diffusion Prior. Our results show that the proposed architecture and f… view at source ↗

**Figure 5.** Figure 5: Qualitative results. Both Marigold-SSD and Marigold-DC tend to underestimate sky depth on KITTI and DDAD, consistent with prior Marigold limitations and limited conditioning information in the sky, while they differ in how they estimate fine scene details. #15360 #1500 #1000 #500 Sparsity Level 0.09 0.12 0.16 0.20 R M S E ( E r r o r ) NYUv2 (Indoor) #15360 #1500 #1000 #500 Sparsity Level 0.05 0.07 0.08 0.… view at source ↗

**Figure 6.** Figure 6: Evaluation under multiple levels of depth density on NYUv2 and ScanNet. Depth density is denoted by the number of depth samples (#). See the supplementary material for all datasets. #5000 #1500 #500 Sparsity Level 7.85 9.47 11.09 12.71 R M S E ( E r r o r ) DDAD (Outdoor) #5000 #1500 #500 Sparsity Level 2.20 2.94 3.68 4.42 M A E ( E r r o r ) DDAD (Outdoor) Interpolation Marigold-DC Marigold-SSD* Marigold-… view at source ↗

**Figure 7.** Figure 7: Challenging the models on DDAD. At the commonly used sparsity level of 5000 points even sophisticated models can be outperformed by trivial Barycentric interpolation. #500 #1000 #1500 #15360 Sparsity Level 0.13 0.25 0.36 0.48 R M S E ( E r r o r ) ScanNet (Indoor) #500 #1500 #5000 Sparsity Level 8.55 11.03 13.51 15.99 R M S E ( E r r o r ) DDAD (Outdoor) (A) (B) (C) Marigold-SSD* Marigold-SSD [PITH_FULL_I… view at source ↗

**Figure 8.** Figure 8: Sampling Density. Models (A), (B) & (C) fine-tuned on different densities. See the supplementary material for all datasets. Marigold-SSD trained on our default broad spectrum of sparsity levels can achieve strong zero-shot performance across domains. The impact of condition sampling density could be less pronounced at higher densities, where much of the target information is already provided and lightweig… view at source ↗

read the original abstract

We introduce Marigold-SSD, a single-step, late-fusion depth completion framework that leverages strong diffusion priors while eliminating the costly test-time optimization typically associated with diffusion-based methods. By shifting computational burden from inference to finetuning, our approach enables efficient and robust 3D perception under real-world latency constraints. Marigold-SSD achieves significantly faster inference with a training cost of only 4.5 GPU days. We evaluate our method across four indoor and two outdoor benchmarks, demonstrating strong cross-domain generalization and zero-shot performance compared to existing depth completion approaches. Our approach significantly narrows the efficiency gap between diffusion-based and discriminative models. Finally, we challenge common evaluation protocols by analyzing performance under varying input sparsity levels. Page: https://dtu-pas.github.io/marigold-ssd/

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Marigold-SSD moves the heavy diffusion work into a one-time 4.5-GPU-day fine-tune so inference becomes single-step and fast enough for real-time depth completion.

read the letter

The central point is that this paper takes an existing diffusion depth model (Marigold) and converts it to single-step inference via late fusion after targeted fine-tuning. That removes the usual multi-step sampling at test time while claiming to keep most of the prior's robustness for zero-shot depth completion on sparse inputs. They report training cost of only 4.5 GPU days and test across four indoor plus two outdoor benchmarks, plus an analysis of how performance changes with different sparsity levels in the input. That sparsity sweep is a useful addition because most depth-completion papers fix the sparsity pattern and miss how methods degrade in more realistic conditions. The efficiency shift is the real engineering win here: once the model is tuned, it runs at discriminative speeds without needing test-time optimization. The approach is presented cleanly and the motivation for latency-critical settings like robotics is straightforward. The main soft spot is that the abstract gives no numbers, error bars, or direct comparisons, so it is still unclear how small the accuracy gap actually is relative to multi-step diffusion or strong discriminative baselines. If the drop is modest, the speed gain is attractive; if it is larger, the practical value shrinks. The late-fusion design also assumes the fine-tuning preserves the diffusion prior well enough across domains, which may not hold for every sensor or environment. This paper is aimed at people who need fast, reasonably accurate depth completion in deployed systems rather than pure accuracy leaders. A reader working on real-time 3D perception would find the efficiency numbers and the sparsity analysis worth looking at. It deserves a serious referee because the core design choice is explicit, the evaluation covers multiple domains, and the practical angle is relevant even if the quantitative claims need closer checking in the full text.

Referee Report

2 major / 1 minor

Summary. The paper introduces Marigold-SSD, a single-step late-fusion depth completion framework that leverages diffusion priors from a base model via targeted finetuning, eliminating test-time optimization to achieve fast inference at a reported training cost of 4.5 GPU days. It evaluates the approach on four indoor and two outdoor benchmarks, claiming strong cross-domain generalization and zero-shot performance relative to existing depth completion methods, while also analyzing robustness under varying input sparsity levels.

Significance. If the quantitative claims hold, the work could meaningfully narrow the efficiency gap between diffusion-based and discriminative depth completion models, enabling practical real-world deployment under latency constraints by shifting compute to a one-time finetuning stage while retaining strong priors.

major comments (2)

Abstract: the central claim of 'strong cross-domain generalization and zero-shot performance' is presented without any quantitative metrics, baseline comparisons, error bars, or ablation details, leaving the efficiency and accuracy assertions unsupported by verifiable evidence in the provided text.
Evaluation section: the analysis of performance under varying input sparsity levels is positioned as challenging common protocols, but without explicit quantitative results or direct comparison to the zero-shot claim, it is unclear whether this supports or qualifies the main generalization assertions.

minor comments (1)

The page link and method name (Marigold-SSD) should be consistently referenced with a brief expansion of the acronym on first use.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address the major comments point by point below and will revise the manuscript to strengthen the presentation of quantitative evidence.

read point-by-point responses

Referee: Abstract: the central claim of 'strong cross-domain generalization and zero-shot performance' is presented without any quantitative metrics, baseline comparisons, error bars, or ablation details, leaving the efficiency and accuracy assertions unsupported by verifiable evidence in the provided text.

Authors: We agree that the abstract would benefit from including key quantitative results to make the claims immediately verifiable. In the revised version, we will update the abstract to report specific metrics such as average RMSE and accuracy on the indoor and outdoor benchmarks, the 4.5 GPU-day training cost, inference speed, and direct comparisons to existing depth completion methods. revision: yes
Referee: Evaluation section: the analysis of performance under varying input sparsity levels is positioned as challenging common protocols, but without explicit quantitative results or direct comparison to the zero-shot claim, it is unclear whether this supports or qualifies the main generalization assertions.

Authors: We will revise the evaluation section to present the sparsity analysis results more explicitly, including tables or figures with quantitative metrics across sparsity levels and direct side-by-side comparisons to the zero-shot performance on the main benchmarks. This will clarify how the robustness analysis supports the cross-domain generalization claims. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The provided manuscript text and abstract describe an engineering contribution: a single-step late-fusion diffusion model (Marigold-SSD) obtained by targeted finetuning of a pre-trained diffusion prior, with inference cost reduced by removing test-time optimization. No equations, derivation steps, or self-citation chains are exhibited that reduce a claimed prediction or uniqueness result to a fitted parameter or prior author result by construction. The training cost (4.5 GPU days) and benchmark numbers are presented as empirical outcomes of the training procedure rather than quantities forced by the method's own definitions. The central design choice (shifting compute to finetuning) is explicitly stated and does not rely on a self-referential uniqueness theorem or ansatz smuggled via citation. The derivation chain is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Review limited to abstract; no explicit free parameters, axioms, or invented entities are stated. The approach relies on standard diffusion priors and late fusion without detailing new postulates.

pith-pipeline@v0.9.0 · 5457 in / 1160 out tokens · 49332 ms · 2026-05-15T13:48:49.084174+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

92 extracted references · 92 canonical work pages · 2 internal anchors

[1]

[Online; accessed 28-October-2025]

huggingface/diffusers: State-of-the-art diffusion models for image, video, and audio generation in PyTorch.https:// github.com/huggingface/diffusers, . [Online; accessed 28-October-2025]. 5

work page 2025
[2]

https : / / huggingface

Hugging Face: GonzaloMG/marigold-e2e-ft-depth. https : / / huggingface . co / GonzaloMG / marigold - e2e - ft - depth, . [Online; accessed 28-October-2025]. 5

work page 2025
[3]

Bridging Depth Estimation and Completion for Mobile Robots Reliable 3D Perception

Dimitrios Arapis, Milad Jami, and Lazaros Nalpantidis. Bridging Depth Estimation and Completion for Mobile Robots Reliable 3D Perception. InRobot Intelligence Tech- nology and Applications 7, pages 169–179, Cham, 2023. Springer International Publishing. 3

work page 2023
[4]

Revisiting Depth Completion from a Stereo Matching Perspective for Cross-domain Generaliza- tion

Luca Bartolomei, Matteo Poggi, Andrea Conti, Fabio Tosi, and Stefano Mattoccia. Revisiting Depth Completion from a Stereo Matching Perspective for Cross-domain Generaliza- tion. In2024 International Conference on 3D Vision (3DV), pages 1360–1370, 2024. ISSN: 2475-7888. 1, 3, 5, 6, 7, 8

work page 2024
[5]

ZoeDepth: Zero-shot Transfer by Com- bining Relative and Metric Depth, 2023

Shariq Farooq Bhat, Reiner Birkl, Diana Wofk, Peter Wonka, and Matthias M¨uller. ZoeDepth: Zero-shot Transfer by Com- bining Relative and Metric Depth, 2023. 2

work page 2023
[6]

Depth Pro: Sharp Monocular Metric Depth in Less Than a Second

Aleksei Bochkovskii, Ama ¨el Delaunoy, Hugo Germain, Marcel Santos, Yichao Zhou, Stephan R Richter, and Vladlen Koltun. Depth Pro: Sharp Monocular Metric Depth in Less Than a Second.arXiv preprint arXiv:2410.02073,

work page internal anchor Pith review Pith/arXiv arXiv
[7]

Virtual KITTI 2

Yohann Cabon, Naila Murray, and Martin Humenberger. Vir- tual KITTI 2.CoRR, abs/2001.10773, 2020. 4, 5

work page internal anchor Pith review Pith/arXiv arXiv 2001
[8]

Depth Estimation via Affinity Learned with Convolutional Spatial Propagation Network

Xinjing Cheng, Peng Wang, and Ruigang Yang. Depth Estimation via Affinity Learned with Convolutional Spatial Propagation Network. pages 103–119, 2018. 2

work page 2018
[9]

Xinjing Cheng, Peng Wang, Chenye Guan, and Ruigang Yang. CSPN++: Learning Context and Resource Aware Convolutional Spatial Propagation Networks for Depth Completion.Proceedings of the AAAI Conference on Arti- ficial Intelligence, 34(07):10615–10622, 2020. Number: 07. 2

work page 2020
[10]

Diffusion pos- terior sampling for general noisy inverse problems

Hyungjin Chung, Jeongsol Kim, Michael Thompson Mc- cann, Marc Louis Klasky, and Jong Chul Ye. Diffusion pos- terior sampling for general noisy inverse problems. InThe Eleventh International Conference on Learning Representa- tions, 2023. 4

work page 2023
[11]

Unsupervised confidence for LiDAR depth maps and applications

Andrea Conti, Matteo Poggi, Filippo Aleotti, and Stefano Mattoccia. Unsupervised confidence for LiDAR depth maps and applications. In2022 IEEE/RSJ International Confer- ence on Intelligent Robots and Systems (IROS), pages 8352– 8359, 2022. 5

work page 2022
[12]

Spar- sity Agnostic Depth Completion

Andrea Conti, Matteo Poggi, and Stefano Mattoccia. Spar- sity Agnostic Depth Completion. InProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pages 5871–5880, 2023. 1, 3, 5, 6

work page 2023
[13]

Chang, Manolis Savva, Maciej Hal- ber, Thomas Funkhouser, and Matthias Niessner

Angela Dai, Angel X. Chang, Manolis Savva, Maciej Hal- ber, Thomas Funkhouser, and Matthias Niessner. ScanNet: Richly-Annotated 3D Reconstructions of Indoor Scenes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017. 5, 6

work page 2017
[14]

Diffusion models beat gans on image synthesis.Advances in neural informa- tion processing systems, 34:8780–8794, 2021

Prafulla Dhariwal and Alexander Nichol. Diffusion models beat gans on image synthesis.Advances in neural informa- tion processing systems, 34:8780–8794, 2021. 4

work page 2021
[15]

DTU Computing Center resources

DTU Computing Center. DTU Computing Center resources. https : / / doi . org / 10 . 48714 / DTU . HPC . 0001,

work page
[16]

Geowiz- ard: Unleashing the Diffusion Priors for 3D Geometry Es- timation from a Single Image

Xiao Fu, Wei Yin, Mu Hu, Kaixuan Wang, Yuexin Ma, Ping Tan, Shaojie Shen, Dahua Lin, and Xiaoxiao Long. Geowiz- ard: Unleashing the Diffusion Priors for 3D Geometry Es- timation from a Single Image. InEuropean Conference on Computer Vision, pages 241–258. Springer, 2024. 1

work page 2024
[17]

Virtual Worlds as Proxy for Multi-Object Tracking Analysis

Adrien Gaidon, Qiao Wang, Yohann Cabon, and Eleonora Vig. Virtual Worlds as Proxy for Multi-Object Tracking Analysis. InProceedings of the IEEE conference on Com- puter Vision and Pattern Recognition, pages 4340–4349,

work page
[18]

Vision meets Robotics: The KITTI Dataset.Inter- national Journal of Robotics Research (IJRR), 2013

Andreas Geiger, Philip Lenz, Christoph Stiller, and Raquel Urtasun. Vision meets Robotics: The KITTI Dataset.Inter- national Journal of Robotics Research (IJRR), 2013. 1, 5, 6

work page 2013
[19]

SteeredMarigold: Steering Diffusion Towards Depth Completion of Largely Incomplete Depth Maps

Jakub Gregorek and Lazaros Nalpantidis. SteeredMarigold: Steering Diffusion Towards Depth Completion of Largely Incomplete Depth Maps. In2025 IEEE International Con- ference on Robotics and Automation (ICRA), pages 13304– 13311, 2025. 1, 3

work page 2025
[20]

DepthFM: Fast Generative Monocular Depth Estimation with Flow Matching

Ming Gui, Johannes Schusterbauer, Ulrich Prestel, Pingchuan Ma, Dmytro Kotovenko, Olga Grebenkova, Stefan Andreas Baumann, Vincent Tao Hu, and Bj ¨orn Ommer. DepthFM: Fast Generative Monocular Depth Estimation with Flow Matching. InProceedings of the AAAI Conference on Artificial Intelligence, pages 3203–3211,

work page
[21]

3D Packing for Self-Supervised Monocular Depth Estimation

Vitor Guizilini, Rares Ambrus, Sudeep Pillai, Allan Raven- tos, and Adrien Gaidon. 3D Packing for Self-Supervised Monocular Depth Estimation. InIEEE Conference on Com- puter Vision and Pattern Recognition (CVPR), 2020. 5

work page 2020
[22]

Lotus: Diffusion-based visual foundation model for high-quality dense prediction.arXiv preprint arXiv:2409.18124, 2024

Jing He, Haodong Li, Wei Yin, Yixun Liang, Leheng Li, Kaiqiang Zhou, Hongbo Zhang, Bingbing Liu, and Ying- Cong Chen. Lotus: Diffusion-based visual foundation model for high-quality dense prediction.arXiv preprint arXiv:2409.18124, 2024. 1

work page arXiv 2024
[23]

Deep Residual Learning for Image Recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep Residual Learning for Image Recognition. InProceed- ings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016. 3

work page 2016
[24]

Denoising Dif- fusion Probabilistic Models

Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising Dif- fusion Probabilistic Models. InAdvances in Neural Infor- mation Processing Systems, pages 6840–6851. Curran Asso- ciates, Inc., 2020. 3

work page 2020
[25]

Deep Depth Comple- tion From Extremely Sparse Data: A Survey.IEEE Trans- actions on Pattern Analysis and Machine Intelligence, 45(7): 8244–8264, 2023

Junjie Hu, Chenyu Bao, Mete Ozay, Chenyou Fan, Qing Gao, Honghai Liu, and Tin Lun Lam. Deep Depth Comple- tion From Extremely Sparse Data: A Survey.IEEE Trans- actions on Pattern Analysis and Machine Intelligence, 45(7): 8244–8264, 2023. 1

work page 2023
[26]

PENet: Towards Precise and Efficient Image Guided Depth Completion

Mu Hu, Shuling Wang, Bin Li, Shiyu Ning, Li Fan, and Xi- aojin Gong. PENet: Towards Precise and Efficient Image Guided Depth Completion. In2021 IEEE International Con- ference on Robotics and Automation (ICRA), pages 13656– 13662, 2021. ISSN: 2577-087X. 2

work page 2021
[27]

Test-Time Prompt Tuning for Zero-Shot Depth Com- pletion

Chanhwi Jeong, Inhwan Bae, Jin-Hwi Park, and Hae-Gon Jeon. Test-Time Prompt Tuning for Zero-Shot Depth Com- pletion. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 9443–9454,

work page
[28]

Repurpos- ing Diffusion-Based Image Generators for Monocular Depth Estimation

Bingxin Ke, Anton Obukhov, Shengyu Huang, Nando Met- zger, Rodrigo Caye Daudt, and Konrad Schindler. Repurpos- ing Diffusion-Based Image Generators for Monocular Depth Estimation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024. 1, 3, 4, 5, 6

work page 2024
[29]

Marigold: Affordable Adaptation of Diffusion- Based Image Generators for Image Analysis.IEEE Trans- actions on Pattern Analysis and Machine Intelligence, pages 1–18, 2025

Bingxin Ke, Kevin Qu, Tianfu Wang, Nando Metzger, Shengyu Huang, Bo Li, Anton Obukhov, and Konrad Schindler. Marigold: Affordable Adaptation of Diffusion- Based Image Generators for Image Analysis.IEEE Trans- actions on Pattern Analysis and Machine Intelligence, pages 1–18, 2025. 3

work page 2025
[30]

Evaluation of CNN-based Single-Image Depth Estimation Methods

Tobias Koch, Lukas Liebel, Friedrich Fraundorfer, and Marco Korner. Evaluation of CNN-based Single-Image Depth Estimation Methods. InProceedings of the European Conference on Computer Vision (ECCV) Workshops, 2018. 5, 6

work page 2018
[31]

Orchid: Image latent diffusion for joint appearance and geometry generation.arXiv preprint arXiv:2501.13087,

Akshay Krishnan, Xinchen Yan, Vincent Casser, and Abhijit Kundu. Orchid: Image latent diffusion for joint appearance and geometry generation.arXiv preprint arXiv:2501.13087,

work page arXiv
[32]

Waslander

Jason Ku, Ali Harakeh, and Steven L. Waslander. In Defense of Classical Image Processing: Fast Depth Completion on the CPU. In2018 15th Conference on Computer and Robot Vision (CRV), pages 16–22, 2018. 2

work page 2018
[33]

Dis- tilling Monocular Foundation Model for Fine-grained Depth Completion

Yingping Liang, Yutao Hu, Wenqi Shao, and Ying Fu. Dis- tilling Monocular Foundation Model for Fine-grained Depth Completion. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 22254–22265, 2025. 3, 5

work page 2025
[34]

Dis- tilling monocular foundation model for fine-grained depth completion

Yingping Liang, Yutao Hu, Wenqi Shao, and Ying Fu. Dis- tilling monocular foundation model for fine-grained depth completion. InProceedings of the Computer Vision and Pat- tern Recognition Conference, pages 22254–22265, 2025. 6

work page 2025
[35]

Chen, Zhenyu Li, Guang Shi, Jiashi Feng, and Bingyi Kang

Haotong Lin, Sili Chen, Junhao Liew, Donny Y . Chen, Zhenyu Li, Guang Shi, Jiashi Feng, and Bingyi Kang. Depth Anything 3: Recovering the Visual Space from Any Views,

work page
[36]

Prompting Depth Anything for 4K Resolution Accurate Metric Depth Estimation

Haotong Lin, Sida Peng, Jingxiao Chen, Songyou Peng, Ji- aming Sun, Minghuan Liu, Hujun Bao, Jiashi Feng, Xiaowei Zhou, and Bingyi Kang. Prompting Depth Anything for 4K Resolution Accurate Metric Depth Estimation. InProceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 17070–17080, 2025. 1, 3, 5

work page 2025
[37]

Common diffusion noise schedules and sample steps are flawed

Shanchuan Lin, Bingchen Liu, Jiashi Li, and Xiao Yang. Common diffusion noise schedules and sample steps are flawed. InProceedings of the IEEE/CVF winter conference on applications of computer vision, pages 5404–5411, 2024. 4

work page 2024
[38]

DySPN: Learning Dynamic Affinity for Image-Guided Depth Completion.IEEE Transactions on Circuits and Systems for Video Technology, 34(6):4596– 4609, 2024

Yuankai Lin, Hua Yang, Tao Cheng, Wending Zhou, and Zhouping Yin. DySPN: Learning Dynamic Affinity for Image-Guided Depth Completion.IEEE Transactions on Circuits and Systems for Video Technology, 34(6):4596– 4609, 2024. 2

work page 2024
[39]

Chen, Heli Ben-Hamu, Maximil- ian Nickel, and Matt Le

Yaron Lipman, Ricky T.Q. Chen, Heli Ben-Hamu, Maximil- ian Nickel, and Matt Le. Flow matching for generative mod- eling. In11th International Conference on Learning Repre- sentations (ICLR), 2023. 3

work page 2023
[40]

Learning Affinity via Spatial Propagation Networks

Sifei Liu, Shalini De Mello, Jinwei Gu, Guangyu Zhong, Ming-Hsuan Yang, and Jan Kautz. Learning Affinity via Spatial Propagation Networks. InAdvances in Neural Infor- mation Processing Systems. Curran Associates, Inc., 2017. 2

work page 2017
[41]

DepthLab: From Partial to Complete, 2024

Zhiheng Liu, Ka Leong Cheng, Qiuyu Wang, Shuzhe Wang, Hao Ouyang, Bin Tan, Kai Zhu, Yujun Shen, Qifeng Chen, and Ping Luo. DepthLab: From Partial to Complete, 2024. 5, 6

work page 2024
[42]

Decoupled Weight De- cay Regularization

Ilya Loshchilov and Frank Hutter. Decoupled Weight De- cay Regularization. InInternational Conference on Learning Representations, 2019. 5

work page 2019
[43]

Latent Consistency Models: Synthesizing High- Resolution Images with Few-Step Inference, 2023

Simian Luo, Yiqin Tan, Longbo Huang, Jian Li, and Hang Zhao. Latent Consistency Models: Synthesizing High- Resolution Images with Few-Step Inference, 2023. 3

work page 2023
[44]

Fine-Tuning Image-Conditional Diffusion Models is Easier than You Think

Gonzalo Martin Garcia, Karim Abou Zeid, Christian Schmidt, Daan de Geus, Alexander Hermans, and Bastian Leibe. Fine-Tuning Image-Conditional Diffusion Models is Easier than You Think. InProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2025. 1, 3, 4, 5, 6, 7

work page 2025
[45]

A Large Dataset to Train Convolutional Networks for Dispar- ity, Optical Flow, and Scene Flow Estimation

Nikolaus Mayer, Eddy Ilg, Philip Hausser, Philipp Fischer, Daniel Cremers, Alexey Dosovitskiy, and Thomas Brox. A Large Dataset to Train Convolutional Networks for Dispar- ity, Optical Flow, and Scene Flow Estimation. InProceed- ings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016. 7

work page 2016
[46]

Indoor Segmentation and Support Inference from RGBD Images

Pushmeet Kohli Nathan Silberman, Derek Hoiem and Rob Fergus. Indoor Segmentation and Support Inference from RGBD Images. InECCV, 2012. 5, 6

work page 2012
[47]

SemAttNet: Toward Attention-Based Semantic Aware Guided Depth Comple- tion.IEEE Access, 10:120781–120791, 2022

Danish Nazir, Alain Pagani, Marcus Liwicki, Didier Stricker, and Muhammad Zeshan Afzal. SemAttNet: Toward Attention-Based Semantic Aware Guided Depth Comple- tion.IEEE Access, 10:120781–120791, 2022. Conference Name: IEEE Access. 2

work page 2022
[48]

Scaling Diffusion Models to Real-World 3D LiDAR Scene Completion

Lucas Nunes, Rodrigo Marcuzzi, Benedikt Mersch, Jens Behley, and Cyrill Stachniss. Scaling Diffusion Models to Real-World 3D LiDAR Scene Completion. InProceedings of the IEEE/CVF Conference on Computer Vision and Pat- tern Recognition (CVPR), pages 14770–14780, 2024. 2

work page 2024
[49]

DINOv2: Learning Robust Visual Features without Supervision, 2024

Maxime Oquab, Timoth ´ee Darcet, Th ´eo Moutakanni, Huy V o, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, Mah- moud Assran, Nicolas Ballas, Wojciech Galuba, Russell Howes, Po-Yao Huang, Shang-Wen Li, Ishan Misra, Michael Rabbat, Vasu Sharma, Gabriel Synnaeve, Hu Xu, Herv ´e Je- gou, Julien Mairal, ...

work page 2024
[50]

Non-local Spatial Propagation Network for Depth Completion

Jinsun Park, Kyungdon Joo, Zhe Hu, Chi-Kuei Liu, and In So Kweon. Non-local Spatial Propagation Network for Depth Completion. InComputer Vision – ECCV 2020, pages 120–136, Cham, 2020. Springer International Publishing. 1, 2, 5, 6

work page 2020
[51]

SharpDepth: Sharpening Metric Depth Predictions Using Diffusion Distillation

Duc-Hai Pham, Tung Do, Phong Nguyen, Binh-Son Hua, Khoi Nguyen, and Rang Nguyen. SharpDepth: Sharpening Metric Depth Predictions Using Diffusion Distillation. In Proceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition (CVPR), pages 17060–17069,

work page
[52]

UniDepth: Universal Monocular Metric Depth Estimation

Luigi Piccinelli, Yung-Hsu Yang, Christos Sakaridis, Mattia Segu, Siyuan Li, Luc Van Gool, and Fisher Yu. UniDepth: Universal Monocular Metric Depth Estimation. InProceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 10106–10116, 2024. 2

work page 2024
[53]

UniDepthV2: Universal Monocular Metric Depth Estima- tion Made Simpler.IEEE Transactions on Pattern Analysis and Machine Intelligence, 48(3):2354–2367, 2026

Luigi Piccinelli, Christos Sakaridis, Yung-Hsu Yang, Mat- tia Segu, Siyuan Li, Wim Abbeloos, and Luc Van Gool. UniDepthV2: Universal Monocular Metric Depth Estima- tion Made Simpler.IEEE Transactions on Pattern Analysis and Machine Intelligence, 48(3):2354–2367, 2026. 2

work page 2026
[54]

Ren ´e Ranftl, Katrin Lasinger, David Hafner, Konrad Schindler, and Vladlen Koltun. Towards Robust Monocu- lar Depth Estimation: Mixing Datasets for Zero-Shot Cross- Dataset Transfer.IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(3):1623–1637, 2022. 2, 4

work page 2022
[55]

Susskind

Mike Roberts, Jason Ramapuram, Anurag Ranjan, Atulit Kumar, Miguel Angel Bautista, Nathan Paczan, Russ Webb, and Joshua M. Susskind. Hypersim: A Photorealistic Syn- thetic Dataset for Holistic Indoor Scene Understanding. In International Conference on Computer Vision (ICCV) 2021,

work page 2021
[56]

High-Resolution Image Synthesis with Latent Diffusion Models

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bj ¨orn Ommer. High-Resolution Image Synthesis with Latent Diffusion Models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 10684–10695, 2022. 1, 3

work page 2022
[57]

U- net: Convolutional networks for biomedical image segmen- tation

Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U- net: Convolutional networks for biomedical image segmen- tation. InInternational Conference on Medical image com- puting and computer-assisted intervention, pages 234–241. Springer, 2015. 3

work page 2015
[58]

Progressive distillation for fast sampling of diffusion models

Tim Salimans and Jonathan Ho. Progressive distillation for fast sampling of diffusion models. InInternational Confer- ence on Learning Representations, 2022. 3

work page 2022
[59]

Shuwei Shao, Zhongcai Pei, Weihai Chen, Peter C. Y . Chen, and Zhengguo Li. NDDepth: Normal-Distance Assisted Monocular Depth Estimation and Completion.IEEE Trans- actions on Pattern Analysis and Machine Intelligence, 46 (12):8883–8899, 2024. 2

work page 2024
[60]

Denois- ing diffusion implicit models

Jiaming Song, Chenlin Meng, and Stefano Ermon. Denois- ing diffusion implicit models. InInternational Conference on Learning Representations, 2021. 4

work page 2021
[61]

Learning Guided Convolutional Network for Depth Com- pletion.IEEE Transactions on Image Processing, 30:1116– 1129, 2021

Jie Tang, Fei-Peng Tian, Wei Feng, Jian Li, and Ping Tan. Learning Guided Convolutional Network for Depth Com- pletion.IEEE Transactions on Image Processing, 30:1116– 1129, 2021. 2

work page 2021
[62]

Bi- lateral Propagation Network for Depth Completion

Jie Tang, Fei-Peng Tian, Boshi An, Jian Li, and Ping Tan. Bi- lateral Propagation Network for Depth Completion. InPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 9763–9772, 2024. 1, 2, 5, 6

work page 2024
[63]

Gaussian belief propagation network for depth completion.arXiv preprint arXiv:2601.21291, 2026

Jie Tang, Pingping Xie, Jian Li, and Ping Tan. Gaussian belief propagation network for depth completion.arXiv preprint arXiv:2601.21291, 2026. 5, 6

work page arXiv 2026
[64]

Sparsity Invariant CNNs

Jonas Uhrig, Nick Schneider, Lukas Schneider, Uwe Franke, Thomas Brox, and Andreas Geiger. Sparsity Invariant CNNs. InInternational Conference on 3D Vision (3DV), 2017. 5

work page 2017
[65]

Neural discrete representation learning.Advances in neural information pro- cessing systems, 30, 2017

Aaron Van Den Oord, Oriol Vinyals, et al. Neural discrete representation learning.Advances in neural information pro- cessing systems, 30, 2017. 3

work page 2017
[66]

Marigold-DC: Zero-Shot Monocular Depth Completion with Guided Diffusion, 2024

Massimiliano Viola, Kevin Qu, Nando Metzger, Bingxin Ke, Alexander Becker, Konrad Schindler, and Anton Obukhov. Marigold-DC: Zero-Shot Monocular Depth Completion with Guided Diffusion, 2024. 1, 2, 3, 4, 5, 6, 7, 8

work page 2024
[67]

Learning Inverse Laplacian Pyramid for Progressive Depth Completion.arXiv preprint arXiv:2502.07289, 2025

Kun Wang, Zhiqiang Yan, Junkai Fan, Jun Li, and Jian Yang. Learning Inverse Laplacian Pyramid for Progressive Depth Completion.arXiv preprint arXiv:2502.07289, 2025. 2

work page arXiv 2025
[68]

MoGe: Unlock- ing Accurate Monocular Geometry Estimation for Open- Domain Images with Optimal Training Supervision

Ruicheng Wang, Sicheng Xu, Cassie Dai, Jianfeng Xiang, Yu Deng, Xin Tong, and Jiaolong Yang. MoGe: Unlock- ing Accurate Monocular Geometry Estimation for Open- Domain Images with Optimal Training Supervision. InPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 5261–5271, 2025. 2

work page 2025
[69]

MoGe-2: Accurate Monocular Geometry with Metric Scale and Sharp Details, 2025

Ruicheng Wang, Sicheng Xu, Yue Dong, Yu Deng, Jianfeng Xiang, Zelong Lv, Guangzhong Sun, Xin Tong, and Jiaolong Yang. MoGe-2: Accurate Monocular Geometry with Metric Scale and Sharp Details, 2025. 2

work page 2025
[70]

LRRU: Long-short Range Recurrent Updating Net- works for Depth Completion

Yufei Wang, Bo Li, Ge Zhang, Qi Liu, Tao Gao, and Yuchao Dai. LRRU: Long-short Range Recurrent Updating Net- works for Depth Completion. pages 9422–9432, 2023. 1, 2

work page 2023
[71]

Decom- posed Guided Dynamic Filters for Efficient RGB-Guided Depth Completion.IEEE Transactions on Circuits and Sys- tems for Video Technology, 34(2):1186–1198, 2024

Yufei Wang, Yuxin Mao, Qi Liu, and Yuchao Dai. Decom- posed Guided Dynamic Filters for Efficient RGB-Guided Depth Completion.IEEE Transactions on Circuits and Sys- tems for Video Technology, 34(2):1186–1198, 2024. 2

work page 2024
[72]

Improving Depth Completion via Depth Feature Upsampling

Yufei Wang, Ge Zhang, Shaoqian Wang, Bo Li, Qi Liu, Le Hui, and Yuchao Dai. Improving Depth Completion via Depth Feature Upsampling. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 21104–21113, 2024. 2

work page 2024
[73]

Fast diffusion-based counterfactuals for shortcut removal and generation

Nina Weng, Paraskevas Pegios, Eike Petersen, Aasa Feragen, and Siavash Bigdeli. Fast diffusion-based counterfactuals for shortcut removal and generation. InEuropean Conference on Computer Vision, pages 338–357. Springer, 2024. 4

work page 2024
[74]

Unsupervised Depth Completion from Visual Iner- tial Odometry.IEEE Robotics and Automation Letters, 5(2): 1899–1906, 2020

Alex Wong, Xiaohan Fei, Stephanie Tsuei, and Stefano Soatto. Unsupervised Depth Completion from Visual Iner- tial Odometry.IEEE Robotics and Automation Letters, 5(2): 1899–1906, 2020. 5

work page 1906
[75]

DEPTHOR++: Robust Depth Enhance- ment from a Real-World Lightweight dToF and RGB Guid- ance, 2025

Jijun Xiang, Longliang Liu, Xuan Zhu, Xianqi Wang, Min Lin, and Xin Yang. DEPTHOR++: Robust Depth Enhance- ment from a Real-World Lightweight dToF and RGB Guid- ance, 2025. 3

work page 2025
[76]

DEPTHOR: Depth Enhance- ment from a Practical Light-Weight dToF Sensor and RGB Image.arXiv preprint arXiv:2504.01596, 2025

Jijun Xiang, Xuan Zhu, Xianqi Wang, Yu Wang, Hong Zhang, Fei Guo, and Xin Yang. DEPTHOR: Depth Enhance- ment from a Practical Light-Weight dToF Sensor and RGB Image.arXiv preprint arXiv:2504.01596, 2025. 2, 3

work page arXiv 2025
[77]

3dDepthNet: Point Cloud Guided Depth Completion Net- work for Sparse Depth and Single Color Image.CoRR, abs/2003.09175, 2020

Rui Xiang, Feng Zheng, Huapeng Su, and Zhe Zhang. 3dDepthNet: Point Cloud Guided Depth Completion Net- work for Sparse Depth and Single Color Image.CoRR, abs/2003.09175, 2020. 2

work page arXiv 2003
[78]

Recent Advances in Conventional and Deep Learning-Based Depth Completion: A Survey.IEEE Trans- actions on Neural Networks and Learning Systems, 35(3): 3395–3415, 2024

Zexiao Xie, Xiaoxuan Yu, Xiang Gao, Kunqian Li, and Shuhan Shen. Recent Advances in Conventional and Deep Learning-Based Depth Completion: A Survey.IEEE Trans- actions on Neural Networks and Learning Systems, 35(3): 3395–3415, 2024. 1

work page 2024
[79]

Pixel-perfect depth with semantics- prompted diffusion transformers, 2025

Gangwei Xu, Haotong Lin, Hongcheng Luo, Xianqi Wang, Jingfeng Yao, Lianghui Zhu, Yuechuan Pu, Cheng Chi, Haiyang Sun, Bing Wang, Guang Chen, Hangjun Ye, Sida Peng, and Xin Yang. Pixel-perfect depth with semantics- prompted diffusion transformers, 2025. 3

work page 2025
[80]

Tri-Perspective View Decomposition for Geometry-Aware Depth Completion

Zhiqiang Yan, Yuankai Lin, Kun Wang, Yupeng Zheng, Yufei Wang, Zhenyu Zhang, Jun Li, and Jian Yang. Tri-Perspective View Decomposition for Geometry-Aware Depth Completion. pages 4874–4884, 2024. 2

work page 2024

Showing first 80 references.