pith. machine review for the scientific record. sign in

arxiv: 2603.10584 · v2 · submitted 2026-03-11 · 💻 cs.CV · cs.RO

Recognition: 1 theorem link

· Lean Theorem

Need for Speed: Zero-Shot Depth Completion with Single-Step Diffusion

Authors on Pith no claims yet

Pith reviewed 2026-05-15 13:48 UTC · model grok-4.3

classification 💻 cs.CV cs.RO
keywords depth completiondiffusion modelssingle-step inferencezero-shot3D perceptionsparse depthcomputer vision
0
0 comments X

The pith

Marigold-SSD completes depth maps in one diffusion step by finetuning instead of test-time optimization.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents Marigold-SSD as a single-step late-fusion framework for depth completion that draws on diffusion priors. It removes the slow iterative sampling usually required at inference by performing a short finetuning phase instead. This shift keeps the benefits of strong priors while cutting inference time enough for real-world use. Tests on four indoor and two outdoor benchmarks show competitive accuracy and generalization without further per-scene training. The work also examines how results change when input depth becomes sparser.

Core claim

Marigold-SSD is a single-step late-fusion depth completion framework that leverages strong diffusion priors while eliminating the costly test-time optimization typically associated with diffusion-based methods. By shifting computational burden from inference to finetuning, the approach enables efficient and robust 3D perception under real-world latency constraints after only 4.5 GPU days of training. Evaluations across indoor and outdoor benchmarks demonstrate strong cross-domain generalization and zero-shot performance, narrowing the efficiency gap with discriminative models.

What carries the argument

Single-step late-fusion depth completion that fuses diffusion priors in one forward pass after finetuning.

If this is right

  • Faster inference times make diffusion priors practical for latency-sensitive 3D perception tasks.
  • Training cost limited to 4.5 GPU days removes the need for repeated test-time optimization.
  • Zero-shot results hold across indoor and outdoor domains without scene-specific retraining.
  • Performance remains stable under varying levels of input depth sparsity.
  • The method closes much of the speed gap between diffusion and discriminative depth completion models.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The single-step pattern could be adapted to speed up other diffusion tasks such as semantic segmentation or normal estimation.
  • Lower inference cost opens deployment on mobile or embedded hardware for robotics applications.
  • Rethinking sparsity protocols in evaluation may encourage more realistic benchmarks for depth completion methods.

Load-bearing premise

Finetuning a pre-trained diffusion model for single-step use keeps depth prediction accuracy and robustness across varied scenes.

What would settle it

A benchmark run where the single-step model produces clearly lower accuracy than standard multi-step diffusion depth completion on the same datasets would disprove the claim.

Figures

Figures reproduced from arXiv: 2603.10584 by Jakub Gregorek, Konrad Schindler, Lazaros Nalpantidis, Nando Metzger, Paraskevas Pegios, Theodora Kontogianni.

Figure 1
Figure 1. Figure 1: Performance vs speed trade-off. Comparison of our method Marigold-SSD with other diffusion-based approaches Marigold-DC [66] and Marigold-E2E [44] + LS (w/o sparse con￾dition) as well as discriminative baselines [50, 88] on KITTI dataset [18]. Marigold-SSD occupies a unique region in the trade￾off space closing the efficiency gap to discriminative methods while retaining the benefit of the strong diffusion… view at source ↗
Figure 2
Figure 2. Figure 2: Marigold-SSD for zero-shot depth completion. We present a single-step diffusion framework with end-to-end fine-tuning as an efficient alternative to the test-time optimization approach of Marigold-DC [66]. To this end, we introduce a conditional decoder with late fusion to incorporate sparse depth measurements. At inference, our method Marigold-SSD produces high-quality results in a single step, while Mari… view at source ↗
Figure 3
Figure 3. Figure 3: Internal architecture of the conditional decoder. DC consists of the VAE decoder D (top row) and blocks processing the sparse condition C (bottom row), adapted from the VAE encoder E (differing in down-sampling positions). Feature maps are concatenated channel-wise (⊕) at five levels and the fusion blocks use 1×1 convolutions (Eq. 1). Conv denotes standard convolution layers, UP, DOWN, and MID blocks are R… view at source ↗
Figure 4
Figure 4. Figure 4: Qualitative results. Marigold-SSD generally produces smoother depth maps than Marigold-DC [66], which tends to over￾refine details that can lead to unrealistic scene structures. The black arrows highlight variations in the estimated depth, while the red and blue colors indicate the nearest and farthest regions. 5. Discussion Preserving Diffusion Prior. Our results show that the pro￾posed architecture and f… view at source ↗
Figure 5
Figure 5. Figure 5: Qualitative results. Both Marigold-SSD and Marigold-DC tend to underestimate sky depth on KITTI and DDAD, consistent with prior Marigold limitations and limited conditioning information in the sky, while they differ in how they estimate fine scene details. #15360 #1500 #1000 #500 Sparsity Level 0.09 0.12 0.16 0.20 R M S E ( E r r o r ) NYUv2 (Indoor) #15360 #1500 #1000 #500 Sparsity Level 0.05 0.07 0.08 0.… view at source ↗
Figure 6
Figure 6. Figure 6: Evaluation under multiple levels of depth density on NYUv2 and ScanNet. Depth density is denoted by the number of depth samples (#). See the supplementary material for all datasets. #5000 #1500 #500 Sparsity Level 7.85 9.47 11.09 12.71 R M S E ( E r r o r ) DDAD (Outdoor) #5000 #1500 #500 Sparsity Level 2.20 2.94 3.68 4.42 M A E ( E r r o r ) DDAD (Outdoor) Interpolation Marigold-DC Marigold-SSD* Marigold-… view at source ↗
Figure 7
Figure 7. Figure 7: Challenging the models on DDAD. At the commonly used sparsity level of 5000 points even sophisticated models can be outperformed by trivial Barycentric interpolation. #500 #1000 #1500 #15360 Sparsity Level 0.13 0.25 0.36 0.48 R M S E ( E r r o r ) ScanNet (Indoor) #500 #1500 #5000 Sparsity Level 8.55 11.03 13.51 15.99 R M S E ( E r r o r ) DDAD (Outdoor) (A) (B) (C) Marigold-SSD* Marigold-SSD [PITH_FULL_I… view at source ↗
Figure 8
Figure 8. Figure 8: Sampling Density. Models (A), (B) & (C) fine-tuned on different densities. See the supplementary material for all datasets. Marigold-SSD trained on our default broad spectrum of sparsity levels can achieve strong zero-shot performance across domains. The impact of condition sampling den￾sity could be less pronounced at higher densities, where much of the target information is already provided and lightweig… view at source ↗
read the original abstract

We introduce Marigold-SSD, a single-step, late-fusion depth completion framework that leverages strong diffusion priors while eliminating the costly test-time optimization typically associated with diffusion-based methods. By shifting computational burden from inference to finetuning, our approach enables efficient and robust 3D perception under real-world latency constraints. Marigold-SSD achieves significantly faster inference with a training cost of only 4.5 GPU days. We evaluate our method across four indoor and two outdoor benchmarks, demonstrating strong cross-domain generalization and zero-shot performance compared to existing depth completion approaches. Our approach significantly narrows the efficiency gap between diffusion-based and discriminative models. Finally, we challenge common evaluation protocols by analyzing performance under varying input sparsity levels. Page: https://dtu-pas.github.io/marigold-ssd/

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper introduces Marigold-SSD, a single-step late-fusion depth completion framework that leverages diffusion priors from a base model via targeted finetuning, eliminating test-time optimization to achieve fast inference at a reported training cost of 4.5 GPU days. It evaluates the approach on four indoor and two outdoor benchmarks, claiming strong cross-domain generalization and zero-shot performance relative to existing depth completion methods, while also analyzing robustness under varying input sparsity levels.

Significance. If the quantitative claims hold, the work could meaningfully narrow the efficiency gap between diffusion-based and discriminative depth completion models, enabling practical real-world deployment under latency constraints by shifting compute to a one-time finetuning stage while retaining strong priors.

major comments (2)
  1. Abstract: the central claim of 'strong cross-domain generalization and zero-shot performance' is presented without any quantitative metrics, baseline comparisons, error bars, or ablation details, leaving the efficiency and accuracy assertions unsupported by verifiable evidence in the provided text.
  2. Evaluation section: the analysis of performance under varying input sparsity levels is positioned as challenging common protocols, but without explicit quantitative results or direct comparison to the zero-shot claim, it is unclear whether this supports or qualifies the main generalization assertions.
minor comments (1)
  1. The page link and method name (Marigold-SSD) should be consistently referenced with a brief expansion of the acronym on first use.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address the major comments point by point below and will revise the manuscript to strengthen the presentation of quantitative evidence.

read point-by-point responses
  1. Referee: Abstract: the central claim of 'strong cross-domain generalization and zero-shot performance' is presented without any quantitative metrics, baseline comparisons, error bars, or ablation details, leaving the efficiency and accuracy assertions unsupported by verifiable evidence in the provided text.

    Authors: We agree that the abstract would benefit from including key quantitative results to make the claims immediately verifiable. In the revised version, we will update the abstract to report specific metrics such as average RMSE and accuracy on the indoor and outdoor benchmarks, the 4.5 GPU-day training cost, inference speed, and direct comparisons to existing depth completion methods. revision: yes

  2. Referee: Evaluation section: the analysis of performance under varying input sparsity levels is positioned as challenging common protocols, but without explicit quantitative results or direct comparison to the zero-shot claim, it is unclear whether this supports or qualifies the main generalization assertions.

    Authors: We will revise the evaluation section to present the sparsity analysis results more explicitly, including tables or figures with quantitative metrics across sparsity levels and direct side-by-side comparisons to the zero-shot performance on the main benchmarks. This will clarify how the robustness analysis supports the cross-domain generalization claims. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The provided manuscript text and abstract describe an engineering contribution: a single-step late-fusion diffusion model (Marigold-SSD) obtained by targeted finetuning of a pre-trained diffusion prior, with inference cost reduced by removing test-time optimization. No equations, derivation steps, or self-citation chains are exhibited that reduce a claimed prediction or uniqueness result to a fitted parameter or prior author result by construction. The training cost (4.5 GPU days) and benchmark numbers are presented as empirical outcomes of the training procedure rather than quantities forced by the method's own definitions. The central design choice (shifting compute to finetuning) is explicitly stated and does not rely on a self-referential uniqueness theorem or ansatz smuggled via citation. The derivation chain is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Review limited to abstract; no explicit free parameters, axioms, or invented entities are stated. The approach relies on standard diffusion priors and late fusion without detailing new postulates.

pith-pipeline@v0.9.0 · 5457 in / 1160 out tokens · 49332 ms · 2026-05-15T13:48:49.084174+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

92 extracted references · 92 canonical work pages · 2 internal anchors

  1. [1]

    [Online; accessed 28-October-2025]

    huggingface/diffusers: State-of-the-art diffusion models for image, video, and audio generation in PyTorch.https:// github.com/huggingface/diffusers, . [Online; accessed 28-October-2025]. 5

  2. [2]

    https : / / huggingface

    Hugging Face: GonzaloMG/marigold-e2e-ft-depth. https : / / huggingface . co / GonzaloMG / marigold - e2e - ft - depth, . [Online; accessed 28-October-2025]. 5

  3. [3]

    Bridging Depth Estimation and Completion for Mobile Robots Reliable 3D Perception

    Dimitrios Arapis, Milad Jami, and Lazaros Nalpantidis. Bridging Depth Estimation and Completion for Mobile Robots Reliable 3D Perception. InRobot Intelligence Tech- nology and Applications 7, pages 169–179, Cham, 2023. Springer International Publishing. 3

  4. [4]

    Revisiting Depth Completion from a Stereo Matching Perspective for Cross-domain Generaliza- tion

    Luca Bartolomei, Matteo Poggi, Andrea Conti, Fabio Tosi, and Stefano Mattoccia. Revisiting Depth Completion from a Stereo Matching Perspective for Cross-domain Generaliza- tion. In2024 International Conference on 3D Vision (3DV), pages 1360–1370, 2024. ISSN: 2475-7888. 1, 3, 5, 6, 7, 8

  5. [5]

    ZoeDepth: Zero-shot Transfer by Com- bining Relative and Metric Depth, 2023

    Shariq Farooq Bhat, Reiner Birkl, Diana Wofk, Peter Wonka, and Matthias M¨uller. ZoeDepth: Zero-shot Transfer by Com- bining Relative and Metric Depth, 2023. 2

  6. [6]

    Depth Pro: Sharp Monocular Metric Depth in Less Than a Second

    Aleksei Bochkovskii, Ama ¨el Delaunoy, Hugo Germain, Marcel Santos, Yichao Zhou, Stephan R Richter, and Vladlen Koltun. Depth Pro: Sharp Monocular Metric Depth in Less Than a Second.arXiv preprint arXiv:2410.02073,

  7. [7]

    Virtual KITTI 2

    Yohann Cabon, Naila Murray, and Martin Humenberger. Vir- tual KITTI 2.CoRR, abs/2001.10773, 2020. 4, 5

  8. [8]

    Depth Estimation via Affinity Learned with Convolutional Spatial Propagation Network

    Xinjing Cheng, Peng Wang, and Ruigang Yang. Depth Estimation via Affinity Learned with Convolutional Spatial Propagation Network. pages 103–119, 2018. 2

  9. [9]

    Xinjing Cheng, Peng Wang, Chenye Guan, and Ruigang Yang. CSPN++: Learning Context and Resource Aware Convolutional Spatial Propagation Networks for Depth Completion.Proceedings of the AAAI Conference on Arti- ficial Intelligence, 34(07):10615–10622, 2020. Number: 07. 2

  10. [10]

    Diffusion pos- terior sampling for general noisy inverse problems

    Hyungjin Chung, Jeongsol Kim, Michael Thompson Mc- cann, Marc Louis Klasky, and Jong Chul Ye. Diffusion pos- terior sampling for general noisy inverse problems. InThe Eleventh International Conference on Learning Representa- tions, 2023. 4

  11. [11]

    Unsupervised confidence for LiDAR depth maps and applications

    Andrea Conti, Matteo Poggi, Filippo Aleotti, and Stefano Mattoccia. Unsupervised confidence for LiDAR depth maps and applications. In2022 IEEE/RSJ International Confer- ence on Intelligent Robots and Systems (IROS), pages 8352– 8359, 2022. 5

  12. [12]

    Spar- sity Agnostic Depth Completion

    Andrea Conti, Matteo Poggi, and Stefano Mattoccia. Spar- sity Agnostic Depth Completion. InProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pages 5871–5880, 2023. 1, 3, 5, 6

  13. [13]

    Chang, Manolis Savva, Maciej Hal- ber, Thomas Funkhouser, and Matthias Niessner

    Angela Dai, Angel X. Chang, Manolis Savva, Maciej Hal- ber, Thomas Funkhouser, and Matthias Niessner. ScanNet: Richly-Annotated 3D Reconstructions of Indoor Scenes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017. 5, 6

  14. [14]

    Diffusion models beat gans on image synthesis.Advances in neural informa- tion processing systems, 34:8780–8794, 2021

    Prafulla Dhariwal and Alexander Nichol. Diffusion models beat gans on image synthesis.Advances in neural informa- tion processing systems, 34:8780–8794, 2021. 4

  15. [15]

    DTU Computing Center resources

    DTU Computing Center. DTU Computing Center resources. https : / / doi . org / 10 . 48714 / DTU . HPC . 0001,

  16. [16]

    Geowiz- ard: Unleashing the Diffusion Priors for 3D Geometry Es- timation from a Single Image

    Xiao Fu, Wei Yin, Mu Hu, Kaixuan Wang, Yuexin Ma, Ping Tan, Shaojie Shen, Dahua Lin, and Xiaoxiao Long. Geowiz- ard: Unleashing the Diffusion Priors for 3D Geometry Es- timation from a Single Image. InEuropean Conference on Computer Vision, pages 241–258. Springer, 2024. 1

  17. [17]

    Virtual Worlds as Proxy for Multi-Object Tracking Analysis

    Adrien Gaidon, Qiao Wang, Yohann Cabon, and Eleonora Vig. Virtual Worlds as Proxy for Multi-Object Tracking Analysis. InProceedings of the IEEE conference on Com- puter Vision and Pattern Recognition, pages 4340–4349,

  18. [18]

    Vision meets Robotics: The KITTI Dataset.Inter- national Journal of Robotics Research (IJRR), 2013

    Andreas Geiger, Philip Lenz, Christoph Stiller, and Raquel Urtasun. Vision meets Robotics: The KITTI Dataset.Inter- national Journal of Robotics Research (IJRR), 2013. 1, 5, 6

  19. [19]

    SteeredMarigold: Steering Diffusion Towards Depth Completion of Largely Incomplete Depth Maps

    Jakub Gregorek and Lazaros Nalpantidis. SteeredMarigold: Steering Diffusion Towards Depth Completion of Largely Incomplete Depth Maps. In2025 IEEE International Con- ference on Robotics and Automation (ICRA), pages 13304– 13311, 2025. 1, 3

  20. [20]

    DepthFM: Fast Generative Monocular Depth Estimation with Flow Matching

    Ming Gui, Johannes Schusterbauer, Ulrich Prestel, Pingchuan Ma, Dmytro Kotovenko, Olga Grebenkova, Stefan Andreas Baumann, Vincent Tao Hu, and Bj ¨orn Ommer. DepthFM: Fast Generative Monocular Depth Estimation with Flow Matching. InProceedings of the AAAI Conference on Artificial Intelligence, pages 3203–3211,

  21. [21]

    3D Packing for Self-Supervised Monocular Depth Estimation

    Vitor Guizilini, Rares Ambrus, Sudeep Pillai, Allan Raven- tos, and Adrien Gaidon. 3D Packing for Self-Supervised Monocular Depth Estimation. InIEEE Conference on Com- puter Vision and Pattern Recognition (CVPR), 2020. 5

  22. [22]

    Lotus: Diffusion-based visual foundation model for high-quality dense prediction.arXiv preprint arXiv:2409.18124, 2024

    Jing He, Haodong Li, Wei Yin, Yixun Liang, Leheng Li, Kaiqiang Zhou, Hongbo Zhang, Bingbing Liu, and Ying- Cong Chen. Lotus: Diffusion-based visual foundation model for high-quality dense prediction.arXiv preprint arXiv:2409.18124, 2024. 1

  23. [23]

    Deep Residual Learning for Image Recognition

    Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep Residual Learning for Image Recognition. InProceed- ings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016. 3

  24. [24]

    Denoising Dif- fusion Probabilistic Models

    Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising Dif- fusion Probabilistic Models. InAdvances in Neural Infor- mation Processing Systems, pages 6840–6851. Curran Asso- ciates, Inc., 2020. 3

  25. [25]

    Deep Depth Comple- tion From Extremely Sparse Data: A Survey.IEEE Trans- actions on Pattern Analysis and Machine Intelligence, 45(7): 8244–8264, 2023

    Junjie Hu, Chenyu Bao, Mete Ozay, Chenyou Fan, Qing Gao, Honghai Liu, and Tin Lun Lam. Deep Depth Comple- tion From Extremely Sparse Data: A Survey.IEEE Trans- actions on Pattern Analysis and Machine Intelligence, 45(7): 8244–8264, 2023. 1

  26. [26]

    PENet: Towards Precise and Efficient Image Guided Depth Completion

    Mu Hu, Shuling Wang, Bin Li, Shiyu Ning, Li Fan, and Xi- aojin Gong. PENet: Towards Precise and Efficient Image Guided Depth Completion. In2021 IEEE International Con- ference on Robotics and Automation (ICRA), pages 13656– 13662, 2021. ISSN: 2577-087X. 2

  27. [27]

    Test-Time Prompt Tuning for Zero-Shot Depth Com- pletion

    Chanhwi Jeong, Inhwan Bae, Jin-Hwi Park, and Hae-Gon Jeon. Test-Time Prompt Tuning for Zero-Shot Depth Com- pletion. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 9443–9454,

  28. [28]

    Repurpos- ing Diffusion-Based Image Generators for Monocular Depth Estimation

    Bingxin Ke, Anton Obukhov, Shengyu Huang, Nando Met- zger, Rodrigo Caye Daudt, and Konrad Schindler. Repurpos- ing Diffusion-Based Image Generators for Monocular Depth Estimation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024. 1, 3, 4, 5, 6

  29. [29]

    Marigold: Affordable Adaptation of Diffusion- Based Image Generators for Image Analysis.IEEE Trans- actions on Pattern Analysis and Machine Intelligence, pages 1–18, 2025

    Bingxin Ke, Kevin Qu, Tianfu Wang, Nando Metzger, Shengyu Huang, Bo Li, Anton Obukhov, and Konrad Schindler. Marigold: Affordable Adaptation of Diffusion- Based Image Generators for Image Analysis.IEEE Trans- actions on Pattern Analysis and Machine Intelligence, pages 1–18, 2025. 3

  30. [30]

    Evaluation of CNN-based Single-Image Depth Estimation Methods

    Tobias Koch, Lukas Liebel, Friedrich Fraundorfer, and Marco Korner. Evaluation of CNN-based Single-Image Depth Estimation Methods. InProceedings of the European Conference on Computer Vision (ECCV) Workshops, 2018. 5, 6

  31. [31]

    Orchid: Image latent diffusion for joint appearance and geometry generation.arXiv preprint arXiv:2501.13087,

    Akshay Krishnan, Xinchen Yan, Vincent Casser, and Abhijit Kundu. Orchid: Image latent diffusion for joint appearance and geometry generation.arXiv preprint arXiv:2501.13087,

  32. [32]

    Waslander

    Jason Ku, Ali Harakeh, and Steven L. Waslander. In Defense of Classical Image Processing: Fast Depth Completion on the CPU. In2018 15th Conference on Computer and Robot Vision (CRV), pages 16–22, 2018. 2

  33. [33]

    Dis- tilling Monocular Foundation Model for Fine-grained Depth Completion

    Yingping Liang, Yutao Hu, Wenqi Shao, and Ying Fu. Dis- tilling Monocular Foundation Model for Fine-grained Depth Completion. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 22254–22265, 2025. 3, 5

  34. [34]

    Dis- tilling monocular foundation model for fine-grained depth completion

    Yingping Liang, Yutao Hu, Wenqi Shao, and Ying Fu. Dis- tilling monocular foundation model for fine-grained depth completion. InProceedings of the Computer Vision and Pat- tern Recognition Conference, pages 22254–22265, 2025. 6

  35. [35]

    Chen, Zhenyu Li, Guang Shi, Jiashi Feng, and Bingyi Kang

    Haotong Lin, Sili Chen, Junhao Liew, Donny Y . Chen, Zhenyu Li, Guang Shi, Jiashi Feng, and Bingyi Kang. Depth Anything 3: Recovering the Visual Space from Any Views,

  36. [36]

    Prompting Depth Anything for 4K Resolution Accurate Metric Depth Estimation

    Haotong Lin, Sida Peng, Jingxiao Chen, Songyou Peng, Ji- aming Sun, Minghuan Liu, Hujun Bao, Jiashi Feng, Xiaowei Zhou, and Bingyi Kang. Prompting Depth Anything for 4K Resolution Accurate Metric Depth Estimation. InProceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 17070–17080, 2025. 1, 3, 5

  37. [37]

    Common diffusion noise schedules and sample steps are flawed

    Shanchuan Lin, Bingchen Liu, Jiashi Li, and Xiao Yang. Common diffusion noise schedules and sample steps are flawed. InProceedings of the IEEE/CVF winter conference on applications of computer vision, pages 5404–5411, 2024. 4

  38. [38]

    DySPN: Learning Dynamic Affinity for Image-Guided Depth Completion.IEEE Transactions on Circuits and Systems for Video Technology, 34(6):4596– 4609, 2024

    Yuankai Lin, Hua Yang, Tao Cheng, Wending Zhou, and Zhouping Yin. DySPN: Learning Dynamic Affinity for Image-Guided Depth Completion.IEEE Transactions on Circuits and Systems for Video Technology, 34(6):4596– 4609, 2024. 2

  39. [39]

    Chen, Heli Ben-Hamu, Maximil- ian Nickel, and Matt Le

    Yaron Lipman, Ricky T.Q. Chen, Heli Ben-Hamu, Maximil- ian Nickel, and Matt Le. Flow matching for generative mod- eling. In11th International Conference on Learning Repre- sentations (ICLR), 2023. 3

  40. [40]

    Learning Affinity via Spatial Propagation Networks

    Sifei Liu, Shalini De Mello, Jinwei Gu, Guangyu Zhong, Ming-Hsuan Yang, and Jan Kautz. Learning Affinity via Spatial Propagation Networks. InAdvances in Neural Infor- mation Processing Systems. Curran Associates, Inc., 2017. 2

  41. [41]

    DepthLab: From Partial to Complete, 2024

    Zhiheng Liu, Ka Leong Cheng, Qiuyu Wang, Shuzhe Wang, Hao Ouyang, Bin Tan, Kai Zhu, Yujun Shen, Qifeng Chen, and Ping Luo. DepthLab: From Partial to Complete, 2024. 5, 6

  42. [42]

    Decoupled Weight De- cay Regularization

    Ilya Loshchilov and Frank Hutter. Decoupled Weight De- cay Regularization. InInternational Conference on Learning Representations, 2019. 5

  43. [43]

    Latent Consistency Models: Synthesizing High- Resolution Images with Few-Step Inference, 2023

    Simian Luo, Yiqin Tan, Longbo Huang, Jian Li, and Hang Zhao. Latent Consistency Models: Synthesizing High- Resolution Images with Few-Step Inference, 2023. 3

  44. [44]

    Fine-Tuning Image-Conditional Diffusion Models is Easier than You Think

    Gonzalo Martin Garcia, Karim Abou Zeid, Christian Schmidt, Daan de Geus, Alexander Hermans, and Bastian Leibe. Fine-Tuning Image-Conditional Diffusion Models is Easier than You Think. InProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2025. 1, 3, 4, 5, 6, 7

  45. [45]

    A Large Dataset to Train Convolutional Networks for Dispar- ity, Optical Flow, and Scene Flow Estimation

    Nikolaus Mayer, Eddy Ilg, Philip Hausser, Philipp Fischer, Daniel Cremers, Alexey Dosovitskiy, and Thomas Brox. A Large Dataset to Train Convolutional Networks for Dispar- ity, Optical Flow, and Scene Flow Estimation. InProceed- ings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016. 7

  46. [46]

    Indoor Segmentation and Support Inference from RGBD Images

    Pushmeet Kohli Nathan Silberman, Derek Hoiem and Rob Fergus. Indoor Segmentation and Support Inference from RGBD Images. InECCV, 2012. 5, 6

  47. [47]

    SemAttNet: Toward Attention-Based Semantic Aware Guided Depth Comple- tion.IEEE Access, 10:120781–120791, 2022

    Danish Nazir, Alain Pagani, Marcus Liwicki, Didier Stricker, and Muhammad Zeshan Afzal. SemAttNet: Toward Attention-Based Semantic Aware Guided Depth Comple- tion.IEEE Access, 10:120781–120791, 2022. Conference Name: IEEE Access. 2

  48. [48]

    Scaling Diffusion Models to Real-World 3D LiDAR Scene Completion

    Lucas Nunes, Rodrigo Marcuzzi, Benedikt Mersch, Jens Behley, and Cyrill Stachniss. Scaling Diffusion Models to Real-World 3D LiDAR Scene Completion. InProceedings of the IEEE/CVF Conference on Computer Vision and Pat- tern Recognition (CVPR), pages 14770–14780, 2024. 2

  49. [49]

    DINOv2: Learning Robust Visual Features without Supervision, 2024

    Maxime Oquab, Timoth ´ee Darcet, Th ´eo Moutakanni, Huy V o, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, Mah- moud Assran, Nicolas Ballas, Wojciech Galuba, Russell Howes, Po-Yao Huang, Shang-Wen Li, Ishan Misra, Michael Rabbat, Vasu Sharma, Gabriel Synnaeve, Hu Xu, Herv ´e Je- gou, Julien Mairal, ...

  50. [50]

    Non-local Spatial Propagation Network for Depth Completion

    Jinsun Park, Kyungdon Joo, Zhe Hu, Chi-Kuei Liu, and In So Kweon. Non-local Spatial Propagation Network for Depth Completion. InComputer Vision – ECCV 2020, pages 120–136, Cham, 2020. Springer International Publishing. 1, 2, 5, 6

  51. [51]

    SharpDepth: Sharpening Metric Depth Predictions Using Diffusion Distillation

    Duc-Hai Pham, Tung Do, Phong Nguyen, Binh-Son Hua, Khoi Nguyen, and Rang Nguyen. SharpDepth: Sharpening Metric Depth Predictions Using Diffusion Distillation. In Proceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition (CVPR), pages 17060–17069,

  52. [52]

    UniDepth: Universal Monocular Metric Depth Estimation

    Luigi Piccinelli, Yung-Hsu Yang, Christos Sakaridis, Mattia Segu, Siyuan Li, Luc Van Gool, and Fisher Yu. UniDepth: Universal Monocular Metric Depth Estimation. InProceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 10106–10116, 2024. 2

  53. [53]

    UniDepthV2: Universal Monocular Metric Depth Estima- tion Made Simpler.IEEE Transactions on Pattern Analysis and Machine Intelligence, 48(3):2354–2367, 2026

    Luigi Piccinelli, Christos Sakaridis, Yung-Hsu Yang, Mat- tia Segu, Siyuan Li, Wim Abbeloos, and Luc Van Gool. UniDepthV2: Universal Monocular Metric Depth Estima- tion Made Simpler.IEEE Transactions on Pattern Analysis and Machine Intelligence, 48(3):2354–2367, 2026. 2

  54. [54]

    Ren ´e Ranftl, Katrin Lasinger, David Hafner, Konrad Schindler, and Vladlen Koltun. Towards Robust Monocu- lar Depth Estimation: Mixing Datasets for Zero-Shot Cross- Dataset Transfer.IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(3):1623–1637, 2022. 2, 4

  55. [55]

    Susskind

    Mike Roberts, Jason Ramapuram, Anurag Ranjan, Atulit Kumar, Miguel Angel Bautista, Nathan Paczan, Russ Webb, and Joshua M. Susskind. Hypersim: A Photorealistic Syn- thetic Dataset for Holistic Indoor Scene Understanding. In International Conference on Computer Vision (ICCV) 2021,

  56. [56]

    High-Resolution Image Synthesis with Latent Diffusion Models

    Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bj ¨orn Ommer. High-Resolution Image Synthesis with Latent Diffusion Models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 10684–10695, 2022. 1, 3

  57. [57]

    U- net: Convolutional networks for biomedical image segmen- tation

    Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U- net: Convolutional networks for biomedical image segmen- tation. InInternational Conference on Medical image com- puting and computer-assisted intervention, pages 234–241. Springer, 2015. 3

  58. [58]

    Progressive distillation for fast sampling of diffusion models

    Tim Salimans and Jonathan Ho. Progressive distillation for fast sampling of diffusion models. InInternational Confer- ence on Learning Representations, 2022. 3

  59. [59]

    Shuwei Shao, Zhongcai Pei, Weihai Chen, Peter C. Y . Chen, and Zhengguo Li. NDDepth: Normal-Distance Assisted Monocular Depth Estimation and Completion.IEEE Trans- actions on Pattern Analysis and Machine Intelligence, 46 (12):8883–8899, 2024. 2

  60. [60]

    Denois- ing diffusion implicit models

    Jiaming Song, Chenlin Meng, and Stefano Ermon. Denois- ing diffusion implicit models. InInternational Conference on Learning Representations, 2021. 4

  61. [61]

    Learning Guided Convolutional Network for Depth Com- pletion.IEEE Transactions on Image Processing, 30:1116– 1129, 2021

    Jie Tang, Fei-Peng Tian, Wei Feng, Jian Li, and Ping Tan. Learning Guided Convolutional Network for Depth Com- pletion.IEEE Transactions on Image Processing, 30:1116– 1129, 2021. 2

  62. [62]

    Bi- lateral Propagation Network for Depth Completion

    Jie Tang, Fei-Peng Tian, Boshi An, Jian Li, and Ping Tan. Bi- lateral Propagation Network for Depth Completion. InPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 9763–9772, 2024. 1, 2, 5, 6

  63. [63]

    Gaussian belief propagation network for depth completion.arXiv preprint arXiv:2601.21291, 2026

    Jie Tang, Pingping Xie, Jian Li, and Ping Tan. Gaussian belief propagation network for depth completion.arXiv preprint arXiv:2601.21291, 2026. 5, 6

  64. [64]

    Sparsity Invariant CNNs

    Jonas Uhrig, Nick Schneider, Lukas Schneider, Uwe Franke, Thomas Brox, and Andreas Geiger. Sparsity Invariant CNNs. InInternational Conference on 3D Vision (3DV), 2017. 5

  65. [65]

    Neural discrete representation learning.Advances in neural information pro- cessing systems, 30, 2017

    Aaron Van Den Oord, Oriol Vinyals, et al. Neural discrete representation learning.Advances in neural information pro- cessing systems, 30, 2017. 3

  66. [66]

    Marigold-DC: Zero-Shot Monocular Depth Completion with Guided Diffusion, 2024

    Massimiliano Viola, Kevin Qu, Nando Metzger, Bingxin Ke, Alexander Becker, Konrad Schindler, and Anton Obukhov. Marigold-DC: Zero-Shot Monocular Depth Completion with Guided Diffusion, 2024. 1, 2, 3, 4, 5, 6, 7, 8

  67. [67]

    Learning Inverse Laplacian Pyramid for Progressive Depth Completion.arXiv preprint arXiv:2502.07289, 2025

    Kun Wang, Zhiqiang Yan, Junkai Fan, Jun Li, and Jian Yang. Learning Inverse Laplacian Pyramid for Progressive Depth Completion.arXiv preprint arXiv:2502.07289, 2025. 2

  68. [68]

    MoGe: Unlock- ing Accurate Monocular Geometry Estimation for Open- Domain Images with Optimal Training Supervision

    Ruicheng Wang, Sicheng Xu, Cassie Dai, Jianfeng Xiang, Yu Deng, Xin Tong, and Jiaolong Yang. MoGe: Unlock- ing Accurate Monocular Geometry Estimation for Open- Domain Images with Optimal Training Supervision. InPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 5261–5271, 2025. 2

  69. [69]

    MoGe-2: Accurate Monocular Geometry with Metric Scale and Sharp Details, 2025

    Ruicheng Wang, Sicheng Xu, Yue Dong, Yu Deng, Jianfeng Xiang, Zelong Lv, Guangzhong Sun, Xin Tong, and Jiaolong Yang. MoGe-2: Accurate Monocular Geometry with Metric Scale and Sharp Details, 2025. 2

  70. [70]

    LRRU: Long-short Range Recurrent Updating Net- works for Depth Completion

    Yufei Wang, Bo Li, Ge Zhang, Qi Liu, Tao Gao, and Yuchao Dai. LRRU: Long-short Range Recurrent Updating Net- works for Depth Completion. pages 9422–9432, 2023. 1, 2

  71. [71]

    Decom- posed Guided Dynamic Filters for Efficient RGB-Guided Depth Completion.IEEE Transactions on Circuits and Sys- tems for Video Technology, 34(2):1186–1198, 2024

    Yufei Wang, Yuxin Mao, Qi Liu, and Yuchao Dai. Decom- posed Guided Dynamic Filters for Efficient RGB-Guided Depth Completion.IEEE Transactions on Circuits and Sys- tems for Video Technology, 34(2):1186–1198, 2024. 2

  72. [72]

    Improving Depth Completion via Depth Feature Upsampling

    Yufei Wang, Ge Zhang, Shaoqian Wang, Bo Li, Qi Liu, Le Hui, and Yuchao Dai. Improving Depth Completion via Depth Feature Upsampling. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 21104–21113, 2024. 2

  73. [73]

    Fast diffusion-based counterfactuals for shortcut removal and generation

    Nina Weng, Paraskevas Pegios, Eike Petersen, Aasa Feragen, and Siavash Bigdeli. Fast diffusion-based counterfactuals for shortcut removal and generation. InEuropean Conference on Computer Vision, pages 338–357. Springer, 2024. 4

  74. [74]

    Unsupervised Depth Completion from Visual Iner- tial Odometry.IEEE Robotics and Automation Letters, 5(2): 1899–1906, 2020

    Alex Wong, Xiaohan Fei, Stephanie Tsuei, and Stefano Soatto. Unsupervised Depth Completion from Visual Iner- tial Odometry.IEEE Robotics and Automation Letters, 5(2): 1899–1906, 2020. 5

  75. [75]

    DEPTHOR++: Robust Depth Enhance- ment from a Real-World Lightweight dToF and RGB Guid- ance, 2025

    Jijun Xiang, Longliang Liu, Xuan Zhu, Xianqi Wang, Min Lin, and Xin Yang. DEPTHOR++: Robust Depth Enhance- ment from a Real-World Lightweight dToF and RGB Guid- ance, 2025. 3

  76. [76]

    DEPTHOR: Depth Enhance- ment from a Practical Light-Weight dToF Sensor and RGB Image.arXiv preprint arXiv:2504.01596, 2025

    Jijun Xiang, Xuan Zhu, Xianqi Wang, Yu Wang, Hong Zhang, Fei Guo, and Xin Yang. DEPTHOR: Depth Enhance- ment from a Practical Light-Weight dToF Sensor and RGB Image.arXiv preprint arXiv:2504.01596, 2025. 2, 3

  77. [77]

    3dDepthNet: Point Cloud Guided Depth Completion Net- work for Sparse Depth and Single Color Image.CoRR, abs/2003.09175, 2020

    Rui Xiang, Feng Zheng, Huapeng Su, and Zhe Zhang. 3dDepthNet: Point Cloud Guided Depth Completion Net- work for Sparse Depth and Single Color Image.CoRR, abs/2003.09175, 2020. 2

  78. [78]

    Recent Advances in Conventional and Deep Learning-Based Depth Completion: A Survey.IEEE Trans- actions on Neural Networks and Learning Systems, 35(3): 3395–3415, 2024

    Zexiao Xie, Xiaoxuan Yu, Xiang Gao, Kunqian Li, and Shuhan Shen. Recent Advances in Conventional and Deep Learning-Based Depth Completion: A Survey.IEEE Trans- actions on Neural Networks and Learning Systems, 35(3): 3395–3415, 2024. 1

  79. [79]

    Pixel-perfect depth with semantics- prompted diffusion transformers, 2025

    Gangwei Xu, Haotong Lin, Hongcheng Luo, Xianqi Wang, Jingfeng Yao, Lianghui Zhu, Yuechuan Pu, Cheng Chi, Haiyang Sun, Bing Wang, Guang Chen, Hangjun Ye, Sida Peng, and Xin Yang. Pixel-perfect depth with semantics- prompted diffusion transformers, 2025. 3

  80. [80]

    Tri-Perspective View Decomposition for Geometry-Aware Depth Completion

    Zhiqiang Yan, Yuankai Lin, Kun Wang, Yupeng Zheng, Yufei Wang, Zhenyu Zhang, Jun Li, and Jian Yang. Tri-Perspective View Decomposition for Geometry-Aware Depth Completion. pages 4874–4884, 2024. 2

Showing first 80 references.