Recognition: 1 theorem link
· Lean TheoremNeed for Speed: Zero-Shot Depth Completion with Single-Step Diffusion
Pith reviewed 2026-05-15 13:48 UTC · model grok-4.3
The pith
Marigold-SSD completes depth maps in one diffusion step by finetuning instead of test-time optimization.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Marigold-SSD is a single-step late-fusion depth completion framework that leverages strong diffusion priors while eliminating the costly test-time optimization typically associated with diffusion-based methods. By shifting computational burden from inference to finetuning, the approach enables efficient and robust 3D perception under real-world latency constraints after only 4.5 GPU days of training. Evaluations across indoor and outdoor benchmarks demonstrate strong cross-domain generalization and zero-shot performance, narrowing the efficiency gap with discriminative models.
What carries the argument
Single-step late-fusion depth completion that fuses diffusion priors in one forward pass after finetuning.
If this is right
- Faster inference times make diffusion priors practical for latency-sensitive 3D perception tasks.
- Training cost limited to 4.5 GPU days removes the need for repeated test-time optimization.
- Zero-shot results hold across indoor and outdoor domains without scene-specific retraining.
- Performance remains stable under varying levels of input depth sparsity.
- The method closes much of the speed gap between diffusion and discriminative depth completion models.
Where Pith is reading between the lines
- The single-step pattern could be adapted to speed up other diffusion tasks such as semantic segmentation or normal estimation.
- Lower inference cost opens deployment on mobile or embedded hardware for robotics applications.
- Rethinking sparsity protocols in evaluation may encourage more realistic benchmarks for depth completion methods.
Load-bearing premise
Finetuning a pre-trained diffusion model for single-step use keeps depth prediction accuracy and robustness across varied scenes.
What would settle it
A benchmark run where the single-step model produces clearly lower accuracy than standard multi-step diffusion depth completion on the same datasets would disprove the claim.
Figures
read the original abstract
We introduce Marigold-SSD, a single-step, late-fusion depth completion framework that leverages strong diffusion priors while eliminating the costly test-time optimization typically associated with diffusion-based methods. By shifting computational burden from inference to finetuning, our approach enables efficient and robust 3D perception under real-world latency constraints. Marigold-SSD achieves significantly faster inference with a training cost of only 4.5 GPU days. We evaluate our method across four indoor and two outdoor benchmarks, demonstrating strong cross-domain generalization and zero-shot performance compared to existing depth completion approaches. Our approach significantly narrows the efficiency gap between diffusion-based and discriminative models. Finally, we challenge common evaluation protocols by analyzing performance under varying input sparsity levels. Page: https://dtu-pas.github.io/marigold-ssd/
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces Marigold-SSD, a single-step late-fusion depth completion framework that leverages diffusion priors from a base model via targeted finetuning, eliminating test-time optimization to achieve fast inference at a reported training cost of 4.5 GPU days. It evaluates the approach on four indoor and two outdoor benchmarks, claiming strong cross-domain generalization and zero-shot performance relative to existing depth completion methods, while also analyzing robustness under varying input sparsity levels.
Significance. If the quantitative claims hold, the work could meaningfully narrow the efficiency gap between diffusion-based and discriminative depth completion models, enabling practical real-world deployment under latency constraints by shifting compute to a one-time finetuning stage while retaining strong priors.
major comments (2)
- Abstract: the central claim of 'strong cross-domain generalization and zero-shot performance' is presented without any quantitative metrics, baseline comparisons, error bars, or ablation details, leaving the efficiency and accuracy assertions unsupported by verifiable evidence in the provided text.
- Evaluation section: the analysis of performance under varying input sparsity levels is positioned as challenging common protocols, but without explicit quantitative results or direct comparison to the zero-shot claim, it is unclear whether this supports or qualifies the main generalization assertions.
minor comments (1)
- The page link and method name (Marigold-SSD) should be consistently referenced with a brief expansion of the acronym on first use.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. We address the major comments point by point below and will revise the manuscript to strengthen the presentation of quantitative evidence.
read point-by-point responses
-
Referee: Abstract: the central claim of 'strong cross-domain generalization and zero-shot performance' is presented without any quantitative metrics, baseline comparisons, error bars, or ablation details, leaving the efficiency and accuracy assertions unsupported by verifiable evidence in the provided text.
Authors: We agree that the abstract would benefit from including key quantitative results to make the claims immediately verifiable. In the revised version, we will update the abstract to report specific metrics such as average RMSE and accuracy on the indoor and outdoor benchmarks, the 4.5 GPU-day training cost, inference speed, and direct comparisons to existing depth completion methods. revision: yes
-
Referee: Evaluation section: the analysis of performance under varying input sparsity levels is positioned as challenging common protocols, but without explicit quantitative results or direct comparison to the zero-shot claim, it is unclear whether this supports or qualifies the main generalization assertions.
Authors: We will revise the evaluation section to present the sparsity analysis results more explicitly, including tables or figures with quantitative metrics across sparsity levels and direct side-by-side comparisons to the zero-shot performance on the main benchmarks. This will clarify how the robustness analysis supports the cross-domain generalization claims. revision: yes
Circularity Check
No significant circularity detected
full rationale
The provided manuscript text and abstract describe an engineering contribution: a single-step late-fusion diffusion model (Marigold-SSD) obtained by targeted finetuning of a pre-trained diffusion prior, with inference cost reduced by removing test-time optimization. No equations, derivation steps, or self-citation chains are exhibited that reduce a claimed prediction or uniqueness result to a fitted parameter or prior author result by construction. The training cost (4.5 GPU days) and benchmark numbers are presented as empirical outcomes of the training procedure rather than quantities forced by the method's own definitions. The central design choice (shifting compute to finetuning) is explicitly stated and does not rely on a self-referential uniqueness theorem or ansatz smuggled via citation. The derivation chain is therefore self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
[Online; accessed 28-October-2025]
huggingface/diffusers: State-of-the-art diffusion models for image, video, and audio generation in PyTorch.https:// github.com/huggingface/diffusers, . [Online; accessed 28-October-2025]. 5
work page 2025
-
[2]
Hugging Face: GonzaloMG/marigold-e2e-ft-depth. https : / / huggingface . co / GonzaloMG / marigold - e2e - ft - depth, . [Online; accessed 28-October-2025]. 5
work page 2025
-
[3]
Bridging Depth Estimation and Completion for Mobile Robots Reliable 3D Perception
Dimitrios Arapis, Milad Jami, and Lazaros Nalpantidis. Bridging Depth Estimation and Completion for Mobile Robots Reliable 3D Perception. InRobot Intelligence Tech- nology and Applications 7, pages 169–179, Cham, 2023. Springer International Publishing. 3
work page 2023
-
[4]
Revisiting Depth Completion from a Stereo Matching Perspective for Cross-domain Generaliza- tion
Luca Bartolomei, Matteo Poggi, Andrea Conti, Fabio Tosi, and Stefano Mattoccia. Revisiting Depth Completion from a Stereo Matching Perspective for Cross-domain Generaliza- tion. In2024 International Conference on 3D Vision (3DV), pages 1360–1370, 2024. ISSN: 2475-7888. 1, 3, 5, 6, 7, 8
work page 2024
-
[5]
ZoeDepth: Zero-shot Transfer by Com- bining Relative and Metric Depth, 2023
Shariq Farooq Bhat, Reiner Birkl, Diana Wofk, Peter Wonka, and Matthias M¨uller. ZoeDepth: Zero-shot Transfer by Com- bining Relative and Metric Depth, 2023. 2
work page 2023
-
[6]
Depth Pro: Sharp Monocular Metric Depth in Less Than a Second
Aleksei Bochkovskii, Ama ¨el Delaunoy, Hugo Germain, Marcel Santos, Yichao Zhou, Stephan R Richter, and Vladlen Koltun. Depth Pro: Sharp Monocular Metric Depth in Less Than a Second.arXiv preprint arXiv:2410.02073,
work page internal anchor Pith review Pith/arXiv arXiv
-
[7]
Yohann Cabon, Naila Murray, and Martin Humenberger. Vir- tual KITTI 2.CoRR, abs/2001.10773, 2020. 4, 5
work page internal anchor Pith review Pith/arXiv arXiv 2001
-
[8]
Depth Estimation via Affinity Learned with Convolutional Spatial Propagation Network
Xinjing Cheng, Peng Wang, and Ruigang Yang. Depth Estimation via Affinity Learned with Convolutional Spatial Propagation Network. pages 103–119, 2018. 2
work page 2018
-
[9]
Xinjing Cheng, Peng Wang, Chenye Guan, and Ruigang Yang. CSPN++: Learning Context and Resource Aware Convolutional Spatial Propagation Networks for Depth Completion.Proceedings of the AAAI Conference on Arti- ficial Intelligence, 34(07):10615–10622, 2020. Number: 07. 2
work page 2020
-
[10]
Diffusion pos- terior sampling for general noisy inverse problems
Hyungjin Chung, Jeongsol Kim, Michael Thompson Mc- cann, Marc Louis Klasky, and Jong Chul Ye. Diffusion pos- terior sampling for general noisy inverse problems. InThe Eleventh International Conference on Learning Representa- tions, 2023. 4
work page 2023
-
[11]
Unsupervised confidence for LiDAR depth maps and applications
Andrea Conti, Matteo Poggi, Filippo Aleotti, and Stefano Mattoccia. Unsupervised confidence for LiDAR depth maps and applications. In2022 IEEE/RSJ International Confer- ence on Intelligent Robots and Systems (IROS), pages 8352– 8359, 2022. 5
work page 2022
-
[12]
Spar- sity Agnostic Depth Completion
Andrea Conti, Matteo Poggi, and Stefano Mattoccia. Spar- sity Agnostic Depth Completion. InProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pages 5871–5880, 2023. 1, 3, 5, 6
work page 2023
-
[13]
Chang, Manolis Savva, Maciej Hal- ber, Thomas Funkhouser, and Matthias Niessner
Angela Dai, Angel X. Chang, Manolis Savva, Maciej Hal- ber, Thomas Funkhouser, and Matthias Niessner. ScanNet: Richly-Annotated 3D Reconstructions of Indoor Scenes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017. 5, 6
work page 2017
-
[14]
Prafulla Dhariwal and Alexander Nichol. Diffusion models beat gans on image synthesis.Advances in neural informa- tion processing systems, 34:8780–8794, 2021. 4
work page 2021
-
[15]
DTU Computing Center resources
DTU Computing Center. DTU Computing Center resources. https : / / doi . org / 10 . 48714 / DTU . HPC . 0001,
-
[16]
Geowiz- ard: Unleashing the Diffusion Priors for 3D Geometry Es- timation from a Single Image
Xiao Fu, Wei Yin, Mu Hu, Kaixuan Wang, Yuexin Ma, Ping Tan, Shaojie Shen, Dahua Lin, and Xiaoxiao Long. Geowiz- ard: Unleashing the Diffusion Priors for 3D Geometry Es- timation from a Single Image. InEuropean Conference on Computer Vision, pages 241–258. Springer, 2024. 1
work page 2024
-
[17]
Virtual Worlds as Proxy for Multi-Object Tracking Analysis
Adrien Gaidon, Qiao Wang, Yohann Cabon, and Eleonora Vig. Virtual Worlds as Proxy for Multi-Object Tracking Analysis. InProceedings of the IEEE conference on Com- puter Vision and Pattern Recognition, pages 4340–4349,
-
[18]
Vision meets Robotics: The KITTI Dataset.Inter- national Journal of Robotics Research (IJRR), 2013
Andreas Geiger, Philip Lenz, Christoph Stiller, and Raquel Urtasun. Vision meets Robotics: The KITTI Dataset.Inter- national Journal of Robotics Research (IJRR), 2013. 1, 5, 6
work page 2013
-
[19]
SteeredMarigold: Steering Diffusion Towards Depth Completion of Largely Incomplete Depth Maps
Jakub Gregorek and Lazaros Nalpantidis. SteeredMarigold: Steering Diffusion Towards Depth Completion of Largely Incomplete Depth Maps. In2025 IEEE International Con- ference on Robotics and Automation (ICRA), pages 13304– 13311, 2025. 1, 3
work page 2025
-
[20]
DepthFM: Fast Generative Monocular Depth Estimation with Flow Matching
Ming Gui, Johannes Schusterbauer, Ulrich Prestel, Pingchuan Ma, Dmytro Kotovenko, Olga Grebenkova, Stefan Andreas Baumann, Vincent Tao Hu, and Bj ¨orn Ommer. DepthFM: Fast Generative Monocular Depth Estimation with Flow Matching. InProceedings of the AAAI Conference on Artificial Intelligence, pages 3203–3211,
-
[21]
3D Packing for Self-Supervised Monocular Depth Estimation
Vitor Guizilini, Rares Ambrus, Sudeep Pillai, Allan Raven- tos, and Adrien Gaidon. 3D Packing for Self-Supervised Monocular Depth Estimation. InIEEE Conference on Com- puter Vision and Pattern Recognition (CVPR), 2020. 5
work page 2020
-
[22]
Jing He, Haodong Li, Wei Yin, Yixun Liang, Leheng Li, Kaiqiang Zhou, Hongbo Zhang, Bingbing Liu, and Ying- Cong Chen. Lotus: Diffusion-based visual foundation model for high-quality dense prediction.arXiv preprint arXiv:2409.18124, 2024. 1
-
[23]
Deep Residual Learning for Image Recognition
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep Residual Learning for Image Recognition. InProceed- ings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016. 3
work page 2016
-
[24]
Denoising Dif- fusion Probabilistic Models
Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising Dif- fusion Probabilistic Models. InAdvances in Neural Infor- mation Processing Systems, pages 6840–6851. Curran Asso- ciates, Inc., 2020. 3
work page 2020
-
[25]
Junjie Hu, Chenyu Bao, Mete Ozay, Chenyou Fan, Qing Gao, Honghai Liu, and Tin Lun Lam. Deep Depth Comple- tion From Extremely Sparse Data: A Survey.IEEE Trans- actions on Pattern Analysis and Machine Intelligence, 45(7): 8244–8264, 2023. 1
work page 2023
-
[26]
PENet: Towards Precise and Efficient Image Guided Depth Completion
Mu Hu, Shuling Wang, Bin Li, Shiyu Ning, Li Fan, and Xi- aojin Gong. PENet: Towards Precise and Efficient Image Guided Depth Completion. In2021 IEEE International Con- ference on Robotics and Automation (ICRA), pages 13656– 13662, 2021. ISSN: 2577-087X. 2
work page 2021
-
[27]
Test-Time Prompt Tuning for Zero-Shot Depth Com- pletion
Chanhwi Jeong, Inhwan Bae, Jin-Hwi Park, and Hae-Gon Jeon. Test-Time Prompt Tuning for Zero-Shot Depth Com- pletion. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 9443–9454,
-
[28]
Repurpos- ing Diffusion-Based Image Generators for Monocular Depth Estimation
Bingxin Ke, Anton Obukhov, Shengyu Huang, Nando Met- zger, Rodrigo Caye Daudt, and Konrad Schindler. Repurpos- ing Diffusion-Based Image Generators for Monocular Depth Estimation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024. 1, 3, 4, 5, 6
work page 2024
-
[29]
Bingxin Ke, Kevin Qu, Tianfu Wang, Nando Metzger, Shengyu Huang, Bo Li, Anton Obukhov, and Konrad Schindler. Marigold: Affordable Adaptation of Diffusion- Based Image Generators for Image Analysis.IEEE Trans- actions on Pattern Analysis and Machine Intelligence, pages 1–18, 2025. 3
work page 2025
-
[30]
Evaluation of CNN-based Single-Image Depth Estimation Methods
Tobias Koch, Lukas Liebel, Friedrich Fraundorfer, and Marco Korner. Evaluation of CNN-based Single-Image Depth Estimation Methods. InProceedings of the European Conference on Computer Vision (ECCV) Workshops, 2018. 5, 6
work page 2018
-
[31]
Akshay Krishnan, Xinchen Yan, Vincent Casser, and Abhijit Kundu. Orchid: Image latent diffusion for joint appearance and geometry generation.arXiv preprint arXiv:2501.13087,
- [32]
-
[33]
Dis- tilling Monocular Foundation Model for Fine-grained Depth Completion
Yingping Liang, Yutao Hu, Wenqi Shao, and Ying Fu. Dis- tilling Monocular Foundation Model for Fine-grained Depth Completion. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 22254–22265, 2025. 3, 5
work page 2025
-
[34]
Dis- tilling monocular foundation model for fine-grained depth completion
Yingping Liang, Yutao Hu, Wenqi Shao, and Ying Fu. Dis- tilling monocular foundation model for fine-grained depth completion. InProceedings of the Computer Vision and Pat- tern Recognition Conference, pages 22254–22265, 2025. 6
work page 2025
-
[35]
Chen, Zhenyu Li, Guang Shi, Jiashi Feng, and Bingyi Kang
Haotong Lin, Sili Chen, Junhao Liew, Donny Y . Chen, Zhenyu Li, Guang Shi, Jiashi Feng, and Bingyi Kang. Depth Anything 3: Recovering the Visual Space from Any Views,
-
[36]
Prompting Depth Anything for 4K Resolution Accurate Metric Depth Estimation
Haotong Lin, Sida Peng, Jingxiao Chen, Songyou Peng, Ji- aming Sun, Minghuan Liu, Hujun Bao, Jiashi Feng, Xiaowei Zhou, and Bingyi Kang. Prompting Depth Anything for 4K Resolution Accurate Metric Depth Estimation. InProceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 17070–17080, 2025. 1, 3, 5
work page 2025
-
[37]
Common diffusion noise schedules and sample steps are flawed
Shanchuan Lin, Bingchen Liu, Jiashi Li, and Xiao Yang. Common diffusion noise schedules and sample steps are flawed. InProceedings of the IEEE/CVF winter conference on applications of computer vision, pages 5404–5411, 2024. 4
work page 2024
-
[38]
Yuankai Lin, Hua Yang, Tao Cheng, Wending Zhou, and Zhouping Yin. DySPN: Learning Dynamic Affinity for Image-Guided Depth Completion.IEEE Transactions on Circuits and Systems for Video Technology, 34(6):4596– 4609, 2024. 2
work page 2024
-
[39]
Chen, Heli Ben-Hamu, Maximil- ian Nickel, and Matt Le
Yaron Lipman, Ricky T.Q. Chen, Heli Ben-Hamu, Maximil- ian Nickel, and Matt Le. Flow matching for generative mod- eling. In11th International Conference on Learning Repre- sentations (ICLR), 2023. 3
work page 2023
-
[40]
Learning Affinity via Spatial Propagation Networks
Sifei Liu, Shalini De Mello, Jinwei Gu, Guangyu Zhong, Ming-Hsuan Yang, and Jan Kautz. Learning Affinity via Spatial Propagation Networks. InAdvances in Neural Infor- mation Processing Systems. Curran Associates, Inc., 2017. 2
work page 2017
-
[41]
DepthLab: From Partial to Complete, 2024
Zhiheng Liu, Ka Leong Cheng, Qiuyu Wang, Shuzhe Wang, Hao Ouyang, Bin Tan, Kai Zhu, Yujun Shen, Qifeng Chen, and Ping Luo. DepthLab: From Partial to Complete, 2024. 5, 6
work page 2024
-
[42]
Decoupled Weight De- cay Regularization
Ilya Loshchilov and Frank Hutter. Decoupled Weight De- cay Regularization. InInternational Conference on Learning Representations, 2019. 5
work page 2019
-
[43]
Latent Consistency Models: Synthesizing High- Resolution Images with Few-Step Inference, 2023
Simian Luo, Yiqin Tan, Longbo Huang, Jian Li, and Hang Zhao. Latent Consistency Models: Synthesizing High- Resolution Images with Few-Step Inference, 2023. 3
work page 2023
-
[44]
Fine-Tuning Image-Conditional Diffusion Models is Easier than You Think
Gonzalo Martin Garcia, Karim Abou Zeid, Christian Schmidt, Daan de Geus, Alexander Hermans, and Bastian Leibe. Fine-Tuning Image-Conditional Diffusion Models is Easier than You Think. InProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2025. 1, 3, 4, 5, 6, 7
work page 2025
-
[45]
Nikolaus Mayer, Eddy Ilg, Philip Hausser, Philipp Fischer, Daniel Cremers, Alexey Dosovitskiy, and Thomas Brox. A Large Dataset to Train Convolutional Networks for Dispar- ity, Optical Flow, and Scene Flow Estimation. InProceed- ings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016. 7
work page 2016
-
[46]
Indoor Segmentation and Support Inference from RGBD Images
Pushmeet Kohli Nathan Silberman, Derek Hoiem and Rob Fergus. Indoor Segmentation and Support Inference from RGBD Images. InECCV, 2012. 5, 6
work page 2012
-
[47]
Danish Nazir, Alain Pagani, Marcus Liwicki, Didier Stricker, and Muhammad Zeshan Afzal. SemAttNet: Toward Attention-Based Semantic Aware Guided Depth Comple- tion.IEEE Access, 10:120781–120791, 2022. Conference Name: IEEE Access. 2
work page 2022
-
[48]
Scaling Diffusion Models to Real-World 3D LiDAR Scene Completion
Lucas Nunes, Rodrigo Marcuzzi, Benedikt Mersch, Jens Behley, and Cyrill Stachniss. Scaling Diffusion Models to Real-World 3D LiDAR Scene Completion. InProceedings of the IEEE/CVF Conference on Computer Vision and Pat- tern Recognition (CVPR), pages 14770–14780, 2024. 2
work page 2024
-
[49]
DINOv2: Learning Robust Visual Features without Supervision, 2024
Maxime Oquab, Timoth ´ee Darcet, Th ´eo Moutakanni, Huy V o, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, Mah- moud Assran, Nicolas Ballas, Wojciech Galuba, Russell Howes, Po-Yao Huang, Shang-Wen Li, Ishan Misra, Michael Rabbat, Vasu Sharma, Gabriel Synnaeve, Hu Xu, Herv ´e Je- gou, Julien Mairal, ...
work page 2024
-
[50]
Non-local Spatial Propagation Network for Depth Completion
Jinsun Park, Kyungdon Joo, Zhe Hu, Chi-Kuei Liu, and In So Kweon. Non-local Spatial Propagation Network for Depth Completion. InComputer Vision – ECCV 2020, pages 120–136, Cham, 2020. Springer International Publishing. 1, 2, 5, 6
work page 2020
-
[51]
SharpDepth: Sharpening Metric Depth Predictions Using Diffusion Distillation
Duc-Hai Pham, Tung Do, Phong Nguyen, Binh-Son Hua, Khoi Nguyen, and Rang Nguyen. SharpDepth: Sharpening Metric Depth Predictions Using Diffusion Distillation. In Proceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition (CVPR), pages 17060–17069,
-
[52]
UniDepth: Universal Monocular Metric Depth Estimation
Luigi Piccinelli, Yung-Hsu Yang, Christos Sakaridis, Mattia Segu, Siyuan Li, Luc Van Gool, and Fisher Yu. UniDepth: Universal Monocular Metric Depth Estimation. InProceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 10106–10116, 2024. 2
work page 2024
-
[53]
Luigi Piccinelli, Christos Sakaridis, Yung-Hsu Yang, Mat- tia Segu, Siyuan Li, Wim Abbeloos, and Luc Van Gool. UniDepthV2: Universal Monocular Metric Depth Estima- tion Made Simpler.IEEE Transactions on Pattern Analysis and Machine Intelligence, 48(3):2354–2367, 2026. 2
work page 2026
-
[54]
Ren ´e Ranftl, Katrin Lasinger, David Hafner, Konrad Schindler, and Vladlen Koltun. Towards Robust Monocu- lar Depth Estimation: Mixing Datasets for Zero-Shot Cross- Dataset Transfer.IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(3):1623–1637, 2022. 2, 4
work page 2022
-
[55]
Mike Roberts, Jason Ramapuram, Anurag Ranjan, Atulit Kumar, Miguel Angel Bautista, Nathan Paczan, Russ Webb, and Joshua M. Susskind. Hypersim: A Photorealistic Syn- thetic Dataset for Holistic Indoor Scene Understanding. In International Conference on Computer Vision (ICCV) 2021,
work page 2021
-
[56]
High-Resolution Image Synthesis with Latent Diffusion Models
Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bj ¨orn Ommer. High-Resolution Image Synthesis with Latent Diffusion Models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 10684–10695, 2022. 1, 3
work page 2022
-
[57]
U- net: Convolutional networks for biomedical image segmen- tation
Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U- net: Convolutional networks for biomedical image segmen- tation. InInternational Conference on Medical image com- puting and computer-assisted intervention, pages 234–241. Springer, 2015. 3
work page 2015
-
[58]
Progressive distillation for fast sampling of diffusion models
Tim Salimans and Jonathan Ho. Progressive distillation for fast sampling of diffusion models. InInternational Confer- ence on Learning Representations, 2022. 3
work page 2022
-
[59]
Shuwei Shao, Zhongcai Pei, Weihai Chen, Peter C. Y . Chen, and Zhengguo Li. NDDepth: Normal-Distance Assisted Monocular Depth Estimation and Completion.IEEE Trans- actions on Pattern Analysis and Machine Intelligence, 46 (12):8883–8899, 2024. 2
work page 2024
-
[60]
Denois- ing diffusion implicit models
Jiaming Song, Chenlin Meng, and Stefano Ermon. Denois- ing diffusion implicit models. InInternational Conference on Learning Representations, 2021. 4
work page 2021
-
[61]
Jie Tang, Fei-Peng Tian, Wei Feng, Jian Li, and Ping Tan. Learning Guided Convolutional Network for Depth Com- pletion.IEEE Transactions on Image Processing, 30:1116– 1129, 2021. 2
work page 2021
-
[62]
Bi- lateral Propagation Network for Depth Completion
Jie Tang, Fei-Peng Tian, Boshi An, Jian Li, and Ping Tan. Bi- lateral Propagation Network for Depth Completion. InPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 9763–9772, 2024. 1, 2, 5, 6
work page 2024
-
[63]
Gaussian belief propagation network for depth completion.arXiv preprint arXiv:2601.21291, 2026
Jie Tang, Pingping Xie, Jian Li, and Ping Tan. Gaussian belief propagation network for depth completion.arXiv preprint arXiv:2601.21291, 2026. 5, 6
-
[64]
Jonas Uhrig, Nick Schneider, Lukas Schneider, Uwe Franke, Thomas Brox, and Andreas Geiger. Sparsity Invariant CNNs. InInternational Conference on 3D Vision (3DV), 2017. 5
work page 2017
-
[65]
Aaron Van Den Oord, Oriol Vinyals, et al. Neural discrete representation learning.Advances in neural information pro- cessing systems, 30, 2017. 3
work page 2017
-
[66]
Marigold-DC: Zero-Shot Monocular Depth Completion with Guided Diffusion, 2024
Massimiliano Viola, Kevin Qu, Nando Metzger, Bingxin Ke, Alexander Becker, Konrad Schindler, and Anton Obukhov. Marigold-DC: Zero-Shot Monocular Depth Completion with Guided Diffusion, 2024. 1, 2, 3, 4, 5, 6, 7, 8
work page 2024
-
[67]
Kun Wang, Zhiqiang Yan, Junkai Fan, Jun Li, and Jian Yang. Learning Inverse Laplacian Pyramid for Progressive Depth Completion.arXiv preprint arXiv:2502.07289, 2025. 2
-
[68]
Ruicheng Wang, Sicheng Xu, Cassie Dai, Jianfeng Xiang, Yu Deng, Xin Tong, and Jiaolong Yang. MoGe: Unlock- ing Accurate Monocular Geometry Estimation for Open- Domain Images with Optimal Training Supervision. InPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 5261–5271, 2025. 2
work page 2025
-
[69]
MoGe-2: Accurate Monocular Geometry with Metric Scale and Sharp Details, 2025
Ruicheng Wang, Sicheng Xu, Yue Dong, Yu Deng, Jianfeng Xiang, Zelong Lv, Guangzhong Sun, Xin Tong, and Jiaolong Yang. MoGe-2: Accurate Monocular Geometry with Metric Scale and Sharp Details, 2025. 2
work page 2025
-
[70]
LRRU: Long-short Range Recurrent Updating Net- works for Depth Completion
Yufei Wang, Bo Li, Ge Zhang, Qi Liu, Tao Gao, and Yuchao Dai. LRRU: Long-short Range Recurrent Updating Net- works for Depth Completion. pages 9422–9432, 2023. 1, 2
work page 2023
-
[71]
Yufei Wang, Yuxin Mao, Qi Liu, and Yuchao Dai. Decom- posed Guided Dynamic Filters for Efficient RGB-Guided Depth Completion.IEEE Transactions on Circuits and Sys- tems for Video Technology, 34(2):1186–1198, 2024. 2
work page 2024
-
[72]
Improving Depth Completion via Depth Feature Upsampling
Yufei Wang, Ge Zhang, Shaoqian Wang, Bo Li, Qi Liu, Le Hui, and Yuchao Dai. Improving Depth Completion via Depth Feature Upsampling. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 21104–21113, 2024. 2
work page 2024
-
[73]
Fast diffusion-based counterfactuals for shortcut removal and generation
Nina Weng, Paraskevas Pegios, Eike Petersen, Aasa Feragen, and Siavash Bigdeli. Fast diffusion-based counterfactuals for shortcut removal and generation. InEuropean Conference on Computer Vision, pages 338–357. Springer, 2024. 4
work page 2024
-
[74]
Alex Wong, Xiaohan Fei, Stephanie Tsuei, and Stefano Soatto. Unsupervised Depth Completion from Visual Iner- tial Odometry.IEEE Robotics and Automation Letters, 5(2): 1899–1906, 2020. 5
work page 1906
-
[75]
DEPTHOR++: Robust Depth Enhance- ment from a Real-World Lightweight dToF and RGB Guid- ance, 2025
Jijun Xiang, Longliang Liu, Xuan Zhu, Xianqi Wang, Min Lin, and Xin Yang. DEPTHOR++: Robust Depth Enhance- ment from a Real-World Lightweight dToF and RGB Guid- ance, 2025. 3
work page 2025
-
[76]
Jijun Xiang, Xuan Zhu, Xianqi Wang, Yu Wang, Hong Zhang, Fei Guo, and Xin Yang. DEPTHOR: Depth Enhance- ment from a Practical Light-Weight dToF Sensor and RGB Image.arXiv preprint arXiv:2504.01596, 2025. 2, 3
-
[77]
Rui Xiang, Feng Zheng, Huapeng Su, and Zhe Zhang. 3dDepthNet: Point Cloud Guided Depth Completion Net- work for Sparse Depth and Single Color Image.CoRR, abs/2003.09175, 2020. 2
-
[78]
Zexiao Xie, Xiaoxuan Yu, Xiang Gao, Kunqian Li, and Shuhan Shen. Recent Advances in Conventional and Deep Learning-Based Depth Completion: A Survey.IEEE Trans- actions on Neural Networks and Learning Systems, 35(3): 3395–3415, 2024. 1
work page 2024
-
[79]
Pixel-perfect depth with semantics- prompted diffusion transformers, 2025
Gangwei Xu, Haotong Lin, Hongcheng Luo, Xianqi Wang, Jingfeng Yao, Lianghui Zhu, Yuechuan Pu, Cheng Chi, Haiyang Sun, Bing Wang, Guang Chen, Hangjun Ye, Sida Peng, and Xin Yang. Pixel-perfect depth with semantics- prompted diffusion transformers, 2025. 3
work page 2025
-
[80]
Tri-Perspective View Decomposition for Geometry-Aware Depth Completion
Zhiqiang Yan, Yuankai Lin, Kun Wang, Yupeng Zheng, Yufei Wang, Zhenyu Zhang, Jun Li, and Jian Yang. Tri-Perspective View Decomposition for Geometry-Aware Depth Completion. pages 4874–4884, 2024. 2
work page 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.