Efficient Test-Time Optimization for Depth Completion via Low-Rank Decoder Adaptation

Changick Kim; Jaehyuk Jang; Minseok Seo; Wonjun Lee

arxiv: 2603.01765 · v4 · submitted 2026-03-02 · 💻 cs.CV

Efficient Test-Time Optimization for Depth Completion via Low-Rank Decoder Adaptation

Minseok Seo , Wonjun Lee , Jaehyuk Jang , Changick Kim This is my paper

Pith reviewed 2026-05-15 18:17 UTC · model grok-4.3

classification 💻 cs.CV

keywords depth completiontest-time adaptationlow-rank adaptationzero-shot learningdecoder subspacefoundation modelsefficient inference

0 comments

The pith

Depth foundation models concentrate depth information in a low-dimensional decoder subspace, so updating only that subspace during test-time optimization is enough for strong zero-shot depth completion.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper argues that existing test-time methods for zero-shot depth completion either run slow diffusion iterations or repeatedly optimize input prompts through the entire frozen network. It shows instead that the models already pack the depth-relevant signals into a small decoder subspace. Updating only that low-rank part with sparse depth measurements produces accurate results at far lower cost. Experiments across five indoor and outdoor datasets confirm the approach reaches state-of-the-art accuracy while cutting inference time substantially.

Core claim

Depth foundation models concentrate depth-relevant information within a low-dimensional decoder subspace; therefore adapting only this subspace with sparse depth supervision suffices for effective test-time optimization and yields a new accuracy-efficiency Pareto frontier.

What carries the argument

Low-rank decoder adaptation that identifies and updates only the low-dimensional subspace holding depth-relevant features.

If this is right

The method achieves state-of-the-art performance on five indoor and outdoor depth completion benchmarks.
It reduces the number of forward-backward passes compared with diffusion-based or prompt-optimization baselines.
It establishes a new accuracy-efficiency trade-off curve for test-time adaptation.
The approach enables practical real-time zero-shot depth completion without sensor-specific retraining.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same subspace concentration may appear in other foundation models, allowing similar low-cost adaptation for tasks such as surface normal estimation or semantic segmentation.
If the subspace can be located with even fewer samples, the method could extend to single-image adaptation scenarios.
Hardware implementations could cache the adapted decoder weights for repeated use on similar scenes, further amortizing the one-time optimization cost.

Load-bearing premise

Depth-relevant information is concentrated in a low-dimensional decoder subspace that can be reliably identified and updated using only sparse depth supervision across diverse indoor and outdoor scenes.

What would settle it

A new scene or dataset where updating only the identified decoder subspace produces no accuracy gain over the frozen baseline while full-network or prompt optimization still improves results.

Figures

Figures reproduced from arXiv: 2603.01765 by Changick Kim, Jaehyuk Jang, Minseok Seo, Wonjun Lee.

**Figure 1.** Figure 1: We compare a training-based method (PromptDA [28]) with test-time optimization-based depth completion approaches [20, 43]. PromptDA requires sensorspecific training and achieves real-time inference, but suffers from large reconstruction error. Existing test-time optimization-based improve accuracy at the cost of several seconds of inference per image. In contrast, our method establishes a new Pareto front… view at source ↗

**Figure 2.** Figure 2: (a) Training-based depth completion relies on offline training with paired RGB–depth data. (b) Test-time optimization methods adapt either latent variables or visual prompts at inference time, incurring significant computational cost. (c) In contrast, our method adapts only the decoder low-dimensional subspace, which already encodes highly correlated depth structure, enabling efficient and fast test-time … view at source ↗

**Figure 3.** Figure 3: (a) Layer-wise correlation with the final depth output shows low correlation in the encoder and a sharp increase in the decoder. (b) PCA (PC1) visualizations indicate that decoder features already align closely with the final depth map, revealing strong depth information in a low-dimensional decoder subspace. side broader efforts [35, 37]. Nevertheless, current models remain fragile under severe domain sh… view at source ↗

**Figure 4.** Figure 4: Efficiency and performance comparison of test-time adaptation strategies. Decoder-only LoRA minimizes trainable parameters and adaptation time, while achieving a favorable speed–accuracy trade-off. (DFM) given an RGB input, which is then refined using sparse depth as supervision. However, existing TTO approaches require multiple iterations of full forward passes and parameter updates at inference time, i… view at source ↗

**Figure 5.** Figure 5: Energy fraction captured by low-rank components of decoder weight updates. Most layers exhibit strongly low-rank structures, where rank r = 8 explains over 90% of the total energy [PITH_FULL_IMAGE:figures/full_fig_p012_5.png] view at source ↗

**Figure 6.** Figure 6: Qualitative results on multiple datasets. For each dataset, the top row shows the error maps with respect to the ground truth, and the bottom row shows the corresponding depth predictions. Blue dashed boxes highlight representative regions for easier comparison; readers are encouraged to zoom in for detailed inspection. Impact of Adaptation Strategies. We conduct an ablation study on the NYUv2 dataset to … view at source ↗

**Figure 7.** Figure 7: Error maps and depth predictions over optimization iterations. The top row visualizes the error maps, while the bottom row presents the corresponding predicted depth maps. features while keeping the depth prediction function fixed, our approach operates in the decoder parameter space that governs metric scale and geometric structure. This enables more direct resolution of scale inconsistency and structur… view at source ↗

**Figure 8.** Figure 8: Qualitative results on multiple datasets. For each dataset, the top row shows the error maps with respect to the ground truth, and the bottom row shows the corresponding depth predictions. preserve object boundaries more reliably and show fewer severe local artifacts than prior zero-shot baselines. We further observe that Marigold-DC can become unstable on some samples, occasionally producing visibly degr… view at source ↗

**Figure 1.** Figure 1: Trade-off between training-based and test-time opti [PITH_FULL_IMAGE:figures/full_fig_p018_1.png] view at source ↗

**Figure 2.** Figure 2: Visual comparison of different adaptation paradigms. [PITH_FULL_IMAGE:figures/full_fig_p019_2.png] view at source ↗

read the original abstract

Zero-shot depth completion has gained attention for its ability to generalize across environments without sensor-specific datasets or retraining. However, most existing approaches rely on diffusion-based test-time optimization, which is computationally expensive due to iterative denoising. Recent visual-prompt-based methods reduce training cost but still require repeated forward--backward passes through the full frozen network to optimize input-level prompts, resulting in slow inference. In this work, we show that adapting only the decoder is sufficient for effective test-time optimization, as depth foundation models concentrate depth-relevant information within a low-dimensional decoder subspace. Based on this insight, we propose a lightweight test-time adaptation method that updates only this low-dimensional subspace using sparse depth supervision. Our approach achieves state-of-the-art performance, establishing a new Pareto frontier between accuracy and efficiency for test-time adaptation. Extensive experiments on five indoor and outdoor datasets demonstrate consistent improvements over prior methods, highlighting the practicality of fast zero-shot depth completion.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes an efficient test-time optimization approach for zero-shot depth completion. It argues that depth foundation models concentrate depth-relevant information in a low-dimensional decoder subspace, allowing adaptation of only this subspace via low-rank updates driven by sparse depth supervision. The method is claimed to achieve state-of-the-art results on five indoor and outdoor datasets while establishing a superior accuracy-efficiency Pareto frontier compared to diffusion-based and prompt-based baselines.

Significance. If the core assumption holds, the work would meaningfully advance practical test-time adaptation for depth completion by reducing the computational burden of full-network or iterative denoising methods, enabling faster inference without sacrificing accuracy across diverse scenes.

major comments (2)

[Abstract] Abstract: The central claim that depth foundation models concentrate depth-relevant information within a low-dimensional decoder subspace is presented without any described supporting analysis (e.g., activation statistics, subspace stability metrics across scenes, or direct ablation of decoder-only vs. encoder+decoder adaptation under identical sparse supervision). This assumption is load-bearing for the method's justification and the reported efficiency gains.
[§3] §3 (Method): No details are provided on how the low-rank subspace is identified or selected (e.g., whether it is determined post-hoc from the frozen model, via a fixed rank choice, or through a data-driven process), nor on error-bar controls or statistical significance for the SOTA claims across the five datasets. This leaves the central empirical support unverifiable from the given description.

minor comments (2)

[Abstract] The abstract mentions 'consistent improvements' but does not specify the exact metrics or baselines used for the Pareto frontier comparison.
[§3] Notation for the low-rank adaptation (e.g., definition of the subspace projection or update rule) should be introduced earlier for clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive feedback. We agree that additional supporting analysis and methodological details will strengthen the manuscript. Below we respond point-by-point to the major comments and indicate the revisions we will make.

read point-by-point responses

Referee: [Abstract] Abstract: The central claim that depth foundation models concentrate depth-relevant information within a low-dimensional decoder subspace is presented without any described supporting analysis (e.g., activation statistics, subspace stability metrics across scenes, or direct ablation of decoder-only vs. encoder+decoder adaptation under identical sparse supervision). This assumption is load-bearing for the method's justification and the reported efficiency gains.

Authors: We acknowledge that the abstract presents the core insight concisely without explicit supporting analysis. In the revised manuscript we will expand the abstract slightly and, more importantly, add a dedicated subsection in §3 (and corresponding figures in the main paper or supplementary material) that reports activation statistics across decoder layers, subspace stability metrics computed over multiple scenes, and a direct ablation comparing decoder-only low-rank adaptation versus full encoder+decoder adaptation under the same sparse supervision budget. These additions will make the load-bearing assumption verifiable and will be referenced from the abstract. revision: yes
Referee: [§3] §3 (Method): No details are provided on how the low-rank subspace is identified or selected (e.g., whether it is determined post-hoc from the frozen model, via a fixed rank choice, or through a data-driven process), nor on error-bar controls or statistical significance for the SOTA claims across the five datasets. This leaves the central empirical support unverifiable from the given description.

Authors: We agree that the current description of subspace identification is insufficient. In the revision we will clarify that the low-rank subspace is identified post-hoc from the frozen decoder weights via a data-driven singular-value analysis performed once on a small calibration set of depth maps; the rank is then chosen to retain 95% of the explained variance in the decoder activations. We will also add error bars (standard deviation over three random seeds) and report p-values for the SOTA comparisons on all five datasets in the experimental tables and text of §4. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected in the derivation chain

full rationale

The paper presents the concentration of depth-relevant information in a low-dimensional decoder subspace as an empirical insight motivating decoder-only adaptation. No quoted derivation, equation, or self-citation reduces the central claim to fitted inputs, self-definitions, or prior author results by construction. Experiments on five datasets provide external validation of the method's performance, keeping the chain self-contained without circular reduction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that depth information concentrates in a low-dimensional decoder subspace; no free parameters or invented entities are explicitly introduced in the abstract.

axioms (1)

domain assumption Depth foundation models concentrate depth-relevant information within a low-dimensional decoder subspace.
This is the load-bearing insight stated directly in the abstract as the basis for the adaptation method.

pith-pipeline@v0.9.0 · 5462 in / 1146 out tokens · 48856 ms · 2026-05-15T18:17:09.803109+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Rethinking Electro-Optical Vision Foundation Models for Remote Sensing Retrieval: A Controlled Comparison with Generalist VFM
cs.CV 2026-05 unverdicted novelty 5.0

Strong generalist vision foundation models match or outperform electro-optical specific models in remote sensing retrieval with better cross-scene stability.

Reference graph

Works this paper leans on

53 extracted references · 53 canonical work pages · cited by 1 Pith paper · 5 internal anchors

[1]

In: ICLR (2022)

Bao, H., Dong, L., Piao, S., Wei, F.: Beit: Bert pre-training of image transformers. In: ICLR (2022)

work page 2022
[2]

Bartolomei, L., Poggi, M., Conti, A., Tosi, F., Mattoccia, S.: Revisiting depth completion from a stereo matching perspective for cross-domain generalization. In: 3DV. pp. 1360–1370. IEEE (2024)

work page 2024
[3]

In: CVPR

Bhat, S.F., Alhashim, I., Wonka, P.: Adabins: Depth estimation using adaptive bins. In: CVPR. pp. 4009–4018 (2021)

work page 2021
[4]

In: ECCV

Bhat, S.F., Alhashim, I., Wonka, P.: Localbins: Improving depth estimation by learning local distributions. In: ECCV. pp. 480–496. Springer (2022)

work page 2022
[5]

Midas v3

Birkl, R., Wofk, D., Müller, M.: Midas v3. 1–a model zoo for robust monocular relative depth estimation. arXiv preprint arXiv:2307.14460 (2023)

work page arXiv 2023
[6]

Depth Pro: Sharp Monocular Metric Depth in Less Than a Second

Bochkovskii, A., Delaunoy, A., Germain, H., Santos, M., Zhou, Y., Richter, S.R., Koltun, V.: Depth pro: Sharp monocular metric depth in less than a second. arXiv preprint arXiv:2410.02073 (2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024
[7]

In: ICCV

Caron, M., Touvron, H., Misra, I., Jégou, H., Mairal, J., Bojanowski, P., Joulin, A.: Emerging properties in self-supervised vision transformers. In: ICCV. pp. 9650– 9660 (2021)

work page 2021
[8]

Structure-Aware Residual Pyramid Network for Monocular Depth Estimation

Chen, X., Chen, X., Zha, Z.J.: Structure-aware residual pyramid network for monocular depth estimation. arXiv preprint arXiv:1907.06023 (2019)

work page internal anchor Pith review Pith/arXiv arXiv 1907
[9]

In: ACCV

Chodosh, N., Wang, C., Lucey, S.: Deep convolutional compressed sensing for lidar depth completion. In: ACCV. pp. 499–513. Springer (2018)

work page 2018
[10]

Conti,A.,Poggi,M.,Mattoccia,S.:Sparsityagnosticdepthcompletion.In:WACV. pp. 5871–5880 (2023) Depth in One Rank 25

work page 2023
[11]

NeurIPS27(2014)

Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. NeurIPS27(2014)

work page 2014
[12]

In: CVPR

Fu, H., Gong, M., Wang, C., Batmanghelich, K., Tao, D.: Deep ordinal regression network for monocular depth estimation. In: CVPR. pp. 2002–2011 (2018)

work page 2002
[13]

In: ECCV

Fu, X., Yin, W., Hu, M., Wang, K., Ma, Y., Tan, P., Shen, S., Lin, D., Long, X.: Geowizard: Unleashing the diffusion priors for 3d geometry estimation from a single image. In: ECCV. pp. 241–258. Springer (2024)

work page 2024
[14]

The international journal of robotics research32(11), 1231–1237 (2013)

Geiger, A., Lenz, P., Stiller, C., Urtasun, R.: Vision meets robotics: The kitti dataset. The international journal of robotics research32(11), 1231–1237 (2013)

work page 2013
[15]

In: CVPR

Guizilini, V., Ambrus, R., Pillai, S., Raventos, A., Gaidon, A.: 3d packing for self-supervised monocular depth estimation. In: CVPR. pp. 2485–2494 (2020)

work page 2020
[16]

Hao, Z., Li, Y., You, S., Lu, F.: Detail preserving depth estimation from a single image using attention guided networks. In: 3DV. pp. 304–313. IEEE (2018)

work page 2018
[17]

In: CVPR (2022)

He, K., Chen, X., Xie, S., Li, Y., Dollár, P., Girshick, R.: Masked autoencoders are scalable vision learners. In: CVPR (2022)

work page 2022
[18]

In: WACV

Hu, J., Ozay, M., Zhang, Y., Okatani, T.: Revisiting single image depth estimation: Toward higher resolution maps with accurate object boundaries. In: WACV. pp. 1043–1051. IEEE (2019)

work page 2019
[19]

In: AAAI (2025)

Hyoseok, L., Kim, K.S., Byung-Ki, K., Oh, T.H.: Zero-shot depth completion via test-time alignment with affine-invariant depth prior. In: AAAI (2025)

work page 2025
[20]

In: ICCV

Jeong, C., Bae, I., Park, J.H., Jeon, H.G.: Test-time prompt tuning for zero-shot depth completion. In: ICCV. pp. 9443–9454 (2025)

work page 2025
[21]

In: ECCV

Jun, J., Lee, J.H., Lee, C., Kim, C.S.: Depth map decomposition for monocular depth estimation. In: ECCV. pp. 18–34. Springer (2022)

work page 2022
[22]

In: CVPR

Ke, B., Obukhov, A., Huang, S., Metzger, N., Daudt, R.C., Schindler, K.: Re- purposing diffusion-based image generators for monocular depth estimation. In: CVPR. pp. 9492–9502 (2024)

work page 2024
[23]

In: Proceedings of the European Conference on Computer Vision Workshop (ECCVW)

Koch, T., Liebel, L., Fraundorfer, F., Korner, M.: Evaluation of cnn-based single- image depth estimation methods. In: Proceedings of the European Conference on Computer Vision Workshop (ECCVW). pp. 0–0 (2018)

work page 2018
[24]

Laina, I., Rupprecht, C., Belagiannis, V., Tombari, F., Navab, N.: Deeper depth prediction with fully convolutional residual networks. In: 3DV. pp. 239–248. IEEE (2016)

work page 2016
[25]

From big to small: Multi-scale local planar guidance for monocular depth estimation,

Lee, J.H., Han, M.K., Ko, D.W., Suh, I.H.: From big to small: Multi-scale local planar guidance for monocular depth estimation. arXiv preprint arXiv:1907.10326 (2019)

work page arXiv 1907
[26]

In: ECCV

Leroy, V., Cabon, Y., Revaud, J.: Grounding image matching in 3d with mast3r. In: ECCV. pp. 71–91. Springer (2024)

work page 2024
[27]

Depth Anything 3: Recovering the Visual Space from Any Views

Lin, H., Chen, S., Liew, J., Chen, D.Y., Li, Z., Shi, G., Feng, J., Kang, B.: Depth anything 3: Recovering the visual space from any views. arXiv preprint arXiv:2511.10647 (2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[28]

In: CVPR

Lin, H., Peng, S., Chen, J., Peng, S., Sun, J., Liu, M., Bao, H., Feng, J., Zhou, X., Kang, B.: Prompting depth anything for 4k resolution accurate metric depth estimation. In: CVPR. pp. 17070–17080 (2025)

work page 2025
[29]

Depthlab: From partial to complete.arXiv preprint arXiv:2412.18153, 2024

Liu, Z., Cheng, K.L., Wang, Q., Wang, S., Ouyang, H., Tan, B., Zhu, K., Shen, Y., Chen, Q., Luo, P.: Depthlab: From partial to complete. arXiv preprint arXiv:2412.18153 (2024)

work page arXiv 2024
[30]

In: ICRA

Ma, F., Karaman, S.: Sparse-to-dense: Depth prediction from sparse depth samples and a single image. In: ICRA. pp. 4796–4803. IEEE (2018) 26 Minseok Seo ∗, Wonjun Lee∗, Jaehyuk Jang, and Changick Kim†

work page 2018
[31]

Advances in neural information processing systems27 (2014)

Montúfar, G., Pascanu, R., Cho, K., Bengio, Y.: On the number of linear regions of deep neural networks. Advances in neural information processing systems27 (2014)

work page 2014
[32]

Towards stable test-time adaptation in dynamic wild world,

Niu, S., Wu, J., Zhang, Y., Wen, Z., Chen, Y., Zhao, P., Tan, M.: Towards sta- ble test-time adaptation in dynamic wild world. arXiv preprint arXiv:2302.12400 (2023)

work page arXiv 2023
[33]

TMLR (2024)

Oquab, M., Darcet, T., Moutakanni, T., Vo, H.V., Szafraniec, M., Khalidov, V., Fernandez, P., HAZIZA, D., Massa, F., El-Nouby, A., Assran, M., Ballas, N., Galuba, W., Howes, R., Huang, P.Y., Li, S.W., Misra, I., Rabbat, M., Sharma, V., Synnaeve, G., Xu, H., Jegou, H., Mairal, J., Labatut, P., Joulin, A., Bojanowski, P.: DINOv2: Learning robust visual feat...

work page 2024
[34]

In: ECCV

Park, J., Joo, K., Hu, Z., Liu, C.K., So Kweon, I.: Non-local spatial propagation network for depth completion. In: ECCV. pp. 120–136. Springer (2020)

work page 2020
[35]

UniDepthV2: Universal Monocular Metric Depth Estimation Made Simpler

Piccinelli, L., Sakaridis, C., Yang, Y.H., Segu, M., Li, S., Abbeloos, W., Van Gool, L.: Unidepthv2: Universal monocular metric depth estimation made simpler. arXiv preprint arXiv:2502.20110 (2025)

work page internal anchor Pith review arXiv 2025
[36]

In: CVPR

Piccinelli, L., Yang, Y.H., Sakaridis, C., Segu, M., Li, S., Van Gool, L., Yu, F.: Unidepth: Universal monocular metric depth estimation. In: CVPR. pp. 10106– 10116 (2024)

work page 2024
[37]

arXiv preprint arXiv:2601.02760 (2026)

Ren, Z., Zhang, Z., Li, W., Liu, Q., Tang, H.: Anydepth: Depth estimation made easy. arXiv preprint arXiv:2601.02760 (2026)

work page arXiv 2026
[38]

arXiv preprint arXiv:2511.16301 (2025)

Seo, M., Hamilton, M., Kim, C.: Upsample anything: A simple and hard to beat baseline for feature upsampling. arXiv preprint arXiv:2511.16301 (2025)

work page arXiv 2025
[39]

In: ECCV

Silberman, N., Hoiem, D., Kohli, P., Fergus, R.: Indoor segmentation and support inference from rgbd images. In: ECCV. pp. 746–760. Springer (2012)

work page 2012
[40]

DINOv3

Siméoni, O., Vo, H.V., Seitzer, M., Baldassarre, F., Oquab, M., Jose, C., Khali- dov, V., Szafraniec, M., Yi, S., Ramamonjisoa, M., et al.: Dinov3. arXiv preprint arXiv:2508.10104 (2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[41]

In: CVPR

Tang, J., Tian, F.P., An, B., Li, J., Tan, P.: Bilateral propagation network for depth completion. In: CVPR. pp. 9763–9772 (2024)

work page 2024
[42]

Uhrig, J., Schneider, N., Schneider, L., Franke, U., Brox, T., Geiger, A.: Sparsity invariant cnns. In: 3DV. pp. 11–20. IEEE (2017)

work page 2017
[43]

In: ICCV

Viola, M., Qu, K., Metzger, N., Ke, B., Becker, A., Schindler, K., Obukhov, A.: Marigold-dc: Zero-shot monocular depth completion with guided diffusion. In: ICCV. pp. 5359–5370 (2025)

work page 2025
[44]

In: CVPR

Wang, J., Chen, M., Karaev, N., Vedaldi, A., Rupprecht, C., Novotny, D.: Vggt: Visual geometry grounded transformer. In: CVPR. pp. 5294–5306 (2025)

work page 2025
[45]

IEEE Robotics and Automation Letters (RA-L)5(2), 1899–1906 (2020)

Wong, A., Fei, X., Tsuei, S., Soatto, S.: Unsupervised depth completion from visual inertial odometry. IEEE Robotics and Automation Letters (RA-L)5(2), 1899–1906 (2020)

work page 1906
[46]

In: CVPR

Yang,L.,Kang,B.,Huang,Z.,Xu,X.,Feng,J.,Zhao,H.:Depthanything:Unleash- ing the power of large-scale unlabeled data. In: CVPR. pp. 10371–10381 (2024)

work page 2024
[47]

NeurIPS37, 21875–21911 (2024)

Yang, L., Kang, B., Huang, Z., Zhao, Z., Xu, X., Feng, J., Zhao, H.: Depth anything v2. NeurIPS37, 21875–21911 (2024)

work page 2024
[48]

In: ICCV

Yin, W., Liu, Y., Shen, C., Yan, Y.: Enforcing geometric constraints of virtual normal for depth prediction. In: ICCV. pp. 5684–5693 (2019)

work page 2019
[49]

In: ICCV

Yin, W., Zhang, C., Chen, H., Cai, Z., Yu, G., Wang, K., Chen, X., Shen, C.: Metric3d: Towards zero-shot metric 3d prediction from a single image. In: ICCV. pp. 9043–9053 (2023) Depth in One Rank 27

work page 2023
[50]

arXiv preprint arXiv:2203.01502 (2022)

Yuan, W., Gu, X., Dai, Z., Zhu, S., Tan, P.: New crfs: Neural window fully- connected crfs for monocular depth estimation. arXiv preprint arXiv:2203.01502 (2022)

work page arXiv 2022
[51]

In: CVPR

Zhang, Y., Guo, X., Poggi, M., Zhu, Z., Huang, G., Mattoccia, S.: Completion- former: Depth completion with convolutions and vision transformers. In: CVPR. pp. 18527–18536 (2023)

work page 2023
[52]

In: ICLR (2022)

Zhou, J., Wei, C., Wang, H., Shen, W., Xie, C., Yuille, A., Kong, T.: ibot: Image bert pre-training with online tokenizer. In: ICLR (2022)

work page 2022
[53]

In: ECCV

Zuo, Y., Deng, J.: Ogni-dc: Robust depth completion with optimization-guided neural iterations. In: ECCV. pp. 78–95. Springer (2024)

work page 2024

[1] [1]

In: ICLR (2022)

Bao, H., Dong, L., Piao, S., Wei, F.: Beit: Bert pre-training of image transformers. In: ICLR (2022)

work page 2022

[2] [2]

Bartolomei, L., Poggi, M., Conti, A., Tosi, F., Mattoccia, S.: Revisiting depth completion from a stereo matching perspective for cross-domain generalization. In: 3DV. pp. 1360–1370. IEEE (2024)

work page 2024

[3] [3]

In: CVPR

Bhat, S.F., Alhashim, I., Wonka, P.: Adabins: Depth estimation using adaptive bins. In: CVPR. pp. 4009–4018 (2021)

work page 2021

[4] [4]

In: ECCV

Bhat, S.F., Alhashim, I., Wonka, P.: Localbins: Improving depth estimation by learning local distributions. In: ECCV. pp. 480–496. Springer (2022)

work page 2022

[5] [5]

Midas v3

Birkl, R., Wofk, D., Müller, M.: Midas v3. 1–a model zoo for robust monocular relative depth estimation. arXiv preprint arXiv:2307.14460 (2023)

work page arXiv 2023

[6] [6]

Depth Pro: Sharp Monocular Metric Depth in Less Than a Second

Bochkovskii, A., Delaunoy, A., Germain, H., Santos, M., Zhou, Y., Richter, S.R., Koltun, V.: Depth pro: Sharp monocular metric depth in less than a second. arXiv preprint arXiv:2410.02073 (2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024

[7] [7]

In: ICCV

Caron, M., Touvron, H., Misra, I., Jégou, H., Mairal, J., Bojanowski, P., Joulin, A.: Emerging properties in self-supervised vision transformers. In: ICCV. pp. 9650– 9660 (2021)

work page 2021

[8] [8]

Structure-Aware Residual Pyramid Network for Monocular Depth Estimation

Chen, X., Chen, X., Zha, Z.J.: Structure-aware residual pyramid network for monocular depth estimation. arXiv preprint arXiv:1907.06023 (2019)

work page internal anchor Pith review Pith/arXiv arXiv 1907

[9] [9]

In: ACCV

Chodosh, N., Wang, C., Lucey, S.: Deep convolutional compressed sensing for lidar depth completion. In: ACCV. pp. 499–513. Springer (2018)

work page 2018

[10] [10]

Conti,A.,Poggi,M.,Mattoccia,S.:Sparsityagnosticdepthcompletion.In:WACV. pp. 5871–5880 (2023) Depth in One Rank 25

work page 2023

[11] [11]

NeurIPS27(2014)

Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. NeurIPS27(2014)

work page 2014

[12] [12]

In: CVPR

Fu, H., Gong, M., Wang, C., Batmanghelich, K., Tao, D.: Deep ordinal regression network for monocular depth estimation. In: CVPR. pp. 2002–2011 (2018)

work page 2002

[13] [13]

In: ECCV

Fu, X., Yin, W., Hu, M., Wang, K., Ma, Y., Tan, P., Shen, S., Lin, D., Long, X.: Geowizard: Unleashing the diffusion priors for 3d geometry estimation from a single image. In: ECCV. pp. 241–258. Springer (2024)

work page 2024

[14] [14]

The international journal of robotics research32(11), 1231–1237 (2013)

Geiger, A., Lenz, P., Stiller, C., Urtasun, R.: Vision meets robotics: The kitti dataset. The international journal of robotics research32(11), 1231–1237 (2013)

work page 2013

[15] [15]

In: CVPR

Guizilini, V., Ambrus, R., Pillai, S., Raventos, A., Gaidon, A.: 3d packing for self-supervised monocular depth estimation. In: CVPR. pp. 2485–2494 (2020)

work page 2020

[16] [16]

Hao, Z., Li, Y., You, S., Lu, F.: Detail preserving depth estimation from a single image using attention guided networks. In: 3DV. pp. 304–313. IEEE (2018)

work page 2018

[17] [17]

In: CVPR (2022)

He, K., Chen, X., Xie, S., Li, Y., Dollár, P., Girshick, R.: Masked autoencoders are scalable vision learners. In: CVPR (2022)

work page 2022

[18] [18]

In: WACV

Hu, J., Ozay, M., Zhang, Y., Okatani, T.: Revisiting single image depth estimation: Toward higher resolution maps with accurate object boundaries. In: WACV. pp. 1043–1051. IEEE (2019)

work page 2019

[19] [19]

In: AAAI (2025)

Hyoseok, L., Kim, K.S., Byung-Ki, K., Oh, T.H.: Zero-shot depth completion via test-time alignment with affine-invariant depth prior. In: AAAI (2025)

work page 2025

[20] [20]

In: ICCV

Jeong, C., Bae, I., Park, J.H., Jeon, H.G.: Test-time prompt tuning for zero-shot depth completion. In: ICCV. pp. 9443–9454 (2025)

work page 2025

[21] [21]

In: ECCV

Jun, J., Lee, J.H., Lee, C., Kim, C.S.: Depth map decomposition for monocular depth estimation. In: ECCV. pp. 18–34. Springer (2022)

work page 2022

[22] [22]

In: CVPR

Ke, B., Obukhov, A., Huang, S., Metzger, N., Daudt, R.C., Schindler, K.: Re- purposing diffusion-based image generators for monocular depth estimation. In: CVPR. pp. 9492–9502 (2024)

work page 2024

[23] [23]

In: Proceedings of the European Conference on Computer Vision Workshop (ECCVW)

Koch, T., Liebel, L., Fraundorfer, F., Korner, M.: Evaluation of cnn-based single- image depth estimation methods. In: Proceedings of the European Conference on Computer Vision Workshop (ECCVW). pp. 0–0 (2018)

work page 2018

[24] [24]

Laina, I., Rupprecht, C., Belagiannis, V., Tombari, F., Navab, N.: Deeper depth prediction with fully convolutional residual networks. In: 3DV. pp. 239–248. IEEE (2016)

work page 2016

[25] [25]

From big to small: Multi-scale local planar guidance for monocular depth estimation,

Lee, J.H., Han, M.K., Ko, D.W., Suh, I.H.: From big to small: Multi-scale local planar guidance for monocular depth estimation. arXiv preprint arXiv:1907.10326 (2019)

work page arXiv 1907

[26] [26]

In: ECCV

Leroy, V., Cabon, Y., Revaud, J.: Grounding image matching in 3d with mast3r. In: ECCV. pp. 71–91. Springer (2024)

work page 2024

[27] [27]

Depth Anything 3: Recovering the Visual Space from Any Views

Lin, H., Chen, S., Liew, J., Chen, D.Y., Li, Z., Shi, G., Feng, J., Kang, B.: Depth anything 3: Recovering the visual space from any views. arXiv preprint arXiv:2511.10647 (2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025

[28] [28]

In: CVPR

Lin, H., Peng, S., Chen, J., Peng, S., Sun, J., Liu, M., Bao, H., Feng, J., Zhou, X., Kang, B.: Prompting depth anything for 4k resolution accurate metric depth estimation. In: CVPR. pp. 17070–17080 (2025)

work page 2025

[29] [29]

Depthlab: From partial to complete.arXiv preprint arXiv:2412.18153, 2024

Liu, Z., Cheng, K.L., Wang, Q., Wang, S., Ouyang, H., Tan, B., Zhu, K., Shen, Y., Chen, Q., Luo, P.: Depthlab: From partial to complete. arXiv preprint arXiv:2412.18153 (2024)

work page arXiv 2024

[30] [30]

In: ICRA

Ma, F., Karaman, S.: Sparse-to-dense: Depth prediction from sparse depth samples and a single image. In: ICRA. pp. 4796–4803. IEEE (2018) 26 Minseok Seo ∗, Wonjun Lee∗, Jaehyuk Jang, and Changick Kim†

work page 2018

[31] [31]

Advances in neural information processing systems27 (2014)

Montúfar, G., Pascanu, R., Cho, K., Bengio, Y.: On the number of linear regions of deep neural networks. Advances in neural information processing systems27 (2014)

work page 2014

[32] [32]

Towards stable test-time adaptation in dynamic wild world,

Niu, S., Wu, J., Zhang, Y., Wen, Z., Chen, Y., Zhao, P., Tan, M.: Towards sta- ble test-time adaptation in dynamic wild world. arXiv preprint arXiv:2302.12400 (2023)

work page arXiv 2023

[33] [33]

TMLR (2024)

Oquab, M., Darcet, T., Moutakanni, T., Vo, H.V., Szafraniec, M., Khalidov, V., Fernandez, P., HAZIZA, D., Massa, F., El-Nouby, A., Assran, M., Ballas, N., Galuba, W., Howes, R., Huang, P.Y., Li, S.W., Misra, I., Rabbat, M., Sharma, V., Synnaeve, G., Xu, H., Jegou, H., Mairal, J., Labatut, P., Joulin, A., Bojanowski, P.: DINOv2: Learning robust visual feat...

work page 2024

[34] [34]

In: ECCV

Park, J., Joo, K., Hu, Z., Liu, C.K., So Kweon, I.: Non-local spatial propagation network for depth completion. In: ECCV. pp. 120–136. Springer (2020)

work page 2020

[35] [35]

UniDepthV2: Universal Monocular Metric Depth Estimation Made Simpler

Piccinelli, L., Sakaridis, C., Yang, Y.H., Segu, M., Li, S., Abbeloos, W., Van Gool, L.: Unidepthv2: Universal monocular metric depth estimation made simpler. arXiv preprint arXiv:2502.20110 (2025)

work page internal anchor Pith review arXiv 2025

[36] [36]

In: CVPR

Piccinelli, L., Yang, Y.H., Sakaridis, C., Segu, M., Li, S., Van Gool, L., Yu, F.: Unidepth: Universal monocular metric depth estimation. In: CVPR. pp. 10106– 10116 (2024)

work page 2024

[37] [37]

arXiv preprint arXiv:2601.02760 (2026)

Ren, Z., Zhang, Z., Li, W., Liu, Q., Tang, H.: Anydepth: Depth estimation made easy. arXiv preprint arXiv:2601.02760 (2026)

work page arXiv 2026

[38] [38]

arXiv preprint arXiv:2511.16301 (2025)

Seo, M., Hamilton, M., Kim, C.: Upsample anything: A simple and hard to beat baseline for feature upsampling. arXiv preprint arXiv:2511.16301 (2025)

work page arXiv 2025

[39] [39]

In: ECCV

Silberman, N., Hoiem, D., Kohli, P., Fergus, R.: Indoor segmentation and support inference from rgbd images. In: ECCV. pp. 746–760. Springer (2012)

work page 2012

[40] [40]

DINOv3

Siméoni, O., Vo, H.V., Seitzer, M., Baldassarre, F., Oquab, M., Jose, C., Khali- dov, V., Szafraniec, M., Yi, S., Ramamonjisoa, M., et al.: Dinov3. arXiv preprint arXiv:2508.10104 (2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025

[41] [41]

In: CVPR

Tang, J., Tian, F.P., An, B., Li, J., Tan, P.: Bilateral propagation network for depth completion. In: CVPR. pp. 9763–9772 (2024)

work page 2024

[42] [42]

Uhrig, J., Schneider, N., Schneider, L., Franke, U., Brox, T., Geiger, A.: Sparsity invariant cnns. In: 3DV. pp. 11–20. IEEE (2017)

work page 2017

[43] [43]

In: ICCV

Viola, M., Qu, K., Metzger, N., Ke, B., Becker, A., Schindler, K., Obukhov, A.: Marigold-dc: Zero-shot monocular depth completion with guided diffusion. In: ICCV. pp. 5359–5370 (2025)

work page 2025

[44] [44]

In: CVPR

Wang, J., Chen, M., Karaev, N., Vedaldi, A., Rupprecht, C., Novotny, D.: Vggt: Visual geometry grounded transformer. In: CVPR. pp. 5294–5306 (2025)

work page 2025

[45] [45]

IEEE Robotics and Automation Letters (RA-L)5(2), 1899–1906 (2020)

Wong, A., Fei, X., Tsuei, S., Soatto, S.: Unsupervised depth completion from visual inertial odometry. IEEE Robotics and Automation Letters (RA-L)5(2), 1899–1906 (2020)

work page 1906

[46] [46]

In: CVPR

Yang,L.,Kang,B.,Huang,Z.,Xu,X.,Feng,J.,Zhao,H.:Depthanything:Unleash- ing the power of large-scale unlabeled data. In: CVPR. pp. 10371–10381 (2024)

work page 2024

[47] [47]

NeurIPS37, 21875–21911 (2024)

Yang, L., Kang, B., Huang, Z., Zhao, Z., Xu, X., Feng, J., Zhao, H.: Depth anything v2. NeurIPS37, 21875–21911 (2024)

work page 2024

[48] [48]

In: ICCV

Yin, W., Liu, Y., Shen, C., Yan, Y.: Enforcing geometric constraints of virtual normal for depth prediction. In: ICCV. pp. 5684–5693 (2019)

work page 2019

[49] [49]

In: ICCV

Yin, W., Zhang, C., Chen, H., Cai, Z., Yu, G., Wang, K., Chen, X., Shen, C.: Metric3d: Towards zero-shot metric 3d prediction from a single image. In: ICCV. pp. 9043–9053 (2023) Depth in One Rank 27

work page 2023

[50] [50]

arXiv preprint arXiv:2203.01502 (2022)

Yuan, W., Gu, X., Dai, Z., Zhu, S., Tan, P.: New crfs: Neural window fully- connected crfs for monocular depth estimation. arXiv preprint arXiv:2203.01502 (2022)

work page arXiv 2022

[51] [51]

In: CVPR

Zhang, Y., Guo, X., Poggi, M., Zhu, Z., Huang, G., Mattoccia, S.: Completion- former: Depth completion with convolutions and vision transformers. In: CVPR. pp. 18527–18536 (2023)

work page 2023

[52] [52]

In: ICLR (2022)

Zhou, J., Wei, C., Wang, H., Shen, W., Xie, C., Yuille, A., Kong, T.: ibot: Image bert pre-training with online tokenizer. In: ICLR (2022)

work page 2022

[53] [53]

In: ECCV

Zuo, Y., Deng, J.: Ogni-dc: Robust depth completion with optimization-guided neural iterations. In: ECCV. pp. 78–95. Springer (2024)

work page 2024