pith. sign in

arxiv: 2603.01765 · v4 · submitted 2026-03-02 · 💻 cs.CV

Efficient Test-Time Optimization for Depth Completion via Low-Rank Decoder Adaptation

Pith reviewed 2026-05-15 18:17 UTC · model grok-4.3

classification 💻 cs.CV
keywords depth completiontest-time adaptationlow-rank adaptationzero-shot learningdecoder subspacefoundation modelsefficient inference
0
0 comments X

The pith

Depth foundation models concentrate depth information in a low-dimensional decoder subspace, so updating only that subspace during test-time optimization is enough for strong zero-shot depth completion.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper argues that existing test-time methods for zero-shot depth completion either run slow diffusion iterations or repeatedly optimize input prompts through the entire frozen network. It shows instead that the models already pack the depth-relevant signals into a small decoder subspace. Updating only that low-rank part with sparse depth measurements produces accurate results at far lower cost. Experiments across five indoor and outdoor datasets confirm the approach reaches state-of-the-art accuracy while cutting inference time substantially.

Core claim

Depth foundation models concentrate depth-relevant information within a low-dimensional decoder subspace; therefore adapting only this subspace with sparse depth supervision suffices for effective test-time optimization and yields a new accuracy-efficiency Pareto frontier.

What carries the argument

Low-rank decoder adaptation that identifies and updates only the low-dimensional subspace holding depth-relevant features.

If this is right

  • The method achieves state-of-the-art performance on five indoor and outdoor depth completion benchmarks.
  • It reduces the number of forward-backward passes compared with diffusion-based or prompt-optimization baselines.
  • It establishes a new accuracy-efficiency trade-off curve for test-time adaptation.
  • The approach enables practical real-time zero-shot depth completion without sensor-specific retraining.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same subspace concentration may appear in other foundation models, allowing similar low-cost adaptation for tasks such as surface normal estimation or semantic segmentation.
  • If the subspace can be located with even fewer samples, the method could extend to single-image adaptation scenarios.
  • Hardware implementations could cache the adapted decoder weights for repeated use on similar scenes, further amortizing the one-time optimization cost.

Load-bearing premise

Depth-relevant information is concentrated in a low-dimensional decoder subspace that can be reliably identified and updated using only sparse depth supervision across diverse indoor and outdoor scenes.

What would settle it

A new scene or dataset where updating only the identified decoder subspace produces no accuracy gain over the frozen baseline while full-network or prompt optimization still improves results.

Figures

Figures reproduced from arXiv: 2603.01765 by Changick Kim, Jaehyuk Jang, Minseok Seo, Wonjun Lee.

Figure 1
Figure 1. Figure 1: We compare a training-based method (PromptDA [28]) with test-time optimization-based depth completion approaches [20, 43]. PromptDA requires sensor￾specific training and achieves real-time inference, but suffers from large reconstruction error. Existing test-time optimization-based improve accuracy at the cost of several seconds of inference per image. In contrast, our method establishes a new Pareto front… view at source ↗
Figure 2
Figure 2. Figure 2: (a) Training-based depth completion relies on offline training with paired RGB–depth data. (b) Test-time optimization methods adapt either latent variables or visual prompts at inference time, incurring significant computational cost. (c) In contrast, our method adapts only the decoder low-dimensional subspace, which al￾ready encodes highly correlated depth structure, enabling efficient and fast test-time … view at source ↗
Figure 3
Figure 3. Figure 3: (a) Layer-wise correlation with the final depth output shows low correlation in the encoder and a sharp increase in the decoder. (b) PCA (PC1) visualizations indicate that decoder features already align closely with the final depth map, revealing strong depth information in a low-dimensional decoder subspace. side broader efforts [35, 37]. Nevertheless, current models remain fragile un￾der severe domain sh… view at source ↗
Figure 4
Figure 4. Figure 4: Efficiency and performance comparison of test-time adaptation strategies. Decoder-only LoRA minimizes trainable parameters and adaptation time, while achiev￾ing a favorable speed–accuracy trade-off. (DFM) given an RGB input, which is then refined using sparse depth as su￾pervision. However, existing TTO approaches require multiple iterations of full forward passes and parameter updates at inference time, i… view at source ↗
Figure 5
Figure 5. Figure 5: Energy fraction captured by low-rank components of decoder weight updates. Most layers exhibit strongly low-rank structures, where rank r = 8 explains over 90% of the total energy [PITH_FULL_IMAGE:figures/full_fig_p012_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Qualitative results on multiple datasets. For each dataset, the top row shows the error maps with respect to the ground truth, and the bottom row shows the cor￾responding depth predictions. Blue dashed boxes highlight representative regions for easier comparison; readers are encouraged to zoom in for detailed inspection. Impact of Adaptation Strategies. We conduct an ablation study on the NYUv2 dataset to … view at source ↗
Figure 7
Figure 7. Figure 7: Error maps and depth predictions over optimization iterations. The top row visualizes the error maps, while the bottom row presents the corresponding predicted depth maps. features while keeping the depth prediction function fixed, our approach oper￾ates in the decoder parameter space that governs metric scale and geometric structure. This enables more direct resolution of scale inconsistency and struc￾tur… view at source ↗
Figure 8
Figure 8. Figure 8: Qualitative results on multiple datasets. For each dataset, the top row shows the error maps with respect to the ground truth, and the bottom row shows the corre￾sponding depth predictions. preserve object boundaries more reliably and show fewer severe local artifacts than prior zero-shot baselines. We further observe that Marigold-DC can become unstable on some samples, occasionally producing visibly degr… view at source ↗
Figure 1
Figure 1. Figure 1: Trade-off between training-based and test-time opti [PITH_FULL_IMAGE:figures/full_fig_p018_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Visual comparison of different adaptation paradigms. [PITH_FULL_IMAGE:figures/full_fig_p019_2.png] view at source ↗
read the original abstract

Zero-shot depth completion has gained attention for its ability to generalize across environments without sensor-specific datasets or retraining. However, most existing approaches rely on diffusion-based test-time optimization, which is computationally expensive due to iterative denoising. Recent visual-prompt-based methods reduce training cost but still require repeated forward--backward passes through the full frozen network to optimize input-level prompts, resulting in slow inference. In this work, we show that adapting only the decoder is sufficient for effective test-time optimization, as depth foundation models concentrate depth-relevant information within a low-dimensional decoder subspace. Based on this insight, we propose a lightweight test-time adaptation method that updates only this low-dimensional subspace using sparse depth supervision. Our approach achieves state-of-the-art performance, establishing a new Pareto frontier between accuracy and efficiency for test-time adaptation. Extensive experiments on five indoor and outdoor datasets demonstrate consistent improvements over prior methods, highlighting the practicality of fast zero-shot depth completion.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes an efficient test-time optimization approach for zero-shot depth completion. It argues that depth foundation models concentrate depth-relevant information in a low-dimensional decoder subspace, allowing adaptation of only this subspace via low-rank updates driven by sparse depth supervision. The method is claimed to achieve state-of-the-art results on five indoor and outdoor datasets while establishing a superior accuracy-efficiency Pareto frontier compared to diffusion-based and prompt-based baselines.

Significance. If the core assumption holds, the work would meaningfully advance practical test-time adaptation for depth completion by reducing the computational burden of full-network or iterative denoising methods, enabling faster inference without sacrificing accuracy across diverse scenes.

major comments (2)
  1. [Abstract] Abstract: The central claim that depth foundation models concentrate depth-relevant information within a low-dimensional decoder subspace is presented without any described supporting analysis (e.g., activation statistics, subspace stability metrics across scenes, or direct ablation of decoder-only vs. encoder+decoder adaptation under identical sparse supervision). This assumption is load-bearing for the method's justification and the reported efficiency gains.
  2. [§3] §3 (Method): No details are provided on how the low-rank subspace is identified or selected (e.g., whether it is determined post-hoc from the frozen model, via a fixed rank choice, or through a data-driven process), nor on error-bar controls or statistical significance for the SOTA claims across the five datasets. This leaves the central empirical support unverifiable from the given description.
minor comments (2)
  1. [Abstract] The abstract mentions 'consistent improvements' but does not specify the exact metrics or baselines used for the Pareto frontier comparison.
  2. [§3] Notation for the low-rank adaptation (e.g., definition of the subspace projection or update rule) should be introduced earlier for clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive feedback. We agree that additional supporting analysis and methodological details will strengthen the manuscript. Below we respond point-by-point to the major comments and indicate the revisions we will make.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The central claim that depth foundation models concentrate depth-relevant information within a low-dimensional decoder subspace is presented without any described supporting analysis (e.g., activation statistics, subspace stability metrics across scenes, or direct ablation of decoder-only vs. encoder+decoder adaptation under identical sparse supervision). This assumption is load-bearing for the method's justification and the reported efficiency gains.

    Authors: We acknowledge that the abstract presents the core insight concisely without explicit supporting analysis. In the revised manuscript we will expand the abstract slightly and, more importantly, add a dedicated subsection in §3 (and corresponding figures in the main paper or supplementary material) that reports activation statistics across decoder layers, subspace stability metrics computed over multiple scenes, and a direct ablation comparing decoder-only low-rank adaptation versus full encoder+decoder adaptation under the same sparse supervision budget. These additions will make the load-bearing assumption verifiable and will be referenced from the abstract. revision: yes

  2. Referee: [§3] §3 (Method): No details are provided on how the low-rank subspace is identified or selected (e.g., whether it is determined post-hoc from the frozen model, via a fixed rank choice, or through a data-driven process), nor on error-bar controls or statistical significance for the SOTA claims across the five datasets. This leaves the central empirical support unverifiable from the given description.

    Authors: We agree that the current description of subspace identification is insufficient. In the revision we will clarify that the low-rank subspace is identified post-hoc from the frozen decoder weights via a data-driven singular-value analysis performed once on a small calibration set of depth maps; the rank is then chosen to retain 95% of the explained variance in the decoder activations. We will also add error bars (standard deviation over three random seeds) and report p-values for the SOTA comparisons on all five datasets in the experimental tables and text of §4. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected in the derivation chain

full rationale

The paper presents the concentration of depth-relevant information in a low-dimensional decoder subspace as an empirical insight motivating decoder-only adaptation. No quoted derivation, equation, or self-citation reduces the central claim to fitted inputs, self-definitions, or prior author results by construction. Experiments on five datasets provide external validation of the method's performance, keeping the chain self-contained without circular reduction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that depth information concentrates in a low-dimensional decoder subspace; no free parameters or invented entities are explicitly introduced in the abstract.

axioms (1)
  • domain assumption Depth foundation models concentrate depth-relevant information within a low-dimensional decoder subspace.
    This is the load-bearing insight stated directly in the abstract as the basis for the adaptation method.

pith-pipeline@v0.9.0 · 5462 in / 1146 out tokens · 48856 ms · 2026-05-15T18:17:09.803109+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Rethinking Electro-Optical Vision Foundation Models for Remote Sensing Retrieval: A Controlled Comparison with Generalist VFM

    cs.CV 2026-05 unverdicted novelty 5.0

    Strong generalist vision foundation models match or outperform electro-optical specific models in remote sensing retrieval with better cross-scene stability.

Reference graph

Works this paper leans on

53 extracted references · 53 canonical work pages · cited by 1 Pith paper · 5 internal anchors

  1. [1]

    In: ICLR (2022)

    Bao, H., Dong, L., Piao, S., Wei, F.: Beit: Bert pre-training of image transformers. In: ICLR (2022)

  2. [2]

    Bartolomei, L., Poggi, M., Conti, A., Tosi, F., Mattoccia, S.: Revisiting depth completion from a stereo matching perspective for cross-domain generalization. In: 3DV. pp. 1360–1370. IEEE (2024)

  3. [3]

    In: CVPR

    Bhat, S.F., Alhashim, I., Wonka, P.: Adabins: Depth estimation using adaptive bins. In: CVPR. pp. 4009–4018 (2021)

  4. [4]

    In: ECCV

    Bhat, S.F., Alhashim, I., Wonka, P.: Localbins: Improving depth estimation by learning local distributions. In: ECCV. pp. 480–496. Springer (2022)

  5. [5]

    Midas v3

    Birkl, R., Wofk, D., Müller, M.: Midas v3. 1–a model zoo for robust monocular relative depth estimation. arXiv preprint arXiv:2307.14460 (2023)

  6. [6]

    Depth Pro: Sharp Monocular Metric Depth in Less Than a Second

    Bochkovskii, A., Delaunoy, A., Germain, H., Santos, M., Zhou, Y., Richter, S.R., Koltun, V.: Depth pro: Sharp monocular metric depth in less than a second. arXiv preprint arXiv:2410.02073 (2024)

  7. [7]

    In: ICCV

    Caron, M., Touvron, H., Misra, I., Jégou, H., Mairal, J., Bojanowski, P., Joulin, A.: Emerging properties in self-supervised vision transformers. In: ICCV. pp. 9650– 9660 (2021)

  8. [8]

    Structure-Aware Residual Pyramid Network for Monocular Depth Estimation

    Chen, X., Chen, X., Zha, Z.J.: Structure-aware residual pyramid network for monocular depth estimation. arXiv preprint arXiv:1907.06023 (2019)

  9. [9]

    In: ACCV

    Chodosh, N., Wang, C., Lucey, S.: Deep convolutional compressed sensing for lidar depth completion. In: ACCV. pp. 499–513. Springer (2018)

  10. [10]

    Conti,A.,Poggi,M.,Mattoccia,S.:Sparsityagnosticdepthcompletion.In:WACV. pp. 5871–5880 (2023) Depth in One Rank 25

  11. [11]

    NeurIPS27(2014)

    Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. NeurIPS27(2014)

  12. [12]

    In: CVPR

    Fu, H., Gong, M., Wang, C., Batmanghelich, K., Tao, D.: Deep ordinal regression network for monocular depth estimation. In: CVPR. pp. 2002–2011 (2018)

  13. [13]

    In: ECCV

    Fu, X., Yin, W., Hu, M., Wang, K., Ma, Y., Tan, P., Shen, S., Lin, D., Long, X.: Geowizard: Unleashing the diffusion priors for 3d geometry estimation from a single image. In: ECCV. pp. 241–258. Springer (2024)

  14. [14]

    The international journal of robotics research32(11), 1231–1237 (2013)

    Geiger, A., Lenz, P., Stiller, C., Urtasun, R.: Vision meets robotics: The kitti dataset. The international journal of robotics research32(11), 1231–1237 (2013)

  15. [15]

    In: CVPR

    Guizilini, V., Ambrus, R., Pillai, S., Raventos, A., Gaidon, A.: 3d packing for self-supervised monocular depth estimation. In: CVPR. pp. 2485–2494 (2020)

  16. [16]

    Hao, Z., Li, Y., You, S., Lu, F.: Detail preserving depth estimation from a single image using attention guided networks. In: 3DV. pp. 304–313. IEEE (2018)

  17. [17]

    In: CVPR (2022)

    He, K., Chen, X., Xie, S., Li, Y., Dollár, P., Girshick, R.: Masked autoencoders are scalable vision learners. In: CVPR (2022)

  18. [18]

    In: WACV

    Hu, J., Ozay, M., Zhang, Y., Okatani, T.: Revisiting single image depth estimation: Toward higher resolution maps with accurate object boundaries. In: WACV. pp. 1043–1051. IEEE (2019)

  19. [19]

    In: AAAI (2025)

    Hyoseok, L., Kim, K.S., Byung-Ki, K., Oh, T.H.: Zero-shot depth completion via test-time alignment with affine-invariant depth prior. In: AAAI (2025)

  20. [20]

    In: ICCV

    Jeong, C., Bae, I., Park, J.H., Jeon, H.G.: Test-time prompt tuning for zero-shot depth completion. In: ICCV. pp. 9443–9454 (2025)

  21. [21]

    In: ECCV

    Jun, J., Lee, J.H., Lee, C., Kim, C.S.: Depth map decomposition for monocular depth estimation. In: ECCV. pp. 18–34. Springer (2022)

  22. [22]

    In: CVPR

    Ke, B., Obukhov, A., Huang, S., Metzger, N., Daudt, R.C., Schindler, K.: Re- purposing diffusion-based image generators for monocular depth estimation. In: CVPR. pp. 9492–9502 (2024)

  23. [23]

    In: Proceedings of the European Conference on Computer Vision Workshop (ECCVW)

    Koch, T., Liebel, L., Fraundorfer, F., Korner, M.: Evaluation of cnn-based single- image depth estimation methods. In: Proceedings of the European Conference on Computer Vision Workshop (ECCVW). pp. 0–0 (2018)

  24. [24]

    Laina, I., Rupprecht, C., Belagiannis, V., Tombari, F., Navab, N.: Deeper depth prediction with fully convolutional residual networks. In: 3DV. pp. 239–248. IEEE (2016)

  25. [25]

    From big to small: Multi-scale local planar guidance for monocular depth estimation,

    Lee, J.H., Han, M.K., Ko, D.W., Suh, I.H.: From big to small: Multi-scale local planar guidance for monocular depth estimation. arXiv preprint arXiv:1907.10326 (2019)

  26. [26]

    In: ECCV

    Leroy, V., Cabon, Y., Revaud, J.: Grounding image matching in 3d with mast3r. In: ECCV. pp. 71–91. Springer (2024)

  27. [27]

    Depth Anything 3: Recovering the Visual Space from Any Views

    Lin, H., Chen, S., Liew, J., Chen, D.Y., Li, Z., Shi, G., Feng, J., Kang, B.: Depth anything 3: Recovering the visual space from any views. arXiv preprint arXiv:2511.10647 (2025)

  28. [28]

    In: CVPR

    Lin, H., Peng, S., Chen, J., Peng, S., Sun, J., Liu, M., Bao, H., Feng, J., Zhou, X., Kang, B.: Prompting depth anything for 4k resolution accurate metric depth estimation. In: CVPR. pp. 17070–17080 (2025)

  29. [29]

    Depthlab: From partial to complete.arXiv preprint arXiv:2412.18153, 2024

    Liu, Z., Cheng, K.L., Wang, Q., Wang, S., Ouyang, H., Tan, B., Zhu, K., Shen, Y., Chen, Q., Luo, P.: Depthlab: From partial to complete. arXiv preprint arXiv:2412.18153 (2024)

  30. [30]

    In: ICRA

    Ma, F., Karaman, S.: Sparse-to-dense: Depth prediction from sparse depth samples and a single image. In: ICRA. pp. 4796–4803. IEEE (2018) 26 Minseok Seo ∗, Wonjun Lee∗, Jaehyuk Jang, and Changick Kim†

  31. [31]

    Advances in neural information processing systems27 (2014)

    Montúfar, G., Pascanu, R., Cho, K., Bengio, Y.: On the number of linear regions of deep neural networks. Advances in neural information processing systems27 (2014)

  32. [32]

    Towards stable test-time adaptation in dynamic wild world,

    Niu, S., Wu, J., Zhang, Y., Wen, Z., Chen, Y., Zhao, P., Tan, M.: Towards sta- ble test-time adaptation in dynamic wild world. arXiv preprint arXiv:2302.12400 (2023)

  33. [33]

    TMLR (2024)

    Oquab, M., Darcet, T., Moutakanni, T., Vo, H.V., Szafraniec, M., Khalidov, V., Fernandez, P., HAZIZA, D., Massa, F., El-Nouby, A., Assran, M., Ballas, N., Galuba, W., Howes, R., Huang, P.Y., Li, S.W., Misra, I., Rabbat, M., Sharma, V., Synnaeve, G., Xu, H., Jegou, H., Mairal, J., Labatut, P., Joulin, A., Bojanowski, P.: DINOv2: Learning robust visual feat...

  34. [34]

    In: ECCV

    Park, J., Joo, K., Hu, Z., Liu, C.K., So Kweon, I.: Non-local spatial propagation network for depth completion. In: ECCV. pp. 120–136. Springer (2020)

  35. [35]

    UniDepthV2: Universal Monocular Metric Depth Estimation Made Simpler

    Piccinelli, L., Sakaridis, C., Yang, Y.H., Segu, M., Li, S., Abbeloos, W., Van Gool, L.: Unidepthv2: Universal monocular metric depth estimation made simpler. arXiv preprint arXiv:2502.20110 (2025)

  36. [36]

    In: CVPR

    Piccinelli, L., Yang, Y.H., Sakaridis, C., Segu, M., Li, S., Van Gool, L., Yu, F.: Unidepth: Universal monocular metric depth estimation. In: CVPR. pp. 10106– 10116 (2024)

  37. [37]

    arXiv preprint arXiv:2601.02760 (2026)

    Ren, Z., Zhang, Z., Li, W., Liu, Q., Tang, H.: Anydepth: Depth estimation made easy. arXiv preprint arXiv:2601.02760 (2026)

  38. [38]

    arXiv preprint arXiv:2511.16301 (2025)

    Seo, M., Hamilton, M., Kim, C.: Upsample anything: A simple and hard to beat baseline for feature upsampling. arXiv preprint arXiv:2511.16301 (2025)

  39. [39]

    In: ECCV

    Silberman, N., Hoiem, D., Kohli, P., Fergus, R.: Indoor segmentation and support inference from rgbd images. In: ECCV. pp. 746–760. Springer (2012)

  40. [40]

    DINOv3

    Siméoni, O., Vo, H.V., Seitzer, M., Baldassarre, F., Oquab, M., Jose, C., Khali- dov, V., Szafraniec, M., Yi, S., Ramamonjisoa, M., et al.: Dinov3. arXiv preprint arXiv:2508.10104 (2025)

  41. [41]

    In: CVPR

    Tang, J., Tian, F.P., An, B., Li, J., Tan, P.: Bilateral propagation network for depth completion. In: CVPR. pp. 9763–9772 (2024)

  42. [42]

    Uhrig, J., Schneider, N., Schneider, L., Franke, U., Brox, T., Geiger, A.: Sparsity invariant cnns. In: 3DV. pp. 11–20. IEEE (2017)

  43. [43]

    In: ICCV

    Viola, M., Qu, K., Metzger, N., Ke, B., Becker, A., Schindler, K., Obukhov, A.: Marigold-dc: Zero-shot monocular depth completion with guided diffusion. In: ICCV. pp. 5359–5370 (2025)

  44. [44]

    In: CVPR

    Wang, J., Chen, M., Karaev, N., Vedaldi, A., Rupprecht, C., Novotny, D.: Vggt: Visual geometry grounded transformer. In: CVPR. pp. 5294–5306 (2025)

  45. [45]

    IEEE Robotics and Automation Letters (RA-L)5(2), 1899–1906 (2020)

    Wong, A., Fei, X., Tsuei, S., Soatto, S.: Unsupervised depth completion from visual inertial odometry. IEEE Robotics and Automation Letters (RA-L)5(2), 1899–1906 (2020)

  46. [46]

    In: CVPR

    Yang,L.,Kang,B.,Huang,Z.,Xu,X.,Feng,J.,Zhao,H.:Depthanything:Unleash- ing the power of large-scale unlabeled data. In: CVPR. pp. 10371–10381 (2024)

  47. [47]

    NeurIPS37, 21875–21911 (2024)

    Yang, L., Kang, B., Huang, Z., Zhao, Z., Xu, X., Feng, J., Zhao, H.: Depth anything v2. NeurIPS37, 21875–21911 (2024)

  48. [48]

    In: ICCV

    Yin, W., Liu, Y., Shen, C., Yan, Y.: Enforcing geometric constraints of virtual normal for depth prediction. In: ICCV. pp. 5684–5693 (2019)

  49. [49]

    In: ICCV

    Yin, W., Zhang, C., Chen, H., Cai, Z., Yu, G., Wang, K., Chen, X., Shen, C.: Metric3d: Towards zero-shot metric 3d prediction from a single image. In: ICCV. pp. 9043–9053 (2023) Depth in One Rank 27

  50. [50]

    arXiv preprint arXiv:2203.01502 (2022)

    Yuan, W., Gu, X., Dai, Z., Zhu, S., Tan, P.: New crfs: Neural window fully- connected crfs for monocular depth estimation. arXiv preprint arXiv:2203.01502 (2022)

  51. [51]

    In: CVPR

    Zhang, Y., Guo, X., Poggi, M., Zhu, Z., Huang, G., Mattoccia, S.: Completion- former: Depth completion with convolutions and vision transformers. In: CVPR. pp. 18527–18536 (2023)

  52. [52]

    In: ICLR (2022)

    Zhou, J., Wei, C., Wang, H., Shen, W., Xie, C., Yuille, A., Kong, T.: ibot: Image bert pre-training with online tokenizer. In: ICLR (2022)

  53. [53]

    In: ECCV

    Zuo, Y., Deng, J.: Ogni-dc: Robust depth completion with optimization-guided neural iterations. In: ECCV. pp. 78–95. Springer (2024)