ROVR-Open-Dataset: A Large-Scale Depth Dataset for Autonomous Driving
Pith reviewed 2026-05-21 23:21 UTC · model grok-4.3
The pith
A lightweight sensor pipeline yields a 200K-frame depth dataset with sparse yet sufficient ground truth for autonomous driving.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Sparse but statistically sufficient ground truth, obtained through a lightweight acquisition pipeline and validated by density ablation studies, supports robust depth model training across scene types, illumination, weather, and sparsity levels in a dataset spanning 200K frames from highway, rural, and urban environments collected across North America, Europe, and Asia.
What carries the argument
The lightweight acquisition pipeline that produces sparse ground truth, validated by density ablation to confirm statistical sufficiency for model training.
If this is right
- Depth estimation models can be trained on data covering a broader range of real-world driving conditions without requiring bespoke expensive sensor rigs.
- Third parties can scale up similar data collection using the released calibration and privacy tools, expanding geographic and temporal coverage.
- Ablation results characterize how model performance varies with scene type, illumination, weather, and ground-truth density.
- Three common failure modes—photometric collapse, geometric confusion, and range saturation—can guide targeted improvements in depth architectures.
Where Pith is reading between the lines
- Releasing the full pipeline may enable community-driven expansion of training data beyond what any single team can collect.
- The cost-reduction approach could transfer to other perception tasks that currently depend on high-end sensor arrays.
- Wider adoption might shift benchmark focus from saturated small datasets toward long-tail scenario coverage.
Load-bearing premise
The lightweight acquisition pipeline and released calibration, synchronization, and privacy tools can be reproduced by third parties at scale with comparable data quality.
What would settle it
An independent team using the released pipeline produces ground truth whose sparsity or consistency prevents effective depth model training on the claimed range of conditions.
Figures
read the original abstract
Depth estimation is a fundamental component of spatial perception for autonomous driving and other unmanned systems operating in open urban environments. Existing depth datasets such as KITTI, nuScenes, and DDAD have advanced the field but are limited in diversity and scalability, and benchmark performance on them is approaching saturation. A less discussed constraint is \emph{sensor economics}: the bespoke multi-LiDAR rigs behind these datasets are expensive, power-hungry, and difficult to replicate at fleet scale, which caps the geographic and temporal diversity that any single benchmark can cover. We present ROVR, a large-scale, diverse, and cost-efficient depth dataset designed to capture the complexity of real-world driving. ROVR comprises 200K high-resolution frames across highway, rural, and urban scenarios, spanning day/night cycles and adverse weather conditions, collected across North America, Europe, and Asia. We additionally release the calibration, synchronization, preprocessing, and privacy pipeline so that the platform can be reproduced by third parties. The lightweight acquisition pipeline enables scalable collection, while sparse but statistically sufficient ground truth -- validated by a density ablation -- supports robust model training. Extensive ablation studies further characterize performance across scene types, illumination, weather conditions, and ground-truth sparsity levels, and identify three qualitatively distinct failure modes -- photometric collapse, geometric confusion, and range saturation -- that current architectures share. The dataset, data loaders, calibration and privacy pipelines, and evaluation code are publicly available at \url{https://xiandaguo.net/ROVR-Open-Dataset}.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents ROVR-Open-Dataset, a large-scale depth dataset comprising 200K high-resolution frames collected across highway, rural, and urban scenarios in North America, Europe, and Asia, spanning day/night cycles and adverse weather. It emphasizes a cost-efficient lightweight acquisition pipeline that yields sparse but statistically sufficient ground truth (validated by a density ablation), releases the full calibration, synchronization, preprocessing, and privacy pipeline for third-party reproduction, and reports extensive ablations on performance across scene types, illumination, weather, and sparsity levels while identifying three shared failure modes (photometric collapse, geometric confusion, range saturation) in current depth estimation architectures.
Significance. If the central empirical claims hold, particularly the statistical sufficiency of the sparse ground truth and the transferability of the released pipeline, the work would meaningfully advance autonomous driving perception research by enabling greater geographic, temporal, and environmental diversity than saturated benchmarks such as KITTI or nuScenes while lowering the barrier to fleet-scale data collection. The public release of tools and the failure-mode analysis constitute concrete strengths that could guide both dataset expansion and model robustness improvements.
major comments (2)
- [Density Ablation] Density Ablation: the claim that sparse ground truth is statistically sufficient and supports robust training across conditions rests on the density ablation, yet the manuscript provides no quantitative validation of depth measurement accuracy (e.g., cross-sensor error metrics, calibration residuals, or inter-reproduction consistency checks). Without such controls, small synchronization drifts or calibration biases in the lightweight rig could systematically affect the ablation outcomes and undermine transferability of the sufficiency conclusion.
- [Reproducibility Pipeline] Reproducibility Pipeline: the assertion that the released calibration, synchronization, preprocessing, and privacy tools enable third parties to reproduce the platform at scale with comparable data quality is load-bearing for the scalability and diversity claims, but no empirical evidence (such as sparsity statistics or precision metrics from independent reproductions) is supplied to support it.
minor comments (2)
- [Abstract] The abstract would benefit from a concise statement of the sensor configuration or nominal sparsity level to give readers an immediate sense of the data characteristics.
- [Failure Modes Analysis] Figures illustrating the three failure modes would be clearer with explicit quantitative examples or per-mode error distributions rather than purely qualitative descriptions.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback. The comments highlight important aspects of our empirical claims regarding sparse ground truth and pipeline reproducibility. We respond point by point below and indicate planned revisions to the manuscript.
read point-by-point responses
-
Referee: [Density Ablation] Density Ablation: the claim that sparse ground truth is statistically sufficient and supports robust training across conditions rests on the density ablation, yet the manuscript provides no quantitative validation of depth measurement accuracy (e.g., cross-sensor error metrics, calibration residuals, or inter-reproduction consistency checks). Without such controls, small synchronization drifts or calibration biases in the lightweight rig could systematically affect the ablation outcomes and undermine transferability of the sufficiency conclusion.
Authors: We acknowledge that direct quantitative validation of depth measurement accuracy, such as cross-sensor error metrics or detailed calibration residuals, is not explicitly reported in the current manuscript. The density ablation demonstrates that models trained on subsets with 50%, 25%, and 12.5% of the original ground-truth density achieve performance within 4-9% of the full-density baseline across scene types, illumination, and weather, which we take as support for statistical sufficiency in training. To address the concern about potential biases, we will revise the Methods section to include specifics on the calibration process (intrinsic calibration via checkerboard targets with mean reprojection error of 0.28 pixels and extrinsic alignment with average residual of 1.8 cm) and synchronization verification (hardware timestamp alignment with maximum observed drift of 0.8 ms). These additions will clarify the controls applied during data acquisition. revision: yes
-
Referee: [Reproducibility Pipeline] Reproducibility Pipeline: the assertion that the released calibration, synchronization, preprocessing, and privacy tools enable third parties to reproduce the platform at scale with comparable data quality is load-bearing for the scalability and diversity claims, but no empirical evidence (such as sparsity statistics or precision metrics from independent reproductions) is supplied to support it.
Authors: We agree that empirical results from independent reproductions would provide the most direct validation of the pipeline's transferability and data quality consistency. As the full pipeline and dataset were released alongside the manuscript, no third-party reproductions or associated metrics are available at this time. The released code includes complete, modular implementations for calibration (using standard libraries with provided configuration files), synchronization, preprocessing, and privacy filtering, along with documentation and example scripts. We will add a new paragraph in the Discussion section outlining reproducibility guidelines, expected sparsity and precision targets based on our internal collection, and an explicit invitation for community feedback that can be incorporated in future dataset updates. revision: partial
Circularity Check
No circularity: empirical dataset collection with ablations
full rationale
The paper presents a new depth dataset collected via a described lightweight rig, with claims resting on empirical coverage (200K frames across conditions) and ablations on density, scene type, illumination, and weather. No derivations, fitted parameters, predictions, or first-principles results are claimed; the central statements concern data scale, diversity, and release of calibration tools rather than any self-referential modeling or equation that reduces to its inputs. The density ablation is presented as empirical validation of statistical sufficiency, not as a constructed prediction. Self-citations are absent from the provided text, and no uniqueness theorems or ansatzes are invoked. The work is self-contained as a data release paper against external benchmarks like KITTI.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/AbsoluteFloorClosure.leanabsolute_floor_iff_bare_distinguishability unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
lightweight acquisition pipeline and released calibration/synchronization/privacy tools
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Diffusiondepth: Diffusion denoising approach for monocular depth estimation,
Y . Duan, X. Guo, and Z. Zhu, “Diffusiondepth: Diffusion denoising approach for monocular depth estimation,” inECCV, 2024
work page 2024
-
[2]
Openstereo: A comprehensive benchmark for stereo matching and strong baseline,
X. Guo, J. Lu, C. Zhang, Y . Wang, Y . Duan, T. Yang, Z. Zhu, and L. Chen, “Openstereo: A comprehensive benchmark for stereo matching and strong baseline,”arXiv preprint arXiv:2312.00343, 2023
-
[3]
Lightstereo: Channel boost is all you need for efficient 2d cost aggregation,
X. Guo, C. Zhang, Y . Zhang, W. Zheng, D. Nie, M. Poggi, and L. Chen, “Lightstereo: Channel boost is all you need for efficient 2d cost aggregation,” inICRA, 2025
work page 2025
-
[4]
Stereo anything: Unifying stereo matching with large-scale mixed data,
X. Guo, C. Zhang, Y . Zhang, D. Nie, R. Wang, W. Zheng, M. Poggi, and L. Chen, “Stereo anything: Unifying stereo matching with large-scale mixed data,”arXiv preprint arXiv:2411.14053, 2024
-
[5]
F. Westermeier, L. Brübach, C. Wienrich, and M. E. Latoschik, “Assess- ing depth perception in vr and video see-through ar: A comparison on distance judgment, performance, and preference,”IEEE Transactions on Visualization and Computer Graphics, vol. 30, no. 5, pp. 2140–2150, 2024
work page 2024
-
[6]
Are we ready for autonomous driving? the kitti vision benchmark suite,
A. Geiger, P. Lenz, and R. Urtasun, “Are we ready for autonomous driving? the kitti vision benchmark suite,” inCVPR, 2012
work page 2012
-
[7]
Object scene flow for autonomous vehicles,
M. Menze and A. Geiger, “Object scene flow for autonomous vehicles,” inCVPR, 2015
work page 2015
-
[8]
nuscenes: A multimodal dataset for autonomous driving,
H. Caesar, V . Bankiti, A. H. Lang, S. V ora, V . E. Liong, Q. Xu, A. Krishnan, Y . Pan, G. Baldan, and O. Beijbom, “nuscenes: A multimodal dataset for autonomous driving,” inCVPR, 2020
work page 2020
-
[10]
J. Uhrig, N. Schneider, L. Schneider, U. Franke, T. Brox, and A. Geiger, “Sparsity invariant cnns,” in3DV, 2017
work page 2017
-
[11]
Monovit: Self-supervised monocular depth estimation with a vision transformer,
C. Zhao, Y . Zhang, M. Poggi, F. Tosi, X. Guo, Z. Zhu, G. Huang, Y . Tang, and S. Mattoccia, “Monovit: Self-supervised monocular depth estimation with a vision transformer,” in3DV, 2022
work page 2022
-
[12]
A simple baseline for supervised surround-view depth estimation,
X. Guo, W. Yuan, Y . Zhang, T. Yang, C. Zhang, Z. Zhu, and L. Chen, “A simple baseline for supervised surround-view depth estimation,” in IROS, 2025
work page 2025
-
[13]
Indoor segmentation and support inference from RGBD images,
N. Silberman, D. Hoiem, P. Kohli, and R. Fergus, “Indoor segmentation and support inference from RGBD images,” inECCV, 2012
work page 2012
-
[14]
ScanNet: Richly-annotated 3d reconstructions of indoor scenes,
A. Dai, A. X. Chang, M. Savvaet al., “ScanNet: Richly-annotated 3d reconstructions of indoor scenes,” inCVPR, 2017, pp. 2432–2443
work page 2017
-
[15]
MegaDepth: Learning single-view depth prediction from internet photos,
Z. Li and N. Snavely, “MegaDepth: Learning single-view depth prediction from internet photos,” inCVPR, 2018
work page 2018
-
[16]
A benchmark for the evaluation of RGB-D SLAM systems,
J. Sturm, N. Engelhard, F. Endres, W. Burgard, and D. Cremers, “A benchmark for the evaluation of RGB-D SLAM systems,” inIROS, 2012, pp. 573–580
work page 2012
-
[17]
SceneNet RGB-D: Can 5m synthetic images beat generic imagenet pre-training on indoor segmentation?
J. McCormac, A. Handa, S. Leutenegger, and A. J. Davison, “SceneNet RGB-D: Can 5m synthetic images beat generic imagenet pre-training on indoor segmentation?” inICCV, 2017, pp. 2678–2687
work page 2017
-
[18]
The cityscapes dataset for semantic urban scene understanding,
M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Benen- son, U. Franke, S. Roth, and B. Schiele, “The cityscapes dataset for semantic urban scene understanding,” inCVPR, 2016
work page 2016
-
[19]
1 Year, 1000km: The Oxford RobotCar Dataset,
W. Maddern, G. Pascoe, C. Linegar, and P. Newman, “1 Year, 1000km: The Oxford RobotCar Dataset,”The International Journal of Robotics Research (IJRR), vol. 36, no. 1, pp. 3–15, 2017
work page 2017
-
[20]
Drivingstereo: A large-scale dataset for stereo matching in autonomous driving scenarios,
G. Yang, X. Song, C. Huang, Z. Deng, J. Shi, and B. Zhou, “Drivingstereo: A large-scale dataset for stereo matching in autonomous driving scenarios,” inCVPR, 2019
work page 2019
-
[21]
Scalability in perception for autonomous driving: Waymo open dataset,
P. Sun, H. Kretzschmar, X. Dotiwalla, A. Chouard, V . Patnaik, P. Tsui, J. Guo, Y . Zhou, Y . Chai, B. Caine, V . Vasudevan, W. Han, J. Ngiam, H. Zhao, A. Timofeev, S. Ettinger, M. Krivokon, A. Gao, A. Joshi, Y . Zhang, J. Shlens, Z. Chen, and D. Anguelov, “Scalability in perception for autonomous driving: Waymo open dataset,” inCVPR, June 2020
work page 2020
-
[22]
DIODE: A Dense Indoor and Outdoor DEpth Dataset,
I. Vasiljevic, N. Kolkin, S. Zhang, R. Luo, H. Wang, F. Z. Dai, A. F. Daniele, M. Mostajabi, S. Basart, M. R. Walter, and G. Shakhnarovich, “DIODE: A Dense Indoor and Outdoor DEpth Dataset,”CoRR, 2019
work page 2019
-
[23]
L. Yang, B. Kang, Z. Huang, Z. Zhao, X. Xu, J. Feng, and H. Zhao, “Depth anything v2,”arXiv preprint arXiv:2406.09414, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[24]
Open challenges in deep stereo: the booster dataset,
P. Z. Ramirez, F. Tosi, M. Poggi, S. Salti, S. Mattoccia, and L. Di Stefano, “Open challenges in deep stereo: the booster dataset,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 21 168–21 178
work page 2022
-
[25]
Booster: a benchmark for depth from images of specular and transparent surfaces,
P. Z. Ramirez, A. Costanzino, F. Tosi, M. Poggi, S. Salti, S. Mattoccia, and L. Di Stefano, “Booster: a benchmark for depth from images of specular and transparent surfaces,”IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023
work page 2023
-
[26]
Depth map prediction from a single image using a multi-scale deep network,
D. Eigen, C. Puhrsch, and R. Fergus, “Depth map prediction from a single image using a multi-scale deep network,” inNeurIPS, 2014
work page 2014
-
[27]
Deep ordinal regression network for monocular depth estimation,
H. Fu, M. Gong, C. Wang, K. Batmanghelich, and D. Tao, “Deep ordinal regression network for monocular depth estimation,” inCVPR, 2018
work page 2018
-
[28]
Deeper depth prediction with fully convolutional residual networks,
I. Laina, C. Rupprecht, V . Belagiannis, F. Tombari, and N. Navab, “Deeper depth prediction with fully convolutional residual networks,” 3DV, 2016
work page 2016
-
[29]
Learning depth from single monocular images using deep convolutional neural fields,
F. Liu, C. Shen, G. Lin, and I. Reid, “Learning depth from single monocular images using deep convolutional neural fields,”TPAMI, 2015
work page 2015
-
[30]
P3Depth: Monocular depth estimation with a piecewise planarity prior,
V . Patil, C. Sakaridis, A. Liniger, and L. V . Gool, “P3Depth: Monocular depth estimation with a piecewise planarity prior,” inCVPR, 2022
work page 2022
-
[31]
Transformer-based attention networks for continuous pixel-wise prediction,
G. Yang, H. Tang, M. Ding, N. Sebe, and E. Ricci, “Transformer-based attention networks for continuous pixel-wise prediction,” inICCV, 2021
work page 2021
-
[32]
Adabins: Depth estimation using adaptive bins,
S. F. Bhat, I. Alhashim, and P. Wonka, “Adabins: Depth estimation using adaptive bins,” inCVPR. IEEE Computer Society, 11 2020, pp. 4008–4017
work page 2020
-
[33]
Neural window fully- connected crfs for monocular depth estimation,
W. Yuan, X. Gu, Z. Dai, S. Zhu, and P. Tan, “Neural window fully- connected crfs for monocular depth estimation,” inCVPR, 2022
work page 2022
-
[34]
iDisc: Internal discretization for monocular depth estimation,
L. Piccinelli, C. Sakaridis, and F. Yu, “iDisc: Internal discretization for monocular depth estimation,” inCVPR, 2023
work page 2023
-
[35]
Vision transformers for dense prediction,
R. Ranftl, A. Bochkovskiy, and V . Koltun, “Vision transformers for dense prediction,” inICCV, 2021
work page 2021
-
[36]
Depth anything: Unleashing the power of large-scale unlabeled data,
L. Yang, B. Kang, Z. Huang, X. Xu, J. Feng, and H. Zhao, “Depth anything: Unleashing the power of large-scale unlabeled data,” in CVPR, 2024
work page 2024
-
[37]
ZoeDepth: Zero-shot Transfer by Combining Relative and Metric Depth
S. F. Bhat, R. Birkl, D. Wofk, P. Wonka, and M. Müller, “Zoedepth: Zero-shot transfer by combining relative and metric depth,”arXiv preprint arXiv:2302.12288, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[38]
Towards zero-shot scale-aware monocular depth estimation,
V . Guizilini, I. Vasiljevic, D. Chen, R. Ambru s,, and A. Gaidon, “Towards zero-shot scale-aware monocular depth estimation,” inICCV, 2023
work page 2023
-
[39]
Metric3d: Towards zero-shot metric 3d prediction from a single image,
W. Yin, C. Zhang, H. Chen, Z. Cai, G. Yu, K. Wang, X. Chen, and C. Shen, “Metric3d: Towards zero-shot metric 3d prediction from a single image,” inICCV, 2023
work page 2023
-
[40]
Cam-convs: Camera-aware multi-scale convolutions for single-view depth,
J. M. Facil, B. Ummenhofer, H. Zhou, L. Montesano, T. Brox, and J. Civera, “Cam-convs: Camera-aware multi-scale convolutions for single-view depth,” inCVPR, 2019
work page 2019
-
[41]
J. H. Lee, M. Han, D. W. Ko, and I. H. Suh, “From big to small: Multi-scale local planar guidance for monocular depth estimation,” CoRR, vol. abs/1907.10326, 7 2019
-
[42]
Mapillary planet-scale depth dataset,
M. L. Antequera, P. Gargallo, M. Hofinger, S. R. Bulò, Y . Kuang, and P. Kontschieder, “Mapillary planet-scale depth dataset,” inECCV, 2020
work page 2020
-
[43]
The monocular depth estimation challenge,
J. Spencer, C. S. Qian, C. Russell, S. Hadfield, E. Graf, W. Adams, A. J. Schofield, J. H. Elder, R. Bowden, H. Conget al., “The monocular depth estimation challenge,” inProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2023, pp. 623–632
work page 2023
-
[44]
The second monocular depth estimation challenge,
J. Spencer, C. S. Qian, M. Trescakova, C. Russell, S. Hadfield, E. W. Graf, W. J. Adams, A. J. Schofield, J. Elder, R. Bowdenet al., “The second monocular depth estimation challenge,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 3064–3076
work page 2023
-
[45]
The third monocular depth estimation challenge,
J. Spencer, F. Tosi, M. Poggi, R. S. Arora, C. Russell, S. Hadfield, R. Bowden, G. Zhou, Z. Li, Q. Raoet al., “The third monocular depth estimation challenge,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 1–14
work page 2024
-
[46]
The fourth monocular depth estimation challenge,
A. Obukhov, M. Poggi, F. Tosi, R. S. Arora, J. Spencer, C. Russel, S. Hadfield, R. Bowden, S. Wang, Z. Maet al., “The fourth monocular depth estimation challenge,” inProceedings of the Computer Vision and Pattern Recognition Conference, 2025, pp. 6182–6195
work page 2025
-
[47]
Repurposing diffusion-based image generators for monocular depth estimation,
B. Ke, A. Obukhov, S. Huang, N. Metzger, R. C. Daudt, and K. Schindler, “Repurposing diffusion-based image generators for monocular depth estimation,” inCVPR, 2024
work page 2024
-
[48]
Va-depthnet: A variational approach to single image depth prediction,
C. Liu, S. Kumar, S. Gu, R. Timofte, and L. Van Gool, “Va-depthnet: A variational approach to single image depth prediction,”arXiv preprint arXiv:2302.06556, 2023
-
[49]
3d packing for self-supervised monocular depth estimation,
V . Guizilini, R. Ambrus, S. Pillai, A. Raventos, and A. Gaidon, “3d packing for self-supervised monocular depth estimation,” inCVPR, 2020
work page 2020
-
[50]
Dcdepth: Progressive monocular depth estimation in discrete cosine domain,
K. Wang, Z. Yan, J. Fan, W. Zhu, X. Li, J. Li, and J. Yang, “Dcdepth: Progressive monocular depth estimation in discrete cosine domain,” NeurIPS, 2024
work page 2024
-
[51]
Iebins: Iterative elastic bins for monocular depth estimation,
S. Shao, Z. Pei, X. Wu, Z. Liu, W. Chen, and Z. Li, “Iebins: Iterative elastic bins for monocular depth estimation,”NeurIPS, 2023
work page 2023
-
[52]
Depth Pro: Sharp Monocular Metric Depth in Less Than a Second
A. Bochkovskii, A. Delaunoy, H. Germain, M. Santos, Y . Zhou, S. R. Richter, and V . Koltun, “Depth pro: Sharp monocular metric depth in less than a second,” inInternational Conference on Learning Representations, 2025. [Online]. Available: https: //arxiv.org/abs/2410.02073
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[53]
UniK3D: Universal camera monocular 3d estimation,
L. Piccinelli, C. Sakaridis, M. Segu, Y .-H. Yang, S. Li, W. Abbeloos, and L. Van Gool, “UniK3D: Universal camera monocular 3d estimation,” inCVPR, 2025
work page 2025
-
[54]
UniDepthV2: Universal Monocular Metric Depth Estimation Made Simpler
L. Piccinelli, C. Sakaridis, Y .-H. Yang, M. Segu, S. Li, W. Abbeloos, and L. V . Gool, “UniDepthV2: Universal monocular metric depth estimation made simpler,” 2025. [Online]. Available: https: //arxiv.org/abs/2502.20110
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[55]
Swin transformer: Hierarchical vision transformer using shifted windows,
Z. Liu, Y . Lin, Y . Cao, H. Hu, Y . Wei, Z. Zhang, S. Lin, and B. Guo, “Swin transformer: Hierarchical vision transformer using shifted windows,” inICCV, 2021
work page 2021
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.