UAVFF3D: A Geometry-Aware Benchmark for Feed-Forward UAV 3D Reconstruction

Haifeng Li; Xiang Yang; Yongli Wang; Yunsheng Zhang

arxiv: 2605.17942 · v2 · pith:5AZNVZ74new · submitted 2026-05-18 · 💻 cs.CV

UAVFF3D: A Geometry-Aware Benchmark for Feed-Forward UAV 3D Reconstruction

Xiang Yang , Yongli Wang , HaiFeng Li , Yunsheng Zhang This is my paper

Pith reviewed 2026-05-20 12:56 UTC · model grok-4.3

classification 💻 cs.CV

keywords UAV photogrammetry3D reconstructiondomain adaptationfeed-forward modelscamera geometryoblique viewsbenchmark datasetHFOV ambiguity

0 comments

The pith

Domain adaptation using a new UAV geometry benchmark reduces ray error by up to 84% and pose error by 76% in feed-forward 3D reconstruction.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper argues that feed-forward 3D reconstruction models fail on UAV imagery not just from visual differences but due to specific variations in camera geometry such as oblique viewing angles and ambiguity between field of view and flight height. To test and fix this, it presents UAVFF3D, a benchmark with over 170,000 real drone images and over 370,000 synthetic ones that systematically vary these parameters along with a special test set for the height ambiguity. Experiments then show that retraining or adapting models on this data leads to large gains in accuracy for both camera pose and 3D scene reconstruction. A reader would care because UAVs are increasingly used for mapping and surveying, yet current AI methods still produce unreliable results under realistic flight conditions.

Core claim

UAVFF3D provides a geometry-aware benchmark with more than 170k real UAV images and 370k synthetic images rendered from textured 3D models, covering diverse HFOVs, altitudes, viewing directions and patterns, plus a controlled HFOV-height test subset. Domain adaptation with this benchmark on existing feed-forward models reduces Ray Error by up to 84.2%, Pose ATE by up to 76.0% and Chamfer Distance by up to 41.1%. It also cuts the rotation gap between oblique and nadir views by up to 90.7% and yields more stable results across HFOV settings, with further gains from adding camera priors.

What carries the argument

The UAVFF3D benchmark dataset together with an evaluation protocol that measures camera-geometry estimation and dense reconstruction under a single shared global alignment.

If this is right

Adaptation closes most of the performance gap between oblique and nadir acquisition patterns.
Performance becomes more consistent when HFOV and height vary together.
Adding explicit camera priors boosts results for typical UAV flight geometries.
The joint evaluation avoids over-optimistic scores from independent alignments.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar geometry-focused benchmarks could help other camera-based 3D tasks like SLAM or novel view synthesis in non-standard capture setups.
Instead of adapting after the fact, future models might be trained from the start with explicit modeling of projection ambiguities.
The controlled ambiguity test set offers a way to measure progress on a specific failure mode that general scene-diverse datasets miss.

Load-bearing premise

That the synthetic images from high-quality 3D models accurately represent the camera-geometry variations found in actual UAV flights.

What would settle it

A test where models adapted on UAVFF3D show no improvement or even worse errors on a held-out set of real UAV images with varied oblique angles and HFOV-height pairs.

Figures

Figures reproduced from arXiv: 2605.17942 by Haifeng Li, Xiang Yang, Yongli Wang, Yunsheng Zhang.

**Figure 2.** Figure 2: Overview of the UAVFF3D dataset construction pipeline and dataset characteristics. [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Controlled HFOV–height ambiguity in UAVFF3D-FA. [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: Comparison between the commonly used separate-alignment protocol and the shared-alignment protocol of UAVFF3D. We therefore evaluate the predicted cameras and dense geometry as a coupled reconstruction result. Let the predicted dense point set be Xˆ, the GT point set be X , and the predicted camera poses be {Tˆ i} N i=1. For each predicted reconstruction, we estimate only one scene-level global similarity… view at source ↗

**Figure 5.** Figure 5: Qualitative effect of UAVFF3D fine-tuning. First row: oblique input images, where fine-tuning improves pose and [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗

**Figure 6.** Figure 6: Dataset-level effect of fine-tuning. Each cell reports [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗

**Figure 7.** Figure 7: Controlled HFOV–height diagnosis on UAVFF3D-FA. [PITH_FULL_IMAGE:figures/full_fig_p009_7.png] view at source ↗

**Figure 8.** Figure 8: Geographic coverage of the three LiDAR-supported [PITH_FULL_IMAGE:figures/full_fig_p012_8.png] view at source ↗

**Figure 9.** Figure 9: LiDAR acquisition results. NanFang mainly contains dense low-rise residential buildings and urban blocks. YangHaiTang contains campus-scale and urban structures with distinct facade and roof variations. XiaoXiang Campus contains buildings, vegetation, roads, and waterfront areas, making it suitable for evaluating visibility and geometric consistency in complex real scenes. A.2 Unified Data Representation … view at source ↗

**Figure 10.** Figure 10: Controlled HFOV–height examples in UAVFF3D-FA. [PITH_FULL_IMAGE:figures/full_fig_p013_10.png] view at source ↗

**Figure 11.** Figure 11: Representative scene visualization of UAVFF3D-Real. These examples illustrate the diversity of real UAV acquisition in [PITH_FULL_IMAGE:figures/full_fig_p014_11.png] view at source ↗

**Figure 12.** Figure 12: Representative scene visualization of UAVFF3D-Syn. The synthetic scenes cover diverse UAV camera trajectories and [PITH_FULL_IMAGE:figures/full_fig_p015_12.png] view at source ↗

**Figure 13.** Figure 13: Qualitative visualization results on UAVFF3D-FA. The examples show reconstruction behavior under controlled [PITH_FULL_IMAGE:figures/full_fig_p016_13.png] view at source ↗

**Figure 14.** Figure 14: Qualitative visualization results on UAVFF3D-Real. The examples show reconstruction outputs on real UAV scenes from [PITH_FULL_IMAGE:figures/full_fig_p017_14.png] view at source ↗

**Figure 15.** Figure 15: Qualitative visualization results on UrbanScene3D. The examples illustrate feed-forward reconstruction under oblique [PITH_FULL_IMAGE:figures/full_fig_p018_15.png] view at source ↗

**Figure 16.** Figure 16: Qualitative visualization results on UseGeo. The examples illustrate reconstruction performance on nadir-view UAV [PITH_FULL_IMAGE:figures/full_fig_p018_16.png] view at source ↗

read the original abstract

Feed-forward 3D reconstruction has advanced rapidly, but current models remain unreliable in UAV photogrammetric acquisition. We argue that this failure is caused not only by appearance-domain shift, but also by UAV-specific camera-geometry variations, especially oblique views and HFOV-height ambiguity. Existing UAV datasets mainly emphasize scene diversity and provide limited coverage of camera configurations, which restricts robustness evaluation and UAV-domain adaptation. To address this gap, we introduce UAVFF3D, a geometry-aware real-synthetic benchmark for feed-forward UAV 3D reconstruction. UAVFF3D contains more than 170k real UAV images and more than 370k synthetic images rendered from high-quality textured 3D models, covering diverse HFOVs, flight altitudes, viewing directions, and acquisition patterns. It also includes a controlled HFOV-height test subset for diagnosing projection-geometry ambiguity. We further propose an evaluation protocol that jointly assesses camera-geometry estimation and dense scene reconstruction under a shared global alignment, avoiding the bias caused by separate camera and geometry alignments. Experiments on representative feed-forward reconstruction models show that UAVFF3D-based domain adaptation consistently improves camera and geometry estimation, reducing Ray Error by up to 84.2%, Pose ATE by up to 76.0%, and Chamfer Distance by up to 41.1%. In oblique scenes, adaptation reduces the oblique-nadir rotation gap by up to 90.7%. Under HFOV-height ambiguity, it improves robustness across HFOV-height configurations and yields more stable performance across HFOV settings. Incorporating camera priors further improves reconstruction under UAV-specific acquisition geometries. The dataset and evaluation code are available at https://github.com/yanxian-ll/UAVFF3D .

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

UAVFF3D gives a practical benchmark for geometry issues in UAV 3D recon, with controlled HFOV-height tests and joint evaluation, but the synthetic data's match to real failures needs direct checks.

read the letter

The main takeaway is that this paper supplies a new real-synthetic UAV dataset focused on camera geometry variations that existing collections overlook. It pairs over 170k real images with 370k rendered ones and adds a controlled HFOV-height subset plus a joint evaluation protocol under shared global alignment. That setup lets them measure both pose and reconstruction without separate alignments introducing bias, and the reported adaptation gains look sizable on paper: Ray Error down 84%, Pose ATE down 76%, Chamfer Distance down 41%, with big closure of the oblique-nadir gap in some cases. Incorporating camera priors is also shown to help under those geometries. The work is straightforward about targeting feed-forward models that struggle on drone data for surveying and inspection tasks. The controlled test subset and emphasis on projection ambiguity are the clearest additions relative to prior UAV datasets that mostly chase scene variety. The numbers suggest domain adaptation on this mix can stabilize performance across HFOV settings. The soft spot is the reliance on synthetic renders from high-quality 3D models. The abstract does not show quantitative alignment checks between synthetic and real parameter distributions in the exact failure regimes, so it remains possible that some gains come from scale or generic appearance rather than the intended geometry factors. Full methods and statistical details on baselines would help confirm the claims. This is useful for groups working on UAV photogrammetry or domain adaptation for geometry-sensitive reconstruction. A reader who needs a testbed for oblique and ambiguous camera setups will find concrete value in the protocol and splits. The benchmark idea is solid enough to warrant serious referee time, though reviewers will likely press on the synthetic-to-real fidelity and the exact contribution of the geometry controls versus data volume.

Referee Report

2 major / 2 minor

Summary. The paper introduces UAVFF3D, a geometry-aware benchmark for feed-forward UAV 3D reconstruction containing over 170k real UAV images and over 370k synthetic images rendered from high-quality textured 3D models. It covers diverse HFOVs, flight altitudes, viewing directions, and acquisition patterns, including a controlled HFOV-height test subset. The authors propose a joint evaluation protocol for camera-geometry estimation and dense reconstruction under shared global alignment, and show that domain adaptation on UAVFF3D yields consistent improvements on representative feed-forward models, with reductions of up to 84.2% in Ray Error, 76.0% in Pose ATE, and 41.1% in Chamfer Distance, plus up to 90.7% reduction in the oblique-nadir rotation gap.

Significance. If the synthetic renders are shown to faithfully reproduce the real UAV-specific projection ambiguities that cause failures in existing models, the benchmark and adaptation results would provide a valuable resource for improving robustness in UAV photogrammetry. The public release of the dataset and evaluation code is a clear strength that supports reproducibility and further research on geometry-aware domain adaptation.

major comments (2)

[Abstract and Experiments] Abstract and Experiments section: the central claim attributes the reported metric gains specifically to the introduction of UAV camera-geometry variations (oblique views and HFOV-height ambiguity) via the synthetic renders. No quantitative validation is provided that the distributions of HFOV, altitudes, viewing angles, or ray-sampling statistics in the synthetic data match those of real UAV acquisitions in the failure regimes; without such a check (e.g., Kolmogorov-Smirnov tests or overlaid histograms on the controlled HFOV-height subset), the improvements could arise from generic appearance shift or dataset scale rather than the targeted geometry factors.
[Evaluation Protocol] Evaluation protocol description: the joint global-alignment protocol is presented as avoiding bias from separate camera and geometry alignments, yet the manuscript does not report the sensitivity of the reported percentages (84.2% Ray Error, 76.0% Pose ATE, 41.1% Chamfer Distance) to the choice of alignment reference or to the number of runs; a single-run or post-hoc baseline comparison would undermine the cross-model robustness claim.

minor comments (2)

[Abstract] The abstract and introduction would benefit from an explicit statement of the exact feed-forward models evaluated and the precise train/validation/test splits used for the domain-adaptation experiments.
[Dataset] Figure captions for the HFOV-height subset visualizations should include the exact parameter ranges sampled and the number of images per configuration to allow readers to assess coverage of the ambiguity space.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below and will incorporate revisions to strengthen the validation of our claims.

read point-by-point responses

Referee: [Abstract and Experiments] Abstract and Experiments section: the central claim attributes the reported metric gains specifically to the introduction of UAV camera-geometry variations (oblique views and HFOV-height ambiguity) via the synthetic renders. No quantitative validation is provided that the distributions of HFOV, altitudes, viewing angles, or ray-sampling statistics in the synthetic data match those of real UAV acquisitions in the failure regimes; without such a check (e.g., Kolmogorov-Smirnov tests or overlaid histograms on the controlled HFOV-height subset), the improvements could arise from generic appearance shift or dataset scale rather than the targeted geometry factors.

Authors: We agree that explicit statistical comparisons between the synthetic and real distributions were not reported. The controlled HFOV-height test subset was introduced precisely to isolate and diagnose projection-geometry ambiguity, and the largest gains appear in oblique and HFOV-ambiguous regimes. The synthetic renders are generated from high-quality textured 3D models to reproduce real UAV acquisition patterns. To directly address the concern, we will add overlaid histograms, basic statistics, and Kolmogorov-Smirnov tests comparing HFOV, altitude, and viewing-angle distributions in the revised Experiments section. revision: yes
Referee: [Evaluation Protocol] Evaluation protocol description: the joint global-alignment protocol is presented as avoiding bias from separate camera and geometry alignments, yet the manuscript does not report the sensitivity of the reported percentages (84.2% Ray Error, 76.0% Pose ATE, 41.1% Chamfer Distance) to the choice of alignment reference or to the number of runs; a single-run or post-hoc baseline comparison would undermine the cross-model robustness claim.

Authors: The joint global-alignment protocol uses a single shared reference to ensure consistent evaluation across camera and geometry metrics. The reported figures reflect this fixed protocol applied uniformly to all models. We acknowledge that sensitivity to alternative alignment references or run-to-run variance was not quantified. We will add a sensitivity analysis, including results under different alignment choices and standard deviations over repeated evaluations, to the revised manuscript. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical benchmark and adaptation results are measured against external models

full rationale

The paper introduces the UAVFF3D dataset (real + synthetic images) and an evaluation protocol, then reports empirical gains from domain adaptation on standard metrics (Ray Error, Pose ATE, Chamfer Distance) versus external feed-forward models. No equations, fitted parameters renamed as predictions, or self-citation chains appear in the derivation of the central claims. All reported improvements are direct experimental comparisons to independent baselines, rendering the work self-contained with no reduction of outputs to inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper relies on standard computer vision assumptions for 3D reconstruction and photogrammetry without introducing new mathematical derivations, fitted parameters, or postulated entities.

axioms (1)

domain assumption Existing feed-forward 3D reconstruction models can be meaningfully adapted using additional UAV-specific training data.
The reported gains from domain adaptation presuppose that the models are capable of learning from the new geometry variations.

pith-pipeline@v0.9.0 · 5854 in / 1263 out tokens · 33297 ms · 2026-05-20T12:56:41.770519+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We propose UAVFF3D, a geometry-aware real–synthetic benchmark... unified representation (RGB, intrinsics, rays, poses, depth, masks)... shared scene-level alignment... Ray Error, Pose ATE, Chamfer Distance
IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

HFOV–height ambiguity... controlled test set... projection geometry varied while image footprint remains approximately unchanged

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

52 extracted references · 52 canonical work pages · 2 internal anchors

[1]

Advances in Neural Information Processing Systems , volume=

3d-llm: Injecting the 3d world into large language models , author=. Advances in Neural Information Processing Systems , volume=

work page
[2]

VoxPoser: Composable 3D Value Maps for Robotic Manipulation with Language Models

Voxposer: Composable 3d value maps for robotic manipulation with language models , author=. arXiv preprint arXiv:2307.05973 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[3]

and Salazar, Grecia and Ryoo, Michael S

Zitkovich, Brianna and Yu, Tianhe and Xu, Sichun and Xu, Peng and Xiao, Ted and Xia, Fei and Wu, Jialin and Wohlhart, Paul and Welker, Stefan and Wahid, Ayzaan and Vuong, Quan and Vanhoucke, Vincent and Tran, Huong and Soricut, Radu and Singh, Anikait and Singh, Jaspiar and Sermanet, Pierre and Sanketi, Pannag R. and Salazar, Grecia and Ryoo, Michael S. a...

work page 2023
[4]

Cheng, Xiaoya and Wu, Rouwan and Liu, Xinyi and Cui, Zeyu and Liu, Yan and Zhao, Na and Liu, Yu and Zhang, Maojun and Yan, Shen , journal =

work page
[5]

IEEE Transactions on Pattern Analysis and Machine Intelligence , volume=

Metric3d v2: A versatile monocular geometric foundation model for zero-shot metric depth and surface normal estimation , author=. IEEE Transactions on Pattern Analysis and Machine Intelligence , volume=. 2024 , publisher=

work page 2024
[6]

Proceedings of the 32nd ACM International Conference on Multimedia , pages=

Sm4depth: Seamless monocular metric depth estimation across multiple cameras and scenes by one model , author=. Proceedings of the 32nd ACM International Conference on Multimedia , pages=

work page
[7]

ISPRS Journal of photogrammetry and remote sensing , volume=

Unmanned aerial systems for photogrammetry and remote sensing: A review , author=. ISPRS Journal of photogrammetry and remote sensing , volume=. 2014 , publisher=

work page 2014
[8]

The International archives of the photogrammetry, remote sensing and spatial Information sciences , volume=

UAV photogrammetry for mapping and 3d modeling--current status and future perspectives , author=. The International archives of the photogrammetry, remote sensing and spatial Information sciences , volume=. 2012 , publisher=

work page 2012
[9]

IEEE Geoscience and Remote Sensing Magazine , volume=

Unmanned Aerial Vehicle-Based Photogrammetric 3D Mapping: A survey of techniques, applications, and challenges , author=. IEEE Geoscience and Remote Sensing Magazine , volume=. 2021 , publisher=

work page 2021
[10]

Remote Sensing , volume=

UAVs and 3D city modeling to aid urban planning and historic preservation: A systematic review , author=. Remote Sensing , volume=. 2023 , publisher=

work page 2023
[11]

Remote Sensing , volume=

Towards urban digital twins: A workflow for procedural visualization using geospatial data , author=. Remote Sensing , volume=. 2024 , publisher=

work page 2024
[12]

arXiv preprint arXiv:2507.08448 , year=

Review of feed-forward 3d reconstruction: From dust3r to vggt , author=. arXiv preprint arXiv:2507.08448 , year=

work page arXiv
[13]

ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences , volume=

Autonomous UAV 3D Reconstruction using Prediction-Based Next Best View , author=. ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences , volume=. 2025 , publisher=

work page 2025
[14]

Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

Structure-from-motion revisited , author=. Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

work page
[15]

European conference on computer vision , pages=

Pixelwise view selection for unstructured multi-view stereo , author=. European conference on computer vision , pages=. 2016 , organization=

work page 2016
[16]

2003 , publisher=

Multiple view geometry in computer vision , author=. 2003 , publisher=

work page 2003
[17]

Geomorphology , volume=

Structure-from-Motion photogrammetry: A low-cost, effective tool for geoscience applications , author=. Geomorphology , volume=. 2012 , publisher=

work page 2012
[18]

Applied geomatics , volume=

UAV for 3D mapping applications: A review , author=. Applied geomatics , volume=. 2014 , publisher=

work page 2014
[19]

International Journal of Computer Vision , volume=

Large-scale data for multiple-view stereopsis , author=. International Journal of Computer Vision , volume=. 2016 , publisher=

work page 2016
[20]

Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

A multi-view stereo benchmark with high-resolution images and multi-camera videos , author=. Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

work page
[21]

ACM Transactions on Graphics (ToG) , volume=

Tanks and temples: Benchmarking large-scale scene reconstruction , author=. ACM Transactions on Graphics (ToG) , volume=. 2017 , publisher=

work page 2017
[22]

Proceedings of the European conference on computer vision (ECCV) , pages=

Mvsnet: Depth inference for unstructured multi-view stereo , author=. Proceedings of the European conference on computer vision (ECCV) , pages=

work page
[23]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

Recurrent mvsnet for high-resolution multi-view stereo depth inference , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

work page
[24]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

Cascade cost volume for high-resolution multi-view stereo and stereo matching , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

work page
[25]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

Patchmatchnet: Learned multi-view patchmatch stereo , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

work page
[26]

IEEE transactions on pattern analysis and machine intelligence , volume=

Towards robust monocular depth estimation: Mixing datasets for zero-shot cross-dataset transfer , author=. IEEE transactions on pattern analysis and machine intelligence , volume=. 2020 , publisher=

work page 2020
[27]

Proceedings of the IEEE/CVF international conference on computer vision , pages=

Metric3d: Towards zero-shot metric 3d prediction from a single image , author=. Proceedings of the IEEE/CVF international conference on computer vision , pages=

work page
[28]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

Unidepth: Universal monocular metric depth estimation , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

work page
[29]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

Blendedmvs: A large-scale dataset for generalized multi-view stereo networks , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

work page
[30]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

A novel recurrent encoder-decoder structure for large-scale multi-view stereo reconstruction from an open aerial dataset , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

work page
[31]

ISPRS Journal of Photogrammetry and Remote Sensing , volume=

Deep learning based multi-view stereo matching and 3D scene reconstruction from oblique aerial images , author=. ISPRS Journal of Photogrammetry and Remote Sensing , volume=. 2023 , publisher=

work page 2023
[32]

European Conference on Computer Vision , pages=

Capturing, reconstructing, and simulating: the urbanscene3d dataset , author=. European Conference on Computer Vision , pages=. 2022 , organization=

work page 2022
[33]

Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

Uavscenes: A multi-modal dataset for uavs , author=. Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

work page
[34]

ISPRS Open Journal of Photogrammetry and Remote Sensing , volume=

Depth estimation and 3D reconstruction from UAV-borne imagery: Evaluation on the UseGeo dataset , author=. ISPRS Open Journal of Photogrammetry and Remote Sensing , volume=. 2024 , publisher=

work page 2024
[35]

ISPRS Open Journal of Photogrammetry and Remote Sensing , volume=

UseGeo-A UAV-based multi-sensor dataset for geospatial research , author=. ISPRS Open Journal of Photogrammetry and Remote Sensing , volume=. 2024 , publisher=

work page 2024
[36]

Photogrammetric Engineering & Remote Sensing , volume=

Reliable spatial relationship constrained feature point matching of oblique aerial images , author=. Photogrammetric Engineering & Remote Sensing , volume=. 2015 , publisher=

work page 2015
[37]

Remote Sensing , volume=

The plumb-line matching algorithm for UAV oblique photographic photos , author=. Remote Sensing , volume=. 2023 , publisher=

work page 2023
[38]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

Dust3r: Geometric 3d vision made easy , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

work page
[39]

European conference on computer vision , pages=

Grounding image matching in 3d with mast3r , author=. European conference on computer vision , pages=. 2024 , organization=

work page 2024
[40]

Yang, Jianing and Sax, Alexander and Liang, Kevin J and Henaff, Mikael and Tang, Hao and Cao, Ang and Chai, Joyce and Meier, Franziska and Feiszli, Matt , booktitle=

work page
[41]

Wang, Qianqian and Zhang, Yifei and Holynski, Aleksander and Efros, Alexei A and Kanazawa, Angjoo , booktitle=

work page
[42]

Wang, Jianyuan and Chen, Minghao and Karaev, Nikita and Vedaldi, Andrea and Rupprecht, Christian and Novotny, David , booktitle=

work page
[43]

MapAnything: Universal Feed-Forward Metric 3D Reconstruction

Keetha, Nikhil and M. arXiv preprint arXiv:2509.13414 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[44]

Liu, Yifan and Min, Zhiyuan and Wang, Zhenwei and Wu, Junta and Wang, Tengfei and Yuan, Yixuan and Luo, Yawei and Guo, Chunchao , journal=

work page
[45]

Lin, Haotong and Chen, Sili and Liew, Junhao and Chen, Donny Y and Li, Zhenyu and Shi, Guang and Feng, Jiashi and Kang, Bingyi , journal=

work page
[46]

ISPRS Journal of Photogrammetry and Remote Sensing , volume=

ENRICH: Multi-purposE dataset for beNchmaRking In Computer vision and pHotogrammetry , author=. ISPRS Journal of Photogrammetry and Remote Sensing , volume=. 2023 , publisher=

work page 2023
[47]

Wang, Yifan and Zhou, Jianjun and Zhu, Haoyi and Chang, Wenzheng and Zhou, Yang and Li, Zizun and Chen, Junyi and Pang, Jiangmiao and Shen, Chunhua and He, Tong , journal=

work page
[48]

2023 , publisher=

Li, Jiayi and Huang, Xin and Feng, Yujin and Ji, Zhen and Zhang, Shulei and Wen, Dawei , journal=. 2023 , publisher=

work page 2023
[49]

Geo-spatial Information Science , pages=

An evaluation of DUSt3R/MASt3R/VGGT 3D reconstruction on photogrammetric aerial blocks , author=. Geo-spatial Information Science , pages=. 2025 , publisher=

work page 2025
[50]

ISPRS journal of photogrammetry and remote sensing , volume=

UAVid: A semantic segmentation dataset for UAV imagery , author=. ISPRS journal of photogrammetry and remote sensing , volume=. 2020 , publisher=

work page 2020
[51]

2020 , publisher=

Using semantically paired images to improve domain adaptation for the semantic segmentation of aerial images , author=. 2020 , publisher=

work page 2020
[52]

2024 , eprint=

GauU-Scene: A Scene Reconstruction Benchmark on Large Scale 3D Reconstruction Dataset Using Gaussian Splatting , author=. 2024 , eprint=

work page 2024

[1] [1]

Advances in Neural Information Processing Systems , volume=

3d-llm: Injecting the 3d world into large language models , author=. Advances in Neural Information Processing Systems , volume=

work page

[2] [2]

VoxPoser: Composable 3D Value Maps for Robotic Manipulation with Language Models

Voxposer: Composable 3d value maps for robotic manipulation with language models , author=. arXiv preprint arXiv:2307.05973 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[3] [3]

and Salazar, Grecia and Ryoo, Michael S

Zitkovich, Brianna and Yu, Tianhe and Xu, Sichun and Xu, Peng and Xiao, Ted and Xia, Fei and Wu, Jialin and Wohlhart, Paul and Welker, Stefan and Wahid, Ayzaan and Vuong, Quan and Vanhoucke, Vincent and Tran, Huong and Soricut, Radu and Singh, Anikait and Singh, Jaspiar and Sermanet, Pierre and Sanketi, Pannag R. and Salazar, Grecia and Ryoo, Michael S. a...

work page 2023

[4] [4]

Cheng, Xiaoya and Wu, Rouwan and Liu, Xinyi and Cui, Zeyu and Liu, Yan and Zhao, Na and Liu, Yu and Zhang, Maojun and Yan, Shen , journal =

work page

[5] [5]

IEEE Transactions on Pattern Analysis and Machine Intelligence , volume=

Metric3d v2: A versatile monocular geometric foundation model for zero-shot metric depth and surface normal estimation , author=. IEEE Transactions on Pattern Analysis and Machine Intelligence , volume=. 2024 , publisher=

work page 2024

[6] [6]

Proceedings of the 32nd ACM International Conference on Multimedia , pages=

Sm4depth: Seamless monocular metric depth estimation across multiple cameras and scenes by one model , author=. Proceedings of the 32nd ACM International Conference on Multimedia , pages=

work page

[7] [7]

ISPRS Journal of photogrammetry and remote sensing , volume=

Unmanned aerial systems for photogrammetry and remote sensing: A review , author=. ISPRS Journal of photogrammetry and remote sensing , volume=. 2014 , publisher=

work page 2014

[8] [8]

The International archives of the photogrammetry, remote sensing and spatial Information sciences , volume=

UAV photogrammetry for mapping and 3d modeling--current status and future perspectives , author=. The International archives of the photogrammetry, remote sensing and spatial Information sciences , volume=. 2012 , publisher=

work page 2012

[9] [9]

IEEE Geoscience and Remote Sensing Magazine , volume=

Unmanned Aerial Vehicle-Based Photogrammetric 3D Mapping: A survey of techniques, applications, and challenges , author=. IEEE Geoscience and Remote Sensing Magazine , volume=. 2021 , publisher=

work page 2021

[10] [10]

Remote Sensing , volume=

UAVs and 3D city modeling to aid urban planning and historic preservation: A systematic review , author=. Remote Sensing , volume=. 2023 , publisher=

work page 2023

[11] [11]

Remote Sensing , volume=

Towards urban digital twins: A workflow for procedural visualization using geospatial data , author=. Remote Sensing , volume=. 2024 , publisher=

work page 2024

[12] [12]

arXiv preprint arXiv:2507.08448 , year=

Review of feed-forward 3d reconstruction: From dust3r to vggt , author=. arXiv preprint arXiv:2507.08448 , year=

work page arXiv

[13] [13]

ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences , volume=

Autonomous UAV 3D Reconstruction using Prediction-Based Next Best View , author=. ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences , volume=. 2025 , publisher=

work page 2025

[14] [14]

Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

Structure-from-motion revisited , author=. Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

work page

[15] [15]

European conference on computer vision , pages=

Pixelwise view selection for unstructured multi-view stereo , author=. European conference on computer vision , pages=. 2016 , organization=

work page 2016

[16] [16]

2003 , publisher=

Multiple view geometry in computer vision , author=. 2003 , publisher=

work page 2003

[17] [17]

Geomorphology , volume=

Structure-from-Motion photogrammetry: A low-cost, effective tool for geoscience applications , author=. Geomorphology , volume=. 2012 , publisher=

work page 2012

[18] [18]

Applied geomatics , volume=

UAV for 3D mapping applications: A review , author=. Applied geomatics , volume=. 2014 , publisher=

work page 2014

[19] [19]

International Journal of Computer Vision , volume=

Large-scale data for multiple-view stereopsis , author=. International Journal of Computer Vision , volume=. 2016 , publisher=

work page 2016

[20] [20]

Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

A multi-view stereo benchmark with high-resolution images and multi-camera videos , author=. Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

work page

[21] [21]

ACM Transactions on Graphics (ToG) , volume=

Tanks and temples: Benchmarking large-scale scene reconstruction , author=. ACM Transactions on Graphics (ToG) , volume=. 2017 , publisher=

work page 2017

[22] [22]

Proceedings of the European conference on computer vision (ECCV) , pages=

Mvsnet: Depth inference for unstructured multi-view stereo , author=. Proceedings of the European conference on computer vision (ECCV) , pages=

work page

[23] [23]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

Recurrent mvsnet for high-resolution multi-view stereo depth inference , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

work page

[24] [24]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

Cascade cost volume for high-resolution multi-view stereo and stereo matching , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

work page

[25] [25]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

Patchmatchnet: Learned multi-view patchmatch stereo , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

work page

[26] [26]

IEEE transactions on pattern analysis and machine intelligence , volume=

Towards robust monocular depth estimation: Mixing datasets for zero-shot cross-dataset transfer , author=. IEEE transactions on pattern analysis and machine intelligence , volume=. 2020 , publisher=

work page 2020

[27] [27]

Proceedings of the IEEE/CVF international conference on computer vision , pages=

Metric3d: Towards zero-shot metric 3d prediction from a single image , author=. Proceedings of the IEEE/CVF international conference on computer vision , pages=

work page

[28] [28]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

Unidepth: Universal monocular metric depth estimation , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

work page

[29] [29]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

Blendedmvs: A large-scale dataset for generalized multi-view stereo networks , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

work page

[30] [30]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

A novel recurrent encoder-decoder structure for large-scale multi-view stereo reconstruction from an open aerial dataset , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

work page

[31] [31]

ISPRS Journal of Photogrammetry and Remote Sensing , volume=

Deep learning based multi-view stereo matching and 3D scene reconstruction from oblique aerial images , author=. ISPRS Journal of Photogrammetry and Remote Sensing , volume=. 2023 , publisher=

work page 2023

[32] [32]

European Conference on Computer Vision , pages=

Capturing, reconstructing, and simulating: the urbanscene3d dataset , author=. European Conference on Computer Vision , pages=. 2022 , organization=

work page 2022

[33] [33]

Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

Uavscenes: A multi-modal dataset for uavs , author=. Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

work page

[34] [34]

ISPRS Open Journal of Photogrammetry and Remote Sensing , volume=

Depth estimation and 3D reconstruction from UAV-borne imagery: Evaluation on the UseGeo dataset , author=. ISPRS Open Journal of Photogrammetry and Remote Sensing , volume=. 2024 , publisher=

work page 2024

[35] [35]

ISPRS Open Journal of Photogrammetry and Remote Sensing , volume=

UseGeo-A UAV-based multi-sensor dataset for geospatial research , author=. ISPRS Open Journal of Photogrammetry and Remote Sensing , volume=. 2024 , publisher=

work page 2024

[36] [36]

Photogrammetric Engineering & Remote Sensing , volume=

Reliable spatial relationship constrained feature point matching of oblique aerial images , author=. Photogrammetric Engineering & Remote Sensing , volume=. 2015 , publisher=

work page 2015

[37] [37]

Remote Sensing , volume=

The plumb-line matching algorithm for UAV oblique photographic photos , author=. Remote Sensing , volume=. 2023 , publisher=

work page 2023

[38] [38]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

Dust3r: Geometric 3d vision made easy , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

work page

[39] [39]

European conference on computer vision , pages=

Grounding image matching in 3d with mast3r , author=. European conference on computer vision , pages=. 2024 , organization=

work page 2024

[40] [40]

Yang, Jianing and Sax, Alexander and Liang, Kevin J and Henaff, Mikael and Tang, Hao and Cao, Ang and Chai, Joyce and Meier, Franziska and Feiszli, Matt , booktitle=

work page

[41] [41]

Wang, Qianqian and Zhang, Yifei and Holynski, Aleksander and Efros, Alexei A and Kanazawa, Angjoo , booktitle=

work page

[42] [42]

Wang, Jianyuan and Chen, Minghao and Karaev, Nikita and Vedaldi, Andrea and Rupprecht, Christian and Novotny, David , booktitle=

work page

[43] [43]

MapAnything: Universal Feed-Forward Metric 3D Reconstruction

Keetha, Nikhil and M. arXiv preprint arXiv:2509.13414 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[44] [44]

Liu, Yifan and Min, Zhiyuan and Wang, Zhenwei and Wu, Junta and Wang, Tengfei and Yuan, Yixuan and Luo, Yawei and Guo, Chunchao , journal=

work page

[45] [45]

Lin, Haotong and Chen, Sili and Liew, Junhao and Chen, Donny Y and Li, Zhenyu and Shi, Guang and Feng, Jiashi and Kang, Bingyi , journal=

work page

[46] [46]

ISPRS Journal of Photogrammetry and Remote Sensing , volume=

ENRICH: Multi-purposE dataset for beNchmaRking In Computer vision and pHotogrammetry , author=. ISPRS Journal of Photogrammetry and Remote Sensing , volume=. 2023 , publisher=

work page 2023

[47] [47]

Wang, Yifan and Zhou, Jianjun and Zhu, Haoyi and Chang, Wenzheng and Zhou, Yang and Li, Zizun and Chen, Junyi and Pang, Jiangmiao and Shen, Chunhua and He, Tong , journal=

work page

[48] [48]

2023 , publisher=

Li, Jiayi and Huang, Xin and Feng, Yujin and Ji, Zhen and Zhang, Shulei and Wen, Dawei , journal=. 2023 , publisher=

work page 2023

[49] [49]

Geo-spatial Information Science , pages=

An evaluation of DUSt3R/MASt3R/VGGT 3D reconstruction on photogrammetric aerial blocks , author=. Geo-spatial Information Science , pages=. 2025 , publisher=

work page 2025

[50] [50]

ISPRS journal of photogrammetry and remote sensing , volume=

UAVid: A semantic segmentation dataset for UAV imagery , author=. ISPRS journal of photogrammetry and remote sensing , volume=. 2020 , publisher=

work page 2020

[51] [51]

2020 , publisher=

Using semantically paired images to improve domain adaptation for the semantic segmentation of aerial images , author=. 2020 , publisher=

work page 2020

[52] [52]

2024 , eprint=

GauU-Scene: A Scene Reconstruction Benchmark on Large Scale 3D Reconstruction Dataset Using Gaussian Splatting , author=. 2024 , eprint=

work page 2024