arxiv: 2605.14984 · v1 · submitted 2026-05-14 · 💻 cs.CV · cs.AI

Recognition: 2 theorem links

· Lean Theorem

Sat3DGen: Comprehensive Street-Level 3D Scene Generation from Single Satellite Image

Ming Qian , Zimin Xia , Changkun Liu , Shuailei Ma , Wen Wang , Zeran Ke , Bin Tan , Hang Zhang

show 1 more author

Gui-Song Xia

Authors on Pith no claims yet

Pith reviewed 2026-05-15 03:11 UTC · model grok-4.3

classification 💻 cs.CV cs.AI

keywords satellite to street 3Dgeometry-first generationfeed-forward 3Dviewpoint gapDSM supervisionphotorealistic scenesVIGOR benchmark

0 comments

The pith

A geometry-first training strategy generates accurate street-level 3D scenes directly from single satellite images.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that current satellite-to-street 3D methods either lock onto coarse building shapes or produce unstable geometry because of the huge viewpoint shift and weak supervision signals. Sat3DGen counters this by forcing the model to respect explicit geometric constraints while training on perspective views inside an otherwise feed-forward pipeline. This change cuts root-mean-square geometric error from 6.76 m to 5.20 m on a new DSM-augmented VIGOR-OOD test set. The same geometric improvement also halves the Fréchet Inception Distance from roughly 40 to 19, even without any specialized image-quality modules. The resulting 3D assets support downstream tasks such as semantic-map synthesis, multi-view video, large-scale meshing, and unsupervised DSM recovery.

Core claim

Sat3DGen demonstrates that a geometry-first methodology, which augments a standard feed-forward image-to-3D backbone with novel geometric constraints and a perspective-view training regimen, directly mitigates the viewpoint gap and sparse supervision that previously limited both metric accuracy and visual quality in satellite-to-street scene generation.

What carries the argument

The geometry-first methodology that adds explicit geometric constraints to the feed-forward image-to-3D framework and trains under a perspective-view regime.

If this is right

Geometric RMSE drops from 6.76 m to 5.20 m on the VIGOR-OOD benchmark paired with high-resolution DSM data.
FID falls from approximately 40 to 19 against the prior leading method without any extra image-quality components.
High-quality 3D assets become available for semantic-map-to-3D synthesis, multi-camera video generation, large-scale meshing, and unsupervised single-image DSM estimation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The approach may scale to city-wide 3D reconstruction if satellite coverage density increases.
Similar constraint-plus-perspective training could be tested on other wide-baseline 3D tasks such as aerial-to-ground fusion.
The unsupervised DSM recovery path suggests the model learns metric structure without explicit depth labels.

Load-bearing premise

The extreme viewpoint gap and sparse inconsistent supervision between satellite and street views can be overcome by adding geometric constraints and perspective-view training.

What would settle it

If the same Sat3DGen architecture trained on the VIGOR-OOD benchmark with DSM supervision still yields RMSE above 6 m and FID above 30, the claim that geometric constraints plus perspective training close the gap would be falsified.

Figures

Figures reproduced from arXiv: 2605.14984 by Bin Tan, Changkun Liu, Gui-Song Xia, Hang Zhang, Ming Qian, Shuailei Ma, Wen Wang, Zeran Ke, Zimin Xia.

**Figure 2.** Figure 2: Diagram of the proposed Sat3DGen framework. [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: The comparison of generation 3D between Sat2Density++ (Qian et al., 2026) and our [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 4.** Figure 4: Visual results of generated mesh (b), panorama videos (c), and multi-view perspective [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 5.** Figure 5: Qualitative ablation on key modules. results generated by Sat2Density++ and our model. Consistent with the conclusions drawn from the generated 3D assets, our model enhances the generated videos primarily by reducing artifacts and producing smoother edges around buildings and scene boundaries through the generation of higher-quality 3D representations. Geometric Comparison To quantitatively evaluate the ge… view at source ↗

**Figure 6.** Figure 6: Comparison of generation 3D assets between Sat2Density++(Qian et al., 2026) and Ours. [PITH_FULL_IMAGE:figures/full_fig_p015_6.png] view at source ↗

**Figure 7.** Figure 7: Comparison of generation 3D assets between the baseline Canonical image-to-3D model and Ours. generated perspective images is nearly on par with panorama images, underscoring the superiority of our model design. To the best of our knowledge, our approach is the first to generate diverse content in multi-view perspective videos from a single satellite image without requiring video data or 3D geometry as tra… view at source ↗

**Figure 8.** Figure 8: Comparison of predicted satellite-view height map. 17 [PITH_FULL_IMAGE:figures/full_fig_p017_8.png] view at source ↗

**Figure 9.** Figure 9: Visual results of our model generated DSM (metric depth) from the monocular satellite [PITH_FULL_IMAGE:figures/full_fig_p018_9.png] view at source ↗

**Figure 10.** Figure 10: Given a large satellite image, our model can generate mesh with sliding window inference [PITH_FULL_IMAGE:figures/full_fig_p019_10.png] view at source ↗

**Figure 11.** Figure 11: Given a colored semantic map, our model can generate 3D mesh through a pipeline that [PITH_FULL_IMAGE:figures/full_fig_p020_11.png] view at source ↗

**Figure 12.** Figure 12: The collected DSM data in Seattle City [PITH_FULL_IMAGE:figures/full_fig_p022_12.png] view at source ↗

read the original abstract

Generating a street-level 3D scene from a single satellite image is a crucial yet challenging task. Current methods present a stark trade-off: geometry-colorization models achieve high geometric fidelity but are typically building-focused and lack semantic diversity. In contrast, proxy-based models use feed-forward image-to-3D frameworks to generate holistic scenes by jointly learning geometry and texture, a process that yields rich content but coarse and unstable geometry. We attribute these geometric failures to the extreme viewpoint gap and sparse, inconsistent supervision inherent in satellite-to-street data. We introduce Sat3DGen to address these fundamental challenges, which embodies a geometry-first methodology. This methodology enhances the feed-forward paradigm by integrating novel geometric constraints with a perspective-view training strategy, explicitly countering the primary sources of geometric error. This geometry-centric strategy yields a dramatic leap in both 3D accuracy and photorealism. For validation, we first constructed a new benchmark by pairing the VIGOR-OOD test set with high-resolution DSM data. On this benchmark, our method improves geometric RMSE from 6.76m to 5.20m. Crucially, this geometric leap also boosts photorealism, reducing the Fr\'echet Inception Distance (FID) from $\sim$40 to 19 against the leading method, Sat2Density++, despite using no extra tailored image-quality modules. We demonstrate the versatility of our high-quality 3D assets through diverse downstream applications, including semantic-map-to-3D synthesis, multi-camera video generation, large-scale meshing, and unsupervised single-image Digital Surface Model (DSM) estimation. The code has been released on https://github.com/qianmingduowan/Sat3DGen.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Sat3DGen gets measurable gains on satellite-to-street 3D by adding geometric constraints and perspective training, but the gains are not clearly isolated to those additions.

read the letter

The paper introduces Sat3DGen, which generates street-level 3D scenes from a single satellite image using a feed-forward model that prioritizes geometry. They combine novel geometric constraints with perspective-view training to handle the big viewpoint difference and weak supervision in this setting. On their new benchmark from VIGOR-OOD with DSM data, it cuts geometric RMSE from 6.76m to 5.20m and FID from around 40 to 19 compared to Sat2Density++.

Referee Report

2 major / 2 minor

Summary. The paper introduces Sat3DGen, a geometry-first feed-forward method for generating street-level 3D scenes from a single satellite image. It integrates novel geometric constraints with a perspective-view training strategy to address the extreme viewpoint gap and sparse supervision in satellite-to-street data. On a newly constructed VIGOR-OOD benchmark paired with DSM data, the method reports reducing geometric RMSE from 6.76 m to 5.20 m and FID from ~40 to 19 relative to Sat2Density++, while enabling downstream tasks such as semantic-map-to-3D synthesis and unsupervised DSM estimation. The code is released publicly.

Significance. If the central improvements hold under controlled validation, the work would advance satellite-to-street 3D generation by demonstrating that explicit geometric constraints can simultaneously boost accuracy and photorealism without dedicated image-quality modules. The construction of a DSM-augmented benchmark and public code release are concrete strengths that support reproducibility and downstream research in large-scale meshing and multi-view synthesis.

major comments (2)

[Abstract and Experiments] Abstract and Experiments section: The headline claim that the 1.56 m RMSE reduction and FID halving are driven by the novel geometric constraints plus perspective-view training lacks support from any ablation that removes only those constraints while freezing architecture, data, and other losses. Without this isolation, the measured gains could arise from benchmark construction, training schedule, or capacity changes.
[Experiments] Experiments section: No error analysis or per-scene breakdown is provided to show where the geometric constraints specifically mitigate the viewpoint gap versus other factors; this is load-bearing for the assertion that the methodology 'explicitly counters the primary sources of geometric error.'

minor comments (2)

The abstract contains a typesetting artifact ('Fréchet' rendered as 'Fréchet' with backslash); ensure consistent LaTeX rendering and define DSM on first use in the main text.
Figure captions and table legends should explicitly state whether reported metrics are computed on the full test set or a subset, and whether the baseline Sat2Density++ was retrained on the new DSM-augmented benchmark.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment below and will revise the paper to strengthen the experimental validation as suggested.

read point-by-point responses

Referee: [Abstract and Experiments] Abstract and Experiments section: The headline claim that the 1.56 m RMSE reduction and FID halving are driven by the novel geometric constraints plus perspective-view training lacks support from any ablation that removes only those constraints while freezing architecture, data, and other losses. Without this isolation, the measured gains could arise from benchmark construction, training schedule, or capacity changes.

Authors: We agree that an ablation isolating only the geometric constraints and perspective-view training (while freezing architecture, data, and remaining losses) would provide stronger evidence for the headline claims. In the revised manuscript we will add this controlled ablation study to demonstrate that the reported RMSE and FID improvements are attributable to the proposed components rather than other factors such as benchmark construction or training schedule. revision: yes
Referee: [Experiments] Experiments section: No error analysis or per-scene breakdown is provided to show where the geometric constraints specifically mitigate the viewpoint gap versus other factors; this is load-bearing for the assertion that the methodology 'explicitly counters the primary sources of geometric error.'

Authors: We acknowledge that per-scene error breakdowns and targeted analysis would better illustrate how the geometric constraints address the viewpoint gap. In the revision we will incorporate additional error analysis, including per-scene RMSE statistics, error map visualizations, and breakdowns by scene characteristics to show the specific mitigation of viewpoint-related errors. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical gains presented as direct benchmark outcomes without self-referential reductions

full rationale

The paper's central claims rest on measured RMSE (6.76 m to 5.20 m) and FID (~40 to 19) improvements on a newly constructed VIGOR-OOD + DSM benchmark. These are reported as empirical results of the geometry-first pipeline rather than quantities derived from fitted parameters or self-cited uniqueness theorems. No equations appear that define geometric constraints in terms of the target accuracy metrics, no predictions are statistically forced by training-set fits, and no load-bearing premises reduce to prior self-citations. The methodology is described at the level of architectural choices and training strategies whose effects are validated externally on held-out data.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review provides no explicit free parameters, axioms, or invented entities; the central claim rests on the unstated effectiveness of the novel geometric constraints and training strategy.

pith-pipeline@v0.9.0 · 5638 in / 1100 out tokens · 50498 ms · 2026-05-15T03:11:15.943077+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Gravity-based Density Variation Loss ... σ should generally be non-increasing with altitude ... Lgrav = E[ReLU(σ(x+δz)−σ(x)−ϵ)]
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Monocular Relative-Depth Prior ... Ldepth = scale-shift invariant MiDaS-style loss on satellite-view depth

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

119 extracted references · 119 canonical work pages · 4 internal anchors

[1]

Scaling Learning Algorithms Towards

Bengio, Yoshua and LeCun, Yann , booktitle =. Scaling Learning Algorithms Towards

work page
[2]

and Osindero, Simon and Teh, Yee Whye , journal =

Hinton, Geoffrey E. and Osindero, Simon and Teh, Yee Whye , journal =. A Fast Learning Algorithm for Deep Belief Nets , volume =

work page
[3]

2016 , publisher=

Deep learning , author=. 2016 , publisher=

work page 2016
[4]

Chan and Connor Z

Eric R. Chan and Connor Z. Lin and Matthew A. Chan and Koki Nagano and Boxiao Pan and Shalini De Mello and Orazio Gallo and Leonidas Guibas and Jonathan Tremblay and Sameh Khamis and Tero Karras and Gordon Wetzstein , title =

work page
[5]

3D-aware Conditional Image Synthesis , author=

work page
[6]

NeRF in the Wild: Neural Radiance Fields for Unconstrained Photo Collections , booktitle = CVPR, pages =

Ricardo Martin. NeRF in the Wild: Neural Radiance Fields for Unconstrained Photo Collections , booktitle = CVPR, pages =. 2021 , doi =

work page 2021
[7]

Regmi, Krishna and Borji, Ali , title =

work page
[8]

Oswald and Marc Pollefeys and Rongjun Qin , title =

Xiaohu Lu and Zuoyue Li and Zhaopeng Cui and Martin R. Oswald and Marc Pollefeys and Rongjun Qin , title =

work page
[9]

Krishna Regmi and Ali Borji , title =

work page
[10]

Corso and Yan Yan , title =

Hao Tang and Dan Xu and Nicu Sebe and Yanzhi Wang and Jason J. Corso and Yan Yan , title =

work page
[11]

Yujiao Shi and Dylan Campbell and Xin Yu and Hongdong Li , title =

work page
[12]

, booktitle=ICCV, title=

Li, Zuoyue and Li, Zhenqiang and Cui, Zhaopeng and Qin, Rongjun and Pollefeys, Marc and Oswald, Martin R. , booktitle=ICCV, title=. 2021 , volume=

work page 2021
[13]

, title =

Li, Zuoyue and Li, Zhenqiang and Cui, Zhaopeng and Pollefeys, Marc and Oswald, Martin R. , title =. 2024 , pages =

work page 2024
[14]

2023 , volume =

Ming Qian and Jincheng Xiong and Gui-Song Xia and Nan Xue , booktitle = ICCV, title =. 2023 , volume =

work page 2023
[15]

2019 , volume=

Park, Taesung and Liu, Ming-Yu and Wang, Ting-Chun and Zhu, Jun-Yan , booktitle=CVPR, title=. 2019 , volume=

work page 2019
[16]

Kajiya and Brian Von Herzen , title =

James T. Kajiya and Brian Von Herzen , title =. Proceedings of the 11th Annual Conference on Computer Graphics and Interactive Techniques,. 1984 , doi =

work page 1984
[17]

Srinivasan and Matthew Tancik and Jonathan T

Ben Mildenhall and Pratul P. Srinivasan and Matthew Tancik and Jonathan T. Barron and Ravi Ramamoorthi and Ren Ng , title =. 2020 , timestamp =

work page 2020
[18]

StyleNeRF: A Style-based 3D Aware Generator for High-resolution Image Synthesis , author=

work page
[19]

Advances in Neural Information Processing Systems , year =

A Shading-Guided Generative Implicit Model for Shape-Accurate 3D-Aware Image Synthesis , author =. Advances in Neural Information Processing Systems , year =

work page
[20]

Bending Reality: Distortion-aware Transformers for Adapting to Panoramic Semantic Segmentation , booktitle = CVPR, pages =

Jiaming Zhang and Kailun Yang and Chaoxiang Ma and Simon Rei. Bending Reality: Distortion-aware Transformers for Adapting to Panoramic Semantic Segmentation , booktitle = CVPR, pages =

work page
[21]

Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data , author=

work page
[22]

Zhang and P

R. Zhang and P. Isola and A. A. Efros and E. Shechtman and O. Wang , booktitle = CVPR, title =. 2018 , volume =

work page 2018
[23]

Arxiv , year=

Zoedepth: Zero-shot transfer by combining relative and metric depth , author=. Arxiv , year=

work page
[24]

Omnidata: A Scalable Pipeline for Making Multi-Task Mid-Level Vision Datasets From 3D Scans , author=

work page
[25]

3D Common Corruptions and Data Augmentation , author=

work page
[26]

Goodfellow and Jean Pouget

Ian J. Goodfellow and Jean Pouget. Generative Adversarial Nets , booktitle = NIPS, pages =. 2014 , timestamp =

work page 2014
[27]

Mescheder and Andreas Geiger and Sebastian Nowozin , title =

Lars M. Mescheder and Andreas Geiger and Sebastian Nowozin , title =. 2018 , timestamp =

work page 2018
[28]

Tero Karras and Samuli Laine and Miika Aittala and Janne Hellsten and Jaakko Lehtinen and Timo Aila , title =

work page
[29]

VIGOR: Cross-View Image Geo-localization beyond One-to-one Retrieval , author=

work page
[30]

Lentsch, Ted and Xia, Zimin and Caesar, Holger and Kooij, Julian F. P. , title =. 2023 , pages =

work page 2023
[31]

Liu, Andrew and Tucker, Richard and Jampani, Varun and Makadia, Ameesh and Snavely, Noah and Kanazawa, Angjoo , title =

work page
[32]

InfiniteNature-Zero: Learning Perpetual View Generation of Natural Scenes from Single Images , author =

work page
[33]

Yuanbo Yang and Yifei Yang and Hanlei Guo and Rong Xiong and Yue Wang and Yiyi Liao , title =

work page
[34]

2023 , doi =

Yuanbo Xiangli and Linning Xu and Xingang Pan and Nanxuan Zhao and Bo Dai and Dahua Lin , title =. 2023 , doi =

work page 2023
[35]

Lin, Chieh Hubert and Lee, Hsin-Ying and Menapace, Willi and Chai, Menglei and Siarohin, Aliaksandr and Yang, Ming-Hsuan and Tulyakov, Sergey , booktitle=ICCV, year=. Infini

work page
[36]

Xie, Haozhe and Chen, Zhaoxi and Hong, Fangzhou and Liu, Ziwei , booktitle = CVPR, year =. City

work page
[37]

Menghua Zhai and Zachary Bessinger and Scott Workman and Nathan Jacobs , title =

work page
[38]

Spatial-aware feature aggregation for image based cross-view geo-localization , author=

work page
[39]

Yujiao Shi and Liu Liu and Xin Yu and Hongdong Li , title =

work page
[40]

Yujiao Shi and Xin Yu and Liu Liu and Tong Zhang and Hongdong Li , title =

work page
[41]

Scott Workman and Muhammad Usman Rafique and Hunter Blanton and Nathan Jacobs , title =

work page
[42]

Scott Workman and Hunter Blanton , title =

work page
[43]

Image-to-Image Translation with Conditional Adversarial Networks , booktitle = CVPR, pages =

Phillip Isola and Jun. Image-to-Image Translation with Conditional Adversarial Networks , booktitle = CVPR, pages =

work page
[44]

2019 , timestamp =

Thomas Unterthiner and Sjoerd van Steenkiste and Karol Kurach and Rapha. 2019 , timestamp =

work page 2019
[45]

MaskGAN: Towards Diverse and Interactive Facial Image Manipulation , booktitle = CVPR, pages =

Cheng. MaskGAN: Towards Diverse and Interactive Facial Image Manipulation , booktitle = CVPR, pages =. 2020 , doi =

work page 2020
[46]

StarGAN v2: Diverse Image Synthesis for Multiple Domains , booktitle = CVPR, pages =

Yunjey Choi and Youngjung Uh and Jaejun Yoo and Jung. StarGAN v2: Diverse Image Synthesis for Multiple Domains , booktitle = CVPR, pages =. 2020 , doi =

work page 2020
[47]

ShapeNet: An Information-Rich 3D Model Repository

Angel X. Chang and Thomas A. Funkhouser and Leonidas J. Guibas and Pat Hanrahan and Qi. ShapeNet: An Information-Rich 3D Model Repository , journal =. 2015 , eprinttype =. 1512.03012 , timestamp =

work page internal anchor Pith review Pith/arXiv arXiv 2015
[48]

2023 , timestamp =

Qiuyu Wang and Zifan Shi and Kecheng Zheng and Yinghao Xu and Sida Peng and Yujun Shen , title =. 2023 , timestamp =

work page 2023
[49]

Hinton , title =

Alex Krizhevsky and Ilya Sutskever and Geoffrey E. Hinton , title =. 2012 , timestamp =

work page 2012
[50]

2016 , eprint=

SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size , author=. 2016 , eprint=

work page 2016
[51]

Toker and Q

A. Toker and Q. Zhou and M. Maximov and L. Leal-Taixe , booktitle = CVPR, title =. 2021 , volume =

work page 2021
[52]

Li and Y

W. Li and Y. Lai and L. Xu and Y. Xiangli and J. Yu and C. He and G. Xia and D. Lin , booktitle = CVPR, title =. 2023 , volume =

work page 2023
[53]

CoRR , volume =

Yichao Zhou and Jingwei Huang and Xili Dai and Linjie Luo and Zhili Chen and Yi Ma , title =. CoRR , volume =. 2020 , eprinttype =. 2008.03286 , timestamp =

work page arXiv 2020
[54]

Max , title =

Nelson L. Max , title =

work page
[55]

2023 , biburl =

Paul. 2023 , biburl =

work page 2023
[56]

3D Gaussian Splatting for Real-Time Radiance Field Rendering , journal = TOG, number =

Kerbl, Bernhard and Kopanas, Georgios and Leimk. 3D Gaussian Splatting for Real-Time Radiance Field Rendering , journal = TOG, number =

work page
[57]

CoRR , volume =

Xiaomou Hou and Wanshui Gan and Naoto Yokoya , title =. CoRR , volume =. 2023 , doi =

work page 2023
[58]

Srinivasan and Jonathan T

Konstantinos Rematas and Andrew Liu and Pratul P. Srinivasan and Jonathan T. Barron and Andrea Tagliasacchi and Thomas A. Funkhouser and Vittorio Ferrari , title =

work page
[59]

Estimating the Natural Illumination Conditions from a Single Outdoor Image , journal = IJCV, volume =

Jean. Estimating the Natural Illumination Conditions from a Single Outdoor Image , journal = IJCV, volume =. 2012 , doi =

work page 2012
[60]

Tang, Jiajun and Zhu, Yongjie and Wang, Haoyu and Chan, Jun-Hoong and Li, Si and Shi, Boxin , title =

work page
[61]

Deep sky modeling for single image outdoor lighting estimation , author=

work page
[62]

2023 , volume =

Wimbauer, Felix and Yang, Nan and Rupprecht, Christian and Cremers, Daniel , booktitle = CVPR, title =. 2023 , volume =

work page 2023
[63]

2021 , volume =

Yu, Alex and Ye, Vickie and Tancik, Matthew and Kanazawa, Angjoo , booktitle = CVPR, title =. 2021 , volume =

work page 2021
[64]

On Aliased Resizing and Surprising Subtleties in GAN Evaluation , author=

work page
[65]

2023 , howpublished =

OpenCV Team , title =. 2023 , howpublished =

work page 2023
[66]

2015 , publisher=

Pillow (PIL Fork) Documentation , author=. 2015 , publisher=

work page 2015
[67]

Yinghao Xu and Hao Tan and Fujun Luan and Sai Bi and Peng Wang and Jiahao Li and Zifan Shi and Kalyan Sunkavalli and Gordon Wetzstein and Zexiang Xu and Kai Zhang , booktitle=ICLR, year=

work page
[68]

Yinghao Xu and Zifan Shi and Yifan Wang and Hansheng Chen and Ceyuan Yang and Sida Peng and Yujun Shen and Gordon Wetzstein , title =

work page
[69]

CoRR , volume =

Xinyang Li and Zhangyu Lai and Linning Xu and Jianfei Guo and Liujuan Cao and Shengchuan Zhang and Bo Dai and Rongrong Ji , title =. CoRR , volume =. 2024 , eprinttype =

work page 2024
[70]

DINOv2: Learning Robust Visual Features without Supervision

DINOv2: Learning Robust Visual Features without Supervision , author=. arXiv:2304.07193 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[71]

2017 , booktitle = NIPS, pages =

Heusel, Martin and Ramsauer, Hubert and Unterthiner, Thomas and Nessler, Bernhard and Hochreiter, Sepp , title =. 2017 , booktitle = NIPS, pages =

work page 2017
[72]

Sutherland and Michael Arbel and Arthur Gretton , booktitle=ICLR, year=

Mikołaj Bińkowski and Dougal J. Sutherland and Michael Arbel and Arthur Gretton , booktitle=ICLR, year=. Demystifying

work page
[73]

Zekun Hao and Arun Mallya and Serge Belongie and Ming-Yu Liu , booktitle=ICCV, year=

work page
[74]

Alex Yu and Vickie Ye and Matthew Tancik and Angjoo Kanazawa , year=

work page
[75]

Street Gaussians: Modeling Dynamic Urban Scenes with Gaussian Splatting , author=

work page
[76]

2021 , issn =

Street view imagery in urban analytics and GIS: A review , journal =. 2021 , issn =

work page 2021
[77]

Proceedings of the 18th International Conference on Mobile and Ubiquitous Multimedia , articleno =

Kostakos, Panos and Alavesa, Paula and Oppenlaender, Jonas and Hosio, Simo , title =. Proceedings of the 18th International Conference on Mobile and Ubiquitous Multimedia , articleno =. 2019 , isbn =

work page 2019
[78]

2024 , pages =

Li, Guopeng and Qian, Ming and Xia, Gui-Song , title =. 2024 , pages =

work page 2024
[79]

Abhishek Kar and Christian H\"ane and Jitendra Malik , title =

work page
[80]

and Russell, Bryan C

Groueix, Thibault and Fisher, Matthew and Kim, Vladimir G. and Russell, Bryan C. and Aubry, Mathieu , title =

work page

Showing first 80 references.