GeoFormer: A Lightweight Swin Transformer for Joint Building Height and Footprint Estimation from Sentinel Imagery

DaHee Kim; Han Jinzhen; HongSik Yun; JinByeong Lee; JiSung Kim; MinKyung Cho

arxiv: 2602.09932 · v2 · submitted 2026-02-10 · 💻 cs.CV

GeoFormer: A Lightweight Swin Transformer for Joint Building Height and Footprint Estimation from Sentinel Imagery

Han Jinzhen , JinByeong Lee , JiSung Kim , MinKyung Cho , DaHee Kim , HongSik Yun This is my paper

Pith reviewed 2026-05-16 02:45 UTC · model grok-4.3

classification 💻 cs.CV

keywords building height estimationfootprint extractionSwin TransformerSentinel imagerymulti-task learningremote sensingurban morphologylightweight model

0 comments

The pith

GeoFormer uses a lightweight Swin Transformer to jointly estimate building height and footprint from Sentinel data with fewer parameters and higher accuracy than CNN baselines.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces GeoFormer, a multi-task Swin Transformer model that predicts building heights and footprints at 100 m resolution using only open Sentinel-1 SAR, Sentinel-2 multispectral, and DEM inputs. It reports a building height RMSE of 3.19 m with 0.32 million parameters, beating the strongest CNN baseline by 7.5 percent while confirming that a 5 by 5 context window and DEM inputs are key contributors. A geo-blocked split across 54 cities tests whether the model transfers across continents without retraining. These results matter because consistent global building morphology data remain scarce yet are required for climate modeling, disaster risk assessment, and population mapping.

Core claim

GeoFormer achieves a building height RMSE of 3.19 m and competitive footprint accuracy with only 0.32 M parameters by replacing convolutional layers with windowed local attention in a multi-task framework; this outperforms the best CNN baseline (UNet) by 7.5 percent and maintains sub-3.5 m RMSE in cross-continent transfer tests without region-specific fine-tuning.

What carries the argument

A lightweight Swin Transformer backbone with windowed self-attention operating in a multi-task regression head that jointly outputs building height and footprint on a 100 m grid from fused Sentinel and DEM inputs.

If this is right

A 5 by 5 (500 m) receptive field proves optimal for scene-level building parameter retrieval.
DEM data is indispensable for height accuracy while multispectral reflectance supplies the dominant signal for footprint prediction.
The model’s low parameter count allows deployment on modest hardware for repeated global mapping updates.
Cross-continent transfer without fine-tuning supports production of consistent worldwide urban morphology layers.
Ablation results indicate that further gains are unlikely from simply enlarging the context window or model capacity.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same architecture could be adapted to estimate additional urban parameters such as building volume or material type with minimal extra cost.
Public release of the global product enables immediate integration into existing climate and disaster models that currently lack fine-scale building data.
If Sentinel data streams continue, periodic re-runs of the model could track urban expansion and height changes over time at low computational expense.
The efficiency advantage may extend to other remote-sensing regression tasks where labeled data are sparse but multi-modal satellite inputs are abundant.

Load-bearing premise

The geo-blocked split across 54 cities is assumed to deliver strict spatial independence plus enough morphological variety for the model to generalize globally without any further training.

What would settle it

Repeating the evaluation on a fresh collection of cities outside the original 54 and finding that GeoFormer’s height RMSE exceeds the retrained UNet baseline by more than 0.2 m.

Figures

Figures reproduced from arXiv: 2602.09932 by DaHee Kim, Han Jinzhen, HongSik Yun, JinByeong Lee, JiSung Kim, MinKyung Cho.

**Figure 2.** Figure 2: Illustration of Fishnet Analysis: a 100 m grid overlays vector building [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Geographic distribution of SHAFTS (v2022.3) reference cities. [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: Structure of a single city group in the final HDF5 file. [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗

**Figure 5.** Figure 5: Data leakage from random sampling under dynamic receptive field [PITH_FULL_IMAGE:figures/full_fig_p005_5.png] view at source ↗

**Figure 6.** Figure 6: Sample reduction under static receptive field expansion. [PITH_FULL_IMAGE:figures/full_fig_p005_6.png] view at source ↗

**Figure 7.** Figure 7: Radial sector division of New York City used for spatially balanced [PITH_FULL_IMAGE:figures/full_fig_p005_7.png] view at source ↗

**Figure 9.** Figure 9: Architecture of GeoFormer: a Swin-based multi-task model for predicting building height and footprint. [PITH_FULL_IMAGE:figures/full_fig_p006_9.png] view at source ↗

**Figure 10.** Figure 10: Illustration of the 8-band multi-source input tensor. From left to right: Sentinel-1 (VV, VH), Sentinel-2 (RGB+NIR), true BH, DEM, and the binary [PITH_FULL_IMAGE:figures/full_fig_p006_10.png] view at source ↗

**Figure 11.** Figure 11: Comparison of three CNN baseline architectures on BH and [PITH_FULL_IMAGE:figures/full_fig_p008_11.png] view at source ↗

**Figure 12.** Figure 12: Scatter plots comparing predictions and ground truths for BH and BF under different receptive field configurations. [PITH_FULL_IMAGE:figures/full_fig_p009_12.png] view at source ↗

**Figure 13.** Figure 13: Stratified error analysis of building height prediction across height [PITH_FULL_IMAGE:figures/full_fig_p009_13.png] view at source ↗

**Figure 15.** Figure 15: Sample distribution in train vs. test dataset. [PITH_FULL_IMAGE:figures/full_fig_p010_15.png] view at source ↗

**Figure 16.** Figure 16: Performance before and after removing the top 0.1% residuals across [PITH_FULL_IMAGE:figures/full_fig_p010_16.png] view at source ↗

**Figure 17.** Figure 17: Train vs. test performance gap for GeoFormer (Full), Enlarged, [PITH_FULL_IMAGE:figures/full_fig_p011_17.png] view at source ↗

**Figure 18.** Figure 18: Validation combined MAE and training loss curves for GeoFormer [PITH_FULL_IMAGE:figures/full_fig_p011_18.png] view at source ↗

**Figure 22.** Figure 22: Pre- and post-earthquake Sentinel-2 imagery and predicted BF/BH [PITH_FULL_IMAGE:figures/full_fig_p012_22.png] view at source ↗

**Figure 20.** Figure 20: Joint distribution of building height and footprint in Suwon. [PITH_FULL_IMAGE:figures/full_fig_p012_20.png] view at source ↗

**Figure 21.** Figure 21: Spatial distribution of building height (left) and footprint (right) in [PITH_FULL_IMAGE:figures/full_fig_p012_21.png] view at source ↗

read the original abstract

Building height (BH) and footprint (BF) are fundamental urban morphological parameters required by climate modelling, disaster-risk assessment, and population mapping, yet globally consistent data remain scarce. In this work, we develop GeoFormer, a lightweight Swin Transformer-based multi-task learning framework that jointly estimates BH and BF on a 100 m grid using only open-access Sentinel-1 SAR, Sentinel-2 multispectral, and DEM data. A geo-blocked data-splitting strategy enforces strict spatial independence between training and evaluation regions across 54 morphologically diverse cities. We set representative CNN baselines (ResNet, UNet, SENet) as benchmarks and thoroughly evaluate GeoFormer's prediction accuracy, computational efficiency, and spatial transferability. Results show that GeoFormer achieves a BH RMSE of 3.19 m with only 0.32 M parameters -- outperforming the best CNN baseline (UNet) by 7.5% -- indicating that windowed local attention is more effective than convolution for scene-level building-parameter retrieval. Systematic ablation on context window size, model capacity, and input modality further reveals that a 5x5 (500 m) receptive field is optimal, DEM is indispensable for height estimation, and multispectral reflectance carries the dominant predictive signal. Cross-continent transfer tests confirm BH RMSE below 3.5 m without region-specific fine-tuning. All code, model weights, and the resulting global product are publicly released.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

GeoFormer shows a small Swin model can beat standard CNNs on joint building height and footprint retrieval from Sentinel data, but the geo-blocked split lacks checks that would confirm real cross-region generalization.

read the letter

The paper puts forward GeoFormer, a lightweight multi-task Swin Transformer that estimates building height and footprint together at 100 m resolution from open Sentinel-1, Sentinel-2, and DEM inputs. The central result is a 3.19 m height RMSE with 0.32 M parameters, a 7.5 % improvement over the best CNN baseline on a test set from 54 cities held out by geographic blocks. They also release the code, weights, and a global product, which is straightforward to use for downstream work.

Referee Report

2 major / 1 minor

Summary. The paper introduces GeoFormer, a lightweight Swin Transformer-based multi-task framework for joint building height (BH) and footprint (BF) estimation on a 100 m grid from Sentinel-1 SAR, Sentinel-2 multispectral, and DEM inputs. It employs a geo-blocked split across 54 cities to enforce spatial independence, reports a BH RMSE of 3.19 m with 0.32 M parameters (7.5 % better than UNet), provides ablations on context window size, capacity, and modalities, and shows cross-continent transfer with RMSE below 3.5 m, while releasing all code, weights, and the global product.

Significance. If the performance and generalization claims hold, the work would be significant for delivering an efficient, publicly available model that improves upon CNN baselines for global-scale urban morphology retrieval using only open satellite data, with direct utility for climate modeling, disaster risk, and population mapping; the ablation results on receptive field and input modalities also provide useful insight into attention mechanisms for remote-sensing regression tasks.

major comments (2)

[Data-splitting section] Data-splitting section: the claim that the geo-blocked strategy across 54 cities 'enforces strict spatial independence' is load-bearing for the cross-continent transfer results (RMSE < 3.5 m) and the interpretation that windowed attention enables global generalization; however, no quantitative validation (e.g., Earth-mover distance or nearest-neighbor similarity on morphological histograms of building density/height) is supplied to confirm absence of leakage.
[Results section] Results section (performance table): the reported 7.5 % improvement over UNet and the headline BH RMSE of 3.19 m are presented without error bars, confidence intervals, or statistical significance tests, which is required to substantiate that the gain is robust rather than attributable to run-to-run variance.

minor comments (1)

[Abstract] Abstract: the joint multi-task architecture (shared backbone vs. separate heads) and the precise definition of the 100 m output grid are not stated explicitly, which would aid reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which help strengthen the manuscript. We address each major point below and will revise the paper accordingly where appropriate.

read point-by-point responses

Referee: [Data-splitting section] Data-splitting section: the claim that the geo-blocked strategy across 54 cities 'enforces strict spatial independence' is load-bearing for the cross-continent transfer results (RMSE < 3.5 m) and the interpretation that windowed attention enables global generalization; however, no quantitative validation (e.g., Earth-mover distance or nearest-neighbor similarity on morphological histograms of building density/height) is supplied to confirm absence of leakage.

Authors: We agree that quantitative validation would further substantiate the spatial independence claim. In the revised manuscript we will add an analysis of morphological feature distributions (building density and height histograms) between the training and test partitions, reporting Earth Mover's Distance and nearest-neighbor similarity scores. The geo-blocked split across 54 cities was constructed to eliminate any spatial overlap, but the additional metrics will provide empirical confirmation of minimal leakage. revision: yes
Referee: [Results section] Results section (performance table): the reported 7.5 % improvement over UNet and the headline BH RMSE of 3.19 m are presented without error bars, confidence intervals, or statistical significance tests, which is required to substantiate that the gain is robust rather than attributable to run-to-run variance.

Authors: We concur that error bars and statistical tests are necessary to demonstrate robustness. In the revised version we will report standard deviations computed over five independent training runs with different random seeds, add 95% confidence intervals to the performance table, and include paired t-test p-values comparing GeoFormer against the UNet baseline to establish that the 7.5% improvement is statistically significant. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical training and held-out geographic evaluation are self-contained

full rationale

The manuscript presents GeoFormer as an empirical multi-task model (Swin-Transformer backbone with standard training on Sentinel-1/2 + DEM inputs). All headline numbers (BH RMSE 3.19 m, 7.5 % gain over UNet, cross-continent transfer < 3.5 m) are obtained by fitting on geo-blocked training folds and measuring on held-out city blocks. No derivation, uniqueness theorem, or ansatz is invoked that reduces the reported performance to fitted parameters by construction. Any citations to the original Swin Transformer paper are to an independent, externally published architecture and do not bear the load of the accuracy claims. The evaluation therefore remains falsifiable against external benchmarks and receives the default non-circularity finding.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

The paper relies on standard supervised learning assumptions and the sufficiency of Sentinel-1/2 plus DEM for the task; no new physical axioms or invented entities are introduced.

free parameters (2)

context_window_size
5x5 (500 m) receptive field selected after ablation; treated as a tuned hyperparameter.
model_capacity
Lightweight configuration with 0.32 M parameters chosen to balance accuracy and efficiency.

axioms (2)

domain assumption Sentinel-1 SAR, Sentinel-2 multispectral, and DEM inputs contain sufficient signal for building height and footprint at 100 m resolution
Invoked throughout the abstract as the basis for using only these open data sources.
domain assumption Geo-blocked splitting across 54 cities produces training and test sets that are spatially independent and morphologically representative
Central to the claim of global transferability without fine-tuning.

pith-pipeline@v0.9.0 · 5580 in / 1543 out tokens · 52159 ms · 2026-05-16T02:45:40.092750+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

52 extracted references · 52 canonical work pages

[1]

Changing and differentiated urban land- scape in China: Spatiotemporal patterns and driving forces,

C. Fang, G. Li, and S. Wang, “Changing and differentiated urban land- scape in China: Spatiotemporal patterns and driving forces,”Environ. Sci. Technol., vol. 50, no. 5, pp. 2217–2227, 2016

work page 2016
[2]

A global fingerprint of macro-scale changes in urban structure from 1999 to 2009,

S. Frolking, T. Milliman, K. C. Seto, and M. A. Friedl, “A global fingerprint of macro-scale changes in urban structure from 1999 to 2009,”Environ. Res. Lett., vol. 8, no. 2, p. 024004, 2013

work page 1999
[3]

Global urban structural growth shows a profound shift from spreading out to building up,

S. Frolking, R. Mahtta, T. Milliman, T. Esch, and K. C. Seto, “Global urban structural growth shows a profound shift from spreading out to building up,”Nat. Cities, vol. 1, no. 9, pp. 555–566, 2024

work page 2024
[4]

Impacts of urban-scale building height diversity on urban climates: A case study of Nanjing, China,

C. Xi, C. Ren, J. Wang, Z. Feng, and S.-J. Cao, “Impacts of urban-scale building height diversity on urban climates: A case study of Nanjing, China,”Energy Build., vol. 251, p. 111350, 2021

work page 2021
[5]

Effects of vegetation, urban density, building height, and atmospheric conditions on local temperatures and thermal comfort,

K. Perini and A. Magliocco, “Effects of vegetation, urban density, building height, and atmospheric conditions on local temperatures and thermal comfort,”Urban For. Urban Green., vol. 13, no. 3, pp. 495–506, 2014

work page 2014
[6]

Estimates of exposure to the 100-year floods in the conterminous United States using national building footprints,

X. Huang and C. Wang, “Estimates of exposure to the 100-year floods in the conterminous United States using national building footprints,” Int. J. Disaster Risk Reduct., vol. 50, p. 101731, 2020

work page 2020
[7]

A fire following earthquake spread model considering building height and its application to real-world events,

Y . Tian, M. Lu, Z. Xu, and J. Ren, “A fire following earthquake spread model considering building height and its application to real-world events,”Int. J. Disaster Risk Reduct., p. 105261, 2025

work page 2025
[8]

OpenStreetMap download statistics,

Geofabrik, “OpenStreetMap download statistics,” 2018

work page 2018
[9]

Developing a method to estimate building height from Sentinel-1 data,

X. Li, Y . Zhou, P. Gong, K. C. Seto, and N. Clinton, “Developing a method to estimate building height from Sentinel-1 data,”Remote Sens. Environ., vol. 240, p. 111705, 2020

work page 2020
[10]

Deep learning-based building height mapping using Sentinel-1 and Sentinel-2 data,

B. Cai, Z. Shao, X. Huang, X. Zhou, and S. Fang, “Deep learning-based building height mapping using Sentinel-1 and Sentinel-2 data,”Int. J. Appl. Earth Obs. Geoinformation, vol. 122, p. 103399, 2023

work page 2023
[11]

National-scale mapping of building height using Sentinel-1 and Sentinel-2 time series,

D. Frantz, F. Schug, A. Okujeni, C. Navacchi, W. Wagner, S. van der Linden, and P. Hostert, “National-scale mapping of building height using Sentinel-1 and Sentinel-2 time series,”Remote Sens. Environ., vol. 252, p. 112128, 2021

work page 2021
[12]

Leveraging machine learning to generate a unified and complete building height dataset for Germany,

K. Dabrock, N. Pflugradt, J. M. Weinand, and D. Stolten, “Leveraging machine learning to generate a unified and complete building height dataset for Germany,”Energy AI, vol. 17, p. 100408, 2024

work page 2024
[13]

Deep learning based building footprint extraction from very high resolution true orthophotos and nDSM,

M. Buyukdemircioglu, R. Can, S. Kocaman, and M. Kada, “Deep learning based building footprint extraction from very high resolution true orthophotos and nDSM,”ISPRS Ann. Photogramm. Remote Sens. Spat. Inf. Sci., vol. 2, pp. 211–218, 2022

work page 2022
[14]

A deep learning-based framework for automated extraction of building footprint polygons from very high- resolution aerial imagery,

Z. Li, Q. Xin, Y . Sun, and M. Cao, “A deep learning-based framework for automated extraction of building footprint polygons from very high- resolution aerial imagery,”Remote Sens., vol. 13, no. 18, p. 3630, 2021

work page 2021
[15]

SHAFTS (v2022.3): A deep- learning-based Python package for simultaneous extraction of building height and footprint from Sentinel imagery,

R. Li, T. Sun, F. Tian, and G.-H. Ni, “SHAFTS (v2022.3): A deep- learning-based Python package for simultaneous extraction of building height and footprint from Sentinel imagery,”Geosci. Model Dev., vol. 16, no. 2, pp. 751–778, 2023

work page 2023
[16]

Automatic building foot- print extraction from very high-resolution imagery using deep learning techniques,

K. Rastogi, P. Bodani, and S. A. Sharma, “Automatic building foot- print extraction from very high-resolution imagery using deep learning techniques,”Geocarto Int., vol. 37, no. 5, pp. 1501–1513, 2022

work page 2022
[17]

Creating 3D city models with building footprints and LIDAR point cloud classification: A machine learning approach,

Y . Park and J.-M. Guldmann, “Creating 3D city models with building footprints and LIDAR point cloud classification: A machine learning approach,”Comput. Environ. Urban Syst., vol. 75, pp. 76–89, 2019. IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERV ATIONS AND REMOTE SENSING 14

work page 2019
[18]

Large-scale building height estimation from single VHR SAR image using fully convolutional network and GIS building footprints. 2019 Joint Urban Remote Sensing Event, JURSE 2019,

Y . Sun, Y . Hua, L. Mou, and XX. Zhu, “Large-scale building height estimation from single VHR SAR image using fully convolutional network and GIS building footprints. 2019 Joint Urban Remote Sensing Event, JURSE 2019,” 2019

work page 2019
[19]

Automated building height estimation using ice, cloud, and land elevation satellite 2 light detection and ranging data and building footprints,

P. Cai, J. Guo, R. Li, Z. Xiao, H. Fu, T. Guo, X. Zhang, Y . Li, and X. Song, “Automated building height estimation using ice, cloud, and land elevation satellite 2 light detection and ranging data and building footprints,”Remote Sens., vol. 16, no. 2, p. 263, 2024

work page 2024
[20]

A first Chinese building height estimate at 10 m resolution (CNBH-10 m) using multi-source earth observations and machine learning,

W.-B. Wu, J. Ma, E. Banzhaf, M. E. Meadows, Z.-W. Yu, F.-X. Guo, D. Sengupta, X.-X. Cai, and B. Zhao, “A first Chinese building height estimate at 10 m resolution (CNBH-10 m) using multi-source earth observations and machine learning,”Remote Sens. Environ., vol. 291, p. 113578, 2023

work page 2023
[21]

Refining urban morphology: An explainable machine learning method for estimating footprint-level building height,

Y . Chen, W. Sun, L. Yang, X. Yang, X. Zhou, X. Li, S. Li, and G. Tang, “Refining urban morphology: An explainable machine learning method for estimating footprint-level building height,”Sustain. Cities Soc., vol. 112, p. 105635, 2024

work page 2024
[22]

Structure-aware deep learning network for building height estimation,

Y . Chen, J. Zhou, C. Xu, Q. Ma, X. Zhang, Y . Zhou, and Y . Ge, “Structure-aware deep learning network for building height estimation,” Int. J. Appl. Earth Obs. Geoinformation, p. 104443, 2025

work page 2025
[23]

3D-GloBFP: The first global three-dimensional building footprint dataset,

Y . Che, X. Li, X. Liu, Y . Wang, W. Liao, X. Zheng, X. Zhang, X. Xu, Q. Shi, J. Zhuet al., “3D-GloBFP: The first global three-dimensional building footprint dataset,”Earth Syst. Sci. Data Discuss., vol. 2024, pp. 1–28, 2024

work page 2024
[24]

Mf-bhnet: A hybrid multimodal fusion network for building height estimation using sentinel-1 and sentinel-2 imagery,

S. Wang, B. Cai, D. Hou, Q. Ding, J. Wang, and Z. Shao, “Mf-bhnet: A hybrid multimodal fusion network for building height estimation using sentinel-1 and sentinel-2 imagery,”IEEE Transactions on Geoscience and Remote Sensing, vol. 62, pp. 1–19, 2024

work page 2024
[25]

Y . Zhenget al., “Estimating individual building heights by integrating spaceborne LiDAR and multisource remote sensing data: A CNN– transformer model and a semi-supervised sample augmentation ap- proach,”IEEE Transactions on Geoscience and Remote Sensing, vol. 63, 2025

work page 2025
[26]

Global building heights for urban studies (ut-globus) for city-and street-scale urban simulations: Development and first applications,

H. G. Kamath, M. Singh, N. Malviya, A. Martilli, L. He, D. Aliaga, C. He, F. Chen, L. A. Magruder, Z.-L. Yanget al., “Global building heights for urban studies (ut-globus) for city-and street-scale urban simulations: Development and first applications,”Scientific Data, vol. 11, no. 1, p. 886, 2024

work page 2024
[27]

GlobalBuildingAtlas: An open global and complete dataset of building polygons, heights and LoD1 3D models,

X. Zhu, S. Chen, F. Zhang, Y . Shi, and Y . Wang, “GlobalBuildingAtlas: An open global and complete dataset of building polygons, heights and LoD1 3D models,”Earth System Science Data, vol. 17, pp. 6647–6670, 2025

work page 2025
[28]

The pixel: A snare and a delusion,

P. Fisher, “The pixel: A snare and a delusion,”Int. J. Remote Sens., vol. 18, no. 3, pp. 679–685, 1997

work page 1997
[29]

Remote sensing of impervious surfaces in the urban areas: Requirements, methods, and trends,

Q. Weng, “Remote sensing of impervious surfaces in the urban areas: Requirements, methods, and trends,”Remote Sens. Environ., vol. 117, pp. 34–49, 2012

work page 2012
[30]

Earthquake damage assess- ment of buildings using VHR optical and SAR imagery,

D. Brunner, G. Lemoine, and L. Bruzzone, “Earthquake damage assess- ment of buildings using VHR optical and SAR imagery,”IEEE Trans. Geosci. Remote Sens., vol. 48, no. 5, pp. 2403–2420, 2010

work page 2010
[31]

Sub-pixel building area mapping based on synthetic training data and regression-based unmixing using Sentinel-1 and-2 data,

F. Schug, D. Frantz, A. Okujeni, and P. Hostert, “Sub-pixel building area mapping based on synthetic training data and regression-based unmixing using Sentinel-1 and-2 data,”Remote Sens. Lett., vol. 13, no. 8, pp. 822– 832, 2022

work page 2022
[32]

Sentinel- 2’s potential for sub-pixel landscape feature detection,

J. Radoux, G. Chom ´e, D. C. Jacques, F. Waldner, N. Bellemans, N. Matton, C. Lamarche, R. d’Andrimont, and P. Defourny, “Sentinel- 2’s potential for sub-pixel landscape feature detection,”Remote Sens., vol. 8, no. 6, p. 488, 2016

work page 2016
[33]

Local climate zones for urban temperature studies,

I. D. Stewart and T. R. Oke, “Local climate zones for urban temperature studies,”Bull. Am. Meteorol. Soc., vol. 93, no. 12, pp. 1879–1900, 2012

work page 1900
[34]

A global map of local climate zones to support earth system modelling and urban scale environmental science,

M. Demuzere, J. Kittner, A. Martilli, G. Mills, C. Moede, I. D. Stewart, J. Van Vliet, and B. Bechtel, “A global map of local climate zones to support earth system modelling and urban scale environmental science,” Earth System Science Data Discussions, vol. 2022, pp. 1–57, 2022

work page 2022
[35]

WUDAPT: An urban weather, climate, and environmental modeling infrastructure for the anthropocene,

J. Ching, G. Mills, B. Bechtel, L. See, J. Feddema, X. Wang, C. Ren, O. Brousse, A. Martilli, M. Neophytouet al., “WUDAPT: An urban weather, climate, and environmental modeling infrastructure for the anthropocene,”Bull. Am. Meteorol. Soc., vol. 99, no. 9, pp. 1907–1924, 2018

work page 1907
[36]

Mapping local climate zones for a worldwide database of the form and function of cities,

B. Bechtel, P. J. Alexander, J. B ¨ohner, J. Ching, O. Conrad, J. Feddema, G. Mills, L. See, and I. Stewart, “Mapping local climate zones for a worldwide database of the form and function of cities,”ISPRS Int. J. Geo-Inf., vol. 4, no. 1, pp. 199–219, 2015

work page 2015
[37]

An urban surface exchange parameterisation for mesoscale models,

A. Martilli, A. Clappier, and M. W. Rotach, “An urban surface exchange parameterisation for mesoscale models,”Bound.-Layer Meteorol., vol. 104, pp. 261–304, 2002

work page 2002
[38]

WorldPop, open data for spatial demography,

A. J. Tatem, “WorldPop, open data for spatial demography,”Sci. Data, vol. 4, p. 170004, 2017

work page 2017
[39]

GHS-POP R2023A – GHS population grid multitemporal (1975–2030),

M. Schiavina, S. Freire, A. Carioli, and K. MacManus, “GHS-POP R2023A – GHS population grid multitemporal (1975–2030),” European Commission, Joint Research Centre (JRC), 2023, available at 100 m resolution

work page 1975
[40]

Urban building energy modeling – a review of a nascent field,

C. F. Reinhart and C. Cerezo Davila, “Urban building energy modeling – a review of a nascent field,”Build. Environ., vol. 97, pp. 196–202, 2016

work page 2016
[41]

Application of GIS tools in the measurement analysis of urban spatial layouts using the square grid method,

Ł. Musiaka and M. Nalej, “Application of GIS tools in the measurement analysis of urban spatial layouts using the square grid method,”ISPRS Int. J. Geo-Inf., vol. 10, no. 8, p. 558, 2021

work page 2021
[42]

Swin transformer: Hierarchical vision transformer using shifted windows,

Z. Liu, Y . Lin, Y . Cao, H. Hu, Y . Wei, Z. Zhang, S. Lin, and B. Guo, “Swin transformer: Hierarchical vision transformer using shifted windows,” inProc. IEEECVF Int. Conf. Comput. Vis., 2021, pp. 10 012– 10 022

work page 2021
[43]

Multi-task learning using uncer- tainty to weigh losses for scene geometry and semantics,

A. Kendall, Y . Gal, and R. Cipolla, “Multi-task learning using uncer- tainty to weigh losses for scene geometry and semantics,” inProc. IEEE Conf. Comput. Vis. Pattern Recognit., 2018, pp. 7482–7491

work page 2018
[44]

Robust estimation of a location parameter,

P. J. Huber, “Robust estimation of a location parameter,” inBreak- throughs in Statistics: Methodology and Distribution. Springer, 1992, pp. 492–518

work page 1992
[45]

Decoupled weight decay regularization,

I. Loshchilov and F. Hutter, “Decoupled weight decay regularization,” ArXiv Prepr. ArXiv171105101, 2017

work page 2017
[46]

Sgdr: Stochastic gradient descent with warm restarts,

——, “Sgdr: Stochastic gradient descent with warm restarts,”ArXiv Prepr. ArXiv160803983, 2016

work page 2016
[47]

Deep residual learning for image recognition,

K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” inProc. IEEE Conf. Comput. Vis. Pattern Recognit., 2016, pp. 770–778

work page 2016
[48]

U-Net: Convolutional net- works for biomedical image segmentation,

O. Ronneberger, P. Fischer, and T. Brox, “U-Net: Convolutional net- works for biomedical image segmentation,” inMedical Image Comput- ing and Computer-Assisted Intervention (MICCAI). Springer, 2015, pp. 234–241

work page 2015
[49]

Squeeze-and-excitation networks,

J. Hu, L. Shen, and G. Sun, “Squeeze-and-excitation networks,” inProc. IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), 2018, pp. 7132–7141

work page 2018
[50]

Improvement of surface roughness classification criteria reflecting the height and density of building by region,

M.-H. Lee, W.-S. Seo, C.-Y . Park, and C.-H. Choi, “Improvement of surface roughness classification criteria reflecting the height and density of building by region,”J. Korean Inst. Archit. Sustain. Environ. Build. Syst. KIAEBS, vol. 15, no. 5, pp. 513–524, 2021

work page 2021
[51]

Evaluating urban building damage of 2023 Kahramanmaras, Turkey earthquake sequence using SAR change detection,

X. Wang, G. Feng, L. He, Q. An, Z. Xiong, H. Lu, W. Wang, N. Li, Y . Zhao, Y . Wang, and Y . Wang, “Evaluating urban building damage of 2023 Kahramanmaras, Turkey earthquake sequence using SAR change detection,”Sensors, vol. 23, no. 14, p. 6342, 2023

work page 2023
[52]

Intelligent assessment of building damage of 2023 Turkey–Syria earthquake by multiple remote sensing approaches,

X. Yu, X. Hu, Y . Songet al., “Intelligent assessment of building damage of 2023 Turkey–Syria earthquake by multiple remote sensing approaches,”npj Nat. Hazards, vol. 1, p. 3, 2024

work page 2023

[1] [1]

Changing and differentiated urban land- scape in China: Spatiotemporal patterns and driving forces,

C. Fang, G. Li, and S. Wang, “Changing and differentiated urban land- scape in China: Spatiotemporal patterns and driving forces,”Environ. Sci. Technol., vol. 50, no. 5, pp. 2217–2227, 2016

work page 2016

[2] [2]

A global fingerprint of macro-scale changes in urban structure from 1999 to 2009,

S. Frolking, T. Milliman, K. C. Seto, and M. A. Friedl, “A global fingerprint of macro-scale changes in urban structure from 1999 to 2009,”Environ. Res. Lett., vol. 8, no. 2, p. 024004, 2013

work page 1999

[3] [3]

Global urban structural growth shows a profound shift from spreading out to building up,

S. Frolking, R. Mahtta, T. Milliman, T. Esch, and K. C. Seto, “Global urban structural growth shows a profound shift from spreading out to building up,”Nat. Cities, vol. 1, no. 9, pp. 555–566, 2024

work page 2024

[4] [4]

Impacts of urban-scale building height diversity on urban climates: A case study of Nanjing, China,

C. Xi, C. Ren, J. Wang, Z. Feng, and S.-J. Cao, “Impacts of urban-scale building height diversity on urban climates: A case study of Nanjing, China,”Energy Build., vol. 251, p. 111350, 2021

work page 2021

[5] [5]

Effects of vegetation, urban density, building height, and atmospheric conditions on local temperatures and thermal comfort,

K. Perini and A. Magliocco, “Effects of vegetation, urban density, building height, and atmospheric conditions on local temperatures and thermal comfort,”Urban For. Urban Green., vol. 13, no. 3, pp. 495–506, 2014

work page 2014

[6] [6]

Estimates of exposure to the 100-year floods in the conterminous United States using national building footprints,

X. Huang and C. Wang, “Estimates of exposure to the 100-year floods in the conterminous United States using national building footprints,” Int. J. Disaster Risk Reduct., vol. 50, p. 101731, 2020

work page 2020

[7] [7]

A fire following earthquake spread model considering building height and its application to real-world events,

Y . Tian, M. Lu, Z. Xu, and J. Ren, “A fire following earthquake spread model considering building height and its application to real-world events,”Int. J. Disaster Risk Reduct., p. 105261, 2025

work page 2025

[8] [8]

OpenStreetMap download statistics,

Geofabrik, “OpenStreetMap download statistics,” 2018

work page 2018

[9] [9]

Developing a method to estimate building height from Sentinel-1 data,

X. Li, Y . Zhou, P. Gong, K. C. Seto, and N. Clinton, “Developing a method to estimate building height from Sentinel-1 data,”Remote Sens. Environ., vol. 240, p. 111705, 2020

work page 2020

[10] [10]

Deep learning-based building height mapping using Sentinel-1 and Sentinel-2 data,

B. Cai, Z. Shao, X. Huang, X. Zhou, and S. Fang, “Deep learning-based building height mapping using Sentinel-1 and Sentinel-2 data,”Int. J. Appl. Earth Obs. Geoinformation, vol. 122, p. 103399, 2023

work page 2023

[11] [11]

National-scale mapping of building height using Sentinel-1 and Sentinel-2 time series,

D. Frantz, F. Schug, A. Okujeni, C. Navacchi, W. Wagner, S. van der Linden, and P. Hostert, “National-scale mapping of building height using Sentinel-1 and Sentinel-2 time series,”Remote Sens. Environ., vol. 252, p. 112128, 2021

work page 2021

[12] [12]

Leveraging machine learning to generate a unified and complete building height dataset for Germany,

K. Dabrock, N. Pflugradt, J. M. Weinand, and D. Stolten, “Leveraging machine learning to generate a unified and complete building height dataset for Germany,”Energy AI, vol. 17, p. 100408, 2024

work page 2024

[13] [13]

Deep learning based building footprint extraction from very high resolution true orthophotos and nDSM,

M. Buyukdemircioglu, R. Can, S. Kocaman, and M. Kada, “Deep learning based building footprint extraction from very high resolution true orthophotos and nDSM,”ISPRS Ann. Photogramm. Remote Sens. Spat. Inf. Sci., vol. 2, pp. 211–218, 2022

work page 2022

[14] [14]

A deep learning-based framework for automated extraction of building footprint polygons from very high- resolution aerial imagery,

Z. Li, Q. Xin, Y . Sun, and M. Cao, “A deep learning-based framework for automated extraction of building footprint polygons from very high- resolution aerial imagery,”Remote Sens., vol. 13, no. 18, p. 3630, 2021

work page 2021

[15] [15]

SHAFTS (v2022.3): A deep- learning-based Python package for simultaneous extraction of building height and footprint from Sentinel imagery,

R. Li, T. Sun, F. Tian, and G.-H. Ni, “SHAFTS (v2022.3): A deep- learning-based Python package for simultaneous extraction of building height and footprint from Sentinel imagery,”Geosci. Model Dev., vol. 16, no. 2, pp. 751–778, 2023

work page 2023

[16] [16]

Automatic building foot- print extraction from very high-resolution imagery using deep learning techniques,

K. Rastogi, P. Bodani, and S. A. Sharma, “Automatic building foot- print extraction from very high-resolution imagery using deep learning techniques,”Geocarto Int., vol. 37, no. 5, pp. 1501–1513, 2022

work page 2022

[17] [17]

Creating 3D city models with building footprints and LIDAR point cloud classification: A machine learning approach,

Y . Park and J.-M. Guldmann, “Creating 3D city models with building footprints and LIDAR point cloud classification: A machine learning approach,”Comput. Environ. Urban Syst., vol. 75, pp. 76–89, 2019. IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERV ATIONS AND REMOTE SENSING 14

work page 2019

[18] [18]

Large-scale building height estimation from single VHR SAR image using fully convolutional network and GIS building footprints. 2019 Joint Urban Remote Sensing Event, JURSE 2019,

Y . Sun, Y . Hua, L. Mou, and XX. Zhu, “Large-scale building height estimation from single VHR SAR image using fully convolutional network and GIS building footprints. 2019 Joint Urban Remote Sensing Event, JURSE 2019,” 2019

work page 2019

[19] [19]

Automated building height estimation using ice, cloud, and land elevation satellite 2 light detection and ranging data and building footprints,

P. Cai, J. Guo, R. Li, Z. Xiao, H. Fu, T. Guo, X. Zhang, Y . Li, and X. Song, “Automated building height estimation using ice, cloud, and land elevation satellite 2 light detection and ranging data and building footprints,”Remote Sens., vol. 16, no. 2, p. 263, 2024

work page 2024

[20] [20]

A first Chinese building height estimate at 10 m resolution (CNBH-10 m) using multi-source earth observations and machine learning,

W.-B. Wu, J. Ma, E. Banzhaf, M. E. Meadows, Z.-W. Yu, F.-X. Guo, D. Sengupta, X.-X. Cai, and B. Zhao, “A first Chinese building height estimate at 10 m resolution (CNBH-10 m) using multi-source earth observations and machine learning,”Remote Sens. Environ., vol. 291, p. 113578, 2023

work page 2023

[21] [21]

Refining urban morphology: An explainable machine learning method for estimating footprint-level building height,

Y . Chen, W. Sun, L. Yang, X. Yang, X. Zhou, X. Li, S. Li, and G. Tang, “Refining urban morphology: An explainable machine learning method for estimating footprint-level building height,”Sustain. Cities Soc., vol. 112, p. 105635, 2024

work page 2024

[22] [22]

Structure-aware deep learning network for building height estimation,

Y . Chen, J. Zhou, C. Xu, Q. Ma, X. Zhang, Y . Zhou, and Y . Ge, “Structure-aware deep learning network for building height estimation,” Int. J. Appl. Earth Obs. Geoinformation, p. 104443, 2025

work page 2025

[23] [23]

3D-GloBFP: The first global three-dimensional building footprint dataset,

Y . Che, X. Li, X. Liu, Y . Wang, W. Liao, X. Zheng, X. Zhang, X. Xu, Q. Shi, J. Zhuet al., “3D-GloBFP: The first global three-dimensional building footprint dataset,”Earth Syst. Sci. Data Discuss., vol. 2024, pp. 1–28, 2024

work page 2024

[24] [24]

Mf-bhnet: A hybrid multimodal fusion network for building height estimation using sentinel-1 and sentinel-2 imagery,

S. Wang, B. Cai, D. Hou, Q. Ding, J. Wang, and Z. Shao, “Mf-bhnet: A hybrid multimodal fusion network for building height estimation using sentinel-1 and sentinel-2 imagery,”IEEE Transactions on Geoscience and Remote Sensing, vol. 62, pp. 1–19, 2024

work page 2024

[25] [25]

Y . Zhenget al., “Estimating individual building heights by integrating spaceborne LiDAR and multisource remote sensing data: A CNN– transformer model and a semi-supervised sample augmentation ap- proach,”IEEE Transactions on Geoscience and Remote Sensing, vol. 63, 2025

work page 2025

[26] [26]

Global building heights for urban studies (ut-globus) for city-and street-scale urban simulations: Development and first applications,

H. G. Kamath, M. Singh, N. Malviya, A. Martilli, L. He, D. Aliaga, C. He, F. Chen, L. A. Magruder, Z.-L. Yanget al., “Global building heights for urban studies (ut-globus) for city-and street-scale urban simulations: Development and first applications,”Scientific Data, vol. 11, no. 1, p. 886, 2024

work page 2024

[27] [27]

GlobalBuildingAtlas: An open global and complete dataset of building polygons, heights and LoD1 3D models,

X. Zhu, S. Chen, F. Zhang, Y . Shi, and Y . Wang, “GlobalBuildingAtlas: An open global and complete dataset of building polygons, heights and LoD1 3D models,”Earth System Science Data, vol. 17, pp. 6647–6670, 2025

work page 2025

[28] [28]

The pixel: A snare and a delusion,

P. Fisher, “The pixel: A snare and a delusion,”Int. J. Remote Sens., vol. 18, no. 3, pp. 679–685, 1997

work page 1997

[29] [29]

Remote sensing of impervious surfaces in the urban areas: Requirements, methods, and trends,

Q. Weng, “Remote sensing of impervious surfaces in the urban areas: Requirements, methods, and trends,”Remote Sens. Environ., vol. 117, pp. 34–49, 2012

work page 2012

[30] [30]

Earthquake damage assess- ment of buildings using VHR optical and SAR imagery,

D. Brunner, G. Lemoine, and L. Bruzzone, “Earthquake damage assess- ment of buildings using VHR optical and SAR imagery,”IEEE Trans. Geosci. Remote Sens., vol. 48, no. 5, pp. 2403–2420, 2010

work page 2010

[31] [31]

Sub-pixel building area mapping based on synthetic training data and regression-based unmixing using Sentinel-1 and-2 data,

F. Schug, D. Frantz, A. Okujeni, and P. Hostert, “Sub-pixel building area mapping based on synthetic training data and regression-based unmixing using Sentinel-1 and-2 data,”Remote Sens. Lett., vol. 13, no. 8, pp. 822– 832, 2022

work page 2022

[32] [32]

Sentinel- 2’s potential for sub-pixel landscape feature detection,

J. Radoux, G. Chom ´e, D. C. Jacques, F. Waldner, N. Bellemans, N. Matton, C. Lamarche, R. d’Andrimont, and P. Defourny, “Sentinel- 2’s potential for sub-pixel landscape feature detection,”Remote Sens., vol. 8, no. 6, p. 488, 2016

work page 2016

[33] [33]

Local climate zones for urban temperature studies,

I. D. Stewart and T. R. Oke, “Local climate zones for urban temperature studies,”Bull. Am. Meteorol. Soc., vol. 93, no. 12, pp. 1879–1900, 2012

work page 1900

[34] [34]

A global map of local climate zones to support earth system modelling and urban scale environmental science,

M. Demuzere, J. Kittner, A. Martilli, G. Mills, C. Moede, I. D. Stewart, J. Van Vliet, and B. Bechtel, “A global map of local climate zones to support earth system modelling and urban scale environmental science,” Earth System Science Data Discussions, vol. 2022, pp. 1–57, 2022

work page 2022

[35] [35]

WUDAPT: An urban weather, climate, and environmental modeling infrastructure for the anthropocene,

J. Ching, G. Mills, B. Bechtel, L. See, J. Feddema, X. Wang, C. Ren, O. Brousse, A. Martilli, M. Neophytouet al., “WUDAPT: An urban weather, climate, and environmental modeling infrastructure for the anthropocene,”Bull. Am. Meteorol. Soc., vol. 99, no. 9, pp. 1907–1924, 2018

work page 1907

[36] [36]

Mapping local climate zones for a worldwide database of the form and function of cities,

B. Bechtel, P. J. Alexander, J. B ¨ohner, J. Ching, O. Conrad, J. Feddema, G. Mills, L. See, and I. Stewart, “Mapping local climate zones for a worldwide database of the form and function of cities,”ISPRS Int. J. Geo-Inf., vol. 4, no. 1, pp. 199–219, 2015

work page 2015

[37] [37]

An urban surface exchange parameterisation for mesoscale models,

A. Martilli, A. Clappier, and M. W. Rotach, “An urban surface exchange parameterisation for mesoscale models,”Bound.-Layer Meteorol., vol. 104, pp. 261–304, 2002

work page 2002

[38] [38]

WorldPop, open data for spatial demography,

A. J. Tatem, “WorldPop, open data for spatial demography,”Sci. Data, vol. 4, p. 170004, 2017

work page 2017

[39] [39]

GHS-POP R2023A – GHS population grid multitemporal (1975–2030),

M. Schiavina, S. Freire, A. Carioli, and K. MacManus, “GHS-POP R2023A – GHS population grid multitemporal (1975–2030),” European Commission, Joint Research Centre (JRC), 2023, available at 100 m resolution

work page 1975

[40] [40]

Urban building energy modeling – a review of a nascent field,

C. F. Reinhart and C. Cerezo Davila, “Urban building energy modeling – a review of a nascent field,”Build. Environ., vol. 97, pp. 196–202, 2016

work page 2016

[41] [41]

Application of GIS tools in the measurement analysis of urban spatial layouts using the square grid method,

Ł. Musiaka and M. Nalej, “Application of GIS tools in the measurement analysis of urban spatial layouts using the square grid method,”ISPRS Int. J. Geo-Inf., vol. 10, no. 8, p. 558, 2021

work page 2021

[42] [42]

Swin transformer: Hierarchical vision transformer using shifted windows,

Z. Liu, Y . Lin, Y . Cao, H. Hu, Y . Wei, Z. Zhang, S. Lin, and B. Guo, “Swin transformer: Hierarchical vision transformer using shifted windows,” inProc. IEEECVF Int. Conf. Comput. Vis., 2021, pp. 10 012– 10 022

work page 2021

[43] [43]

Multi-task learning using uncer- tainty to weigh losses for scene geometry and semantics,

A. Kendall, Y . Gal, and R. Cipolla, “Multi-task learning using uncer- tainty to weigh losses for scene geometry and semantics,” inProc. IEEE Conf. Comput. Vis. Pattern Recognit., 2018, pp. 7482–7491

work page 2018

[44] [44]

Robust estimation of a location parameter,

P. J. Huber, “Robust estimation of a location parameter,” inBreak- throughs in Statistics: Methodology and Distribution. Springer, 1992, pp. 492–518

work page 1992

[45] [45]

Decoupled weight decay regularization,

I. Loshchilov and F. Hutter, “Decoupled weight decay regularization,” ArXiv Prepr. ArXiv171105101, 2017

work page 2017

[46] [46]

Sgdr: Stochastic gradient descent with warm restarts,

——, “Sgdr: Stochastic gradient descent with warm restarts,”ArXiv Prepr. ArXiv160803983, 2016

work page 2016

[47] [47]

Deep residual learning for image recognition,

K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” inProc. IEEE Conf. Comput. Vis. Pattern Recognit., 2016, pp. 770–778

work page 2016

[48] [48]

U-Net: Convolutional net- works for biomedical image segmentation,

O. Ronneberger, P. Fischer, and T. Brox, “U-Net: Convolutional net- works for biomedical image segmentation,” inMedical Image Comput- ing and Computer-Assisted Intervention (MICCAI). Springer, 2015, pp. 234–241

work page 2015

[49] [49]

Squeeze-and-excitation networks,

J. Hu, L. Shen, and G. Sun, “Squeeze-and-excitation networks,” inProc. IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), 2018, pp. 7132–7141

work page 2018

[50] [50]

Improvement of surface roughness classification criteria reflecting the height and density of building by region,

M.-H. Lee, W.-S. Seo, C.-Y . Park, and C.-H. Choi, “Improvement of surface roughness classification criteria reflecting the height and density of building by region,”J. Korean Inst. Archit. Sustain. Environ. Build. Syst. KIAEBS, vol. 15, no. 5, pp. 513–524, 2021

work page 2021

[51] [51]

Evaluating urban building damage of 2023 Kahramanmaras, Turkey earthquake sequence using SAR change detection,

X. Wang, G. Feng, L. He, Q. An, Z. Xiong, H. Lu, W. Wang, N. Li, Y . Zhao, Y . Wang, and Y . Wang, “Evaluating urban building damage of 2023 Kahramanmaras, Turkey earthquake sequence using SAR change detection,”Sensors, vol. 23, no. 14, p. 6342, 2023

work page 2023

[52] [52]

Intelligent assessment of building damage of 2023 Turkey–Syria earthquake by multiple remote sensing approaches,

X. Yu, X. Hu, Y . Songet al., “Intelligent assessment of building damage of 2023 Turkey–Syria earthquake by multiple remote sensing approaches,”npj Nat. Hazards, vol. 1, p. 3, 2024

work page 2023