Morphology-Guided Cross-Task Coupling for Joint Building Height and Footprint Estimation

HongSik Yun; JinByeong Lee; Jinzhen Han; JiSung Kim

arxiv: 2605.04731 · v1 · submitted 2026-05-06 · 💻 cs.CV

Morphology-Guided Cross-Task Coupling for Joint Building Height and Footprint Estimation

Jinzhen Han , JinByeong Lee , JiSung Kim , HongSik Yun This is my paper

Pith reviewed 2026-05-08 17:17 UTC · model grok-4.3

classification 💻 cs.CV

keywords building heightbuilding footprintremote sensingmulti-task learningmorphology consistencycross-attentionjoint estimationurban morphology

0 comments

The pith

Explicitly coupling building height and footprint estimation via morphology guidance improves height accuracy over independent or standard multi-task approaches.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Building height and footprint describe the vertical and horizontal built environment and are linked through floor-area-ratio constraints, yet remote sensing usually estimates them separately. The paper demonstrates that two mechanisms enforcing their coupling—a decoder that guides height prediction using footprint morphology context and a loss that ensures height consistency from the footprint—deliver better height estimates while preserving footprint quality. This matters for applications like urban climate modeling, disaster risk assessment, and population mapping that rely on accurate 3D building data from satellite imagery. Ablations show the coupling accounts for most gains compared to a baseline with the same encoder. The design uses a shared Swin backbone on Sentinel and DEM inputs across 54 cities with a geo-blocked split.

Core claim

MorphoFormer encodes cross-task coupling between building height and footprint using a BF-Guided Task Decoder that applies cross-attention from footprint morphology to gate the height branch, plus a Morphology Consistency Loss that trains a height surrogate from footprint features against ground-truth heights. On a 54-city dataset this lowers building height RMSE from 3.39 m to 3.15 m and raises R-squared from 0.62 to 0.67, while footprint R-squared stays at 0.80. Ablations at matched capacity attribute the 0.24 m improvement mainly to the two mechanisms.

What carries the argument

MorphoFormer framework with BF-Guided Task Decoder (cross-attention gating of height branch by footprint morphology context) and Morphology Consistency Loss (supervising height-from-footprint surrogate).

Load-bearing premise

That the proposed consistency loss and guided decoder compel footprint features to capture genuine height-related morphology instead of training-set-specific correlations.

What would settle it

Observing that the height accuracy improvement vanishes when the model is tested on cities whose building morphologies differ substantially from those in the training data.

Figures

Figures reproduced from arXiv: 2605.04731 by HongSik Yun, JinByeong Lee, Jinzhen Han, JiSung Kim.

**Figure 1.** Figure 1: Overview of the proposed MorphoFormer framework. The BGTD module (highlighted) and the Morphology Consistency Loss (LMCL) jointly operationalize the cross-task coupling described in Section 4. 3. Against a Swin-MTL baseline at identical receptive field, MorphoFormer reduces BH RMSE from 3.39 to 3.15 m on a 54-city geo-blocked test split. 4. Controlled ablations at identical capacity attribute most of the… view at source ↗

**Figure 2.** Figure 2: A 90 × 90 input scene from the test split (San Francisco, coastline). Panels left to right: Sentinel-1 VV/VH average, Sentinel-2 RGB, Sentinel-2 NIR, SRTM DEM, and the validity mask. Dashed red lines mark cell boundaries; the regression-target centre cell is outlined in solid red. seen during training (as centres of their own training samples), so the same raw band values appear in train and test inputs an… view at source ↗

**Figure 3.** Figure 3: GeoSplit assignment for three cities of contrasting morphology and extent. Each view at source ↗

**Figure 4.** Figure 4: The (BH, BF) coupling on the training split. (a) Joint hexbin of view at source ↗

**Figure 5.** Figure 5: Architecture of MorphoFormer. Encoder pipeline (Section 4.1) on the left; view at source ↗

**Figure 6.** Figure 6: Predicted-vs-ground-truth hexbin densities on the test split for MorphoFormer view at source ↗

**Figure 7.** Figure 7: Stratification of test-set BH RMSE by λp bin. (a) BH RMSE per bin for MorphoFormer (full) and the BGTD/MCL ablations; the dense-urban tail (λp > 0.55) is dominated by a small population of high-rise cells with large absolute residuals. (b) Per-bin BH-RMSE increase upon removing each mechanism view at source ↗

**Figure 8.** Figure 8: BGTD cross-gate activation on the test split. (a) Per-sample mean gate value view at source ↗

**Figure 9.** Figure 9: Behaviour of the height-from-footprint surrogate on the test split. (a) Hexbin view at source ↗

**Figure 10.** Figure 10: Error origins for MorphoFormer on the test split, binned over view at source ↗

read the original abstract

Building height (BH) and building footprint (BF) jointly describe the vertical and horizontal extent of the built environment and are required inputs for urban climate, disaster-risk, and population-mapping models. The two parameters are coupled through floor-area-ratio (FAR) constraints, yet remote-sensing approaches typically treat them as independent regression targets. We argue that explicitly encoding this cross-task coupling is more impactful than further refining individual encoders, and propose MorphoFormer, a joint BH/BF estimation framework built around two complementary mechanisms: (i) a BF-Guided Task Decoder (BGTD) that gates the height branch via cross-attention on a footprint-derived morphology context, and (ii) a Morphology Consistency Loss (MCL) that supervises a height-from-footprint surrogate against the ground-truth BH, indirectly forcing the BF feature to encode height-correlated structure. The encoder is a single-stage Swin backbone fed by Sentinel-1 SAR, Sentinel-2 multispectral, and DEM inputs, trained and evaluated on a geo-blocked split of 54 cities. Against a Swin-MTL baseline at identical receptive field, MorphoFormer reduces BH test RMSE from 3.39 to 3.15 m (R^2 improves 0.62 -> 0.67) with BF R^2 stable at 0.80. Controlled ablations at identical capacity attribute most of this 0.24 m improvement to the two proposed mechanisms: removing BGTD raises BH RMSE by 0.11 m and removing MCL raises it by 0.11 m, with the residual approximately 0.02 m falling within the noise floor of encoder-side variations. Because both mechanisms act on cross-task representations rather than pixels, the design carries no intrinsic dependence on input resolution.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

MorphoFormer shows that a footprint-guided cross-attention decoder plus a morphology consistency loss can pull an extra 0.24 m off building height RMSE while keeping footprint accuracy flat, with ablations crediting the two new modules.

read the letter

The main thing to know is that this paper gets a modest but measurable lift in joint building height and footprint estimation by adding two mechanisms that explicitly link the tasks instead of treating them as separate regressions. On a geo-blocked split of 54 cities using Sentinel-1, Sentinel-2 and DEM inputs, the full model drops height RMSE from 3.39 m to 3.15 m and raises R^{2} from 0.62 to 0.67, with footprint R^{2} staying at 0.80. The ablations at matched capacity attribute roughly 0.11 m of the gain to the BF-Guided Task Decoder and another 0.11 m to the Morphology Consistency Loss, which is clean enough to evaluate the central claim. The design is straightforward: cross-attention lets footprint morphology context gate the height branch, and the consistency loss trains a height surrogate from the footprint features so those features are forced to carry height-relevant structure. Both operate on representations rather than raw pixels, so the approach is not tied to a particular resolution. The soft spot is the absolute size of the improvement. A 0.24 m reduction on a 3.39 m baseline is incremental for most downstream uses in urban climate or risk modeling, and the abstract gives no error bars or run-to-run variance, so it is possible part of the residual 0.02 m falls inside training noise. The paper also does not test whether the same coupling helps on higher-resolution optical data or different sensor mixes. This work is useful for groups already running multi-task remote-sensing models who want to try adding task coupling without changing the encoder. It deserves peer review because the controlled ablations and spatial split give referees a clear way to check whether the reported attribution holds. I would send it out.

Referee Report

0 major / 3 minor

Summary. The manuscript proposes MorphoFormer, a joint framework for building height (BH) and building footprint (BF) estimation from Sentinel-1 SAR, Sentinel-2 multispectral, and DEM inputs using a single-stage Swin encoder. It introduces two mechanisms to encode floor-area-ratio coupling: a BF-Guided Task Decoder (BGTD) that applies cross-attention gating from footprint-derived morphology to the height branch, and a Morphology Consistency Loss (MCL) that supervises a height surrogate derived from footprint features against ground-truth BH. On a geo-blocked split of 54 cities, MorphoFormer reduces BH test RMSE from 3.39 m to 3.15 m (R² from 0.62 to 0.67) while holding BF R² at 0.80; controlled ablations at fixed capacity attribute most of the 0.24 m gain to BGTD and MCL.

Significance. If the results hold, the work demonstrates that explicitly modeling cross-task physical coupling can improve remote-sensing regression accuracy without increasing encoder capacity or input resolution. The geo-blocked evaluation across disjoint cities and the fixed-capacity ablations provide direct evidence that gains arise from the morphology-guided mechanisms rather than data leakage or parameter count, strengthening the case for incorporating domain constraints like FAR into multi-task architectures for urban morphology estimation.

minor comments (3)

The experimental section should report standard deviations or results across multiple random seeds for the RMSE and R² values, as the 0.24 m improvement and the 0.11 m ablation deltas are currently presented as point estimates.
The exact mathematical definition of the Morphology Consistency Loss (including how the height surrogate is computed from BF features) would benefit from an explicit equation in the methods section to facilitate reproduction.
Table or text describing the data split should state the precise number of cities (or samples) allocated to training, validation, and test sets under the geo-blocked protocol.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive summary and significance assessment of MorphoFormer, which correctly highlights the 0.24 m BH RMSE reduction, stable BF performance, geo-blocked 54-city evaluation, and fixed-capacity ablations. The recommendation for minor revision is noted. No specific major comments were provided in the report.

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper's derivation consists of proposing two explicit mechanisms (BGTD cross-attention gating and MCL surrogate supervision) whose impact is measured via controlled ablations on a geo-blocked held-out test set of 54 cities. The 0.24 m BH RMSE reduction is reported as an empirical outcome on spatially disjoint test data, not as a quantity obtained by fitting a parameter and then re-predicting a related statistic from the same fit. No equations are presented that equate a claimed prediction to its own input by construction, and no load-bearing premise relies on a self-citation chain or imported uniqueness theorem. The design is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on standard assumptions that cross-attention can transfer morphology context and that a surrogate height-from-footprint task will regularize features toward real height correlations; no new physical entities or ad-hoc constants are introduced beyond ordinary neural-network training.

axioms (2)

domain assumption Cross-attention on footprint-derived morphology can usefully gate height features without introducing harmful bias
Invoked in the design of the BF-Guided Task Decoder
domain assumption A height-from-footprint surrogate loss will force footprint features to encode height-correlated structure
Core justification for the Morphology Consistency Loss

pith-pipeline@v0.9.0 · 5637 in / 1579 out tokens · 82516 ms · 2026-05-08T17:17:07.625595+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

31 extracted references · 9 canonical work pages

[1]

C. Fang, G. Li, S. Wang, Changing and differentiated urban landscape in China: Spatiotemporal patterns and driving forces, Environ. Sci. Tech- nol. 50 (5) (2016) 2217–2227

2016
[2]

Frolking, T

S. Frolking, T. Milliman, K. C. Seto, M. A. Friedl, A global fingerprint of macro-scale changes in urban structure from 1999 to 2009, Environ. Res. Lett. 8 (2) (2013) 024004

1999
[3]

Frolking, R

S. Frolking, R. Mahtta, T. Milliman, T. Esch, K. C. Seto, Global urban structural growth shows a profound shift from spreading out to building up, Nat. Cities 1 (9) (2024) 555–566

2024
[4]

C. Xi, C. Ren, J. Wang, Z. Feng, S.-J. Cao, Impacts of urban-scale building height diversity on urban climates: A case study of Nanjing, China, Energy Build. 251 (2021) 111350

2021
[5]

Perini, A

K. Perini, A. Magliocco, Effects of vegetation, urban density, building height, and atmospheric conditions on local temperatures and thermal comfort, Urban For. Urban Green. 13 (3) (2014) 495–506

2014
[6]

Huang, C

X. Huang, C. Wang, Estimates of exposure to the 100-year floods in the conterminous United States using national building footprints, Int. J. Disaster Risk Reduct. 50 (2020) 101731. 24

2020
[7]

Y. Tian, M. Lu, Z. Xu, J. Ren, A fire following earthquake spread model considering building height and its application to real-world events, Int. J. Disaster Risk Reduct. (2025) 105261

2025
[8]

C. F. Reinhart, C. Cerezo Davila, Urban building energy modeling – a review of a nascent field, Build. Environ. 97 (2016) 196–202. doi:10.1016/j.buildenv.2015.12.001

work page doi:10.1016/j.buildenv.2015.12.001 2016
[9]

A. J. Tatem, WorldPop, open data for spatial demography, Sci. Data 4 (2017) 170004. doi:10.1038/sdata.2017.4

work page doi:10.1038/sdata.2017.4 2017
[10]

Schiavina, S

M. Schiavina, S. Freire, A. Carioli, K. MacManus, GHS-POP R2023A – GHS population grid multitemporal (1975–2030), European Commis- sion, Joint Research Centre (JRC), available at 100m resolution (2023). doi:10.2905/2FF68A52-5B5B-4532-88CB-C5A729C3F5D0

work page doi:10.2905/2ff68a52-5b5b-4532-88cb-c5a729c3f5d0 1975
[11]

Geofabrik, OpenStreetMap download statistics (2018)

2018
[12]

Buyukdemircioglu, R

M. Buyukdemircioglu, R. Can, S. Kocaman, M. Kada, Deep learning based building footprint extraction from very high resolution true or- thophotos and nDSM, ISPRS Ann. Photogramm. Remote Sens. Spat. Inf. Sci. 2 (2022) 211–218

2022
[13]

Z. Li, Q. Xin, Y. Sun, M. Cao, A deep learning-based framework for automated extraction of building footprint polygons from very high- resolution aerial imagery, Remote Sens. 13 (18) (2021) 3630

2021
[14]

Rastogi, P

K. Rastogi, P. Bodani, S. A. Sharma, Automatic building footprint ex- traction from very high-resolution imagery using deep learning tech- niques, Geocarto Int. 37 (5) (2022) 1501–1513

2022
[15]

Park, J.-M

Y. Park, J.-M. Guldmann, Creating 3D city models with building foot- prints and LIDAR point cloud classification: A machine learning ap- proach, Comput. Environ. Urban Syst. 75 (2019) 76–89

2019
[16]

X. Li, Y. Zhou, P. Gong, K. C. Seto, N. Clinton, Developing a method to estimate building height from Sentinel-1 data, Remote Sens. Environ. 240 (2020) 111705

2020
[17]

Y. Sun, Y. Hua, L. Mou, XX. Zhu, Large-scale building height esti- mation from single VHR SAR image using fully convolutional network 25 and GIS building footprints. 2019 Joint Urban Remote Sensing Event, JURSE 2019 (2019)

2019
[18]

P. Cai, J. Guo, R. Li, Z. Xiao, H. Fu, T. Guo, X. Zhang, Y. Li, X. Song, Automated building height estimation using ice, cloud, and land eleva- tion satellite 2 light detection and ranging data and building footprints, Remote Sens. 16 (2) (2024) 263

2024
[19]

Frantz, F

D. Frantz, F. Schug, A. Okujeni, C. Navacchi, W. Wagner, S. van der Linden, P. Hostert, National-scale mapping of building height using Sentinel-1 and Sentinel-2 time series, Remote Sens. Environ. 252 (2021) 112128

2021
[20]

W.-B. Wu, J. Ma, E. Banzhaf, M. E. Meadows, Z.-W. Yu, F.-X. Guo, D. Sengupta, X.-X. Cai, B. Zhao, A first Chinese building height esti- mate at 10 m resolution (CNBH-10 m) using multi-source earth observa- tions and machine learning, Remote Sens. Environ. 291 (2023) 113578

2023
[21]

B. Cai, Z. Shao, X. Huang, X. Zhou, S. Fang, Deep learning-based building height mapping using Sentinel-1 and Sentinel-2 data, Int. J. Appl. Earth Obs. Geoinformation 122 (2023) 103399

2023
[22]

Y. Chen, W. Sun, L. Yang, X. Yang, X. Zhou, X. Li, S. Li, G. Tang, Refining urban morphology: An explainable machine learning method for estimating footprint-level building height, Sustain. Cities Soc. 112 (2024) 105635

2024
[23]

S. Wang, B. Cai, D. Hou, Q. Ding, J. Wang, Z. Shao, Mf-bhnet: A hybrid multimodal fusion network for building height estimation using sentinel-1 and sentinel-2 imagery, IEEE Transactions on Geoscience and Remote Sensing 62 (2024) 1–19

2024
[24]

Y. Zheng, et al., Estimating individual building heights by integrat- ing spaceborne LiDAR and multisource remote sensing data: A CNN– transformer model and a semi-supervised sample augmentation ap- proach, IEEE Transactions on Geoscience and Remote Sensing 63 (2025). doi:10.1109/TGRS.2025.3601205

work page doi:10.1109/tgrs.2025.3601205 2025
[25]

H. G. Kamath, M. Singh, N. Malviya, A. Martilli, L. He, D. Aliaga, C. He, F. Chen, L. A. Magruder, Z.-L. Yang, et al., Global build- ing heights for urban studies (ut-globus) for city-and street-scale urban 26 simulations: Development and first applications, Scientific Data 11 (1) (2024) 886

2024
[26]

X. Zhu, S. Chen, F. Zhang, Y. Shi, Y. Wang, GlobalBuildingAtlas: An open global and complete dataset of building polygons, heights and LoD1 3D models, Earth System Science Data 17 (2025) 6647–6670. doi:10.5194/essd-17-6647-2025

work page doi:10.5194/essd-17-6647-2025 2025
[27]

I. D. Stewart, T. R. Oke, Local climate zones for urban temper- ature studies, Bull. Am. Meteorol. Soc. 93 (12) (2012) 1879–1900. doi:10.1175/BAMS-D-11-00019.1

work page doi:10.1175/bams-d-11-00019.1 2012
[28]

Ching, G

J. Ching, G. Mills, B. Bechtel, L. See, J. Feddema, X. Wang, C. Ren, O. Brousse, A. Martilli, M. Neophytou, et al., WUDAPT: An ur- ban weather, climate, and environmental modeling infrastructure for the anthropocene, Bull. Am. Meteorol. Soc. 99 (9) (2018) 1907–1924. doi:10.1175/BAMS-D-16-0236.1

work page doi:10.1175/bams-d-16-0236.1 2018
[29]

R. Li, T. Sun, F. Tian, G.-H. Ni, SHAFTS (v2022.3): A deep-learning- based Python package for simultaneous extraction of building height and footprint from Sentinel imagery, Geosci. Model Dev. 16 (2) (2023) 751–778. doi:10.5194/gmd-16-751-2023

work page doi:10.5194/gmd-16-751-2023 2023
[30]

Musiaka, M

Ł. Musiaka, M. Nalej, Application of GIS tools in the measurement analysis of urban spatial layouts using the square grid method, ISPRS Int. J. Geo-Inf. 10 (8) (2021) 558. doi:10.3390/ijgi10080558

work page doi:10.3390/ijgi10080558 2021
[31]

Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, B. Guo, Swin transformer: Hierarchical vision transformer using shifted windows, in: Proc. IEEECVF Int. Conf. Comput. Vis., 2021, pp. 10012–10022. 27

2021

[1] [1]

C. Fang, G. Li, S. Wang, Changing and differentiated urban landscape in China: Spatiotemporal patterns and driving forces, Environ. Sci. Tech- nol. 50 (5) (2016) 2217–2227

2016

[2] [2]

Frolking, T

S. Frolking, T. Milliman, K. C. Seto, M. A. Friedl, A global fingerprint of macro-scale changes in urban structure from 1999 to 2009, Environ. Res. Lett. 8 (2) (2013) 024004

1999

[3] [3]

Frolking, R

S. Frolking, R. Mahtta, T. Milliman, T. Esch, K. C. Seto, Global urban structural growth shows a profound shift from spreading out to building up, Nat. Cities 1 (9) (2024) 555–566

2024

[4] [4]

C. Xi, C. Ren, J. Wang, Z. Feng, S.-J. Cao, Impacts of urban-scale building height diversity on urban climates: A case study of Nanjing, China, Energy Build. 251 (2021) 111350

2021

[5] [5]

Perini, A

K. Perini, A. Magliocco, Effects of vegetation, urban density, building height, and atmospheric conditions on local temperatures and thermal comfort, Urban For. Urban Green. 13 (3) (2014) 495–506

2014

[6] [6]

Huang, C

X. Huang, C. Wang, Estimates of exposure to the 100-year floods in the conterminous United States using national building footprints, Int. J. Disaster Risk Reduct. 50 (2020) 101731. 24

2020

[7] [7]

Y. Tian, M. Lu, Z. Xu, J. Ren, A fire following earthquake spread model considering building height and its application to real-world events, Int. J. Disaster Risk Reduct. (2025) 105261

2025

[8] [8]

C. F. Reinhart, C. Cerezo Davila, Urban building energy modeling – a review of a nascent field, Build. Environ. 97 (2016) 196–202. doi:10.1016/j.buildenv.2015.12.001

work page doi:10.1016/j.buildenv.2015.12.001 2016

[9] [9]

A. J. Tatem, WorldPop, open data for spatial demography, Sci. Data 4 (2017) 170004. doi:10.1038/sdata.2017.4

work page doi:10.1038/sdata.2017.4 2017

[10] [10]

Schiavina, S

M. Schiavina, S. Freire, A. Carioli, K. MacManus, GHS-POP R2023A – GHS population grid multitemporal (1975–2030), European Commis- sion, Joint Research Centre (JRC), available at 100m resolution (2023). doi:10.2905/2FF68A52-5B5B-4532-88CB-C5A729C3F5D0

work page doi:10.2905/2ff68a52-5b5b-4532-88cb-c5a729c3f5d0 1975

[11] [11]

Geofabrik, OpenStreetMap download statistics (2018)

2018

[12] [12]

Buyukdemircioglu, R

M. Buyukdemircioglu, R. Can, S. Kocaman, M. Kada, Deep learning based building footprint extraction from very high resolution true or- thophotos and nDSM, ISPRS Ann. Photogramm. Remote Sens. Spat. Inf. Sci. 2 (2022) 211–218

2022

[13] [13]

Z. Li, Q. Xin, Y. Sun, M. Cao, A deep learning-based framework for automated extraction of building footprint polygons from very high- resolution aerial imagery, Remote Sens. 13 (18) (2021) 3630

2021

[14] [14]

Rastogi, P

K. Rastogi, P. Bodani, S. A. Sharma, Automatic building footprint ex- traction from very high-resolution imagery using deep learning tech- niques, Geocarto Int. 37 (5) (2022) 1501–1513

2022

[15] [15]

Park, J.-M

Y. Park, J.-M. Guldmann, Creating 3D city models with building foot- prints and LIDAR point cloud classification: A machine learning ap- proach, Comput. Environ. Urban Syst. 75 (2019) 76–89

2019

[16] [16]

X. Li, Y. Zhou, P. Gong, K. C. Seto, N. Clinton, Developing a method to estimate building height from Sentinel-1 data, Remote Sens. Environ. 240 (2020) 111705

2020

[17] [17]

Y. Sun, Y. Hua, L. Mou, XX. Zhu, Large-scale building height esti- mation from single VHR SAR image using fully convolutional network 25 and GIS building footprints. 2019 Joint Urban Remote Sensing Event, JURSE 2019 (2019)

2019

[18] [18]

P. Cai, J. Guo, R. Li, Z. Xiao, H. Fu, T. Guo, X. Zhang, Y. Li, X. Song, Automated building height estimation using ice, cloud, and land eleva- tion satellite 2 light detection and ranging data and building footprints, Remote Sens. 16 (2) (2024) 263

2024

[19] [19]

Frantz, F

D. Frantz, F. Schug, A. Okujeni, C. Navacchi, W. Wagner, S. van der Linden, P. Hostert, National-scale mapping of building height using Sentinel-1 and Sentinel-2 time series, Remote Sens. Environ. 252 (2021) 112128

2021

[20] [20]

W.-B. Wu, J. Ma, E. Banzhaf, M. E. Meadows, Z.-W. Yu, F.-X. Guo, D. Sengupta, X.-X. Cai, B. Zhao, A first Chinese building height esti- mate at 10 m resolution (CNBH-10 m) using multi-source earth observa- tions and machine learning, Remote Sens. Environ. 291 (2023) 113578

2023

[21] [21]

B. Cai, Z. Shao, X. Huang, X. Zhou, S. Fang, Deep learning-based building height mapping using Sentinel-1 and Sentinel-2 data, Int. J. Appl. Earth Obs. Geoinformation 122 (2023) 103399

2023

[22] [22]

Y. Chen, W. Sun, L. Yang, X. Yang, X. Zhou, X. Li, S. Li, G. Tang, Refining urban morphology: An explainable machine learning method for estimating footprint-level building height, Sustain. Cities Soc. 112 (2024) 105635

2024

[23] [23]

S. Wang, B. Cai, D. Hou, Q. Ding, J. Wang, Z. Shao, Mf-bhnet: A hybrid multimodal fusion network for building height estimation using sentinel-1 and sentinel-2 imagery, IEEE Transactions on Geoscience and Remote Sensing 62 (2024) 1–19

2024

[24] [24]

Y. Zheng, et al., Estimating individual building heights by integrat- ing spaceborne LiDAR and multisource remote sensing data: A CNN– transformer model and a semi-supervised sample augmentation ap- proach, IEEE Transactions on Geoscience and Remote Sensing 63 (2025). doi:10.1109/TGRS.2025.3601205

work page doi:10.1109/tgrs.2025.3601205 2025

[25] [25]

H. G. Kamath, M. Singh, N. Malviya, A. Martilli, L. He, D. Aliaga, C. He, F. Chen, L. A. Magruder, Z.-L. Yang, et al., Global build- ing heights for urban studies (ut-globus) for city-and street-scale urban 26 simulations: Development and first applications, Scientific Data 11 (1) (2024) 886

2024

[26] [26]

X. Zhu, S. Chen, F. Zhang, Y. Shi, Y. Wang, GlobalBuildingAtlas: An open global and complete dataset of building polygons, heights and LoD1 3D models, Earth System Science Data 17 (2025) 6647–6670. doi:10.5194/essd-17-6647-2025

work page doi:10.5194/essd-17-6647-2025 2025

[27] [27]

I. D. Stewart, T. R. Oke, Local climate zones for urban temper- ature studies, Bull. Am. Meteorol. Soc. 93 (12) (2012) 1879–1900. doi:10.1175/BAMS-D-11-00019.1

work page doi:10.1175/bams-d-11-00019.1 2012

[28] [28]

Ching, G

J. Ching, G. Mills, B. Bechtel, L. See, J. Feddema, X. Wang, C. Ren, O. Brousse, A. Martilli, M. Neophytou, et al., WUDAPT: An ur- ban weather, climate, and environmental modeling infrastructure for the anthropocene, Bull. Am. Meteorol. Soc. 99 (9) (2018) 1907–1924. doi:10.1175/BAMS-D-16-0236.1

work page doi:10.1175/bams-d-16-0236.1 2018

[29] [29]

R. Li, T. Sun, F. Tian, G.-H. Ni, SHAFTS (v2022.3): A deep-learning- based Python package for simultaneous extraction of building height and footprint from Sentinel imagery, Geosci. Model Dev. 16 (2) (2023) 751–778. doi:10.5194/gmd-16-751-2023

work page doi:10.5194/gmd-16-751-2023 2023

[30] [30]

Musiaka, M

Ł. Musiaka, M. Nalej, Application of GIS tools in the measurement analysis of urban spatial layouts using the square grid method, ISPRS Int. J. Geo-Inf. 10 (8) (2021) 558. doi:10.3390/ijgi10080558

work page doi:10.3390/ijgi10080558 2021

[31] [31]

Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, B. Guo, Swin transformer: Hierarchical vision transformer using shifted windows, in: Proc. IEEECVF Int. Conf. Comput. Vis., 2021, pp. 10012–10022. 27

2021