pith. sign in

arxiv: 2509.10083 · v1 · submitted 2025-09-12 · 💻 cs.CY

The Hierarchical Morphotope Classification: A Theory-Driven Framework for Large-Scale Analysis of Built Form

Pith reviewed 2026-05-18 17:57 UTC · model grok-4.3

classification 💻 cs.CY
keywords morphotope classificationurban morphologybuilt formhierarchical classificationopen dataregionalisationmorphometric analysisscalable urban analysis
0
0 comments X

The pith

HiMoC classifies built form by first delineating morphotopes as smallest localities with distinctive character using open building and street data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper presents the Hierarchical Morphotope Classification as a method to group urban patterns into contiguous localities called morphotopes and then arrange them in a hierarchy. The approach starts from open data on buildings and streets, applies a regionalisation step to create morphologically distinct units, and builds a taxonomic tree based on dissimilarity in morphometric profiles. A sympathetic reader would care because the resulting framework supports scalable, reproducible classification that works across countries rather than remaining limited to single cities. It offers a theory-grounded alternative that complements land-use maps by focusing directly on physical form.

Core claim

The paper claims that the morphotope concept can be operationalised through the SA3 regionalisation method to produce contiguous, morphologically distinct localities from open data on buildings and streets, after which these units are organised into a hierarchical taxonomic tree that reflects their morphometric dissimilarity and permits flexible, interpretable classification of built fabric at continental scales.

What carries the argument

The morphotope, operationalised as the smallest locality with a distinctive character via the SA3 Spatial Agglomerative Adaptive Aggregation regionalisation applied to morphometric profiles of buildings and streets.

If this is right

  • The method groups over 90 million building footprints into more than 500,000 morphotopes across Central Europe.
  • Classification becomes applicable beyond a single country because the hierarchy is built from open data and a reproducible algorithm.
  • Users obtain flexible, interpretable categories of built fabric that complement existing land-use products.
  • The framework supports applications in urban planning, environmental analysis, and socio-spatial studies by providing a nuanced view of urban structure.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The hierarchy could be linked to socio-economic datasets to test whether morphological similarity predicts patterns of social or economic outcomes.
  • Environmental models of urban heat or air quality might incorporate morphotope boundaries as units for simulation rather than arbitrary grid cells.
  • The same open-data pipeline could be rerun after major construction events to measure morphological change over time.
  • Cross-validation against historical maps might reveal whether the derived morphotopes align with longstanding urban districts.

Load-bearing premise

Morphometric profiles derived from open building and street data are sufficient to capture the distinctive character of a locality and that SA3 regionalisation produces contiguous, morphologically distinct morphotopes.

What would settle it

Empirical comparison showing that the morphotopes produced by HiMoC do not correspond to areas identified as morphologically distinct through independent field surveys or expert visual assessment would falsify the central claim.

read the original abstract

Built environment, formed of a plethora of patterns of building, streets, and plots, has a profound impact on how cities are perceived and function. While various methods exist to classify urban patterns, they often lack a strong theoretical foundation, are not scalable beyond a local level, or sacrifice detail for broader application. This paper introduces the Hierarchical Morphotope Classification (HiMoC), a novel, theory-driven, and computationally scalable method of classification of built form. HiMoC operationalises the idea of a morphotope - the smallest locality with a distinctive character - using a bespoke regionalisation method SA3 (Spatial Agglomerative Adaptive Aggregation), to delineate contiguous, morphologically distinct localities. These are further organised into a hierarchical taxonomic tree reflecting their dissimilarity based on morphometric profile derived from buildings and streets retrieved from open data, allowing flexible, interpretable classification of built fabric, that can be applied beyond a scale of a single country. The method is tested on a subset of countries of Central Europe, grouping over 90 million building footprints into over 500,000 morphotopes. The method extends the capabilities of available morphometric analyses, while offering a complementary perspective to existing large scale data products, which are focusing primarily on land use or use conceptual definition of urban fabric types. This theory-grounded, reproducible, unsupervised and scalable method facilitates a nuanced understanding of urban structure, with broad applications in urban planning, environmental analysis, and socio-spatial studies.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces the Hierarchical Morphotope Classification (HiMoC) framework, which operationalizes the morphotope concept—the smallest locality with a distinctive character—via a custom SA3 (Spatial Agglomerative Adaptive Aggregation) regionalisation algorithm applied to morphometric profiles derived from open building and street data. These morphotopes are organized into a hierarchical taxonomic tree reflecting dissimilarity, enabling flexible classification. The approach is demonstrated on Central European data, grouping over 90 million building footprints into over 500,000 morphotopes, and positioned as a scalable, theory-driven, unsupervised complement to land-use-focused urban classification products.

Significance. If the core claims hold, the work provides a notable contribution by delivering a reproducible, computationally scalable, and theory-grounded method for large-scale built-form analysis using open data. Strengths include the hierarchical structure for flexible interpretation and the explicit linkage to morphotope theory, which could support applications in urban planning, environmental analysis, and socio-spatial studies while complementing existing large-scale data products. The unsupervised and scalable design is a clear asset for continental or global extensions.

major comments (2)
  1. [SA3 regionalisation description] The description of the SA3 regionalisation (methods section): the claim that SA3 produces contiguous, morphologically distinct morphotopes—the central operationalisation step—is not supported by any reported cluster quality metrics such as silhouette scores, Davies-Bouldin indices, or external validation against known morphological typologies. Without these, the assertion that the 500,000 morphotopes reflect genuine 'distinctive character' rather than artifacts from OSM coverage gaps remains unsubstantiated and is load-bearing for all downstream hierarchical and application claims.
  2. [Results / Central Europe test] Central Europe test results: no baseline comparisons, sensitivity tests on SA3 agglomeration parameters, or error analysis are provided despite grouping 90 million buildings; this leaves the quantitative performance of the method without demonstrated support and weakens the claim of scalability and nuance over existing approaches.
minor comments (2)
  1. [Abstract and methods] The abstract and methods would benefit from an explicit table listing the morphometric variables (e.g., building density, street network metrics) used to construct profiles, to improve reproducibility and clarity of the input to SA3.
  2. [Hierarchical tree construction] Notation for the hierarchical taxonomic tree levels and dissimilarity measure is introduced without a formal definition or pseudocode; adding this would aid readers in understanding the claimed flexibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed comments, which have helped us identify areas where the manuscript can be strengthened. We address each major comment point by point below, providing clarifications on the methodological choices and outlining specific revisions.

read point-by-point responses
  1. Referee: [SA3 regionalisation description] The description of the SA3 regionalisation (methods section): the claim that SA3 produces contiguous, morphologically distinct morphotopes—the central operationalisation step—is not supported by any reported cluster quality metrics such as silhouette scores, Davies-Bouldin indices, or external validation against known morphological typologies. Without these, the assertion that the 500,000 morphotopes reflect genuine 'distinctive character' rather than artifacts from OSM coverage gaps remains unsubstantiated and is load-bearing for all downstream hierarchical and application claims.

    Authors: We agree that additional quantitative support would strengthen the presentation. However, SA3 is a spatially constrained regionalisation procedure that enforces contiguity via adaptive aggregation; standard internal validation indices such as silhouette scores or Davies-Bouldin indices assume non-spatial, distance-based clusters and are therefore not directly applicable. We will revise the methods section to clarify this design rationale and to include sensitivity tests on the agglomeration parameters. We will also expand the discussion of OSM data limitations, describing the preprocessing steps taken to mitigate coverage gaps, and add qualitative comparisons against existing morphological studies for selected cities as a form of external reference. revision: partial

  2. Referee: [Results / Central Europe test] Central Europe test results: no baseline comparisons, sensitivity tests on SA3 agglomeration parameters, or error analysis are provided despite grouping 90 million buildings; this leaves the quantitative performance of the method without demonstrated support and weakens the claim of scalability and nuance over existing approaches.

    Authors: We accept that the current results section would benefit from these additions to better demonstrate performance. In the revised manuscript we will insert a dedicated subsection reporting sensitivity tests on the principal SA3 parameters, showing stability of the resulting morphotope counts and profiles. We will also provide a baseline comparison against a non-hierarchical, non-spatial clustering method applied to a representative subsample, and we will expand the error analysis to quantify the influence of data-quality issues. These changes will directly support the scalability and comparative claims. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained operationalisation of prior theory

full rationale

The paper introduces HiMoC as a new operationalisation of the existing morphotope concept via the bespoke SA3 regionalisation algorithm applied to morphometric profiles derived from open building and street data. No equations or steps reduce outputs to fitted parameters by construction, nor do self-citations form load-bearing justifications for uniqueness or ansatzes. The hierarchical tree and 500k morphotopes emerge from the described clustering process rather than renaming known results or importing unverified self-citations as external facts. The derivation chain remains independent of the target claims.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 2 invented entities

The central claim rests on the existence of morphotopes as meaningful units and on the validity of SA3 for delineating them; these are introduced without independent empirical grounding or falsifiable tests beyond the descriptive application.

free parameters (1)
  • SA3 agglomeration parameters
    The bespoke regionalisation method SA3 is described as adaptive but no specific thresholds or stopping criteria are stated in the abstract.
axioms (1)
  • domain assumption A morphotope is the smallest locality with a distinctive character that can be captured by morphometric profiles of buildings and streets.
    This premise is invoked when the paper states it operationalises the morphotope idea using SA3.
invented entities (2)
  • morphotope no independent evidence
    purpose: Smallest locality with distinctive character as the base unit of classification
    Core new conceptual unit introduced to ground the classification.
  • SA3 (Spatial Agglomerative Adaptive Aggregation) no independent evidence
    purpose: Regionalisation algorithm to delineate contiguous morphotopes
    Bespoke method created for this framework.

pith-pipeline@v0.9.0 · 5813 in / 1328 out tokens · 48770 ms · 2026-05-18T17:57:40.918646+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

  • IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean reality_from_one_distinction unclear
    ?
    unclear

    Relation between the paper passage and the cited Recognition theorem.

    HiMoC operationalises the idea of a morphotope – the smallest locality with a distinctive character – using a bespoke regionalisation method SA3 (Spatial Agglomerative Adaptive Aggregation), to delineate contiguous, morphologically distinct localities. These are further organised into a hierarchical taxonomic tree reflecting their dissimilarity based on morphometric profile derived from buildings and streets retrieved from open data

  • IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear
    ?
    unclear

    Relation between the paper passage and the cited Recognition theorem.

    The only parameter required by SA3 is the minimum number of buildings to form a morphotope... we selected a value of 75

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

69 extracted references · 69 canonical work pages

  1. [1]

    On the Discovery of Urban Typologies: Data Mining the Multi-Dimensional Character of Neighbourhoods

    https://doi.org/10.5311/JOSIS.2024.28.319. Fleischmann, Martin, Anastassia Vybornova, James D. Gaboardi, Anna Brázdová, and Daniela Dančejová. 2025. Adaptive Continuity-Preserving Simplification of Street Networks. arXiv:2504.16198. arXiv. https://doi.org/10.48550/arXiv.2504.16198. Gil, Jorge, Nuno Montenegro, J N Beirão, and J P Duarte. 2012. “On the Dis...

  2. [2]

    Beyond Housing Preferences: Urban Structure and Actualisation of Residential Area Preferences

    Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-29063-3_18. Hasanzadeh, Kamyar, Marketta Kyttä, and Greg Brown. 2019. “Beyond Housing Preferences: Urban Structure and Actualisation of Residential Area Preferences. ” Urban Science 3 (1): 21. https://doi. org/10.3390/urbansci3010021. Hijazi, Ihab, Xin Li, Reinhard Koenig, et al. 2016. “Measu...

  3. [3]

    Classifying Settlement Types from Multi-Scale Spatial Patterns of Building Footprints

    https://doi.org/10.1016/j.landurbplan.2007.02.010. Jochem, Warren C, Douglas R Leasure, Oliver Pannell, Heather R Chamberlain, Patricia Jones, and Andrew J Tatem. 2020. “Classifying Settlement Types from Multi-Scale Spatial Patterns of Building Footprints. ” Environment and Planning B: Urban Analytics and City Science, May, 239980832092120. https://doi.or...

  4. [4]

    Clustering Patterns of Urban Built-up Areas with Curves of Fractal Scaling Behaviour

    “Clustering Patterns of Urban Built-up Areas with Curves of Fractal Scaling Behaviour. ” Environment and Planning B: Planning and Design 37 (5): 942–54. https://doi.org/10.1068/b36039. Van den Bossche, Joris, Kelsey Jordahl, Martin Fleischmann, et al. 2025. Geopandas/Geopandas: V1.1.1. Version v1.1.1. Zenodo, released June. https://doi.org/10.5281/zenodo....

  5. [5]

    Area of a building is denoted as (1) 𝑎𝑏𝑙𝑔 and defined as an area covered by a building footprint in m²

  6. [6]

    Perimeter of a building is denoted as (2) 𝑝𝑏𝑙𝑔 and defined as the sum of lengths of the building exterior walls in m

  7. [7]

    Courtyard area of a building is denoted as (3) 𝑎𝑏𝑙𝑔𝑐 and defined as the sum of areas of interior holes in footprint polygons in m²

  8. [8]

    It captures the relation of building footprint shape to its minimal enclosing circle, illustrating the similarity of shape and circle (Dibble et al

    Circular compactness of a building is denoted as (4) 𝐶𝐶𝑜𝑏𝑙𝑔 = 𝑎𝑏 𝑙 𝑔 𝑎𝑏 𝑙 𝑔 𝐶 where 𝑎𝑏𝑙𝑔𝐶 is an area of minimal enclosing circle. It captures the relation of building footprint shape to its minimal enclosing circle, illustrating the similarity of shape and circle (Dibble et al. 2015)

  9. [9]

    It uses only external shape (shapely.geometry.exterior), courtyards are not included

    Corners of a building is denoted as (5) 𝐶𝑜𝑟𝑏𝑙𝑔 = ∑𝑛 𝑖=1 𝑐𝑏𝑙𝑔 where 𝑐𝑏𝑙𝑔 is defined as a vertex of building exterior shape with an angle between adjacent line segments ≤ 170 degrees. It uses only external shape (shapely.geometry.exterior), courtyards are not included. Character is adapted from (Steiniger et al. 2008) to exclude non-corner-like vertices

  10. [10]

    Squareness of a building is denoted as (6) 𝑆𝑞𝑢𝑏𝑙𝑔 = ∑𝑛 𝑖 =1 𝐷𝑐𝑏 𝑙 𝑔 𝑖 𝑛 where 𝐷 is the deviation of angle of corner 𝑐𝑏𝑙𝑔𝑖 from 90 degrees and 𝑛 is a number of corners

  11. [11]

    It is a measure of shape complexity identified by Basaraner and Cetinkaya (2017) as the shape characters with the best performance

    Equivalent rectangular index of a building is denoted as (7) 𝐸𝑅𝐼𝑏𝑙𝑔 = √ 𝑎𝑏 𝑙 𝑔 𝑎𝑏 𝑙 𝑔 𝐵 × 𝑝𝑏 𝑙 𝑔 𝐵 𝑝𝑏 𝑙 𝑔 where 𝑎𝑏𝑙𝑔𝐵 is an area of a minimal rotated bounding rectangle of a building (MBR) footprint and 𝑝𝑏𝑙𝑔𝐵 its perimeter of MBR. It is a measure of shape complexity identified by Basaraner and Cetinkaya (2017) as the shape characters with the best performance

  12. [12]

    It captures the ratio of shorter to the longer dimension of MBR to indirectly capture the deviation of the shape from a square (Schirmer and Axhausen 2015)

    Elongation of a building is denoted as (8) 𝐸𝑙𝑜𝑏𝑙𝑔 = 𝑙𝑏 𝑙 𝑔 𝐵 𝑤𝑏 𝑙 𝑔 𝐵 where 𝑙𝑏𝑙𝑔𝐵 is length of MBR and 𝑤𝑏𝑙𝑔𝐵 is width of MBR. It captures the ratio of shorter to the longer dimension of MBR to indirectly capture the deviation of the shape from a square (Schirmer and Axhausen 2015)

  13. [13]

    The axis itself does not have to be fully within the polygon

    Longest axis length of a tessellation cell is denoted as (9) 𝐿𝐴𝐿𝑐𝑒𝑙𝑙 = 𝑑𝑐𝑒𝑙𝑙𝐶 where 𝑑𝑐𝑒𝑙𝑙𝐶 is a diameter of the minimal circumscribed circle around the tessellation cell polygon. The axis itself does not have to be fully within the polygon. It could be seen as a proxy of plot depth for tessellation-based analysis

  14. [14]

    Area of a tessellation cell is denoted as (10) 𝑎𝑐𝑒𝑙𝑙 and defined as an area covered by a tessellation cell footprint in m²

  15. [15]

    It captures the relation of tessellation cell footprint shape to its minimal enclosing circle, illustrating the similarity of shape and circle

    Circular compactness of a tessellation cell is denoted as (11) 𝐶𝐶𝑜𝑐𝑒𝑙𝑙 = 𝑎𝑐𝑒 𝑙 𝑙 𝑎𝑐𝑒 𝑙 𝑙 𝐶 where 𝑎𝑐𝑒𝑙𝑙𝐶 is an area of minimal enclosing circle. It captures the relation of tessellation cell footprint shape to its minimal enclosing circle, illustrating the similarity of shape and circle. The Hierarchical Morphotope Classification 23

  16. [16]

    It is a measure of shape complexity identified by Basaraner and Cetinkaya (2017) as a shape character of the best performance

    Equivalent rectangular index of a tessellation cell is denoted as (12) 𝐸𝑅𝐼𝑐𝑒𝑙𝑙 = √ 𝑎𝑐𝑒 𝑙 𝑙 𝑎𝑐𝑒 𝑙 𝑙 𝐵 × 𝑝𝑐𝑒 𝑙 𝑙 𝐵 𝑝𝑐𝑒 𝑙 𝑙 where 𝑎𝑐𝑒𝑙𝑙𝐵 is an area of the minimal rotated bounding rectangle of a tessellation cell (MBR) footprint and 𝑝𝑐𝑒𝑙𝑙𝐵 its perimeter of MBR. It is a measure of shape complexity identified by Basaraner and Cetinkaya (2017) as a shape charac...

  17. [17]

    Coverage area ratio (CAR) is one of the commonly used characters capturing intensity of development

    Coverage area ratio of a tessellation cell is denoted as (13) 𝐶𝐴𝑅𝑐𝑒𝑙𝑙 = 𝑎𝑏 𝑙 𝑔 𝑎𝑐𝑒 𝑙 𝑙 where 𝑎𝑏𝑙𝑔 is an area of a building and 𝑎𝑐𝑒𝑙𝑙 is an area of related tessellation cell (Schirmer and Axhausen 2015). Coverage area ratio (CAR) is one of the commonly used characters capturing intensity of development. However, the definitions vary based on the spatial unit

  18. [18]

    2015; Gil et al

    Length of a street segment is denoted as (14) 𝑙𝑒𝑑𝑔 and defined as a length of a LineString geometry in metres (Dibble et al. 2015; Gil et al. 2012)

  19. [19]

    The algorithm generates street sections every 3 meters alongside the street segment, and measures mean value

    Width of a street profile is denoted as (15) 𝑤𝑠𝑝 = 1 𝑛 (∑𝑛 𝑖=1 𝑤𝑖) where 𝑤𝑖 is width of a street section i. The algorithm generates street sections every 3 meters alongside the street segment, and measures mean value. In the case of the open-ended street, 50 metres is used as a perception-based proximity limit (Araldi and Fusco 2019)

  20. [20]

    The algorithm generates street sections every 3 meters alongside the street segment

    Openness of a street profile is denoted as (16) 𝑂𝑝𝑒𝑠𝑝 = 1 − ∑ ℎ𝑖𝑡 2 ∑ 𝑠𝑒𝑐 where ∑ ℎ𝑖𝑡 is a sum of section lines (left and right sides separately) intersecting buildings and ∑ 𝑠𝑒𝑐 total number of street sections. The algorithm generates street sections every 3 meters alongside the street segment

  21. [21]

    The algorithm generates street sections every 3 meters alongside the street segment

    Width deviation of a street profile is denoted as (17) 𝑤𝐷𝑒𝑣𝑠𝑝 = √ 1 𝑛 ∑𝑛 𝑖=1 (𝑤𝑖 − 𝑤𝑠𝑝) 2 where 𝑤𝑖 is width of a street section i and 𝑤𝑠𝑝 is mean width. The algorithm generates street sections every 3 meters alongside the street segment

  22. [22]

    It captures the deviation of a segment shape from a straight line

    Linearity of a street segment is denoted as (18) 𝐿𝑖𝑛𝑒𝑑𝑔 = 𝑙𝑒 𝑢 𝑐𝑙 𝑙𝑒 𝑑𝑔 where 𝑙𝑒𝑢𝑐𝑙 is Euclidean distance between endpoints of a street segment and 𝑙𝑒𝑑𝑔 is a street segment length. It captures the deviation of a segment shape from a straight line. It is adapted from Araldi and Fusco (2019)

  23. [23]

    It captures the area which is likely served by each segment

    Area covered by a street segment is denoted as (19) 𝑎𝑒𝑑𝑔 = ∑𝑛 𝑖=1 𝑎𝑐𝑒𝑙𝑙𝑖 where 𝑎𝑐𝑒𝑙𝑙𝑖 is an area of tessellation cell 𝑖 belonging to the street segment. It captures the area which is likely served by each segment

  24. [24]

    It reflects the granularity of development along each segment

    Buildings per meter of a street segment is denoted as (20) 𝐵𝑝𝑀𝑒𝑑𝑔 = ∑ 𝑏𝑙𝑔 𝑙𝑒 𝑑𝑔 where ∑ 𝑏𝑙𝑔 is a number of buildings belonging to a street segment and 𝑙𝑒𝑑𝑔 is a length of a street segment. It reflects the granularity of development along each segment

  25. [25]

    It captures the area which is likely served by each node

    Area covered by a street node is denoted as (21) 𝑎𝑛𝑜𝑑𝑒 = ∑𝑛 𝑖=1 𝑎𝑐𝑒𝑙𝑙𝑖 where 𝑎𝑐𝑒𝑙𝑙𝑖 is an area of tessellation cell 𝑖 belonging to the street node. It captures the area which is likely served by each node

  26. [26]

    It captures the amount of wall space facing the open space (Hamaina et al

    Shared walls ratio of adjacent buildings is denoted as (22) 𝑆𝑊𝑅𝑏𝑙𝑔 = 𝑝𝑏 𝑙 𝑔 𝑠ℎ𝑎𝑟𝑒 𝑑 𝑝𝑏 𝑙 𝑔 where 𝑝𝑏𝑙𝑔𝑠ℎ𝑎𝑟𝑒 𝑑 is a length of a perimeter shared with adjacent buildings and 𝑝𝑏𝑙𝑔 is a perimeter of a building. It captures the amount of wall space facing the open space (Hamaina et al. 2012)

  27. [27]

    It is adapted from Hijazi et al

    Mean distance to neighbouring buildings is denoted as The Hierarchical Morphotope Classification 24 (23) 𝑁𝐷𝑖𝑏𝑙𝑔 = 1 𝑛 ∑𝑛 𝑖=1 𝑑𝑏𝑙𝑔,𝑏𝑙𝑔𝑖 where 𝑑𝑏𝑙𝑔,𝑏𝑙𝑔𝑖 is a distance between building and building 𝑖 on a neighbouring tessellation cell. It is adapted from Hijazi et al. (2016). It captures the average proximity to other buildings

  28. [28]

    It reflects granularity of morphological tessellation

    Weighted neighbours of a tessellation cell is denoted as (24) 𝑊𝑁𝑒𝑐𝑒𝑙𝑙 = ∑ 𝑐𝑒𝑙𝑙𝑛 𝑝𝑐𝑒 𝑙 𝑙 where ∑ 𝑐𝑒𝑙𝑙𝑛 is a number of cell neighbours and 𝑝𝑐𝑒𝑙𝑙 is a perimeter of a cell. It reflects granularity of morphological tessellation

  29. [29]

    It captures the scale of morphological tessellation

    Area covered by neighbouring cells is denoted as (25) 𝑎𝑐𝑒𝑙𝑙𝑛 = ∑𝑛 𝑖=1 𝑎𝑐𝑒𝑙𝑙𝑖 where 𝑎𝑐𝑒𝑙𝑙𝑖 is area of tessellation cell 𝑖 within topological distance 1. It captures the scale of morphological tessellation

  30. [30]

    It captures an accessible area

    Reached area by neighbouring segments is denoted as (26) 𝑎𝑒𝑑𝑔𝑛 = ∑𝑛 𝑖=1 𝑎𝑒𝑑𝑔𝑖 where 𝑎𝑒𝑑𝑔𝑖 is an area covered by a street segment 𝑖 within topological distance 1. It captures an accessible area

  31. [31]

    It reflects the basic degree centrality

    Degree of a street node is denoted as (27) 𝑑𝑒𝑔𝑛𝑜𝑑𝑒𝑖 = ∑𝑗𝑒𝑑𝑔𝑖𝑗 where 𝑒𝑑𝑔𝑖𝑗 is an edge of a street network between node 𝑖 and node 𝑗. It reflects the basic degree centrality

  32. [32]

    It captures the average proximity to other nodes

    Mean distance to neighbouring nodes from a street node is denoted as (28) 𝑀𝐷𝑖𝑛𝑜𝑑𝑒 = 1 𝑛 ∑𝑛 𝑖=1 𝑑𝑛𝑜𝑑𝑒,𝑛𝑜𝑑𝑒𝑖 where 𝑑𝑛𝑜𝑑𝑒,𝑛𝑜𝑑𝑒𝑖 is a distance between node and node 𝑖 within topological distance 1. It captures the average proximity to other nodes

  33. [33]

    It captures accessible granularity

    Reached cells by neighbouring nodes is denoted as (29) 𝑅𝐶𝑛𝑜𝑑𝑒𝑛 = ∑𝑛 𝑖=1 𝑐𝑒𝑙𝑙𝑠𝑛𝑜𝑑𝑒𝑖 where 𝑐𝑒𝑙𝑙𝑠𝑛𝑜𝑑𝑒𝑖 is number of tessellation cells on node 𝑖 within topological distance 1. It captures accessible granularity

  34. [34]

    It captures an accessible area

    Reached area by neighbouring nodes is denoted as (30) 𝑎𝑛𝑜𝑑𝑒𝑛 = ∑𝑛 𝑖=1 𝑎𝑛𝑜𝑑𝑒𝑖 where 𝑎𝑛𝑜𝑑𝑒𝑖 is an area covered by a street node 𝑖 within topological distance 1. It captures an accessible area

  35. [35]

    Number of courtyards of adjacent buildings is denoted as (31) 𝑁𝐶𝑜𝑏𝑙𝑔𝑎𝑑𝑗 where 𝑁𝐶𝑜𝑏𝑙𝑔𝑎𝑑𝑗 is a number of interior rings of a polygon composed of footprints of adjacent buildings (Schirmer and Axhausen 2015)

  36. [36]

    Perimeter wall length of adjacent buildings is denoted as (32) 𝑝𝑏𝑙𝑔𝑎𝑑𝑗 where 𝑝𝑏𝑙𝑔𝑎𝑑𝑗 is a length of an exterior ring of a polygon composed of footprints of adjacent buildings

  37. [37]

    It is adapted from Caruso et al

    Mean inter-building distance between neighbouring buildings is denoted as (33) 𝐼𝐵𝐷𝑏𝑙𝑔 = 1 𝑛 ∑𝑛 𝑖=1 𝑑𝑏𝑙𝑔,𝑏𝑙𝑔𝑖 where 𝑑𝑏𝑙𝑔,𝑏𝑙𝑔𝑖 is a distance between building and building 𝑖 on a tessellation cell within topological distance 3. It is adapted from Caruso et al. (2017). It captures the average proximity between buildings

  38. [38]

    It is adapted from Vanderhaegen and Canters (2017)

    Building adjacency of neighbouring buildings is denoted as (34) 𝐵𝑢𝐴𝑏𝑙𝑔 = ∑ 𝑏𝑙𝑔𝑎𝑑𝑗 ∑ 𝑏𝑙𝑔 where ∑ 𝑏𝑙𝑔𝑎𝑑𝑗 is a number of joined built-up structures within topological distance three and ∑ 𝑏𝑙𝑔 is a number of buildings within topological distance 3. It is adapted from Vanderhaegen and Canters (2017). The Hierarchical Morphotope Classification 25

  39. [39]

    Weighted reached blocks of neighbouring tessellation cells is denoted as (35) 𝑊𝑅𝐵𝑐𝑒𝑙𝑙 = ∑ 𝑏𝑙𝑘 ∑𝑛 𝑖 =1 𝑎𝑐𝑒 𝑙 𝑙 𝑖 where ∑ 𝑏𝑙𝑘 is a number of blocks within topological distance three and 𝑎𝑐𝑒𝑙𝑙𝑖 is an area of tessellation cell 𝑖 within topological distance three

  40. [40]

    A subgraph is defined as a network within topological distance five around a node

    Local meshedness of a street network is denoted as (36) 𝑀𝑒𝑠𝑛𝑜𝑑𝑒 = 𝑒−𝑣+1 2𝑣−5 where 𝑒 is a number of edges in a subgraph, and 𝑣 is the number of nodes in a subgraph (Feliciotti 2018). A subgraph is defined as a network within topological distance five around a node

  41. [41]

    Mean segment length of a street network is denoted as (37) 𝑀𝑆𝐿𝑒𝑑𝑔 = 1 𝑛 ∑𝑛 𝑖=1 𝑙𝑒𝑑𝑔𝑖 where 𝑙𝑒𝑑𝑔𝑖 is a length of a street segment 𝑖 within a topological distance 3 around a segment

  42. [42]

    Cul-de-sac length of a street network is denoted as (38) 𝐶𝐷𝐿𝑛𝑜𝑑𝑒 = ∑𝑛 𝑖=1 𝑙𝑒𝑑𝑔𝑖 , if 𝑒𝑑𝑔𝑖 is cul-de-sac where 𝑙𝑒𝑑𝑔𝑖 is a length of a street segment 𝑖 within a topological distance 3 around a node

  43. [43]

    It captures accessible granularity

    Reached cells by street network segments is denoted as (39) 𝑅𝐶𝑒𝑑𝑔 = ∑𝑛 𝑖=1 𝑐𝑒𝑙𝑙𝑠𝑒𝑑𝑔𝑖 where 𝑐𝑒𝑙𝑙𝑠𝑒𝑑𝑔𝑖 is number of tessellation cells on segment 𝑖 within topological distance 3. It captures accessible granularity

  44. [44]

    A subgraph is defined as a network within topological distance five around a node

    Node density of a street network is denoted as (40) 𝐷𝑛𝑜𝑑𝑒 = ∑ 𝑛𝑜𝑑𝑒 ∑𝑛 𝑖 =1 𝑙𝑒 𝑑𝑔 𝑖 where ∑ 𝑛𝑜𝑑𝑒 is a number of nodes within a subgraph and 𝑙𝑒𝑑𝑔𝑖 is a length of a segment 𝑖 within a subgraph. A subgraph is defined as a network within topological distance five around a node

  45. [45]

    It captures accessible granularity

    Reached cells by street network nodes is denoted as (41) 𝑅𝐶𝑛𝑜𝑑𝑒𝑛 𝑒 𝑡 = ∑𝑛 𝑖=1 𝑐𝑒𝑙𝑙𝑠𝑛𝑜𝑑𝑒𝑖 where 𝑐𝑒𝑙𝑙𝑠𝑛𝑜𝑑𝑒𝑖 is number of tessellation cells on node 𝑖 within topological distance 3. It captures accessible granularity

  46. [46]

    It captures an accessible area

    Reached area by street network nodes is denoted as (42) 𝑎𝑛𝑜𝑑𝑒𝑛 𝑒 𝑡 = ∑𝑛 𝑖=1 𝑎𝑛𝑜𝑑𝑒𝑖 where 𝑎𝑛𝑜𝑑𝑒𝑖 is an area covered by a street node 𝑖 within topological distance 3. It captures an accessible area

  47. [47]

    Adapted from (Boeing 2017)

    Proportion of cul-de-sacs within a street network is denoted as (43) 𝑝𝐶𝐷𝑛𝑜𝑑𝑒 = ∑𝑛 𝑖 =1 𝑛𝑜𝑑𝑒𝑖 , if 𝑑𝑒𝑔𝑛 𝑜𝑑𝑒 𝑖 =1 ∑𝑛 𝑖 =1 𝑛𝑜𝑑𝑒𝑖 where 𝑛𝑜𝑑𝑒𝑖 is a node whiting topological distance five around a node. Adapted from (Boeing 2017)

  48. [48]

    Adapted from (Boeing 2017)

    Proportion of 3-way intersections within a street network is denoted as (44) 𝑝3𝑊𝑛𝑜𝑑𝑒 = ∑𝑛 𝑖 =1 𝑛𝑜𝑑𝑒𝑖 , if 𝑑𝑒𝑔𝑛 𝑜𝑑𝑒 𝑖 =3 ∑𝑛 𝑖 =1 𝑛𝑜𝑑𝑒𝑖 where 𝑛𝑜𝑑𝑒𝑖 is a node whiting topological distance five around a node. Adapted from (Boeing 2017)

  49. [49]

    Adapted from (Boeing 2017)

    Proportion of 4-way intersections within a street network is denoted as (45) 𝑝4𝑊𝑛𝑜𝑑𝑒 = ∑𝑛 𝑖 =1 𝑛𝑜𝑑𝑒𝑖 , if 𝑑𝑒𝑔𝑛 𝑜𝑑𝑒 𝑖 =4 ∑𝑛 𝑖 =1 𝑛𝑜𝑑𝑒𝑖 where 𝑛𝑜𝑑𝑒𝑖 is a node whiting topological distance five around a node. Adapted from (Boeing 2017)

  50. [50]

    A subgraph is defined as a network within topological distance five around a node

    Weighted node density of a street network is denoted as (46) 𝑤𝐷𝑛𝑜𝑑𝑒 = ∑𝑛 𝑖 =1 𝑑𝑒𝑔𝑛 𝑜𝑑𝑒 𝑖 −1 ∑𝑛 𝑖 =1 𝑙𝑒 𝑑𝑔 𝑖 where 𝑑𝑒𝑔𝑛𝑜𝑑𝑒𝑖 is a degree of a node 𝑖 within a subgraph and 𝑙𝑒𝑑𝑔𝑖 is a length of a segment 𝑖 within a subgraph. A subgraph is defined as a network within topological distance five around a node

  51. [51]

    A subgraph is defined as a network within topological distance five around a node

    Local closeness centrality of a street network is denoted as The Hierarchical Morphotope Classification 26 (47) 𝑙𝐶𝐶𝑛𝑜𝑑𝑒 = 𝑛−1 ∑𝑛 −1 𝑣=1 𝑑(𝑣,𝑢) where 𝑑(𝑣, 𝑢) is the shortest-path distance between 𝑣 and 𝑢, and 𝑛 is the number of nodes within a subgraph. A subgraph is defined as a network within topological distance five around a node

  52. [52]

    Square clustering of a street network is denoted as (48) 𝑠𝐶𝑙𝑛𝑜𝑑𝑒 = ∑𝑘𝑣 𝑢 =1 ∑𝑘𝑣 𝑤=𝑢 +1 𝑞𝑣(𝑢,𝑤) ∑𝑘𝑣 𝑢 =1 ∑𝑘𝑣 𝑤=𝑢 +1 [𝑎𝑣(𝑢,𝑤)+𝑞𝑣(𝑢,𝑤)] where 𝑞𝑣(𝑢, 𝑤) are the number of common neighbours of 𝑢 and 𝑤 other than 𝑣 (ie squares), and 𝑎𝑣(𝑢, 𝑤)= (𝑘𝑢 − (1 + 𝑞𝑣(𝑢, 𝑤)+ 𝜃𝑢𝑣))(𝑘𝑤− (1 + 𝑞𝑣(𝑢, 𝑤)+ 𝜃𝑢𝑤)), where 𝜃𝑢𝑤 = 1 if 𝑢 and 𝑤 are connected and 0 otherwise (Lind et al. 2005)

  53. [53]

    Connected buildings count is denoted as (49) 𝑐𝑏𝑙𝑔 and defined as number of buildings directly adjacent to the target building

  54. [54]

    Connected buildings area is denoted as (50) 𝑎𝑐𝑏𝑙𝑔 and defined as total area of all buildings directly adjacent to the target building

  55. [55]

    Connected buildings perimeter is denoted as (51) 𝑝𝑐𝑏𝑙𝑔 and defined as total perimeter of all buildings directly adjacent to the target building

  56. [56]

    Connected buildings elongation is denoted as (52) 𝑚𝑖𝑏𝐸𝑙𝑜𝑐𝑏𝑙𝑔 = 𝐸𝑙𝑜(𝑐𝑏𝑙𝑔) where 𝑐𝑏𝑙𝑔 are all buildings adjacent to the target building and 𝐸𝑙𝑜 is the elongation formula defined previously

  57. [57]

    Connected buildings elongation is denoted as (53) 𝑚𝑖𝑏𝐸𝑅𝐼𝑐𝑏𝑙𝑔 = 𝐸𝑅𝐼(𝑐𝑏𝑙𝑔) where 𝑐𝑏𝑙𝑔 are all buildings adjacent to the target building and 𝐸𝑅𝐼 is the elongation formula defined previously

  58. [58]

    Connected buildings circular compactness is denoted as (54) 𝑚𝑖𝑏𝐶𝐶𝑜𝑐𝑏𝑙𝑔 = 𝐶𝐶𝑜(𝑐𝑏𝑙𝑔) where 𝑐𝑏𝑙𝑔 are all buildings adjacent to the target building and 𝐶𝐶𝑜 is the elongation formula defined previously

  59. [59]

    Connected buildings longest axis length is denoted as (55) 𝑚𝑖𝑏𝐿𝐴𝐿𝑐𝑏𝑙𝑔 = 𝐿𝐴𝐿(𝑐𝑏𝑙𝑔) where 𝑐𝑏𝑙𝑔 are all buildings adjacent to the target building and 𝐿𝐴𝐿 is the elongation formula defined previously

  60. [60]

    Connected buildings facade ratio is denoted as (56) 𝑚𝑖𝑏𝐹𝑅𝑐𝑏𝑙𝑔 = 𝑚𝑖𝑏𝐴𝑟𝑒𝑐𝑏 𝑙 𝑔 𝑚𝑖𝑏𝑃 𝑒𝑟𝑐𝑏 𝑙 𝑔 where 𝑐𝑏𝑙𝑔 are all buildings adjacent to the target building and 𝑚𝑖𝑏𝐴𝑟𝑒 and 𝑚𝑖𝑏𝑃𝑒𝑟 are the formulas defined previously

  61. [61]

    Connected buildings square compactness is denoted as (57) 𝑚𝑖𝑏𝑆𝐶𝑜𝑐𝑏𝑙𝑔 = ( 4√𝑚𝑖𝑏𝐴𝑟𝑒𝑐𝑏 𝑙 𝑔 𝑚𝑖𝑏𝑃 𝑒𝑟𝑐𝑏 𝑙 𝑔 ) 2 where 𝑐𝑏𝑙𝑔 are all buildings adjacent to the target building and 𝑚𝑖𝑏𝐴𝑟𝑒 and 𝑚𝑖𝑏𝑃𝑒𝑟 are the formulas defined previously

  62. [62]

    Deviation of building area in tessellation neighbourhood is denoted as (58) 𝑚𝑖𝑐𝐵𝐴𝐷𝑐𝑒𝑙𝑙 and is defined as the standard deviation in the areas of all buildings within tessellation cells, directly adjacent to the target tesellation cell

  63. [63]

    Likely Occupied Area

    Deviation of building area in node-attached buildings is denoted as (59) 𝑚𝑖𝑑𝐵𝐴𝐷𝑛𝑜𝑑𝑒 and is defined as the standard deviation in the areas of all buildings attached to the target node. The Hierarchical Morphotope Classification 27 There are three additional indicator variable calculated per morphotope - “Likely Occupied Area”, “Area of the largest ten conn...

  64. [64]

    First, it generates a full Ward clustering tree based on differences in feature space, and adjacency in geographic space

  65. [65]

    Second it uses Leaf extraction to generate a set of clusters from the dendrogram. The linkage matrix is generated by computing distances between observations based on the Ward formula, subject to a restriction that new connections must be spatially adjacent enclosed tessel- lation cells. The leaf extraction algorithm processes the resulting dendrogram as follows:

  66. [66]

    The dendrogram is cut at all possible levels - one for each connection - starting from the lowest to the highest distance value

  67. [67]

    If a cluster has more than N ETCs it is marked for extraction

    At every level the number of members within each cluster and its constituent children are counted. If a cluster has more than N ETCs it is marked for extraction

  68. [68]

    Since the members of a marked cluster keeps increasing until a merger occurs, typically each extracted cluster has more than N members

    When one cluster marked for extraction merges with another, both are extracted from the dendrogram as separate clusters. Since the members of a marked cluster keeps increasing until a merger occurs, typically each extracted cluster has more than N members

  69. [69]

    Perimeter of the largest ten connected structures

    All points that are never part of a marked cluster are treated as outliers and marked as noise. Before applying the clustering algorithm, the all variables are preprocessed using a Quantile Transformer with a uniform distribution. This data transformation produces a relatively more equal weighing of all features when calculating distances between observat...