RoofNet: A Global Multimodal Dataset for Roof Material Identification from Earth Observation

Benjamin Tarver; Noelle Law; Sasha Getz; Yuki Miura

arxiv: 2505.19358 · v3 · pith:2Y27YKWTnew · submitted 2025-05-25 · 💻 cs.CE

RoofNet: A Global Multimodal Dataset for Roof Material Identification from Earth Observation

Benjamin Tarver , Noelle Law , Sasha Getz , Yuki Miura This is my paper

Pith reviewed 2026-05-19 12:50 UTC · model grok-4.3

classification 💻 cs.CE

keywords roof material classificationsatellite imagerymultimodal datasetvision-language modelsnatural hazard modelingglobal exposure datasetsEarth observationbuilding vulnerability

0 comments

The pith

RoofNet supplies over 51,500 satellite images labeled with 14 roof materials from 184 diverse global locations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper presents RoofNet as a new multimodal dataset that pairs high-resolution satellite imagery with text annotations for classifying roof materials on a global scale. Accurate information on roofing types is currently missing but needed to model how buildings will hold up against earthquakes, floods, wildfires, and hurricanes. The authors built the dataset by selecting images from many different climates and building styles, annotating a portion with experts, and using a vision-language model with special prompts to label the rest while checking results with rules and human review. This resource is meant to make global datasets of building exposure more reliable for disaster planning.

Core claim

The central claim is that RoofNet constitutes the largest and most geographically diverse multimodal dataset for roof material classification, with more than 51,500 samples drawn from 184 sites around the world, each combining Earth Observation imagery and curated annotations for 14 roofing types, created to support vision-language modeling and improve the accuracy of global building exposure data.

What carries the argument

The key mechanism is the process of geographic sampling from climatically and architecturally distinct regions, expert annotation of 6,000 images, application of geographic- and material-aware prompt tuning to a vision-language model, and subsequent rule-based and human-in-the-loop verification to generate labels and additional metadata for the full set of tiles.

If this is right

Improved modeling of building vulnerability to natural hazards becomes possible with better roof material data.
Vision-language models can be trained more effectively on global roof classification tasks.
Global exposure datasets gain higher fidelity through the inclusion of this labeled imagery and metadata.
Additional attributes like roof shape, area, solar panel presence, and mixed materials are available for each sample.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Such a dataset could allow researchers to track roof material changes over time by comparing images from different years.
Integration into disaster simulation tools might refine estimates of economic losses from future events.
Similar approaches could be applied to classify other building components, such as foundations or walls, using satellite data.

Load-bearing premise

The selection of 184 sites from climatically and architecturally distinct regions creates a representative sample of global roof materials, and the combination of expert annotations with rule-based and human verification produces accurate labels free from major bias.

What would settle it

Ground surveys or independent high-resolution inspections in several of the sampled sites that reveal a substantially different distribution of the 14 roof material types than what the dataset reports.

Figures

Figures reproduced from arXiv: 2505.19358 by Benjamin Tarver, Noelle Law, Sasha Getz, Yuki Miura.

**Figure 1.** Figure 1: Overview of RoofNet and downstream applications. (1) High-resolution EO imagery is annotated with prompts describing roof material and location. Materials are labeled using expertvalidated and VLM-assisted classification (2) A VLM is trained using RoofNet to label a subset of the xBD [9] dataset. (3) RoofNet is hosted online to allow for open-access to advance risk modeling frameworks and other downstream… view at source ↗

**Figure 2.** Figure 2: RoofNet dataset construction and classification pipeline. Architecturally and geographically diverse cities are selected to collect geocoded metadata and representative satellite imagery. Roofs are centered using GroundingDINO [26], with a 6k-sample subset manually verified for finetuning RemoteCLIP ViT-L/14 [24]. Remaining samples are classified using the model, followed by rule-based and human-in-the-l… view at source ↗

**Figure 3.** Figure 3: RoofNet material classes. RoofNet includes 14 distinct roof material classes to support downstream modeling of vulnerability to natural hazards. 3.1 Data Collection and Preprocessing Approximately 200 buildings per city were geocoded and matched with high-resolution satellite imagery. To enhance coverage of underrepresented (“long-tail") material classes (i.e., polycarbonate sheets, wood shingles, and glas… view at source ↗

**Figure 4.** Figure 4: Geographic and categorical distribution of roof material classes in the RoofNet dataset. The four world maps (left) visualize spatial coverage for each major roof material group (i.e., Manufactured Tiles, Sheet Materials, Synthetic/Flat Roofs, and Traditional/Natural) colored by material class and spanning 184 geographically diverse sites across 112 countries. These maps illustrate RoofNet’s architectural,… view at source ↗

**Figure 5.** Figure 5: Prompts were selected to maximize distance in material classifications while remaining in the 77 token limit. The image above shows how class names cause confusion and similar confidence levels in the out-of-the-box remoteCLIP, while carefully selecting prompts that use simple, common language with diverse descriptions allows for a greater separation between classes. The EO image uploaded is a Concrete Til… view at source ↗

**Figure 6.** Figure 6: Impact of Fine-tuning and Class-Rebalanced Training on Roof Material Classification Accuracy. We compare material classification confusion matrices across four models: (left) CLIP ViTL/14 [29], (center-left) RemoteCLIP ViT-L/14 [24], (center-right) RemoteCLIP ViT-L/14 fine-tuned on the RoofNet dataset subset, and (right) RemoteCLIP ViT-L/14 fine-tuned using class-rebalanced sampling. While the pretrained … view at source ↗

read the original abstract

Building-level exposure data are critical to natural hazard risk modeling, yet most global inventories describe where buildings are located rather than what they are made of. Roof material is a critical but poorly documented attribute for assessing vulnerability to wildfires, wind hazards, urban heat, floods, and earthquakes. To address this gap, we introduce RoofNet, a global dataset that maps 49,662 georeferenced building instances from 101 countries to 14 key roofing material classes using Earth observation (EO) imagery (redistributed where permitted) and associated geospatial metadata. RoofNet contributes (1) climatographically and architecturally diverse coverage of roof material labels, (2) a scalable annotation pipeline combining SME-guided manual labeling with vision-language model (VLM)-assisted classification, rule-based validation, and human-in-the-loop verification, and (3) a resource for evaluating subtle, geographically variable material-level identification in EO imagery and its implications for material-aware hazard risk modeling. Evaluation on a manually labeled hold-out set shows that zero-shot Remote Contrastive Language-Image Pre-Training (RemoteCLIP) struggles with roof material classification, while fine-tuning with RoofNet improves top-1 accuracy from 4.9% to 47.7%. We use RoofNet in an illustrative hazard case study to demonstrate how material-aware exposure data can change vulnerability estimates relative to material-naive inventories. RoofNet provides a missing material layer for global building attribute mapping and scalable hazard risk assessment.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

RoofNet claims a large new global dataset for roof material classification to aid hazard modeling, but the data has been removed due to licensing and updated versions are still in progress.

read the letter

The main takeaway is that this is a dataset paper offering over 51,500 EO image samples from 184 sites with labels for 14 roof types plus metadata on shape, area, solar panels, and mixed materials. The authors fine-tune a vision-language model on 6,000 expert-annotated images and apply it to the rest with rule-based and human verification. They target climatically and architecturally varied regions to support better global exposure maps for disasters like earthquakes and floods. That scale and the multimodal setup with text annotations are the concrete additions here, and the practical motivation for infrastructure vulnerability data is straightforward and useful in applied remote sensing and risk modeling circles. The geographic spread and extra metadata fields show some care in design beyond simple classification labels. The high-level description of sampling and prompt tuning for class separability is reasonable for a data collection effort. The central weakness is availability. The abstract states outright that the dataset used in earlier experiments was removed due to licensing constraints on imagery sources, with results to be interpreted cautiously and updated compliant data still in progress. Without the actual paired imagery and labels accessible, it is difficult to assess label accuracy, sampling representativeness, or whether the claimed diversity holds up in practice. The annotation and verification steps are outlined but not detailed enough to judge bias or error rates independently. This paper is aimed at researchers working on building classification from satellite data or on improving exposure datasets for natural hazards. A reader looking for ideas on large-scale labeling pipelines or metadata for roof analysis could extract some value from the approach, even if the current artifact is not usable. It deserves a serious referee because the gap it targets is real and the authors have been transparent about the licensing setback. Reviewers could focus on the new data version and verification protocols. I would send it to peer review with data availability as a clear requirement for any final acceptance.

Referee Report

2 major / 1 minor

Summary. The manuscript introduces RoofNet as the largest and most geographically diverse multimodal dataset for roof material classification, comprising over 51,500 samples from 184 sites that pair high-resolution Earth Observation imagery with curated text annotations for 14 roofing types. It describes sampling EO tiles from climatically and architecturally distinct regions, expert annotation of a 6,000-image subset, fine-tuning a vision-language model via geographic- and material-aware prompt tuning, and subsequent application with rule-based and human-in-the-loop verification. The work also provides rich metadata including roof shape, footprint area, solar panel presence, and mixed-material indicators. The abstract explicitly states that the dataset used in earlier experiments has been removed due to licensing constraints on imagery sources, with updated experiments using compliant data stated to be in progress.

Significance. If a fully compliant and accessible version of the dataset were released with verified results, RoofNet would offer a substantial contribution to global building exposure datasets for natural hazard modeling. Its scale, geographic diversity across 184 sites, and multimodal design supporting vision-language models could improve fidelity in vulnerability assessments for earthquakes, floods, wildfires, and hurricanes, while the additional metadata would enable more granular analyses.

major comments (2)

[Abstract] Abstract: The explicit statement that 'The dataset used in earlier experiments has been removed due to licensing constraints related to imagery sources' and that 'Results based on this dataset should be interpreted with caution' with 'Updated experiments using compliant data are in progress' directly undermines the central claim of introducing a usable RoofNet dataset. The core contribution is the paired EO imagery and labels, which are currently unavailable, preventing any verification or use of the claimed scale, diversity, or utility.
[Abstract] Abstract: The premise that sampling from climatically and architecturally distinct regions yields a representative global distribution of roof materials, combined with expert annotations and rule-based verification producing accurate labels without significant bias, is load-bearing for the dataset's claimed value but receives only high-level description without quantitative validation or bias assessment.

minor comments (1)

[Abstract] Abstract: The description of the VLM fine-tuning and prompt tuning process would benefit from explicit details on the number of classes, prompt templates, or performance metrics to clarify how class separability was enhanced.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive review and for highlighting important issues regarding dataset availability and validation. We address each major comment below, indicating revisions where appropriate.

read point-by-point responses

Referee: [Abstract] Abstract: The explicit statement that 'The dataset used in earlier experiments has been removed due to licensing constraints related to imagery sources' and that 'Results based on this dataset should be interpreted with caution' with 'Updated experiments using compliant data are in progress' directly undermines the central claim of introducing a usable RoofNet dataset. The core contribution is the paired EO imagery and labels, which are currently unavailable, preventing any verification or use of the claimed scale, diversity, or utility.

Authors: We agree that the cautionary language in the abstract regarding the removed dataset creates uncertainty about immediate usability and verifiability. This change was necessitated by licensing restrictions discovered after initial experiments. In the revised manuscript we have updated the abstract to state that a fully compliant version of RoofNet, constructed from alternative licensed imagery sources, is now the primary contribution, with the original non-compliant data removed entirely. We have added a dedicated paragraph in the methods section describing the new data sources, acquisition process, and preliminary scale achieved with the compliant imagery. We also include a clear statement that the full dataset and updated results will be released upon completion of verification. This revision directly addresses the concern by shifting emphasis to the compliant resource while acknowledging the transitional status. revision: partial
Referee: [Abstract] Abstract: The premise that sampling from climatically and architecturally distinct regions yields a representative global distribution of roof materials, combined with expert annotations and rule-based verification producing accurate labels without significant bias, is load-bearing for the dataset's claimed value but receives only high-level description without quantitative validation or bias assessment.

Authors: We accept that the original manuscript relied on high-level descriptions of the sampling strategy and verification pipeline without sufficient quantitative support. To strengthen this foundation we have added a new subsection on dataset representativeness that includes: (1) a breakdown of roof-material class frequencies stratified by the 184 sites and major Köppen climate zones, (2) inter-annotator agreement statistics (Cohen’s kappa) from the 6,000-image expert subset, and (3) a preliminary bias analysis comparing observed material distributions against publicly available national building statistics for a subset of countries. These additions provide measurable evidence for the sampling and annotation approach and will be expanded with the compliant dataset release. revision: yes

Circularity Check

0 steps flagged

No circularity: dataset introduction paper with no derivation or fitted predictions

full rationale

This is a data-collection paper whose central contribution is the creation and description of RoofNet (51,500+ samples, 184 sites, 14 roof classes, VLM fine-tuning plus rule-based verification). The abstract and provided text contain no equations, no parameter fitting presented as out-of-sample prediction, no uniqueness theorems, and no self-citations that bear the load of any claimed result. Sampling from climatically distinct regions and expert annotation are procedural choices, not self-referential definitions or reductions of outputs to inputs. The licensing-removal note affects reproducibility but does not create any circular step in a derivation chain. The work is therefore self-contained as an empirical resource rather than a mathematical or predictive model.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper contributes primarily through data curation and sampling rather than new theoretical constructs or fitted parameters.

axioms (1)

domain assumption High-resolution Earth observation imagery combined with expert text annotations can reliably identify roof material types across diverse global regions.
Invoked in the description of sampling EO tiles and the annotation workflow.

pith-pipeline@v0.9.0 · 5799 in / 1397 out tokens · 49071 ms · 2026-05-19T12:50:46.558696+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We sample EO tiles from climatically and architecturally distinct regions... A subset of 6,000 images was annotated... fine-tune a VLM... rule-based and human-in-the-loop verification.
IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean absolute_floor_iff_bare_distinguishability unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

The dataset used in earlier experiments has been removed due to licensing constraints...

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.