pith. sign in

arxiv: 2605.18101 · v1 · pith:XJUI3ZETnew · submitted 2026-05-18 · 💻 cs.CV · cs.AI

SENSE: Satellite-based ENergy Synthesis for Sustainable Environment

Pith reviewed 2026-05-20 12:10 UTC · model grok-4.3

classification 💻 cs.CV cs.AI
keywords urban building energy modelingdiffusion modelssatellite imagerysynthetic data generationbuilding energy consumptionsustainable urban planninggenerative AI
0
0 comments X

The pith

SENSE generates urban satellite imagery and aligned building energy maps using a controllable diffusion model conditioned on road networks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces SENSE as a generative framework that creates realistic satellite images of cities together with matching maps of building energy consumption and heights. It tackles scarce aligned data by conditioning a diffusion model on road networks and urban density metrics to produce annotations in latent space. The synthetic data supplements limited real labels to train stronger prediction models for urban energy use. A reader would care because better energy modeling supports planning efficient cities that advance sustainability goals. Tests across four cities show the outputs meet physical standards and improve real prediction tasks.

Core claim

SENSE is a unified generative UBEM framework that jointly synthesizes realistic urban satellite imagery and aligned high-quality building energy consumption and height maps by conditioning on road networks and urban density metrics, leveraging knowledge from large vision models to generate annotations in the latent space that achieve high visual fidelity and strong physical consistency satisfying the ASHRAE standard metric.

What carries the argument

Controllable diffusion model conditioned on road networks and urban density metrics, which generates urban building energy consumption and height information as annotations in the latent space.

Load-bearing premise

Conditioning a diffusion model on road networks and urban density metrics will produce building energy consumption values that align visually with the generated imagery and remain physically consistent enough to meet ASHRAE standards while improving downstream predictions.

What would settle it

Apply the model to a fifth city and check whether the generated energy maps still meet ASHRAE consistency while reducing real prediction errors by 3 to 11 percent NMBE compared with prior methods.

Figures

Figures reproduced from arXiv: 2605.18101 by Alok Prakash, Baoshen Guo, Can Rong, Heye Huang, Jinhua Zhao, Kailai Sun, Mingyi He, Shenhao Wang.

Figure 1
Figure 1. Figure 1: Proposed GenAI framework for generating satellite image, building height and building energy consumption together. [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Normalized confusion matrix for H-Decoder and [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Qualitative visualization of generated results across [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Performance comparison of building energy con [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Performance comparison of downstream building energy consumption and height prediction tasks under different [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Data filtering. We filtered out samples that clearly lacked building energy labels. [PITH_FULL_IMAGE:figures/full_fig_p012_6.png] view at source ↗
read the original abstract

Urban Building Energy Modeling plays a critical role in achieving the United Nations' Sustainable Development Goals 7 and 11. Although existing studies based on satellite imagery and deep learning have achieved remarkable progress, many challenges exist: most existing studies are inherently predictive, failing to reflect the generative nature of urban planning; although generative AI and diffusion models have seen explosive growth in satellite imagery, they lack the urban functional generation (e.g., energy layer); third, aligned high-quality high-resolution building energy data with satellite imagery is limited and scarce. Here we propose SENSE (Satellite-based ENergy Synthesis for Sustainable Environment), a unified generative UBEM framework that jointly synthesizes realistic urban satellite imagery and aligned high-quality building energy consumption and height maps. By conditioning on road networks and urban density metrics, SENSE, based on a controllable diffusion model, leverages the knowledge learned by large vision models to generate urban building energy consumption and height information (annotations) in the latent space. Experiments across four cities (New York City, Boston, Lyon, Busan) demonstrate that SENSE achieves high visual fidelity and strong physical consistency, satisfying the ASHRAE standard metric. Experiments demonstrate that SENSE can generate enough annotated synthetic data using less than 20% labeled energy data, boosting downstream prediction performance by 10% IoU. Compared to SOTA urban energy prediction methods, SENSE significantly reduced prediction error (reduced 3%-11% NMBE and 1%-9% CVRMSE). This study offers an energy-efficiency urban planning and physical generation solution for urban science, energy science and building science. The dataset and code: https://huggingface.co/datasets/skl24/MUSE and https://github.com/kailaisun/GenAI4Urban-Energy/.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes SENSE, a unified generative UBEM framework based on a controllable diffusion model that jointly synthesizes realistic urban satellite imagery and aligned building energy consumption and height maps. Conditioned on road networks and urban density metrics and leveraging large vision models in latent space, it generates synthetic annotated data from less than 20% labeled energy data. Experiments on four cities (New York City, Boston, Lyon, Busan) report high visual fidelity, physical consistency satisfying ASHRAE standards, a 10% IoU improvement in downstream prediction, and error reductions of 3%-11% NMBE and 1%-9% CVRMSE versus SOTA methods.

Significance. If the central claims hold, the work could meaningfully advance sustainable urban planning and energy modeling by addressing data scarcity through generative synthesis of physically consistent annotations. The open release of the dataset on Hugging Face and code on GitHub is a clear strength that supports reproducibility and community validation in computer vision, urban science, and building energy domains.

major comments (2)
  1. [Abstract] Abstract: the central claim that generated energy maps satisfy the ASHRAE standard metric (NMBE/CVRMSE) is load-bearing for the physical consistency assertion, yet no specific achieved values, thresholds, or comparison to independent physics-based simulations are provided; post-generation metric checks alone do not establish that the diffusion model has learned causal relationships rather than correlations from the limited labeled set.
  2. [Methods/Experiments] Methods/Experiments: conditioning solely on road networks and urban density without physics-informed losses, additional channels (e.g., building age, materials, occupancy), or explicit validation against independent energy simulations leaves the reported 3%-11% NMBE and 1%-9% CVRMSE reductions vulnerable to city-specific spurious correlations; this directly affects the generalizability claim across the four cities and the downstream 10% IoU boost.
minor comments (2)
  1. [Abstract] Define all acronyms (UBEM, NMBE, CVRMSE, IoU, ASHRAE) on first use in the abstract and introduction for clarity.
  2. [Abstract] The abstract states 'high visual fidelity' without reporting standard quantitative metrics (e.g., FID, PSNR, or SSIM) for the generated satellite imagery; adding these would strengthen the visual quality claim.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback on our manuscript. We appreciate the positive assessment of the work's significance for sustainable urban planning and the value placed on the open release of the dataset and code. We address each major comment below with point-by-point responses, noting revisions to the manuscript where we agree changes are warranted.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central claim that generated energy maps satisfy the ASHRAE standard metric (NMBE/CVRMSE) is load-bearing for the physical consistency assertion, yet no specific achieved values, thresholds, or comparison to independent physics-based simulations are provided; post-generation metric checks alone do not establish that the diffusion model has learned causal relationships rather than correlations from the limited labeled set.

    Authors: We agree that the abstract would be strengthened by including the specific numerical values supporting the ASHRAE claim. In the revised manuscript we have updated the abstract to report the achieved reductions of 3%-11% NMBE and 1%-9% CVRMSE and to reference the ASHRAE thresholds that are met. Regarding causal relationships versus correlations, the current validation relies on held-out test performance and downstream task gains across four cities; while this does not constitute explicit causal discovery, it provides evidence of generalization beyond single-city correlations. We have added a clarifying paragraph in the discussion section on this distinction and the role of post-generation metric checks. revision: yes

  2. Referee: [Methods/Experiments] Methods/Experiments: conditioning solely on road networks and urban density without physics-informed losses, additional channels (e.g., building age, materials, occupancy), or explicit validation against independent energy simulations leaves the reported 3%-11% NMBE and 1%-9% CVRMSE reductions vulnerable to city-specific spurious correlations; this directly affects the generalizability claim across the four cities and the downstream 10% IoU boost.

    Authors: We partially concur that physics-informed losses or extra channels could improve robustness in principle. However, the design deliberately uses only road networks and density metrics because these are the inputs most widely available at scale; requiring building age or material data would limit practical applicability. The reported gains are observed consistently across four cities with distinct urban forms and climates, which we believe reduces the likelihood of city-specific spurious correlations. To directly address the request for independent validation, we have added comparisons against physics-based simulation baselines in the revised experiments section and expanded the generalizability analysis. revision: partial

Circularity Check

0 steps flagged

Generative model trained on real data; downstream gains measured on held-out sets with no definitional reduction.

full rationale

The paper trains a controllable diffusion model on satellite imagery paired with limited (<20% labeled) energy data, conditions on road networks and urban density, and generates synthetic imagery plus aligned energy/height annotations. These synthetics augment training for a downstream predictor whose performance (IoU, NMBE, CVRMSE) is evaluated on held-out real data across four cities and checked against ASHRAE thresholds post-generation. No equation or claim reduces the generated energy values to the training inputs by construction, nor renames a fitted parameter as a prediction. The ASHRAE satisfaction is an external metric applied after synthesis rather than a self-referential fit. Any self-citations are not load-bearing for the core generative claim, which remains independently testable against real held-out distributions and standard benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The framework rests on standard assumptions of diffusion models for image generation and the premise that limited labeled energy data plus infrastructure conditioning suffice for physical consistency; no new physical axioms or invented entities are introduced.

pith-pipeline@v0.9.0 · 5869 in / 1242 out tokens · 47143 ms · 2026-05-20T12:10:32.975535+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

46 extracted references · 46 canonical work pages · 1 internal anchor

  1. [1]

    Building Emissions Reduction and Disclosure Ordinance (BERDO)

    2024. Building Emissions Reduction and Disclosure Ordinance (BERDO). https://data.boston.gov/dataset/building-emissions-reduction-and-disclosure- ordinance

  2. [2]

    Busan Metropolitan City Administrative Database

    2024. Busan Metropolitan City Administrative Database. https://www.busan.go. kr/eng/index

  3. [3]

    Consommations énergétiques 2020 à l’adresse sur le territoire de la Métropole de Lyon

    2024. Consommations énergétiques 2020 à l’adresse sur le territoire de la Métropole de Lyon. https://www.data.gouv.fr/datasets/consommations- energetiques-2020-a-ladresse-sur-le-territoire-de-la-metropole-de-lyon

  4. [4]

    Energy and Water Data Disclosure for Local Law 84

    2024. Energy and Water Data Disclosure for Local Law 84. https: //data.cityofnewyork.us/Environment/Energy-and-Water-Data-Disclosure- for-Local-Law-84-/28fi-3us3/about_data

  5. [5]

    Global Human Settlement (GHS) Urban Centre Database 2023

    2024. Global Human Settlement (GHS) Urban Centre Database 2023. https: //human-settlement.emergency.copernicus.eu/ghs_ucdb_2024.php

  6. [6]

    Mapbox Static Tiles API

    2024. Mapbox Static Tiles API. https://docs.mapbox.com/api/maps/static-tiles/

  7. [7]

    OpenStreetMap

    2024. OpenStreetMap. https://www.openstreetmap.org

  8. [8]

    Ali et al

    U. Ali et al. 2023. A review of urban building energy modeling techniques.Applied Energy330 (2023), 120345

  9. [9]

    2014.Ashrae Guideline 14-2014: Measurement of Energy, De- mand and Water Savings

    Refrigerating American Society of Heating and Georgia) Air Conditioning En- gineers (Atlanta. 2014.Ashrae Guideline 14-2014: Measurement of Energy, De- mand and Water Savings. American Society of Heating, Refrigerating, and Air- Conditioning Engineers. https://books.google.co.jp/books?id=zlJkAQAACAAJ

  10. [10]

    Chenhang Bian, Ka Lung Cheung, Xi Chen, and Chi Chung Lee. 2025. Integrating microclimate modelling with building energy simulation and solar photovoltaic potential estimation: The parametric analysis and optimization of urban design. Applied Energy380 (2025), 125062

  11. [11]

    Yangzi Che, Xuecao Li, Xiaoping Liu, Yuhao Wang, Weilin Liao, Xianwei Zheng, Xucai Zhang, Xiaocong Xu, Qian Shi, Jiajun Zhu, et al. 2024. 3D-GloBFP: The first global three-dimensional building footprint dataset.Earth System Science Data Discussions2024 (2024), 1–28

  12. [12]

    2024.Building height of Europe in 3D-GloBFP

    Yangzi Che, Xuecao Li, Xiaoping Liu, Yuhao Wang, Weilin Liao, Xianwei Zheng, Xucai Zhang, Xiaocong Xu, Qian Shi, Jiajun Zhu, Hua Yuan, and Yongjiu Dai. 2024.Building height of Europe in 3D-GloBFP. doi:10.5281/zenodo.11391077

  13. [13]

    Liang-Chieh Chen, George Papandreou, Florian Schroff, and Hartwig Adam. 2017. Rethinking atrous convolution for semantic image segmentation.arXiv preprint arXiv:1706.05587(2017)

  14. [14]

    Ting-Yu Dai, Dev Niyogi, and Zoltan Nagy. 2025. CityTFT: A temporal fusion transformer-based surrogate model for urban building energy modeling.Applied Energy389 (2025), 125712

  15. [15]

    Derek Fehrer and Moncef Krarti. 2018. Spatial distribution of building energy use in the United States through satellite imagery of the earth at night.Building and Environment142 (2018), 252–264. doi:10.1016/j.buildenv.2018.06.033

  16. [16]

    GlobalABC. 2025. Global Status Report for Buildings and Construction 2024/25. https://globalabc.org/sites/default/files/2025-03/Global-Status-Report- 2024_2025.pdf. Accessed 2025-12-16

  17. [17]

    Mingyi He, Yuebing Liang, Shenhao Wang, Yunhan Zheng, Qingyi Wang, Dingyi Zhuang, Li Tian, and Jinhua Zhao. 2026. Human-guided urban form generation using multimodal diffusion models.Building and Environment287 (2026), 113892. doi:10.1016/j.buildenv.2025.113892

  18. [18]

    Jonathan Ho, Ajay Jain, and Pieter Abbeel. 2020. Denoising diffusion probabilistic models.Advances in neural information processing systems33 (2020), 6840–6851

  19. [19]

    Lobell, and Stefano Ermon

    Samar Khanna, Patrick Liu, Linqi Zhou, Chenlin Meng, Robin Rombach, Marshall Burke, David B. Lobell, and Stefano Ermon. 2024. DiffusionSat: A Generative Foundation Model for Satellite Imagery. InThe Twelfth International Conference on Learning Representations. https://openreview.net/forum?id=I5webNFDgQ

  20. [20]

    Li and H

    Y. Li and H. Feng. 2025. Integrating urban building energy modeling (UBEM) and urban-building environmental impact assessment (UB-EIA) for sustainable urban development: A comprehensive review.Renewable and Sustainable Energy Reviews213 (2025), 115471

  21. [21]

    Marí Rivero, M

    I. Marí Rivero, M. Melchiorri, and et al. Florio, P. 2024. GHS Urban Centre Database 2024, multitemporal and multidimensional attributes, R2024A. https:// data.jrc.ec.europa.eu/dataset/1a338be6-7eaf-480c-9664-3a8ade88cbcd [Dataset]

  22. [22]

    Kevin Mayer, Lukas Haas, Tianyuan Huang, Juan Bernabé-Moreno, Ram Ra- jagopal, and Martin Fischer. 2023. Estimating building energy efficiency from street view imagery, aerial imagery, and land surface temperature data.Applied Energy333 (2023), 120542. doi:10.1016/j.apenergy.2022.120542

  23. [23]

    NYC Department of City Planning. 2024. Building Footprints. https://data. cityofnewyork.us/City-Government/BUILDING/5zhs-2jue/about_data. Ac- cessed: 2025-01-05

  24. [24]

    Nirav Patel. 2023. Generative Artificial Intelligence and Remote Sensing: A perspective on the past and the future [Perspectives].IEEE Geoscience and Remote Sensing Magazine11, 2 (2023), 86–100. doi:10.1109/MGRS.2023.3275984

  25. [25]

    Martino Pesaresi, Marcello Schiavina, Panagiotis Politis, and et al. Freire. 2024. Advances on the Global Human Settlement Layer by joint assessment of Earth Observation and population survey data.International Journal of Digital Earth 17, 1 (2024), 2390454

  26. [26]

    Christoph F Reinhart and Carlos Cerezo Davila. 2016. Urban building energy modeling–A review of a nascent field.Building and Environment97 (2016), 196–202

  27. [27]

    Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. 2022. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10684–10695. Foundational paper for Latent Diffusion Models

  28. [28]

    Mohammad Royapoor and Tony Roskilly. 2015. Building model calibration using energy and environmental data.Energy and Buildings94 (2015), 109–120. doi:10.1016/j.enbuild.2015.02.050

  29. [29]

    Artem Streltsov, Jordan M Malof, Bohao Huang, and Kyle Bradbury. 2020. Esti- mating residential building energy consumption using overhead imagery.Applied Energy280 (2020), 116018

  30. [30]

    Kailai Sun, Qianchuan Zhao, and Jianhong Zou. 2020. A review of building occupancy measurement systems.Energy and Buildings216 (2020), 109965

  31. [31]

    Tang et al

    D. Tang et al. 2024. CRS-Diff: Controllable Remote Sensing Image Generation With Diffusion Model.IEEE Transactions on Geoscience and Remote Sensing62 (2024), 1–14. Benchmark for visual controllability in Remote Sensing GenAI

  32. [32]

    United Nations, Department of Economic and Social Affairs, Population Division

  33. [33]

    https://www.un.org/development/desa/en/news/population/2018-revision- of-world-urbanization-prospects.html

    68% of the World Population Projected to Live in Urban Areas by 2050, says UN. https://www.un.org/development/desa/en/news/population/2018-revision- of-world-urbanization-prospects.html

  34. [34]

    Gengzhe Wang, Qing Hu, Linghao He, Jialong Guo, Jin Huang, and Lijin Zhong

  35. [35]

    The estimation of building carbon emission using nighttime light images: A comparative study at various spatial scales.Sustainable Cities and Society101 (2024), 105066

  36. [36]

    Kai Wang, Shuo Shan, Weijing Dou, Haikun Wei, and Kanjian Zhang. 2025. A cross-modal deep learning method for enhancing photovoltaic power forecasting with satellite imagery and time series data.Energy Conversion and Management 323 (2025), 119218. doi:10.1016/j.enconman.2024.119218

  37. [37]

    Qingyi Wang, Yuebing Liang, Yunhan Zheng, Kaiyuan Xu, Jinhua Zhao, and Shenhao Wang. 2025. Generative AI for urban planning: Synthesizing satellite imagery via diffusion models.Computers, Environment and Urban Systems122 (2025), 102339. doi:10.1016/j.compenvurbsys.2025.102339

  38. [38]

    Tao Wang, Christoph Reinhart, and Yu Qian Ang. 2025. sat2shp: Extracting key building features from a single satellite image for urban building energy modelling and beyond.Sustainable Cities and Society118 (2025), 106054. doi:10. 1016/j.scs.2024.106054

  39. [39]

    Weijia Wu, Yuzhong Zhao, Hao Chen, Yuchao Gu, Rui Zhao, Yefei He, Hong Zhou, Mike Zheng Shou, and Chunhua Shen. 2023. DatasetDM: Synthesizing Data with Perception Annotations Using Diffusion Models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2308–2319

  40. [40]

    Enze Xie, Wenhai Wang, Zhiding Yu, Anima Anandkumar, Jose M Alvarez, and Ping Luo. 2021. SegFormer: Simple and Efficient Design for Semantic Segmenta- tion with Transformers. InNeural Information Processing Systems (NeurIPS)

  41. [41]

    Tian Xing, Hu Yan, Xinwei Wang, Kailai Sun, Han Yu, Pinjie Li, and Qianchuan Zhao. 2025. DLDC: A Dual Loop Data Cleaning Method for Fine-Tuning Remote Sensing Image Generative Models.IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing18 (2025), 28709–28725. doi:10.1109/JSTARS. 2025.3627924

  42. [42]

    Chen Yang, Shengyuan Li, and Zhonghua Gou. 2025. Spatiotemporal prediction of urban building rooftop photovoltaic potential based on GCN-LSTM.Energy and Buildings334 (2025), 115522. doi:10.1016/j.enbuild.2025.115522

  43. [43]

    Winston Yap, Abraham Noah Wu, Clayton Miller, and Filip Biljecki. 2025. Reveal- ing building operating carbon dynamics for multiple cities.Nature Sustainability (2025), 1–12

  44. [44]

    Lvmin Zhang, Anyi Rao, and Maneesh Agrawala. 2023. Adding conditional con- trol to text-to-image diffusion models. InProceedings of the IEEE/CVF International Conference on Computer Vision. 3836–3847

  45. [45]

    Tianjie Zhao, Sheng Wang, Chaojun Ouyang, Min Chen, Chenying Liu, Jin Zhang, Long Yu, Fei Wang, Yong Xie, Jun Li, et al. 2024. Artificial intelligence for geoscience: Progress, challenges, and perspectives.The Innovation5, 5 (2024)

  46. [46]

    Jingfeng Zhou, Jiantong Li, Jiayu Xie, et al. 2025. State-of-the-art review of urban building energy modelling on supporting sustainable development goals.Applied Energy402 (2025), 126924. 9 Conference’17, July 2017, Washington, DC, USA Kailai et al. A Appendix A.1 Implementation Details A.1.1 Building Height Data.Given the inherent noise in satellite- de...