SENSE: Satellite-based ENergy Synthesis for Sustainable Environment
Pith reviewed 2026-05-20 12:10 UTC · model grok-4.3
The pith
SENSE generates urban satellite imagery and aligned building energy maps using a controllable diffusion model conditioned on road networks.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
SENSE is a unified generative UBEM framework that jointly synthesizes realistic urban satellite imagery and aligned high-quality building energy consumption and height maps by conditioning on road networks and urban density metrics, leveraging knowledge from large vision models to generate annotations in the latent space that achieve high visual fidelity and strong physical consistency satisfying the ASHRAE standard metric.
What carries the argument
Controllable diffusion model conditioned on road networks and urban density metrics, which generates urban building energy consumption and height information as annotations in the latent space.
Load-bearing premise
Conditioning a diffusion model on road networks and urban density metrics will produce building energy consumption values that align visually with the generated imagery and remain physically consistent enough to meet ASHRAE standards while improving downstream predictions.
What would settle it
Apply the model to a fifth city and check whether the generated energy maps still meet ASHRAE consistency while reducing real prediction errors by 3 to 11 percent NMBE compared with prior methods.
Figures
read the original abstract
Urban Building Energy Modeling plays a critical role in achieving the United Nations' Sustainable Development Goals 7 and 11. Although existing studies based on satellite imagery and deep learning have achieved remarkable progress, many challenges exist: most existing studies are inherently predictive, failing to reflect the generative nature of urban planning; although generative AI and diffusion models have seen explosive growth in satellite imagery, they lack the urban functional generation (e.g., energy layer); third, aligned high-quality high-resolution building energy data with satellite imagery is limited and scarce. Here we propose SENSE (Satellite-based ENergy Synthesis for Sustainable Environment), a unified generative UBEM framework that jointly synthesizes realistic urban satellite imagery and aligned high-quality building energy consumption and height maps. By conditioning on road networks and urban density metrics, SENSE, based on a controllable diffusion model, leverages the knowledge learned by large vision models to generate urban building energy consumption and height information (annotations) in the latent space. Experiments across four cities (New York City, Boston, Lyon, Busan) demonstrate that SENSE achieves high visual fidelity and strong physical consistency, satisfying the ASHRAE standard metric. Experiments demonstrate that SENSE can generate enough annotated synthetic data using less than 20% labeled energy data, boosting downstream prediction performance by 10% IoU. Compared to SOTA urban energy prediction methods, SENSE significantly reduced prediction error (reduced 3%-11% NMBE and 1%-9% CVRMSE). This study offers an energy-efficiency urban planning and physical generation solution for urban science, energy science and building science. The dataset and code: https://huggingface.co/datasets/skl24/MUSE and https://github.com/kailaisun/GenAI4Urban-Energy/.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes SENSE, a unified generative UBEM framework based on a controllable diffusion model that jointly synthesizes realistic urban satellite imagery and aligned building energy consumption and height maps. Conditioned on road networks and urban density metrics and leveraging large vision models in latent space, it generates synthetic annotated data from less than 20% labeled energy data. Experiments on four cities (New York City, Boston, Lyon, Busan) report high visual fidelity, physical consistency satisfying ASHRAE standards, a 10% IoU improvement in downstream prediction, and error reductions of 3%-11% NMBE and 1%-9% CVRMSE versus SOTA methods.
Significance. If the central claims hold, the work could meaningfully advance sustainable urban planning and energy modeling by addressing data scarcity through generative synthesis of physically consistent annotations. The open release of the dataset on Hugging Face and code on GitHub is a clear strength that supports reproducibility and community validation in computer vision, urban science, and building energy domains.
major comments (2)
- [Abstract] Abstract: the central claim that generated energy maps satisfy the ASHRAE standard metric (NMBE/CVRMSE) is load-bearing for the physical consistency assertion, yet no specific achieved values, thresholds, or comparison to independent physics-based simulations are provided; post-generation metric checks alone do not establish that the diffusion model has learned causal relationships rather than correlations from the limited labeled set.
- [Methods/Experiments] Methods/Experiments: conditioning solely on road networks and urban density without physics-informed losses, additional channels (e.g., building age, materials, occupancy), or explicit validation against independent energy simulations leaves the reported 3%-11% NMBE and 1%-9% CVRMSE reductions vulnerable to city-specific spurious correlations; this directly affects the generalizability claim across the four cities and the downstream 10% IoU boost.
minor comments (2)
- [Abstract] Define all acronyms (UBEM, NMBE, CVRMSE, IoU, ASHRAE) on first use in the abstract and introduction for clarity.
- [Abstract] The abstract states 'high visual fidelity' without reporting standard quantitative metrics (e.g., FID, PSNR, or SSIM) for the generated satellite imagery; adding these would strengthen the visual quality claim.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback on our manuscript. We appreciate the positive assessment of the work's significance for sustainable urban planning and the value placed on the open release of the dataset and code. We address each major comment below with point-by-point responses, noting revisions to the manuscript where we agree changes are warranted.
read point-by-point responses
-
Referee: [Abstract] Abstract: the central claim that generated energy maps satisfy the ASHRAE standard metric (NMBE/CVRMSE) is load-bearing for the physical consistency assertion, yet no specific achieved values, thresholds, or comparison to independent physics-based simulations are provided; post-generation metric checks alone do not establish that the diffusion model has learned causal relationships rather than correlations from the limited labeled set.
Authors: We agree that the abstract would be strengthened by including the specific numerical values supporting the ASHRAE claim. In the revised manuscript we have updated the abstract to report the achieved reductions of 3%-11% NMBE and 1%-9% CVRMSE and to reference the ASHRAE thresholds that are met. Regarding causal relationships versus correlations, the current validation relies on held-out test performance and downstream task gains across four cities; while this does not constitute explicit causal discovery, it provides evidence of generalization beyond single-city correlations. We have added a clarifying paragraph in the discussion section on this distinction and the role of post-generation metric checks. revision: yes
-
Referee: [Methods/Experiments] Methods/Experiments: conditioning solely on road networks and urban density without physics-informed losses, additional channels (e.g., building age, materials, occupancy), or explicit validation against independent energy simulations leaves the reported 3%-11% NMBE and 1%-9% CVRMSE reductions vulnerable to city-specific spurious correlations; this directly affects the generalizability claim across the four cities and the downstream 10% IoU boost.
Authors: We partially concur that physics-informed losses or extra channels could improve robustness in principle. However, the design deliberately uses only road networks and density metrics because these are the inputs most widely available at scale; requiring building age or material data would limit practical applicability. The reported gains are observed consistently across four cities with distinct urban forms and climates, which we believe reduces the likelihood of city-specific spurious correlations. To directly address the request for independent validation, we have added comparisons against physics-based simulation baselines in the revised experiments section and expanded the generalizability analysis. revision: partial
Circularity Check
Generative model trained on real data; downstream gains measured on held-out sets with no definitional reduction.
full rationale
The paper trains a controllable diffusion model on satellite imagery paired with limited (<20% labeled) energy data, conditions on road networks and urban density, and generates synthetic imagery plus aligned energy/height annotations. These synthetics augment training for a downstream predictor whose performance (IoU, NMBE, CVRMSE) is evaluated on held-out real data across four cities and checked against ASHRAE thresholds post-generation. No equation or claim reduces the generated energy values to the training inputs by construction, nor renames a fitted parameter as a prediction. The ASHRAE satisfaction is an external metric applied after synthesis rather than a self-referential fit. Any self-citations are not load-bearing for the core generative claim, which remains independently testable against real held-out distributions and standard benchmarks.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
controllable diffusion model... Energy Decoder... discretize... log1p... NMBE/CVRMSE... ASHRAE Guideline 14
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
conditioning on road networks and urban density metrics... latent space
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Building Emissions Reduction and Disclosure Ordinance (BERDO)
2024. Building Emissions Reduction and Disclosure Ordinance (BERDO). https://data.boston.gov/dataset/building-emissions-reduction-and-disclosure- ordinance
work page 2024
-
[2]
Busan Metropolitan City Administrative Database
2024. Busan Metropolitan City Administrative Database. https://www.busan.go. kr/eng/index
work page 2024
-
[3]
Consommations énergétiques 2020 à l’adresse sur le territoire de la Métropole de Lyon
2024. Consommations énergétiques 2020 à l’adresse sur le territoire de la Métropole de Lyon. https://www.data.gouv.fr/datasets/consommations- energetiques-2020-a-ladresse-sur-le-territoire-de-la-metropole-de-lyon
work page 2024
-
[4]
Energy and Water Data Disclosure for Local Law 84
2024. Energy and Water Data Disclosure for Local Law 84. https: //data.cityofnewyork.us/Environment/Energy-and-Water-Data-Disclosure- for-Local-Law-84-/28fi-3us3/about_data
work page 2024
-
[5]
Global Human Settlement (GHS) Urban Centre Database 2023
2024. Global Human Settlement (GHS) Urban Centre Database 2023. https: //human-settlement.emergency.copernicus.eu/ghs_ucdb_2024.php
work page 2024
-
[6]
2024. Mapbox Static Tiles API. https://docs.mapbox.com/api/maps/static-tiles/
work page 2024
- [7]
- [8]
-
[9]
2014.Ashrae Guideline 14-2014: Measurement of Energy, De- mand and Water Savings
Refrigerating American Society of Heating and Georgia) Air Conditioning En- gineers (Atlanta. 2014.Ashrae Guideline 14-2014: Measurement of Energy, De- mand and Water Savings. American Society of Heating, Refrigerating, and Air- Conditioning Engineers. https://books.google.co.jp/books?id=zlJkAQAACAAJ
work page 2014
-
[10]
Chenhang Bian, Ka Lung Cheung, Xi Chen, and Chi Chung Lee. 2025. Integrating microclimate modelling with building energy simulation and solar photovoltaic potential estimation: The parametric analysis and optimization of urban design. Applied Energy380 (2025), 125062
work page 2025
-
[11]
Yangzi Che, Xuecao Li, Xiaoping Liu, Yuhao Wang, Weilin Liao, Xianwei Zheng, Xucai Zhang, Xiaocong Xu, Qian Shi, Jiajun Zhu, et al. 2024. 3D-GloBFP: The first global three-dimensional building footprint dataset.Earth System Science Data Discussions2024 (2024), 1–28
work page 2024
-
[12]
2024.Building height of Europe in 3D-GloBFP
Yangzi Che, Xuecao Li, Xiaoping Liu, Yuhao Wang, Weilin Liao, Xianwei Zheng, Xucai Zhang, Xiaocong Xu, Qian Shi, Jiajun Zhu, Hua Yuan, and Yongjiu Dai. 2024.Building height of Europe in 3D-GloBFP. doi:10.5281/zenodo.11391077
-
[13]
Liang-Chieh Chen, George Papandreou, Florian Schroff, and Hartwig Adam. 2017. Rethinking atrous convolution for semantic image segmentation.arXiv preprint arXiv:1706.05587(2017)
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[14]
Ting-Yu Dai, Dev Niyogi, and Zoltan Nagy. 2025. CityTFT: A temporal fusion transformer-based surrogate model for urban building energy modeling.Applied Energy389 (2025), 125712
work page 2025
-
[15]
Derek Fehrer and Moncef Krarti. 2018. Spatial distribution of building energy use in the United States through satellite imagery of the earth at night.Building and Environment142 (2018), 252–264. doi:10.1016/j.buildenv.2018.06.033
-
[16]
GlobalABC. 2025. Global Status Report for Buildings and Construction 2024/25. https://globalabc.org/sites/default/files/2025-03/Global-Status-Report- 2024_2025.pdf. Accessed 2025-12-16
work page 2025
-
[17]
Mingyi He, Yuebing Liang, Shenhao Wang, Yunhan Zheng, Qingyi Wang, Dingyi Zhuang, Li Tian, and Jinhua Zhao. 2026. Human-guided urban form generation using multimodal diffusion models.Building and Environment287 (2026), 113892. doi:10.1016/j.buildenv.2025.113892
-
[18]
Jonathan Ho, Ajay Jain, and Pieter Abbeel. 2020. Denoising diffusion probabilistic models.Advances in neural information processing systems33 (2020), 6840–6851
work page 2020
-
[19]
Samar Khanna, Patrick Liu, Linqi Zhou, Chenlin Meng, Robin Rombach, Marshall Burke, David B. Lobell, and Stefano Ermon. 2024. DiffusionSat: A Generative Foundation Model for Satellite Imagery. InThe Twelfth International Conference on Learning Representations. https://openreview.net/forum?id=I5webNFDgQ
work page 2024
- [20]
-
[21]
I. Marí Rivero, M. Melchiorri, and et al. Florio, P. 2024. GHS Urban Centre Database 2024, multitemporal and multidimensional attributes, R2024A. https:// data.jrc.ec.europa.eu/dataset/1a338be6-7eaf-480c-9664-3a8ade88cbcd [Dataset]
work page 2024
-
[22]
Kevin Mayer, Lukas Haas, Tianyuan Huang, Juan Bernabé-Moreno, Ram Ra- jagopal, and Martin Fischer. 2023. Estimating building energy efficiency from street view imagery, aerial imagery, and land surface temperature data.Applied Energy333 (2023), 120542. doi:10.1016/j.apenergy.2022.120542
-
[23]
NYC Department of City Planning. 2024. Building Footprints. https://data. cityofnewyork.us/City-Government/BUILDING/5zhs-2jue/about_data. Ac- cessed: 2025-01-05
work page 2024
-
[24]
Nirav Patel. 2023. Generative Artificial Intelligence and Remote Sensing: A perspective on the past and the future [Perspectives].IEEE Geoscience and Remote Sensing Magazine11, 2 (2023), 86–100. doi:10.1109/MGRS.2023.3275984
-
[25]
Martino Pesaresi, Marcello Schiavina, Panagiotis Politis, and et al. Freire. 2024. Advances on the Global Human Settlement Layer by joint assessment of Earth Observation and population survey data.International Journal of Digital Earth 17, 1 (2024), 2390454
work page 2024
-
[26]
Christoph F Reinhart and Carlos Cerezo Davila. 2016. Urban building energy modeling–A review of a nascent field.Building and Environment97 (2016), 196–202
work page 2016
-
[27]
Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. 2022. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10684–10695. Foundational paper for Latent Diffusion Models
work page 2022
-
[28]
Mohammad Royapoor and Tony Roskilly. 2015. Building model calibration using energy and environmental data.Energy and Buildings94 (2015), 109–120. doi:10.1016/j.enbuild.2015.02.050
-
[29]
Artem Streltsov, Jordan M Malof, Bohao Huang, and Kyle Bradbury. 2020. Esti- mating residential building energy consumption using overhead imagery.Applied Energy280 (2020), 116018
work page 2020
-
[30]
Kailai Sun, Qianchuan Zhao, and Jianhong Zou. 2020. A review of building occupancy measurement systems.Energy and Buildings216 (2020), 109965
work page 2020
-
[31]
D. Tang et al. 2024. CRS-Diff: Controllable Remote Sensing Image Generation With Diffusion Model.IEEE Transactions on Geoscience and Remote Sensing62 (2024), 1–14. Benchmark for visual controllability in Remote Sensing GenAI
work page 2024
-
[32]
United Nations, Department of Economic and Social Affairs, Population Division
-
[33]
68% of the World Population Projected to Live in Urban Areas by 2050, says UN. https://www.un.org/development/desa/en/news/population/2018-revision- of-world-urbanization-prospects.html
work page 2050
-
[34]
Gengzhe Wang, Qing Hu, Linghao He, Jialong Guo, Jin Huang, and Lijin Zhong
-
[35]
The estimation of building carbon emission using nighttime light images: A comparative study at various spatial scales.Sustainable Cities and Society101 (2024), 105066
work page 2024
-
[36]
Kai Wang, Shuo Shan, Weijing Dou, Haikun Wei, and Kanjian Zhang. 2025. A cross-modal deep learning method for enhancing photovoltaic power forecasting with satellite imagery and time series data.Energy Conversion and Management 323 (2025), 119218. doi:10.1016/j.enconman.2024.119218
-
[37]
Qingyi Wang, Yuebing Liang, Yunhan Zheng, Kaiyuan Xu, Jinhua Zhao, and Shenhao Wang. 2025. Generative AI for urban planning: Synthesizing satellite imagery via diffusion models.Computers, Environment and Urban Systems122 (2025), 102339. doi:10.1016/j.compenvurbsys.2025.102339
- [38]
-
[39]
Weijia Wu, Yuzhong Zhao, Hao Chen, Yuchao Gu, Rui Zhao, Yefei He, Hong Zhou, Mike Zheng Shou, and Chunhua Shen. 2023. DatasetDM: Synthesizing Data with Perception Annotations Using Diffusion Models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2308–2319
work page 2023
-
[40]
Enze Xie, Wenhai Wang, Zhiding Yu, Anima Anandkumar, Jose M Alvarez, and Ping Luo. 2021. SegFormer: Simple and Efficient Design for Semantic Segmenta- tion with Transformers. InNeural Information Processing Systems (NeurIPS)
work page 2021
-
[41]
Tian Xing, Hu Yan, Xinwei Wang, Kailai Sun, Han Yu, Pinjie Li, and Qianchuan Zhao. 2025. DLDC: A Dual Loop Data Cleaning Method for Fine-Tuning Remote Sensing Image Generative Models.IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing18 (2025), 28709–28725. doi:10.1109/JSTARS. 2025.3627924
-
[42]
Chen Yang, Shengyuan Li, and Zhonghua Gou. 2025. Spatiotemporal prediction of urban building rooftop photovoltaic potential based on GCN-LSTM.Energy and Buildings334 (2025), 115522. doi:10.1016/j.enbuild.2025.115522
-
[43]
Winston Yap, Abraham Noah Wu, Clayton Miller, and Filip Biljecki. 2025. Reveal- ing building operating carbon dynamics for multiple cities.Nature Sustainability (2025), 1–12
work page 2025
-
[44]
Lvmin Zhang, Anyi Rao, and Maneesh Agrawala. 2023. Adding conditional con- trol to text-to-image diffusion models. InProceedings of the IEEE/CVF International Conference on Computer Vision. 3836–3847
work page 2023
-
[45]
Tianjie Zhao, Sheng Wang, Chaojun Ouyang, Min Chen, Chenying Liu, Jin Zhang, Long Yu, Fei Wang, Yong Xie, Jun Li, et al. 2024. Artificial intelligence for geoscience: Progress, challenges, and perspectives.The Innovation5, 5 (2024)
work page 2024
-
[46]
Jingfeng Zhou, Jiantong Li, Jiayu Xie, et al. 2025. State-of-the-art review of urban building energy modelling on supporting sustainable development goals.Applied Energy402 (2025), 126924. 9 Conference’17, July 2017, Washington, DC, USA Kailai et al. A Appendix A.1 Implementation Details A.1.1 Building Height Data.Given the inherent noise in satellite- de...
work page 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.