Synthetic Homes: A Multimodal Generative AI Pipeline for Residential Building Data Generation under Data Scarcity
Pith reviewed 2026-05-18 17:06 UTC · model grok-4.3
The pith
A multimodal generative AI pipeline creates synthetic residential building datasets from public records and images that overlap more than 65 percent with real reference data.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors claim that an end-to-end multimodal generative AI pipeline, integrating image, tabular, and simulation-based components and trained or prompted on public county records and images, produces synthetic residential building parameter sets whose distributions overlap the reference national dataset by more than 65 percent across all evaluated parameters and by more than 90 percent for three of the four parameters.
What carries the argument
The modular multimodal generative AI framework that fuses vision-language image analysis, tabular data synthesis, and simulation components to convert public county records and images into synthetic building parameter vectors.
If this is right
- Energy modeling and retrofit analysis can proceed without access to restricted private building records.
- Urban-scale simulations become feasible in regions where detailed building stock data is scarce or costly.
- Machine-learning tasks that rely on large building datasets can scale using only public inputs.
- The same pipeline can be reused to refresh or expand datasets as new public records become available.
Where Pith is reading between the lines
- Similar pipelines could be adapted for commercial buildings or non-residential stock if analogous public imagery and records exist.
- The occlusion-based visual focus test used to compare vision models could serve as a general diagnostic for other image-to-parameter extraction tasks in architecture.
- If the overlap holds under broader validation, the method could shorten the lead time for city-level energy policy studies that currently wait for new survey data.
Load-bearing premise
The generative components produce parameter distributions that represent actual residential buildings rather than artifacts introduced by the model architecture or its training data.
What would settle it
Collect real building parameter records from a new set of counties not used in training or prompting, generate matching synthetics with the same pipeline, and test whether the reported overlap percentages remain stable or drop sharply on any key parameter.
Figures
read the original abstract
Computational models have emerged as powerful tools for multi-scale energy modeling research at the building and urban scale, supporting data-driven analysis across building and urban energy systems. However, these models require large amounts of building parameter data that is often inaccessible, expensive to collect, or subject to privacy constraints. We introduce a modular, multimodal generative Artificial Intelligence (AI) framework that integrates image, tabular, and simulation-based components and produces synthetic residential building datasets from publicly available county records and images, and present an end-to-end pipeline instantiating this framework. To reduce typical Large Language Model (LLM) challenges, we evaluate our model's components using occlusion-based visual focus analysis. Our analysis demonstrates that our selected vision-language model achieves significantly stronger visual focus than a GPT-based alternative for building image processing. We also assess realism of our results against a national reference dataset. Our synthetic data overlaps more than 65% with the reference dataset across all evaluated parameters and greater than 90% for three of the four. This work reduces dependence on costly or restricted data sources, lowering barriers to building-scale energy research and Machine Learning (ML)-driven urban energy modeling, and therefore enabling scalable downstream tasks such as energy modeling, retrofit analysis, and urban-scale simulation under data scarcity.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces a modular multimodal generative AI framework integrating image, tabular, and simulation components to generate synthetic residential building datasets from public county records and images. It presents an end-to-end pipeline, evaluates the vision-language model via occlusion-based visual focus analysis (showing superiority over GPT alternatives), and assesses realism through overlap with a national reference dataset (>65% across all parameters, >90% for three of four).
Significance. If the central realism claim holds under appropriate validation, the work would lower barriers to data-scarce building and urban energy modeling by enabling scalable synthetic datasets for tasks like retrofit analysis and ML-driven simulation. The modular design and explicit handling of LLM visual focus challenges are strengths that support transparency and reproducibility.
major comments (2)
- [Results (realism assessment)] The realism assessment compares synthetic outputs to a national reference dataset rather than the source county records and images used for generation. Building parameters exhibit strong regional variation (climate zones, local codes, construction eras), so national overlap does not confirm reproduction of county-specific marginals or joint distributions; this metric is therefore non-diagnostic for the claimed local realism.
- [Abstract and Results section] The reported overlap percentages lack any details on sample sizes, statistical tests, error bars, or how comparison parameters were chosen. This omission makes the central claim of >65% (all) and >90% (three of four) overlap difficult to evaluate rigorously.
minor comments (1)
- [Methods] The description of the occlusion-based visual focus analysis could include a brief definition or reference on first use to aid readers unfamiliar with the technique.
Simulated Author's Rebuttal
We thank the referee for their constructive and detailed feedback, which has helped us identify areas for improvement in the presentation and validation of our results. We address each major comment below and have revised the manuscript accordingly to strengthen the rigor of our realism assessment while maintaining the core contributions of the modular generative framework.
read point-by-point responses
-
Referee: [Results (realism assessment)] The realism assessment compares synthetic outputs to a national reference dataset rather than the source county records and images used for generation. Building parameters exhibit strong regional variation (climate zones, local codes, construction eras), so national overlap does not confirm reproduction of county-specific marginals or joint distributions; this metric is therefore non-diagnostic for the claimed local realism.
Authors: We thank the referee for this important observation on the distinction between local fidelity and broader realism. Our use of the national reference dataset was intended to provide an external benchmark for overall distributional plausibility across a larger and more diverse sample, which is relevant for downstream applications in urban-scale modeling. We acknowledge, however, that this does not directly verify reproduction of the specific marginals and joints present in the source county records. In the revised manuscript we have added a new analysis subsection that directly compares the synthetic outputs to the original county records for all parameters that are available in both, reporting overlap metrics, marginal histograms, and selected joint statistics. We have also updated the text to explicitly state that the national comparison serves as a supplementary check for general realism rather than a substitute for local validation, and we discuss the implications of regional variation. revision: yes
-
Referee: [Abstract and Results section] The reported overlap percentages lack any details on sample sizes, statistical tests, error bars, or how comparison parameters were chosen. This omission makes the central claim of >65% (all) and >90% (three of four) overlap difficult to evaluate rigorously.
Authors: We agree that the absence of these details limits the interpretability of the overlap figures. In the revised version we have expanded both the Abstract and the Results section to report the exact sample sizes used (500 synthetic buildings versus the full national reference set of approximately 12,000 records), the rationale for selecting the four parameters (availability in both datasets and direct relevance to building energy modeling), and 95% bootstrap confidence intervals around each overlap percentage. We have also added the results of statistical tests: chi-squared tests for the categorical parameter and two-sample Kolmogorov-Smirnov tests for the continuous parameters, with associated p-values, to provide a more rigorous assessment of distributional similarity. revision: yes
Circularity Check
No significant circularity; validation uses external national reference independent of model inputs
full rationale
The paper describes a multimodal generative pipeline that ingests public county records and images to produce synthetic building data, then compares output distributions to a separate national reference dataset for overlap metrics (>65% all parameters, >90% for three of four). This comparison is an external benchmark rather than a quantity defined by the model's fitted parameters or self-citations. No equations, self-definitional loops, fitted-input-as-prediction, or load-bearing self-citation chains appear in the provided text. The central claim (synthetic data realism under scarcity) remains self-contained against the external reference and does not reduce to its own inputs by construction.
Axiom & Free-Parameter Ledger
free parameters (1)
- Vision-language model selection and prompting strategy
axioms (1)
- domain assumption Public county records and images contain sufficient information to generate realistic building parameter distributions
Reference graph
Works this paper leans on
-
[1]
Energy Information Administration, Drivers of U.S
U.S. Energy Information Administration, Drivers of U.S. household energy consumption, 1980–2009, Tech. rep., U.S. Department of Energy, https://www.eia.gov/analysis/studies/buildings/households/ (Feb. 2015)
work page 1980
-
[2]
U.S. Environmental Protection Agency, Climate Change Indicators: Residential Energy Use,https://www.epa.gov/climate-indicators/ climate-change-indicators-residential-energy-use(Jan. 2025)
work page 2025
-
[3]
E. F. Bompard, S. Conti, M. J. Masera, G. G. Soma, A New Electricity Infrastructure for Fostering Urban Sustainability: Challenges and Emerg- ing Trends, Energies 17 (22) (2024) 5573.doi:10.3390/en17225573
-
[4]
A. Perera, K. Javanroodi, V. M. Nik, Climate resilient interconnected infrastructure: Co-optimization of energy systems and urban morphology, Applied Energy 285 (2021) 116430. doi:10.1016/j.apenergy.2020. 116430
-
[5]
Y. Zeng, Y. Cai, G. Huang, J. Dai, A Review on Optimization Modeling of Energy Systems Planning and GHG Emission Mitigation under Un- certainty, Energies 4 (10) (2011) 1624–1656.doi:10.3390/en4101624
-
[6]
A. Hajri, R. Garay-Marinez, A. M. Macarulla, M. A. Ben Sassi, Data- driven model for heat load prediction in buildings connected to district heating networks, Energy 329 (2025) 136684.doi:10.1016/j.energy. 2025.136684
-
[7]
Department of Energy, Getting Started (3 2025)
U.S. Department of Energy, Getting Started (3 2025). URL https://energyplus.net/assets/nrel_custom/pdfs/pdfs_ v25.1.0/GettingStarted.pdf
work page 2025
-
[8]
D. Wan, X. Zhao, W. Lu, P. Li, X. Shi, H. Fukuda, A Deep Learning Approach toward Energy-Effective Residential Building Floor Plan Gen- eration, Sustainability 14 (13) (2022) 8074.doi:10.3390/su14138074
-
[9]
M. H. Elnabawi, N. Hamza, A Methodology of Creating a Synthetic, Urban-Specific Weather Dataset Using a Microclimate Model for Build- ing Energy Modelling, Buildings 12 (9) (2022) 1407. doi:10.3390/ buildings12091407. 25
work page 2022
-
[10]
S. Lee, J. Cha, M. K. Kim, K. S. Kim, V. H. Pham, M. Leach, Neural- Network-Based Building Energy Consumption Prediction with Training Data Generation, Processes 7 (10) (2019) 731.doi:10.3390/pr7100731
-
[11]
M. H. Zweig, G. Campbell, Receiver-operating characteristic (ROC) plots: A fundamental evaluation tool in clinical medicine, Clinical Chemistry 39 (4) (1993) 561–577.doi:10.1093/clinchem/39.4.561
-
[12]
A. C. J. W. Janssens, F. K. Martens, Reflection on modern methods: Revisiting the area under the ROC Curve, International Journal of Epidemiology 49 (4) (2020) 1397–1403.doi:10.1093/ije/dyz274
-
[13]
F. Stinner, M. Wiecek, M. Baranski, A. Kümpel, D. Müller, Automatic digital twin data model generation of building energy systems from piping and instrumentation diagrams (Aug. 2021).arXiv:2108.13912, doi:10.48550/arXiv.2108.13912
-
[14]
S. Agostinelli, F. Cumo, G. Guidi, C. Tomazzoli, Cyber-Physical Systems Improving Building Energy Management: Digital Twin and Artificial Intelligence, Energies 14 (8) (2021) 2338.doi:10.3390/en14082338
-
[15]
A. Francisco, N. Mohammadi, J. E. Taylor, Smart City Digital Twin– Enabled Energy Management: Toward Real-Time Urban Building Energy Benchmarking, Journal of Management in Engineering 36 (2) (2020) 04019045.doi:10.1061/(ASCE)ME.1943-5479.0000741
-
[16]
M. Belik, O. Rubanenko, Implementation of Digital Twin for Increasing Efficiency of Renewable Energy Sources, Energies 16 (12) (2023) 4787. doi:10.3390/en16124787
-
[17]
H. Xu, F. Omitaomu, S. Sabri, S. Zlatanova, X. Li, Y. Song, Lever- aging generative AI for urban digital twins: A scoping review on the autonomous generation of urban data, scenarios, designs, and 3D city models for smart city advancement, Urban Informatics 3 (1) (2024) 29. doi:10.1007/s44212-024-00060-w
-
[18]
S. Dodge, J. Xu, B. Stenger, Parsing floor plan images, in: 2017 Fifteenth IAPR International Conference on Machine Vision Applications (MVA), IEEE, Nagoya, Japan, 2017, pp. 358–361.doi:10.23919/MVA.2017. 7986875. 26
-
[19]
L. Zhang, V. Ford, Z. Chen, J. Chen, Automatic Building Energy Model Development and Debugging Using Large Language Models Agentic Workflow, preprint (2024).doi:10.2139/ssrn.4864703
-
[20]
T. Xiao, P. Xu, Exploring automated energy optimization with un- structured building data: A multi-agent based framework leverag- ing large language models, Energy and Buildings 322 (2024) 114691. doi:10.1016/j.enbuild.2024.114691
-
[21]
Y. Lin, Y. Yao, J. Zhu, C. He, Application of Generative AI in Predictive Analysis of Urban Energy Distribution and Traffic Congestion in Smart Cities, in: 2025 IEEE International Conference on Electronics, Energy Systems and Power Engineering (EESPE), IEEE, Shenyang, China, 2025, pp. 765–768.doi:10.1109/EESPE63401.2025.10987500
-
[22]
Z. Sha, W. Yue, S. Wang, N. Cheng, J. Wu, C. Li, Generative AI- Enabled Sensing and Communication Integration for Urban Air Mobil- ity, in: 2024 IEEE 99th Vehicular Technology Conference (VTC2024- Spring), IEEE, Singapore, Singapore, 2024, pp. 1–5. doi:10.1109/ VTC2024-Spring62846.2024.10683276
-
[23]
Y. Zhang, A. Schlüter, C. Waibel, SolarGAN: Synthetic Annual Solar Irradiance Time Series on Urban Building Facades via Deep Generative Networks (Jun. 2022).arXiv:2206.00747, doi:10.48550/arXiv.2206. 00747
-
[24]
M. Liu, L. Zhang, J. Chen, W.-A. Chen, Z. Yang, L. J. Lo, J. Wen, Z. O’Neill, Large language models for building energy applications: Op- portunities and challenges, Building Simulation 18 (2) (2025) 225–234. doi:10.1007/s12273-025-1235-9
-
[25]
F. Rehmann, M. Mosteiro-Romero, C. Miller, R. Streblow, Enhancing urban energy modeling: A case study of data acquisition, enrichment, and evaluation in Berlin, Energy and Buildings 346 (2025) 116070. doi:10.1016/j.enbuild.2025.116070. URL https://linkinghub.elsevier.com/retrieve/pii/ S037877882500800X
-
[26]
T. Guo, M. Bachmann, M. Kersten, M. Kriegel, A combined work- flow to generate citywide building energy demand profiles from 27 low-level datasets, Sustainable Cities and Society 96 (2023) 104694. doi:10.1016/j.scs.2023.104694. URL https://linkinghub.elsevier.com/retrieve/pii/ S2210670723003050
-
[27]
D. Bishop, P. Gallardo, B. L. M. Williams, A Review of Multi-Domain Urban Energy Modelling Data, Clean Energy and Sustainability 2 (4) (2024) 10016–10016.doi:10.70322/ces.2024.10016. URLhttps://www.sciepublish.com/article/pii/298
-
[28]
H. Liu, C. Li, Q. Wu, Y. J. Lee, Visual Instruction Tuning (Dec. 2023). arXiv:2304.08485,doi:10.48550/arXiv.2304.08485
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2304.08485 2023
-
[29]
OpenAI, Introducing GPT-4.1 in the API,https://openai.com/index/ gpt-4-1/(Apr. 2025)
work page 2025
-
[30]
D. B. Crawley, L. K. Lawrie, F. C. Winkelmann, W. Buhl, Y. Huang, C. O. Pedersen, R. K. Strand, R. J. Liesen, D. E. Fisher, M. J. Witte, J. Glazer, EnergyPlus: Creating a new-generation building en- ergy simulation program, Energy and Buildings 33 (4) (2001) 319–331. doi:10.1016/S0378-7788(00)00114-6
-
[31]
J. Ansel, E. Yang, H. He, N. Gimelshein, A. Jain, M. Voznesensky, B. Bao, P. Bell, D. Berard, E. Burovski, G. Chauhan, A. Chourdia, W. Constable, A. Desmaison, Z. DeVito, E. Ellison, W. Feng, J. Gong, M. Gschwind, B. Hirsh, S. Huang, K. Kalambarkar, L. Kirsch, M. Lazos, M. Lezcano, Y. Liang, J. Liang, Y. Lu, C. K. Luk, B. Maher, Y. Pan, C. Puhrsch, M. Res...
-
[32]
T. Wolf, L. Debut, V. Sanh, J. Chaumond, C. Delangue, A.Moi, P. Cistac, T. Rault, R. Louf, M. Funtowicz, J. Davison, S. Shleifer, P. Von Platen, C. Ma, Y. Jernite, J. Plu, C. Xu, T. Le Scao, S. Gugger, M. Drame, Q. Lhoest, A. Rush, Transformers: State-of-the-Art Natural Language 28 Processing, in: Proceedings of the 2020 Conference on Empirical Methods in...
work page 2020
-
[33]
dev/documentation/webdriver/, accessed 2025-08-06 (Nov
Selenium, Selenium webdriver documentation,https://www.selenium. dev/documentation/webdriver/, accessed 2025-08-06 (Nov. 2024)
work page 2025
-
[34]
com/chromium.org/driver/, accessed: 2025-08-06 (Jun
Google, Chromedriver - webdriver for chrome,https://sites.google. com/chromium.org/driver/, accessed: 2025-08-06 (Jun. 2023)
work page 2025
-
[35]
Learning Transferable Visual Models From Natural Language Supervision
A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark, G. Krueger, I. Sutskever, Learning Transferable Visual Models From Natural Language Supervision (Feb. 2021).arXiv:2103.00020,doi:10.48550/arXiv.2103.00020
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2103.00020 2021
-
[36]
arXiv preprint arXiv:2501.03895 , year=
S. Zhang, Q. Fang, Z. Yang, Y. Feng, LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One Vision Token (Mar. 2025). arXiv:2501.03895,doi:10.48550/arXiv.2501.03895
-
[37]
Z. Yang, L. Li, K. Lin, J. Wang, C.-C. Lin, Z. Liu, L. Wang, The Dawn of LMMs: Preliminary Explorations with GPT-4V(ision) (Oct. 2023). arXiv:2309.17421,doi:10.48550/arXiv.2309.17421
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2309.17421 2023
-
[38]
J. Li, D. Li, S. Savarese, S. Hoi, BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models (Jun. 2023).arXiv:2301.12597,doi:10.48550/arXiv.2301.12597
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2301.12597 2023
-
[39]
D. Zhu, J. Chen, X. Shen, X. Li, M. Elhoseiny, MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models (Oct. 2023).arXiv:2304.10592,doi:10.48550/arXiv.2304.10592
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2304.10592 2023
-
[40]
C. Shorten, C. Pierse, T. B. Smith, E. Cardenas, A. Sharma, J. Tren- grove, B. van Luijt, StructuredRAG: JSON Response Formatting with Large Language Models (Aug. 2024). arXiv:2408.11061, doi: 10.48550/arXiv.2408.11061
-
[41]
santoshphilip, Eppy,https://github.com/santoshphilip/eppy (Oct. 2024). 29
work page 2024
-
[42]
U.S. Department of Energy, EnergyPlus Auxiliary Programs Documen- tation — Version 23.2.0, EnergyPlus, U.S. Department of Energy, see pages 45–48 forExpandObjectspreprocessor details (2023). URL https://energyplus.net/assets/nrel_custom/pdfs/pdfs_ v23.2.0/AuxiliaryPrograms.pdf
work page 2023
-
[43]
Evaluating feature importance estimates.arXiv preprint arXiv:1806.10758, 2018
S. Hooker, D. Erhan, P.-J. Kindermans, B. Kim, A Benchmark for Interpretability Methods in Deep Neural Networks (Nov. 2019).arXiv: 1806.10758,doi:10.48550/arXiv.1806.10758
-
[44]
E. Balkir, I. Nejadgholi, K. Fraser, S. Kiritchenko, Necessity and Suf- ficiency for Explaining Text Classifiers: A Case Study in Hate Speech Detection, in: Proceedings of the 2022 Conference of the North Amer- ican Chapter of the Association for Computational Linguistics: Hu- man Language Technologies, Association for Computational Linguistics, Seattle, ...
-
[45]
Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
N. Reimers, I. Gurevych, Sentence-BERT: Sentence Embeddings us- ing Siamese BERT-Networks (Aug. 2019). arXiv:1908.10084, doi: 10.48550/arXiv.1908.10084
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1908.10084 2019
-
[46]
T. M. D. Team, Matplotlib: Visualization with Python, Zenodo (May 2025).doi:10.5281/ZENODO.15375714
-
[47]
Y.-H.H.Tsai, S.Bai, P.P.Liang, J.Z.Kolter, L.-P.Morency, R.Salakhut- dinov, Multimodal Transformer for Unaligned Multimodal Language Se- quences (Jun. 2019). arXiv:1906.00295, doi:10.48550/arXiv.1906. 00295. Appendix A. Labeler Experimental Values In this appendix, we list the detailed experimental values used to test the labeler in Section 3.2. Tables A....
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1906 2019
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.