Synthetic Homes: A Multimodal Generative AI Pipeline for Residential Building Data Generation under Data Scarcity

Chetan Tiwari; Jackson Eshbaugh; Jorge Silveyra

arxiv: 2509.09794 · v4 · submitted 2025-09-11 · 💻 cs.AI · cs.LG

Synthetic Homes: A Multimodal Generative AI Pipeline for Residential Building Data Generation under Data Scarcity

Jackson Eshbaugh , Chetan Tiwari , Jorge Silveyra This is my paper

Pith reviewed 2026-05-18 17:06 UTC · model grok-4.3

classification 💻 cs.AI cs.LG

keywords synthetic data generationgenerative AIresidential buildingsenergy modelingdata scarcitymultimodal AIbuilding parametersurban simulation

0 comments

The pith

A multimodal generative AI pipeline creates synthetic residential building datasets from public records and images that overlap more than 65 percent with real reference data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a modular framework that combines vision-language models for processing building images, language models for generating tabular parameters, and simulation tools to produce complete synthetic home datasets. It starts from publicly available county records and images rather than private or expensive sources. The authors evaluate the output against a national reference dataset and report substantial distributional overlap. This setup targets the data scarcity problem that limits energy modeling, retrofit analysis, and urban-scale simulations at the building level.

Core claim

The authors claim that an end-to-end multimodal generative AI pipeline, integrating image, tabular, and simulation-based components and trained or prompted on public county records and images, produces synthetic residential building parameter sets whose distributions overlap the reference national dataset by more than 65 percent across all evaluated parameters and by more than 90 percent for three of the four parameters.

What carries the argument

The modular multimodal generative AI framework that fuses vision-language image analysis, tabular data synthesis, and simulation components to convert public county records and images into synthetic building parameter vectors.

If this is right

Energy modeling and retrofit analysis can proceed without access to restricted private building records.
Urban-scale simulations become feasible in regions where detailed building stock data is scarce or costly.
Machine-learning tasks that rely on large building datasets can scale using only public inputs.
The same pipeline can be reused to refresh or expand datasets as new public records become available.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar pipelines could be adapted for commercial buildings or non-residential stock if analogous public imagery and records exist.
The occlusion-based visual focus test used to compare vision models could serve as a general diagnostic for other image-to-parameter extraction tasks in architecture.
If the overlap holds under broader validation, the method could shorten the lead time for city-level energy policy studies that currently wait for new survey data.

Load-bearing premise

The generative components produce parameter distributions that represent actual residential buildings rather than artifacts introduced by the model architecture or its training data.

What would settle it

Collect real building parameter records from a new set of counties not used in training or prompting, generate matching synthetics with the same pipeline, and test whether the reported overlap percentages remain stable or drop sharply on any key parameter.

Figures

Figures reproduced from arXiv: 2509.09794 by Chetan Tiwari, Jackson Eshbaugh, Jorge Silveyra.

**Figure 2.** Figure 2: An example image and floor plan from the Northampton County database. [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗

**Figure 3.** Figure 3: We use GPT for structured data generation given its strong ability to produce well-formed JSON outputs from descriptive prompts. Multiple studies have shown that LLMs perform well on generation tasks under strict schema constraints, even with no previous prompting or fine tuning. For example, StructuredRAG’s benchmark suite showed 82% format compliance under JSON prompt conditions [40]. As we built the pip… view at source ↗

**Figure 3.** Figure 3: Excerpt of a generated GeoJSON file. 2.4. Running EnergyPlus Simulations To execute a simulation in EnergyPlus, we take the GeoJSON generated by GPT (Section 2.3) and convert it to an IDF file for use in EnergyPlus. We use a template IDF file, filled with default values that are replaced with variables from the GeoJSON data, such as the geometry, HVAC heating and cooling coefficients of performance, r-valu… view at source ↗

**Figure 4.** Figure 4: GPT and LLaVA forward occlusion per image results. [PITH_FULL_IMAGE:figures/full_fig_p014_4.png] view at source ↗

**Figure 5.** Figure 5: Forward occlusion statistics for GPT and LLaVA. [PITH_FULL_IMAGE:figures/full_fig_p014_5.png] view at source ↗

**Figure 6.** Figure 6: GPT and LLaVA occlusion results. of the image, primarily consisting of the home’s roof. GPT, on the other hand, behaves much more randomly. Based on these results, we selected LLaVA for use in the image processing step of our pipeline. Full experimental details and findings are available in Appendix B. 3.2. Validation of Labeling Components As discussed in Section 2.5, the pipeline includes a labeler that … view at source ↗

**Figure 7.** Figure 7: Ablation and combined variation testing of the labeling module using only GPT. [PITH_FULL_IMAGE:figures/full_fig_p017_7.png] view at source ↗

**Figure 7.** Figure 7: Ablation and combined variation testing of the labeling module using only GPT [PITH_FULL_IMAGE:figures/full_fig_p018_7.png] view at source ↗

**Figure 8.** Figure 8: Experimental results after introducing the heuristic labeler. [PITH_FULL_IMAGE:figures/full_fig_p019_8.png] view at source ↗

**Figure 8.** Figure 8: Experimental results after introducing the heuristic labeler (continued). [PITH_FULL_IMAGE:figures/full_fig_p020_8.png] view at source ↗

**Figure 9.** Figure 9: Experimental results after introducing the weighted sum. [PITH_FULL_IMAGE:figures/full_fig_p021_9.png] view at source ↗

**Figure 9.** Figure 9: Experimental results after introducing the weighted sum (continued). [PITH_FULL_IMAGE:figures/full_fig_p022_9.png] view at source ↗

read the original abstract

Computational models have emerged as powerful tools for multi-scale energy modeling research at the building and urban scale, supporting data-driven analysis across building and urban energy systems. However, these models require large amounts of building parameter data that is often inaccessible, expensive to collect, or subject to privacy constraints. We introduce a modular, multimodal generative Artificial Intelligence (AI) framework that integrates image, tabular, and simulation-based components and produces synthetic residential building datasets from publicly available county records and images, and present an end-to-end pipeline instantiating this framework. To reduce typical Large Language Model (LLM) challenges, we evaluate our model's components using occlusion-based visual focus analysis. Our analysis demonstrates that our selected vision-language model achieves significantly stronger visual focus than a GPT-based alternative for building image processing. We also assess realism of our results against a national reference dataset. Our synthetic data overlaps more than 65% with the reference dataset across all evaluated parameters and greater than 90% for three of the four. This work reduces dependence on costly or restricted data sources, lowering barriers to building-scale energy research and Machine Learning (ML)-driven urban energy modeling, and therefore enabling scalable downstream tasks such as energy modeling, retrofit analysis, and urban-scale simulation under data scarcity.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a modular multimodal pipeline for synthetic residential building data from public county sources, which could ease data scarcity in energy modeling, but the national overlap metric does not confirm it reproduces local county distributions.

read the letter

The main takeaway is a practical end-to-end pipeline that pulls together vision-language models on building images, tabular data from county records, and simulation components to generate synthetic residential datasets. It reports decent overlap numbers against a national reference and shows better visual focus than a GPT alternative via occlusion analysis. That combination is the concrete new piece here, even if the underlying generative techniques are not first-principles inventions.

Referee Report

2 major / 1 minor

Summary. The manuscript introduces a modular multimodal generative AI framework integrating image, tabular, and simulation components to generate synthetic residential building datasets from public county records and images. It presents an end-to-end pipeline, evaluates the vision-language model via occlusion-based visual focus analysis (showing superiority over GPT alternatives), and assesses realism through overlap with a national reference dataset (>65% across all parameters, >90% for three of four).

Significance. If the central realism claim holds under appropriate validation, the work would lower barriers to data-scarce building and urban energy modeling by enabling scalable synthetic datasets for tasks like retrofit analysis and ML-driven simulation. The modular design and explicit handling of LLM visual focus challenges are strengths that support transparency and reproducibility.

major comments (2)

[Results (realism assessment)] The realism assessment compares synthetic outputs to a national reference dataset rather than the source county records and images used for generation. Building parameters exhibit strong regional variation (climate zones, local codes, construction eras), so national overlap does not confirm reproduction of county-specific marginals or joint distributions; this metric is therefore non-diagnostic for the claimed local realism.
[Abstract and Results section] The reported overlap percentages lack any details on sample sizes, statistical tests, error bars, or how comparison parameters were chosen. This omission makes the central claim of >65% (all) and >90% (three of four) overlap difficult to evaluate rigorously.

minor comments (1)

[Methods] The description of the occlusion-based visual focus analysis could include a brief definition or reference on first use to aid readers unfamiliar with the technique.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed feedback, which has helped us identify areas for improvement in the presentation and validation of our results. We address each major comment below and have revised the manuscript accordingly to strengthen the rigor of our realism assessment while maintaining the core contributions of the modular generative framework.

read point-by-point responses

Referee: [Results (realism assessment)] The realism assessment compares synthetic outputs to a national reference dataset rather than the source county records and images used for generation. Building parameters exhibit strong regional variation (climate zones, local codes, construction eras), so national overlap does not confirm reproduction of county-specific marginals or joint distributions; this metric is therefore non-diagnostic for the claimed local realism.

Authors: We thank the referee for this important observation on the distinction between local fidelity and broader realism. Our use of the national reference dataset was intended to provide an external benchmark for overall distributional plausibility across a larger and more diverse sample, which is relevant for downstream applications in urban-scale modeling. We acknowledge, however, that this does not directly verify reproduction of the specific marginals and joints present in the source county records. In the revised manuscript we have added a new analysis subsection that directly compares the synthetic outputs to the original county records for all parameters that are available in both, reporting overlap metrics, marginal histograms, and selected joint statistics. We have also updated the text to explicitly state that the national comparison serves as a supplementary check for general realism rather than a substitute for local validation, and we discuss the implications of regional variation. revision: yes
Referee: [Abstract and Results section] The reported overlap percentages lack any details on sample sizes, statistical tests, error bars, or how comparison parameters were chosen. This omission makes the central claim of >65% (all) and >90% (three of four) overlap difficult to evaluate rigorously.

Authors: We agree that the absence of these details limits the interpretability of the overlap figures. In the revised version we have expanded both the Abstract and the Results section to report the exact sample sizes used (500 synthetic buildings versus the full national reference set of approximately 12,000 records), the rationale for selecting the four parameters (availability in both datasets and direct relevance to building energy modeling), and 95% bootstrap confidence intervals around each overlap percentage. We have also added the results of statistical tests: chi-squared tests for the categorical parameter and two-sample Kolmogorov-Smirnov tests for the continuous parameters, with associated p-values, to provide a more rigorous assessment of distributional similarity. revision: yes

Circularity Check

0 steps flagged

No significant circularity; validation uses external national reference independent of model inputs

full rationale

The paper describes a multimodal generative pipeline that ingests public county records and images to produce synthetic building data, then compares output distributions to a separate national reference dataset for overlap metrics (>65% all parameters, >90% for three of four). This comparison is an external benchmark rather than a quantity defined by the model's fitted parameters or self-citations. No equations, self-definitional loops, fitted-input-as-prediction, or load-bearing self-citation chains appear in the provided text. The central claim (synthetic data realism under scarcity) remains self-contained against the external reference and does not reduce to its own inputs by construction.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

Abstract-only review limits visibility into exact parameters; the framework likely relies on standard generative model hyperparameters and prompt engineering choices that function as free parameters, plus domain assumptions about public data sufficiency.

free parameters (1)

Vision-language model selection and prompting strategy
Chosen to achieve stronger visual focus; specific hyperparameters or fine-tuning details not stated in abstract.

axioms (1)

domain assumption Public county records and images contain sufficient information to generate realistic building parameter distributions
Invoked implicitly as the input source for the generative pipeline.

pith-pipeline@v0.9.0 · 5762 in / 1251 out tokens · 34096 ms · 2026-05-18T17:06:48.156174+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

47 extracted references · 47 canonical work pages · 7 internal anchors

[1]

Energy Information Administration, Drivers of U.S

U.S. Energy Information Administration, Drivers of U.S. household energy consumption, 1980–2009, Tech. rep., U.S. Department of Energy, https://www.eia.gov/analysis/studies/buildings/households/ (Feb. 2015)

work page 1980
[2]

Environmental Protection Agency, Climate Change Indicators: Residential Energy Use,https://www.epa.gov/climate-indicators/ climate-change-indicators-residential-energy-use(Jan

U.S. Environmental Protection Agency, Climate Change Indicators: Residential Energy Use,https://www.epa.gov/climate-indicators/ climate-change-indicators-residential-energy-use(Jan. 2025)

work page 2025
[3]

E. F. Bompard, S. Conti, M. J. Masera, G. G. Soma, A New Electricity Infrastructure for Fostering Urban Sustainability: Challenges and Emerg- ing Trends, Energies 17 (22) (2024) 5573.doi:10.3390/en17225573

work page doi:10.3390/en17225573 2024
[4]

Perera, K

A. Perera, K. Javanroodi, V. M. Nik, Climate resilient interconnected infrastructure: Co-optimization of energy systems and urban morphology, Applied Energy 285 (2021) 116430. doi:10.1016/j.apenergy.2020. 116430

work page doi:10.1016/j.apenergy.2020 2021
[5]

Y. Zeng, Y. Cai, G. Huang, J. Dai, A Review on Optimization Modeling of Energy Systems Planning and GHG Emission Mitigation under Un- certainty, Energies 4 (10) (2011) 1624–1656.doi:10.3390/en4101624

work page doi:10.3390/en4101624 2011
[6]

Hajri, R

A. Hajri, R. Garay-Marinez, A. M. Macarulla, M. A. Ben Sassi, Data- driven model for heat load prediction in buildings connected to district heating networks, Energy 329 (2025) 136684.doi:10.1016/j.energy. 2025.136684

work page doi:10.1016/j.energy 2025
[7]

Department of Energy, Getting Started (3 2025)

U.S. Department of Energy, Getting Started (3 2025). URL https://energyplus.net/assets/nrel_custom/pdfs/pdfs_ v25.1.0/GettingStarted.pdf

work page 2025
[8]

D. Wan, X. Zhao, W. Lu, P. Li, X. Shi, H. Fukuda, A Deep Learning Approach toward Energy-Effective Residential Building Floor Plan Gen- eration, Sustainability 14 (13) (2022) 8074.doi:10.3390/su14138074

work page doi:10.3390/su14138074 2022
[9]

M. H. Elnabawi, N. Hamza, A Methodology of Creating a Synthetic, Urban-Specific Weather Dataset Using a Microclimate Model for Build- ing Energy Modelling, Buildings 12 (9) (2022) 1407. doi:10.3390/ buildings12091407. 25

work page 2022
[10]

S. Lee, J. Cha, M. K. Kim, K. S. Kim, V. H. Pham, M. Leach, Neural- Network-Based Building Energy Consumption Prediction with Training Data Generation, Processes 7 (10) (2019) 731.doi:10.3390/pr7100731

work page doi:10.3390/pr7100731 2019
[11]

M. H. Zweig, G. Campbell, Receiver-operating characteristic (ROC) plots: A fundamental evaluation tool in clinical medicine, Clinical Chemistry 39 (4) (1993) 561–577.doi:10.1093/clinchem/39.4.561

work page doi:10.1093/clinchem/39.4.561 1993
[12]

A. C. J. W. Janssens, F. K. Martens, Reflection on modern methods: Revisiting the area under the ROC Curve, International Journal of Epidemiology 49 (4) (2020) 1397–1403.doi:10.1093/ije/dyz274

work page doi:10.1093/ije/dyz274 2020
[13]

Stinner, M

F. Stinner, M. Wiecek, M. Baranski, A. Kümpel, D. Müller, Automatic digital twin data model generation of building energy systems from piping and instrumentation diagrams (Aug. 2021).arXiv:2108.13912, doi:10.48550/arXiv.2108.13912

work page doi:10.48550/arxiv.2108.13912 2021
[14]

Agostinelli, F

S. Agostinelli, F. Cumo, G. Guidi, C. Tomazzoli, Cyber-Physical Systems Improving Building Energy Management: Digital Twin and Artificial Intelligence, Energies 14 (8) (2021) 2338.doi:10.3390/en14082338

work page doi:10.3390/en14082338 2021
[15]

Francisco, N

A. Francisco, N. Mohammadi, J. E. Taylor, Smart City Digital Twin– Enabled Energy Management: Toward Real-Time Urban Building Energy Benchmarking, Journal of Management in Engineering 36 (2) (2020) 04019045.doi:10.1061/(ASCE)ME.1943-5479.0000741

work page doi:10.1061/(asce)me.1943-5479.0000741 2020
[16]

Belik, O

M. Belik, O. Rubanenko, Implementation of Digital Twin for Increasing Efficiency of Renewable Energy Sources, Energies 16 (12) (2023) 4787. doi:10.3390/en16124787

work page doi:10.3390/en16124787 2023
[17]

H. Xu, F. Omitaomu, S. Sabri, S. Zlatanova, X. Li, Y. Song, Lever- aging generative AI for urban digital twins: A scoping review on the autonomous generation of urban data, scenarios, designs, and 3D city models for smart city advancement, Urban Informatics 3 (1) (2024) 29. doi:10.1007/s44212-024-00060-w

work page doi:10.1007/s44212-024-00060-w 2024
[18]

Dodge, J

S. Dodge, J. Xu, B. Stenger, Parsing floor plan images, in: 2017 Fifteenth IAPR International Conference on Machine Vision Applications (MVA), IEEE, Nagoya, Japan, 2017, pp. 358–361.doi:10.23919/MVA.2017. 7986875. 26

work page doi:10.23919/mva.2017 2017
[19]

Zhang, V

L. Zhang, V. Ford, Z. Chen, J. Chen, Automatic Building Energy Model Development and Debugging Using Large Language Models Agentic Workflow, preprint (2024).doi:10.2139/ssrn.4864703

work page doi:10.2139/ssrn.4864703 2024
[20]

T. Xiao, P. Xu, Exploring automated energy optimization with un- structured building data: A multi-agent based framework leverag- ing large language models, Energy and Buildings 322 (2024) 114691. doi:10.1016/j.enbuild.2024.114691

work page doi:10.1016/j.enbuild.2024.114691 2024
[21]

Y. Lin, Y. Yao, J. Zhu, C. He, Application of Generative AI in Predictive Analysis of Urban Energy Distribution and Traffic Congestion in Smart Cities, in: 2025 IEEE International Conference on Electronics, Energy Systems and Power Engineering (EESPE), IEEE, Shenyang, China, 2025, pp. 765–768.doi:10.1109/EESPE63401.2025.10987500

work page doi:10.1109/eespe63401.2025.10987500 2025
[22]

Z. Sha, W. Yue, S. Wang, N. Cheng, J. Wu, C. Li, Generative AI- Enabled Sensing and Communication Integration for Urban Air Mobil- ity, in: 2024 IEEE 99th Vehicular Technology Conference (VTC2024- Spring), IEEE, Singapore, Singapore, 2024, pp. 1–5. doi:10.1109/ VTC2024-Spring62846.2024.10683276

work page arXiv 2024
[23]

Zhang, A

Y. Zhang, A. Schlüter, C. Waibel, SolarGAN: Synthetic Annual Solar Irradiance Time Series on Urban Building Facades via Deep Generative Networks (Jun. 2022).arXiv:2206.00747, doi:10.48550/arXiv.2206. 00747

work page doi:10.48550/arxiv.2206 2022
[24]

M. Liu, L. Zhang, J. Chen, W.-A. Chen, Z. Yang, L. J. Lo, J. Wen, Z. O’Neill, Large language models for building energy applications: Op- portunities and challenges, Building Simulation 18 (2) (2025) 225–234. doi:10.1007/s12273-025-1235-9

work page doi:10.1007/s12273-025-1235-9 2025
[25]

Rehmann, M

F. Rehmann, M. Mosteiro-Romero, C. Miller, R. Streblow, Enhancing urban energy modeling: A case study of data acquisition, enrichment, and evaluation in Berlin, Energy and Buildings 346 (2025) 116070. doi:10.1016/j.enbuild.2025.116070. URL https://linkinghub.elsevier.com/retrieve/pii/ S037877882500800X

work page doi:10.1016/j.enbuild.2025.116070 2025
[26]

T. Guo, M. Bachmann, M. Kersten, M. Kriegel, A combined work- flow to generate citywide building energy demand profiles from 27 low-level datasets, Sustainable Cities and Society 96 (2023) 104694. doi:10.1016/j.scs.2023.104694. URL https://linkinghub.elsevier.com/retrieve/pii/ S2210670723003050

work page doi:10.1016/j.scs.2023.104694 2023
[27]

Bishop, P

D. Bishop, P. Gallardo, B. L. M. Williams, A Review of Multi-Domain Urban Energy Modelling Data, Clean Energy and Sustainability 2 (4) (2024) 10016–10016.doi:10.70322/ces.2024.10016. URLhttps://www.sciepublish.com/article/pii/298

work page doi:10.70322/ces.2024.10016 2024
[28]

H. Liu, C. Li, Q. Wu, Y. J. Lee, Visual Instruction Tuning (Dec. 2023). arXiv:2304.08485,doi:10.48550/arXiv.2304.08485

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2304.08485 2023
[29]

OpenAI, Introducing GPT-4.1 in the API,https://openai.com/index/ gpt-4-1/(Apr. 2025)

work page 2025
[30]

D. B. Crawley, L. K. Lawrie, F. C. Winkelmann, W. Buhl, Y. Huang, C. O. Pedersen, R. K. Strand, R. J. Liesen, D. E. Fisher, M. J. Witte, J. Glazer, EnergyPlus: Creating a new-generation building en- ergy simulation program, Energy and Buildings 33 (4) (2001) 319–331. doi:10.1016/S0378-7788(00)00114-6

work page doi:10.1016/s0378-7788(00)00114-6 2001
[31]

PyTorch 2: Faster Machine Learning Through Dynamic Python Bytecode Transformation and Graph Compilation

J. Ansel, E. Yang, H. He, N. Gimelshein, A. Jain, M. Voznesensky, B. Bao, P. Bell, D. Berard, E. Burovski, G. Chauhan, A. Chourdia, W. Constable, A. Desmaison, Z. DeVito, E. Ellison, W. Feng, J. Gong, M. Gschwind, B. Hirsh, S. Huang, K. Kalambarkar, L. Kirsch, M. Lazos, M. Lezcano, Y. Liang, J. Liang, Y. Lu, C. K. Luk, B. Maher, Y. Pan, C. Puhrsch, M. Res...

work page doi:10.1145/3620665.3640366 2024
[32]

T. Wolf, L. Debut, V. Sanh, J. Chaumond, C. Delangue, A.Moi, P. Cistac, T. Rault, R. Louf, M. Funtowicz, J. Davison, S. Shleifer, P. Von Platen, C. Ma, Y. Jernite, J. Plu, C. Xu, T. Le Scao, S. Gugger, M. Drame, Q. Lhoest, A. Rush, Transformers: State-of-the-Art Natural Language 28 Processing, in: Proceedings of the 2020 Conference on Empirical Methods in...

work page 2020
[33]

dev/documentation/webdriver/, accessed 2025-08-06 (Nov

Selenium, Selenium webdriver documentation,https://www.selenium. dev/documentation/webdriver/, accessed 2025-08-06 (Nov. 2024)

work page 2025
[34]

com/chromium.org/driver/, accessed: 2025-08-06 (Jun

Google, Chromedriver - webdriver for chrome,https://sites.google. com/chromium.org/driver/, accessed: 2025-08-06 (Jun. 2023)

work page 2025
[35]

Learning Transferable Visual Models From Natural Language Supervision

A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark, G. Krueger, I. Sutskever, Learning Transferable Visual Models From Natural Language Supervision (Feb. 2021).arXiv:2103.00020,doi:10.48550/arXiv.2103.00020

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2103.00020 2021
[36]

arXiv preprint arXiv:2501.03895 , year=

S. Zhang, Q. Fang, Z. Yang, Y. Feng, LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One Vision Token (Mar. 2025). arXiv:2501.03895,doi:10.48550/arXiv.2501.03895

work page doi:10.48550/arxiv.2501.03895 2025
[37]

Z. Yang, L. Li, K. Lin, J. Wang, C.-C. Lin, Z. Liu, L. Wang, The Dawn of LMMs: Preliminary Explorations with GPT-4V(ision) (Oct. 2023). arXiv:2309.17421,doi:10.48550/arXiv.2309.17421

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2309.17421 2023
[38]

J. Li, D. Li, S. Savarese, S. Hoi, BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models (Jun. 2023).arXiv:2301.12597,doi:10.48550/arXiv.2301.12597

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2301.12597 2023
[39]

D. Zhu, J. Chen, X. Shen, X. Li, M. Elhoseiny, MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models (Oct. 2023).arXiv:2304.10592,doi:10.48550/arXiv.2304.10592

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2304.10592 2023
[40]

Shorten, C

C. Shorten, C. Pierse, T. B. Smith, E. Cardenas, A. Sharma, J. Tren- grove, B. van Luijt, StructuredRAG: JSON Response Formatting with Large Language Models (Aug. 2024). arXiv:2408.11061, doi: 10.48550/arXiv.2408.11061

work page doi:10.48550/arxiv.2408.11061 2024
[41]

santoshphilip, Eppy,https://github.com/santoshphilip/eppy (Oct. 2024). 29

work page 2024
[42]

Department of Energy, EnergyPlus Auxiliary Programs Documen- tation — Version 23.2.0, EnergyPlus, U.S

U.S. Department of Energy, EnergyPlus Auxiliary Programs Documen- tation — Version 23.2.0, EnergyPlus, U.S. Department of Energy, see pages 45–48 forExpandObjectspreprocessor details (2023). URL https://energyplus.net/assets/nrel_custom/pdfs/pdfs_ v23.2.0/AuxiliaryPrograms.pdf

work page 2023
[43]

Evaluating feature importance estimates.arXiv preprint arXiv:1806.10758, 2018

S. Hooker, D. Erhan, P.-J. Kindermans, B. Kim, A Benchmark for Interpretability Methods in Deep Neural Networks (Nov. 2019).arXiv: 1806.10758,doi:10.48550/arXiv.1806.10758

work page doi:10.48550/arxiv.1806.10758 2019
[44]

Bahri, H

E. Balkir, I. Nejadgholi, K. Fraser, S. Kiritchenko, Necessity and Suf- ficiency for Explaining Text Classifiers: A Case Study in Hate Speech Detection, in: Proceedings of the 2022 Conference of the North Amer- ican Chapter of the Association for Computational Linguistics: Hu- man Language Technologies, Association for Computational Linguistics, Seattle, ...

work page doi:10.18653/v1/2022 2022
[45]

Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks

N. Reimers, I. Gurevych, Sentence-BERT: Sentence Embeddings us- ing Siamese BERT-Networks (Aug. 2019). arXiv:1908.10084, doi: 10.48550/arXiv.1908.10084

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1908.10084 2019
[46]

T. M. D. Team, Matplotlib: Visualization with Python, Zenodo (May 2025).doi:10.5281/ZENODO.15375714

work page doi:10.5281/zenodo.15375714 2025
[47]

Y.-H.H.Tsai, S.Bai, P.P.Liang, J.Z.Kolter, L.-P.Morency, R.Salakhut- dinov, Multimodal Transformer for Unaligned Multimodal Language Se- quences (Jun. 2019). arXiv:1906.00295, doi:10.48550/arXiv.1906. 00295. Appendix A. Labeler Experimental Values In this appendix, we list the detailed experimental values used to test the labeler in Section 3.2. Tables A....

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1906 2019

[1] [1]

Energy Information Administration, Drivers of U.S

U.S. Energy Information Administration, Drivers of U.S. household energy consumption, 1980–2009, Tech. rep., U.S. Department of Energy, https://www.eia.gov/analysis/studies/buildings/households/ (Feb. 2015)

work page 1980

[2] [2]

Environmental Protection Agency, Climate Change Indicators: Residential Energy Use,https://www.epa.gov/climate-indicators/ climate-change-indicators-residential-energy-use(Jan

U.S. Environmental Protection Agency, Climate Change Indicators: Residential Energy Use,https://www.epa.gov/climate-indicators/ climate-change-indicators-residential-energy-use(Jan. 2025)

work page 2025

[3] [3]

E. F. Bompard, S. Conti, M. J. Masera, G. G. Soma, A New Electricity Infrastructure for Fostering Urban Sustainability: Challenges and Emerg- ing Trends, Energies 17 (22) (2024) 5573.doi:10.3390/en17225573

work page doi:10.3390/en17225573 2024

[4] [4]

Perera, K

A. Perera, K. Javanroodi, V. M. Nik, Climate resilient interconnected infrastructure: Co-optimization of energy systems and urban morphology, Applied Energy 285 (2021) 116430. doi:10.1016/j.apenergy.2020. 116430

work page doi:10.1016/j.apenergy.2020 2021

[5] [5]

Y. Zeng, Y. Cai, G. Huang, J. Dai, A Review on Optimization Modeling of Energy Systems Planning and GHG Emission Mitigation under Un- certainty, Energies 4 (10) (2011) 1624–1656.doi:10.3390/en4101624

work page doi:10.3390/en4101624 2011

[6] [6]

Hajri, R

A. Hajri, R. Garay-Marinez, A. M. Macarulla, M. A. Ben Sassi, Data- driven model for heat load prediction in buildings connected to district heating networks, Energy 329 (2025) 136684.doi:10.1016/j.energy. 2025.136684

work page doi:10.1016/j.energy 2025

[7] [7]

Department of Energy, Getting Started (3 2025)

U.S. Department of Energy, Getting Started (3 2025). URL https://energyplus.net/assets/nrel_custom/pdfs/pdfs_ v25.1.0/GettingStarted.pdf

work page 2025

[8] [8]

D. Wan, X. Zhao, W. Lu, P. Li, X. Shi, H. Fukuda, A Deep Learning Approach toward Energy-Effective Residential Building Floor Plan Gen- eration, Sustainability 14 (13) (2022) 8074.doi:10.3390/su14138074

work page doi:10.3390/su14138074 2022

[9] [9]

M. H. Elnabawi, N. Hamza, A Methodology of Creating a Synthetic, Urban-Specific Weather Dataset Using a Microclimate Model for Build- ing Energy Modelling, Buildings 12 (9) (2022) 1407. doi:10.3390/ buildings12091407. 25

work page 2022

[10] [10]

S. Lee, J. Cha, M. K. Kim, K. S. Kim, V. H. Pham, M. Leach, Neural- Network-Based Building Energy Consumption Prediction with Training Data Generation, Processes 7 (10) (2019) 731.doi:10.3390/pr7100731

work page doi:10.3390/pr7100731 2019

[11] [11]

M. H. Zweig, G. Campbell, Receiver-operating characteristic (ROC) plots: A fundamental evaluation tool in clinical medicine, Clinical Chemistry 39 (4) (1993) 561–577.doi:10.1093/clinchem/39.4.561

work page doi:10.1093/clinchem/39.4.561 1993

[12] [12]

A. C. J. W. Janssens, F. K. Martens, Reflection on modern methods: Revisiting the area under the ROC Curve, International Journal of Epidemiology 49 (4) (2020) 1397–1403.doi:10.1093/ije/dyz274

work page doi:10.1093/ije/dyz274 2020

[13] [13]

Stinner, M

F. Stinner, M. Wiecek, M. Baranski, A. Kümpel, D. Müller, Automatic digital twin data model generation of building energy systems from piping and instrumentation diagrams (Aug. 2021).arXiv:2108.13912, doi:10.48550/arXiv.2108.13912

work page doi:10.48550/arxiv.2108.13912 2021

[14] [14]

Agostinelli, F

S. Agostinelli, F. Cumo, G. Guidi, C. Tomazzoli, Cyber-Physical Systems Improving Building Energy Management: Digital Twin and Artificial Intelligence, Energies 14 (8) (2021) 2338.doi:10.3390/en14082338

work page doi:10.3390/en14082338 2021

[15] [15]

Francisco, N

A. Francisco, N. Mohammadi, J. E. Taylor, Smart City Digital Twin– Enabled Energy Management: Toward Real-Time Urban Building Energy Benchmarking, Journal of Management in Engineering 36 (2) (2020) 04019045.doi:10.1061/(ASCE)ME.1943-5479.0000741

work page doi:10.1061/(asce)me.1943-5479.0000741 2020

[16] [16]

Belik, O

M. Belik, O. Rubanenko, Implementation of Digital Twin for Increasing Efficiency of Renewable Energy Sources, Energies 16 (12) (2023) 4787. doi:10.3390/en16124787

work page doi:10.3390/en16124787 2023

[17] [17]

H. Xu, F. Omitaomu, S. Sabri, S. Zlatanova, X. Li, Y. Song, Lever- aging generative AI for urban digital twins: A scoping review on the autonomous generation of urban data, scenarios, designs, and 3D city models for smart city advancement, Urban Informatics 3 (1) (2024) 29. doi:10.1007/s44212-024-00060-w

work page doi:10.1007/s44212-024-00060-w 2024

[18] [18]

Dodge, J

S. Dodge, J. Xu, B. Stenger, Parsing floor plan images, in: 2017 Fifteenth IAPR International Conference on Machine Vision Applications (MVA), IEEE, Nagoya, Japan, 2017, pp. 358–361.doi:10.23919/MVA.2017. 7986875. 26

work page doi:10.23919/mva.2017 2017

[19] [19]

Zhang, V

L. Zhang, V. Ford, Z. Chen, J. Chen, Automatic Building Energy Model Development and Debugging Using Large Language Models Agentic Workflow, preprint (2024).doi:10.2139/ssrn.4864703

work page doi:10.2139/ssrn.4864703 2024

[20] [20]

T. Xiao, P. Xu, Exploring automated energy optimization with un- structured building data: A multi-agent based framework leverag- ing large language models, Energy and Buildings 322 (2024) 114691. doi:10.1016/j.enbuild.2024.114691

work page doi:10.1016/j.enbuild.2024.114691 2024

[21] [21]

Y. Lin, Y. Yao, J. Zhu, C. He, Application of Generative AI in Predictive Analysis of Urban Energy Distribution and Traffic Congestion in Smart Cities, in: 2025 IEEE International Conference on Electronics, Energy Systems and Power Engineering (EESPE), IEEE, Shenyang, China, 2025, pp. 765–768.doi:10.1109/EESPE63401.2025.10987500

work page doi:10.1109/eespe63401.2025.10987500 2025

[22] [22]

Z. Sha, W. Yue, S. Wang, N. Cheng, J. Wu, C. Li, Generative AI- Enabled Sensing and Communication Integration for Urban Air Mobil- ity, in: 2024 IEEE 99th Vehicular Technology Conference (VTC2024- Spring), IEEE, Singapore, Singapore, 2024, pp. 1–5. doi:10.1109/ VTC2024-Spring62846.2024.10683276

work page arXiv 2024

[23] [23]

Zhang, A

Y. Zhang, A. Schlüter, C. Waibel, SolarGAN: Synthetic Annual Solar Irradiance Time Series on Urban Building Facades via Deep Generative Networks (Jun. 2022).arXiv:2206.00747, doi:10.48550/arXiv.2206. 00747

work page doi:10.48550/arxiv.2206 2022

[24] [24]

M. Liu, L. Zhang, J. Chen, W.-A. Chen, Z. Yang, L. J. Lo, J. Wen, Z. O’Neill, Large language models for building energy applications: Op- portunities and challenges, Building Simulation 18 (2) (2025) 225–234. doi:10.1007/s12273-025-1235-9

work page doi:10.1007/s12273-025-1235-9 2025

[25] [25]

Rehmann, M

F. Rehmann, M. Mosteiro-Romero, C. Miller, R. Streblow, Enhancing urban energy modeling: A case study of data acquisition, enrichment, and evaluation in Berlin, Energy and Buildings 346 (2025) 116070. doi:10.1016/j.enbuild.2025.116070. URL https://linkinghub.elsevier.com/retrieve/pii/ S037877882500800X

work page doi:10.1016/j.enbuild.2025.116070 2025

[26] [26]

T. Guo, M. Bachmann, M. Kersten, M. Kriegel, A combined work- flow to generate citywide building energy demand profiles from 27 low-level datasets, Sustainable Cities and Society 96 (2023) 104694. doi:10.1016/j.scs.2023.104694. URL https://linkinghub.elsevier.com/retrieve/pii/ S2210670723003050

work page doi:10.1016/j.scs.2023.104694 2023

[27] [27]

Bishop, P

D. Bishop, P. Gallardo, B. L. M. Williams, A Review of Multi-Domain Urban Energy Modelling Data, Clean Energy and Sustainability 2 (4) (2024) 10016–10016.doi:10.70322/ces.2024.10016. URLhttps://www.sciepublish.com/article/pii/298

work page doi:10.70322/ces.2024.10016 2024

[28] [28]

H. Liu, C. Li, Q. Wu, Y. J. Lee, Visual Instruction Tuning (Dec. 2023). arXiv:2304.08485,doi:10.48550/arXiv.2304.08485

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2304.08485 2023

[29] [29]

OpenAI, Introducing GPT-4.1 in the API,https://openai.com/index/ gpt-4-1/(Apr. 2025)

work page 2025

[30] [30]

D. B. Crawley, L. K. Lawrie, F. C. Winkelmann, W. Buhl, Y. Huang, C. O. Pedersen, R. K. Strand, R. J. Liesen, D. E. Fisher, M. J. Witte, J. Glazer, EnergyPlus: Creating a new-generation building en- ergy simulation program, Energy and Buildings 33 (4) (2001) 319–331. doi:10.1016/S0378-7788(00)00114-6

work page doi:10.1016/s0378-7788(00)00114-6 2001

[31] [31]

PyTorch 2: Faster Machine Learning Through Dynamic Python Bytecode Transformation and Graph Compilation

J. Ansel, E. Yang, H. He, N. Gimelshein, A. Jain, M. Voznesensky, B. Bao, P. Bell, D. Berard, E. Burovski, G. Chauhan, A. Chourdia, W. Constable, A. Desmaison, Z. DeVito, E. Ellison, W. Feng, J. Gong, M. Gschwind, B. Hirsh, S. Huang, K. Kalambarkar, L. Kirsch, M. Lazos, M. Lezcano, Y. Liang, J. Liang, Y. Lu, C. K. Luk, B. Maher, Y. Pan, C. Puhrsch, M. Res...

work page doi:10.1145/3620665.3640366 2024

[32] [32]

T. Wolf, L. Debut, V. Sanh, J. Chaumond, C. Delangue, A.Moi, P. Cistac, T. Rault, R. Louf, M. Funtowicz, J. Davison, S. Shleifer, P. Von Platen, C. Ma, Y. Jernite, J. Plu, C. Xu, T. Le Scao, S. Gugger, M. Drame, Q. Lhoest, A. Rush, Transformers: State-of-the-Art Natural Language 28 Processing, in: Proceedings of the 2020 Conference on Empirical Methods in...

work page 2020

[33] [33]

dev/documentation/webdriver/, accessed 2025-08-06 (Nov

Selenium, Selenium webdriver documentation,https://www.selenium. dev/documentation/webdriver/, accessed 2025-08-06 (Nov. 2024)

work page 2025

[34] [34]

com/chromium.org/driver/, accessed: 2025-08-06 (Jun

Google, Chromedriver - webdriver for chrome,https://sites.google. com/chromium.org/driver/, accessed: 2025-08-06 (Jun. 2023)

work page 2025

[35] [35]

Learning Transferable Visual Models From Natural Language Supervision

A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark, G. Krueger, I. Sutskever, Learning Transferable Visual Models From Natural Language Supervision (Feb. 2021).arXiv:2103.00020,doi:10.48550/arXiv.2103.00020

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2103.00020 2021

[36] [36]

arXiv preprint arXiv:2501.03895 , year=

S. Zhang, Q. Fang, Z. Yang, Y. Feng, LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One Vision Token (Mar. 2025). arXiv:2501.03895,doi:10.48550/arXiv.2501.03895

work page doi:10.48550/arxiv.2501.03895 2025

[37] [37]

Z. Yang, L. Li, K. Lin, J. Wang, C.-C. Lin, Z. Liu, L. Wang, The Dawn of LMMs: Preliminary Explorations with GPT-4V(ision) (Oct. 2023). arXiv:2309.17421,doi:10.48550/arXiv.2309.17421

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2309.17421 2023

[38] [38]

J. Li, D. Li, S. Savarese, S. Hoi, BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models (Jun. 2023).arXiv:2301.12597,doi:10.48550/arXiv.2301.12597

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2301.12597 2023

[39] [39]

D. Zhu, J. Chen, X. Shen, X. Li, M. Elhoseiny, MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models (Oct. 2023).arXiv:2304.10592,doi:10.48550/arXiv.2304.10592

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2304.10592 2023

[40] [40]

Shorten, C

C. Shorten, C. Pierse, T. B. Smith, E. Cardenas, A. Sharma, J. Tren- grove, B. van Luijt, StructuredRAG: JSON Response Formatting with Large Language Models (Aug. 2024). arXiv:2408.11061, doi: 10.48550/arXiv.2408.11061

work page doi:10.48550/arxiv.2408.11061 2024

[41] [41]

santoshphilip, Eppy,https://github.com/santoshphilip/eppy (Oct. 2024). 29

work page 2024

[42] [42]

Department of Energy, EnergyPlus Auxiliary Programs Documen- tation — Version 23.2.0, EnergyPlus, U.S

U.S. Department of Energy, EnergyPlus Auxiliary Programs Documen- tation — Version 23.2.0, EnergyPlus, U.S. Department of Energy, see pages 45–48 forExpandObjectspreprocessor details (2023). URL https://energyplus.net/assets/nrel_custom/pdfs/pdfs_ v23.2.0/AuxiliaryPrograms.pdf

work page 2023

[43] [43]

Evaluating feature importance estimates.arXiv preprint arXiv:1806.10758, 2018

S. Hooker, D. Erhan, P.-J. Kindermans, B. Kim, A Benchmark for Interpretability Methods in Deep Neural Networks (Nov. 2019).arXiv: 1806.10758,doi:10.48550/arXiv.1806.10758

work page doi:10.48550/arxiv.1806.10758 2019

[44] [44]

Bahri, H

E. Balkir, I. Nejadgholi, K. Fraser, S. Kiritchenko, Necessity and Suf- ficiency for Explaining Text Classifiers: A Case Study in Hate Speech Detection, in: Proceedings of the 2022 Conference of the North Amer- ican Chapter of the Association for Computational Linguistics: Hu- man Language Technologies, Association for Computational Linguistics, Seattle, ...

work page doi:10.18653/v1/2022 2022

[45] [45]

Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks

N. Reimers, I. Gurevych, Sentence-BERT: Sentence Embeddings us- ing Siamese BERT-Networks (Aug. 2019). arXiv:1908.10084, doi: 10.48550/arXiv.1908.10084

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1908.10084 2019

[46] [46]

T. M. D. Team, Matplotlib: Visualization with Python, Zenodo (May 2025).doi:10.5281/ZENODO.15375714

work page doi:10.5281/zenodo.15375714 2025

[47] [47]

Y.-H.H.Tsai, S.Bai, P.P.Liang, J.Z.Kolter, L.-P.Morency, R.Salakhut- dinov, Multimodal Transformer for Unaligned Multimodal Language Se- quences (Jun. 2019). arXiv:1906.00295, doi:10.48550/arXiv.1906. 00295. Appendix A. Labeler Experimental Values In this appendix, we list the detailed experimental values used to test the labeler in Section 3.2. Tables A....

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1906 2019