A Zero-Shot Multi-Agent Framework for Human-Building Interaction via Programmatic Reasoning

Ali Mehmani; Gulai Shen; Yuqi Wang

arxiv: 2606.11354 · v1 · pith:GI5QFFE3new · submitted 2026-06-09 · 💻 cs.ET

A Zero-Shot Multi-Agent Framework for Human-Building Interaction via Programmatic Reasoning

Yuqi Wang , Gulai Shen , Ali Mehmani This is my paper

Pith reviewed 2026-06-27 10:27 UTC · model grok-4.3

classification 💻 cs.ET

keywords multi-agent frameworkhuman-building interactionprogrammatic reasoningzero-shotsemantic routingbuilding analyticsLLM agents

0 comments

The pith

A hierarchical multi-agent framework uses a Doorman for query decomposition and coding agents that emit executable Python scripts to deliver accurate building analytics from natural language without fine-tuning or RAG.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a zero-shot multi-agent system for human-building interaction that separates intuitive language handling from precise technical calculations in complex building data. A top-level Doorman agent breaks down user questions, then routes them to specialized coding agents that write and run Python scripts for arithmetic and analysis. This is tested on data from more than 200 commercial buildings and produces accurate, contextual answers for users from tenants to managers across multiple building systems. A sympathetic reader would care because building systems hold large, opaque datasets that normally require scarce domain experts, and this setup aims to make those data queryable through ordinary language.

Core claim

The central claim is that semantic routing combined with programmatic reasoning lets LLMs handle human-building interaction reliably in a zero-shot setting by generating executable Python scripts for exact calculations, thereby avoiding the need to embed domain knowledge directly in base models or rely on retrieval-augmented generation, and this produces accurate responses on real data from over 200 buildings for diverse stakeholders and applications.

What carries the argument

The Doorman mechanism for task decomposition together with specialized coding agents that output executable Python scripts for arithmetic and building analytics.

If this is right

The system supplies accurate and contextual responses to stakeholders ranging from tenants to building managers.
It supports multiple building system applications on data from more than 200 commercial buildings.
Programmatic reasoning via generated scripts replaces standard RAG for technical precision.
Natural language understanding is decoupled from domain analytics so that no single model needs to hold both.
Zero-shot operation works across varying LLM alignment characteristics without per-domain retraining.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same routing-plus-script pattern could apply to other data-heavy domains that mix natural language with precise calculations, such as energy-grid operations or facility maintenance logs.
Real-time sensor feeds could be wired directly into the script execution step to support live queries rather than static datasets.
Error rates might drop further if the coding agents were allowed to chain multiple short scripts instead of one monolithic output per query.

Load-bearing premise

Routing natural language queries to coding agents that emit executable Python scripts will yield reliable technical accuracy without fine-tuning or direct domain-knowledge embedding in the models.

What would settle it

A set of building queries where the generated Python scripts return demonstrably incorrect numerical results or fail to interpret user intent on the 200-building dataset.

read the original abstract

Large Language Model (LLM) offers opportunities to enhance Human-Building Interaction (HBI) by enabling more direct interactions through intuitive interfaces to complex building systems. These systems can be characterized by the vast amounts of data across multiple formats, the lack of nonconfidential and generalizable information, and the requirement of domain expertise for interpretation. Applying LLMs to domain-specific tasks like HBI presents additional challenges. Limited training data makes traditional fine-tuning approaches impractical. Meanwhile, the opacity of LLM training data requires careful integration of domain knowledge to ensure reliability. Additionally, different LLMs exhibit varying alignment characteristics, suggesting that achieving both natural interaction and technical accuracy requires a multi-agent approach. These challenges highlight the need for innovative approaches to adapt LLMs for specialized domains while maintaining accuracy and user engagement. In this paper, we develop a hierarchical multi-agent framework that utilizes semantic routing and programmatic reasoning to decouple natural language understanding from building analytics. Instead of standard RAG approaches, our system employs a "Doorman" mechanism for task decomposition and specialized coding agents that generate executable Python scripts for precise arithmetic. We validate this framework on a dataset from more than 200 commercial buildings. Results demonstrate the effectiveness in providing accurate and contextual responses for diverse users, including stakeholders, from tenants to building managers, across various building system applications.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper applies a Doorman-plus-coding-agents pattern to human-building interaction queries but supplies no quantitative results to back its accuracy claims.

read the letter

The main point is that this work takes existing multi-agent LLM patterns and points them at building management queries by routing natural language to agents that emit Python scripts for the actual calculations. That separation is the central design move.

It does a clean job spelling out the practical constraints: limited non-confidential data, the need for exact arithmetic on energy or HVAC numbers, and why standard fine-tuning or plain RAG falls short. The Doorman for task decomposition plus specialized coding agents is a reasonable way to keep the language side flexible while pushing the technical work into executable code. Applying the idea to a dataset spanning more than 200 commercial buildings shows they are thinking about real deployment scale rather than toy examples.

The soft spot is the evaluation. The abstract states that the system provides accurate and contextual responses but gives no error rates, no baseline comparisons, no breakdown by query type, and no failure cases. Without those numbers it is impossible to judge whether the code-generation step actually delivers the claimed reliability or how often the scripts are wrong. The assumption that off-the-shelf models will produce correct building-specific Python without extra domain scaffolding or verification steps remains untested in the visible material.

This is the kind of applied paper that might interest facilities teams or researchers working on domain-specific LLM agents. A reader looking for a worked example of programmatic reasoning in a constrained setting could pull useful architecture details from it. It is not advancing core LLM methods, but the domain application is narrow enough that modest evidence would still be useful.

I would send it to peer review. The idea is coherent and the motivation is sound; the main request to authors would be for concrete metrics and ablation results so the effectiveness claim can be checked.

Referee Report

2 major / 2 minor

Summary. The paper proposes a hierarchical multi-agent LLM framework for Human-Building Interaction that uses a 'Doorman' semantic router for query decomposition and specialized coding agents that emit executable Python scripts for building analytics computations. It claims this zero-shot approach decouples natural language handling from technical accuracy, avoiding fine-tuning and RAG limitations, and validates the system on data from more than 200 commercial buildings to demonstrate accurate, contextual responses for users ranging from tenants to building managers across building system applications.

Significance. If the quantitative results hold, the separation of routing from programmatic reasoning offers a practical route to reliable domain-specific LLM use in data-scarce settings like building management, where direct fine-tuning is impractical. The design choice to generate executable code rather than rely on LLM arithmetic is a clear strength that could generalize to other technical domains.

major comments (2)

[Abstract and §4] Abstract and §4 (Results): The effectiveness claim for >200 buildings is stated without any reported metrics (accuracy, error rates, success rates), baselines, statistical analysis, or exclusion criteria. This absence is load-bearing because the central contribution is the framework's reliability; without these numbers the claim cannot be evaluated.
[§3] §3 (Methods): The description of the coding agents and Python script execution does not specify how domain knowledge (e.g., building metadata schemas, sensor units, or safety constraints) is injected into the generated code or how runtime errors are handled and reported back to the user.

minor comments (2)

[§3] Notation for the Doorman routing logic is introduced without a formal definition or pseudocode; a diagram or algorithm box would improve clarity.
[Abstract and §4] The abstract mentions 'various building system applications' but the results section does not enumerate which applications were tested or provide per-application breakdowns.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address the major comments point by point below and will revise the manuscript to provide the requested quantitative details and methodological clarifications.

read point-by-point responses

Referee: [Abstract and §4] Abstract and §4 (Results): The effectiveness claim for >200 buildings is stated without any reported metrics (accuracy, error rates, success rates), baselines, statistical analysis, or exclusion criteria. This absence is load-bearing because the central contribution is the framework's reliability; without these numbers the claim cannot be evaluated.

Authors: We agree that the absence of quantitative metrics limits evaluability of the reliability claims. The current manuscript states validation on data from more than 200 buildings but does not report accuracy, error rates, baselines, statistical analysis, or exclusion criteria. In the revised version we will expand §4 with these metrics, including success rates across user types and building systems, baseline comparisons, statistical tests, and explicit exclusion criteria. revision: yes
Referee: [§3] §3 (Methods): The description of the coding agents and Python script execution does not specify how domain knowledge (e.g., building metadata schemas, sensor units, or safety constraints) is injected into the generated code or how runtime errors are handled and reported back to the user.

Authors: We acknowledge the need for greater specificity. The revised §3 will explicitly describe how domain knowledge is injected via structured prompts containing building metadata schemas, standardized sensor units, and safety constraints. It will also detail the runtime error handling process, in which execution errors are captured, returned to the coding agents for correction through iterative prompting, and only then surfaced to the user with explanatory context. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper presents an architectural description of a hierarchical multi-agent LLM framework (Doorman routing plus specialized coding agents emitting Python scripts) for human-building interaction, validated on a >200-building dataset. No equations, fitted parameters, self-citations, or derivation chains appear in the abstract or described content. The central claim of effectiveness is framed as an empirical outcome of the proposed system rather than a result reduced by construction to its own inputs or prior self-referential work. This is a standard non-circular framework proposal.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only abstract available; no free parameters, axioms, or invented entities are described.

pith-pipeline@v0.9.1-grok · 5767 in / 947 out tokens · 31128 ms · 2026-06-27T10:27:47.554228+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

47 extracted references · 9 canonical work pages · 3 internal anchors

[1]

It’s about time: A comparison of canadian and american time–activity patterns

LEECH, J. A., NELSON, W. C., BURNETT, R. T., AARON, S., and RAIZENNE, M. E., 2002. “It’s about time: A comparison of canadian and american time–activity patterns”.Journal of Exposure Science & Environmental Epidemiology,12(6), p. 427–432

2002
[2]

A review of select human-building interfaces and their relationship to human behavior, energy use and occupant comfort

Day, J. K., McIlvennie, C., Brackley, C., Tarantini, M., Piselli, C., Hahn, J., O’Brien, W., Rajus, V . S., De Simone, M., Kjærgaard, M. B., et al., 2020. “A review of select human-building interfaces and their relationship to human behavior, energy use and occupant comfort”.Building and environment,178, p. 106920

2020
[3]

S., Churchill, E

Alavi, H. S., Churchill, E. F., Wiberg, M., Lalanne, D., Dalsgaard, P., Fatah gen Schieck, A., and Rogers, Y ., 2019. Introduction to human-building interaction (hbi) interfac- ing hci with architecture and urban design

2019
[4]

The field of human building interaction for convergent research and innovation for intelligent built environments

Becerik-Gerber, B., Lucas, G., Aryal, A., Awada, M., Berg´es, M., Billington, S., Boric-Lubecke, O., Ghahra- mani, A., Heydarian, A., H ¨oelscher, C., et al., 2022. “The field of human building interaction for convergent research and innovation for intelligent built environments”.Scien- tific Reports,12(1), p. 22092

2022
[5]

I., 2022

Messner, J. I., 2022. The lifecycle of a building project. Accessed: 2024-09-02

2022
[6]

Modeling and simulation of energy-related human-building interac- tion: A systematic review

Norouziasl, S., Jafari, A., and Zhu, Y ., 2021. “Modeling and simulation of energy-related human-building interac- tion: A systematic review”.Journal of Building Engineer- ing,44, p. 102928

2021
[7]

Human-building interaction for indoor environmental con- trol: Evolution of technology and future prospects

Kim, H., Kang, H., Choi, H., Jung, D., and Hong, T., 2023. “Human-building interaction for indoor environmental con- trol: Evolution of technology and future prospects”.Au- tomation in Construction,152, p. 104938

2023
[8]

Ten questions concerning human-building interaction research for improving the quality of life

Becerik-Gerber, B., Lucas, G., Aryal, A., Awada, M., Berg´es, M., Billington, S. L., Boric-Lubecke, O., Ghahra- mani, A., Heydarian, A., Jazizadeh, F., et al., 2022. “Ten questions concerning human-building interaction research for improving the quality of life”.Building and Environ- ment,226, p. 109681

2022
[9]

Bosch building solutions - history of building automation.https: //www.boschbuildingsolutions

Bosch, 2023. Bosch building solutions - history of building automation.https: //www.boschbuildingsolutions. com/xc/en/news-and-stories/ history-of-building-automation/. Accessed: 2023-05-23

2023
[10]

Nantum ai

Nantum AI, 2024. Nantum ai. Accessed: 2024-06-22

2024
[11]

Design and applica- tions of an iot architecture for data-driven smart building operations and experimentation

Malkawi, A., Ervin, S., Han, X., Chen, E. X., Lim, S., Am- panavos, S., and Howard, P., 2023. “Design and applica- tions of an iot architecture for data-driven smart building operations and experimentation”.Energy and Buildings, 295, p. 113291

2023
[12]

The foundation for a smarter home

Apple, 2024. The foundation for a smarter home. Accessed: 2024-09-02

2024
[13]

Indoor envi- ronmental wellness index (iew-index): Towards intelligent building systems automation and optimization

Wang, Y ., Shen, G., and Mehmani, A., 2024. “Indoor envi- ronmental wellness index (iew-index): Towards intelligent building systems automation and optimization”.Building and Environment,247, p. 111039

2024
[14]

Word2Vec

Church, K. W., 2017. “Word2Vec”.Natural Language En- gineering,23(1), Jan., pp. 155–162

2017
[15]

Glove: Global Vectors for Word Representation

Pennington, J., Socher, R., and Manning, C., 2014. “Glove: Global Vectors for Word Representation”. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Association for Computa- tional Linguistics, pp. 1532–1543

2014
[16]

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K., 2019. BERT: Pre-training of Deep Bidirec- tional Transformers for Language Understanding, May. arXiv:1810.04805

work page internal anchor Pith review Pith/arXiv arXiv 2019
[17]

A Survey of Large Language Models

Zhao, W. X., Zhou, K., Li, J., Tang, T., Wang, X., Hou, Y ., Min, Y ., Zhang, B., Zhang, J., Dong, Z., et al., 2023. “A survey of large language models”.arXiv preprint arXiv:2303.18223

work page internal anchor Pith review Pith/arXiv arXiv 2023
[18]

Gong, L., Wang, S., Elhoushi, M., and Cheung, A.,
[19]

Evaluation of LLMs on syntax-aware code fill- in-the-middle tasks

“Evaluation of LLMs on syntax-aware code fill- in-the-middle tasks”. In Proceedings of the 41st Interna- tional Conference on Machine Learning, R. Salakhutdinov, Z. Kolter, K. Heller, A. Weller, N. Oliver, J. Scarlett, and F. Berkenkamp, eds., V ol. 235 ofProceedings of Machine Learning Research, PMLR, pp. 15907–15928
[20]

Gorilla: Large Language Model Connected with Massive APIs

Patil, S. G., Zhang, T., Wang, X., and Gonzalez, J. E., 2023. “Gorilla: Large language model connected with massive apis”.arXiv preprint arXiv:2305.15334

work page internal anchor Pith review Pith/arXiv arXiv 2023
[21]

SceneCraft: An LLM agent for synthesizing 3D scenes as blender code

Hu, Z., Iscen, A., Jain, A., Kipf, T., Yue, Y ., Ross, D. A., 9 Copyright © 2026 by ASME Schmid, C., and Fathi, A., 2024. “SceneCraft: An LLM agent for synthesizing 3D scenes as blender code”. In Pro- ceedings of the 41st International Conference on Machine Learning, R. Salakhutdinov, Z. Kolter, K. Heller, A. Weller, N. Oliver, J. Scarlett, and F. Berkenk...

2026
[22]

Interpreting and improving large language models in arithmetic calculation

Zhang, W., Wan, C., Zhang, Y ., Cheung, Y .-M., Tian, X., Shen, X., and Ye, J., 2024. “Interpreting and improving large language models in arithmetic calculation”. In Pro- ceedings of the 41st International Conference on Machine Learning, R. Salakhutdinov, Z. Kolter, K. Heller, A. Weller, N. Oliver, J. Scarlett, and F. Berkenkamp, eds., V ol. 235 ofProcee...

2024
[23]

Continual learning of large language models: A comprehensive sur- vey

Shi, H., Xu, Z., Wang, H., Qin, W., Wang, W., Wang, Y ., Wang, Z., Ebrahimi, S., and Wang, H., 2024. Continual learning of large language models: A comprehensive sur- vey

2024
[24]

A., 2020

Gururangan, S., Marasovi ´c, A., Swayamdipta, S., Lo, K., Beltagy, I., Downey, D., and Smith, N. A., 2020. Don’t stop pretraining: Adapt language models to domains and tasks

2020
[25]

Gsm-symbolic: Under- standing the limitations of mathematical reasoning in large language models

Mirzadeh, I., Alizadeh, K., Shahrokhi, H., Tuzel, O., Ben- gio, S., and Farajtabar, M., 2024. Gsm-symbolic: Under- standing the limitations of mathematical reasoning in large language models

2024
[26]

Retrieval-augmented generation for knowledge-intensive nlp tasks

Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V ., Goyal, N., K ¨uttler, H., Lewis, M., Yih, W.-t., Rockt¨aschel, T., et al., 2020. “Retrieval-augmented generation for knowledge-intensive nlp tasks”.Advances in Neural In- formation Processing Systems,33, pp. 9459–9474

2020
[27]

A survey on llm-based multi-agent systems: workflow, infras- tructure, and challenges

Li, X., Wang, S., Zeng, S., Wu, Y ., and Yang, Y ., 2024. “A survey on llm-based multi-agent systems: workflow, infras- tructure, and challenges”.Vicinagearth,1(1), p. 9

2024
[28]

APT: Adap- tive pruning and tuning pretrained language models for effi- cient training and inference

Zhao, B., Hajishirzi, H., and Cao, Q., 2024. “APT: Adap- tive pruning and tuning pretrained language models for effi- cient training and inference”. In Proceedings of the 41st In- ternational Conference on Machine Learning, R. Salakhut- dinov, Z. Kolter, K. Heller, A. Weller, N. Oliver, J. Scarlett, and F. Berkenkamp, eds., V ol. 235 ofProceedings of Ma- ...

2024
[29]

Li, C., Liang, J., Zeng, A., Chen, X., Hausman, K., Sadigh, D., Levine, S., Fei-Fei, L., Xia, F., and Ichter, B.,
[30]

Chain of code: Reasoning with a language model- augmented code emulator

“Chain of code: Reasoning with a language model- augmented code emulator”. In Proceedings of the 41st In- ternational Conference on Machine Learning, R. Salakhut- dinov, Z. Kolter, K. Heller, A. Weller, N. Oliver, J. Scarlett, and F. Berkenkamp, eds., V ol. 235 ofProceedings of Ma- chine Learning Research, PMLR, pp. 28259–28277
[31]

Improving factuality and reasoning in lan- guage models through multiagent debate

Du, Y ., Li, S., Torralba, A., Tenenbaum, J. B., and Mor- datch, I., 2024. “Improving factuality and reasoning in lan- guage models through multiagent debate”. In Proceedings of the 41st International Conference on Machine Learning, ICML’24, JMLR.org

2024
[32]

Agent instructs large language models to be general zero-shot reasoners

Crispino, N., Montgomery, K., Zeng, F., Song, D., and Wang, C., 2024. “Agent instructs large language models to be general zero-shot reasoners”. In Proceedings of the 41st International Conference on Machine Learning, ICML’24, JMLR.org

2024
[33]

Transformer machine learning language model for auto- alignment of long-term and short-term plans in construc- tion

Amer, F., Jung, Y ., and Golparvar-Fard, M., 2021. “Transformer machine learning language model for auto- alignment of long-term and short-term plans in construc- tion”.Automation in Construction,132, p. 103929

2021
[34]

Investigating the use of chatgpt for the scheduling of con- struction projects

Prieto, S. A., Mengiste, E. T., and Garc´ıa de Soto, B., 2023. “Investigating the use of chatgpt for the scheduling of con- struction projects”.Buildings,13(4), p. 857

2023
[35]

Gpt models in con- struction industry: Opportunities, limitations, and a use case validation

Saka, A., Taiwo, R., Saka, N., Salami, B. A., Ajayi, S., Akande, K., and Kazemi, H., 2023. “Gpt models in con- struction industry: Opportunities, limitations, and a use case validation”.Developments in the Built Environment, p. 100300

2023
[36]

Leveraging chatgpt to aid construction hazard recognition and support safety education and training

Uddin, S. J., Albert, A., Ovid, A., and Alsharef, A., 2023. “Leveraging chatgpt to aid construction hazard recognition and support safety education and training”.Sustainability, 15(9), p. 7121

2023
[37]

Text2bim: Generating building models using a large language model-based multi-agent framework

Du, C., Esser, S., Nousias, S., and Borrmann, A., 2024. “Text2bim: Generating building models using a large language model-based multi-agent framework”.arXiv preprint arXiv:2408.08054

work page arXiv 2024
[38]

Llm-funcmapper: Function identification for interpreting complex clauses in building codes via llm

Zheng, Z., Chen, K.-Y ., Cao, X.-Y ., Lu, X.-Z., and Lin, J.- R., 2023. “Llm-funcmapper: Function identification for interpreting complex clauses in building codes via llm”. arXiv preprint arXiv:2308.08728

work page arXiv 2023
[39]

Automated building information modeling compliance check through a large language model combined with deep learning and ontology

Chen, N., Lin, X., Jiang, H., and An, Y ., 2024. “Automated building information modeling compliance check through a large language model combined with deep learning and ontology”.Buildings,14(7), p. 1983

2024
[40]

BIM-GPT: A prompt-based virtual assistant framework for bim information retrieval.arXiv preprint arXiv:2304.09333, 2023

Zheng, J., and Fischer, M., 2023. “Bim-gpt: a prompt- based virtual assistant framework for bim information re- trieval”.arXiv preprint arXiv:2304.09333

work page arXiv 2023
[41]

Hotgpt: How to make software documentation more useful with a large language model?

Su, Y ., Wan, C., Sethi, U., Lu, S., Musuvathi, M., and Nath, S., 2023. “Hotgpt: How to make software documentation more useful with a large language model?”. In Proceedings of the 19th Workshop on Hot Topics in Operating Systems, pp. 87–93

2023
[42]

Advancing build- ing energy modeling with large language models: Explo- ration and case studies

Zhang, L., Chen, Z., and Ford, V ., 2024. “Advancing build- ing energy modeling with large language models: Explo- ration and case studies”.arXiv preprint arXiv:2402.09579

work page arXiv 2024
[43]

Eplus- llm: A large language model-based computing platform for automated building energy modeling

Jiang, G., Ma, Z., Zhang, L., and Chen, J., 2024. “Eplus- llm: A large language model-based computing platform for automated building energy modeling”.Applied Energy, 367, p. 123431

2024
[44]

Semantic enrichment 10 Copyright © 2026 by ASME for bim-based building energy performance simulations us- ing semantic textual similarity and fine-tuning multilingual llm

Forth, K., and Borrmann, A., 2024. “Semantic enrichment 10 Copyright © 2026 by ASME for bim-based building energy performance simulations us- ing semantic textual similarity and fine-tuning multilingual llm”.Journal of Building Engineering,95, p. 110312

2024
[45]

Using large language models for the interpretation of building regulations

Fuchs, S., Witbrock, M., Dimyadi, J., and Amor, R., 2024. “Using large language models for the interpretation of building regulations”.arXiv preprint arXiv:2407.21060

work page arXiv 2024
[46]

Exploring automated en- ergy optimization with unstructured building data: A multi- agent based framework leveraging large language models

Xiao, T., and Xu, P., 2024. “Exploring automated en- ergy optimization with unstructured building data: A multi- agent based framework leveraging large language models”. Energy and Buildings, p. 114691

2024
[47]

An llm- based digital twin for optimizing human-in-the loop sys- tems

Yang, H., Siew, M., and Joe-Wong, C., 2024. “An llm- based digital twin for optimizing human-in-the loop sys- tems”.arXiv preprint arXiv:2403.16809. A Survey Instrument and Response Data The complete survey instrument (including all questions and answer options) and the anonymized user response dataset are available at the following links: Survey instrume...

work page arXiv 2024

[1] [1]

It’s about time: A comparison of canadian and american time–activity patterns

LEECH, J. A., NELSON, W. C., BURNETT, R. T., AARON, S., and RAIZENNE, M. E., 2002. “It’s about time: A comparison of canadian and american time–activity patterns”.Journal of Exposure Science & Environmental Epidemiology,12(6), p. 427–432

2002

[2] [2]

A review of select human-building interfaces and their relationship to human behavior, energy use and occupant comfort

Day, J. K., McIlvennie, C., Brackley, C., Tarantini, M., Piselli, C., Hahn, J., O’Brien, W., Rajus, V . S., De Simone, M., Kjærgaard, M. B., et al., 2020. “A review of select human-building interfaces and their relationship to human behavior, energy use and occupant comfort”.Building and environment,178, p. 106920

2020

[3] [3]

S., Churchill, E

Alavi, H. S., Churchill, E. F., Wiberg, M., Lalanne, D., Dalsgaard, P., Fatah gen Schieck, A., and Rogers, Y ., 2019. Introduction to human-building interaction (hbi) interfac- ing hci with architecture and urban design

2019

[4] [4]

The field of human building interaction for convergent research and innovation for intelligent built environments

Becerik-Gerber, B., Lucas, G., Aryal, A., Awada, M., Berg´es, M., Billington, S., Boric-Lubecke, O., Ghahra- mani, A., Heydarian, A., H ¨oelscher, C., et al., 2022. “The field of human building interaction for convergent research and innovation for intelligent built environments”.Scien- tific Reports,12(1), p. 22092

2022

[5] [5]

I., 2022

Messner, J. I., 2022. The lifecycle of a building project. Accessed: 2024-09-02

2022

[6] [6]

Modeling and simulation of energy-related human-building interac- tion: A systematic review

Norouziasl, S., Jafari, A., and Zhu, Y ., 2021. “Modeling and simulation of energy-related human-building interac- tion: A systematic review”.Journal of Building Engineer- ing,44, p. 102928

2021

[7] [7]

Human-building interaction for indoor environmental con- trol: Evolution of technology and future prospects

Kim, H., Kang, H., Choi, H., Jung, D., and Hong, T., 2023. “Human-building interaction for indoor environmental con- trol: Evolution of technology and future prospects”.Au- tomation in Construction,152, p. 104938

2023

[8] [8]

Ten questions concerning human-building interaction research for improving the quality of life

Becerik-Gerber, B., Lucas, G., Aryal, A., Awada, M., Berg´es, M., Billington, S. L., Boric-Lubecke, O., Ghahra- mani, A., Heydarian, A., Jazizadeh, F., et al., 2022. “Ten questions concerning human-building interaction research for improving the quality of life”.Building and Environ- ment,226, p. 109681

2022

[9] [9]

Bosch building solutions - history of building automation.https: //www.boschbuildingsolutions

Bosch, 2023. Bosch building solutions - history of building automation.https: //www.boschbuildingsolutions. com/xc/en/news-and-stories/ history-of-building-automation/. Accessed: 2023-05-23

2023

[10] [10]

Nantum ai

Nantum AI, 2024. Nantum ai. Accessed: 2024-06-22

2024

[11] [11]

Design and applica- tions of an iot architecture for data-driven smart building operations and experimentation

Malkawi, A., Ervin, S., Han, X., Chen, E. X., Lim, S., Am- panavos, S., and Howard, P., 2023. “Design and applica- tions of an iot architecture for data-driven smart building operations and experimentation”.Energy and Buildings, 295, p. 113291

2023

[12] [12]

The foundation for a smarter home

Apple, 2024. The foundation for a smarter home. Accessed: 2024-09-02

2024

[13] [13]

Indoor envi- ronmental wellness index (iew-index): Towards intelligent building systems automation and optimization

Wang, Y ., Shen, G., and Mehmani, A., 2024. “Indoor envi- ronmental wellness index (iew-index): Towards intelligent building systems automation and optimization”.Building and Environment,247, p. 111039

2024

[14] [14]

Word2Vec

Church, K. W., 2017. “Word2Vec”.Natural Language En- gineering,23(1), Jan., pp. 155–162

2017

[15] [15]

Glove: Global Vectors for Word Representation

Pennington, J., Socher, R., and Manning, C., 2014. “Glove: Global Vectors for Word Representation”. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Association for Computa- tional Linguistics, pp. 1532–1543

2014

[16] [16]

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K., 2019. BERT: Pre-training of Deep Bidirec- tional Transformers for Language Understanding, May. arXiv:1810.04805

work page internal anchor Pith review Pith/arXiv arXiv 2019

[17] [17]

A Survey of Large Language Models

Zhao, W. X., Zhou, K., Li, J., Tang, T., Wang, X., Hou, Y ., Min, Y ., Zhang, B., Zhang, J., Dong, Z., et al., 2023. “A survey of large language models”.arXiv preprint arXiv:2303.18223

work page internal anchor Pith review Pith/arXiv arXiv 2023

[18] [18]

Gong, L., Wang, S., Elhoushi, M., and Cheung, A.,

[19] [19]

Evaluation of LLMs on syntax-aware code fill- in-the-middle tasks

“Evaluation of LLMs on syntax-aware code fill- in-the-middle tasks”. In Proceedings of the 41st Interna- tional Conference on Machine Learning, R. Salakhutdinov, Z. Kolter, K. Heller, A. Weller, N. Oliver, J. Scarlett, and F. Berkenkamp, eds., V ol. 235 ofProceedings of Machine Learning Research, PMLR, pp. 15907–15928

[20] [20]

Gorilla: Large Language Model Connected with Massive APIs

Patil, S. G., Zhang, T., Wang, X., and Gonzalez, J. E., 2023. “Gorilla: Large language model connected with massive apis”.arXiv preprint arXiv:2305.15334

work page internal anchor Pith review Pith/arXiv arXiv 2023

[21] [21]

SceneCraft: An LLM agent for synthesizing 3D scenes as blender code

Hu, Z., Iscen, A., Jain, A., Kipf, T., Yue, Y ., Ross, D. A., 9 Copyright © 2026 by ASME Schmid, C., and Fathi, A., 2024. “SceneCraft: An LLM agent for synthesizing 3D scenes as blender code”. In Pro- ceedings of the 41st International Conference on Machine Learning, R. Salakhutdinov, Z. Kolter, K. Heller, A. Weller, N. Oliver, J. Scarlett, and F. Berkenk...

2026

[22] [22]

Interpreting and improving large language models in arithmetic calculation

Zhang, W., Wan, C., Zhang, Y ., Cheung, Y .-M., Tian, X., Shen, X., and Ye, J., 2024. “Interpreting and improving large language models in arithmetic calculation”. In Pro- ceedings of the 41st International Conference on Machine Learning, R. Salakhutdinov, Z. Kolter, K. Heller, A. Weller, N. Oliver, J. Scarlett, and F. Berkenkamp, eds., V ol. 235 ofProcee...

2024

[23] [23]

Continual learning of large language models: A comprehensive sur- vey

Shi, H., Xu, Z., Wang, H., Qin, W., Wang, W., Wang, Y ., Wang, Z., Ebrahimi, S., and Wang, H., 2024. Continual learning of large language models: A comprehensive sur- vey

2024

[24] [24]

A., 2020

Gururangan, S., Marasovi ´c, A., Swayamdipta, S., Lo, K., Beltagy, I., Downey, D., and Smith, N. A., 2020. Don’t stop pretraining: Adapt language models to domains and tasks

2020

[25] [25]

Gsm-symbolic: Under- standing the limitations of mathematical reasoning in large language models

Mirzadeh, I., Alizadeh, K., Shahrokhi, H., Tuzel, O., Ben- gio, S., and Farajtabar, M., 2024. Gsm-symbolic: Under- standing the limitations of mathematical reasoning in large language models

2024

[26] [26]

Retrieval-augmented generation for knowledge-intensive nlp tasks

Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V ., Goyal, N., K ¨uttler, H., Lewis, M., Yih, W.-t., Rockt¨aschel, T., et al., 2020. “Retrieval-augmented generation for knowledge-intensive nlp tasks”.Advances in Neural In- formation Processing Systems,33, pp. 9459–9474

2020

[27] [27]

A survey on llm-based multi-agent systems: workflow, infras- tructure, and challenges

Li, X., Wang, S., Zeng, S., Wu, Y ., and Yang, Y ., 2024. “A survey on llm-based multi-agent systems: workflow, infras- tructure, and challenges”.Vicinagearth,1(1), p. 9

2024

[28] [28]

APT: Adap- tive pruning and tuning pretrained language models for effi- cient training and inference

Zhao, B., Hajishirzi, H., and Cao, Q., 2024. “APT: Adap- tive pruning and tuning pretrained language models for effi- cient training and inference”. In Proceedings of the 41st In- ternational Conference on Machine Learning, R. Salakhut- dinov, Z. Kolter, K. Heller, A. Weller, N. Oliver, J. Scarlett, and F. Berkenkamp, eds., V ol. 235 ofProceedings of Ma- ...

2024

[29] [29]

Li, C., Liang, J., Zeng, A., Chen, X., Hausman, K., Sadigh, D., Levine, S., Fei-Fei, L., Xia, F., and Ichter, B.,

[30] [30]

Chain of code: Reasoning with a language model- augmented code emulator

“Chain of code: Reasoning with a language model- augmented code emulator”. In Proceedings of the 41st In- ternational Conference on Machine Learning, R. Salakhut- dinov, Z. Kolter, K. Heller, A. Weller, N. Oliver, J. Scarlett, and F. Berkenkamp, eds., V ol. 235 ofProceedings of Ma- chine Learning Research, PMLR, pp. 28259–28277

[31] [31]

Improving factuality and reasoning in lan- guage models through multiagent debate

Du, Y ., Li, S., Torralba, A., Tenenbaum, J. B., and Mor- datch, I., 2024. “Improving factuality and reasoning in lan- guage models through multiagent debate”. In Proceedings of the 41st International Conference on Machine Learning, ICML’24, JMLR.org

2024

[32] [32]

Agent instructs large language models to be general zero-shot reasoners

Crispino, N., Montgomery, K., Zeng, F., Song, D., and Wang, C., 2024. “Agent instructs large language models to be general zero-shot reasoners”. In Proceedings of the 41st International Conference on Machine Learning, ICML’24, JMLR.org

2024

[33] [33]

Transformer machine learning language model for auto- alignment of long-term and short-term plans in construc- tion

Amer, F., Jung, Y ., and Golparvar-Fard, M., 2021. “Transformer machine learning language model for auto- alignment of long-term and short-term plans in construc- tion”.Automation in Construction,132, p. 103929

2021

[34] [34]

Investigating the use of chatgpt for the scheduling of con- struction projects

Prieto, S. A., Mengiste, E. T., and Garc´ıa de Soto, B., 2023. “Investigating the use of chatgpt for the scheduling of con- struction projects”.Buildings,13(4), p. 857

2023

[35] [35]

Gpt models in con- struction industry: Opportunities, limitations, and a use case validation

Saka, A., Taiwo, R., Saka, N., Salami, B. A., Ajayi, S., Akande, K., and Kazemi, H., 2023. “Gpt models in con- struction industry: Opportunities, limitations, and a use case validation”.Developments in the Built Environment, p. 100300

2023

[36] [36]

Leveraging chatgpt to aid construction hazard recognition and support safety education and training

Uddin, S. J., Albert, A., Ovid, A., and Alsharef, A., 2023. “Leveraging chatgpt to aid construction hazard recognition and support safety education and training”.Sustainability, 15(9), p. 7121

2023

[37] [37]

Text2bim: Generating building models using a large language model-based multi-agent framework

Du, C., Esser, S., Nousias, S., and Borrmann, A., 2024. “Text2bim: Generating building models using a large language model-based multi-agent framework”.arXiv preprint arXiv:2408.08054

work page arXiv 2024

[38] [38]

Llm-funcmapper: Function identification for interpreting complex clauses in building codes via llm

Zheng, Z., Chen, K.-Y ., Cao, X.-Y ., Lu, X.-Z., and Lin, J.- R., 2023. “Llm-funcmapper: Function identification for interpreting complex clauses in building codes via llm”. arXiv preprint arXiv:2308.08728

work page arXiv 2023

[39] [39]

Automated building information modeling compliance check through a large language model combined with deep learning and ontology

Chen, N., Lin, X., Jiang, H., and An, Y ., 2024. “Automated building information modeling compliance check through a large language model combined with deep learning and ontology”.Buildings,14(7), p. 1983

2024

[40] [40]

BIM-GPT: A prompt-based virtual assistant framework for bim information retrieval.arXiv preprint arXiv:2304.09333, 2023

Zheng, J., and Fischer, M., 2023. “Bim-gpt: a prompt- based virtual assistant framework for bim information re- trieval”.arXiv preprint arXiv:2304.09333

work page arXiv 2023

[41] [41]

Hotgpt: How to make software documentation more useful with a large language model?

Su, Y ., Wan, C., Sethi, U., Lu, S., Musuvathi, M., and Nath, S., 2023. “Hotgpt: How to make software documentation more useful with a large language model?”. In Proceedings of the 19th Workshop on Hot Topics in Operating Systems, pp. 87–93

2023

[42] [42]

Advancing build- ing energy modeling with large language models: Explo- ration and case studies

Zhang, L., Chen, Z., and Ford, V ., 2024. “Advancing build- ing energy modeling with large language models: Explo- ration and case studies”.arXiv preprint arXiv:2402.09579

work page arXiv 2024

[43] [43]

Eplus- llm: A large language model-based computing platform for automated building energy modeling

Jiang, G., Ma, Z., Zhang, L., and Chen, J., 2024. “Eplus- llm: A large language model-based computing platform for automated building energy modeling”.Applied Energy, 367, p. 123431

2024

[44] [44]

Semantic enrichment 10 Copyright © 2026 by ASME for bim-based building energy performance simulations us- ing semantic textual similarity and fine-tuning multilingual llm

Forth, K., and Borrmann, A., 2024. “Semantic enrichment 10 Copyright © 2026 by ASME for bim-based building energy performance simulations us- ing semantic textual similarity and fine-tuning multilingual llm”.Journal of Building Engineering,95, p. 110312

2024

[45] [45]

Using large language models for the interpretation of building regulations

Fuchs, S., Witbrock, M., Dimyadi, J., and Amor, R., 2024. “Using large language models for the interpretation of building regulations”.arXiv preprint arXiv:2407.21060

work page arXiv 2024

[46] [46]

Exploring automated en- ergy optimization with unstructured building data: A multi- agent based framework leveraging large language models

Xiao, T., and Xu, P., 2024. “Exploring automated en- ergy optimization with unstructured building data: A multi- agent based framework leveraging large language models”. Energy and Buildings, p. 114691

2024

[47] [47]

An llm- based digital twin for optimizing human-in-the loop sys- tems

Yang, H., Siew, M., and Joe-Wong, C., 2024. “An llm- based digital twin for optimizing human-in-the loop sys- tems”.arXiv preprint arXiv:2403.16809. A Survey Instrument and Response Data The complete survey instrument (including all questions and answer options) and the anonymized user response dataset are available at the following links: Survey instrume...

work page arXiv 2024