pith. machine review for the scientific record. sign in

arxiv: 2605.01698 · v1 · submitted 2026-05-03 · 💻 cs.CL · cs.AI

Recognition: unknown

BIM Information Extraction Through LLM-based Adaptive Exploration

Authors on Pith no claims yet

Pith reviewed 2026-05-10 15:53 UTC · model grok-4.3

classification 💻 cs.CL cs.AI
keywords BIMIFCLLMinformation extractionadaptive explorationquestion answeringbenchmark
0
0 comments X

The pith

An LLM agent that iteratively executes code to explore BIM model structure at runtime extracts information more reliably than static query translation methods.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

BIM models organize building data in highly variable ways, so methods that assume a fixed structure often fail to retrieve the right details. The paper replaces that assumption with adaptive exploration: an LLM agent writes and runs code on the model itself, discovering the actual data layout during the query. On a new benchmark of 1027 tasks spanning 37 IFC models from 21 projects, the adaptive method beats static query generation in every tested configuration of model size and extra context. This points to handling data heterogeneity by changing the extraction paradigm rather than tuning the old static pipeline.

Core claim

Adaptive exploration lets an LLM-based agent iteratively generate and execute code against a BIM model to discover its runtime structure instead of relying on a pre-assumed data organization. When tested on ifc-bench v2, this approach yields significantly higher accuracy than static query generation across two LLM capability levels and four augmentation strategies.

What carries the argument

The adaptive exploration loop, in which an LLM agent dynamically writes and runs code to probe and extract from the BIM model without presupposing its schema.

If this is right

  • Adaptive exploration improves results independently of LLM size or added context strategies.
  • Further tuning of static query methods is unlikely to close the performance gap created by model heterogeneity.
  • BIM information extraction benefits more from runtime discovery than from stronger assumptions about data layout.
  • The accompanying ifc-bench v2 benchmark provides a reusable testbed for comparing extraction paradigms.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same iterative code-execution pattern could be applied to other heterogeneous engineering data such as CAD assemblies or GIS city models.
  • Production deployments would need safeguards to guarantee that the agent's code executions remain bounded and safe.
  • If adaptive agents prove robust, they could reduce reliance on rigid IFC schema compliance for downstream analytics tools.

Load-bearing premise

The ifc-bench v2 benchmark captures enough real-world BIM variation and the agent's code execution step stays reliable and complete across models without introducing errors or truncated searches.

What would settle it

A head-to-head test on a fresh collection of BIM models drawn from projects outside the 37 used in ifc-bench v2 that shows no accuracy gain for adaptive exploration over static query generation would falsify the central claim.

Figures

Figures reproduced from arXiv: 2605.01698 by Andr\'e Borrmann, Stavros Nousias, Stefan Fuchs, Suhyung Jang, Sylvain Hellin.

Figure 1
Figure 1. Figure 1: Conventional approaches translate a natural language query into a single structured query in languages such as SQL or SPARQL [8, 9]. These methods require converting BIM models into alternative representations that typically capture only a subset of the full IFC schema. The limitation is the fixed, design-time assumptions about data structure: these systems support only pre-defined query types with exact n… view at source ↗
Figure 1
Figure 1. Figure 1: Four approaches to BIM information extraction, distinguished by how users [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Adaptive exploration execution flow. The agent iteratively writes and executes [PITH_FULL_IMAGE:figures/full_fig_p015_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Automated tool generation pipeline (optional augmentation). Path A (correct [PITH_FULL_IMAGE:figures/full_fig_p019_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Representative 3D views of the 21 projects in the ifc-bench corpus, illustrating [PITH_FULL_IMAGE:figures/full_fig_p023_4.png] view at source ↗
read the original abstract

BIM models provide structured representations of building geometry, semantics, and topology, yet extracting specific information from them remains remarkably difficult. Current approaches translate natural language into structured queries by assuming a fixed data organization (static approach), which BIM heterogeneity eventually invalidates. We address this with a new paradigm, adaptive exploration, where an LLM-based agent iteratively executes code to extract information from a BIM model, discovering its structure at runtime instead of assuming it. We evaluate this approach on ifc-bench v2, an open-source BIM question-answering benchmark introduced alongside this work, comprising 1,027 tasks across 37 IFC models from 21 projects. A factorial ablation across two LLM capability levels and four augmentation strategies shows that adaptive exploration significantly outperforms static query generation across all configurations, regardless of the augmentation strategy. These results indicate that BIM heterogeneity is best addressed at the paradigm level, not by further optimizing static approaches.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes adaptive exploration, an LLM-agent paradigm that iteratively executes code to discover BIM model structures at runtime, as an alternative to static query generation that assumes fixed data organization. It introduces the open ifc-bench v2 benchmark with 1,027 tasks over 37 IFC models from 21 projects and reports a factorial ablation across two LLM capability levels and four augmentation strategies in which adaptive exploration significantly outperforms static baselines in all configurations.

Significance. If the results hold, the work offers a paradigm-level approach to BIM heterogeneity that could reduce reliance on brittle static assumptions. The open release of ifc-bench v2 is a concrete strength that enables reproducible follow-up work and direct comparisons.

major comments (2)
  1. [§4] §4 (Benchmark): ifc-bench v2 is introduced in the same manuscript with no external validation or third-party task curation. Because task selection and ground-truth construction may implicitly target properties whose IFC paths vary across projects, static baselines are placed at a structural disadvantage by design; this directly undermines the claim that uniform superiority demonstrates a paradigm-level advantage rather than a benchmark artifact.
  2. [§5.2] §5.2 (Ablation and Results): the factorial study reports consistent outperformance but provides no statistical significance tests, per-task error analysis, or breakdown by model heterogeneity level. Without these, it is impossible to determine whether the reported gains are robust or driven by a subset of tasks that favor runtime discovery.
minor comments (2)
  1. [Abstract] Abstract: the magnitude of improvement (e.g., absolute accuracy deltas) and the precise success metric are not stated, making it difficult to gauge practical impact.
  2. [§3] §3 (Method): the four augmentation strategies are referenced but not defined until later; an early summary table would improve readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We address each major point below and describe the revisions we will make to the manuscript.

read point-by-point responses
  1. Referee: [§4] §4 (Benchmark): ifc-bench v2 is introduced in the same manuscript with no external validation or third-party task curation. Because task selection and ground-truth construction may implicitly target properties whose IFC paths vary across projects, static baselines are placed at a structural disadvantage by design; this directly undermines the claim that uniform superiority demonstrates a paradigm-level advantage rather than a benchmark artifact.

    Authors: We acknowledge that ifc-bench v2 is newly introduced in this work. The benchmark comprises 1,027 tasks drawn from 37 IFC models across 21 distinct real-world projects, with tasks formulated as practical natural-language information extraction questions that professionals would pose. Ground-truth labels were obtained via manual verification on the actual model data rather than by presupposing particular IFC path structures. The consistent superiority of adaptive exploration across all four augmentation strategies and both LLM capability levels indicates that the gains arise from runtime structure discovery rather than benchmark design. To strengthen this claim, we will expand §4 with an explicit description of the task curation protocol, including how questions were generated to reflect domain use cases independent of data organization, and add qualitative examples demonstrating that tasks do not encode assumptions about specific IFC paths. The open release of the benchmark will also enable independent third-party validation. revision: partial

  2. Referee: [§5.2] §5.2 (Ablation and Results): the factorial study reports consistent outperformance but provides no statistical significance tests, per-task error analysis, or breakdown by model heterogeneity level. Without these, it is impossible to determine whether the reported gains are robust or driven by a subset of tasks that favor runtime discovery.

    Authors: We agree that the results section would be strengthened by additional quantitative and qualitative analyses. In the revised manuscript we will add statistical significance tests (paired Wilcoxon signed-rank tests with Bonferroni correction across the factorial conditions) together with 95% bootstrap confidence intervals on the performance deltas. We will also include a per-task error breakdown that classifies failures into categories such as discovery errors, code execution errors, and answer extraction errors, and we will stratify results by model heterogeneity proxies (number of unique IFC entity types and project scale). These additions will demonstrate that the observed advantages hold across heterogeneity levels and are not driven by a small subset of tasks. revision: yes

Circularity Check

0 steps flagged

No circularity detected in empirical evaluation

full rationale

The paper presents an empirical factorial ablation comparing adaptive exploration against static query generation on the newly introduced ifc-bench v2 benchmark. No mathematical derivations, equations, or parameter-fitting steps are described that reduce by construction to their own inputs. The central claim rests on direct performance measurements across LLM levels and augmentation strategies rather than self-definitional claims, fitted predictions, or load-bearing self-citations. The benchmark introduction does not create circularity because both methods are evaluated on identical tasks with explicit ground-truth construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper is empirical and introduces a methodological paradigm plus benchmark. It rests on the capability of LLMs to generate executable code for IFC data access.

axioms (1)
  • domain assumption LLM agents can reliably generate and execute code to explore and query IFC BIM models without critical errors or safety issues across heterogeneous models.
    The adaptive approach depends on the LLM producing correct, runnable code at each iteration.

pith-pipeline@v0.9.0 · 5459 in / 1220 out tokens · 42321 ms · 2026-05-10T15:53:48.708806+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

55 extracted references · 41 canonical work pages · 5 internal anchors

  1. [1]

    Borrmann, M

    A. Borrmann, M. König, C. Koch, J. Beetz (Eds.), Building Information Modeling: Technology Foundations and Industry Practice, Springer International Publishing, Cham, 2018. doi:10.1007/978-3-319-92862-3

  2. [2]

    Wang, BIM Handbook: A guide to Building Information Modeling for owners, managers, designers, engineers and contrac- tors, Construction Economics and Building 12 (3) (2012) 101–102

    X. Wang, BIM Handbook: A guide to Building Information Modeling for owners, managers, designers, engineers and contrac- tors, Construction Economics and Building 12 (3) (2012) 101–102. doi:10.5130/AJCEB.v12i3.2749

  3. [3]

    Y. Wei, X. Li, F. Petzold, Text-to-structure interpretation of user re- quests in BIM interaction, Automation in Construction 174 (2025) 106119. doi:10.1016/j.autcon.2025.106119

  4. [4]

    Olofsson Hallén, M

    K. Olofsson Hallén, M. Forsman, A. Eriksson, Interactions between Human, Technology and Organization in Building Information Mod- elling (BIM) - A scoping review of critical factors for the individual user, International Journal of Industrial Ergonomics 97 (2023) 103480. doi:10.1016/j.ergon.2023.103480

  5. [5]

    Y. Dong, Z. Zhan, Y. Hu, D. M. Doe, Z. Han, AI BIM coordina- tor for non-expert interaction in building design using LLM-driven multi-agent systems, Automation in Construction 180 (2025) 106563. doi:10.1016/j.autcon.2025.106563

  6. [6]

    Borrmann, J

    A. Borrmann, J. Beetz, C. Koch, T. Liebich, S. Muhic, Industry Foun- dation Classes: A Standardized Data Model for the Vendor-Neutral Exchange of Digital Building Models , in: A. Borrmann, M. König, C. Koch, J. Beetz (Eds.), Building Information Modeling: Technology Foundations and Industry Practice, Springer International Publishing, Cham, 2018, pp. 81–1...

  7. [7]

    Handy Kosasih, BIM Quality Control: Common Challenges and Best Practices (May 2024)

  8. [8]

    D. Guo, E. Onstein, A. D. L. Rosa, An Approach of Automatic SPARQL Generation for BIM Data Extraction, Applied Sciences 10 (24) (2020)

  9. [9]

    doi:10.3390/app10248794. 36

  10. [10]

    D. Liu, X. Zhou, Y. Li, An integrated method for BIM data retrieval using large language model, Architectural Science Review (Aug. 2025). doi:10.1080/00038628.2025.2538505

  11. [11]

    Hellin, S

    S. Hellin, S. Nousias, A. Borrmann, Natural Language Information Re- trieval from BIM Models: An LLM-Based Agentic Workflow Approach, in: Proceedings of the 2025 European Conference on Computing in Construction, 2025. doi:http://www.doi.org/10.35490/EC3.2025.265

  12. [12]

    L. Wang, C. Ma, X. Feng, Z. Zhang, H. Yang, J. Zhang, Z. Chen, J. Tang, X. Chen, Y. Lin, W. X. Zhao, Z. Wei, J. Wen, A survey on large language model based autonomous agents, Frontiers of Computer Science 18 (6) (2024) 186345. doi:10.1007/s11704-024-40231-1

  13. [13]

    Y. Zhu, T. Jin, Y. Pruksachatkun, A. K. Zhang, S. Liu, S. Cui, S. Kapoor, S. Longpre, K. Meng, R. Weiss, F. Barez, R. Gupta, J. Dhamala, J. Merizian, M. Giulianelli, H. Coppock, C. Ududec, A. Kellermann, J. S. Sekhon, J. Steinhardt, S. Schwettmann, A. Narayanan, M. Zaharia, I. Stoica, P. Liang, D. Kang, Establish- ing best practices in building rigorous a...

  14. [14]

    Austern, M

    G. Austern, M. Schwarz, B. Sternfeld, Comparing different Building representations for readability by Large Language Models, in: CAAD Futures 2025 – Catalytic Interfaces, HKU Data Repository, 2025, pp. 437–452. doi:10.25442/HKU.29350238

  15. [15]

    S. Zhou, U. Alon, F. F. Xu, Z. Wang, Z. Jiang, G. Neubig, DocPrompt- ing: Generating Code by Retrieving the Docs , in: Interna- tional Conference on Learning Representations ( ICLR), arXiv, 2022. doi:10.48550/ARXIV.2207.05987

  16. [16]

    TianleCai, XuezhiWang, TengyuMa, XinyunChen, DennyZhou, Large Language Models as Tool Makers, in: The Twelfth International Con- ference on Learning Representations, 2024

  17. [17]

    S. Shin, R. R. A. Issa, BIMASR: Framework for Voice-Based BIM Infor- mation Retrieval, Journal of Construction Engineering and Management 147 (10) (2021) 04021124. doi:10.1061/(ASCE)CO.1943-7862.0002138. 37

  18. [18]

    Pauwels, W

    P. Pauwels, W. Terkaj, EXPRESS to OWL for construction industry: Towards a recommendable and usable ifcOWL ontology, Automation in Construction 63 (2016) 100–133. doi:10.1016/j.autcon.2015.12.003

  19. [19]

    Lin, Z.-Z

    J.-R. Lin, Z.-Z. Hu, J.-P. Zhang, F.-Q. Yu, A Natural-Language-Based Approach to Intelligent Data Retrieval and Representation for Cloud BIM, Computer-Aided Civil and Infrastructure Engineering 31 (1) (2016) 18–33. doi:10.1111/mice.12151

  20. [20]

    Elghaish, J

    F. Elghaish, J. K. Chauhan, S. Matarneh, F. Pour Rahimian, M. R. Hosseini, Artificial intelligence-based voice assistant for BIM data management, Automation in Construction 140 (2022) 104320. doi:10.1016/j.autcon.2022.104320

  21. [21]

    J. Wang, X. Gao, X. Zhou, Q. Xie, Multi-scale Information Retrieval for BIM using Hierarchical Structure Modelling and Natural Language Pro- cessing , Journal of Information Technology in Construction 26 (2021) 409–426. doi:10.36680/j.itcon.2021.022

  22. [22]

    N. Wang, R. R. A. Issa, C. J. Anumba, NLP-Based Query-Answering System for Information Extraction from Building Information Models, Journal of Computing in Civil Engineering 36 (3) (2022) 04022004. doi:10.1061/(ASCE)CP.1943-5487.0001019

  23. [23]

    M. Yin, L. Tang, C. Webster, S. Xu, X. Li, H. Ying, An ontology- aided, natural language-based approach for multi-constraint BIM model querying, Journal of Building Engineering 76 (2023) 107066. doi:10.1016/j.jobe.2023.107066

  24. [24]

    M. Yin, L. Tang, C. Webster, J. Li, H. Li, Z. Wu, R. C. Cheng, Two- stage Text-to-BIMQL semantic parsing for building information model extraction using graph neural networks, Automation in Construction 152 (2023) 104902. doi:10.1016/j.autcon.2023.104902

  25. [25]

    P. Guo, H. Xue, J. Ma, J. C. P. Cheng, Advancing BIM information retrieval with an LLM-based query-domain-specific language and library code function alignment system, Automation in Construction 178 (2025) 106374. doi:10.1016/j.autcon.2025.106374. 38

  26. [26]

    P. T. Koh, H. Xue, J. Ma, J. C. P. Cheng, Cost-effective and minimal-intervention BIM information retrieval via condensed multi- LLM agent code generation, Automation in Construction 181 (2026) 106585. doi:10.1016/j.autcon.2025.106585

  27. [27]

    H. Gao, T. Hartmann, B. Zhong, K. Lia, H. Luo, Domain-Specific Fine- Tuning and Prompt-Based Learning: A Comparative Study for develop- ingNaturalLanguage-BasedBIMInformationRetrievalSystems(2025). doi:10.48550/ARXIV.2508.05676

  28. [28]

    Zheng, M

    J. Zheng, M. Fischer, Dynamic prompt-based virtual assistant frame- work for BIM information search, Automation in Construction 155 (2023) 105067. doi:10.1016/j.autcon.2023.105067

  29. [29]

    M. Li, Z. Wang, BuildingGPT: Query building semantic data us- ing large language models and vector-graph retrieval-augmented generation, Building and Environment 287 (2026) 113855. doi:10.1016/j.buildenv.2025.113855

  30. [30]

    M. Li, Z. Hu, P. Mohebi, S. Li, Z. Wang, Enhancing LLM-based building data query with chain-of-thought, retrieval-augmented genera- tion, and fine-tuning, Automation in Construction 182 (2026) 106738. doi:10.1016/j.autcon.2025.106738

  31. [31]

    G. Lee, S. Jang, S. Hyun, A Generalized LLM-Augmented BIM Frame- work: Application to a Speech-to-BIM system, in: Proceedings of the 41st International Conference of CIB W78, 2024

  32. [32]

    S. Yao, J. Zhao, D. Yu, N. Du, I. Shafran, K. Narasimhan, Y. Cao, ReAct: Synergizing Reasoning and Acting in Language Models, in: The Eleventh International Conference on Learning Representations, 2022

  33. [33]

    Y. Gao, F. Hu, C. Chai, Y. Weng, H. Li, Multi-agent frame- work for schema-guided reasoning and tool-augmented interaction with IFC models, Automation in Construction 186 (2026) 106888. doi:10.1016/j.autcon.2026.106888

  34. [34]

    Hellin, Sylvain, Nousias, Stavros, Borrmann, André, A Systematic Eval- uation Framework for AI-Driven BIM Question Answering Systems. 39

  35. [35]

    Hellin, Sylvain, Fuchs, Stefan, Nousias, Stavros, Borrmann, André, En- abling cross-study comparison: A framework for automated BIM-QA evaluation

  36. [36]

    X. Wang, Y. Chen, L. Yuan, Y. Zhang, Y. Li, H. Peng, H. Ji, Executable Code Actions Elicit Better LLM Agents, in: ICML’24: Proceedings of the 41st International Conference on Machine Learning, ICML’24, JMLR.org, 2024. doi:10.5555/3692070.3694124

  37. [37]

    J. Chen, S. Chen, J. Cao, J. Shen, S.-C. Cheung, When LLMs Meet APIDocumentation: CanRetrievalAugmentationAidCodeGeneration Just as It Helps Developers ? (2025). doi:10.48550/ARXIV.2503.15231

  38. [38]

    Stengel-Eskin, A

    E. Stengel-Eskin, A. Prasad, M. Bansal, ReGAL: Refactoring programs to discover generalizable abstractions, in: Forty-First International Con- ference on Machine Learning, 2024

  39. [39]

    AgentCoder: Multi-Agent-based Code Generation with Iterative Testing and Optimisation

    D. Huang, J. M. Zhang, M. Luck, Q. Bu, Y. Qing, H. Cui, AgentCoder: Multi-Agent-based Code Generation with Iterative Testing and Optimi- sation (May 2024). arXiv:2312.13010, doi:10.48550/arXiv.2312.13010

  40. [40]

    Y. Shen, K. Song, X. Tan, D. Li, W. Lu, Y. Zhuang, HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face, in: Advances in Neural Information Processing Systems, Curran Associates, Inc., 2023

  41. [41]

    S. G. Patil, T. Zhang, X. Wang, J. E. Gonzalez, Gorilla: Large Language Model Connected with Massive APIs , in: Advances in Neural Informa- tion Processing Systems, Curran Associates, Inc., 2024, pp. 126544– 126565. doi:10.52202/079017-4020

  42. [42]

    Toolformer: Language Models Can Teach Themselves to Use Tools

    T. Schick, J. Dwivedi-Yu, R. Dessì, R. Raileanu, M. Lomeli, L. Zettle- moyer, N. Cancedda, T. Scialom, Toolformer: Language Models Can Teach Themselves to Use Tools, in: Advances in Neural Information Processing Systems, Vol. 36, Curran Associates, Inc., 2023, pp. 68539– 68551. arXiv:2302.04761, doi:10.48550/arXiv.2302.04761

  43. [43]

    Z. Wang, G. Neubig, D. Fried, TroVE: Inducing verifiable and efficient toolboxes for solving programmatic tasks, in: Forty-First International Conference on Machine Learning, 2024. 40

  44. [44]

    A compute-matched re-evaluation of trove on math.arXiv preprint arXiv:2507.22069, 2025

    T. Sesterhenn, I. Berlot-Attwell, J. Zenkner, C. Bartelt, A Compute-Matched Re-Evaluation of TroVE on MATH (2025). doi:10.48550/ARXIV.2507.22069

  45. [45]

    Madaan, N

    A. Madaan, N. Tandon, P. Gupta, S. Hallinan, L. Gao, S. Wiegreffe, U. Alon, N. Dziri, S. Prabhumoye, Y. Yang, S. Gupta, B. P. Majumder, K. Hermann, S. Welleck, A. Yazdanbakhsh, P. Clark, Self-refine: Iter- ative refinement with self-feedback, in: Thirty-Seventh Conference on Neural Information Processing Systems, 2023

  46. [46]

    Reflexion: Language Agents with Verbal Reinforcement Learning

    N. Shinn, F. Cassano, E. Berman, A. Gopinath, K. Narasimhan, S. Yao, Reflexion: Language Agents with Verbal Reinforcement Learning, in: Thirty-Seventh Conference on Neural Information Processing Systems, arXiv, 2023. doi:10.48550/ARXIV.2303.11366

  47. [47]

    CoRRabs/1904.08375(2019), http://arxiv.org/abs/1904.08375

    R. Nogueira, W. Yang, J. Lin, K. Cho, Document Expansion by Query Prediction (2019). doi:10.48550/ARXIV.1904.08375

  48. [48]

    The probabilistic relevance framework: Bm25 and beyond

    S. Robertson, H. Zaragoza, The Probabilistic Relevance Framework: BM25 and Beyond, Foundations and Trends®in Information Retrieval 3 (4) (2009) 333–389. doi:10.1561/1500000019

  49. [49]

    G. V. Cormack, C. L. A. Clarke, S. Buettcher, Reciprocal rank fusion outperforms condorcet and individual rank learning methods, in: Pro- ceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM, Boston MA USA, 2009, pp. 758–759. doi:10.1145/1571941.1572114

  50. [50]

    J. Gu, X. Jiang, Z. Shi, H. Tan, X. Zhai, C. Xu, W. Li, Y. Shen, S. Ma, H. Liu, Y. Wang, J. Guo, A survey on LLM-as-a-judge, CoRR abs/2411.15594 (2024)

  51. [51]

    Darwin G

    J. Zhang, S. Hu, C. Lu, R. Lange, J. Clune, Darwin Godel Ma- chine: Open-Ended Evolution of Self-Improving Agents (May 2025). arXiv:2505.22954, doi:10.48550/arXiv.2505.22954

  52. [52]

    AlphaEvolve: A coding agent for scientific and algorithmic discovery

    A. Novikov, N. V˜ u, M. Eisenberger, E. Dupont, P.-S. Huang, A. Z. Wagner, S. Shirobokov, B. Kozlovskii, F. J. R. Ruiz, A. Mehrabian, M. P. Kumar, A. See, S. Chaudhuri, G. Holland, A. Davies, S. Nowozin, P. Kohli, M. Balog, AlphaEvolve: A coding agent for scientific and algorithmic discovery (2025). doi:10.48550/ARXIV.2506.13131. 41

  53. [53]

    URL{https://github.com/IfcOpenShell/IfcOpenShell}

    Thomas Krijnen, IfcOpenShell (2025). URL{https://github.com/IfcOpenShell/IfcOpenShell}

  54. [54]

    Solihin, C

    W. Solihin, C. Eastman, Classification of rules for automated BIM rule checking development, Automation in Construction 53 (2015) 69–82. doi:10.1016/j.autcon.2015.03.003

  55. [55]

    Sutton, The Bitter Lesson (Mar

    R. Sutton, The Bitter Lesson (Mar. 2019). URLhttp://incompleteideas.net/IncIdeas/BitterLesson.html Appendix A. Supplementary Materials The project repository contains the following supplementary materials: thecompletedocumentationretrievalalgorithm, thetrainingphasealgorithm and state machine for automated tool generation, representative ifc-bench task ex...