Tokenizing Buildings: A Transformer for Layout Synthesis

Ardavan Bidgoli; Jinmo Rhee; Manuel Ladron de Guevara; Michael Bergin; Vaidas Razgaitis

arxiv: 2512.04832 · v2 · submitted 2025-12-04 · 💻 cs.CV · cs.GR· cs.LG

Tokenizing Buildings: A Transformer for Layout Synthesis

Manuel Ladron de Guevara , Jinmo Rhee , Ardavan Bidgoli , Vaidas Razgaitis , Michael Bergin This is my paper

Pith reviewed 2026-05-17 01:48 UTC · model grok-4.3

classification 💻 cs.CV cs.GRcs.LG

keywords building layout synthesistransformer architectureBIMroom embeddingsautoregressive predictionsemantic retrievalgenerative designtokenization

0 comments

The pith

A Transformer model called Small Building Model generates functional building layouts by tokenizing architectural elements into sequences.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Small Building Model, a Transformer architecture for synthesizing layouts in Building Information Modeling scenes. It addresses how to convert mixed features of rooms and building elements into ordered sequences that keep their original structure. This tokenization feeds a unified embedding step and then a single Transformer backbone that can either produce room embeddings or predict new room entities step by step. The result is claimed to yield more usable layouts than general language or vision models and earlier specialized methods.

Core claim

Small Building Model unifies heterogeneous architectural features into a sparse attribute-feature matrix, learns joint representations through a unified embedding module, and trains a Transformer in encoder-only mode for high-fidelity room embeddings and in encoder-decoder mode for autoregressive prediction of residential room entities, producing layouts with fewer collisions, boundary violations, and better navigability.

What carries the argument

The unified embedding module that learns joint representations of categorical and continuous feature groups from the sparse attribute-feature matrix, feeding a Transformer backbone for both embedding extraction and autoregressive entity prediction.

If this is right

The learned room embeddings support strong semantic retrieval by clustering layouts according to type and topology.
In prediction mode the model produces residential layouts that satisfy functional constraints better than general-purpose or prior domain-specific approaches.
A single architecture handles both retrieval and generative tasks without separate models for each.
The sequence representation allows the model to respect room relationships and boundaries during generation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same tokenization strategy might transfer to other structured spatial domains such as furniture arrangement or urban block design.
Embedding the model inside existing BIM software could provide interactive layout suggestions during the design process.
Scaling the approach to larger commercial buildings would test whether the sequence length and feature unification remain effective.

Load-bearing premise

Unifying heterogeneous feature sets of architectural elements into sequences while preserving compositional structure enables reliable clustering and accurate autoregressive prediction of residential room entities.

What would settle it

Running Small Building Model and the compared baselines on a fresh collection of residential floor plans and measuring collision counts, boundary violations, and navigability scores on the generated layouts.

Figures

Figures reproduced from arXiv: 2512.04832 by Ardavan Bidgoli, Jinmo Rhee, Manuel Ladron de Guevara, Michael Bergin, Vaidas Razgaitis.

**Figure 1.** Figure 1: Small Building Model (SBM) is an encoder-decoder Transformer that generates functionally correct and semantically coherent [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗

**Figure 2.** Figure 2: Model overview. (a) BIM data extraction and assembly into a discrete set of token bundles. (b) SBM encoder stack processes the [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Qualitative comparison of generated layouts across five [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 4.** Figure 4: UMAP visualization of room embeddings colored by [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

read the original abstract

We introduce Small Building Model (SBM), a Transformer-based architecture for layout synthesis in Building Information Modeling (BIM) scenes. We address the question of how to tokenize buildings by unifying heterogeneous feature sets of architectural elements into sequences while preserving compositional structure. Such feature sets are represented as a sparse attribute-feature matrix that captures room properties. We then design a unified embedding module that learns joint representations of categorical and possibly correlated continuous feature groups. Lastly, we train a single Transformer backbone in two modes: an encoder-only pathway that yields high-fidelity room embeddings, and an encoder-decoder pipeline for autoregressive prediction of residential room entities, referred to as Data-Driven Entity Prediction (DDEP). Experiments across retrieval and generative layout synthesis show that SBM learns compact room embeddings that reliably cluster by type and topology, enabling strong semantic retrieval. In DDEP mode, SBM produces functionally sound layouts with fewer collisions and boundary violations, and improved navigability, outperforming general-purpose LLM/VLM baselines and recent domain-specific methods.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This applies a Transformer to BIM layouts via sparse feature matrix tokenization and dual encoder modes, but the outperformance claims lack the dataset and metric details needed to assess them.

read the letter

The key point is that this paper takes a standard Transformer setup and adapts it to building layout synthesis by converting heterogeneous BIM features into sequences with a sparse attribute-feature matrix, then runs the model in encoder-only mode for room embeddings or encoder-decoder mode for autoregressive prediction called DDEP. The tokenization aims to keep compositional structure intact while handling both categorical and continuous attributes in a unified embedding layer.

Referee Report

2 major / 1 minor

Summary. The paper introduces the Small Building Model (SBM), a Transformer-based architecture for layout synthesis in Building Information Modeling (BIM). It addresses tokenizing buildings by unifying heterogeneous feature sets of architectural elements into sequences via a sparse attribute-feature matrix that captures room properties. A unified embedding module learns joint representations of categorical and continuous features. The model is trained in encoder-only mode for high-fidelity room embeddings and in encoder-decoder mode for autoregressive Data-Driven Entity Prediction (DDEP) of residential room entities. Experiments are reported to show reliable clustering by type and topology for semantic retrieval, and in DDEP mode, functionally sound layouts with fewer collisions, boundary violations, and improved navigability, outperforming general-purpose LLM/VLM baselines and recent domain-specific methods.

Significance. If the experimental claims hold under rigorous validation, the work could contribute to automated layout synthesis in architecture and BIM by demonstrating how Transformers can handle heterogeneous, compositional data for both retrieval and generative tasks. The dual-mode training (embeddings plus autoregressive prediction) and the attempt to preserve structure in tokenization are constructive ideas that extend sequence modeling techniques to a structured design domain. Credit is due for focusing on practical functional metrics like navigability and collision avoidance rather than purely visual quality.

major comments (2)

[Abstract] Abstract: The central claim that SBM in DDEP mode produces layouts with fewer collisions and boundary violations and improved navigability, outperforming baselines, is load-bearing but unsupported by any quantitative metrics, dataset descriptions, baseline implementation details, or statistical significance tests. This absence prevents verification of the reported outperformance and leaves open the possibility that results depend on post-hoc choices or unstated evaluation protocols.
[Tokenization and embedding module] Tokenization and embedding module (as described in the abstract and methods): The unified embedding of the sparse attribute-feature matrix is presented as sufficient to enable accurate autoregressive prediction, but the description does not specify inclusion of explicit inter-room adjacency, pairwise spatial relations, or global layout tokens. Without these, the decoder may generate locally plausible sequences whose assembled geometry violates physical constraints, directly risking the claimed reductions in collisions and boundary violations.

minor comments (1)

[Abstract] Abstract: While DDEP is expanded on first use, the abstract would be clearer if it briefly indicated the scale of the residential room entities or the nature of the retrieval task (e.g., nearest-neighbor by embedding distance).

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed review of our manuscript. We address each major comment point by point below, providing clarifications based on the content of the paper and indicating where we will make revisions to improve clarity and verifiability.

read point-by-point responses

Referee: [Abstract] Abstract: The central claim that SBM in DDEP mode produces layouts with fewer collisions and boundary violations and improved navigability, outperforming baselines, is load-bearing but unsupported by any quantitative metrics, dataset descriptions, baseline implementation details, or statistical significance tests. This absence prevents verification of the reported outperformance and leaves open the possibility that results depend on post-hoc choices or unstated evaluation protocols.

Authors: We agree that the abstract, as a high-level summary, would be strengthened by incorporating specific quantitative support for the performance claims. The full manuscript contains an Experiments section that describes the dataset of residential BIM layouts, details the baseline implementations (including prompting strategies for general-purpose LLMs/VLMs and configurations for domain-specific methods), reports quantitative metrics for collisions, boundary violations, and navigability, and includes comparative results. We will revise the abstract to reference these results more explicitly and include representative quantitative improvements drawn from the experiments. revision: yes
Referee: [Tokenization and embedding module] Tokenization and embedding module (as described in the abstract and methods): The unified embedding of the sparse attribute-feature matrix is presented as sufficient to enable accurate autoregressive prediction, but the description does not specify inclusion of explicit inter-room adjacency, pairwise spatial relations, or global layout tokens. Without these, the decoder may generate locally plausible sequences whose assembled geometry violates physical constraints, directly risking the claimed reductions in collisions and boundary violations.

Authors: The referee correctly identifies that the tokenization centers on per-room attributes via the sparse matrix. However, because the autoregressive training uses complete layout sequences from real data, the decoder learns implicit inter-room adjacencies, pairwise relations, and global constraints through attention over the sequence. Post-generation assembly and evaluation explicitly quantify collisions and boundary violations, with results showing reductions relative to baselines. We will add a clarifying paragraph in the Methods section describing how relational structure emerges from the data-driven training and will consider an optional ablation with explicit adjacency tokens. revision: partial

Circularity Check

0 steps flagged

No circularity: empirical training on external data yields independent performance claims

full rationale

The paper presents a standard machine-learning pipeline: heterogeneous architectural features are tokenized into sequences via a sparse attribute-feature matrix, a unified embedding is learned, and a Transformer is trained in encoder-only and encoder-decoder (DDEP) modes. All reported outcomes—room embedding clusters, retrieval accuracy, and layout metrics such as collision count and navigability—are obtained by evaluating the trained model on held-out data against external baselines. No equations, fitted parameters, or self-citations are shown to reduce the central claims to their own inputs by construction. The derivation chain therefore remains self-contained and falsifiable outside the fitted values.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The approach rests on standard machine learning assumptions about sequence modeling and domain assumptions about building data structure; no invented physical entities.

free parameters (1)

embedding dimensions and Transformer hyperparameters
Typical learned or chosen parameters in the unified embedding module and backbone training, though exact values not stated in abstract.

axioms (1)

domain assumption Heterogeneous room features can be represented as a sparse attribute-feature matrix that preserves compositional structure when tokenized.
Invoked in the tokenization step to unify categorical and continuous features for the Transformer.

pith-pipeline@v0.9.0 · 5491 in / 1200 out tokens · 33611 ms · 2026-05-17T01:48:46.877935+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

unifying heterogeneous feature sets of architectural elements into sequences while preserving compositional structure... sparse attribute-feature matrix... unified embedding module... encoder-decoder pipeline for autoregressive prediction of residential room entities (DDEP)
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

DDEP produces functionally sound layouts with fewer collisions and boundary violations

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

29 extracted references · 29 canonical work pages · 2 internal anchors

[1]

The MIT Press, 1st edition, 2011

Mario Carpo.The Alphabet and the Algorithm. The MIT Press, 1st edition, 2011. 2

work page 2011
[2]

MIT press, 2017

Mario Carpo.The second digital turn: design beyond intel- ligence. MIT press, 2017. 2

work page 2017
[3]

Eastman.Spatial synthesis in computer-aided building design

Charles N. Eastman.Spatial synthesis in computer-aided building design. Elsevier Science Inc., 1975. 2

work page 1975
[4]

Charles N. Eastman. The Use of Computers Instead of Draw- ings in Building Design.AIA Journal, 63, 1975. 2

work page 1975
[5]

Akula, Xuehai He, Sugato Basu, Xin Eric Wang, and William Yang Wang

Weixi Feng, Wanrong Zhu, Tsu-Jui Fu, Varun Jampani, Ar- jun R. Akula, Xuehai He, Sugato Basu, Xin Eric Wang, and William Yang Wang. Layoutgpt: Compositional visual plan- ning and generation with large language models. InAd- vances in Neural Information Processing Systems, 2023. 2

work page 2023
[6]

A comparison of multidisciplinary design, analysis and optimization pro- cesses in the building construction and aerospace industries

Forest Flager and John Riker Haymaker. A comparison of multidisciplinary design, analysis and optimization pro- cesses in the building construction and aerospace industries

work page
[7]

Graph2plan: Learning floorplan generation from layout graphs.arXiv preprint arXiv:2004.13204, 2020

Ruizhen Hu, Zeyu Huang, Yuhan Tang, Oliver van Kaick, Hao Zhang, and Hui Huang. Graph2plan: Learning floorplan generation from layout graphs.arXiv preprint arXiv:2004.13204, 2020. 2

work page arXiv 2004
[8]

Mixed diffusion for 3d indoor scene synthesis.arXiv preprint arXiv:2405.21066, 2024

Song Hu et al. MiDiffusion: Mixed diffusion for 3d indoor scene synthesis.arXiv preprint arXiv:2405.21066, 2024. 2

work page arXiv 2024
[9]

Automated interior de- sign using a genetic algorithm

Peter K ´an and Hannes Kaufmann. Automated interior de- sign using a genetic algorithm. InProceedings of the 23rd ACM Symposium on Virtual Reality Software and Technol- ogy, pages 1–10, New York, NY , USA, 2017. Association for Computing Machinery. 2

work page 2017
[10]

Llm4cad: Multi-Modal large language models for three-dimensional computer-aided design generation

Xingang Li, Yuewan Sun, and Zhenghui Sha. Llm4cad: Multi-Modal large language models for three-dimensional computer-aided design generation. InProceedings of the ASME 2024 International Design Engineering Technical Conferences and Computers and Information in Engineer- ing Conference (IDETC/CIE 2024), page V006T06A015. ASME, 2024. 2

work page 2024
[11]

Towards General Text Embeddings with Multi-stage Contrastive Learning

Zehan Li, Xin Zhang, Yanzhao Zhang, Dingkun Long, Pengjun Xie, and Meishan Zhang. Towards general text embeddings with multi-stage contrastive learning.arXiv preprint arXiv:2308.03281, 2023. 8

work page internal anchor Pith review Pith/arXiv arXiv 2023
[12]

Gabrielle Littlefair, Niladri Shekhar Dutt, and Niloy J. Mi- tra. Flairgpt: Repurposing llms for interior designs, 2025. EUROGRAPHICS 2025. 2, 6

work page 2025
[13]

Exploration of the Indoor Lay- out Optimization Model in Computer-Aided Visual Analy- sis.Computer-Aided Design and Applications, pages 167– 180, 2024

Yang Liu and Guanjie Wang. Exploration of the Indoor Lay- out Optimization Model in Computer-Aided Visual Analy- sis.Computer-Aided Design and Applications, pages 167– 180, 2024. 2

work page 2024
[14]

Interactive furniture layout using in- terior design guidelines.ACM Trans

Paul Merrell, Eric Schkufza, Zeyang Li, Maneesh Agrawala, and Vladlen Koltun. Interactive furniture layout using in- terior design guidelines.ACM Trans. Graph., 30(4):87:1– 87:10, 2011. 2

work page 2011
[15]

Parametric design: a review and some ex- periences.Automation in Construction, 9(4):369–377, 2000

Javier Monedero. Parametric design: a review and some ex- periences.Automation in Construction, 9(4):369–377, 2000. 2

work page 2000
[16]

House-gan++: Generative adversarial layout re- finement network towards intelligent computational agent

Nelson Nauata, Wei-Chiu Ma Chang, Yasutaka Furukawa, and et al. House-gan++: Generative adversarial layout re- finement network towards intelligent computational agent. In CVPR, 2021. 2

work page 2021
[17]

Nguyen, Yiwen Chen, Vikram V oleti, Varun Jam- pani, and Huaizu Jiang

Hieu T. Nguyen, Yiwen Chen, Vikram V oleti, Varun Jam- pani, and Huaizu Jiang. Housecrafter: Lifting floorplans to 3d scenes with 2d diffusion model. InarXiv preprint arXiv:2406.20077, 2024. 2

work page arXiv 2024
[18]

Ran et al

X. Ran et al. Directlayout: Direct numerical layout gen- eration for 3d indoor scene synthesis.arXiv preprint arXiv:2506.05341, 2025. 2

work page arXiv 2025
[19]

Housediffusion: Vector floorplan genera- tion via a diffusion model with discrete and continuous de- noising, 2022

Mohammad Amin Shabani, Sepidehsadat Hosseini, and Ya- sutaka Furukawa. Housediffusion: Vector floorplan genera- tion via a diffusion model with discrete and continuous de- noising, 2022. 2

work page 2022
[20]

Web3D-based automatic furniture layout system using recur- sive case-based reasoning and floor field.Multimedia Tools and Applications, 78(4):5051–5079, 2019

Peihua Song, Youyi Zheng, Jinyuan Jia, and Yan Gao. Web3D-based automatic furniture layout system using recur- sive case-based reasoning and floor field.Multimedia Tools and Applications, 78(4):5051–5079, 2019. 2

work page 2019
[21]

Srivastava et al

D. Srivastava et al. Lay-your-scene: Natural scene lay- out generation with diffusion transformers.arXiv preprint arXiv:2505.04718, 2025. 2

work page arXiv 2025
[22]

Layoutvlm: Differentiable optimization of 3d layout via vision-language models

Fan-Yun Sun, Weiyu Liu, Siyi Gu, Dylan Lim, Goutam Bhat, Federico Tombari, Manling Li, Nick Haber, and Jiajun Wu. Layoutvlm: Differentiable optimization of 3d layout via vision-language models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 29469–29478, 2025. 2, 6

work page 2025
[23]

arXiv preprint arXiv:2508.18597 , year=

X. Sun et al. SemLayoutDiff: Semantic layout generation with diffusion models.arXiv preprint arXiv:2508.18597,

work page arXiv
[24]

Automation in Interior Space Planning: Utilizing Conditional Generative Adversarial Net- work Models to Create Furniture Layouts.Buildings, 13(7): 1793, 2023

Hanan Tanasra, Tamar Rott Shaham, Tomer Michaeli, Guy Austern, and Shany Barath. Automation in Interior Space Planning: Utilizing Conditional Generative Adversarial Net- work Models to Create Furniture Layouts.Buildings, 13(7): 1793, 2023. Publisher: Multidisciplinary Digital Publishing Institute. 2

work page 2023
[25]

Text Embeddings by Weakly-Supervised Contrastive Pre-training

Liang Wang, Nan Yang, Xiaolong Huang, Binxing Jiao, Linjun Yang, Daxin Jiang, Rangan Majumder, and Furu Wei. Text embeddings by weakly-supervised contrastive pre- training.arXiv preprint arXiv:2212.03533, 2022. 8

work page internal anchor Pith review Pith/arXiv arXiv 2022
[26]

C-pack: Packaged resources to advance general chi- nese embedding, 2023

Shitao Xiao, Zheng Liu, Peitian Zhang, and Niklas Muen- nighoff. C-pack: Packaged resources to advance general chi- nese embedding, 2023. 8

work page 2023
[27]

Graph2seq: Graph to se- quence learning with attention-based neural networks, 2018

Kun Xu, Lingfei Wu, Zhiguo Wang, Yansong Feng, Michael Witbrock, and Vadim Sheinin. Graph2seq: Graph to se- quence learning with attention-based neural networks, 2018. 2

work page 2018
[28]

Floorplan-deepseek (fpds): A multimodal approach to floorplan generation using vector-based next room prediction.arXiv preprint arXiv:2506.21562, 2025

Jun Yin, Pengyu Zeng, Jing Zhong, Peilin Li, Miao Zhang, Ran Luo, and Shuai Lu. Floorplan-deepseek (fpds): A mul- timodal approach to floorplan generation using vector-based next room prediction.arXiv preprint, arXiv:2506.21562,

work page arXiv
[29]

Housetune: Two-stage floorplan gener- ation with LLM assistance, 2024

Ziyang Zong, Guanying Chen, Zhaohuan Zhan, Fengcheng Yu, and Guang Tan. Housetune: Two-stage floorplan gener- ation with LLM assistance, 2024. 2 9

work page 2024

[1] [1]

The MIT Press, 1st edition, 2011

Mario Carpo.The Alphabet and the Algorithm. The MIT Press, 1st edition, 2011. 2

work page 2011

[2] [2]

MIT press, 2017

Mario Carpo.The second digital turn: design beyond intel- ligence. MIT press, 2017. 2

work page 2017

[3] [3]

Eastman.Spatial synthesis in computer-aided building design

Charles N. Eastman.Spatial synthesis in computer-aided building design. Elsevier Science Inc., 1975. 2

work page 1975

[4] [4]

Charles N. Eastman. The Use of Computers Instead of Draw- ings in Building Design.AIA Journal, 63, 1975. 2

work page 1975

[5] [5]

Akula, Xuehai He, Sugato Basu, Xin Eric Wang, and William Yang Wang

Weixi Feng, Wanrong Zhu, Tsu-Jui Fu, Varun Jampani, Ar- jun R. Akula, Xuehai He, Sugato Basu, Xin Eric Wang, and William Yang Wang. Layoutgpt: Compositional visual plan- ning and generation with large language models. InAd- vances in Neural Information Processing Systems, 2023. 2

work page 2023

[6] [6]

A comparison of multidisciplinary design, analysis and optimization pro- cesses in the building construction and aerospace industries

Forest Flager and John Riker Haymaker. A comparison of multidisciplinary design, analysis and optimization pro- cesses in the building construction and aerospace industries

work page

[7] [7]

Graph2plan: Learning floorplan generation from layout graphs.arXiv preprint arXiv:2004.13204, 2020

Ruizhen Hu, Zeyu Huang, Yuhan Tang, Oliver van Kaick, Hao Zhang, and Hui Huang. Graph2plan: Learning floorplan generation from layout graphs.arXiv preprint arXiv:2004.13204, 2020. 2

work page arXiv 2004

[8] [8]

Mixed diffusion for 3d indoor scene synthesis.arXiv preprint arXiv:2405.21066, 2024

Song Hu et al. MiDiffusion: Mixed diffusion for 3d indoor scene synthesis.arXiv preprint arXiv:2405.21066, 2024. 2

work page arXiv 2024

[9] [9]

Automated interior de- sign using a genetic algorithm

Peter K ´an and Hannes Kaufmann. Automated interior de- sign using a genetic algorithm. InProceedings of the 23rd ACM Symposium on Virtual Reality Software and Technol- ogy, pages 1–10, New York, NY , USA, 2017. Association for Computing Machinery. 2

work page 2017

[10] [10]

Llm4cad: Multi-Modal large language models for three-dimensional computer-aided design generation

Xingang Li, Yuewan Sun, and Zhenghui Sha. Llm4cad: Multi-Modal large language models for three-dimensional computer-aided design generation. InProceedings of the ASME 2024 International Design Engineering Technical Conferences and Computers and Information in Engineer- ing Conference (IDETC/CIE 2024), page V006T06A015. ASME, 2024. 2

work page 2024

[11] [11]

Towards General Text Embeddings with Multi-stage Contrastive Learning

Zehan Li, Xin Zhang, Yanzhao Zhang, Dingkun Long, Pengjun Xie, and Meishan Zhang. Towards general text embeddings with multi-stage contrastive learning.arXiv preprint arXiv:2308.03281, 2023. 8

work page internal anchor Pith review Pith/arXiv arXiv 2023

[12] [12]

Gabrielle Littlefair, Niladri Shekhar Dutt, and Niloy J. Mi- tra. Flairgpt: Repurposing llms for interior designs, 2025. EUROGRAPHICS 2025. 2, 6

work page 2025

[13] [13]

Exploration of the Indoor Lay- out Optimization Model in Computer-Aided Visual Analy- sis.Computer-Aided Design and Applications, pages 167– 180, 2024

Yang Liu and Guanjie Wang. Exploration of the Indoor Lay- out Optimization Model in Computer-Aided Visual Analy- sis.Computer-Aided Design and Applications, pages 167– 180, 2024. 2

work page 2024

[14] [14]

Interactive furniture layout using in- terior design guidelines.ACM Trans

Paul Merrell, Eric Schkufza, Zeyang Li, Maneesh Agrawala, and Vladlen Koltun. Interactive furniture layout using in- terior design guidelines.ACM Trans. Graph., 30(4):87:1– 87:10, 2011. 2

work page 2011

[15] [15]

Parametric design: a review and some ex- periences.Automation in Construction, 9(4):369–377, 2000

Javier Monedero. Parametric design: a review and some ex- periences.Automation in Construction, 9(4):369–377, 2000. 2

work page 2000

[16] [16]

House-gan++: Generative adversarial layout re- finement network towards intelligent computational agent

Nelson Nauata, Wei-Chiu Ma Chang, Yasutaka Furukawa, and et al. House-gan++: Generative adversarial layout re- finement network towards intelligent computational agent. In CVPR, 2021. 2

work page 2021

[17] [17]

Nguyen, Yiwen Chen, Vikram V oleti, Varun Jam- pani, and Huaizu Jiang

Hieu T. Nguyen, Yiwen Chen, Vikram V oleti, Varun Jam- pani, and Huaizu Jiang. Housecrafter: Lifting floorplans to 3d scenes with 2d diffusion model. InarXiv preprint arXiv:2406.20077, 2024. 2

work page arXiv 2024

[18] [18]

Ran et al

X. Ran et al. Directlayout: Direct numerical layout gen- eration for 3d indoor scene synthesis.arXiv preprint arXiv:2506.05341, 2025. 2

work page arXiv 2025

[19] [19]

Housediffusion: Vector floorplan genera- tion via a diffusion model with discrete and continuous de- noising, 2022

Mohammad Amin Shabani, Sepidehsadat Hosseini, and Ya- sutaka Furukawa. Housediffusion: Vector floorplan genera- tion via a diffusion model with discrete and continuous de- noising, 2022. 2

work page 2022

[20] [20]

Web3D-based automatic furniture layout system using recur- sive case-based reasoning and floor field.Multimedia Tools and Applications, 78(4):5051–5079, 2019

Peihua Song, Youyi Zheng, Jinyuan Jia, and Yan Gao. Web3D-based automatic furniture layout system using recur- sive case-based reasoning and floor field.Multimedia Tools and Applications, 78(4):5051–5079, 2019. 2

work page 2019

[21] [21]

Srivastava et al

D. Srivastava et al. Lay-your-scene: Natural scene lay- out generation with diffusion transformers.arXiv preprint arXiv:2505.04718, 2025. 2

work page arXiv 2025

[22] [22]

Layoutvlm: Differentiable optimization of 3d layout via vision-language models

Fan-Yun Sun, Weiyu Liu, Siyi Gu, Dylan Lim, Goutam Bhat, Federico Tombari, Manling Li, Nick Haber, and Jiajun Wu. Layoutvlm: Differentiable optimization of 3d layout via vision-language models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 29469–29478, 2025. 2, 6

work page 2025

[23] [23]

arXiv preprint arXiv:2508.18597 , year=

X. Sun et al. SemLayoutDiff: Semantic layout generation with diffusion models.arXiv preprint arXiv:2508.18597,

work page arXiv

[24] [24]

Automation in Interior Space Planning: Utilizing Conditional Generative Adversarial Net- work Models to Create Furniture Layouts.Buildings, 13(7): 1793, 2023

Hanan Tanasra, Tamar Rott Shaham, Tomer Michaeli, Guy Austern, and Shany Barath. Automation in Interior Space Planning: Utilizing Conditional Generative Adversarial Net- work Models to Create Furniture Layouts.Buildings, 13(7): 1793, 2023. Publisher: Multidisciplinary Digital Publishing Institute. 2

work page 2023

[25] [25]

Text Embeddings by Weakly-Supervised Contrastive Pre-training

Liang Wang, Nan Yang, Xiaolong Huang, Binxing Jiao, Linjun Yang, Daxin Jiang, Rangan Majumder, and Furu Wei. Text embeddings by weakly-supervised contrastive pre- training.arXiv preprint arXiv:2212.03533, 2022. 8

work page internal anchor Pith review Pith/arXiv arXiv 2022

[26] [26]

C-pack: Packaged resources to advance general chi- nese embedding, 2023

Shitao Xiao, Zheng Liu, Peitian Zhang, and Niklas Muen- nighoff. C-pack: Packaged resources to advance general chi- nese embedding, 2023. 8

work page 2023

[27] [27]

Graph2seq: Graph to se- quence learning with attention-based neural networks, 2018

Kun Xu, Lingfei Wu, Zhiguo Wang, Yansong Feng, Michael Witbrock, and Vadim Sheinin. Graph2seq: Graph to se- quence learning with attention-based neural networks, 2018. 2

work page 2018

[28] [28]

Floorplan-deepseek (fpds): A multimodal approach to floorplan generation using vector-based next room prediction.arXiv preprint arXiv:2506.21562, 2025

Jun Yin, Pengyu Zeng, Jing Zhong, Peilin Li, Miao Zhang, Ran Luo, and Shuai Lu. Floorplan-deepseek (fpds): A mul- timodal approach to floorplan generation using vector-based next room prediction.arXiv preprint, arXiv:2506.21562,

work page arXiv

[29] [29]

Housetune: Two-stage floorplan gener- ation with LLM assistance, 2024

Ziyang Zong, Guanying Chen, Zhaohuan Zhan, Fengcheng Yu, and Guang Tan. Housetune: Two-stage floorplan gener- ation with LLM assistance, 2024. 2 9

work page 2024