Architect-Ant: Editable Automatic Furnishing of Architectural Floor Plans
Pith reviewed 2026-06-27 12:55 UTC · model grok-4.3
The pith
A fine-tuned vision-language model with a coordinate DSL and procedural reasoning traces generates editable, constraint-respecting furniture layouts for architectural floor plans.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Architect-Ant is an editable automatic furnishing framework powered by a fine-tuned vision-language model. Furniture layouts are represented using a compact, coordinate-based domain-specific language that encodes object categories and placements relative to room geometry. Procedural reasoning traces that capture architectural constraints such as wall alignment, door and window clearance, circulation, fixture compatibility, and room-specific inventories are generated to supervise fine-tuning, after which preference optimization further refines placement quality. The DSL output can be rasterized into semantic masks to condition a Flux-based LoRA renderer while remaining directly editable.
What carries the argument
The compact coordinate-based domain-specific language (DSL) that encodes object categories and placements relative to room geometry, together with procedural reasoning traces that encode architectural constraints for model supervision.
If this is right
- The symbolic DSL layouts remain directly editable after generation, supporting iterative design workflows.
- Rasterization of the DSL into semantic masks allows conditioning of a separate image model to produce realistic blueprint-style furnished plans.
- The approach provides a scalable route to annotate and furnish large existing collections of structure-only floor plans.
- Preference optimization over candidate placements improves functional plausibility beyond the initial supervised outputs.
Where Pith is reading between the lines
- The same trace-generation and preference steps could be adapted to non-residential building types if new room-category inventories are supplied.
- Because the DSL is coordinate-based and relative to walls, it could serve as an intermediate representation for exporting layouts into CAD or 3D modeling tools.
- The separation between the symbolic layout stage and the image renderer stage allows independent improvement of either component without retraining the other.
Load-bearing premise
The procedurally generated reasoning traces accurately encode every relevant architectural constraint without introducing biases absent from real professional designs.
What would settle it
A side-by-side evaluation of Architect-Ant outputs against a held-out collection of human-designed professional floor plans, measuring the rate of violations in wall alignment, circulation clearance, and fixture compatibility.
Figures
read the original abstract
Furnished floor plans are fundamental to real estate visualization, interior design, and architectural workflows. However, progress in automatic furniture arrangement has been limited by the lack of real, professionally designed floor-plan datasets with object-level furniture annotations. To address this gap, we introduce AntPlan-270, a curated dataset of 270 architectural floor plans with per-room furniture bounding box annotations across ten residential room categories. Building on this dataset, we present Architect-Ant, an editable automatic furnishing framework powered by a fine-tuned vision-language model. Furniture layouts are represented using a compact, coordinate-based domain-specific language (DSL) that encodes object categories and placements relative to the room geometry. To improve spatial reasoning, we generate procedural reasoning traces that capture architectural constraints such as wall alignment, door and window clearance, circulation, fixture compatibility, and room-specific furniture inventories, and use them to supervise fine-tuning of the model. We then apply preference optimization over candidate object placements to further refine layout quality. The generated DSL can be rasterized into semantic masks and used to condition a Flux-based LoRA renderer, producing realistic blueprint-style furnished floor-plan images while preserving the editable symbolic layout. Experiments on layout furnishing show that Architect-Ant produces geometrically valid and functionally plausible layouts, and suggest a scalable path for furnishing larger structure-only floor-plan datasets.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces AntPlan-270, a dataset of 270 annotated residential floor plans, and Architect-Ant, a framework that represents layouts via a coordinate-based DSL, generates procedural reasoning traces encoding constraints such as wall alignment and circulation to supervise VLM fine-tuning, applies preference optimization, and renders outputs with a Flux LoRA. It claims that the resulting layouts are geometrically valid and functionally plausible and that the approach scales to larger structure-only datasets.
Significance. If the central claims hold, the work supplies a missing annotated dataset and a practical pipeline for automatic furnishing that preserves editability through the DSL while producing renderable outputs. The procedural-trace supervision and preference-optimization steps are concrete technical contributions that could be adopted by others working on spatial reasoning for floor plans.
major comments (2)
- [Abstract / Experiments] Abstract and Experiments section: the headline claim that Architect-Ant 'produces geometrically valid and functionally plausible layouts' is unsupported by any quantitative metrics, baseline comparisons, or description of how validity/plausibility were measured or scored. Without these, it is impossible to determine whether the outputs improve on prior methods or merely satisfy the authors' own synthetic objective.
- [Method (procedural reasoning traces)] Method section on procedural reasoning traces: the central assumption that the generated traces faithfully encode all relevant architectural constraints (wall alignment, circulation minima, fixture compatibility, room-specific inventories) is not externally validated against real professional designs, code-compliance checkers, or held-out expert layouts. This leaves the functional-plausibility claim dependent on an untested mapping from synthetic supervision to real-world acceptability.
minor comments (2)
- [Dataset] The paper should clarify the exact size and split of AntPlan-270 (training/validation/test) and whether any overlap exists with the procedural-trace generation process.
- [DSL definition] Notation for the DSL should be formalized with a grammar or BNF in an appendix so that reproducibility of the coordinate-based representation is unambiguous.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on evaluation and validation. We address each major comment below and will revise the manuscript accordingly where feasible.
read point-by-point responses
-
Referee: [Abstract / Experiments] Abstract and Experiments section: the headline claim that Architect-Ant 'produces geometrically valid and functionally plausible layouts' is unsupported by any quantitative metrics, baseline comparisons, or description of how validity/plausibility were measured or scored. Without these, it is impossible to determine whether the outputs improve on prior methods or merely satisfy the authors' own synthetic objective.
Authors: We agree the current manuscript does not report quantitative metrics, baseline comparisons, or explicit scoring procedures for geometric validity and functional plausibility. The experiments section focuses on qualitative demonstration of the DSL outputs and rendering pipeline. In revision we will add a dedicated evaluation subsection with concrete metrics (e.g., wall-alignment error, minimum circulation clearance, object-overlap ratios) computed on held-out AntPlan-270 rooms, plus comparison against a simple rule-based baseline. This will directly support or qualify the headline claim. revision: yes
-
Referee: [Method (procedural reasoning traces)] Method section on procedural reasoning traces: the central assumption that the generated traces faithfully encode all relevant architectural constraints (wall alignment, circulation minima, fixture compatibility, room-specific inventories) is not externally validated against real professional designs, code-compliance checkers, or held-out expert layouts. This leaves the functional-plausibility claim dependent on an untested mapping from synthetic supervision to real-world acceptability.
Authors: The traces are procedurally derived from the AntPlan-270 annotations and standard architectural heuristics (wall proximity, clearance rules, room-type inventories). We will expand the method section to state these sources explicitly and add a limitations paragraph acknowledging the absence of external validation against professional code checkers or expert layouts. Full external validation would require new data collection and is noted as future work rather than a claim of the present study. revision: partial
Circularity Check
No circularity; new dataset and experimental evaluation are self-contained.
full rationale
The paper introduces AntPlan-270 as a new curated dataset and generates procedural reasoning traces to supervise VLM fine-tuning, followed by preference optimization and rendering. The central claim of geometrically valid and functionally plausible layouts rests on experiments performed on this newly introduced dataset rather than on any fitted parameter renamed as a prediction or on a self-citation chain. No load-bearing step reduces by construction to the authors' prior outputs or definitions; the derivation chain is therefore independent of the patterns that would trigger a positive circularity finding.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Procedural reasoning traces accurately encode architectural constraints such as wall alignment, door clearance, and room-specific furniture inventories.
invented entities (1)
-
Coordinate-based domain-specific language (DSL) for furniture layouts
no independent evidence
Reference graph
Works this paper leans on
-
[1]
ProcTHOR: Large-Scale Embodied AI Using Procedural Generation.Ad- vances in Neural Information Processing Systems35 (2022), 5982–5994. https: //api.semanticscholar.org/CorpusID:249642405 Shihan Dou, Yan Liu, Haoxiang Jia, Enyu Zhou, Limao Xiong, Junjie Shan, Caishuang Huang, Xiao Wang, Xiaoran Fan, Zhiheng Xi, et al. 2024. StepCoder: Improving Code Genera...
2022
-
[2]
https://api.semanticscholar.org/CorpusID:269043104 Chenguo Lin and Yadong Mu. 2024. InstructScene: Instruction-Driven 3D Indoor Scene Synthesis with Semantic Graph Prior. InThe Twelfth International Conference on Learning Representations. https://openreview.net/forum?id=LtuRgL03pI Yuanqing Liu, Ziming Yang, Yulong Li, and Yue Yang. 2026. FloorplanVLM: A V...
arXiv 2024
-
[3]
arXiv:2407.17140 [cs.CV] https://arxiv.org/abs/2407.17140 Paul C
RT-DETRv2: Improved Baseline with Bag-of-Freebies for Real-Time Detection Transformer. arXiv:2407.17140 [cs.CV] https://arxiv.org/abs/2407.17140 Paul C. Merrell, Eric Schkufza, Zeyang Li, Maneesh Agrawala, and Vladlen Koltun
-
[4]
Interactive furniture layout using interior design guidelines.ACM SIGGRAPH 2011 papers(2011). https://api.semanticscholar.org/CorpusID:53246134 Nelson Nauata, Kai-Hung Chang, Chin-Yi Cheng, Greg Mori, and Yasutaka Fu- rukawa. 2020. House-GAN: Relational Generative Adversarial Networks for Graph- constrained House Layout Generation. InEuropean Conference o...
Pith/arXiv arXiv 2011
-
[5]
Samples” denotes the number of room samples that retain at least two whitelisted objects. “Classes
https://api.semanticscholar.org/CorpusID:266844416 Martin Weyssow, Aton Kamanda, Xin Zhou, and Houari Sahraoui. 2026. CodeUltraFeed- back: An LLM-as-a-Judge Dataset for Aligning Large Language Models to Coding Preferences.ACM Transactions on Software Engineering and Methodology35, 3 (2026), 1–36. https://api.semanticscholar.org/CorpusID:268385144 Wenming ...
arXiv 2026
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.