ShellMaker: Language-Guided Exterior Completion under Structural Constraints
Pith reviewed 2026-07-01 05:52 UTC · model grok-4.3
The pith
ShellMaker completes building exteriors from scaffolds and style prompts while keeping footprint, walls, and openings fixed.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Given a building scaffold and a text style prompt, ShellMaker generates a complete exterior mesh with PBR materials by combining parametric roof generation, LLM-based part-aware prompt refinement, joint wall-roof material retrieval, and geometry-aware assembly while maintaining structural consistency.
What carries the argument
The integration of parametric roof generation, LLM-based part-aware prompt refinement, joint wall-roof material retrieval, and geometry-aware assembly on a format-agnostic scaffold representation.
If this is right
- The framework generalizes across indoor generators, CityGML, and CAD inputs.
- Structural consistency is preserved with the fixed footprint, wall geometry, and opening semantics.
- Architectural coherence improves compared with retrieval-only and unconstrained generative baselines.
- The output includes PBR materials that align with the input style prompt.
Where Pith is reading between the lines
- The method could support automatic population of city-scale 3D models from partial data.
- It might pair directly with interior-only generators to produce complete buildings in one pipeline.
- Testing on highly irregular or historic building shapes would reveal how far the structural constraints hold.
- Real-time applications such as game level design could use the output meshes with little extra cleanup.
Load-bearing premise
The components of parametric roof generation, LLM-based prompt refinement, joint material retrieval, and geometry-aware assembly can reliably produce coherent exteriors without violating the fixed footprint, wall geometry, and opening semantics of the input scaffold.
What would settle it
A set of test cases where the output meshes show walls or roofs extending beyond the scaffold footprint or changing the positions of openings would show that structural constraints are not maintained.
Figures
read the original abstract
Despite advances in indoor scene generation, synthesizing coherent building exteriors consistent with generated interiors remains largely unexplored. Existing methods can generate floor plans and wall layouts but typically stop at a structural shell, lacking stylistically consistent facades and roofs. Completing these exteriors is challenging because the footprint, wall geometry, and opening semantics must remain fixed-constraints that unconstrained generative models often violate. We introduce ShellMaker, a language-guided exterior completion framework that operates under these structural constraints. Given a building scaffold and a text style prompt, ShellMaker generates a complete exterior mesh with PBR materials by combining parametric roof generation, LLM-based part-aware prompt refinement, joint wall-roof material retrieval, and geometry-aware assembly. Operating on a format agnostic scaffold representation, ShellMaker generalizes to indoor generators, CityGML, and CAD inputs, while maintaining structural consistency and improving architectural coherence over retrieval and unconstrained generative baselines. The project page is available at https://ruiqixu37.github.io/ShellMaker_web/
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces ShellMaker, a language-guided exterior completion framework for building scaffolds. Given a scaffold (format-agnostic, supporting indoor generators, CityGML, CAD) and text style prompt, it generates a complete exterior mesh with PBR materials via four components: parametric roof generation, LLM-based part-aware prompt refinement, joint wall-roof material retrieval, and geometry-aware assembly. The method claims to enforce structural constraints (fixed footprint, wall geometry, opening semantics) while improving architectural coherence over retrieval and unconstrained generative baselines.
Significance. If validated, the work would address a clear gap in 3D scene generation by enabling stylistically consistent exteriors that respect generated interiors. The format-agnostic scaffold handling and explicit structural constraints are practical strengths for downstream applications in architecture and simulation.
major comments (1)
- [Abstract] Abstract: the central claim of 'improving architectural coherence over retrieval and unconstrained generative baselines' is unsupported by any quantitative results, error analysis, baseline details, or validation data, which is load-bearing for assessing whether the four components reliably enforce constraints without violation.
Simulated Author's Rebuttal
We thank the referee for their review and the identification of this important point regarding the abstract. We address the comment below.
read point-by-point responses
-
Referee: [Abstract] Abstract: the central claim of 'improving architectural coherence over retrieval and unconstrained generative baselines' is unsupported by any quantitative results, error analysis, baseline details, or validation data, which is load-bearing for assessing whether the four components reliably enforce constraints without violation.
Authors: We agree that the abstract's claim would be more robust with explicit quantitative support. The manuscript currently relies on qualitative visual comparisons (Section 5) to illustrate improvements in coherence and constraint adherence, but does not report numerical metrics, baseline implementation details, or error analysis for this specific claim. In the revised manuscript we will add a quantitative evaluation subsection that includes: (1) baseline descriptions with implementation details, (2) metrics such as constraint violation rate (e.g., footprint/wall deviation) and perceptual coherence scores from a user study, and (3) error analysis across the four components. The abstract will be updated to reflect the new results. revision: yes
Circularity Check
No significant circularity detected
full rationale
The manuscript presents a descriptive system framework (parametric roof generation + LLM prompt refinement + material retrieval + geometry-aware assembly) operating on a fixed scaffold input. No equations, fitted parameters, uniqueness theorems, or derivation chains appear in the provided text or abstract. All components are presented as independent engineering choices whose outputs are evaluated against external baselines; none reduce to the inputs by construction or via self-citation. The work is therefore self-contained against external benchmarks with no circular steps.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Building scaffolds provide fixed footprint, wall geometry, and opening semantics that must be preserved during exterior generation.
Reference graph
Works this paper leans on
-
[1]
Bokhovkin, A., Meng, Q., Tulsiani, S., Dai, A.: Scenefactor: Factored latent 3d diffusion for controllable 3d scene generation (2024)
2024
-
[2]
In: Proceedings of the IEEE/CVF international conference on computer vision
Chang, K.H., Cheng, C.Y., Luo, J., Murata, S., Nourbakhsh, M., Tsuji, Y.: Building-gan: Graph-conditioned architectural volumetric design generation. In: Proceedings of the IEEE/CVF international conference on computer vision. pp. 11956–11965 (2021)
2021
-
[3]
SAM 3D: 3Dfy Anything in Images
Chen, X., Chu, F.J., Gleize, P., Liang, K.J., Sax, A., Tang, H., Wang, W., Guo, M., Hardin, T., Li, X., et al.: Sam 3d: 3dfy anything in images. arXiv preprint arXiv:2511.16624 (2025)
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[4]
In: ACM SIGGRAPH 2024 Conference Papers
Dong, W., Yang, B., Ma, L., Liu, X., Cui, L., Bao, H., Ma, Y., Cui, Z.: Coin3d: Controllable and interactive 3d assets generation with proxy-guided conditioning. In: ACM SIGGRAPH 2024 Conference Papers. pp. 1–10 (2024)
2024
-
[5]
arXiv preprint arXiv:2512.05343 (2025)
Fedele, E., Engelmann, F., Huang, I., Litany, O., Pollefeys, M., Guibas, L.: Space- control: Introducing test-time spatial control to 3d generative modeling. arXiv preprint arXiv:2512.05343 (2025)
-
[6]
In: Proceedings of the Computer Vision and Pattern Recognition Conference
Gu, Z., Cui, Y., Li, Z., Wei, F., Ge, Y., Gu, J., Liu, M.Y., Davis, A., Ding, Y.: Ar- tiscene: Language-driven artistic 3d scene generation through image intermediary. In: Proceedings of the Computer Vision and Pattern Recognition Conference. pp. 2891–2901 (2025)
2025
-
[7]
In: Proceedings of the SIGGRAPH Asia 2025 Conference Papers
Gumin, M., Han, D.H., Yoo, S.J., Ganeshan, A., Jones, R.K., Fu, K., Aguina- Kang, R., Morris, S., Ritchie, D.: Procedural scene programs for open-universe scene generation: Llm-free error correction via program search. In: Proceedings of the SIGGRAPH Asia 2025 Conference Papers. pp. 1–11 (2025)
2025
-
[8]
In: Proceedings of the Special Interest Group on Computer Graphics and Interactive Techniques Conference Conference Papers
Huang, J., Wang, C., Li, L., Huang, C., Dai, Q., Xu, W.: Buildingblock: A hybrid approach for structured building generation. In: Proceedings of the Special Interest Group on Computer Graphics and Interactive Techniques Conference Conference Papers. pp. 1–11 (2025)
2025
-
[9]
Hunyuan3D 2.1: From Images to High-Fidelity 3D Assets with Production-Ready PBR Material
Hunyuan3D, T., Yang, S., Yang, M., Feng, Y., Huang, X., Zhang, S., He, Z., Luo, D., Liu, H., Zhao, Y., et al.: Hunyuan3d 2.1: From images to high-fidelity 3d assets with production-ready pbr material. arXiv preprint arXiv:2506.15442 (2025)
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[10]
arXiv preprint arXiv:2508.18531 (2025)
Jin, Z., Feng, A.: Sat-skylines: 3d building generation from satellite imagery and coarse geometric priors. arXiv preprint arXiv:2508.18531 (2025)
-
[11]
In: Proceedings of the Computer Vision and Pattern Recognition Conference
Lin, J., Yang, X., Chen, M., Xu, Y., Yan, D., Wu, L., Xu, X., Xu, L., Zhang, S., Chen, Y.C.: Kiss3dgen: Repurposing image diffusion models for 3d asset genera- tion. In: Proceedings of the Computer Vision and Pattern Recognition Conference. pp. 5870–5880 (2025)
2025
- [12]
-
[13]
In: ACM SIGGRAPH 2006 Papers, pp
Müller, P., Wonka, P., Haegler, S., Ulmer, A., Van Gool, L.: Procedural modeling of buildings. In: ACM SIGGRAPH 2006 Papers, pp. 614–623 (2006)
2006
-
[14]
In: Computer Graphics Forum
Nishida, G., Bousseau, A., Aliaga, D.G.: Procedural modeling of a building from a single image. In: Computer Graphics Forum. vol. 37, pp. 415–429. Wiley Online Library (2018)
2018
-
[15]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
Qiu, L., Chen, G., Gu, X., Zuo, Q., Xu, M., Wu, Y., Yuan, W., Dong, Z., Bo, L., Han, X.: Richdreamer: A generalizable normal-depth diffusion model for detail richness in text-to-3d. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 9914–9925 (2024) Title Suppressed Due to Excessive Length 17
2024
-
[16]
In: Proceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition (CVPR)
Raistrick, A., Mei, L., Kayan, K., Yan, D., Zuo, Y., Han, B., Wen, H., Parakh, M., Alexandropoulos, S., Lipson, L., Ma, Z., Deng, J.: Infinigen indoors: Photorealistic indoor scenes using procedural generation. In: Proceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition (CVPR). pp. 21783–21794 (June 2024)
2024
-
[17]
arXiv (2025)
Seed, B.: Seed3d 1.0: From images to high-fidelity simulation-ready 3d assets. arXiv (2025)
2025
-
[18]
Tam, H.I.I., Pun, H.I.D., Wang, A.T., Chang, A.X., Savva, M.: SceneEval: Evalu- ating semantic coherence in text-conditioned 3D indoor scene synthesis (2025)
2025
-
[19]
Gemini: A Family of Highly Capable Multimodal Models
Team, G., Anil, R., Borgeaud, S., Alayrac, J.B., Yu, J., Soricut, R., Schalkwyk, J., Dai, A.M., Hauth, A., Millican, K., et al.: Gemini: a family of highly capable multimodal models. arXiv preprint arXiv:2312.11805 (2023)
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[20]
In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition
Vanegas, C.A., Aliaga, D.G., Benes, B.: Building reconstruction using manhattan- world grammars. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. pp. 358–365. IEEE (2010)
2010
-
[21]
arXiv preprint arXiv:2012.09793 (2020)
Wang, X., Yeshwanth, C., Nießner, M.: Sceneformer: Indoor scene generation with transformers. arXiv preprint arXiv:2012.09793 (2020)
-
[22]
Wu, S., Lin, Y., Zhang, F., Zeng, Y., Yang, Y., Bao, Y., Qian, J., Zhu, S., Cao, X., Torr, P., et al.: Direct3d-s2: Gigascale 3d generation made easy with spatial sparse attention. arXiv preprint arXiv:2505.17412 (2025)
-
[23]
Native and Compact Structured Latents for 3D Generation
Xiang, J., Chen, X., Xu, S., Wang, R., Lv, Z., Deng, Y., Zhu, H., Dong, Y., Zhao, H., Yuan, N.J., et al.: Native and compact structured latents for 3d generation. arXiv preprint arXiv:2512.14692 (2025)
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[24]
In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition
Xiang, J., Lv, Z., Xu, S., Deng, Y., Wang, R., Zhang, B., Chen, D., Tong, X., Yang, J.: Structured 3d latents for scalable and versatile 3d generation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 21469–21480 (2025)
2025
-
[25]
In: Proceedings of the IEEE/CVF International Con- ference on Computer Vision
Xie, Y., Xu, C., Rakotosaona, M.J., Rim, P., Tombari, F., Keutzer, K., Tomizuka, M., Zhan, W.: Sparsefusion: Fusing multi-modal sparse representations for multi- sensor 3d object detection. In: Proceedings of the IEEE/CVF International Con- ference on Computer Vision. pp. 17591–17602 (2023)
2023
-
[26]
In: Advances in Neural Information Processing Systems (NeurIPS) (2025)
Yang, Y., Jia, B., Zhang, S., Huang, S.: Sceneweaver: All-in-one 3d scene synthesis with an extensible and self-reflective agent. In: Advances in Neural Information Processing Systems (NeurIPS) (2025)
2025
-
[27]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
Yang, Y., Sun, F.Y., Weihs, L., VanderBilt, E., Herrasti, A., Han, W., Wu, J., Haber, N., Krishna, R., Liu, L., et al.: Holodeck: Language guided generation of 3d embodied ai environments. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 16227–16237 (2024)
2024
-
[28]
arXiv preprint arXiv:2504.07943 (2025)
Yang, Y., Guo, Y.C., Huang, Y., Zou, Z.X., Yu, Z., Li, Y., Cao, Y.P., Liu, X.: Holopart: Generative 3d part amodal segmentation. arXiv preprint arXiv:2504.07943 (2025)
-
[29]
In: Proceed- ings of the IEEE/CVF International Conference on Computer Vision
Ye, C., Wu, Y., Lu, Z., Chang, J., Guo, X., Zhou, J., Zhao, H., Han, X.: Hi3dgen: High-fidelity 3d geometry generation from images via normal bridging. In: Proceed- ings of the IEEE/CVF International Conference on Computer Vision. pp. 25050– 25061 (2025) 18 R. Xu & D. Aliaga ShellMaker Supplementary Material Wall and Roof Textures We curate a diverse data...
2025
-
[30]
aged red brick wall
and|∆s| ∈[0,1]is the absolute saturation difference, givingC color ij ∈[0,1]. Frequency Compatibility.Foreachdiffusemapwetakea2DFFTofitsluminance image and radially average the power spectrum into a vectorP∈RR.C freq is the cosine similarity of the wall and roof profiles: Cfreq ij = Pw ·P r ∥Pw∥ ∥Pr∥ ∈[0,1], (6) which lies in[0,1]since the spectrum entrie...
-
[33]
Do NOT default to generic porch/awning/handrail ele- ments unless they suit the style
Choose entrance details that are authentic to the architectural style de- scribed above. Do NOT default to generic porch/awning/handrail ele- ments unless they suit the style. Window – System Message: Yougenerateconcisetextpromptsfor3Dwindowmodels.Eachpromptmust describe a single isolated architectural element — the window only, with NO surrounding walls,...
-
[34]
EachpromptMUSTdescribeanisolatedarchitecturalelementwithclean background and no surroundings
-
[35]
Do NOT include walls, buildings, ground, or any environment context
-
[36]
prompts” key whose value is an array of strings. Example: {
Choose window details that are authentic to the architectural style de- scribed above. Do NOT default to generic muntins/shutters/sill unless they suit the style. [If material is glass: - Glass must be transparent and clear.] Roof Ornament – System Message: You generate concise text prompts for 3D roof ornament models. Allowed or- nament categories: {cat_...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.