pith. sign in

arxiv: 2606.31680 · v1 · pith:Y6NO2EWHnew · submitted 2026-06-30 · 💻 cs.CV

ShellMaker: Language-Guided Exterior Completion under Structural Constraints

Pith reviewed 2026-07-01 05:52 UTC · model grok-4.3

classification 💻 cs.CV
keywords building exterior completionlanguage-guided generationstructural constraintsparametric roof generationmaterial retrievalarchitectural coherence3D mesh generationPBR materials
0
0 comments X

The pith

ShellMaker completes building exteriors from scaffolds and style prompts while keeping footprint, walls, and openings fixed.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces ShellMaker as a framework that takes a partial building scaffold and a text style prompt to produce a full exterior mesh including PBR materials. It combines several steps to ensure the result respects the original structure instead of freely altering walls or openings as many other generators do. A sympathetic reader would care because indoor scene tools often leave buildings incomplete at the shell stage, and manual fixes are slow. The method works with different input formats such as indoor generators, CityGML, and CAD files. If the components work together, they deliver exteriors that stay consistent with the given constraints and look more architecturally coherent than simple retrieval or unconstrained generation.

Core claim

Given a building scaffold and a text style prompt, ShellMaker generates a complete exterior mesh with PBR materials by combining parametric roof generation, LLM-based part-aware prompt refinement, joint wall-roof material retrieval, and geometry-aware assembly while maintaining structural consistency.

What carries the argument

The integration of parametric roof generation, LLM-based part-aware prompt refinement, joint wall-roof material retrieval, and geometry-aware assembly on a format-agnostic scaffold representation.

If this is right

  • The framework generalizes across indoor generators, CityGML, and CAD inputs.
  • Structural consistency is preserved with the fixed footprint, wall geometry, and opening semantics.
  • Architectural coherence improves compared with retrieval-only and unconstrained generative baselines.
  • The output includes PBR materials that align with the input style prompt.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The method could support automatic population of city-scale 3D models from partial data.
  • It might pair directly with interior-only generators to produce complete buildings in one pipeline.
  • Testing on highly irregular or historic building shapes would reveal how far the structural constraints hold.
  • Real-time applications such as game level design could use the output meshes with little extra cleanup.

Load-bearing premise

The components of parametric roof generation, LLM-based prompt refinement, joint material retrieval, and geometry-aware assembly can reliably produce coherent exteriors without violating the fixed footprint, wall geometry, and opening semantics of the input scaffold.

What would settle it

A set of test cases where the output meshes show walls or roofs extending beyond the scaffold footprint or changing the positions of openings would show that structural constraints are not maintained.

Figures

Figures reproduced from arXiv: 2606.31680 by Daniel Aliaga, Ruiqi Xu.

Figure 1
Figure 1. Figure 1: ShellMaker. Given a building scaffold and a style text prompt, our system pro￾duces a complete, textured 3D exterior. The wall geometry and window/door opening placements are preserved exactly across all style variations. paired datasets linking structured building layouts to fully textured exterior meshes are not currently available. Consequently, a gap remains between per￾ceptually realistic generation a… view at source ↗
Figure 2
Figure 2. Figure 2: ShellMaker Pipeline. Given a structured building scaffold (e.g., indoor scene generators, CityGML, or CAD models) and a text style prompt, ShellMaker synthe￾sizes a complete stylized 3D exterior through three stages. The structural stage parses the scaffold and generates parametric roof geometry, while the style stage refines the prompt using an LLM, retrieves compatible wall–roof materials, and generates … view at source ↗
Figure 3
Figure 3. Figure 3: Parametric Roof Generation. Top: canonical roof types supported by Shell￾Maker (flat, hip, gable, pyramid, and half-hip) with example pitch angles θ. Bottom: two roof generation modes for complex footprints. original footprint and naturally supports conformal roof generation for non￾convex layouts. The second strategy enables a wider range of roof compositions by provid￾ing an alternative greedy rectangle … view at source ↗
Figure 4
Figure 4. Figure 4: Part-aware Prompt Refinement. A free-form user prompt is passed to an instruction-tuned LLM, which generates a structured, style-consistent text prompt for each architectural component. The system instruction enforces that each prompt describes a single isolated architectural element with no surrounding walls or environmental context, en￾suring that the generated meshes can be composited cleanly onto the s… view at source ↗
Figure 5
Figure 5. Figure 5: Two-stage part synthesis intermediates. LLM-refined part prompts, Nano Ba￾nana reference images, and reconstructed Trellis-2 meshes across three styles and two element types. Style-specific geometric and decorative cues introduced during prompt refinement carry through to the final textured parts. C mat assigns high scores to commonly co-occurring architectural pairings (e.g. brick walls with clay-tile roo… view at source ↗
Figure 6
Figure 6. Figure 6 [PITH_FULL_IMAGE:figures/full_fig_p010_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Style gallery. (Top) Two input scaffolds each rendered under two style prompts, demonstrating preservation of wall geometry and opening placements across variants. (Bottom) Nine additional styles applied to various scaffolds, illustrating the breadth of architectural styles ShellMaker can automatically generate with appropriate roof forms, materials, and detailing [PITH_FULL_IMAGE:figures/full_fig_p013_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Representative samples from the curated PBR texture library used by Shell￾Maker. Left column shows facade materials and right column shows roof materials. Each material includes diffuse, normal, and roughness maps [PITH_FULL_IMAGE:figures/full_fig_p018_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Illustrative Failure Cases of ShellMaker [PITH_FULL_IMAGE:figures/full_fig_p019_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: illustrates an example where a row of buildings generated by an indoor scene pipeline (Holodeck) is processed independently by ShellMaker to produce fully textured exterior meshes. Despite being generated per-building, the re￾sulting block exhibits consistent architectural styling and material coherence, demonstrating that ShellMaker can scale naturally to neighborhood- or block￾level scenes without modif… view at source ↗
read the original abstract

Despite advances in indoor scene generation, synthesizing coherent building exteriors consistent with generated interiors remains largely unexplored. Existing methods can generate floor plans and wall layouts but typically stop at a structural shell, lacking stylistically consistent facades and roofs. Completing these exteriors is challenging because the footprint, wall geometry, and opening semantics must remain fixed-constraints that unconstrained generative models often violate. We introduce ShellMaker, a language-guided exterior completion framework that operates under these structural constraints. Given a building scaffold and a text style prompt, ShellMaker generates a complete exterior mesh with PBR materials by combining parametric roof generation, LLM-based part-aware prompt refinement, joint wall-roof material retrieval, and geometry-aware assembly. Operating on a format agnostic scaffold representation, ShellMaker generalizes to indoor generators, CityGML, and CAD inputs, while maintaining structural consistency and improving architectural coherence over retrieval and unconstrained generative baselines. The project page is available at https://ruiqixu37.github.io/ShellMaker_web/

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 0 minor

Summary. The paper introduces ShellMaker, a language-guided exterior completion framework for building scaffolds. Given a scaffold (format-agnostic, supporting indoor generators, CityGML, CAD) and text style prompt, it generates a complete exterior mesh with PBR materials via four components: parametric roof generation, LLM-based part-aware prompt refinement, joint wall-roof material retrieval, and geometry-aware assembly. The method claims to enforce structural constraints (fixed footprint, wall geometry, opening semantics) while improving architectural coherence over retrieval and unconstrained generative baselines.

Significance. If validated, the work would address a clear gap in 3D scene generation by enabling stylistically consistent exteriors that respect generated interiors. The format-agnostic scaffold handling and explicit structural constraints are practical strengths for downstream applications in architecture and simulation.

major comments (1)
  1. [Abstract] Abstract: the central claim of 'improving architectural coherence over retrieval and unconstrained generative baselines' is unsupported by any quantitative results, error analysis, baseline details, or validation data, which is load-bearing for assessing whether the four components reliably enforce constraints without violation.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their review and the identification of this important point regarding the abstract. We address the comment below.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central claim of 'improving architectural coherence over retrieval and unconstrained generative baselines' is unsupported by any quantitative results, error analysis, baseline details, or validation data, which is load-bearing for assessing whether the four components reliably enforce constraints without violation.

    Authors: We agree that the abstract's claim would be more robust with explicit quantitative support. The manuscript currently relies on qualitative visual comparisons (Section 5) to illustrate improvements in coherence and constraint adherence, but does not report numerical metrics, baseline implementation details, or error analysis for this specific claim. In the revised manuscript we will add a quantitative evaluation subsection that includes: (1) baseline descriptions with implementation details, (2) metrics such as constraint violation rate (e.g., footprint/wall deviation) and perceptual coherence scores from a user study, and (3) error analysis across the four components. The abstract will be updated to reflect the new results. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The manuscript presents a descriptive system framework (parametric roof generation + LLM prompt refinement + material retrieval + geometry-aware assembly) operating on a fixed scaffold input. No equations, fitted parameters, uniqueness theorems, or derivation chains appear in the provided text or abstract. All components are presented as independent engineering choices whose outputs are evaluated against external baselines; none reduce to the inputs by construction or via self-citation. The work is therefore self-contained against external benchmarks with no circular steps.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Based solely on the abstract, the claim rests on domain assumptions about scaffold constraints and the effectiveness of the listed components; no free parameters or invented entities are identifiable.

axioms (1)
  • domain assumption Building scaffolds provide fixed footprint, wall geometry, and opening semantics that must be preserved during exterior generation.
    The framework is explicitly designed to operate under these constraints as stated in the abstract.

pith-pipeline@v0.9.1-grok · 5696 in / 1332 out tokens · 47180 ms · 2026-07-01T05:52:35.700269+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

34 extracted references · 10 canonical work pages · 4 internal anchors

  1. [1]

    Bokhovkin, A., Meng, Q., Tulsiani, S., Dai, A.: Scenefactor: Factored latent 3d diffusion for controllable 3d scene generation (2024)

  2. [2]

    In: Proceedings of the IEEE/CVF international conference on computer vision

    Chang, K.H., Cheng, C.Y., Luo, J., Murata, S., Nourbakhsh, M., Tsuji, Y.: Building-gan: Graph-conditioned architectural volumetric design generation. In: Proceedings of the IEEE/CVF international conference on computer vision. pp. 11956–11965 (2021)

  3. [3]

    SAM 3D: 3Dfy Anything in Images

    Chen, X., Chu, F.J., Gleize, P., Liang, K.J., Sax, A., Tang, H., Wang, W., Guo, M., Hardin, T., Li, X., et al.: Sam 3d: 3dfy anything in images. arXiv preprint arXiv:2511.16624 (2025)

  4. [4]

    In: ACM SIGGRAPH 2024 Conference Papers

    Dong, W., Yang, B., Ma, L., Liu, X., Cui, L., Bao, H., Ma, Y., Cui, Z.: Coin3d: Controllable and interactive 3d assets generation with proxy-guided conditioning. In: ACM SIGGRAPH 2024 Conference Papers. pp. 1–10 (2024)

  5. [5]

    arXiv preprint arXiv:2512.05343 (2025)

    Fedele, E., Engelmann, F., Huang, I., Litany, O., Pollefeys, M., Guibas, L.: Space- control: Introducing test-time spatial control to 3d generative modeling. arXiv preprint arXiv:2512.05343 (2025)

  6. [6]

    In: Proceedings of the Computer Vision and Pattern Recognition Conference

    Gu, Z., Cui, Y., Li, Z., Wei, F., Ge, Y., Gu, J., Liu, M.Y., Davis, A., Ding, Y.: Ar- tiscene: Language-driven artistic 3d scene generation through image intermediary. In: Proceedings of the Computer Vision and Pattern Recognition Conference. pp. 2891–2901 (2025)

  7. [7]

    In: Proceedings of the SIGGRAPH Asia 2025 Conference Papers

    Gumin, M., Han, D.H., Yoo, S.J., Ganeshan, A., Jones, R.K., Fu, K., Aguina- Kang, R., Morris, S., Ritchie, D.: Procedural scene programs for open-universe scene generation: Llm-free error correction via program search. In: Proceedings of the SIGGRAPH Asia 2025 Conference Papers. pp. 1–11 (2025)

  8. [8]

    In: Proceedings of the Special Interest Group on Computer Graphics and Interactive Techniques Conference Conference Papers

    Huang, J., Wang, C., Li, L., Huang, C., Dai, Q., Xu, W.: Buildingblock: A hybrid approach for structured building generation. In: Proceedings of the Special Interest Group on Computer Graphics and Interactive Techniques Conference Conference Papers. pp. 1–11 (2025)

  9. [9]

    Hunyuan3D 2.1: From Images to High-Fidelity 3D Assets with Production-Ready PBR Material

    Hunyuan3D, T., Yang, S., Yang, M., Feng, Y., Huang, X., Zhang, S., He, Z., Luo, D., Liu, H., Zhao, Y., et al.: Hunyuan3d 2.1: From images to high-fidelity 3d assets with production-ready pbr material. arXiv preprint arXiv:2506.15442 (2025)

  10. [10]

    arXiv preprint arXiv:2508.18531 (2025)

    Jin, Z., Feng, A.: Sat-skylines: 3d building generation from satellite imagery and coarse geometric priors. arXiv preprint arXiv:2508.18531 (2025)

  11. [11]

    In: Proceedings of the Computer Vision and Pattern Recognition Conference

    Lin, J., Yang, X., Chen, M., Xu, Y., Yan, D., Wu, L., Xu, X., Xu, L., Zhang, S., Chen, Y.C.: Kiss3dgen: Repurposing image diffusion models for 3d asset genera- tion. In: Proceedings of the Computer Vision and Pattern Recognition Conference. pp. 5870–5880 (2025)

  12. [12]

    Lin, Y., Lin, C., Pan, P., Yan, H., Feng, Y., Mu, Y., Fragkiadaki, K.: Partcrafter: Structured 3d mesh generation via compositional latent diffusion transformers (2025),https://arxiv.org/abs/2506.05573

  13. [13]

    In: ACM SIGGRAPH 2006 Papers, pp

    Müller, P., Wonka, P., Haegler, S., Ulmer, A., Van Gool, L.: Procedural modeling of buildings. In: ACM SIGGRAPH 2006 Papers, pp. 614–623 (2006)

  14. [14]

    In: Computer Graphics Forum

    Nishida, G., Bousseau, A., Aliaga, D.G.: Procedural modeling of a building from a single image. In: Computer Graphics Forum. vol. 37, pp. 415–429. Wiley Online Library (2018)

  15. [15]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

    Qiu, L., Chen, G., Gu, X., Zuo, Q., Xu, M., Wu, Y., Yuan, W., Dong, Z., Bo, L., Han, X.: Richdreamer: A generalizable normal-depth diffusion model for detail richness in text-to-3d. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 9914–9925 (2024) Title Suppressed Due to Excessive Length 17

  16. [16]

    In: Proceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition (CVPR)

    Raistrick, A., Mei, L., Kayan, K., Yan, D., Zuo, Y., Han, B., Wen, H., Parakh, M., Alexandropoulos, S., Lipson, L., Ma, Z., Deng, J.: Infinigen indoors: Photorealistic indoor scenes using procedural generation. In: Proceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition (CVPR). pp. 21783–21794 (June 2024)

  17. [17]

    arXiv (2025)

    Seed, B.: Seed3d 1.0: From images to high-fidelity simulation-ready 3d assets. arXiv (2025)

  18. [18]

    Tam, H.I.I., Pun, H.I.D., Wang, A.T., Chang, A.X., Savva, M.: SceneEval: Evalu- ating semantic coherence in text-conditioned 3D indoor scene synthesis (2025)

  19. [19]

    Gemini: A Family of Highly Capable Multimodal Models

    Team, G., Anil, R., Borgeaud, S., Alayrac, J.B., Yu, J., Soricut, R., Schalkwyk, J., Dai, A.M., Hauth, A., Millican, K., et al.: Gemini: a family of highly capable multimodal models. arXiv preprint arXiv:2312.11805 (2023)

  20. [20]

    In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition

    Vanegas, C.A., Aliaga, D.G., Benes, B.: Building reconstruction using manhattan- world grammars. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. pp. 358–365. IEEE (2010)

  21. [21]

    arXiv preprint arXiv:2012.09793 (2020)

    Wang, X., Yeshwanth, C., Nießner, M.: Sceneformer: Indoor scene generation with transformers. arXiv preprint arXiv:2012.09793 (2020)

  22. [22]

    Direct3d-s2: Gigascale 3d generation made easy with spatial sparse attention.arXiv preprint arXiv:2505.17412, 2025

    Wu, S., Lin, Y., Zhang, F., Zeng, Y., Yang, Y., Bao, Y., Qian, J., Zhu, S., Cao, X., Torr, P., et al.: Direct3d-s2: Gigascale 3d generation made easy with spatial sparse attention. arXiv preprint arXiv:2505.17412 (2025)

  23. [23]

    Native and Compact Structured Latents for 3D Generation

    Xiang, J., Chen, X., Xu, S., Wang, R., Lv, Z., Deng, Y., Zhu, H., Dong, Y., Zhao, H., Yuan, N.J., et al.: Native and compact structured latents for 3d generation. arXiv preprint arXiv:2512.14692 (2025)

  24. [24]

    In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

    Xiang, J., Lv, Z., Xu, S., Deng, Y., Wang, R., Zhang, B., Chen, D., Tong, X., Yang, J.: Structured 3d latents for scalable and versatile 3d generation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 21469–21480 (2025)

  25. [25]

    In: Proceedings of the IEEE/CVF International Con- ference on Computer Vision

    Xie, Y., Xu, C., Rakotosaona, M.J., Rim, P., Tombari, F., Keutzer, K., Tomizuka, M., Zhan, W.: Sparsefusion: Fusing multi-modal sparse representations for multi- sensor 3d object detection. In: Proceedings of the IEEE/CVF International Con- ference on Computer Vision. pp. 17591–17602 (2023)

  26. [26]

    In: Advances in Neural Information Processing Systems (NeurIPS) (2025)

    Yang, Y., Jia, B., Zhang, S., Huang, S.: Sceneweaver: All-in-one 3d scene synthesis with an extensible and self-reflective agent. In: Advances in Neural Information Processing Systems (NeurIPS) (2025)

  27. [27]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

    Yang, Y., Sun, F.Y., Weihs, L., VanderBilt, E., Herrasti, A., Han, W., Wu, J., Haber, N., Krishna, R., Liu, L., et al.: Holodeck: Language guided generation of 3d embodied ai environments. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 16227–16237 (2024)

  28. [28]

    arXiv preprint arXiv:2504.07943 (2025)

    Yang, Y., Guo, Y.C., Huang, Y., Zou, Z.X., Yu, Z., Li, Y., Cao, Y.P., Liu, X.: Holopart: Generative 3d part amodal segmentation. arXiv preprint arXiv:2504.07943 (2025)

  29. [29]

    In: Proceed- ings of the IEEE/CVF International Conference on Computer Vision

    Ye, C., Wu, Y., Lu, Z., Chang, J., Guo, X., Zhou, J., Zhao, H., Han, X.: Hi3dgen: High-fidelity 3d geometry generation from images via normal bridging. In: Proceed- ings of the IEEE/CVF International Conference on Computer Vision. pp. 25050– 25061 (2025) 18 R. Xu & D. Aliaga ShellMaker Supplementary Material Wall and Roof Textures We curate a diverse data...

  30. [30]

    aged red brick wall

    and|∆s| ∈[0,1]is the absolute saturation difference, givingC color ij ∈[0,1]. Frequency Compatibility.Foreachdiffusemapwetakea2DFFTofitsluminance image and radially average the power spectrum into a vectorP∈RR.C freq is the cosine similarity of the wall and roof profiles: Cfreq ij = Pw ·P r ∥Pw∥ ∥Pr∥ ∈[0,1], (6) which lies in[0,1]since the spectrum entrie...

  31. [33]

    Do NOT default to generic porch/awning/handrail ele- ments unless they suit the style

    Choose entrance details that are authentic to the architectural style de- scribed above. Do NOT default to generic porch/awning/handrail ele- ments unless they suit the style. Window – System Message: Yougenerateconcisetextpromptsfor3Dwindowmodels.Eachpromptmust describe a single isolated architectural element — the window only, with NO surrounding walls,...

  32. [34]

    EachpromptMUSTdescribeanisolatedarchitecturalelementwithclean background and no surroundings

  33. [35]

    Do NOT include walls, buildings, ground, or any environment context

  34. [36]

    prompts” key whose value is an array of strings. Example: {

    Choose window details that are authentic to the architectural style de- scribed above. Do NOT default to generic muntins/shutters/sill unless they suit the style. [If material is glass: - Glass must be transparent and clear.] Roof Ornament – System Message: You generate concise text prompts for 3D roof ornament models. Allowed or- nament categories: {cat_...