Home3D 1.0: A High-Fidelity Image-to-3D Asset Generation System for Interior Design
Pith reviewed 2026-06-30 09:47 UTC · model grok-4.3
The pith
Home3D turns one photo of furniture into a watertight 3D mesh with PBR materials that decomposes into editable parts.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Given a photograph of a furniture or decor item, the system outputs a mesh with physically-based rendering (PBR) materials, and the mesh can be decomposed into material-specific components through four tightly coupled modules: Geometry reconstructs a watertight mesh through latent SDF modelling with a geometry VAE and a coarse-to-fine flow-matching DiT; Texture predicts multiview albedo observations, reprojects them onto the mesh, and completes unseen surface regions with a 3D texture field; Material uses MatWeaver to obtain component masks through video-based segmentation and UV-space voting, then retrieves and bakes PBR maps from a curated material library through hierarchical multi-modal
What carries the argument
Four tightly coupled modules (Geometry via latent SDF VAE and flow-matching DiT, Texture via multiview albedo reprojection plus 3D field completion, Material via MatWeaver segmentation and library matching, Parts via PartVAE and PartDiT) that together produce a decomposable PBR mesh from one image.
If this is right
- The output is a watertight mesh suitable for rendering and editing in standard 3D software.
- Materials are retrieved from a curated library and baked as PBR maps that support component-level changes.
- The mesh decomposes into material-specific and semantic part meshes generated in a single pass.
- Each of the four modules can be assessed independently using dedicated metrics.
- The system targets direct use in interior design and e-commerce workflows from ordinary photographs.
Where Pith is reading between the lines
- If the coupling works as described, the approach could reduce reliance on multi-view capture rigs for asset creation.
- The modular design suggests it might be possible to swap individual modules for domain-specific improvements, such as better handling of reflective surfaces.
- Automated part decomposition could support downstream tasks like physics simulation or modular furniture reconfiguration in virtual environments.
- Extending the input to include partial depth or lighting estimates might close some of the remaining gaps noted in the module evaluations.
Load-bearing premise
The four modules can be tightly coupled in practice to produce high-fidelity, usable assets without major quality gaps or manual intervention.
What would settle it
Testing the full pipeline on a held-out set of real furniture photographs and finding that the generated meshes contain non-watertight surfaces, mismatched PBR parameters, or part decompositions that do not align with the input image's visible materials.
Figures
read the original abstract
We present Home3D 1.0, a modular image-to-3D generation system that produces high-quality 3D assets from a single reference image, targeting interior design and e-commerce applications. Given a photograph of a furniture or decor item, the system outputs a mesh with physically-based rendering (PBR) materials, and the mesh can be decomposed into material-specific components. The pipeline is organized into four tightly coupled modules: Geometry reconstructs a watertight mesh through latent SDF modelling with a geometry VAE and a coarse-to-fine flow-matching DiT; Texture predicts multiview albedo observations, reprojects them onto the mesh, and completes unseen surface regions with a 3D texture field; Material uses MatWeaver to obtain component masks through video-based segmentation and UV-space voting, then retrieves and bakes PBR maps from a curated material library through hierarchical multi-modal matching; and Parts generates material-editable semantic part meshes with a PartVAE and PartDiT, decoding multi-head part-specific SDF fields in one pass. Each module is evaluated independently with dedicated metrics, highlighting both the current system capability and the remaining gaps toward broader deployment.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents Home3D 1.0, a modular image-to-3D generation system for producing high-quality 3D assets with PBR materials and decomposable parts from a single reference image. The pipeline comprises four modules: Geometry (latent SDF modelling with VAE and coarse-to-fine flow-matching DiT for watertight meshes), Texture (multiview albedo prediction, reprojection, and 3D texture field completion), Material (MatWeaver for component masks via video segmentation and UV voting, followed by PBR map retrieval and baking), and Parts (PartVAE and PartDiT for generating material-editable semantic part meshes via multi-head SDF fields). Each module is evaluated independently using dedicated metrics, with the goal of enabling applications in interior design and e-commerce.
Significance. If the integration of the modules succeeds as claimed, the system could offer a substantial advance in automated creation of editable, high-fidelity 3D assets suitable for professional use, addressing a key need in e-commerce and design workflows where manual modeling is costly. The modular structure facilitates independent development and evaluation, which is a positive design choice. However, the absence of system-level validation in the provided description makes it difficult to gauge the practical impact.
major comments (1)
- [Abstract] The central claim that the modules 'can be tightly coupled in practice to produce high-fidelity, usable assets without major quality gaps or manual intervention' is not supported by any described end-to-end evaluation. The abstract explicitly notes that 'each module is evaluated independently with dedicated metrics,' but provides no pipeline-level metrics, error-propagation studies, or checks on interface consistency (e.g., how Geometry outputs feed into Texture reprojection or how Material masks condition Parts generation). This leaves the 'tightly coupled' aspect unverified.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback and for recognizing the potential of the modular design. We address the single major comment below.
read point-by-point responses
-
Referee: [Abstract] The central claim that the modules 'can be tightly coupled in practice to produce high-fidelity, usable assets without major quality gaps or manual intervention' is not supported by any described end-to-end evaluation. The abstract explicitly notes that 'each module is evaluated independently with dedicated metrics,' but provides no pipeline-level metrics, error-propagation studies, or checks on interface consistency (e.g., how Geometry outputs feed into Texture reprojection or how Material masks condition Parts generation). This leaves the 'tightly coupled' aspect unverified.
Authors: We agree with the observation. The current manuscript evaluates each module independently to isolate technical contributions and remaining gaps, as stated in the abstract. The 'tightly coupled' phrasing in the abstract and introduction refers to the explicit data interfaces defined in the methods (Geometry mesh as input to Texture reprojection, Material component masks conditioning Parts generation, etc.), but no quantitative end-to-end metrics, error-propagation analysis, or interface-consistency experiments are reported. We will revise the abstract to remove or qualify the claim of 'without major quality gaps or manual intervention,' add a dedicated subsection presenting qualitative full-pipeline results on held-out images, and discuss observed interface behavior. These changes will be incorporated in the next version. revision: yes
Circularity Check
No derivation chain or equations present; modular system description with independent module evaluations.
full rationale
The paper describes a four-module pipeline (Geometry, Texture, Material, Parts) for image-to-3D asset generation and states that each module is evaluated independently with dedicated metrics. No equations, latent variable derivations, fitted parameters renamed as predictions, or self-citation chains appear in the abstract or described content. The central claims concern architectural organization and per-module performance rather than any load-bearing mathematical step that reduces to its own inputs by construction. The integration assumption noted in the skeptic analysis is an empirical claim about system behavior, not a circular derivation.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Dora: Sampling and benchmarking for 3d shape variational auto-encoders
Rui Chen, Jianfeng Zhang, Yixun Liang, Guan Luo, Weiyu Li, Jiarui Liu, Xiu Li, Xiaoxiao Long, Jiashi Feng, and Ping Tan. Dora: Sampling and benchmarking for 3d shape variational auto-encoders. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 16251–16261, 2025
2025
-
[2]
Objaverse-XL: A Universe of 10M+ 3D Objects
Matt Deitke, Ruoshi Liu, Matthew Wallingford, Huong Ngo, Oscar Michel, Aditya Kusupati, Alan Fan, Christian Laforte, Vikram Voleti, Samir Yitzhak Gadre, Eli VanderBilt, Aniruddha Kembhavi, Carl Vondrick, Georgia Gkioxari, Kiana Ehsani, Ludwig Schmidt, and Ali Farhadi. Objaverse-xl: A universe of 10m+ 3d objects.arXiv preprint arXiv:2307.05663, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[3]
Jiashi Feng, Xiu Li, Jing Lin, Jiahang Liu, Gaohong Liu, Weiqiang Lou, Su Ma, et al. Seed3d 1.0: From images to high-fidelity simulation-ready 3d assets.arXiv preprint arXiv:2510.19944, 2025
-
[4]
3D-FRONT: 3d furnished rooms with layouts and semantics
Huan Fu, Bowen Cai, Lin Gao, Ling-Xiao Zhang, Jiaming Wang, Cao Li, Qixun Zeng, Chengyue Sun, Rongfei Jia, Binqiang Zhao, and Hao Zhang. 3D-FRONT: 3d furnished rooms with layouts and semantics. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 10933–10942, October 2021
2021
-
[5]
Seed3D 2.0: Advancing High-Fidelity Simulation-Ready 3D Content Generation
Diandian Gu, Jing Lin, Gaohong Liu, Jiahang Liu, Su Ma, Guang Shi, Jun Wang, Qinlong Wang, Qianyi Wu, Zhongcong Xu, Xuanyu Yi, Zihao Yu, Jianfeng Zhang, Zhuolin Zheng, Yifan Zhu, Rui Chen, Hengkai Guo, Xiaoyang Guo, Mingcong Han, Xu Han, Xiu Li, Yixun Liang, Weiqiang Lou, Junzhe Lu, Guan Luo, Minghan Qin, Shuguang Wang, and Yuang Wang. Seed3d 2.0: Advanci...
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[6]
Lattice: Democratize high-fidelity 3d generation at scale, 2025
Zeqiang Lai, Yunfei Zhao, Zibo Zhao, Haolin Liu, Qingxiang Lin, Jingwei Huang, Chunchao Guo, and Xiangyu Yue. Lattice: Democratize high-fidelity 3d generation at scale, 2025
2025
-
[7]
Hunyuan3d studio: End-to-end ai pipeline for game-ready 3d asset generation
Biwen Lei, Yang Li, Xinhai Liu, Shuhui Yang, Lixin Xu, Jingwei Huang, Ruining Tang, Haohan Weng, Jian Liu, Jing Xu, et al. Hunyuan3d studio: End-to-end ai pipeline for game-ready 3d asset generation. arXiv preprint arXiv:2509.12815, 2025
-
[8]
Yuchen Lin, Chenguo Lin, Panwang Pan, Honglei Yan, Yiqiang Feng, Yadong Mu, and Katerina Fragki- adaki. Partcrafter: Structured 3d mesh generation via compositional latent diffusion transformers.arXiv preprint arXiv:2506.05573, 2025
-
[9]
Yaron Lipman, Ricky T. Q. Chen, Heli Ben-Hamu, Maximilian Nickel, and Matt Le. Flow matching for generative modeling. InInternational Conference on Learning Representations (ICLR), 2023
2023
-
[10]
Lorensen and Harvey E
William E. Lorensen and Harvey E. Cline. Marching cubes: A high resolution 3d surface construction algorithm. InProceedings of SIGGRAPH, volume 21, pages 163–169, 1987. 16 Alibaba Group/Taobao Home3D Technical Report
1987
-
[11]
Dinov2: Learning robust visual features without supervision.Transactions on Machine Learning Research, 2024
Maxime Oquab, Timoth´ ee Darcet, Th´ eo Moutakanni, Huy Vo, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, et al. Dinov2: Learning robust visual features without supervision.Transactions on Machine Learning Research, 2024
2024
-
[12]
Efficient part-level 3d object generation via dual volume packing
Jiaxiang Tang, Ruijie Lu, Zhaoshuo Li, Zekun Hao, Xuan Li, Fangyin Wei, Shuran Song, Gang Zeng, Ming-Yu Liu, and Tsung-Yi Lin. Efficient part-level 3d object generation via dual volume packing. arXiv preprint arXiv:2506.09980, 2025
-
[13]
Hunyuan3D 2.1: From Images to High-Fidelity 3D Assets with Production-Ready PBR Material
Team Hunyuan3D, Shuhui Yang, Mingxin Yang, Yifei Feng, Xin Huang, Sheng Zhang, Zebin He, Di Luo, et al. Hunyuan3d 2.1: From images to high-fidelity 3d assets with production-ready pbr material.arXiv preprint arXiv:2506.15442, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[14]
TripoSR: Fast 3D Object Reconstruction from a Single Image
Dmitry Tochilkin, David Pankratz, Zexiang Liu, Zixuan Huang, , Adam Letts, Yangguang Li, Ding Liang, Christian Laforte, Varun Jampani, and Yan-Pei Cao. Triposr: Fast 3d object reconstruction from a single image.arXiv preprint arXiv:2403.02151, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[15]
Native and compact structured latents for 3d generation.Tech report, 2025
Jianfeng Xiang, Xiaoxue Chen, Sicheng Xu, Ruicheng Wang, Zelong Lv, Yu Deng, Hongyuan Zhu, Yue Dong, Hao Zhao, Nicholas Jing Yuan, and Jiaolong Yang. Native and compact structured latents for 3d generation.Tech report, 2025
2025
-
[16]
Xinhao Yan, Jiachen Xu, Yang Li, Changfeng Ma, Yunhan Yang, Chunshi Wang, Zibo Zhao, Zeqiang Lai, Yunfei Zhao, Zhuo Chen, and Chunchao Guo. X-part: High fidelity and structure coherent shape decomposition.arXiv preprint arXiv:2509.08643, 2025
-
[17]
Efros, Eli Shechtman, and Oliver Wang
Richard Zhang, Phillip Isola, Alexei A. Efros, Eli Shechtman, and Oliver Wang. The unreasonable effectiveness of deep features as a perceptual metric. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 586–595, 2018. 17 Alibaba Group/Taobao Home3D Technical Report A Contributions and Acknowledgments All contr...
2018
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.