DecoRec: Decomposed 3D Scene Reconstruction from Single-View Images via Object-Level Diffusion
Pith reviewed 2026-05-19 21:06 UTC · model grok-4.3
pith:X43KUQV7 Add to your LaTeX paper
What is a Pith Number?\usepackage{pith}
\pithnumber{X43KUQV7}
Prints a linked pith:X43KUQV7 badge after your title and writes the identifier into PDF metadata. Compiles on arXiv with no extra files. Learn more
The pith
DecoRec reconstructs 3D scenes from single-view images by diffusing objects individually then refining their merge.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that by reconstructing each object in the scene separately with diffusion-based single-view methods and then merging them through a refinement pipeline that uses differentiable rendering and diffusion guidance, high-quality 3D scene reconstruction and novel view synthesis become possible from just one image.
What carries the argument
The decomposition of the scene into individual objects reconstructed via diffusion models, followed by a refinement pipeline for merging using differentiable rendering and diffusion-guided adjustments.
Load-bearing premise
That the refinement pipeline can reliably resolve any geometric or appearance inconsistencies introduced when merging the separately reconstructed objects.
What would settle it
Observing a case where individually accurate object reconstructions lead to a merged scene with uncorrectable errors in surface alignment or texture continuity despite the refinement steps.
Figures
read the original abstract
In this paper, we introduce \textit{DecoRec}, a novel system designed to elevate single-view 2D images to a decomposed 3D scene mesh. Current methods for single-view scene reconstruction typically rely on object retrieval or the regression of coarse 3D voxels or surfaces, leading to inaccuracies in capturing the appearance and geometry of the input image. The lack of high-quality large-scale scene-level datasets further complicates direct 3D scene generation from single-view images. To achieve high-quality 3D scene generation from a single-view image, DecoRec takes advantage of recent diffusion-based single-view object reconstruction methods to reconstruct individual objects separately. Subsequently, a refinement pipeline is proposed to effectively merge these reconstructed objects, enhancing appearance and geometry through a differentiable rendering technique and diffusion-guided refinement. Our results demonstrate that DecoRec facilitates high-quality single-view scene reconstruction in both geometry and novel synthesis, offering significant benefits for downstream applications like room interior design.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces DecoRec, a system for single-view 3D scene reconstruction that decomposes the input into individual objects, reconstructs each using existing diffusion-based single-view object methods, estimates poses to place them, and then applies a refinement pipeline with differentiable rendering and diffusion guidance to produce a coherent scene mesh. The central claim is that this yields high-quality geometry and novel-view synthesis superior to direct scene-level approaches, without requiring large scene datasets.
Significance. If validated, the decomposed approach would be a useful contribution to single-view scene reconstruction by leveraging strong object-level priors and a post-merging refinement stage. Credit is given for the modular design that reuses external diffusion models and for proposing an independent merging/refinement stage rather than end-to-end scene generation. The emphasis on both geometry and appearance consistency via differentiable rendering is a reasonable direction for addressing under-constrained single-view problems.
major comments (2)
- [Results / Experiments] The abstract states that results demonstrate high-quality geometry and novel-view synthesis, yet the supplied text contains no quantitative metrics (e.g., Chamfer distance, IoU, or PSNR), ablation studies, or error analysis. This is load-bearing for the central claim; the results section must include controlled comparisons to baselines and targeted failure-case analysis (occlusions, contacts) to substantiate superiority.
- [Method / Refinement pipeline] The refinement pipeline (described after object reconstruction) relies on differentiable rendering plus diffusion guidance to resolve inter-object inconsistencies. Without explicit 3D consistency terms (e.g., contact or penetration losses) or ablation on scenes with touching/occluding objects, it is unclear whether residual geometric drift or appearance seams are reliably eliminated; this directly affects the coherence guarantee.
minor comments (2)
- [Method] Notation for the merging stage (pose estimation, rigid placement) should be formalized with equations or a clear algorithmic outline to improve reproducibility.
- [Abstract] The abstract would benefit from naming the specific object-reconstruction diffusion models used and the scene categories evaluated.
Simulated Author's Rebuttal
We thank the referee for the thoughtful and constructive comments. We address each major point below and describe the changes planned for the revised manuscript.
read point-by-point responses
-
Referee: [Results / Experiments] The abstract states that results demonstrate high-quality geometry and novel-view synthesis, yet the supplied text contains no quantitative metrics (e.g., Chamfer distance, IoU, or PSNR), ablation studies, or error analysis. This is load-bearing for the central claim; the results section must include controlled comparisons to baselines and targeted failure-case analysis (occlusions, contacts) to substantiate superiority.
Authors: We agree that the absence of quantitative metrics weakens the central claim. The current manuscript emphasizes qualitative results partly because of the scarcity of standardized large-scale scene benchmarks for single-view reconstruction. In the revision we will add controlled quantitative comparisons using Chamfer distance and IoU for geometry as well as PSNR and SSIM for novel-view synthesis, together with ablations and a dedicated failure-case analysis focused on occlusions and object contacts. revision: yes
-
Referee: [Method / Refinement pipeline] The refinement pipeline (described after object reconstruction) relies on differentiable rendering plus diffusion guidance to resolve inter-object inconsistencies. Without explicit 3D consistency terms (e.g., contact or penetration losses) or ablation on scenes with touching/occluding objects, it is unclear whether residual geometric drift or appearance seams are reliably eliminated; this directly affects the coherence guarantee.
Authors: The refinement stage optimizes the merged scene via differentiable rendering under diffusion guidance from the input and generated novel views; the strong object-level priors embedded in the diffusion model are intended to reduce inter-object drift and seams without hand-crafted contact losses. We acknowledge that this mechanism would be clearer with targeted evidence. In the revision we will therefore include ablations on scenes containing touching and occluding objects, reporting both qualitative coherence and quantitative consistency metrics. revision: partial
Circularity Check
No significant circularity; method builds on external priors with independent refinement stage
full rationale
The paper describes a pipeline that applies existing diffusion-based single-view object reconstruction methods to individual objects, followed by a separate refinement stage using differentiable rendering and diffusion guidance to merge them. No equations, fitted parameters renamed as predictions, or self-citation chains that reduce the central claim to its own inputs are present in the abstract or described approach. The derivation is self-contained against external benchmarks (prior diffusion models) and does not rely on self-definitional steps or uniqueness theorems imported from the authors' prior work.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/AlexanderDuality.leanalexander_duality_circle_linking echoes?
echoesECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.
DecoRec ... reconstruct individual objects separately ... refinement pipeline ... differentiable rendering technique and diffusion-guided refinement
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
High- resolution image synthesis with latent diffusion models,
R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, “High- resolution image synthesis with latent diffusion models,” inCVPR, 2022
work page 2022
-
[2]
Monoscene: Monocular 3d semantic scene completion,
A.-Q. Cao and R. De Charette, “Monoscene: Monocular 3d semantic scene completion,” inCVPR, 2022
work page 2022
-
[3]
Corenet: Coherent 3d scene reconstruction from a single rgb image,
S. Popov, P. Bauszat, and V . Ferrari, “Corenet: Coherent 3d scene reconstruction from a single rgb image,” inECCV, 2020. MANUSCRIPT SUBMITTED TO IEEE TVCG 11
work page 2020
-
[4]
Y . Nie, X. Han, S. Guo, Y . Zheng, J. Chang, and J. J. Zhang, “To- tal3dunderstanding: Joint layout, object pose and mesh reconstruction for indoor scenes from a single image,” inCVPR, 2020
work page 2020
-
[5]
Psdr-room: Single photo to scene using differentiable rendering,
K. Yan, F. Luan, M. Ha ˇsan, T. Groueix, V . Deschaintre, and S. Zhao, “Psdr-room: Single photo to scene using differentiable rendering,” in SIGGRAPH Asia 2023, 2023
work page 2023
-
[6]
Roca: Robust cad model retrieval and alignment from a single image,
C. G ¨umeli, A. Dai, and M. Nießner, “Roca: Robust cad model retrieval and alignment from a single image,” inCVPR, 2022
work page 2022
-
[7]
Patch2cad: Patchwise embedding learning for in-the-wild shape retrieval from a single image,
W. Kuo, A. Angelova, T.-Y . Lin, and A. Dai, “Patch2cad: Patchwise embedding learning for in-the-wild shape retrieval from a single image,” inICCV, 2021
work page 2021
-
[8]
Generalizing single- view 3d shape retrieval to occlusions and unseen objects,
Q. Wu, D. Ritchie, M. Savva, and A. X. Chang, “Generalizing single- view 3d shape retrieval to occlusions and unseen objects,”arXiv preprint arXiv:2401.00405, 2023
-
[9]
DreamFusion: Text-to-3D using 2D Diffusion
B. Poole, A. Jain, J. T. Barron, and B. Mildenhall, “Dreamfusion: Text- to-3d using 2d diffusion,”arXiv preprint arXiv:2209.14988, 2022
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[10]
SyncDreamer: Generating Multiview-consistent Images from a Single-view Image
Y . Liu, C. Lin, Z. Zeng, X. Long, L. Liu, T. Komura, and W. Wang, “Syncdreamer: Generating multiview-consistent images from a single- view image,”arXiv preprint arXiv:2309.03453, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[11]
Zero-1-to-3: Zero-shot one image to 3d object,
R. Liu, R. Wu, B. Van Hoorick, P. Tokmakov, S. Zakharov, and C. V ondrick, “Zero-1-to-3: Zero-shot one image to 3d object,” inCVPR, 2023
work page 2023
-
[12]
Zero123++: a Single Image to Consistent Multi-view Diffusion Base Model
R. Shi, H. Chen, Z. Zhang, M. Liu, C. Xu, X. Wei, L. Chen, C. Zeng, and H. Su, “Zero123++: a single image to consistent multi-view diffusion base model,”arXiv preprint arXiv:2310.15110, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[13]
arXiv preprint arXiv:2310.15008 , year=
X. Long, Y .-C. Guo, C. Lin, Y . Liu, Z. Dou, L. Liu, Y . Ma, S.-H. Zhang, M. Habermann, C. Theobaltet al., “Wonder3d: Single image to 3d using cross-domain diffusion,”arXiv preprint arXiv:2310.15008, 2023
-
[14]
Objaverse: A universe of annotated 3d objects,
M. Deitke, D. Schwenk, J. Salvador, L. Weihs, O. Michel, E. VanderBilt, L. Schmidt, K. Ehsani, A. Kembhavi, and A. Farhadi, “Objaverse: A universe of annotated 3d objects,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 13 142–13 153
work page 2023
-
[15]
A. Kirillov, E. Mintun, N. Ravi, H. Mao, C. Rolland, L. Gustafson, T. Xiao, S. Whitehead, A. C. Berg, W.-Y . Loet al., “Segment anything,” inCVPR, 2023
work page 2023
-
[16]
Crm: Single image to 3d textured mesh with convolutional reconstruction model,
Z. Wang, Y . Wang, Y . Chen, C. Xiang, S. Chen, D. Yu, C. Li, H. Su, and J. Zhu, “Crm: Single image to 3d textured mesh with convolutional reconstruction model,”arXiv preprint arXiv:2403.05034, 2024
-
[17]
Modular primitives for high-performance differentiable rendering,
S. Laine, J. Hellsten, T. Karras, Y . Seol, J. Lehtinen, and T. Aila, “Modular primitives for high-performance differentiable rendering,” ACM Transactions on Graphics, vol. 39, no. 6, 2020
work page 2020
-
[18]
Instructpix2pix: Learning to follow image editing instructions,
T. Brooks, A. Holynski, and A. A. Efros, “Instructpix2pix: Learning to follow image editing instructions,” inCVPR, 2023
work page 2023
-
[19]
Pixel- wise view selection for unstructured multi-view stereo,
J. L. Sch ¨onberger, E. Zheng, M. Pollefeys, and J.-M. Frahm, “Pixel- wise view selection for unstructured multi-view stereo,” inEuropean Conference on Computer Vision (ECCV), 2016
work page 2016
-
[20]
Kinectfusion: real-time 3d reconstruction and interaction using a moving depth cam- era,
S. Izadi, D. Kim, O. Hilliges, D. Molyneaux, R. Newcombe, P. Kohli, J. Shotton, S. Hodges, D. Freeman, A. Davisonet al., “Kinectfusion: real-time 3d reconstruction and interaction using a moving depth cam- era,” inProceedings of the 24th annual ACM symposium on User interface software and technology, 2011, pp. 559–568
work page 2011
-
[21]
Denoising diffusion probabilistic models,
J. Ho, A. Jain, and P. Abbeel, “Denoising diffusion probabilistic models,” Advances in neural information processing systems, vol. 33, pp. 6840– 6851, 2020
work page 2020
-
[22]
Superprimitive: Scene recon- struction at a primitive level,
K. Mazur, G. Bae, and A. J. Davison, “Superprimitive: Scene recon- struction at a primitive level,”arXiv preprint arXiv:2312.05889, 2023
-
[23]
Panoptic 3d scene reconstruction from a single rgb image,
M. Dahnert, J. Hou, M. Nießner, and A. Dai, “Panoptic 3d scene reconstruction from a single rgb image,”Advances in Neural Information Processing Systems, vol. 34, pp. 8282–8293, 2021
work page 2021
-
[24]
Know your neighbors: Improving single-view reconstruction via spatial vision-language reasoning,
R. Li, T. Fischer, M. Segu, M. Pollefeys, L. Van Gool, and F. Tombari, “Know your neighbors: Improving single-view reconstruction via spatial vision-language reasoning,”arXiv preprint arXiv:2404.03658, 2024
-
[25]
J. Yao and J. Zhang, “Depthssc: Depth-spatial alignment and dynamic voxel resolution for monocular 3d semantic scene completion,”arXiv preprint arXiv:2311.17084, 2023
-
[26]
Ndc- scene: Boost monocular 3d semantic scene completion in normalized device coordinates space,
J. Yao, C. Li, K. Sun, Y . Cai, H. Li, W. Ouyang, and H. Li, “Ndc- scene: Boost monocular 3d semantic scene completion in normalized device coordinates space,” in2023 IEEE/CVF International Conference on Computer Vision (ICCV). IEEE Computer Society, 2023, pp. 9421– 9431
work page 2023
-
[27]
Scenerf: Self-supervised monocular 3d scene reconstruction with radiance fields,
A.-Q. Cao and R. de Charette, “Scenerf: Self-supervised monocular 3d scene reconstruction with radiance fields,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 9387–9398
work page 2023
-
[28]
Rico: Regularizing the unobservable for indoor compositional reconstruction,
Z. Li, X. Lyu, Y . Ding, M. Wang, Y . Liao, and Y . Liu, “Rico: Regularizing the unobservable for indoor compositional reconstruction,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 17 761–17 771
work page 2023
-
[29]
Holistic 3d scene understanding from a single image with implicit representa- tion,
C. Zhang, Z. Cui, Y . Zhang, B. Zeng, M. Pollefeys, and S. Liu, “Holistic 3d scene understanding from a single image with implicit representa- tion,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 8833–8842
work page 2021
-
[30]
Towards high-fidelity single-view holistic reconstruction of indoor scenes,
H. Liu, Y . Zheng, G. Chen, S. Cui, and X. Han, “Towards high-fidelity single-view holistic reconstruction of indoor scenes,” inEuropean Con- ference on Computer Vision. Springer, 2022, pp. 429–446
work page 2022
-
[31]
Single- view 3d scene reconstruction with high-fidelity shape and texture,
Y . Chen, J. Ni, N. Jiang, Y . Zhang, Y . Zhu, and S. Huang, “Single- view 3d scene reconstruction with high-fidelity shape and texture,” in 2024 International Conference on 3D Vision (3DV). IEEE, 2024, pp. 1456–1467
work page 2024
-
[32]
T. Chu, P. Zhang, Q. Liu, and J. Wang, “Buol: A bottom-up framework with occupancy-aware lifting for panoptic 3d scene reconstruction from a single image,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 4937–4946
work page 2023
-
[33]
Uni-3d: A universal model for panoptic 3d scene reconstruction,
X. Zhang, Z. Chen, F. Wei, and Z. Tu, “Uni-3d: A universal model for panoptic 3d scene reconstruction,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 9256–9266
work page 2023
-
[34]
Open: Occlusion-invariant perception network for single image-based 3d shape retrieval,
F. Chu, Y . Cong, and R. Chen, “Open: Occlusion-invariant perception network for single image-based 3d shape retrieval,”IEEE Transactions on Circuits and Systems for Video Technology, 2024
work page 2024
-
[35]
Diffcad: Weakly- supervised probabilistic cad model retrieval and alignment from an rgb image,
D. Gao, D. Rozenberszki, S. Leutenegger, and A. Dai, “Diffcad: Weakly- supervised probabilistic cad model retrieval and alignment from an rgb image,”arXiv preprint arXiv:2311.18610, 2023
-
[36]
Generalizable 3d scene recon- struction via divide and conquer from a single view,
A. Dogaru, M. ¨Ozer, and B. Egger, “Generalizable 3d scene recon- struction via divide and conquer from a single view,”arXiv preprint arXiv:2404.03421, 2024
-
[37]
Comboverse: Compositional 3d assets creation using spatially-aware diffusion guid- ance,
Y . Chen, T. Wang, T. Wu, X. Pan, K. Jia, and Z. Liu, “Comboverse: Compositional 3d assets creation using spatially-aware diffusion guid- ance,”arXiv preprint arXiv:2403.12409, 2024
-
[38]
3d cinemagraphy from a single image,
X. Li, Z. Cao, H. Sun, J. Zhang, K. Xian, and G. Lin, “3d cinemagraphy from a single image,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 4595–4605
work page 2023
-
[39]
Sine: Semantic-driven image-based nerf editing with prior- guided editing field,
C. Bao, Y . Zhang, B. Yang, T. Fan, Z. Yang, H. Bao, G. Zhang, and Z. Cui, “Sine: Semantic-driven image-based nerf editing with prior- guided editing field,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 20 919–20 929
work page 2023
-
[40]
C. Wang, Y .-P. Wang, and D. Manocha, “Lolep: Single-view view syn- thesis with locally-learned planes and self-attention occlusion inference,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 10 841–10 851
work page 2023
-
[41]
Perf: Panoramic neural radiance field from a single panorama,
G. Wang, P. Wang, Z. Chen, W. Wang, C. C. Loy, and Z. Liu, “Perf: Panoramic neural radiance field from a single panorama,”arXiv preprint arXiv:2310.16831, 2023
-
[42]
As-deformable-as- possible single-image-based view synthesis without depth prior,
C. Zhang, C. Lin, K. Liao, L. Nie, and Y . Zhao, “As-deformable-as- possible single-image-based view synthesis without depth prior,”IEEE Transactions on Circuits and Systems for Video Technology, 2023
work page 2023
-
[43]
Sinmpi: Novel view synthesis from a single image with expanded multiplane images,
G. Pu, P.-S. Wang, and Z. Lian, “Sinmpi: Novel view synthesis from a single image with expanded multiplane images,” inSIGGRAPH Asia 2023 Conference Papers, 2023, pp. 1–10
work page 2023
-
[44]
Single-view view synthesis in the wild with learned adaptive multiplane images,
Y . Han, R. Wang, and J. Yang, “Single-view view synthesis in the wild with learned adaptive multiplane images,” inACM SIGGRAPH 2022 Conference Proceedings, 2022, pp. 1–8
work page 2022
-
[45]
Luciddreamer: Domain-free generation of 3d gaussian splatting scenes,
J. Chung, S. Lee, H. Nam, J. Lee, and K. M. Lee, “Luciddreamer: Domain-free generation of 3d gaussian splatting scenes,”arXiv preprint arXiv:2311.13384, 2023
-
[46]
CAT3D: Create Anything in 3D with Multi-View Diffusion Models
R. Gao, A. Holynski, P. Henzler, A. Brussee, R. Martin-Brualla, P. Srini- vasan, J. T. Barron, and B. Poole, “Cat3d: Create anything in 3d with multi-view diffusion models,”arXiv preprint arXiv:2405.10314, 2024
work page internal anchor Pith review arXiv 2024
-
[47]
Realmdreamer: Text-driven 3d scene generation with inpainting and depth diffusion,
J. Shriram, A. Trevithick, L. Liu, and R. Ramamoorthi, “Realmdreamer: Text-driven 3d scene generation with inpainting and depth diffusion,” arXiv preprint arXiv:2404.07199, 2024
-
[48]
Spin-nerf: Multiview segmentation and perceptual inpainting with neural radiance fields,
A. Mirzaei, T. Aumentado-Armstrong, K. G. Derpanis, J. Kelly, M. A. Brubaker, I. Gilitschenski, and A. Levinshtein, “Spin-nerf: Multiview segmentation and perceptual inpainting with neural radiance fields,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 20 669–20 679
work page 2023
-
[49]
Bolt3d: Generating 3d scenes in seconds,
S. Szymanowicz, J. Y . Zhang, P. Srinivasan, R. Gao, A. Brussee, A. Holynski, R. Martin-Brualla, J. T. Barron, and P. Henzler, “Bolt3d: Generating 3d scenes in seconds,”arXiv preprint arXiv:2503.14445, 2025
-
[50]
B. Mildenhall, P. P. Srinivasan, M. Tancik, J. T. Barron, R. Ramamoorthi, and R. Ng, “Nerf: Representing scenes as neural radiance fields for view MANUSCRIPT SUBMITTED TO IEEE TVCG 12 synthesis,”Communications of the ACM, vol. 65, no. 1, pp. 99–106, 2021
work page 2021
-
[51]
3d gaussian splatting for real-time radiance field rendering
B. Kerbl, G. Kopanas, T. Leimk ¨uhler, and G. Drettakis, “3d gaussian splatting for real-time radiance field rendering.”ACM Trans. Graph., vol. 42, no. 4, pp. 139–1, 2023
work page 2023
-
[52]
Diffuscene: Denoising diffusion models for gerative indoor scene synthesis,
J. Tang, Y . Nie, L. Markhasin, A. Dai, J. Thies, and M. Nießner, “Diffuscene: Denoising diffusion models for gerative indoor scene synthesis,” inProceedings of the ieee/cvf conference on computer vision and pattern recognition, 2024
work page 2024
-
[53]
Commonscenes: Generating commonsense 3d indoor scenes with scene graphs,
G. Zhai, E. P. ¨Ornek, S.-C. Wu, Y . Di, F. Tombari, N. Navab, and B. Busam, “Commonscenes: Generating commonsense 3d indoor scenes with scene graphs,”Advances in Neural Information Processing Systems, vol. 36, 2024
work page 2024
-
[54]
C. Lin and Y . Mu, “Instructscene: Instruction-driven 3d indoor scene synthesis with semantic graph prior,”arXiv preprint arXiv:2402.04717, 2024
-
[55]
Furniscene: A large-scale 3d room dataset with intricate furnishing scenes,
G. Zhang, Y . Wang, C. Luo, S. Xu, J. Peng, Z. Zhang, and M. Zhang, “Furniscene: A large-scale 3d room dataset with intricate furnishing scenes,”arXiv preprint arXiv:2401.03470, 2024
-
[56]
Echoscene: Indoor scene generation via information echo over scene graph diffusion,
G. Zhai, E. P. ¨Ornek, D. Z. Chen, R. Liao, Y . Di, N. Navab, F. Tombari, and B. Busam, “Echoscene: Indoor scene generation via information echo over scene graph diffusion,”arXiv preprint arXiv:2405.00915, 2024
-
[57]
Blockfusion: Expandable 3d scene generation using latent tri-plane extrapolation,
Z. Wu, Y . Li, H. Yan, T. Shang, W. Sun, S. Wang, R. Cui, W. Liu, H. Sato, H. Liet al., “Blockfusion: Expandable 3d scene generation using latent tri-plane extrapolation,”arXiv preprint arXiv:2401.17053, 2024
-
[58]
Scenewiz3d: Towards text-guided 3d scene composition,
Q. Zhang, C. Wang, A. Siarohin, P. Zhuang, Y . Xu, C. Yang, D. Lin, B. Zhou, S. Tulyakov, and H.-Y . Lee, “Scenewiz3d: Towards text-guided 3d scene composition,”arXiv preprint arXiv:2312.08885, 2023
-
[59]
Graphdreamer: Compositional 3d scene synthesis from scene graphs,
G. Gao, W. Liu, A. Chen, A. Geiger, and B. Sch ¨olkopf, “Graphdreamer: Compositional 3d scene synthesis from scene graphs,”arXiv preprint arXiv:2312.00093, 2023
-
[60]
Text2scene: Text-driven in- door scene stylization with part-aware details,
I. Hwang, H. Kim, and Y . M. Kim, “Text2scene: Text-driven in- door scene stylization with part-aware details,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 1890–1899
work page 2023
-
[61]
Text2nerf: Text-driven 3d scene generation with neural radiance fields,
J. Zhang, X. Li, Z. Wan, C. Wang, and J. Liao, “Text2nerf: Text-driven 3d scene generation with neural radiance fields,”IEEE Transactions on Visualization and Computer Graphics, 2024
work page 2024
-
[62]
Dreamscene360: Unconstrained text-to- 3d scene generation with panoramic gaussian splatting,
S. Zhou, Z. Fan, D. Xu, H. Chang, P. Chari, T. Bharadwaj, S. You, Z. Wang, and A. Kadambi, “Dreamscene360: Unconstrained text-to- 3d scene generation with panoramic gaussian splatting,”arXiv preprint arXiv:2404.06903, 2024
-
[63]
Dreamscene: 3d gaussian-based text-to-3d scene generation via formation pattern sampling,
H. Li, H. Shi, W. Zhang, W. Wu, Y . Liao, L. Wang, L.-h. Lee, and P. Zhou, “Dreamscene: 3d gaussian-based text-to-3d scene generation via formation pattern sampling,”arXiv preprint arXiv:2404.03575, 2024
-
[64]
Text2immersion: Generative immersive scene with 3d gaussians,
H. Ouyang, K. Heal, S. Lombardi, and T. Sun, “Text2immersion: Generative immersive scene with 3d gaussians,”arXiv preprint arXiv:2312.09242, 2023
-
[65]
Controlroom3d: Room generation using semantic proxy rooms,
J. Schult, S. Tsai, L. H ¨ollein, B. Wu, J. Wang, C.-Y . Ma, K. Li, X. Wang, F. Wimbauer, Z. Heet al., “Controlroom3d: Room generation using semantic proxy rooms,”arXiv preprint arXiv:2312.05208, 2023
-
[66]
Ctrl-room: Controllable text- to-3d room meshes generation with layout constraints,
C. Fang, X. Hu, K. Luo, and P. Tan, “Ctrl-room: Controllable text- to-3d room meshes generation with layout constraints,”arXiv preprint arXiv:2310.03602, 2023
-
[67]
3d-scenedreamer: Text-driven 3d-consistent scene generation,
F. Zhang, Y . Zhang, Q. Zheng, R. Ma, W. Hua, H. Bao, W. Xu, and C. Zou, “3d-scenedreamer: Text-driven 3d-consistent scene generation,” arXiv preprint arXiv:2403.09439, 2024
-
[68]
Showroom3d: Text to high-quality 3d room generation using 3d priors,
W. Mao, Y .-P. Cao, J.-W. Liu, Z. Xu, and M. Z. Shou, “Showroom3d: Text to high-quality 3d room generation using 3d priors,”arXiv preprint arXiv:2312.13324, 2023
-
[69]
Fastscene: Text-driven fast 3d indoor scene generation via panoramic gaussian splatting,
Y . Ma, D. Zhan, and Z. Jin, “Fastscene: Text-driven fast 3d indoor scene generation via panoramic gaussian splatting,”arXiv preprint arXiv:2405.05768, 2024
-
[70]
360dvd: Controllable panorama video generation with 360-degree video diffusion model,
Q. Wang, W. Li, C. Mou, X. Cheng, and J. Zhang, “360dvd: Controllable panorama video generation with 360-degree video diffusion model,” arXiv preprint arXiv:2401.06578, 2024
-
[71]
MVDream: Multi-view Diffusion for 3D Generation
Y . Shi, P. Wang, J. Ye, M. Long, K. Li, and X. Yang, “Mvdream: Multi- view diffusion for 3d generation,”arXiv preprint arXiv:2308.16512, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[72]
Imagedream: Image-prompt multi-view diffusion for 3d generation
P. Wang and Y . Shi, “Imagedream: Image-prompt multi-view diffusion for 3d generation,”arXiv preprint arXiv:2312.02201, 2023
-
[73]
Direct2. 5: Diverse text-to-3d generation via multi-view 2.5 d diffusion,
Y . Lu, J. Zhang, S. Li, T. Fang, D. McKinnon, Y . Tsin, L. Quan, X. Cao, and Y . Yao, “Direct2. 5: Diverse text-to-3d generation via multi-view 2.5 d diffusion,”arXiv preprint arXiv:2311.15980, 2023
-
[74]
Shap-E: Generating Conditional 3D Implicit Functions
H. Jun and A. Nichol, “Shap-e: Generating conditional 3d implicit functions,”arXiv preprint arXiv:2305.02463, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[75]
Z.-X. Zou, Z. Yu, Y .-C. Guo, Y . Li, D. Liang, Y .-P. Cao, and S.-H. Zhang, “Triplane meets gaussian splatting: Fast and generalizable single-view 3d reconstruction with transformers,”arXiv preprint arXiv:2312.09147, 2023
-
[76]
Instant-3d: Instant neural radiance field training towards on-device ar/vr 3d reconstruction,
S. Li, C. Li, W. Zhu, B. Yu, Y . Zhao, C. Wan, H. You, H. Shi, and Y . Lin, “Instant-3d: Instant neural radiance field training towards on-device ar/vr 3d reconstruction,” inProceedings of the 50th Annual International Symposium on Computer Architecture, 2023, pp. 1–13
work page 2023
-
[77]
DreamGaussian: Generative Gaussian Splatting for Efficient 3D Content Creation
J. Tang, J. Ren, H. Zhou, Z. Liu, and G. Zeng, “Dreamgaussian: Generative gaussian splatting for efficient 3d content creation,”arXiv preprint arXiv:2309.16653, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[78]
Objaverse-xl: A uni- verse of 10m+ 3d objects,
M. Deitke, R. Liu, M. Wallingford, H. Ngo, O. Michel, A. Kusupati, A. Fan, C. Laforte, V . V oleti, S. Y . Gadreet al., “Objaverse-xl: A uni- verse of 10m+ 3d objects,”Advances in Neural Information Processing Systems, vol. 36, 2024
work page 2024
-
[79]
Mvdiffusion: Enabling holistic multi-view image generation with correspondence- aware diffusion,
S. Tang, F. Zhang, J. Chen, P. Wang, and F. Yasutaka, “Mvdiffusion: Enabling holistic multi-view image generation with correspondence- aware diffusion,”arXiv preprint 2307.01097, 2023
-
[80]
Geowizard: Unleashing the diffusion priors for 3d geometry estimation from a single image,
X. Fu, W. Yin, M. Hu, K. Wang, Y . Ma, P. Tan, S. Shen, D. Lin, and X. Long, “Geowizard: Unleashing the diffusion priors for 3d geometry estimation from a single image,”arXiv preprint arXiv:2403.12013, 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.