Recognition: unknown
REVIVE 3D: Refinement via Encoded Voluminous Inflated prior for Volume Enhancement
Pith reviewed 2026-05-07 08:01 UTC · model grok-4.3
The pith
A two-stage pipeline builds an inflated geometric prior from flat images then refines it in latent space to produce voluminous 3D assets.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
REVIVE 3D shows that an Inflated Prior, created by inflating the input silhouette to recover global volume and superimposing part-aware details to capture local structure, can guide a 3D Latent Refinement stage in which Gaussian noise is injected into the prior's latent representation and then removed, allowing the pretrained backbone to produce more voluminous 3D assets from flat images.
What carries the argument
The Inflated Prior, formed by silhouette inflation for global volume plus part-aware detail overlays for local structure, which supplies geometric cues that steer the latent-space denoising process.
If this is right
- The same two-stage process enables image-conditioned 3D editing by refining modified inputs.
- Compactness and Normal Anisotropy metrics provide quantitative proxies for perceived volume and surface quality that match human evaluations.
- Extensive tests demonstrate state-of-the-art results on challenging flat-image datasets.
Where Pith is reading between the lines
- The prior-construction step could be inserted into other 3D generative pipelines to improve volume without retraining the backbone.
- Higher-fidelity volumetric outputs may improve accuracy in downstream uses such as physics simulation or virtual-object placement.
- Automating the part-aware detail stage might further reduce reliance on heuristic or manual inputs.
Load-bearing premise
The geometric cues encoded in the Inflated Prior can steer the denoising process to add volume without creating inconsistencies or erasing fine details from the original image.
What would settle it
Apply REVIVE 3D to a held-out set of flat images and observe no increase in Compactness scores or no preference in blind user ratings for volume relative to standard generative baselines.
Figures
read the original abstract
Recent generative models have shown strong performance in generating diverse 3D assets from 2D images, a fundamental research topic in computer vision and graphics. However, these models still struggle to generate voluminous 3D assets when the input is a flat image that provides limited 3D cues. We introduce REVIVE 3D, a two-stage, plug-and-play pipeline for generating voluminous 3D assets from flat images. In Stage 1, we construct an Inflated Prior by inflating the foreground silhouette to recover global volume and superimposing part-aware details to capture local structure. In Stage 2, 3D Latent Refinement injects Gaussian noise into the Inflated Prior's latent and then denoises it, using the prior's geometric cues to leverage the backbone's pretrained 3D knowledge. Furthermore, our framework supports image-conditioned 3D editing. To quantify volume and surface flatness, we propose Compactness and Normal Anisotropy. We validate Compactness and Normal Anisotropy through a user study, showing that these metrics align with human perception of volume and quality. We show that REVIVE 3D achieves state-of-the-art performance on a challenging flat image dataset, based on extensive qualitative and quantitative evaluations.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents REVIVE 3D, a two-stage plug-and-play pipeline for generating voluminous 3D assets from flat 2D images. Stage 1 constructs an Inflated Prior by inflating the foreground silhouette to recover global volume and superimposing part-aware details to capture local structure. Stage 2 performs 3D Latent Refinement by injecting Gaussian noise into the prior's latent representation and denoising it, guided by the prior's geometric cues to leverage pretrained 3D knowledge from backbones. The framework also supports image-conditioned 3D editing. New metrics Compactness and Normal Anisotropy are proposed to quantify volume and surface flatness; these are validated via a user study showing alignment with human perception. The authors claim state-of-the-art performance on a challenging flat-image dataset based on qualitative and quantitative evaluations.
Significance. If the central claims hold, the work could offer a practical, efficient approach to addressing limited 3D cues in single-image 3D generation by injecting a constructed prior into latent refinement. The introduction of Compactness and Normal Anisotropy metrics, if validated as correlating with human judgments, would provide useful tools for evaluating volumetric quality in 3D assets. The plug-and-play design and editing support add practical value, and reliance on pretrained backbones is computationally attractive. However, the significance is tempered by the need for clearer evidence that the Inflated Prior reliably guides refinement without artifacts.
major comments (3)
- [Stage 2 description] Stage 2 (3D Latent Refinement): The description states that geometric cues from the Inflated Prior guide the denoising process after Gaussian noise injection, but provides no mechanism for cue injection (e.g., conditioning, attention, or auxiliary losses), no explicit fidelity terms to the original image, and no analysis of failure modes such as over-smoothing or hallucinated geometry. This is load-bearing for the claim that the method recovers true volume without inconsistencies or detail loss.
- [Evaluation] Evaluation section: The SOTA claim rests on 'extensive qualitative and quantitative evaluations' and a user study validating Compactness and Normal Anisotropy, yet the provided text supplies no numerical results, ablation tables, baseline comparisons, participant counts, rating protocol, or statistical analysis. Without these, the performance claims and metric validation cannot be assessed.
- [Stage 1 description] Stage 1 (Inflated Prior construction): The process of superimposing part-aware details onto the inflated silhouette lacks specifics on detail extraction (e.g., from segmentation models or other sources) and resolution of potential conflicts between global inflation and local details. If these cues are weak or inconsistent, they directly undermine the reliability of the guidance provided to Stage 2.
minor comments (2)
- [Abstract] The abstract introduces several new terms (Inflated Prior, Compactness, Normal Anisotropy) without immediate definitions or illustrative examples; the full manuscript should provide these early to improve accessibility.
- [Method] Notation for the latent space operations in Stage 2 should be formalized with equations to clarify the noise injection and denoising steps.
Simulated Author's Rebuttal
We thank the referee for the insightful and constructive comments on our work. The feedback has helped us identify areas where the manuscript can be improved for clarity and completeness. We have prepared a revised version of the manuscript that addresses each of the major comments. Below, we provide point-by-point responses to the major comments.
read point-by-point responses
-
Referee: [Stage 2 description] Stage 2 (3D Latent Refinement): The description states that geometric cues from the Inflated Prior guide the denoising process after Gaussian noise injection, but provides no mechanism for cue injection (e.g., conditioning, attention, or auxiliary losses), no explicit fidelity terms to the original image, and no analysis of failure modes such as over-smoothing or hallucinated geometry. This is load-bearing for the claim that the method recovers true volume without inconsistencies or detail loss.
Authors: We appreciate the referee highlighting the need for more precise technical details in Stage 2. The original manuscript described the process at a conceptual level, but we agree that the specific implementation of cue injection was not sufficiently elaborated. In the revised manuscript, we have added a detailed description of the mechanism: the geometric cues are injected by encoding the prior's depth and normal information and using them as conditioning inputs to the denoising U-Net through cross-attention layers. We have also incorporated an auxiliary loss term to maintain fidelity to the geometric prior during refinement. Regarding fidelity to the original image, the Inflated Prior is built directly from the input, and the latent refinement is constrained to stay close to the prior's latent code. Furthermore, we have included an analysis of failure modes in a new subsection, discussing scenarios of over-smoothing (addressed by lower noise injection levels) and potential hallucination (prevented by the strong guidance from the prior). These revisions ensure the claims are better supported. revision: yes
-
Referee: [Evaluation] Evaluation section: The SOTA claim rests on 'extensive qualitative and quantitative evaluations' and a user study validating Compactness and Normal Anisotropy, yet the provided text supplies no numerical results, ablation tables, baseline comparisons, participant counts, rating protocol, or statistical analysis. Without these, the performance claims and metric validation cannot be assessed.
Authors: We acknowledge that the main body of the submitted manuscript did not include the full set of numerical results and detailed evaluation protocols, which may have made it difficult to assess the claims. The quantitative evaluations and user study details were partially described but not with the full tables and statistics. In the revised version, we have expanded the Evaluation section to include comprehensive numerical results comparing against state-of-the-art methods, ablation studies demonstrating the contribution of each component, and full details of the user study including the number of participants, the rating protocol (5-point Likert scale on volume and flatness), and statistical analysis (e.g., correlation coefficients and significance tests showing alignment with human perception). These additions will allow for a complete assessment of our performance claims and the validity of the proposed metrics. revision: yes
-
Referee: [Stage 1 description] Stage 1 (Inflated Prior construction): The process of superimposing part-aware details onto the inflated silhouette lacks specifics on detail extraction (e.g., from segmentation models or other sources) and resolution of potential conflicts between global inflation and local details. If these cues are weak or inconsistent, they directly undermine the reliability of the guidance provided to Stage 2.
Authors: Thank you for this comment on Stage 1. We agree that more specifics are needed to ensure reproducibility and to address potential issues with cue consistency. In the revised manuscript, we have elaborated on the detail extraction process: part-aware details are extracted using a pretrained segmentation model to identify semantic parts, followed by detail enhancement from the input image's texture and edges. For resolving conflicts between global inflation and local details, we employ a priority-based blending where local details override the global volume in regions of high detail density, using a weighted combination based on edge strength. We have added pseudocode and additional figures to illustrate this process. This should strengthen the reliability of the prior for guiding Stage 2. revision: yes
Circularity Check
No circularity: procedural pipeline and externally validated metrics remain self-contained.
full rationale
The paper presents a two-stage engineering pipeline (silhouette inflation plus part-aware detail overlay to form an Inflated Prior, followed by latent-space denoising that injects geometric cues from that prior into a pretrained 3D backbone) without any equations, fitted parameters, or claimed predictions that reduce to the inputs by construction. New metrics (Compactness and Normal Anisotropy) are defined procedurally and validated against an independent user study rather than by reference to the method's own outputs. No self-citations are invoked as load-bearing uniqueness theorems, and the SOTA claim is grounded in comparative evaluations on an external flat-image dataset. The derivation chain is therefore absent; the work is a plug-and-play heuristic that does not collapse to self-definition or fitted-input renaming.
Axiom & Free-Parameter Ledger
invented entities (1)
-
Inflated Prior
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Gwangbin Bae and Andrew J. Davison. Rethinking inductive biases for surface normal estimation. InIEEE/CVF Confer- ence on Computer Vision and Pattern Recognition (CVPR),
-
[2]
Fan- tasia3d: Disentangling geometry and appearance for high- quality text-to-3d content creation
Rui Chen, Yongwei Chen, Ningxin Jiao, and Kui Jia. Fan- tasia3d: Disentangling geometry and appearance for high- quality text-to-3d content creation. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 22246–22256, 2023. 2
2023
-
[3]
Dora: Sampling and benchmarking for 3d shape varia- tional auto-encoders
Rui Chen, Jianfeng Zhang, Yixun Liang, Guan Luo, Weiyu Li, Jiarui Liu, Xiu Li, Xiaoxiao Long, Jiashi Feng, and Ping Tan. Dora: Sampling and benchmarking for 3d shape varia- tional auto-encoders. InProceedings of the Computer Vision and Pattern Recognition Conference (CVPR), pages 16251– 16261, 2025. 3
2025
-
[4]
Panic-3d: Stylized single- view 3d reconstruction from portraits of anime characters
Shuhong Chen, Kevin Zhang, Yichun Shi, Heng Wang, Yi- heng Zhu, Guoxian Song, Sizhe An, Janus Kristjansson, Xiao Yang, and Matthias Zwicker. Panic-3d: Stylized single- view 3d reconstruction from portraits of anime characters. In Proceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition, pages 21068–21077, 2023. 3
2023
-
[5]
Text-to-3d using gaussian splatting
Zilong Chen, Feng Wang, Yikai Wang, and Huaping Liu. Text-to-3d using gaussian splatting. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 21401–21412, 2024. 2
2024
-
[6]
Art3D: Training-Free 3D Generation from Flat-Colored Illustration
Xiaoyan Cong, Jiayi Shen, Zekun Li, Rao Fu, Tao Lu, and Srinath Sridhar. Art3d: Training-free 3d generation from flat-colored illustration.arXiv preprint arXiv:2504.10466,
work page internal anchor Pith review Pith/arXiv arXiv
-
[7]
Monster mash: a single-view approach to casual 3d modeling and an- imation.ACM Transactions on Graphics (ToG), 39(6):1–12,
Marek Dvoro ˇzˇn´ak, Daniel S `ykora, Cassidy Curtis, Brian Curless, Olga Sorkine-Hornung, and David Salesin. Monster mash: a single-view approach to casual 3d modeling and an- imation.ACM Transactions on Graphics (ToG), 39(6):1–12,
-
[8]
Magictoon: A 2d-to-3d creative cartoon modeling system with mobile ar
Lele Feng, Xubo Yang, and Shuangjiu Xiao. Magictoon: A 2d-to-3d creative cartoon modeling system with mobile ar. In2017 IEEE Virtual Reality (VR), pages 195–204. IEEE,
-
[9]
Anchit Gupta, Wenhan Xiong, Yixin Nie, Ian Jones, and Bar- las O˘guz. 3dgen: Triplane latent diffusion for textured mesh generation.arXiv preprint arXiv:2303.05371, 2023. 3
-
[10]
Deeps- ketch2face: a deep learning based sketching system for 3d face and caricature modeling.ACM Transactions on graph- ics (TOG), 36(4):1–12, 2017
Xiaoguang Han, Chang Gao, and Yizhou Yu. Deeps- ketch2face: a deep learning based sketching system for 3d face and caricature modeling.ACM Transactions on graph- ics (TOG), 36(4):1–12, 2017. 3
2017
-
[11]
Stdgen: Semantic-decomposed 3d character generation from single images
Yuze He, Yanning Zhou, Wang Zhao, Zhongkai Wu, Kai- wen Xiao, Wei Yang, Yong-Jin Liu, and Xiao Han. Stdgen: Semantic-decomposed 3d character generation from single images. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 26345–26355, 2025. 3
2025
-
[12]
Fangzhou Hong, Jiaxiang Tang, Ziang Cao, Min Shi, Tong Wu, Zhaoxi Chen, Shuai Yang, Tengfei Wang, Liang Pan, Dahua Lin, et al. 3dtopia: Large text-to-3d genera- tion model with hybrid diffusion priors.arXiv preprint arXiv:2403.02234, 2024. 3
-
[13]
Teddy: a sketching interface for 3d freeform design
Takeo Igarashi, Satoshi Matsuoka, and Hidehiko Tanaka. Teddy: a sketching interface for 3d freeform design. In Proceedings of the 26th Annual Conference on Computer Graphics and Interactive Techniques, page 409–416, USA,
-
[14]
ACM Press/Addison-Wesley Publishing Co. 3
-
[15]
Shap-e: Generating conditional 3d implicit functions
Heewoo Jun and Alex Nichol. Shap-e: Generat- ing conditional 3d implicit functions.arXiv preprint arXiv:2305.02463, 2023. 3
-
[16]
Elucidating the design space of diffusion-based generative models.Advances in neural information processing systems, 35:26565–26577, 2022
Tero Karras, Miika Aittala, Timo Aila, and Samuli Laine. Elucidating the design space of diffusion-based generative models.Advances in neural information processing systems, 35:26565–26577, 2022. 5
2022
-
[17]
3d gaussian splatting for real-time radiance field rendering.ACM Transactions on Graphics, 42 (4), 2023
Bernhard Kerbl, Georgios Kopanas, Thomas Leimk ¨uhler, and George Drettakis. 3d gaussian splatting for real-time radiance field rendering.ACM Transactions on Graphics, 42 (4), 2023. 2
2023
-
[18]
Modeling from contour drawings
Vladislav Kraevoy, Alla Sheffer, and Michiel Van De Panne. Modeling from contour drawings. InProceedings of the 6th Eurographics Symposium on Sketch-Based interfaces and Modeling, pages 37–44, 2009. 3
2009
-
[19]
Ln3diff: Scalable latent neural fields diffusion for speedy 3d generation
Yushi Lan, Fangzhou Hong, Shuai Yang, Shangchen Zhou, Xuyi Meng, Bo Dai, Xingang Pan, and Chen Change Loy. Ln3diff: Scalable latent neural fields diffusion for speedy 3d generation. InECCV, 2024. 3
2024
-
[20]
Instant3d: Fast text-to-3d with sparse-view gen- eration and large reconstruction model
Jiahao Li, Hao Tan, Kai Zhang, Zexiang Xu, Fujun Luan, Yinghao Xu, Yicong Hong, Kalyan Sunkavalli, Greg Shakhnarovich, and Sai Bi. Instant3d: Fast text-to-3d with sparse-view generation and large reconstruction model. arXiv preprint arXiv:2311.06214, 2023. 2
-
[21]
Lin Li, Zehuan Huang, Haoran Feng, Gengxiong Zhuang, Rui Chen, Chunchao Guo, and Lu Sheng. V oxhammer: Training-free precise and coherent 3d editing in native 3d space.arXiv preprint arXiv:2508.19247, 2025. 16, 17
-
[22]
Magic3d: High-resolution text-to-3d content creation
Chen-Hsuan Lin, Jun Gao, Luming Tang, Towaki Takikawa, Xiaohui Zeng, Xun Huang, Karsten Kreis, Sanja Fidler, Ming-Yu Liu, and Tsung-Yi Lin. Magic3d: High-resolution text-to-3d content creation. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 300–309, 2023. 2
2023
-
[23]
One-2-3-45: Any single image to 3d mesh in 45 seconds without per-shape optimiza- tion.Advances in Neural Information Processing Systems, 36:22226–22246, 2023
Minghua Liu, Chao Xu, Haian Jin, Linghao Chen, Mukund Varma T, Zexiang Xu, and Hao Su. One-2-3-45: Any single image to 3d mesh in 45 seconds without per-shape optimiza- tion.Advances in Neural Information Processing Systems, 36:22226–22246, 2023. 2
2023
-
[24]
One-2-3-45++: Fast single image to 3d objects with consistent multi-view generation and 3d dif- fusion
Minghua Liu, Ruoxi Shi, Linghao Chen, Zhuoyang Zhang, Chao Xu, Xinyue Wei, Hansheng Chen, Chong Zeng, Ji- ayuan Gu, and Hao Su. One-2-3-45++: Fast single image to 3d objects with consistent multi-view generation and 3d dif- fusion. InProceedings of the IEEE/CVF conference on com- puter vision and pattern recognition, pages 10072–10083,
-
[25]
Zero-1-to- 3: Zero-shot one image to 3d object
Ruoshi Liu, Rundi Wu, Basile Van Hoorick, Pavel Tok- makov, Sergey Zakharov, and Carl V ondrick. Zero-1-to- 3: Zero-shot one image to 3d object. InProceedings of the IEEE/CVF international conference on computer vision, pages 9298–9309, 2023. 1, 2
2023
-
[26]
Yuan Liu, Cheng Lin, Zijiao Zeng, Xiaoxiao Long, Lingjie Liu, Taku Komura, and Wenping Wang. Syncdreamer: Gen- erating multiview-consistent images from a single-view im- age.arXiv preprint arXiv:2309.03453, 2023. 2
-
[27]
Wonder3d: Sin- gle image to 3d using cross-domain diffusion
Xiaoxiao Long, Yuan-Chen Guo, Cheng Lin, Yuan Liu, Zhiyang Dou, Lingjie Liu, Yuexin Ma, Song-Hai Zhang, Marc Habermann, Christian Theobalt, et al. Wonder3d: Sin- gle image to 3d using cross-domain diffusion. InProceed- ings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9970–9980, 2024. 2
2024
-
[28]
Simpmodeling: Sketching implicit field to guide mesh modeling for 3d animalmorphic head de- sign
Zhongjin Luo, Jie Zhou, Heming Zhu, Dong Du, Xiaoguang Han, and Hongbo Fu. Simpmodeling: Sketching implicit field to guide mesh modeling for 3d animalmorphic head de- sign. InThe 34th annual ACM symposium on user interface software and technology, pages 854–863, 2021. 3
2021
-
[29]
Ra- bit: Parametric modeling of 3d biped cartoon characters with a topological-consistent dataset
Zhongjin Luo, Shengcai Cai, Jinguo Dong, Ruibo Ming, Liangdong Qiu, Xiaohang Zhan, and Xiaoguang Han. Ra- bit: Parametric modeling of 3d biped cartoon characters with a topological-consistent dataset. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12825–12835, 2023. 2, 3
2023
-
[30]
SDEdit: Guided Image Synthesis and Editing with Stochastic Differential Equations
Chenlin Meng, Yutong He, Yang Song, Jiaming Song, Jia- jun Wu, Jun-Yan Zhu, and Stefano Ermon. Sdedit: Guided image synthesis and editing with stochastic differential equa- tions.arXiv preprint arXiv:2108.01073, 2021. 5
work page internal anchor Pith review arXiv 2021
-
[31]
Nerf: Representing scenes as neural radiance fields for view syn- thesis.Communications of the ACM, 65(1):99–106, 2021
Ben Mildenhall, Pratul P Srinivasan, Matthew Tancik, Jonathan T Barron, Ravi Ramamoorthi, and Ren Ng. Nerf: Representing scenes as neural radiance fields for view syn- thesis.Communications of the ACM, 65(1):99–106, 2021. 2
2021
-
[32]
DINOv2: Learning Robust Visual Features without Supervision
Maxime Oquab, Timoth ´ee Darcet, Th ´eo Moutakanni, Huy V o, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, et al. Dinov2: Learning robust visual features without supervision. arXiv preprint arXiv:2304.07193, 2023. 5
work page internal anchor Pith review arXiv 2023
-
[33]
Charactergen: Efficient 3d character generation from single images with multi-view pose canon- icalization.ACM Transactions on Graphics (TOG), 43(4): 1–13, 2024
Hao-Yang Peng, Jia-Peng Zhang, Meng-Hao Guo, Yan-Pei Cao, and Shi-Min Hu. Charactergen: Efficient 3d character generation from single images with multi-view pose canon- icalization.ACM Transactions on Graphics (TOG), 43(4): 1–13, 2024. 2, 3
2024
-
[34]
DreamFusion: Text-to-3D using 2D Diffusion
Ben Poole, Ajay Jain, Jonathan T Barron, and Ben Milden- hall. Dreamfusion: Text-to-3d using 2d diffusion.arXiv preprint arXiv:2209.14988, 2022. 2
work page internal anchor Pith review arXiv 2022
-
[35]
Magic123: One image to high-quality 3d object generation using both 2d and 3d diffusion priors
Guocheng Qian, Jinjie Mai, Abdullah Hamdi, Jian Ren, Aliaksandr Siarohin, Bing Li, Hsin-Ying Lee, Ivan Sko- rokhodov, Peter Wonka, Sergey Tulyakov, and Bernard Ghanem. Magic123: One image to high-quality 3d object generation using both 2d and 3d diffusion priors. InThe Twelfth International Conference on Learning Representa- tions (ICLR), 2024. 2
2024
-
[36]
Learning transferable visual models from natural language supervi- sion
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural language supervi- sion. InInternational conference on machine learning, pages 8748–8763. PmLR, 2021. 5
2021
-
[37]
SAM 2: Segment Anything in Images and Videos
Nikhila Ravi, Valentin Gabeur, Yuan-Ting Hu, Ronghang Hu, Chaitanya Ryali, Tengyu Ma, Haitham Khedr, Roman R¨adle, Chloe Rolland, Laura Gustafson, Eric Mintun, Junt- ing Pan, Kalyan Vasudev Alwala, Nicolas Carion, Chao- Yuan Wu, Ross Girshick, Piotr Doll´ar, and Christoph Feicht- enhofer. Sam 2: Segment anything in images and videos. arXiv preprint arXiv:...
work page internal anchor Pith review arXiv 2024
-
[38]
Xcube: Large-scale 3d generative modeling using sparse voxel hierarchies
Xuanchi Ren, Jiahui Huang, Xiaohui Zeng, Ken Museth, Sanja Fidler, and Francis Williams. Xcube: Large-scale 3d generative modeling using sparse voxel hierarchies. InPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024. 3
2024
-
[39]
Zeronvs: Zero-shot 360- degree view synthesis from a single image
Kyle Sargent, Zizhang Li, Tanmay Shah, Charles Herrmann, Hong-Xing Yu, Yunzhi Zhang, Eric Ryan Chan, Dmitry La- gun, Li Fei-Fei, Deqing Sun, et al. Zeronvs: Zero-shot 360- degree view synthesis from a single image. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9420–9429, 2024. 2
2024
-
[40]
Zero123++: a single image to consistent multi-view dif- fusion base model, 2023
Ruoxi Shi, Hansheng Chen, Zhuoyang Zhang, Minghua Liu, Chao Xu, Xinyue Wei, Linghao Chen, Chong Zeng, and Hao Su. Zero123++: a single image to consistent multi-view dif- fusion base model, 2023. 1
2023
-
[41]
MVDream: Multi-view Diffusion for 3D Generation
Yichun Shi, Peng Wang, Jianglong Ye, Mai Long, Kejie Li, and Xiao Yang. Mvdream: Multi-view diffusion for 3d gen- eration.arXiv preprint arXiv:2308.16512, 2023. 2
work page internal anchor Pith review arXiv 2023
-
[42]
Score-Based Generative Modeling through Stochastic Differential Equations
Yang Song, Jascha Sohl-Dickstein, Diederik P Kingma, Ab- hishek Kumar, Stefano Ermon, and Ben Poole. Score-based generative modeling through stochastic differential equa- tions.arXiv preprint arXiv:2011.13456, 2020. 5
work page internal anchor Pith review arXiv 2011
-
[43]
Magiccartoon: 3d pose and shape esti- mation for bipedal cartoon characters
Yu-Pei Song, Yuan-Tong Liu, Xiao Wu, Qi He, Zhaoquan Yuan, and Ao Luo. Magiccartoon: 3d pose and shape esti- mation for bipedal cartoon characters. InProceedings of the 32nd ACM International Conference on Multimedia, pages 8219–8227, 2024. 2, 3
2024
-
[44]
arXiv preprint arXiv:2310.16818 , year=
Jingxiang Sun, Bo Zhang, Ruizhi Shao, Lizhen Wang, Wen Liu, Zhenda Xie, and Yebin Liu. Dreamcraft3d: Hierarchi- cal 3d generation with bootstrapped diffusion prior.arXiv preprint arXiv:2310.16818, 2023. 2
-
[45]
arXiv preprint arXiv:2309.16653 , year=
Jiaxiang Tang, Jiawei Ren, Hang Zhou, Ziwei Liu, and Gang Zeng. Dreamgaussian: Generative gaussian splatting for effi- cient 3d content creation.arXiv preprint arXiv:2309.16653,
-
[46]
Hunyuan3d 2.1: From images to high-fidelity 3d assets with production-ready pbr material,
Tencent Hunyuan3D Team. Hunyuan3d 2.1: From images to high-fidelity 3d assets with production-ready pbr material,
-
[47]
1, 2, 3, 5, 6, 16, 18
-
[48]
Hunyuan3d-omni: A unified framework for controllable generation of 3d assets, 2025
Tencent Hunyuan3D Team. Hunyuan3d-omni: A unified framework for controllable generation of 3d assets, 2025. 6, 18
2025
-
[49]
Hunyuan3d 2.0: Scaling diffu- sion models for high resolution textured 3d assets generation,
Tencent Hunyuan3D Team. Hunyuan3d 2.0: Scaling diffu- sion models for high resolution textured 3d assets generation,
-
[50]
Score jacobian chaining: Lifting pretrained 2d diffusion models for 3d generation
Haochen Wang, Xiaodan Du, Jiahao Li, Raymond A Yeh, and Greg Shakhnarovich. Score jacobian chaining: Lifting pretrained 2d diffusion models for 3d generation. InPro- ceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 12619–12629, 2023. 2
2023
-
[51]
Nova-3d: Non-overlapped views for 3d anime character reconstruc- tion
Hongsheng Wang, Xinrui Zhou, and Feng Lin. Nova-3d: Non-overlapped views for 3d anime character reconstruc- tion. InProceedings of the 6th ACM International Confer- ence on Multimedia in Asia Workshops, pages 1–7, 2024. 3
2024
-
[52]
arXiv preprint arXiv:2312.02201 , year=
Peng Wang and Yichun Shi. Imagedream: Image-prompt multi-view diffusion for 3d generation.arXiv preprint arXiv:2312.02201, 2023. 2
-
[53]
Zhengyi Wang, Cheng Lu, Yikai Wang, Fan Bao, Chongxuan Li, Hang Su, and Jun Zhu. Prolificdreamer: High-fidelity and diverse text-to-3d generation with variational score distilla- tion.arXiv preprint arXiv:2305.16213, 2023. 2
-
[54]
Direct3d: Scal- able image-to-3d generation via 3d latent diffusion trans- former.Advances in Neural Information Processing Systems, 37:121859–121881, 2024
Shuang Wu, Youtian Lin, Feihu Zhang, Yifei Zeng, Jingxi Xu, Philip Torr, Xun Cao, and Yao Yao. Direct3d: Scal- able image-to-3d generation via 3d latent diffusion trans- former.Advances in Neural Information Processing Systems, 37:121859–121881, 2024. 1, 2, 3, 5, 6, 16
2024
-
[55]
3d shapenets: A deep representation for volumetric shapes
Zhirong Wu, Shuran Song, Aditya Khosla, Fisher Yu, Lin- guang Zhang, Xiaoou Tang, and Jianxiong Xiao. 3d shapenets: A deep representation for volumetric shapes. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1912–1920, 2015. 6, 15
1912
-
[56]
arXiv preprint arXiv:2412.01506 (2024) 4
Jianfeng Xiang, Zelong Lv, Sicheng Xu, Yu Deng, Ruicheng Wang, Bowen Zhang, Dong Chen, Xin Tong, and Jiaolong Yang. Structured 3d latents for scalable and versatile 3d gen- eration.arXiv preprint arXiv:2412.01506, 2024. 1, 3, 6, 18
-
[57]
Jiale Xu, Weihao Cheng, Yiming Gao, Xintao Wang, Shenghua Gao, and Ying Shan. Instantmesh: Efficient 3d mesh generation from a single image with sparse-view large reconstruction models.arXiv preprint arXiv:2404.07191,
work page internal anchor Pith review arXiv
-
[58]
Ulip: Learning a unified representation of language, images, and point clouds for 3d understanding
Le Xue, Mingfei Gao, Chen Xing, Roberto Mart ´ın-Mart´ın, Jiajun Wu, Caiming Xiong, Ran Xu, Juan Carlos Niebles, and Silvio Savarese. Ulip: Learning a unified representation of language, images, and point clouds for 3d understanding. InProceedings of the IEEE/CVF conference on computer vi- sion and pattern recognition, pages 1179–1189, 2023. 2, 7
2023
-
[59]
Depth any- thing v2.Advances in Neural Information Processing Sys- tems, 37:21875–21911, 2024
Lihe Yang, Bingyi Kang, Zilong Huang, Zhen Zhao, Xiao- gang Xu, Jiashi Feng, and Hengshuang Zhao. Depth any- thing v2.Advances in Neural Information Processing Sys- tems, 37:21875–21911, 2024. 2
2024
-
[60]
Gaussiandreamer: Fast generation from text to 3d gaussians by bridging 2d and 3d diffusion models
Taoran Yi, Jiemin Fang, Junjie Wang, Guanjun Wu, Lingxi Xie, Xiaopeng Zhang, Wenyu Liu, Qi Tian, and Xinggang Wang. Gaussiandreamer: Fast generation from text to 3d gaussians by bridging 2d and 3d diffusion models. InPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6796–6807, 2024. 2
2024
-
[61]
Taoran Yi, Jiemin Fang, Zanwei Zhou, Junjie Wang, Guan- jun Wu, Lingxi Xie, Xiaopeng Zhang, Wenyu Liu, Xing- gang Wang, and Qi Tian. Gaussiandreamerpro: Text to ma- nipulable 3d gaussians with highly enhanced quality.arXiv preprint arXiv:2406.18462, 2024. 2
-
[62]
Grm: Large gaussian reconstruction model for efficient 3d reconstruction and generation, 2024
Xu Yinghao, Shi Zifan, Yifan Wang, Chen Hansheng, Yang Ceyuan, Peng Sida, Shen Yujun, and Wetzstein Gordon. Grm: Large gaussian reconstruction model for efficient 3d reconstruction and generation, 2024. 2
2024
-
[63]
Lion: Latent point diffusion models for 3d shape generation
Xiaohui Zeng, Arash Vahdat, Francis Williams, Zan Gojcic, Or Litany, Sanja Fidler, and Karsten Kreis. Lion: Latent point diffusion models for 3d shape generation. InAdvances in Neural Information Processing Systems (NeurIPS), 2022. 3
2022
-
[64]
3dshape2vecset: A 3d shape representation for neu- ral fields and generative diffusion models.ACM Transactions On Graphics (TOG), 42(4):1–16, 2023
Biao Zhang, Jiapeng Tang, Matthias Niessner, and Peter Wonka. 3dshape2vecset: A 3d shape representation for neu- ral fields and generative diffusion models.ACM Transactions On Graphics (TOG), 42(4):1–16, 2023. 3
2023
-
[65]
Creatureshop: Interactive 3d character modeling and texturing from a single color drawing.IEEE Transactions on Visualization and Computer Graphics, 29(12):4874–4890,
Congyi Zhang, Lei Yang, Nenglun Chen, Nicholas Vining, Alla Sheffer, Francis CM Lau, Guoping Wang, and Wenping Wang. Creatureshop: Interactive 3d character modeling and texturing from a single color drawing.IEEE Transactions on Visualization and Computer Graphics, 29(12):4874–4890,
-
[66]
Clay: A controllable large-scale generative model for creat- ing high-quality 3d assets.ACM Transactions on Graphics (TOG), 43(4):1–20, 2024
Longwen Zhang, Ziyu Wang, Qixuan Zhang, Qiwei Qiu, Anqi Pang, Haoran Jiang, Wei Yang, Lan Xu, and Jingyi Yu. Clay: A controllable large-scale generative model for creat- ing high-quality 3d assets.ACM Transactions on Graphics (TOG), 43(4):1–20, 2024. 3
2024
-
[67]
Xuying Zhang, Yupeng Zhou, Kai Wang, Yikai Wang, Zhen Li, Shaohui Jiao, Daquan Zhou, Qibin Hou, and Ming- Ming Cheng. Ar-1-to-3: Single image to consistent 3d object generation via next-view prediction.arXiv preprint arXiv:2503.12929, 2025. 2
-
[68]
Michelangelo: Conditional 3d shape generation based on shape-image-text aligned latent representation.Advances in neural information processing systems, 36:73969–73982,
Zibo Zhao, Wen Liu, Xin Chen, Xianfang Zeng, Rui Wang, Pei Cheng, Bin Fu, Tao Chen, Gang Yu, and Shenghua Gao. Michelangelo: Conditional 3d shape generation based on shape-image-text aligned latent representation.Advances in neural information processing systems, 36:73969–73982,
-
[69]
Free3d: Consistent novel view synthesis without 3d representation
Chuanxia Zheng and Andrea Vedaldi. Free3d: Consistent novel view synthesis without 3d representation. InProceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9720–9731, 2024. 2
2024
-
[70]
Uni3d: Ex- ploring unified 3d representation at scale,
Junsheng Zhou, Jinsheng Wang, Baorui Ma, Yu-Shen Liu, Tiejun Huang, and Xinlong Wang. Uni3d: Exploring unified 3d representation at scale.arXiv preprint arXiv:2310.06773,
-
[71]
Drawingspinup: 3d animation from single character draw- ings
Jie Zhou, Chufeng Xiao, Miu-Ling Lam, and Hongbo Fu. Drawingspinup: 3d animation from single character draw- ings. InSIGGRAPH Asia 2024 Conference Papers, pages 1–10, 2024. 2, 3, 6, 18 REVIVE 3D: Refinement via Encoded Voluminous Inflated prior for Volume Enhancement (Supplementary Material) Figure 11. Contour and Cut Visualization. The outer boundary is ...
2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.