CORGI: Consistency-Aware 3D Dog Reconstruction from a Single Image in the Wild
Pith reviewed 2026-07-02 15:24 UTC · model grok-4.3
The pith
CORGI reconstructs geometrically accurate and animatable 3D dog models from a single unconstrained image without any 3D supervision.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
CORGI eliminates the need for 3D supervision by using Canonical-Driven Orbital Generation to normalize arbitrary input poses and synthesize reliable 360-degree video observations with specialized Canonical and Orbit LoRAs, Consistency-aware Deformable 3DGS to anchor on a D-SMAL prior and learn vertex displacements via neural deformation fields that capture generative errors, and a self-supervised Deformation-Conditioned Generative Repair module to eliminate structural distortions and recover high-frequency details.
What carries the argument
The three-component CORGI pipeline: Canonical-Driven Orbital Generation (CDOG) for pose normalization and 360-degree synthesis via LoRAs, Consistency-aware Deformable 3DGS (CA-3DGS) for explicit error modeling with neural fields, and Deformation-Conditioned Generative Repair (DCGR) for self-supervised detail recovery.
If this is right
- The output models are geometrically accurate and fully animatable for downstream applications.
- The method generalizes across diverse dog breeds from unconstrained inputs.
- No 3D supervision is required at any stage of training or inference.
- High-frequency details are recovered while structural distortions are removed.
Where Pith is reading between the lines
- The consistency mechanisms could extend to single-image reconstruction of other articulated animals that lack 3D datasets.
- Explicit modeling of generative errors via deformation fields may reduce the data requirements for similar 3D tasks on non-rigid objects.
- Self-supervised repair after view synthesis might improve outputs in other pipelines that rely on generative multi-view creation.
Load-bearing premise
The Canonical-Driven Orbital Generation strategy with specialized LoRAs can normalize arbitrary input poses and synthesize reliable 360-degree observations without introducing uncorrectable generative inconsistencies.
What would settle it
A test set of single dog images where the generated 360-degree videos contain persistent view inconsistencies that produce visibly distorted vertex positions in the final 3D models after deformation fields and repair are applied.
read the original abstract
Reconstructing high-fidelity 3D models of highly articulated animals, such as dogs, from a single in-the-wild image remains a formidable challenge. In this paper, we introduce CORGI, a novel framework for consistency-aware 3D dog reconstruction from a single unconstrained image that completely eliminates the need for 3D supervision. To overcome generative inconsistencies and the lack of multi-view capture, our pipeline introduces three core components. First, we propose a Canonical-Driven Orbital Generation (CDOG) strategy, utilizing specialized Canonical and Orbit LoRAs to normalize arbitrary input poses and synthesize reliable 360-degree video observations. Second, we design a Consistency-aware Deformable 3DGS (CA-3DGS) module that anchors on a D-SMAL prior, explicitly modeling per-view generative errors through dedicated neural deformation fields to learn accurate vertex-level displacements. Finally, to eliminate structural distortions and recover high-frequency details, we introduce a self-supervised Deformation-Conditioned Generative Repair (DCGR) module. Extensive experiments demonstrate that CORGI achieves state-of-the-art performance, generalizing seamlessly across diverse dog breeds to produce geometrically accurate, visually coherent, and fully animatable 3D assets ready for downstream applications.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces CORGI, a framework for 3D dog reconstruction from a single in-the-wild image that eliminates 3D supervision. It proposes three components: Canonical-Driven Orbital Generation (CDOG) using specialized Canonical and Orbit LoRAs to normalize poses and synthesize 360-degree observations; Consistency-aware Deformable 3DGS (CA-3DGS) that anchors on a D-SMAL prior and models per-view generative errors via neural deformation fields for vertex displacements; and self-supervised Deformation-Conditioned Generative Repair (DCGR) to fix structural distortions and recover high-frequency details. The manuscript claims state-of-the-art performance with generalization across dog breeds, yielding geometrically accurate, visually coherent, and fully animatable 3D assets.
Significance. If the central claims hold, the work would be significant for single-view 3D reconstruction of highly articulated animals, as it removes reliance on 3D ground truth while addressing generative inconsistencies through explicit modeling and repair stages. The integration of LoRA-based view synthesis with deformable 3DGS and self-supervised repair offers a practical path to animatable assets from unconstrained images.
major comments (2)
- [CDOG strategy (Abstract and method description)] The no-3D-supervision claim rests on CDOG producing reliable 360° observations that CA-3DGS can anchor to D-SMAL without uncorrectable artifacts. The manuscript provides no quantitative bound on residual view inconsistency after CDOG, nor an ablation isolating CDOG's contribution to final geometric error (see skeptic note on pose-dependent artifacts or texture drift not captured by the neural deformation fields).
- [CA-3DGS module] CA-3DGS models per-view generative errors through dedicated neural deformation fields, but it is unclear how these fields are trained or regularized to ensure the vertex displacements remain consistent across the synthesized views without introducing bias from the generative step.
minor comments (2)
- [Abstract] The abstract states 'extensive experiments demonstrate SOTA performance' but provides no specific metrics, baselines, or dataset details; these should be summarized with quantitative results in the abstract or introduction.
- [Method sections] Notation for D-SMAL, DCGR, and the neural deformation fields should be defined at first use with explicit equations for the deformation fields and loss terms.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback and positive assessment of the work's significance. We address each major comment point by point below, indicating where revisions will be made to strengthen the manuscript.
read point-by-point responses
-
Referee: [CDOG strategy (Abstract and method description)] The no-3D-supervision claim rests on CDOG producing reliable 360° observations that CA-3DGS can anchor to D-SMAL without uncorrectable artifacts. The manuscript provides no quantitative bound on residual view inconsistency after CDOG, nor an ablation isolating CDOG's contribution to final geometric error (see skeptic note on pose-dependent artifacts or texture drift not captured by the neural deformation fields).
Authors: We acknowledge that the current manuscript lacks an explicit quantitative bound on residual view inconsistency after CDOG and does not include an ablation isolating its contribution to geometric error. The overall performance gains and qualitative results support the approach, but to directly address this point we will add both a consistency metric (e.g., average cross-view PSNR on synthesized observations) and a dedicated ablation study measuring the impact of CDOG on final reconstruction error in the revised version. revision: yes
-
Referee: [CA-3DGS module] CA-3DGS models per-view generative errors through dedicated neural deformation fields, but it is unclear how these fields are trained or regularized to ensure the vertex displacements remain consistent across the synthesized views without introducing bias from the generative step.
Authors: The neural deformation fields are optimized via a self-supervised objective combining photometric rendering loss across the CDOG-synthesized views with an explicit cross-view consistency regularizer on the predicted displacements. This procedure is described in Section 3.2, but we agree the description can be made clearer regarding regularization details and bias mitigation. We will expand the method section with the precise loss formulation and training schedule in the revision. revision: yes
Circularity Check
No significant circularity; derivation relies on independent generative priors and external anchors
full rationale
The pipeline introduces CDOG (LoRA-based view synthesis), CA-3DGS (deformation fields anchored to D-SMAL prior), and DCGR (self-supervised repair) as sequential modules. No equations or self-citations are presented that reduce a claimed prediction or consistency model to a fitted quantity from the same generative step by construction. The no-3D-supervision claim rests on external generative models and a fixed SMAL prior rather than internal re-fitting of the target output. This is the common case of a self-contained method paper whose central claims remain independently falsifiable against held-out images or alternative generators.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Li Z, Amrani A, Rai S, Laga H. Advances and Trends in the 3D Reconstruction of the Shape and Motion of Animals.arXiv preprint arXiv:2508.16062, 2025
-
[2]
Bite: Beyond Priors for Improved Three-D Dog Pose Estimation
R¨ uegg N, Tripathi S, Schindler K, Black MJ, Zuffi S. Bite: Beyond Priors for Improved Three-D Dog Pose Estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, 8867–8876
2023
-
[3]
3D Menagerie: Modeling the 3D Shape and Pose of Animals
Zuffi S, Kanazawa A, Jacobs DW, Black MJ. 3D Menagerie: Modeling the 3D Shape and Pose of Animals. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2017, 6365–6373
2017
-
[4]
Animal Avatars: Recon- structing Animatable 3D Animals from Casual Videos
Sabathier R, Mitra NJ, Novotny D. Animal Avatars: Recon- structing Animatable 3D Animals from Casual Videos. In European Conference on Computer Vision, 2024, 270–287
2024
-
[5]
Learning the 3d Fauna of the Web
Li Z, Litvak D, Li R, Zhang Y, Jakab T, Rupprecht C, Wu S, Vedaldi A, Wu J. Learning the 3d Fauna of the Web. In CORGI:Consistency-Aware 3D DogReconstruction from a SingleImage in the Wild 15 Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, 9752–9762
2024
-
[6]
Lassie: Learning Articulated Shapes from Sparse Image En- semble via 3D Part Discovery.Advances in Neural Information Processing Systems, 2022, 35: 15296–15308
Yao CH, Hung WC, Li Y, Rubinstein M, Yang MH, Jampani V. Lassie: Learning Articulated Shapes from Sparse Image En- semble via 3D Part Discovery.Advances in Neural Information Processing Systems, 2022, 35: 15296–15308
2022
-
[7]
Magicpony: Learning Articulated 3D Animals in the Wild
Wu S, Li R, Jakab T, Rupprecht C, Vedaldi A. Magicpony: Learning Articulated 3D Animals in the Wild. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, 8792–8802
2023
-
[8]
Zero-1-to-3: Zero-shot one Image to 3D Object
Liu R, Wu R, Van Hoorick B, Tokmakov P, Zakharov S, Vondrick C. Zero-1-to-3: Zero-shot one Image to 3D Object. InProceedings of the IEEE/CVF International Conference on Computer Vision, 2023, 9298–9309
2023
-
[9]
Syncdreamer: Generating Multiview-Consistent Images from a Single-View Image
Liu Y, Lin C, Zeng Z, Long X, Liu L, Komura T, Wang W. Syncdreamer: Generating Multiview-Consistent Images from a Single-View Image. InInternational Conference on Learning Representations, volume 2024, 2024, 27676–27697
2024
-
[10]
One-2-3-45++: Fast Single Image to 3D Objects with Consistent Multi-View Generation and 3D Diffusion
Liu M, Shi R, Chen L, Zhang Z, Xu C, Wei X, Chen H, Zeng C, Gu J, Su H. One-2-3-45++: Fast Single Image to 3D Objects with Consistent Multi-View Generation and 3D Diffusion. arXiv preprint arXiv:2311.07885, 2023
-
[11]
Genfusion: Closing the Loop between Reconstruction and Generation via Videos
Wu S, Xu C, Huang B, Geiger A, Chen A. Genfusion: Closing the Loop between Reconstruction and Generation via Videos. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025, 6078–6088
2025
-
[12]
Gen3c: 3D-Informed World- Consistent Video Generation with Precise Camera Control
Ren X, Shen T, Huang J, Ling H, Lu Y, Nimier-David M, M¨ uller T, Keller A, Fidler S, Gao J. Gen3c: 3D-Informed World- Consistent Video Generation with Precise Camera Control. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025, 6121–6132
2025
-
[13]
DogRecon: Canine Prior- Guided Animatable 3D Gaussian Dog Reconstruction From A Single Image.International Journal of Computer Vision, 2025, 133(9): 6332–6346
Cho G, Kang C, Soon D, Joo K. DogRecon: Canine Prior- Guided Animatable 3D Gaussian Dog Reconstruction From A Single Image.International Journal of Computer Vision, 2025, 133(9): 6332–6346
2025
-
[14]
3D Gaussian Splatting for Real-Time Radiance Field Rendering
Kerbl B, Kopanas G, Leimk¨ uhler T, Drettakis G, et al.. 3D Gaussian Splatting for Real-Time Radiance Field Rendering. ACM Transactions on Graphics (TOG), 2023, 42(4): 139–1
2023
-
[15]
Li C, Ghorbani N, Broom´e S, Rashid M, Black MJ, Hernlund E, Kjellstr¨om H, Zuffi S. Hsmal: Detailed Horse Shape and Pose Reconstruction for Motion Pattern Recognition.arXiv preprint arXiv:2106.10102, 2021
-
[16]
Varen: Very Accurate and Realistic Equine Network
Zuffi S, Mellbin Y, Li C, Hoeschle M, Kjellstr¨om H, Polikovsky S, Hernlund E, Black MJ. Varen: Very Accurate and Realistic Equine Network. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, 5374– 5383
2024
-
[17]
Lasr: Learning Articulated Shape Reconstruction from a Monocular Video
Yang G, Sun D, Jampani V, Vlasic D, Cole F, Chang H, Ramanan D, Freeman WT, Liu C. Lasr: Learning Articulated Shape Reconstruction from a Monocular Video. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, 15980–15989
2021
-
[18]
Barc: Learning to Regress 3D Dog Shape from Images by Exploiting Breed Information
Rueegg N, Zuffi S, Schindler K, Black MJ. Barc: Learning to Regress 3D Dog Shape from Images by Exploiting Breed Information. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, 3876–3884
2022
-
[19]
Animer: Animal Pose and shape Estimation using Family Aware Transformer
Lyu J, Zhu T, Gu Y, Lin L, Cheng P, Liu Y, Tang X, An L. Animer: Animal Pose and shape Estimation using Family Aware Transformer. InProceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition, 2025, 17486–17496
2025
-
[20]
Gart: Gaussian Articulated Template Models
Lei J, Wang Y, Pavlakos G, Liu L, Daniilidis K. Gart: Gaussian Articulated Template Models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, 19876–19887
2024
-
[21]
Banmo: Building Animatable 3D Neural Models from Many Casual Videos
Yang G, Vo M, Neverova N, Ramanan D, Vedaldi A, Joo H. Banmo: Building Animatable 3D Neural Models from Many Casual Videos. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, 2863–2873
2022
-
[22]
Hi-Lassie: High-Fidelity Articulated Shape and Skeleton Discovery from Sparse Image Ensemble
Yao CH, Hung WC, Li Y, Rubinstein M, Yang MH, Jampani V. Hi-Lassie: High-Fidelity Articulated Shape and Skeleton Discovery from Sparse Image Ensemble. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, 4853–4862
2023
-
[23]
Lepard: Learning Explicit Part Discovery for 3D Articulated Shape Reconstruction.Advances in Neural Information Processing Systems, 2023, 36: 54187–54198
Liu D, Stathopoulos A, Zhangli Q, Gao Y, Metaxas D. Lepard: Learning Explicit Part Discovery for 3D Articulated Shape Reconstruction.Advances in Neural Information Processing Systems, 2023, 36: 54187–54198
2023
-
[24]
Artic3d: Learning Robust Articulated 3D Ahapes from Noisy Web Image Collections.Advances in Neural Information Processing Systems, 2023, 36: 48173–48184
Yao CH, Raj A, Hung WC, Rubinstein M, Li Y, Yang MH, Jam- pani V. Artic3d: Learning Robust Articulated 3D Ahapes from Noisy Web Image Collections.Advances in Neural Information Processing Systems, 2023, 36: 48173–48184
2023
-
[25]
Casa: Category- Agnostic Skeletal Animal Reconstruction.Advances in Neural Information Processing Systems, 2022, 35: 28559–28574
Wu Y, Chen Z, Liu S, Ren Z, Wang S. Casa: Category- Agnostic Skeletal Animal Reconstruction.Advances in Neural Information Processing Systems, 2022, 35: 28559–28574
2022
-
[26]
Dualpm: Dual Posed-Canonical Point Maps for 3D Shape and Pose Reconstruction
Kaye B, Jakab T, Wu S, Ruprecht C, Vedaldi A. Dualpm: Dual Posed-Canonical Point Maps for 3D Shape and Pose Reconstruction. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025, 6425– 6435
2025
-
[27]
Diffusion Models for 3D Generation: A Survey.Computational Visual Media, 2025, 11(1): 1–28
Wang C, Peng HY, Liu YT, Gu J, Hu SM. Diffusion Models for 3D Generation: A Survey.Computational Visual Media, 2025, 11(1): 1–28
2025
-
[28]
What do Single-View 3D Reconstruction Networks Learn? In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, 3405–3414
Tatarchenko M, Richter SR, Ranftl R, Li Z, Koltun V, Brox T. What do Single-View 3D Reconstruction Networks Learn? In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, 3405–3414
2019
-
[29]
Single Image 3D Object Re- construction Based on Deep Learning: A Review.Multimedia Tools and Applications, 2021, 80: 463–498
Fu K, Peng J, He Q, Zhang H. Single Image 3D Object Re- construction Based on Deep Learning: A Review.Multimedia Tools and Applications, 2021, 80: 463–498
2021
-
[30]
Learning view Priors for Single-View 3D Reconstruction
Kato H, Harada T. Learning view Priors for Single-View 3D Reconstruction. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, 9778– 9787
2019
-
[31]
Self-Supervised Single-View 3D Reconstruction via Semantic 16 Y
Li X, Liu S, Kim K, De Mello S, Jampani V, Yang MH, Kautz J. Self-Supervised Single-View 3D Reconstruction via Semantic 16 Y. Wu, W. Li, B. Zhu, Y. Liu, Y. Cai, L. Liu Consistency. InEuropean Conference on Computer Vision, 2020, 677–693
2020
-
[32]
Single-View 3D reconstruction: A Survey of Deep Learning Methods.Computers & Graphics, 2021, 94: 164–190
Fahim G, Amin K, Zarif S. Single-View 3D reconstruction: A Survey of Deep Learning Methods.Computers & Graphics, 2021, 94: 164–190
2021
-
[33]
Graf: Generative Radiance Fields for 3D-Aware Image Synthesis.Advances in Neural Information Processing Systems, 2020, 33: 20154– 20166
Schwarz K, Liao Y, Niemeyer M, Geiger A. Graf: Generative Radiance Fields for 3D-Aware Image Synthesis.Advances in Neural Information Processing Systems, 2020, 33: 20154– 20166
2020
-
[34]
Giraffe: Representing Scenes as Com- positional Generative Neural Feature Fields
Niemeyer M, Geiger A. Giraffe: Representing Scenes as Com- positional Generative Neural Feature Fields. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, 11453–11464
2021
-
[35]
Stylenerf: A Style-Based 3D-Aware Generator for High-Resolution Image Synthesis
Gu J, Liu L, Wang P, Theobalt C. Stylenerf: A Style-Based 3D-Aware Generator for High-Resolution Image Synthesis. arXiv preprint arXiv:2110.08985, 2021
-
[36]
Efficient Geometry-Aware 3D Generative Adversarial Networks
Chan ER, Lin CZ, Chan MA, Nagano K, Pan B, De Mello S, Gallo O, Guibas LJ, Tremblay J, Khamis S, et al.. Efficient Geometry-Aware 3D Generative Adversarial Networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, 16123–16133
2022
-
[37]
DreamFusion: Text-to-3D using 2D Diffusion
Poole B, Jain A, Barron JT, Mildenhall B. Dreamfusion: Text- to-3D using 2D Diffusion.arXiv preprint arXiv:2209.14988, 2022
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[38]
Liang Y, Yang X, Lin J, Li H, Xu X, Chen Y. Luciddreamer: Towards High-Fidelity Text-to-3D Generation via Interval Score Matching.arXiv preprint arXiv:2311.11284, 2023
-
[39]
Magic123: One Image to High-Quality 3D Object Generation using both 2D and 3D Diffusion Priors
Qian G, Mai J, Hamdi A, Ren J, Siarohin A, Li B, Lee HY, Skorokhodov I, Wonka P, Tulyakov S, et al.. Magic123: One Image to High-Quality 3D Object Generation using both 2D and 3D Diffusion Priors. InInternational Conference on Learning Representations, volume 2024, 2024, 48142–48159
2024
-
[40]
Prolific- dreamer: High-Fidelity and Diverse Text-to-3D Generation with Variational Score Distillation.Advances in Neural Infor- mation Processing Systems, 2024, 36
Wang Z, Lu C, Wang Y, Bao F, Li C, Su H, Zhu J. Prolific- dreamer: High-Fidelity and Diverse Text-to-3D Generation with Variational Score Distillation.Advances in Neural Infor- mation Processing Systems, 2024, 36
2024
-
[41]
MVDream: Multi-View Diffusion for 3D Generation
Shi Y, Wang P, Ye J, Mai L, Li K, Yang X. MVDream: Multi-View Diffusion for 3D Generation. InInternational Conference on Learning Representations, volume 2024, 2024, 39838–39859
2024
-
[42]
Wonder3d: Single Image to 3D using Cross-Domain Diffusion.CVPR, 2024
Long X, Guo YC, Lin C, Liu Y, Dou Z, Liu L, Ma Y, Zhang SH, Habermann M, Theobalt C, et al.. Wonder3d: Single Image to 3D using Cross-Domain Diffusion.CVPR, 2024
2024
-
[43]
LRM: Large Reconstruction Model for Single Image to 3D
Hong Y, Zhang K, Gu J, Bi S, Zhou Y, Liu D, Liu F, Sunkavalli K, Bui T, Tan H. Lrm: Large Reconstruction Model for Single Image to 3D.arXiv preprint arXiv:2311.04400, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[44]
Wang P, Tan H, Bi S, Xu Y, Luan F, Sunkavalli K, Wang W, Xu Z, Zhang K. Pf-Lrm: Pose-Free Large Reconstruction Model for Joint Pose and Shape Prediction.arXiv preprint arXiv:2311.12024, 2023
-
[45]
Wang Z, Wang Y, Chen Y, Xiang C, Chen S, Yu D, Li C, Su H, Zhu J. Crm: Single Image to 3D Textured Mesh with Convolutional Reconstruction Model.arXiv preprint arXiv:2403.05034, 2024
-
[46]
Tang J, Chen Z, Chen X, Wang T, Zeng G, Liu Z. LGM: Large Multi-View Gaussian Model for High-Resolution 3D Content Creation.arXiv preprint arXiv:2402.05054, 2024
-
[47]
Xu J, Cheng W, Gao Y, Wang X, Gao S, Shan Y. InstantMesh: Efficient 3D Mesh Generation from a Single Image with Sparse-View Large Reconstruction Models.arXiv preprint arXiv:2404.07191, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[48]
TripoSR: Fast 3D Object Reconstruction from a Single Image
Tochilkin D, Pankratz D, Liu Z, Huang Z, Letts A, Li Y, Liang D, Laforte C, Jampani V, Cao YP. Triposr: Fast 3D Object Reconstruction from a Single Image.arXiv preprint arXiv:2403.02151, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[49]
One-2- 3-45: Any Single Image to 3D Mesh in 45 Seconds without Per-Shape Optimization.Advances in Neural Information Pro- cessing Systems, 2024, 36
Liu M, Xu C, Jin H, Chen L, Varma T M, Xu Z, Su H. One-2- 3-45: Any Single Image to 3D Mesh in 45 Seconds without Per-Shape Optimization.Advances in Neural Information Pro- cessing Systems, 2024, 36
2024
-
[50]
Clay: A Controllable Large-Scale Generative Model for Creating High-Quality 3D Assets.ACM Transactions on Graphics (TOG), 2024, 43(4): 1–20
Zhang L, Wang Z, Zhang Q, Qiu Q, Pang A, Jiang H, Yang W, Xu L, Yu J. Clay: A Controllable Large-Scale Generative Model for Creating High-Quality 3D Assets.ACM Transactions on Graphics (TOG), 2024, 43(4): 1–20
2024
-
[51]
Structured 3D Latents for Scalable and Versatile 3D Generation
Xiang J, Lv Z, Xu S, Deng Y, Wang R, Zhang B, Chen D, Tong X, Yang J. Structured 3D Latents for Scalable and Versatile 3D Generation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025, 21469– 21480
2025
-
[52]
Native and Compact Structured Latents for 3D Generation
Xiang J, Chen X, Xu S, Wang R, Lv Z, Deng Y, Zhu H, Dong Y, Zhao H, Yuan NJ, et al.. Native and Compact Structured Latents for 3D Generation.arXiv preprint arXiv:2512.14692, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[53]
Hunyuan3D 2.0: Scaling Diffusion Models for High Resolution Textured 3D Assets Generation
Zhao Z, Lai Z, Lin Q, Zhao Y, Liu H, Yang S, Feng Y, Yang M, Zhang S, Yang X, et al.. Hunyuan3d 2.0: Scaling Diffusion Models for High Resolution Textured 3d Assets Generation. arXiv preprint arXiv:2501.12202, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[54]
Ewa Volume Splatting
Zwicker M, Pfister H, Van Baar J, Gross M. Ewa Volume Splatting. InProceedings Visualization, 2001. VIS’01., 2001, 29–538
2001
-
[55]
Wei J, Leutenegger S, Schaefer S. GSFix3D: Diffusion-Guided Repair of Novel Views in Gaussian Splatting.arXiv preprint arXiv:2508.14717, 2025
-
[56]
Difix3d+: Improving 3D Reconstruc- tions with Single-Step Diffusion Models
Wu JZ, Zhang Y, Turki H, Ren X, Gao J, Shou MZ, Fidler S, Gojcic Z, Ling H. Difix3d+: Improving 3D Reconstruc- tions with Single-Step Diffusion Models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025, 26024–26035
2025
-
[57]
LoRA: Low-Rank Adaptation of Large Language Models, 2021
Hu EJ, Shen Y, Wallis P, Allen-Zhu Z, Li Y, Wang S, Wang L, Chen W. LoRA: Low-Rank Adaptation of Large Language Models, 2021
2021
-
[58]
Wu C, Li J, Zhou J, Lin J, Gao K, Yan K, Yin Sm, Bai S, Xu X, Chen Y, et al.. Qwen-Image Technical Report.arXiv preprint arXiv:2508.02324, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[59]
Wan: Open and Advanced Large-Scale Video Generative Models
Wan T, Wang A, Ai B, Wen B, Mao C, Xie CW, Chen D, Yu F, Zhao H, Yang J, Zeng J, Wang J, Zhang J, Zhou J, Wang J, CORGI:Consistency-Aware 3D DogReconstruction from a SingleImage in the Wild 17 Chen J, Zhu K, Zhao K, Yan K, Huang L, Feng M, Zhang N, Li P, Wu P, Chu R, Feng R, Zhang S, Sun S, Fang T, Wang T, Gui T, Weng T, Shen T, Lin W, Wang W, Wang W, Zho...
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[60]
Wang Z, Chen S, Mo L, Gao X, Shen Y, Ding L, Liang W. DogMo: A Large-Scale Multi-View RGB-D Dataset for 4D Canine Motion Recovery.arXiv preprint arXiv:2510.24117, 2025
-
[61]
Soft Rasterizer: A Differentiable Renderer for Image-based 3D Reasoning
Liu S, Li T, Chen W, Li H. Soft Rasterizer: A Differentiable Renderer for Image-based 3D Reasoning. InProceedings of the IEEE/CVF International Conference on Computer Vision, 2019, 7708–7717
2019
-
[62]
Uv Gaussians: Joint Learning of Mesh Deformation and Gaussian Textures for Human Avatar Modeling.Knowledge- Based Systems, 2025, 320: 113470
Jiang Y, Liao Q, Li X, Ma L, Zhang Q, Zhang C, Lu Z, Shan Y. Uv Gaussians: Joint Learning of Mesh Deformation and Gaussian Textures for Human Avatar Modeling.Knowledge- Based Systems, 2025, 320: 113470
2025
-
[63]
Real-Time Large-Scale Deformation of Gaussian Splatting
Gao L, Yang J, Zhang BT, Sun JM, Yuan YJ, Fu H, Lai YK. Real-Time Large-Scale Deformation of Gaussian Splatting. ACM Transactions on Graphics (TOG), 2024, 43(6): 1–17
2024
-
[64]
As-Rigid-as-Possible Shape Manipulation.ACM Transactions on Graphics (TOG), 2005, 24(3): 1134–1141
Igarashi T, Moscovich T, Hughes JF. As-Rigid-as-Possible Shape Manipulation.ACM Transactions on Graphics (TOG), 2005, 24(3): 1134–1141
2005
-
[65]
The Unrea- sonable Effectiveness of Deep Features as a Perceptual Metric
Zhang R, Isola P, Efros AA, Shechtman E, Wang O. The Unrea- sonable Effectiveness of Deep Features as a Perceptual Metric. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018, 586–595
2018
-
[66]
GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium.Advances in Neural Information Processing Systems, 2017, 30: 1–12
Heusel M, Ramsauer H, Unterthiner T, Nessler B, Hochreiter S. GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium.Advances in Neural Information Processing Systems, 2017, 30: 1–12
2017
-
[67]
A Feature-Enriched Completely Blind Image Quality Evaluator.IEEE Transactions on Image Processing, 2015, 24(8): 2579–2591
Zhang L, Zhang L, Bovik AC. A Feature-Enriched Completely Blind Image Quality Evaluator.IEEE Transactions on Image Processing, 2015, 24(8): 2579–2591
2015
-
[68]
Dreamgaussian: Gen- erative Gaussian Splatting for Efficient 3D Content Creation
Tang J, Ren J, Zhou H, Liu Z, Zeng G. Dreamgaussian: Gen- erative Gaussian Splatting for Efficient 3D Content Creation. InInternational Conference on Learning Representations, volume 2024, 2024, 33879–33896
2024
-
[69]
Ar-1-to-3: Single Image to Consistent 3D Object via Next-View Prediction
Zhang X, Zhou Y, Wang K, Wang Y, Li Z, Jiao S, Zhou D, Hou Q, Cheng MM. Ar-1-to-3: Single Image to Consistent 3D Object via Next-View Prediction. InProceedings of the IEEE/CVF International Conference on Computer Vision, 2025, 26273–26283
2025
-
[70]
Stable Virtual Camera: Generative View Synthesis with Diffusion Models
Zhou J, Gao H, Voleti V, Vasishta A, Yao CH, Boss M, Torr P, Rupprecht C, Jampani V. Stable Virtual Camera: Generative View Synthesis with Diffusion Models. InProceedings of the IEEE/CVF International Conference on Computer Vision, 2025, 12405–12414
2025
-
[71]
Huang T, Zheng W, Wang T, Liu Y, Wang Z, Wu J, Jiang J, Li H, Lau RW, Zuo W, Guo C. Voyager: Long-Range and World-Consistent Video Diffusion for Explorable 3D Scene Generation.arXiv preprint arXiv:2506.04225, 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.