CORGI: Consistency-Aware 3D Dog Reconstruction from a Single Image in the Wild

Boyi Zhu; Ligang Liu; Weile Li; Youcheng Cai; Yumeng Liu; Yuxiao Wu

arxiv: 2607.00321 · v1 · pith:DWR2R3KEnew · submitted 2026-07-01 · 💻 cs.CV

CORGI: Consistency-Aware 3D Dog Reconstruction from a Single Image in the Wild

Yuxiao Wu , Weile Li , Boyi Zhu , Yumeng Liu , Youcheng Cai , Ligang Liu This is my paper

Pith reviewed 2026-07-02 15:24 UTC · model grok-4.3

classification 💻 cs.CV

keywords 3D reconstructionsingle imagedog modelingconsistency-awaredeformable Gaussian splattinggenerative repairpose normalization

0 comments

The pith

CORGI reconstructs geometrically accurate and animatable 3D dog models from a single unconstrained image without any 3D supervision.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a framework called CORGI that builds high-fidelity 3D dog models from one in-the-wild photo. It overcomes the lack of multi-view data and generative inconsistencies through three modules that normalize poses, model per-view errors on a shape prior, and repair distortions in a self-supervised way. A sympathetic reader would care because this removes the need for expensive 3D training data while producing models ready for animation and other uses. The work claims the approach works across many dog breeds and yields coherent results.

Core claim

CORGI eliminates the need for 3D supervision by using Canonical-Driven Orbital Generation to normalize arbitrary input poses and synthesize reliable 360-degree video observations with specialized Canonical and Orbit LoRAs, Consistency-aware Deformable 3DGS to anchor on a D-SMAL prior and learn vertex displacements via neural deformation fields that capture generative errors, and a self-supervised Deformation-Conditioned Generative Repair module to eliminate structural distortions and recover high-frequency details.

What carries the argument

The three-component CORGI pipeline: Canonical-Driven Orbital Generation (CDOG) for pose normalization and 360-degree synthesis via LoRAs, Consistency-aware Deformable 3DGS (CA-3DGS) for explicit error modeling with neural fields, and Deformation-Conditioned Generative Repair (DCGR) for self-supervised detail recovery.

If this is right

The output models are geometrically accurate and fully animatable for downstream applications.
The method generalizes across diverse dog breeds from unconstrained inputs.
No 3D supervision is required at any stage of training or inference.
High-frequency details are recovered while structural distortions are removed.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The consistency mechanisms could extend to single-image reconstruction of other articulated animals that lack 3D datasets.
Explicit modeling of generative errors via deformation fields may reduce the data requirements for similar 3D tasks on non-rigid objects.
Self-supervised repair after view synthesis might improve outputs in other pipelines that rely on generative multi-view creation.

Load-bearing premise

The Canonical-Driven Orbital Generation strategy with specialized LoRAs can normalize arbitrary input poses and synthesize reliable 360-degree observations without introducing uncorrectable generative inconsistencies.

What would settle it

A test set of single dog images where the generated 360-degree videos contain persistent view inconsistencies that produce visibly distorted vertex positions in the final 3D models after deformation fields and repair are applied.

read the original abstract

Reconstructing high-fidelity 3D models of highly articulated animals, such as dogs, from a single in-the-wild image remains a formidable challenge. In this paper, we introduce CORGI, a novel framework for consistency-aware 3D dog reconstruction from a single unconstrained image that completely eliminates the need for 3D supervision. To overcome generative inconsistencies and the lack of multi-view capture, our pipeline introduces three core components. First, we propose a Canonical-Driven Orbital Generation (CDOG) strategy, utilizing specialized Canonical and Orbit LoRAs to normalize arbitrary input poses and synthesize reliable 360-degree video observations. Second, we design a Consistency-aware Deformable 3DGS (CA-3DGS) module that anchors on a D-SMAL prior, explicitly modeling per-view generative errors through dedicated neural deformation fields to learn accurate vertex-level displacements. Finally, to eliminate structural distortions and recover high-frequency details, we introduce a self-supervised Deformation-Conditioned Generative Repair (DCGR) module. Extensive experiments demonstrate that CORGI achieves state-of-the-art performance, generalizing seamlessly across diverse dog breeds to produce geometrically accurate, visually coherent, and fully animatable 3D assets ready for downstream applications.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

CORGI's pipeline uses dual LoRAs for view synthesis, deformable 3DGS on a dog template, and a repair module to claim single-image reconstruction without 3D labels, but the abstract supplies no numbers or ablations to check the consistency assumptions.

read the letter

The one thing to know about this paper is that it describes a method called CORGI that reconstructs animatable 3D dog models from a single in-the-wild image by first generating multi-view observations with fine-tuned diffusion models and then fitting a consistency-aware 3D Gaussian splatting model anchored on a dog template.

The new elements are the Canonical-Driven Orbital Generation strategy with specialized Canonical and Orbit LoRAs to handle pose normalization and view synthesis, the Consistency-aware Deformable 3DGS that uses neural deformation fields to capture per-view generative errors and learn vertex displacements, and the self-supervised Deformation-Conditioned Generative Repair module to fix structural issues and add details.

The paper does a good job of breaking down the challenges of articulated animal reconstruction and proposing a modular approach that avoids direct 3D supervision, which is a practical direction given how hard it is to get 3D animal data.

The soft spots are around the soundness of the core claims. The abstract asserts state-of-the-art performance and seamless generalization, but provides no numbers, no ablation studies, and no details on how the components interact or perform. The assumption that the LoRA-based generation produces reliable 360-degree observations without introducing uncorrectable inconsistencies is central, and if that does not hold, the later modules may not fully compensate. The stress-test note correctly points out that without quantitative bounds on residual errors from CDOG, the final geometric accuracy could be affected.

This paper is aimed at researchers in computer vision and graphics working on 3D reconstruction of animals or other articulated objects from limited views. A reader in that area might find the pipeline ideas worth considering for their own work.

It deserves a serious referee because the problem is real and the proposed components are specific enough that reviewers can assess the implementation and results once the full paper and experiments are available.

Referee Report

2 major / 2 minor

Summary. The paper introduces CORGI, a framework for 3D dog reconstruction from a single in-the-wild image that eliminates 3D supervision. It proposes three components: Canonical-Driven Orbital Generation (CDOG) using specialized Canonical and Orbit LoRAs to normalize poses and synthesize 360-degree observations; Consistency-aware Deformable 3DGS (CA-3DGS) that anchors on a D-SMAL prior and models per-view generative errors via neural deformation fields for vertex displacements; and self-supervised Deformation-Conditioned Generative Repair (DCGR) to fix structural distortions and recover high-frequency details. The manuscript claims state-of-the-art performance with generalization across dog breeds, yielding geometrically accurate, visually coherent, and fully animatable 3D assets.

Significance. If the central claims hold, the work would be significant for single-view 3D reconstruction of highly articulated animals, as it removes reliance on 3D ground truth while addressing generative inconsistencies through explicit modeling and repair stages. The integration of LoRA-based view synthesis with deformable 3DGS and self-supervised repair offers a practical path to animatable assets from unconstrained images.

major comments (2)

[CDOG strategy (Abstract and method description)] The no-3D-supervision claim rests on CDOG producing reliable 360° observations that CA-3DGS can anchor to D-SMAL without uncorrectable artifacts. The manuscript provides no quantitative bound on residual view inconsistency after CDOG, nor an ablation isolating CDOG's contribution to final geometric error (see skeptic note on pose-dependent artifacts or texture drift not captured by the neural deformation fields).
[CA-3DGS module] CA-3DGS models per-view generative errors through dedicated neural deformation fields, but it is unclear how these fields are trained or regularized to ensure the vertex displacements remain consistent across the synthesized views without introducing bias from the generative step.

minor comments (2)

[Abstract] The abstract states 'extensive experiments demonstrate SOTA performance' but provides no specific metrics, baselines, or dataset details; these should be summarized with quantitative results in the abstract or introduction.
[Method sections] Notation for D-SMAL, DCGR, and the neural deformation fields should be defined at first use with explicit equations for the deformation fields and loss terms.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback and positive assessment of the work's significance. We address each major comment point by point below, indicating where revisions will be made to strengthen the manuscript.

read point-by-point responses

Referee: [CDOG strategy (Abstract and method description)] The no-3D-supervision claim rests on CDOG producing reliable 360° observations that CA-3DGS can anchor to D-SMAL without uncorrectable artifacts. The manuscript provides no quantitative bound on residual view inconsistency after CDOG, nor an ablation isolating CDOG's contribution to final geometric error (see skeptic note on pose-dependent artifacts or texture drift not captured by the neural deformation fields).

Authors: We acknowledge that the current manuscript lacks an explicit quantitative bound on residual view inconsistency after CDOG and does not include an ablation isolating its contribution to geometric error. The overall performance gains and qualitative results support the approach, but to directly address this point we will add both a consistency metric (e.g., average cross-view PSNR on synthesized observations) and a dedicated ablation study measuring the impact of CDOG on final reconstruction error in the revised version. revision: yes
Referee: [CA-3DGS module] CA-3DGS models per-view generative errors through dedicated neural deformation fields, but it is unclear how these fields are trained or regularized to ensure the vertex displacements remain consistent across the synthesized views without introducing bias from the generative step.

Authors: The neural deformation fields are optimized via a self-supervised objective combining photometric rendering loss across the CDOG-synthesized views with an explicit cross-view consistency regularizer on the predicted displacements. This procedure is described in Section 3.2, but we agree the description can be made clearer regarding regularization details and bias mitigation. We will expand the method section with the precise loss formulation and training schedule in the revision. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation relies on independent generative priors and external anchors

full rationale

The pipeline introduces CDOG (LoRA-based view synthesis), CA-3DGS (deformation fields anchored to D-SMAL prior), and DCGR (self-supervised repair) as sequential modules. No equations or self-citations are presented that reduce a claimed prediction or consistency model to a fitted quantity from the same generative step by construction. The no-3D-supervision claim rests on external generative models and a fixed SMAL prior rather than internal re-fitting of the target output. This is the common case of a self-contained method paper whose central claims remain independently falsifiable against held-out images or alternative generators.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review; no explicit free parameters, axioms, or invented entities can be identified from the provided text. The method implicitly relies on pre-trained generative models and the D-SMAL prior, but details are absent.

pith-pipeline@v0.9.1-grok · 5765 in / 1282 out tokens · 31463 ms · 2026-07-02T15:24:23.271323+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

71 extracted references · 19 canonical work pages · 8 internal anchors

[1]

Advances and Trends in the 3D Reconstruction of the Shape and Motion of Animals.arXiv preprint arXiv:2508.16062, 2025

Li Z, Amrani A, Rai S, Laga H. Advances and Trends in the 3D Reconstruction of the Shape and Motion of Animals.arXiv preprint arXiv:2508.16062, 2025

work page arXiv 2025
[2]

Bite: Beyond Priors for Improved Three-D Dog Pose Estimation

R¨ uegg N, Tripathi S, Schindler K, Black MJ, Zuffi S. Bite: Beyond Priors for Improved Three-D Dog Pose Estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, 8867–8876

2023
[3]

3D Menagerie: Modeling the 3D Shape and Pose of Animals

Zuffi S, Kanazawa A, Jacobs DW, Black MJ. 3D Menagerie: Modeling the 3D Shape and Pose of Animals. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2017, 6365–6373

2017
[4]

Animal Avatars: Recon- structing Animatable 3D Animals from Casual Videos

Sabathier R, Mitra NJ, Novotny D. Animal Avatars: Recon- structing Animatable 3D Animals from Casual Videos. In European Conference on Computer Vision, 2024, 270–287

2024
[5]

Learning the 3d Fauna of the Web

Li Z, Litvak D, Li R, Zhang Y, Jakab T, Rupprecht C, Wu S, Vedaldi A, Wu J. Learning the 3d Fauna of the Web. In CORGI:Consistency-Aware 3D DogReconstruction from a SingleImage in the Wild 15 Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, 9752–9762

2024
[6]

Lassie: Learning Articulated Shapes from Sparse Image En- semble via 3D Part Discovery.Advances in Neural Information Processing Systems, 2022, 35: 15296–15308

Yao CH, Hung WC, Li Y, Rubinstein M, Yang MH, Jampani V. Lassie: Learning Articulated Shapes from Sparse Image En- semble via 3D Part Discovery.Advances in Neural Information Processing Systems, 2022, 35: 15296–15308

2022
[7]

Magicpony: Learning Articulated 3D Animals in the Wild

Wu S, Li R, Jakab T, Rupprecht C, Vedaldi A. Magicpony: Learning Articulated 3D Animals in the Wild. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, 8792–8802

2023
[8]

Zero-1-to-3: Zero-shot one Image to 3D Object

Liu R, Wu R, Van Hoorick B, Tokmakov P, Zakharov S, Vondrick C. Zero-1-to-3: Zero-shot one Image to 3D Object. InProceedings of the IEEE/CVF International Conference on Computer Vision, 2023, 9298–9309

2023
[9]

Syncdreamer: Generating Multiview-Consistent Images from a Single-View Image

Liu Y, Lin C, Zeng Z, Long X, Liu L, Komura T, Wang W. Syncdreamer: Generating Multiview-Consistent Images from a Single-View Image. InInternational Conference on Learning Representations, volume 2024, 2024, 27676–27697

2024
[10]

One-2-3-45++: Fast Single Image to 3D Objects with Consistent Multi-View Generation and 3D Diffusion

Liu M, Shi R, Chen L, Zhang Z, Xu C, Wei X, Chen H, Zeng C, Gu J, Su H. One-2-3-45++: Fast Single Image to 3D Objects with Consistent Multi-View Generation and 3D Diffusion. arXiv preprint arXiv:2311.07885, 2023

work page arXiv 2023
[11]

Genfusion: Closing the Loop between Reconstruction and Generation via Videos

Wu S, Xu C, Huang B, Geiger A, Chen A. Genfusion: Closing the Loop between Reconstruction and Generation via Videos. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025, 6078–6088

2025
[12]

Gen3c: 3D-Informed World- Consistent Video Generation with Precise Camera Control

Ren X, Shen T, Huang J, Ling H, Lu Y, Nimier-David M, M¨ uller T, Keller A, Fidler S, Gao J. Gen3c: 3D-Informed World- Consistent Video Generation with Precise Camera Control. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025, 6121–6132

2025
[13]

DogRecon: Canine Prior- Guided Animatable 3D Gaussian Dog Reconstruction From A Single Image.International Journal of Computer Vision, 2025, 133(9): 6332–6346

Cho G, Kang C, Soon D, Joo K. DogRecon: Canine Prior- Guided Animatable 3D Gaussian Dog Reconstruction From A Single Image.International Journal of Computer Vision, 2025, 133(9): 6332–6346

2025
[14]

3D Gaussian Splatting for Real-Time Radiance Field Rendering

Kerbl B, Kopanas G, Leimk¨ uhler T, Drettakis G, et al.. 3D Gaussian Splatting for Real-Time Radiance Field Rendering. ACM Transactions on Graphics (TOG), 2023, 42(4): 139–1

2023
[15]

Hsmal: Detailed Horse Shape and Pose Reconstruction for Motion Pattern Recognition.arXiv preprint arXiv:2106.10102, 2021

Li C, Ghorbani N, Broom´e S, Rashid M, Black MJ, Hernlund E, Kjellstr¨om H, Zuffi S. Hsmal: Detailed Horse Shape and Pose Reconstruction for Motion Pattern Recognition.arXiv preprint arXiv:2106.10102, 2021

work page arXiv 2021
[16]

Varen: Very Accurate and Realistic Equine Network

Zuffi S, Mellbin Y, Li C, Hoeschle M, Kjellstr¨om H, Polikovsky S, Hernlund E, Black MJ. Varen: Very Accurate and Realistic Equine Network. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, 5374– 5383

2024
[17]

Lasr: Learning Articulated Shape Reconstruction from a Monocular Video

Yang G, Sun D, Jampani V, Vlasic D, Cole F, Chang H, Ramanan D, Freeman WT, Liu C. Lasr: Learning Articulated Shape Reconstruction from a Monocular Video. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, 15980–15989

2021
[18]

Barc: Learning to Regress 3D Dog Shape from Images by Exploiting Breed Information

Rueegg N, Zuffi S, Schindler K, Black MJ. Barc: Learning to Regress 3D Dog Shape from Images by Exploiting Breed Information. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, 3876–3884

2022
[19]

Animer: Animal Pose and shape Estimation using Family Aware Transformer

Lyu J, Zhu T, Gu Y, Lin L, Cheng P, Liu Y, Tang X, An L. Animer: Animal Pose and shape Estimation using Family Aware Transformer. InProceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition, 2025, 17486–17496

2025
[20]

Gart: Gaussian Articulated Template Models

Lei J, Wang Y, Pavlakos G, Liu L, Daniilidis K. Gart: Gaussian Articulated Template Models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, 19876–19887

2024
[21]

Banmo: Building Animatable 3D Neural Models from Many Casual Videos

Yang G, Vo M, Neverova N, Ramanan D, Vedaldi A, Joo H. Banmo: Building Animatable 3D Neural Models from Many Casual Videos. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, 2863–2873

2022
[22]

Hi-Lassie: High-Fidelity Articulated Shape and Skeleton Discovery from Sparse Image Ensemble

Yao CH, Hung WC, Li Y, Rubinstein M, Yang MH, Jampani V. Hi-Lassie: High-Fidelity Articulated Shape and Skeleton Discovery from Sparse Image Ensemble. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, 4853–4862

2023
[23]

Lepard: Learning Explicit Part Discovery for 3D Articulated Shape Reconstruction.Advances in Neural Information Processing Systems, 2023, 36: 54187–54198

Liu D, Stathopoulos A, Zhangli Q, Gao Y, Metaxas D. Lepard: Learning Explicit Part Discovery for 3D Articulated Shape Reconstruction.Advances in Neural Information Processing Systems, 2023, 36: 54187–54198

2023
[24]

Artic3d: Learning Robust Articulated 3D Ahapes from Noisy Web Image Collections.Advances in Neural Information Processing Systems, 2023, 36: 48173–48184

Yao CH, Raj A, Hung WC, Rubinstein M, Li Y, Yang MH, Jam- pani V. Artic3d: Learning Robust Articulated 3D Ahapes from Noisy Web Image Collections.Advances in Neural Information Processing Systems, 2023, 36: 48173–48184

2023
[25]

Casa: Category- Agnostic Skeletal Animal Reconstruction.Advances in Neural Information Processing Systems, 2022, 35: 28559–28574

Wu Y, Chen Z, Liu S, Ren Z, Wang S. Casa: Category- Agnostic Skeletal Animal Reconstruction.Advances in Neural Information Processing Systems, 2022, 35: 28559–28574

2022
[26]

Dualpm: Dual Posed-Canonical Point Maps for 3D Shape and Pose Reconstruction

Kaye B, Jakab T, Wu S, Ruprecht C, Vedaldi A. Dualpm: Dual Posed-Canonical Point Maps for 3D Shape and Pose Reconstruction. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025, 6425– 6435

2025
[27]

Diffusion Models for 3D Generation: A Survey.Computational Visual Media, 2025, 11(1): 1–28

Wang C, Peng HY, Liu YT, Gu J, Hu SM. Diffusion Models for 3D Generation: A Survey.Computational Visual Media, 2025, 11(1): 1–28

2025
[28]

What do Single-View 3D Reconstruction Networks Learn? In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, 3405–3414

Tatarchenko M, Richter SR, Ranftl R, Li Z, Koltun V, Brox T. What do Single-View 3D Reconstruction Networks Learn? In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, 3405–3414

2019
[29]

Single Image 3D Object Re- construction Based on Deep Learning: A Review.Multimedia Tools and Applications, 2021, 80: 463–498

Fu K, Peng J, He Q, Zhang H. Single Image 3D Object Re- construction Based on Deep Learning: A Review.Multimedia Tools and Applications, 2021, 80: 463–498

2021
[30]

Learning view Priors for Single-View 3D Reconstruction

Kato H, Harada T. Learning view Priors for Single-View 3D Reconstruction. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, 9778– 9787

2019
[31]

Self-Supervised Single-View 3D Reconstruction via Semantic 16 Y

Li X, Liu S, Kim K, De Mello S, Jampani V, Yang MH, Kautz J. Self-Supervised Single-View 3D Reconstruction via Semantic 16 Y. Wu, W. Li, B. Zhu, Y. Liu, Y. Cai, L. Liu Consistency. InEuropean Conference on Computer Vision, 2020, 677–693

2020
[32]

Single-View 3D reconstruction: A Survey of Deep Learning Methods.Computers & Graphics, 2021, 94: 164–190

Fahim G, Amin K, Zarif S. Single-View 3D reconstruction: A Survey of Deep Learning Methods.Computers & Graphics, 2021, 94: 164–190

2021
[33]

Graf: Generative Radiance Fields for 3D-Aware Image Synthesis.Advances in Neural Information Processing Systems, 2020, 33: 20154– 20166

Schwarz K, Liao Y, Niemeyer M, Geiger A. Graf: Generative Radiance Fields for 3D-Aware Image Synthesis.Advances in Neural Information Processing Systems, 2020, 33: 20154– 20166

2020
[34]

Giraffe: Representing Scenes as Com- positional Generative Neural Feature Fields

Niemeyer M, Geiger A. Giraffe: Representing Scenes as Com- positional Generative Neural Feature Fields. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, 11453–11464

2021
[35]

Stylenerf: A Style-Based 3D-Aware Generator for High-Resolution Image Synthesis

Gu J, Liu L, Wang P, Theobalt C. Stylenerf: A Style-Based 3D-Aware Generator for High-Resolution Image Synthesis. arXiv preprint arXiv:2110.08985, 2021

work page arXiv 2021
[36]

Efficient Geometry-Aware 3D Generative Adversarial Networks

Chan ER, Lin CZ, Chan MA, Nagano K, Pan B, De Mello S, Gallo O, Guibas LJ, Tremblay J, Khamis S, et al.. Efficient Geometry-Aware 3D Generative Adversarial Networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, 16123–16133

2022
[37]

DreamFusion: Text-to-3D using 2D Diffusion

Poole B, Jain A, Barron JT, Mildenhall B. Dreamfusion: Text- to-3D using 2D Diffusion.arXiv preprint arXiv:2209.14988, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022
[38]

Luciddreamer: Towards High-Fidelity Text-to-3D Generation via Interval Score Matching.arXiv preprint arXiv:2311.11284, 2023

Liang Y, Yang X, Lin J, Li H, Xu X, Chen Y. Luciddreamer: Towards High-Fidelity Text-to-3D Generation via Interval Score Matching.arXiv preprint arXiv:2311.11284, 2023

work page arXiv 2023
[39]

Magic123: One Image to High-Quality 3D Object Generation using both 2D and 3D Diffusion Priors

Qian G, Mai J, Hamdi A, Ren J, Siarohin A, Li B, Lee HY, Skorokhodov I, Wonka P, Tulyakov S, et al.. Magic123: One Image to High-Quality 3D Object Generation using both 2D and 3D Diffusion Priors. InInternational Conference on Learning Representations, volume 2024, 2024, 48142–48159

2024
[40]

Prolific- dreamer: High-Fidelity and Diverse Text-to-3D Generation with Variational Score Distillation.Advances in Neural Infor- mation Processing Systems, 2024, 36

Wang Z, Lu C, Wang Y, Bao F, Li C, Su H, Zhu J. Prolific- dreamer: High-Fidelity and Diverse Text-to-3D Generation with Variational Score Distillation.Advances in Neural Infor- mation Processing Systems, 2024, 36

2024
[41]

MVDream: Multi-View Diffusion for 3D Generation

Shi Y, Wang P, Ye J, Mai L, Li K, Yang X. MVDream: Multi-View Diffusion for 3D Generation. InInternational Conference on Learning Representations, volume 2024, 2024, 39838–39859

2024
[42]

Wonder3d: Single Image to 3D using Cross-Domain Diffusion.CVPR, 2024

Long X, Guo YC, Lin C, Liu Y, Dou Z, Liu L, Ma Y, Zhang SH, Habermann M, Theobalt C, et al.. Wonder3d: Single Image to 3D using Cross-Domain Diffusion.CVPR, 2024

2024
[43]

LRM: Large Reconstruction Model for Single Image to 3D

Hong Y, Zhang K, Gu J, Bi S, Zhou Y, Liu D, Liu F, Sunkavalli K, Bui T, Tan H. Lrm: Large Reconstruction Model for Single Image to 3D.arXiv preprint arXiv:2311.04400, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[44]

Pf-Lrm: Pose-Free Large Reconstruction Model for Joint Pose and Shape Prediction.arXiv preprint arXiv:2311.12024, 2023

Wang P, Tan H, Bi S, Xu Y, Luan F, Sunkavalli K, Wang W, Xu Z, Zhang K. Pf-Lrm: Pose-Free Large Reconstruction Model for Joint Pose and Shape Prediction.arXiv preprint arXiv:2311.12024, 2023

work page arXiv 2023
[45]

Crm: Single Image to 3D Textured Mesh with Convolutional Reconstruction Model.arXiv preprint arXiv:2403.05034, 2024

Wang Z, Wang Y, Chen Y, Xiang C, Chen S, Yu D, Li C, Su H, Zhu J. Crm: Single Image to 3D Textured Mesh with Convolutional Reconstruction Model.arXiv preprint arXiv:2403.05034, 2024

work page arXiv 2024
[46]

LGM: Large Multi-View Gaussian Model for High-Resolution 3D Content Creation.arXiv preprint arXiv:2402.05054, 2024

Tang J, Chen Z, Chen X, Wang T, Zeng G, Liu Z. LGM: Large Multi-View Gaussian Model for High-Resolution 3D Content Creation.arXiv preprint arXiv:2402.05054, 2024

work page arXiv 2024
[47]

InstantMesh: Efficient 3D Mesh Generation from a Single Image with Sparse-view Large Reconstruction Models

Xu J, Cheng W, Gao Y, Wang X, Gao S, Shan Y. InstantMesh: Efficient 3D Mesh Generation from a Single Image with Sparse-View Large Reconstruction Models.arXiv preprint arXiv:2404.07191, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[48]

TripoSR: Fast 3D Object Reconstruction from a Single Image

Tochilkin D, Pankratz D, Liu Z, Huang Z, Letts A, Li Y, Liang D, Laforte C, Jampani V, Cao YP. Triposr: Fast 3D Object Reconstruction from a Single Image.arXiv preprint arXiv:2403.02151, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[49]

One-2- 3-45: Any Single Image to 3D Mesh in 45 Seconds without Per-Shape Optimization.Advances in Neural Information Pro- cessing Systems, 2024, 36

Liu M, Xu C, Jin H, Chen L, Varma T M, Xu Z, Su H. One-2- 3-45: Any Single Image to 3D Mesh in 45 Seconds without Per-Shape Optimization.Advances in Neural Information Pro- cessing Systems, 2024, 36

2024
[50]

Clay: A Controllable Large-Scale Generative Model for Creating High-Quality 3D Assets.ACM Transactions on Graphics (TOG), 2024, 43(4): 1–20

Zhang L, Wang Z, Zhang Q, Qiu Q, Pang A, Jiang H, Yang W, Xu L, Yu J. Clay: A Controllable Large-Scale Generative Model for Creating High-Quality 3D Assets.ACM Transactions on Graphics (TOG), 2024, 43(4): 1–20

2024
[51]

Structured 3D Latents for Scalable and Versatile 3D Generation

Xiang J, Lv Z, Xu S, Deng Y, Wang R, Zhang B, Chen D, Tong X, Yang J. Structured 3D Latents for Scalable and Versatile 3D Generation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025, 21469– 21480

2025
[52]

Native and Compact Structured Latents for 3D Generation

Xiang J, Chen X, Xu S, Wang R, Lv Z, Deng Y, Zhu H, Dong Y, Zhao H, Yuan NJ, et al.. Native and Compact Structured Latents for 3D Generation.arXiv preprint arXiv:2512.14692, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[53]

Hunyuan3D 2.0: Scaling Diffusion Models for High Resolution Textured 3D Assets Generation

Zhao Z, Lai Z, Lin Q, Zhao Y, Liu H, Yang S, Feng Y, Yang M, Zhang S, Yang X, et al.. Hunyuan3d 2.0: Scaling Diffusion Models for High Resolution Textured 3d Assets Generation. arXiv preprint arXiv:2501.12202, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[54]

Ewa Volume Splatting

Zwicker M, Pfister H, Van Baar J, Gross M. Ewa Volume Splatting. InProceedings Visualization, 2001. VIS’01., 2001, 29–538

2001
[55]

GSFix3D: Diffusion-Guided Repair of Novel Views in Gaussian Splatting.arXiv preprint arXiv:2508.14717, 2025

Wei J, Leutenegger S, Schaefer S. GSFix3D: Diffusion-Guided Repair of Novel Views in Gaussian Splatting.arXiv preprint arXiv:2508.14717, 2025

work page arXiv 2025
[56]

Difix3d+: Improving 3D Reconstruc- tions with Single-Step Diffusion Models

Wu JZ, Zhang Y, Turki H, Ren X, Gao J, Shou MZ, Fidler S, Gojcic Z, Ling H. Difix3d+: Improving 3D Reconstruc- tions with Single-Step Diffusion Models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025, 26024–26035

2025
[57]

LoRA: Low-Rank Adaptation of Large Language Models, 2021

Hu EJ, Shen Y, Wallis P, Allen-Zhu Z, Li Y, Wang S, Wang L, Chen W. LoRA: Low-Rank Adaptation of Large Language Models, 2021

2021
[58]

Qwen-Image Technical Report

Wu C, Li J, Zhou J, Lin J, Gao K, Yan K, Yin Sm, Bai S, Xu X, Chen Y, et al.. Qwen-Image Technical Report.arXiv preprint arXiv:2508.02324, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[59]

Wan: Open and Advanced Large-Scale Video Generative Models

Wan T, Wang A, Ai B, Wen B, Mao C, Xie CW, Chen D, Yu F, Zhao H, Yang J, Zeng J, Wang J, Zhang J, Zhou J, Wang J, CORGI:Consistency-Aware 3D DogReconstruction from a SingleImage in the Wild 17 Chen J, Zhu K, Zhao K, Yan K, Huang L, Feng M, Zhang N, Li P, Wu P, Chu R, Feng R, Zhang S, Sun S, Fang T, Wang T, Gui T, Weng T, Shen T, Lin W, Wang W, Wang W, Zho...

work page internal anchor Pith review Pith/arXiv arXiv 2025
[60]

DogMo: A Large-Scale Multi-View RGB-D Dataset for 4D Canine Motion Recovery.arXiv preprint arXiv:2510.24117, 2025

Wang Z, Chen S, Mo L, Gao X, Shen Y, Ding L, Liang W. DogMo: A Large-Scale Multi-View RGB-D Dataset for 4D Canine Motion Recovery.arXiv preprint arXiv:2510.24117, 2025

work page arXiv 2025
[61]

Soft Rasterizer: A Differentiable Renderer for Image-based 3D Reasoning

Liu S, Li T, Chen W, Li H. Soft Rasterizer: A Differentiable Renderer for Image-based 3D Reasoning. InProceedings of the IEEE/CVF International Conference on Computer Vision, 2019, 7708–7717

2019
[62]

Uv Gaussians: Joint Learning of Mesh Deformation and Gaussian Textures for Human Avatar Modeling.Knowledge- Based Systems, 2025, 320: 113470

Jiang Y, Liao Q, Li X, Ma L, Zhang Q, Zhang C, Lu Z, Shan Y. Uv Gaussians: Joint Learning of Mesh Deformation and Gaussian Textures for Human Avatar Modeling.Knowledge- Based Systems, 2025, 320: 113470

2025
[63]

Real-Time Large-Scale Deformation of Gaussian Splatting

Gao L, Yang J, Zhang BT, Sun JM, Yuan YJ, Fu H, Lai YK. Real-Time Large-Scale Deformation of Gaussian Splatting. ACM Transactions on Graphics (TOG), 2024, 43(6): 1–17

2024
[64]

As-Rigid-as-Possible Shape Manipulation.ACM Transactions on Graphics (TOG), 2005, 24(3): 1134–1141

Igarashi T, Moscovich T, Hughes JF. As-Rigid-as-Possible Shape Manipulation.ACM Transactions on Graphics (TOG), 2005, 24(3): 1134–1141

2005
[65]

The Unrea- sonable Effectiveness of Deep Features as a Perceptual Metric

Zhang R, Isola P, Efros AA, Shechtman E, Wang O. The Unrea- sonable Effectiveness of Deep Features as a Perceptual Metric. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018, 586–595

2018
[66]

GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium.Advances in Neural Information Processing Systems, 2017, 30: 1–12

Heusel M, Ramsauer H, Unterthiner T, Nessler B, Hochreiter S. GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium.Advances in Neural Information Processing Systems, 2017, 30: 1–12

2017
[67]

A Feature-Enriched Completely Blind Image Quality Evaluator.IEEE Transactions on Image Processing, 2015, 24(8): 2579–2591

Zhang L, Zhang L, Bovik AC. A Feature-Enriched Completely Blind Image Quality Evaluator.IEEE Transactions on Image Processing, 2015, 24(8): 2579–2591

2015
[68]

Dreamgaussian: Gen- erative Gaussian Splatting for Efficient 3D Content Creation

Tang J, Ren J, Zhou H, Liu Z, Zeng G. Dreamgaussian: Gen- erative Gaussian Splatting for Efficient 3D Content Creation. InInternational Conference on Learning Representations, volume 2024, 2024, 33879–33896

2024
[69]

Ar-1-to-3: Single Image to Consistent 3D Object via Next-View Prediction

Zhang X, Zhou Y, Wang K, Wang Y, Li Z, Jiao S, Zhou D, Hou Q, Cheng MM. Ar-1-to-3: Single Image to Consistent 3D Object via Next-View Prediction. InProceedings of the IEEE/CVF International Conference on Computer Vision, 2025, 26273–26283

2025
[70]

Stable Virtual Camera: Generative View Synthesis with Diffusion Models

Zhou J, Gao H, Voleti V, Vasishta A, Yao CH, Boss M, Torr P, Rupprecht C, Jampani V. Stable Virtual Camera: Generative View Synthesis with Diffusion Models. InProceedings of the IEEE/CVF International Conference on Computer Vision, 2025, 12405–12414

2025
[71]

Voyager: Long-Range and World-Consistent Video Diffusion for Explorable 3D Scene Generation.arXiv preprint arXiv:2506.04225, 2025

Huang T, Zheng W, Wang T, Liu Y, Wang Z, Wu J, Jiang J, Li H, Lau RW, Zuo W, Guo C. Voyager: Long-Range and World-Consistent Video Diffusion for Explorable 3D Scene Generation.arXiv preprint arXiv:2506.04225, 2025

work page arXiv 2025

[1] [1]

Advances and Trends in the 3D Reconstruction of the Shape and Motion of Animals.arXiv preprint arXiv:2508.16062, 2025

Li Z, Amrani A, Rai S, Laga H. Advances and Trends in the 3D Reconstruction of the Shape and Motion of Animals.arXiv preprint arXiv:2508.16062, 2025

work page arXiv 2025

[2] [2]

Bite: Beyond Priors for Improved Three-D Dog Pose Estimation

R¨ uegg N, Tripathi S, Schindler K, Black MJ, Zuffi S. Bite: Beyond Priors for Improved Three-D Dog Pose Estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, 8867–8876

2023

[3] [3]

3D Menagerie: Modeling the 3D Shape and Pose of Animals

Zuffi S, Kanazawa A, Jacobs DW, Black MJ. 3D Menagerie: Modeling the 3D Shape and Pose of Animals. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2017, 6365–6373

2017

[4] [4]

Animal Avatars: Recon- structing Animatable 3D Animals from Casual Videos

Sabathier R, Mitra NJ, Novotny D. Animal Avatars: Recon- structing Animatable 3D Animals from Casual Videos. In European Conference on Computer Vision, 2024, 270–287

2024

[5] [5]

Learning the 3d Fauna of the Web

Li Z, Litvak D, Li R, Zhang Y, Jakab T, Rupprecht C, Wu S, Vedaldi A, Wu J. Learning the 3d Fauna of the Web. In CORGI:Consistency-Aware 3D DogReconstruction from a SingleImage in the Wild 15 Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, 9752–9762

2024

[6] [6]

Lassie: Learning Articulated Shapes from Sparse Image En- semble via 3D Part Discovery.Advances in Neural Information Processing Systems, 2022, 35: 15296–15308

Yao CH, Hung WC, Li Y, Rubinstein M, Yang MH, Jampani V. Lassie: Learning Articulated Shapes from Sparse Image En- semble via 3D Part Discovery.Advances in Neural Information Processing Systems, 2022, 35: 15296–15308

2022

[7] [7]

Magicpony: Learning Articulated 3D Animals in the Wild

Wu S, Li R, Jakab T, Rupprecht C, Vedaldi A. Magicpony: Learning Articulated 3D Animals in the Wild. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, 8792–8802

2023

[8] [8]

Zero-1-to-3: Zero-shot one Image to 3D Object

Liu R, Wu R, Van Hoorick B, Tokmakov P, Zakharov S, Vondrick C. Zero-1-to-3: Zero-shot one Image to 3D Object. InProceedings of the IEEE/CVF International Conference on Computer Vision, 2023, 9298–9309

2023

[9] [9]

Syncdreamer: Generating Multiview-Consistent Images from a Single-View Image

Liu Y, Lin C, Zeng Z, Long X, Liu L, Komura T, Wang W. Syncdreamer: Generating Multiview-Consistent Images from a Single-View Image. InInternational Conference on Learning Representations, volume 2024, 2024, 27676–27697

2024

[10] [10]

One-2-3-45++: Fast Single Image to 3D Objects with Consistent Multi-View Generation and 3D Diffusion

Liu M, Shi R, Chen L, Zhang Z, Xu C, Wei X, Chen H, Zeng C, Gu J, Su H. One-2-3-45++: Fast Single Image to 3D Objects with Consistent Multi-View Generation and 3D Diffusion. arXiv preprint arXiv:2311.07885, 2023

work page arXiv 2023

[11] [11]

Genfusion: Closing the Loop between Reconstruction and Generation via Videos

Wu S, Xu C, Huang B, Geiger A, Chen A. Genfusion: Closing the Loop between Reconstruction and Generation via Videos. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025, 6078–6088

2025

[12] [12]

Gen3c: 3D-Informed World- Consistent Video Generation with Precise Camera Control

Ren X, Shen T, Huang J, Ling H, Lu Y, Nimier-David M, M¨ uller T, Keller A, Fidler S, Gao J. Gen3c: 3D-Informed World- Consistent Video Generation with Precise Camera Control. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025, 6121–6132

2025

[13] [13]

DogRecon: Canine Prior- Guided Animatable 3D Gaussian Dog Reconstruction From A Single Image.International Journal of Computer Vision, 2025, 133(9): 6332–6346

Cho G, Kang C, Soon D, Joo K. DogRecon: Canine Prior- Guided Animatable 3D Gaussian Dog Reconstruction From A Single Image.International Journal of Computer Vision, 2025, 133(9): 6332–6346

2025

[14] [14]

3D Gaussian Splatting for Real-Time Radiance Field Rendering

Kerbl B, Kopanas G, Leimk¨ uhler T, Drettakis G, et al.. 3D Gaussian Splatting for Real-Time Radiance Field Rendering. ACM Transactions on Graphics (TOG), 2023, 42(4): 139–1

2023

[15] [15]

Hsmal: Detailed Horse Shape and Pose Reconstruction for Motion Pattern Recognition.arXiv preprint arXiv:2106.10102, 2021

Li C, Ghorbani N, Broom´e S, Rashid M, Black MJ, Hernlund E, Kjellstr¨om H, Zuffi S. Hsmal: Detailed Horse Shape and Pose Reconstruction for Motion Pattern Recognition.arXiv preprint arXiv:2106.10102, 2021

work page arXiv 2021

[16] [16]

Varen: Very Accurate and Realistic Equine Network

Zuffi S, Mellbin Y, Li C, Hoeschle M, Kjellstr¨om H, Polikovsky S, Hernlund E, Black MJ. Varen: Very Accurate and Realistic Equine Network. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, 5374– 5383

2024

[17] [17]

Lasr: Learning Articulated Shape Reconstruction from a Monocular Video

Yang G, Sun D, Jampani V, Vlasic D, Cole F, Chang H, Ramanan D, Freeman WT, Liu C. Lasr: Learning Articulated Shape Reconstruction from a Monocular Video. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, 15980–15989

2021

[18] [18]

Barc: Learning to Regress 3D Dog Shape from Images by Exploiting Breed Information

Rueegg N, Zuffi S, Schindler K, Black MJ. Barc: Learning to Regress 3D Dog Shape from Images by Exploiting Breed Information. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, 3876–3884

2022

[19] [19]

Animer: Animal Pose and shape Estimation using Family Aware Transformer

Lyu J, Zhu T, Gu Y, Lin L, Cheng P, Liu Y, Tang X, An L. Animer: Animal Pose and shape Estimation using Family Aware Transformer. InProceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition, 2025, 17486–17496

2025

[20] [20]

Gart: Gaussian Articulated Template Models

Lei J, Wang Y, Pavlakos G, Liu L, Daniilidis K. Gart: Gaussian Articulated Template Models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, 19876–19887

2024

[21] [21]

Banmo: Building Animatable 3D Neural Models from Many Casual Videos

Yang G, Vo M, Neverova N, Ramanan D, Vedaldi A, Joo H. Banmo: Building Animatable 3D Neural Models from Many Casual Videos. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, 2863–2873

2022

[22] [22]

Hi-Lassie: High-Fidelity Articulated Shape and Skeleton Discovery from Sparse Image Ensemble

Yao CH, Hung WC, Li Y, Rubinstein M, Yang MH, Jampani V. Hi-Lassie: High-Fidelity Articulated Shape and Skeleton Discovery from Sparse Image Ensemble. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, 4853–4862

2023

[23] [23]

Lepard: Learning Explicit Part Discovery for 3D Articulated Shape Reconstruction.Advances in Neural Information Processing Systems, 2023, 36: 54187–54198

Liu D, Stathopoulos A, Zhangli Q, Gao Y, Metaxas D. Lepard: Learning Explicit Part Discovery for 3D Articulated Shape Reconstruction.Advances in Neural Information Processing Systems, 2023, 36: 54187–54198

2023

[24] [24]

Artic3d: Learning Robust Articulated 3D Ahapes from Noisy Web Image Collections.Advances in Neural Information Processing Systems, 2023, 36: 48173–48184

Yao CH, Raj A, Hung WC, Rubinstein M, Li Y, Yang MH, Jam- pani V. Artic3d: Learning Robust Articulated 3D Ahapes from Noisy Web Image Collections.Advances in Neural Information Processing Systems, 2023, 36: 48173–48184

2023

[25] [25]

Casa: Category- Agnostic Skeletal Animal Reconstruction.Advances in Neural Information Processing Systems, 2022, 35: 28559–28574

Wu Y, Chen Z, Liu S, Ren Z, Wang S. Casa: Category- Agnostic Skeletal Animal Reconstruction.Advances in Neural Information Processing Systems, 2022, 35: 28559–28574

2022

[26] [26]

Dualpm: Dual Posed-Canonical Point Maps for 3D Shape and Pose Reconstruction

Kaye B, Jakab T, Wu S, Ruprecht C, Vedaldi A. Dualpm: Dual Posed-Canonical Point Maps for 3D Shape and Pose Reconstruction. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025, 6425– 6435

2025

[27] [27]

Diffusion Models for 3D Generation: A Survey.Computational Visual Media, 2025, 11(1): 1–28

Wang C, Peng HY, Liu YT, Gu J, Hu SM. Diffusion Models for 3D Generation: A Survey.Computational Visual Media, 2025, 11(1): 1–28

2025

[28] [28]

What do Single-View 3D Reconstruction Networks Learn? In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, 3405–3414

Tatarchenko M, Richter SR, Ranftl R, Li Z, Koltun V, Brox T. What do Single-View 3D Reconstruction Networks Learn? In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, 3405–3414

2019

[29] [29]

Single Image 3D Object Re- construction Based on Deep Learning: A Review.Multimedia Tools and Applications, 2021, 80: 463–498

Fu K, Peng J, He Q, Zhang H. Single Image 3D Object Re- construction Based on Deep Learning: A Review.Multimedia Tools and Applications, 2021, 80: 463–498

2021

[30] [30]

Learning view Priors for Single-View 3D Reconstruction

Kato H, Harada T. Learning view Priors for Single-View 3D Reconstruction. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, 9778– 9787

2019

[31] [31]

Self-Supervised Single-View 3D Reconstruction via Semantic 16 Y

Li X, Liu S, Kim K, De Mello S, Jampani V, Yang MH, Kautz J. Self-Supervised Single-View 3D Reconstruction via Semantic 16 Y. Wu, W. Li, B. Zhu, Y. Liu, Y. Cai, L. Liu Consistency. InEuropean Conference on Computer Vision, 2020, 677–693

2020

[32] [32]

Single-View 3D reconstruction: A Survey of Deep Learning Methods.Computers & Graphics, 2021, 94: 164–190

Fahim G, Amin K, Zarif S. Single-View 3D reconstruction: A Survey of Deep Learning Methods.Computers & Graphics, 2021, 94: 164–190

2021

[33] [33]

Graf: Generative Radiance Fields for 3D-Aware Image Synthesis.Advances in Neural Information Processing Systems, 2020, 33: 20154– 20166

Schwarz K, Liao Y, Niemeyer M, Geiger A. Graf: Generative Radiance Fields for 3D-Aware Image Synthesis.Advances in Neural Information Processing Systems, 2020, 33: 20154– 20166

2020

[34] [34]

Giraffe: Representing Scenes as Com- positional Generative Neural Feature Fields

Niemeyer M, Geiger A. Giraffe: Representing Scenes as Com- positional Generative Neural Feature Fields. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, 11453–11464

2021

[35] [35]

Stylenerf: A Style-Based 3D-Aware Generator for High-Resolution Image Synthesis

Gu J, Liu L, Wang P, Theobalt C. Stylenerf: A Style-Based 3D-Aware Generator for High-Resolution Image Synthesis. arXiv preprint arXiv:2110.08985, 2021

work page arXiv 2021

[36] [36]

Efficient Geometry-Aware 3D Generative Adversarial Networks

Chan ER, Lin CZ, Chan MA, Nagano K, Pan B, De Mello S, Gallo O, Guibas LJ, Tremblay J, Khamis S, et al.. Efficient Geometry-Aware 3D Generative Adversarial Networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, 16123–16133

2022

[37] [37]

DreamFusion: Text-to-3D using 2D Diffusion

Poole B, Jain A, Barron JT, Mildenhall B. Dreamfusion: Text- to-3D using 2D Diffusion.arXiv preprint arXiv:2209.14988, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022

[38] [38]

Luciddreamer: Towards High-Fidelity Text-to-3D Generation via Interval Score Matching.arXiv preprint arXiv:2311.11284, 2023

Liang Y, Yang X, Lin J, Li H, Xu X, Chen Y. Luciddreamer: Towards High-Fidelity Text-to-3D Generation via Interval Score Matching.arXiv preprint arXiv:2311.11284, 2023

work page arXiv 2023

[39] [39]

Magic123: One Image to High-Quality 3D Object Generation using both 2D and 3D Diffusion Priors

Qian G, Mai J, Hamdi A, Ren J, Siarohin A, Li B, Lee HY, Skorokhodov I, Wonka P, Tulyakov S, et al.. Magic123: One Image to High-Quality 3D Object Generation using both 2D and 3D Diffusion Priors. InInternational Conference on Learning Representations, volume 2024, 2024, 48142–48159

2024

[40] [40]

Prolific- dreamer: High-Fidelity and Diverse Text-to-3D Generation with Variational Score Distillation.Advances in Neural Infor- mation Processing Systems, 2024, 36

Wang Z, Lu C, Wang Y, Bao F, Li C, Su H, Zhu J. Prolific- dreamer: High-Fidelity and Diverse Text-to-3D Generation with Variational Score Distillation.Advances in Neural Infor- mation Processing Systems, 2024, 36

2024

[41] [41]

MVDream: Multi-View Diffusion for 3D Generation

Shi Y, Wang P, Ye J, Mai L, Li K, Yang X. MVDream: Multi-View Diffusion for 3D Generation. InInternational Conference on Learning Representations, volume 2024, 2024, 39838–39859

2024

[42] [42]

Wonder3d: Single Image to 3D using Cross-Domain Diffusion.CVPR, 2024

Long X, Guo YC, Lin C, Liu Y, Dou Z, Liu L, Ma Y, Zhang SH, Habermann M, Theobalt C, et al.. Wonder3d: Single Image to 3D using Cross-Domain Diffusion.CVPR, 2024

2024

[43] [43]

LRM: Large Reconstruction Model for Single Image to 3D

Hong Y, Zhang K, Gu J, Bi S, Zhou Y, Liu D, Liu F, Sunkavalli K, Bui T, Tan H. Lrm: Large Reconstruction Model for Single Image to 3D.arXiv preprint arXiv:2311.04400, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[44] [44]

Pf-Lrm: Pose-Free Large Reconstruction Model for Joint Pose and Shape Prediction.arXiv preprint arXiv:2311.12024, 2023

Wang P, Tan H, Bi S, Xu Y, Luan F, Sunkavalli K, Wang W, Xu Z, Zhang K. Pf-Lrm: Pose-Free Large Reconstruction Model for Joint Pose and Shape Prediction.arXiv preprint arXiv:2311.12024, 2023

work page arXiv 2023

[45] [45]

Crm: Single Image to 3D Textured Mesh with Convolutional Reconstruction Model.arXiv preprint arXiv:2403.05034, 2024

Wang Z, Wang Y, Chen Y, Xiang C, Chen S, Yu D, Li C, Su H, Zhu J. Crm: Single Image to 3D Textured Mesh with Convolutional Reconstruction Model.arXiv preprint arXiv:2403.05034, 2024

work page arXiv 2024

[46] [46]

LGM: Large Multi-View Gaussian Model for High-Resolution 3D Content Creation.arXiv preprint arXiv:2402.05054, 2024

Tang J, Chen Z, Chen X, Wang T, Zeng G, Liu Z. LGM: Large Multi-View Gaussian Model for High-Resolution 3D Content Creation.arXiv preprint arXiv:2402.05054, 2024

work page arXiv 2024

[47] [47]

InstantMesh: Efficient 3D Mesh Generation from a Single Image with Sparse-view Large Reconstruction Models

Xu J, Cheng W, Gao Y, Wang X, Gao S, Shan Y. InstantMesh: Efficient 3D Mesh Generation from a Single Image with Sparse-View Large Reconstruction Models.arXiv preprint arXiv:2404.07191, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[48] [48]

TripoSR: Fast 3D Object Reconstruction from a Single Image

Tochilkin D, Pankratz D, Liu Z, Huang Z, Letts A, Li Y, Liang D, Laforte C, Jampani V, Cao YP. Triposr: Fast 3D Object Reconstruction from a Single Image.arXiv preprint arXiv:2403.02151, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[49] [49]

One-2- 3-45: Any Single Image to 3D Mesh in 45 Seconds without Per-Shape Optimization.Advances in Neural Information Pro- cessing Systems, 2024, 36

Liu M, Xu C, Jin H, Chen L, Varma T M, Xu Z, Su H. One-2- 3-45: Any Single Image to 3D Mesh in 45 Seconds without Per-Shape Optimization.Advances in Neural Information Pro- cessing Systems, 2024, 36

2024

[50] [50]

Clay: A Controllable Large-Scale Generative Model for Creating High-Quality 3D Assets.ACM Transactions on Graphics (TOG), 2024, 43(4): 1–20

Zhang L, Wang Z, Zhang Q, Qiu Q, Pang A, Jiang H, Yang W, Xu L, Yu J. Clay: A Controllable Large-Scale Generative Model for Creating High-Quality 3D Assets.ACM Transactions on Graphics (TOG), 2024, 43(4): 1–20

2024

[51] [51]

Structured 3D Latents for Scalable and Versatile 3D Generation

Xiang J, Lv Z, Xu S, Deng Y, Wang R, Zhang B, Chen D, Tong X, Yang J. Structured 3D Latents for Scalable and Versatile 3D Generation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025, 21469– 21480

2025

[52] [52]

Native and Compact Structured Latents for 3D Generation

Xiang J, Chen X, Xu S, Wang R, Lv Z, Deng Y, Zhu H, Dong Y, Zhao H, Yuan NJ, et al.. Native and Compact Structured Latents for 3D Generation.arXiv preprint arXiv:2512.14692, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[53] [53]

Hunyuan3D 2.0: Scaling Diffusion Models for High Resolution Textured 3D Assets Generation

Zhao Z, Lai Z, Lin Q, Zhao Y, Liu H, Yang S, Feng Y, Yang M, Zhang S, Yang X, et al.. Hunyuan3d 2.0: Scaling Diffusion Models for High Resolution Textured 3d Assets Generation. arXiv preprint arXiv:2501.12202, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[54] [54]

Ewa Volume Splatting

Zwicker M, Pfister H, Van Baar J, Gross M. Ewa Volume Splatting. InProceedings Visualization, 2001. VIS’01., 2001, 29–538

2001

[55] [55]

GSFix3D: Diffusion-Guided Repair of Novel Views in Gaussian Splatting.arXiv preprint arXiv:2508.14717, 2025

Wei J, Leutenegger S, Schaefer S. GSFix3D: Diffusion-Guided Repair of Novel Views in Gaussian Splatting.arXiv preprint arXiv:2508.14717, 2025

work page arXiv 2025

[56] [56]

Difix3d+: Improving 3D Reconstruc- tions with Single-Step Diffusion Models

Wu JZ, Zhang Y, Turki H, Ren X, Gao J, Shou MZ, Fidler S, Gojcic Z, Ling H. Difix3d+: Improving 3D Reconstruc- tions with Single-Step Diffusion Models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025, 26024–26035

2025

[57] [57]

LoRA: Low-Rank Adaptation of Large Language Models, 2021

Hu EJ, Shen Y, Wallis P, Allen-Zhu Z, Li Y, Wang S, Wang L, Chen W. LoRA: Low-Rank Adaptation of Large Language Models, 2021

2021

[58] [58]

Qwen-Image Technical Report

Wu C, Li J, Zhou J, Lin J, Gao K, Yan K, Yin Sm, Bai S, Xu X, Chen Y, et al.. Qwen-Image Technical Report.arXiv preprint arXiv:2508.02324, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[59] [59]

Wan: Open and Advanced Large-Scale Video Generative Models

Wan T, Wang A, Ai B, Wen B, Mao C, Xie CW, Chen D, Yu F, Zhao H, Yang J, Zeng J, Wang J, Zhang J, Zhou J, Wang J, CORGI:Consistency-Aware 3D DogReconstruction from a SingleImage in the Wild 17 Chen J, Zhu K, Zhao K, Yan K, Huang L, Feng M, Zhang N, Li P, Wu P, Chu R, Feng R, Zhang S, Sun S, Fang T, Wang T, Gui T, Weng T, Shen T, Lin W, Wang W, Wang W, Zho...

work page internal anchor Pith review Pith/arXiv arXiv 2025

[60] [60]

DogMo: A Large-Scale Multi-View RGB-D Dataset for 4D Canine Motion Recovery.arXiv preprint arXiv:2510.24117, 2025

Wang Z, Chen S, Mo L, Gao X, Shen Y, Ding L, Liang W. DogMo: A Large-Scale Multi-View RGB-D Dataset for 4D Canine Motion Recovery.arXiv preprint arXiv:2510.24117, 2025

work page arXiv 2025

[61] [61]

Soft Rasterizer: A Differentiable Renderer for Image-based 3D Reasoning

Liu S, Li T, Chen W, Li H. Soft Rasterizer: A Differentiable Renderer for Image-based 3D Reasoning. InProceedings of the IEEE/CVF International Conference on Computer Vision, 2019, 7708–7717

2019

[62] [62]

Uv Gaussians: Joint Learning of Mesh Deformation and Gaussian Textures for Human Avatar Modeling.Knowledge- Based Systems, 2025, 320: 113470

Jiang Y, Liao Q, Li X, Ma L, Zhang Q, Zhang C, Lu Z, Shan Y. Uv Gaussians: Joint Learning of Mesh Deformation and Gaussian Textures for Human Avatar Modeling.Knowledge- Based Systems, 2025, 320: 113470

2025

[63] [63]

Real-Time Large-Scale Deformation of Gaussian Splatting

Gao L, Yang J, Zhang BT, Sun JM, Yuan YJ, Fu H, Lai YK. Real-Time Large-Scale Deformation of Gaussian Splatting. ACM Transactions on Graphics (TOG), 2024, 43(6): 1–17

2024

[64] [64]

As-Rigid-as-Possible Shape Manipulation.ACM Transactions on Graphics (TOG), 2005, 24(3): 1134–1141

Igarashi T, Moscovich T, Hughes JF. As-Rigid-as-Possible Shape Manipulation.ACM Transactions on Graphics (TOG), 2005, 24(3): 1134–1141

2005

[65] [65]

The Unrea- sonable Effectiveness of Deep Features as a Perceptual Metric

Zhang R, Isola P, Efros AA, Shechtman E, Wang O. The Unrea- sonable Effectiveness of Deep Features as a Perceptual Metric. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018, 586–595

2018

[66] [66]

GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium.Advances in Neural Information Processing Systems, 2017, 30: 1–12

Heusel M, Ramsauer H, Unterthiner T, Nessler B, Hochreiter S. GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium.Advances in Neural Information Processing Systems, 2017, 30: 1–12

2017

[67] [67]

A Feature-Enriched Completely Blind Image Quality Evaluator.IEEE Transactions on Image Processing, 2015, 24(8): 2579–2591

Zhang L, Zhang L, Bovik AC. A Feature-Enriched Completely Blind Image Quality Evaluator.IEEE Transactions on Image Processing, 2015, 24(8): 2579–2591

2015

[68] [68]

Dreamgaussian: Gen- erative Gaussian Splatting for Efficient 3D Content Creation

Tang J, Ren J, Zhou H, Liu Z, Zeng G. Dreamgaussian: Gen- erative Gaussian Splatting for Efficient 3D Content Creation. InInternational Conference on Learning Representations, volume 2024, 2024, 33879–33896

2024

[69] [69]

Ar-1-to-3: Single Image to Consistent 3D Object via Next-View Prediction

Zhang X, Zhou Y, Wang K, Wang Y, Li Z, Jiao S, Zhou D, Hou Q, Cheng MM. Ar-1-to-3: Single Image to Consistent 3D Object via Next-View Prediction. InProceedings of the IEEE/CVF International Conference on Computer Vision, 2025, 26273–26283

2025

[70] [70]

Stable Virtual Camera: Generative View Synthesis with Diffusion Models

Zhou J, Gao H, Voleti V, Vasishta A, Yao CH, Boss M, Torr P, Rupprecht C, Jampani V. Stable Virtual Camera: Generative View Synthesis with Diffusion Models. InProceedings of the IEEE/CVF International Conference on Computer Vision, 2025, 12405–12414

2025

[71] [71]

Voyager: Long-Range and World-Consistent Video Diffusion for Explorable 3D Scene Generation.arXiv preprint arXiv:2506.04225, 2025

Huang T, Zheng W, Wang T, Liu Y, Wang Z, Wu J, Jiang J, Li H, Lau RW, Zuo W, Guo C. Voyager: Long-Range and World-Consistent Video Diffusion for Explorable 3D Scene Generation.arXiv preprint arXiv:2506.04225, 2025

work page arXiv 2025