pith. sign in

arxiv: 2607.00321 · v1 · pith:DWR2R3KEnew · submitted 2026-07-01 · 💻 cs.CV

CORGI: Consistency-Aware 3D Dog Reconstruction from a Single Image in the Wild

Pith reviewed 2026-07-02 15:24 UTC · model grok-4.3

classification 💻 cs.CV
keywords 3D reconstructionsingle imagedog modelingconsistency-awaredeformable Gaussian splattinggenerative repairpose normalization
0
0 comments X

The pith

CORGI reconstructs geometrically accurate and animatable 3D dog models from a single unconstrained image without any 3D supervision.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a framework called CORGI that builds high-fidelity 3D dog models from one in-the-wild photo. It overcomes the lack of multi-view data and generative inconsistencies through three modules that normalize poses, model per-view errors on a shape prior, and repair distortions in a self-supervised way. A sympathetic reader would care because this removes the need for expensive 3D training data while producing models ready for animation and other uses. The work claims the approach works across many dog breeds and yields coherent results.

Core claim

CORGI eliminates the need for 3D supervision by using Canonical-Driven Orbital Generation to normalize arbitrary input poses and synthesize reliable 360-degree video observations with specialized Canonical and Orbit LoRAs, Consistency-aware Deformable 3DGS to anchor on a D-SMAL prior and learn vertex displacements via neural deformation fields that capture generative errors, and a self-supervised Deformation-Conditioned Generative Repair module to eliminate structural distortions and recover high-frequency details.

What carries the argument

The three-component CORGI pipeline: Canonical-Driven Orbital Generation (CDOG) for pose normalization and 360-degree synthesis via LoRAs, Consistency-aware Deformable 3DGS (CA-3DGS) for explicit error modeling with neural fields, and Deformation-Conditioned Generative Repair (DCGR) for self-supervised detail recovery.

If this is right

  • The output models are geometrically accurate and fully animatable for downstream applications.
  • The method generalizes across diverse dog breeds from unconstrained inputs.
  • No 3D supervision is required at any stage of training or inference.
  • High-frequency details are recovered while structural distortions are removed.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The consistency mechanisms could extend to single-image reconstruction of other articulated animals that lack 3D datasets.
  • Explicit modeling of generative errors via deformation fields may reduce the data requirements for similar 3D tasks on non-rigid objects.
  • Self-supervised repair after view synthesis might improve outputs in other pipelines that rely on generative multi-view creation.

Load-bearing premise

The Canonical-Driven Orbital Generation strategy with specialized LoRAs can normalize arbitrary input poses and synthesize reliable 360-degree observations without introducing uncorrectable generative inconsistencies.

What would settle it

A test set of single dog images where the generated 360-degree videos contain persistent view inconsistencies that produce visibly distorted vertex positions in the final 3D models after deformation fields and repair are applied.

read the original abstract

Reconstructing high-fidelity 3D models of highly articulated animals, such as dogs, from a single in-the-wild image remains a formidable challenge. In this paper, we introduce CORGI, a novel framework for consistency-aware 3D dog reconstruction from a single unconstrained image that completely eliminates the need for 3D supervision. To overcome generative inconsistencies and the lack of multi-view capture, our pipeline introduces three core components. First, we propose a Canonical-Driven Orbital Generation (CDOG) strategy, utilizing specialized Canonical and Orbit LoRAs to normalize arbitrary input poses and synthesize reliable 360-degree video observations. Second, we design a Consistency-aware Deformable 3DGS (CA-3DGS) module that anchors on a D-SMAL prior, explicitly modeling per-view generative errors through dedicated neural deformation fields to learn accurate vertex-level displacements. Finally, to eliminate structural distortions and recover high-frequency details, we introduce a self-supervised Deformation-Conditioned Generative Repair (DCGR) module. Extensive experiments demonstrate that CORGI achieves state-of-the-art performance, generalizing seamlessly across diverse dog breeds to produce geometrically accurate, visually coherent, and fully animatable 3D assets ready for downstream applications.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces CORGI, a framework for 3D dog reconstruction from a single in-the-wild image that eliminates 3D supervision. It proposes three components: Canonical-Driven Orbital Generation (CDOG) using specialized Canonical and Orbit LoRAs to normalize poses and synthesize 360-degree observations; Consistency-aware Deformable 3DGS (CA-3DGS) that anchors on a D-SMAL prior and models per-view generative errors via neural deformation fields for vertex displacements; and self-supervised Deformation-Conditioned Generative Repair (DCGR) to fix structural distortions and recover high-frequency details. The manuscript claims state-of-the-art performance with generalization across dog breeds, yielding geometrically accurate, visually coherent, and fully animatable 3D assets.

Significance. If the central claims hold, the work would be significant for single-view 3D reconstruction of highly articulated animals, as it removes reliance on 3D ground truth while addressing generative inconsistencies through explicit modeling and repair stages. The integration of LoRA-based view synthesis with deformable 3DGS and self-supervised repair offers a practical path to animatable assets from unconstrained images.

major comments (2)
  1. [CDOG strategy (Abstract and method description)] The no-3D-supervision claim rests on CDOG producing reliable 360° observations that CA-3DGS can anchor to D-SMAL without uncorrectable artifacts. The manuscript provides no quantitative bound on residual view inconsistency after CDOG, nor an ablation isolating CDOG's contribution to final geometric error (see skeptic note on pose-dependent artifacts or texture drift not captured by the neural deformation fields).
  2. [CA-3DGS module] CA-3DGS models per-view generative errors through dedicated neural deformation fields, but it is unclear how these fields are trained or regularized to ensure the vertex displacements remain consistent across the synthesized views without introducing bias from the generative step.
minor comments (2)
  1. [Abstract] The abstract states 'extensive experiments demonstrate SOTA performance' but provides no specific metrics, baselines, or dataset details; these should be summarized with quantitative results in the abstract or introduction.
  2. [Method sections] Notation for D-SMAL, DCGR, and the neural deformation fields should be defined at first use with explicit equations for the deformation fields and loss terms.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback and positive assessment of the work's significance. We address each major comment point by point below, indicating where revisions will be made to strengthen the manuscript.

read point-by-point responses
  1. Referee: [CDOG strategy (Abstract and method description)] The no-3D-supervision claim rests on CDOG producing reliable 360° observations that CA-3DGS can anchor to D-SMAL without uncorrectable artifacts. The manuscript provides no quantitative bound on residual view inconsistency after CDOG, nor an ablation isolating CDOG's contribution to final geometric error (see skeptic note on pose-dependent artifacts or texture drift not captured by the neural deformation fields).

    Authors: We acknowledge that the current manuscript lacks an explicit quantitative bound on residual view inconsistency after CDOG and does not include an ablation isolating its contribution to geometric error. The overall performance gains and qualitative results support the approach, but to directly address this point we will add both a consistency metric (e.g., average cross-view PSNR on synthesized observations) and a dedicated ablation study measuring the impact of CDOG on final reconstruction error in the revised version. revision: yes

  2. Referee: [CA-3DGS module] CA-3DGS models per-view generative errors through dedicated neural deformation fields, but it is unclear how these fields are trained or regularized to ensure the vertex displacements remain consistent across the synthesized views without introducing bias from the generative step.

    Authors: The neural deformation fields are optimized via a self-supervised objective combining photometric rendering loss across the CDOG-synthesized views with an explicit cross-view consistency regularizer on the predicted displacements. This procedure is described in Section 3.2, but we agree the description can be made clearer regarding regularization details and bias mitigation. We will expand the method section with the precise loss formulation and training schedule in the revision. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation relies on independent generative priors and external anchors

full rationale

The pipeline introduces CDOG (LoRA-based view synthesis), CA-3DGS (deformation fields anchored to D-SMAL prior), and DCGR (self-supervised repair) as sequential modules. No equations or self-citations are presented that reduce a claimed prediction or consistency model to a fitted quantity from the same generative step by construction. The no-3D-supervision claim rests on external generative models and a fixed SMAL prior rather than internal re-fitting of the target output. This is the common case of a self-contained method paper whose central claims remain independently falsifiable against held-out images or alternative generators.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review; no explicit free parameters, axioms, or invented entities can be identified from the provided text. The method implicitly relies on pre-trained generative models and the D-SMAL prior, but details are absent.

pith-pipeline@v0.9.1-grok · 5765 in / 1282 out tokens · 31463 ms · 2026-07-02T15:24:23.271323+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

71 extracted references · 19 canonical work pages · 8 internal anchors

  1. [1]

    Advances and Trends in the 3D Reconstruction of the Shape and Motion of Animals.arXiv preprint arXiv:2508.16062, 2025

    Li Z, Amrani A, Rai S, Laga H. Advances and Trends in the 3D Reconstruction of the Shape and Motion of Animals.arXiv preprint arXiv:2508.16062, 2025

  2. [2]

    Bite: Beyond Priors for Improved Three-D Dog Pose Estimation

    R¨ uegg N, Tripathi S, Schindler K, Black MJ, Zuffi S. Bite: Beyond Priors for Improved Three-D Dog Pose Estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, 8867–8876

  3. [3]

    3D Menagerie: Modeling the 3D Shape and Pose of Animals

    Zuffi S, Kanazawa A, Jacobs DW, Black MJ. 3D Menagerie: Modeling the 3D Shape and Pose of Animals. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2017, 6365–6373

  4. [4]

    Animal Avatars: Recon- structing Animatable 3D Animals from Casual Videos

    Sabathier R, Mitra NJ, Novotny D. Animal Avatars: Recon- structing Animatable 3D Animals from Casual Videos. In European Conference on Computer Vision, 2024, 270–287

  5. [5]

    Learning the 3d Fauna of the Web

    Li Z, Litvak D, Li R, Zhang Y, Jakab T, Rupprecht C, Wu S, Vedaldi A, Wu J. Learning the 3d Fauna of the Web. In CORGI:Consistency-Aware 3D DogReconstruction from a SingleImage in the Wild 15 Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, 9752–9762

  6. [6]

    Lassie: Learning Articulated Shapes from Sparse Image En- semble via 3D Part Discovery.Advances in Neural Information Processing Systems, 2022, 35: 15296–15308

    Yao CH, Hung WC, Li Y, Rubinstein M, Yang MH, Jampani V. Lassie: Learning Articulated Shapes from Sparse Image En- semble via 3D Part Discovery.Advances in Neural Information Processing Systems, 2022, 35: 15296–15308

  7. [7]

    Magicpony: Learning Articulated 3D Animals in the Wild

    Wu S, Li R, Jakab T, Rupprecht C, Vedaldi A. Magicpony: Learning Articulated 3D Animals in the Wild. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, 8792–8802

  8. [8]

    Zero-1-to-3: Zero-shot one Image to 3D Object

    Liu R, Wu R, Van Hoorick B, Tokmakov P, Zakharov S, Vondrick C. Zero-1-to-3: Zero-shot one Image to 3D Object. InProceedings of the IEEE/CVF International Conference on Computer Vision, 2023, 9298–9309

  9. [9]

    Syncdreamer: Generating Multiview-Consistent Images from a Single-View Image

    Liu Y, Lin C, Zeng Z, Long X, Liu L, Komura T, Wang W. Syncdreamer: Generating Multiview-Consistent Images from a Single-View Image. InInternational Conference on Learning Representations, volume 2024, 2024, 27676–27697

  10. [10]

    One-2-3-45++: Fast Single Image to 3D Objects with Consistent Multi-View Generation and 3D Diffusion

    Liu M, Shi R, Chen L, Zhang Z, Xu C, Wei X, Chen H, Zeng C, Gu J, Su H. One-2-3-45++: Fast Single Image to 3D Objects with Consistent Multi-View Generation and 3D Diffusion. arXiv preprint arXiv:2311.07885, 2023

  11. [11]

    Genfusion: Closing the Loop between Reconstruction and Generation via Videos

    Wu S, Xu C, Huang B, Geiger A, Chen A. Genfusion: Closing the Loop between Reconstruction and Generation via Videos. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025, 6078–6088

  12. [12]

    Gen3c: 3D-Informed World- Consistent Video Generation with Precise Camera Control

    Ren X, Shen T, Huang J, Ling H, Lu Y, Nimier-David M, M¨ uller T, Keller A, Fidler S, Gao J. Gen3c: 3D-Informed World- Consistent Video Generation with Precise Camera Control. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025, 6121–6132

  13. [13]

    DogRecon: Canine Prior- Guided Animatable 3D Gaussian Dog Reconstruction From A Single Image.International Journal of Computer Vision, 2025, 133(9): 6332–6346

    Cho G, Kang C, Soon D, Joo K. DogRecon: Canine Prior- Guided Animatable 3D Gaussian Dog Reconstruction From A Single Image.International Journal of Computer Vision, 2025, 133(9): 6332–6346

  14. [14]

    3D Gaussian Splatting for Real-Time Radiance Field Rendering

    Kerbl B, Kopanas G, Leimk¨ uhler T, Drettakis G, et al.. 3D Gaussian Splatting for Real-Time Radiance Field Rendering. ACM Transactions on Graphics (TOG), 2023, 42(4): 139–1

  15. [15]

    Hsmal: Detailed Horse Shape and Pose Reconstruction for Motion Pattern Recognition.arXiv preprint arXiv:2106.10102, 2021

    Li C, Ghorbani N, Broom´e S, Rashid M, Black MJ, Hernlund E, Kjellstr¨om H, Zuffi S. Hsmal: Detailed Horse Shape and Pose Reconstruction for Motion Pattern Recognition.arXiv preprint arXiv:2106.10102, 2021

  16. [16]

    Varen: Very Accurate and Realistic Equine Network

    Zuffi S, Mellbin Y, Li C, Hoeschle M, Kjellstr¨om H, Polikovsky S, Hernlund E, Black MJ. Varen: Very Accurate and Realistic Equine Network. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, 5374– 5383

  17. [17]

    Lasr: Learning Articulated Shape Reconstruction from a Monocular Video

    Yang G, Sun D, Jampani V, Vlasic D, Cole F, Chang H, Ramanan D, Freeman WT, Liu C. Lasr: Learning Articulated Shape Reconstruction from a Monocular Video. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, 15980–15989

  18. [18]

    Barc: Learning to Regress 3D Dog Shape from Images by Exploiting Breed Information

    Rueegg N, Zuffi S, Schindler K, Black MJ. Barc: Learning to Regress 3D Dog Shape from Images by Exploiting Breed Information. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, 3876–3884

  19. [19]

    Animer: Animal Pose and shape Estimation using Family Aware Transformer

    Lyu J, Zhu T, Gu Y, Lin L, Cheng P, Liu Y, Tang X, An L. Animer: Animal Pose and shape Estimation using Family Aware Transformer. InProceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition, 2025, 17486–17496

  20. [20]

    Gart: Gaussian Articulated Template Models

    Lei J, Wang Y, Pavlakos G, Liu L, Daniilidis K. Gart: Gaussian Articulated Template Models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, 19876–19887

  21. [21]

    Banmo: Building Animatable 3D Neural Models from Many Casual Videos

    Yang G, Vo M, Neverova N, Ramanan D, Vedaldi A, Joo H. Banmo: Building Animatable 3D Neural Models from Many Casual Videos. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, 2863–2873

  22. [22]

    Hi-Lassie: High-Fidelity Articulated Shape and Skeleton Discovery from Sparse Image Ensemble

    Yao CH, Hung WC, Li Y, Rubinstein M, Yang MH, Jampani V. Hi-Lassie: High-Fidelity Articulated Shape and Skeleton Discovery from Sparse Image Ensemble. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, 4853–4862

  23. [23]

    Lepard: Learning Explicit Part Discovery for 3D Articulated Shape Reconstruction.Advances in Neural Information Processing Systems, 2023, 36: 54187–54198

    Liu D, Stathopoulos A, Zhangli Q, Gao Y, Metaxas D. Lepard: Learning Explicit Part Discovery for 3D Articulated Shape Reconstruction.Advances in Neural Information Processing Systems, 2023, 36: 54187–54198

  24. [24]

    Artic3d: Learning Robust Articulated 3D Ahapes from Noisy Web Image Collections.Advances in Neural Information Processing Systems, 2023, 36: 48173–48184

    Yao CH, Raj A, Hung WC, Rubinstein M, Li Y, Yang MH, Jam- pani V. Artic3d: Learning Robust Articulated 3D Ahapes from Noisy Web Image Collections.Advances in Neural Information Processing Systems, 2023, 36: 48173–48184

  25. [25]

    Casa: Category- Agnostic Skeletal Animal Reconstruction.Advances in Neural Information Processing Systems, 2022, 35: 28559–28574

    Wu Y, Chen Z, Liu S, Ren Z, Wang S. Casa: Category- Agnostic Skeletal Animal Reconstruction.Advances in Neural Information Processing Systems, 2022, 35: 28559–28574

  26. [26]

    Dualpm: Dual Posed-Canonical Point Maps for 3D Shape and Pose Reconstruction

    Kaye B, Jakab T, Wu S, Ruprecht C, Vedaldi A. Dualpm: Dual Posed-Canonical Point Maps for 3D Shape and Pose Reconstruction. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025, 6425– 6435

  27. [27]

    Diffusion Models for 3D Generation: A Survey.Computational Visual Media, 2025, 11(1): 1–28

    Wang C, Peng HY, Liu YT, Gu J, Hu SM. Diffusion Models for 3D Generation: A Survey.Computational Visual Media, 2025, 11(1): 1–28

  28. [28]

    What do Single-View 3D Reconstruction Networks Learn? In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, 3405–3414

    Tatarchenko M, Richter SR, Ranftl R, Li Z, Koltun V, Brox T. What do Single-View 3D Reconstruction Networks Learn? In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, 3405–3414

  29. [29]

    Single Image 3D Object Re- construction Based on Deep Learning: A Review.Multimedia Tools and Applications, 2021, 80: 463–498

    Fu K, Peng J, He Q, Zhang H. Single Image 3D Object Re- construction Based on Deep Learning: A Review.Multimedia Tools and Applications, 2021, 80: 463–498

  30. [30]

    Learning view Priors for Single-View 3D Reconstruction

    Kato H, Harada T. Learning view Priors for Single-View 3D Reconstruction. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, 9778– 9787

  31. [31]

    Self-Supervised Single-View 3D Reconstruction via Semantic 16 Y

    Li X, Liu S, Kim K, De Mello S, Jampani V, Yang MH, Kautz J. Self-Supervised Single-View 3D Reconstruction via Semantic 16 Y. Wu, W. Li, B. Zhu, Y. Liu, Y. Cai, L. Liu Consistency. InEuropean Conference on Computer Vision, 2020, 677–693

  32. [32]

    Single-View 3D reconstruction: A Survey of Deep Learning Methods.Computers & Graphics, 2021, 94: 164–190

    Fahim G, Amin K, Zarif S. Single-View 3D reconstruction: A Survey of Deep Learning Methods.Computers & Graphics, 2021, 94: 164–190

  33. [33]

    Graf: Generative Radiance Fields for 3D-Aware Image Synthesis.Advances in Neural Information Processing Systems, 2020, 33: 20154– 20166

    Schwarz K, Liao Y, Niemeyer M, Geiger A. Graf: Generative Radiance Fields for 3D-Aware Image Synthesis.Advances in Neural Information Processing Systems, 2020, 33: 20154– 20166

  34. [34]

    Giraffe: Representing Scenes as Com- positional Generative Neural Feature Fields

    Niemeyer M, Geiger A. Giraffe: Representing Scenes as Com- positional Generative Neural Feature Fields. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, 11453–11464

  35. [35]

    Stylenerf: A Style-Based 3D-Aware Generator for High-Resolution Image Synthesis

    Gu J, Liu L, Wang P, Theobalt C. Stylenerf: A Style-Based 3D-Aware Generator for High-Resolution Image Synthesis. arXiv preprint arXiv:2110.08985, 2021

  36. [36]

    Efficient Geometry-Aware 3D Generative Adversarial Networks

    Chan ER, Lin CZ, Chan MA, Nagano K, Pan B, De Mello S, Gallo O, Guibas LJ, Tremblay J, Khamis S, et al.. Efficient Geometry-Aware 3D Generative Adversarial Networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, 16123–16133

  37. [37]

    DreamFusion: Text-to-3D using 2D Diffusion

    Poole B, Jain A, Barron JT, Mildenhall B. Dreamfusion: Text- to-3D using 2D Diffusion.arXiv preprint arXiv:2209.14988, 2022

  38. [38]

    Luciddreamer: Towards High-Fidelity Text-to-3D Generation via Interval Score Matching.arXiv preprint arXiv:2311.11284, 2023

    Liang Y, Yang X, Lin J, Li H, Xu X, Chen Y. Luciddreamer: Towards High-Fidelity Text-to-3D Generation via Interval Score Matching.arXiv preprint arXiv:2311.11284, 2023

  39. [39]

    Magic123: One Image to High-Quality 3D Object Generation using both 2D and 3D Diffusion Priors

    Qian G, Mai J, Hamdi A, Ren J, Siarohin A, Li B, Lee HY, Skorokhodov I, Wonka P, Tulyakov S, et al.. Magic123: One Image to High-Quality 3D Object Generation using both 2D and 3D Diffusion Priors. InInternational Conference on Learning Representations, volume 2024, 2024, 48142–48159

  40. [40]

    Prolific- dreamer: High-Fidelity and Diverse Text-to-3D Generation with Variational Score Distillation.Advances in Neural Infor- mation Processing Systems, 2024, 36

    Wang Z, Lu C, Wang Y, Bao F, Li C, Su H, Zhu J. Prolific- dreamer: High-Fidelity and Diverse Text-to-3D Generation with Variational Score Distillation.Advances in Neural Infor- mation Processing Systems, 2024, 36

  41. [41]

    MVDream: Multi-View Diffusion for 3D Generation

    Shi Y, Wang P, Ye J, Mai L, Li K, Yang X. MVDream: Multi-View Diffusion for 3D Generation. InInternational Conference on Learning Representations, volume 2024, 2024, 39838–39859

  42. [42]

    Wonder3d: Single Image to 3D using Cross-Domain Diffusion.CVPR, 2024

    Long X, Guo YC, Lin C, Liu Y, Dou Z, Liu L, Ma Y, Zhang SH, Habermann M, Theobalt C, et al.. Wonder3d: Single Image to 3D using Cross-Domain Diffusion.CVPR, 2024

  43. [43]

    LRM: Large Reconstruction Model for Single Image to 3D

    Hong Y, Zhang K, Gu J, Bi S, Zhou Y, Liu D, Liu F, Sunkavalli K, Bui T, Tan H. Lrm: Large Reconstruction Model for Single Image to 3D.arXiv preprint arXiv:2311.04400, 2023

  44. [44]

    Pf-Lrm: Pose-Free Large Reconstruction Model for Joint Pose and Shape Prediction.arXiv preprint arXiv:2311.12024, 2023

    Wang P, Tan H, Bi S, Xu Y, Luan F, Sunkavalli K, Wang W, Xu Z, Zhang K. Pf-Lrm: Pose-Free Large Reconstruction Model for Joint Pose and Shape Prediction.arXiv preprint arXiv:2311.12024, 2023

  45. [45]

    Crm: Single Image to 3D Textured Mesh with Convolutional Reconstruction Model.arXiv preprint arXiv:2403.05034, 2024

    Wang Z, Wang Y, Chen Y, Xiang C, Chen S, Yu D, Li C, Su H, Zhu J. Crm: Single Image to 3D Textured Mesh with Convolutional Reconstruction Model.arXiv preprint arXiv:2403.05034, 2024

  46. [46]

    LGM: Large Multi-View Gaussian Model for High-Resolution 3D Content Creation.arXiv preprint arXiv:2402.05054, 2024

    Tang J, Chen Z, Chen X, Wang T, Zeng G, Liu Z. LGM: Large Multi-View Gaussian Model for High-Resolution 3D Content Creation.arXiv preprint arXiv:2402.05054, 2024

  47. [47]

    InstantMesh: Efficient 3D Mesh Generation from a Single Image with Sparse-view Large Reconstruction Models

    Xu J, Cheng W, Gao Y, Wang X, Gao S, Shan Y. InstantMesh: Efficient 3D Mesh Generation from a Single Image with Sparse-View Large Reconstruction Models.arXiv preprint arXiv:2404.07191, 2024

  48. [48]

    TripoSR: Fast 3D Object Reconstruction from a Single Image

    Tochilkin D, Pankratz D, Liu Z, Huang Z, Letts A, Li Y, Liang D, Laforte C, Jampani V, Cao YP. Triposr: Fast 3D Object Reconstruction from a Single Image.arXiv preprint arXiv:2403.02151, 2024

  49. [49]

    One-2- 3-45: Any Single Image to 3D Mesh in 45 Seconds without Per-Shape Optimization.Advances in Neural Information Pro- cessing Systems, 2024, 36

    Liu M, Xu C, Jin H, Chen L, Varma T M, Xu Z, Su H. One-2- 3-45: Any Single Image to 3D Mesh in 45 Seconds without Per-Shape Optimization.Advances in Neural Information Pro- cessing Systems, 2024, 36

  50. [50]

    Clay: A Controllable Large-Scale Generative Model for Creating High-Quality 3D Assets.ACM Transactions on Graphics (TOG), 2024, 43(4): 1–20

    Zhang L, Wang Z, Zhang Q, Qiu Q, Pang A, Jiang H, Yang W, Xu L, Yu J. Clay: A Controllable Large-Scale Generative Model for Creating High-Quality 3D Assets.ACM Transactions on Graphics (TOG), 2024, 43(4): 1–20

  51. [51]

    Structured 3D Latents for Scalable and Versatile 3D Generation

    Xiang J, Lv Z, Xu S, Deng Y, Wang R, Zhang B, Chen D, Tong X, Yang J. Structured 3D Latents for Scalable and Versatile 3D Generation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025, 21469– 21480

  52. [52]

    Native and Compact Structured Latents for 3D Generation

    Xiang J, Chen X, Xu S, Wang R, Lv Z, Deng Y, Zhu H, Dong Y, Zhao H, Yuan NJ, et al.. Native and Compact Structured Latents for 3D Generation.arXiv preprint arXiv:2512.14692, 2025

  53. [53]

    Hunyuan3D 2.0: Scaling Diffusion Models for High Resolution Textured 3D Assets Generation

    Zhao Z, Lai Z, Lin Q, Zhao Y, Liu H, Yang S, Feng Y, Yang M, Zhang S, Yang X, et al.. Hunyuan3d 2.0: Scaling Diffusion Models for High Resolution Textured 3d Assets Generation. arXiv preprint arXiv:2501.12202, 2025

  54. [54]

    Ewa Volume Splatting

    Zwicker M, Pfister H, Van Baar J, Gross M. Ewa Volume Splatting. InProceedings Visualization, 2001. VIS’01., 2001, 29–538

  55. [55]

    GSFix3D: Diffusion-Guided Repair of Novel Views in Gaussian Splatting.arXiv preprint arXiv:2508.14717, 2025

    Wei J, Leutenegger S, Schaefer S. GSFix3D: Diffusion-Guided Repair of Novel Views in Gaussian Splatting.arXiv preprint arXiv:2508.14717, 2025

  56. [56]

    Difix3d+: Improving 3D Reconstruc- tions with Single-Step Diffusion Models

    Wu JZ, Zhang Y, Turki H, Ren X, Gao J, Shou MZ, Fidler S, Gojcic Z, Ling H. Difix3d+: Improving 3D Reconstruc- tions with Single-Step Diffusion Models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025, 26024–26035

  57. [57]

    LoRA: Low-Rank Adaptation of Large Language Models, 2021

    Hu EJ, Shen Y, Wallis P, Allen-Zhu Z, Li Y, Wang S, Wang L, Chen W. LoRA: Low-Rank Adaptation of Large Language Models, 2021

  58. [58]

    Qwen-Image Technical Report

    Wu C, Li J, Zhou J, Lin J, Gao K, Yan K, Yin Sm, Bai S, Xu X, Chen Y, et al.. Qwen-Image Technical Report.arXiv preprint arXiv:2508.02324, 2025

  59. [59]

    Wan: Open and Advanced Large-Scale Video Generative Models

    Wan T, Wang A, Ai B, Wen B, Mao C, Xie CW, Chen D, Yu F, Zhao H, Yang J, Zeng J, Wang J, Zhang J, Zhou J, Wang J, CORGI:Consistency-Aware 3D DogReconstruction from a SingleImage in the Wild 17 Chen J, Zhu K, Zhao K, Yan K, Huang L, Feng M, Zhang N, Li P, Wu P, Chu R, Feng R, Zhang S, Sun S, Fang T, Wang T, Gui T, Weng T, Shen T, Lin W, Wang W, Wang W, Zho...

  60. [60]

    DogMo: A Large-Scale Multi-View RGB-D Dataset for 4D Canine Motion Recovery.arXiv preprint arXiv:2510.24117, 2025

    Wang Z, Chen S, Mo L, Gao X, Shen Y, Ding L, Liang W. DogMo: A Large-Scale Multi-View RGB-D Dataset for 4D Canine Motion Recovery.arXiv preprint arXiv:2510.24117, 2025

  61. [61]

    Soft Rasterizer: A Differentiable Renderer for Image-based 3D Reasoning

    Liu S, Li T, Chen W, Li H. Soft Rasterizer: A Differentiable Renderer for Image-based 3D Reasoning. InProceedings of the IEEE/CVF International Conference on Computer Vision, 2019, 7708–7717

  62. [62]

    Uv Gaussians: Joint Learning of Mesh Deformation and Gaussian Textures for Human Avatar Modeling.Knowledge- Based Systems, 2025, 320: 113470

    Jiang Y, Liao Q, Li X, Ma L, Zhang Q, Zhang C, Lu Z, Shan Y. Uv Gaussians: Joint Learning of Mesh Deformation and Gaussian Textures for Human Avatar Modeling.Knowledge- Based Systems, 2025, 320: 113470

  63. [63]

    Real-Time Large-Scale Deformation of Gaussian Splatting

    Gao L, Yang J, Zhang BT, Sun JM, Yuan YJ, Fu H, Lai YK. Real-Time Large-Scale Deformation of Gaussian Splatting. ACM Transactions on Graphics (TOG), 2024, 43(6): 1–17

  64. [64]

    As-Rigid-as-Possible Shape Manipulation.ACM Transactions on Graphics (TOG), 2005, 24(3): 1134–1141

    Igarashi T, Moscovich T, Hughes JF. As-Rigid-as-Possible Shape Manipulation.ACM Transactions on Graphics (TOG), 2005, 24(3): 1134–1141

  65. [65]

    The Unrea- sonable Effectiveness of Deep Features as a Perceptual Metric

    Zhang R, Isola P, Efros AA, Shechtman E, Wang O. The Unrea- sonable Effectiveness of Deep Features as a Perceptual Metric. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018, 586–595

  66. [66]

    GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium.Advances in Neural Information Processing Systems, 2017, 30: 1–12

    Heusel M, Ramsauer H, Unterthiner T, Nessler B, Hochreiter S. GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium.Advances in Neural Information Processing Systems, 2017, 30: 1–12

  67. [67]

    A Feature-Enriched Completely Blind Image Quality Evaluator.IEEE Transactions on Image Processing, 2015, 24(8): 2579–2591

    Zhang L, Zhang L, Bovik AC. A Feature-Enriched Completely Blind Image Quality Evaluator.IEEE Transactions on Image Processing, 2015, 24(8): 2579–2591

  68. [68]

    Dreamgaussian: Gen- erative Gaussian Splatting for Efficient 3D Content Creation

    Tang J, Ren J, Zhou H, Liu Z, Zeng G. Dreamgaussian: Gen- erative Gaussian Splatting for Efficient 3D Content Creation. InInternational Conference on Learning Representations, volume 2024, 2024, 33879–33896

  69. [69]

    Ar-1-to-3: Single Image to Consistent 3D Object via Next-View Prediction

    Zhang X, Zhou Y, Wang K, Wang Y, Li Z, Jiao S, Zhou D, Hou Q, Cheng MM. Ar-1-to-3: Single Image to Consistent 3D Object via Next-View Prediction. InProceedings of the IEEE/CVF International Conference on Computer Vision, 2025, 26273–26283

  70. [70]

    Stable Virtual Camera: Generative View Synthesis with Diffusion Models

    Zhou J, Gao H, Voleti V, Vasishta A, Yao CH, Boss M, Torr P, Rupprecht C, Jampani V. Stable Virtual Camera: Generative View Synthesis with Diffusion Models. InProceedings of the IEEE/CVF International Conference on Computer Vision, 2025, 12405–12414

  71. [71]

    Voyager: Long-Range and World-Consistent Video Diffusion for Explorable 3D Scene Generation.arXiv preprint arXiv:2506.04225, 2025

    Huang T, Zheng W, Wang T, Liu Y, Wang Z, Wu J, Jiang J, Li H, Lau RW, Zuo W, Guo C. Voyager: Long-Range and World-Consistent Video Diffusion for Explorable 3D Scene Generation.arXiv preprint arXiv:2506.04225, 2025