Restore3D: Breathing Life into Broken Objects with Shape and Texture Restoration

Xiaolong Shen; Yi Yang; Zongxin Yang

arxiv: 2607.00522 · v1 · pith:VSW3PIOPnew · submitted 2026-07-01 · 💻 cs.CV

Restore3D: Breathing Life into Broken Objects with Shape and Texture Restoration

Xiaolong Shen , Zongxin Yang , Yi Yang This is my paper

Pith reviewed 2026-07-02 14:39 UTC · model grok-4.3

classification 💻 cs.CV

keywords 3D object restorationshape and texture restorationmulti-view imagestextured mesh reconstructiondamaged objects3D inpaintingcultural heritagecoarse-to-fine reconstruction

0 comments

The pith

Restore3D restores both shape and texture of broken 3D objects from multi-view images using a mask self-perceiver and data synthesis pipeline.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Restore3D to restore incomplete or damaged 3D objects by recovering both geometry and surface textures at once. Prior methods typically complete only shapes, ignore textures, or fail on complex objects. Restore3D creates paired training data automatically from large 3D datasets and runs a multi-view model whose Mask Self-Perceiver with Depth-Aware Mask Rectifier produces rectified masks. These masks direct image integration that keeps observed patterns while refining new regions for high-resolution and view-consistent results. The refined images feed a coarse-to-fine reconstruction that yields detailed textured meshes, with experiments showing gains over inpainting, completion, and reconstruction baselines.

Core claim

Restore3D is a framework that simultaneously restores shape and texture of broken objects from multi-view images. An automated pipeline synthesizes paired incomplete-complete samples from large-scale 3D datasets to overcome limited training data. Its multi-view model uses a Mask Self-Perceiver module with a Depth-Aware Mask Rectifier whose learned rectified masks guide image integration and enhancement, retaining observed shape and texture while refining generated areas and overcoming low-resolution limits of the base model. Refined multi-view images then support a coarse-to-fine reconstruction that recovers detailed textured 3D meshes, producing higher-quality results than representative ba

What carries the argument

Mask Self-Perceiver module with Depth-Aware Mask Rectifier that learns rectified masks to guide image integration and enhancement while retaining observed patterns.

If this is right

Higher-quality multi-view restoration on both synthetic and real broken-object benchmarks.
Improved textured-mesh reconstruction compared with inpainting, completion, and reconstruction baselines.
Better handling of relatively complex and diverse objects.
Direct applicability to cultural heritage preservation, occluded object reconstruction, and artistic design.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The data synthesis pipeline could be adapted for other 3D modalities such as point clouds if similar paired data can be generated.
Adding temporal consistency terms might allow extension to video or dynamic scene restoration.
The mask rectifier could be tested as a plug-in module inside existing multi-view diffusion models for broader use.
Fine-tuning on domain-specific real broken-object photographs might further close the synthetic-to-real gap.

Load-bearing premise

The automated data generation pipeline that synthesizes paired incomplete-complete samples from large-scale 3D datasets produces training data whose distribution matches real-world broken objects sufficiently for the model to generalize.

What would settle it

Apply Restore3D and the baselines to a collection of real broken objects that have known complete ground-truth versions, then check whether restoration quality and mesh accuracy metrics show no improvement.

Figures

Figures reproduced from arXiv: 2607.00522 by Xiaolong Shen, Yi Yang, Zongxin Yang.

**Figure 1.** Figure 1: Restore3D targets preservation-aware shape-and-texture restoration of broken object-level 3D assets, producing plausible textured meshes for synthetic, simulated, and real broken objects. X. Shen, Z. Yang, and Y. Yang are with ReLER, CCAI, Zhejiang University, Hangzhou, China (Email: {sxlongcs, yangzongxin, yangyics}@zju.edu.cn). Y. Yang is the corresponding author. 1 INTRODUCTION Recent advances in 3D gen… view at source ↗

**Figure 2.** Figure 2: The importance of masks. In single-view inpainting, user-provided masks define the regions requiring inpainting. However, in a multi-view context, manually creating consistent masks across all views is impractical. Directly inverting object masks to serve as inpainting masks inevitably causes issues (see Prob. 1 & 3). Moreover, manually adjusting masks based on depth information (see Prob. 2) is labor-inte… view at source ↗

**Figure 3.** Figure 3: Multi-view Image Inpainting. We carefully design a mask self-perceiver based on a multi-view diffusion model that composes the image and text features with a spatial mask predicted by a depth-aware mask rectifier, therefore the model can automatically perceive the missing part and further generate it meanwhile preserving the original parts. self-attention and camera embedding to achieve multi-view text-to-… view at source ↗

**Figure 4.** Figure 4: Image Integration and Enhancement Pipeline using Rectified Masks. fn = (1 − Mr)st + Mrsr (4) Conv is convolution layers, CBAM is Convolutional Block Attention Module [85]. Training objectives Given training samples, including incomplete images I, depth images D, incomplete masks M, text prompts P and camera embedding C, the multi-view inpainting loss can be formulated as follows, L = min θ Ez,ϵ∼N(0,I),t∥ϵ… view at source ↗

**Figure 5.** Figure 5: Geometry and Texture Refinement. We separately refine the geometry and texture of the coarse results inferred by LRMs [19]. of ControlNet-Tile to enhance the images. 3. Image harmonizing using ControlNet-Tile with a blending strategy. Directly using ControlNet-Tile will alter the original pattern and destroy the integration step. Inspired by previous works [27], [87], we incorporate a mask blending techni… view at source ↗

**Figure 6.** Figure 6: Visual Comparison with Inpainting Methods. TABLE 1: Comparison with Inpainting Methods. △ means using Depth-Anything [14] to obtain the depth images. ♣ means using MV-adapter [88]. ♡ means using our model’s predicted masks as inpainting masks. Method PSNR ↑ LPIPS ↓ FID ↓ SSIM ↑ Repaint 10.55 0.31 69.57 0.76 SD 12.58 0.22 61.15 0.83 ControlNet 10.66 0.30 69.91 0.76 Pix2gestalt ♣ 16.43 0.21 75.08 0.86 MVInpa… view at source ↗

**Figure 7.** Figure 7: Visual Comparison with Reconstruction Models. TABLE 3: Generalization Ability on Real-world Dataset [25]. Method PSNR ↑ LPIPS ↓ SSIM ↑ SD 12.59 0.72 0.40 Controlnet 15.63 0.55 0.56 Nerfiller 18.94 0.52 0.81 Instant3dit 23.11 0.14 0.96 Ours 26.91 0.09 0.97 TABLE 4: Generalization Ability on Physically-simulated Dataset [24]. Method PSNR ↑ LPIPS ↓ SSIM ↑ SD 12.02 0.74 0.53 ControlNet 14.50 0.59 0.71 NeRFille… view at source ↗

**Figure 8.** Figure 8: Visualization of Ablation Studies. a. Bilinear b. Real-Esrgan c. Controlnet-tile (low strength) d. Real-Esrgan + Image integration e. Real-Esrgan + Image integration + Controlnet-tile (w/ mask) d. Real-Esrgan + Image integration e. Real-Esrgan + Image integration + Controlnet-tile (w/ mask) f. Real-Esrgan + Image integration + Controlnet-tile (w/o mask) c. Controlnet-tile (high strength) Input [PITH_FULL_… view at source ↗

**Figure 9.** Figure 9: The Effects of Image Integration and Enhancement TABLE 7: Ablation Studies of Image Integration and Enhancement (256px→1024px). RE: Real-ESRGAN, II: Image Integration, CT: ControlNet-tile, MB: Mask Blending. Variant (e) is our default. # RE II CT MB PSNR ↑ LPIPS ↓ SSIM ↑ a ✗ ✗ ✗ ✗ 26.83 0.10 0.97 b ✓ ✗ ✗ ✗ 26.59 0.08 0.97 c ✗ ✗ ✓ ✗ 26.56 0.08 0.96 d ✓ ✓ ✗ ✗ 27.13 0.06 0.97 e ✓ ✓ ✓ ✓ 26.94 0.06 0.97 f ✓ ✓… view at source ↗

**Figure 11.** Figure 11: Visualization under Different Lighting Conditions. TABLE 8: Different Lighting Settings. Setting PSNR ↑ LPIPS ↓ SSIM ↑ Top area light 25.18 0.06 0.95 Multiple area lights 25.50 0.06 0.95 Environment light 25.28 0.06 0.95 TABLE 9: More Generated Views. Setting PSNR ↑ LPIPS ↓ FID ↓ SSIM ↑ 4-view 25.50 0.06 31.82 0.95 6-view 25.00 0.07 24.70 0.95 8-view 25.17 0.07 20.49 0.95 TABLE 10: View-consistency Scorin… view at source ↗

**Figure 12.** Figure 12: Different Colors on Broken Planes. Input Input SDFusion SDFusion Ours Ours [PITH_FULL_IMAGE:figures/full_fig_p011_12.png] view at source ↗

**Figure 13.** Figure 13: Visual Comparison with SDFusion [1] Input Generated Generated Images Images 3D3D Enhancer Enchancer Ours Ours [PITH_FULL_IMAGE:figures/full_fig_p011_13.png] view at source ↗

**Figure 14.** Figure 14: Visual Comparison with 3DEnhancer [37]. Input DiffEdit Ours Input + Ours GT Input DiffEdit Ours Input + Ours GT [PITH_FULL_IMAGE:figures/full_fig_p011_14.png] view at source ↗

**Figure 15.** Figure 15: Generated Mask Quality. verse, OmniObject3D) and unseen datasets (Breaking Bad Dataset, Fantastic Breaks) in [PITH_FULL_IMAGE:figures/full_fig_p011_15.png] view at source ↗

**Figure 16.** Figure 16: Broken Rate vs. Performance. Strength leads to inconsistent or misaligned results that do not match the preserved (visible) regions. In contrast, moderate Strength values (e.g., 0.25) reliably maintain alignment while enhancing detail quality. 6 CONCLUSION In this paper, we propose a novel framework named Restore3D, consisting of multi-view image inpainting and reconstruction, to simultaneously complete… view at source ↗

read the original abstract

Restoring incomplete or damaged 3D objects is crucial for cultural heritage preservation, occluded object reconstruction, and artistic design. Existing methods primarily focus on geometric completion, often neglecting texture restoration and struggling with relatively complex and diverse objects. We introduce Restore3D, a novel framework that simultaneously restores both the shape and texture of broken objects using multi-view images. To address limited training data, we develop an automated data generation pipeline that synthesizes paired incomplete-complete samples from large-scale 3D datasets. Central to Restore3D is a multi-view model, enhanced by a carefully designed Mask Self-Perceiver module with a Depth-Aware Mask Rectifier. The rectified masks learned by the self-perceiver guide an image integration and enhancement phase, helping retain observed shape and texture patterns while refining the generated regions and mitigating the low-resolution limitations of the base model, yielding high-resolution, semantically coherent, and view-consistent multi-view images. A coarse-to-fine reconstruction strategy is then employed to recover detailed textured 3D meshes from refined multi-view images. Experiments on synthetic and real broken-object benchmarks show that Restore3D improves multi-view restoration quality and textured-mesh reconstruction over representative inpainting, completion, and reconstruction baselines in the evaluated settings. Project Page: restore3dx.github.io

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Restore3D adds a Mask Self-Perceiver and synthetic data pipeline to multi-view shape-plus-texture restoration, but the real-benchmark gains hinge on unverified similarity between simulated and actual breaks.

read the letter

The core contribution is a multi-view pipeline that restores both geometry and texture on broken objects. It uses an automated synthesis step to create training pairs from existing 3D datasets, then a Mask Self-Perceiver with Depth-Aware Mask Rectifier to clean up observed regions before feeding refined images into a coarse-to-fine mesh stage. That combination is new relative to separate inpainting or completion baselines, and the rectifier step looks like a sensible way to keep view consistency while pushing resolution.

The paper does the practical work of showing the full stack on both synthetic and real test sets, which is more than most abstract-only claims deliver. The automated data route is a reasonable workaround for scarce real broken-object scans.

The soft spot is exactly the one the stress-test flags: the transfer from synthetic fractures to real ones. Nothing in the abstract or the described experiments quantifies how close the generated break geometry, edge statistics, or texture discontinuities are to real damage. If the full paper has distribution comparisons or an ablation that swaps real versus synthetic masks, that would tighten the claim; otherwise the real-benchmark improvements rest on an assumption that is not yet shown to hold. Minor issues include the usual lack of error bars or failure-case analysis, but those are secondary.

This is for groups already working on multi-view 3D reconstruction or heritage scanning who need a concrete baseline for joint shape-texture repair. A reader looking for a ready-to-adapt pipeline would get value; someone chasing a new theoretical angle would not.

It deserves peer review. The problem is well-motivated, the architecture is described clearly enough to reproduce, and the experiments are at least on the right benchmarks. A referee can press on the data-fidelity question without the paper being rejected outright.

Referee Report

2 major / 2 minor

Summary. The manuscript presents Restore3D, a novel framework for simultaneous shape and texture restoration of broken 3D objects using multi-view images. It addresses data scarcity with an automated pipeline generating paired incomplete-complete samples from large-scale 3D datasets. The core is a multi-view model with a Mask Self-Perceiver module incorporating a Depth-Aware Mask Rectifier to guide image integration and enhancement for high-resolution, consistent outputs. A coarse-to-fine reconstruction then produces detailed textured meshes. The paper claims superior performance over inpainting, completion, and reconstruction baselines on both synthetic and real broken-object benchmarks.

Significance. If the experimental results are robust, Restore3D could have significant impact in fields requiring 3D object restoration, such as cultural heritage preservation and occluded reconstruction. The approach of jointly handling shape and texture via multi-view consistency is a strength. The automated synthetic data pipeline is a valuable contribution for training such models. The design of the Mask Self-Perceiver and Depth-Aware Mask Rectifier offers a specific mechanism for retaining observed patterns while refining generated regions. These elements, if validated, position the work as an advance over separate inpainting and completion methods.

major comments (2)

Abstract: The abstract states that Restore3D 'improves multi-view restoration quality and textured-mesh reconstruction over representative ... baselines in the evaluated settings' but provides no quantitative metrics, specific numbers, error analysis, or dataset details. This absence makes it difficult to evaluate the magnitude of the claimed improvements, which is load-bearing for the central experimental claim.
Data generation pipeline (methods section): The generalization to real broken-object benchmarks depends on the automated synthetic data pipeline producing break geometries, surface statistics, and texture discontinuities that match real-world distributions. No quantitative validation, such as distribution distances or ablations comparing real vs. synthetic break realism, is mentioned. This is the weakest link in the argument from training to real-benchmark results.

minor comments (2)

Abstract: The project page URL is given but could be formatted as a hyperlink for accessibility.
Throughout: Some module names like 'Mask Self-Perceiver' and 'Depth-Aware Mask Rectifier' could benefit from a dedicated notation table or figure for clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. The comments highlight opportunities to strengthen the presentation of results and the justification for our data pipeline. We address each major comment below.

read point-by-point responses

Referee: Abstract: The abstract states that Restore3D 'improves multi-view restoration quality and textured-mesh reconstruction over representative ... baselines in the evaluated settings' but provides no quantitative metrics, specific numbers, error analysis, or dataset details. This absence makes it difficult to evaluate the magnitude of the claimed improvements, which is load-bearing for the central experimental claim.

Authors: We agree that the abstract would benefit from quantitative support. In the revised version we will insert concise numerical results (e.g., average PSNR/SSIM gains on the synthetic benchmark and Chamfer-distance reductions on mesh reconstruction) together with the names of the primary datasets used, while remaining within the word limit. revision: yes
Referee: Data generation pipeline (methods section): The generalization to real broken-object benchmarks depends on the automated synthetic data pipeline producing break geometries, surface statistics, and texture discontinuities that match real-world distributions. No quantitative validation, such as distribution distances or ablations comparing real vs. synthetic break realism, is mentioned. This is the weakest link in the argument from training to real-benchmark results.

Authors: We acknowledge that explicit distributional comparisons (e.g., Wasserstein distances on break geometry or texture statistics) between synthetic and real breaks are not reported. The pipeline follows physically motivated fracture rules drawn from prior graphics literature, and the competitive results on the real broken-object benchmark provide indirect evidence of transfer. We will expand the methods section with additional qualitative examples and a brief discussion of the design assumptions; a full statistical validation would require new experiments that lie outside the present study. revision: partial

Circularity Check

0 steps flagged

No circularity: experimental claims rest on external benchmarks, not self-referential fits or derivations.

full rationale

The paper presents a new framework (Restore3D) with an automated synthetic data pipeline and a multi-view restoration architecture (Mask Self-Perceiver + Depth-Aware Mask Rectifier + coarse-to-fine reconstruction). All load-bearing claims are validated via comparative experiments on synthetic and real benchmarks against external baselines. No equations, fitted parameters renamed as predictions, self-citation chains, or ansatzes are present in the provided text. The synthetic-to-real transfer is an empirical assumption, not a definitional reduction. This is the standard non-circular case for a methods paper.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract provides no explicit free parameters, axioms, or invented entities; the data generation pipeline and mask rectifier are described at conceptual level only.

pith-pipeline@v0.9.1-grok · 5757 in / 1034 out tokens · 22754 ms · 2026-07-02T14:39:59.070645+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

97 extracted references · 32 canonical work pages · 13 internal anchors

[1]

Sdfusion: Multimodal 3d shape completion, reconstruction, and generation,

Y.-C. Cheng, H.-Y. Lee, S. Tulyakov, A. G. Schwing, and L.-Y. Gui, “Sdfusion: Multimodal 3d shape completion, reconstruction, and generation,” inCVPR, 2023

2023
[2]

DreamFusion: Text-to-3D using 2D Diffusion

B. Poole, A. Jain, J. T. Barron, and B. Mildenhall, “Dreamfusion: Text-to-3d using 2d diffusion,”arXiv preprint arXiv:2209.14988, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022
[3]

Magic3d: High- resolution text-to-3d content creation,

C.-H. Lin, J. Gao, L. Tang, T. Takikawa, X. Zeng, X. Huang, K. Kreis, S. Fidler, M.-Y. Liu, and T.-Y. Lin, “Magic3d: High- resolution text-to-3d content creation,” inCVPR, 2023

2023
[4]

Instant3d: Fast text-to-3d with sparse-view generation and large reconstruction model,

J. Li, H. Tan, K. Zhang, Z. Xu, F. Luan, Y. Xu, Y. Hong, K. Sunkavalli, G. Shakhnarovich, and S. Bi, “Instant3d: Fast text-to-3d with sparse-view generation and large reconstruction model,” 2023

2023
[5]

Lgm: Large multi-view gaussian model for high-resolution 3d content creation,

J. Tang, Z. Chen, X. Chen, T. Wang, G. Zeng, and Z. Liu, “Lgm: Large multi-view gaussian model for high-resolution 3d content creation,” 2024

2024
[6]

Shape completion using 3d-encoder-predictor cnns and shape synthesis,

A. Dai, C. Ruizhongtai Qi, and M. Nießner, “Shape completion using 3d-encoder-predictor cnns and shape synthesis,” inProceed- ings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 5868–5877

2017
[7]

Patchcomplete: Learning multi- resolution patch priors for 3d shape completion on unseen cat- egories,

Y. Rao, Y. Nie, and A. Dai, “Patchcomplete: Learning multi- resolution patch priors for 3d shape completion on unseen cat- egories,” 2022

2022
[8]

ShapeNet: An Information-Rich 3D Model Repository

A. X. Chang, T. Funkhouser, L. Guibas, P . Hanrahan, Q. Huang, Z. Li, S. Savarese, M. Savva, S. Song, H. Suet al., “Shapenet: An information-rich 3d model repository,”arXiv preprint arXiv:1512.03012, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015
[9]

LRM: Large Reconstruction Model for Single Image to 3D

Y. Hong, K. Zhang, J. Gu, S. Bi, Y. Zhou, D. Liu, F. Liu, K. Sunkavalli, T. Bui, and H. Tan, “Lrm: Large reconstruction model for single image to 3d,”arXiv preprint arXiv:2311.04400, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[10]

MVDream: Multi-view Diffusion for 3D Generation

Y. Shi, P . Wang, J. Ye, M. Long, K. Li, and X. Yang, “Mv- dream: Multi-view diffusion for 3d generation,”arXiv preprint arXiv:2308.16512, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[11]

High-resolution image synthesis with latent diffusion models,

R. Rombach, A. Blattmann, D. Lorenz, P . Esser, and B. Ommer, “High-resolution image synthesis with latent diffusion models,” 2022

2022
[12]

Dinov2: Learning robust visual features without supervision,

M. Oquab, T. Darcet, T. Moutakanni, H. V . Vo, M. Szafraniec, V . Khalidov, P . Fernandez, D. Haziza, F. Massa, A. El-Nouby, R. Howes, P .-Y. Huang, H. Xu, V . Sharma, S.-W. Li, W. Galuba, M. Rabbat, M. Assran, N. Ballas, G. Synnaeve, I. Misra, H. Jegou, J. Mairal, P . Labatut, A. Joulin, and P . Bojanowski, “Dinov2: Learning robust visual features withou...

2023
[13]

Segment Anything

A. Kirillov, E. Mintun, N. Ravi, H. Mao, C. Rolland, L. Gustafson, T. Xiao, S. Whitehead, A. C. Berg, W.-Y. Lo, P . Doll ´ar, and R. Gir- shick, “Segment anything,”arXiv:2304.02643, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[14]

Depth Anything V2

L. Yang, B. Kang, Z. Huang, Z. Zhao, X. Xu, J. Feng, and H. Zhao, “Depth anything v2,”arXiv:2406.09414, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[15]

Adding conditional control to text-to-image diffusion models,

L. Zhang, A. Rao, and M. Agrawala, “Adding conditional control to text-to-image diffusion models,” 2023

2023
[16]

Ip-adapter: Text compatible image prompt adapter for text-to-image diffusion models,

H. Ye, J. Zhang, S. Liu, X. Han, and W. Yang, “Ip-adapter: Text compatible image prompt adapter for text-to-image diffusion models,” 2023

2023
[17]

T2I-Adapter: Learning Adapters to Dig out More Controllable Ability for Text-to-Image Diffusion Models

C. Mou, X. Wang, L. Xie, Y. Wu, J. Zhang, Z. Qi, Y. Shan, and X. Qie, “T2i-adapter: Learning adapters to dig out more control- RESTORE3D 13 lable ability for text-to-image diffusion models,”arXiv preprint arXiv:2302.08453, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[18]

Zero-1-to-3: Zero-shot one image to 3d object,

R. Liu, R. Wu, B. Van Hoorick, P . Tokmakov, S. Zakharov, and C. Vondrick, “Zero-1-to-3: Zero-shot one image to 3d object,” in ICCV, 2023

2023
[19]

InstantMesh: Efficient 3D Mesh Generation from a Single Image with Sparse-view Large Reconstruction Models

J. Xu, W. Cheng, Y. Gao, X. Wang, S. Gao, and Y. Shan, “In- stantmesh: Efficient 3d mesh generation from a single image with sparse-view large reconstruction models,”arXiv preprint arXiv:2404.07191, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[20]

Rethinking inductive biases for surface normal estimation,

G. Bae and A. J. Davison, “Rethinking inductive biases for surface normal estimation,” inIEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024

2024
[21]

Stablenormal: Reducing diffusion variance for stable and sharp normal,

C. Ye, L. Qiu, X. Gu, Q. Zuo, Y. Wu, Z. Dong, L. Bo, Y. Xiu, and X. Han, “Stablenormal: Reducing diffusion variance for stable and sharp normal,”ACM Transactions on Graphics (TOG), 2024

2024
[22]

Objaverse: A universe of annotated 3d objects,

M. Deitke, D. Schwenk, J. Salvador, L. Weihs, O. Michel, E. Van- derBilt, L. Schmidt, K. Ehsani, A. Kembhavi, and A. Farhadi, “Objaverse: A universe of annotated 3d objects,” inCVPR, 2023

2023
[23]

Google scanned objects: A high-quality dataset of 3d scanned household items,

L. Downs, A. Francis, N. Koenig, B. Kinman, R. Hickman, K. Reymann, T. B. McHugh, and V . Vanhoucke, “Google scanned objects: A high-quality dataset of 3d scanned household items,”
[24]

Available: https://arxiv.org/abs/2204.11918

[Online]. Available: https://arxiv.org/abs/2204.11918

work page arXiv
[25]

Breaking bad: A dataset for geometric fracture and reassembly,

S. Sell ´an, Y.-C. Chen, Z. Wu, A. Garg, and A. Jacobson, “Breaking bad: A dataset for geometric fracture and reassembly,” 2022. [Online]. Available: https://arxiv.org/abs/2210.11463

work page arXiv 2022
[26]

Fantastic breaks: A dataset of paired 3d scans of real-world broken objects and their complete counterparts,

N. Lamb, C. Palmer, B. Molloy, S. Banerjee, and N. K. Banerjee, “Fantastic breaks: A dataset of paired 3d scans of real-world broken objects and their complete counterparts,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2023, pp. 4681–4691

2023
[27]

Omniobject3d: Large- vocabulary 3d object dataset for realistic perception, reconstruc- tion and generation,

T. Wu, J. Zhang, X. Fu, Y. Wang, L. P . Jiawei Ren, W. Wu, L. Yang, J. Wang, C. Qian, D. Lin, and Z. Liu, “Omniobject3d: Large- vocabulary 3d object dataset for realistic perception, reconstruc- tion and generation,” inIEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023

2023
[28]

Repaint: Inpainting using denoising diffusion probabilistic models,

A. Lugmayr, M. Danelljan, A. Romero, F. Yu, R. Timofte, and L. V . Gool, “Repaint: Inpainting using denoising diffusion probabilistic models,” 2022. [Online]. Available: https://arxiv.org/abs/2201. 09865

2022
[29]

Nerfiller: Completing scenes via generative 3d inpainting,

E. Weber, A. Holynski, V . Jampani, S. Saxena, N. Snavely, A. Kar, and A. Kanazawa, “Nerfiller: Completing scenes via generative 3d inpainting,” inCVPR, 2024

2024
[30]

Instant3dit: Multiview inpainting for fast editing of 3d objects,

A. Barda, M. Gadelha, V . G. Kim, N. Aigerman, A. H. Bermano, and T. Groueix, “Instant3dit: Multiview inpainting for fast editing of 3d objects,” 2025

2025
[31]

Openlrm: Open-source large reconstruction models,

Z. He and T. Wang, “Openlrm: Open-source large reconstruction models,” https://github.com/3DTopia/OpenLRM, 2023

2023
[32]

Structured 3D Latents for Scalable and Versatile 3D Generation

J. Xiang, Z. Lv, S. Xu, Y. Deng, R. Wang, B. Zhang, D. Chen, X. Tong, and J. Yang, “Structured 3d latents for scalable and versatile 3d generation,”arXiv preprint arXiv:2412.01506, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[33]

Resolution-robust Large Mask Inpainting with Fourier Convolutions.arXiv preprint arXiv:2109.07161, 2021

R. Suvorov, E. Logacheva, A. Mashikhin, A. Remizova, A. Ashukha, A. Silvestrov, N. Kong, H. Goka, K. Park, and V . Lem- pitsky, “Resolution-robust large mask inpainting with fourier con- volutions,”arXiv preprint arXiv:2109.07161, 2021

work page arXiv 2021
[34]

SPIn-NeRF: Multiview segmentation and perceptual inpainting with neural radiance fields,

A. Mirzaei, T. Aumentado-Armstrong, K. G. Derpanis, J. Kelly, M. A. Brubaker, I. Gilitschenski, and A. Levinshtein, “SPIn-NeRF: Multiview segmentation and perceptual inpainting with neural radiance fields,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023

2023
[35]

MVIP-NeRF: Multi-view 3d inpainting on nerf scenes via diffusion prior,

H. Chen, C. C. Loy, and X. Pan, “MVIP-NeRF: Multi-view 3d inpainting on nerf scenes via diffusion prior,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024

2024
[36]

3DGIC: 3d gaussian inpainting with depth-guided cross-view consistency,

S.-Y. Huang, Z.-T. Chou, and Y.-C. F. Wang, “3DGIC: 3d gaussian inpainting with depth-guided cross-view consistency,” inProceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025

2025
[37]

MVD 2: Efficient multiview 3d reconstruction for multiview diffusion,

X.-Y. Zheng, H. Pan, Y.-X. Guo, X. Tong, and Y. Liu, “MVD 2: Efficient multiview 3d reconstruction for multiview diffusion,” in ACM SIGGRAPH 2024 Conference Papers, 2024

2024
[38]

3denhancer: Consistent multi-view diffusion for 3d enhancement,

Y. Luo, S. Zhou, Y. Lan, X. Pan, and C. C. Loy, “3denhancer: Consistent multi-view diffusion for 3d enhancement,” 2025. [Online]. Available: https://arxiv.org/abs/2412.18565

work page arXiv 2025
[39]

Sharp- It: A multi-view to multi-view diffusion model for 3d synthesis and manipulation,

Y. Edelstein, O. Patashnik, D. Cohen-Bar, and L. Wolf, “Sharp- It: A multi-view to multi-view diffusion model for 3d synthesis and manipulation,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025

2025
[40]

ProlificDreamer: High-Fidelity and Diverse Text-to-3D Generation with Variational Score Distillation, 2023

Z. Wang, C. Lu, Y. Wang, F. Bao, C. Li, H. Su, and J. Zhu, “Prolificdreamer: High-fidelity and diverse text-to-3d generation with variational score distillation,”arXiv preprint arXiv:2305.16213, 2023

work page arXiv 2023
[41]

Fantasia3d: Disentangling geometry and appearance for high-quality text-to-3d content cre- ation,

R. Chen, Y. Chen, N. Jiao, and K. Jia, “Fantasia3d: Disentangling geometry and appearance for high-quality text-to-3d content cre- ation,”arXiv preprint arXiv:2303.13873, 2023

work page arXiv 2023
[42]

Score jacobian chaining: Lifting pretrained 2d diffusion models for 3d generation,

H. Wang, X. Du, J. Li, R. A. Yeh, and G. Shakhnarovich, “Score jacobian chaining: Lifting pretrained 2d diffusion models for 3d generation,” inCVPR, 2023

2023
[43]

Sparsefusion: Distilling view- conditioned diffusion for 3d reconstruction,

Z. Zhou and S. Tulsiani, “Sparsefusion: Distilling view- conditioned diffusion for 3d reconstruction,” inCVPR, 2023

2023
[44]

MVDiffusion: Enabling Holistic Multi- view Image Generation with Correspondence-Aware Diffu- sion, 2023

S. Tang, F. Zhang, J. Chen, P . Wang, and Y. Furukawa, “Mvd- iffusion: Enabling holistic multi-view image generation with correspondence-aware diffusion,”arXiv preprint arXiv:2307.01097, 2023

work page arXiv 2023
[45]

Viewset diffusion:(0-) image-conditioned 3d generative models from 2d data,

S. Szymanowicz, C. Rupprecht, and A. Vedaldi, “Viewset diffusion:(0-) image-conditioned 3d generative models from 2d data,”arXiv preprint arXiv:2306.07881, 2023

work page arXiv 2023
[46]

Diffusion with for- ward models: Solving stochastic inverse problems without direct supervision,

A. Tewari, T. Yin, G. Cazenavette, S. Rezchikov, J. B. Tenenbaum, F. Durand, W. T. Freeman, and V . Sitzmann, “Diffusion with for- ward models: Solving stochastic inverse problems without direct supervision,”arXiv preprint arXiv:2306.11719, 2023

work page arXiv 2023
[47]

Dmv3d: Denoising multi-view diffusion using 3d large reconstruction model,

Y. Xu, H. Tan, F. Luan, S. Bi, P . Wang, J. Li, Z. Shi, K. Sunkavalli, G. Wetzstein, Z. Xu, and K. Zhang, “Dmv3d: Denoising multi-view diffusion using 3d large reconstruction model,” 2023

2023
[48]

One-2-3-45,

H. Face, “One-2-3-45,” https://huggingface.co/spaces/ One-2-3-45/One-2-3-45, 2023

2023
[49]

Sparseneus: Fast generalizable neural surface reconstruction from sparse views,

X. Long, C. Lin, P . Wang, T. Komura, and W. Wang, “Sparseneus: Fast generalizable neural surface reconstruction from sparse views,” inEuropean Conference on Computer Vision. Springer, 2022, pp. 210–227

2022
[50]

Wonder3d: Single image to 3d using cross-domain diffusion,

X. Long, Y.-C. Guo, C. Lin, Y. Liu, Z. Dou, L. Liu, Y. Ma, S.-H. Zhang, M. Habermann, C. Theobalt, and W. Wang, “Wonder3d: Single image to 3d using cross-domain diffusion,” 2023

2023
[51]

Unique3d: High-quality and efficient 3d mesh generation from a single image,

K. Wu, F. Liu, Z. Cai, R. Yan, H. Wang, Y. Hu, Y. Duan, and K. Ma, “Unique3d: High-quality and efficient 3d mesh generation from a single image,” 2024

2024
[52]

Direct2.5: Diverse text-to-3d generation via multi-view 2.5d diffusion,

Y. Lu, J. Zhang, S. Li, T. Fang, D. McKinnon, Y. Tsin, L. Quan, X. Cao, and Y. Yao, “Direct2.5: Diverse text-to-3d generation via multi-view 2.5d diffusion,”Computer Vision and Pattern Recognition (CVPR), 2024

2024
[53]

Point-E: A System for Generating 3D Point Clouds from Complex Prompts

A. Nichol, H. Jun, P . Dhariwal, P . Mishkin, and M. Chen, “Point-e: A system for generating 3d point clouds from complex prompts,” arXiv preprint arXiv:2212.08751, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022
[54]

Lion: Latent point diffusion models for 3d shape generation,

X. Zeng, A. Vahdat, F. Williams, Z. Gojcic, O. Litany, S. Fidler, and K. Kreis, “Lion: Latent point diffusion models for 3d shape generation,” inNeurIPS, 2022

2022
[55]

Diffusion probabilistic models for 3d point cloud generation,

S. Luo and W. Hu, “Diffusion probabilistic models for 3d point cloud generation,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 2837–2845

2021
[56]

Meshdiffusion: Score-based generative 3d mesh model- ing,

Z. Liu, Y. Feng, M. J. Black, D. Nowrouzezahrai, L. Paull, and W. Liu, “Meshdiffusion: Score-based generative 3d mesh model- ing,” inICLR, 2023

2023
[57]

Get3d: A generative model of high quality 3d textured shapes learned from images,

J. Gao, T. Shen, Z. Wang, W. Chen, K. Yin, D. Li, O. Litany, Z. Gojcic, and S. Fidler, “Get3d: A generative model of high quality 3d textured shapes learned from images,”NeurIPS, 2022

2022
[58]

Neuralfield-ldm: Scene generation with hierarchical latent diffusion models,

S. W. Kim, B. Brown, K. Yin, K. Kreis, K. Schwarz, D. Li, R. Rombach, A. Torralba, and S. Fidler, “Neuralfield-ldm: Scene generation with hierarchical latent diffusion models,” inCVPR, 2023

2023
[59]

Renderdiffusion: Image diffusion for 3d reconstruction, inpainting and generation,

T. Anciukevi ˇcius, Z. Xu, M. Fisher, P . Henderson, H. Bilen, N. J. Mitra, and P . Guerrero, “Renderdiffusion: Image diffusion for 3d reconstruction, inpainting and generation,” inCVPR, 2023

2023
[60]

Diffrf: Rendering-guided 3d radiance field diffu- sion,

N. M ¨uller, Y. Siddiqui, L. Porzi, S. R. Bulo, P . Kontschieder, and M. Nießner, “Diffrf: Rendering-guided 3d radiance field diffu- sion,” inCVPR, 2023

2023
[61]

Shap-e: Generating conditional 3d implicit functions,

H. Jun and A. Nichol, “Shap-e: Generating conditional 3d implicit functions,” 2023

2023
[62]

3dshape2vecset: A 3d shape representation for neural fields and generative diffusion models,

B. Zhang, J. Tang, M. Niessner, and P . Wonka, “3dshape2vecset: A 3d shape representation for neural fields and generative diffusion models,” inSIGGRAPH, 2023

2023
[63]

Hyperdiffusion: Generating implicit neural fields with weight-space diffusion,

Z. Erkoc ¸, F. Ma, Q. Shan, M. Nießner, and A. Dai, “Hyperdiffusion: Generating implicit neural fields with weight-space diffusion,” arXiv preprint arXiv:2303.17015, 2023. RESTORE3D 14

work page arXiv 2023
[64]

Single- stage diffusion nerf: A unified approach to 3d generation and reconstruction,

H. Chen, J. Gu, A. Chen, W. Tian, Z. Tu, L. Liu, and H. Su, “Single- stage diffusion nerf: A unified approach to 3d generation and reconstruction,” inICCV, 2023

2023
[65]

Point-cloud completion with pretrained text-to-image diffusion models,

Y. Kasten, O. Rahamim, and G. Chechik, “Point-cloud completion with pretrained text-to-image diffusion models,” 2023

2023
[66]

Unsupervised 3d shape completion through gan inversion,

J. Zhang, X. Chen, Z. Cai, L. Pan, H. Zhao, S. Yi, C. K. Yeo, B. Dai, and C. C. Loy, “Unsupervised 3d shape completion through gan inversion,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 1768–1777

2021
[67]

Scan2mesh: From unstructured range scans to 3d meshes,

A. Dai and M. Nießner, “Scan2mesh: From unstructured range scans to 3d meshes,” 2019

2019
[68]

Autosdf: Shape priors for 3d completion, reconstruction and generation,

P . Mittal, Y.-C. Cheng, M. Singh, and S. Tulsiani, “Autosdf: Shape priors for 3d completion, reconstruction and generation,” inPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 306–315

2022
[69]

Variational relational point completion network,

L. Pan, X. Chen, Z. Cai, J. Zhang, H. Zhao, S. Yi, and Z. Liu, “Variational relational point completion network,” 2021

2021
[70]

Diffcomplete: Diffusion-based generative 3d shape completion,

R. Chu, E. Xie, S. Mo, Z. Li, M. Nießner, C.-W. Fu, and J. Jia, “Diffcomplete: Diffusion-based generative 3d shape completion,” 2023

2023
[71]

Texture: Text-guided texturing of 3d shapes,

E. Richardson, G. Metzer, Y. Alaluf, R. Giryes, and D. Cohen- Or, “Texture: Text-guided texturing of 3d shapes,”arXiv preprint arXiv:2302.01721, 2023

work page arXiv 2023
[72]

Texfusion: Syn- thesizing 3d textures with text-guided image diffusion models,

T. Cao, K. Kreis, S. Fidler, N. Sharp, and K. Yin, “Texfusion: Syn- thesizing 3d textures with text-guided image diffusion models,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 4169–4181

2023
[73]

Text2tex: Text-driven texture synthesis via diffusion models,

D. Z. Chen, Y. Siddiqui, H.-Y. Lee, S. Tulyakov, and M. Nießner, “Text2tex: Text-driven texture synthesis via diffusion models,” arXiv preprint arXiv:2303.11396, 2023

work page arXiv 2023
[74]

Paint3d: Paint anything 3d with lighting-less texture diffusion models,

X. Zeng, X. Chen, Z. Qi, W. Liu, Z. Zhao, Z. Wang, B. Fu, Y. Liu, and G. Yu, “Paint3d: Paint anything 3d with lighting-less texture diffusion models,” 2023

2023
[75]

Texturify: Generating textures on 3d shape surfaces,

Y. Siddiqui, J. Thies, F. Ma, Q. Shan, M. Nießner, and A. Dai, “Texturify: Generating textures on 3d shape surfaces,” inEuropean Conference on Computer Vision. Springer, 2022, pp. 72–88

2022
[76]

A style-based generator archi- tecture for generative adversarial networks,

T. Karras, S. Laine, and T. Aila, “A style-based generator archi- tecture for generative adversarial networks,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 4401–4410

2019
[77]

Mesh2tex: Generating mesh textures from image queries,

A. Bokhovkin, S. Tulsiani, and A. Dai, “Mesh2tex: Generating mesh textures from image queries,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 8918–8928

2023
[78]

Texture generation on 3d meshes with point-uv diffusion,

X. Yu, P . Dai, W. Li, L. Ma, Z. Liu, and X. Qi, “Texture generation on 3d meshes with point-uv diffusion,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 4206–4216

2023
[79]

Auv-net: Learning aligned uv maps for texture transfer and synthesis,

Z. Chen, K. Yin, and S. Fidler, “Auv-net: Learning aligned uv maps for texture transfer and synthesis,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 1465–1474

2022
[80]

Learning texture generators for 3d shape collections from internet photo sets,

R. Yu, Y. Dong, P . Peers, and X. Tong, “Learning texture generators for 3d shape collections from internet photo sets,” inBritish Machine Vision Conference, 2021

2021

Showing first 80 references.

[1] [1]

Sdfusion: Multimodal 3d shape completion, reconstruction, and generation,

Y.-C. Cheng, H.-Y. Lee, S. Tulyakov, A. G. Schwing, and L.-Y. Gui, “Sdfusion: Multimodal 3d shape completion, reconstruction, and generation,” inCVPR, 2023

2023

[2] [2]

DreamFusion: Text-to-3D using 2D Diffusion

B. Poole, A. Jain, J. T. Barron, and B. Mildenhall, “Dreamfusion: Text-to-3d using 2d diffusion,”arXiv preprint arXiv:2209.14988, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022

[3] [3]

Magic3d: High- resolution text-to-3d content creation,

C.-H. Lin, J. Gao, L. Tang, T. Takikawa, X. Zeng, X. Huang, K. Kreis, S. Fidler, M.-Y. Liu, and T.-Y. Lin, “Magic3d: High- resolution text-to-3d content creation,” inCVPR, 2023

2023

[4] [4]

Instant3d: Fast text-to-3d with sparse-view generation and large reconstruction model,

J. Li, H. Tan, K. Zhang, Z. Xu, F. Luan, Y. Xu, Y. Hong, K. Sunkavalli, G. Shakhnarovich, and S. Bi, “Instant3d: Fast text-to-3d with sparse-view generation and large reconstruction model,” 2023

2023

[5] [5]

Lgm: Large multi-view gaussian model for high-resolution 3d content creation,

J. Tang, Z. Chen, X. Chen, T. Wang, G. Zeng, and Z. Liu, “Lgm: Large multi-view gaussian model for high-resolution 3d content creation,” 2024

2024

[6] [6]

Shape completion using 3d-encoder-predictor cnns and shape synthesis,

A. Dai, C. Ruizhongtai Qi, and M. Nießner, “Shape completion using 3d-encoder-predictor cnns and shape synthesis,” inProceed- ings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 5868–5877

2017

[7] [7]

Patchcomplete: Learning multi- resolution patch priors for 3d shape completion on unseen cat- egories,

Y. Rao, Y. Nie, and A. Dai, “Patchcomplete: Learning multi- resolution patch priors for 3d shape completion on unseen cat- egories,” 2022

2022

[8] [8]

ShapeNet: An Information-Rich 3D Model Repository

A. X. Chang, T. Funkhouser, L. Guibas, P . Hanrahan, Q. Huang, Z. Li, S. Savarese, M. Savva, S. Song, H. Suet al., “Shapenet: An information-rich 3d model repository,”arXiv preprint arXiv:1512.03012, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015

[9] [9]

LRM: Large Reconstruction Model for Single Image to 3D

Y. Hong, K. Zhang, J. Gu, S. Bi, Y. Zhou, D. Liu, F. Liu, K. Sunkavalli, T. Bui, and H. Tan, “Lrm: Large reconstruction model for single image to 3d,”arXiv preprint arXiv:2311.04400, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[10] [10]

MVDream: Multi-view Diffusion for 3D Generation

Y. Shi, P . Wang, J. Ye, M. Long, K. Li, and X. Yang, “Mv- dream: Multi-view diffusion for 3d generation,”arXiv preprint arXiv:2308.16512, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[11] [11]

High-resolution image synthesis with latent diffusion models,

R. Rombach, A. Blattmann, D. Lorenz, P . Esser, and B. Ommer, “High-resolution image synthesis with latent diffusion models,” 2022

2022

[12] [12]

Dinov2: Learning robust visual features without supervision,

M. Oquab, T. Darcet, T. Moutakanni, H. V . Vo, M. Szafraniec, V . Khalidov, P . Fernandez, D. Haziza, F. Massa, A. El-Nouby, R. Howes, P .-Y. Huang, H. Xu, V . Sharma, S.-W. Li, W. Galuba, M. Rabbat, M. Assran, N. Ballas, G. Synnaeve, I. Misra, H. Jegou, J. Mairal, P . Labatut, A. Joulin, and P . Bojanowski, “Dinov2: Learning robust visual features withou...

2023

[13] [13]

Segment Anything

A. Kirillov, E. Mintun, N. Ravi, H. Mao, C. Rolland, L. Gustafson, T. Xiao, S. Whitehead, A. C. Berg, W.-Y. Lo, P . Doll ´ar, and R. Gir- shick, “Segment anything,”arXiv:2304.02643, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[14] [14]

Depth Anything V2

L. Yang, B. Kang, Z. Huang, Z. Zhao, X. Xu, J. Feng, and H. Zhao, “Depth anything v2,”arXiv:2406.09414, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[15] [15]

Adding conditional control to text-to-image diffusion models,

L. Zhang, A. Rao, and M. Agrawala, “Adding conditional control to text-to-image diffusion models,” 2023

2023

[16] [16]

Ip-adapter: Text compatible image prompt adapter for text-to-image diffusion models,

H. Ye, J. Zhang, S. Liu, X. Han, and W. Yang, “Ip-adapter: Text compatible image prompt adapter for text-to-image diffusion models,” 2023

2023

[17] [17]

T2I-Adapter: Learning Adapters to Dig out More Controllable Ability for Text-to-Image Diffusion Models

C. Mou, X. Wang, L. Xie, Y. Wu, J. Zhang, Z. Qi, Y. Shan, and X. Qie, “T2i-adapter: Learning adapters to dig out more control- RESTORE3D 13 lable ability for text-to-image diffusion models,”arXiv preprint arXiv:2302.08453, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[18] [18]

Zero-1-to-3: Zero-shot one image to 3d object,

R. Liu, R. Wu, B. Van Hoorick, P . Tokmakov, S. Zakharov, and C. Vondrick, “Zero-1-to-3: Zero-shot one image to 3d object,” in ICCV, 2023

2023

[19] [19]

InstantMesh: Efficient 3D Mesh Generation from a Single Image with Sparse-view Large Reconstruction Models

J. Xu, W. Cheng, Y. Gao, X. Wang, S. Gao, and Y. Shan, “In- stantmesh: Efficient 3d mesh generation from a single image with sparse-view large reconstruction models,”arXiv preprint arXiv:2404.07191, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[20] [20]

Rethinking inductive biases for surface normal estimation,

G. Bae and A. J. Davison, “Rethinking inductive biases for surface normal estimation,” inIEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024

2024

[21] [21]

Stablenormal: Reducing diffusion variance for stable and sharp normal,

C. Ye, L. Qiu, X. Gu, Q. Zuo, Y. Wu, Z. Dong, L. Bo, Y. Xiu, and X. Han, “Stablenormal: Reducing diffusion variance for stable and sharp normal,”ACM Transactions on Graphics (TOG), 2024

2024

[22] [22]

Objaverse: A universe of annotated 3d objects,

M. Deitke, D. Schwenk, J. Salvador, L. Weihs, O. Michel, E. Van- derBilt, L. Schmidt, K. Ehsani, A. Kembhavi, and A. Farhadi, “Objaverse: A universe of annotated 3d objects,” inCVPR, 2023

2023

[23] [23]

Google scanned objects: A high-quality dataset of 3d scanned household items,

L. Downs, A. Francis, N. Koenig, B. Kinman, R. Hickman, K. Reymann, T. B. McHugh, and V . Vanhoucke, “Google scanned objects: A high-quality dataset of 3d scanned household items,”

[24] [24]

Available: https://arxiv.org/abs/2204.11918

[Online]. Available: https://arxiv.org/abs/2204.11918

work page arXiv

[25] [25]

Breaking bad: A dataset for geometric fracture and reassembly,

S. Sell ´an, Y.-C. Chen, Z. Wu, A. Garg, and A. Jacobson, “Breaking bad: A dataset for geometric fracture and reassembly,” 2022. [Online]. Available: https://arxiv.org/abs/2210.11463

work page arXiv 2022

[26] [26]

Fantastic breaks: A dataset of paired 3d scans of real-world broken objects and their complete counterparts,

N. Lamb, C. Palmer, B. Molloy, S. Banerjee, and N. K. Banerjee, “Fantastic breaks: A dataset of paired 3d scans of real-world broken objects and their complete counterparts,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2023, pp. 4681–4691

2023

[27] [27]

Omniobject3d: Large- vocabulary 3d object dataset for realistic perception, reconstruc- tion and generation,

T. Wu, J. Zhang, X. Fu, Y. Wang, L. P . Jiawei Ren, W. Wu, L. Yang, J. Wang, C. Qian, D. Lin, and Z. Liu, “Omniobject3d: Large- vocabulary 3d object dataset for realistic perception, reconstruc- tion and generation,” inIEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023

2023

[28] [28]

Repaint: Inpainting using denoising diffusion probabilistic models,

A. Lugmayr, M. Danelljan, A. Romero, F. Yu, R. Timofte, and L. V . Gool, “Repaint: Inpainting using denoising diffusion probabilistic models,” 2022. [Online]. Available: https://arxiv.org/abs/2201. 09865

2022

[29] [29]

Nerfiller: Completing scenes via generative 3d inpainting,

E. Weber, A. Holynski, V . Jampani, S. Saxena, N. Snavely, A. Kar, and A. Kanazawa, “Nerfiller: Completing scenes via generative 3d inpainting,” inCVPR, 2024

2024

[30] [30]

Instant3dit: Multiview inpainting for fast editing of 3d objects,

A. Barda, M. Gadelha, V . G. Kim, N. Aigerman, A. H. Bermano, and T. Groueix, “Instant3dit: Multiview inpainting for fast editing of 3d objects,” 2025

2025

[31] [31]

Openlrm: Open-source large reconstruction models,

Z. He and T. Wang, “Openlrm: Open-source large reconstruction models,” https://github.com/3DTopia/OpenLRM, 2023

2023

[32] [32]

Structured 3D Latents for Scalable and Versatile 3D Generation

J. Xiang, Z. Lv, S. Xu, Y. Deng, R. Wang, B. Zhang, D. Chen, X. Tong, and J. Yang, “Structured 3d latents for scalable and versatile 3d generation,”arXiv preprint arXiv:2412.01506, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[33] [33]

Resolution-robust Large Mask Inpainting with Fourier Convolutions.arXiv preprint arXiv:2109.07161, 2021

R. Suvorov, E. Logacheva, A. Mashikhin, A. Remizova, A. Ashukha, A. Silvestrov, N. Kong, H. Goka, K. Park, and V . Lem- pitsky, “Resolution-robust large mask inpainting with fourier con- volutions,”arXiv preprint arXiv:2109.07161, 2021

work page arXiv 2021

[34] [34]

SPIn-NeRF: Multiview segmentation and perceptual inpainting with neural radiance fields,

A. Mirzaei, T. Aumentado-Armstrong, K. G. Derpanis, J. Kelly, M. A. Brubaker, I. Gilitschenski, and A. Levinshtein, “SPIn-NeRF: Multiview segmentation and perceptual inpainting with neural radiance fields,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023

2023

[35] [35]

MVIP-NeRF: Multi-view 3d inpainting on nerf scenes via diffusion prior,

H. Chen, C. C. Loy, and X. Pan, “MVIP-NeRF: Multi-view 3d inpainting on nerf scenes via diffusion prior,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024

2024

[36] [36]

3DGIC: 3d gaussian inpainting with depth-guided cross-view consistency,

S.-Y. Huang, Z.-T. Chou, and Y.-C. F. Wang, “3DGIC: 3d gaussian inpainting with depth-guided cross-view consistency,” inProceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025

2025

[37] [37]

MVD 2: Efficient multiview 3d reconstruction for multiview diffusion,

X.-Y. Zheng, H. Pan, Y.-X. Guo, X. Tong, and Y. Liu, “MVD 2: Efficient multiview 3d reconstruction for multiview diffusion,” in ACM SIGGRAPH 2024 Conference Papers, 2024

2024

[38] [38]

3denhancer: Consistent multi-view diffusion for 3d enhancement,

Y. Luo, S. Zhou, Y. Lan, X. Pan, and C. C. Loy, “3denhancer: Consistent multi-view diffusion for 3d enhancement,” 2025. [Online]. Available: https://arxiv.org/abs/2412.18565

work page arXiv 2025

[39] [39]

Sharp- It: A multi-view to multi-view diffusion model for 3d synthesis and manipulation,

Y. Edelstein, O. Patashnik, D. Cohen-Bar, and L. Wolf, “Sharp- It: A multi-view to multi-view diffusion model for 3d synthesis and manipulation,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025

2025

[40] [40]

ProlificDreamer: High-Fidelity and Diverse Text-to-3D Generation with Variational Score Distillation, 2023

Z. Wang, C. Lu, Y. Wang, F. Bao, C. Li, H. Su, and J. Zhu, “Prolificdreamer: High-fidelity and diverse text-to-3d generation with variational score distillation,”arXiv preprint arXiv:2305.16213, 2023

work page arXiv 2023

[41] [41]

Fantasia3d: Disentangling geometry and appearance for high-quality text-to-3d content cre- ation,

R. Chen, Y. Chen, N. Jiao, and K. Jia, “Fantasia3d: Disentangling geometry and appearance for high-quality text-to-3d content cre- ation,”arXiv preprint arXiv:2303.13873, 2023

work page arXiv 2023

[42] [42]

Score jacobian chaining: Lifting pretrained 2d diffusion models for 3d generation,

H. Wang, X. Du, J. Li, R. A. Yeh, and G. Shakhnarovich, “Score jacobian chaining: Lifting pretrained 2d diffusion models for 3d generation,” inCVPR, 2023

2023

[43] [43]

Sparsefusion: Distilling view- conditioned diffusion for 3d reconstruction,

Z. Zhou and S. Tulsiani, “Sparsefusion: Distilling view- conditioned diffusion for 3d reconstruction,” inCVPR, 2023

2023

[44] [44]

MVDiffusion: Enabling Holistic Multi- view Image Generation with Correspondence-Aware Diffu- sion, 2023

S. Tang, F. Zhang, J. Chen, P . Wang, and Y. Furukawa, “Mvd- iffusion: Enabling holistic multi-view image generation with correspondence-aware diffusion,”arXiv preprint arXiv:2307.01097, 2023

work page arXiv 2023

[45] [45]

Viewset diffusion:(0-) image-conditioned 3d generative models from 2d data,

S. Szymanowicz, C. Rupprecht, and A. Vedaldi, “Viewset diffusion:(0-) image-conditioned 3d generative models from 2d data,”arXiv preprint arXiv:2306.07881, 2023

work page arXiv 2023

[46] [46]

Diffusion with for- ward models: Solving stochastic inverse problems without direct supervision,

A. Tewari, T. Yin, G. Cazenavette, S. Rezchikov, J. B. Tenenbaum, F. Durand, W. T. Freeman, and V . Sitzmann, “Diffusion with for- ward models: Solving stochastic inverse problems without direct supervision,”arXiv preprint arXiv:2306.11719, 2023

work page arXiv 2023

[47] [47]

Dmv3d: Denoising multi-view diffusion using 3d large reconstruction model,

Y. Xu, H. Tan, F. Luan, S. Bi, P . Wang, J. Li, Z. Shi, K. Sunkavalli, G. Wetzstein, Z. Xu, and K. Zhang, “Dmv3d: Denoising multi-view diffusion using 3d large reconstruction model,” 2023

2023

[48] [48]

One-2-3-45,

H. Face, “One-2-3-45,” https://huggingface.co/spaces/ One-2-3-45/One-2-3-45, 2023

2023

[49] [49]

Sparseneus: Fast generalizable neural surface reconstruction from sparse views,

X. Long, C. Lin, P . Wang, T. Komura, and W. Wang, “Sparseneus: Fast generalizable neural surface reconstruction from sparse views,” inEuropean Conference on Computer Vision. Springer, 2022, pp. 210–227

2022

[50] [50]

Wonder3d: Single image to 3d using cross-domain diffusion,

X. Long, Y.-C. Guo, C. Lin, Y. Liu, Z. Dou, L. Liu, Y. Ma, S.-H. Zhang, M. Habermann, C. Theobalt, and W. Wang, “Wonder3d: Single image to 3d using cross-domain diffusion,” 2023

2023

[51] [51]

Unique3d: High-quality and efficient 3d mesh generation from a single image,

K. Wu, F. Liu, Z. Cai, R. Yan, H. Wang, Y. Hu, Y. Duan, and K. Ma, “Unique3d: High-quality and efficient 3d mesh generation from a single image,” 2024

2024

[52] [52]

Direct2.5: Diverse text-to-3d generation via multi-view 2.5d diffusion,

Y. Lu, J. Zhang, S. Li, T. Fang, D. McKinnon, Y. Tsin, L. Quan, X. Cao, and Y. Yao, “Direct2.5: Diverse text-to-3d generation via multi-view 2.5d diffusion,”Computer Vision and Pattern Recognition (CVPR), 2024

2024

[53] [53]

Point-E: A System for Generating 3D Point Clouds from Complex Prompts

A. Nichol, H. Jun, P . Dhariwal, P . Mishkin, and M. Chen, “Point-e: A system for generating 3d point clouds from complex prompts,” arXiv preprint arXiv:2212.08751, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022

[54] [54]

Lion: Latent point diffusion models for 3d shape generation,

X. Zeng, A. Vahdat, F. Williams, Z. Gojcic, O. Litany, S. Fidler, and K. Kreis, “Lion: Latent point diffusion models for 3d shape generation,” inNeurIPS, 2022

2022

[55] [55]

Diffusion probabilistic models for 3d point cloud generation,

S. Luo and W. Hu, “Diffusion probabilistic models for 3d point cloud generation,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 2837–2845

2021

[56] [56]

Meshdiffusion: Score-based generative 3d mesh model- ing,

Z. Liu, Y. Feng, M. J. Black, D. Nowrouzezahrai, L. Paull, and W. Liu, “Meshdiffusion: Score-based generative 3d mesh model- ing,” inICLR, 2023

2023

[57] [57]

Get3d: A generative model of high quality 3d textured shapes learned from images,

J. Gao, T. Shen, Z. Wang, W. Chen, K. Yin, D. Li, O. Litany, Z. Gojcic, and S. Fidler, “Get3d: A generative model of high quality 3d textured shapes learned from images,”NeurIPS, 2022

2022

[58] [58]

Neuralfield-ldm: Scene generation with hierarchical latent diffusion models,

S. W. Kim, B. Brown, K. Yin, K. Kreis, K. Schwarz, D. Li, R. Rombach, A. Torralba, and S. Fidler, “Neuralfield-ldm: Scene generation with hierarchical latent diffusion models,” inCVPR, 2023

2023

[59] [59]

Renderdiffusion: Image diffusion for 3d reconstruction, inpainting and generation,

T. Anciukevi ˇcius, Z. Xu, M. Fisher, P . Henderson, H. Bilen, N. J. Mitra, and P . Guerrero, “Renderdiffusion: Image diffusion for 3d reconstruction, inpainting and generation,” inCVPR, 2023

2023

[60] [60]

Diffrf: Rendering-guided 3d radiance field diffu- sion,

N. M ¨uller, Y. Siddiqui, L. Porzi, S. R. Bulo, P . Kontschieder, and M. Nießner, “Diffrf: Rendering-guided 3d radiance field diffu- sion,” inCVPR, 2023

2023

[61] [61]

Shap-e: Generating conditional 3d implicit functions,

H. Jun and A. Nichol, “Shap-e: Generating conditional 3d implicit functions,” 2023

2023

[62] [62]

3dshape2vecset: A 3d shape representation for neural fields and generative diffusion models,

B. Zhang, J. Tang, M. Niessner, and P . Wonka, “3dshape2vecset: A 3d shape representation for neural fields and generative diffusion models,” inSIGGRAPH, 2023

2023

[63] [63]

Hyperdiffusion: Generating implicit neural fields with weight-space diffusion,

Z. Erkoc ¸, F. Ma, Q. Shan, M. Nießner, and A. Dai, “Hyperdiffusion: Generating implicit neural fields with weight-space diffusion,” arXiv preprint arXiv:2303.17015, 2023. RESTORE3D 14

work page arXiv 2023

[64] [64]

Single- stage diffusion nerf: A unified approach to 3d generation and reconstruction,

H. Chen, J. Gu, A. Chen, W. Tian, Z. Tu, L. Liu, and H. Su, “Single- stage diffusion nerf: A unified approach to 3d generation and reconstruction,” inICCV, 2023

2023

[65] [65]

Point-cloud completion with pretrained text-to-image diffusion models,

Y. Kasten, O. Rahamim, and G. Chechik, “Point-cloud completion with pretrained text-to-image diffusion models,” 2023

2023

[66] [66]

Unsupervised 3d shape completion through gan inversion,

J. Zhang, X. Chen, Z. Cai, L. Pan, H. Zhao, S. Yi, C. K. Yeo, B. Dai, and C. C. Loy, “Unsupervised 3d shape completion through gan inversion,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 1768–1777

2021

[67] [67]

Scan2mesh: From unstructured range scans to 3d meshes,

A. Dai and M. Nießner, “Scan2mesh: From unstructured range scans to 3d meshes,” 2019

2019

[68] [68]

Autosdf: Shape priors for 3d completion, reconstruction and generation,

P . Mittal, Y.-C. Cheng, M. Singh, and S. Tulsiani, “Autosdf: Shape priors for 3d completion, reconstruction and generation,” inPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 306–315

2022

[69] [69]

Variational relational point completion network,

L. Pan, X. Chen, Z. Cai, J. Zhang, H. Zhao, S. Yi, and Z. Liu, “Variational relational point completion network,” 2021

2021

[70] [70]

Diffcomplete: Diffusion-based generative 3d shape completion,

R. Chu, E. Xie, S. Mo, Z. Li, M. Nießner, C.-W. Fu, and J. Jia, “Diffcomplete: Diffusion-based generative 3d shape completion,” 2023

2023

[71] [71]

Texture: Text-guided texturing of 3d shapes,

E. Richardson, G. Metzer, Y. Alaluf, R. Giryes, and D. Cohen- Or, “Texture: Text-guided texturing of 3d shapes,”arXiv preprint arXiv:2302.01721, 2023

work page arXiv 2023

[72] [72]

Texfusion: Syn- thesizing 3d textures with text-guided image diffusion models,

T. Cao, K. Kreis, S. Fidler, N. Sharp, and K. Yin, “Texfusion: Syn- thesizing 3d textures with text-guided image diffusion models,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 4169–4181

2023

[73] [73]

Text2tex: Text-driven texture synthesis via diffusion models,

D. Z. Chen, Y. Siddiqui, H.-Y. Lee, S. Tulyakov, and M. Nießner, “Text2tex: Text-driven texture synthesis via diffusion models,” arXiv preprint arXiv:2303.11396, 2023

work page arXiv 2023

[74] [74]

Paint3d: Paint anything 3d with lighting-less texture diffusion models,

X. Zeng, X. Chen, Z. Qi, W. Liu, Z. Zhao, Z. Wang, B. Fu, Y. Liu, and G. Yu, “Paint3d: Paint anything 3d with lighting-less texture diffusion models,” 2023

2023

[75] [75]

Texturify: Generating textures on 3d shape surfaces,

Y. Siddiqui, J. Thies, F. Ma, Q. Shan, M. Nießner, and A. Dai, “Texturify: Generating textures on 3d shape surfaces,” inEuropean Conference on Computer Vision. Springer, 2022, pp. 72–88

2022

[76] [76]

A style-based generator archi- tecture for generative adversarial networks,

T. Karras, S. Laine, and T. Aila, “A style-based generator archi- tecture for generative adversarial networks,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 4401–4410

2019

[77] [77]

Mesh2tex: Generating mesh textures from image queries,

A. Bokhovkin, S. Tulsiani, and A. Dai, “Mesh2tex: Generating mesh textures from image queries,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 8918–8928

2023

[78] [78]

Texture generation on 3d meshes with point-uv diffusion,

X. Yu, P . Dai, W. Li, L. Ma, Z. Liu, and X. Qi, “Texture generation on 3d meshes with point-uv diffusion,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 4206–4216

2023

[79] [79]

Auv-net: Learning aligned uv maps for texture transfer and synthesis,

Z. Chen, K. Yin, and S. Fidler, “Auv-net: Learning aligned uv maps for texture transfer and synthesis,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 1465–1474

2022

[80] [80]

Learning texture generators for 3d shape collections from internet photo sets,

R. Yu, Y. Dong, P . Peers, and X. Tong, “Learning texture generators for 3d shape collections from internet photo sets,” inBritish Machine Vision Conference, 2021

2021