{"total":17,"items":[{"citing_arxiv_id":"2605.18010","ref_index":50,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Functionalization via Structure Completion and Motion Rectification","primary_cat":"cs.CV","submitted_at":"2026-05-18T08:05:07+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Object functionalization is cast as neural graph completion over a functional graph of parts, contacts, and motions, followed by geometry realization that also rectifies erroneous motions, demonstrated on furniture with a new paired dataset.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.16990","ref_index":43,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"DreamEdit3D: Personalization of Multi-View Diffusion Models for 3D Editing","primary_cat":"cs.CV","submitted_at":"2026-05-16T13:21:22+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"DreamEdit3D learns separate token embeddings for segmented object components via two-phase multi-view optimization to enable text-guided 3D editing with consistent image generation and mesh reconstruction.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.16807","ref_index":72,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"DecoRec: Decomposed 3D Scene Reconstruction from Single-View Images via Object-Level Diffusion","primary_cat":"cs.CV","submitted_at":"2026-05-16T04:23:48+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"DecoRec decomposes single-view 3D scene reconstruction into per-object diffusion reconstructions followed by a differentiable rendering and diffusion-guided merging pipeline.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.13838","ref_index":109,"ref_count":2,"confidence":0.9,"is_internal_anchor":false,"paper_title":"R-DMesh: Video-Guided 3D Animation via Rectified Dynamic Mesh Flow","primary_cat":"cs.CV","submitted_at":"2026-05-13T17:58:13+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"R-DMesh generates high-fidelity 4D meshes aligned to video by disentangling base mesh, motion, and a learned rectification jump offset inside a VAE, then using Triflow Attention and rectified-flow diffusion.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.04527","ref_index":93,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Velox: Learning Representations of 4D Geometry and Appearance","primary_cat":"cs.CV","submitted_at":"2026-05-06T06:12:19+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Velox compresses dynamic point clouds into latent tokens that support geometry via 4D surface modeling and appearance via 3D Gaussians, showing strong results on video-to-4D generation, tracking, and image-to-4D cloth simulation.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"in Neural Information Processing Systems (NeurIPS), 30, 2017. 3 [91] Bo Wang, Jian Li, Yang Yu, Li Liu, Zhenping Sun, and Dewen Hu. Scenetracker: Long-term scene flow estimation network.arXiv preprint arXiv:2403.19924, 2024. 2, 8 [92] Feng Wang, Sinan Tan, Xinghang Li, Zeyue Tian, and Huaping Liu. Mixed neural voxels for fast multi-view video synthesis.arXiv preprint arXiv:2212.00190, 2022. 2 [93] Peng Wang and Yichun Shi. Imagedream: Image-prompt multi-view diffusion for 3d generation.arXiv preprint arXiv:2312.02201, 2023. 8 [94] Qianqian Wang, Yen-Yu Chang, Ruojin Cai, Zhengqi Li, Bharath Hariharan, Aleksander Holynski, and Noah Snavely. Tracking everything everywhere all at once.ICCV, 2023. 2, 8 [95] Qianqian Wang, Yifei Zhang, Aleksander Holynski,"},{"citing_arxiv_id":"2605.00345","ref_index":46,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Pose-Aware Diffusion for 3D Generation","primary_cat":"cs.CV","submitted_at":"2026-05-01T02:05:28+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"PAD synthesizes 3D geometry in observation space via depth unprojection as anchor to eliminate pose ambiguity in image-to-3D generation.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"Driven by the rapid advancement of powerful 2D image synthesis models [33,35], pioneering3Dgenerativeframeworks[3,23,34,41,49]haveextensivelyutilizedop- timization methods based on differentiable rendering [19,29,30] to lift 2D gener- ative priors into high-fidelity 3D representations. Subsequent research has transi- tioned toward multi-view diffusion [37,38,46] or video generative models [5,43] to ensure spatial consistency. Concurrently, there is a paradigm shift from iterative optimization [25,26] to feed-forward large reconstruction models [14,40,55,56], which directly generate 3D assets in a single pass. Current state-of-the-art frame- works have adopted native 3D generative modeling [17,21,52,53,64], which en-"},{"citing_arxiv_id":"2604.27504","ref_index":52,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"REVIVE 3D: Refinement via Encoded Voluminous Inflated prior for Volume Enhancement","primary_cat":"cs.CV","submitted_at":"2026-04-30T06:54:35+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"REVIVE 3D generates voluminous 3D assets from flat 2D images via an inflated prior construction followed by latent-space refinement, plus new metrics for volume and flatness validated by user study.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.26917","ref_index":36,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"AnimateAnyMesh++: A Flexible 4D Foundation Model for High-Fidelity Text-Driven Mesh Animation","primary_cat":"cs.CV","submitted_at":"2026-04-29T17:27:40+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"AnimateAnyMesh++ animates arbitrary 3D meshes from text using an expanded 300K-identity DyMesh-XL dataset, a power-law topology-aware DyMeshVAE-Flex, and a variable-length rectified-flow generator to produce semantically accurate, temporally coherent animations in seconds.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":", \"Photorealistic text-to-image diffusion models with deep language understanding,\"Advances in neural information processing systems, vol. 35, pp. 36 479-36 494, 2022. [35] R. Liu, R. Wu, B. Van Hoorick, P. Tokmakov, S. Zakharov, and C. V on- drick, \"Zero-1-to-3: Zero-shot one image to 3d object,\" inProceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 9298-9309. [36] P. Wang and Y . Shi, \"Imagedream: Image-prompt multi-view diffusion for 3d generation,\"arXiv preprint arXiv:2312.02201, 2023. [37] Y . Liu, C. Lin, Z. Zeng, X. Long, L. Liu, T. Komura, and W. Wang, \"Syncdreamer: Generating multiview-consistent images from a single- view image,\"arXiv preprint arXiv:2309.03453, 2023. [38] R. Shi, H. Chen, Z. Zhang, M."},{"citing_arxiv_id":"2604.18468","ref_index":41,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Asset Harvester: Extracting 3D Assets from Autonomous Driving Logs for Simulation","primary_cat":"cs.CV","submitted_at":"2026-04-20T16:20:57+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Asset Harvester converts sparse in-the-wild object observations from AV driving logs into complete simulation-ready 3D assets via data curation, geometry-aware preprocessing, and a SparseViewDiT model that couples sparse-view multiview generation with 3D Gaussian lifting.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"[39] Ruoshi Liu, Rundi Wu, Basile Van Hoorick, Pavel Tokmakov, Sergey Zakharov, and Carl Vondrick. Zero-1-to-3: Zero-shot one image to 3d object.arXiv preprint arXiv:2303.11328, 2023. 16 [40] Yichun Shi, Peng Wang, Jianglong Ye, Mai Long, Kejie Li, and Xiao Yang. Mvdream: Multi-view diffusion for 3d generation.arXiv preprint arXiv:2308.16512, 2023. 16 [41] Peng Wang and Yichun Shi. Imagedream: Image-prompt multi-view diffusion for 3d generation.arXiv preprint arXiv:2312.02201, 2023. 16 [42] Shuang Wu, Youtian Lin, Feihu Zhang, Yifei Zeng, Jingxi Xu, Philip Torr, Xun Cao, and Yao Yao. Direct3d: Scalable image-to-3d generation via 3d latent diffusion transformer.arXiv preprint arXiv:2405.14832, 2024. 16"},{"citing_arxiv_id":"2604.13856","ref_index":42,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Any3DAvatar: Fast and High-Quality Full-Head 3D Avatar Reconstruction from Single Portrait Image","primary_cat":"cs.CV","submitted_at":"2026-04-15T13:24:47+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Any3DAvatar reconstructs full-head 3D Gaussian avatars from one image via one-step denoising on a Plücker-aware scaffold plus auxiliary view supervision, beating prior single-image methods on fidelity while running substantially faster.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"IEEE, 20697-20709. doi:10.1109/CVPR52733.2024.01956 [41] Yifan Wang, Jianjun Zhou, Haoyi Zhu, Wenzheng Chang, Yang Zhou, Zizun Li, Junyi Chen, Jiangmiao Pang, Chunhua Shen, and Tong He. 2026. 𝜋3: Scalable Permutation-Equivariant Visual Geometry Learning. InInternational Conference on Learning Representations (ICLR), 2026. https://doi.org/10.48550/arXiv.2507. 13347 [42] Zidu Wang, Xiangyu Zhu, Tianshuo Zhang, Baiqin Wang, and Zhen Lei. 2024. 3D Face Reconstruction with the Geometric Guidance of Facial Part Segmentation. InIEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024. IEEE, 1672-1682. doi:10.1109/CVPR52733.2024.00165 [43] Kailu Wu, Fangfu Liu, Zhihan Cai, Runjie Yan, Hanyang Wang, Yating Hu,"},{"citing_arxiv_id":"2509.07435","ref_index":77,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"DreamLifting: A Plug-in Module Lifting MV Diffusion Models for 3D Asset Generation","primary_cat":"cs.CV","submitted_at":"2025-09-09T06:43:15+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"LGAA is a modular adapter framework that lifts multi-view diffusion models to produce 2D Gaussian Splats with PBR channels for high-quality relightable 3D mesh extraction using data-efficient finetuning on 69k instances.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2508.02324","ref_index":28,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Qwen-Image Technical Report","primary_cat":"cs.CV","submitted_at":"2025-08-04T11:49:20+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Qwen-Image is a foundation model that reaches state-of-the-art results in image generation and editing by combining a large-scale text-focused data pipeline with curriculum learning and dual semantic-reconstructive encoding for editing consistency.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2506.16504","ref_index":14,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Hunyuan3D 2.5: Towards High-Fidelity 3D Assets Generation with Ultimate Details","primary_cat":"cs.CV","submitted_at":"2025-06-19T17:57:40+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"Hunyuan3D 2.5's LATTICE model with 10B parameters generates detailed 3D shapes from images and uses multi-view PBR for textures, outperforming prior methods in fidelity and mesh quality.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2502.06608","ref_index":188,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"TripoSG: High-Fidelity 3D Shape Synthesis using Large-Scale Rectified Flow Models","primary_cat":"cs.CV","submitted_at":"2025-02-10T16:07:54+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"TripoSG generates high-fidelity 3D meshes from input images via a large-scale rectified flow transformer and hybrid-trained 3D VAE on a custom 2-million-sample dataset, claiming state-of-the-art fidelity and generalization.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2501.12202","ref_index":94,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Hunyuan3D 2.0: Scaling Diffusion Models for High Resolution Textured 3D Assets Generation","primary_cat":"cs.CV","submitted_at":"2025-01-21T15:16:54+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"Hunyuan3D 2.0 scales flow-based diffusion transformers and texture synthesis models to generate high-resolution textured 3D assets that outperform prior state-of-the-art in geometry, alignment, and texture quality.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2405.10314","ref_index":9,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"CAT3D: Create Anything in 3D with Multi-View Diffusion Models","primary_cat":"cs.CV","submitted_at":"2024-05-16T17:59:05+00:00","verdict":"CONDITIONAL","verdict_confidence":"MODERATE","novelty_score":6.0,"formal_verification":"none","one_line_summary":"A multi-view diffusion model generates consistent novel views from sparse images to enable fast 3D scene reconstruction.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2404.07191","ref_index":50,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"InstantMesh: Efficient 3D Mesh Generation from a Single Image with Sparse-view Large Reconstruction Models","primary_cat":"cs.CV","submitted_at":"2024-04-10T17:48:37+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"InstantMesh produces diverse, high-quality 3D meshes from single images in seconds by combining a multi-view diffusion model with a sparse-view large reconstruction model and optimizing directly on meshes.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"Multi-view Diffusion Models. To address the inconsis- tency among multiple generated views of Zero123, some works [24, 26, 41, 50] try to fine-tune 2D diffusion models to synthesize multiple views for the same object simulta- neously. With 3D consistent multi-view images, various techniques can be applied to obtain the 3D object, e.g., SDS optimization [50], neural surface reconstruction meth- 2 FlexiCubes ViT Encoder Multi-view Diffusion Model Triplane Decoder Input Image Multi-view Image Tokens Triplane Mesh render 1283 Grid Sparse-view Large Reconstruction Model Figure 2. The overview of our InstantMesh framework. Given an input image, we first utilize a multi-view diffusion model to synthesize 6"}],"limit":50,"offset":0}