GenRecon: Bridging Generative Priors for Multi-View 3D Scene Reconstruction
Pith reviewed 2026-05-25 04:35 UTC · model grok-4.3
The pith
A projection-based conditioning mechanism lifts object-level generative priors to multi-view scene-scale 3D reconstruction.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper claims that casting reconstruction as conditional generation over tiled chunks, combined with a projection-based conditioning mechanism, lifts posed multi-view image features into a coherent 3D representation aligned with the generative model, independent of view ordering and spatially anchored to the scene. This enables scaling from object-level priors to scene-scale generation, yielding high-fidelity editable PBR mesh reconstructions of indoor environments that outperform cutting-edge reconstruction methods by 16%.
What carries the argument
The projection-based conditioning mechanism that lifts posed multi-view image features into a coherent 3D representation aligned with the generative model.
If this is right
- Scene reconstruction is performed as conditional 3D generation over overlapping chunks that tile large extents.
- Generated geometry remains multi-view consistent and high-fidelity.
- The output consists of faithful, editable PBR meshes suitable for indoor environments.
- Quantitative performance exceeds cutting-edge reconstruction methods by 16%.
Where Pith is reading between the lines
- The conditioning approach is presented as general and could therefore apply to other generative shape models.
- Chunk-based tiling implies the method can extend to larger environments simply by increasing the number of chunks.
- Editable PBR meshes enable direct use in graphics pipelines for editing or simulation without additional conversion steps.
Load-bearing premise
The projection-based conditioning mechanism lifts posed multi-view image features into a coherent 3D representation aligned with the generative model, independent of view ordering and spatially anchored to the scene.
What would settle it
A quantitative evaluation on standard indoor multi-view benchmarks that shows the generated meshes lack consistency across views or fail to exceed baseline reconstruction accuracy by the reported margin would falsify the central claim.
Figures
read the original abstract
We introduce a new approach to high-fidelity 3D scene reconstruction from multi-view RGB images that tightly couples reconstruction with a strong generative 3D prior. We cast scene reconstruction as conditional 3D generation over a set of spatially-localized, overlapping chunks that together tile the scene, scaling generation to large scene extents. Crucially, we inherit the fidelity and completeness of state-of-the-art generative shape models -- we use Trellis.2 as an example -- which we generalize to the scene level. To this end, we propose a projection-based conditioning mechanism that lifts posed multi-view image features into a coherent 3D representation aligned with the generative model, independent of view ordering and spatially anchored to the scene, yielding high-fidelity, multi-view consistent generated geometry. This enables lifting the strong object-level prior of Trellis.2 to multi-view, scene-scale generation, producing faithful, editable PBR mesh reconstructions of indoor environments. As a result, we obtain high-fidelity results that outperform cutting-edge reconstruction methods by 16%.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces GenRecon, a method for high-fidelity 3D scene reconstruction from multi-view RGB images. It formulates scene reconstruction as conditional 3D generation over spatially-localized overlapping chunks that tile the scene, generalizing the object-level generative prior of Trellis.2 to scene scale. A projection-based conditioning mechanism is proposed to lift posed multi-view image features into a coherent 3D representation aligned with the generative model, independent of view ordering and spatially anchored to the scene, yielding multi-view consistent PBR mesh outputs that outperform cutting-edge reconstruction methods by 16%.
Significance. If the central claims hold, the work would meaningfully advance integration of strong generative 3D priors with multi-view reconstruction, enabling scalable, high-fidelity, and editable reconstructions of indoor scenes. The chunk-tiling strategy and conditioning approach could address limitations in applying object-centric generative models to large environments while preserving consistency.
major comments (2)
- [Abstract] Abstract: The claim that the method 'outperform[s] cutting-edge reconstruction methods by 16%' is stated without reference to any metrics, baselines, datasets, error analysis, or experimental protocol, rendering the central quantitative result unverifiable from the provided description.
- [Abstract] Abstract: The projection-based conditioning is asserted to lift features into a 'coherent 3D representation aligned with the generative model, independent of view ordering and spatially anchored to the scene', but supplies no implementation details on the underlying 3D structure, feature projection, cross-view aggregation, pose encoding, or invariance operation. This mechanism is load-bearing for the claim that Trellis.2's object-level prior can be lifted to overlapping scene chunks while preserving multi-view consistency.
Simulated Author's Rebuttal
We thank the referee for the constructive comments. We address each point below and will revise the abstract to improve clarity and verifiability of the claims.
read point-by-point responses
-
Referee: [Abstract] Abstract: The claim that the method 'outperform[s] cutting-edge reconstruction methods by 16%' is stated without reference to any metrics, baselines, datasets, error analysis, or experimental protocol, rendering the central quantitative result unverifiable from the provided description.
Authors: We agree that the abstract's quantitative claim would benefit from additional context. In the revised manuscript we will update the abstract to name the metric, the main baselines, the dataset, and a pointer to the experimental section (Section 4) for the protocol and analysis, while keeping the abstract concise. revision: yes
-
Referee: [Abstract] Abstract: The projection-based conditioning is asserted to lift features into a 'coherent 3D representation aligned with the generative model, independent of view ordering and spatially anchored to the scene', but supplies no implementation details on the underlying 3D structure, feature projection, cross-view aggregation, pose encoding, or invariance operation. This mechanism is load-bearing for the claim that Trellis.2's object-level prior can be lifted to overlapping scene chunks while preserving multi-view consistency.
Authors: Abstracts are necessarily high-level. The full technical description of the projection-based conditioning (3D chunk structure, feature projection, cross-view aggregation, pose encoding, and order invariance) appears in Section 3.2. We will add one short clarifying phrase to the abstract to better signal these aspects without exceeding length limits. revision: partial
Circularity Check
No circularity: derivation builds on external Trellis.2 prior via proposed conditioning mechanism
full rationale
The paper's central derivation introduces a projection-based conditioning to lift multi-view features into a 3D representation compatible with an external generative model (Trellis.2). This is presented as a novel proposal rather than a self-referential fit, renaming, or self-citation chain. No equations or steps reduce the claimed outputs to the inputs by construction; the 16% improvement is positioned as an empirical outcome of applying the external prior at scene scale. The mechanism is described as independent of view ordering and spatially anchored, but this is asserted as a design property of the proposed method, not derived tautologically from fitted quantities or prior self-work. The derivation chain remains self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
joint chunk generation... MultiDiffusion-style scheme... z_{t-1}(x)=1/Σ Mc(x) Σ Mc(x) ẑ^(c)_{t-1}(x)
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Taylor, Mathias Unberath, Ming-Yu Liu, and Chen-Hsuan Lin
Zhaoshuo Li, Thomas Müller, Alex Evans, Russell H. Taylor, Mathias Unberath, Ming-Yu Liu, and Chen-Hsuan Lin. Neuralangelo: High-fidelity neural surface reconstruction, 2023. URL https://arxiv.org/abs/2306.03092
-
[2]
Neus: Learning neural implicit surfaces by volume rendering for multi-view reconstruction,
Peng Wang, Lingjie Liu, Yuan Liu, Christian Theobalt, Taku Komura, and Wenping Wang. Neus: Learning neural implicit surfaces by volume rendering for multi-view reconstruction,
-
[3]
URLhttps://arxiv.org/abs/2106.10689
work page internal anchor Pith review Pith/arXiv arXiv
-
[4]
V olume rendering of neural implicit surfaces, 2021
Lior Yariv, Jiatao Gu, Yoni Kasten, and Yaron Lipman. V olume rendering of neural implicit surfaces, 2021. URLhttps://arxiv.org/abs/2106.12052
-
[5]
Monosdf: Exploring monocular geometric cues for neural implicit surface reconstruction, 2022
Zehao Yu, Songyou Peng, Michael Niemeyer, Torsten Sattler, and Andreas Geiger. Monosdf: Exploring monocular geometric cues for neural implicit surface reconstruction, 2022. URL https://arxiv.org/abs/2206.00665
-
[6]
Neusg: Neural implicit surface reconstruction with 3d gaussian splatting guidance, 2025
Hanlin Chen, Chen Li, Yunsong Wang, and Gim Hee Lee. Neusg: Neural implicit surface reconstruction with 3d gaussian splatting guidance, 2025. URL https://arxiv.org/abs/ 2312.00846
-
[7]
3d gaussian splatting for real-time radiance field rendering, 2023
Bernhard Kerbl, Georgios Kopanas, Thomas Leimkühler, and George Drettakis. 3d gaussian splatting for real-time radiance field rendering, 2023. URL https://arxiv.org/abs/2308. 04079
2023
-
[8]
2d gaussian splatting for geometrically accurate radiance fields, 2025
Binbin Huang, Zehao Yu, Anpei Chen, Andreas Geiger, and Shenghua Gao. 2d gaussian splatting for geometrically accurate radiance fields, 2025. URL https://arxiv.org/abs/ 2403.17888
-
[9]
Pgsr: Planar-based gaussian splatting for efficient and high-fidelity surface reconstruction, 2025
Danpeng Chen, Hai Li, Weicai Ye, Yifan Wang, Weijian Xie, Shangjin Zhai, Nan Wang, Haomin Liu, Hujun Bao, and Guofeng Zhang. Pgsr: Planar-based gaussian splatting for efficient and high-fidelity surface reconstruction, 2025. URLhttps://arxiv.org/abs/2406.06521
-
[10]
Dust3r: Geometric 3d vision made easy, 2024
Shuzhe Wang, Vincent Leroy, Yohann Cabon, Boris Chidlovskii, and Jerome Revaud. Dust3r: Geometric 3d vision made easy, 2024. URLhttps://arxiv.org/abs/2312.14132
-
[11]
Grounding image matching in 3d with mast3r, 2024
Vincent Leroy, Yohann Cabon, and Jérôme Revaud. Grounding image matching in 3d with mast3r, 2024. URLhttps://arxiv.org/abs/2406.09756
-
[12]
Vggt: Visual geometry grounded transformer, 2025
Jianyuan Wang, Minghao Chen, Nikita Karaev, Andrea Vedaldi, Christian Rupprecht, and David Novotny. Vggt: Visual geometry grounded transformer, 2025. URL https://arxiv.org/ abs/2503.11651. 10
-
[13]
Depth Anything 3: Recovering the Visual Space from Any Views
Haotong Lin, Sili Chen, Junhao Liew, Donny Y . Chen, Zhenyu Li, Guang Shi, Jiashi Feng, and Bingyi Kang. Depth anything 3: Recovering the visual space from any views, 2025. URL https://arxiv.org/abs/2511.10647
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[14]
Neuralrecon: Real- time coherent 3d reconstruction from monocular video, 2021
Jiaming Sun, Yiming Xie, Linghao Chen, Xiaowei Zhou, and Hujun Bao. Neuralrecon: Real- time coherent 3d reconstruction from monocular video, 2021. URL https://arxiv.org/ abs/2104.00681
-
[15]
Finerecon: Depth-aware feed-forward network for detailed 3d reconstruction, 2023
Noah Stier, Anurag Ranjan, Alex Colburn, Yajie Yan, Liang Yang, Fangchang Ma, and Baptiste Angles. Finerecon: Depth-aware feed-forward network for detailed 3d reconstruction, 2023. URLhttps://arxiv.org/abs/2304.01480
-
[16]
pixelsplat: 3d gaussian splats from image pairs for scalable generalizable 3d reconstruction, 2024
David Charatan, Sizhe Li, Andrea Tagliasacchi, and Vincent Sitzmann. pixelsplat: 3d gaussian splats from image pairs for scalable generalizable 3d reconstruction, 2024. URL https: //arxiv.org/abs/2312.12337
-
[17]
Mvsplat: Efficient 3d gaussian splatting from sparse multi-view images, 2024
Yuedong Chen, Haofei Xu, Chuanxia Zheng, Bohan Zhuang, Marc Pollefeys, Andreas Geiger, Tat-Jen Cham, and Jianfei Cai. Mvsplat: Efficient 3d gaussian splatting from sparse multi-view images, 2024. URLhttps://arxiv.org/abs/2403.14627
-
[18]
Anysplat: Feed-forward 3d gaussian splatting from unconstrained views, 2025
Lihan Jiang, Yucheng Mao, Linning Xu, Tao Lu, Kerui Ren, Yichen Jin, Xudong Xu, Mulin Yu, Jiangmiao Pang, Feng Zhao, Dahua Lin, and Bo Dai. Anysplat: Feed-forward 3d gaussian splatting from unconstrained views, 2025. URLhttps://arxiv.org/abs/2505.23716
-
[19]
Jiale Xu, Weihao Cheng, Yiming Gao, Xintao Wang, Shenghua Gao, and Ying Shan. In- stantmesh: Efficient 3d mesh generation from a single image with sparse-view large reconstruc- tion models, 2024. URLhttps://arxiv.org/abs/2404.07191
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[20]
One-2-3-45: Any single image to 3d mesh in 45 seconds without per-shape optimization, 2023
Minghua Liu, Chao Xu, Haian Jin, Linghao Chen, Mukund Varma T, Zexiang Xu, and Hao Su. One-2-3-45: Any single image to 3d mesh in 45 seconds without per-shape optimization, 2023. URLhttps://arxiv.org/abs/2306.16928
-
[21]
SyncDreamer: Generating Multiview-consistent Images from a Single-view Image
Yuan Liu, Cheng Lin, Zijiao Zeng, Xiaoxiao Long, Lingjie Liu, Taku Komura, and Wenping Wang. Syncdreamer: Generating multiview-consistent images from a single-view image, 2024. URLhttps://arxiv.org/abs/2309.03453
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[22]
Hunyuan3D 2.0: Scaling Diffusion Models for High Resolution Textured 3D Assets Generation
Zibo Zhao, Zeqiang Lai, Qingxiang Lin, Yunfei Zhao, Haolin Liu, Shuhui Yang, Yifei Feng, Mingxin Yang, Sheng Zhang, Xianghui Yang, Huiwen Shi, Sicong Liu, Junta Wu, Yihang Lian, Fan Yang, Ruining Tang, Zebin He, Xinzhou Wang, Jian Liu, Xuhui Zuo, Zhuo Chen, Biwen Lei, Haohan Weng, Jing Xu, Yiling Zhu, Xinhai Liu, Lixin Xu, Changrong Hu, Shaoxiong Yang, So...
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[23]
Waslander, Sara Vicente, Daniyar Turmukhambetov, and Michael Firman
Ziwei Liao, Mohamed Sayed, Steven L. Waslander, Sara Vicente, Daniyar Turmukhambetov, and Michael Firman. Complete gaussian splats from a single image with denoising diffusion models, 2025. URLhttps://arxiv.org/abs/2508.21542
-
[24]
Reconviagen: Towards accurate multi-view 3d object reconstruction via generation, 2025
Jiahao Chang, Chongjie Ye, Yushuang Wu, Yuantao Chen, Yidan Zhang, Zhongjin Luo, Chenghong Li, Yihao Zhi, and Xiaoguang Han. Reconviagen: Towards accurate multi-view 3d object reconstruction via generation, 2025. URLhttps://arxiv.org/abs/2510.23306
-
[25]
Native and Compact Structured Latents for 3D Generation
Jianfeng Xiang, Xiaoxue Chen, Sicheng Xu, Ruicheng Wang, Zelong Lv, Yu Deng, Hongyuan Zhu, Yue Dong, Hao Zhao, Nicholas Jing Yuan, and Jiaolong Yang. Native and compact structured latents for 3d generation, 2025. URLhttps://arxiv.org/abs/2512.14692
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[26]
Schonberger and Jan-Michael Frahm
Johannes L. Schonberger and Jan-Michael Frahm. Structure-from-Motion Revisited. In2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV , USA, June 2016. IEEE. doi: 10.1109/cvpr.2016.445. 11
-
[27]
Srinivasan, Matthew Tancik, Jonathan T
Ben Mildenhall, Pratul P. Srinivasan, Matthew Tancik, Jonathan T. Barron, Ravi Ramamoorthi, and Ren Ng. Nerf: Representing scenes as neural radiance fields for view synthesis, 2020. URL https://arxiv.org/abs/2003.08934
-
[28]
Meshsplats: Mesh-based rendering with gaussian splatting initialization, 2026
Rafał Tobiasz, Grzegorz Wilczy´nski, Marcin Mazur, Sławomir Tadeja, Weronika Smolak- Dy˙zewska, and Przemysław Spurek. Meshsplats: Mesh-based rendering with gaussian splatting initialization, 2026. URLhttps://arxiv.org/abs/2502.07754
-
[29]
Qianqian Wang, Yifei Zhang, Aleksander Holynski, Alexei A. Efros, and Angjoo Kanazawa. Continuous 3d perception model with persistent state, 2025. URL https://arxiv.org/abs/ 2501.12387
-
[30]
Transformerfusion: Monocular rgb scene reconstruction using transformers, 2021
Aljaž Božiˇc, Pablo Palafox, Justus Thies, Angela Dai, and Matthias Nießner. Transformerfusion: Monocular rgb scene reconstruction using transformers, 2021. URL https://arxiv.org/ abs/2107.02191
-
[31]
Noah Stier, Alexander Rich, Pradeep Sen, and Tobias Höllerer. V ortx: V olumetric 3d reconstruction with transformers for voxelwise view selection and fusion, 2021. URL https://arxiv.org/abs/2112.00236
-
[32]
Visfusion: Visibility-aware online 3d scene recon- struction from videos, 2023
Huiyu Gao, Wei Mao, and Miaomiao Liu. Visfusion: Visibility-aware online 3d scene recon- struction from videos, 2023. URLhttps://arxiv.org/abs/2304.10687
-
[33]
Uforecon: Generalizable sparse-view surface reconstruction from arbitrary and unfavorable sets, 2024
Youngju Na, Woo Jae Kim, Kyu Beom Han, Suhyeon Ha, and Sung eui Yoon. Uforecon: Generalizable sparse-view surface reconstruction from arbitrary and unfavorable sets, 2024. URLhttps://arxiv.org/abs/2403.05086
-
[34]
Simplerecon: 3d reconstruction without 3d convolutions, 2022
Mohamed Sayed, John Gibson, Jamie Watson, Victor Prisacariu, Michael Firman, and Clément Godard. Simplerecon: 3d reconstruction without 3d convolutions, 2022. URL https://arxiv. org/abs/2208.14743
-
[35]
Depthsplat: Connecting gaussian splatting and depth, 2025
Haofei Xu, Songyou Peng, Fangjinhua Wang, Hermann Blum, Daniel Barath, Andreas Geiger, and Marc Pollefeys. Depthsplat: Connecting gaussian splatting and depth, 2025. URL https: //arxiv.org/abs/2410.13862
-
[36]
No pose, no problem: Surprisingly simple 3d gaussian splats from sparse unposed images,
Botao Ye, Sifei Liu, Haofei Xu, Xueting Li, Marc Pollefeys, Ming-Hsuan Yang, and Songyou Peng. No pose, no problem: Surprisingly simple 3d gaussian splats from sparse unposed images,
- [37]
-
[38]
Pf3plat: Pose-free feed-forward 3d gaussian splatting, 2025
Sunghwan Hong, Jaewoo Jung, Heeseong Shin, Jisang Han, Jiaolong Yang, Chong Luo, and Seungryong Kim. Pf3plat: Pose-free feed-forward 3d gaussian splatting, 2025. URL https: //arxiv.org/abs/2410.22128
-
[39]
Yonosplat: You only need one model for feedforward 3d gaussian splatting, 2025
Botao Ye, Boqi Chen, Haofei Xu, Daniel Barath, and Marc Pollefeys. Yonosplat: You only need one model for feedforward 3d gaussian splatting, 2025. URL https://arxiv.org/abs/ 2511.07321
-
[40]
Freesplat++: Generalizable 3d gaussian splatting for efficient indoor scene reconstruction, 2025
Yunsong Wang, Tianxin Huang, Hanlin Chen, and Gim Hee Lee. Freesplat++: Generalizable 3d gaussian splatting for efficient indoor scene reconstruction, 2025. URL https://arxiv. org/abs/2503.22986
-
[41]
CAT3D: Create Anything in 3D with Multi-View Diffusion Models
Ruiqi Gao, Aleksander Holynski, Philipp Henzler, Arthur Brussee, Ricardo Martin-Brualla, Pratul Srinivasan, Jonathan T. Barron, and Ben Poole. Cat3d: Create anything in 3d with multi-view diffusion models, 2024. URLhttps://arxiv.org/abs/2405.10314
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[42]
Srinivasan, Dor Verbin, Jonathan T
Rundi Wu, Ben Mildenhall, Philipp Henzler, Keunhong Park, Ruiqi Gao, Daniel Watson, Pratul P. Srinivasan, Dor Verbin, Jonathan T. Barron, Ben Poole, and Aleksander Holynski. Reconfusion: 3d reconstruction with diffusion priors, 2023. URL https://arxiv.org/abs/2312.02981
-
[43]
Multi-view reconstruction via sfm-guided monocular depth estimation,
Haoyu Guo, He Zhu, Sida Peng, Haotong Lin, Yunzhi Yan, Tao Xie, Wenguan Wang, Xiaowei Zhou, and Hujun Bao. Multi-view reconstruction via sfm-guided monocular depth estimation,
- [44]
-
[45]
Depthcrafter: Generating consistent long depth sequences for open-world videos,
Wenbo Hu, Xiangjun Gao, Xiaoyu Li, Sijie Zhao, Xiaodong Cun, Yong Zhang, Long Quan, and Ying Shan. Depthcrafter: Generating consistent long depth sequences for open-world videos,
- [46]
-
[47]
Geome- trycrafter: Consistent geometry estimation for open-world videos with diffusion priors, 2025
Tian-Xing Xu, Xiangjun Gao, Wenbo Hu, Xiaoyu Li, Song-Hai Zhang, and Ying Shan. Geome- trycrafter: Consistent geometry estimation for open-world videos with diffusion priors, 2025. URLhttps://arxiv.org/abs/2504.01016
-
[48]
Reconx: Reconstruct any scene from sparse views with video diffusion model,
Fangfu Liu, Wenqiang Sun, Hanyang Wang, Yikai Wang, Haowen Sun, Junliang Ye, Jun Zhang, and Yueqi Duan. Reconx: Reconstruct any scene from sparse views with video diffusion model,
- [49]
-
[50]
ViewCrafter: Taming Video Diffusion Models for High-fidelity Novel View Synthesis
Wangbo Yu, Jinbo Xing, Li Yuan, Wenbo Hu, Xiaoyu Li, Zhipeng Huang, Xiangjun Gao, Tien-Tsin Wong, Ying Shan, and Yonghong Tian. Viewcrafter: Taming video diffusion models for high-fidelity novel view synthesis, 2024. URLhttps://arxiv.org/abs/2409.02048
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[51]
MV-SAM3D: Adaptive Multi-View Fusion for Layout-Aware 3D Generation
Baicheng Li, Dong Wu, Jun Li, Shunkai Zhou, Zecui Zeng, Lusong Li, and Hongbin Zha. Mv-sam3d: Adaptive multi-view fusion for layout-aware 3d generation, 2026. URL https: //arxiv.org/abs/2603.11633
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[52]
Pixal3D: Pixel-Aligned 3D Generation from Images
Dong-Yang Li, Wang Zhao, Yuxin Chen, Wenbo Hu, Meng-Hao Guo, Fang-Lue Zhang, Ying Shan, and Shi-Min Hu. Pixal3d: Pixel-aligned 3d generation from images, 2026. URL https://arxiv.org/abs/2605.10922
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[53]
Scenewiz3d: Towards text-guided 3d scene composition, 2023
Qihang Zhang, Chaoyang Wang, Aliaksandr Siarohin, Peiye Zhuang, Yinghao Xu, Ceyuan Yang, Dahua Lin, Bolei Zhou, Sergey Tulyakov, and Hsin-Ying Lee. Scenewiz3d: Towards text-guided 3d scene composition, 2023. URLhttps://arxiv.org/abs/2312.08885
-
[54]
Xiaoyu Zhou, Xingjian Ran, Yajiao Xiong, Jinlin He, Zhiwei Lin, Yongtao Wang, Deqing Sun, and Ming-Hsuan Yang. Gala3d: Towards text-to-3d complex scene generation via layout-guided generative gaussian splatting, 2024. URLhttps://arxiv.org/abs/2402.07207
-
[55]
Discene: Object decoupling and interaction modeling for complex scene generation
Xiao-Lei Li, Haodong Li, Hao-Xiang Chen, Tai-Jiang Mu, and Shi-Min Hu. Discene: Object decoupling and interaction modeling for complex scene generation. InSIGGRAPH Asia 2024 Conference Papers, SA ’24, New York, NY , USA, 2024. Association for Computing Machinery. ISBN 9798400711312. doi: 10.1145/3680528.3687589. URL https://doi.org/10.1145/ 3680528.3687589
-
[56]
Comboverse: Compositional 3d assets creation using spatially-aware diffusion guidance, 2024
Yongwei Chen, Tengfei Wang, Tong Wu, Xingang Pan, Kui Jia, and Ziwei Liu. Comboverse: Compositional 3d assets creation using spatially-aware diffusion guidance, 2024. URL https: //arxiv.org/abs/2403.12409
-
[57]
Reparo: Compositional 3d assets generation with differentiable 3d layout alignment, 2025
Haonan Han, Rui Yang, Huan Liao, Jiankai Xing, Zunnan Xu, Xiaoming Yu, Junwei Zha, Xiu Li, and Wanhua Li. Reparo: Compositional 3d assets generation with differentiable 3d layout alignment, 2025. URLhttps://arxiv.org/abs/2405.18525
-
[58]
Cast: Component-aligned 3d scene reconstruction from an rgb image, 2025
Kaixin Yao, Longwen Zhang, Xinhao Yan, Yan Zeng, Qixuan Zhang, Wei Yang, Lan Xu, Jiayuan Gu, and Jingyi Yu. Cast: Component-aligned 3d scene reconstruction from an rgb image, 2025. URLhttps://arxiv.org/abs/2502.12894
-
[59]
Dreamanywhere: Object-centric panoramic 3d scene generation, 2025
Edoardo Alberto Dominici, Jozef Hladky, Floor Verhoeven, Lukas Radl, Thomas Deixelberger, Stefan Ainetter, Philipp Drescher, Stefan Hauswiesner, Arno Coomans, Giacomo Nazzaro, Konstantinos Vardis, and Markus Steinberger. Dreamanywhere: Object-centric panoramic 3d scene generation, 2025. URLhttps://arxiv.org/abs/2506.20367
-
[60]
Yaron Lipman, Ricky T. Q. Chen, Heli Ben-Hamu, Maximilian Nickel, and Matt Le. Flow matching for generative modeling, 2023. URLhttps://arxiv.org/abs/2210.02747
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[61]
Oriane Siméoni, Huy V . V o, Maximilian Seitzer, Federico Baldassarre, Maxime Oquab, Cijo Jose, Vasil Khalidov, Marc Szafraniec, Seungeun Yi, Michaël Ramamonjisoa, Francisco Massa, Daniel Haziza, Luca Wehrstedt, Jianyuan Wang, Timothée Darcet, Théo Moutakanni, Leonel Sentana, Claire Roberts, Andrea Vedaldi, Jamie Tolan, John Brandt, Camille Couprie, Julie...
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[62]
Barron, Ricardo Martin-Brualla, Noah Snavely, and Thomas Funkhouser
Qianqian Wang, Zhicheng Wang, Kyle Genova, Pratul Srinivasan, Howard Zhou, Jonathan T. Barron, Ricardo Martin-Brualla, Noah Snavely, and Thomas Funkhouser. Ibrnet: Learning multi-view image-based rendering. InCVPR, 2021
2021
-
[63]
LoRA: Low-Rank Adaptation of Large Language Models
Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. Lora: Low-rank adaptation of large language models, 2021. URL https://arxiv.org/abs/2106.09685
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[64]
Multidiffusion: Fusing diffusion paths for controlled image generation, 2023
Omer Bar-Tal, Lior Yariv, Yaron Lipman, and Tali Dekel. Multidiffusion: Fusing diffusion paths for controlled image generation, 2023. URLhttps://arxiv.org/abs/2302.08113
-
[65]
Sage: Scalable agentic 3d scene generation for embodied ai, 2026
Hongchi Xia, Xuan Li, Zhaoshuo Li, Qianli Ma, Jiashu Xu, Ming-Yu Liu, Yin Cui, Tsung-Yi Lin, Wei-Chiu Ma, Shenlong Wang, Shuran Song, and Fangyin Wei. Sage: Scalable agentic 3d scene generation for embodied ai, 2026. URLhttps://arxiv.org/abs/2602.10116
-
[66]
Structured 3D Latents for Scalable and Versatile 3D Generation
Jianfeng Xiang, Zelong Lv, Sicheng Xu, Yu Deng, Ruicheng Wang, Bowen Zhang, Dong Chen, Xin Tong, and Jiaolong Yang. Structured 3d latents for scalable and versatile 3d generation. arXiv preprint arXiv:2412.01506, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[67]
3d-front: 3d furnished rooms with layouts and semantics.arXiv preprint arXiv:2011.09127, 2020
Huan Fu, Bowen Cai, Lin Gao, Lingxiao Zhang, Cao Li, Qixun Zeng, Chengyue Sun, Yiyun Fei, Yu Zheng, Ying Li, Yi Liu, Peng Liu, Lin Ma, Le Weng, Xiaohang Hu, Xin Ma, Qian Qian, Rongfei Jia, Binqiang Zhao, and Hao Zhang. 3d-front: 3d furnished rooms with layouts and semantics.arXiv preprint arXiv:2011.09127, 2020
-
[68]
Scannet++: A high-fidelity dataset of 3d indoor scenes
Chandan Yeshwanth, Yueh-Cheng Liu, Matthias Nießner, and Angela Dai. Scannet++: A high-fidelity dataset of 3d indoor scenes. InProceedings of the International Conference on Computer Vision (ICCV), 2023
2023
-
[69]
Decoupled Weight Decay Regularization
Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization, 2019. URL https: //arxiv.org/abs/1711.05101
work page internal anchor Pith review Pith/arXiv arXiv 2019
-
[70]
Classifier-Free Diffusion Guidance
Jonathan Ho and Tim Salimans. Classifier-free diffusion guidance, 2022. URL https:// arxiv.org/abs/2207.12598
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[71]
The unreason- able effectiveness of deep features as a perceptual metric
Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman, and Oliver Wang. The unreason- able effectiveness of deep features as a perceptual metric. InCVPR, 2018
2018
-
[72]
Imagenet classification with deep convolutional neural networks
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural networks. In F. Pereira, C.J. Burges, L. Bottou, and K. Weinberger, editors,Advances in Neural Information Processing Systems, volume 25. Curran Associates, Inc., 2012. URL https://proceedings.neurips.cc/paper_files/paper/2012/file/ c399862d3b9d6...
2012
-
[73]
Learning Transferable Visual Models From Natural Language Supervision
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agar- wal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. Learning transferable visual models from natural language supervision, 2021. URL https://arxiv.org/abs/2103.00020
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[74]
Omnidata: A scalable pipeline for making multi-task mid-level vision datasets from 3d scans
Ainaz Eftekhar, Alexander Sax, Jitendra Malik, and Amir Zamir. Omnidata: A scalable pipeline for making multi-task mid-level vision datasets from 3d scans. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 10786–10796, 2021
2021
-
[75]
Chang, Manolis Savva, Maciej Halber, Thomas Funkhouser, and Matthias Nießner
Angela Dai, Angel X. Chang, Manolis Savva, Maciej Halber, Thomas Funkhouser, and Matthias Nießner. Scannet: Richly-annotated 3d reconstructions of indoor scenes. InProc. Computer Vision and Pattern Recognition (CVPR), IEEE, 2017. 14 A Experimental Setup Implementation Details.We build on Trellis.2 [ 24] at resolution 512. Consequently, for the occupancy g...
2017
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.