Recognition: unknown
FurnSet: Exploiting Repeats for 3D Scene Reconstruction
Pith reviewed 2026-05-10 01:26 UTC · model grok-4.3
The pith
Exploiting repeated object instances improves single-view 3D scene reconstruction quality.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By introducing per-object CLS tokens and a set-aware self-attention mechanism that groups identical instances and aggregates complementary observations across them, the framework enables joint reconstruction of repeated objects, combined with scene-level and object-level conditioning and layout optimization using object point clouds with 3D and 2D projection losses, leading to improved scene reconstruction quality.
What carries the argument
per-object CLS tokens and set-aware self-attention mechanism for grouping identical instances and aggregating observations
If this is right
- Joint reconstruction from grouped instances fills in missing geometric details from complementary views.
- Scene alignment is enhanced by optimizing layouts with point cloud and projection consistency losses.
- Reconstruction performs better in scenes containing repeated furniture objects as shown on 3D-Future and 3D-Front.
- Object geometries are more complete and layouts more consistent than independent per-object methods.
Where Pith is reading between the lines
- This grouping strategy could apply to other repeated elements in 3D scenes beyond furniture.
- Future work might test the method in outdoor scenes with repeated structures like trees or windows.
- Hybrid systems could use this when repeats are detected and switch to standard methods otherwise.
Load-bearing premise
Real-world scenes contain sufficient reliably identifiable repeated instances that the per-object CLS tokens and set-aware self-attention can group correctly without introducing grouping errors.
What would settle it
Observing that on a test scene with clear repeated objects the method produces lower quality reconstructions or incorrect groupings compared to baseline methods that treat objects independently would falsify the central claim.
Figures
read the original abstract
Single-view 3D scene reconstruction involves inferring both object geometry and spatial layout. Existing methods typically reconstruct objects independently or rely on implicit scene context, failing to exploit the repeated instances commonly present in realworld scenes. We propose FurnSet, a framework that explicitly identifies and leverages repeated object instances to improve reconstruction. Our method introduces per-object CLS tokens and a set-aware self-attention mechanism that groups identical instances and aggregates complementary observations across them, enabling joint reconstruction. We further combine scene-level and object-level conditioning to guide object reconstruction, followed by layout optimization using object point clouds with 3D and 2D projection losses for scene alignment. Experiments on 3D-Future and 3D-Front demonstrate improved scene reconstruction quality, highlighting the effectiveness of exploiting repetition for robust 3D scene reconstruction.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces FurnSet, a framework for single-view 3D scene reconstruction that explicitly identifies repeated object instances in indoor scenes. It proposes per-object CLS tokens combined with a set-aware self-attention mechanism to group identical instances and aggregate complementary observations across them for joint reconstruction. The approach further incorporates scene-level and object-level conditioning to guide reconstruction, followed by layout optimization that uses object point clouds together with 3D and 2D projection losses. Experiments on the 3D-Future and 3D-Front datasets are reported to show improved scene reconstruction quality over prior methods.
Significance. If the grouping and aggregation mechanism proves reliable, the work could meaningfully advance single-view scene reconstruction by exploiting a structural property (repetitions) that is common in real-world indoor environments but ignored by most existing pipelines. The combination of set-aware attention with conditioning and explicit layout optimization is a coherent design choice, and evaluation on standard datasets (3D-Future, 3D-Front) would allow direct comparison with prior art. The absence of quantitative metrics, ablations, or error analysis in the current text, however, prevents a full assessment of whether the claimed gains are attributable to repetition exploitation.
major comments (1)
- [§3.2] §3.2 (set-aware self-attention): The central claim that per-object CLS tokens plus set-aware self-attention correctly group identical instances so that complementary observations can be aggregated is not supported by any validation. In single-view inputs, intra-class feature similarity can produce embeddings that are close without corresponding to identical objects; erroneous groupings would then average incompatible geometry or texture signals. The subsequent scene/object conditioning and layout optimization steps do not retroactively correct such mistakes, so any reported improvement on 3D-Future/3D-Front could be driven by conditioning alone rather than repetition exploitation. The manuscript must supply either qualitative grouping visualizations or a quantitative grouping-accuracy metric to establish that this component functions as required.
minor comments (2)
- The abstract states that experiments demonstrate improved quality but provides no numerical results, baseline comparisons, or ablation tables; these must be added with standard metrics (e.g., object IoU, scene Chamfer distance) and controls that isolate the repetition component.
- Notation for the per-object CLS tokens and the set-aware attention operation should be formalized with equations to clarify how grouping and aggregation are implemented.
Simulated Author's Rebuttal
Thank you for the opportunity to respond to the referee's comments. We value the feedback on the need for validation of the core grouping component in FurnSet. We address the concern below and commit to revisions that strengthen the manuscript.
read point-by-point responses
-
Referee: [§3.2] §3.2 (set-aware self-attention): The central claim that per-object CLS tokens plus set-aware self-attention correctly group identical instances so that complementary observations can be aggregated is not supported by any validation. In single-view inputs, intra-class feature similarity can produce embeddings that are close without corresponding to identical objects; erroneous groupings would then average incompatible geometry or texture signals. The subsequent scene/object conditioning and layout optimization steps do not retroactively correct such mistakes, so any reported improvement on 3D-Future/3D-Front could be driven by conditioning alone rather than repetition exploitation. The manuscript must supply either qualitative grouping visualizations or a quantitative grouping-accuracy metric to establish that this component functions as required.
Authors: We acknowledge the importance of validating the grouping mechanism to ensure that the observed improvements stem from repetition exploitation. While the manuscript describes the set-aware self-attention and its intended role in grouping identical instances via per-object CLS tokens, we agree that explicit evidence is necessary. In the revised version, we will include qualitative visualizations of the attention maps and grouped instances from the 3D-Future and 3D-Front datasets. These will illustrate cases where repeated objects are correctly identified and their features aggregated. Furthermore, we will introduce a quantitative grouping-accuracy metric, computed by measuring the precision and recall of instance grouping against available ground-truth labels in the datasets. This will be accompanied by an ablation study isolating the contribution of the set-aware attention versus the conditioning components. We believe these additions will substantiate the central claim. revision: yes
Circularity Check
No circularity: novel architecture with independent mechanisms evaluated on external data
full rationale
The paper proposes FurnSet as a new framework that introduces per-object CLS tokens and a set-aware self-attention mechanism to group repeated instances and aggregate observations for joint reconstruction. This is combined with scene/object conditioning and layout optimization using point clouds and projection losses. The description presents these as original contributions, with experiments on the external 3D-Future and 3D-Front datasets. No equations or steps reduce a claimed prediction or result to a fitted input or self-citation by construction. No self-citations are invoked as load-bearing for uniqueness or ansatz. The central claim of improved reconstruction via repetition exploitation rests on the described novel components rather than tautological redefinition of inputs.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Gen3dsr: Generalizable 3d scene reconstruction via divide and conquer from a single view
Andreea Ardelean, Mert Özer, and Bernhard Egger. Gen3dsr: Generalizable 3d scene reconstruction via divide and conquer from a single view. In3DV, pages 616–626. IEEE, 2025
2025
-
[2]
Chang, and Matthias Nießner
Armen Avetisyan, Manuel Dahnert, Angela Dai, Manolis Savva, Angel X. Chang, and Matthias Nießner. Scan2cad: Learning cad model alignment in rgb-d scans. InCVPR, pages 2614–2623, 2019
2019
-
[3]
ShapeNet: An Information-Rich 3D Model Repository
Angel X. Chang, Thomas Funkhouser, Leonidas Guibas, Pat Hanrahan, Qixing Huang, Zimo Li, Silvio Savarese, et al. Shapenet: An information-rich 3d model repository. arXiv preprint arXiv:1512.03012, 2015
work page internal anchor Pith review arXiv 2015
-
[4]
SAM 3D: 3Dfy Anything in Images
Xingyu Chen, Fu-Jen Chu, Pierre Gleize, Kevin J. Liang, Alexander Sax, Hao Tang, Weiyao Wang, et al. Sam 3d: 3dfy anything in images.arXiv preprint arXiv:2511.16624, 2025
work page internal anchor Pith review arXiv 2025
-
[5]
Chang, Manolis Savva, Maciej Halber, Thomas Funkhouser, and Matthias Nießner
Angela Dai, Angel X. Chang, Manolis Savva, Maciej Halber, Thomas Funkhouser, and Matthias Nießner. Scannet: Richly-annotated 3d reconstructions of indoor scenes. In CVPR, pages 5828–5839, 2017
2017
-
[6]
Objaverse-xl: A universe of 10m+ 3d objects
Matt Deitke, Ruoshi Liu, Matthew Wallingford, Huong Ngo, Oscar Michel, Aditya Kusupati, Alan Fan, et al. Objaverse-xl: A universe of 10m+ 3d objects. InNeurIPS, volume 36, pages 35799–35813, 2023
2023
-
[7]
Obja- verse: A universe of annotated 3d objects
Matt Deitke, Dustin Schwenk, Jordi Salvador, Luca Weihs, Oscar Michel, Eli Vander- Bilt, Ludwig Schmidt, Kiana Ehsani, Aniruddha Kembhavi, and Ali Farhadi. Obja- verse: A universe of annotated 3d objects. InCVPR, pages 13142–13153, 2023
2023
-
[8]
Bert: Pre- training of deep bidirectional transformers for language understanding
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre- training of deep bidirectional transformers for language understanding. InNAACL, pages 4171–4186, 2019
2019
-
[9]
Full- part: Generating each 3d part at full resolution.arXiv preprint arXiv:2510.26140,
Lihe Ding, Shaocong Dong, Yaokun Li, Chenjian Gao, Xiao Chen, Rui Han, Yihao Kuang, et al. Fullpart: Generating each 3d part at full resolution.arXiv preprint arXiv:2510.26140, 2025
-
[10]
From one to more: Contextual part latents for 3d generation
Shaocong Dong, Lihe Ding, Xiao Chen, Yaokun Li, Yuxin Wang, Yucheng Wang, Qi Wang, Jaehyeok Kim, Chenjian Gao, Zhanpeng Huang, et al. From one to more: Contextual part latents for 3d generation. InICCV, 2025
2025
-
[11]
3d-front: 3d furnished rooms with layouts and semantics
Huan Fu, Bowen Cai, Lin Gao, Ling-Xiao Zhang, Jiaming Wang, Cao Li, Qixun Zeng, et al. 3d-front: 3d furnished rooms with layouts and semantics. InICCV, pages 10933– 10942, 2021
2021
-
[12]
3d-future: 3d furniture shape with texture.International Journal of Computer Vision, 129(12):3313–3337, 2021
Huan Fu, Rongfei Jia, Lin Gao, Mingming Gong, Binqiang Zhao, Steve Maybank, and Dacheng Tao. 3d-future: 3d furniture shape with texture.International Journal of Computer Vision, 129(12):3313–3337, 2021
2021
-
[13]
Diffcad: Weakly-supervised probabilistic cad model retrieval and alignment from an rgb image
Daoyi Gao, Dávid Rozenberszki, Stefan Leutenegger, and Angela Dai. Diffcad: Weakly-supervised probabilistic cad model retrieval and alignment from an rgb image. ACM Transactions on Graphics, 43(4):1–15, 2024. 12DOBRE ET AL. : FURNSET
2024
-
[14]
Cat3D: Create anything in 3d with multi-view diffusion models
Ruiqi Gao, Aleksander Holynski, Philipp Henzler, Arthur Brussee, Ricardo Martin- Brualla, Pratul Srinivasan, Jonathan T. Barron, and Ben Poole. Cat3d: Create anything in 3d with multi-view diffusion models.arXiv preprint arXiv:2405.10314, 2024
-
[15]
Filterreg: Robust and efficient probabilistic point-set regis- tration using gaussian filter and twist parameterization
Wei Gao and Russ Tedrake. Filterreg: Robust and efficient probabilistic point-set regis- tration using gaussian filter and twist parameterization. InCVPR, pages 11095–11104, 2019
2019
-
[16]
Roca: Robust cad model retrieval and alignment from a single image
Can Gümeli, Angela Dai, and Matthias Nießner. Roca: Robust cad model retrieval and alignment from a single image. InCVPR, pages 4022–4031, 2022
2022
-
[17]
Classifier-Free Diffusion Guidance
Jonathan Ho and Tim Salimans. Classifier-free diffusion guidance.arXiv preprint arXiv:2207.12598, 2022
work page internal anchor Pith review arXiv 2022
-
[18]
Midi: Multi-instance diffu- sion for single image to 3d scene generation
Zehuan Huang, Yuan-Chen Guo, Xingqiao An, Yunhan Yang, Yangguang Li, Zi-Xin Zou, Ding Liang, Xihui Liu, Yan-Pei Cao, and Lu Sheng. Midi: Multi-instance diffu- sion for single image to 3d scene generation. InCVPR, pages 23646–23657, 2025
2025
-
[19]
Hunyuan3D Team, Shuhui Yang, Mingxin Yang, Yifei Feng, Xin Huang, Sheng Zhang, Zebin He, et al. Hunyuan3d 2.1: From images to high-fidelity 3d assets with production-ready pbr material.arXiv preprint arXiv:2506.15442, 2025
-
[20]
Repurposing diffusion-based image generators for monocular depth estimation
Bingxin Ke, Anton Obukhov, Shengyu Huang, Nando Metzger, Rodrigo Caye Daudt, and Konrad Schindler. Repurposing diffusion-based image generators for monocular depth estimation. InCVPR, pages 9492–9502, 2024
2024
-
[21]
3d gaussian splatting for real-time radiance field rendering.ACM Transactions on Graph- ics, 42(4), 2023
Bernhard Kerbl, Georgios Kopanas, Thomas Leimkühler, and George Drettakis. 3d gaussian splatting for real-time radiance field rendering.ACM Transactions on Graph- ics, 42(4), 2023
2023
-
[22]
Segment anything
Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, et al. Segment anything. InICCV, pages 4015–4026, 2023
2023
-
[23]
Instant3d: Fast text-to-3d with sparse-view gen- eration and large reconstruction model
Jiahao Li, Hao Tan, Kai Zhang, Zexiang Xu, Fujun Luan, Yinghao Xu, Yicong Hong, Kalyan Sunkavalli, Greg Shakhnarovich, and Sai Bi. Instant3d: Fast text- to-3d with sparse-view generation and large reconstruction model.arXiv preprint arXiv:2311.06214, 2023
-
[24]
Triposg: High-fidelity 3d shape synthesis using large-scale rec- tified flow models.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025
Yangguang Li, Zi-Xin Zou, Zexiang Liu, Dehu Wang, Yuan Liang, Zhipeng Yu, Xingchao Liu, et al. Triposg: High-fidelity 3d shape synthesis using large-scale rec- tified flow models.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025
2025
-
[25]
Depth Anything 3: Recovering the Visual Space from Any Views
Haotong Lin, Sili Chen, Junhao Liew, Donny Y . Chen, Zhenyu Li, Guang Shi, Jiashi Feng, and Bingyi Kang. Depth anything 3: Recovering the visual space from any views.arXiv preprint arXiv:2511.10647, 2025
work page internal anchor Pith review arXiv 2025
-
[26]
Yuchen Lin, Chenguo Lin, Panwang Pan, Honglei Yan, Yiqiang Feng, Yadong Mu, and Katerina Fragkiadaki. Partcrafter: Structured 3d mesh generation via compositional latent diffusion transformers.arXiv preprint arXiv:2506.05573, 2025
-
[27]
Yaron Lipman, Ricky T. Q. Chen, Heli Ben-Hamu, Maximilian Nickel, and Matt Le. Flow matching for generative modeling. InICLR, 2023. DOBRE ET AL. : FURNSET13
2023
-
[28]
Yuan Liu, Cheng Lin, Zijiao Zeng, Xiaoxiao Long, Lingjie Liu, Taku Komura, and Wenping Wang. Syncdreamer: Generating multiview-consistent images from a single- view image.arXiv preprint arXiv:2309.03453, 2023
-
[29]
Wonder3d: Single image to 3d using cross-domain diffusion
Xiaoxiao Long, Yuan-Chen Guo, Cheng Lin, Yuan Liu, Zhiyang Dou, Lingjie Liu, Yuexin Ma, et al. Wonder3d: Single image to 3d using cross-domain diffusion. In CVPR, pages 9970–9980, 2024
2024
-
[30]
Decoupled Weight Decay Regularization
Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization.arXiv preprint arXiv:1711.05101, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[31]
arXiv preprint arXiv:2508.15769 (2025)
Yanxu Meng, Haoning Wu, Ya Zhang, and Weidi Xie. Scenegen: Single-image 3d scene generation in one feedforward pass.arXiv preprint arXiv:2508.15769, 2025
-
[32]
Srinivasan, Matthew Tancik, Jonathan T
Ben Mildenhall, Pratul P. Srinivasan, Matthew Tancik, Jonathan T. Barron, Ravi Ra- mamoorthi, and Ren Ng. Nerf: Representing scenes as neural radiance fields for view synthesis.Communications of the ACM, 65(1):99–106, 2021
2021
-
[33]
Total3dunderstanding: Joint layout, object pose and mesh reconstruction for indoor scenes from a single image
Yinyu Nie, Xiaoguang Han, Shihui Guo, Yujian Zheng, Jian Chang, and Jian Jun Zhang. Total3dunderstanding: Joint layout, object pose and mesh reconstruction for indoor scenes from a single image. InCVPR, pages 55–64, 2020
2020
-
[34]
Scalable diffusion models with transformers
William Peebles and Saining Xie. Scalable diffusion models with transformers. In ICCV, pages 4195–4205, 2023
2023
-
[35]
Convolutional occupancy networks
Songyou Peng, Michael Niemeyer, Lars Mescheder, Marc Pollefeys, and Andreas Geiger. Convolutional occupancy networks. InECCV, pages 523–540. Springer, 2020
2020
-
[36]
High-resolution image synthesis with latent diffusion models
Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Om- mer. High-resolution image synthesis with latent diffusion models. InCVPR, pages 10684–10695, 2022
2022
-
[37]
Retrievalfuse: Neural 3d scene reconstruction with a database
Yawar Siddiqui, Justus Thies, Fangchang Ma, Qi Shan, Matthias Nießner, and Angela Dai. Retrievalfuse: Neural 3d scene reconstruction with a database. InICCV, pages 12568–12577, 2021
2021
-
[38]
Layoutvlm: Differentiable optimization of 3d layout via vision-language models
Fan-Yun Sun, Weiyu Liu, Siyi Gu, Dylan Lim, Goutam Bhat, Federico Tombari, Man- ling Li, Nick Haber, and Jiajun Wu. Layoutvlm: Differentiable optimization of 3d layout via vision-language models. InCVPR, pages 29469–29478, 2025
2025
-
[39]
Recent advances in 3d object and scene generation: A survey,
Xiang Tang, Ruotong Li, and Xiaopeng Fan. Recent advances in 3d object and scene generation: A survey.arXiv preprint arXiv:2504.11734, 2025
-
[40]
Vggt: Visual geometry grounded transformer
Jianyuan Wang, Minghao Chen, Nikita Karaev, Andrea Vedaldi, Christian Rupprecht, and David Novotny. Vggt: Visual geometry grounded transformer. InCVPR, pages 5294–5306, 2025
2025
-
[41]
Qirui Wu, Denys Iliash, Daniel Ritchie, Manolis Savva, and Angel X. Chang. Diorama: Unleashing zero-shot single-view 3d indoor scene modeling. InICCV, pages 8896– 8907, 2025
2025
-
[42]
Rundi Wu, Ruoshi Liu, Carl V ondrick, and Changxi Zheng. Sin3dm: Learning a diffu- sion model from a single 3d textured shape.arXiv preprint arXiv:2305.15399, 2023. 14DOBRE ET AL. : FURNSET
-
[43]
Direct3d: Scalable image-to-3d generation via 3d latent diffusion trans- former
Shuang Wu, Youtian Lin, Feihu Zhang, Yifei Zeng, Jingxi Xu, Philip Torr, Xun Cao, and Yao Yao. Direct3d: Scalable image-to-3d generation via 3d latent diffusion trans- former. InNeurIPS, volume 37, pages 121859–121881, 2024
2024
-
[44]
Amodal3r: Amodal 3d reconstruction from occluded 2d images
Tianhao Wu, Chuanxia Zheng, Frank Guan, Andrea Vedaldi, and Tat-Jen Cham. Amodal3r: Amodal 3d reconstruction from occluded 2d images. InICCV, pages 9181– 9193, 2025
2025
-
[45]
Structured 3d latents for scalable and ver- satile 3d generation
Jianfeng Xiang, Zelong Lv, Sicheng Xu, Yu Deng, Ruicheng Wang, Bowen Zhang, Dong Chen, Xin Tong, and Jiaolong Yang. Structured 3d latents for scalable and ver- satile 3d generation. InCVPR, pages 21469–21480, 2025
2025
-
[46]
Depth anything v2
Lihe Yang, Bingyi Kang, Zilong Huang, Zhen Zhao, Xiaogang Xu, Jiashi Feng, and Hengshuang Zhao. Depth anything v2. InNeurIPS, 2024
2024
-
[47]
Cast: Component-aligned 3d scene reconstruction from an rgb image.ACM Transactions on Graphics, 44(4), 2025
Kaixin Yao, Longwen Zhang, Xinhao Yan, Yan Zeng, Qixuan Zhang, Lan Xu, Wei Yang, Jiayuan Gu, and Jingyi Yu. Cast: Component-aligned 3d scene reconstruction from an rgb image.ACM Transactions on Graphics, 44(4), 2025
2025
-
[48]
Scannet++: A high-fidelity dataset of 3d indoor scenes
Chandan Yeshwanth, Yueh-Cheng Liu, Matthias Nießner, and Angela Dai. Scannet++: A high-fidelity dataset of 3d indoor scenes. InICCV, pages 12–22, 2023
2023
-
[49]
Metascenes: Towards automated replica creation for real-world 3d scans
Huangyue Yu, Baoxiong Jia, Yixin Chen, Yandan Yang, Puhao Li, Rongpeng Su, Jiaxin Li, et al. Metascenes: Towards automated replica creation for real-world 3d scans. In CVPR, pages 1667–1679, 2025
2025
-
[50]
Commonscenes: Generating commonsense 3d indoor scenes with scene graph diffusion
Guangyao Zhai, Evin Pınar Örnek, Shun-Cheng Wu, Yan Di, Federico Tombari, Nassir Navab, and Benjamin Busam. Commonscenes: Generating commonsense 3d indoor scenes with scene graph diffusion. InNeurIPS, volume 36, pages 30026–30038, 2023
2023
-
[51]
3dshape2vecset: A 3d shape representation for neural fields and generative diffusion models.ACM Trans- actions on Graphics, 42(4), 2023
Biao Zhang, Jiapeng Tang, Matthias Niessner, and Peter Wonka. 3dshape2vecset: A 3d shape representation for neural fields and generative diffusion models.ACM Trans- actions on Graphics, 42(4), 2023
2023
-
[52]
arXiv preprint arXiv:2507.14501 , year=
Jiahui Zhang, Yuelei Li, Anpei Chen, Muyu Xu, Kunhao Liu, Jianyuan Wang, Xiao- Xiao Long, et al. Advances in feed-forward 3d reconstruction and view synthesis: A survey.arXiv preprint arXiv:2507.14501, 2025
-
[53]
Clay: A controllable large-scale generative model for creating high-quality 3d assets.ACM Transactions on Graphics, 43(4):1–20, 2024
Longwen Zhang, Ziyu Wang, Qixuan Zhang, Qiwei Qiu, Anqi Pang, Haoran Jiang, Wei Yang, Lan Xu, and Jingyi Yu. Clay: A controllable large-scale generative model for creating high-quality 3d assets.ACM Transactions on Graphics, 43(4):1–20, 2024
2024
-
[54]
Depr: Depth-guided single-view scene reconstruction with instance- level diffusion
Qingcheng Zhao, Xiang Zhang, Haiyang Xu, Zeyuan Chen, Jianwen Xie, Yuan Gao, and Zhuowen Tu. Depr: Depth-guided single-view scene reconstruction with instance- level diffusion. InICCV, pages 5722–5733, 2025
2025
-
[55]
Junwei Zhou and Yu-Wing Tai. Amodalgen3d: Generative amodal 3d object recon- struction from sparse unposed views.arXiv preprint arXiv:2511.21945, 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.