Recognition: 2 theorem links
· Lean TheoremAffostruction: 3D Affordance Grounding with Generative Reconstruction
Pith reviewed 2026-05-16 14:48 UTC · model grok-4.3
The pith
A generative model reconstructs full 3D object geometry from partial RGBD views and locates action-specific regions on both visible and hidden surfaces.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Affostruction reconstructs complete object geometry from partial RGBD observations via sparse voxel fusion of multi-view features and grounds affordances on the full shape including unobserved regions, using a flow-based formulation to capture inherent ambiguity in affordance distributions together with active view selection guided by predicted affordances.
What carries the argument
Sparse voxel fusion of multi-view features for generative reconstruction, paired with flow-based modeling of affordance ambiguity and active view selection.
If this is right
- Affordance grounding extends to surfaces not observed in the input RGBD images.
- Reconstruction maintains constant computational cost even as the number of input views grows.
- Active view selection uses initial affordance predictions to choose additional observations that improve final grounding accuracy.
- Flow-based modeling expresses the probabilistic nature of suitable action regions rather than single deterministic maps.
Where Pith is reading between the lines
- Robotic planning could use these completed shapes to simulate grasps or placements on back sides of objects before physical interaction.
- The same reconstruction pipeline might support incremental mapping when an agent circles an object and collects new views over time.
- Text queries could be refined by large language models to produce more precise affordance distributions within the generated geometry.
Load-bearing premise
The generative reconstruction accurately fills in geometry for unseen object parts so that affordance predictions on those parts stay reliable.
What would settle it
Objects with substantial occluded geometry where the reconstructed shape differs markedly from ground truth and produces affordance maps that disagree with human annotations on the hidden surfaces.
Figures
read the original abstract
This paper addresses the problem of affordance grounding from RGBD images of an object, which aims to localize surface regions corresponding to a text query that describes an action on the object. While existing methods predict affordance regions only on visible surfaces, we propose Affostruction, a generative framework that reconstructs complete object geometry from partial RGBD observations and grounds affordances on the full shape including unobserved regions. Our approach introduces sparse voxel fusion of multi-view features for constant-complexity generative reconstruction, a flow-based formulation that captures the inherent ambiguity of affordance distributions, and an active view selection strategy guided by predicted affordances. Affostruction outperforms existing methods by large margins on challenging benchmarks, achieving 19.1 aIoU on affordance grounding and 32.67 IoU for 3D reconstruction.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes Affostruction, a generative framework for 3D affordance grounding from partial RGBD observations. It reconstructs complete object geometry via sparse voxel fusion of multi-view features, models affordance distributions with a flow-based formulation to handle ambiguity, and uses active view selection guided by predicted affordances. The central claim is that this enables reliable affordance grounding on the full shape including unobserved regions, yielding large gains over prior methods (19.1 aIoU on grounding, 32.67 IoU on reconstruction).
Significance. If the reconstruction of unobserved geometry proves sufficiently accurate and the affordance predictions on those regions are validated, the work would advance 3D scene understanding for interaction tasks by moving beyond visible-surface limitations. The constant-complexity sparse fusion and flow-based ambiguity modeling are technically interesting contributions that could influence downstream robotics and AR applications.
major comments (2)
- [Abstract and §4] Abstract and §4 (experiments): the reported 19.1 aIoU improvement is presented as evidence that affordances are successfully grounded on unobserved reconstructed geometry, yet standard benchmarks annotate only visible surfaces from the input RGBD views. No direct GT annotations, per-region (visible vs. hallucinated) breakdown, or uncertainty-aware metrics are described for the unobserved voxels, despite the reconstruction IoU of 32.67 indicating non-negligible error.
- [§3.1] §3.1 (reconstruction module): the sparse voxel fusion is claimed to produce complete geometry suitable for downstream affordance grounding, but the manuscript provides no ablation isolating the effect of reconstruction accuracy on affordance aIoU specifically in unobserved regions, leaving the load-bearing assumption untested.
minor comments (2)
- [§3.2] Notation for the flow-based affordance model could be clarified with an explicit equation for the conditional density in the methods section.
- Figure captions should explicitly label which surfaces are input-visible versus reconstructed to aid reader interpretation of qualitative results.
Simulated Author's Rebuttal
We thank the referee for the careful review and constructive criticism. The points raised about validating performance on unobserved geometry are important, and we will revise the manuscript to better address them while acknowledging the limitations of available ground-truth data.
read point-by-point responses
-
Referee: [Abstract and §4] Abstract and §4 (experiments): the reported 19.1 aIoU improvement is presented as evidence that affordances are successfully grounded on unobserved reconstructed geometry, yet standard benchmarks annotate only visible surfaces from the input RGBD views. No direct GT annotations, per-region (visible vs. hallucinated) breakdown, or uncertainty-aware metrics are described for the unobserved voxels, despite the reconstruction IoU of 32.67 indicating non-negligible error.
Authors: We agree that the reported aIoU is computed on visible surfaces as per the benchmarks, and there are no direct GT annotations for unobserved regions. This makes it difficult to directly quantify affordance grounding accuracy on hallucinated geometry. In the revised manuscript, we will clarify this in the abstract and experiments section, add qualitative results showing affordance predictions on reconstructed unobserved surfaces, and include uncertainty-aware analysis using the flow-based model to evaluate prediction confidence in those regions. We will also discuss how the reconstruction IoU impacts the reliability of these predictions. revision: partial
-
Referee: [§3.1] §3.1 (reconstruction module): the sparse voxel fusion is claimed to produce complete geometry suitable for downstream affordance grounding, but the manuscript provides no ablation isolating the effect of reconstruction accuracy on affordance aIoU specifically in unobserved regions, leaving the load-bearing assumption untested.
Authors: We acknowledge that an ablation specifically for unobserved regions would be ideal but is limited by the absence of GT affordance labels there. We will add an ablation study measuring the effect of reconstruction accuracy (e.g., our method vs. ground-truth geometry where possible for the full shape) on the overall affordance aIoU, and analyze the correlation with reconstruction quality to indirectly support the assumption. This will be included in §4. revision: partial
- Quantitative evaluation of affordance grounding specifically on unobserved regions due to lack of ground-truth annotations in the benchmarks.
Circularity Check
No circularity detected in derivation chain
full rationale
The paper introduces Affostruction as a new generative framework combining sparse voxel fusion for reconstruction, flow-based affordance modeling, and active view selection. No equations, definitions, or claims in the abstract or described components reduce by construction to fitted parameters, self-referential inputs, or load-bearing self-citations. Performance metrics (aIoU, IoU) are standard benchmarks applied to the outputs rather than being redefined within the method itself. The derivation chain remains self-contained as an independent technical proposal without the enumerated circular patterns.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Generative reconstruction from partial multi-view RGBD observations can produce accurate geometry for unobserved regions
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We extend TRELLIS with two key components: multi-view sparse voxel fusion that aggregates DINOv2 features conditioned on depth, and a flow-based affordance module that generates heatmaps from text queries.
-
IndisputableMonolith/Foundation/AlexanderDuality.leanalexander_duality_circle_linking unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Affostruction achieves 19.1 aIoU on affordance grounding (40.4% improvement) and 32.67 IoU for 3D reconstruction (67.7% improvement)
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Abo: Dataset and benchmarks for real-world 3d object un- derstanding
Jasmine Collins, Shubham Goel, Kenan Deng, Achlesh- war Luthra, Leon Xu, Erhan Gundogdu, Xi Zhang, Tomas F Yago Vicente, Thomas Dideriksen, Himanshu Arora, et al. Abo: Dataset and benchmarks for real-world 3d object un- derstanding. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 21126– 21136, 2022. 5
work page 2022
-
[2]
Angela Dai, Matthias Nießner, Michael Zollh ¨ofer, Shahram Izadi, and Christian Theobalt. Bundlefusion: Real-time globally consistent 3d reconstruction using on-the-fly surface reintegration.ACM Transactions on Graphics (ToG), 36(4): 1, 2017. 2
work page 2017
-
[3]
Matt Deitke, Ruoshi Liu, Matthew Wallingford, Huong Ngo, Oscar Michel, Aditya Kusupati, Alan Fan, Christian Laforte, Vikram V oleti, Samir Yitzhak Gadre, et al. Objaverse-xl: A universe of 10m+ 3d objects.Advances in Neural Informa- tion Processing Systems, 36:35799–35813, 2023. 5
work page 2023
-
[4]
3d affordancenet: A benchmark for visual object af- fordance understanding
Shengheng Deng, Xun Xu, Chaozheng Wu, Ke Chen, and Kui Jia. 3d affordancenet: A benchmark for visual object af- fordance understanding. InIEEE/CVF Conference on Com- puter Vision and Pattern Recognition (CVPR), pages 1778– 1787, 2021. 1, 2
work page 2021
-
[5]
Transmvsnet: Global context-aware multi-view stereo network with trans- formers
Yikang Ding, Wentao Yuan, Qingtian Zhu, Haotian Zhang, Xiangyue Liu, Yuanjiang Wang, and Xiao Liu. Transmvsnet: Global context-aware multi-view stereo network with trans- formers. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 8585–8594,
-
[6]
Huan Fu, Rongfei Jia, Lin Gao, Mingming Gong, Binqiang Zhao, Steve Maybank, and Dacheng Tao. 3d-future: 3d fur- niture shape with texture.International Journal of Computer Vision, 129(12):3313–3337, 2021. 5
work page 2021
-
[7]
Jun Gao, Tianchang Shen, Zian Wang, Wenzheng Chen, Kangxue Yin, Daiqing Li, Or Litany, Zan Gojcic, and Sanja Fidler. Get3d: A generative model of high quality 3d tex- tured shapes learned from images.Advances in neural infor- mation processing systems, 35:31841–31854, 2022. 2
work page 2022
-
[8]
Cascade cost volume for high-resolution multi-view stereo and stereo matching
Xiaodong Gu, Zhiwen Fan, Siyu Zhu, Zuozhuo Dai, Feitong Tan, and Ping Tan. Cascade cost volume for high-resolution multi-view stereo and stereo matching. InIEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 2495–2504, 2020. 2
work page 2020
-
[9]
Classifier-free diffusion guidance
Jonathan Ho and Tim Salimans. Classifier-free diffusion guidance. InNeurIPS 2021 Workshop on Deep Generative Models and Downstream Applications, 2021. 5, 3
work page 2021
-
[10]
Zero-shot multi-object scene completion
Shun Iwase, Katherine Liu, Vitor Guizilini, Adrien Gaidon, Kris Kitani, Rares ¸ Ambrus ¸, and Sergey Zakharov. Zero-shot multi-object scene completion. InEuropean Conference on Computer Vision, pages 96–113. Springer, 2024. 1
work page 2024
-
[11]
Synergies between affordance and geometry: 6- dof grasp detection via implicit representations
Zhenyu Jiang, Yifeng Zhu, Maxwell Svetlik, Kuan Fang, and Yuke Zhu. Synergies between affordance and geometry: 6- dof grasp detection via implicit representations. InRobotics: Science and Systems (RSS), 2021. 1
work page 2021
-
[12]
Shap-E: Generating Conditional 3D Implicit Functions
Heewoo Jun and Alex Nichol. Shap-e: Generat- ing conditional 3d implicit functions.arXiv preprint arXiv:2305.02463, 2023. 2, 5
work page internal anchor Pith review arXiv 2023
-
[13]
Mukul Khanna, Yongsen Mao, Hanxiao Jiang, Sanjay Haresh, Brennan Shacklett, Dhruv Batra, Alexander Clegg, Eric Undersander, Angel X Chang, and Manolis Savva. Habitat synthetic scenes dataset (hssd-200): An analysis of 3d scene scale and realism tradeoffs for objectgoal naviga- tion. InProceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern...
-
[14]
Matthew Klingensmith, Ivan Dryanovski, Siddhartha Srini- vasa, and Jianxiong Xiao. Chisel: Real time large scale 3d re- construction onboard a mobile device using spatially hashed signed distance fields. InRobotics: Science and Systems (RSS), 2015. 2
work page 2015
-
[15]
Junha Lee, Eunha Park, Chunghyun Park, Dahyun Kang, and Minsu Cho. Affogato: Learning open-vocabulary affordance grounding with automated data generation at scale.arXiv preprint arXiv:2506.12009, 2025. 2, 5, 6, 7, 3
-
[16]
Ground- ing image matching in 3d with mast3r
Vincent Leroy, Yohann Cabon, and Jerome Revaud. Ground- ing image matching in 3d with mast3r. InarXiv preprint arXiv:2406.09756, 2024. 2
-
[17]
Zhiqi Li, Yiming Chen, Lingzhe Zhao, and Peidong Liu. Mvcontrol: Adding conditional control to multi-view diffu- sion for controllable text-to-3d generation.arXiv preprint arXiv:2311.14494, 2023. 2
-
[18]
Magic3d: High-resolution text-to-3d content creation
Chen-Hsuan Lin, Jun Gao, Luming Tang, Towaki Takikawa, Xiaohui Zeng, Xun Huang, Karsten Kreis, Sanja Fidler, Ming-Yu Liu, and Tsung-Yi Lin. Magic3d: High-resolution text-to-3d content creation. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 300–309, 2023. 2
work page 2023
-
[19]
Flow matching for genera- tive modeling
Yaron Lipman, Ricky TQ Chen, Heli Ben-Hamu, Maxim- ilian Nickel, and Matthew Le. Flow matching for genera- tive modeling. InThe Eleventh International Conference on Learning Representations. 3
-
[20]
Structured 3D Latents for Scalable and Versatile 3D Generation
Jianfeng Liu, Xiaoshui Zeng, Zeyuan Wu, Yujun Lu, Yuan Li, Ming-Hsuan Chen, and Song-Hai Zhang. Structured 3d latents for scalable and versatile 3d generation.arXiv preprint arXiv:2412.01506, 2024. 2, 3, 4, 5, 7, 1
work page internal anchor Pith review arXiv 2024
-
[21]
Minghua Liu, Chao Xu, Haian Jin, Linghao Chen, Mukund Varma T, Zexiang Xu, and Hao Su. One-2-3-45: Any single image to 3d mesh in 45 seconds without per-shape optimiza- tion.Advances in Neural Information Processing Systems, 36:22226–22246, 2023. 2
work page 2023
-
[22]
Zero-1-to- 3: Zero-shot one image to 3d object
Ruoshi Liu, Rundi Wu, Basile Van Hoorick, Pavel Tok- makov, Sergey Zakharov, and Carl V ondrick. Zero-1-to- 3: Zero-shot one image to 3d object. InProceedings of the IEEE/CVF international conference on computer vision, pages 9298–9309, 2023. 2
work page 2023
-
[23]
Flow straight and fast: Learning to generate and transfer data with rectified flow
Xingchao Liu, Chengyue Gong, et al. Flow straight and fast: Learning to generate and transfer data with rectified flow. In The Eleventh International Conference on Learning Repre- sentations. 3, 1
-
[24]
SyncDreamer: Generating Multiview-consistent Images from a Single-view Image
Yuan Liu, Cheng Lin, Zijiao Zeng, Xiaoxiao Long, Lingjie Liu, Taku Komura, and Wenping Wang. Syncdreamer: Gen- erating multiview-consistent images from a single-view im- age.arXiv preprint arXiv:2309.03453, 2023. 2 7
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[25]
Decoupled weight de- cay regularization
Ilya Loshchilov and Frank Hutter. Decoupled weight de- cay regularization. InInternational Conference on Learning Representations, 2019. 5, 1
work page 2019
-
[26]
Neat: Learning neural implicit surfaces with arbitrary topologies from multi- view images
Xiaoxu Meng, Weikai Chen, and Bo Yang. Neat: Learning neural implicit surfaces with arbitrary topologies from multi- view images. InIEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 3034–3043, 2023. 1
work page 2023
-
[27]
V-net: Fully convolutional neural networks for volumetric medical image segmentation
Fausto Milletari, Nassir Navab, and Seyed-Ahmad Ahmadi. V-net: Fully convolutional neural networks for volumetric medical image segmentation. In2016 fourth international conference on 3D vision (3DV), pages 565–571. Ieee, 2016. 4
work page 2016
-
[28]
Where2act: From pixels to actions for articulated 3d objects
Kaichun Mo, Leonidas J Guibas, Mustafa Mukadam, Abhi- nav Gupta, and Shubham Tulsiani. Where2act: From pixels to actions for articulated 3d objects. InIEEE International Conference on Computer Vision (ICCV), pages 6813–6823,
-
[29]
Kinectfusion: Real-time dense surface mapping and tracking
Richard A Newcombe, Shahram Izadi, Otmar Hilliges, David Molyneaux, David Kim, Andrew J Davison, Pushmeet Kohi, Jamie Shotton, Steve Hodges, and Andrew Fitzgib- bon. Kinectfusion: Real-time dense surface mapping and tracking. InIEEE International Symposium on Mixed and Augmented Reality (ISMAR), pages 127–136. IEEE, 2011. 2
work page 2011
-
[30]
Open-vocabulary af- fordance detection in 3d point clouds
Toan Nguyen, Minh Nhat Vu, An Vuong, Dzung Nguyen, Thieu V o, Ngan Le, and Anh Nguyen. Open-vocabulary af- fordance detection in 3d point clouds. InIEEE/RSJ Interna- tional Conference on Intelligent Robots and Systems (IROS), pages 5692–5698. IEEE, 2023. 1, 2, 5, 6, 7
work page 2023
-
[31]
Language-conditioned affordance-pose detection in 3d point clouds
Toan Nguyen, Minh Nhat Vu, Baoru Huang, Tuan Van V o, Vy Truong, Ngan Le, Thieu V o, Bac Le, and Anh Nguyen. Language-conditioned affordance-pose detection in 3d point clouds. InIEEE International Conference on Robotics and Automation (ICRA), pages 4216–4223, 2024. 1, 2, 5, 6, 7
work page 2024
-
[32]
V oxblox: Incremental 3d eu- clidean signed distance fields for on-board mav planning
Helen Oleynikova, Zachary Taylor, Marius Fehr, Roland Siegwart, and Juan Nieto. V oxblox: Incremental 3d eu- clidean signed distance fields for on-board mav planning. IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 1366–1373, 2017. 2
work page 2017
-
[33]
Maxime Oquab, Timoth ´ee Darcet, Th´eo Moutakanni, Huy V V o, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel HAZIZA, Francisco Massa, Alaaeldin El-Nouby, et al. Dinov2: Learning robust visual features without su- pervision.Transactions on Machine Learning Research. 1, 3, 5
-
[34]
Affordancellm: Grounding affordance from vision language models
Shengyi Qian, Weifeng Chen, Min Bai, Xiong Zhou, Zhuowen Tu, and Li Erran Li. Affordancellm: Grounding affordance from vision language models. InIEEE/CVF Con- ference on Computer Vision and Pattern Recognition Work- shops (CVPRW), pages 5627–5637, 2024. 1
work page 2024
-
[35]
Learning transferable visual models from natural language supervi- sion
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural language supervi- sion. InInternational conference on machine learning, pages 8748–8763. PmLR, 2021. 2, 3, 4, 5, 1
work page 2021
-
[36]
Xcube: Large-scale 3d generative modeling using sparse voxel hierarchies
Jiakai Ren, Zehuan Liang, Xiang Feng, Yu-Guan Hwang, Yan-Pei Chen, Zeqi Liu, Xin Zhou, Chen Cao, Pan Gao, and Tobias Ritschel. Xcube: Large-scale 3d generative modeling using sparse voxel hierarchies. InIEEE Conference on Com- puter Vision and Pattern Recognition (CVPR), pages 7309– 7318, 2024. 2
work page 2024
-
[37]
MVDream: Multi-view Diffusion for 3D Generation
Yichun Shi, Peng Wang, Jianglong Ye, Mai Long, Kejie Li, and Xiao Yang. Mvdream: Multi-view diffusion for 3d gen- eration.arXiv preprint arXiv:2308.16512, 2023. 2
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[38]
Heng Su, Mengying Xie, Nieqing Cao, Yan Ding, Beichen Shao, Xianlei Long, Fuqiang Gu, and Chao Chen. Ova- fields: Weakly supervised open-vocabulary affordance fields for robot operational part detection. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 6385–6395, 2025. 1
work page 2025
-
[39]
Neuralrecon: Real-time coherent 3d re- construction from monocular video
Jiaming Sun, Yiming Xie, Linghao Chen, Xiaowei Zhou, and Hujun Bao. Neuralrecon: Real-time coherent 3d re- construction from monocular video. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 15598–15607, 2021. 1
work page 2021
-
[40]
Lgm: Large multi-view gaussian model for high- resolution 3d content creation
Jiaxiang Tang, Zhaoxi Ren, Hang Zhou, Ziwei Liu, and Gang Zeng. Lgm: Large multi-view gaussian model for high- resolution 3d content creation. InEuropean Conference on Computer Vision (ECCV), pages 381–399, 2024. 2, 5
work page 2024
-
[41]
Gemma Team, Aishwarya Kamath, Johan Ferret, Shreya Pathak, Nino Vieillard, Ramona Merhej, Sarah Perrin, Ta- tiana Matejovicova, Alexandre Ram ´e, Morgane Rivi `ere, et al. Gemma 3 technical report.arXiv preprint arXiv:2503.19786, 2025. 2
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[42]
Neus: Learning neural im- plicit surfaces by volume rendering for multi-view recon- struction
Peng Wang, Lingjie Liu, Yuan Liu, Christian Theobalt, Taku Komura, and Wenping Wang. Neus: Learning neural im- plicit surfaces by volume rendering for multi-view recon- struction. InAdvances in Neural Information Processing Systems (NeurIPS), pages 27171–27183, 2021. 1
work page 2021
-
[43]
Dust3r: Geometric 3d vi- sion made easy
Shuzhe Wang, Vincent Leroy, Yohann Cabon, Boris Chidlovskii, and Jerome Revaud. Dust3r: Geometric 3d vi- sion made easy. InIEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 20697–20709, 2024. 1, 2
work page 2024
-
[44]
Yian Wang, Ruihai Wu, Kaichun Mo, Jiaqi Ke, Qingnan Fan, Leonidas J Guibas, and Hao Dong. Adaafford: Learning to adapt manipulation affordance for 3d articulated objects via few-shot interactions. InEuropean Conference on Computer Vision (ECCV), pages 90–107, 2022. 1
work page 2022
-
[45]
Multiview compres- sive coding for 3d reconstruction
Chao-Yuan Wu, Justin Johnson, Jitendra Malik, Christoph Feichtenhofer, and Georgia Gkioxari. Multiview compres- sive coding for 3d reconstruction. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9065–9075, 2023. 2, 5, 7
work page 2023
-
[46]
Jiale Xu, Weihao Cheng, Yiming Gao, Xintao Wang, Shenghua Gao, and Ying Shan. Instantmesh: Efficient 3d mesh generation from a single image with sparse-view large reconstruction models. InarXiv preprint arXiv:2404.07191,
work page internal anchor Pith review Pith/arXiv arXiv
-
[47]
Dreamcomposer: Controllable 3d object generation via multi-view conditions
Yunhan Yang, Yukun Huang, Xiaoyang Wu, Yuan-Chen Guo, Song-Hai Zhang, Hengshuang Zhao, Tong He, and Xi- hui Liu. Dreamcomposer: Controllable 3d object generation via multi-view conditions. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8111–8120, 2024. 2 8
work page 2024
-
[48]
Mvsnet: Depth inference for unstructured multi-view stereo
Yao Yao, Zixin Luo, Shiwei Li, Tian Fang, and Long Quan. Mvsnet: Depth inference for unstructured multi-view stereo. InProceedings of the European conference on computer vi- sion (ECCV), pages 767–783, 2018. 1, 2
work page 2018
-
[49]
Recurrent mvsnet for high-resolution multi- view stereo depth inference
Yao Yao, Zixin Luo, Shiwei Li, Tianwei Shen, Tian Fang, and Long Quan. Recurrent mvsnet for high-resolution multi- view stereo depth inference. InIEEE Conference on Com- puter Vision and Pattern Recognition (CVPR), pages 5525– 5534, 2019. 2
work page 2019
-
[50]
Grounding 3d ob- ject affordance with language instructions, visual observa- tions and interactions
He Zhu, Quyu Kong, Kechun Xu, Xunlong Xia, Bing Deng, Jieping Ye, Rong Xiong, and Yue Wang. Grounding 3d ob- ject affordance with language instructions, visual observa- tions and interactions. InIEEE/CVF Conference on Com- puter Vision and Pattern Recognition (CVPR), 2025. 1 9
work page 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.