Recognition: no theorem link
FunRec: Reconstructing Functional 3D Scenes from Egocentric Interaction Videos
Pith reviewed 2026-05-10 18:40 UTC · model grok-4.3
The pith
FunRec reconstructs interactable 3D digital twins of indoor scenes from ordinary egocentric RGB-D videos of human interactions.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
FunRec recovers interactable 3D scenes by processing in-the-wild human interaction sequences from egocentric RGB-D videos. It automatically discovers articulated parts, estimates their kinematic parameters, tracks their 3D motion, and reconstructs static and moving geometry in canonical space, yielding simulation-compatible meshes. The method operates without controlled multi-state captures or CAD priors and is evaluated on new real and simulated benchmarks where it reports large gains in segmentation, articulation accuracy, and reconstruction quality.
What carries the argument
Automatic discovery of articulated parts combined with kinematic parameter estimation from observed human-object interactions in egocentric video.
If this is right
- Part segmentation improves by up to 50 mIoU compared with prior methods.
- Articulation and pose errors drop by a factor of 5 to 10.
- Overall 3D reconstruction accuracy increases substantially.
- Reconstructed meshes can be exported directly as URDF or USD files for simulation.
- The output supports hand-guided affordance mapping and robot-scene interaction tasks.
Where Pith is reading between the lines
- Reconstruction from single-view interaction footage could reduce reliance on multi-camera rigs for indoor mapping.
- The same interaction signal might help infer object affordances that are invisible in static views.
- If the pipeline scales, it could let robots learn scene geometry by watching people use the environment rather than by direct scanning.
Load-bearing premise
Ordinary egocentric RGB-D interaction videos contain enough information to reliably discover and parameterize articulated object parts without controlled multi-state captures, CAD priors, or additional supervision.
What would settle it
A test sequence in which the reconstructed parts and kinematic parameters produce simulated motions that visibly mismatch the actual object movements recorded in the input video.
Figures
read the original abstract
We present FunRec, a method for reconstructing functional 3D digital twins of indoor scenes directly from egocentric RGB-D interaction videos. Unlike existing methods on articulated reconstruction, which rely on controlled setups, multi-state captures, or CAD priors, FunRec operates directly on in-the-wild human interaction sequences to recover interactable 3D scenes. It automatically discovers articulated parts, estimates their kinematic parameters, tracks their 3D motion, and reconstructs static and moving geometry in canonical space, yielding simulation-compatible meshes. Across new real and simulated benchmarks, FunRec surpasses prior work by a large margin, achieving up to +50 mIoU improvement in part segmentation, 5-10 times lower articulation and pose errors, and significantly higher reconstruction accuracy. We further demonstrate applications on URDF/USD export for simulation, hand-guided affordance mapping and robot-scene interaction.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces FunRec, a method for reconstructing functional 3D digital twins of indoor scenes from egocentric RGB-D interaction videos. It automatically discovers articulated parts, estimates kinematic parameters, tracks 3D motion, and reconstructs static and moving geometry in canonical space to produce simulation-compatible meshes. The method is claimed to work on in-the-wild sequences without controlled setups, multi-state captures, or CAD priors, and demonstrates large quantitative improvements on new benchmarks along with applications in simulation and robotics.
Significance. If the results hold, this work would be significant for enabling scalable, automatic reconstruction of interactable 3D scenes from casual egocentric videos. The reported gains of up to +50 mIoU in part segmentation and 5-10x lower errors in articulation and pose suggest a substantial advance over prior methods that require more controlled conditions. The provision of URDF/USD export and affordance mapping further enhances its practical utility for simulation and robot interaction.
major comments (2)
- [Abstract and Evaluation] Abstract and Evaluation: The abstract reports large quantitative gains (+50 mIoU, 5-10 times lower errors) but provides no method details, error analysis, or ablation studies. This makes it difficult to assess the validity of the claims without the full details in the evaluation section.
- [Method (likely §3 or §4)] Method (likely §3 or §4): The central assumption that egocentric RGB-D interaction videos contain sufficient information to uniquely determine articulated kinematics without additional priors is load-bearing. The inverse problem is underconstrained due to limited configuration space coverage in human interactions, egocentric viewpoint correlations, depth ambiguities, and occlusions. The paper should provide concrete evidence or tests showing how ambiguities are resolved without implicit priors.
minor comments (2)
- [Abstract] Clarify the exact nature of the new real and simulated benchmarks used for evaluation.
- [Evaluation] Ensure all quantitative results include standard deviations or statistical significance to support the 'large margin' claims.
Simulated Author's Rebuttal
Thank you for reviewing our manuscript and for the encouraging comments on its potential impact. We provide point-by-point responses to the major comments below.
read point-by-point responses
-
Referee: [Abstract and Evaluation] Abstract and Evaluation: The abstract reports large quantitative gains (+50 mIoU, 5-10 times lower errors) but provides no method details, error analysis, or ablation studies. This makes it difficult to assess the validity of the claims without the full details in the evaluation section.
Authors: The abstract serves as a high-level overview of the paper's contributions and key results, following standard academic conventions for brevity. Comprehensive method details are presented in Section 3, while Section 5 (Experiments) contains the full evaluation, including quantitative benchmarks on new real and simulated datasets, error breakdowns for articulation and pose, ablation studies on key components such as part discovery and motion tracking, and supporting analysis. These sections substantiate the claims made in the abstract. revision: no
-
Referee: [Method (likely §3 or §4)] Method (likely §3 or §4): The central assumption that egocentric RGB-D interaction videos contain sufficient information to uniquely determine articulated kinematics without additional priors is load-bearing. The inverse problem is underconstrained due to limited configuration space coverage in human interactions, egocentric viewpoint correlations, depth ambiguities, and occlusions. The paper should provide concrete evidence or tests showing how ambiguities are resolved without implicit priors.
Authors: We acknowledge that the inverse problem is challenging and potentially underconstrained. Our method addresses this through a joint optimization that leverages temporal motion consistency from interaction sequences, RGB-D observations, and interaction-induced constraints to discover parts and estimate kinematics without relying on CAD models or multi-state captures. Concrete evidence is provided via extensive quantitative results and ablations on diverse in-the-wild sequences demonstrating accurate recovery despite occlusions and limited motions, as well as comparisons showing large gains over prior methods. We have added further discussion in the revised manuscript on ambiguity handling and included additional failure case analysis. revision: partial
Circularity Check
No significant circularity; derivation introduces independent processing steps
full rationale
The abstract and claims describe a method that processes in-the-wild egocentric RGB-D videos to discover articulated parts, estimate kinematic parameters, track motion, and reconstruct canonical geometry without relying on controlled captures or CAD priors. No equations, fitted parameters renamed as predictions, or self-citation chains are exhibited in the provided text that would reduce any central claim to its own inputs by construction. The approach asserts new capabilities for simulation-compatible outputs and reports quantitative improvements over prior work, indicating the derivation chain remains self-contained against external benchmarks rather than tautological.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
HOT3D: Hand and Object Tracking in 3D from Ego- centric Multi-View Videos
Prithviraj Banerjee, Sindi Shkodrani, Pierre Moulon, Shreyas Hampali, Shangchen Han, Fan Zhang, Linguang Zhang, Jade Fountain, Edward Miller, Selen Basol, Richard Newcombe, Robert Wang, Jakob Julian Engel, and Tomas Hodan. HOT3D: Hand and Object Tracking in 3D from Ego- centric Multi-View Videos. InInternational Conference on Computer Vision and Pattern R...
work page 2025
-
[2]
Superansac: One ransac to rule them all
Daniel Barath. Superansac: One ransac to rule them all. arXiv preprint arXiv:2506.04803, 2025. 3, 5
-
[3]
Gilad Baruch, Zhuoyuan Chen, Afshin Dehghan, Tal Dimry, Yuri Feigin, Peter Fu, Thomas Gebauer, Brandon Joffe, Daniel Kurz, Arik Schwartz, et al. ARKitScenes: A Diverse Real-world Dataset for 3D Indoor Scene Understanding Us- ing Mobile RGB-D Data. InInternational Conference on Neural Information Processing Systems (NeurIPS), 2021. 1, 2
work page 2021
-
[4]
Lost & Found: Tracking Changes from Egocentric Observations in 3D Dynamic Scene Graphs
Tjark Behrens, Ren ´e Zurbr¨ugg, Marc Pollefeys, Zuria Bauer, and Hermann Blum. Lost & Found: Tracking Changes from Egocentric Observations in 3D Dynamic Scene Graphs. IEEE Robotics and Automation Letters (RA-L), 2025. 3
work page 2025
-
[5]
Ricardo J. G. B. Campello, Davoud Moulavi, and Joerg Sander. Density-Based Clustering Based on Hierarchical Density Estimates. InAdvances in Knowledge Discovery and Data Mining, 2013. 4
work page 2013
-
[6]
Matterport3D: Learning from RGB- D Data in Indoor Environments
Angel Chang, Angela Dai, Thomas Funkhouser, Maciej Hal- ber, Matthias Nießner, Manolis Savva, Shuran Song, Andy Zeng, and Yinda Zhang. Matterport3D: Learning from RGB- D Data in Indoor Environments. InInternational Conference on 3d Vision (3dV), 2017. 1, 2
work page 2017
-
[7]
ScanNet: Richly-Annotated 3D Reconstructions of Indoor Scenes
Angela Dai, Angel X Chang, Manolis Savva, Maciej Hal- ber, Thomas Funkhouser, and Matthias Nießner. ScanNet: Richly-Annotated 3D Reconstructions of Indoor Scenes. In International Conference on Computer Vision and Pattern Recognition (CVPR), 2017. 1, 2
work page 2017
-
[8]
Auto- mated Creation of Digital Cousins for Robust Policy Learn- ing
Tianyuan Dai, Josiah Wong, Yunfan Jiang, Chen Wang, Cem Gokmen, Ruohan Zhang, Jiajun Wu, and Li Fei-Fei. Auto- mated Creation of Digital Cousins for Robust Policy Learn- ing. InConference on Robot Learning (CoRL), 2024. 2
work page 2024
-
[9]
EPIC-KITCHENS VISOR Benchmark: VIdeo Seg- mentations and Object Relations
Ahmad Darkhalil, Dandan Shan, Bin Zhu, Jian Ma, Amlan Kar, Richard Higgins, Sanja Fidler, David Fouhey, and Dima Damen. EPIC-KITCHENS VISOR Benchmark: VIdeo Seg- mentations and Object Relations. InInternational Confer- ence on Neural Information Processing Systems (NeurIPS),
-
[10]
Google Deepmind. Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities.arXiv preprint arXiv:2507.06261, 2025. 3
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[11]
SceneFun3D: Fine-Grained Functionality and Affordance Understanding in 3D Scenes
Alexandros Delitzas, Ayca Takmaz, Federico Tombari, Robert Sumner, Marc Pollefeys, and Francis Engelmann. SceneFun3D: Fine-Grained Functionality and Affordance Understanding in 3D Scenes. InInternational Conference on Computer Vision and Pattern Recognition (CVPR), 2024. 2
work page 2024
-
[12]
RoMa: Robust Dense Feature Matching
Johan Edstedt, Qiyu Sun, Georg B ¨okman, M ˚arten Wadenb¨ack, and Michael Felsberg. RoMa: Robust Dense Feature Matching. InInternational Conference on Computer Vision and Pattern Recognition (CVPR), 2024. 3
work page 2024
-
[13]
Chengshu Li et al. BEHA VIOR-1K: A Human-Centered, Embodied AI Benchmark with 1,000 Everyday Activities and Realistic Simulation.arXiv preprint arXiv:2403.09227,
work page internal anchor Pith review arXiv
-
[14]
Zicong Fan, Omid Taheri, Dimitrios Tzionas, Muhammed Kocabas, Manuel Kaufmann, Michael J. Black, and Otmar Hilliges. ARCTIC: A Dataset for Dexterous Bimanual Hand- Object Manipulation. InInternational Conference on Com- puter Vision and Pattern Recognition (CVPR), 2023. 3
work page 2023
-
[15]
Black, Trevor Darrell, and Angjoo Kanazawa
Haiwen Feng, Junyi Zhang, Qianqian Wang, Yufei Ye, Pengcheng Yu, Michael J. Black, Trevor Darrell, and Angjoo Kanazawa. St4RTrack: Simultaneous 4D Reconstruction and Tracking in the World. InInternational Conference on Computer Vision and Pattern Recognition (CVPR), 2025. 3
work page 2025
-
[16]
GEOPARD: Geometric Pretraining for Articulation Prediction in 3D Shapes
Pradyumn Goyal, Dmitry Petrov, Sheldon Andrews, Yizhak Ben-Shabat, Hsueh-Ti Derek Liu, and Evangelos Kaloger- akis. GEOPARD: Geometric Pretraining for Articulation Prediction in 3D Shapes. InInternational Conference on Computer Vision (ICCV), 2025. 2
work page 2025
-
[17]
Interaction Replica: Tracking human–object interaction and scene changes from human motion
Vladimir Guzov, Julian Chibane, Riccardo Marin, Yannan He, Yunus Saracoglu, Torsten Sattler, and Gerard Pons-Moll. Interaction Replica: Tracking human–object interaction and scene changes from human motion. InInternational Confer- ence on 3d Vision (3dV), 2024. 3
work page 2024
-
[18]
Holistic Un- derstanding of 3D Scenes as Universal Scene Description
Anna-Maria Halacheva, Yang Miao, Jan-Nico Zaech, Xi Wang, Luc Van Gool, and Danda Pani Paudel. Holistic Un- derstanding of 3D Scenes as Universal Scene Description. In International Conference on Computer Vision (ICCV), 2025. 2
work page 2025
-
[19]
Adam W. Harley, Yang You, Xinglong Sun, Yang Zheng, Nikhil Raghuraman, Yunqi Gu, Sheldon Liang, Wen-Hsuan Chu, Achal Dave, Pavel Tokmakov, Suya You, Rares Am- brus, Katerina Fragkiadaki, and Leonidas J. Guibas. All- Tracker: Efficient Dense Point Tracking at High Resolution. InInternational Conference on Computer Vision (ICCV),
-
[20]
CARTO: Category and Joint Agnostic Reconstruction of ARTiculated Objects
Nick Heppert, Muhammad Zubair Irshad, Sergey Zakharov, Katherine Liu, Rares Andrei Ambrus, Jeannette Bohg, Ab- hinav Valada, and Thomas Kollar. CARTO: Category and Joint Agnostic Reconstruction of ARTiculated Objects. In International Conference on Computer Vision and Pattern Recognition (CVPR), 2023. 2
work page 2023
-
[21]
PREDATOR: Registration of 3D Point Clouds with Low Overlap
Shengyu Huang, Zan Gojcic, Mikhail Usvyatsov, and Kon- rad Schindler Andreas Wieser. PREDATOR: Registration of 3D Point Clouds with Low Overlap. InInternational Confer- ence on Computer Vision and Pattern Recognition (CVPR),
-
[22]
Zhening Huang, Xiaoyang Wu, Fangcheng Zhong, Heng- shuang Zhao, Matthias Nießner, and Joan Lasenby. LiteReal- ity: Graphics-Ready 3D Scene Reconstruction from RGB-D Scans.arXiv preprint arXiv:2507.02861, 2025. 2
-
[23]
Zhao Huang, Boyang Sun, Alexandros Delitzas, Jiaqi Chen, and Marc Pollefeys. React3d: Recovering articulations for interactive physical 3d scenes.IEEE Robotics and Automa- tion Letters (RA-L), 2026. 2 9
work page 2026
-
[24]
Barron, Mario Fritz, Kate Saenko, and Trevor Darrell
Allison Janoch, Sergey Karayev, Yangqing Jia, Jonathan T. Barron, Mario Fritz, Kate Saenko, and Trevor Darrell. A Category-level 3D Object Dataset: Putting the Kinect to Work. InInternational Conference on Computer Vision (ICCV) Workshops, 2011. 1
work page 2011
-
[25]
OPD: Single-view 3D Openable Part Detection
Hanxiao Jiang, Yongsen Mao, Manolis Savva, and Angel X Chang. OPD: Single-view 3D Openable Part Detection. In European Conference on Computer Vision (ECCV), 2022. 2
work page 2022
-
[26]
Ditto: Building Digital Twins of Articulated Objects from Interac- tion
Zhenyu Jiang, Cheng-Chun Hsu, and Yuke Zhu. Ditto: Building Digital Twins of Articulated Objects from Interac- tion. InInternational Conference on Computer Vision and Pattern Recognition (CVPR), 2022. 2
work page 2022
-
[27]
Co- Tracker: It is Better to Track Together
Nikita Karaev, Ignacio Rocco, Benjamin Graham, Natalia Neverova, Andrea Vedaldi, and Christian Rupprecht. Co- Tracker: It is Better to Track Together. InEuropean Confer- ence on Computer Vision (ECCV), 2024. 3
work page 2024
-
[28]
Co- Tracker3: Simpler and Better Point Tracking by Pseudo- Labelling Real Videos
Nikita Karaev, Iurii Makarov, Jianyuan Wang, Natalia Neverova, Andrea Vedaldi, and Christian Rupprecht. Co- Tracker3: Simpler and Better Point Tracking by Pseudo- Labelling Real Videos. InInternational Conference on Com- puter Vision (ICCV), 2025. 3, 7
work page 2025
-
[29]
Robot See Robot Do: Imitating Articulated Object Manipulation with Monocular 4D Reconstruction
Justin Kerr, Chung Min Kim, Mingxuan Wu, Brent Yi, Qian- qian Wang, Ken Goldberg, and Angjoo Kanazawa. Robot See Robot Do: Imitating Articulated Object Manipulation with Monocular 4D Reconstruction. InConference on Robot Learning (CoRL), 2024. 2, 3
work page 2024
-
[30]
Jeonghwan Kim, Jisoo Kim, Jeonghyeon Na, and Hanbyul Joo. Parahome: Parameterizing everyday home activities to- wards 3d generative modeling of human-object interactions. InInternational Conference on Computer Vision and Pattern Recognition (CVPR), 2025. 3
work page 2025
-
[31]
OneFormer3D: One transformer for Unified Point Cloud Segmentation
Maxim Kolodiazhnyi, Anna V orontsova, Anton Konushin, and Danila Rukhovich. OneFormer3D: One transformer for Unified Point Cloud Segmentation. InInternational Confer- ence on Computer Vision and Pattern Recognition (CVPR),
-
[32]
H2O: Two Hands Manipulating Objects for First Person Interaction Recognition
Taein Kwon, Bugra Tekin, Jan St ¨uhmer, Federica Bogo, and Marc Pollefeys. H2O: Two Hands Manipulating Objects for First Person Interaction Recognition. InInternational Con- ference on Computer Vision (ICCV), 2021. 3
work page 2021
-
[33]
Cubify Anything: Scaling In- door 3D Object Detection
Justin Lazarow, David Griffiths, Gefen Kohavi, Francisco Crespo, and Afshin Dehghan. Cubify Anything: Scaling In- door 3D Object Detection. InInternational Conference on Computer Vision and Pattern Recognition (CVPR), 2025. 2
work page 2025
-
[34]
Long Le, Jason Xie, William Liang, Hung-Ju Wang, Yue Yang, Yecheng Jason Ma, Kyle Vedder, Arjun Krishna, Di- nesh Jayaraman, and Eric Eaton. Articulate-anything: Auto- matic modeling of articulated objects via a vision-language foundation model. InInternational Conference on Learning Representations (ICLR), 2025. 2
work page 2025
-
[35]
MoSca: Dynamic Gaussian Fusion from Casual Videos via 4D Motion Scaffolds
Jiahui Lei, Yijia Weng, Adam Harley, Leonidas Guibas, and Kostas Daniilidis. MoSca: Dynamic Gaussian Fusion from Casual Videos via 4D Motion Scaffolds. InInternational Conference on Computer Vision and Pattern Recognition (CVPR), 2025. 3
work page 2025
-
[36]
MegaSaM: Accurate, Fast and Ro- bust Structure and Motion from Casual Dynamic Videos
Zhengqi Li, Richard Tucker, Forrester Cole, Qianqian Wang, Linyi Jin, Vickie Ye, Angjoo Kanazawa, Aleksander Holyn- ski, and Noah Snavely. MegaSaM: Accurate, Fast and Ro- bust Structure and Motion from Casual Dynamic Videos. In International Conference on Computer Vision and Pattern Recognition (CVPR), 2025. 3
work page 2025
-
[37]
PARIS: Part-level Reconstruction and Motion Analysis for Articu- lated Objects
Jiayi Liu, Ali Mahdavi-Amiri, and Manolis Savva. PARIS: Part-level Reconstruction and Motion Analysis for Articu- lated Objects. InInternational Conference on Computer Vi- sion (ICCV), 2023. 2
work page 2023
-
[38]
Survey on Modeling of Human-made Articulated Objects.arXiv preprint arXiv:2403.14937, 2025
Jiayi Liu, Manolis Savva, and Ali Mahdavi-Amiri. Survey on Modeling of Human-made Articulated Objects.arXiv preprint arXiv:2403.14937, 2025. 2
-
[39]
Xueyi Liu, Ji Zhang, Ruizhen Hu, Haibin Huang, He Wang, and Li Yi. Self-Supervised Category-Level Articulated Ob- ject Pose Estimation with Part-Level SE(3) Equivariance. InInternational Conference on Learning Representations (ICLR), 2023. 2
work page 2023
-
[40]
HOI4D: A 4D Egocentric Dataset for Category-Level Human-Object Interaction
Yunze Liu, Yun Liu, Che Jiang, Kangbo Lyu, Weikang Wan, Hao Shen, Boqiang Liang, Zhoujie Fu, He Wang, and Li Yi. HOI4D: A 4D Egocentric Dataset for Category-Level Human-Object Interaction. InInternational Conference on Computer Vision and Pattern Recognition (CVPR), 2022. 2, 3, 5, 6, 7
work page 2022
-
[41]
arXiv preprint arXiv:2509.17647 (2025) 2, 3
Yu Liu, Baoxiong Jia, Ruijie Lu, Chuyue Gan, Huayu Chen, Junfeng Ni, Song-Chun Zhu, and Siyuan Huang. Videoartgs: Building digital twins of articulated objects from monocular video.arXiv preprint arXiv:2509.17647, 2025. 3
-
[42]
Building Interactable Replicas of Complex Articulated Objects via Gaussian Splatting
Yu Liu, Baoxiong Jia, Ruijie Lu, Junfeng Ni, Song-Chun Zhu, and Siyuan Huang. Building Interactable Replicas of Complex Articulated Objects via Gaussian Splatting. InIn- ternational Conference on Learning Representations (ICLR),
-
[43]
Chang, Iro Laina, and Victor Adrian Prisacariu
Xianzheng Ma, Yash Bhalgat, Brandon Smart, Shuai Chen, Xinghui Li, Jian Ding, Jindong Gu, Dave Zhenyu Chen, Songyou Peng, Jia-Wang Bian, Philip H Torr, Marc Polle- feys, Matthias Nießner, Ian D Reid, Angel X. Chang, Iro Laina, and Victor Adrian Prisacariu. When LLMs step into the 3D World: A Survey and Meta-Analysis of 3D Tasks via Multi-modal Large Langu...
work page 2024
-
[44]
MultiScan: Scalable RGBD Scanning for 3D Environments With Articulated Objects
Yongsen Mao, Yiming Zhang, Hanxiao Jiang, Angel X Chang, and Manolis Savva. MultiScan: Scalable RGBD Scanning for 3D Environments With Articulated Objects. In International Conference on Neural Information Processing Systems (NeurIPS), 2022. 2
work page 2022
-
[45]
Indoor Scene Understanding in 2.5/3D for Autonomous Agents: A Survey.IEEE Access, 2018
Muzammal Naseer, Salman Khan, and Fatih Porikli. Indoor Scene Understanding in 2.5/3D for Autonomous Agents: A Survey.IEEE Access, 2018. 1
work page 2018
-
[46]
DELTA: Dense Efficient Long-Range 3D Tracking for Any Video
Tuan Ngo, Peiye Zhuang, Evangelos Kalogerakis, Chuang Gan, Sergey Tulyakov, Hsin-Ying Lee, and Chaoyang Wang. DELTA: Dense Efficient Long-Range 3D Tracking for Any Video. InInternational Conference on Learning Represen- tations (ICLR), 2025. 3
work page 2025
- [47]
- [48]
-
[49]
Recon- structing Hands in 3D With Transformers
Georgios Pavlakos, Dandan Shan, Ilija Radosavovic, Angjoo Kanazawa, David Fouhey, and Jitendra Malik. Recon- structing Hands in 3D With Transformers. InInternational Conference on Computer Vision and Pattern Recognition (CVPR), 2024. 8 10
work page 2024
-
[50]
iTACO: Interactable Digital Twins of Articulated Objects from Casu- ally Captured RGBD Videos
Weikun Peng, Jun Lv, Cewu Lu, and Manolis Savva. iTACO: Interactable Digital Twins of Articulated Objects from Casu- ally Captured RGBD Videos. InInternational Conference on 3d Vision (3dV), 2026. 3
work page 2026
-
[51]
Wilor: End-to-end 3d hand localization and reconstruction in-the-wild
Rolandos Alexandros Potamias, Jinglei Zhang, Jiankang Deng, and Stefanos Zafeiriou. Wilor: End-to-end 3d hand localization and reconstruction in-the-wild. InInternational Conference on Computer Vision and Pattern Recognition (CVPR), 2025. 8
work page 2025
-
[52]
PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation
Charles R Qi, Hao Su, Kaichun Mo, and Leonidas J Guibas. PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation. InInternational Conference on Computer Vision and Pattern Recognition (CVPR), 2017. 2
work page 2017
-
[53]
Deep Hugh V oting for 3D Object Detection in Point Clouds
Charles R Qi, Or Litany, Kaiming He, and Leonidas J Guibas. Deep Hugh V oting for 3D Object Detection in Point Clouds. InInternational Conference on Computer Vision (ICCV), 2019. 2
work page 2019
-
[54]
Frano Raji ˇc, Haofei Xu, Marko Mihajlovic, Siyuan Li, Irem Demir, Emircan G¨undo˘gdu, Lei Ke, Sergey Prokudin, Marc Pollefeys, and Siyu Tang. Multi-view 3d point tracking. In International Conference on Computer Vision (ICCV), 2025. 3
work page 2025
-
[55]
REACT3D: Recovering Articulations for Interactive Physical 3D Scenes
Nikhila Ravi, Valentin Gabeur, Yuan-Ting Hu, Ronghang Hu, Chaitanya Ryali, Tengyu Ma, Haitham Khedr, Roman R¨adle, Chloe Rolland, Laura Gustafson, Eric Mintun, Junt- ing Pan, Kalyan Vasudev Alwala, Nicolas Carion, Chao- Yuan Wu, Ross Girshick, Piotr Doll´ar, and Christoph Feicht- enhofer. SAM 2: Segment Anything in Images and Videos. arXiv preprint arXiv:...
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[56]
Bokui Shen, Fei Xia, Chengshu Li, Roberto Mart ´ın-Mart´ın, Linxi Fan, Guanzhi Wang, Claudia P ´erez-D’Arpino, Shya- mal Buch, Sanjana Srivastava, Lyne P. Tchapmi, Micael E. Tchapmi, Kent Vainio, Josiah Wong, Li Fei-Fei, and Silvio Savarese. iGibson 1.0: A Simulation Environment for Inter- active Tasks in Large Realistic Scenes. InIEEE/RSJ Interna- tional...
-
[57]
Indoor Segmentation and Support Inference from RGBD Images
Nathan Silberman, Derek Hoiem, Pushmeet Kohli, and Rob Fergus. Indoor Segmentation and Support Inference from RGBD Images. InEuropean Conference on Computer Vision (ECCV), 2012. 1
work page 2012
-
[58]
Lichtenberg, and Jianxiong Xiao
Shuran Song, Samuel P. Lichtenberg, and Jianxiong Xiao. SUN RGB-D: A RGB-D Scene Understanding Benchmark Suite. InInternational Conference on Computer Vision and Pattern Recognition (CVPR), 2015. 1
work page 2015
-
[59]
Tao Sun, Yan Hao, Shengyu Huang, Silvio Savarese, Konrad Schindler, Marc Pollefeys, and Iro Armeni. Nothing Stands Still: A Spatiotemporal Benchmark on 3D Point Cloud Reg- istration Under Large Geometric and Temporal Change.IS- PRS Journal of Photogrammetry and Remote Sensing, 2025. 2
work page 2025
-
[60]
OPDMulti: Openable Part Detection for Multiple Objects
Xiaohao Sun, Hanxiao Jiang, Manolis Savva, and Angel Chang. OPDMulti: Openable Part Detection for Multiple Objects. InInternational Conference on 3d Vision (3dV),
-
[61]
RIO: 3D Object Instance Re-Localization in Changing Indoor Environments
Johanna Wald, Armen Avetisyan, Nassir Navab, Federico Tombari, and Matthias Niessner. RIO: 3D Object Instance Re-Localization in Changing Indoor Environments. InInter- national Conference on Computer Vision (ICCV), 2019. 1, 2
work page 2019
-
[62]
Jikai Wang, Qifan Zhang, Yu-Wei Chao, Bowen Wen, Xi- aohu Guo, and Yu Xiang. HO-Cap: A Capture System and Dataset for 3D Reconstruction and Pose Tracking of Hand- Object Interaction. InInternational Conference on Neural Information Processing Systems (NeurIPS), 2025. 3
work page 2025
-
[63]
Tracking Everything Everywhere All at Once
Qianqian Wang, Yen-Yu Chang, Ruojin Cai, Zhengqi Li, Bharath Hariharan, Aleksander Holynski, and Noah Snavely. Tracking Everything Everywhere All at Once. InInterna- tional Conference on Computer Vision (ICCV), 2023. 3
work page 2023
-
[64]
Shape of Mo- tion: 4D Reconstruction from a Single Video
Qianqian Wang, Vickie Ye, Hang Gao, Weijia Zeng, Jake Austin, Zhengqi Li, and Angjoo Kanazawa. Shape of Mo- tion: 4D Reconstruction from a Single Video. InInterna- tional Conference on Computer Vision (ICCV), 2025. 3
work page 2025
-
[65]
Qianqian Wang, Yifei Zhang, Aleksander Holynski, Alexei A. Efros, and Angjoo Kanazawa. Continuous 3D Per- ception Model with Persistent State. InInternational Confer- ence on Computer Vision and Pattern Recognition (CVPR),
-
[66]
BundleTrack: 6D Pose Track- ing for Novel Objects without Instance or Category-Level 3D Models
B Wen and Kostas E Bekris. BundleTrack: 6D Pose Track- ing for Novel Objects without Instance or Category-Level 3D Models. InIEEE/RSJ International Conference on Intel- ligent Robots and Systems (IROS), 2021. 3, 7
work page 2021
-
[67]
BundleSDF: Neural 6-DoF Tracking and 3D Reconstruction of Unknown Objects
Bowen Wen, Jonathan Tremblay, Valts Blukis, Stephen Tyree, Thomas Muller, Alex Evans, Dieter Fox, Jan Kautz, and Stan Birchfield. BundleSDF: Neural 6-DoF Tracking and 3D Reconstruction of Unknown Objects. InInterna- tional Conference on Computer Vision and Pattern Recogni- tion (CVPR), 2023. 7, 8
work page 2023
-
[68]
FoundationPose: Unified 6D Pose Estimation and Tracking of Novel Objects
Bowen Wen, Wei Yang, Jan Kautz, and Stan Birchfield. FoundationPose: Unified 6D Pose Estimation and Tracking of Novel Objects. InInternational Conference on Computer Vision and Pattern Recognition (CVPR), 2024. 3
work page 2024
-
[69]
Neural Implicit Representation for Building Digital Twins of Un- known Articulated Objects
Yijia Weng, Bowen Wen, Jonathan Tremblay, Valts Blukis, Dieter Fox, Leonidas Guibas, and Stan Birchfield. Neural Implicit Representation for Building Digital Twins of Un- known Articulated Objects. InInternational Conference on Computer Vision and Pattern Recognition (CVPR), 2024. 2
work page 2024
-
[70]
Artic- ulated object estimation in the wild
Abdelrhman Werby, Martin Buechner, Adrian Roefer, Chen- guang Huang, Wolfram Burgard, and Abhinav Valada. Artic- ulated object estimation in the wild. InConference on Robot Learning (CoRL), 2025. 3
work page 2025
-
[71]
Di Wu, Liu Liu, Zhou Linli, Anran Huang, Liangtu Song, Qiaojun Yu, Qi Wu, and Cewu Lu. Reartgs: Reconstruct- ing and generating articulated objects via 3d gaussian splat- ting with geometric and motion constraints. InInterna- tional Conference on Neural Information Processing Sys- tems (NeurIPS), 2025. 2
work page 2025
-
[72]
Predict- Optimize-Distill: A Self-Improving Cycle for 4D Object Un- derstanding
Mingxuan Wu, Huang Huang, Justin Kerr, Chung Min Kim, Anthony Zhang, Brent Yi, and Angjoo Kanazawa. Predict- Optimize-Distill: A Self-Improving Cycle for 4D Object Un- derstanding. InInternational Conference on Computer Vi- sion (ICCV), 2025. 2, 3
work page 2025
-
[73]
Drawer: Digital Reconstruction and Articulation with Environment Realism
Hongchi Xia, Entong Su, Marius Memmel, Arhan Jain, Ray- mond Yu, Numfor Mbiziwo-Tiapo, Ali Farhadi, Abhishek 11 Gupta, Shenlong Wang, and Wei-Chiu Ma. Drawer: Digital Reconstruction and Articulation with Environment Realism. InInternational Conference on Computer Vision and Pattern Recognition (CVPR), 2025. 2
work page 2025
-
[74]
SUN3D: A Database of Big Spaces Reconstructed Using SfM and Object Labels
Jianxiong Xiao, Andrew Owens, and Antonio Torralba. SUN3D: A Database of Big Spaces Reconstructed Using SfM and Object Labels. InInternational Conference on Computer Vision (ICCV), 2013. 1
work page 2013
-
[75]
SpatialTracker: Tracking Any 2D Pixels in 3D Space
Yuxi Xiao, Qianqian Wang, Shangzhan Zhang, Nan Xue, Sida Peng, Yujun Shen, and Xiaowei Zhou. SpatialTracker: Tracking Any 2D Pixels in 3D Space. InInternational Conference on Computer Vision and Pattern Recognition (CVPR), 2024. 3
work page 2024
-
[76]
Spatialtrackerv2: Advancing 3d point tracking with explicit camera motion
Yuxi Xiao, Jianyuan Wang, Nan Xue, Nikita Karaev, Yuri Makarov, Bingyi Kang, Xing Zhu, Hujun Bao, Yujun Shen, and Xiaowei Zhou. Spatialtrackerv2: Advancing 3d point tracking with explicit camera motion. InInternational Con- ference on Computer Vision (ICCV), 2025. 3, 7, 8
work page 2025
-
[77]
ScanNet++: A High-fidelity Dataset of 3D Indoor Scenes
Chandan Yeshwanth, Yueh-Cheng Liu, Matthias Nießner, and Angela Dai. ScanNet++: A High-fidelity Dataset of 3D Indoor Scenes. InInternational Conference on Computer Vision (ICCV), 2023. 1, 2
work page 2023
-
[78]
METASCENES: Towards Automated Replica Creation for Real-world 3D Scans
Huangyue Yu, Baoxiong Jia, Yixin Chen, Yandan Yang, Puhao Li, Rongpeng Su, Jiaxin Li, Qing Li, Wei Liang, Zhu Song-Chun, Tengyu Liu, and Siyuan Huang. METASCENES: Towards Automated Replica Creation for Real-world 3D Scans. InInternational Conference on Com- puter Vision and Pattern Recognition (CVPR), 2025. 2
work page 2025
-
[79]
Self- Supervised Monocular 4D Scene Reconstruction for Ego- centric Videos
Chengbo Yuan, Geng Chen, Li Yi, and Yang Gao. Self- Supervised Monocular 4D Scene Reconstruction for Ego- centric Videos. InInternational Conference on Computer Vision (ICCV), 2025. 3
work page 2025
-
[80]
TAPIP3D: Tracking Any Point in Persistent 3D Ge- ometry
Bowei Zhang, Lei Ke, Adam W Harley, and Katerina Fragki- adaki. TAPIP3D: Tracking Any Point in Persistent 3D Ge- ometry. InInternational Conference on Neural Information Processing Systems (NeurIPS), 2025. 3
work page 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.