Hoi! - A Multimodal Dataset for Force-Grounded, Cross-View Articulated Manipulation
Pith reviewed 2026-05-17 01:38 UTC · model grok-4.3
The pith
A new dataset records 3048 real interactions with 381 articulated objects using four embodiments that include force and tactile sensing.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central contribution is the Hoi! dataset, which couples synchronized video from multiple viewpoints with end-effector forces and tactile signals collected from four distinct embodiments on 381 articulated objects, thereby enabling direct study of cross-view transfer and force-grounded manipulation.
What carries the argument
The four-embodiment recording setup that provides aligned visual, force, and tactile data for the same physical interactions performed by a human hand, wrist-camera hand, UMI gripper, and custom Hoi! gripper.
If this is right
- Video-based methods can now be directly compared against those that also use force and tactile channels for the same interactions.
- Transfer learning experiments become possible between human demonstrations and robotic execution on identical objects and actions.
- Force prediction tasks can be added to standard manipulation benchmarks using the provided sensor streams.
- Policies for articulated objects can be trained and evaluated with explicit physical grounding rather than vision alone.
Where Pith is reading between the lines
- The dataset structure could support new benchmarks that quantify how much force information reduces the sim-to-real gap in policy learning.
- Extending similar recordings to additional object categories or longer interaction sequences would test whether the observed transfer patterns hold more broadly.
- The aligned multi-embodiment recordings offer a natural testbed for studying embodiment-invariant features in manipulation.
Load-bearing premise
That the force and tactile signals recorded from these specific embodiments accurately represent physical interactions that can transfer to train general robotic policies on articulated objects.
What would settle it
A model trained only on the human-hand embodiment data shows no improvement in force prediction or manipulation success when tested on the custom gripper embodiment with held-out objects.
Figures
read the original abstract
We present a dataset for force-grounded, cross-view articulated manipulation that couples what is seen with what is done and what is felt during real human interaction. The dataset contains 3048 sequences across 381 articulated objects in 38 environments. Each object is operated in four embodiments - (i) human hand, (ii) human hand with a wrist-mounted camera, (iii) handheld UMI gripper, and (iv) a custom Hoi! gripper, where the tool embodiment provides end-effector forces and tactile sensing. Our dataset offers a holistic view of interaction understanding from video, enabling researchers to evaluate how well methods transfer between human and robotic viewpoints, but also investigate underexplored modalities such as interaction forces. The Project Website can be found at https://timengelbracht.github.io/Hoi-Dataset-Website/.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents the Hoi! dataset for force-grounded, cross-view articulated manipulation. It comprises 3048 sequences across 381 articulated objects in 38 environments, captured in four embodiments: human hand, human hand with wrist-mounted camera, handheld UMI gripper, and custom Hoi! gripper. The latter two provide end-effector force and tactile sensing. The central contribution is the release of this multimodal resource to support evaluation of method transfer between human and robotic viewpoints and investigation of interaction forces.
Significance. If the data collection protocols, sensor calibration, and quality controls are rigorously documented and the dataset is released with reproducible access, this resource could meaningfully advance robotics research on articulated manipulation by filling a gap in paired visual-force-tactile data across embodiments. The cross-view and force-grounded aspects address underexplored areas in current manipulation datasets.
major comments (1)
- [§3] §3 (Data Collection): The manuscript provides insufficient detail on force/tactile sensor calibration, force range matching between the UMI and Hoi! grippers, and temporal synchronization with video streams. Without these, it is difficult to assess whether the recorded signals capture embodiment-independent interaction physics as needed to support the claimed utility for training general robotic policies.
minor comments (2)
- The project website URL is given but the manuscript should include a permanent DOI or direct download link for the dataset and code to ensure long-term accessibility.
- [§4] Figure captions and axis labels in the data statistics section could be clarified to explicitly state the number of sequences per embodiment.
Simulated Author's Rebuttal
We thank the referee for their constructive feedback, which highlights important aspects of our data collection that require clarification. We address the major comment point-by-point below and will revise the manuscript to incorporate additional technical details.
read point-by-point responses
-
Referee: [§3] §3 (Data Collection): The manuscript provides insufficient detail on force/tactile sensor calibration, force range matching between the UMI and Hoi! grippers, and temporal synchronization with video streams. Without these, it is difficult to assess whether the recorded signals capture embodiment-independent interaction physics as needed to support the claimed utility for training general robotic policies.
Authors: We agree that the current version of §3 provides insufficient detail on these points. In the revised manuscript we will expand the Data Collection section with a dedicated subsection on sensor calibration. This will describe the procedures for both the UMI and Hoi! grippers, including reference load-cell validation, zero-offset correction, and temperature compensation. We will also add explicit force-range information and matching: the UMI gripper uses a sensor with a 0–50 N range while the Hoi! gripper uses a 0–200 N range; we apply per-embodiment min-max normalization followed by a shared scaling factor derived from overlapping calibration trials to enable direct comparison of interaction forces. For temporal synchronization we will document the hardware trigger protocol (shared clock source with <5 ms measured jitter) and the software alignment routine based on event timestamps and cross-correlation of high-frequency force spikes with video frame changes. These additions, together with released calibration scripts, will allow readers to verify that the recorded signals reflect embodiment-independent physics suitable for cross-embodiment policy training. revision: yes
Circularity Check
Dataset release paper with no derivations or predictions
full rationale
The paper presents a new multimodal dataset for force-grounded articulated manipulation, describing data collection across 3048 sequences on 381 objects in four embodiments without any claimed mathematical derivations, model predictions, fitted parameters, or equations. The central contribution is the dataset itself and its release for enabling future research on cross-view and force modalities. No load-bearing steps reduce by construction to self-definitions, fitted inputs renamed as predictions, or self-citation chains; the work is self-contained as an empirical data resource independent of prior results.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We present a dataset for force-grounded, cross-view articulated manipulation that couples what is seen with what is done and what is felt during real human interaction.
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
The dataset contains 3048 sequences across 381 articulated objects... end-effector forces and tactile sensing.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 3 Pith papers
-
Tactile-based Multimodal Fusion in Embodied Intelligence: A Survey of Vision, Language, and Contact-Driven Paradigms
A survey proposing a hierarchical taxonomy for multimodal tactile fusion datasets and methods across perception, generation, and interaction in embodied intelligence.
-
World Action Models: The Next Frontier in Embodied AI
The paper introduces World Action Models as a new paradigm unifying predictive world modeling with action generation in embodied foundation models and provides a taxonomy of existing approaches.
-
World Model for Robot Learning: A Comprehensive Survey
A comprehensive survey that organizes the literature on world models in robot learning, their roles in policy learning, planning, simulation, and video-based generation, with connections to navigation, driving, datase...
Reference graph
Works this paper leans on
-
[1]
Qingwen Bu, Jisong Cai, Li Chen, Xiuqi Cui, Yan Ding, Siyuan Feng, Shenyuan Gao, Xindong He, Xu Huang, Shu Jiang, et al. Agibot world colosseo: A large-scale manipula- tion platform for scalable and intelligent embodied systems. CoRR, 2025. 3
work page 2025
-
[2]
Carlos Campos, Richard Elvira, Juan J. Gomez Rodriguez, Jose M. M. Montiel, and Juan D. Tardos. Orb-slam3: An accurate open-source library for visual, visual–inertial, and multimap slam.IEEE Transactions on Robotics, 37(6): 1874–1890, 2021. 5
work page 2021
-
[3]
Emerging Properties in Self-Supervised Vision Transformers
Mathilde Caron, Hugo Touvron, Ishan Misra, Herv ´e J´egou, Julien Mairal, Piotr Bojanowski, and Armand Joulin. Emerg- ing properties in self-supervised vision transformers.CoRR, abs/2104.14294, 2021. 7, 5
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[4]
Tianyi Cheng, Dandan Shan, Ayda Sultan, Richard E. L. Higgins, and David F. Fouhey. Towards a richer 2d under- standing of hands at scale. InProceedings of the 37th Inter- national Conference on Neural Information Processing Sys- tems, Red Hook, NY , USA, 2023. Curran Associates Inc. 7
work page 2023
-
[5]
Universal manipulation interface: In-the-wild robot teaching without in-the-wild robots, 2024
Cheng Chi, Zhenjia Xu, Chuer Pan, Eric Cousineau, Ben- jamin Burchfiel, Siyuan Feng, Russ Tedrake, and Shuran Song. Universal manipulation interface: In-the-wild robot teaching without in-the-wild robots, 2024. 4
work page 2024
-
[6]
Collins, Cody Houff, You Liang Tan, and Charles C
Jeremy A. Collins, Cody Houff, You Liang Tan, and Charles C. Kemp. Forcesight: Text-guided mobile manip- ulation with visual-force goals, 2023. 2, 7, 8, 6
work page 2023
-
[7]
Dima Damen, Hazel Doughty, Giovanni Maria Farinella, Antonino Furnari, Jian Ma, Evangelos Kazakos, Davide Moltisanti, Jonathan Munro, Toby Perrett, Will Price, and Michael Wray. Rescaling egocentric vision: Collection, pipeline and challenges for epic-kitchens-100.International Journal of Computer Vision (IJCV), 130:33–55, 2022. 2, 3
work page 2022
-
[8]
Epic-kitchens visor benchmark: Video segmenta- tions and object relations, 2022
Ahmad Darkhalil, Dandan Shan, Bin Zhu, Jian Ma, Amlan Kar, Richard Higgins, Sanja Fidler, David Fouhey, and Dima Damen. Epic-kitchens visor benchmark: Video segmenta- tions and object relations, 2022. 7
work page 2022
-
[9]
SceneFun3D: Fine-Grained Functionality and Affordance Understanding in 3D Scenes
Alexandros Delitzas, Ayca Takmaz, Federico Tombari, Robert Sumner, Marc Pollefeys, and Francis Engelmann. SceneFun3D: Fine-Grained Functionality and Affordance Understanding in 3D Scenes. InIEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024. 2
work page 2024
- [10]
-
[11]
Ben Eisner and Harry Zhang. Flowbot3d: Learning 3d ar- ticulation flow to manipulate articulated objects.Robotics Science and Systems 2022, 2022. 2
work page 2022
-
[12]
Project aria: A new tool for egocentric multi-modal ai research,
Jakob Engel, Kiran Somasundaram, Michael Goesele, Al- bert Sun, Alexander Gamino, Andrew Turner, Arjang Talat- tof, Arnie Yuan, Bilal Souti, Brighid Meredith, Cheng Peng, Chris Sweeney, Cole Wilson, Dan Barnes, Daniel DeTone, David Caruso, Derek Valleroy, Dinesh Ginjupalli, Duncan Frost, Edward Miller, Elias Mueggler, Evgeniy Oleinik, Fan Zhang, Guruprasa...
-
[13]
Rh20t: A comprehensive robotic dataset for learning diverse skills in one-shot, 2023
Hao-Shu Fang, Hongjie Fang, Zhenyu Tang, Jirong Liu, Chenxi Wang, Junbo Wang, Haoyi Zhu, and Cewu Lu. Rh20t: A comprehensive robotic dataset for learning diverse skills in one-shot, 2023. 3
work page 2023
-
[14]
Martin A. Fischler and Robert C. Bolles. Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography.Commun. ACM, 24(6):381–395, 1981. 4
work page 1981
-
[15]
Springer International Publishing, Cham,
Fadri Furrer, Marius Fehr, Tonci Novkovic, Hannes Sommer, Igor Gilitschenski, and Roland Siegwart.Evaluation of Com- bined Time-Offset Estimation and Hand-Eye Calibration on Robotic Datasets. Springer International Publishing, Cham,
- [16]
-
[17]
Ego4d: Around the world in 3,000 hours of egocentric video
Kristen Grauman, Andrew Westbury, Eugene Byrne, Zachary Chavis, Antonino Furnari, Rohit Girdhar, Jackson Hamburger, Hao Jiang, Miao Liu, Xingyu Liu, et al. Ego4d: Around the world in 3,000 hours of egocentric video. In Proceedings of the IEEE/CVF conference on computer vi- sion and pattern recognition, pages 18995–19012, 2022. 2, 3, 7
work page 2022
-
[18]
Kristen Grauman, Andrew Westbury, Lorenzo Torresani, Kris Kitani, Jitendra Malik, Triantafyllos Afouras, Kumar Ashutosh, Vijay Baiyya, Siddhant Bansal, Bikram Boote, Eugene Byrne, Zach Chavis, Joya Chen, Feng Cheng, Fu- Jen Chu, Sean Crane, Avijit Dasgupta, Jing Dong, Maria Escobar, Cristhian Forigua, Abrham Gebreselasie, Sanjay Haresh, Jing Huang, Md Moh...
work page 2024
-
[19]
Opening articulated structures in the real world,
Arjun Gupta, Michelle Zhang, Rishik Sathua, and Saurabh Gupta. Opening articulated structures in the real world,
-
[20]
Articulate3d: Holistic understanding of 3d scenes as universal scene de- scription
Anna-Maria Halacheva, Yang Miao, Jan-Nico Zaech, Xi Wang, Luc Van Gool, and Danda Pani Paudel. Articulate3d: Holistic understanding of 3d scenes as universal scene de- scription. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2025. 2
work page 2025
-
[21]
Carto: Category and joint agnostic reconstruction of articulated objects
Nick Heppert, Muhammad Zubair Irshad, Sergey Zakharov, Katherine Liu, Rares Andrei Ambrus, Jeannette Bohg, Ab- hinav Valada, and Thomas Kollar. Carto: Category and joint agnostic reconstruction of articulated objects. InProceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 21201–21210, 2023. 2
work page 2023
-
[22]
Sparsh: Self-supervised touch rep- resentations for vision-based tactile sensing
Carolina Higuera, Akash Sharma, Chaithanya Krishna Bod- duluri, Taosha Fan, Patrick Lancaster, Mrinal Kalakrishnan, Michael Kaess, Byron Boots, Mike Lambeta, Tingfan Wu, and Mustafa Mukadam. Sparsh: Self-supervised touch rep- resentations for vision-based tactile sensing. In8th Annual Conference on Robot Learning, 2024. 7, 5
work page 2024
-
[23]
Advait Jain and Charles C. Kemp. Improving robot manip- ulation with data-driven object-centric models of everyday forces.Autonomous Robots, 35(2):143–159, 2013. 2
work page 2013
-
[24]
Ditto: Building digital twins of articulated objects from interaction,
Zhenyu Jiang, Cheng-Chun Hsu, and Yuke Zhu. Ditto: Building digital twins of articulated objects from interaction,
-
[25]
Egomimic: Scaling imitation learning via egocentric video
Simar Kareer, Dhruv Patel, Ryan Punamiya, Pranay Mathur, Shuo Cheng, Chen Wang, Judy Hoffman, and Danfei Xu. Egomimic: Scaling imitation learning via egocentric video. In2025 IEEE International Conference on Robotics and Au- tomation (ICRA), pages 13226–13233. IEEE, 2025. 3
work page 2025
-
[26]
Mapanything: Universal feed- forward metric 3d reconstruction, 2025
Nikhil Keetha, Norman M ¨uller, Johannes Sch ¨onberger, Lorenzo Porzi, Yuchen Zhang, Tobias Fischer, Arno Knapitsch, Duncan Zauss, Ethan Weber, Nelson Antunes, Jonathon Luiten, Manuel Lopez-Antequera, Samuel Rota Bul`o, Christian Richardt, Deva Ramanan, Sebastian Scherer, and Peter Kontschieder. Mapanything: Universal feed- forward metric 3d reconstructio...
work page 2025
-
[27]
Droid: A large-scale in-the-wild robot manipulation dataset
Alexander Khazatsky, Karl Pertsch, Suraj Nair, Ashwin Bal- akrishna, Sudeep Dasari, Siddharth Karamcheti, Soroush Nasiriany, Mohan Kumar Srirama, Lawrence Yunliang Chen, Kirsty Ellis, et al. Droid: A large-scale in-the-wild robot manipulation dataset. InRSS 2024 Workshop: Data Generation for Robotics. 3
work page 2024
-
[28]
Phantom: Training robots without robots using only human videos,
Marion Lepert, Jiaying Fang, and Jeannette Bohg. Phantom: Training robots without robots using only human videos,
-
[29]
Akb-48: A real-world articulated object knowledge base
Liu Liu, Wenqiang Xu, Haoyuan Fu, Sucheng Qian, Qiao- jun Yu, Yang Han, and Cewu Lu. Akb-48: A real-world articulated object knowledge base. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14809–14818, 2022. 2
work page 2022
-
[30]
Wenhai Liu, Junbo Wang, Yiming Wang, Weiming Wang, and Cewu Lu. Forcemimic: Force-centric imitation learning with force-motion capture system for contact-rich manipula- tion, 2025. 3
work page 2025
-
[31]
Artgs: Building interactable repli- cas of complex articulated objects via gaussian splatting,
Yu Liu, Baoxiong Jia, Ruijie Lu, Junfeng Ni, Song-Chun Zhu, and Siyuan Huang. Artgs: Building interactable repli- cas of complex articulated objects via gaussian splatting,
-
[32]
The rbo dataset of articulated objects and interactions, 2018
Roberto Mart ´ın-Mart´ın, Clemens Eppner, and Oliver Brock. The rbo dataset of articulated objects and interactions, 2018. 2, 3
work page 2018
-
[33]
Kaichun Mo, Shilin Zhu, Angel X Chang, Li Yi, Subarna Tripathi, Leonidas J Guibas, and Hao Su. Partnet: A large- scale benchmark for fine-grained and hierarchical part-level 3d object understanding. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 909–918, 2019. 2
work page 2019
-
[34]
R3m: A universal visual repre- sentation for robot manipulation
Suraj Nair, Aravind Rajeswaran, Vikash Kumar, Chelsea Finn, and Abhinav Gupta. R3m: A universal visual repre- sentation for robot manipulation. In6th Annual Conference on Robot Learning. 3
-
[35]
Dinov2: Learning robust visual features with- out supervision, 2024
Maxime Oquab, Timoth ´ee Darcet, Th ´eo Moutakanni, Huy V o, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, Mah- moud Assran, Nicolas Ballas, Wojciech Galuba, Russell Howes, Po-Yao Huang, Shang-Wen Li, Ishan Misra, Michael Rabbat, Vasu Sharma, Gabriel Synnaeve, Hu Xu, Herv ´e Je- gou, Julien Mairal, ...
work page 2024
-
[36]
Reconstruct- ing hands in 3D with transformers
Georgios Pavlakos, Dandan Shan, Ilija Radosavovic, Angjoo Kanazawa, David Fouhey, and Jitendra Malik. Reconstruct- ing hands in 3D with transformers. InCVPR, 2024. 6
work page 2024
-
[37]
Hd-epic: A highly-detailed egocentric video dataset
Toby Perrett, Ahmad Darkhalil, Saptarshi Sinha, Omar Emara, Sam Pollard, Kranti Kumar Parida, Kaiting Liu, Pra- jwal Gatti, Siddhant Bansal, Kevin Flanagan, et al. Hd-epic: A highly-detailed egocentric video dataset. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 23901–23913, 2025. 2, 3
work page 2025
-
[38]
Qi, Li Yi, Hao Su, and Leonidas J
Charles R. Qi, Li Yi, Hao Su, and Leonidas J. Guibas. Point- net++: Deep hierarchical feature learning on point sets in a metric space, 2017. 4
work page 2017
-
[39]
Sam 2: Segment anything in images and videos,
Nikhila Ravi, Valentin Gabeur, Yuan-Ting Hu, Ronghang Hu, Chaitanya Ryali, Tengyu Ma, Haitham Khedr, Roman R¨adle, Chloe Rolland, Laura Gustafson, Eric Mintun, Junt- ing Pan, Kalyan Vasudev Alwala, Nicolas Carion, Chao- Yuan Wu, Ross Girshick, Piotr Doll´ar, and Christoph Feicht- enhofer. Sam 2: Segment anything in images and videos,
-
[40]
Extending kalibr: Cali- brating the extrinsics of multiple imus and of individual axes
Joern Rehder, Janosch Nikolic, Thomas Schneider, Timo Hinzmann, and Roland Siegwart. Extending kalibr: Cali- brating the extrinsics of multiple imus and of individual axes. pages 4304–4311, 2016. 1
work page 2016
-
[41]
Ruochen Ren, Zhipeng Wang, Chaoyun Yang, Jiahang Liu, Rong Jiang, Yanmin Zhou, Shuo Jiang, and Bin He. Enhanc- ing robotic skill acquisition with multimodal sensory data: A novel dataset for kitchen tasks.Scientific Data, 12(1):476,
-
[42]
Orb: An efficient alternative to sift or surf
Ethan Rublee, Vincent Rabaud, Kurt Konolige, and Gary Bradski. Orb: An efficient alternative to sift or surf. In2011 International Conference on Computer Vision, pages 2564– 2571, 2011. 4
work page 2011
-
[43]
From coarse to fine: Robust hierarchical localization at large scale
Paul-Edouard Sarlin, Cesar Cadena, Roland Siegwart, and Marcin Dymczyk. From coarse to fine: Robust hierarchical localization at large scale. InCVPR, 2019. 5, 4
work page 2019
-
[44]
Reacto: Reconstructing articulated ob- jects from a single video, 2024
Chaoyue Song, Jiacheng Wei, Chuan-Sheng Foo, Guosheng Lin, and Fayao Liu. Reacto: Reconstructing articulated ob- jects from a single video, 2024. 2
work page 2024
-
[45]
J ¨urgen Sturm. Learning kinematic models of articulated ob- jects.Springer Tracts in Advanced Robotics, pages 65–111,
-
[46]
Learn- ing kinematic models for articulated objects
J ¨urgen Sturm, Vijay Pradeep, Cyrill Stachniss, Christian Plagemann, Kurt Konolige, and Wolfram Burgard. Learn- ing kinematic models for articulated objects. 2
-
[47]
Yufei Wang, Ziyu Wang, Mino Nakura, Pratik Bhowal, Chia- Liang Kuo, Yi-Ting Chen, Zackory Erickson, and David Held. Articubot: Learning universal articulated object ma- nipulation policy via large scale simulation, 2025. 3
work page 2025
-
[48]
Ar- ticulated object estimation in the wild
Abdelrhman Werby, Martin B ¨uchner, Adrian R ¨ofer, Chen- guang Huang, Wolfram Burgard, and Abhinav Valada. Ar- ticulated object estimation in the wild. InConference on Robot Learning (CoRL), 2025. 3, 6, 7
work page 2025
-
[49]
Ar- ticulated object estimation in the wild, 2025
Abdelrhman Werby, Martin B ¨uchner, Adrian R ¨ofer, Chen- guang Huang, Wolfram Burgard, and Abhinav Valada. Ar- ticulated object estimation in the wild, 2025. 2, 5, 6
work page 2025
-
[50]
Sapien: A simulated part-based interactive environment
Fanbo Xiang, Yuzhe Qin, Kaichun Mo, Yikuan Xia, Hao Zhu, Fangchen Liu, Minghua Liu, Hanxiao Jiang, Yifu Yuan, He Wang, et al. Sapien: A simulated part-based interactive environment. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 11097– 11107, 2020. 2
work page 2020
-
[51]
Robotube: Learning household manipulation from human videos with simulated twin environments
Haoyu Xiong, Haoyuan Fu, Jieyi Zhang, Chen Bao, Qiang Zhang, Yongxi Huang, Wenqiang Xu, Animesh Garg, and Cewu Lu. Robotube: Learning household manipulation from human videos with simulated twin environments. In6th An- nual Conference on Robot Learning, 2022. 3
work page 2022
-
[52]
Lihe Yang, Bingyi Kang, Zilong Huang, Zhen Zhao, Xiao- gang Xu, Jiashi Feng, and Hengshuang Zhao. Depth any- thing v2, 2024. 4
work page 2024
-
[53]
Yi Yang and Deva Ramanan. Articulated human detection with flexible mixtures of parts.IEEE Transactions on Pat- tern Analysis and Machine Intelligence, 35(12):2878–2890,
-
[54]
Open-vocabulary functional 3d scene graphs for real-world indoor spaces
Chenyangguang Zhang, Alexandros Delitzas, Fangjinhua Wang, Ruida Zhang, Xiangyang Ji, Marc Pollefeys, and Francis Engelmann. Open-vocabulary functional 3d scene graphs for real-world indoor spaces. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 19401–19413, 2025. 3
work page 2025
-
[55]
Zhao, Vikash Kumar, Sergey Levine, and Chelsea Finn
Tony Z. Zhao, Vikash Kumar, Sergey Levine, and Chelsea Finn. Learning fine-grained bimanual manipulation with low-cost hardware, 2023. 4 Hoi! - A Multimodal Dataset for Force-Grounded, Cross-View Articulated Manipulation Supplementary Material Contents
work page 2023
-
[56]
In-the-Wild Articulated Object Estimation
Evaluations 6 4.1. In-the-Wild Articulated Object Estimation . . . . . 6 4.2. Tactile Force Estimation . . . . . . . . . . . . . . 7 4.3. Visual Force Estimation . . . . . . . . . . . . . . . 7
-
[57]
Limitations & Future Work 8
-
[58]
Hoi! Gripper Calibration Details 1 A.1
Conclusions 8 A . Hoi! Gripper Calibration Details 1 A.1 . Motor Calibration . . . . . . . . . . . . . . . . . . 1 A.2 . Inter-Sensor Calibration . . . . . . . . . . . . . . . 1 A.3 . Gripper Gravity Compensation . . . . . . . . . . . 2 B . Alignment of Sensors in the Hoi! Dataset Recordings 2 B.1. Time Alignment . . . . . . . . . . . . . . . . . . . 2 B....
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.