Web2Grasp: Learning Functional Grasps from Web Images of Hand-Object Interactions

Arthur Jakobsson; Homanga Bharadhwaj; Hongyi Chen; Jeffrey Ichnowski; Jiashun Wang; Ruihan Zhao; Shubham Tulsiani; Yufei Ye; Yunchao Yao; Zackory Erickson

read the original abstract

Functional grasping is essential for enabling dexterous multi-finger robot hands to manipulate objects effectively. Prior work largely focuses on power grasps, which only involve holding an object, or relies on in-domain demonstrations for specific objects. We propose leveraging human grasp information extracted from web images, which capture natural and functional hand-object interactions (HOI). Using a pretrained 3D reconstruction model, we recover 3D human HOI meshes from RGB images. To train on these noisy HOI data, we propose to use: (1) an interaction-centric model to learn the functional interaction pattern between hand and object, and (2) geometry-based filtering to remove the infeasible grasps and physical simulation to retain grasps who can resist disturbance. In IssacGym simulation, our model trained on reconstructed HOI grasps achieves a 75.8% success rate on objects from the web dataset and generalizes to unseen objects, outperforming baseline methods in both grasp success and functional quality. In real-world experiments with the LEAP hand and Inspire hand, it attains a 77.5% success rate across 12 objects, including challenging ones such as a syringe, spray bottle, knife, and tongs. Project website is at: https://web2grasp.github.io/.

Web2Grasp: Learning Functional Grasps from Web Images of Hand-Object Interactions

discussion (0)