pith. sign in

arxiv: 2505.05517 · v3 · pith:ZP7ZGCLHnew · submitted 2025-05-07 · 💻 cs.CV · cs.LG· cs.RO

Web2Grasp: Learning Functional Grasps from Web Images of Hand-Object Interactions

classification 💻 cs.CV cs.LGcs.RO
keywords functionalgraspsobjectshandimagesmodelsuccessgrasp
0
0 comments X
read the original abstract

Functional grasping is essential for enabling dexterous multi-finger robot hands to manipulate objects effectively. Prior work largely focuses on power grasps, which only involve holding an object, or relies on in-domain demonstrations for specific objects. We propose leveraging human grasp information extracted from web images, which capture natural and functional hand-object interactions (HOI). Using a pretrained 3D reconstruction model, we recover 3D human HOI meshes from RGB images. To train on these noisy HOI data, we propose to use: (1) an interaction-centric model to learn the functional interaction pattern between hand and object, and (2) geometry-based filtering to remove the infeasible grasps and physical simulation to retain grasps who can resist disturbance. In IssacGym simulation, our model trained on reconstructed HOI grasps achieves a 75.8% success rate on objects from the web dataset and generalizes to unseen objects, outperforming baseline methods in both grasp success and functional quality. In real-world experiments with the LEAP hand and Inspire hand, it attains a 77.5% success rate across 12 objects, including challenging ones such as a syringe, spray bottle, knife, and tongs. Project website is at: https://web2grasp.github.io/.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.