pith. sign in

arxiv: 2603.20850 · v2 · pith:MN644MRRnew · submitted 2026-03-21 · 💻 cs.CV · cs.RO

Glove2Hand: Synthesizing Natural Hand-Object Interaction from Multi-Modal Sensing Gloves

classification 💻 cs.CV cs.RO
keywords handglove2handhand-objectinteractionmulti-modalvideoscontacthandsense
0
0 comments X
read the original abstract

Understanding hand-object interaction (HOI) is fundamental to computer vision, robotics, and AR/VR. However, conventional hand videos often lack essential physical information such as contact forces and motion signals, and are prone to frequent occlusions. To address the challenges, we present Glove2Hand, a framework that translates multi-modal sensing glove HOI videos into photorealistic bare hands, while faithfully preserving the underlying physical interaction dynamics. We introduce a novel 3D Gaussian hand model that ensures temporal rendering consistency. The rendered hand is seamlessly integrated into the scene using a diffusion-based hand restorer, which effectively handles complex hand-object interactions and non-rigid deformations. Leveraging Glove2Hand, we create HandSense, the first multi-modal HOI dataset featuring glove-to-hand videos with synchronized tactile and IMU signals. We demonstrate that HandSense significantly enhances downstream bare-hand applications, including video-based contact estimation and hand tracking under severe occlusion.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. AVI-HT: Adaptive Vision-IMU Fusion for 3D Hand Tracking

    cs.CV 2026-05 unverdicted novelty 5.0

    AVI-HT adaptively fuses vision and IMU data via attention to cut 3D hand keypoint error by 16.1% (24.2% wrist-aligned) on a new 100K+ sample DexGloveHOI dataset in occluded hand-object scenarios.