Glove2Hand: Synthesizing Natural Hand-Object Interaction from Multi-Modal Sensing Gloves

Abdeslam Boularias; Ankit Kumar; Chuan Qin; Ergys Ristani; Kun He; Lele Chen; Li Guan; Mia Huang; Xinyu Zhang; Ziyi Kou

arxiv: 2603.20850 · v2 · pith:MN644MRRnew · submitted 2026-03-21 · 💻 cs.CV · cs.RO

Glove2Hand: Synthesizing Natural Hand-Object Interaction from Multi-Modal Sensing Gloves

Xinyu Zhang , Ziyi Kou , Chuan Qin , Mia Huang , Ergys Ristani , Ankit Kumar , Lele Chen , Kun He

show 2 more authors

Abdeslam Boularias Li Guan

This is my paper

classification 💻 cs.CV cs.RO

keywords handglove2handhand-objectinteractionmulti-modalvideoscontacthandsense

0 comments

read the original abstract

Understanding hand-object interaction (HOI) is fundamental to computer vision, robotics, and AR/VR. However, conventional hand videos often lack essential physical information such as contact forces and motion signals, and are prone to frequent occlusions. To address the challenges, we present Glove2Hand, a framework that translates multi-modal sensing glove HOI videos into photorealistic bare hands, while faithfully preserving the underlying physical interaction dynamics. We introduce a novel 3D Gaussian hand model that ensures temporal rendering consistency. The rendered hand is seamlessly integrated into the scene using a diffusion-based hand restorer, which effectively handles complex hand-object interactions and non-rigid deformations. Leveraging Glove2Hand, we create HandSense, the first multi-modal HOI dataset featuring glove-to-hand videos with synchronized tactile and IMU signals. We demonstrate that HandSense significantly enhances downstream bare-hand applications, including video-based contact estimation and hand tracking under severe occlusion.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

AVI-HT: Adaptive Vision-IMU Fusion for 3D Hand Tracking
cs.CV 2026-05 unverdicted novelty 5.0

AVI-HT adaptively fuses vision and IMU data via attention to cut 3D hand keypoint error by 16.1% (24.2% wrist-aligned) on a new 100K+ sample DexGloveHOI dataset in occluded hand-object scenarios.