pith. sign in

arxiv: 2503.23300 · v2 · pith:G55TYSIMnew · submitted 2025-03-30 · 💻 cs.CV · cs.RO

Learning Predictive Visuomotor Coordination

classification 💻 cs.CV cs.RO
keywords visuomotorcoordinationmodelingacrossegocentrichumankinematiclearning
0
0 comments X
read the original abstract

Understanding and predicting human visuomotor coordination is crucial for applications in robotics, human-computer interaction, and assistive technologies. This work introduces a forecasting-based task for visuomotor modeling, where the goal is to predict head pose, gaze, and upper-body motion from egocentric visual and kinematic observations. We propose a \textit{Visuomotor Coordination Representation} (VCR) that learns structured temporal dependencies across these multimodal signals. We extend a diffusion-based motion modeling framework that integrates egocentric vision and kinematic sequences, enabling temporally coherent and accurate visuomotor predictions. Our approach is evaluated on the large-scale EgoExo4D dataset, demonstrating strong generalization across diverse real-world activities. Our results highlight the importance of multimodal integration in understanding visuomotor coordination, contributing to research in visuomotor learning and human behavior modeling. Project Page: https://vjwq.github.io/VCR/.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.