Flowing from reasoning to motion: Learning 3d hand trajectory prediction from egocentric human interaction videos

Mingfei Chen, Yifan Wang, Zhengqin Li, Homanga Bharadhwaj, Yujin Chen, Chuan Qin, Ziyi Kou, Yuan Tian, Eric Whitmire, Rajinder Sodhi, et al · 2025 · arXiv 2512.16907

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

read on arXiv browse 3 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

BORA: Bridging Offline Reinforcement Learning and Online Residual Adaptation for Real-World Dexterous VLA Models

cs.RO · 2026-05-28 · unverdicted · novelty 5.0

BORA combines offline RL critic training with online chunk-wise residual adaptation to raise average success rates of real-world dexterous VLA policies by 33% and up to 43% on unseen objects across five tasks.

Learning Human-Intention Priors from Large-Scale Human Demonstrations for Robotic Manipulation

cs.RO · 2026-04-27 · unverdicted · novelty 5.0 · 2 refs

MoT-HRA learns embodiment-agnostic human-intention priors from a curated 2.2M-episode human video dataset via a three-expert hierarchical vision-language-action model to improve robotic manipulation under distribution shift.

Unified Video-Action Joint Denoising for Dexterous Action and Data Generation

cs.CV · 2026-06-02 · unverdicted · novelty 4.0

Donk is a unified video-action denoising model that generates dexterous hand trajectories and videos under language, image, and state conditioning while also serving as a text-conditioned data engine.

citing papers explorer

Showing 3 of 3 citing papers.

BORA: Bridging Offline Reinforcement Learning and Online Residual Adaptation for Real-World Dexterous VLA Models cs.RO · 2026-05-28 · unverdicted · none · ref 13
BORA combines offline RL critic training with online chunk-wise residual adaptation to raise average success rates of real-world dexterous VLA policies by 33% and up to 43% on unseen objects across five tasks.
Learning Human-Intention Priors from Large-Scale Human Demonstrations for Robotic Manipulation cs.RO · 2026-04-27 · unverdicted · none · ref 14 · 2 links
MoT-HRA learns embodiment-agnostic human-intention priors from a curated 2.2M-episode human video dataset via a three-expert hierarchical vision-language-action model to improve robotic manipulation under distribution shift.
Unified Video-Action Joint Denoising for Dexterous Action and Data Generation cs.CV · 2026-06-02 · unverdicted · none · ref 18
Donk is a unified video-action denoising model that generates dexterous hand trajectories and videos under language, image, and state conditioning while also serving as a text-conditioned data engine.

Flowing from reasoning to motion: Learning 3d hand trajectory prediction from egocentric human interaction videos

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer