HUG trains a flow-matching model on a new 1M-frame egocentric human grasp dataset to generate retargetable grasps from single RGB-D images, beating baselines by 23-34% on a new 90-object benchmark.
Tapir: Tracking any point with per-frame initialization and temporal refinement
3 Pith papers cite this work. Polarity classification is still indexing.
verdicts
UNVERDICTED 3representative citing papers
Decouples action-free video world models from embodiment-specific IDMs using Jacobian-based translation to achieve zero-shot cross-embodiment robot policies.
The paper summarizes results from the SurgToolLoc and SurgVU challenges held at MICCAI conferences from 2022 to 2025.
citing papers explorer
-
Human Universal Grasping
HUG trains a flow-matching model on a new 1M-frame egocentric human grasp dataset to generate retargetable grasps from single RGB-D images, beating baselines by 23-34% on a new 90-object benchmark.
-
Turning Video Models into Generalist Robot Policies
Decouples action-free video world models from embodiment-specific IDMs using Jacobian-based translation to achieve zero-shot cross-embodiment robot policies.
-
Intuitive Surgical SurgToolLoc and SurgVU Challenges Results: 2022-2025
The paper summarizes results from the SurgToolLoc and SurgVU challenges held at MICCAI conferences from 2022 to 2025.