hub Canonical reference

Dexumi: Using human hand as the universal manipulation in- terface for dexterous manipulation

· 2025 · arXiv 2505.21864

Canonical reference. 100% of citing Pith papers cite this work as background.

25 Pith papers citing it

Background 100% of classified citations

read on arXiv browse 25 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 7

citation-polarity summary

background 7

representative citing papers

Human Universal Grasping

cs.RO · 2026-06-15 · unverdicted · novelty 7.0

HUG trains a flow-matching model on a new 1M-frame egocentric human grasp dataset to generate retargetable grasps from single RGB-D images, beating baselines by 23-34% on a new 90-object benchmark.

FTP-1: A Generalist Foundation Tactile Policy Across Tactile Sensors for Contact-Rich Manipulation

cs.RO · 2026-06-11 · unverdicted · novelty 7.0

FTP-1 is the first foundation tactile policy pretrained on ~3000 hours of data from 26 sources across 21 sensors that improves performance on seen setups by 17.2% and transfers to unseen sensors with 31% success rate gain.

EgoEngine: From Egocentric Human Videos to High-Fidelity Dexterous Robot Demonstrations

cs.RO · 2026-06-10 · unverdicted · novelty 7.0

EgoEngine transforms egocentric human videos into high-fidelity robot data enabling zero-shot visuomotor dexterous policy learning without real-robot demonstrations.

RIO: Flexible Real-Time Robot I/O for Cross-Embodiment Robot Learning

cs.RO · 2026-05-12 · unverdicted · novelty 7.0

RIO introduces a lightweight open-source framework that abstracts real-time robot I/O to support easy switching between embodiments and platforms for collecting data and deploying VLAs.

Being-H0.7: A Latent World-Action Model from Egocentric Videos

cs.RO · 2026-04-30 · unverdicted · novelty 7.0

Being-H0.7 adds future-aware latent reasoning to direct VLA policies via dual-branch alignment on latent queries, matching world-model benefits at VLA efficiency.

TouchGuide: Inference-Time Steering of Visuomotor Policies via Touch Guidance

cs.RO · 2026-01-28 · unverdicted · novelty 7.0

TouchGuide improves contact-rich robot manipulation by steering diffusion or flow-matching visuomotor policies with tactile feasibility scores from a contrastively trained Contact Physical Model.

Translation as a Bridging Action: Transferring Manipulation Skills from Humans to Robots

cs.RO · 2026-06-26 · unverdicted · novelty 6.0

A relative wrist translation bridging action with a vision-language-action model using interleaved tokens and attention masking transfers human manipulation skills to robots more effectively than 6DoF actions.

Imitation from Heterogeneous Demonstrations using Grounded Latent-Action World Models

cs.RO · 2026-06-19 · unverdicted · novelty 6.0

GLAM learns a shared latent action space grounded in consistent future observation prediction across heterogeneous data sources to train improved behavioral cloning policies for robot manipulation tasks.

Transferring Contact, Not Just Motion: Compliant Grasping Across Dexterous Hands

cs.RO · 2026-06-14 · unverdicted · novelty 6.0

A cross-embodiment force-position interface with system-identified torque calibration enables a flow-matching policy to perform transferable compliant grasping on heterogeneous dexterous hands.

EmbodiSteer: Steering Embodiment-Agnostic Visuomotor Policies with Joint-Space Guidance for Zero-Shot Cross-Embodiment Deployment

cs.RO · 2026-06-11 · unverdicted · novelty 6.0

EmbodiSteer steers embodiment-agnostic Cartesian diffusion policies into joint space with Jacobian-based collision guidance after each denoising step for zero-shot cross-embodiment deployment.

HARP-VLA: Human-Robot Aligned Representation Learning for Vision-Language-Action Model

cs.RO · 2026-05-29 · unverdicted · novelty 6.0

HARP aligns human-robot visual and latent action representations via paired bridges and unpaired dynamics supervision to boost VLA policy performance on manipulation tasks.

DexJoCo: A Benchmark and Toolkit for Task-Oriented Dexterous Manipulation on MuJoCo

cs.RO · 2026-05-15 · conditional · novelty 6.0

DexJoCo is a benchmark and toolkit with 11 functionally grounded tasks, 1.1K trajectories, and empirical benchmarks for task-oriented dexterous manipulation on MuJoCo.

FingerViP: Learning Real-World Dexterous Manipulation with Fingertip Visual Perception

cs.RO · 2026-04-23 · conditional · novelty 6.0

FingerViP equips each finger with a miniature camera and trains a multi-view diffusion policy that achieves 80.8% success on real-world dexterous tasks previously limited by wrist-camera occlusion.

UMI-3D: Extending Universal Manipulation Interface from Vision-Limited to 3D Spatial Perception

cs.RO · 2026-04-15 · unverdicted · novelty 6.0

UMI-3D integrates LiDAR into the UMI hardware for robust multimodal 3D perception in manipulation demonstrations, yielding higher policy success rates and enabling previously infeasible tasks like deformable object handling.

A Mechanistic Analysis of Sim-and-Real Co-Training in Generative Robot Policies

cs.RO · 2026-04-15 · unverdicted · novelty 6.0

Sim-and-real co-training for robot policies is driven primarily by balanced cross-domain representation alignment and secondarily by domain-dependent action reweighting.

XRZero-G0: Pushing the Frontier of Dexterous Robotic Manipulation with Interfaces, Quality and Ratios

cs.RO · 2026-04-14 · unverdicted · novelty 6.0

XRZero-G0 enables 2000-hour robot-free datasets that, when mixed 10:1 with real-robot data, match full real-robot performance at 1/20th the cost and support zero-shot transfer.

ActiveGlasses: Learning Manipulation with Active Vision from Ego-centric Human Demonstration

cs.RO · 2026-04-09 · unverdicted · novelty 6.0

ActiveGlasses learns robot manipulation from ego-centric human demos captured with active vision via smart glasses, achieving zero-shot transfer using object-centric point-cloud policies.

One Hand to Rule Them All: Canonical Representations for Unified Dexterous Manipulation

cs.RO · 2026-02-18 · unverdicted · novelty 6.0

A unified parameter space and canonical URDF enable cross-embodiment dexterous grasping policies with 81.9% zero-shot success on unseen hands like the 3-finger LEAP Hand.

Play2Perfect: What Matters in Dexterous Play Pretraining for Precise Assembly?

cs.RO · 2026-06-24 · unverdicted · novelty 5.0

Play2Perfect uses task-agnostic RL play pretraining on diverse objects to build reusable manipulation priors, then fine-tunes for assembly, yielding 33x sample efficiency gains and 60% success on 0.5mm-clearance insertions in sim-to-real transfer.

KITE: Decoupling Kinematics and Interaction for Zero-Shot Cross-Embodiment Manipulation

cs.RO · 2026-06-20 · unverdicted · novelty 5.0

KITE decouples task reasoning from embodiment-specific control via learned latent interaction intents to enable zero-shot transfer across structurally different robots.

ConTrack: Constrained Hand Motion Tracking with Adaptive Trade-off Control

cs.RO · 2026-06-02 · unverdicted · novelty 5.0

ConTrack introduces a constrained RL method with online dual-variable adaptation and adaptive resets for improved long-horizon hand tracking in simulation and on real robots.

OmniUMI: Towards Physically Grounded Robot Learning via Human-Aligned Multimodal Interaction

cs.RO · 2026-04-12 · unverdicted · novelty 5.0

OmniUMI introduces a multimodal handheld interface that synchronously records RGB, depth, trajectory, tactile, internal grasp force, and external wrench data for training diffusion policies on contact-rich robot manipulation.

World Action Models: The Next Frontier in Embodied AI

cs.RO · 2026-05-12 · unverdicted · novelty 4.0

The paper introduces World Action Models as a new paradigm unifying predictive world modeling with action generation in embodied foundation models and provides a taxonomy of existing approaches.

Robotic Affection -- Opportunities of AI-based haptic interactions to improve social robotic touch through a multi-deep-learning approach

cs.HC · 2026-05-04 · unverdicted · novelty 4.0

A position paper proposes decomposing affective robotic touch into multiple specialized deep learning models for better social human-robot interaction.

citing papers explorer

Showing 22 of 22 citing papers after filters.

Human Universal Grasping cs.RO · 2026-06-15 · unverdicted · none · ref 37
HUG trains a flow-matching model on a new 1M-frame egocentric human grasp dataset to generate retargetable grasps from single RGB-D images, beating baselines by 23-34% on a new 90-object benchmark.
FTP-1: A Generalist Foundation Tactile Policy Across Tactile Sensors for Contact-Rich Manipulation cs.RO · 2026-06-11 · unverdicted · none · ref 91
FTP-1 is the first foundation tactile policy pretrained on ~3000 hours of data from 26 sources across 21 sensors that improves performance on seen setups by 17.2% and transfers to unseen sensors with 31% success rate gain.
EgoEngine: From Egocentric Human Videos to High-Fidelity Dexterous Robot Demonstrations cs.RO · 2026-06-10 · unverdicted · none · ref 4
EgoEngine transforms egocentric human videos into high-fidelity robot data enabling zero-shot visuomotor dexterous policy learning without real-robot demonstrations.
RIO: Flexible Real-Time Robot I/O for Cross-Embodiment Robot Learning cs.RO · 2026-05-12 · unverdicted · none · ref 50
RIO introduces a lightweight open-source framework that abstracts real-time robot I/O to support easy switching between embodiments and platforms for collecting data and deploying VLAs.
Being-H0.7: A Latent World-Action Model from Egocentric Videos cs.RO · 2026-04-30 · unverdicted · none · ref 72
Being-H0.7 adds future-aware latent reasoning to direct VLA policies via dual-branch alignment on latent queries, matching world-model benefits at VLA efficiency.
TouchGuide: Inference-Time Steering of Visuomotor Policies via Touch Guidance cs.RO · 2026-01-28 · unverdicted · none · ref 63
TouchGuide improves contact-rich robot manipulation by steering diffusion or flow-matching visuomotor policies with tactile feasibility scores from a contrastively trained Contact Physical Model.
Translation as a Bridging Action: Transferring Manipulation Skills from Humans to Robots cs.RO · 2026-06-26 · unverdicted · none · ref 60
A relative wrist translation bridging action with a vision-language-action model using interleaved tokens and attention masking transfers human manipulation skills to robots more effectively than 6DoF actions.
Imitation from Heterogeneous Demonstrations using Grounded Latent-Action World Models cs.RO · 2026-06-19 · unverdicted · none · ref 10
GLAM learns a shared latent action space grounded in consistent future observation prediction across heterogeneous data sources to train improved behavioral cloning policies for robot manipulation tasks.
Transferring Contact, Not Just Motion: Compliant Grasping Across Dexterous Hands cs.RO · 2026-06-14 · unverdicted · none · ref 16
A cross-embodiment force-position interface with system-identified torque calibration enables a flow-matching policy to perform transferable compliant grasping on heterogeneous dexterous hands.
EmbodiSteer: Steering Embodiment-Agnostic Visuomotor Policies with Joint-Space Guidance for Zero-Shot Cross-Embodiment Deployment cs.RO · 2026-06-11 · unverdicted · none · ref 17
EmbodiSteer steers embodiment-agnostic Cartesian diffusion policies into joint space with Jacobian-based collision guidance after each denoising step for zero-shot cross-embodiment deployment.
HARP-VLA: Human-Robot Aligned Representation Learning for Vision-Language-Action Model cs.RO · 2026-05-29 · unverdicted · none · ref 7
HARP aligns human-robot visual and latent action representations via paired bridges and unpaired dynamics supervision to boost VLA policy performance on manipulation tasks.
UMI-3D: Extending Universal Manipulation Interface from Vision-Limited to 3D Spatial Perception cs.RO · 2026-04-15 · unverdicted · none · ref 29
UMI-3D integrates LiDAR into the UMI hardware for robust multimodal 3D perception in manipulation demonstrations, yielding higher policy success rates and enabling previously infeasible tasks like deformable object handling.
A Mechanistic Analysis of Sim-and-Real Co-Training in Generative Robot Policies cs.RO · 2026-04-15 · unverdicted · none · ref 28
Sim-and-real co-training for robot policies is driven primarily by balanced cross-domain representation alignment and secondarily by domain-dependent action reweighting.
XRZero-G0: Pushing the Frontier of Dexterous Robotic Manipulation with Interfaces, Quality and Ratios cs.RO · 2026-04-14 · unverdicted · none · ref 18
XRZero-G0 enables 2000-hour robot-free datasets that, when mixed 10:1 with real-robot data, match full real-robot performance at 1/20th the cost and support zero-shot transfer.
ActiveGlasses: Learning Manipulation with Active Vision from Ego-centric Human Demonstration cs.RO · 2026-04-09 · unverdicted · none · ref 16
ActiveGlasses learns robot manipulation from ego-centric human demos captured with active vision via smart glasses, achieving zero-shot transfer using object-centric point-cloud policies.
One Hand to Rule Them All: Canonical Representations for Unified Dexterous Manipulation cs.RO · 2026-02-18 · unverdicted · none · ref 33
A unified parameter space and canonical URDF enable cross-embodiment dexterous grasping policies with 81.9% zero-shot success on unseen hands like the 3-finger LEAP Hand.
Play2Perfect: What Matters in Dexterous Play Pretraining for Precise Assembly? cs.RO · 2026-06-24 · unverdicted · none · ref 34
Play2Perfect uses task-agnostic RL play pretraining on diverse objects to build reusable manipulation priors, then fine-tunes for assembly, yielding 33x sample efficiency gains and 60% success on 0.5mm-clearance insertions in sim-to-real transfer.
KITE: Decoupling Kinematics and Interaction for Zero-Shot Cross-Embodiment Manipulation cs.RO · 2026-06-20 · unverdicted · none · ref 36
KITE decouples task reasoning from embodiment-specific control via learned latent interaction intents to enable zero-shot transfer across structurally different robots.
ConTrack: Constrained Hand Motion Tracking with Adaptive Trade-off Control cs.RO · 2026-06-02 · unverdicted · none · ref 48
ConTrack introduces a constrained RL method with online dual-variable adaptation and adaptive resets for improved long-horizon hand tracking in simulation and on real robots.
OmniUMI: Towards Physically Grounded Robot Learning via Human-Aligned Multimodal Interaction cs.RO · 2026-04-12 · unverdicted · none · ref 27
OmniUMI introduces a multimodal handheld interface that synchronously records RGB, depth, trajectory, tactile, internal grasp force, and external wrench data for training diffusion policies on contact-rich robot manipulation.
World Action Models: The Next Frontier in Embodied AI cs.RO · 2026-05-12 · unverdicted · none · ref 159
The paper introduces World Action Models as a new paradigm unifying predictive world modeling with action generation in embodied foundation models and provides a taxonomy of existing approaches.
Robotic Affection -- Opportunities of AI-based haptic interactions to improve social robotic touch through a multi-deep-learning approach cs.HC · 2026-05-04 · unverdicted · none · ref 41
A position paper proposes decomposing affective robotic touch into multiple specialized deep learning models for better social human-robot interaction.

Dexumi: Using human hand as the universal manipulation in- terface for dexterous manipulation

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer