HUG trains a flow-matching model on a new 1M-frame egocentric human grasp dataset to generate retargetable grasps from single RGB-D images, beating baselines by 23-34% on a new 90-object benchmark.
hub Canonical reference
Dexumi: Using human hand as the universal manipulation in- terface for dexterous manipulation
Canonical reference. 100% of citing Pith papers cite this work as background.
hub tools
citation-role summary
citation-polarity summary
years
2026 25roles
background 7polarities
background 7representative citing papers
FTP-1 is the first foundation tactile policy pretrained on ~3000 hours of data from 26 sources across 21 sensors that improves performance on seen setups by 17.2% and transfers to unseen sensors with 31% success rate gain.
EgoEngine transforms egocentric human videos into high-fidelity robot data enabling zero-shot visuomotor dexterous policy learning without real-robot demonstrations.
RIO introduces a lightweight open-source framework that abstracts real-time robot I/O to support easy switching between embodiments and platforms for collecting data and deploying VLAs.
Being-H0.7 adds future-aware latent reasoning to direct VLA policies via dual-branch alignment on latent queries, matching world-model benefits at VLA efficiency.
TouchGuide improves contact-rich robot manipulation by steering diffusion or flow-matching visuomotor policies with tactile feasibility scores from a contrastively trained Contact Physical Model.
A relative wrist translation bridging action with a vision-language-action model using interleaved tokens and attention masking transfers human manipulation skills to robots more effectively than 6DoF actions.
GLAM learns a shared latent action space grounded in consistent future observation prediction across heterogeneous data sources to train improved behavioral cloning policies for robot manipulation tasks.
A cross-embodiment force-position interface with system-identified torque calibration enables a flow-matching policy to perform transferable compliant grasping on heterogeneous dexterous hands.
EmbodiSteer steers embodiment-agnostic Cartesian diffusion policies into joint space with Jacobian-based collision guidance after each denoising step for zero-shot cross-embodiment deployment.
HARP aligns human-robot visual and latent action representations via paired bridges and unpaired dynamics supervision to boost VLA policy performance on manipulation tasks.
DexJoCo is a benchmark and toolkit with 11 functionally grounded tasks, 1.1K trajectories, and empirical benchmarks for task-oriented dexterous manipulation on MuJoCo.
FingerViP equips each finger with a miniature camera and trains a multi-view diffusion policy that achieves 80.8% success on real-world dexterous tasks previously limited by wrist-camera occlusion.
UMI-3D integrates LiDAR into the UMI hardware for robust multimodal 3D perception in manipulation demonstrations, yielding higher policy success rates and enabling previously infeasible tasks like deformable object handling.
Sim-and-real co-training for robot policies is driven primarily by balanced cross-domain representation alignment and secondarily by domain-dependent action reweighting.
XRZero-G0 enables 2000-hour robot-free datasets that, when mixed 10:1 with real-robot data, match full real-robot performance at 1/20th the cost and support zero-shot transfer.
ActiveGlasses learns robot manipulation from ego-centric human demos captured with active vision via smart glasses, achieving zero-shot transfer using object-centric point-cloud policies.
A unified parameter space and canonical URDF enable cross-embodiment dexterous grasping policies with 81.9% zero-shot success on unseen hands like the 3-finger LEAP Hand.
Play2Perfect uses task-agnostic RL play pretraining on diverse objects to build reusable manipulation priors, then fine-tunes for assembly, yielding 33x sample efficiency gains and 60% success on 0.5mm-clearance insertions in sim-to-real transfer.
KITE decouples task reasoning from embodiment-specific control via learned latent interaction intents to enable zero-shot transfer across structurally different robots.
ConTrack introduces a constrained RL method with online dual-variable adaptation and adaptive resets for improved long-horizon hand tracking in simulation and on real robots.
OmniUMI introduces a multimodal handheld interface that synchronously records RGB, depth, trajectory, tactile, internal grasp force, and external wrench data for training diffusion policies on contact-rich robot manipulation.
The paper introduces World Action Models as a new paradigm unifying predictive world modeling with action generation in embodied foundation models and provides a taxonomy of existing approaches.
A position paper proposes decomposing affective robotic touch into multiple specialized deep learning models for better social human-robot interaction.
citing papers explorer
-
Human Universal Grasping
HUG trains a flow-matching model on a new 1M-frame egocentric human grasp dataset to generate retargetable grasps from single RGB-D images, beating baselines by 23-34% on a new 90-object benchmark.
-
FTP-1: A Generalist Foundation Tactile Policy Across Tactile Sensors for Contact-Rich Manipulation
FTP-1 is the first foundation tactile policy pretrained on ~3000 hours of data from 26 sources across 21 sensors that improves performance on seen setups by 17.2% and transfers to unseen sensors with 31% success rate gain.
-
EgoEngine: From Egocentric Human Videos to High-Fidelity Dexterous Robot Demonstrations
EgoEngine transforms egocentric human videos into high-fidelity robot data enabling zero-shot visuomotor dexterous policy learning without real-robot demonstrations.
-
RIO: Flexible Real-Time Robot I/O for Cross-Embodiment Robot Learning
RIO introduces a lightweight open-source framework that abstracts real-time robot I/O to support easy switching between embodiments and platforms for collecting data and deploying VLAs.
-
Being-H0.7: A Latent World-Action Model from Egocentric Videos
Being-H0.7 adds future-aware latent reasoning to direct VLA policies via dual-branch alignment on latent queries, matching world-model benefits at VLA efficiency.
-
TouchGuide: Inference-Time Steering of Visuomotor Policies via Touch Guidance
TouchGuide improves contact-rich robot manipulation by steering diffusion or flow-matching visuomotor policies with tactile feasibility scores from a contrastively trained Contact Physical Model.
-
Translation as a Bridging Action: Transferring Manipulation Skills from Humans to Robots
A relative wrist translation bridging action with a vision-language-action model using interleaved tokens and attention masking transfers human manipulation skills to robots more effectively than 6DoF actions.
-
Imitation from Heterogeneous Demonstrations using Grounded Latent-Action World Models
GLAM learns a shared latent action space grounded in consistent future observation prediction across heterogeneous data sources to train improved behavioral cloning policies for robot manipulation tasks.
-
Transferring Contact, Not Just Motion: Compliant Grasping Across Dexterous Hands
A cross-embodiment force-position interface with system-identified torque calibration enables a flow-matching policy to perform transferable compliant grasping on heterogeneous dexterous hands.
-
EmbodiSteer: Steering Embodiment-Agnostic Visuomotor Policies with Joint-Space Guidance for Zero-Shot Cross-Embodiment Deployment
EmbodiSteer steers embodiment-agnostic Cartesian diffusion policies into joint space with Jacobian-based collision guidance after each denoising step for zero-shot cross-embodiment deployment.
-
HARP-VLA: Human-Robot Aligned Representation Learning for Vision-Language-Action Model
HARP aligns human-robot visual and latent action representations via paired bridges and unpaired dynamics supervision to boost VLA policy performance on manipulation tasks.
-
UMI-3D: Extending Universal Manipulation Interface from Vision-Limited to 3D Spatial Perception
UMI-3D integrates LiDAR into the UMI hardware for robust multimodal 3D perception in manipulation demonstrations, yielding higher policy success rates and enabling previously infeasible tasks like deformable object handling.
-
A Mechanistic Analysis of Sim-and-Real Co-Training in Generative Robot Policies
Sim-and-real co-training for robot policies is driven primarily by balanced cross-domain representation alignment and secondarily by domain-dependent action reweighting.
-
XRZero-G0: Pushing the Frontier of Dexterous Robotic Manipulation with Interfaces, Quality and Ratios
XRZero-G0 enables 2000-hour robot-free datasets that, when mixed 10:1 with real-robot data, match full real-robot performance at 1/20th the cost and support zero-shot transfer.
-
ActiveGlasses: Learning Manipulation with Active Vision from Ego-centric Human Demonstration
ActiveGlasses learns robot manipulation from ego-centric human demos captured with active vision via smart glasses, achieving zero-shot transfer using object-centric point-cloud policies.
-
One Hand to Rule Them All: Canonical Representations for Unified Dexterous Manipulation
A unified parameter space and canonical URDF enable cross-embodiment dexterous grasping policies with 81.9% zero-shot success on unseen hands like the 3-finger LEAP Hand.
-
Play2Perfect: What Matters in Dexterous Play Pretraining for Precise Assembly?
Play2Perfect uses task-agnostic RL play pretraining on diverse objects to build reusable manipulation priors, then fine-tunes for assembly, yielding 33x sample efficiency gains and 60% success on 0.5mm-clearance insertions in sim-to-real transfer.
-
KITE: Decoupling Kinematics and Interaction for Zero-Shot Cross-Embodiment Manipulation
KITE decouples task reasoning from embodiment-specific control via learned latent interaction intents to enable zero-shot transfer across structurally different robots.
-
ConTrack: Constrained Hand Motion Tracking with Adaptive Trade-off Control
ConTrack introduces a constrained RL method with online dual-variable adaptation and adaptive resets for improved long-horizon hand tracking in simulation and on real robots.
-
OmniUMI: Towards Physically Grounded Robot Learning via Human-Aligned Multimodal Interaction
OmniUMI introduces a multimodal handheld interface that synchronously records RGB, depth, trajectory, tactile, internal grasp force, and external wrench data for training diffusion policies on contact-rich robot manipulation.
-
World Action Models: The Next Frontier in Embodied AI
The paper introduces World Action Models as a new paradigm unifying predictive world modeling with action generation in embodied foundation models and provides a taxonomy of existing approaches.
-
Robotic Affection -- Opportunities of AI-based haptic interactions to improve social robotic touch through a multi-deep-learning approach
A position paper proposes decomposing affective robotic touch into multiple specialized deep learning models for better social human-robot interaction.