RIO introduces a lightweight open-source framework that abstracts real-time robot I/O to support easy switching between embodiments and platforms for collecting data and deploying VLAs.
hub Canonical reference
Dexumi: Using human hand as the universal manipulation in- terface for dexterous manipulation
Canonical reference. 100% of citing Pith papers cite this work as background.
hub tools
citation-role summary
citation-polarity summary
years
2026 14roles
background 7polarities
background 7representative citing papers
Being-H0.7 adds future-aware latent reasoning to direct VLA policies via dual-branch alignment on latent queries, matching world-model benefits at VLA efficiency.
TouchGuide improves contact-rich robot manipulation by steering diffusion or flow-matching visuomotor policies with tactile feasibility scores from a contrastively trained Contact Physical Model.
DexJoCo is a benchmark and toolkit with 11 functionally grounded tasks, 1.1K trajectories, and empirical benchmarks for task-oriented dexterous manipulation on MuJoCo.
FingerViP equips each finger with a miniature camera and trains a multi-view diffusion policy that achieves 80.8% success on real-world dexterous tasks previously limited by wrist-camera occlusion.
UMI-3D integrates LiDAR into the UMI hardware for robust multimodal 3D perception in manipulation demonstrations, yielding higher policy success rates and enabling previously infeasible tasks like deformable object handling.
Sim-and-real co-training for robot policies is driven primarily by balanced cross-domain representation alignment and secondarily by domain-dependent action reweighting.
XRZero-G0 enables 2000-hour robot-free datasets that, when mixed 10:1 with real-robot data, match full real-robot performance at 1/20th the cost and support zero-shot transfer.
ActiveGlasses learns robot manipulation from ego-centric human demos captured with active vision via smart glasses, achieving zero-shot transfer using object-centric point-cloud policies.
PTLD distills real privileged tactile data into a state estimator to boost sim-to-real performance of proprioceptive dexterous manipulation policies, yielding 182% improvement on in-hand rotation and 57% on reorientation tasks.
A unified parameter space and canonical URDF enable cross-embodiment dexterous grasping policies with 81.9% zero-shot success on unseen hands like the 3-finger LEAP Hand.
OmniUMI introduces a multimodal handheld interface that synchronously records RGB, depth, trajectory, tactile, internal grasp force, and external wrench data for training diffusion policies on contact-rich robot manipulation.
The paper introduces World Action Models as a new paradigm unifying predictive world modeling with action generation in embodied foundation models and provides a taxonomy of existing approaches.
A position paper proposes decomposing affective robotic touch into multiple specialized deep learning models for better social human-robot interaction.
citing papers explorer
-
RIO: Flexible Real-Time Robot I/O for Cross-Embodiment Robot Learning
RIO introduces a lightweight open-source framework that abstracts real-time robot I/O to support easy switching between embodiments and platforms for collecting data and deploying VLAs.
-
Being-H0.7: A Latent World-Action Model from Egocentric Videos
Being-H0.7 adds future-aware latent reasoning to direct VLA policies via dual-branch alignment on latent queries, matching world-model benefits at VLA efficiency.
-
TouchGuide: Inference-Time Steering of Visuomotor Policies via Touch Guidance
TouchGuide improves contact-rich robot manipulation by steering diffusion or flow-matching visuomotor policies with tactile feasibility scores from a contrastively trained Contact Physical Model.
-
DexJoCo: A Benchmark and Toolkit for Task-Oriented Dexterous Manipulation on MuJoCo
DexJoCo is a benchmark and toolkit with 11 functionally grounded tasks, 1.1K trajectories, and empirical benchmarks for task-oriented dexterous manipulation on MuJoCo.
-
FingerViP: Learning Real-World Dexterous Manipulation with Fingertip Visual Perception
FingerViP equips each finger with a miniature camera and trains a multi-view diffusion policy that achieves 80.8% success on real-world dexterous tasks previously limited by wrist-camera occlusion.
-
UMI-3D: Extending Universal Manipulation Interface from Vision-Limited to 3D Spatial Perception
UMI-3D integrates LiDAR into the UMI hardware for robust multimodal 3D perception in manipulation demonstrations, yielding higher policy success rates and enabling previously infeasible tasks like deformable object handling.
-
A Mechanistic Analysis of Sim-and-Real Co-Training in Generative Robot Policies
Sim-and-real co-training for robot policies is driven primarily by balanced cross-domain representation alignment and secondarily by domain-dependent action reweighting.
-
XRZero-G0: Pushing the Frontier of Dexterous Robotic Manipulation with Interfaces, Quality and Ratios
XRZero-G0 enables 2000-hour robot-free datasets that, when mixed 10:1 with real-robot data, match full real-robot performance at 1/20th the cost and support zero-shot transfer.
-
ActiveGlasses: Learning Manipulation with Active Vision from Ego-centric Human Demonstration
ActiveGlasses learns robot manipulation from ego-centric human demos captured with active vision via smart glasses, achieving zero-shot transfer using object-centric point-cloud policies.
-
PTLD: Sim-to-real Privileged Tactile Latent Distillation for Dexterous Manipulation
PTLD distills real privileged tactile data into a state estimator to boost sim-to-real performance of proprioceptive dexterous manipulation policies, yielding 182% improvement on in-hand rotation and 57% on reorientation tasks.
-
One Hand to Rule Them All: Canonical Representations for Unified Dexterous Manipulation
A unified parameter space and canonical URDF enable cross-embodiment dexterous grasping policies with 81.9% zero-shot success on unseen hands like the 3-finger LEAP Hand.
-
OmniUMI: Towards Physically Grounded Robot Learning via Human-Aligned Multimodal Interaction
OmniUMI introduces a multimodal handheld interface that synchronously records RGB, depth, trajectory, tactile, internal grasp force, and external wrench data for training diffusion policies on contact-rich robot manipulation.
-
World Action Models: The Next Frontier in Embodied AI
The paper introduces World Action Models as a new paradigm unifying predictive world modeling with action generation in embodied foundation models and provides a taxonomy of existing approaches.
-
Robotic Affection -- Opportunities of AI-based haptic interactions to improve social robotic touch through a multi-deep-learning approach
A position paper proposes decomposing affective robotic touch into multiple specialized deep learning models for better social human-robot interaction.