FTP-1 is the first foundation tactile policy pretrained on ~3000 hours of data from 26 sources across 21 sensors that improves performance on seen setups by 17.2% and transfers to unseen sensors with 31% success rate gain.
Learning Versatile Humanoid Manipulation with Touch Dreaming
4 Pith papers cite this work. Polarity classification is still indexing.
abstract
Humanoid robots promise general-purpose assistance, yet real-world humanoid loco-manipulation remains challenging because it requires whole-body stability, end-effector dexterity, and contact-aware interaction under frequent contact changes. In this work, we study dexterous, contact-rich humanoid loco-manipulation. We first develop an RL-based lower-body controller that serves as the stability backbone for whole-body execution during complex manipulation. Built on this controller, we develop a VR-based whole-body humanoid data collection system that integrates dexterous hands and tactile sensing for contact-rich manipulation. We then propose Humanoid Transformer with Touch Dreaming (HTD), a multimodal encoder--decoder Transformer that models touch as a core modality alongside multi-view vision and proprioception. HTD is trained in a single stage with behavioral cloning augmented by touch dreaming: in addition to predicting action chunks, the policy predicts future hand-joint forces and future tactile latents, with tactile-latent targets provided by an exponential moving average target encoder without requiring a separate tactile pretraining stage. This encourages the policy to learn contact-aware representations for dexterous manipulation. Across five real-world contact-rich tasks, HTD achieves a 90.9% relative improvement in average success rate over the stronger baseline. Ablation results further show that latent-space tactile prediction is more effective than raw tactile prediction, yielding a 30% relative gain in success rate. These results demonstrate that our touch-dreaming-enhanced learning system enables versatile, high-dexterity humanoid manipulation in the real world. More information and open-source materials are available at: humanoid-touch-dream.github.io.
citation-role summary
citation-polarity summary
fields
cs.RO 4years
2026 4verdicts
UNVERDICTED 4roles
background 1polarities
background 1representative citing papers
VibeAct bridges real vibro-acoustic sensing and sim-based RL via a shared contact/slip representation, outperforming proprioception baselines on contact-rich dexterous tasks with successful real-world transfer.
HumanoidUMI is a robot-free data collection framework that uses VR and UMI-inspired grippers to gather human demonstrations for training policies that enable humanoid whole-body manipulation.
BifrostUMI enables robot-free human demonstration capture via VR and wrist cameras to train visuomotor policies that predict keypoint trajectories for transfer to humanoid whole-body control through retargeting.
citing papers explorer
-
FTP-1: A Generalist Foundation Tactile Policy Across Tactile Sensors for Contact-Rich Manipulation
FTP-1 is the first foundation tactile policy pretrained on ~3000 hours of data from 26 sources across 21 sensors that improves performance on seen setups by 17.2% and transfers to unseen sensors with 31% success rate gain.
-
VibeAct: Vibration to Actions for Contact-Rich Reactive Robot Dexterity
VibeAct bridges real vibro-acoustic sensing and sim-based RL via a shared contact/slip representation, outperforming proprioception baselines on contact-rich dexterous tasks with successful real-world transfer.
-
HumanoidUMI: Bridging Robot-Free Demonstrations and Humanoid Whole-Body Manipulation
HumanoidUMI is a robot-free data collection framework that uses VR and UMI-inspired grippers to gather human demonstrations for training policies that enable humanoid whole-body manipulation.
-
BifrostUMI: Bridging Robot-Free Demonstrations and Humanoid Whole-Body Manipulation
BifrostUMI enables robot-free human demonstration capture via VR and wrist cameras to train visuomotor policies that predict keypoint trajectories for transfer to humanoid whole-body control through retargeting.