Learning Versatile Humanoid Manipulation with Touch Dreaming

· 2026 · cs.RO · arXiv 2604.13015

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

open full Pith review browse 4 citing papers arXiv PDF

abstract

Humanoid robots promise general-purpose assistance, yet real-world humanoid loco-manipulation remains challenging because it requires whole-body stability, end-effector dexterity, and contact-aware interaction under frequent contact changes. In this work, we study dexterous, contact-rich humanoid loco-manipulation. We first develop an RL-based lower-body controller that serves as the stability backbone for whole-body execution during complex manipulation. Built on this controller, we develop a VR-based whole-body humanoid data collection system that integrates dexterous hands and tactile sensing for contact-rich manipulation. We then propose Humanoid Transformer with Touch Dreaming (HTD), a multimodal encoder--decoder Transformer that models touch as a core modality alongside multi-view vision and proprioception. HTD is trained in a single stage with behavioral cloning augmented by touch dreaming: in addition to predicting action chunks, the policy predicts future hand-joint forces and future tactile latents, with tactile-latent targets provided by an exponential moving average target encoder without requiring a separate tactile pretraining stage. This encourages the policy to learn contact-aware representations for dexterous manipulation. Across five real-world contact-rich tasks, HTD achieves a 90.9% relative improvement in average success rate over the stronger baseline. Ablation results further show that latent-space tactile prediction is more effective than raw tactile prediction, yielding a 30% relative gain in success rate. These results demonstrate that our touch-dreaming-enhanced learning system enables versatile, high-dexterity humanoid manipulation in the real world. More information and open-source materials are available at: humanoid-touch-dream.github.io.

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

FTP-1: A Generalist Foundation Tactile Policy Across Tactile Sensors for Contact-Rich Manipulation

cs.RO · 2026-06-11 · unverdicted · novelty 7.0

FTP-1 is the first foundation tactile policy pretrained on ~3000 hours of data from 26 sources across 21 sensors that improves performance on seen setups by 17.2% and transfers to unseen sensors with 31% success rate gain.

VibeAct: Vibration to Actions for Contact-Rich Reactive Robot Dexterity

cs.RO · 2026-06-25 · unverdicted · novelty 6.0

VibeAct bridges real vibro-acoustic sensing and sim-based RL via a shared contact/slip representation, outperforming proprioception baselines on contact-rich dexterous tasks with successful real-world transfer.

HumanoidUMI: Bridging Robot-Free Demonstrations and Humanoid Whole-Body Manipulation

cs.RO · 2026-06-25 · unverdicted · novelty 6.0

HumanoidUMI is a robot-free data collection framework that uses VR and UMI-inspired grippers to gather human demonstrations for training policies that enable humanoid whole-body manipulation.

BifrostUMI: Bridging Robot-Free Demonstrations and Humanoid Whole-Body Manipulation

cs.RO · 2026-05-05 · unverdicted · novelty 6.0

BifrostUMI enables robot-free human demonstration capture via VR and wrist cameras to train visuomotor policies that predict keypoint trajectories for transfer to humanoid whole-body control through retargeting.

citing papers explorer

Showing 4 of 4 citing papers after filters.

FTP-1: A Generalist Foundation Tactile Policy Across Tactile Sensors for Contact-Rich Manipulation cs.RO · 2026-06-11 · unverdicted · none · ref 14 · internal anchor
FTP-1 is the first foundation tactile policy pretrained on ~3000 hours of data from 26 sources across 21 sensors that improves performance on seen setups by 17.2% and transfers to unseen sensors with 31% success rate gain.
VibeAct: Vibration to Actions for Contact-Rich Reactive Robot Dexterity cs.RO · 2026-06-25 · unverdicted · none · ref 11 · internal anchor
VibeAct bridges real vibro-acoustic sensing and sim-based RL via a shared contact/slip representation, outperforming proprioception baselines on contact-rich dexterous tasks with successful real-world transfer.
HumanoidUMI: Bridging Robot-Free Demonstrations and Humanoid Whole-Body Manipulation cs.RO · 2026-06-25 · unverdicted · none · ref 6 · internal anchor
HumanoidUMI is a robot-free data collection framework that uses VR and UMI-inspired grippers to gather human demonstrations for training policies that enable humanoid whole-body manipulation.
BifrostUMI: Bridging Robot-Free Demonstrations and Humanoid Whole-Body Manipulation cs.RO · 2026-05-05 · unverdicted · none · ref 7 · internal anchor
BifrostUMI enables robot-free human demonstration capture via VR and wrist cameras to train visuomotor policies that predict keypoint trajectories for transfer to humanoid whole-body control through retargeting.

Learning Versatile Humanoid Manipulation with Touch Dreaming

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer