Touch in the wild: Learning fine-grained manipulation with a portable visuo-tactile gripper

Xinyue Zhu, Binghao Huang, Yunzhu Li · 2025 · arXiv 2507.15062

9 Pith papers cite this work. Polarity classification is still indexing.

9 Pith papers citing it

read on arXiv browse 9 citing papers

citation-role summary

background 4

citation-polarity summary

background 4

representative citing papers

Mobile UMI: Cross-View Diffusion Policy with Decoupled Kinematics for Mobile Manipulation

cs.RO · 2026-05-20 · conditional · novelty 7.0

A hardware-free dual-camera capture framework with ChArUco spatial unification and receding-horizon state alignment enables decoupled SE(3) manipulation and SE(2) base trajectories for diffusion policies, yielding 83.8% average success on four long-horizon household tasks.

AT-VLA: Adaptive Tactile Injection for Enhanced Feedback Reaction in Vision-Language-Action Models

cs.RO · 2026-05-08 · unverdicted · novelty 7.0 · 2 refs

AT-VLA proposes adaptive tactile injection and a dual-stream tactile reaction mechanism to enhance VLA models for contact-rich robotic manipulation with real-time responses.

TouchGuide: Inference-Time Steering of Visuomotor Policies via Touch Guidance

cs.RO · 2026-01-28 · unverdicted · novelty 7.0

TouchGuide improves contact-rich robot manipulation by steering diffusion or flow-matching visuomotor policies with tactile feasibility scores from a contrastively trained Contact Physical Model.

Multi-Modal Manipulation via Multi-Modal Policy Consensus

cs.RO · 2025-09-27 · unverdicted · novelty 7.0

A policy that factorizes into modality-specific diffusion models combined by a learned router network for adaptive multi-modal robotic manipulation.

FingerViP: Learning Real-World Dexterous Manipulation with Fingertip Visual Perception

cs.RO · 2026-04-23 · conditional · novelty 6.0

FingerViP equips each finger with a miniature camera and trains a multi-view diffusion policy that achieves 80.8% success on real-world dexterous tasks previously limited by wrist-camera occlusion.

HoMMI: Learning Whole-Body Mobile Manipulation from Human Demonstrations

cs.RO · 2026-03-03 · unverdicted · novelty 6.0

HoMMI learns whole-body mobile manipulation policies from robot-free human demonstrations by augmenting UMI with egocentric sensing and bridging the embodiment gap through an agnostic visual representation, relaxed head actions, and a whole-body controller.

Learning Tactile-Aware Quadrupedal Loco-Manipulation Policies

cs.RO · 2026-04-29 · unverdicted · novelty 5.0 · 2 refs

A hierarchical tactile-aware policy combines human-demonstration training for contact cue prediction with sim-to-real reinforcement learning to improve quadrupedal loco-manipulation performance by 28.54% over vision baselines on contact-rich tasks.

Learning Versatile Humanoid Manipulation with Touch Dreaming

cs.RO · 2026-04-14 · conditional · novelty 5.0

HTD, a multimodal transformer policy trained with behavioral cloning and touch dreaming to predict future tactile latents, achieves a 90.9% relative success rate improvement over baselines on five real-world contact-rich humanoid loco-manipulation tasks.

Tactile-based Multimodal Fusion in Embodied Intelligence: A Survey of Vision, Language, and Contact-Driven Paradigms

cs.RO · 2026-05-17 · unverdicted · novelty 4.0

A survey proposing a hierarchical taxonomy for multimodal tactile fusion datasets and methods across perception, generation, and interaction in embodied intelligence.

citing papers explorer

Showing 9 of 9 citing papers.

Mobile UMI: Cross-View Diffusion Policy with Decoupled Kinematics for Mobile Manipulation cs.RO · 2026-05-20 · conditional · none · ref 32
A hardware-free dual-camera capture framework with ChArUco spatial unification and receding-horizon state alignment enables decoupled SE(3) manipulation and SE(2) base trajectories for diffusion policies, yielding 83.8% average success on four long-horizon household tasks.
AT-VLA: Adaptive Tactile Injection for Enhanced Feedback Reaction in Vision-Language-Action Models cs.RO · 2026-05-08 · unverdicted · none · ref 48 · 2 links
AT-VLA proposes adaptive tactile injection and a dual-stream tactile reaction mechanism to enhance VLA models for contact-rich robotic manipulation with real-time responses.
TouchGuide: Inference-Time Steering of Visuomotor Policies via Touch Guidance cs.RO · 2026-01-28 · unverdicted · none · ref 79
TouchGuide improves contact-rich robot manipulation by steering diffusion or flow-matching visuomotor policies with tactile feasibility scores from a contrastively trained Contact Physical Model.
Multi-Modal Manipulation via Multi-Modal Policy Consensus cs.RO · 2025-09-27 · unverdicted · none · ref 39
A policy that factorizes into modality-specific diffusion models combined by a learned router network for adaptive multi-modal robotic manipulation.
FingerViP: Learning Real-World Dexterous Manipulation with Fingertip Visual Perception cs.RO · 2026-04-23 · conditional · none · ref 81
FingerViP equips each finger with a miniature camera and trains a multi-view diffusion policy that achieves 80.8% success on real-world dexterous tasks previously limited by wrist-camera occlusion.
HoMMI: Learning Whole-Body Mobile Manipulation from Human Demonstrations cs.RO · 2026-03-03 · unverdicted · none · ref 50
HoMMI learns whole-body mobile manipulation policies from robot-free human demonstrations by augmenting UMI with egocentric sensing and bridging the embodiment gap through an agnostic visual representation, relaxed head actions, and a whole-body controller.
Learning Tactile-Aware Quadrupedal Loco-Manipulation Policies cs.RO · 2026-04-29 · unverdicted · none · ref 8 · 2 links
A hierarchical tactile-aware policy combines human-demonstration training for contact cue prediction with sim-to-real reinforcement learning to improve quadrupedal loco-manipulation performance by 28.54% over vision baselines on contact-rich tasks.
Learning Versatile Humanoid Manipulation with Touch Dreaming cs.RO · 2026-04-14 · conditional · none · ref 44
HTD, a multimodal transformer policy trained with behavioral cloning and touch dreaming to predict future tactile latents, achieves a 90.9% relative success rate improvement over baselines on five real-world contact-rich humanoid loco-manipulation tasks.
Tactile-based Multimodal Fusion in Embodied Intelligence: A Survey of Vision, Language, and Contact-Driven Paradigms cs.RO · 2026-05-17 · unverdicted · none · ref 49
A survey proposing a hierarchical taxonomy for multimodal tactile fusion datasets and methods across perception, generation, and interaction in embodied intelligence.

Touch in the wild: Learning fine-grained manipulation with a portable visuo-tactile gripper

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer