Seeing Touch from Motion: A Unified Modality-Aware Visuo-Tactile Policy with Tactile Motion Correlation
Pith reviewed 2026-06-30 05:53 UTC · model grok-4.3
The pith
The correlation between transient and cumulative tactile motion distinguishes fine-grained contact states that raw images and motion fields cannot.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper claims that the correlation between transient and cumulative motion explicitly distinguishes fine-grained contact states, and that a unified modality-aware policy built on the Mixture-of-Transformers architecture can capture cross-modal complementarity while preserving modality-specific properties.
What carries the argument
Tactile Motion Correlation: the per-pixel relationship between short-term motion vectors and the accumulated deformation field, used as the explicit representation of contact dynamics.
If this is right
- Contact states that appear identical in raw tactile images or cumulative fields become separable during policy execution.
- The Mixture-of-Transformers fusion lets the policy model interactions between vision and touch without discarding modality-specific information.
- The motion-aware representation supplies dynamic priors that reduce perception ambiguity in contact-rich tasks.
- The policy architecture supports simultaneous cross-modal and modality-specific processing in a single network.
Where Pith is reading between the lines
- If the correlation signature proves stable, the same representation could be applied to other elastic sensors whose deformation is imaged over time.
- Policies using this representation might tolerate moderate changes in speed or illumination without additional training data.
- The approach suggests that explicit dynamic priors extracted from motion can substitute for some hand-crafted filtering steps in tactile processing pipelines.
Load-bearing premise
The correlation patterns between transient and cumulative motion remain reliable across different gel materials, lighting conditions, and contact velocities without per-setup recalibration.
What would settle it
An experiment that records the same physical contact state under changed lighting or gel type and finds the transient-cumulative correlation values become indistinguishable from those of a different contact state.
read the original abstract
Visuo-Tactile policies leveraging optical tactile sensors have shown great promise in contact-rich manipulation. These sensors achieve high spatial resolution and multi-dimensional force sensing by utilizing an internal camera to monitor the deformation of their elastic gel surface, thereby indirectly inferring tactile cues. Despite their advantages, extracting fine-grained contact states necessary for contact-rich manipulation remains an open challenge. Existing methods typically use either raw images or cumulative motion fields to represent tactile cues. However, both are prone to perception ambiguity. Raw tactile images mainly capture appearance changes, while cumulative motion fields only reflect the aggregate gel deformation. Consequently, distinct fine-grained contact states can exhibit highly similar patterns, making it difficult to explicitly distinguish subtle contact variations. To address this issue, we explore the dynamic priors of tactile motion and discover that the correlation between transient and cumulative motion can explicitly distinguish fine-grained contact states. Based on this insight, we propose a motion-aware tactile representation to facilitate contact-rich manipulation. Beyond tactile representation, effective fusion of tactile and visual modalities is also critical. Most existing fusion methods either directly concatenate features from each modality or train modality-specific networks separately and fuse their outputs. However, these strategies struggle to simultaneously model cross-modal interactions and preserve modality-specific characteristics. In this work, we take advantage of the Mixture-of-Transformers architecture and propose a unified modality-aware visuo-tactile policy that captures cross-modal complementarity while maintaining modality-specific properties.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that the correlation between transient and cumulative motion in optical tactile sensors can explicitly distinguish fine-grained contact states that raw images and cumulative motion fields cannot. It proposes a motion-aware tactile representation based on this insight and a unified modality-aware visuo-tactile policy using the Mixture-of-Transformers architecture to capture cross-modal complementarity while maintaining modality-specific properties for contact-rich manipulation.
Significance. If the correlation provides stable new information across setups, this could enhance perception of subtle contact variations in visuo-tactile robotic policies, addressing ambiguity in existing representations. The Mixture-of-Transformers fusion choice is a standard way to balance interactions and specificity, but its advantage here would need demonstration.
major comments (2)
- [Abstract] Abstract: The abstract states the discovery and architectural choice but supplies no quantitative results, ablation studies, or error analysis; without these it is impossible to verify whether the claimed distinction actually holds or whether the fusion preserves modality-specific properties.
- [Experiments] The central claim requires that the correlation signature remains reliable across gel materials, lighting conditions, and contact velocities without per-setup recalibration. No cross-material, cross-lighting, or cross-velocity experiments are described, so the patterns could be artifacts of a single sensor configuration rather than a general dynamic prior.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on the abstract and the generalizability of the tactile motion correlation. We address the major comments point by point below.
read point-by-point responses
-
Referee: [Abstract] Abstract: The abstract states the discovery and architectural choice but supplies no quantitative results, ablation studies, or error analysis; without these it is impossible to verify whether the claimed distinction actually holds or whether the fusion preserves modality-specific properties.
Authors: We agree that the abstract, as a concise summary, would benefit from including key quantitative indicators to allow readers to assess the claims immediately. In the revised version, we will update the abstract to reference specific results from the experiments, including contact state distinction accuracy improvements and policy success rates, while pointing to the ablation studies on the motion-aware representation and Mixture-of-Transformers fusion detailed in the main text. revision: yes
-
Referee: [Experiments] The central claim requires that the correlation signature remains reliable across gel materials, lighting conditions, and contact velocities without per-setup recalibration. No cross-material, cross-lighting, or cross-velocity experiments are described, so the patterns could be artifacts of a single sensor configuration rather than a general dynamic prior.
Authors: The referee is correct that the manuscript does not include explicit cross-material, cross-lighting, or cross-velocity experiments. Our current evaluations use a standard sensor setup to demonstrate the method. The correlation is derived from the physics of transient versus cumulative gel deformation, which we argue is a general dynamic prior for elastic surfaces. In revision, we will add experiments varying lighting and contact velocities on the existing sensor and expand the discussion section with analysis of why the signature should generalize across gel materials without recalibration. revision: partial
Circularity Check
No significant circularity detected
full rationale
The paper presents an empirical observation that transient-cumulative motion correlation distinguishes contact states, followed by an architectural proposal using Mixture-of-Transformers for fusion. No equations, fitted parameters renamed as predictions, or self-citations as load-bearing uniqueness theorems appear in the abstract or described chain. The central claim is framed as a discovery from data patterns rather than a derivation that reduces to its own inputs by construction, making the work self-contained.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Daimon optical tactile sensor, dm-tac w2.https://www.dmrobot.com/product/p1/dm-tacw2.html, DM-Tac W2
-
[2]
Neote ai optical tactile sensor, intac s1.https://www.neoteai.com/, InTac S1
-
[3]
Xense optical tactile sensor, xensesensor.https://www.xenserobotics.com/product/367/detail/9, Xens- eSensor
-
[4]
Principal component analysis.Wileyinterdisciplinary reviews: computational statistics, 2(4):433–459, 2010
Hervé Abdi and Lynne J Williams. Principal component analysis.Wileyinterdisciplinary reviews: computational statistics, 2(4):433–459, 2010
2010
-
[5]
Reskin: versatile, replaceable, lasting tactile skins.arXivpreprintarXiv:2111.00071, 2021
Raunaq Bhirangi, Tess Hellebrekers, Carmel Majidi, and Abhinav Gupta. Reskin: versatile, replaceable, lasting tactile skins.arXivpreprintarXiv:2111.00071, 2021
arXiv 2021
-
[6]
Motus: A unified latent action world model.arXiv preprint arXiv:2512.13030, 2025
Hongzhe Bi, Hengkai Tan, Shenghao Xie, Zeyuan Wang, Shuhe Huang, Haitian Liu, Ruowen Zhao, Yao Feng, Chendong Xiang, Yinze Rong, et al. Motus: A unified latent action world model.arXiv preprint arXiv:2512.13030, 2025
Pith/arXiv arXiv 2025
-
[7]
Jianxin Bi, Kevin Yuchen Ma, Ce Hao, Mike Zheng Shou, and Harold Soh. Vla-touch: Enhancing vision-language- action models with dual-level tactile feedback.arXivpreprint arXiv:2507.17294, 2025
arXiv 2025
-
[8]
Junhao Cai, Zetao Cai, Jiafei Cao, Yilun Chen, Zeyu He, Lei Jiang, Hang Li, Hengjie Li, Yang Li, Yufei Liu, et al. Internvla-a1: Unifying understanding, generation and action for robotic manipulation.arXiv preprint arXiv:2601.02456, 2026
arXiv 2026
-
[9]
Multi-modal manipulation via multi-modal policy consensus.arXiv preprint arXiv:2509.23468, 2025
Haonan Chen, Jiaming Xu, Hongyu Chen, Kaiwen Hong, Binghao Huang, Chaoqi Liu, Jiayuan Mao, Yunzhu Li, Yilun Du, and Katherine Driggs-Campbell. Multi-modal manipulation via multi-modal policy consensus.arXiv preprint arXiv:2509.23468, 2025
Pith/arXiv arXiv 2025
-
[10]
Wendi Chen, Han Xue, Yi Wang, Fangyuan Zhou, Jun Lv, Yang Jin, Shirun Tang, Chuan Wen, and Cewu Lu. Implicitrdp: An end-to-end visual-force diffusion policy with structural slow-fast learning.arXiv preprint arXiv:2512.10946, 2025
arXiv 2025
-
[11]
Sam 3d: 3dfy anything in images.arXivpreprintarXiv:2511.16624, 2025
Xingyu Chen, Fu-Jen Chu, Pierre Gleize, Kevin J Liang, Alexander Sax, Hao Tang, Weiyao Wang, Michelle Guo, Thibaut Hardin, Xiang Li, et al. Sam 3d: 3dfy anything in images.arXivpreprintarXiv:2511.16624, 2025
Pith/arXiv arXiv 2025
-
[12]
Visuo-tactile transformers for manipulation
Yizhou Chen, Andrea Sipos, Mark Van der Merwe, and Nima Fazeli. Visuo-tactile transformers for manipulation. arXiv preprintarXiv:2210.00121, 2022
arXiv 2022
-
[13]
Zhengxue Cheng, Yiqian Zhang, Wenkang Zhang, Haoyu Li, Keyu Wang, Li Song, and Hengdi Zhang. Omnivtla: Vision-tactile-language-action model with semantic-aligned tactile sensing.arXivpreprintarXiv:2508.08706, 2025
arXiv 2025
-
[14]
Diffusion policy: Visuomotor policy learning via action diffusion.The International Journal of RoboticsResearch, 44(10-11):1684–1704, 2025
Cheng Chi, Zhenjia Xu, Siyuan Feng, Eric Cousineau, Yilun Du, Benjamin Burchfiel, Russ Tedrake, and Shuran Song. Diffusion policy: Visuomotor policy learning via action diffusion.The International Journal of RoboticsResearch, 44(10-11):1684–1704, 2025
2025
-
[15]
Emerging properties in unified multimodal pretraining.arXiv preprintarXiv:2505.14683, 2025
Chaorui Deng, Deyao Zhu, Kunchang Li, Chenhui Gou, Feng Li, Zeyu Wang, Shu Zhong, Weihao Yu, Xiaonan Nie, Ziang Song, et al. Emerging properties in unified multimodal pretraining.arXiv preprintarXiv:2505.14683, 2025
Pith/arXiv arXiv 2025
-
[16]
Ruoxuan Feng, Di Hu, Wenke Ma, and Xuelong Li. Play to the score: Stage-guided dynamic multi-sensory fusion for robotic manipulation.arXivpreprintarXiv:2408.01366, 2024
arXiv 2024
-
[17]
Ruoxuan Feng, Jiangyu Hu, Wenke Xia, Tianci Gao, Ao Shen, Yuhao Sun, Bin Fang, and Di Hu. Anytouch: Learning unified static-dynamic representation across multiple visuo-tactile sensors.arXivpreprintarXiv:2502.12191, 2025
arXiv 2025
-
[18]
Ruoxuan Feng, Yuxuan Zhou, Siyu Mei, Dongzhan Zhou, Pengwei Wang, Shaowei Cui, Bin Fang, Guocai Yao, and Di Hu. Anytouch 2: General optical tactile representation learning for dynamic tactile perception.arXivpreprint arXiv:2602.09617, 2026
arXiv 2026
-
[19]
Vital pretraining: Visuo-tactile pretraining for tactile and non-tactile manipulation policies
Abraham George, Selam Gano, Pranav Katragadda, and Amir Barati Farimani. Vital pretraining: Visuo-tactile pretraining for tactile and non-tactile manipulation policies. In2025IEEEInternational ConferenceonRoboticsand Automation(ICRA), pages 258–264. IEEE, 2025. 13
2025
-
[20]
Manualvla: A unified vla model for chain-of-thought manual generation and robotic manipulation
Chenyang Gu, Jiaming Liu, Hao Chen, Runzhong Huang, Qingpo Wuwu, Zhuoyang Liu, Xiaoqi Li, Ying Li, Renrui Zhang, Peng Jia, et al. Manualvla: A unified vla model for chain-of-thought manual generation and robotic manipulation. arXiv preprintarXiv:2512.02013, 2025
arXiv 2025
-
[21]
Tactilealoha: Learning bimanual manipulation with tactile sensing.IEEERoboticsandAutomationLetters, 2025
Ningquan Gu, Kazuhiro Kosuge, and Mitsuhiro Hayashibe. Tactilealoha: Learning bimanual manipulation with tactile sensing.IEEERoboticsandAutomationLetters, 2025
2025
-
[22]
Foar: Force-aware reactive policy for contact-rich robotic manipulation.IEEE Roboticsand AutomationLetters, 2025
Zihao He, Hongjie Fang, Jingjing Chen, Hao-Shu Fang, and Cewu Lu. Foar: Force-aware reactive policy for contact-rich robotic manipulation.IEEE Roboticsand AutomationLetters, 2025
2025
-
[23]
Über integrale der hydrodynamischen gleichungen, welche den wirbelbewegungen entsprechen
H von Helmholtz. Über integrale der hydrodynamischen gleichungen, welche den wirbelbewegungen entsprechen. 1858
-
[24]
CarolinaHiguera,AkashSharma,ChaithanyaKrishnaBodduluri,TaoshaFan,PatrickLancaster,MrinalKalakrishnan, Michael Kaess, Byron Boots, Mike Lambeta, Tingfan Wu, et al. Sparsh: Self-supervised touch representations for vision-based tactile sensing.arXivpreprint arXiv:2410.24090, 2024
arXiv 2024
-
[25]
Seeing through your skin: Recognizing objects with a novel visuotactile sensor
Francois R Hogan, Michael Jenkin, Sahand Rezaei-Shoshtari, Yogesh Girdhar, David Meger, and Gregory Dudek. Seeing through your skin: Recognizing objects with a novel visuotactile sensor. InProceedingsofthe IEEE/CVF winter conferenceon applicationsofcomputervision, pages 1218–1227, 2021
2021
-
[26]
BinghaoHuang,YixuanWang,XinyiYang,YiyueLuo,andYunzhuLi. 3d-vitac: Learningfine-grainedmanipulation with visuo-tactile sensing.arXivpreprintarXiv:2410.24091, 2024
arXiv 2024
-
[27]
Binghao Huang, Jie Xu, Iretiayo Akinola, Wei Yang, Balakumar Sundaralingam, Rowland O’Flaherty, Dieter Fox, Xiaolong Wang, Arsalan Mousavian, Yu-Wei Chao, et al. Vt-refine: Learning bimanual assembly with visuo-tactile feedback via simulation fine-tuning.arXivpreprint arXiv:2510.14930, 2025
arXiv 2025
-
[28]
Jialei Huang, Shuo Wang, Fanqi Lin, Yihang Hu, Chuan Wen, and Yang Gao. Tactile-vla: unlocking vision-language- action model’s physical knowledge for tactile generalization.arXivpreprintarXiv:2507.09160, 2025
arXiv 2025
-
[29]
Wenhui Huang, Changhe Chen, Han Qi, Chen Lv, Yilun Du, and Heng Yang. Motvla: A vision-language-action model with unified fast-slow reasoning.arXiv preprintarXiv:2510.18337, 2025
arXiv 2025
-
[30]
Yuzhe Huang, Pei Lin, Wanlin Li, Daohan Li, Jiajun Li, Jiaming Jiang, Chenxi Xiao, and Ziyuan Jiao. Tactile-force alignment in vision-language-action models for force-aware manipulation.arXivpreprintarXiv:2601.20321, 2026
arXiv 2026
-
[31]
Highly sensitive soft tactile sensors for an anthropomorphic robotic hand.IEEEsensors Journal, 15(8):4226–4233, 2015
Lorenzo Jamone, Lorenzo Natale, Giorgio Metta, and Giulio Sandini. Highly sensitive soft tactile sensors for an anthropomorphic robotic hand.IEEEsensors Journal, 15(8):4226–4233, 2015
2015
-
[32]
Rotipbot: Robotic handling of thin and flexible objects using rotatable tactile sensors.IEEE TransactionsonRobotics, 2025
Jiaqi Jiang, Xuyang Zhang, Daniel Fernandes Gomes, Thanh-Toan Do, and Shan Luo. Rotipbot: Robotic handling of thin and flexible objects using rotatable tactile sensors.IEEE TransactionsonRobotics, 2025
2025
-
[33]
Srum: Fine-grained self-rewarding for unified multimodal models.arXivpreprintarXiv:2510.12784, 2025
Weiyang Jin, Yuwei Niu, Jiaqi Liao, Chengqi Duan, Aoxue Li, Shenghua Gao, and Xihui Liu. Srum: Fine-grained self-rewarding for unified multimodal models.arXivpreprintarXiv:2510.12784, 2025
Pith/arXiv arXiv 2025
-
[34]
Xuhui Kang, Tongxuan Tian, Sung-Wook Lee, Binghao Huang, Yunzhu Li, and Yen-Ling Kuo. Learning force- regulated manipulation with a low-cost tactile-force-controlled gripper.arXivpreprintarXiv:2602.10013, 2026
arXiv 2026
-
[35]
Adam: A method for stochastic optimization.arXivpreprintarXiv:1412.6980, 2014
Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization.arXivpreprintarXiv:1412.6980, 2014
Pith/arXiv arXiv 2014
-
[36]
Fast optical flow using dense inverse search
Till Kroeger, Radu Timofte, Dengxin Dai, and Luc Van Gool. Fast optical flow using dense inverse search. In European conferenceoncomputervision, pages 471–488. Springer, 2016
2016
-
[37]
Geonhyup Lee, Yeongjin Lee, Kangmin Kim, Seongju Lee, Sangjun Noh, Seunghyeok Back, and Kyoobin Lee. Manipforce: Force-guided policy learning with frequency-aware representation for contact-rich manipulation.arXiv preprint arXiv:2509.19047, 2025
arXiv 2025
-
[38]
Hao Li, Yizhi Zhang, Junzhe Zhu, Shaoxiong Wang, Michelle A Lee, Huazhe Xu, Edward Adelson, Li Fei-Fei, Ruohan Gao, and Jiajun Wu. See, hear, and feel: Smart sensory fusion for robotic manipulation.arXiv preprint arXiv:2212.03858, 2022. 14
arXiv 2022
-
[39]
Jinzhou Li, Tianhao Wu, Jiyao Zhang, Zeyuan Chen, Haotian Jin, Mingdong Wu, Yujun Shen, Yaodong Yang, and Hao Dong. Adaptive visuo-tactile fusion with predictive force attention for dexterous manipulation.arXiv preprint arXiv:2505.13982, 2025
arXiv 2025
-
[40]
Causal world modeling for robot control.arXiv preprintarXiv:2601.21998, 2026
Lin Li, Qihang Zhang, Yiming Luo, Shuai Yang, Ruilin Wang, Fei Han, Mingrui Yu, Zelin Gao, Nan Xue, Xing Zhu, et al. Causal world modeling for robot control.arXiv preprintarXiv:2601.21998, 2026
Pith/arXiv arXiv 2026
-
[41]
When vision meets touch: A contemporary review for visuotactile sensors from the signal processing perspective
Shoujie Li, Zihan Wang, Changsheng Wu, Xiang Li, Shan Luo, Bin Fang, Fuchun Sun, Xiao-Ping Zhang, and Wenbo Ding. When vision meets touch: A contemporary review for visuotactile sensors from the signal processing perspective. IEEE Journal ofSelectedTopicsinSignalProcessing, 18(3):267–287, 2024
2024
-
[42]
Yuyang Li, Yinghan Chen, Zihang Zhao, Puhao Li, Tengyu Liu, Siyuan Huang, and Yixin Zhu. Simultaneous tactile-visual perception for learning multimodal robot manipulation.arXivpreprintarXiv:2512.09851, 2025
arXiv 2025
-
[43]
Mixture-of-transformers: Asparseandscalablearchitectureformulti-modalfoundation models
Weixin Liang, Lili Yu, Liang Luo, Srinivasan Iyer, Ning Dong, Chunting Zhou, Gargi Ghosh, Mike Lewis, Wen-tau Yih,LukeZettlemoyer,etal. Mixture-of-transformers: Asparseandscalablearchitectureformulti-modalfoundation models. arXivpreprint arXiv:2411.04996, 2024
Pith/arXiv arXiv 2024
-
[44]
Changyi Lin, Ziqi Lin, Shaoxiong Wang, and Huazhe Xu. Dtact: A vision-based tactile sensor that measures high-resolution 3d geometry directly from darkness.arXivpreprintarXiv:2209.13916, 2022
arXiv 2022
-
[45]
9dtact: A compact vision-based tactile sensor for accurate 3d shape reconstruction and generalizable 6d force estimation.IEEE Roboticsand AutomationLetters, 9 (2):923–930, 2023
Changyi Lin, Han Zhang, Jikai Xu, Lei Wu, and Huazhe Xu. 9dtact: A compact vision-based tactile sensor for accurate 3d shape reconstruction and generalizable 6d force estimation.IEEE Roboticsand AutomationLetters, 9 (2):923–930, 2023
2023
-
[46]
Fangchen Liu, Chuanyu Li, Yihua Qin, Jing Xu, Pieter Abbeel, and Rui Chen. Vitamin: Learning contact-rich tasks through robot-free visuo-tactile manipulation interface.arXivpreprintarXiv:2504.06156, 2025
arXiv 2025
-
[47]
Neuro-inspired electronic skin for robots.Science robotics, 7(67):eabl7344, 2022
Fengyuan Liu, Sweety Deswal, Adamos Christou, Yulia Sandamirskaya, Mohsen Kaboli, and Ravinder Dahiya. Neuro-inspired electronic skin for robots.Science robotics, 7(67):eabl7344, 2022
2022
-
[48]
Printed synaptic transistor–based electronic skin for robots to feel and learn
Fengyuan Liu, Sweety Deswal, Adamos Christou, Mahdieh Shojaei Baghini, Radu Chirila, Dhayalan Shakthivel, Moupali Chakraborty, and Ravinder Dahiya. Printed synaptic transistor–based electronic skin for robots to feel and learn. ScienceRobotics, 7(67):eabl7286, 2022
2022
-
[49]
Jason Jingzhou Liu, Yulong Li, Kenneth Shaw, Tony Tao, Ruslan Salakhutdinov, and Deepak Pathak. Factr: Force-attending curriculum training for contact-rich policy learning.arXiv preprintarXiv:2502.17432, 2025
arXiv 2025
-
[50]
Zhuoyang Liu, Jiaming Liu, Jiadong Xu, Nuowei Han, Chenyang Gu, Hao Chen, Kaichen Zhou, Renrui Zhang, Kai Chin Hsieh, Kun Wu, et al. Mla: A multisensory language-action model for multimodal understanding and forecasting in robotic manipulation.arXivpreprintarXiv:2509.26642, 2025
arXiv 2025
- [51]
-
[52]
Tactilerobotics: An outlook.IEEE TransactionsonRobotics, 2025
ShanLuo,NathanFLepora,WenzhenYuan,KasparAlthoefer,GordonCheng,andRavinderDahiya. Tactilerobotics: An outlook.IEEE TransactionsonRobotics, 2025
2025
-
[53]
Mc-tac: Modularcamera-basedtactilesensorforrobotgripper
JiejiRen,JiangZou,andGuoyingGu. Mc-tac: Modularcamera-basedtactilesensorforrobotgripper. In International Conferenceon IntelligentRoboticsandApplications, pages 169–179. Springer, 2023
2023
-
[54]
Gelslim 3.0: High-resolution measurement of shape, force and slip in a compact tactile-sensing finger
Ian H Taylor, Siyuan Dong, and Alberto Rodriguez. Gelslim 3.0: High-resolution measurement of shape, force and slip in a compact tactile-sensing finger. In2022 International Conferenceon Roboticsand Automation(ICRA), pages 10781–10787. IEEE, 2022
2022
-
[55]
Built different: Tactile perception to overcome cross-embodiment capability differences in collaborative manipulation.arXive-prints, pages arXiv–2409, 2024
William van den Bogert, Madhavan Iyengar, and Nima Fazeli. Built different: Tactile perception to overcome cross-embodiment capability differences in collaborative manipulation.arXive-prints, pages arXiv–2409, 2024
2024
-
[56]
Xiang Wang, Zhifei Zhang, He Zhang, Zhe Lin, Yuqian Zhou, Qing Liu, Shiwei Zhang, Yijun Li, Shaoteng Liu, Haitian Zheng, et al. Hbridge: H-shape bridging of heterogeneous experts for unified multimodal understanding and generation.arXiv preprintarXiv:2511.20520, 2025. 15
arXiv 2025
-
[57]
Soft robotics, 5(2):216–227, 2018
Benjamin Ward-Cherrier, Nicholas Pestell, Luke Cramphorn, Benjamin Winstone, Maria Elena Giannaccini, Jonathan Rossiter,andNathanFLepora.Thetactipfamily: Softopticaltactilesensorswith3d-printedbiomimeticmorphologies. Soft robotics, 5(2):216–227, 2018
2018
-
[58]
Longyan Wu, Checheng Yu, Jieji Ren, Li Chen, Yufei Jiang, Ran Huang, Guoying Gu, and Hongyang Li. Freetacman: Robot-free visuo-tactile data collection system for contact-rich manipulation.arXivpreprintarXiv:2506.01941, 2025
arXiv 2025
-
[59]
Canonical representation and force-based pretraining of 3d tactile for dexterous visuo-tactile policy learning
Tianhao Wu, Jinzhou Li, Jiyao Zhang, Mingdong Wu, and Hao Dong. Canonical representation and force-based pretraining of 3d tactile for dexterous visuo-tactile policy learning. In2025 IEEE International Conference on RoboticsandAutomation(ICRA), pages 6786–6792. IEEE, 2025
2025
-
[60]
A pragmatic vla foundation model.arXivpreprintarXiv:2601.18692, 2026
Wei Wu, Fan Lu, Yunnan Wang, Shuai Yang, Shi Liu, Fangjing Wang, Qian Zhu, He Sun, Yong Wang, Shuailei Ma, et al. A pragmatic vla foundation model.arXivpreprintarXiv:2601.18692, 2026
Pith/arXiv arXiv 2026
-
[61]
Yue Xu, Litao Wei, Pengyu An, Qingyu Zhang, and Yong-Lu Li. exumi: Extensible robot teaching system with action-aware task-agnostic tactile representation.arXivpreprintarXiv:2509.14688, 2025
arXiv 2025
-
[62]
Han Xue, Jieji Ren, Wendi Chen, Gu Zhang, Yuan Fang, Guoying Gu, Huazhe Xu, and Cewu Lu. Reactive diffusion policy: Slow-fast visual-tactile policy learning for contact-rich manipulation.arXivpreprintarXiv:2503.02881, 2025
arXiv 2025
-
[63]
Implementing tactile behaviors using fingervision
Akihiko Yamaguchi and Christopher G Atkeson. Implementing tactile behaviors using fingervision. In2017 IEEE-RAS 17th International Conferenceon Humanoid Robotics(Humanoids), pages 241–248. IEEE, 2017
2017
-
[64]
Jiawen Yu, Hairuo Liu, Qiaojun Yu, Jieji Ren, Ce Hao, Haitong Ding, Guangyu Huang, Guofan Huang, Yan Song, Panpan Cai, et al. Forcevla: Enhancing vla models with a force-aware moe for contact-rich manipulation.arXiv preprint arXiv:2505.22159, 2025
arXiv 2025
-
[65]
KelinYu,YunhaiHan,QixianWang,VaibhavSaxena,DanfeiXu,andYeZhao. Mimictouch: Leveragingmulti-modal human tactile demonstrations for contact-rich manipulation.arXivpreprintarXiv:2310.16917, 2023
arXiv 2023
-
[66]
Gelsight: High-resolution robot tactile sensors for estimating geometry and force.Sensors, 17(12):2762, 2017
Wenzhen Yuan, Siyuan Dong, and Edward H Adelson. Gelsight: High-resolution robot tactile sensors for estimating geometry and force.Sensors, 17(12):2762, 2017
2017
-
[67]
Chaofan Zhang, Peng Hao, Xiaoge Cao, Xiaoshuai Hao, Shaowei Cui, and Shuo Wang. Vtla: Vision-tactile- language-action model with preference learning for insertion manipulation.arXiv preprint arXiv:2505.09577, 2025
arXiv 2025
-
[68]
Finger-inspired rigid-soft hybrid tactile sensor with superior sensitivity at high frequency.Nature communications, 13(1):5076, 2022
Jinhui Zhang, Haimin Yao, Jiaying Mo, Songyue Chen, Yu Xie, Shenglin Ma, Rui Chen, Tao Luo, Weisong Ling, Lifeng Qin, et al. Finger-inspired rigid-soft hybrid tactile sensor with superior sensitivity at high frequency.Nature communications, 13(1):5076, 2022
2022
-
[69]
Zhemeng Zhang, Jiahua Ma, Xincheng Yang, Xin Wen, Yuzhi Zhang, Boyan Li, Yiran Qin, Jin Liu, Can Zhao, Li Kang, et al. Touchguide: Inference-time steering of visuomotor policies via touch guidance.arXiv preprint arXiv:2601.20239, 2026
Pith/arXiv arXiv 2026
-
[70]
Tony Z Zhao, Vikash Kumar, Sergey Levine, and Chelsea Finn. Learning fine-grained bimanual manipulation with low-cost hardware.arXivpreprintarXiv:2304.13705, 2023
Pith/arXiv arXiv 2023
-
[71]
Aloha unleashed: A simple recipe for robot dexterity.arXiv preprintarXiv:2410.13126, 2024
Tony Z Zhao, Jonathan Tompson, Danny Driess, Pete Florence, Kamyar Ghasemipour, Chelsea Finn, and Ayzaan Wahid. Aloha unleashed: A simple recipe for robot dexterity.arXiv preprintarXiv:2410.13126, 2024
arXiv 2024
-
[72]
Zifan Zhao, Siddhant Haldar, Jinda Cui, Lerrel Pinto, and Raunaq Bhirangi. Touch begins where vision ends: Generalizable policies for contact-rich manipulation.arXivpreprintarXiv:2506.13762, 2025
arXiv 2025
-
[73]
XinyueZhu,BinghaoHuang,andYunzhuLi. Touchinthewild: Learningfine-grainedmanipulationwithaportable visuo-tactile gripper.arXivpreprint arXiv:2507.15062, 2025
arXiv 2025
-
[74]
Residual rotation correction using tactile equivariance.arXivpreprintarXiv:2511.07381, 2025
Yizhe Zhu, Zhang Ye, Boce Hu, Haibo Zhao, Yu Qi, Dian Wang, and Robert Platt. Residual rotation correction using tactile equivariance.arXivpreprintarXiv:2511.07381, 2025. 16 Appendix A Overview This appendix is organized as follows: (Section B) We offer additional analysis and details about the proposed method. • We provide more analysis of tactile motion...
arXiv 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.