{"paper":{"title":"TouchAnything: A Dataset and Framework for Bimanual Tactile Estimation from Egocentric Video","license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","headline":"Tactile pressure maps can be predicted from egocentric video of bimanual interactions by incorporating optional wrist views.","cross_cats":[],"primary_cat":"cs.RO","authors_text":"Chuqiao Lyu, Feiyang Hong, Guannan Zhang, Haotian Wu, Jianyi Zhou, Ruichen Zhen, Shuo Yang, Weisheng Dai, Wenbo Ding, Xushi Wang, Yinian Mao, Yuxiang Jiang, Zirui Liu, Ziteng Gao","submitted_at":"2026-05-13T06:54:36Z","abstract_excerpt":"Egocentric human video data, which captures rich human-environment interactions and can be collected at scale, has become a key driver of embodied intelligence research. However, existing egocentric datasets typically lack tactile sensing, a critical modality that provides direct cues about contact, force, and pressure in human-object interaction. Without such signals, models struggle to learn physically grounded representations of real-world interaction dynamics. While tactile sensors provide these cues, deploying high-quality tactile hardware at scale remains expensive and cumbersome. This r"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"Experiments show that incorporating wrist-mounted views generally improves tactile prediction over egocentric-only input, achieving up to 5.0% relative improvement in Contact IoU and 6.1% relative improvement in Volumetric IoU.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"That the wearable tactile sensors deliver accurate, dense, and synchronized ground-truth pressure maps suitable for supervising vision-based prediction models across diverse tasks and environments.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"EgoTouch is a new multi-view egocentric dataset with dense bimanual tactile supervision, and TouchAnything is a baseline framework showing that wrist views improve vision-based tactile prediction over egocentric input alone.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"Tactile pressure maps can be predicted from egocentric video of bimanual interactions by incorporating optional wrist views.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"e30c55da95eac0dd1a057509506aac8d507bc00c90aa44ad85996d3077a69721"},"source":{"id":"2605.13083","kind":"arxiv","version":1},"verdict":{"id":"727c95f3-fe39-49f7-8084-b6bbd8c84b52","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-14T18:58:02.368438Z","strongest_claim":"Experiments show that incorporating wrist-mounted views generally improves tactile prediction over egocentric-only input, achieving up to 5.0% relative improvement in Contact IoU and 6.1% relative improvement in Volumetric IoU.","one_line_summary":"EgoTouch is a new multi-view egocentric dataset with dense bimanual tactile supervision, and TouchAnything is a baseline framework showing that wrist views improve vision-based tactile prediction over egocentric input alone.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"That the wearable tactile sensors deliver accurate, dense, and synchronized ground-truth pressure maps suitable for supervising vision-based prediction models across diverse tasks and environments.","pith_extraction_headline":"Tactile pressure maps can be predicted from egocentric video of bimanual interactions by incorporating optional wrist views."},"references":{"count":42,"sample":[{"doi":"10.1109/access.2025.3648171","year":2026,"title":"Dawadi, and Sushant Chalise","work_id":"52393e47-5d0d-499f-88f4-d8ffa13aaa3f","ref_index":1,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2019,"title":"ContactDB: Analyzing and Predicting Grasp Contact via Thermal Imaging","work_id":"0a774f49-ec3a-4b89-a951-c3b249accb9f","ref_index":3,"cited_arxiv_id":"1904.06830","is_internal_anchor":true},{"doi":"","year":2021,"title":"Narang, Karl Van Wyk, Umar Iqbal, Stan Birchfield, Jan Kautz, and Dieter Fox","work_id":"49bf2ea7-48ba-4bc5-9904-a62a40e73cfc","ref_index":4,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2018,"title":"Scaling egocentric vision: The epic-kitchens dataset","work_id":"a3f122c9-fbd3-4e1d-a842-7bd2c9e20d6b","ref_index":5,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2022,"title":"Actionsense: A multimodal dataset and recording framework for human activities using wearable sensors in a kitchen environment","work_id":"5fe6e5d2-e4be-44bd-a324-3ff2a65a30e1","ref_index":6,"cited_arxiv_id":"","is_internal_anchor":false}],"resolved_work":42,"snapshot_sha256":"4082b4739056cf841d90cc14defc0b66c7b41d776ed7b95575a75d8e9f12340b","internal_anchors":4},"formal_canon":{"evidence_count":2,"snapshot_sha256":"b75cf3d589c8b0d14464fa6347ec9f853c51075506abd06114b423472fea50c8"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"}