TacO: Benchmarking Tactile Sensors for Object Manipulation

Alexiy Buynitsky; Anya Zorin; Junsung Park; Michael T. Tolley; Myungsun Park; Oliver Kroemer; Sachin Bhadang; Sha Yi; Sohee John Yoon; Taejun Park

arxiv: 2605.21976 · v1 · pith:F2DRCKSOnew · submitted 2026-05-21 · 💻 cs.RO

TacO: Benchmarking Tactile Sensors for Object Manipulation

Anya Zorin , Zilin Si , Myungsun Park , Junsung Park , Alexiy Buynitsky , Sachin Bhadang , Taejun Park , Sohee John Yoon

show 6 more authors

Yong-Lae Park Oliver Kroemer Zeynep Temel Michael T. Tolley Sha Yi Xiaolong Wang

This is my paper

Pith reviewed 2026-05-22 05:43 UTC · model grok-4.3

classification 💻 cs.RO

keywords tactile sensorsrobot manipulationsensor benchmarkingpolicy learningcontact-rich taskspick and placeobject reorientationplug insertion

0 comments

The pith

Tactile sensor usefulness for robot manipulation depends on modality, material, and task rather than applying uniformly.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper compares four tactile sensor modalities by training separate manipulation policies for each and testing them on three contact-rich tasks. It finds that properties such as spatial resolution, shear sensing, and tactile representation interact with object friction and task demands to produce different success rates. This evaluation framework supplies concrete data for matching sensors to applications instead of assuming tactile input always helps equally. A sympathetic reader would care because it replaces broad claims about tactile sensing with task-specific performance evidence that can inform hardware choices in robotics.

Core claim

Policies trained independently for visual, acoustic, magnetic, and resistive tactile sensors exhibit distinct performance profiles on pick-and-place with unknown mass, object reorientation, and plug insertion; the usefulness of the resulting tactile information is modulated by each sensor's spatial resolution, shear-sensing capability, data representation, and the friction characteristics of the manipulated materials.

What carries the argument

Task-driven benchmarking framework that trains modality-specific policies and measures manipulation success to compare sensor suitability.

If this is right

Sensor selection for a given manipulation problem can be guided by matching modality strengths to task requirements such as precision or force feedback.
Object material friction must be accounted for when predicting how well a particular tactile representation will support stable grasping or insertion.
Analysis of spatial resolution and shear sensing explains why certain modalities outperform others on specific tasks like reorientation versus insertion.
Public release of the sensors, code, data, and setups allows direct replication and extension of the modality comparisons.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Robots might achieve more consistent performance by switching between sensor modalities mid-task depending on the current phase.
The same evaluation approach could rank new sensor designs against the four modalities tested here on additional tasks.
Future benchmarks would gain reliability by standardizing the policy training procedure before attributing differences to the sensors themselves.

Load-bearing premise

That training separate policies for each sensor modality produces performance differences that cleanly reflect sensor properties rather than differences in training dynamics or optimization.

What would settle it

Repeating the full set of experiments while enforcing identical policy architectures, training hyperparameters, and compute budgets across all four sensor modalities; persistent performance gaps would support the claim while large changes in relative rankings would indicate that the benchmark does not isolate sensor effects.

Figures

Figures reproduced from arXiv: 2605.21976 by Alexiy Buynitsky, Anya Zorin, Junsung Park, Michael T. Tolley, Myungsun Park, Oliver Kroemer, Sachin Bhadang, Sha Yi, Sohee John Yoon, Taejun Park, Xiaolong Wang, Yong-Lae Park, Zeynep Temel, Zilin Si.

**Figure 2.** Figure 2: TacO imitation learning pipeline with tactile, camera, and proprioceptive observations as inputs. [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: a), b), and c) are the success of the three manipulation tasks and d) sensor repeatability test setup. [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: The average sensor reading across episodes during the durability test. [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

read the original abstract

Vision-based learning from demonstrations has achieved remarkable success in enabling robots to perform manipulation tasks and high-level semantic reasoning, yet it remains insufficient for complex, contact-rich manipulation. While there is broad agreement that tactile sensing improves manipulation, there is no empirical guidance on which tactile sensors are best suited for which manipulation tasks. In this paper, we provide a systematic, task-driven evaluation of tactile sensors for robot manipulation and propose a framework for selecting and evaluating sensors based on manipulation policy performance. Separate manipulation policies are trained for tactile sensors of four distinct modalities: visual, acoustic, magnetic, and resistive, across three tasks: pick-and-place with unknown mass, object reorientation, and plug insertion. For each task, an analysis of how sensor properties such as spatial resolution, shear sensing, and tactile representation, and the inherent material friction affect task performances is done. Rather than tactile sensing being universally beneficial in the same way, our results show that the usefulness of tactile information depends strongly on sensor modality, material properties, and the specific manipulation tasks. All of the tactile sensors, code, data, and hardware setup will be publicly available on the project website.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The manuscript introduces TacO, a benchmark framework for evaluating tactile sensors across four modalities (visual, acoustic, magnetic, resistive) on three contact-rich manipulation tasks: pick-and-place with unknown mass, object reorientation, and plug insertion. Separate policies are trained for each sensor modality; performance differences are then attributed to sensor properties such as spatial resolution, shear sensing, and interaction with material friction. The central claim is that tactile information is not universally beneficial but its utility depends strongly on modality, material, and task. The authors commit to releasing all sensors, code, data, and hardware.

Significance. If the performance comparisons are shown to isolate sensor effects, the work supplies the first systematic, task-driven empirical guidance on modality selection for contact-rich manipulation, where vision alone is known to be insufficient. The public release of hardware, code, and data would strengthen reproducibility and enable follow-on studies. The result challenges the assumption of uniform tactile benefit and could inform both sensor hardware design and policy architectures in robotics.

major comments (1)

[Abstract and §4 (Experimental Setup)] The central claim—that performance gaps reflect sensor modality rather than training artifacts—rests on the assumption that policies trained separately for each modality are comparable. The manuscript states only that 'separate manipulation policies are trained' (Abstract and likely §4 Experimental Setup) without specifying whether network topology, observation encoding, reward formulation, optimizer schedule, or hyper-parameters are held fixed across modalities. If these elements are allowed to vary (even unintentionally), measured differences could arise from optimization ease rather than spatial resolution or shear sensing, directly undermining the modality-dependent utility conclusion.

minor comments (2)

[Abstract] The abstract provides no quantitative success rates, error bars, or statistical tests; readers must reach the results section to evaluate the strength of the modality-dependent claim.
[§3 (Sensor Modalities)] Notation for sensor properties (e.g., 'tactile representation') is introduced without a precise definition or reference to a table/figure that quantifies each property for the four modalities.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their thoughtful and constructive review. We address the single major comment below by clarifying the experimental controls and committing to explicit documentation in the revision.

read point-by-point responses

Referee: The central claim—that performance gaps reflect sensor modality rather than training artifacts—rests on the assumption that policies trained separately for each modality are comparable. The manuscript states only that 'separate manipulation policies are trained' (Abstract and likely §4 Experimental Setup) without specifying whether network topology, observation encoding, reward formulation, optimizer schedule, or hyper-parameters are held fixed across modalities. If these elements are allowed to vary (even unintentionally), measured differences could arise from optimization ease rather than spatial resolution or shear sensing, directly undermining the modality-dependent utility conclusion.

Authors: We agree that the current description is insufficiently explicit and could leave readers uncertain whether observed differences truly isolate sensor properties. In the experiments, we used a common actor-critic architecture for all modalities, with only the first layer dimension adjusted to match each sensor’s output size; reward functions, optimizer (Adam), learning-rate schedule, batch size, and all other hyperparameters were identical across modalities and tasks. Observation preprocessing was limited to modality-specific normalization or resizing, without altering the policy or value networks. We will revise §4 to include a dedicated paragraph and a summary table listing the fixed components, thereby removing any ambiguity about training artifacts. revision: yes

Circularity Check

0 steps flagged

No circularity in empirical benchmarking paper

full rationale

This is an experimental benchmarking study with no mathematical derivations, fitted parameters, or self-referential equations. Central claims rest on direct comparisons of task success rates across sensor modalities after separate policy training. No load-bearing steps reduce to inputs by construction, self-citation chains, or ansatz smuggling. The paper is self-contained via external experimental benchmarks and does not invoke uniqueness theorems or rename known results as novel derivations.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Empirical benchmarking study; no mathematical derivations or new physical entities. Claims rest on experimental comparisons and standard RL policy training assumptions.

pith-pipeline@v0.9.0 · 5783 in / 1093 out tokens · 39620 ms · 2026-05-22T05:43:00.443561+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Separate manipulation policies are trained for tactile sensors of four distinct modalities: visual, acoustic, magnetic, and resistive, across three tasks: pick-and-place with unknown mass, object reorientation, and plug insertion. ... the usefulness of tactile information depends strongly on sensor modality, material properties, and the specific manipulation tasks.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

51 extracted references · 51 canonical work pages · 7 internal anchors

[1]

Calandra, A

R. Calandra, A. Owens, D. Jayaraman, J. Lin, W. Yuan, J. Malik, E. H. Adelson, and S. Levine. More than a feeling: Learning to grasp and regrasp using vision and touch.IEEE Robotics and Automation Letters, 3(4):3300–3307, 2018

work page 2018
[2]

H. Qi, B. Yi, S. Suresh, M. Lambeta, Y . Ma, R. Calandra, and J. Malik. General in-hand object rotation with vision and touch. InConference on Robot Learning, pages 2549–2564. PMLR, 2023

work page 2023
[3]

Huang, J

B. Huang, J. Xu, I. Akinola, W. Yang, B. Sundaralingam, R. O’Flaherty, D. Fox, X. Wang, A. Mousavian, Y .-W. Chao, et al. Vt-refine: Learning bimanual assembly with visuo-tactile feedback via simulation fine-tuning.arXiv preprint arXiv:2510.14930, 2025

work page arXiv 2025
[4]

R. Gao, Z. Si, Y .-Y . Chang, S. Clarke, J. Bohg, L. Fei-Fei, W. Yuan, and J. Wu. Objectfolder 2.0: A multisensory object dataset for sim2real transfer. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10598–10608, 2022

work page 2022
[5]

Q. K. Luu, P. Zhou, Z. Xu, Z. Zhang, Q. Qiu, and Y . She. Manifeel: Benchmarking and understanding visuotactile manipulation policy learning.arXiv preprint arXiv:2505.18472, 2025

work page arXiv 2025
[6]

Tactile mnist: Benchmarking active tactile perception.arXiv preprint arXiv:2506.06361,

T. Schneider, G. Duret, C. de Farias, R. Calandra, L. Chen, and J. Peters. Tactile mnist: Benchmarking active tactile perception.arXiv preprint arXiv:2506.06361, 2025

work page arXiv 2025
[7]

T. Z. Zhao, V . Kumar, S. Levine, and C. Finn. Learning fine-grained bimanual manipulation with low-cost hardware.arXiv preprint arXiv:2304.13705, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[8]

R.-Z. Qiu, S. Yang, X. Cheng, C. Chawla, J. Li, T. He, G. Yan, D. J. Yoon, R. Hoque, L. Paulsen, et al. Humanoid policy˜ human policy.Conference on Robot Learning (CoRL), 2025

work page 2025
[9]

R. Gao, Y . Dou, H. Li, T. Agarwal, J. Bohg, Y . Li, L. Fei-Fei, and J. Wu. The object- folder benchmark: Multisensory learning with neural and real objects. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 17276–17286, 2023

work page 2023
[10]

Q. Liu, Y . Cui, Z. Sun, G. Li, J. Chen, and Q. Ye. Vtdexmanip: A dataset and benchmark for visual-tactile pretraining and dexterous manipulation with reinforcement learning. InThe Thirteenth International Conference on Learning Representations

work page
[11]

C. Chi, Z. Xu, S. Feng, E. Cousineau, Y . Du, B. Burchfiel, R. Tedrake, and S. Song. Diffusion policy: Visuomotor policy learning via action diffusion.The International Journal of Robotics Research, page 02783649241273668, 2023

work page 2023
[12]

Z. Fu, T. Z. Zhao, and C. Finn. Mobile aloha: Learning bimanual mobile manipulation with low-cost whole-body teleoperation.arXiv preprint arXiv:2401.02117, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[13]

S. Yang, M. Liu, Y . Qin, R. Ding, J. Li, X. Cheng, R. Yang, S. Yi, and X. Wang. Ace: A cross-platform visual-exoskeletons system for low-cost dexterous teleoperation.Conference on Robot Learning (CoRL), 2024

work page 2024
[14]

Cheng, J

X. Cheng, J. Li, S. Yang, G. Yang, and X. Wang. Open-television: Teleoperation with immer- sive active visual feedback.Conference on Robot Learning (CoRL), 2024. 10

work page 2024
[15]

P. Wu, Y . Shentu, Z. Yi, X. Lin, and P. Abbeel. Gello: A general, low-cost, and intuitive teleoperation framework for robot manipulators.arXiv preprint arXiv:2309.13037, 2023

work page arXiv 2023
[16]

$\pi_0$: A Vision-Language-Action Flow Model for General Robot Control

K. Black, N. Brown, D. Driess, A. Esmail, M. Equi, C. Finn, N. Fusai, L. Groom, K. Hausman, B. Ichter, et al.π 0: A vision-language-action flow model for general robot control.arXiv preprint arXiv:2410.24164, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[17]

GR00T N1: An Open Foundation Model for Generalist Humanoid Robots

J. Bjorck, F. Casta ˜neda, N. Cherniadev, X. Da, R. Ding, L. Fan, Y . Fang, D. Fox, F. Hu, S. Huang, et al. Gr00t n1: An open foundation model for generalist humanoid robots.arXiv preprint arXiv:2503.14734, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[18]

J. Lee, J. Duan, H. Fang, Y . Deng, S. Liu, B. Li, B. Fang, J. Zhang, Y . R. Wang, S. Lee, et al. Molmoact: Action reasoning models that can reason in space.arXiv preprint arXiv:2508.07917, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[19]

Kappassov, J.-A

Z. Kappassov, J.-A. Corrales, and V . Perdereau. Tactile sensing in dexterous robot hands. Robotics and Autonomous Systems, 74:195–220, 2015

work page 2015
[20]

M. R. Tremblay and M. R. Cutkosky. Estimating friction using incipient slip sensing during a manipulation task. In[1993] Proceedings IEEE International Conference on Robotics and Automation, pages 429–434. IEEE, 1993

work page 1993
[21]

W. Yuan, S. Dong, and E. H. Adelson. Gelsight: High-resolution robot tactile sensors for estimating geometry and force.Sensors, 17(12):2762, 2017

work page 2017
[22]

Lambeta, P.-W

M. Lambeta, P.-W. Chou, S. Tian, B. Yang, B. Maloon, V . R. Most, D. Stroud, R. Santos, A. Byagowi, G. Kammerer, et al. Digit: A novel design for a low-cost compact high-resolution tactile sensor with application to in-hand manipulation.IEEE Robotics and Automation Letters, 5(3):3838–3845, 2020

work page 2020
[23]

W. Li, J. Konstantinova, Y . Noh, Z. Ma, A. Alomainy, and K. Althoefer. An elastomer-based flexible optical force and tactile sensor. In2019 2nd IEEE International Conference on Soft Robotics (RoboSoft), pages 361–366. IEEE, 2019

work page 2019
[24]

Sferrazza and R

C. Sferrazza and R. D’Andrea. Design, motivation and evaluation of a full-resolution optical tactile sensor.Sensors, 19(4):928, 2019

work page 2019
[25]

Lambeta, T

M. Lambeta, T. Wu, A. Sengul, V . R. Most, N. Black, K. Sawyer, R. Mercado, H. Qi, A. Sohn, B. Taylor, N. Tydingco, G. Kammerer, D. Stroud, J. Khatha, K. Jenkins, K. Most, N. Stein, R. Chavira, T. Craven-Bartle, E. Sanchez, Y . Ding, J. Malik, and R. Calandra. Digitizing touch with an artificial multimodal fingertip. InarXiv, 2024

work page 2024
[26]

T. P. Tomo, M. Regoli, A. Schmitz, L. Natale, H. Kristanto, S. Somlor, L. Jamone, G. Metta, and S. Sugano. A new silicone structure for uskin—a soft, distributed, digital 3-axis skin sensor and its integration on the humanoid robot icub.IEEE Robotics and Automation Letters, 3(3): 2584–2591, 2018

work page 2018
[27]

Hellebrekers, O

T. Hellebrekers, O. Kroemer, and C. Majidi. Soft magnetic skin for continuous deformation sensing.Advanced Intelligent Systems, 1(4):1900025, 2019

work page 2019
[28]

K. Dai, X. Wang, A. M. Rojas, E. Harber, Y . Tian, N. Paiva, J. Gnehm, E. Schindewolf, H. Choset, V . A. Webster-Wood, et al. Design of a biomimetic tactile sensor for material classification. In2022 International Conference on Robotics and Automation (ICRA), pages 10774–10780. IEEE, 2022

work page 2022
[29]

Bhirangi, V

R. Bhirangi, V . Pattabiraman, E. Erciyes, Y . Cao, T. Hellebrekers, and L. Pinto. Anyskin: Plug- and-play skin sensing for robotic touch. In2025 IEEE International Conference on Robotics and Automation (ICRA), pages 16563–16570. IEEE, 2025. 11

work page 2025
[30]

Pattabiraman, Z

V . Pattabiraman, Z. Huang, D. Panozzo, D. Zorin, L. Pinto, and R. Bhirangi. eflesh: Highly customizable magnetic touch sensing using cut-cell microstructures.arXiv preprint arXiv:2506.09994, 2025

work page arXiv 2025
[31]

Park, B.-R

Y .-L. Park, B.-R. Chen, and R. J. Wood. Design and fabrication of soft artificial skin using embedded microchannels and liquid conductors.IEEE Sensors journal, 12(8):2711–2718, 2012

work page 2012
[32]

Bhattacharjee, A

T. Bhattacharjee, A. Jain, S. Vaish, M. D. Killpack, and C. C. Kemp. Tactile sensing over articulated joints with stretchable sensors. In2013 World Haptics Conference (WHC), pages 103–108. IEEE, 2013

work page 2013
[33]

Z.-H. Yin, B. Huang, Y . Qin, Q. Chen, and X. Wang. Rotating without seeing: Towards in-hand dexterity through touch.arXiv preprint arXiv:2303.10880, 2023

work page arXiv 2023
[34]

Gupta, D

A. Gupta, D. Park, S. Bashar, C. Girerd, N. Bhat, S. Mundhra, T. K. Morimoto, and D. Bhara- dia. Forcesticker: Wireless, batteryless, thin & flexible force sensors.Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, 7(1):1–32, 2023

work page 2023
[35]

Huang, Y

B. Huang, Y . Wang, X. Yang, Y . Luo, and Y . Li. 3d-vitac: Learning fine-grained manipulation with visuo-tactile sensing.arXiv preprint arXiv:2410.24091, 2024

work page arXiv 2024
[36]

Yoshimura, K

S. Yoshimura, K. Kawaharazuka, and K. Okada. M3d-skin: Multi-material 3d-printed tactile sensor with hierarchical infill structures for pressure sensing.arXiv preprint arXiv:2510.12419, 2025

work page arXiv 2025
[37]

Arandjelovic and A

R. Arandjelovic and A. Zisserman. Look, listen and learn. InProceedings of the IEEE inter- national conference on computer vision, pages 609–617, 2017

work page 2017
[38]

Zhang, M

K. Zhang, M. Sharma, M. Veloso, and O. Kroemer. Leveraging multimodal haptic sensory data for robust cutting. In2019 IEEE-RAS 19th International Conference on Humanoid Robots (Humanoids), pages 409–416. IEEE, 2019

work page 2019
[39]

Mejia, V

J. Mejia, V . Dean, T. Hellebrekers, and A. Gupta. Hearing touch: Audio-visual pretraining for contact-rich manipulation. In2024 IEEE International Conference on Robotics and Automa- tion (ICRA), pages 6912–6919. IEEE, 2024

work page 2024
[40]

Aderibigbe, M

J. Aderibigbe, M. Li, J. Lee, and H. S. Stuart. Milli-scale acoustac sensing using soft helmholtz resonators. In2025 IEEE International Conference on Robotics and Automation (ICRA), pages 16585–16590. IEEE, 2025

work page 2025
[41]

Z. Xu, Z. Si, K. Zhang, O. Kroemer, and Z. Temel. A multi-modal tactile fingertip design for robotic hands to enhance dexterous manipulation.arXiv preprint arXiv:2510.05382, 2025

work page arXiv 2025
[42]

Si and W

Z. Si and W. Yuan. Taxim: An example-based simulation model for gelsight tactile sensors. IEEE Robotics and Automation Letters, 7(2):2361–2368, 2022

work page 2022
[43]

Y . R. Song, J. Li, R. Fu, D. Murphy, K. Zhou, R. Shiv, Y . Li, H. Xiong, C. E. Owens, Y . Du, et al. Opentouch: Bringing full-hand touch to real-world interaction.arXiv preprint arXiv:2512.16842, 2025

work page arXiv 2025
[44]

Higuera, A

C. Higuera, A. Sharma, T. Fan, C. K. Bodduluri, B. Boots, M. Kaess, M. Lambeta, T. Wu, Z. Liu, F. R. Hogan, et al. Tactile beyond pixels: Multisensory touch representations for robot manipulation.arXiv preprint arXiv:2506.14754, 2025

work page arXiv 2025
[45]

J. J. Liu, Y . Li, K. Shaw, T. Tao, R. Salakhutdinov, and D. Pathak. Factr: Force-attending curriculum training for contact-rich policy learning.arXiv preprint arXiv:2502.17432, 2025. 12

work page arXiv 2025
[46]

K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. InPro- ceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016

work page 2016
[47]

J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. Imagenet: A large-scale hierar- chical image database. In2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009

work page 2009
[48]

Vaswani, N

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polo- sukhin. Attention is all you need.Advances in neural information processing systems, 30, 2017

work page 2017
[49]

Decoupled Weight Decay Regularization

I. Loshchilov and F. Hutter. Decoupled weight decay regularization, 2019. URLhttps: //arxiv.org/abs/1711.05101

work page internal anchor Pith review Pith/arXiv arXiv 2019
[50]

C. Chi, Z. Xu, C. Pan, E. Cousineau, B. Burchfiel, S. Feng, R. Tedrake, and S. Song. Universal manipulation interface: In-the-wild robot teaching without in-the-wild robots.2024 Robotics, Science and Systems, 2024. URLhttps://arxiv.org/abs/2402.10329

work page internal anchor Pith review Pith/arXiv arXiv 2024
[51]

Schneider

T. Schneider. franky: High-Level Control Library for Franka Robots. URLhttps://github. com/TimSchneider42/franky. 13 8 Appendix 8.1 Additional Tactile Encoders As sensor data differ fundamentally in dimensionality and spatial structure, a single shared encoder can’t be used. Our encoders used in the main paper experiments are matched to each sensor’s sign...

work page

[1] [1]

Calandra, A

R. Calandra, A. Owens, D. Jayaraman, J. Lin, W. Yuan, J. Malik, E. H. Adelson, and S. Levine. More than a feeling: Learning to grasp and regrasp using vision and touch.IEEE Robotics and Automation Letters, 3(4):3300–3307, 2018

work page 2018

[2] [2]

H. Qi, B. Yi, S. Suresh, M. Lambeta, Y . Ma, R. Calandra, and J. Malik. General in-hand object rotation with vision and touch. InConference on Robot Learning, pages 2549–2564. PMLR, 2023

work page 2023

[3] [3]

Huang, J

B. Huang, J. Xu, I. Akinola, W. Yang, B. Sundaralingam, R. O’Flaherty, D. Fox, X. Wang, A. Mousavian, Y .-W. Chao, et al. Vt-refine: Learning bimanual assembly with visuo-tactile feedback via simulation fine-tuning.arXiv preprint arXiv:2510.14930, 2025

work page arXiv 2025

[4] [4]

R. Gao, Z. Si, Y .-Y . Chang, S. Clarke, J. Bohg, L. Fei-Fei, W. Yuan, and J. Wu. Objectfolder 2.0: A multisensory object dataset for sim2real transfer. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10598–10608, 2022

work page 2022

[5] [5]

Q. K. Luu, P. Zhou, Z. Xu, Z. Zhang, Q. Qiu, and Y . She. Manifeel: Benchmarking and understanding visuotactile manipulation policy learning.arXiv preprint arXiv:2505.18472, 2025

work page arXiv 2025

[6] [6]

Tactile mnist: Benchmarking active tactile perception.arXiv preprint arXiv:2506.06361,

T. Schneider, G. Duret, C. de Farias, R. Calandra, L. Chen, and J. Peters. Tactile mnist: Benchmarking active tactile perception.arXiv preprint arXiv:2506.06361, 2025

work page arXiv 2025

[7] [7]

T. Z. Zhao, V . Kumar, S. Levine, and C. Finn. Learning fine-grained bimanual manipulation with low-cost hardware.arXiv preprint arXiv:2304.13705, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[8] [8]

R.-Z. Qiu, S. Yang, X. Cheng, C. Chawla, J. Li, T. He, G. Yan, D. J. Yoon, R. Hoque, L. Paulsen, et al. Humanoid policy˜ human policy.Conference on Robot Learning (CoRL), 2025

work page 2025

[9] [9]

R. Gao, Y . Dou, H. Li, T. Agarwal, J. Bohg, Y . Li, L. Fei-Fei, and J. Wu. The object- folder benchmark: Multisensory learning with neural and real objects. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 17276–17286, 2023

work page 2023

[10] [10]

Q. Liu, Y . Cui, Z. Sun, G. Li, J. Chen, and Q. Ye. Vtdexmanip: A dataset and benchmark for visual-tactile pretraining and dexterous manipulation with reinforcement learning. InThe Thirteenth International Conference on Learning Representations

work page

[11] [11]

C. Chi, Z. Xu, S. Feng, E. Cousineau, Y . Du, B. Burchfiel, R. Tedrake, and S. Song. Diffusion policy: Visuomotor policy learning via action diffusion.The International Journal of Robotics Research, page 02783649241273668, 2023

work page 2023

[12] [12]

Z. Fu, T. Z. Zhao, and C. Finn. Mobile aloha: Learning bimanual mobile manipulation with low-cost whole-body teleoperation.arXiv preprint arXiv:2401.02117, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[13] [13]

S. Yang, M. Liu, Y . Qin, R. Ding, J. Li, X. Cheng, R. Yang, S. Yi, and X. Wang. Ace: A cross-platform visual-exoskeletons system for low-cost dexterous teleoperation.Conference on Robot Learning (CoRL), 2024

work page 2024

[14] [14]

Cheng, J

X. Cheng, J. Li, S. Yang, G. Yang, and X. Wang. Open-television: Teleoperation with immer- sive active visual feedback.Conference on Robot Learning (CoRL), 2024. 10

work page 2024

[15] [15]

P. Wu, Y . Shentu, Z. Yi, X. Lin, and P. Abbeel. Gello: A general, low-cost, and intuitive teleoperation framework for robot manipulators.arXiv preprint arXiv:2309.13037, 2023

work page arXiv 2023

[16] [16]

$\pi_0$: A Vision-Language-Action Flow Model for General Robot Control

K. Black, N. Brown, D. Driess, A. Esmail, M. Equi, C. Finn, N. Fusai, L. Groom, K. Hausman, B. Ichter, et al.π 0: A vision-language-action flow model for general robot control.arXiv preprint arXiv:2410.24164, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[17] [17]

GR00T N1: An Open Foundation Model for Generalist Humanoid Robots

J. Bjorck, F. Casta ˜neda, N. Cherniadev, X. Da, R. Ding, L. Fan, Y . Fang, D. Fox, F. Hu, S. Huang, et al. Gr00t n1: An open foundation model for generalist humanoid robots.arXiv preprint arXiv:2503.14734, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[18] [18]

J. Lee, J. Duan, H. Fang, Y . Deng, S. Liu, B. Li, B. Fang, J. Zhang, Y . R. Wang, S. Lee, et al. Molmoact: Action reasoning models that can reason in space.arXiv preprint arXiv:2508.07917, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[19] [19]

Kappassov, J.-A

Z. Kappassov, J.-A. Corrales, and V . Perdereau. Tactile sensing in dexterous robot hands. Robotics and Autonomous Systems, 74:195–220, 2015

work page 2015

[20] [20]

M. R. Tremblay and M. R. Cutkosky. Estimating friction using incipient slip sensing during a manipulation task. In[1993] Proceedings IEEE International Conference on Robotics and Automation, pages 429–434. IEEE, 1993

work page 1993

[21] [21]

W. Yuan, S. Dong, and E. H. Adelson. Gelsight: High-resolution robot tactile sensors for estimating geometry and force.Sensors, 17(12):2762, 2017

work page 2017

[22] [22]

Lambeta, P.-W

M. Lambeta, P.-W. Chou, S. Tian, B. Yang, B. Maloon, V . R. Most, D. Stroud, R. Santos, A. Byagowi, G. Kammerer, et al. Digit: A novel design for a low-cost compact high-resolution tactile sensor with application to in-hand manipulation.IEEE Robotics and Automation Letters, 5(3):3838–3845, 2020

work page 2020

[23] [23]

W. Li, J. Konstantinova, Y . Noh, Z. Ma, A. Alomainy, and K. Althoefer. An elastomer-based flexible optical force and tactile sensor. In2019 2nd IEEE International Conference on Soft Robotics (RoboSoft), pages 361–366. IEEE, 2019

work page 2019

[24] [24]

Sferrazza and R

C. Sferrazza and R. D’Andrea. Design, motivation and evaluation of a full-resolution optical tactile sensor.Sensors, 19(4):928, 2019

work page 2019

[25] [25]

Lambeta, T

M. Lambeta, T. Wu, A. Sengul, V . R. Most, N. Black, K. Sawyer, R. Mercado, H. Qi, A. Sohn, B. Taylor, N. Tydingco, G. Kammerer, D. Stroud, J. Khatha, K. Jenkins, K. Most, N. Stein, R. Chavira, T. Craven-Bartle, E. Sanchez, Y . Ding, J. Malik, and R. Calandra. Digitizing touch with an artificial multimodal fingertip. InarXiv, 2024

work page 2024

[26] [26]

T. P. Tomo, M. Regoli, A. Schmitz, L. Natale, H. Kristanto, S. Somlor, L. Jamone, G. Metta, and S. Sugano. A new silicone structure for uskin—a soft, distributed, digital 3-axis skin sensor and its integration on the humanoid robot icub.IEEE Robotics and Automation Letters, 3(3): 2584–2591, 2018

work page 2018

[27] [27]

Hellebrekers, O

T. Hellebrekers, O. Kroemer, and C. Majidi. Soft magnetic skin for continuous deformation sensing.Advanced Intelligent Systems, 1(4):1900025, 2019

work page 2019

[28] [28]

K. Dai, X. Wang, A. M. Rojas, E. Harber, Y . Tian, N. Paiva, J. Gnehm, E. Schindewolf, H. Choset, V . A. Webster-Wood, et al. Design of a biomimetic tactile sensor for material classification. In2022 International Conference on Robotics and Automation (ICRA), pages 10774–10780. IEEE, 2022

work page 2022

[29] [29]

Bhirangi, V

R. Bhirangi, V . Pattabiraman, E. Erciyes, Y . Cao, T. Hellebrekers, and L. Pinto. Anyskin: Plug- and-play skin sensing for robotic touch. In2025 IEEE International Conference on Robotics and Automation (ICRA), pages 16563–16570. IEEE, 2025. 11

work page 2025

[30] [30]

Pattabiraman, Z

V . Pattabiraman, Z. Huang, D. Panozzo, D. Zorin, L. Pinto, and R. Bhirangi. eflesh: Highly customizable magnetic touch sensing using cut-cell microstructures.arXiv preprint arXiv:2506.09994, 2025

work page arXiv 2025

[31] [31]

Park, B.-R

Y .-L. Park, B.-R. Chen, and R. J. Wood. Design and fabrication of soft artificial skin using embedded microchannels and liquid conductors.IEEE Sensors journal, 12(8):2711–2718, 2012

work page 2012

[32] [32]

Bhattacharjee, A

T. Bhattacharjee, A. Jain, S. Vaish, M. D. Killpack, and C. C. Kemp. Tactile sensing over articulated joints with stretchable sensors. In2013 World Haptics Conference (WHC), pages 103–108. IEEE, 2013

work page 2013

[33] [33]

Z.-H. Yin, B. Huang, Y . Qin, Q. Chen, and X. Wang. Rotating without seeing: Towards in-hand dexterity through touch.arXiv preprint arXiv:2303.10880, 2023

work page arXiv 2023

[34] [34]

Gupta, D

A. Gupta, D. Park, S. Bashar, C. Girerd, N. Bhat, S. Mundhra, T. K. Morimoto, and D. Bhara- dia. Forcesticker: Wireless, batteryless, thin & flexible force sensors.Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, 7(1):1–32, 2023

work page 2023

[35] [35]

Huang, Y

B. Huang, Y . Wang, X. Yang, Y . Luo, and Y . Li. 3d-vitac: Learning fine-grained manipulation with visuo-tactile sensing.arXiv preprint arXiv:2410.24091, 2024

work page arXiv 2024

[36] [36]

Yoshimura, K

S. Yoshimura, K. Kawaharazuka, and K. Okada. M3d-skin: Multi-material 3d-printed tactile sensor with hierarchical infill structures for pressure sensing.arXiv preprint arXiv:2510.12419, 2025

work page arXiv 2025

[37] [37]

Arandjelovic and A

R. Arandjelovic and A. Zisserman. Look, listen and learn. InProceedings of the IEEE inter- national conference on computer vision, pages 609–617, 2017

work page 2017

[38] [38]

Zhang, M

K. Zhang, M. Sharma, M. Veloso, and O. Kroemer. Leveraging multimodal haptic sensory data for robust cutting. In2019 IEEE-RAS 19th International Conference on Humanoid Robots (Humanoids), pages 409–416. IEEE, 2019

work page 2019

[39] [39]

Mejia, V

J. Mejia, V . Dean, T. Hellebrekers, and A. Gupta. Hearing touch: Audio-visual pretraining for contact-rich manipulation. In2024 IEEE International Conference on Robotics and Automa- tion (ICRA), pages 6912–6919. IEEE, 2024

work page 2024

[40] [40]

Aderibigbe, M

J. Aderibigbe, M. Li, J. Lee, and H. S. Stuart. Milli-scale acoustac sensing using soft helmholtz resonators. In2025 IEEE International Conference on Robotics and Automation (ICRA), pages 16585–16590. IEEE, 2025

work page 2025

[41] [41]

Z. Xu, Z. Si, K. Zhang, O. Kroemer, and Z. Temel. A multi-modal tactile fingertip design for robotic hands to enhance dexterous manipulation.arXiv preprint arXiv:2510.05382, 2025

work page arXiv 2025

[42] [42]

Si and W

Z. Si and W. Yuan. Taxim: An example-based simulation model for gelsight tactile sensors. IEEE Robotics and Automation Letters, 7(2):2361–2368, 2022

work page 2022

[43] [43]

Y . R. Song, J. Li, R. Fu, D. Murphy, K. Zhou, R. Shiv, Y . Li, H. Xiong, C. E. Owens, Y . Du, et al. Opentouch: Bringing full-hand touch to real-world interaction.arXiv preprint arXiv:2512.16842, 2025

work page arXiv 2025

[44] [44]

Higuera, A

C. Higuera, A. Sharma, T. Fan, C. K. Bodduluri, B. Boots, M. Kaess, M. Lambeta, T. Wu, Z. Liu, F. R. Hogan, et al. Tactile beyond pixels: Multisensory touch representations for robot manipulation.arXiv preprint arXiv:2506.14754, 2025

work page arXiv 2025

[45] [45]

J. J. Liu, Y . Li, K. Shaw, T. Tao, R. Salakhutdinov, and D. Pathak. Factr: Force-attending curriculum training for contact-rich policy learning.arXiv preprint arXiv:2502.17432, 2025. 12

work page arXiv 2025

[46] [46]

K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. InPro- ceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016

work page 2016

[47] [47]

J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. Imagenet: A large-scale hierar- chical image database. In2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009

work page 2009

[48] [48]

Vaswani, N

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polo- sukhin. Attention is all you need.Advances in neural information processing systems, 30, 2017

work page 2017

[49] [49]

Decoupled Weight Decay Regularization

I. Loshchilov and F. Hutter. Decoupled weight decay regularization, 2019. URLhttps: //arxiv.org/abs/1711.05101

work page internal anchor Pith review Pith/arXiv arXiv 2019

[50] [50]

C. Chi, Z. Xu, C. Pan, E. Cousineau, B. Burchfiel, S. Feng, R. Tedrake, and S. Song. Universal manipulation interface: In-the-wild robot teaching without in-the-wild robots.2024 Robotics, Science and Systems, 2024. URLhttps://arxiv.org/abs/2402.10329

work page internal anchor Pith review Pith/arXiv arXiv 2024

[51] [51]

Schneider

T. Schneider. franky: High-Level Control Library for Franka Robots. URLhttps://github. com/TimSchneider42/franky. 13 8 Appendix 8.1 Additional Tactile Encoders As sensor data differ fundamentally in dimensionality and spatial structure, a single shared encoder can’t be used. Our encoders used in the main paper experiments are matched to each sensor’s sign...

work page