Advances in Compliance Detection: Novel Models Using Vision-Based Tactile Sensors

Ilana Nisky; Malte Kuhlmann; Nicol\'as Navarro-Guerrero; Ziteng Li

arxiv: 2506.14980 · v1 · submitted 2025-06-17 · 💻 cs.CV · cs.RO

Advances in Compliance Detection: Novel Models Using Vision-Based Tactile Sensors

Ziteng Li , Malte Kuhlmann , Ilana Nisky , Nicol\'as Navarro-Guerrero This is my paper

Pith reviewed 2026-05-19 08:45 UTC · model grok-4.3

classification 💻 cs.CV cs.RO

keywords compliance estimationvision-based tactile sensingGelSight sensorLRCNTransformerrobotic perceptionmaterial property prediction

0 comments

The pith

Two neural network models using GelSight RGB tactile images estimate object compliance more accurately than baseline methods.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops LRCN and Transformer models that take sequences of RGB images from a GelSight vision-based tactile sensor and predict how compliant an object is. Traditional compliance measurement relies on bulky or expensive equipment that does not suit robots, while earlier neural approaches fell short on accuracy. The new models show clear gains across standard metrics on held-out test objects. The work also reports that objects stiffer than the sensor gel itself are systematically harder to judge correctly.

Core claim

LRCN and Transformer architectures applied to RGB tactile images and auxiliary data from the GelSight sensor deliver significant improvements in compliance prediction accuracy over baseline networks, as measured by multiple performance metrics. The same experiments reveal a correlation in which objects harder than the sensor material prove more difficult to estimate accurately.

What carries the argument

LRCN and Transformer networks that process sequences of RGB tactile images captured by the GelSight sensor to regress compliance values.

If this is right

Robotic systems gain a practical way to assess material softness without dedicated force sensors.
Compliance estimation becomes feasible in portable or field settings where traditional instruments are impractical.
Estimation difficulty increases when the target object is stiffer than the sensor gel, suggesting a hardness-mismatch limit on performance.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The models could be deployed on robot hands for online adjustment of grasp force during manipulation of unknown soft items.
Similar image-sequence architectures might transfer to other vision-based tactile sensors if the underlying image-to-compliance mapping proves sensor-agnostic.
Combining the compliance output with additional modalities such as shear or temperature could reduce errors on hard objects.

Load-bearing premise

The RGB images from the GelSight sensor contain enough information about compliance to generalize beyond the particular training objects and sensor instance used.

What would settle it

Train the models on one set of objects and materials, then test them on a fresh collection of objects with substantially different stiffnesses or surface properties and check whether the reported accuracy advantage over baselines vanishes.

Figures

Figures reproduced from arXiv: 2506.14980 by Ilana Nisky, Malte Kuhlmann, Nicol\'as Navarro-Guerrero, Ziteng Li.

**Figure 2.** Figure 2: VGG-LSTM Architecture. and training strategy differ from the experimental strategy proposed in the previous Subsection, we retrained it to verify its performance under our proposed experimental strategy. We also designed two new models: an LRCN-based and a Transformer-based model. The motivation for our proposed models is to exploit the time-series information in the data more effectively. We provide detai… view at source ↗

**Figure 3.** Figure 3: Res-Tf Architecture. aggregated using learnable weighted averaging to produce the final prediction. 3) Transformer: We design a model named Res-Tf based on Residual Networks (ResNet) [37] with Transformer [38]. We incorporate a Transformer encoder due to its proven effectiveness in various time-series tasks [39] [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: Comparison of Young’s modulus prediction performance for the three models with two input modalities on the Seen-Object condition and Balanced [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

**Figure 6.** Figure 6: N-MSE under different Shapes based on Random sampling strategy [PITH_FULL_IMAGE:figures/full_fig_p006_6.png] view at source ↗

**Figure 7.** Figure 7: Comparison of Young’s modulus estimation performance across all three models with Image only on Seen-Object condition for the new dataset. [PITH_FULL_IMAGE:figures/full_fig_p007_7.png] view at source ↗

read the original abstract

Compliance is a critical parameter for describing objects in engineering, agriculture, and biomedical applications. Traditional compliance detection methods are limited by their lack of portability and scalability, rely on specialized, often expensive equipment, and are unsuitable for robotic applications. Moreover, existing neural network-based approaches using vision-based tactile sensors still suffer from insufficient prediction accuracy. In this paper, we propose two models based on Long-term Recurrent Convolutional Networks (LRCNs) and Transformer architectures that leverage RGB tactile images and other information captured by the vision-based sensor GelSight to predict compliance metrics accurately. We validate the performance of these models using multiple metrics and demonstrate their effectiveness in accurately estimating compliance. The proposed models exhibit significant performance improvement over the baseline. Additionally, we investigated the correlation between sensor compliance and object compliance estimation, which revealed that objects that are harder than the sensor are more challenging to estimate.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

LRCN and Transformer models on GelSight images improve compliance estimates over baseline but rest on limited evidence for generalization beyond the training objects.

read the letter

The core of this paper is straightforward: they train LRCN and Transformer networks on RGB images from a GelSight sensor to predict object compliance and report better results than a baseline. They also check how the sensor's own compliance affects accuracy and find that stiffer objects than the sensor are harder to handle correctly. That observation is useful because it flags a practical limit instead of claiming the method works everywhere. The work is new in the narrow sense of bringing these particular sequence models to GelSight-based compliance data, where earlier neural attempts apparently fell short on accuracy. Framing the portability problem with traditional testers is clear and ties directly to robotics needs. The correlation check between sensor and object properties adds a small but honest piece of analysis that most pure performance papers skip. The main weakness is the lack of concrete numbers. The abstract mentions performance gains and multiple metrics but gives no error values, dataset size, split details, or cross-validation scheme. Without those, it is difficult to tell whether the improvement is reliable or tied to the specific objects and sensor unit used in training. The stress-test concern about memorizing deformation patterns rather than learning general compliance cues lines up with the paper's own finding on harder objects. If they did not run hold-out tests on new materials or different GelSight hardware, the gains could stay dataset-specific. This paper is mainly for people already working on vision-based tactile sensing who need a quick example of applying recurrent or attention models to compliance. A reader looking for portable measurement tools in agriculture or soft robotics might scan it for implementation ideas. It is not broad enough to change how the field thinks about tactile sensing overall. The idea is grounded enough and the limitation is acknowledged enough that it should go to peer review rather than a desk reject. Referees will need to see the actual tables and any cross-object results before the claims can be taken as solid.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes two neural network models (LRCN and Transformer) that take RGB tactile images from the GelSight vision-based sensor, along with auxiliary information, to predict object compliance. It reports significant performance gains relative to a baseline and presents an empirical finding that objects harder than the sensor are more difficult to estimate accurately.

Significance. If the reported gains prove robust under proper generalization testing, the work would supply a portable, vision-based alternative to traditional compliance measurement hardware, with potential utility in robotics, agriculture, and biomedical settings. The approach is a straightforward application of established sequence and attention architectures to tactile imagery; its value therefore rests on whether the learned features capture compliance independently of training objects and sensor deformation rather than dataset-specific patterns.

major comments (2)

[§4] §4 (Experimental Setup and Results): the evaluation protocol uses a single train/test split on the collected objects without cross-object hold-out, cross-material validation, or tests on a different GelSight unit. This directly bears on the central claim that the models extract generalizable compliance information from RGB images, especially given the paper's own observation that objects harder than the sensor are harder to estimate.
[Abstract and §4.3] Abstract and §4.3 (Quantitative Results): the asserted 'significant performance improvement' is stated without accompanying numerical values, dataset cardinality, number of distinct objects/materials, validation-split details, or error bars. These omissions prevent assessment of whether the gains are statistically meaningful or merely reflect memorization of the training distribution.

minor comments (2)

[§2 and §3] Notation for compliance metrics (e.g., Young's modulus versus stiffness) is used inconsistently between the abstract and the methods section; a single consistent definition should be adopted.
[Figure 3] Figure 3 (sample GelSight images) would benefit from an explicit scale bar and indication of the contact force applied during capture.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thoughtful and constructive comments. We address each major comment below and indicate the revisions we will make to strengthen the manuscript.

read point-by-point responses

Referee: [§4] §4 (Experimental Setup and Results): the evaluation protocol uses a single train/test split on the collected objects without cross-object hold-out, cross-material validation, or tests on a different GelSight unit. This directly bears on the central claim that the models extract generalizable compliance information from RGB images, especially given the paper's own observation that objects harder than the sensor are harder to estimate.

Authors: We agree that reliance on a single train/test split limits the strength of claims about generalization. The current protocol was selected to maximize training data given the size of the collected dataset. In the revision we will add leave-one-object-out and cross-material validation results to §4, along with a clearer discussion of how the observed difficulty with objects harder than the sensor relates to generalization. Testing on an additional GelSight unit is not feasible with the hardware available for this study; we will therefore note this explicitly as a limitation and a suggested direction for future work rather than claiming broader hardware invariance. revision: partial
Referee: [Abstract and §4.3] Abstract and §4.3 (Quantitative Results): the asserted 'significant performance improvement' is stated without accompanying numerical values, dataset cardinality, number of distinct objects/materials, validation-split details, or error bars. These omissions prevent assessment of whether the gains are statistically meaningful or merely reflect memorization of the training distribution.

Authors: We accept that the abstract and §4.3 should contain the concrete numbers needed to evaluate the reported gains. The revised version will insert the specific accuracy (or other metric) improvements, the total number of objects and distinct materials, the exact train/validation/test split ratios, and error bars obtained from repeated runs. These additions will allow readers to judge whether the improvements exceed what would be expected from memorization of the training distribution. revision: yes

Circularity Check

0 steps flagged

No circularity in empirical ML compliance estimation

full rationale

The paper trains LRCN and Transformer models on GelSight RGB tactile images to predict object compliance, then validates performance on held-out data with reported gains over baseline. No derivation chain, equations, or first-principles results are presented that reduce to inputs by construction. The correlation analysis between sensor and object compliance is an empirical observation, not a self-referential fit or prediction. No self-citations serve as load-bearing uniqueness claims, and no ansatz or renaming of known results occurs. This is a standard data-driven supervised learning approach that remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

The central claim rests on the assumption that tactile image features are predictive of compliance and that standard supervised training will generalize; no new physical entities or ad-hoc constants are introduced beyond typical neural network weights.

free parameters (1)

neural network weights and hyperparameters
All model parameters are fitted to the collected tactile image dataset during training.

axioms (2)

domain assumption Tactile RGB images from GelSight contain extractable features correlated with object compliance
Invoked when the models are trained to map images to compliance values.
standard math Standard supervised learning assumptions hold (i.i.d. samples, appropriate loss, no severe distribution shift)
Implicit in any neural network training for regression.

pith-pipeline@v0.9.0 · 5688 in / 1244 out tokens · 31337 ms · 2026-05-19T08:45:07.334608+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

42 extracted references · 42 canonical work pages · 1 internal anchor

[1]

Tactile Sensing in Dexterous Robot Hands – Review,

Z. Kappassov, J.-A. Corrales, and V . Perdereau, “Tactile Sensing in Dexterous Robot Hands – Review,” Robotics and Autonomous Systems , vol. 74, Part A, pp. 195–220, Dec. 2015

work page 2015
[2]

D. R. H. Jones and M. F. Ashby, Engineering Materials 1: An Introduction to Properties, Applications and Design , 5th ed. Oxford, United Kingdom: Butterworth-Heinemann, 2019, vol. 1

work page 2019
[3]

Single- Grasp Object Classification and Feature Extraction with Simple Robot Hands and Tactile Sensors,

A. J. Spiers, M. V . Liarokapis, B. Calli, and A. M. Dollar, “Single- Grasp Object Classification and Feature Extraction with Simple Robot Hands and Tactile Sensors,” IEEE Transactions on Haptics , vol. 9, no. 2, pp. 207–220, 2016

work page 2016
[4]

Evaluating Inte- gration Strategies for Visuo-Haptic Object Recognition,

S. Toprak, N. Navarro-Guerrero, and S. Wermter, “Evaluating Inte- gration Strategies for Visuo-Haptic Object Recognition,” Cognitive Computation, vol. 10, no. 3, pp. 408–425, Jun. 2018

work page 2018
[5]

Tactile Exploration Strategies With Natural Compliant Objects Elicit Virtual Stiffness Cues,

C. Xu, H. He, S. C. Hauser, and G. J. Gerling, “Tactile Exploration Strategies With Natural Compliant Objects Elicit Virtual Stiffness Cues,” IEEE Transactions on Haptics , vol. 13, no. 1, pp. 4–10, Jan. 2020

work page 2020
[6]

Experimental and Computational Analysis of Soft Tissue Stiffness in Forearm Using a Manual Indentation Device,

J. T. Iivarinen, R. K. Korhonen, P. Julkunen, and J. S. Jurvelin, “Experimental and Computational Analysis of Soft Tissue Stiffness in Forearm Using a Manual Indentation Device,” Medical Engineering & Physics, vol. 33, no. 10, pp. 1245–1253, Dec. 2011

work page 2011
[7]

Biomedical Applications of Soft Robotics,

M. Cianchetti, C. Laschi, A. Menciassi, and P. Dario, “Biomedical Applications of Soft Robotics,” Nature Reviews Materials , vol. 3, no. 6, pp. 143–153, Jun. 2018

work page 2018
[8]

Perception of Stiffness in Laparoscopy – the Fulcrum Effect,

I. Nisky, F. Huang, A. Milstein, C. M. Pugh, F. A. Mussa-ivaldi, and A. Karniel, “Perception of Stiffness in Laparoscopy – the Fulcrum Effect,” Studies in health technology and informatics , vol. 173, pp. 313–319, 2012

work page 2012
[9]

Perception and Action in Teleoperated Needle Insertion,

I. Nisky, A. Pressman, C. M. Pugh, F. A. Mussa-Ivaldi, and A. Karniel, “Perception and Action in Teleoperated Needle Insertion,” IEEE Transactions on Haptics , vol. 4, no. 3, pp. 155–166, Jul. 2011

work page 2011
[10]

A Wearable Pneumatic- Piezoelectric System for Quantitative Assessment of Skeletomuscular Biomechanics,

D. Gao, J. P. Lee, J. Chen, L. S. Tay, Y . Xin, K. Parida, M. W. M. Tan, P. Huang, K. H. Kong, and P. S. Lee, “A Wearable Pneumatic- Piezoelectric System for Quantitative Assessment of Skeletomuscular Biomechanics,” Device, vol. 2, no. 3, p. 100288, Mar. 2024

work page 2024
[11]

Effect of Material Hardness on Friction Between a Bare Finger and Dry and Lubricated Artificial Skin,

K. Inoue, S. Okamoto, Y . Akiyama, and Y . Yamada, “Effect of Material Hardness on Friction Between a Bare Finger and Dry and Lubricated Artificial Skin,” IEEE Transactions on Haptics , vol. 13, no. 1, pp. 123–129, Jan. 2020

work page 2020
[12]

An Investigation on the Effects of in Vitro Induced Advanced Glycation End-Products on Cortical Bone Fracture Mechanics at Fall-Related Loading Rates,

M. Britton, E. Parle, and T. J. Vaughan, “An Investigation on the Effects of in Vitro Induced Advanced Glycation End-Products on Cortical Bone Fracture Mechanics at Fall-Related Loading Rates,” Journal of the Mechanical Behavior of Biomedical Materials , vol. 138, p. 105619, Feb. 2023

work page 2023
[13]

Mechanical-Based and Optical-Based Methods for Nondestructive Evaluation of Fruit Firmness,

S. Tian and H. Xu, “Mechanical-Based and Optical-Based Methods for Nondestructive Evaluation of Fruit Firmness,” F ood Reviews International, vol. 39, no. 7, pp. 4009–4039, Aug. 2023

work page 2023
[14]

Hyperspectral Scattering for Assessing Peach Fruit Firmness,

R. Lu and Y . Peng, “Hyperspectral Scattering for Assessing Peach Fruit Firmness,” Biosystems Engineering , vol. 93, no. 2, pp. 161–171, Feb. 2006

work page 2006
[15]

GelSight: High-Resolution Robot Tactile Sensors for Estimating Geometry and Force,

W. Yuan, S. Dong, and E. H. Adelson, “GelSight: High-Resolution Robot Tactile Sensors for Estimating Geometry and Force,” Sensors, vol. 17, no. 12, p. 2762, Dec. 2017

work page 2017
[16]

Visuo- Haptic Object Perception for Robots: An Overview,

N. Navarro-Guerrero, S. Toprak, J. Josifovski, and L. Jamone, “Visuo- Haptic Object Perception for Robots: An Overview,” Autonomous Robots, vol. 47, no. 4, pp. 377–403, Apr. 2023

work page 2023
[17]

Low-Cost Teleoperation with Haptic Feedback through Vision-based Tactile Sensors for Rigid and Soft Object Manipulation,

M. Lippi, M. C. Welle, M. K. Wozniak, A. Gasparri, and D. Kragic, “Low-Cost Teleoperation with Haptic Feedback through Vision-based Tactile Sensors for Rigid and Soft Object Manipulation,” arXiv, Tech. Rep. arXiv:2403.16764, Mar. 2024

work page arXiv 2024
[18]

Shape-Independent Hardness Estimation Using Deep Learning and a Gelsight Tactile Sensor,

W. Yuan, C. Zhu, A. Owens, M. A. Srinivasan, and E. H. Adelson, “Shape-Independent Hardness Estimation Using Deep Learning and a Gelsight Tactile Sensor,” in IEEE International Conference on Robotics and Automation (ICRA) . Singapore: IEEE, May 2017, pp. 951–958

work page 2017
[19]

Learning Object Compliance via Young’s Modulus from Single Grasps using Camera-Based Tactile Sensors,

M. Burgess, J. Zhao, and L. Willemet, “Learning Object Compliance via Young’s Modulus from Single Grasps using Camera-Based Tactile Sensors,” arXiv, Tech. Rep. arXiv:2406.15304, 2025

work page arXiv 2025
[20]

Toward Vision- Based Object Compliance Estimation,

M. Kuhlmann, Z. Li, and N. Navarro-Guerrero, “Toward Vision- Based Object Compliance Estimation,” in German Robotics Conference (GRC), ser. 1st, Nuremberg, Germany, Mar. 2025, pp. 1–3

work page 2025
[21]

On the Relation between Indentation Hardness and Young’s Modulus,

A. N. Gent, “On the Relation between Indentation Hardness and Young’s Modulus,”Rubber Chemistry and Technology , vol. 31, no. 4, pp. 896–906, Sep. 1958

work page 1958
[22]

Orbit: A Unified Simulation Framework for Interactive Robot Learning Environments,

M. Mittal, C. Yu, Q. Yu, J. Liu, N. Rudin, D. Hoeller, J. L. Yuan, R. Singh, Y . Guo, H. Mazhar, A. Mandlekar, B. Babich, G. State, M. Hutter, and A. Garg, “Orbit: A Unified Simulation Framework for Interactive Robot Learning Environments,” IEEE Robotics and Automation Letters, vol. 8, no. 6, pp. 3740–3747, Jun. 2023

work page 2023
[23]

de Borst, M

R. de Borst, M. A. Crisfield, J. J. C. Remmers, and C. V . Verhoosel, Non-Linear Finite Element Analysis of Solids and Structures , 2nd ed., ser. Wiley Series in Computational Mechanics. Chichester, West Sussex, United Kingdom: Wiley, 2012

work page 2012
[24]

DefGraspSim: Physics-Based Simulation of Grasp Outcomes for 3D Deformable Objects,

I. Huang, Y . Narang, C. Eppner, B. Sundaralingam, M. Macklin, R. Bajcsy, T. Hermans, and D. Fox, “DefGraspSim: Physics-Based Simulation of Grasp Outcomes for 3D Deformable Objects,” IEEE Robotics and Automation Letters , vol. 7, no. 3, pp. 6274–6281, 2022

work page 2022
[25]

Material property data,

MatWeb, “Material property data,” 2024. [Online]. Available: https://www.matweb.com

work page 2024
[26]

Can You Estimate Modulus from Durometer Hardness for Silicones? Yes, but Only Roughly . . . and You Must Choose Your Modulus Carefully!

K. Larson, “Can You Estimate Modulus from Durometer Hardness for Silicones? Yes, but Only Roughly . . . and You Must Choose Your Modulus Carefully!” Dow Chemical Company, White Paper, 2017

work page 2017
[27]

The Hertzian Contact Surface,

A. C. Fischer-Cripps, “The Hertzian Contact Surface,” Journal of Materials Science , vol. 34, no. 1, pp. 129–137, Jan. 1999

work page 1999
[28]

On Hooke’s law,

J. Rychlewski, “On Hooke’s law,” Journal of Applied Mathematics and Mechanics , vol. 48, no. 3, pp. 303–314, Jan. 1984

work page 1984
[29]

On the Accuracy of the Hertz Model to Describe the Normal Contact of Soft Elastic Spheres,

E. Dintwa, E. Tijskens, and H. Ramon, “On the Accuracy of the Hertz Model to Describe the Normal Contact of Soft Elastic Spheres,” Granular Matter , vol. 10, no. 3, pp. 209–221, Mar. 2008

work page 2008
[30]

Well Log Normalization: Methods and Guidelines,

D. E. Shier, “Well Log Normalization: Methods and Guidelines,” Petrophysics - The SPWLA Journal , vol. 45, no. 03, May 2004

work page 2004
[31]

A Touch, Vision, and Language Dataset for Multimodal Alignment,

L. Fu, G. Datta, H. Huang, W. C.-H. Panitch, J. Drake, J. Ortiz, M. Mukadam, M. Lambeta, R. Calandra, and K. Goldberg, “A Touch, Vision, and Language Dataset for Multimodal Alignment,” in International Conference on Machine Learning (ICML) , ser. ICML’24, vol. 235. Vienna, Austria: JMLR.org, Jul. 2024, pp. 14 080–14 101

work page 2024
[32]

Learning Incipient Slip with Gelsight Sensors: Attention Classification with Video Vision Transformers,

A. Parag, E. H. Adelson, and E. Misimi, “Learning Incipient Slip with Gelsight Sensors: Attention Classification with Video Vision Transformers,” in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) , Oct. 2024, pp. 13 960–13 966

work page 2024
[33]

The Coefficient of Determination R-Squared Is More Informative Than SMAPE, MAE, MAPE, MSE and RMSE in Regression Analysis Evaluation,

D. Chicco, M. J. Warrens, and G. Jurman, “The Coefficient of Determination R-Squared Is More Informative Than SMAPE, MAE, MAPE, MSE and RMSE in Regression Analysis Evaluation,” PeerJ Computer Science , vol. 7, p. e623, Jul. 2021

work page 2021
[34]

An Introduction to Convolutional Neural Networks

K. O’Shea and R. Nash, “An Introduction to Convolutional Neural Networks,” arXiv, Tech. Rep. arXiv:1511.08458, Dec. 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015
[35]

Long-Term Recurrent Convolutional Networks for Visual Recognition and Description,

J. Donahue, L. A. Hendricks, M. Rohrbach, S. Venugopalan, S. Guadar- rama, K. Saenko, and T. Darrell, “Long-Term Recurrent Convolutional Networks for Visual Recognition and Description,” IEEE Transactions on Pattern Analysis and Machine Intelligence , vol. 39, no. 04, pp. 677–691, Apr. 2017

work page 2017
[36]

A Review of Recurrent Neural Networks: LSTM Cells and Network Architectures,

Y . Yu, X. Si, C. Hu, and J. Zhang, “A Review of Recurrent Neural Networks: LSTM Cells and Network Architectures,” Neural Computation, vol. 31, no. 7, pp. 1235–1270, Jul. 2019

work page 2019
[37]

Deep Residual Learning for Image Recognition,

K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , Las Vegas, NV , USA, 2016, pp. 770–778

work page 2016
[38]

Attention Is All You Need,

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention Is All You Need,” in Advances in Neural Information Processing Systems (NIPS) , vol. 30, Long Beach, CA, USA, 2017, p. 11

work page 2017
[39]

ViViT: A Video Vision Transformer,

A. Arnab, M. Dehghani, G. Heigold, C. Sun, M. Lu ˇci´c, and C. Schmid, “ViViT: A Video Vision Transformer,” in IEEE/CVF International Conference on Computer Vision (ICCV) , Oct. 2021, pp. 6816–6826

work page 2021
[40]

Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet,

L. Yuan, Y . Chen, T. Wang, W. Yu, Y . Shi, Z. Jiang, F. E. H. Tay, J. Feng, and S. Yan, “Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet,” in IEEE/CVF International Conference on Computer Vision (ICCV) , Montreal, QC, Canada, 2021, pp. 538–547

work page 2021
[41]

Transferring Tactile Data Across Sensors,

W. Zai El Amri, M. Kuhlmann, and N. Navarro-Guerrero, “Transferring Tactile Data Across Sensors,” in 40th Anniversary of the IEEE Conference on Robotics and Automation (ICRA@40) , Rotterdam, The Netherlands, Sep. 2024, pp. 1540–1542

work page 2024
[42]

ACROSS: A Deformation-Based Cross-Modal Representation for Robotic Tactile Perception,

——, “ACROSS: A Deformation-Based Cross-Modal Representation for Robotic Tactile Perception,” in IEEE International Conference on Robotics and Automation (ICRA) , Atlanta, GA, USA, 2025, pp. 1–8

work page 2025

[1] [1]

Tactile Sensing in Dexterous Robot Hands – Review,

Z. Kappassov, J.-A. Corrales, and V . Perdereau, “Tactile Sensing in Dexterous Robot Hands – Review,” Robotics and Autonomous Systems , vol. 74, Part A, pp. 195–220, Dec. 2015

work page 2015

[2] [2]

D. R. H. Jones and M. F. Ashby, Engineering Materials 1: An Introduction to Properties, Applications and Design , 5th ed. Oxford, United Kingdom: Butterworth-Heinemann, 2019, vol. 1

work page 2019

[3] [3]

Single- Grasp Object Classification and Feature Extraction with Simple Robot Hands and Tactile Sensors,

A. J. Spiers, M. V . Liarokapis, B. Calli, and A. M. Dollar, “Single- Grasp Object Classification and Feature Extraction with Simple Robot Hands and Tactile Sensors,” IEEE Transactions on Haptics , vol. 9, no. 2, pp. 207–220, 2016

work page 2016

[4] [4]

Evaluating Inte- gration Strategies for Visuo-Haptic Object Recognition,

S. Toprak, N. Navarro-Guerrero, and S. Wermter, “Evaluating Inte- gration Strategies for Visuo-Haptic Object Recognition,” Cognitive Computation, vol. 10, no. 3, pp. 408–425, Jun. 2018

work page 2018

[5] [5]

Tactile Exploration Strategies With Natural Compliant Objects Elicit Virtual Stiffness Cues,

C. Xu, H. He, S. C. Hauser, and G. J. Gerling, “Tactile Exploration Strategies With Natural Compliant Objects Elicit Virtual Stiffness Cues,” IEEE Transactions on Haptics , vol. 13, no. 1, pp. 4–10, Jan. 2020

work page 2020

[6] [6]

Experimental and Computational Analysis of Soft Tissue Stiffness in Forearm Using a Manual Indentation Device,

J. T. Iivarinen, R. K. Korhonen, P. Julkunen, and J. S. Jurvelin, “Experimental and Computational Analysis of Soft Tissue Stiffness in Forearm Using a Manual Indentation Device,” Medical Engineering & Physics, vol. 33, no. 10, pp. 1245–1253, Dec. 2011

work page 2011

[7] [7]

Biomedical Applications of Soft Robotics,

M. Cianchetti, C. Laschi, A. Menciassi, and P. Dario, “Biomedical Applications of Soft Robotics,” Nature Reviews Materials , vol. 3, no. 6, pp. 143–153, Jun. 2018

work page 2018

[8] [8]

Perception of Stiffness in Laparoscopy – the Fulcrum Effect,

I. Nisky, F. Huang, A. Milstein, C. M. Pugh, F. A. Mussa-ivaldi, and A. Karniel, “Perception of Stiffness in Laparoscopy – the Fulcrum Effect,” Studies in health technology and informatics , vol. 173, pp. 313–319, 2012

work page 2012

[9] [9]

Perception and Action in Teleoperated Needle Insertion,

I. Nisky, A. Pressman, C. M. Pugh, F. A. Mussa-Ivaldi, and A. Karniel, “Perception and Action in Teleoperated Needle Insertion,” IEEE Transactions on Haptics , vol. 4, no. 3, pp. 155–166, Jul. 2011

work page 2011

[10] [10]

A Wearable Pneumatic- Piezoelectric System for Quantitative Assessment of Skeletomuscular Biomechanics,

D. Gao, J. P. Lee, J. Chen, L. S. Tay, Y . Xin, K. Parida, M. W. M. Tan, P. Huang, K. H. Kong, and P. S. Lee, “A Wearable Pneumatic- Piezoelectric System for Quantitative Assessment of Skeletomuscular Biomechanics,” Device, vol. 2, no. 3, p. 100288, Mar. 2024

work page 2024

[11] [11]

Effect of Material Hardness on Friction Between a Bare Finger and Dry and Lubricated Artificial Skin,

K. Inoue, S. Okamoto, Y . Akiyama, and Y . Yamada, “Effect of Material Hardness on Friction Between a Bare Finger and Dry and Lubricated Artificial Skin,” IEEE Transactions on Haptics , vol. 13, no. 1, pp. 123–129, Jan. 2020

work page 2020

[12] [12]

An Investigation on the Effects of in Vitro Induced Advanced Glycation End-Products on Cortical Bone Fracture Mechanics at Fall-Related Loading Rates,

M. Britton, E. Parle, and T. J. Vaughan, “An Investigation on the Effects of in Vitro Induced Advanced Glycation End-Products on Cortical Bone Fracture Mechanics at Fall-Related Loading Rates,” Journal of the Mechanical Behavior of Biomedical Materials , vol. 138, p. 105619, Feb. 2023

work page 2023

[13] [13]

Mechanical-Based and Optical-Based Methods for Nondestructive Evaluation of Fruit Firmness,

S. Tian and H. Xu, “Mechanical-Based and Optical-Based Methods for Nondestructive Evaluation of Fruit Firmness,” F ood Reviews International, vol. 39, no. 7, pp. 4009–4039, Aug. 2023

work page 2023

[14] [14]

Hyperspectral Scattering for Assessing Peach Fruit Firmness,

R. Lu and Y . Peng, “Hyperspectral Scattering for Assessing Peach Fruit Firmness,” Biosystems Engineering , vol. 93, no. 2, pp. 161–171, Feb. 2006

work page 2006

[15] [15]

GelSight: High-Resolution Robot Tactile Sensors for Estimating Geometry and Force,

W. Yuan, S. Dong, and E. H. Adelson, “GelSight: High-Resolution Robot Tactile Sensors for Estimating Geometry and Force,” Sensors, vol. 17, no. 12, p. 2762, Dec. 2017

work page 2017

[16] [16]

Visuo- Haptic Object Perception for Robots: An Overview,

N. Navarro-Guerrero, S. Toprak, J. Josifovski, and L. Jamone, “Visuo- Haptic Object Perception for Robots: An Overview,” Autonomous Robots, vol. 47, no. 4, pp. 377–403, Apr. 2023

work page 2023

[17] [17]

Low-Cost Teleoperation with Haptic Feedback through Vision-based Tactile Sensors for Rigid and Soft Object Manipulation,

M. Lippi, M. C. Welle, M. K. Wozniak, A. Gasparri, and D. Kragic, “Low-Cost Teleoperation with Haptic Feedback through Vision-based Tactile Sensors for Rigid and Soft Object Manipulation,” arXiv, Tech. Rep. arXiv:2403.16764, Mar. 2024

work page arXiv 2024

[18] [18]

Shape-Independent Hardness Estimation Using Deep Learning and a Gelsight Tactile Sensor,

W. Yuan, C. Zhu, A. Owens, M. A. Srinivasan, and E. H. Adelson, “Shape-Independent Hardness Estimation Using Deep Learning and a Gelsight Tactile Sensor,” in IEEE International Conference on Robotics and Automation (ICRA) . Singapore: IEEE, May 2017, pp. 951–958

work page 2017

[19] [19]

Learning Object Compliance via Young’s Modulus from Single Grasps using Camera-Based Tactile Sensors,

M. Burgess, J. Zhao, and L. Willemet, “Learning Object Compliance via Young’s Modulus from Single Grasps using Camera-Based Tactile Sensors,” arXiv, Tech. Rep. arXiv:2406.15304, 2025

work page arXiv 2025

[20] [20]

Toward Vision- Based Object Compliance Estimation,

M. Kuhlmann, Z. Li, and N. Navarro-Guerrero, “Toward Vision- Based Object Compliance Estimation,” in German Robotics Conference (GRC), ser. 1st, Nuremberg, Germany, Mar. 2025, pp. 1–3

work page 2025

[21] [21]

On the Relation between Indentation Hardness and Young’s Modulus,

A. N. Gent, “On the Relation between Indentation Hardness and Young’s Modulus,”Rubber Chemistry and Technology , vol. 31, no. 4, pp. 896–906, Sep. 1958

work page 1958

[22] [22]

Orbit: A Unified Simulation Framework for Interactive Robot Learning Environments,

M. Mittal, C. Yu, Q. Yu, J. Liu, N. Rudin, D. Hoeller, J. L. Yuan, R. Singh, Y . Guo, H. Mazhar, A. Mandlekar, B. Babich, G. State, M. Hutter, and A. Garg, “Orbit: A Unified Simulation Framework for Interactive Robot Learning Environments,” IEEE Robotics and Automation Letters, vol. 8, no. 6, pp. 3740–3747, Jun. 2023

work page 2023

[23] [23]

de Borst, M

R. de Borst, M. A. Crisfield, J. J. C. Remmers, and C. V . Verhoosel, Non-Linear Finite Element Analysis of Solids and Structures , 2nd ed., ser. Wiley Series in Computational Mechanics. Chichester, West Sussex, United Kingdom: Wiley, 2012

work page 2012

[24] [24]

DefGraspSim: Physics-Based Simulation of Grasp Outcomes for 3D Deformable Objects,

I. Huang, Y . Narang, C. Eppner, B. Sundaralingam, M. Macklin, R. Bajcsy, T. Hermans, and D. Fox, “DefGraspSim: Physics-Based Simulation of Grasp Outcomes for 3D Deformable Objects,” IEEE Robotics and Automation Letters , vol. 7, no. 3, pp. 6274–6281, 2022

work page 2022

[25] [25]

Material property data,

MatWeb, “Material property data,” 2024. [Online]. Available: https://www.matweb.com

work page 2024

[26] [26]

Can You Estimate Modulus from Durometer Hardness for Silicones? Yes, but Only Roughly . . . and You Must Choose Your Modulus Carefully!

K. Larson, “Can You Estimate Modulus from Durometer Hardness for Silicones? Yes, but Only Roughly . . . and You Must Choose Your Modulus Carefully!” Dow Chemical Company, White Paper, 2017

work page 2017

[27] [27]

The Hertzian Contact Surface,

A. C. Fischer-Cripps, “The Hertzian Contact Surface,” Journal of Materials Science , vol. 34, no. 1, pp. 129–137, Jan. 1999

work page 1999

[28] [28]

On Hooke’s law,

J. Rychlewski, “On Hooke’s law,” Journal of Applied Mathematics and Mechanics , vol. 48, no. 3, pp. 303–314, Jan. 1984

work page 1984

[29] [29]

On the Accuracy of the Hertz Model to Describe the Normal Contact of Soft Elastic Spheres,

E. Dintwa, E. Tijskens, and H. Ramon, “On the Accuracy of the Hertz Model to Describe the Normal Contact of Soft Elastic Spheres,” Granular Matter , vol. 10, no. 3, pp. 209–221, Mar. 2008

work page 2008

[30] [30]

Well Log Normalization: Methods and Guidelines,

D. E. Shier, “Well Log Normalization: Methods and Guidelines,” Petrophysics - The SPWLA Journal , vol. 45, no. 03, May 2004

work page 2004

[31] [31]

A Touch, Vision, and Language Dataset for Multimodal Alignment,

L. Fu, G. Datta, H. Huang, W. C.-H. Panitch, J. Drake, J. Ortiz, M. Mukadam, M. Lambeta, R. Calandra, and K. Goldberg, “A Touch, Vision, and Language Dataset for Multimodal Alignment,” in International Conference on Machine Learning (ICML) , ser. ICML’24, vol. 235. Vienna, Austria: JMLR.org, Jul. 2024, pp. 14 080–14 101

work page 2024

[32] [32]

Learning Incipient Slip with Gelsight Sensors: Attention Classification with Video Vision Transformers,

A. Parag, E. H. Adelson, and E. Misimi, “Learning Incipient Slip with Gelsight Sensors: Attention Classification with Video Vision Transformers,” in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) , Oct. 2024, pp. 13 960–13 966

work page 2024

[33] [33]

The Coefficient of Determination R-Squared Is More Informative Than SMAPE, MAE, MAPE, MSE and RMSE in Regression Analysis Evaluation,

D. Chicco, M. J. Warrens, and G. Jurman, “The Coefficient of Determination R-Squared Is More Informative Than SMAPE, MAE, MAPE, MSE and RMSE in Regression Analysis Evaluation,” PeerJ Computer Science , vol. 7, p. e623, Jul. 2021

work page 2021

[34] [34]

An Introduction to Convolutional Neural Networks

K. O’Shea and R. Nash, “An Introduction to Convolutional Neural Networks,” arXiv, Tech. Rep. arXiv:1511.08458, Dec. 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015

[35] [35]

Long-Term Recurrent Convolutional Networks for Visual Recognition and Description,

J. Donahue, L. A. Hendricks, M. Rohrbach, S. Venugopalan, S. Guadar- rama, K. Saenko, and T. Darrell, “Long-Term Recurrent Convolutional Networks for Visual Recognition and Description,” IEEE Transactions on Pattern Analysis and Machine Intelligence , vol. 39, no. 04, pp. 677–691, Apr. 2017

work page 2017

[36] [36]

A Review of Recurrent Neural Networks: LSTM Cells and Network Architectures,

Y . Yu, X. Si, C. Hu, and J. Zhang, “A Review of Recurrent Neural Networks: LSTM Cells and Network Architectures,” Neural Computation, vol. 31, no. 7, pp. 1235–1270, Jul. 2019

work page 2019

[37] [37]

Deep Residual Learning for Image Recognition,

K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , Las Vegas, NV , USA, 2016, pp. 770–778

work page 2016

[38] [38]

Attention Is All You Need,

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention Is All You Need,” in Advances in Neural Information Processing Systems (NIPS) , vol. 30, Long Beach, CA, USA, 2017, p. 11

work page 2017

[39] [39]

ViViT: A Video Vision Transformer,

A. Arnab, M. Dehghani, G. Heigold, C. Sun, M. Lu ˇci´c, and C. Schmid, “ViViT: A Video Vision Transformer,” in IEEE/CVF International Conference on Computer Vision (ICCV) , Oct. 2021, pp. 6816–6826

work page 2021

[40] [40]

Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet,

L. Yuan, Y . Chen, T. Wang, W. Yu, Y . Shi, Z. Jiang, F. E. H. Tay, J. Feng, and S. Yan, “Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet,” in IEEE/CVF International Conference on Computer Vision (ICCV) , Montreal, QC, Canada, 2021, pp. 538–547

work page 2021

[41] [41]

Transferring Tactile Data Across Sensors,

W. Zai El Amri, M. Kuhlmann, and N. Navarro-Guerrero, “Transferring Tactile Data Across Sensors,” in 40th Anniversary of the IEEE Conference on Robotics and Automation (ICRA@40) , Rotterdam, The Netherlands, Sep. 2024, pp. 1540–1542

work page 2024

[42] [42]

ACROSS: A Deformation-Based Cross-Modal Representation for Robotic Tactile Perception,

——, “ACROSS: A Deformation-Based Cross-Modal Representation for Robotic Tactile Perception,” in IEEE International Conference on Robotics and Automation (ICRA) , Atlanta, GA, USA, 2025, pp. 1–8

work page 2025