TacSE3: Equivariant SE(3) Motion Estimation from Low-Texture Visuotactile Images for In-Gripper Tracking and Compensation

Fei Meng; Haobo Liang; Jun Ma; Junzhe Wang; Michael Yu Wang; Qingyang Liu; Yi Cai; Zhenmin Huang; Zhongyuan Liao

arxiv: 2605.17929 · v1 · pith:WFEKLE64new · submitted 2026-05-18 · 💻 cs.RO

TacSE3: Equivariant SE(3) Motion Estimation from Low-Texture Visuotactile Images for In-Gripper Tracking and Compensation

Zhongyuan Liao , Junzhe Wang , Qingyang Liu , Zhenmin Huang , Jun Ma , Yi Cai , Fei Meng , Haobo Liang

show 1 more author

Michael Yu Wang

This is my paper

Pith reviewed 2026-05-20 10:30 UTC · model grok-4.3

classification 💻 cs.RO

keywords visuotactile sensingSE(3) motion estimationin-gripper trackingtactile force fieldrobotic in-hand manipulationcontact centroidshear responsedisturbance compensation

0 comments

The pith

TacSE3 converts low-texture visuotactile images into a decoupled 3D force field to estimate incremental SE(3) rigid-body motion for in-gripper tracking and compensation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Robotic in-hand manipulation often loses visual access to objects inside the gripper. Low-texture visuotactile images supply few reliable features for standard matching methods. TacSE3 turns these images into a decoupled three-dimensional force field. Planar translation is read from contact-centroid motion while rotation comes mainly from shear responses in the tactile data. Dual sensors cut down translation-rotation confusion and supply a usable compensation signal that improves disturbance handling without retraining the base policy.

Core claim

TacSE3 is a tactile motion-estimation pipeline that converts low-texture visuotactile observations into a decoupled three-dimensional force field and estimates incremental rigid-body motion on SE(3). The method derives planar translation from contact-centroid motion and estimates rotation primarily from shear-related tactile responses, yielding a physically interpretable signal for in-gripper tracking and compensation. Experiments with paired DM-Tac fingertip sensors show that dual-sensor sensing reduces translation-rotation ambiguity and supports rotation tracking across axes and object geometries.

What carries the argument

The decoupled three-dimensional force field derived from paired visuotactile images, which separates planar translation (via contact-centroid motion) from rotation (via shear-related responses) to produce incremental SE(3) estimates.

Load-bearing premise

Low-texture visuotactile observations can be reliably converted into a decoupled three-dimensional force field from which incremental rigid-body motion on SE(3) can be estimated without significant ambiguity or sensor-specific calibration issues that would invalidate the tracking for varied object geometries.

What would settle it

Ground-truth comparison showing large discrepancies between estimated and actual object trajectories when using single sensors or when testing objects with substantially different contact geometries and textures.

Figures

Figures reproduced from arXiv: 2605.17929 by Fei Meng, Haobo Liang, Jun Ma, Junzhe Wang, Michael Yu Wang, Qingyang Liu, Yi Cai, Zhenmin Huang, Zhongyuan Liao.

**Figure 2.** Figure 2: Decoupled tangential and normal responses derived from tactile [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 4.** Figure 4: Pose tracking on the SE(3) manifold. Small twists ξt ∈ se(3) are mapped to SE(3) via the exponential map and sequentially integrated to form a continuous pose trajectory T0 → T1 → T2 → T3. is required, the integrated tactile pose is further aligned to the ground-truth pose frame through a calibration mapping on SE(3), so that the estimated local contact motion can be consistently compared with the real obj… view at source ↗

**Figure 5.** Figure 5: Refined tactile-geometric adjustment in a residual control framework. [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

**Figure 6.** Figure 6: Representative video screenshots of the grasped object rotating about the three principal axes, [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗

**Figure 7.** Figure 7: Comparison between single-sensor and dual-sensor configurations. Each subfigure shows the tactile image of the decomposed three-dimensional [PITH_FULL_IMAGE:figures/full_fig_p010_7.png] view at source ↗

**Figure 8.** Figure 8: Representative objects used in the multi-object evaluation. Most [PITH_FULL_IMAGE:figures/full_fig_p011_8.png] view at source ↗

**Figure 9.** Figure 9: Representative screenshots of the rotation process for the eight objects in the multi-object evaluation. The visualization interface shows the tracked [PITH_FULL_IMAGE:figures/full_fig_p012_9.png] view at source ↗

**Figure 10.** Figure 10: Experimental setup for the policy-level disturbance-recovery study. [PITH_FULL_IMAGE:figures/full_fig_p012_10.png] view at source ↗

**Figure 11.** Figure 11: Experimental process for the three policy-level tasks: Drawing, Gear Insertion, and Peg-in-Hole. Each task is illustrated as a sequence of four stages: [PITH_FULL_IMAGE:figures/full_fig_p013_11.png] view at source ↗

read the original abstract

Robotic in-hand manipulation requires reliable object-motion tracking under frequent visual occlusion, yet low-texture visuotactile images provide few stable correspondences for conventional image- or geometry-matching methods. This paper presents TacSE3, a tactile motion-estimation pipeline that converts low-texture visuotactile observations into a decoupled three-dimensional force field and estimates incremental rigid-body motion on SE(3). The method derives planar translation from contact-centroid motion and estimates rotation primarily from shear-related tactile responses, yielding a physically interpretable signal for in-gripper tracking and compensation. Experiments with paired DM-Tac fingertip sensors show that dual-sensor sensing reduces translation-rotation ambiguity, supports rotation tracking across axes and object geometries, and provides a lightweight compensation signal that improves disturbance tolerance in downstream manipulation tasks without retraining the base policy.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

TacSE3 shows a workable split of translation from contact centroids and rotation from shear in visuotactile data for in-gripper SE(3) tracking, but the decoupling may not stay clean on irregular contacts.

read the letter

The core of this paper is a pipeline that maps low-texture visuotactile images to a decoupled 3D force field, then pulls planar translation from centroid motion and rotation from shear responses inside an equivariant SE(3) setup. The experiments with paired DM-Tac sensors indicate that the dual view cuts translation-rotation confusion and supplies a compensation signal that helps downstream manipulation hold up better under disturbance, all without retraining the base policy. That combination is the main practical step forward for occlusion-heavy in-hand tasks.

Referee Report

3 major / 2 minor

Summary. The manuscript introduces TacSE3, a tactile motion-estimation pipeline that maps low-texture visuotactile images from paired DM-Tac fingertip sensors to a decoupled three-dimensional force field. Planar translation is derived from contact-centroid motion while rotation is estimated primarily from shear-related responses, enabling incremental SE(3) rigid-body tracking and compensation for in-gripper manipulation under visual occlusion. Experiments claim that dual-sensor sensing reduces translation-rotation ambiguity and supports tracking across axes and object geometries without retraining base policies.

Significance. If the decoupling and physical interpretability hold, the work provides a lightweight, sensor-driven alternative to geometry- or texture-matching methods for occluded in-hand tracking. The emphasis on deriving motion from centroid and shear signals without heavy learning components could aid robustness in manipulation, though the absence of detailed quantitative validation limits evaluation of its practical advantage over existing visuotactile approaches.

major comments (3)

[Method / central derivation] The central claim that low-texture visuotactile observations can be converted into a decoupled 3D force field (from which SE(3) increments are estimated without significant ambiguity) is load-bearing but unsupported by any equations, sensor model details, or derivation steps in the provided description. This makes it impossible to verify independence of translation and rotation components for non-convex geometries or partial-slip cases.
[Experiments] The abstract asserts that experiments with paired DM-Tac sensors show reduced ambiguity, rotation tracking across axes/geometries, and improved disturbance tolerance, yet no quantitative results, error metrics, data exclusion criteria, or baseline comparisons are supplied. This undermines substantiation of the cross-geometry and dual-sensor claims.
[Method / force-field construction] The decoupling premise—that centroid motion isolates planar translation while shear isolates rotation—requires explicit validation against coupling that may arise for irregular contact patches; without this, the SE(3) increment assumption risks violation for varied object shapes.

minor comments (2)

The title references 'Equivariant SE(3)' but the abstract does not specify how equivariance is implemented or enforced in the pipeline; adding a brief statement on this would clarify the contribution relative to standard rigid-motion estimation.
Notation for the force-field components and contact centroid should be defined consistently at first use to aid readability for readers unfamiliar with DM-Tac sensor outputs.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive feedback on our manuscript. We address each of the major comments below and outline the revisions we plan to make to strengthen the presentation of the method and experiments.

read point-by-point responses

Referee: [Method / central derivation] The central claim that low-texture visuotactile observations can be converted into a decoupled 3D force field (from which SE(3) increments are estimated without significant ambiguity) is load-bearing but unsupported by any equations, sensor model details, or derivation steps in the provided description. This makes it impossible to verify independence of translation and rotation components for non-convex geometries or partial-slip cases.

Authors: We appreciate this point and agree that the derivation should be more explicit to allow verification. The full manuscript includes a sensor model in Section III and the force field construction in Section IV, where planar translation is derived from the shift in contact centroid and rotation from integrated shear responses. However, to address the concern, we will expand the method section with detailed equations for the 3D force field mapping and the SE(3) pose increment computation. We will also add a discussion on the assumptions of decoupling, including potential issues with non-convex geometries and partial slip, and how the dual-sensor setup mitigates ambiguity. revision: yes
Referee: [Experiments] The abstract asserts that experiments with paired DM-Tac sensors show reduced ambiguity, rotation tracking across axes/geometries, and improved disturbance tolerance, yet no quantitative results, error metrics, data exclusion criteria, or baseline comparisons are supplied. This undermines substantiation of the cross-geometry and dual-sensor claims.

Authors: The experiments section of the manuscript does include quantitative evaluations, such as mean translation and rotation errors across different objects and axes, as well as comparisons to single-sensor and vision-based baselines. Data collection involved multiple trials with criteria for excluding failed contacts. To better highlight these results and address the comment, we will add a summary table of key metrics, explicitly state the data exclusion criteria, and include additional baseline comparisons in the revised manuscript. revision: yes
Referee: [Method / force-field construction] The decoupling premise—that centroid motion isolates planar translation while shear isolates rotation—requires explicit validation against coupling that may arise for irregular contact patches; without this, the SE(3) increment assumption risks violation for varied object shapes.

Authors: This is a valid concern. While our experiments test the method on objects with varying geometries to show robustness, we did not provide a dedicated analysis of coupling effects for irregular patches. In the revision, we will include additional experiments or simulations validating the decoupling for irregular contact patches and discuss cases where the assumption may be violated, such as in partial slip scenarios. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation relies on independent physical contact models

full rationale

The paper derives planar translation from contact-centroid motion and rotation from shear-related tactile responses within a visuotactile-to-decoupled-3D-force-field pipeline. This chain is presented as grounded in sensor physics and dual DM-Tac fingertip observations rather than any self-definitional loop, fitted-parameter renaming, or load-bearing self-citation. The abstract and description contain no equations that reduce the SE(3) increment output to the input observations by construction; the decoupling assumption is an external modeling choice subject to experimental validation, not an internal tautology. The central claim therefore remains self-contained and does not trigger any of the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Based solely on the abstract, the central claim rests on the domain assumption that visuotactile images yield a usable decoupled 3D force field. No free parameters, invented entities, or additional axioms are explicitly stated or quantifiable from the given text.

axioms (1)

domain assumption Low-texture visuotactile observations can be converted into a decoupled three-dimensional force field suitable for SE(3) motion estimation
This conversion is presented as the starting point for deriving translation from centroids and rotation from shear responses.

pith-pipeline@v0.9.0 · 5704 in / 1513 out tokens · 45371 ms · 2026-05-20T10:30:56.171198+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

converts low-texture visuotactile observations into a decoupled three-dimensional force field and estimates incremental rigid-body motion on SE(3). The method derives planar translation from contact-centroid motion and estimates rotation primarily from shear-related tactile responses

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

41 extracted references · 41 canonical work pages · 1 internal anchor

[1]

Hand movements: A window into haptic object recognition,

S. J. Lederman and R. L. Klatzky, “Hand movements: A window into haptic object recognition,”Cognitive Psychology, vol. 19, no. 3, pp. 342–368, 1987

work page 1987
[2]

Gelsight: High-resolution robot tactile sensors for estimating geometry and force,

W. Yuan, S. Dong, and E. H. Adelson, “Gelsight: High-resolution robot tactile sensors for estimating geometry and force,”Sensors, vol. 17, no. 12, p. 2762, 2017

work page 2017
[3]

Deltact: A vision-based tactile sensor using a dense color pattern,

G. Zhang, Y . Du, H. Yu, and M. Y . Wang, “Deltact: A vision-based tactile sensor using a dense color pattern,”IEEE Robotics and Automation Letters, vol. 7, no. 4, pp. 10 778–10 785, 2022. 14

work page 2022
[4]

Digit: A novel design for a low-cost compact high- resolution tactile sensor with application to in-hand manipulation,

M. Lambeta, P.-W. Chou, S. Tian, B. Yang, B. Maloon, V . R. Most, D. Stroud, R. Santos, A. Byagowi, G. Kammerer, D. Jayaraman, and R. Calandra, “Digit: A novel design for a low-cost compact high- resolution tactile sensor with application to in-hand manipulation,”IEEE Robotics and Automation Letters, vol. 5, no. 3, pp. 3838–3845, 2020

work page 2020
[5]

In-hand object pose estimation using covariance-based tactile to geometry matching,

J. Bimbo, S. Luo, K. Althoefer, and H. Liu, “In-hand object pose estimation using covariance-based tactile to geometry matching,”IEEE Robotics and Automation Letters, vol. 1, no. 1, pp. 570–577, 2016

work page 2016
[6]

Normalflow: Fast, robust, and accurate contact-based object 6dof pose tracking with vision-based tactile sensors,

H.-J. Huang, M. Kaess, and W. Yuan, “Normalflow: Fast, robust, and accurate contact-based object 6dof pose tracking with vision-based tactile sensors,”IEEE Robotics and Automation Letters, 2025

work page 2025
[7]

Autonomous robotic la- paroscopic surgery for intestinal anastomosis.Science Robotics, 7(62):eabj2908, 2022

S. Suresh, H. Qi, T. Wu, T. Fan, L. Pineda, M. Lambeta, J. Malik, M. Kalakrishnan, R. Calandra, M. Kaess, J. Ortiz, and M. Mukadam, “Neuralfeels with neural fields: Visuotactile perception for in-hand manipulation,”Science Robotics, vol. 9, no. 96, p. eadl0628, 2024. [Online]. Available: https://www.science.org/doi/abs/10.1126/scirobotics. adl0628

work page doi:10.1126/scirobotics 2024
[8]

V-hop: Visuo-haptic 6d object pose tracking,

H. Li, M. Jia, M. T. Akbulut, Y . Xiang, G. Konidaris, and S. Sridhar, “V-hop: Visuo-haptic 6d object pose tracking,” inProceedings of Robotics: Science and Systems, Los Angeles, CA, USA, June 2025

work page 2025
[9]

Patchgraph: In- hand tactile tracking with learned surface normals,

P. Sodhi, M. Kaess, M. Mukadanr, and S. Anderson, “Patchgraph: In- hand tactile tracking with learned surface normals,” in2022 International Conference on Robotics and Automation (ICRA), 2022, pp. 2164–2170

work page 2022
[10]

3D Shape Perception from Monocular Vision, Touch, and Shape Priors,

S. Wang, J. Wu, X. Sun, W. Yuan, W. T. Freeman, J. B. Tenenbaum, and E. H. Adelson, “3D Shape Perception from Monocular Vision, Touch, and Shape Priors,” inIEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2018, pp. 1606–1613

work page 2018
[11]

Kinectfusion: Real-time dense surface mapping and tracking,

R. A. Newcombe, S. Izadi, O. Hilliges, D. Molyneaux, D. Kim, A. J. Davison, P. Kohi, J. Shotton, S. Hodges, and A. Fitzgibbon, “Kinectfusion: Real-time dense surface mapping and tracking,” in2011 10th IEEE International Symposium on Mixed and Augmented Reality, 2011, pp. 127–136

work page 2011
[12]

Neuralangelo: High-fidelity neural surface reconstruction,

Z. Li, T. M ¨uller, A. Evans, R. H. Taylor, M. Unberath, M.-Y . Liu, and C.-H. Lin, “Neuralangelo: High-fidelity neural surface reconstruction,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023, pp. 8456–8465

work page 2023
[13]

Classification of vision-based tactile sensors: A review,

H. Li, Y . Lin, C. Lu, M. Yang, E. Psomopoulou, and N. F. Lepora, “Classification of vision-based tactile sensors: A review,”IEEE Sensors Journal, 2025

work page 2025
[14]

A survey of vision-based tactile sensors: Hardware, algorithm, application and future direction,

K. He, “A survey of vision-based tactile sensors: Hardware, algorithm, application and future direction,”IEEE Transactions on Instrumentation and Measurement, 2025

work page 2025
[15]

End-to-end pixelwise surface normal estimation with convolutional neural networks and shape reconstruction using gelsight sensor,

J. Li, S. Dong, and E. H. Adelson, “End-to-end pixelwise surface normal estimation with convolutional neural networks and shape reconstruction using gelsight sensor,” in2018 IEEE International Conference on Robotics and Biomimetics (ROBIO). IEEE, 2018, pp. 1292–1297

work page 2018
[16]

Tac2pose: Tactile object pose estimation from the first touch,

M. Bauza, A. Bronars, and A. Rodriguez, “Tac2pose: Tactile object pose estimation from the first touch,”The International Journal of Robotics Research, vol. 42, no. 13, pp. 1185–1209, 2023

work page 2023
[17]

Visuotactile 6d pose estimation of an in-hand object using vision and tactile sensor data,

S. Dikhale, K. Patel, D. Dhingra, I. Naramura, A. Hayashi, S. Iba, and N. Jamali, “Visuotactile 6d pose estimation of an in-hand object using vision and tactile sensor data,”IEEE Robotics and Automation Letters, vol. 7, no. 2, pp. 2148–2155, 2022

work page 2022
[18]

Hanging a t-shirt: A step towards deformable peg-in-hole manipulation with multimodal tactile feedback,

Y . Du, S. Aslam, M. Y . Wang, and B. E. Shi, “Hanging a t-shirt: A step towards deformable peg-in-hole manipulation with multimodal tactile feedback,” in2024 IEEE International Conference on Robotics and Biomimetics (ROBIO). IEEE, 2024, pp. 2074–2081

work page 2024
[19]

Quantitative hardness assessment with vision-based tactile sensing for fruit classification and grasping,

Z. Liao, Y . Du, J. Duan, H. Liang, and M. Y . Wang, “Quantitative hardness assessment with vision-based tactile sensing for fruit classification and grasping,”arXiv preprint arXiv:2505.05725, 2025

work page arXiv 2025
[20]

MidasTouch: Monte-Carlo inference over distributions across sliding touch,

S. Suresh, Z. Si, S. Anderson, M. Kaess, and M. Mukadam, “MidasTouch: Monte-Carlo inference over distributions across sliding touch,” in Proceedings of The 6th Conference on Robot Learning, Auckland, NZ, Dec. 2022

work page 2022
[21]

Gelsight wedge: Measuring high-resolution 3d contact geometry with a compact robot finger,

S. Wang, Y . She, B. Romero, and E. H. Adelson, “Gelsight wedge: Measuring high-resolution 3d contact geometry with a compact robot finger,” in2021 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2021

work page 2021
[22]

Object modeling by registration of multiple range images,

Y . Chen and G. Medioni, “Object modeling by registration of multiple range images,” inProceedings. 1991 IEEE International Conference on Robotics and Automation, 1991, pp. 2724–2729 vol.3

work page 1991
[23]

Tensor field networks: Rotation- and translation-equivariant neural networks for 3D point clouds

N. Thomas, T. Smidt, S. Kearnes, L. Yang, L. Li, K. Kohlhoff, and P. Riley, “Tensor field networks: Rotation-and translation-equivariant neural networks for 3d point clouds,”arXiv preprint arXiv:1802.08219, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[24]

Diffusion-edfs: Bi-equivariant denoising generative modeling on se (3) for visual robotic manipulation,

H. Ryu, J. Kim, H. An, J. Chang, J. Seo, T. Kim, Y . Kim, C. Hwang, J. Choi, and R. Horowitz, “Diffusion-edfs: Bi-equivariant denoising generative modeling on se (3) for visual robotic manipulation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 18 007–18 018

work page 2024
[25]

Raven: End-to-end equivariant robot learning with rgb cameras,

D. Klee, B. Hu, A. Cole, H. Tian, D. Wang, R. Platt, and R. Walters, “Raven: End-to-end equivariant robot learning with rgb cameras,” inThe Fourteenth International Conference on Learning Representations

work page
[26]

Equicontact: A hierarchical se(3) vision-to-force equivariant policy for spatially generalizable contact-rich tasks,

J. Seo, A. Kruthiventy, S. Lee, M. Teng, S. Choi, J. Choi, and R. Horowitz, “Equicontact: A hierarchical se(3) vision-to-force equivariant policy for spatially generalizable contact-rich tasks,”arXiv:2507.10961, 2025

work page arXiv 2025
[27]

Equact: An se (3)- equivariant multi-task transformer for 3d robotic manipulation,

X. Zhu, Y . Qi, Y . Zhu, R. Walters, and R. Platt, “Equact: An se (3)- equivariant multi-task transformer for 3d robotic manipulation,” inThe Fourteenth International Conference on Learning Representations

work page
[28]

Residual rotation correction using tactile equivariance,

Y . Zhu, Z. Ye, B. Hu, H. Zhao, Y . Qi, D. Wang, and R. Platt, “Residual rotation correction using tactile equivariance,”arXiv:2511.07381, 2025

work page arXiv 2025
[29]

Riemann: Near real-time se (3)-equivariant robot manipulation without point cloud segmentation,

C. Gao, Z. Xue, S. Deng, T. Liang, S. Yang, L. Shao, and H. Xu, “Riemann: Near real-time se (3)-equivariant robot manipulation without point cloud segmentation,” in8th Annual Conference on Robot Learning

work page
[30]

Simshear: Sim-to-real shear- based tactile servoing,

K. Freud, Y . Lin, and N. F. Lepora, “Simshear: Sim-to-real shear- based tactile servoing,” inProceedings of The 9th Conference on Robot Learning, vol. 305, 2025, pp. 3401–3412

work page 2025
[31]

3d-vitac: Learning fine-grained manipulation with visuo-tactile sensing,

B. Huang, Y . Wang, X. Yang, Y . Luo, and Y . Li, “3d-vitac: Learning fine-grained manipulation with visuo-tactile sensing,” inProceedings of The 8th Conference on Robot Learning, ser. Proceedings of Machine Learning Research, vol. 270, 2025, pp. 2557–2578

work page 2025
[32]

Mimictouch: Leveraging multi-modal human tactile demonstrations for contact-rich manipulation,

K. Yu, Y . Han, Q. Wang, V . Saxena, D. Xu, and Y . Zhao, “Mimictouch: Leveraging multi-modal human tactile demonstrations for contact-rich manipulation,” inProceedings of The 8th Conference on Robot Learning, ser. Proceedings of Machine Learning Research, vol. 270, 2025, pp. 4844–4865

work page 2025
[33]

Text2touch: Tactile in-hand manipulation with llm-designed reward functions,

H. Field, M. Yang, Y . Lin, E. Psomopoulou, D. A. Barton, and N. F. Lepora, “Text2touch: Tactile in-hand manipulation with llm-designed reward functions,” inProceedings of The 9th Conference on Robot Learning, ser. Proceedings of Machine Learning Research, vol. 305, 2025, pp. 2847–2887

work page 2025
[34]

Anyrotate: Gravity-invariant in-hand object rotation with sim-to-real touch,

M. Yang, C. Lu, A. Church, Y . Lin, C. J. Ford, H. Li, E. Psomopoulou, D. A. Barton, and N. F. Lepora, “Anyrotate: Gravity-invariant in-hand object rotation with sim-to-real touch,” inProceedings of The 8th Conference on Robot Learning, ser. Proceedings of Machine Learning Research, vol. 270, 2025, pp. 4727–4747

work page 2025
[35]

Learning visuotactile estimation and control for non-prehensile manipulation under occlusions,

J. Del Aguila Ferrandis, J. Moura, and S. Vijayakumar, “Learning visuotactile estimation and control for non-prehensile manipulation under occlusions,” inProceedings of The 8th Conference on Robot Learning, ser. Proceedings of Machine Learning Research, vol. 270, 2025, pp. 1501–1515

work page 2025
[36]

Tacumi: A multi-modal universal manipulation interface for contact-rich tasks,

T. Cheng, K. Chen, L. Chen, L. Zhang, Y . Zhang, Y . Ling, M. Hamad, Z. Bing, F. Wu, K. Sharmaet al., “Tacumi: A multi-modal uni- versal manipulation interface for contact-rich tasks,”arXiv preprint arXiv:2601.14550, 2026

work page arXiv 2026
[37]

exumi: Extensible robot teaching system with action-aware task-agnostic tactile representation,

Y . Xu, L. Wei, P. An, Q. Zhang, and Y .-L. Li, “exumi: Extensible robot teaching system with action-aware task-agnostic tactile representation,” in Proceedings of The 9th Conference on Robot Learning, ser. Proceedings of Machine Learning Research, vol. 305, 2025, pp. 2536–2554

work page 2025
[38]

Kinedex: Learning tactile-informed visuomotor policies via kinesthetic teaching for dexterous manipulation,

D. Zhang, C. Yuan, C. Wen, H. Zhang, J. Zhao, and Y . Gao, “Kinedex: Learning tactile-informed visuomotor policies via kinesthetic teaching for dexterous manipulation,” inProceedings of The 9th Conference on Robot Learning, ser. Proceedings of Machine Learning Research, vol. 305, 2025, pp. 4123–4138

work page 2025
[39]

Tactile beyond pixels: Multisensory touch representations for robot manipulation,

C. Higuera, A. Sharma, T. Fan, C. K. Bodduluri, B. Boots, M. Kaess, M. Lambeta, T. Wu, Z. Liu, F. R. Hogan, and M. Mukadam, “Tactile beyond pixels: Multisensory touch representations for robot manipulation,” inProceedings of The 9th Conference on Robot Learning, ser. Proceedings of Machine Learning Research, vol. 305, 2025, pp. 105–123

work page 2025
[40]

Dexskin: High-coverage conformable robotic skin for learning contact-rich manipulation,

S. Wistreich, B. Shi, S. Tian, S. Clarke, M. Nath, C. Xu, Z. Bao, and J. Wu, “Dexskin: High-coverage conformable robotic skin for learning contact-rich manipulation,” inProceedings of The 9th Conference on Robot Learning, ser. Proceedings of Machine Learning Research, vol. 305, 2025, pp. 769–793

work page 2025
[41]

3d contact point cloud reconstruction from vision-based tactile flow,

Y . Du, G. Zhang, and M. Y . Wang, “3d contact point cloud reconstruction from vision-based tactile flow,”IEEE Robotics and Automation Letters, vol. 7, no. 4, pp. 12 177–12 184, 2022

work page 2022

[1] [1]

Hand movements: A window into haptic object recognition,

S. J. Lederman and R. L. Klatzky, “Hand movements: A window into haptic object recognition,”Cognitive Psychology, vol. 19, no. 3, pp. 342–368, 1987

work page 1987

[2] [2]

Gelsight: High-resolution robot tactile sensors for estimating geometry and force,

W. Yuan, S. Dong, and E. H. Adelson, “Gelsight: High-resolution robot tactile sensors for estimating geometry and force,”Sensors, vol. 17, no. 12, p. 2762, 2017

work page 2017

[3] [3]

Deltact: A vision-based tactile sensor using a dense color pattern,

G. Zhang, Y . Du, H. Yu, and M. Y . Wang, “Deltact: A vision-based tactile sensor using a dense color pattern,”IEEE Robotics and Automation Letters, vol. 7, no. 4, pp. 10 778–10 785, 2022. 14

work page 2022

[4] [4]

Digit: A novel design for a low-cost compact high- resolution tactile sensor with application to in-hand manipulation,

M. Lambeta, P.-W. Chou, S. Tian, B. Yang, B. Maloon, V . R. Most, D. Stroud, R. Santos, A. Byagowi, G. Kammerer, D. Jayaraman, and R. Calandra, “Digit: A novel design for a low-cost compact high- resolution tactile sensor with application to in-hand manipulation,”IEEE Robotics and Automation Letters, vol. 5, no. 3, pp. 3838–3845, 2020

work page 2020

[5] [5]

In-hand object pose estimation using covariance-based tactile to geometry matching,

J. Bimbo, S. Luo, K. Althoefer, and H. Liu, “In-hand object pose estimation using covariance-based tactile to geometry matching,”IEEE Robotics and Automation Letters, vol. 1, no. 1, pp. 570–577, 2016

work page 2016

[6] [6]

Normalflow: Fast, robust, and accurate contact-based object 6dof pose tracking with vision-based tactile sensors,

H.-J. Huang, M. Kaess, and W. Yuan, “Normalflow: Fast, robust, and accurate contact-based object 6dof pose tracking with vision-based tactile sensors,”IEEE Robotics and Automation Letters, 2025

work page 2025

[7] [7]

Autonomous robotic la- paroscopic surgery for intestinal anastomosis.Science Robotics, 7(62):eabj2908, 2022

S. Suresh, H. Qi, T. Wu, T. Fan, L. Pineda, M. Lambeta, J. Malik, M. Kalakrishnan, R. Calandra, M. Kaess, J. Ortiz, and M. Mukadam, “Neuralfeels with neural fields: Visuotactile perception for in-hand manipulation,”Science Robotics, vol. 9, no. 96, p. eadl0628, 2024. [Online]. Available: https://www.science.org/doi/abs/10.1126/scirobotics. adl0628

work page doi:10.1126/scirobotics 2024

[8] [8]

V-hop: Visuo-haptic 6d object pose tracking,

H. Li, M. Jia, M. T. Akbulut, Y . Xiang, G. Konidaris, and S. Sridhar, “V-hop: Visuo-haptic 6d object pose tracking,” inProceedings of Robotics: Science and Systems, Los Angeles, CA, USA, June 2025

work page 2025

[9] [9]

Patchgraph: In- hand tactile tracking with learned surface normals,

P. Sodhi, M. Kaess, M. Mukadanr, and S. Anderson, “Patchgraph: In- hand tactile tracking with learned surface normals,” in2022 International Conference on Robotics and Automation (ICRA), 2022, pp. 2164–2170

work page 2022

[10] [10]

3D Shape Perception from Monocular Vision, Touch, and Shape Priors,

S. Wang, J. Wu, X. Sun, W. Yuan, W. T. Freeman, J. B. Tenenbaum, and E. H. Adelson, “3D Shape Perception from Monocular Vision, Touch, and Shape Priors,” inIEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2018, pp. 1606–1613

work page 2018

[11] [11]

Kinectfusion: Real-time dense surface mapping and tracking,

R. A. Newcombe, S. Izadi, O. Hilliges, D. Molyneaux, D. Kim, A. J. Davison, P. Kohi, J. Shotton, S. Hodges, and A. Fitzgibbon, “Kinectfusion: Real-time dense surface mapping and tracking,” in2011 10th IEEE International Symposium on Mixed and Augmented Reality, 2011, pp. 127–136

work page 2011

[12] [12]

Neuralangelo: High-fidelity neural surface reconstruction,

Z. Li, T. M ¨uller, A. Evans, R. H. Taylor, M. Unberath, M.-Y . Liu, and C.-H. Lin, “Neuralangelo: High-fidelity neural surface reconstruction,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023, pp. 8456–8465

work page 2023

[13] [13]

Classification of vision-based tactile sensors: A review,

H. Li, Y . Lin, C. Lu, M. Yang, E. Psomopoulou, and N. F. Lepora, “Classification of vision-based tactile sensors: A review,”IEEE Sensors Journal, 2025

work page 2025

[14] [14]

A survey of vision-based tactile sensors: Hardware, algorithm, application and future direction,

K. He, “A survey of vision-based tactile sensors: Hardware, algorithm, application and future direction,”IEEE Transactions on Instrumentation and Measurement, 2025

work page 2025

[15] [15]

End-to-end pixelwise surface normal estimation with convolutional neural networks and shape reconstruction using gelsight sensor,

J. Li, S. Dong, and E. H. Adelson, “End-to-end pixelwise surface normal estimation with convolutional neural networks and shape reconstruction using gelsight sensor,” in2018 IEEE International Conference on Robotics and Biomimetics (ROBIO). IEEE, 2018, pp. 1292–1297

work page 2018

[16] [16]

Tac2pose: Tactile object pose estimation from the first touch,

M. Bauza, A. Bronars, and A. Rodriguez, “Tac2pose: Tactile object pose estimation from the first touch,”The International Journal of Robotics Research, vol. 42, no. 13, pp. 1185–1209, 2023

work page 2023

[17] [17]

Visuotactile 6d pose estimation of an in-hand object using vision and tactile sensor data,

S. Dikhale, K. Patel, D. Dhingra, I. Naramura, A. Hayashi, S. Iba, and N. Jamali, “Visuotactile 6d pose estimation of an in-hand object using vision and tactile sensor data,”IEEE Robotics and Automation Letters, vol. 7, no. 2, pp. 2148–2155, 2022

work page 2022

[18] [18]

Hanging a t-shirt: A step towards deformable peg-in-hole manipulation with multimodal tactile feedback,

Y . Du, S. Aslam, M. Y . Wang, and B. E. Shi, “Hanging a t-shirt: A step towards deformable peg-in-hole manipulation with multimodal tactile feedback,” in2024 IEEE International Conference on Robotics and Biomimetics (ROBIO). IEEE, 2024, pp. 2074–2081

work page 2024

[19] [19]

Quantitative hardness assessment with vision-based tactile sensing for fruit classification and grasping,

Z. Liao, Y . Du, J. Duan, H. Liang, and M. Y . Wang, “Quantitative hardness assessment with vision-based tactile sensing for fruit classification and grasping,”arXiv preprint arXiv:2505.05725, 2025

work page arXiv 2025

[20] [20]

MidasTouch: Monte-Carlo inference over distributions across sliding touch,

S. Suresh, Z. Si, S. Anderson, M. Kaess, and M. Mukadam, “MidasTouch: Monte-Carlo inference over distributions across sliding touch,” in Proceedings of The 6th Conference on Robot Learning, Auckland, NZ, Dec. 2022

work page 2022

[21] [21]

Gelsight wedge: Measuring high-resolution 3d contact geometry with a compact robot finger,

S. Wang, Y . She, B. Romero, and E. H. Adelson, “Gelsight wedge: Measuring high-resolution 3d contact geometry with a compact robot finger,” in2021 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2021

work page 2021

[22] [22]

Object modeling by registration of multiple range images,

Y . Chen and G. Medioni, “Object modeling by registration of multiple range images,” inProceedings. 1991 IEEE International Conference on Robotics and Automation, 1991, pp. 2724–2729 vol.3

work page 1991

[23] [23]

Tensor field networks: Rotation- and translation-equivariant neural networks for 3D point clouds

N. Thomas, T. Smidt, S. Kearnes, L. Yang, L. Li, K. Kohlhoff, and P. Riley, “Tensor field networks: Rotation-and translation-equivariant neural networks for 3d point clouds,”arXiv preprint arXiv:1802.08219, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[24] [24]

Diffusion-edfs: Bi-equivariant denoising generative modeling on se (3) for visual robotic manipulation,

H. Ryu, J. Kim, H. An, J. Chang, J. Seo, T. Kim, Y . Kim, C. Hwang, J. Choi, and R. Horowitz, “Diffusion-edfs: Bi-equivariant denoising generative modeling on se (3) for visual robotic manipulation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 18 007–18 018

work page 2024

[25] [25]

Raven: End-to-end equivariant robot learning with rgb cameras,

D. Klee, B. Hu, A. Cole, H. Tian, D. Wang, R. Platt, and R. Walters, “Raven: End-to-end equivariant robot learning with rgb cameras,” inThe Fourteenth International Conference on Learning Representations

work page

[26] [26]

Equicontact: A hierarchical se(3) vision-to-force equivariant policy for spatially generalizable contact-rich tasks,

J. Seo, A. Kruthiventy, S. Lee, M. Teng, S. Choi, J. Choi, and R. Horowitz, “Equicontact: A hierarchical se(3) vision-to-force equivariant policy for spatially generalizable contact-rich tasks,”arXiv:2507.10961, 2025

work page arXiv 2025

[27] [27]

Equact: An se (3)- equivariant multi-task transformer for 3d robotic manipulation,

X. Zhu, Y . Qi, Y . Zhu, R. Walters, and R. Platt, “Equact: An se (3)- equivariant multi-task transformer for 3d robotic manipulation,” inThe Fourteenth International Conference on Learning Representations

work page

[28] [28]

Residual rotation correction using tactile equivariance,

Y . Zhu, Z. Ye, B. Hu, H. Zhao, Y . Qi, D. Wang, and R. Platt, “Residual rotation correction using tactile equivariance,”arXiv:2511.07381, 2025

work page arXiv 2025

[29] [29]

Riemann: Near real-time se (3)-equivariant robot manipulation without point cloud segmentation,

C. Gao, Z. Xue, S. Deng, T. Liang, S. Yang, L. Shao, and H. Xu, “Riemann: Near real-time se (3)-equivariant robot manipulation without point cloud segmentation,” in8th Annual Conference on Robot Learning

work page

[30] [30]

Simshear: Sim-to-real shear- based tactile servoing,

K. Freud, Y . Lin, and N. F. Lepora, “Simshear: Sim-to-real shear- based tactile servoing,” inProceedings of The 9th Conference on Robot Learning, vol. 305, 2025, pp. 3401–3412

work page 2025

[31] [31]

3d-vitac: Learning fine-grained manipulation with visuo-tactile sensing,

B. Huang, Y . Wang, X. Yang, Y . Luo, and Y . Li, “3d-vitac: Learning fine-grained manipulation with visuo-tactile sensing,” inProceedings of The 8th Conference on Robot Learning, ser. Proceedings of Machine Learning Research, vol. 270, 2025, pp. 2557–2578

work page 2025

[32] [32]

Mimictouch: Leveraging multi-modal human tactile demonstrations for contact-rich manipulation,

K. Yu, Y . Han, Q. Wang, V . Saxena, D. Xu, and Y . Zhao, “Mimictouch: Leveraging multi-modal human tactile demonstrations for contact-rich manipulation,” inProceedings of The 8th Conference on Robot Learning, ser. Proceedings of Machine Learning Research, vol. 270, 2025, pp. 4844–4865

work page 2025

[33] [33]

Text2touch: Tactile in-hand manipulation with llm-designed reward functions,

H. Field, M. Yang, Y . Lin, E. Psomopoulou, D. A. Barton, and N. F. Lepora, “Text2touch: Tactile in-hand manipulation with llm-designed reward functions,” inProceedings of The 9th Conference on Robot Learning, ser. Proceedings of Machine Learning Research, vol. 305, 2025, pp. 2847–2887

work page 2025

[34] [34]

Anyrotate: Gravity-invariant in-hand object rotation with sim-to-real touch,

M. Yang, C. Lu, A. Church, Y . Lin, C. J. Ford, H. Li, E. Psomopoulou, D. A. Barton, and N. F. Lepora, “Anyrotate: Gravity-invariant in-hand object rotation with sim-to-real touch,” inProceedings of The 8th Conference on Robot Learning, ser. Proceedings of Machine Learning Research, vol. 270, 2025, pp. 4727–4747

work page 2025

[35] [35]

Learning visuotactile estimation and control for non-prehensile manipulation under occlusions,

J. Del Aguila Ferrandis, J. Moura, and S. Vijayakumar, “Learning visuotactile estimation and control for non-prehensile manipulation under occlusions,” inProceedings of The 8th Conference on Robot Learning, ser. Proceedings of Machine Learning Research, vol. 270, 2025, pp. 1501–1515

work page 2025

[36] [36]

Tacumi: A multi-modal universal manipulation interface for contact-rich tasks,

T. Cheng, K. Chen, L. Chen, L. Zhang, Y . Zhang, Y . Ling, M. Hamad, Z. Bing, F. Wu, K. Sharmaet al., “Tacumi: A multi-modal uni- versal manipulation interface for contact-rich tasks,”arXiv preprint arXiv:2601.14550, 2026

work page arXiv 2026

[37] [37]

exumi: Extensible robot teaching system with action-aware task-agnostic tactile representation,

Y . Xu, L. Wei, P. An, Q. Zhang, and Y .-L. Li, “exumi: Extensible robot teaching system with action-aware task-agnostic tactile representation,” in Proceedings of The 9th Conference on Robot Learning, ser. Proceedings of Machine Learning Research, vol. 305, 2025, pp. 2536–2554

work page 2025

[38] [38]

Kinedex: Learning tactile-informed visuomotor policies via kinesthetic teaching for dexterous manipulation,

D. Zhang, C. Yuan, C. Wen, H. Zhang, J. Zhao, and Y . Gao, “Kinedex: Learning tactile-informed visuomotor policies via kinesthetic teaching for dexterous manipulation,” inProceedings of The 9th Conference on Robot Learning, ser. Proceedings of Machine Learning Research, vol. 305, 2025, pp. 4123–4138

work page 2025

[39] [39]

Tactile beyond pixels: Multisensory touch representations for robot manipulation,

C. Higuera, A. Sharma, T. Fan, C. K. Bodduluri, B. Boots, M. Kaess, M. Lambeta, T. Wu, Z. Liu, F. R. Hogan, and M. Mukadam, “Tactile beyond pixels: Multisensory touch representations for robot manipulation,” inProceedings of The 9th Conference on Robot Learning, ser. Proceedings of Machine Learning Research, vol. 305, 2025, pp. 105–123

work page 2025

[40] [40]

Dexskin: High-coverage conformable robotic skin for learning contact-rich manipulation,

S. Wistreich, B. Shi, S. Tian, S. Clarke, M. Nath, C. Xu, Z. Bao, and J. Wu, “Dexskin: High-coverage conformable robotic skin for learning contact-rich manipulation,” inProceedings of The 9th Conference on Robot Learning, ser. Proceedings of Machine Learning Research, vol. 305, 2025, pp. 769–793

work page 2025

[41] [41]

3d contact point cloud reconstruction from vision-based tactile flow,

Y . Du, G. Zhang, and M. Y . Wang, “3d contact point cloud reconstruction from vision-based tactile flow,”IEEE Robotics and Automation Letters, vol. 7, no. 4, pp. 12 177–12 184, 2022

work page 2022