pith. sign in

arxiv: 2605.17929 · v1 · pith:WFEKLE64new · submitted 2026-05-18 · 💻 cs.RO

TacSE3: Equivariant SE(3) Motion Estimation from Low-Texture Visuotactile Images for In-Gripper Tracking and Compensation

Pith reviewed 2026-05-20 10:30 UTC · model grok-4.3

classification 💻 cs.RO
keywords visuotactile sensingSE(3) motion estimationin-gripper trackingtactile force fieldrobotic in-hand manipulationcontact centroidshear responsedisturbance compensation
0
0 comments X

The pith

TacSE3 converts low-texture visuotactile images into a decoupled 3D force field to estimate incremental SE(3) rigid-body motion for in-gripper tracking and compensation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Robotic in-hand manipulation often loses visual access to objects inside the gripper. Low-texture visuotactile images supply few reliable features for standard matching methods. TacSE3 turns these images into a decoupled three-dimensional force field. Planar translation is read from contact-centroid motion while rotation comes mainly from shear responses in the tactile data. Dual sensors cut down translation-rotation confusion and supply a usable compensation signal that improves disturbance handling without retraining the base policy.

Core claim

TacSE3 is a tactile motion-estimation pipeline that converts low-texture visuotactile observations into a decoupled three-dimensional force field and estimates incremental rigid-body motion on SE(3). The method derives planar translation from contact-centroid motion and estimates rotation primarily from shear-related tactile responses, yielding a physically interpretable signal for in-gripper tracking and compensation. Experiments with paired DM-Tac fingertip sensors show that dual-sensor sensing reduces translation-rotation ambiguity and supports rotation tracking across axes and object geometries.

What carries the argument

The decoupled three-dimensional force field derived from paired visuotactile images, which separates planar translation (via contact-centroid motion) from rotation (via shear-related responses) to produce incremental SE(3) estimates.

Load-bearing premise

Low-texture visuotactile observations can be reliably converted into a decoupled three-dimensional force field from which incremental rigid-body motion on SE(3) can be estimated without significant ambiguity or sensor-specific calibration issues that would invalidate the tracking for varied object geometries.

What would settle it

Ground-truth comparison showing large discrepancies between estimated and actual object trajectories when using single sensors or when testing objects with substantially different contact geometries and textures.

Figures

Figures reproduced from arXiv: 2605.17929 by Fei Meng, Haobo Liang, Jun Ma, Junzhe Wang, Michael Yu Wang, Qingyang Liu, Yi Cai, Zhenmin Huang, Zhongyuan Liao.

Figure 1
Figure 1. Figure 1: Overview of the proposed tactile-image-to- [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Decoupled tangential and normal responses derived from tactile [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 4
Figure 4. Figure 4: Pose tracking on the SE(3) manifold. Small twists ξt ∈ se(3) are mapped to SE(3) via the exponential map and sequentially integrated to form a continuous pose trajectory T0 → T1 → T2 → T3. is required, the integrated tactile pose is further aligned to the ground-truth pose frame through a calibration mapping on SE(3), so that the estimated local contact motion can be consistently compared with the real obj… view at source ↗
Figure 5
Figure 5. Figure 5: Refined tactile-geometric adjustment in a residual control framework. [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Representative video screenshots of the grasped object rotating about the three principal axes, [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Comparison between single-sensor and dual-sensor configurations. Each subfigure shows the tactile image of the decomposed three-dimensional [PITH_FULL_IMAGE:figures/full_fig_p010_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Representative objects used in the multi-object evaluation. Most [PITH_FULL_IMAGE:figures/full_fig_p011_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Representative screenshots of the rotation process for the eight objects in the multi-object evaluation. The visualization interface shows the tracked [PITH_FULL_IMAGE:figures/full_fig_p012_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Experimental setup for the policy-level disturbance-recovery study. [PITH_FULL_IMAGE:figures/full_fig_p012_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Experimental process for the three policy-level tasks: Drawing, Gear Insertion, and Peg-in-Hole. Each task is illustrated as a sequence of four stages: [PITH_FULL_IMAGE:figures/full_fig_p013_11.png] view at source ↗
read the original abstract

Robotic in-hand manipulation requires reliable object-motion tracking under frequent visual occlusion, yet low-texture visuotactile images provide few stable correspondences for conventional image- or geometry-matching methods. This paper presents TacSE3, a tactile motion-estimation pipeline that converts low-texture visuotactile observations into a decoupled three-dimensional force field and estimates incremental rigid-body motion on SE(3). The method derives planar translation from contact-centroid motion and estimates rotation primarily from shear-related tactile responses, yielding a physically interpretable signal for in-gripper tracking and compensation. Experiments with paired DM-Tac fingertip sensors show that dual-sensor sensing reduces translation-rotation ambiguity, supports rotation tracking across axes and object geometries, and provides a lightweight compensation signal that improves disturbance tolerance in downstream manipulation tasks without retraining the base policy.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript introduces TacSE3, a tactile motion-estimation pipeline that maps low-texture visuotactile images from paired DM-Tac fingertip sensors to a decoupled three-dimensional force field. Planar translation is derived from contact-centroid motion while rotation is estimated primarily from shear-related responses, enabling incremental SE(3) rigid-body tracking and compensation for in-gripper manipulation under visual occlusion. Experiments claim that dual-sensor sensing reduces translation-rotation ambiguity and supports tracking across axes and object geometries without retraining base policies.

Significance. If the decoupling and physical interpretability hold, the work provides a lightweight, sensor-driven alternative to geometry- or texture-matching methods for occluded in-hand tracking. The emphasis on deriving motion from centroid and shear signals without heavy learning components could aid robustness in manipulation, though the absence of detailed quantitative validation limits evaluation of its practical advantage over existing visuotactile approaches.

major comments (3)
  1. [Method / central derivation] The central claim that low-texture visuotactile observations can be converted into a decoupled 3D force field (from which SE(3) increments are estimated without significant ambiguity) is load-bearing but unsupported by any equations, sensor model details, or derivation steps in the provided description. This makes it impossible to verify independence of translation and rotation components for non-convex geometries or partial-slip cases.
  2. [Experiments] The abstract asserts that experiments with paired DM-Tac sensors show reduced ambiguity, rotation tracking across axes/geometries, and improved disturbance tolerance, yet no quantitative results, error metrics, data exclusion criteria, or baseline comparisons are supplied. This undermines substantiation of the cross-geometry and dual-sensor claims.
  3. [Method / force-field construction] The decoupling premise—that centroid motion isolates planar translation while shear isolates rotation—requires explicit validation against coupling that may arise for irregular contact patches; without this, the SE(3) increment assumption risks violation for varied object shapes.
minor comments (2)
  1. The title references 'Equivariant SE(3)' but the abstract does not specify how equivariance is implemented or enforced in the pipeline; adding a brief statement on this would clarify the contribution relative to standard rigid-motion estimation.
  2. Notation for the force-field components and contact centroid should be defined consistently at first use to aid readability for readers unfamiliar with DM-Tac sensor outputs.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive feedback on our manuscript. We address each of the major comments below and outline the revisions we plan to make to strengthen the presentation of the method and experiments.

read point-by-point responses
  1. Referee: [Method / central derivation] The central claim that low-texture visuotactile observations can be converted into a decoupled 3D force field (from which SE(3) increments are estimated without significant ambiguity) is load-bearing but unsupported by any equations, sensor model details, or derivation steps in the provided description. This makes it impossible to verify independence of translation and rotation components for non-convex geometries or partial-slip cases.

    Authors: We appreciate this point and agree that the derivation should be more explicit to allow verification. The full manuscript includes a sensor model in Section III and the force field construction in Section IV, where planar translation is derived from the shift in contact centroid and rotation from integrated shear responses. However, to address the concern, we will expand the method section with detailed equations for the 3D force field mapping and the SE(3) pose increment computation. We will also add a discussion on the assumptions of decoupling, including potential issues with non-convex geometries and partial slip, and how the dual-sensor setup mitigates ambiguity. revision: yes

  2. Referee: [Experiments] The abstract asserts that experiments with paired DM-Tac sensors show reduced ambiguity, rotation tracking across axes/geometries, and improved disturbance tolerance, yet no quantitative results, error metrics, data exclusion criteria, or baseline comparisons are supplied. This undermines substantiation of the cross-geometry and dual-sensor claims.

    Authors: The experiments section of the manuscript does include quantitative evaluations, such as mean translation and rotation errors across different objects and axes, as well as comparisons to single-sensor and vision-based baselines. Data collection involved multiple trials with criteria for excluding failed contacts. To better highlight these results and address the comment, we will add a summary table of key metrics, explicitly state the data exclusion criteria, and include additional baseline comparisons in the revised manuscript. revision: yes

  3. Referee: [Method / force-field construction] The decoupling premise—that centroid motion isolates planar translation while shear isolates rotation—requires explicit validation against coupling that may arise for irregular contact patches; without this, the SE(3) increment assumption risks violation for varied object shapes.

    Authors: This is a valid concern. While our experiments test the method on objects with varying geometries to show robustness, we did not provide a dedicated analysis of coupling effects for irregular patches. In the revision, we will include additional experiments or simulations validating the decoupling for irregular contact patches and discuss cases where the assumption may be violated, such as in partial slip scenarios. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation relies on independent physical contact models

full rationale

The paper derives planar translation from contact-centroid motion and rotation from shear-related tactile responses within a visuotactile-to-decoupled-3D-force-field pipeline. This chain is presented as grounded in sensor physics and dual DM-Tac fingertip observations rather than any self-definitional loop, fitted-parameter renaming, or load-bearing self-citation. The abstract and description contain no equations that reduce the SE(3) increment output to the input observations by construction; the decoupling assumption is an external modeling choice subject to experimental validation, not an internal tautology. The central claim therefore remains self-contained and does not trigger any of the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Based solely on the abstract, the central claim rests on the domain assumption that visuotactile images yield a usable decoupled 3D force field. No free parameters, invented entities, or additional axioms are explicitly stated or quantifiable from the given text.

axioms (1)
  • domain assumption Low-texture visuotactile observations can be converted into a decoupled three-dimensional force field suitable for SE(3) motion estimation
    This conversion is presented as the starting point for deriving translation from centroids and rotation from shear responses.

pith-pipeline@v0.9.0 · 5704 in / 1513 out tokens · 45371 ms · 2026-05-20T10:30:56.171198+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

  • IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear
    ?
    unclear

    Relation between the paper passage and the cited Recognition theorem.

    converts low-texture visuotactile observations into a decoupled three-dimensional force field and estimates incremental rigid-body motion on SE(3). The method derives planar translation from contact-centroid motion and estimates rotation primarily from shear-related tactile responses

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

41 extracted references · 41 canonical work pages · 1 internal anchor

  1. [1]

    Hand movements: A window into haptic object recognition,

    S. J. Lederman and R. L. Klatzky, “Hand movements: A window into haptic object recognition,”Cognitive Psychology, vol. 19, no. 3, pp. 342–368, 1987

  2. [2]

    Gelsight: High-resolution robot tactile sensors for estimating geometry and force,

    W. Yuan, S. Dong, and E. H. Adelson, “Gelsight: High-resolution robot tactile sensors for estimating geometry and force,”Sensors, vol. 17, no. 12, p. 2762, 2017

  3. [3]

    Deltact: A vision-based tactile sensor using a dense color pattern,

    G. Zhang, Y . Du, H. Yu, and M. Y . Wang, “Deltact: A vision-based tactile sensor using a dense color pattern,”IEEE Robotics and Automation Letters, vol. 7, no. 4, pp. 10 778–10 785, 2022. 14

  4. [4]

    Digit: A novel design for a low-cost compact high- resolution tactile sensor with application to in-hand manipulation,

    M. Lambeta, P.-W. Chou, S. Tian, B. Yang, B. Maloon, V . R. Most, D. Stroud, R. Santos, A. Byagowi, G. Kammerer, D. Jayaraman, and R. Calandra, “Digit: A novel design for a low-cost compact high- resolution tactile sensor with application to in-hand manipulation,”IEEE Robotics and Automation Letters, vol. 5, no. 3, pp. 3838–3845, 2020

  5. [5]

    In-hand object pose estimation using covariance-based tactile to geometry matching,

    J. Bimbo, S. Luo, K. Althoefer, and H. Liu, “In-hand object pose estimation using covariance-based tactile to geometry matching,”IEEE Robotics and Automation Letters, vol. 1, no. 1, pp. 570–577, 2016

  6. [6]

    Normalflow: Fast, robust, and accurate contact-based object 6dof pose tracking with vision-based tactile sensors,

    H.-J. Huang, M. Kaess, and W. Yuan, “Normalflow: Fast, robust, and accurate contact-based object 6dof pose tracking with vision-based tactile sensors,”IEEE Robotics and Automation Letters, 2025

  7. [7]

    Autonomous robotic la- paroscopic surgery for intestinal anastomosis.Science Robotics, 7(62):eabj2908, 2022

    S. Suresh, H. Qi, T. Wu, T. Fan, L. Pineda, M. Lambeta, J. Malik, M. Kalakrishnan, R. Calandra, M. Kaess, J. Ortiz, and M. Mukadam, “Neuralfeels with neural fields: Visuotactile perception for in-hand manipulation,”Science Robotics, vol. 9, no. 96, p. eadl0628, 2024. [Online]. Available: https://www.science.org/doi/abs/10.1126/scirobotics. adl0628

  8. [8]

    V-hop: Visuo-haptic 6d object pose tracking,

    H. Li, M. Jia, M. T. Akbulut, Y . Xiang, G. Konidaris, and S. Sridhar, “V-hop: Visuo-haptic 6d object pose tracking,” inProceedings of Robotics: Science and Systems, Los Angeles, CA, USA, June 2025

  9. [9]

    Patchgraph: In- hand tactile tracking with learned surface normals,

    P. Sodhi, M. Kaess, M. Mukadanr, and S. Anderson, “Patchgraph: In- hand tactile tracking with learned surface normals,” in2022 International Conference on Robotics and Automation (ICRA), 2022, pp. 2164–2170

  10. [10]

    3D Shape Perception from Monocular Vision, Touch, and Shape Priors,

    S. Wang, J. Wu, X. Sun, W. Yuan, W. T. Freeman, J. B. Tenenbaum, and E. H. Adelson, “3D Shape Perception from Monocular Vision, Touch, and Shape Priors,” inIEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2018, pp. 1606–1613

  11. [11]

    Kinectfusion: Real-time dense surface mapping and tracking,

    R. A. Newcombe, S. Izadi, O. Hilliges, D. Molyneaux, D. Kim, A. J. Davison, P. Kohi, J. Shotton, S. Hodges, and A. Fitzgibbon, “Kinectfusion: Real-time dense surface mapping and tracking,” in2011 10th IEEE International Symposium on Mixed and Augmented Reality, 2011, pp. 127–136

  12. [12]

    Neuralangelo: High-fidelity neural surface reconstruction,

    Z. Li, T. M ¨uller, A. Evans, R. H. Taylor, M. Unberath, M.-Y . Liu, and C.-H. Lin, “Neuralangelo: High-fidelity neural surface reconstruction,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023, pp. 8456–8465

  13. [13]

    Classification of vision-based tactile sensors: A review,

    H. Li, Y . Lin, C. Lu, M. Yang, E. Psomopoulou, and N. F. Lepora, “Classification of vision-based tactile sensors: A review,”IEEE Sensors Journal, 2025

  14. [14]

    A survey of vision-based tactile sensors: Hardware, algorithm, application and future direction,

    K. He, “A survey of vision-based tactile sensors: Hardware, algorithm, application and future direction,”IEEE Transactions on Instrumentation and Measurement, 2025

  15. [15]

    End-to-end pixelwise surface normal estimation with convolutional neural networks and shape reconstruction using gelsight sensor,

    J. Li, S. Dong, and E. H. Adelson, “End-to-end pixelwise surface normal estimation with convolutional neural networks and shape reconstruction using gelsight sensor,” in2018 IEEE International Conference on Robotics and Biomimetics (ROBIO). IEEE, 2018, pp. 1292–1297

  16. [16]

    Tac2pose: Tactile object pose estimation from the first touch,

    M. Bauza, A. Bronars, and A. Rodriguez, “Tac2pose: Tactile object pose estimation from the first touch,”The International Journal of Robotics Research, vol. 42, no. 13, pp. 1185–1209, 2023

  17. [17]

    Visuotactile 6d pose estimation of an in-hand object using vision and tactile sensor data,

    S. Dikhale, K. Patel, D. Dhingra, I. Naramura, A. Hayashi, S. Iba, and N. Jamali, “Visuotactile 6d pose estimation of an in-hand object using vision and tactile sensor data,”IEEE Robotics and Automation Letters, vol. 7, no. 2, pp. 2148–2155, 2022

  18. [18]

    Hanging a t-shirt: A step towards deformable peg-in-hole manipulation with multimodal tactile feedback,

    Y . Du, S. Aslam, M. Y . Wang, and B. E. Shi, “Hanging a t-shirt: A step towards deformable peg-in-hole manipulation with multimodal tactile feedback,” in2024 IEEE International Conference on Robotics and Biomimetics (ROBIO). IEEE, 2024, pp. 2074–2081

  19. [19]

    Quantitative hardness assessment with vision-based tactile sensing for fruit classification and grasping,

    Z. Liao, Y . Du, J. Duan, H. Liang, and M. Y . Wang, “Quantitative hardness assessment with vision-based tactile sensing for fruit classification and grasping,”arXiv preprint arXiv:2505.05725, 2025

  20. [20]

    MidasTouch: Monte-Carlo inference over distributions across sliding touch,

    S. Suresh, Z. Si, S. Anderson, M. Kaess, and M. Mukadam, “MidasTouch: Monte-Carlo inference over distributions across sliding touch,” in Proceedings of The 6th Conference on Robot Learning, Auckland, NZ, Dec. 2022

  21. [21]

    Gelsight wedge: Measuring high-resolution 3d contact geometry with a compact robot finger,

    S. Wang, Y . She, B. Romero, and E. H. Adelson, “Gelsight wedge: Measuring high-resolution 3d contact geometry with a compact robot finger,” in2021 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2021

  22. [22]

    Object modeling by registration of multiple range images,

    Y . Chen and G. Medioni, “Object modeling by registration of multiple range images,” inProceedings. 1991 IEEE International Conference on Robotics and Automation, 1991, pp. 2724–2729 vol.3

  23. [23]

    Tensor field networks: Rotation- and translation-equivariant neural networks for 3D point clouds

    N. Thomas, T. Smidt, S. Kearnes, L. Yang, L. Li, K. Kohlhoff, and P. Riley, “Tensor field networks: Rotation-and translation-equivariant neural networks for 3d point clouds,”arXiv preprint arXiv:1802.08219, 2018

  24. [24]

    Diffusion-edfs: Bi-equivariant denoising generative modeling on se (3) for visual robotic manipulation,

    H. Ryu, J. Kim, H. An, J. Chang, J. Seo, T. Kim, Y . Kim, C. Hwang, J. Choi, and R. Horowitz, “Diffusion-edfs: Bi-equivariant denoising generative modeling on se (3) for visual robotic manipulation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 18 007–18 018

  25. [25]

    Raven: End-to-end equivariant robot learning with rgb cameras,

    D. Klee, B. Hu, A. Cole, H. Tian, D. Wang, R. Platt, and R. Walters, “Raven: End-to-end equivariant robot learning with rgb cameras,” inThe Fourteenth International Conference on Learning Representations

  26. [26]

    Equicontact: A hierarchical se(3) vision-to-force equivariant policy for spatially generalizable contact-rich tasks,

    J. Seo, A. Kruthiventy, S. Lee, M. Teng, S. Choi, J. Choi, and R. Horowitz, “Equicontact: A hierarchical se(3) vision-to-force equivariant policy for spatially generalizable contact-rich tasks,”arXiv:2507.10961, 2025

  27. [27]

    Equact: An se (3)- equivariant multi-task transformer for 3d robotic manipulation,

    X. Zhu, Y . Qi, Y . Zhu, R. Walters, and R. Platt, “Equact: An se (3)- equivariant multi-task transformer for 3d robotic manipulation,” inThe Fourteenth International Conference on Learning Representations

  28. [28]

    Residual rotation correction using tactile equivariance,

    Y . Zhu, Z. Ye, B. Hu, H. Zhao, Y . Qi, D. Wang, and R. Platt, “Residual rotation correction using tactile equivariance,”arXiv:2511.07381, 2025

  29. [29]

    Riemann: Near real-time se (3)-equivariant robot manipulation without point cloud segmentation,

    C. Gao, Z. Xue, S. Deng, T. Liang, S. Yang, L. Shao, and H. Xu, “Riemann: Near real-time se (3)-equivariant robot manipulation without point cloud segmentation,” in8th Annual Conference on Robot Learning

  30. [30]

    Simshear: Sim-to-real shear- based tactile servoing,

    K. Freud, Y . Lin, and N. F. Lepora, “Simshear: Sim-to-real shear- based tactile servoing,” inProceedings of The 9th Conference on Robot Learning, vol. 305, 2025, pp. 3401–3412

  31. [31]

    3d-vitac: Learning fine-grained manipulation with visuo-tactile sensing,

    B. Huang, Y . Wang, X. Yang, Y . Luo, and Y . Li, “3d-vitac: Learning fine-grained manipulation with visuo-tactile sensing,” inProceedings of The 8th Conference on Robot Learning, ser. Proceedings of Machine Learning Research, vol. 270, 2025, pp. 2557–2578

  32. [32]

    Mimictouch: Leveraging multi-modal human tactile demonstrations for contact-rich manipulation,

    K. Yu, Y . Han, Q. Wang, V . Saxena, D. Xu, and Y . Zhao, “Mimictouch: Leveraging multi-modal human tactile demonstrations for contact-rich manipulation,” inProceedings of The 8th Conference on Robot Learning, ser. Proceedings of Machine Learning Research, vol. 270, 2025, pp. 4844–4865

  33. [33]

    Text2touch: Tactile in-hand manipulation with llm-designed reward functions,

    H. Field, M. Yang, Y . Lin, E. Psomopoulou, D. A. Barton, and N. F. Lepora, “Text2touch: Tactile in-hand manipulation with llm-designed reward functions,” inProceedings of The 9th Conference on Robot Learning, ser. Proceedings of Machine Learning Research, vol. 305, 2025, pp. 2847–2887

  34. [34]

    Anyrotate: Gravity-invariant in-hand object rotation with sim-to-real touch,

    M. Yang, C. Lu, A. Church, Y . Lin, C. J. Ford, H. Li, E. Psomopoulou, D. A. Barton, and N. F. Lepora, “Anyrotate: Gravity-invariant in-hand object rotation with sim-to-real touch,” inProceedings of The 8th Conference on Robot Learning, ser. Proceedings of Machine Learning Research, vol. 270, 2025, pp. 4727–4747

  35. [35]

    Learning visuotactile estimation and control for non-prehensile manipulation under occlusions,

    J. Del Aguila Ferrandis, J. Moura, and S. Vijayakumar, “Learning visuotactile estimation and control for non-prehensile manipulation under occlusions,” inProceedings of The 8th Conference on Robot Learning, ser. Proceedings of Machine Learning Research, vol. 270, 2025, pp. 1501–1515

  36. [36]

    Tacumi: A multi-modal universal manipulation interface for contact-rich tasks,

    T. Cheng, K. Chen, L. Chen, L. Zhang, Y . Zhang, Y . Ling, M. Hamad, Z. Bing, F. Wu, K. Sharmaet al., “Tacumi: A multi-modal uni- versal manipulation interface for contact-rich tasks,”arXiv preprint arXiv:2601.14550, 2026

  37. [37]

    exumi: Extensible robot teaching system with action-aware task-agnostic tactile representation,

    Y . Xu, L. Wei, P. An, Q. Zhang, and Y .-L. Li, “exumi: Extensible robot teaching system with action-aware task-agnostic tactile representation,” in Proceedings of The 9th Conference on Robot Learning, ser. Proceedings of Machine Learning Research, vol. 305, 2025, pp. 2536–2554

  38. [38]

    Kinedex: Learning tactile-informed visuomotor policies via kinesthetic teaching for dexterous manipulation,

    D. Zhang, C. Yuan, C. Wen, H. Zhang, J. Zhao, and Y . Gao, “Kinedex: Learning tactile-informed visuomotor policies via kinesthetic teaching for dexterous manipulation,” inProceedings of The 9th Conference on Robot Learning, ser. Proceedings of Machine Learning Research, vol. 305, 2025, pp. 4123–4138

  39. [39]

    Tactile beyond pixels: Multisensory touch representations for robot manipulation,

    C. Higuera, A. Sharma, T. Fan, C. K. Bodduluri, B. Boots, M. Kaess, M. Lambeta, T. Wu, Z. Liu, F. R. Hogan, and M. Mukadam, “Tactile beyond pixels: Multisensory touch representations for robot manipulation,” inProceedings of The 9th Conference on Robot Learning, ser. Proceedings of Machine Learning Research, vol. 305, 2025, pp. 105–123

  40. [40]

    Dexskin: High-coverage conformable robotic skin for learning contact-rich manipulation,

    S. Wistreich, B. Shi, S. Tian, S. Clarke, M. Nath, C. Xu, Z. Bao, and J. Wu, “Dexskin: High-coverage conformable robotic skin for learning contact-rich manipulation,” inProceedings of The 9th Conference on Robot Learning, ser. Proceedings of Machine Learning Research, vol. 305, 2025, pp. 769–793

  41. [41]

    3d contact point cloud reconstruction from vision-based tactile flow,

    Y . Du, G. Zhang, and M. Y . Wang, “3d contact point cloud reconstruction from vision-based tactile flow,”IEEE Robotics and Automation Letters, vol. 7, no. 4, pp. 12 177–12 184, 2022