pith. sign in

arxiv: 2605.21330 · v1 · pith:AG7KHWXRnew · submitted 2026-05-20 · 💻 cs.RO

Learning Robust Dexterous In-Hand Manipulation from Joint Sensors with Proprioceptive Transformer

Pith reviewed 2026-05-21 03:36 UTC · model grok-4.3

classification 💻 cs.RO
keywords dexterous manipulationin-hand manipulationproprioceptiontransformerreinforcement learningjoint sensingtendon-driven hand
0
0 comments X

The pith

A transformer on joint position and velocity histories alone supports faster closed-loop cube rotation on a tendon-driven hand than baselines relying on the same inputs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper asks how much in-hand manipulation can be achieved when vision and tactile sensing are removed and only joint encoders remain available. It trains a teacher policy with full object state access through reinforcement learning, then distills that policy into a student transformer that receives only short histories of joint positions and velocities. The transformer extracts implicit object pose and contact information from the temporal patterns in those signals. On a real tendon-driven hand the resulting controller rotates a cube 3.1 times faster than MLP or other baselines while also lowering position estimation error by 23 percent. The work therefore tests whether proprioceptive time series contain enough structure to close the loop for continuous dexterous tasks without external perception.

Core claim

A transformer policy that receives only sequences of joint positions and velocities can be distilled from a privileged teacher to perform continuous in-hand cube rotation, yielding higher speeds and lower state-estimation error than non-transformer baselines on the same joint-only input.

What carries the argument

The Proprioceptive Transformer, which processes fixed-length histories of joint position and velocity readings to infer implicit object state for closed-loop action selection.

If this is right

  • Joint-only policies can achieve real-world speeds competitive with methods that use external sensors for the same rotation task.
  • Distillation from a privileged teacher transfers the ability to extract object information from proprioception into a deployable controller.
  • Transformer layers are effective at capturing the temporal dependencies needed to decode object motion from joint signals alone.
  • The same architecture may generalize to other continuous manipulation primitives once trained on varied object dynamics.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Hardware designs for dexterous hands could be simplified by dropping cameras or tactile arrays if joint histories prove sufficient for a wider set of tasks.
  • Training curricula that vary object mass, friction, and size during teacher-policy learning would likely improve robustness of the distilled proprioceptive controller.
  • The approach opens a route to low-cost, vision-free manipulation pipelines that rely only on standard motor encoders.

Load-bearing premise

That temporal patterns in joint position and velocity histories contain sufficient implicit information about the object's state to support effective closed-loop control after distillation from a privileged teacher policy.

What would settle it

A controlled experiment in which joint histories are replaced by shuffled or noise-augmented sequences and the rotation speed or position-estimation accuracy collapses to baseline levels.

Figures

Figures reproduced from arXiv: 2605.21330 by Aristotelis Sympetheros, Chenyu Yang, Jaehoon Kim, Robert K. Katzschmann, Senlan Yao.

Figure 1
Figure 1. Figure 1: Left: The Proprioceptive Transformer encodes joint position/velocity history and actions, outputting actions and inferred object state. Middle: The ORCA hand with 16 AS5600 magnetic angle sensors. Right: Our approach achieves 3.1× higher rotation speed and 100% rotation accuracy. Joint sensors improve rotation speed by 26.8% over motor encoders. Abstract—In-hand object manipulation is a fundamental yet cha… view at source ↗
Figure 2
Figure 2. Figure 2: Teacher-student distillation pipeline. The teacher policy [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Left: The ORCA hand in our real-world experiment. Magnetic angle sensors are embedded in the joints for direct joint sensing. The antagonistic tendon design allows the joint angles to be estimated from motor angle readings. Right: Continuous cube rotation on 55 mm and 65 mm cubes with our PT-Joint Policy. TABLE I: Performance comparison on cube rotation task. Best results in bold. Medium Cube (55 mm) Polic… view at source ↗
Figure 4
Figure 4. Figure 4: Joint command vs. actual position for different cube sizes. Deviation from the ideal line ( [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Object position reconstruction from proprioceptive observations alone. (a) Per-axis tracking for a representative [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Per-joint reconstruction RMSE across architectures. [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗
read the original abstract

In-hand object manipulation is a fundamental yet challenging capability for dexterous robots. Despite significant progress in dexterous manipulation, existing approaches rely heavily on vision or tactile sensing to track object states, while joint sensing -- the most readily available modality on any robotic hand -- remains largely overlooked, particularly for tendon-driven hands. In this paper, we study how far joint sensing alone can go by asking: (i) whether motor encoders or direct joint sensing provides better proprioceptive feedback, (ii) how to extract environment information from joint measurements, and (iii) whether joint-only control can achieve competitive real-world performance without external perception. We present the Proprioceptive Transformer (PT), an exteroceptive-free approach for continuous cube rotation on a tendon-driven dexterous hand that uses only joint sensing feedback. A teacher policy is first trained via reinforcement learning with privileged object information, then distilled into PT, which operates solely on joint position and velocity histories. The Transformer architecture effectively extracts implicit object state information from temporal patterns in joint sensor readings. Experiments on the real ORCA hand show that our approach achieves 3.1x higher rotation speed than baselines. We also demonstrate that our PT achieves a 23.4% lower RMSE for cube position estimation than the MLP baseline, indicating superior extraction of exteroceptive information from proprioceptive sources.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper introduces the Proprioceptive Transformer (PT) for in-hand cube rotation on a tendon-driven dexterous hand (ORCA) using only joint position and velocity histories. A teacher policy is trained with RL and privileged object state information, then distilled to a student PT policy that operates without vision or tactile sensing. Real-world experiments claim 3.1x higher rotation speed than baselines and 23.4% lower RMSE in cube position estimation compared to an MLP baseline, demonstrating that temporal patterns in joint sensors can implicitly recover sufficient object state for closed-loop control.

Significance. If the central performance claims hold under rigorous verification, the work is significant for reducing sensor requirements in dexterous manipulation, particularly for tendon-driven hands where external perception is costly or impractical. The real-world deployment on physical hardware and the teacher-student distillation approach provide a practical pathway for proprioception-only policies, with potential impact on scalable robot hands.

major comments (3)
  1. [Real-world experiments] Real-world experiments section: The headline claims of 3.1x higher rotation speed and 23.4% lower RMSE for position estimation are presented without reported trial counts, standard deviations, or statistical significance tests. Given the stochasticity of RL training and real-world contact variability, these omissions make it difficult to assess whether the gains are robust or reproducible.
  2. [Methods] Methods and baselines: The comparison to the MLP baseline for cube position estimation does not explicitly state whether the MLP receives the same joint position/velocity history sequences as the PT or only instantaneous joint values. This detail is load-bearing for the claim that the Transformer extracts implicit exteroceptive information from temporal patterns, as opposed to benefiting from the teacher's action distribution.
  3. [Experiments] Distillation and evaluation: No ablation is reported on history length or a memoryless variant using only current joint readings. Without this, it remains unclear whether the observed closed-loop performance on the physical hand stems from the Transformer's temporal modeling or from other factors in the privileged-teacher distillation pipeline.
minor comments (2)
  1. [Abstract] The abstract and introduction could more clearly distinguish the two questions addressed (motor encoders vs. direct joint sensing; extraction of environment information) with explicit experimental mappings.
  2. [Methods] Notation for joint histories and Transformer input dimensions should be defined consistently in the methods to avoid ambiguity when comparing to the MLP baseline.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive comments. We address each major point below and have revised the manuscript to improve experimental reporting, clarify baselines, and add requested ablations.

read point-by-point responses
  1. Referee: [Real-world experiments] Real-world experiments section: The headline claims of 3.1x higher rotation speed and 23.4% lower RMSE for position estimation are presented without reported trial counts, standard deviations, or statistical significance tests. Given the stochasticity of RL training and real-world contact variability, these omissions make it difficult to assess whether the gains are robust or reproducible.

    Authors: We agree that trial counts, standard deviations, and statistical tests are necessary to demonstrate robustness. In the revised manuscript we now state that all real-world results are averaged over 10 independent trials per method, report standard deviations for rotation speed and RMSE, and include paired t-test results (p < 0.01) confirming the reported improvements are statistically significant. revision: yes

  2. Referee: [Methods] Methods and baselines: The comparison to the MLP baseline for cube position estimation does not explicitly state whether the MLP receives the same joint position/velocity history sequences as the PT or only instantaneous joint values. This detail is load-bearing for the claim that the Transformer extracts implicit exteroceptive information from temporal patterns, as opposed to benefiting from the teacher's action distribution.

    Authors: The MLP baseline receives the identical joint position and velocity history sequences (length-10 window) as the PT; the sequences are flattened and fed through fully-connected layers. We have added an explicit statement in the revised Methods section to make this input equivalence clear, so that the performance gap can be attributed to the Transformer's temporal attention rather than to differences in available information or the teacher's action distribution. revision: yes

  3. Referee: [Experiments] Distillation and evaluation: No ablation is reported on history length or a memoryless variant using only current joint readings. Without this, it remains unclear whether the observed closed-loop performance on the physical hand stems from the Transformer's temporal modeling or from other factors in the privileged-teacher distillation pipeline.

    Authors: We have added the requested ablation study to the revised Experiments section. We evaluate the PT with history lengths of 1, 5, 10, and 20 steps as well as a memoryless variant that receives only the current joint readings. Performance drops sharply for shorter histories and for the memoryless case, supporting that the Transformer's temporal modeling is responsible for the observed closed-loop success on the physical hand. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical distillation and hardware evaluation remain independent of inputs

full rationale

The paper's chain consists of (1) RL training of a teacher policy that receives privileged object pose/velocity during simulation, followed by (2) distillation into a Transformer student that receives only joint-position and joint-velocity histories, and (3) direct evaluation of the resulting policy on the physical ORCA hand. None of these steps reduces to its own inputs by construction: the final performance numbers (3.1× rotation speed, 23.4 % lower RMSE) are measured externally on real hardware rather than being algebraically equivalent to the privileged signals or to any fitted parameter. No equations are presented that would make the student's implicit-state extraction tautological, and no load-bearing self-citation or uniqueness theorem is invoked to force the architecture. The derivation is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that joint histories encode usable object state information and on standard RL training assumptions for the teacher policy; no new entities or explicit free parameters beyond typical model hyperparameters are introduced.

free parameters (1)
  • Transformer and RL hyperparameters
    Model architecture sizes, learning rates, and distillation parameters are chosen during training but not enumerated in the abstract.
axioms (1)
  • domain assumption Temporal sequences of joint positions and velocities contain sufficient implicit object state information for control
    Invoked when the student policy is trained to match the teacher using only joint histories.

pith-pipeline@v0.9.0 · 5786 in / 1273 out tokens · 36821 ms · 2026-05-21T03:36:55.913395+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

28 extracted references · 28 canonical work pages · 2 internal anchors

  1. [1]

    Trends and challenges in robot manipulation,

    A. Billard and D. Kragic, “Trends and challenges in robot manipulation,” Science, vol. 364, no. 6446, 2019

  2. [2]

    Towards human-level bimanual dexterous manipulation with reinforcement learning,

    Y . Chen, T. Wu, S. Wang, X. Feng, J. Jiang, Z. Lu, S. McAleer, H. Dong, S.-C. Zhu, and Y . Yang, “Towards human-level bimanual dexterous manipulation with reinforcement learning,” inAdvances in Neural Information Processing Systems, vol. 35, 2022, pp. 5150–5163

  3. [3]

    Learning dexterous in-hand manipulation,

    OpenAI, M. Andrychowicz, B. Baker, M. Chociej, R. J ´ozefowicz, B. McGrew, J. Pachocki, A. Petron, M. Plappert, G. Powell, A. Ray, J. Schneider, S. Sidor, J. Tobin, P. Welinder, L. Weng, and W. Zaremba, “Learning dexterous in-hand manipulation,”The International Journal of Robotics Research, vol. 39, no. 1, pp. 3–20, 2020

  4. [4]

    Domain randomization for transferring deep neural networks from simulation to the real world,

    J. Tobin, R. Fong, A. Ray, J. Schneider, W. Zaremba, and P. Abbeel, “Domain randomization for transferring deep neural networks from simulation to the real world,” inIEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2017, pp. 23–30

  5. [5]

    Dextreme: Transfer of agile in-hand manipulation from simulation to reality,

    A. Handa, A. Allshire, V . Makoviychuk, A. Petrenko, R. Singh, J. Liu, D. Makoviichuk, K. Van Wyk, A. Zhurkevich, B. Sundaralingamet al., “Dextreme: Transfer of agile in-hand manipulation from simulation to reality,” inIEEE International Conference on Robotics and Automation (ICRA). IEEE, 2023, pp. 5977–5984

  6. [6]

    Sim-to-real transfer in deep reinforcement learning for robotics: A survey,

    W. Zhao, J. P. Queralta, and T. Westerlund, “Sim-to-real transfer in deep reinforcement learning for robotics: A survey,” inIEEE Symposium Series on Computational Intelligence (SSCI), 2020, pp. 737–744

  7. [7]

    Robot learning from randomized simulations: A review,

    F. Muratore, F. Ramos, G. Turk, W. Yu, M. Gienger, and J. Peters, “Robot learning from randomized simulations: A review,”Frontiers in Robotics and AI, vol. 9, 2022

  8. [8]

    Orca: An open-source, reliable, cost-effective, an- thropomorphic robotic hand for uninterrupted dexterous task learning,

    C. C. Christoph, M. Eberlein, F. Katsimalis, A. Roberti, A. Sympetheros, M. R. V ogt, D. Liconti, C. Yang, B. G. Cangan, R. J. Hinchet, and R. K. Katzschmann, “Orca: An open-source, reliable, cost-effective, an- thropomorphic robotic hand for uninterrupted dexterous task learning,” inProceedings of the IEEE/RSJ International Conference on Intelligent Robo...

  9. [9]

    Hands for dexterous manipulation and robust grasping: A difficult road toward simplicity,

    A. Bicchi, “Hands for dexterous manipulation and robust grasping: A difficult road toward simplicity,”IEEE Transactions on Robotics and Automation, vol. 16, no. 6, pp. 652–662, 2000

  10. [10]

    An overview of dexterous manipulation,

    A. M. Okamura, N. Smaby, and M. R. Cutkosky, “An overview of dexterous manipulation,” inIEEE International Conference on Robotics and Automation (ICRA), vol. 1. IEEE, 2000, pp. 255–262

  11. [11]

    Solving Rubik's Cube with a Robot Hand

    I. Akkaya, M. Andrychowicz, M. Chociej, M. Litwin, B. McGrew, A. Petron, A. Paino, M. Plappert, G. Powell, R. Ribaset al., “Solving rubik’s cube with a robot hand,”arXiv preprint arXiv:1910.07113, 2019

  12. [12]

    Optimal control with learned local models: Application to dexterous manipulation,

    V . Kumar, E. Todorov, and S. Levine, “Optimal control with learned local models: Application to dexterous manipulation,” inIEEE Interna- tional Conference on Robotics and Automation (ICRA). IEEE, 2016, pp. 378–383

  13. [13]

    Rotating without seeing: Towards in-hand dexterity through touch,

    Z.-H. Yin, B. Huang, Y . Qin, Q. Chen, and X. Wang, “Rotating without seeing: Towards in-hand dexterity through touch,” inRobotics: Science and Systems (RSS), 2023

  14. [14]

    Modeling and control of elastic joint robots,

    M. W. Spong, “Modeling and control of elastic joint robots,”Journal of Dynamic Systems, Measurement, and Control, vol. 109, no. 4, pp. 310–318, 1987

  15. [15]

    Soft robotics: From torque feedback-controlled lightweight robots to intrinsically compliant systems,

    A. Albu-Sch ¨affer, C. Ott, and G. Hirzinger, “Soft robotics: From torque feedback-controlled lightweight robots to intrinsically compliant systems,”IEEE Robotics & Automation Magazine, vol. 15, no. 3, pp. 20–30, 2008

  16. [16]

    Design of a highly biomimetic anthropomorphic robotic hand towards artificial limb regeneration,

    Z. Xu and E. Todorov, “Design of a highly biomimetic anthropomorphic robotic hand towards artificial limb regeneration,” pp. 3485–3492, 2016

  17. [17]

    Getting the ball rolling: Learning a dexterous policy for a biomimetic tendon-driven hand with rolling contact joints,

    Y . Toshimitsu, B. Forrai, B. G. Cangan, U. Steger, M. Knecht, S. Weirich, and R. K. Katzschmann, “Getting the ball rolling: Learning a dexterous policy for a biomimetic tendon-driven hand with rolling contact joints,” inIEEE-RAS International Conference on Humanoid Robots (Humanoids). IEEE, 2023, pp. 1–7

  18. [18]

    Dexterous tactile in-hand manipulation using a modular reinforcement learning architecture,

    J. Pitz, L. R ¨ostel, L. Sievers, and B. B ¨auml, “Dexterous tactile in-hand manipulation using a modular reinforcement learning architecture,” in IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2023, pp. 1852–1858

  19. [19]

    Learning agile and dynamic motor skills for legged robots,

    J. Hwangbo, J. Lee, A. Dosovitskiy, D. Bellicoso, V . Tsounis, V . Koltun, and M. Hutter, “Learning agile and dynamic motor skills for legged robots,”Science Robotics, vol. 4, no. 26, p. eaau5872, 2019

  20. [20]

    Learning quadrupedal locomotion over challenging terrain,

    J. Lee, J. Hwangbo, L. Wellhausen, V . Koltun, and M. Hutter, “Learning quadrupedal locomotion over challenging terrain,”Science Robotics, vol. 5, no. 47, p. eabc5986, 2020

  21. [21]

    Learning by cheating,

    D. Chen, B. Zhou, V . Koltun, and P. Kr¨ahenb¨uhl, “Learning by cheating,” inConference on Robot Learning (CoRL). PMLR, 2020, pp. 66–75

  22. [22]

    Rma: Rapid motor adaptation for legged robots,

    A. Kumar, Z. Fu, D. Pathak, and J. Malik, “Rma: Rapid motor adaptation for legged robots,” inRobotics: Science and Systems (RSS), 2021

  23. [23]

    Attention is all you need,

    A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,”Advances in Neural Information Processing Systems, vol. 30, 2017

  24. [24]

    Decision transformer: Reinforcement learning via sequence modeling,

    L. Chen, K. Lu, A. Rajeswaran, K. Lee, A. Grover, M. Laskin, P. Abbeel, A. Srinivas, and I. Mordatch, “Decision transformer: Reinforcement learning via sequence modeling,”Advances in Neural Information Processing Systems, vol. 34, pp. 15 084–15 097, 2021

  25. [25]

    Offline reinforcement learning as one big sequence modeling problem,

    M. Janner, Q. Li, and S. Levine, “Offline reinforcement learning as one big sequence modeling problem,” inAdvances in Neural Information Processing Systems, vol. 34, 2021, pp. 1273–1286

  26. [26]

    Rt-1: Robotics transformer for real-world control at scale,

    A. Brohan, N. Brown, J. Carbajal, Y . Chebotar, J. Dabis, C. Finn, K. Gober, K. Hausman, A. Herzoget al., “Rt-1: Robotics transformer for real-world control at scale,” inRobotics: Science and Systems (RSS), 2023

  27. [27]

    Real-world humanoid locomotion with reinforcement learning,

    I. Radosavovic, T. Xiao, B. Zhang, T. Darrell, J. Malik, and K. Sreenath, “Real-world humanoid locomotion with reinforcement learning,”Sci- ence Robotics, vol. 9, no. 89, p. eadi9579, 2024

  28. [28]

    Proximal Policy Optimization Algorithms

    J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,” inarXiv preprint arXiv:1707.06347, 2017