pith. sign in

arxiv: 2510.26067 · v2 · submitted 2025-10-30 · 💻 cs.RO

Morphology-Aware Graph Reinforcement Learning for Tensegrity Robot Locomotion

Pith reviewed 2026-05-18 03:49 UTC · model grok-4.3

classification 💻 cs.RO
keywords tensegrity robotgraph neural networkreinforcement learningsim-to-real transferlocomotion controlmorphology-aware policysoft actor-criticunderactuated dynamics
0
0 comments X

The pith

Encoding a tensegrity robot's connections as a graph in its control policy enables direct transfer of locomotion skills from simulation to real hardware.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a reinforcement learning approach for tensegrity robots by embedding a graph neural network inside the Soft Actor-Critic algorithm. The graph represents the robot's rods and cables as connected elements so the policy can learn how movements and forces propagate through the structure. This produces policies that train with fewer samples, resist noise and stiffness changes, track paths more accurately, and run on physical hardware immediately after simulation training. A reader would care because these robots are flexible and underactuated, making traditional controllers unreliable for tasks that need resilience and easy deployment. Validation on a three-bar robot for straight movement and turning shows clear gains over ordinary neural network policies.

Core claim

The authors state that representing the tensegrity robot's physical topology as a graph inside a graph neural network policy, integrated with Soft Actor-Critic, captures coupling among components. This yields faster and more stable learning than multilayer perceptron policies, along with higher sample efficiency, robustness to noise and stiffness variations, better trajectory accuracy, and direct transfer from simulation to hardware that produces stable real-world locomotion on three primitives for a physical 3-bar tensegrity robot.

What carries the argument

A graph neural network embedded in the policy that encodes the robot's rod-cable topology as nodes and edges to model dynamic couplings between components during reinforcement learning.

If this is right

  • The graph-based policies require fewer training samples to achieve effective locomotion control than multilayer perceptron baselines.
  • Performance holds up under sensor noise and changes in cable stiffness.
  • Trajectory accuracy improves for straight-line tracking and bidirectional turning in both simulation and on hardware.
  • Policies trained in simulation produce stable real-world locomotion on the physical robot without any fine-tuning.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same graph encoding of structure could support control of other robots that combine rigid and elastic parts.
  • Strong structural priors in the policy might allow simpler physics models in simulation for related underactuated systems.
  • Testing the method on tensegrity robots with more bars or varied cable arrangements would show whether the benefits scale.

Load-bearing premise

Modeling the robot's connections as a graph in the control policy is sufficient to handle the important interactions for movement without needing explicit models of cable stretch or ground contact.

What would settle it

Showing that a standard multilayer perceptron policy achieves equivalent sample efficiency, robustness, trajectory accuracy, and direct sim-to-real transfer for stable locomotion on the same physical 3-bar tensegrity robot would undermine the advantage of the graph representation.

Figures

Figures reproduced from arXiv: 2510.26067 by Chi Zhang, Mingrui Li, Wenzhe Tong, Xiaonan Huang.

Figure 1
Figure 1. Figure 1: Morphology-aware graph reinforcement learning for tensegrity locomotion. The robot’s states (end-cap positions and velocities) are encoded as node features in a graph-based policy, which propagates information along the robot’s structural connections. The network outputs tendon length commands to actuate the tensegrity robot to roll forward in physical experiments. tensegrity locomotion. To address this li… view at source ↗
Figure 2
Figure 2. Figure 2: Physical 3-bar tensegrity robot platform and reference coordinate definitions. φ indicates the waypoint angle between forward direction and tracking direction. The 3-bar tensegrity robot consists of three rigid rods connected by a network of elastic tendons, forming a twisted triangular prism with two triangular faces defined as the left and right sides ( [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Overview of the proposed morphology-aware GNN-SAC framework for tensegrity robot locomotion. The Soft Actor-Critic (SAC) algorithm integrates a graph neural network (GNN)-based policy that encodes the robot’s topology via message passing among end-cap nodes. The actor generates tendon length commands based on structured observations, enabling morphology-aware learning in both simulation and real-world envi… view at source ↗
Figure 4
Figure 4. Figure 4: Benchmark of learning performance across algorithms and network depths. The proposed GNN-SAC consistently outperforms MLP-based SAC (M-SAC), PPO, and TD3 in terms of training reward and sample efficiency for all three locomotion primitives. Subplots (a,c,e) compare algorithms, while (b,d,f) analyze the effect of GNN encoder depth, showing improved performance with multi-layer message passing. B. Benchmark … view at source ↗
Figure 5
Figure 5. Figure 5: Simulation evaluation of learned motion primitives between Graph￾based SAC (G-SAC) and MLP-based SAC (M-SAC): (a) Straight-line tracking error for different waypoint yaw angles; (b) Yaw rate and stability in bidirectional turning tasks. To demonstrate motion composability, the three primitives were combined for waypoint-based trajectory following. A high-level planner sequentially selected primitives accor… view at source ↗
Figure 6
Figure 6. Figure 6: Composed trajectory tracking using learned motion primitives. The robot follows an infinity-shaped (∞) waypoint sequence by sequentially combining straight-line and turning primitives. The resulting CoM trajectory (red) closely aligns with target waypoints (orange), confirming effective motion composition and trajectory accuracy [PITH_FULL_IMAGE:figures/full_fig_p005_6.png] view at source ↗
Figure 8
Figure 8. Figure 8: Comparison between simulation and real-world performance. The proposed GNN-SAC achieves close agreement between simulated and physical results in (a) CoM trajectories, (b) forward tracking, and (c,d) bidirectional turning, validating robust zero-shot sim-to-real transfer. TABLE I. Comparison between simulated and real-world performance over motion primitives Motion primitive Simulation Real Relative Error … view at source ↗
Figure 9
Figure 9. Figure 9: Real-world rollout sequences of learned locomotion primitives. The tensegrity robot executes (a) clockwise turning, (b) counterclockwise turning, and (c) straight-line tracking using zero-shot transferred GNN-SAC policies. The sequences show coordinated rolling and stable motion across all tasks. of 2.76◦ /s with a maximum orientation error below 0.59◦ ( [PITH_FULL_IMAGE:figures/full_fig_p007_9.png] view at source ↗
read the original abstract

Tensegrity robots combine rigid rods and elastic cables, offering high resilience and deployability but at the same time posing major challenges for locomotion control due to their underactuated and highly coupled dynamics. This paper introduces a morphology-aware reinforcement learning framework that integrates a graph neural network (GNN) into the Soft Actor-Critic (SAC) algorithm. By representing the robot's physical topology as a graph, the proposed GNN-based policy captures coupling among components, enabling faster and more stable learning than conventional multilayer perceptron (MLP) policies. The method is validated on a physical 3-bar tensegrity robot across three locomotion primitives, including straight-line tracking and bidirectional turning. It shows superior sample efficiency, robustness to noise and stiffness variations, and improved trajectory accuracy. Additionally, the learned policies transfer directly from simulation to hardware without fine-tuning, achieving stable real-world locomotion. These results demonstrate the advantages of incorporating structural priors into reinforcement learning for tensegrity robot control.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The paper introduces a morphology-aware RL method that embeds a graph neural network into the Soft Actor-Critic algorithm to control a 3-bar tensegrity robot. The robot is represented as a fixed graph of rods and cables so that the policy can exploit structural couplings; the authors report faster learning than MLP baselines, robustness to noise and stiffness changes, and direct sim-to-real transfer without fine-tuning for straight-line tracking and bidirectional turning primitives on physical hardware.

Significance. If the empirical claims hold, the work supplies concrete evidence that structural priors encoded via GNNs can improve sample efficiency and zero-shot hardware transfer for underactuated tensegrity systems. Such results would be useful for the broader robotics community working on compliant, high-dimensional platforms where explicit dynamics modeling is difficult.

major comments (1)
  1. [§4 and §5] §4 (Method) and §5 (Experiments): the central sim-to-real claim requires that the static morphology graph alone encodes the dominant dynamic couplings, including intermittent ground contacts and cable elasticity. The manuscript describes a fixed graph topology but provides no indication of dynamic graph updates or explicit contact-force modeling; without ablation on contact-rich versus contact-free regimes or quantitative metrics (e.g., success rate, trajectory RMSE with error bars) showing invariance to these effects, it remains unclear whether the reported transfer exploits simulation-specific artifacts rather than invariant dynamics.
minor comments (2)
  1. [Abstract] The abstract asserts 'superior sample efficiency, robustness, and trajectory accuracy' yet supplies no numerical values, statistical tests, or baseline comparisons; these quantitative details should be added to the abstract and highlighted in the results section.
  2. [§3] Notation for the GNN message-passing update and the SAC actor-critic losses should be unified across §3 and §4 to avoid ambiguity when readers compare the morphology-aware policy to the MLP baseline.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their valuable feedback. We respond to the major comment as follows and will make corresponding revisions to strengthen the paper.

read point-by-point responses
  1. Referee: [§4 and §5] §4 (Method) and §5 (Experiments): the central sim-to-real claim requires that the static morphology graph alone encodes the dominant dynamic couplings, including intermittent ground contacts and cable elasticity. The manuscript describes a fixed graph topology but provides no indication of dynamic graph updates or explicit contact-force modeling; without ablation on contact-rich versus contact-free regimes or quantitative metrics (e.g., success rate, trajectory RMSE with error bars) showing invariance to these effects, it remains unclear whether the reported transfer exploits simulation-specific artifacts rather than invariant dynamics.

    Authors: We agree with the observation that the morphology graph is static and does not include dynamic updates or explicit contact modeling. The fixed graph captures the invariant physical topology of the tensegrity structure, enabling the GNN to learn message-passing rules that reflect how forces and states propagate through rods and cables. Intermittent contacts and elasticity are accounted for in the physics simulator used for training, and the policy is optimized to produce robust actions under these conditions. Our reported robustness to stiffness changes indirectly supports handling of elasticity variations. To provide stronger evidence against simulation artifacts, we will revise the manuscript to include: (i) an ablation study comparing locomotion performance in contact-rich environments versus contact-free regimes (e.g., by disabling ground contacts in simulation), and (ii) quantitative results with error bars, including success rates and trajectory RMSE for the sim-to-real transfers. These additions will demonstrate the contribution of the morphology-aware encoding to the observed zero-shot transfer. We thank the referee for highlighting this point, which will improve the clarity of our empirical validation. revision: partial

Circularity Check

0 steps flagged

No circularity: empirical RL method validated on hardware

full rationale

The paper describes an algorithmic framework that embeds a fixed-topology GNN into SAC for policy learning on a tensegrity robot, with performance measured via simulation training curves, baseline comparisons, and direct sim-to-real transfer on physical hardware. No derivation chain, uniqueness theorem, or fitted parameter is presented that reduces to its own inputs by construction. Claims rest on external empirical benchmarks rather than self-referential definitions or self-citation loops. The graph encoding is a design choice whose value is assessed against independent metrics, not asserted as tautological.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

The central claim rests on standard RL assumptions plus the modeling choice that a graph representation of rods and cables is an adequate structural prior. No new physical constants or invented entities are introduced.

free parameters (2)
  • GNN architecture hyperparameters
    Layer count, hidden dimension, and message-passing steps are chosen to fit the 3-bar topology; their values are not reported in the abstract.
  • SAC reward and entropy coefficients
    Standard SAC hyperparameters that balance task reward against policy entropy; tuned for the locomotion primitives.
axioms (2)
  • domain assumption The robot's dynamics can be adequately captured by a graph whose nodes are rigid bars and elastic cables and whose edges encode physical connections.
    Invoked when the GNN policy is introduced to capture coupling among components.
  • domain assumption Simulation dynamics are sufficiently accurate that policies trained in simulation transfer to hardware without fine-tuning.
    Required for the zero-shot sim-to-real claim.

pith-pipeline@v0.9.0 · 5701 in / 1575 out tokens · 27260 ms · 2026-05-18T03:49:44.029554+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

28 extracted references · 28 canonical work pages · 3 internal anchors

  1. [1]

    R. E. Skelton and M. C. De Oliveira,Tensegrity systems. Springer, 2009, vol. 1. [Online]. Available: https://link.springer.com/book/10. 1007/978-0-387-74242-7

  2. [2]

    Tensegrity robot proprioceptive state estimation with geometric constraints,

    W. Tong, T.-Y . Lin, J. Mi, Y . Jiang, M. Ghaffari, and X. Huang, “Tensegrity robot proprioceptive state estimation with geometric constraints,”IEEE Robotics and Automation Letters, vol. 10, no. 4, pp. 4069–4076, 2025. [Online]. Available: https://ieeexplore.ieee.org/ document/10910166

  3. [3]

    Design of a variable stiffness quasi-direct drive cable-actuated tensegrity robot,

    J. Mi, W. Tong, Y . Ma, and X. Huang, “Design of a variable stiffness quasi-direct drive cable-actuated tensegrity robot,”IEEE Robotics and Automation Letters, 2025. [Online]. Available: https: //ieeexplore.ieee.org/document/11072300

  4. [4]

    Tensegrity robotics,

    D. S. Shah, J. W. Booth, R. L. Baines, K. Wang, M. Vespignani, K. Bekris, and R. Kramer-Bottiglio, “Tensegrity robotics,”Soft robotics, vol. 9, no. 4, pp. 639–656, 2022. [Online]. Available: https://www.liebertpub.com/doi/epub/10.1089/soro.2020.0170

  5. [5]

    Rusu, Joel Veness, Marc G

    V . Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg, and D. Hassabis, “Human-level control through deep reinforcement learning,”Nature, vol. 518, no. 7540, pp. 529–533, 2015. [Online]...

  6. [6]

    Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor,

    T. Haarnoja, A. Zhou, P. Abbeel, and S. Levine, “Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor,” 2018. [Online]. Available: https://openreview.net/ forum?id=HJjvxl-Cb

  7. [7]

    Design and control of tensegrity robots for locomotion,

    C. Paul, F. Valero-Cuevas, and H. Lipson, “Design and control of tensegrity robots for locomotion,”IEEE Transactions on Robotics, vol. 22, no. 5, pp. 944–957, 2006

  8. [8]

    Gait production in a tensegrity based robot,

    C. Paul, J. Roberts, H. Lipson, and F. Valero Cuevas, “Gait production in a tensegrity based robot,” inICAR ’05. Proceedings., 12th Interna- tional Conference on Advanced Robotics, 2005., 2005, pp. 216–222

  9. [9]

    Robust learning of tensegrity robot control for locomotion through form-finding,

    K. Kim, A. K. Agogino, A. Toghyan, D. Moon, L. Taneja, and A. M. Agogino, “Robust learning of tensegrity robot control for locomotion through form-finding,” in2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2015, pp. 5824–5831

  10. [10]

    Rolling locomotion of cable-driven soft spherical tensegrity robots,

    K. Kim, A. K. Agogino, and A. M. Agogino, “Rolling locomotion of cable-driven soft spherical tensegrity robots,”Soft Robotics, vol. 7, no. 3, pp. 346–361, 2020, pMID: 32031916. [Online]. Available: https://doi.org/10.1089/soro.2019.0056

  11. [11]

    Full-actuation rolling locomotion with tensegrity robot via deep reinforcement learning,

    Y . Guo and H. Peng, “Full-actuation rolling locomotion with tensegrity robot via deep reinforcement learning,” in2021 5th International Conference on Robotics and Automation Sciences (ICRAS), 2021, pp. 51–55

  12. [12]

    Real2sim2real transfer for control of cable-driven robots via a differentiable physics engine,

    K. Wang, W. R. Johnson, S. Lu, X. Huang, J. Booth, R. Kramer- Bottiglio, M. Aanjaneya, and K. Bekris, “Real2sim2real transfer for control of cable-driven robots via a differentiable physics engine,” in 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2023, pp. 2534–2541

  13. [13]

    Multi-cable rolling locomotion with spherical tensegrities using model predictive control and deep learn- ing,

    B. Cera and A. M. Agogino, “Multi-cable rolling locomotion with spherical tensegrities using model predictive control and deep learn- ing,” in2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2018, pp. 1–9

  14. [14]

    Continuous control with deep reinforcement learning

    T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y . Tassa, D. Silver, and D. Wierstra, “Continuous control with deep reinforce- ment learning,”arXiv preprint arXiv:1509.02971, 2015

  15. [15]

    Proximal Policy Optimization Algorithms

    J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,” 2017. [Online]. Available: https://arxiv.org/abs/1707.06347

  16. [16]

    Addressing function approxi- mation error in actor-critic methods,

    S. Fujimoto, H. Hoof, and D. Meger, “Addressing function approxi- mation error in actor-critic methods,” inInternational conference on machine learning. PMLR, 2018, pp. 1587–1596

  17. [17]

    Learning dexterous in-hand manipulation,

    O. M. Andrychowicz, B. Baker, M. Chociej, R. Jozefowicz, B. Mc- Grew, J. Pachocki, A. Petron, M. Plappert, G. Powell, A. Rayet al., “Learning dexterous in-hand manipulation,”The International Journal of Robotics Research, vol. 39, no. 1, pp. 3–20, 2020

  18. [18]

    Auke Jan Ijspeert

    J. Hwangbo, J. Lee, A. Dosovitskiy, D. Bellicoso, V . Tsounis, V . Koltun, and M. Hutter, “Learning agile and dynamic motor skills for legged robots,”Science Robotics, vol. 4, no. 26, Jan. 2019. [Online]. Available: http://dx.doi.org/10.1126/scirobotics.aau5872

  19. [19]

    Learning to fly by crashing,

    D. Gandhi, L. Pinto, and A. Gupta, “Learning to fly by crashing,” in 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2017, pp. 3948–3955

  20. [20]

    Domain randomization for transferring deep neural networks from simulation to the real world,

    J. Tobin, R. Fong, A. Ray, J. Schneider, W. Zaremba, and P. Abbeel, “Domain randomization for transferring deep neural networks from simulation to the real world,” in2017 IEEE/RSJ International Con- ference on Intelligent Robots and Systems (IROS), 2017, pp. 23–30

  21. [21]

    Deep reinforcement learning for tensegrity robot locomotion,

    M. Zhang, X. Geng, J. Bruce, K. Caluwaerts, M. Vespignani, V . Sun- Spiral, P. Abbeel, and S. Levine, “Deep reinforcement learning for tensegrity robot locomotion,” in2017 IEEE International Conference on Robotics and Automation (ICRA), 2017, pp. 634–641

  22. [22]

    Tensegrity robot locomotion under limited sensory inputs via deep reinforcement learning,

    J. Luo, R. Edmunds, F. Rice, and A. M. Agogino, “Tensegrity robot locomotion under limited sensory inputs via deep reinforcement learning,” in2018 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2018, pp. 6260–6267

  23. [23]

    Adaptive Tensegrity Locomotion on Rough Terrain via Reinforcement Learning

    D. Surovik, K. Wang, and K. E. Bekris, “Adaptive tensegrity locomotion on rough terrain via reinforcement learning,” 2018. [Online]. Available: https://arxiv.org/abs/1809.10710

  24. [24]

    Adaptive tensegrity locomotion: Controlling a compliant icosahedron with symmetry-reduced reinforcement learning,

    D. Surovik, K. Wang, M. Vespignani, J. Bruce, and K. E. Bekris, “Adaptive tensegrity locomotion: Controlling a compliant icosahedron with symmetry-reduced reinforcement learning,”The International Journal of Robotics Research, vol. 40, no. 1, pp. 375–396, 2021. [Online]. Available: https://doi.org/10.1177/0278364919859443

  25. [25]

    Nervenet: Learning structured policy with graph neural networks,

    T. Wang, R. Liao, J. Ba, and S. Fidler, “Nervenet: Learning structured policy with graph neural networks,” inInternational Conference on Learning Representations, 2018. [Online]. Available: https://openreview.net/forum?id=S1sqHMZCb

  26. [26]

    Universal morphology control via contextual modulation,

    Z. Xiong, J. Beck, and S. Whiteson, “Universal morphology control via contextual modulation,” inInternational Conference on Machine Learning. PMLR, 2023, pp. 38 286–38 300

  27. [27]

    Learning differentiable tensegrity dynamics using graph neural networks,

    N. Chen, K. Wang, W. R. Johnson III, R. Kramer-Bottiglio, K. Bekris, and M. Aanjaneya, “Learning differentiable tensegrity dynamics using graph neural networks,”arXiv preprint arXiv:2410.12216, 2024

  28. [28]

    Sim2sim evaluation of a novel data-efficient differentiable physics engine for tensegrity robots,

    K. Wang, M. Aanjaneya, and K. Bekris, “Sim2sim evaluation of a novel data-efficient differentiable physics engine for tensegrity robots,” in2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2021, pp. 1694–1701