Optimal Gait Control for a Tendon-driven Soft Quadruped Robot by Model-based Reinforcement Learning
Pith reviewed 2026-05-23 23:49 UTC · model grok-4.3
The pith
Model-based reinforcement learning with post-training produces more efficient and robust gait policies for a tendon-driven soft quadruped than benchmark methods.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The proposed MBRL algorithm, combined with post-training, significantly improves the efficiency and performance of gait control policies. The developed policy is both robust and adaptable to the robot's deformable morphology.
What carries the argument
The data-driven dynamics model trained after state-space restriction, used for model-based planning inside the reinforcement learning algorithm.
If this is right
- Gait control policies reach higher efficiency and performance than the benchmark methods tested.
- The same policy remains effective when the robot's body deforms during locomotion.
- The controller demonstrates practical use on real hardware after the post-training stage.
- The multi-stage process of state restriction, model learning, and MBRL planning produces policies that transfer without major additional tuning.
Where Pith is reading between the lines
- The same state-space restriction step could shorten training time for controllers on other soft robots whose actuators have similar compressible dynamics.
- If the model stays accurate across different payloads or surfaces, the approach would support deployment in varied outdoor settings without retraining from scratch.
- Robots that must change shape mid-task, such as for navigation through narrow gaps, might inherit the same adaptability shown here.
Load-bearing premise
The learned data-driven model accurately captures how the tendon-driven actuators and the robot's changing shape behave so that plans made in the model still work when transferred to the physical robot.
What would settle it
Deploy the final policy on the physical robot and measure whether forward speed, stability, and success rate match the values predicted by the model; a large drop would falsify the claim that the model supports transferable optimal control.
Figures
read the original abstract
This study presents an innovative approach to optimal gait control for a soft quadruped robot enabled by four Compressible Tendon-driven Soft Actuators (CTSAs). Improving our previous studies of using model-free reinforcement learning for gait control, we employ model-based reinforcement learning (MBRL) to further enhance the performance of the gait controller. Compared to rigid robots, the proposed soft quadruped robot has better safety, less weight, and a simpler mechanism for fabrication and control. However, the primary challenge lies in developing sophisticated control algorithms to attain optimal gait control for fast and stable locomotion. The research employs a multi-stage methodology, including state space restriction, data-driven model training, and reinforcement learning algorithm development. Compared to benchmark methods, the proposed MBRL algorithm, combined with post-training, significantly improves the efficiency and performance of gait control policies. The developed policy is both robust and adaptable to the robot's deformable morphology. The study concludes by highlighting the practical applicability of these findings in real-world scenarios.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes a multi-stage methodology of state-space restriction, data-driven model training, and model-based reinforcement learning (MBRL) with post-training to achieve optimal gait control for a tendon-driven soft quadruped robot using four compressible tendon-driven soft actuators (CTSAs). It claims that this approach, relative to benchmark methods, significantly improves efficiency and performance of gait policies while yielding robustness and adaptability to the robot's deformable morphology, with practical real-world applicability.
Significance. If the empirical claims hold with proper validation, the work could advance control methods for soft robots by showing how MBRL combined with dimensionality reduction can address nonlinear actuator dynamics, offering advantages in safety and fabrication simplicity over rigid platforms. The focus on sim-to-real transfer and morphology robustness addresses a key practical gap in the field.
major comments (3)
- [Abstract] Abstract: The central claim that the MBRL algorithm 'significantly improves the efficiency and performance of gait control policies' and produces a 'robust and adaptable' policy supplies no quantitative metrics, error bars, comparison tables, or validation procedures. This absence makes it impossible to judge whether the data support the stated improvements over benchmarks.
- [Data-driven model training] Data-driven model training section: No multi-step prediction RMSE, N-step error, or other quantitative fidelity metrics are reported for the learned model on held-out physical trajectories after state-space restriction. This is load-bearing for the claim of successful sim-to-real transfer, as model error growth over the planning horizon would invalidate the reported efficiency gains and robustness.
- [Results] Results and evaluation: The manuscript provides no sim-to-real gap measurements, hardware performance numbers, or statistical comparisons to benchmarks. Without these, the assertions of improved locomotion efficiency and adaptability to deformable morphology cannot be evaluated.
minor comments (1)
- [Abstract] The abstract would be clearer if it included at least one key quantitative result (e.g., percentage improvement or success rate) to ground the significance claims.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. The comments correctly identify areas where quantitative support can be strengthened. We address each major comment below.
read point-by-point responses
-
Referee: [Abstract] Abstract: The central claim that the MBRL algorithm 'significantly improves the efficiency and performance of gait control policies' and produces a 'robust and adaptable' policy supplies no quantitative metrics, error bars, comparison tables, or validation procedures. This absence makes it impossible to judge whether the data support the stated improvements over benchmarks.
Authors: We agree that the abstract would be strengthened by quantitative support. In the revised manuscript we will update the abstract to include key quantitative results from the evaluations, such as specific performance improvements over benchmarks, along with references to error bars and validation procedures. revision: yes
-
Referee: [Data-driven model training] Data-driven model training section: No multi-step prediction RMSE, N-step error, or other quantitative fidelity metrics are reported for the learned model on held-out physical trajectories after state-space restriction. This is load-bearing for the claim of successful sim-to-real transfer, as model error growth over the planning horizon would invalidate the reported efficiency gains and robustness.
Authors: This observation is correct. Although the training procedure is described, explicit multi-step prediction metrics were not reported. We will add multi-step RMSE, N-step errors, and other fidelity metrics evaluated on held-out trajectories in the revised data-driven model training section. revision: yes
-
Referee: [Results] Results and evaluation: The manuscript provides no sim-to-real gap measurements, hardware performance numbers, or statistical comparisons to benchmarks. Without these, the assertions of improved locomotion efficiency and adaptability to deformable morphology cannot be evaluated.
Authors: The results section contains simulation-based comparisons to benchmarks. We will add statistical comparisons (including error bars) in revision. However, the work is simulation-based and does not include physical robot experiments, so we cannot supply hardware performance numbers or measured sim-to-real gaps. revision: partial
- Hardware performance numbers and measured sim-to-real gap values, because the study reports simulation results only.
Circularity Check
No circularity: empirical benchmark comparison on physical hardware
full rationale
The paper presents a multi-stage empirical pipeline (state-space restriction, data-driven model training, MBRL policy optimization, post-training) whose performance claims are evaluated by direct comparison against benchmark methods on the physical soft quadruped. No equations, fitted parameters, or self-citations are shown to reduce the reported efficiency gains or robustness claims to quantities defined by the authors' own inputs by construction. The derivation chain is therefore self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
S. Choi, G. Ji, J. Park, H. Kim, J. Mun, J. H. Lee, J. Hwangbo, Learning quadrupedal locomotion on de- formable terrain, Science Robotics 8 (74) (2023) eade2256. doi:10.1126/scirobotics.ade2256. URL https://www.science.org/doi/abs/10.1126/ scirobotics.ade2256
-
[2]
G. Bledt, M. J. Powell, B. Katz, J. Di Carlo, P. M. Wensing, S. Kim, MIT Cheetah 3: Design and Control of a Robust, Dy- namic Quadruped Robot, in: 2018 IEEE/RSJ International Con- ference on Intelligent Robots and Systems (IROS), 2018, pp. 2245–2252. doi:10.1109/IROS.2018.8593885
-
[3]
M. Hutter, C. Gehring, D. Jud, A. Lauber, C. D. Bellicoso, V . Tsounis, J. Hwangbo, K. Bodie, P. Fankhauser, M. Bloesch, R. Diethelm, S. Bachmann, A. Melzer, M. Hoepflinger, ANY- mal - a highly mobile and dynamic quadrupedal robot, in: 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2016, pp. 38–44.doi:10.1109/IROS.2016. 7...
-
[4]
H. Taheri, N. Mozayani, A study on quadruped mobile robots, Mechanism and Machine Theory 190 (2023) 105448. doi:10.1016/j.mechmachtheory.2023.105448. URL https://www.sciencedirect.com/science/ article/pii/S0094114X23002197
-
[5]
O. Yasa, Y . Toshimitsu, M. Y . Michelis, L. S. Jones, M. Fil- ippi, T. Buchner, R. K. Katzschmann, An Overview of Soft Robotics, Annual Review of Control, Robotics, and Autonomous Systems 6 (V olume 6, 2023) (2023) 1–29. doi:10.1146/annurev-control-062322-100607 . URL https://www.annualreviews.org/content/ journals/10.1146/annurev-control-062322-100607
-
[6]
D. Drotman, S. Jadhav, M. Karimi, P. de Zonia, M. T. Tolley, 3D printed soft actuators for a legged robot capable of navigating unstructured terrain, in: 2017 IEEE International Conference on Robotics and Automation (ICRA), 2017, pp. 5532–5538. doi: 10.1109/ICRA.2017.7989652
-
[7]
Q. Ji, S. Fu, L. Feng, G. Andrikopoulos, X. V . Wang, L. Wang, Omnidirectional walking of a quadruped robot enabled by com- pressible tendon-driven soft actuators, in: 2022 IEEE/RSJ Inter- national Conference on Intelligent Robots and Systems (IROS), 2022, pp. 11015–11022. doi:10.1109/IROS47612.2022. 9981314
-
[8]
J. Wang, A. Chortos, Control Strategies for Soft Robot Sys- tems, Advanced Intelligent Systems 4 (5) (2022) 2100165.doi: 10.1002/aisy.202100165. URL https://onlinelibrary.wiley.com/doi/abs/10. 1002/aisy.202100165
-
[9]
D. Rus, M. T. Tolley, Design, fabrication and control of soft robots, Nature 521 (7553) (2015) 467–475. doi:10.1038/ nature14543. URL https://www.nature.com/articles/nature14543
work page 2015
-
[10]
S. Fahmi, C. Mastalli, M. Focchi, C. Semini, Pas- sive Whole-Body Control for Quadruped Robots: Ex- perimental Validation Over Challenging Terrain, IEEE Robotics and Automation Letters 4 (3) (2019) 2553–2560. doi:10.1109/LRA.2019.2908502. URL https://ieeexplore.ieee.org/abstract/ document/8678400/authors#authors
-
[11]
TartanAir: A dataset to push the limits of visual SLAM,
T. Dudzik, M. Chignoli, G. Bledt, B. Lim, A. Miller, D. Kim, S. Kim, Robust Autonomous Navigation of a Small-Scale Quadruped Robot in Real-World Environments, in: 2020 IEEE /RSJ International Conference on Intelli- gent Robots and Systems (IROS), 2020, pp. 3664–3671. doi:10.1109/IROS45743.2020.9340701. URL https://ieeexplore.ieee.org/abstract/ document/9340701
-
[12]
J.-P. Sleiman, F. Farshidian, M. V . Minniti, M. Hutter, A Unified MPC Framework for Whole-Body Dynamic Locomotion and Manipulation, IEEE Robotics and Automation Letters 6 (3) (2021) 4688–4695. doi:10.1109/LRA.2021.3068908. URL https://ieeexplore.ieee.org/abstract/ document/9387121
-
[13]
J. Di Carlo, P. M. Wensing, B. Katz, G. Bledt, S. Kim, Dynamic Locomotion in the MIT Cheetah 3 Through Convex Model- Predictive Control, in: 2018 IEEE /RSJ International Confer- ence on Intelligent Robots and Systems (IROS), 2018, pp. 1–9. doi:10.1109/IROS.2018.8594448
-
[14]
B. Ponton, M. Khadiv, A. Meduri, L. Righetti, E fficient Multicontact Pattern Generation With Sequential Con- vex Approximations of the Centroidal Dynamics, IEEE Transactions on Robotics 37 (5) (2021) 1661–1679. doi:10.1109/TRO.2020.3048125. URL https://ieeexplore.ieee.org/abstract/ document/9350175
-
[15]
T. Miki, J. Lee, J. Hwangbo, L. Wellhausen, V . Koltun, M. Hut- ter, Learning robust perceptive locomotion for quadrupedal robots in the wild, Science Robotics 7 (62) (2022) eabk2822. doi:10.1126/scirobotics.abk2822. URL https://www.science.org/doi/10.1126/ scirobotics.abk2822
-
[16]
V . Tsounis, M. Alge, J. Lee, F. Farshidian, M. Hutter, DeepGait: Planning and Control of Quadrupedal Gaits Using Deep Re- inforcement Learning, IEEE Robotics and Automation Letters 5 (2) (2020) 3699–3706. doi:10.1109/LRA.2020.2979660. URL https://ieeexplore.ieee.org/abstract/ document/9028188
-
[17]
R. Morimoto, S. Nishikawa, R. Niiyama, Y . Kuniyoshi, Model-Free Reinforcement Learning with Ensemble for a Soft Continuum Robot Arm, in: 2021 IEEE 4th International Conference on Soft Robotics (RoboSoft), 2021, pp. 141–148. doi:10.1109/RoboSoft51838.2021.9479340. URL https://ieeexplore.ieee.org/abstract/ document/9479340
-
[18]
Q. Ji, S. Fu, K. Tan, S. Thorapalli Muralidharan, K. Lagrelius, D. Danelia, G. Andrikopoulos, X. V . Wang, L. Wang, L. Feng, Synthesizing the optimal gait of a quadruped robot with soft actuators using deep reinforcement learning, Robotics and Computer-Integrated Manufacturing 78 (2022) 102382. doi:10.1016/j.rcim.2022.102382. URL https://www.sciencedirect...
-
[19]
T. G. Thuruthel, E. Falotico, F. Renda, C. Laschi, Model-Based Reinforcement Learning for Closed-Loop Dynamic Control of Soft Robotic Manipulators, IEEE Transactions on Robotics 35 (1) (2019) 124–134. doi:10.1109/TRO.2018.2878318. URL https://ieeexplore.ieee.org/abstract/ document/8531756
-
[20]
M. Pei, H. An, B. Liu, C. Wang, An improved dyna-q algorithm for mobile robot path planning in unknown dynamic environ- ment, IEEE Transactions on Systems, Man, and Cybernetics: Systems 52 (7) (2021) 4415–4425
work page 2021
-
[21]
D. Yu, W. Zou, Y . Yang, H. Ma, S. E. Li, Y . Yin, J. Chen, J. Duan, Safe model-based reinforcement learning with an uncertainty-aware reachability certificate, IEEE Transactions on Automation Science and Engineering
-
[22]
Z. Bing, L. Knak, L. Cheng, F. O. Morin, K. Huang, A. Knoll, Meta-reinforcement learning in nonstationary and nonparamet- ric environments, IEEE Transactions on Neural Networks and Learning Systems
- [23]
-
[24]
L. Zhu, P. Peng, Z. Lu, Y . Tian, Metavim: Meta variationally in- trinsic motivated reinforcement learning for decentralized traffic signal control, IEEE Transactions on Knowledge and Data En- gineering 35 (11) (2023) 11570–11584
work page 2023
-
[25]
J. Li, X. Shi, J. Li, X. Zhang, J. Wang, Random curiosity- driven exploration in deep reinforcement learning, Neurocom- puting 418 (2020) 139–147
work page 2020
- [26]
-
[27]
J. Ibarz, J. Tan, C. Finn, M. Kalakrishnan, P. Pastor, S. Levine, How to train your robot with deep reinforcement learn- ing: lessons we have learned, The International Journal of Robotics Research 40 (4-5) (2021) 698–721. doi:10.1177/ 0278364920987859. URL https://doi.org/10.1177/0278364920987859
-
[28]
J. Xie, H. Dong, X. Zhao, Data-driven torque and pitch 11 control of wind turbines via reinforcement learning, Renewable Energy 215 (2023) 118893. doi:https: //doi.org/10.1016/j.renene.2023.06.014. URL https://www.sciencedirect.com/science/ article/pii/S0960148123007905
-
[29]
M. Cao, R. Wang, N. Chen, J. Wang, A learning-based vehi- cle trajectory-tracking approach for autonomous vehicles with lidar failure under various lighting conditions, IEEE /ASME transactions on mechatronics 27 (2) (2021) 1011–1022. doi: 10.1109/TMECH.2021.3077388
-
[30]
B. J. Claessens, P. Vrancx, F. Ruelens, Convolutional neural networks for automatic state-time feature extraction in rein- forcement learning applied to residential load control, IEEE Transactions on Smart Grid 9 (4) (2016) 3259–3269. doi: 10.1109/TSG.2016.2629450
-
[31]
Z. C. Lipton, J. Berkowitz, C. Elkan, A critical review of re- current neural networks for sequence learning, arXiv preprint arXiv:1506.00019 abs/1506.00019. arXiv:1506.00019, doi: arXiv:1506.00019. URL http://arxiv.org/abs/1506.00019
work page internal anchor Pith review Pith/arXiv arXiv
-
[32]
A. Sherstinsky, Fundamentals of recurrent neural network (rnn) and long short-term memory (lstm) network, Physica D: Nonlin- ear Phenomena 404 (2020) 132306. doi:10.1016/j.physd. 2019.132306
-
[33]
Lee, Matthew Tan, Yuke Zhu, and Jeannette Bohg
Y . Huang, K. Xie, H. Bharadhwaj, F. Shkurti, Contin- ual Model-Based Reinforcement Learning with Hyper- networks, in: 2021 IEEE International Conference on Robotics and Automation (ICRA), 2021, pp. 799–805. doi:10.1109/ICRA48506.2021.9560793. URL https://ieeexplore.ieee.org/abstract/ document/9560793
-
[34]
T. Haarnoja, A. Zhou, P. Abbeel, S. Levine, Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor, in: J. Dy, A. Krause (Eds.), Proceedings of the 35th International Conference on Machine Learning, V ol. 80 of Proceedings of Machine Learning Research, PMLR, 2018, pp. 1861–1870. URL https://proceedings.mlr.press/...
work page 2018
-
[35]
Soft Actor-Critic Algorithms and Applications
H. Tuomas, Z. Aurick, H. Kristian, T. George, H. Sehoon, T. Jie, K. Vikash, Z. Henry, G. Abhishek, A. Pieter, L. Sergey, Soft actor-critic algorithms and applications, CoRR abs/1812.05905. arXiv:1812.05905, doi:arXiv:1812.05905. URL http://arxiv.org/abs/1812.05905
work page internal anchor Pith review Pith/arXiv arXiv
-
[36]
T. Haarnoja, V . Pong, A. Zhou, M. Dalal, P. Abbeel, S. Levine, Composable Deep Reinforcement Learning for Robotic Manipulation, in: 2018 IEEE International Conference on Robotics and Automation (ICRA), 2018, pp. 6244–6251. doi:10.1109/ICRA.2018.8460756. URL https://ieeexplore.ieee.org/abstract/ document/8460756
-
[37]
N. Xuezhi, Optimal Gait Control of Soft Quadruped Robot by Model-based Reinforcement Learning, Master’s thesis, Dept. Engineering Design, KTH Royal Institute of Technology, Stockholm, Sweden (2023). URL https://urn.kb.se/resolve?urn=urn:nbn:se: kth:diva-339056
work page 2023
-
[38]
P. Biswal, P. K. Mohanty, Development of quadruped walking robots: A review, Ain Shams Engineering Journal 12 (2) (2021) 2017–2031. doi:10.1016/j.asej.2020.11.005. URL https://www.sciencedirect.com/science/ article/pii/S2090447920302501
-
[39]
M. Vukobratovi ´c, J. Stepanenko, On the stability of anthro- pomorphic systems, Mathematical Biosciences 15 (1) (1972) 1–37. doi:10.1016/0025-5564(72)90061-2 . URL https://www.sciencedirect.com/science/ article/pii/0025556472900612
-
[40]
D. J. Hyun, S. Seok, J. Lee, S. Kim, High speed trot-running: Implementation of a hierarchical controller using propriocep- tive impedance control on the MIT Cheetah, The International Journal of Robotics Research 33 (11) (2014) 1417–1445. doi: 10.1177/0278364914532150. URL https://doi.org/10.1177/0278364914532150
-
[41]
D. Bertsekas, Reinforcement Learning and Optimal Control, Athena Scientific, Belmont, Massachusetts, 2019
work page 2019
-
[42]
Y . Shao, Y . Jin, X. Liu, W. He, H. Wang, W. Yang, Learning Free Gait Transition for Quadruped Robots Via Phase-Guided Controller, IEEE Robotics and Automation Letters 7 (2) (2022) 1230–1237. doi:10.1109/LRA.2021.3136645. 12
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.