Optimal Gait Control for a Tendon-driven Soft Quadruped Robot by Model-based Reinforcement Learning

Kaige Tan; Lei Feng; Xuezhi Niu

arxiv: 2406.07069 · v1 · submitted 2024-06-11 · 💻 cs.RO · cs.SY· eess.SY

Optimal Gait Control for a Tendon-driven Soft Quadruped Robot by Model-based Reinforcement Learning

Xuezhi Niu , Kaige Tan , Lei Feng This is my paper

Pith reviewed 2026-05-23 23:49 UTC · model grok-4.3

classification 💻 cs.RO cs.SYeess.SY

keywords soft quadruped robotgait controlmodel-based reinforcement learningtendon-driven actuatorsdeformable morphologylocomotion control

0 comments

The pith

Model-based reinforcement learning with post-training produces more efficient and robust gait policies for a tendon-driven soft quadruped than benchmark methods.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows how to control the walking gait of a soft quadruped robot built from four compressible tendon-driven actuators by switching from model-free reinforcement learning to a model-based version. The method first restricts the state space, trains a data-driven dynamics model on collected data, then uses that model inside the reinforcement learning loop for planning, followed by a post-training step. A sympathetic reader would care because soft robots are lighter and safer around people than rigid ones, yet their changing shape makes precise control difficult; a working model-based approach would let the robot move faster and more stably without constant retuning. The authors report that the resulting policies outperform several standard methods on efficiency and performance metrics while remaining adaptable when the robot deforms during motion.

Core claim

The proposed MBRL algorithm, combined with post-training, significantly improves the efficiency and performance of gait control policies. The developed policy is both robust and adaptable to the robot's deformable morphology.

What carries the argument

The data-driven dynamics model trained after state-space restriction, used for model-based planning inside the reinforcement learning algorithm.

If this is right

Gait control policies reach higher efficiency and performance than the benchmark methods tested.
The same policy remains effective when the robot's body deforms during locomotion.
The controller demonstrates practical use on real hardware after the post-training stage.
The multi-stage process of state restriction, model learning, and MBRL planning produces policies that transfer without major additional tuning.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same state-space restriction step could shorten training time for controllers on other soft robots whose actuators have similar compressible dynamics.
If the model stays accurate across different payloads or surfaces, the approach would support deployment in varied outdoor settings without retraining from scratch.
Robots that must change shape mid-task, such as for navigation through narrow gaps, might inherit the same adaptability shown here.

Load-bearing premise

The learned data-driven model accurately captures how the tendon-driven actuators and the robot's changing shape behave so that plans made in the model still work when transferred to the physical robot.

What would settle it

Deploy the final policy on the physical robot and measure whether forward speed, stability, and success rate match the values predicted by the model; a large drop would falsify the claim that the model supports transferable optimal control.

Figures

Figures reproduced from arXiv: 2406.07069 by Kaige Tan, Lei Feng, Xuezhi Niu.

**Figure 2.** Figure 2: Overview of SoftQ and CTSA: (a) Rendered robot with key [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Expert gait design, solid lines for FL and RR pairs, dashed [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: Evaluation of the surrogate model accuracy with varying [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

**Figure 6.** Figure 6: Resultant forward walking speed in simulation for expert [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗

**Figure 5.** Figure 5: The training results in 0.2 m/s reference speed. (a) Cumulative reward with training episodes. Variations in (b) entropy and (c) temperature during the training process. 5.2. Benchmark Comparison Multiple metrics are defined to verify the performance of the learned control policy. A stability metric is a weighted combination of gait duration, angular velocity on the z axis (θ˙ z), and velocity on the y … view at source ↗

**Figure 8.** Figure 8: Control architecture. by three servo motors and connected tendons, regulated by a PD controller to reach target positions assigned by the RL controller. Displacement speed components in the x, y, and z axes are estimated via the integration of accelerations from IMU signals and ToF distance. Contact forces at leg ends are measured by force sensors. Reference signals are transmitted to servo motors from th… view at source ↗

**Figure 9.** Figure 9: Field test results captured in video frames. [PITH_FULL_IMAGE:figures/full_fig_p009_9.png] view at source ↗

**Figure 10.** Figure 10: Comparison of speeds in real test and simulation. [PITH_FULL_IMAGE:figures/full_fig_p010_10.png] view at source ↗

read the original abstract

This study presents an innovative approach to optimal gait control for a soft quadruped robot enabled by four Compressible Tendon-driven Soft Actuators (CTSAs). Improving our previous studies of using model-free reinforcement learning for gait control, we employ model-based reinforcement learning (MBRL) to further enhance the performance of the gait controller. Compared to rigid robots, the proposed soft quadruped robot has better safety, less weight, and a simpler mechanism for fabrication and control. However, the primary challenge lies in developing sophisticated control algorithms to attain optimal gait control for fast and stable locomotion. The research employs a multi-stage methodology, including state space restriction, data-driven model training, and reinforcement learning algorithm development. Compared to benchmark methods, the proposed MBRL algorithm, combined with post-training, significantly improves the efficiency and performance of gait control policies. The developed policy is both robust and adaptable to the robot's deformable morphology. The study concludes by highlighting the practical applicability of these findings in real-world scenarios.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 1 minor

Summary. The paper proposes a multi-stage methodology of state-space restriction, data-driven model training, and model-based reinforcement learning (MBRL) with post-training to achieve optimal gait control for a tendon-driven soft quadruped robot using four compressible tendon-driven soft actuators (CTSAs). It claims that this approach, relative to benchmark methods, significantly improves efficiency and performance of gait policies while yielding robustness and adaptability to the robot's deformable morphology, with practical real-world applicability.

Significance. If the empirical claims hold with proper validation, the work could advance control methods for soft robots by showing how MBRL combined with dimensionality reduction can address nonlinear actuator dynamics, offering advantages in safety and fabrication simplicity over rigid platforms. The focus on sim-to-real transfer and morphology robustness addresses a key practical gap in the field.

major comments (3)

[Abstract] Abstract: The central claim that the MBRL algorithm 'significantly improves the efficiency and performance of gait control policies' and produces a 'robust and adaptable' policy supplies no quantitative metrics, error bars, comparison tables, or validation procedures. This absence makes it impossible to judge whether the data support the stated improvements over benchmarks.
[Data-driven model training] Data-driven model training section: No multi-step prediction RMSE, N-step error, or other quantitative fidelity metrics are reported for the learned model on held-out physical trajectories after state-space restriction. This is load-bearing for the claim of successful sim-to-real transfer, as model error growth over the planning horizon would invalidate the reported efficiency gains and robustness.
[Results] Results and evaluation: The manuscript provides no sim-to-real gap measurements, hardware performance numbers, or statistical comparisons to benchmarks. Without these, the assertions of improved locomotion efficiency and adaptability to deformable morphology cannot be evaluated.

minor comments (1)

[Abstract] The abstract would be clearer if it included at least one key quantitative result (e.g., percentage improvement or success rate) to ground the significance claims.

Simulated Author's Rebuttal

3 responses · 1 unresolved

We thank the referee for the constructive feedback. The comments correctly identify areas where quantitative support can be strengthened. We address each major comment below.

read point-by-point responses

Referee: [Abstract] Abstract: The central claim that the MBRL algorithm 'significantly improves the efficiency and performance of gait control policies' and produces a 'robust and adaptable' policy supplies no quantitative metrics, error bars, comparison tables, or validation procedures. This absence makes it impossible to judge whether the data support the stated improvements over benchmarks.

Authors: We agree that the abstract would be strengthened by quantitative support. In the revised manuscript we will update the abstract to include key quantitative results from the evaluations, such as specific performance improvements over benchmarks, along with references to error bars and validation procedures. revision: yes
Referee: [Data-driven model training] Data-driven model training section: No multi-step prediction RMSE, N-step error, or other quantitative fidelity metrics are reported for the learned model on held-out physical trajectories after state-space restriction. This is load-bearing for the claim of successful sim-to-real transfer, as model error growth over the planning horizon would invalidate the reported efficiency gains and robustness.

Authors: This observation is correct. Although the training procedure is described, explicit multi-step prediction metrics were not reported. We will add multi-step RMSE, N-step errors, and other fidelity metrics evaluated on held-out trajectories in the revised data-driven model training section. revision: yes
Referee: [Results] Results and evaluation: The manuscript provides no sim-to-real gap measurements, hardware performance numbers, or statistical comparisons to benchmarks. Without these, the assertions of improved locomotion efficiency and adaptability to deformable morphology cannot be evaluated.

Authors: The results section contains simulation-based comparisons to benchmarks. We will add statistical comparisons (including error bars) in revision. However, the work is simulation-based and does not include physical robot experiments, so we cannot supply hardware performance numbers or measured sim-to-real gaps. revision: partial

standing simulated objections not resolved

Hardware performance numbers and measured sim-to-real gap values, because the study reports simulation results only.

Circularity Check

0 steps flagged

No circularity: empirical benchmark comparison on physical hardware

full rationale

The paper presents a multi-stage empirical pipeline (state-space restriction, data-driven model training, MBRL policy optimization, post-training) whose performance claims are evaluated by direct comparison against benchmark methods on the physical soft quadruped. No equations, fitted parameters, or self-citations are shown to reduce the reported efficiency gains or robustness claims to quantities defined by the authors' own inputs by construction. The derivation chain is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract supplies no explicit free parameters, mathematical axioms, or newly postulated entities. The central claim rests on standard reinforcement-learning concepts and the unstated premise that a learned forward model will be sufficiently accurate for planning.

pith-pipeline@v0.9.0 · 5709 in / 1161 out tokens · 31140 ms · 2026-05-23T23:49:32.443118+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

42 extracted references · 42 canonical work pages · 2 internal anchors

[1]

S. Choi, G. Ji, J. Park, H. Kim, J. Mun, J. H. Lee, J. Hwangbo, Learning quadrupedal locomotion on de- formable terrain, Science Robotics 8 (74) (2023) eade2256. doi:10.1126/scirobotics.ade2256. URL https://www.science.org/doi/abs/10.1126/ scirobotics.ade2256

work page doi:10.1126/scirobotics.ade2256 2023
[2]

Bledt, M

G. Bledt, M. J. Powell, B. Katz, J. Di Carlo, P. M. Wensing, S. Kim, MIT Cheetah 3: Design and Control of a Robust, Dy- namic Quadruped Robot, in: 2018 IEEE/RSJ International Con- ference on Intelligent Robots and Systems (IROS), 2018, pp. 2245–2252. doi:10.1109/IROS.2018.8593885

work page doi:10.1109/iros.2018.8593885 2018
[3]

Hutter, C

M. Hutter, C. Gehring, D. Jud, A. Lauber, C. D. Bellicoso, V . Tsounis, J. Hwangbo, K. Bodie, P. Fankhauser, M. Bloesch, R. Diethelm, S. Bachmann, A. Melzer, M. Hoepflinger, ANY- mal - a highly mobile and dynamic quadrupedal robot, in: 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2016, pp. 38–44.doi:10.1109/IROS.2016. 7...

work page doi:10.1109/iros.2016 2016
[4]

Taheri, N

H. Taheri, N. Mozayani, A study on quadruped mobile robots, Mechanism and Machine Theory 190 (2023) 105448. doi:10.1016/j.mechmachtheory.2023.105448. URL https://www.sciencedirect.com/science/ article/pii/S0094114X23002197

work page doi:10.1016/j.mechmachtheory.2023.105448 2023
[5]

O. Yasa, Y . Toshimitsu, M. Y . Michelis, L. S. Jones, M. Fil- ippi, T. Buchner, R. K. Katzschmann, An Overview of Soft Robotics, Annual Review of Control, Robotics, and Autonomous Systems 6 (V olume 6, 2023) (2023) 1–29. doi:10.1146/annurev-control-062322-100607 . URL https://www.annualreviews.org/content/ journals/10.1146/annurev-control-062322-100607

work page doi:10.1146/annurev-control-062322-100607 2023
[6]

Drotman, S

D. Drotman, S. Jadhav, M. Karimi, P. de Zonia, M. T. Tolley, 3D printed soft actuators for a legged robot capable of navigating unstructured terrain, in: 2017 IEEE International Conference on Robotics and Automation (ICRA), 2017, pp. 5532–5538. doi: 10.1109/ICRA.2017.7989652

work page doi:10.1109/icra.2017.7989652 2017
[7]

Q. Ji, S. Fu, L. Feng, G. Andrikopoulos, X. V . Wang, L. Wang, Omnidirectional walking of a quadruped robot enabled by com- pressible tendon-driven soft actuators, in: 2022 IEEE/RSJ Inter- national Conference on Intelligent Robots and Systems (IROS), 2022, pp. 11015–11022. doi:10.1109/IROS47612.2022. 9981314

work page doi:10.1109/iros47612.2022 2022
[8]

J. Wang, A. Chortos, Control Strategies for Soft Robot Sys- tems, Advanced Intelligent Systems 4 (5) (2022) 2100165.doi: 10.1002/aisy.202100165. URL https://onlinelibrary.wiley.com/doi/abs/10. 1002/aisy.202100165

work page doi:10.1002/aisy.202100165 2022
[9]

D. Rus, M. T. Tolley, Design, fabrication and control of soft robots, Nature 521 (7553) (2015) 467–475. doi:10.1038/ nature14543. URL https://www.nature.com/articles/nature14543

work page 2015
[10]

Fahmi, C

S. Fahmi, C. Mastalli, M. Focchi, C. Semini, Pas- sive Whole-Body Control for Quadruped Robots: Ex- perimental Validation Over Challenging Terrain, IEEE Robotics and Automation Letters 4 (3) (2019) 2553–2560. doi:10.1109/LRA.2019.2908502. URL https://ieeexplore.ieee.org/abstract/ document/8678400/authors#authors

work page doi:10.1109/lra.2019.2908502 2019
[11]

TartanAir: A dataset to push the limits of visual SLAM,

T. Dudzik, M. Chignoli, G. Bledt, B. Lim, A. Miller, D. Kim, S. Kim, Robust Autonomous Navigation of a Small-Scale Quadruped Robot in Real-World Environments, in: 2020 IEEE /RSJ International Conference on Intelli- gent Robots and Systems (IROS), 2020, pp. 3664–3671. doi:10.1109/IROS45743.2020.9340701. URL https://ieeexplore.ieee.org/abstract/ document/9340701

work page doi:10.1109/iros45743.2020.9340701 2020
[12]

Sleiman, F

J.-P. Sleiman, F. Farshidian, M. V . Minniti, M. Hutter, A Unified MPC Framework for Whole-Body Dynamic Locomotion and Manipulation, IEEE Robotics and Automation Letters 6 (3) (2021) 4688–4695. doi:10.1109/LRA.2021.3068908. URL https://ieeexplore.ieee.org/abstract/ document/9387121

work page doi:10.1109/lra.2021.3068908 2021
[13]

Di Carlo, P

J. Di Carlo, P. M. Wensing, B. Katz, G. Bledt, S. Kim, Dynamic Locomotion in the MIT Cheetah 3 Through Convex Model- Predictive Control, in: 2018 IEEE /RSJ International Confer- ence on Intelligent Robots and Systems (IROS), 2018, pp. 1–9. doi:10.1109/IROS.2018.8594448

work page doi:10.1109/iros.2018.8594448 2018
[14]

Ponton, M

B. Ponton, M. Khadiv, A. Meduri, L. Righetti, E fficient Multicontact Pattern Generation With Sequential Con- vex Approximations of the Centroidal Dynamics, IEEE Transactions on Robotics 37 (5) (2021) 1661–1679. doi:10.1109/TRO.2020.3048125. URL https://ieeexplore.ieee.org/abstract/ document/9350175

work page doi:10.1109/tro.2020.3048125 2021
[15]

T. Miki, J. Lee, J. Hwangbo, L. Wellhausen, V . Koltun, M. Hut- ter, Learning robust perceptive locomotion for quadrupedal robots in the wild, Science Robotics 7 (62) (2022) eabk2822. doi:10.1126/scirobotics.abk2822. URL https://www.science.org/doi/10.1126/ scirobotics.abk2822

work page doi:10.1126/scirobotics.abk2822 2022
[16]

Tsounis, M

V . Tsounis, M. Alge, J. Lee, F. Farshidian, M. Hutter, DeepGait: Planning and Control of Quadrupedal Gaits Using Deep Re- inforcement Learning, IEEE Robotics and Automation Letters 5 (2) (2020) 3699–3706. doi:10.1109/LRA.2020.2979660. URL https://ieeexplore.ieee.org/abstract/ document/9028188

work page doi:10.1109/lra.2020.2979660 2020
[17]

Morimoto, S

R. Morimoto, S. Nishikawa, R. Niiyama, Y . Kuniyoshi, Model-Free Reinforcement Learning with Ensemble for a Soft Continuum Robot Arm, in: 2021 IEEE 4th International Conference on Soft Robotics (RoboSoft), 2021, pp. 141–148. doi:10.1109/RoboSoft51838.2021.9479340. URL https://ieeexplore.ieee.org/abstract/ document/9479340

work page doi:10.1109/robosoft51838.2021.9479340 2021
[18]

Q. Ji, S. Fu, K. Tan, S. Thorapalli Muralidharan, K. Lagrelius, D. Danelia, G. Andrikopoulos, X. V . Wang, L. Wang, L. Feng, Synthesizing the optimal gait of a quadruped robot with soft actuators using deep reinforcement learning, Robotics and Computer-Integrated Manufacturing 78 (2022) 102382. doi:10.1016/j.rcim.2022.102382. URL https://www.sciencedirect...

work page doi:10.1016/j.rcim.2022.102382 2022
[19]

T. G. Thuruthel, E. Falotico, F. Renda, C. Laschi, Model-Based Reinforcement Learning for Closed-Loop Dynamic Control of Soft Robotic Manipulators, IEEE Transactions on Robotics 35 (1) (2019) 124–134. doi:10.1109/TRO.2018.2878318. URL https://ieeexplore.ieee.org/abstract/ document/8531756

work page doi:10.1109/tro.2018.2878318 2019
[20]

M. Pei, H. An, B. Liu, C. Wang, An improved dyna-q algorithm for mobile robot path planning in unknown dynamic environ- ment, IEEE Transactions on Systems, Man, and Cybernetics: Systems 52 (7) (2021) 4415–4425

work page 2021
[21]

D. Yu, W. Zou, Y . Yang, H. Ma, S. E. Li, Y . Yin, J. Chen, J. Duan, Safe model-based reinforcement learning with an uncertainty-aware reachability certificate, IEEE Transactions on Automation Science and Engineering

work page
[22]

Z. Bing, L. Knak, L. Cheng, F. O. Morin, K. Huang, A. Knoll, Meta-reinforcement learning in nonstationary and nonparamet- ric environments, IEEE Transactions on Neural Networks and Learning Systems

work page
[23]

Ballou, X

A. Ballou, X. Alameda-Pineda, C. Reinke, Variational meta re- inforcement learning for social robotics, Applied Intelligence 53 (22) (2023) 27249–27268

work page 2023
[24]

L. Zhu, P. Peng, Z. Lu, Y . Tian, Metavim: Meta variationally in- trinsic motivated reinforcement learning for decentralized traffic signal control, IEEE Transactions on Knowledge and Data En- gineering 35 (11) (2023) 11570–11584

work page 2023
[25]

J. Li, X. Shi, J. Li, X. Zhang, J. Wang, Random curiosity- driven exploration in deep reinforcement learning, Neurocom- puting 418 (2020) 139–147

work page 2020
[26]

Jiang, J

Y . Jiang, J. Z. Kolter, R. Raileanu, On the importance of explo- ration for generalization in reinforcement learning, Advances in Neural Information Processing Systems 36

work page
[27]

Ibarz, J

J. Ibarz, J. Tan, C. Finn, M. Kalakrishnan, P. Pastor, S. Levine, How to train your robot with deep reinforcement learn- ing: lessons we have learned, The International Journal of Robotics Research 40 (4-5) (2021) 698–721. doi:10.1177/ 0278364920987859. URL https://doi.org/10.1177/0278364920987859

work page doi:10.1177/0278364920987859 2021
[28]

J. Xie, H. Dong, X. Zhao, Data-driven torque and pitch 11 control of wind turbines via reinforcement learning, Renewable Energy 215 (2023) 118893. doi:https: //doi.org/10.1016/j.renene.2023.06.014. URL https://www.sciencedirect.com/science/ article/pii/S0960148123007905

work page doi:10.1016/j.renene.2023.06.014 2023
[29]

M. Cao, R. Wang, N. Chen, J. Wang, A learning-based vehi- cle trajectory-tracking approach for autonomous vehicles with lidar failure under various lighting conditions, IEEE /ASME transactions on mechatronics 27 (2) (2021) 1011–1022. doi: 10.1109/TMECH.2021.3077388

work page doi:10.1109/tmech.2021.3077388 2021
[30]

B. J. Claessens, P. Vrancx, F. Ruelens, Convolutional neural networks for automatic state-time feature extraction in rein- forcement learning applied to residential load control, IEEE Transactions on Smart Grid 9 (4) (2016) 3259–3269. doi: 10.1109/TSG.2016.2629450

work page doi:10.1109/tsg.2016.2629450 2016
[31]

Z. C. Lipton, J. Berkowitz, C. Elkan, A critical review of re- current neural networks for sequence learning, arXiv preprint arXiv:1506.00019 abs/1506.00019. arXiv:1506.00019, doi: arXiv:1506.00019. URL http://arxiv.org/abs/1506.00019

work page internal anchor Pith review Pith/arXiv arXiv
[32]

Sherstinsky, Fundamentals of recurrent neural network (rnn) and long short-term memory (lstm) network, Physica D: Nonlin- ear Phenomena 404 (2020) 132306

A. Sherstinsky, Fundamentals of recurrent neural network (rnn) and long short-term memory (lstm) network, Physica D: Nonlin- ear Phenomena 404 (2020) 132306. doi:10.1016/j.physd. 2019.132306

work page doi:10.1016/j.physd 2020
[33]

Lee, Matthew Tan, Yuke Zhu, and Jeannette Bohg

Y . Huang, K. Xie, H. Bharadhwaj, F. Shkurti, Contin- ual Model-Based Reinforcement Learning with Hyper- networks, in: 2021 IEEE International Conference on Robotics and Automation (ICRA), 2021, pp. 799–805. doi:10.1109/ICRA48506.2021.9560793. URL https://ieeexplore.ieee.org/abstract/ document/9560793

work page doi:10.1109/icra48506.2021.9560793 2021
[34]

Haarnoja, A

T. Haarnoja, A. Zhou, P. Abbeel, S. Levine, Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor, in: J. Dy, A. Krause (Eds.), Proceedings of the 35th International Conference on Machine Learning, V ol. 80 of Proceedings of Machine Learning Research, PMLR, 2018, pp. 1861–1870. URL https://proceedings.mlr.press/...

work page 2018
[35]

Soft Actor-Critic Algorithms and Applications

H. Tuomas, Z. Aurick, H. Kristian, T. George, H. Sehoon, T. Jie, K. Vikash, Z. Henry, G. Abhishek, A. Pieter, L. Sergey, Soft actor-critic algorithms and applications, CoRR abs/1812.05905. arXiv:1812.05905, doi:arXiv:1812.05905. URL http://arxiv.org/abs/1812.05905

work page internal anchor Pith review Pith/arXiv arXiv
[36]

Haarnoja, V

T. Haarnoja, V . Pong, A. Zhou, M. Dalal, P. Abbeel, S. Levine, Composable Deep Reinforcement Learning for Robotic Manipulation, in: 2018 IEEE International Conference on Robotics and Automation (ICRA), 2018, pp. 6244–6251. doi:10.1109/ICRA.2018.8460756. URL https://ieeexplore.ieee.org/abstract/ document/8460756

work page doi:10.1109/icra.2018.8460756 2018
[37]

Xuezhi, Optimal Gait Control of Soft Quadruped Robot by Model-based Reinforcement Learning, Master’s thesis, Dept

N. Xuezhi, Optimal Gait Control of Soft Quadruped Robot by Model-based Reinforcement Learning, Master’s thesis, Dept. Engineering Design, KTH Royal Institute of Technology, Stockholm, Sweden (2023). URL https://urn.kb.se/resolve?urn=urn:nbn:se: kth:diva-339056

work page 2023
[38]

Biswal, P

P. Biswal, P. K. Mohanty, Development of quadruped walking robots: A review, Ain Shams Engineering Journal 12 (2) (2021) 2017–2031. doi:10.1016/j.asej.2020.11.005. URL https://www.sciencedirect.com/science/ article/pii/S2090447920302501

work page doi:10.1016/j.asej.2020.11.005 2021
[39]

Vukobratovi ´c, J

M. Vukobratovi ´c, J. Stepanenko, On the stability of anthro- pomorphic systems, Mathematical Biosciences 15 (1) (1972) 1–37. doi:10.1016/0025-5564(72)90061-2 . URL https://www.sciencedirect.com/science/ article/pii/0025556472900612

work page doi:10.1016/0025-5564(72)90061-2 1972
[40]

D. J. Hyun, S. Seok, J. Lee, S. Kim, High speed trot-running: Implementation of a hierarchical controller using propriocep- tive impedance control on the MIT Cheetah, The International Journal of Robotics Research 33 (11) (2014) 1417–1445. doi: 10.1177/0278364914532150. URL https://doi.org/10.1177/0278364914532150

work page doi:10.1177/0278364914532150 2014
[41]

Bertsekas, Reinforcement Learning and Optimal Control, Athena Scientific, Belmont, Massachusetts, 2019

D. Bertsekas, Reinforcement Learning and Optimal Control, Athena Scientific, Belmont, Massachusetts, 2019

work page 2019
[42]

Y . Shao, Y . Jin, X. Liu, W. He, H. Wang, W. Yang, Learning Free Gait Transition for Quadruped Robots Via Phase-Guided Controller, IEEE Robotics and Automation Letters 7 (2) (2022) 1230–1237. doi:10.1109/LRA.2021.3136645. 12

work page doi:10.1109/lra.2021.3136645 2022

[1] [1]

S. Choi, G. Ji, J. Park, H. Kim, J. Mun, J. H. Lee, J. Hwangbo, Learning quadrupedal locomotion on de- formable terrain, Science Robotics 8 (74) (2023) eade2256. doi:10.1126/scirobotics.ade2256. URL https://www.science.org/doi/abs/10.1126/ scirobotics.ade2256

work page doi:10.1126/scirobotics.ade2256 2023

[2] [2]

Bledt, M

G. Bledt, M. J. Powell, B. Katz, J. Di Carlo, P. M. Wensing, S. Kim, MIT Cheetah 3: Design and Control of a Robust, Dy- namic Quadruped Robot, in: 2018 IEEE/RSJ International Con- ference on Intelligent Robots and Systems (IROS), 2018, pp. 2245–2252. doi:10.1109/IROS.2018.8593885

work page doi:10.1109/iros.2018.8593885 2018

[3] [3]

Hutter, C

M. Hutter, C. Gehring, D. Jud, A. Lauber, C. D. Bellicoso, V . Tsounis, J. Hwangbo, K. Bodie, P. Fankhauser, M. Bloesch, R. Diethelm, S. Bachmann, A. Melzer, M. Hoepflinger, ANY- mal - a highly mobile and dynamic quadrupedal robot, in: 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2016, pp. 38–44.doi:10.1109/IROS.2016. 7...

work page doi:10.1109/iros.2016 2016

[4] [4]

Taheri, N

H. Taheri, N. Mozayani, A study on quadruped mobile robots, Mechanism and Machine Theory 190 (2023) 105448. doi:10.1016/j.mechmachtheory.2023.105448. URL https://www.sciencedirect.com/science/ article/pii/S0094114X23002197

work page doi:10.1016/j.mechmachtheory.2023.105448 2023

[5] [5]

O. Yasa, Y . Toshimitsu, M. Y . Michelis, L. S. Jones, M. Fil- ippi, T. Buchner, R. K. Katzschmann, An Overview of Soft Robotics, Annual Review of Control, Robotics, and Autonomous Systems 6 (V olume 6, 2023) (2023) 1–29. doi:10.1146/annurev-control-062322-100607 . URL https://www.annualreviews.org/content/ journals/10.1146/annurev-control-062322-100607

work page doi:10.1146/annurev-control-062322-100607 2023

[6] [6]

Drotman, S

D. Drotman, S. Jadhav, M. Karimi, P. de Zonia, M. T. Tolley, 3D printed soft actuators for a legged robot capable of navigating unstructured terrain, in: 2017 IEEE International Conference on Robotics and Automation (ICRA), 2017, pp. 5532–5538. doi: 10.1109/ICRA.2017.7989652

work page doi:10.1109/icra.2017.7989652 2017

[7] [7]

Q. Ji, S. Fu, L. Feng, G. Andrikopoulos, X. V . Wang, L. Wang, Omnidirectional walking of a quadruped robot enabled by com- pressible tendon-driven soft actuators, in: 2022 IEEE/RSJ Inter- national Conference on Intelligent Robots and Systems (IROS), 2022, pp. 11015–11022. doi:10.1109/IROS47612.2022. 9981314

work page doi:10.1109/iros47612.2022 2022

[8] [8]

J. Wang, A. Chortos, Control Strategies for Soft Robot Sys- tems, Advanced Intelligent Systems 4 (5) (2022) 2100165.doi: 10.1002/aisy.202100165. URL https://onlinelibrary.wiley.com/doi/abs/10. 1002/aisy.202100165

work page doi:10.1002/aisy.202100165 2022

[9] [9]

D. Rus, M. T. Tolley, Design, fabrication and control of soft robots, Nature 521 (7553) (2015) 467–475. doi:10.1038/ nature14543. URL https://www.nature.com/articles/nature14543

work page 2015

[10] [10]

Fahmi, C

S. Fahmi, C. Mastalli, M. Focchi, C. Semini, Pas- sive Whole-Body Control for Quadruped Robots: Ex- perimental Validation Over Challenging Terrain, IEEE Robotics and Automation Letters 4 (3) (2019) 2553–2560. doi:10.1109/LRA.2019.2908502. URL https://ieeexplore.ieee.org/abstract/ document/8678400/authors#authors

work page doi:10.1109/lra.2019.2908502 2019

[11] [11]

TartanAir: A dataset to push the limits of visual SLAM,

T. Dudzik, M. Chignoli, G. Bledt, B. Lim, A. Miller, D. Kim, S. Kim, Robust Autonomous Navigation of a Small-Scale Quadruped Robot in Real-World Environments, in: 2020 IEEE /RSJ International Conference on Intelli- gent Robots and Systems (IROS), 2020, pp. 3664–3671. doi:10.1109/IROS45743.2020.9340701. URL https://ieeexplore.ieee.org/abstract/ document/9340701

work page doi:10.1109/iros45743.2020.9340701 2020

[12] [12]

Sleiman, F

J.-P. Sleiman, F. Farshidian, M. V . Minniti, M. Hutter, A Unified MPC Framework for Whole-Body Dynamic Locomotion and Manipulation, IEEE Robotics and Automation Letters 6 (3) (2021) 4688–4695. doi:10.1109/LRA.2021.3068908. URL https://ieeexplore.ieee.org/abstract/ document/9387121

work page doi:10.1109/lra.2021.3068908 2021

[13] [13]

Di Carlo, P

J. Di Carlo, P. M. Wensing, B. Katz, G. Bledt, S. Kim, Dynamic Locomotion in the MIT Cheetah 3 Through Convex Model- Predictive Control, in: 2018 IEEE /RSJ International Confer- ence on Intelligent Robots and Systems (IROS), 2018, pp. 1–9. doi:10.1109/IROS.2018.8594448

work page doi:10.1109/iros.2018.8594448 2018

[14] [14]

Ponton, M

B. Ponton, M. Khadiv, A. Meduri, L. Righetti, E fficient Multicontact Pattern Generation With Sequential Con- vex Approximations of the Centroidal Dynamics, IEEE Transactions on Robotics 37 (5) (2021) 1661–1679. doi:10.1109/TRO.2020.3048125. URL https://ieeexplore.ieee.org/abstract/ document/9350175

work page doi:10.1109/tro.2020.3048125 2021

[15] [15]

T. Miki, J. Lee, J. Hwangbo, L. Wellhausen, V . Koltun, M. Hut- ter, Learning robust perceptive locomotion for quadrupedal robots in the wild, Science Robotics 7 (62) (2022) eabk2822. doi:10.1126/scirobotics.abk2822. URL https://www.science.org/doi/10.1126/ scirobotics.abk2822

work page doi:10.1126/scirobotics.abk2822 2022

[16] [16]

Tsounis, M

V . Tsounis, M. Alge, J. Lee, F. Farshidian, M. Hutter, DeepGait: Planning and Control of Quadrupedal Gaits Using Deep Re- inforcement Learning, IEEE Robotics and Automation Letters 5 (2) (2020) 3699–3706. doi:10.1109/LRA.2020.2979660. URL https://ieeexplore.ieee.org/abstract/ document/9028188

work page doi:10.1109/lra.2020.2979660 2020

[17] [17]

Morimoto, S

R. Morimoto, S. Nishikawa, R. Niiyama, Y . Kuniyoshi, Model-Free Reinforcement Learning with Ensemble for a Soft Continuum Robot Arm, in: 2021 IEEE 4th International Conference on Soft Robotics (RoboSoft), 2021, pp. 141–148. doi:10.1109/RoboSoft51838.2021.9479340. URL https://ieeexplore.ieee.org/abstract/ document/9479340

work page doi:10.1109/robosoft51838.2021.9479340 2021

[18] [18]

Q. Ji, S. Fu, K. Tan, S. Thorapalli Muralidharan, K. Lagrelius, D. Danelia, G. Andrikopoulos, X. V . Wang, L. Wang, L. Feng, Synthesizing the optimal gait of a quadruped robot with soft actuators using deep reinforcement learning, Robotics and Computer-Integrated Manufacturing 78 (2022) 102382. doi:10.1016/j.rcim.2022.102382. URL https://www.sciencedirect...

work page doi:10.1016/j.rcim.2022.102382 2022

[19] [19]

T. G. Thuruthel, E. Falotico, F. Renda, C. Laschi, Model-Based Reinforcement Learning for Closed-Loop Dynamic Control of Soft Robotic Manipulators, IEEE Transactions on Robotics 35 (1) (2019) 124–134. doi:10.1109/TRO.2018.2878318. URL https://ieeexplore.ieee.org/abstract/ document/8531756

work page doi:10.1109/tro.2018.2878318 2019

[20] [20]

M. Pei, H. An, B. Liu, C. Wang, An improved dyna-q algorithm for mobile robot path planning in unknown dynamic environ- ment, IEEE Transactions on Systems, Man, and Cybernetics: Systems 52 (7) (2021) 4415–4425

work page 2021

[21] [21]

D. Yu, W. Zou, Y . Yang, H. Ma, S. E. Li, Y . Yin, J. Chen, J. Duan, Safe model-based reinforcement learning with an uncertainty-aware reachability certificate, IEEE Transactions on Automation Science and Engineering

work page

[22] [22]

Z. Bing, L. Knak, L. Cheng, F. O. Morin, K. Huang, A. Knoll, Meta-reinforcement learning in nonstationary and nonparamet- ric environments, IEEE Transactions on Neural Networks and Learning Systems

work page

[23] [23]

Ballou, X

A. Ballou, X. Alameda-Pineda, C. Reinke, Variational meta re- inforcement learning for social robotics, Applied Intelligence 53 (22) (2023) 27249–27268

work page 2023

[24] [24]

L. Zhu, P. Peng, Z. Lu, Y . Tian, Metavim: Meta variationally in- trinsic motivated reinforcement learning for decentralized traffic signal control, IEEE Transactions on Knowledge and Data En- gineering 35 (11) (2023) 11570–11584

work page 2023

[25] [25]

J. Li, X. Shi, J. Li, X. Zhang, J. Wang, Random curiosity- driven exploration in deep reinforcement learning, Neurocom- puting 418 (2020) 139–147

work page 2020

[26] [26]

Jiang, J

Y . Jiang, J. Z. Kolter, R. Raileanu, On the importance of explo- ration for generalization in reinforcement learning, Advances in Neural Information Processing Systems 36

work page

[27] [27]

Ibarz, J

J. Ibarz, J. Tan, C. Finn, M. Kalakrishnan, P. Pastor, S. Levine, How to train your robot with deep reinforcement learn- ing: lessons we have learned, The International Journal of Robotics Research 40 (4-5) (2021) 698–721. doi:10.1177/ 0278364920987859. URL https://doi.org/10.1177/0278364920987859

work page doi:10.1177/0278364920987859 2021

[28] [28]

J. Xie, H. Dong, X. Zhao, Data-driven torque and pitch 11 control of wind turbines via reinforcement learning, Renewable Energy 215 (2023) 118893. doi:https: //doi.org/10.1016/j.renene.2023.06.014. URL https://www.sciencedirect.com/science/ article/pii/S0960148123007905

work page doi:10.1016/j.renene.2023.06.014 2023

[29] [29]

M. Cao, R. Wang, N. Chen, J. Wang, A learning-based vehi- cle trajectory-tracking approach for autonomous vehicles with lidar failure under various lighting conditions, IEEE /ASME transactions on mechatronics 27 (2) (2021) 1011–1022. doi: 10.1109/TMECH.2021.3077388

work page doi:10.1109/tmech.2021.3077388 2021

[30] [30]

B. J. Claessens, P. Vrancx, F. Ruelens, Convolutional neural networks for automatic state-time feature extraction in rein- forcement learning applied to residential load control, IEEE Transactions on Smart Grid 9 (4) (2016) 3259–3269. doi: 10.1109/TSG.2016.2629450

work page doi:10.1109/tsg.2016.2629450 2016

[31] [31]

Z. C. Lipton, J. Berkowitz, C. Elkan, A critical review of re- current neural networks for sequence learning, arXiv preprint arXiv:1506.00019 abs/1506.00019. arXiv:1506.00019, doi: arXiv:1506.00019. URL http://arxiv.org/abs/1506.00019

work page internal anchor Pith review Pith/arXiv arXiv

[32] [32]

Sherstinsky, Fundamentals of recurrent neural network (rnn) and long short-term memory (lstm) network, Physica D: Nonlin- ear Phenomena 404 (2020) 132306

A. Sherstinsky, Fundamentals of recurrent neural network (rnn) and long short-term memory (lstm) network, Physica D: Nonlin- ear Phenomena 404 (2020) 132306. doi:10.1016/j.physd. 2019.132306

work page doi:10.1016/j.physd 2020

[33] [33]

Lee, Matthew Tan, Yuke Zhu, and Jeannette Bohg

Y . Huang, K. Xie, H. Bharadhwaj, F. Shkurti, Contin- ual Model-Based Reinforcement Learning with Hyper- networks, in: 2021 IEEE International Conference on Robotics and Automation (ICRA), 2021, pp. 799–805. doi:10.1109/ICRA48506.2021.9560793. URL https://ieeexplore.ieee.org/abstract/ document/9560793

work page doi:10.1109/icra48506.2021.9560793 2021

[34] [34]

Haarnoja, A

T. Haarnoja, A. Zhou, P. Abbeel, S. Levine, Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor, in: J. Dy, A. Krause (Eds.), Proceedings of the 35th International Conference on Machine Learning, V ol. 80 of Proceedings of Machine Learning Research, PMLR, 2018, pp. 1861–1870. URL https://proceedings.mlr.press/...

work page 2018

[35] [35]

Soft Actor-Critic Algorithms and Applications

H. Tuomas, Z. Aurick, H. Kristian, T. George, H. Sehoon, T. Jie, K. Vikash, Z. Henry, G. Abhishek, A. Pieter, L. Sergey, Soft actor-critic algorithms and applications, CoRR abs/1812.05905. arXiv:1812.05905, doi:arXiv:1812.05905. URL http://arxiv.org/abs/1812.05905

work page internal anchor Pith review Pith/arXiv arXiv

[36] [36]

Haarnoja, V

T. Haarnoja, V . Pong, A. Zhou, M. Dalal, P. Abbeel, S. Levine, Composable Deep Reinforcement Learning for Robotic Manipulation, in: 2018 IEEE International Conference on Robotics and Automation (ICRA), 2018, pp. 6244–6251. doi:10.1109/ICRA.2018.8460756. URL https://ieeexplore.ieee.org/abstract/ document/8460756

work page doi:10.1109/icra.2018.8460756 2018

[37] [37]

Xuezhi, Optimal Gait Control of Soft Quadruped Robot by Model-based Reinforcement Learning, Master’s thesis, Dept

N. Xuezhi, Optimal Gait Control of Soft Quadruped Robot by Model-based Reinforcement Learning, Master’s thesis, Dept. Engineering Design, KTH Royal Institute of Technology, Stockholm, Sweden (2023). URL https://urn.kb.se/resolve?urn=urn:nbn:se: kth:diva-339056

work page 2023

[38] [38]

Biswal, P

P. Biswal, P. K. Mohanty, Development of quadruped walking robots: A review, Ain Shams Engineering Journal 12 (2) (2021) 2017–2031. doi:10.1016/j.asej.2020.11.005. URL https://www.sciencedirect.com/science/ article/pii/S2090447920302501

work page doi:10.1016/j.asej.2020.11.005 2021

[39] [39]

Vukobratovi ´c, J

M. Vukobratovi ´c, J. Stepanenko, On the stability of anthro- pomorphic systems, Mathematical Biosciences 15 (1) (1972) 1–37. doi:10.1016/0025-5564(72)90061-2 . URL https://www.sciencedirect.com/science/ article/pii/0025556472900612

work page doi:10.1016/0025-5564(72)90061-2 1972

[40] [40]

D. J. Hyun, S. Seok, J. Lee, S. Kim, High speed trot-running: Implementation of a hierarchical controller using propriocep- tive impedance control on the MIT Cheetah, The International Journal of Robotics Research 33 (11) (2014) 1417–1445. doi: 10.1177/0278364914532150. URL https://doi.org/10.1177/0278364914532150

work page doi:10.1177/0278364914532150 2014

[41] [41]

Bertsekas, Reinforcement Learning and Optimal Control, Athena Scientific, Belmont, Massachusetts, 2019

D. Bertsekas, Reinforcement Learning and Optimal Control, Athena Scientific, Belmont, Massachusetts, 2019

work page 2019

[42] [42]

Y . Shao, Y . Jin, X. Liu, W. He, H. Wang, W. Yang, Learning Free Gait Transition for Quadruped Robots Via Phase-Guided Controller, IEEE Robotics and Automation Letters 7 (2) (2022) 1230–1237. doi:10.1109/LRA.2021.3136645. 12

work page doi:10.1109/lra.2021.3136645 2022