pith. sign in

arxiv: 2606.18625 · v1 · pith:WNWEG7RJnew · submitted 2026-06-17 · 💻 cs.RO

SRL: Combining SLIP Model and Reinforcement Learning for Agile Robotic Jumping

Pith reviewed 2026-06-26 21:13 UTC · model grok-4.3

classification 💻 cs.RO
keywords robotic jumpingSLIP modelreinforcement learninghybrid controlbipedal locomotionquadrupedal locomotionsim-to-real transfer
0
0 comments X

The pith

A hybrid controller fuses the SLIP spring-mass model with reinforcement learning to produce stable robotic jumps on irregular terrain after far less training than pure RL.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Spring-loaded Reinforcement Learning (SRL) that supplies a feedforward trajectory from the SLIP model to guide an RL policy whose role is real-time correction. Because the SLIP component encodes biologically plausible hopping dynamics, the RL agent explores a narrower space and converges faster while still adapting when contact or joint assumptions break on stairs or uneven ground. Simulations on bipedal and quadrupedal platforms, plus sim-to-sim and sim-to-real transfers, show average position error below 0.1 m and velocity error within plus or minus 3 percent of target values. A reader cares because the approach suggests that embedding simplified physics inside learning loops can cut the data hunger that currently limits agile locomotion in search-and-rescue or logistics robots.

Core claim

SRL integrates SLIP-based feedforward control signals with RL-driven real-time feedback, enabling continuous optimization of robotic jumping that yields more stable performance with substantially reduced training time relative to baseline RL methods.

What carries the argument

The SRL hybrid that adds SLIP-derived feedforward signals to an RL policy so the policy learns only the residual corrections needed on real terrain.

If this is right

  • SRL produces stable jumps on stairs and uneven ground where pure SLIP fails and pure RL trains slowly.
  • Position and velocity tracking remain within the stated error bounds across bipedal and quadrupedal morphologies.
  • Sim-to-real transfer succeeds without additional retraining beyond the reported protocol.
  • The same hybrid pattern could shorten training for other periodic locomotion tasks once a suitable template model exists.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The method may extend to running or bounding gaits if a suitable template model replaces SLIP.
  • Hardware implementations could test whether the feedforward term still helps when actuator delays or sensor noise exceed simulation levels.
  • If the SLIP template is replaced by a different low-dimensional model, the same training-time reduction might appear in other domains such as manipulation.

Load-bearing premise

The idealized SLIP contact and joint assumptions stay close enough to real robot dynamics that the feedforward signal does not systematically mislead the RL policy on irregular terrain.

What would settle it

Real-robot trials on highly irregular surfaces that produce either position tracking error above 0.1 m or training times comparable to unguided RL would falsify the performance claim.

Figures

Figures reproduced from arXiv: 2606.18625 by Chenyue Shao, Linqi Ye, Qingdu Li, Rankun Li, Xiaowen Hu, Yan Peng, Yudi Zhu.

Figure 1
Figure 1. Figure 1: Overview of the SRL control framework for robot jumping tasks, integrating the SLIP model, RL, and simulation environment. SRL combines the physically grounded motion dynamics of the SLIP model with the adaptability of RL to optimize jumping performance in complex environments such as flat ground with varying disturbances, stairs, and boxes. 3. Methods 3.1. SRL Structure This study proposes a novel robot j… view at source ↗
Figure 2
Figure 2. Figure 2: A six-state FSM constructed based on the SLIP model, where each state represents a key stage in the jump cycle. 3.2.2. Ground Contact Phase (State 0) In the ground contact phase, the robot’s foot makes contact with the ground, and the leg compresses like a spring, generating force 𝐹0 to push the CoM upward. The magnitude of the force is determined by the displacement Δ𝐿 between the CoM and the foot, and th… view at source ↗
Figure 3
Figure 3. Figure 3: Left: Reward evolution for the biped robot’s random-distance jump; Right: Learning efficiency comparison of RL-only and SRL in fixed-distance jumping. 4. Experiments 4.1. Simulation Setup In this study, we use the Unity engine and ML-Agents toolkit for simulation and training experiments. The simulation runs with a time step of 0.01 seconds, corresponding to a control frequency of 100Hz. To accelerate the … view at source ↗
Figure 4
Figure 4. Figure 4: Performance of the biped robot during fixed-distance jumping, showing trunk velocity, absolute tracking errors of the trunk position, and relative tracking errors of the ankle. feedforward joint references for bounding locomotion. This time-driven formulation replaces state-dependent phase switching with a fixed-frequency rhythmic pattern. Without adaptive phase modulation, the controller relies on a fixed… view at source ↗
Figure 5
Figure 5. Figure 5: Performance of the quadruped robot during fixed-distance and random-distance jumping tasks, showing trunk velocity, absolute tracking errors of the trunk position, and relative tracking errors of the foot [PITH_FULL_IMAGE:figures/full_fig_p011_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Fixed-distance jumping simulations of the biped and quadruped robots. Orange and green denote the target and actual states, respectively. accuracy. Overall, across both tasks, the biped robot demonstrated strong capability in tracking the target velocity, keeping body tracking errors within 0.2 m over a 10-second period. Although the foot-effectors occasionally exhibited periodic error spikes during jumpin… view at source ↗
Figure 7
Figure 7. Figure 7: Simulation of the biped (left) and quadruped (right) robot in random-distance jumping tasks, illustrating the motion trajectories of the body and ankle/foot. leg joints experienced maximum explosive force and its body velocity reached its peak. Due to the system’s increased dynamic sensitivity, even small position errors were amplified. Despite the relatively large error spikes shown in the graph, these er… view at source ↗
Figure 8
Figure 8. Figure 8: Performance of the quadruped robot during the box-jumping experiment. The left shows absolute body position errors and relative foot errors, while the right shows the height of the body and feet [PITH_FULL_IMAGE:figures/full_fig_p013_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Simulation of the quadruped robot during the box-jumping experiment. terrain, the robot was still able to maintain the relative foot errors mostly within 0.05 m. Moreover, the height (y￾coordinate) of the body and feet during the jumping process indicates that the robot successfully adjusted its posture and extended its legs to manage the vertical displacement when stepping up. The trajectories of the four… view at source ↗
Figure 10
Figure 10. Figure 10: The X02-lite humanoid robot (Shanghai Droid Robotics) used in the experiment, featuring a height of 168 cm and a total mass of 28 kg. (a) Real robot. (b) Mujoco model. (c) Simplified link model (focusing on the 10 DOF legs) [PITH_FULL_IMAGE:figures/full_fig_p014_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Performance comparison of the training policy in simulation (Isaac Gym and Mujoco) and real-world experiments, demonstrating successful transfer. The robot achieved a maximum jump height of 15 cm, distance of 20 cm, and peak velocity of 2 m/s. provide superior estimation performance compared to standard quaternion-based EKFs. The estimated velocity serves as input to the RL policy during real robot deploy… view at source ↗
Figure 12
Figure 12. Figure 12: Average body height comparison over one gait cycle: reference vs. simulation vs. real-world [PITH_FULL_IMAGE:figures/full_fig_p016_12.png] view at source ↗
read the original abstract

Robotic jumping is pivotal in applications such as search and rescue and logistics, where crossing obstacles and enhancing mobility efficiency are critical. The Spring-Loaded Inverted Pendulum (SLIP) model leverages simplified spring-mass dynamics that naturally encode biologically plausible hopping motions, yet its performance degrades on irregular terrain due to idealized assumptions regarding contact and joint dynamics. Meanwhile, Reinforcement Learning (RL) can adapt to diverse and complex environments but often requires extensive data from unguided exploration. The complementary strengths of SLIP's physically grounded baseline and RL's adaptive capabilities motivate a hybrid framework that overcomes these individual limitations. We therefore propose Spring-loaded Reinforcement Learning (SRL), which integrates SLIP-based feedforward control signals with RL-driven real-time feedback, enabling continuous optimization of robotic jumping. Experimental results demonstrate that SRL can achieve more stable jumps with much less training time than the baseline method, maintaining an average position tracking error below 0.1 m and velocity tracking errors within +/-3% of the target values. Through bipedal and quadrupedal simulations of ground and stair jumping, as well as sim-to-sim and sim-to-real validations, SRL exhibits robust adaptability to various task requirements and environmental complexities, underscoring its potential for real-world deployment.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper proposes SRL, a hybrid controller that augments RL policies with feedforward signals derived from the SLIP model for bipedal and quadrupedal jumping tasks. It claims that the combination yields more stable jumps, substantially shorter training times than pure RL baselines, position tracking error below 0.1 m, and velocity tracking errors within ±3 % of targets, supported by ground/stair simulations plus sim-to-sim and sim-to-real transfer.

Significance. If the performance gap is reproducible and the SLIP feedforward remains beneficial rather than harmful under terrain mismatch, the work would provide concrete evidence that model-based priors can reduce sample complexity in agile locomotion without sacrificing adaptability. The sim-to-real results would be a useful data point for hybrid control in robotics.

major comments (3)
  1. [§4, §5] §4 (Method) and §5 (Experiments): the central claim that SRL reduces training time while improving tracking rests on the assumption that SLIP-derived feedforward remains a net-positive signal on irregular terrain. The manuscript notes SLIP degradation on irregular surfaces yet provides no quantitative ablation measuring how large a mismatch between SLIP contact/joint assumptions and robot dynamics can be tolerated before the hybrid policy underperforms the pure-RL baseline.
  2. [Table 2, Figure 7] Table 2 and Figure 7: the reported position error <0.1 m and velocity error ±3 % are given without error bars, number of random seeds, or statistical tests against the baseline. It is therefore impossible to determine whether the observed gap is statistically reliable or sensitive to hyper-parameter choices.
  3. [§5.3] §5.3 (Sim-to-real): the sim-to-real validation uses only a single terrain type and a limited set of initial conditions. No systematic stress test (e.g., added sensor noise, mass variation, or stair height outside the training distribution) is reported to probe whether the SLIP prior introduces persistent bias when dynamics deviate.
minor comments (2)
  1. [Abstract, §4] The abstract states quantitative results but the methods section does not specify the exact RL algorithm, network architecture, or reward weights used for the baseline comparison.
  2. [§3] Notation for the SLIP feedforward torque and the RL policy output is introduced without an explicit equation linking the two signals (e.g., total torque = τ_SLIP + π_RL(s)).

Simulated Author's Rebuttal

3 responses · 0 unresolved

Thank you for the constructive feedback. We address each major comment below and indicate the changes we will make to strengthen the manuscript.

read point-by-point responses
  1. Referee: [§4, §5] §4 (Method) and §5 (Experiments): the central claim that SRL reduces training time while improving tracking rests on the assumption that SLIP-derived feedforward remains a net-positive signal on irregular terrain. The manuscript notes SLIP degradation on irregular surfaces yet provides no quantitative ablation measuring how large a mismatch between SLIP contact/joint assumptions and robot dynamics can be tolerated before the hybrid policy underperforms the pure-RL baseline.

    Authors: We agree that a quantitative ablation on mismatch tolerance would strengthen the central claim. The revised manuscript will add an ablation study that varies terrain irregularity and reports the SRL vs. pure-RL performance gap as a function of mismatch severity, identifying the point at which the SLIP prior ceases to be net-positive. revision: yes

  2. Referee: [Table 2, Figure 7] Table 2 and Figure 7: the reported position error <0.1 m and velocity error ±3 % are given without error bars, number of random seeds, or statistical tests against the baseline. It is therefore impossible to determine whether the observed gap is statistically reliable or sensitive to hyper-parameter choices.

    Authors: We will rerun the experiments with at least five random seeds, add error bars to Table 2 and Figure 7, and include statistical tests (e.g., paired t-tests) against the baseline in the revised version. revision: yes

  3. Referee: [§5.3] §5.3 (Sim-to-real): the sim-to-real validation uses only a single terrain type and a limited set of initial conditions. No systematic stress test (e.g., added sensor noise, mass variation, or stair height outside the training distribution) is reported to probe whether the SLIP prior introduces persistent bias when dynamics deviate.

    Authors: The existing sim-to-real results were intended as an initial proof of concept on representative hardware. In revision we will expand the section with additional simulation-based stress tests that include sensor noise, mass variation, and out-of-distribution stair heights to assess potential bias from the SLIP prior. revision: partial

Circularity Check

0 steps flagged

No circularity: empirical hybrid method with independent validation

full rationale

The paper proposes SRL as an integration of the standard SLIP model (external, not derived here) for feedforward with RL for feedback, then reports simulation and real-robot experimental outcomes on tracking error and training time. No equations, parameters, or uniqueness claims are presented that reduce by construction to fitted inputs, self-definitions, or self-citation chains; the performance claims rest on direct empirical comparison to a pure-RL baseline rather than any internal derivation.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract supplies no explicit free parameters, axioms, or invented entities; the hybrid rests on the unstated premise that SLIP dynamics remain useful as a feedforward prior.

pith-pipeline@v0.9.1-grok · 5763 in / 1110 out tokens · 20688 ms · 2026-06-26T21:13:20.825810+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

47 extracted references · 44 canonical work pages · 1 internal anchor

  1. [1]

    X.Mo,W.Ge,M.Miraglia,F.Inglese,D.Zhao,C.Stefanini,D.Romano,Jumpinglocomotionstrategies:Fromanimalstobioinspiredrobots, Applied Sciences 10 (23) (2020) 8607.doi:10.3390/app10238607

  2. [2]

    J.-S. Koh, E. Yang, G.-P. Jung, S.-P. Jung, J. H. Son, S.-I. Lee, P. G. Jablonski, R. J. Wood, H.-Y. Kim, K.-J. Cho, Jumping on water: Surface tension–dominated jumping of water striders and robotic insects, Science 349 (6247) (2015) 517–521.doi:10.1126/science.aab1637

  3. [3]

    C. Yi, X. Chen, Y. Zhang, Z. Yu, H. Qi, Y. Liu, Q. Huang, Simulating the grf of humanoid robot vertical jumping using a simplified model with a foot structure for foot design, Journal of Bionic Engineering 21 (1) (2024) 112–125.doi:10.1007/s42235-023-00429-8

  4. [4]

    X.Wang,W.Guo,Z.He,R.Li,F.Zha,L.Sun,Bionicjumpingofhumanoidrobotviaonlinecentroidtrajectoryoptimizationandhighdynamic motion controller, Journal of Bionic Engineering 21 (6) (2024) 2759–2778.doi:10.1007/s42235-024-00586-4

  5. [5]

    Z. Zhao, S. Sun, H. Huang, Q. Gao, W. Xu, Design and control of continuous jumping gaits for humanoid robots based on motion function and reinforcement learning, Procedia Computer Science 250 (2024) 51–57.doi:10.1016/j.procs.2024.11.008

  6. [6]

    Y.Liu,X.Chen,Z.Yu,H.Qi,C.Yi,Singlesequentialtrajectoryoptimizationwithcentroidaldynamicsandwhole-bodykinematicsforvertical jump of humanoid robot, Biomimetics 9 (5) (2024) 274.doi:10.3390/biomimetics9050274

  7. [7]

    Ribak, Insect-inspired jumping robots: challenges and solutions to jump stability, Current Opinion in Insect Science 42 (2020) 32–38

    G. Ribak, Insect-inspired jumping robots: challenges and solutions to jump stability, Current Opinion in Insect Science 42 (2020) 32–38. doi:10.1016/j.cois.2020.09.001

  8. [8]

    675–676.doi:10.1109/URAI.2017.7992792

    K.Y.Su,J.Z.Gul,K.H.Choi,Abiomimeticjumpinglocomotionoffunctionallygradedfrogsoftrobot,in:201714thInternationalConference on Ubiquitous Robots and Ambient Intelligence (URAI), IEEE, 2017, pp. 675–676.doi:10.1109/URAI.2017.7992792

  9. [9]

    Afschrift, E

    M. Afschrift, E. Van Asseldonk, M. Van Mierlo, C. Bayon, A. Keemink, L. D’Hondt, H. Van Der Kooij, F. De Groote, Assisting walking balance using a bio-inspired exoskeleton controller, Journal of Neuroengineering and Rehabilitation 20 (1) (2023) 82.doi:10.1186/ s12984-023-01205-9

  10. [10]

    Ezekiel, R

    D. Ezekiel, R. Samikannu, O. Matsebe, Bio-inspired jumping spider optimization for controller tuning/parameter estimation of an uncertain aerodynamic mimo system, Chaos Theory and Applications 6 (3) (2024) 205–217.doi:10.51537/chaos.1396823

  11. [11]

    Elliott, X

    H. Elliott, X. An, M. Wang, A bio-inspired jumping robot: Design, modelling and experimental tests, in: Annual Conference Towards Autonomous Robotic Systems, Springer, 2024, pp. 164–170.doi:10.1007/978-3-031-72062-8_15

  12. [12]

    Kabir, A

    M. Kabir, A. Anand, P. Sundaravadivel, Hop-bot: a bio-inspired approach to locomotion and stability in modular robotics, in: 2024 IEEE International Conference on Electro Information Technology (EIT), IEEE, 2024, pp. 285–290.doi:10.1109/eIT60633.2024.10609916

  13. [13]

    J. Hong, C. Yeo, S. Bae, J. Hong, S. Oh, Slip embodied robust quadruped robot control, in: 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), IEEE, 2024, pp. 14219–14224.doi:10.1109/IROS58592.2024.10802545

  14. [14]

    doi:10.1177/0278364914552112

    G.Piovan,K.Byl,Reachability-basedcontrolfortheactiveslipmodel,TheInternationalJournalofRoboticsResearch34(3)(2015)270–287. doi:10.1177/0278364914552112

  15. [15]

    Piovan, K

    G. Piovan, K. Byl, Approximation and control of the slip model dynamics via partial feedback linearization and two-element leg actuation strategy, IEEE Transactions on Robotics 32 (2) (2016) 399–412.doi:10.1109/TRO.2016.2529649

  16. [16]

    Hamzaçebi, I

    H. Hamzaçebi, I. Uyanik, Ö. Morgül, On the analysis and control of a bipedal legged locomotion model via partial feedback linearization, Bioinspiration & Biomimetics 19 (5) (2024) 056004.doi:10.1088/1748-3190/ad5cb6

  17. [17]

    H.-W. Park, P. M. Wensing, S. Kim, Jumping over obstacles with mit cheetah 2, Robotics and Autonomous Systems 136 (2021) 103703. doi:10.1016/j.robot.2020.103703

  18. [18]

    :Preprint submitted to Elsevier Page 16 of 17

    D.Ahn,B.-K.Cho,Onlinejumpingmotiongenerationviamodelpredictivecontrol,IEEETransactionsonIndustrialElectronics69(5)(2021) 4957–4965.doi:10.1109/TIE.2021.3078396. :Preprint submitted to Elsevier Page 16 of 17

  19. [19]

    G. Ji, J. Mun, H. Kim, J. Hwangbo, Concurrent training of a control policy and a state estimator for dynamic and robust legged locomotion, IEEE Robotics and Automation Letters 7 (2) (2022) 4630–4637.doi:10.1109/LRA.2022.3151396

  20. [20]

    Z. He, J. Wu, J. Zhang, S. Zhang, Y. Shi, H. Liu, L. Sun, Y. Su, X. Leng, Cdm-mpc: An integrated dynamic planning and control framework for bipedal robots jumping, IEEE Robotics and Automation Letters 9 (7) (2024) 6672–6679.doi:10.1109/LRA.2024.3408487

  21. [21]

    Z. Xu, J. Xie, K. Hashimoto, Human-inspired gait and jumping motion generation for bipedal robots using model predictive control, Biomimetics 10 (1) (2025) 17.doi:10.3390/biomimetics10010017

  22. [22]

    Z.Fu,Z.Yu,X.Chen,L.Han,P.Gergondet,J.Zhang,Q.Huang,Continuousbipedaljumpingviasliding-moderegularizedpredictivecontrol, IEEE/ASME Transactions on Mechatronics (2024).doi:10.1109/TMECH.2024.3515151

  23. [23]

    J.Kober,J.A.Bagnell,J.Peters,Reinforcementlearninginrobotics:Asurvey,TheInternationalJournalofRoboticsResearch32(11)(2013) 1238–1274.doi:10.1177/0278364913495721

  24. [24]

    C. Tao, M. Li, F. Cao, Z. Gao, Z. Zhang, A multiobjective collaborative deep reinforcement learning algorithm for jumping optimization of bipedal robot, Advanced Intelligent Systems 6 (1) (2024) 2300352.doi:10.1002/aisy.202300352

  25. [25]

    4934–4939.doi:10.1109/IROS.2010.5651461

    M.Hutter,C.D.Remy,M.A.Höpflinger,R.Siegwart,Sliprunningwithanarticulatedroboticleg,in:2010IEEE/RSJInternationalConference on Intelligent Robots and Systems, IEEE, 2010, pp. 4934–4939.doi:10.1109/IROS.2010.5651461

  26. [26]

    X.He,X.Li,X.Wang,F.Meng,X.Guan,Z.Jiang,L.Yuan,K.Ba,G.Ma,B.Yu,Runninggaitandcontrolofquadrupedrobotbasedonslip model, Biomimetics 9 (1) (2024).doi:10.3390/biomimetics9010024

  27. [27]

    P. M. Wensing, D. E. Orin, Control of humanoid hopping based on a slip model, Advances in Mechanisms, Robotics and Design Education and Research (2013) 265–274doi:10.1007/978-3-319-00398-6_21

  28. [28]

    Proximal Policy Optimization Algorithms

    J. Schulman, F. Wolski, P. Dhariwal, A. Radford, O. Klimov, Proximal policy optimization algorithms, arXiv preprint arXiv:1707.06347 (2017).doi:10.48550/arXiv.1707.06347

  29. [29]

    C.Yu,A.Velu,E.Vinitsky,J.Gao,Y.Wang,A.Bayen,Y.Wu,Thesurprisingeffectivenessofppoincooperativemulti-agentgames,Advances in Neural Information Processing Systems 35 (2022) 24611–24624

  30. [30]

    Y. Zhao, T. Wu, Y. Zhu, X. Lu, J. Wang, H. Bou-Ammar, X. Zhang, P. Du, Zsl-rppo: Zero-shot learning for quadrupedal locomotion in challengingterrainsusingrecurrentproximalpolicyoptimization,arXivpreprintarXiv:2403.01928(2024).doi:/10.48550/arXiv.2403. 01928

  31. [31]

    Zhang, J

    Z. Zhang, J. Zhao, H. Chen, D. Chen, A survey of bioinspired jumping robot: takeoff, air posture adjustment, and landing buffer, Applied Bionics and Biomechanics 2017 (1) (2017) 4780160.doi:10.1155/2017/4780160

  32. [32]

    Zhang, W

    C. Zhang, W. Zou, L. Ma, Z. Wang, Biologically inspired jumping robots: A comprehensive review, Robotics and Autonomous Systems 124 (2020) 103362.doi:10.1016/j.robot.2019.103362

  33. [33]

    Garofalo, C

    G. Garofalo, C. Ott, A. Albu-Schäffer, Walking control of fully actuated robots based on the bipedal slip model, in: 2012 IEEE International Conference on Robotics and Automation, IEEE, 2012, pp. 1456–1463.doi:10.1109/ICRA.2012.6225272

  34. [34]

    Shahbazi, R

    M. Shahbazi, R. Babuška, G. A. Lopes, Unified modeling and control of walking and running on the spring-loaded inverted pendulum, IEEE Transactions on Robotics 32 (5) (2016) 1178–1195.doi:10.1109/TRO.2016.2593483

  35. [35]

    Rummel, Y

    J. Rummel, Y. Blum, A. Seyfarth, Robust and efficient walking with spring-like legs, Bioinspiration & Biomimetics 5 (4) (2010) 046004. doi:10.1088/1748-3182/5/4/046004

  36. [36]

    S. Xie, X. Li, H. Zhong, C. Hu, L. Gao, Compliant bipedal walking based on variable spring-loaded inverted pendulum model with finite- sized foot, in: 2021 6th IEEE International Conference on Advanced Robotics and Mechatronics (ICARM), IEEE, 2021, pp. 667–672. doi:10.1109/ICARM52023.2021.9536096

  37. [37]

    H.Sang,S.Wang,Lunarleaprobot:3marchitecture–enhanceddeepreinforcementlearningmethodforquadrupedrobotjumpinginlow-gravity environment, Journal of Aerospace Engineering 37 (6) (2024) 04024076.doi:10.1061/JAEEEZ.ASENG-5619

  38. [38]

    Bellegarda, C

    G. Bellegarda, C. Nguyen, Q. Nguyen, Robust quadruped jumping via deep reinforcement learning, Robotics and Autonomous Systems 182 (2024) 104799.doi:10.1016/j.robot.2024.104799

  39. [39]

    RoboAgent: Generalization and efficiency in robot manipulation via semantic augmentations and action chunking,

    G. Bellegarda, M. Shafiee, M. E. Özberk, A. Ijspeert, Quadruped-frog: Rapid online optimization of continuous quadruped jumping, in: 2024 IEEE International Conference on Robotics and Automation (ICRA), IEEE, 2024, pp. 1443–1450.doi:10.1109/ICRA57147.2024. 10610141

  40. [40]

    G.Bellegarda,A.Ijspeert,Cpg-rl:Learningcentralpatterngeneratorsforquadrupedlocomotion,IEEERoboticsandAutomationLetters7(4) (2022) 12547–12554.doi:10.1109/LRA.2022.3218167

  41. [41]

    X. B. Peng, M. Andrychowicz, W. Zaremba, P. Abbeel, Sim-to-real transfer of robotic control with dynamics randomization, in: 2018 IEEE International Conference on Robotics and Automation (ICRA), IEEE, 2018, pp. 3803–3810.doi:10.1109/ICRA.2018.8460528

  42. [42]

    Q. Zhou, G. Li, R. Tang, Y. Xu, H. Wen, Q. Shi, Stable jumping control based on deep reinforcement learning for a locust-inspired robot, Biomimetics 9 (9) (2024) 548.doi:10.3390/biomimetics9090548

  43. [43]

    R. J. Full, D. E. Koditschek, Templates and anchors: neuromechanical hypotheses of legged locomotion on land, Journal of Experimental Biology 202 (23) (1999) 3325–3332.doi:10.1242/jeb.202.23.3325

  44. [44]

    Geyer, U

    H. Geyer, U. Saranli, Gait based on the spring-loaded inverted pendulum, in: A. Goswami, P. Vadakkepat (Eds.), Humanoid Robotics: A Reference, Springer, Dordrecht, 2019, pp. 923–947.doi:10.1007/978-94-007-6046-2_43

  45. [45]

    L. Ye, Y. Cheng, J. Li, X. Wang, B. Liang, Y. Peng, From knowing to doing: learning diverse motor skills through instruction learning, Biomimetic Intelligence and Robotics (2026) 100286

  46. [46]

    Hartley, M

    R. Hartley, M. Ghaffari, R. M. Eustice, J. W. Grizzle, Contact-aided invariant extended kalman filtering for robot state estimation, The International Journal of Robotics Research 39 (4) (2020) 402–430.doi:10.1177/0278364919894385

  47. [47]

    Humanoid-gym: Reinforcement learning for humanoid robot with zero-shot sim2real transfer,

    X. Gu, Y.-J. Wang, J. Chen, Humanoid-gym: Reinforcement learning for humanoid robot with zero-shot sim2real transfer, arXiv preprint arXiv:2404.05695 (2024).doi:10.48550/arXiv.2404.05695. :Preprint submitted to Elsevier Page 17 of 17