SRL: Combining SLIP Model and Reinforcement Learning for Agile Robotic Jumping
Pith reviewed 2026-06-26 21:13 UTC · model grok-4.3
The pith
A hybrid controller fuses the SLIP spring-mass model with reinforcement learning to produce stable robotic jumps on irregular terrain after far less training than pure RL.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
SRL integrates SLIP-based feedforward control signals with RL-driven real-time feedback, enabling continuous optimization of robotic jumping that yields more stable performance with substantially reduced training time relative to baseline RL methods.
What carries the argument
The SRL hybrid that adds SLIP-derived feedforward signals to an RL policy so the policy learns only the residual corrections needed on real terrain.
If this is right
- SRL produces stable jumps on stairs and uneven ground where pure SLIP fails and pure RL trains slowly.
- Position and velocity tracking remain within the stated error bounds across bipedal and quadrupedal morphologies.
- Sim-to-real transfer succeeds without additional retraining beyond the reported protocol.
- The same hybrid pattern could shorten training for other periodic locomotion tasks once a suitable template model exists.
Where Pith is reading between the lines
- The method may extend to running or bounding gaits if a suitable template model replaces SLIP.
- Hardware implementations could test whether the feedforward term still helps when actuator delays or sensor noise exceed simulation levels.
- If the SLIP template is replaced by a different low-dimensional model, the same training-time reduction might appear in other domains such as manipulation.
Load-bearing premise
The idealized SLIP contact and joint assumptions stay close enough to real robot dynamics that the feedforward signal does not systematically mislead the RL policy on irregular terrain.
What would settle it
Real-robot trials on highly irregular surfaces that produce either position tracking error above 0.1 m or training times comparable to unguided RL would falsify the performance claim.
Figures
read the original abstract
Robotic jumping is pivotal in applications such as search and rescue and logistics, where crossing obstacles and enhancing mobility efficiency are critical. The Spring-Loaded Inverted Pendulum (SLIP) model leverages simplified spring-mass dynamics that naturally encode biologically plausible hopping motions, yet its performance degrades on irregular terrain due to idealized assumptions regarding contact and joint dynamics. Meanwhile, Reinforcement Learning (RL) can adapt to diverse and complex environments but often requires extensive data from unguided exploration. The complementary strengths of SLIP's physically grounded baseline and RL's adaptive capabilities motivate a hybrid framework that overcomes these individual limitations. We therefore propose Spring-loaded Reinforcement Learning (SRL), which integrates SLIP-based feedforward control signals with RL-driven real-time feedback, enabling continuous optimization of robotic jumping. Experimental results demonstrate that SRL can achieve more stable jumps with much less training time than the baseline method, maintaining an average position tracking error below 0.1 m and velocity tracking errors within +/-3% of the target values. Through bipedal and quadrupedal simulations of ground and stair jumping, as well as sim-to-sim and sim-to-real validations, SRL exhibits robust adaptability to various task requirements and environmental complexities, underscoring its potential for real-world deployment.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes SRL, a hybrid controller that augments RL policies with feedforward signals derived from the SLIP model for bipedal and quadrupedal jumping tasks. It claims that the combination yields more stable jumps, substantially shorter training times than pure RL baselines, position tracking error below 0.1 m, and velocity tracking errors within ±3 % of targets, supported by ground/stair simulations plus sim-to-sim and sim-to-real transfer.
Significance. If the performance gap is reproducible and the SLIP feedforward remains beneficial rather than harmful under terrain mismatch, the work would provide concrete evidence that model-based priors can reduce sample complexity in agile locomotion without sacrificing adaptability. The sim-to-real results would be a useful data point for hybrid control in robotics.
major comments (3)
- [§4, §5] §4 (Method) and §5 (Experiments): the central claim that SRL reduces training time while improving tracking rests on the assumption that SLIP-derived feedforward remains a net-positive signal on irregular terrain. The manuscript notes SLIP degradation on irregular surfaces yet provides no quantitative ablation measuring how large a mismatch between SLIP contact/joint assumptions and robot dynamics can be tolerated before the hybrid policy underperforms the pure-RL baseline.
- [Table 2, Figure 7] Table 2 and Figure 7: the reported position error <0.1 m and velocity error ±3 % are given without error bars, number of random seeds, or statistical tests against the baseline. It is therefore impossible to determine whether the observed gap is statistically reliable or sensitive to hyper-parameter choices.
- [§5.3] §5.3 (Sim-to-real): the sim-to-real validation uses only a single terrain type and a limited set of initial conditions. No systematic stress test (e.g., added sensor noise, mass variation, or stair height outside the training distribution) is reported to probe whether the SLIP prior introduces persistent bias when dynamics deviate.
minor comments (2)
- [Abstract, §4] The abstract states quantitative results but the methods section does not specify the exact RL algorithm, network architecture, or reward weights used for the baseline comparison.
- [§3] Notation for the SLIP feedforward torque and the RL policy output is introduced without an explicit equation linking the two signals (e.g., total torque = τ_SLIP + π_RL(s)).
Simulated Author's Rebuttal
Thank you for the constructive feedback. We address each major comment below and indicate the changes we will make to strengthen the manuscript.
read point-by-point responses
-
Referee: [§4, §5] §4 (Method) and §5 (Experiments): the central claim that SRL reduces training time while improving tracking rests on the assumption that SLIP-derived feedforward remains a net-positive signal on irregular terrain. The manuscript notes SLIP degradation on irregular surfaces yet provides no quantitative ablation measuring how large a mismatch between SLIP contact/joint assumptions and robot dynamics can be tolerated before the hybrid policy underperforms the pure-RL baseline.
Authors: We agree that a quantitative ablation on mismatch tolerance would strengthen the central claim. The revised manuscript will add an ablation study that varies terrain irregularity and reports the SRL vs. pure-RL performance gap as a function of mismatch severity, identifying the point at which the SLIP prior ceases to be net-positive. revision: yes
-
Referee: [Table 2, Figure 7] Table 2 and Figure 7: the reported position error <0.1 m and velocity error ±3 % are given without error bars, number of random seeds, or statistical tests against the baseline. It is therefore impossible to determine whether the observed gap is statistically reliable or sensitive to hyper-parameter choices.
Authors: We will rerun the experiments with at least five random seeds, add error bars to Table 2 and Figure 7, and include statistical tests (e.g., paired t-tests) against the baseline in the revised version. revision: yes
-
Referee: [§5.3] §5.3 (Sim-to-real): the sim-to-real validation uses only a single terrain type and a limited set of initial conditions. No systematic stress test (e.g., added sensor noise, mass variation, or stair height outside the training distribution) is reported to probe whether the SLIP prior introduces persistent bias when dynamics deviate.
Authors: The existing sim-to-real results were intended as an initial proof of concept on representative hardware. In revision we will expand the section with additional simulation-based stress tests that include sensor noise, mass variation, and out-of-distribution stair heights to assess potential bias from the SLIP prior. revision: partial
Circularity Check
No circularity: empirical hybrid method with independent validation
full rationale
The paper proposes SRL as an integration of the standard SLIP model (external, not derived here) for feedforward with RL for feedback, then reports simulation and real-robot experimental outcomes on tracking error and training time. No equations, parameters, or uniqueness claims are presented that reduce by construction to fitted inputs, self-definitions, or self-citation chains; the performance claims rest on direct empirical comparison to a pure-RL baseline rather than any internal derivation.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
X.Mo,W.Ge,M.Miraglia,F.Inglese,D.Zhao,C.Stefanini,D.Romano,Jumpinglocomotionstrategies:Fromanimalstobioinspiredrobots, Applied Sciences 10 (23) (2020) 8607.doi:10.3390/app10238607
-
[2]
J.-S. Koh, E. Yang, G.-P. Jung, S.-P. Jung, J. H. Son, S.-I. Lee, P. G. Jablonski, R. J. Wood, H.-Y. Kim, K.-J. Cho, Jumping on water: Surface tension–dominated jumping of water striders and robotic insects, Science 349 (6247) (2015) 517–521.doi:10.1126/science.aab1637
-
[3]
C. Yi, X. Chen, Y. Zhang, Z. Yu, H. Qi, Y. Liu, Q. Huang, Simulating the grf of humanoid robot vertical jumping using a simplified model with a foot structure for foot design, Journal of Bionic Engineering 21 (1) (2024) 112–125.doi:10.1007/s42235-023-00429-8
-
[4]
X.Wang,W.Guo,Z.He,R.Li,F.Zha,L.Sun,Bionicjumpingofhumanoidrobotviaonlinecentroidtrajectoryoptimizationandhighdynamic motion controller, Journal of Bionic Engineering 21 (6) (2024) 2759–2778.doi:10.1007/s42235-024-00586-4
-
[5]
Z. Zhao, S. Sun, H. Huang, Q. Gao, W. Xu, Design and control of continuous jumping gaits for humanoid robots based on motion function and reinforcement learning, Procedia Computer Science 250 (2024) 51–57.doi:10.1016/j.procs.2024.11.008
-
[6]
Y.Liu,X.Chen,Z.Yu,H.Qi,C.Yi,Singlesequentialtrajectoryoptimizationwithcentroidaldynamicsandwhole-bodykinematicsforvertical jump of humanoid robot, Biomimetics 9 (5) (2024) 274.doi:10.3390/biomimetics9050274
-
[7]
G. Ribak, Insect-inspired jumping robots: challenges and solutions to jump stability, Current Opinion in Insect Science 42 (2020) 32–38. doi:10.1016/j.cois.2020.09.001
-
[8]
675–676.doi:10.1109/URAI.2017.7992792
K.Y.Su,J.Z.Gul,K.H.Choi,Abiomimeticjumpinglocomotionoffunctionallygradedfrogsoftrobot,in:201714thInternationalConference on Ubiquitous Robots and Ambient Intelligence (URAI), IEEE, 2017, pp. 675–676.doi:10.1109/URAI.2017.7992792
-
[9]
Afschrift, E
M. Afschrift, E. Van Asseldonk, M. Van Mierlo, C. Bayon, A. Keemink, L. D’Hondt, H. Van Der Kooij, F. De Groote, Assisting walking balance using a bio-inspired exoskeleton controller, Journal of Neuroengineering and Rehabilitation 20 (1) (2023) 82.doi:10.1186/ s12984-023-01205-9
2023
-
[10]
D. Ezekiel, R. Samikannu, O. Matsebe, Bio-inspired jumping spider optimization for controller tuning/parameter estimation of an uncertain aerodynamic mimo system, Chaos Theory and Applications 6 (3) (2024) 205–217.doi:10.51537/chaos.1396823
-
[11]
H. Elliott, X. An, M. Wang, A bio-inspired jumping robot: Design, modelling and experimental tests, in: Annual Conference Towards Autonomous Robotic Systems, Springer, 2024, pp. 164–170.doi:10.1007/978-3-031-72062-8_15
-
[12]
M. Kabir, A. Anand, P. Sundaravadivel, Hop-bot: a bio-inspired approach to locomotion and stability in modular robotics, in: 2024 IEEE International Conference on Electro Information Technology (EIT), IEEE, 2024, pp. 285–290.doi:10.1109/eIT60633.2024.10609916
-
[13]
J. Hong, C. Yeo, S. Bae, J. Hong, S. Oh, Slip embodied robust quadruped robot control, in: 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), IEEE, 2024, pp. 14219–14224.doi:10.1109/IROS58592.2024.10802545
-
[14]
G.Piovan,K.Byl,Reachability-basedcontrolfortheactiveslipmodel,TheInternationalJournalofRoboticsResearch34(3)(2015)270–287. doi:10.1177/0278364914552112
-
[15]
G. Piovan, K. Byl, Approximation and control of the slip model dynamics via partial feedback linearization and two-element leg actuation strategy, IEEE Transactions on Robotics 32 (2) (2016) 399–412.doi:10.1109/TRO.2016.2529649
-
[16]
H. Hamzaçebi, I. Uyanik, Ö. Morgül, On the analysis and control of a bipedal legged locomotion model via partial feedback linearization, Bioinspiration & Biomimetics 19 (5) (2024) 056004.doi:10.1088/1748-3190/ad5cb6
-
[17]
H.-W. Park, P. M. Wensing, S. Kim, Jumping over obstacles with mit cheetah 2, Robotics and Autonomous Systems 136 (2021) 103703. doi:10.1016/j.robot.2020.103703
-
[18]
:Preprint submitted to Elsevier Page 16 of 17
D.Ahn,B.-K.Cho,Onlinejumpingmotiongenerationviamodelpredictivecontrol,IEEETransactionsonIndustrialElectronics69(5)(2021) 4957–4965.doi:10.1109/TIE.2021.3078396. :Preprint submitted to Elsevier Page 16 of 17
-
[19]
G. Ji, J. Mun, H. Kim, J. Hwangbo, Concurrent training of a control policy and a state estimator for dynamic and robust legged locomotion, IEEE Robotics and Automation Letters 7 (2) (2022) 4630–4637.doi:10.1109/LRA.2022.3151396
-
[20]
Z. He, J. Wu, J. Zhang, S. Zhang, Y. Shi, H. Liu, L. Sun, Y. Su, X. Leng, Cdm-mpc: An integrated dynamic planning and control framework for bipedal robots jumping, IEEE Robotics and Automation Letters 9 (7) (2024) 6672–6679.doi:10.1109/LRA.2024.3408487
-
[21]
Z. Xu, J. Xie, K. Hashimoto, Human-inspired gait and jumping motion generation for bipedal robots using model predictive control, Biomimetics 10 (1) (2025) 17.doi:10.3390/biomimetics10010017
-
[22]
Z.Fu,Z.Yu,X.Chen,L.Han,P.Gergondet,J.Zhang,Q.Huang,Continuousbipedaljumpingviasliding-moderegularizedpredictivecontrol, IEEE/ASME Transactions on Mechatronics (2024).doi:10.1109/TMECH.2024.3515151
-
[23]
J.Kober,J.A.Bagnell,J.Peters,Reinforcementlearninginrobotics:Asurvey,TheInternationalJournalofRoboticsResearch32(11)(2013) 1238–1274.doi:10.1177/0278364913495721
-
[24]
C. Tao, M. Li, F. Cao, Z. Gao, Z. Zhang, A multiobjective collaborative deep reinforcement learning algorithm for jumping optimization of bipedal robot, Advanced Intelligent Systems 6 (1) (2024) 2300352.doi:10.1002/aisy.202300352
-
[25]
4934–4939.doi:10.1109/IROS.2010.5651461
M.Hutter,C.D.Remy,M.A.Höpflinger,R.Siegwart,Sliprunningwithanarticulatedroboticleg,in:2010IEEE/RSJInternationalConference on Intelligent Robots and Systems, IEEE, 2010, pp. 4934–4939.doi:10.1109/IROS.2010.5651461
-
[26]
X.He,X.Li,X.Wang,F.Meng,X.Guan,Z.Jiang,L.Yuan,K.Ba,G.Ma,B.Yu,Runninggaitandcontrolofquadrupedrobotbasedonslip model, Biomimetics 9 (1) (2024).doi:10.3390/biomimetics9010024
-
[27]
P. M. Wensing, D. E. Orin, Control of humanoid hopping based on a slip model, Advances in Mechanisms, Robotics and Design Education and Research (2013) 265–274doi:10.1007/978-3-319-00398-6_21
-
[28]
Proximal Policy Optimization Algorithms
J. Schulman, F. Wolski, P. Dhariwal, A. Radford, O. Klimov, Proximal policy optimization algorithms, arXiv preprint arXiv:1707.06347 (2017).doi:10.48550/arXiv.1707.06347
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1707.06347 2017
-
[29]
C.Yu,A.Velu,E.Vinitsky,J.Gao,Y.Wang,A.Bayen,Y.Wu,Thesurprisingeffectivenessofppoincooperativemulti-agentgames,Advances in Neural Information Processing Systems 35 (2022) 24611–24624
2022
-
[30]
Y. Zhao, T. Wu, Y. Zhu, X. Lu, J. Wang, H. Bou-Ammar, X. Zhang, P. Du, Zsl-rppo: Zero-shot learning for quadrupedal locomotion in challengingterrainsusingrecurrentproximalpolicyoptimization,arXivpreprintarXiv:2403.01928(2024).doi:/10.48550/arXiv.2403. 01928
-
[31]
Z. Zhang, J. Zhao, H. Chen, D. Chen, A survey of bioinspired jumping robot: takeoff, air posture adjustment, and landing buffer, Applied Bionics and Biomechanics 2017 (1) (2017) 4780160.doi:10.1155/2017/4780160
-
[32]
C. Zhang, W. Zou, L. Ma, Z. Wang, Biologically inspired jumping robots: A comprehensive review, Robotics and Autonomous Systems 124 (2020) 103362.doi:10.1016/j.robot.2019.103362
-
[33]
G. Garofalo, C. Ott, A. Albu-Schäffer, Walking control of fully actuated robots based on the bipedal slip model, in: 2012 IEEE International Conference on Robotics and Automation, IEEE, 2012, pp. 1456–1463.doi:10.1109/ICRA.2012.6225272
-
[34]
M. Shahbazi, R. Babuška, G. A. Lopes, Unified modeling and control of walking and running on the spring-loaded inverted pendulum, IEEE Transactions on Robotics 32 (5) (2016) 1178–1195.doi:10.1109/TRO.2016.2593483
-
[35]
J. Rummel, Y. Blum, A. Seyfarth, Robust and efficient walking with spring-like legs, Bioinspiration & Biomimetics 5 (4) (2010) 046004. doi:10.1088/1748-3182/5/4/046004
-
[36]
S. Xie, X. Li, H. Zhong, C. Hu, L. Gao, Compliant bipedal walking based on variable spring-loaded inverted pendulum model with finite- sized foot, in: 2021 6th IEEE International Conference on Advanced Robotics and Mechatronics (ICARM), IEEE, 2021, pp. 667–672. doi:10.1109/ICARM52023.2021.9536096
-
[37]
H.Sang,S.Wang,Lunarleaprobot:3marchitecture–enhanceddeepreinforcementlearningmethodforquadrupedrobotjumpinginlow-gravity environment, Journal of Aerospace Engineering 37 (6) (2024) 04024076.doi:10.1061/JAEEEZ.ASENG-5619
-
[38]
G. Bellegarda, C. Nguyen, Q. Nguyen, Robust quadruped jumping via deep reinforcement learning, Robotics and Autonomous Systems 182 (2024) 104799.doi:10.1016/j.robot.2024.104799
-
[39]
G. Bellegarda, M. Shafiee, M. E. Özberk, A. Ijspeert, Quadruped-frog: Rapid online optimization of continuous quadruped jumping, in: 2024 IEEE International Conference on Robotics and Automation (ICRA), IEEE, 2024, pp. 1443–1450.doi:10.1109/ICRA57147.2024. 10610141
-
[40]
G.Bellegarda,A.Ijspeert,Cpg-rl:Learningcentralpatterngeneratorsforquadrupedlocomotion,IEEERoboticsandAutomationLetters7(4) (2022) 12547–12554.doi:10.1109/LRA.2022.3218167
-
[41]
X. B. Peng, M. Andrychowicz, W. Zaremba, P. Abbeel, Sim-to-real transfer of robotic control with dynamics randomization, in: 2018 IEEE International Conference on Robotics and Automation (ICRA), IEEE, 2018, pp. 3803–3810.doi:10.1109/ICRA.2018.8460528
-
[42]
Q. Zhou, G. Li, R. Tang, Y. Xu, H. Wen, Q. Shi, Stable jumping control based on deep reinforcement learning for a locust-inspired robot, Biomimetics 9 (9) (2024) 548.doi:10.3390/biomimetics9090548
-
[43]
R. J. Full, D. E. Koditschek, Templates and anchors: neuromechanical hypotheses of legged locomotion on land, Journal of Experimental Biology 202 (23) (1999) 3325–3332.doi:10.1242/jeb.202.23.3325
-
[44]
H. Geyer, U. Saranli, Gait based on the spring-loaded inverted pendulum, in: A. Goswami, P. Vadakkepat (Eds.), Humanoid Robotics: A Reference, Springer, Dordrecht, 2019, pp. 923–947.doi:10.1007/978-94-007-6046-2_43
-
[45]
L. Ye, Y. Cheng, J. Li, X. Wang, B. Liang, Y. Peng, From knowing to doing: learning diverse motor skills through instruction learning, Biomimetic Intelligence and Robotics (2026) 100286
2026
-
[46]
R. Hartley, M. Ghaffari, R. M. Eustice, J. W. Grizzle, Contact-aided invariant extended kalman filtering for robot state estimation, The International Journal of Robotics Research 39 (4) (2020) 402–430.doi:10.1177/0278364919894385
-
[47]
Humanoid-gym: Reinforcement learning for humanoid robot with zero-shot sim2real transfer,
X. Gu, Y.-J. Wang, J. Chen, Humanoid-gym: Reinforcement learning for humanoid robot with zero-shot sim2real transfer, arXiv preprint arXiv:2404.05695 (2024).doi:10.48550/arXiv.2404.05695. :Preprint submitted to Elsevier Page 17 of 17
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.