Guided Reinforcement Learning for Omnidirectional 3D Jumping in Quadruped Robots

Claudio Semini; Giulio Turrisi; Luigi Palopoli; Michele Focchi; Riccardo Bussola

arxiv: 2507.16481 · v3 · pith:EZMQJG74new · submitted 2025-07-22 · 💻 cs.RO · cs.SY· eess.SY

Guided Reinforcement Learning for Omnidirectional 3D Jumping in Quadruped Robots

Riccardo Bussola , Michele Focchi , Giulio Turrisi , Claudio Semini , Luigi Palopoli This is my paper

Pith reviewed 2026-05-22 00:10 UTC · model grok-4.3

classification 💻 cs.RO cs.SYeess.SY

keywords quadruped robotsreinforcement learningjumping controlBézier curvesomnidirectional motionguided RLphysical models3D locomotion

0 comments

The pith

Guided reinforcement learning combines Bézier curves and accelerated motion models for efficient omnidirectional 3D jumping in quadruped robots.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a method that guides reinforcement learning for quadruped robots to perform jumps in any direction in three dimensions by incorporating physical models. It uses Bézier curves to shape the jumping trajectories and a uniformly accelerated rectilinear motion model to describe the dynamics. This guidance aims to reduce the number of training samples needed and make the learned behaviors more predictable and safer than those from standard reinforcement learning approaches. Traditional methods either require detailed knowledge of the robot and environment or suffer from high sample complexity and lack of explainability. A sympathetic reader would care because successful jumping is key for robots operating in challenging terrains, and this could lead to more reliable real-world performance.

Core claim

By combining Bézier curves with a Uniformly Accelerated Rectilinear Motion (UARM) model to guide the reinforcement learning process, the approach achieves more efficient training and more predictable jumping motions for quadruped robots, as shown through simulations and real experiments that outperform existing methods.

What carries the argument

The guided reinforcement learning framework that integrates Bézier curve trajectory planning with the UARM motion model to inject physical intuition into the learning process.

If this is right

Lower sample complexity for training jumping policies compared to end-to-end RL.
Greater predictability in the final jumping motions, aiding safety certification.
Superior performance in both simulation and hardware experiments over alternative approaches.
Reduced need for extensive robot and terrain parameter knowledge in controller design.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This guidance technique could be adapted to other agile locomotion tasks such as running or vaulting.
Similar physical model integration might help bridge the gap between simulation and real-world robot deployment.
The method opens possibilities for certifying safety in dynamic robot behaviors more systematically.

Load-bearing premise

The physical models of Bézier curves and uniformly accelerated motion provide accurate enough guidance to improve RL without adding harmful biases or requiring detailed parameter knowledge.

What would settle it

A direct comparison experiment where the guided approach requires as many or more training episodes than standard RL or yields jumping motions that cannot be more easily predicted or certified would falsify the central claim.

read the original abstract

Jumping poses a significant challenge for quadruped robots, despite being crucial for many operational scenarios. While optimisation methods exist for controlling such motions, they are often time-consuming and demand extensive knowledge of robot and terrain parameters, making them less robust in real-world scenarios. Reinforcement learning (RL) is emerging as a viable alternative, yet conventional end-to-end approaches lack efficiency in terms of sample complexity, requiring extensive training in simulations, and predictability of the final motion, which makes it difficult to certify the safety of the final motion. To overcome these limitations, this paper introduces a novel guided reinforcement learning approach that leverages physical intuition for efficient and explainable jumping, by combining B\'ezier curves with a Uniformly Accelerated Rectilinear Motion (UARM) model. Extensive simulation and experimental results clearly demonstrate the advantages of our approach over existing alternatives.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper combines Bezier curves and a simple UARM model to guide RL for quadruped 3D jumping, which is a reasonable practical step but the dynamics mismatch looks like a real issue worth checking in the results.

read the letter

The main takeaway is that this work adds a guided layer to RL for omnidirectional jumping on quadrupeds by parametrizing trajectories with Bezier curves and using a Uniformly Accelerated Rectilinear Motion model to shape the policy. That combination is the concrete new piece, aimed at cutting sample complexity and making the output motion more predictable than pure end-to-end RL. The abstract frames the usual problems clearly: optimization needs too much parameter knowledge, while standard RL trains slowly and produces hard-to-certify behaviors. The guided approach tries to inject physical intuition to fix both. On the execution side, the method appears straightforward to implement once the guidance is set up, and the choice to target real-robot experiments alongside simulation is the right direction for this subfield. The soft spot is the UARM model itself. It assumes constant acceleration along a straight line, but quadruped jumps involve stance, liftoff, parabolic flight under gravity, and landing with changing contact forces. Omnidirectional control makes the paths even less rectilinear. If the guidance deviates from actual dynamics, it could bias the learned policy or fail to deliver the claimed efficiency gains. The abstract asserts extensive results back the advantages, yet the lack of visible quantitative comparisons, baselines, or ablation numbers in the high-level description leaves the central claim plausible but not yet strongly supported. This paper is for researchers working on dynamic legged locomotion who want hybrid RL-classical methods rather than pure learning or pure optimization. A reader focused on sample-efficient control for real hardware would find usable ideas here. It deserves a serious referee because the problem is relevant and the method is specific enough to evaluate, even if the validation section will likely need tightening.

Referee Report

2 major / 2 minor

Summary. The paper proposes a guided reinforcement learning framework for omnidirectional 3D jumping in quadruped robots. It combines Bézier curves for trajectory planning with a Uniformly Accelerated Rectilinear Motion (UARM) model to inject physical intuition, with the goal of achieving lower sample complexity and more predictable, explainable motions than end-to-end RL or parameter-heavy optimization methods. The abstract states that extensive simulation and experimental results demonstrate clear advantages over existing alternatives.

Significance. If the quantitative results and ablation studies hold, the work could offer a practical middle ground between model-based control and pure learning for dynamic locomotion, potentially improving training efficiency and safety certification for jumping behaviors in real-world quadruped deployments.

major comments (2)

[Method / UARM model definition] The central claim that the Bézier + UARM guidance supplies accurate, low-bias priors that reduce sample complexity without extensive robot/terrain parameter knowledge rests on the fidelity of the UARM model. The manuscript should explicitly compare the UARM-predicted trajectories against the actual stance-to-flight transitions and gravity-dominated parabolic arcs observed in the robot's dynamics (e.g., in the results or dynamics section); without such validation, the guidance risks introducing systematic bias rather than improving predictability.
[Abstract and Results] Abstract claims 'extensive simulation and experimental results clearly demonstrate the advantages' yet the provided description contains no quantitative metrics, baseline comparisons (e.g., sample efficiency curves, success rates, or energy metrics), or error analysis. The results section must include these to substantiate the efficiency and explainability claims; otherwise the central advantage over end-to-end RL remains unverified.

minor comments (2)

[Method] Clarify how the Bézier curve parameters are chosen or adapted online versus fixed from the UARM model, and whether any additional robot-specific parameters are still required.
[Figures] Ensure all figures showing trajectories or learned policies include direct overlays of the UARM reference and measured robot motion for visual assessment of guidance fidelity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback, which helps clarify the presentation of our guided RL framework. We address each major comment below and commit to revisions that strengthen the validation and quantitative support without altering the core contributions.

read point-by-point responses

Referee: [Method / UARM model definition] The central claim that the Bézier + UARM guidance supplies accurate, low-bias priors that reduce sample complexity without extensive robot/terrain parameter knowledge rests on the fidelity of the UARM model. The manuscript should explicitly compare the UARM-predicted trajectories against the actual stance-to-flight transitions and gravity-dominated parabolic arcs observed in the robot's dynamics (e.g., in the results or dynamics section); without such validation, the guidance risks introducing systematic bias rather than improving predictability.

Authors: We agree that explicit validation of the UARM approximation is necessary to substantiate the low-bias claim. The UARM model is specifically chosen to capture the dominant vertical acceleration under gravity during flight, while Bézier curves handle the horizontal and transition phases. In the revised manuscript we will add a dedicated comparison subsection (in Results) that overlays UARM-predicted vertical and horizontal trajectories against both simulation data and hardware recordings of stance-to-flight transitions. This will include quantitative error metrics (e.g., RMSE) to demonstrate fidelity and any residual bias. revision: yes
Referee: [Abstract and Results] Abstract claims 'extensive simulation and experimental results clearly demonstrate the advantages' yet the provided description contains no quantitative metrics, baseline comparisons (e.g., sample efficiency curves, success rates, or energy metrics), or error analysis. The results section must include these to substantiate the efficiency and explainability claims; otherwise the central advantage over end-to-end RL remains unverified.

Authors: The full results section already reports quantitative metrics, including sample-efficiency curves, success rates across omnidirectional jumps, and energy comparisons versus end-to-end RL and optimization baselines, together with ablation studies on the guidance components. To address the concern directly, we will (i) revise the abstract to include one or two key quantitative highlights and (ii) expand the results section with additional error analysis and clearer baseline tables if any gaps exist in the current presentation. revision: partial

Circularity Check

0 steps flagged

Guided RL framework relies on independent physical models with no circular reduction

full rationale

The paper presents a guided reinforcement learning method that combines Bézier curves for trajectory planning with a Uniformly Accelerated Rectilinear Motion (UARM) model to supply physical intuition, thereby reducing sample complexity and improving explainability compared to end-to-end RL. No derivation step in the abstract or described approach reduces a claimed prediction or result to a quantity defined by the paper's own fitted parameters, self-citations, or ansatz smuggled in via prior work. The physical models are invoked as external guidance inputs applied to the RL process rather than being derived from or equivalent to the learned policy outputs. The central claims rest on simulation and experimental validation against alternatives, rendering the framework self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on standard domain assumptions in robotics and RL; no explicit free parameters, new entities, or ad-hoc axioms are detailed in the abstract beyond the core guidance premise.

axioms (1)

domain assumption Physical models such as Bézier curves and UARM can effectively guide RL to achieve lower sample complexity and higher explainability for jumping motions.
Invoked as the foundation for the novel guided approach in the abstract.

pith-pipeline@v0.9.0 · 5687 in / 1258 out tokens · 65840 ms · 2026-05-22T00:10:05.096695+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

combining Bézier curves with a Uniformly Accelerated Rectilinear Motion (UARM) model... ballistic trajectory lies within the plane... equations (1) for projectile motion... safety filter... single-stage learning process
IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

omnidirectional 3D jumping... thrust phase... flight phase governed by conservation of momentum

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

LineRides: Line-Guided Reinforcement Learning for Bicycle Robot Stunts
cs.RO 2026-05 unverdicted novelty 6.0

LineRides enables a bicycle robot to learn five commandable stunts from spatial guidelines and key orientations via RL without demonstrations or timing.
LineRides: Line-Guided Reinforcement Learning for Bicycle Robot Stunts
cs.RO 2026-05 unverdicted novelty 6.0

LineRides enables commandable bicycle robot stunts via line-guided RL that uses spatial guidelines, a tracking margin for feasibility, distance-based progress, and sparse key-orientations.

Reference graph

Works this paper leans on

49 extracted references · 49 canonical work pages · cited by 1 Pith paper · 1 internal anchor

[1]

Journal of Field Robotics41(6), 1829–1842 (2024)

Amatucci, L., Turrisi, G., Bratta, A., Barasuol, V., Semini, C.: Vero: A vacuum- cleaner-equipped quadruped robot for efficient litter removal. Journal of Field Robotics41(6), 1829–1842 (2024)

work page 2024
[2]

Ain Shams Engineering Journal12(2), 2017–2031 (2021)

Biswal, P., Mohanty, P.K.: Development of quadruped walking robots: A review. Ain Shams Engineering Journal12(2), 2017–2031 (2021)

work page 2017
[3]

IEEE Transactions on Robotics38(6), 3395–3413 (2022) https://doi.org/10.1109/TRO.2022.3186804

Jenelten, F., Grandia, R., Farshidian, F., Hutter, M.: Tamols: Terrain-aware motion optimization for legged systems. IEEE Transactions on Robotics38(6), 3395–3413 (2022) https://doi.org/10.1109/TRO.2022.3186804

work page doi:10.1109/tro.2022.3186804 2022
[4]

IEEE Robotics and Automation Letters8(11), 7210–7217 (2023) https://doi.org/10.1109/LRA.2023.3313919

Roscia, F., Focchi, M., Prete, A.D., Caldwell, D.G., Semini, C.: Reactive landing controller for quadruped robots. IEEE Robotics and Automation Letters8(11), 7210–7217 (2023) https://doi.org/10.1109/LRA.2023.3313919

work page doi:10.1109/lra.2023.3313919 2023
[5]

The International Journal of Robotics Research36(2), 167–192 (2017) https://doi.org/10.1177/0278364917694244

Park, H.-W., Wensing, P.M., Kim, S.: High-speed bounding with the mit chee- tah 2: Control design and experiments. The International Journal of Robotics Research36(2), 167–192 (2017) https://doi.org/10.1177/0278364917694244

work page doi:10.1177/0278364917694244 2017
[6]

IEEE Robotics and Automation Letters5(2), 3422–3429 (2020) https://doi.org/10.1109/LRA.2020

Yim, J.K., Singh, B.R.P., Wang, E.K., Featherstone, R., Fearing, R.S.: Preci- sion robotic leaping and landing using stance-phase balance. IEEE Robotics and Automation Letters5(2), 3422–3429 (2020) https://doi.org/10.1109/LRA.2020. 2976597

work page doi:10.1109/lra.2020 2020
[7]

In: 2022 IEEE/RSJ Inter- national Conference on Intelligent Robots and Systems (IROS), pp

Nguyen, C., Nguyen, Q.: Contact-timing and trajectory optimization for 3d jumping on quadruped robots. In: 2022 IEEE/RSJ International Conference on 39 Intelligent Robots and Systems (IROS), pp. 11994–11999 (2022). https://doi.org/ 10.1109/IROS47612.2022.9981284

work page doi:10.1109/iros47612.2022.9981284 2022
[8]

In: 2019 International Conference on Robotics and Automation (ICRA), pp

Katz, B., Di Carlo, J., Kim, S.: Mini cheetah: A platform for pushing the limits of dynamic quadruped control. In: 2019 International Conference on Robotics and Automation (ICRA), pp. 6295–6301 (2019). IEEE

work page 2019
[9]

In: 2021 IEEE International Conference on Robotics and Automation (ICRA), pp

Chignoli, M., Kim, S.: Online trajectory optimization for dynamic aerial motions of a quadruped robot. In: 2021 IEEE International Conference on Robotics and Automation (ICRA), pp. 7693–7699 (2021). IEEE

work page 2021
[10]

VINSEval: Evaluation Framework for Uniﬁed Testing of Consistency and Robustness of Visual-Inertial Navigation System Algorithms,

Garc´ ıa, G., Griffin, R., Pratt, J.: Time-varying model predictive control for highly dynamic motions of quadrupedal robots. In: 2021 IEEE International Conference on Robotics and Automation (ICRA), pp. 7344–7349 (2021). https://doi.org/10. 1109/ICRA48506.2021.9561913

work page arXiv 2021
[11]

Varadarajan, A

Chignoli, M., Morozov, S., Kim, S.: Rapid and reliable quadruped motion planning with omnidirectional jumping. In: 2022 International Conference on Robotics and Automation (ICRA), pp. 6621–6627 (2022). https://doi.org/10. 1109/ICRA46639.2022.9812088

work page arXiv 2022
[12]

In: 2022 IEEE/RSJ Inter- national Conference on Intelligent Robots and Systems (IROS), pp

Song, Z., Yue, L., Sun, G., Ling, Y., Wei, H., Gui, L., Liu, Y.-H.: An optimal motion planning framework for quadruped jumping. In: 2022 IEEE/RSJ Inter- national Conference on Intelligent Robots and Systems (IROS), pp. 11366–11373 (2022). https://doi.org/10.1109/IROS47612.2022.9981642

work page doi:10.1109/iros47612.2022.9981642 2022
[13]

IEEE Transactions on Robotics 41, 837–856 (2025) https://doi.org/10.1109/TRO.2024.3504132

Li, H., Wensing, P.M.: Cafe-mpc: A cascaded-fidelity model predictive control framework with tuning-free whole-body control. IEEE Transactions on Robotics 41, 837–856 (2025) https://doi.org/10.1109/TRO.2024.3504132

work page doi:10.1109/tro.2024.3504132 2025
[14]

IEEE Transactions on Robotics (2024)

Ding, J., Atanassov, V., Panichi, E., Kober, J., Della Santina, C.: Robust quadrupedal jumping with impact-aware landing: Exploiting parallel elasticity. IEEE Transactions on Robotics (2024)

work page 2024
[15]

ArXiv (2022)

Mastalli, C., Merkt, W., Xin, G., Shim, J., Mistry, M., Havoutis, I., Vijayakumar, S.: Agile maneuvers in legged robots:a predictive control approach. ArXiv (2022)

work page 2022
[16]

Li, H., Wensing, P.M.: Cafe-Mpc: A Cascaded-Fidelity Model Predictive Control Framework with Tuning-Free Whole-Body Control (2024)

work page 2024
[17]

arXiv preprint arXiv:2403.06954 (2024)

Bellegarda, G., Shafiee, M., ¨Ozberk, M.E., Ijspeert, A.: Quadruped-frog: Rapid online optimization of continuous quadruped jumping. arXiv preprint arXiv:2403.06954 (2024)

work page arXiv 2024
[18]

Continuous control with deep reinforcement learning

Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N.M.O., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. CoRR abs/1509.02971(2015) 40

work page internal anchor Pith review Pith/arXiv arXiv 2015
[19]

IEEE Robotics & Automation Magazine23(1), 34–43 (2016)

Gehring, C., Coros, S., Hutter, M., Bellicoso, C.D., Heijnen, H., Diethelm, R., Bloesch, M., Fankhauser, P., Hwangbo, J., Hoepflinger, M.,et al.: Practice makes perfect: An optimization-based approach to controlling agile motions for a quadruped robot. IEEE Robotics & Automation Magazine23(1), 34–43 (2016)

work page 2016
[20]

Science Robotics4(26), 5872 (2019)

Hwangbo, J., Lee, J., Dosovitskiy, A., Bellicoso, D., Tsounis, V., Koltun, V., Hutter, M.: Learning agile and dynamic motor skills for legged robots. Science Robotics4(26), 5872 (2019)

work page 2019
[21]

Robotics: Science and Systems (2020) https://doi.org/10.15607/RSS.2020.XVI.064

Peng, X., Coumans, E., Zhang, T., Lee, T.-W., Tan, J., Levine, S.: Learning agile robotic locomotion skills by imitating animals. Robotics: Science and Systems (2020) https://doi.org/10.15607/RSS.2020.XVI.064

work page doi:10.15607/rss.2020.xvi.064 2020
[22]

IEEE Robotics and Automation Letters7(2), 4630–4637 (2022)

Ji, G., Mun, J., Kim, H., Hwangbo, J.: Concurrent training of a control policy and a state estimator for dynamic and robust legged locomotion. IEEE Robotics and Automation Letters7(2), 4630–4637 (2022)

work page 2022
[23]

In: Conference on Robot Learning, pp

Rudin, N., Hoeller, D., Reist, P., Hutter, M.: Learning to walk in minutes using massively parallel deep reinforcement learning. In: Conference on Robot Learning, pp. 91–100 (2022). PMLR

work page 2022
[24]

In: 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp

Fankhauser, P., Hutter, M., Gehring, C., Bloesch, M., Hoepflinger, M.A., Sieg- wart, R.: Reinforcement learning of single legged locomotion. In: 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 188–193 (2013). IEEE

work page 2013
[25]

Science Robotics9(88), 7566 (2024)

Hoeller, D., Rudin, N., Sako, D., Hutter, M.: Anymal parkour: Learning agile navigation for quadrupedal robots. Science Robotics9(88), 7566 (2024)

work page 2024
[26]

https://spinningup.openai.com/en/latest/spinningup/bench.html# benchmarks-for-spinning-up-implementations [Accessed: 26/02/2023] (2022)

OpenAI, I.: Benchmarks for Spinning Up Implementations. https://spinningup.openai.com/en/latest/spinningup/bench.html# benchmarks-for-spinning-up-implementations [Accessed: 26/02/2023] (2022)

work page 2023
[27]

In: Matni, N., Morari, M., Pappas, G.J

Yang, Y., Meng, X., Yu, W., Zhang, T., Tan, J., Boots, B.: Continuous versatile jumping using learned action residuals. In: Matni, N., Morari, M., Pappas, G.J. (eds.) Proceedings of The 5th Annual Learning for Dynamics and Control Confer- ence. Proceedings of Machine Learning Research, vol. 211, pp. 770–782. PMLR, ??? (2023)

work page 2023
[28]

ArXivabs/2110.01411(2021)

Majid, A.Y., Saaybi, S., Rietbergen, T., Fran¸ cois-Lavet, V., Prasad, R.V., Verho- even, C.: Deep reinforcement learning versus evolution strategies: A comparative survey. ArXivabs/2110.01411(2021)

work page arXiv 2021
[29]

Frontiers in Robotics and AI9, 854212 (2022) 41

Bogdanovic, M., Khadiv, M., Righetti, L.: Model-free reinforcement learning for robust locomotion using demonstrations from trajectory optimization. Frontiers in Robotics and AI9, 854212 (2022) 41

work page 2022
[30]

Bellegarda, G., Nguyen, C., Nguyen, Q.: Robust Quadruped Jumping via Deep Reinforcement Learning (2023)

work page 2023
[31]

IEEE Robotics and Automation Letters8(6), 3318–3325 (2023)

Grandesso, G., Alboni, E., Papini, G.P.R., Wensing, P.M., Del Prete, A.: Cacto: Continuous actor-critic with trajectory optimization—towards global optimality. IEEE Robotics and Automation Letters8(6), 3318–3325 (2023)

work page 2023
[32]

ACM SIGGRAPH / Eurographics Symposium on Computer Animation (2017)

Peng, X.B., Panne, M.: Learning locomotion skills using deeprl: Does the choice of action space matter? In: Proc. ACM SIGGRAPH / Eurographics Symposium on Computer Animation (2017)

work page 2017
[33]

In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp

Bellegarda, G., Byl, K.: Training in task space to speed up and guide reinforce- ment learning. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 2693–2699 (2019). https://doi.org/10.1109/IROS40897. 2019.8967995

work page doi:10.1109/iros40897 2019
[34]

In: 2023 IEEE-RAS 22nd International Conference on Humanoid Robots (Humanoids), pp

Chen, S., Zhang, B., Mueller, M.W., Rai, A., Sreenath, K.: Learning torque control for quadrupedal locomotion. In: 2023 IEEE-RAS 22nd International Conference on Humanoid Robots (Humanoids), pp. 1–8 (2023). IEEE

work page 2023
[35]

scientific Reports13(1), 11945 (2023)

Aractingi, M., L´ eziart, P.-A., Flayols, T., Perez, J., Silander, T., Sou` eres, P.: Con- trolling the solo12 quadruped robot with deep reinforcement learning. scientific Reports13(1), 11945 (2023)

work page 2023
[36]

In: 2024 IEEE International Conference on Robotics and Automation (ICRA), pp

Shafiee, M., Bellegarda, G., Ijspeert, A.: Manyquadrupeds: Learning a single locomotion policy for diverse quadruped robots. In: 2024 IEEE International Conference on Robotics and Automation (ICRA), pp. 3471–3477 (2024). IEEE

work page 2024
[37]

In: Matni, N., Morari, M., Pappas, G.J

Yang, Y., Meng, X., Yu, W., Zhang, T., Tan, J., Boots, B.: Continuous versatile jumping using learned action residuals. In: Matni, N., Morari, M., Pappas, G.J. (eds.) Proceedings of The 5th Annual Learning for Dynamics and Control Confer- ence. Proceedings of Machine Learning Research, vol. 211, pp. 770–782. PMLR, ??? (2023). https://proceedings.mlr.press...

work page 2023
[38]

In: 2024 IEEE International Conference on Robotics and Automation (ICRA), pp

Vezzi, F., Ding, J., Raffin, A., Kober, J., Della Santina, C.: Two-stage learning of highly dynamic motions with rigid and articulated soft quadrupeds. In: 2024 IEEE International Conference on Robotics and Automation (ICRA), pp. 9720–9726 (2024). IEEE

work page 2024
[39]

MacDorman, and Norri Kageki

Atanassov, V., Ding, J., Kober, J., Havoutis, I., Santina, C.D.: Curriculum-based reinforcement learning for quadrupedal jumping: A reference-free design. IEEE Robotics & Automation Magazine, 2–15 (2024) https://doi.org/10.1109/MRA. 2024.3487325

work page doi:10.1109/mra 2024
[40]

IEEE Robotics & Automation Magazine30(2), 67–85 (2022) 42

Eßer, J., Bach, N., Jestel, C., Urbann, O., Kerner, S.: Guided reinforcement learn- ing: A review and evaluation for efficient and effective real-world robotics [survey]. IEEE Robotics & Automation Magazine30(2), 67–85 (2022) 42

work page 2022
[41]

Sensors24(15), 4981 (2024)

Bussola, R., Focchi, M., Del Prete, A., Fontanelli, D., Palopoli, L.: Efficient reinforcement learning for 3d jumping monopods. Sensors24(15), 4981 (2024)

work page 2024
[42]

Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal Policy Optimization Algorithms (2017)

work page 2017
[43]

In: International Confer- ence on Machine Learning, pp

Fujita, Y., Maeda, S.-i.: Clipped action policy gradient. In: International Confer- ence on Machine Learning, pp. 1597–1606 (2018). PMLR

work page 2018
[44]

IEEE Robotics and Automation Letters8(6), 3740– 3747 (2023)

Mittal, M., Yu, C., Yu, Q., Liu, J., Rudin, N., Hoeller, D., Yuan, J.L., Singh, R., Guo, Y., Mazhar, H.,et al.: Orbit: A unified simulation framework for interactive robot learning environments. IEEE Robotics and Automation Letters8(6), 3740– 3747 (2023)

work page 2023
[45]

In: Climbing and Walking Robots Conference, pp

Focchi, M., Roscia, F., Semini, C.: Locosim: an open-source cross-platform robotics framework. In: Climbing and Walking Robots Conference, pp. 395–406 (2023). Springer

work page 2023
[46]

In: Liu, K., Kulic, D., Ichnowski, J

Feng, G., Zhang, H., Li, Z., Peng, X.B., Basireddy, B., Yue, L., SONG, Z., Yang, L., Liu, Y., Sreenath, K., Levine, S.: Genloco: Generalized locomotion controllers for quadrupedal robots. In: Liu, K., Kulic, D., Ichnowski, J. (eds.) Proceedings of The 6th Conference on Robot Learning. Proceedings of Machine Learning Research, vol. 205, pp. 1893–1903. PMLR...

work page 1903
[47]

The International Journal of Robotics Research (2025)

Ordo˜ nez-Apraez, D., Turrisi, G., Kostic, V., Martin, M., Agudo, A., Moreno- Noguer, F., Pontil, M., Semini, C., Mastalli, C.: Morphological symmetries in robotics. The International Journal of Robotics Research (2025)

work page 2025
[48]

In: 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp

Su, Z., Huang, X., Ordo˜ nez-Apraez, D., Li, Y., Li, Z., Liao, Q., Turrisi, G., Pontil, M., Semini, C., Wu, Y., Sreenath, K.: Leveraging symmetry in rl-based legged locomotion control. In: 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 6899–6906 (2024)

work page 2024
[49]

SIAM Review7(1), 151–152 (1965) https://doi.org/10.1137/1007028 43

Greenstein, D.S.: Interpolation and approximation. SIAM Review7(1), 151–152 (1965) https://doi.org/10.1137/1007028 43

work page doi:10.1137/1007028 1965

[1] [1]

Journal of Field Robotics41(6), 1829–1842 (2024)

Amatucci, L., Turrisi, G., Bratta, A., Barasuol, V., Semini, C.: Vero: A vacuum- cleaner-equipped quadruped robot for efficient litter removal. Journal of Field Robotics41(6), 1829–1842 (2024)

work page 2024

[2] [2]

Ain Shams Engineering Journal12(2), 2017–2031 (2021)

Biswal, P., Mohanty, P.K.: Development of quadruped walking robots: A review. Ain Shams Engineering Journal12(2), 2017–2031 (2021)

work page 2017

[3] [3]

IEEE Transactions on Robotics38(6), 3395–3413 (2022) https://doi.org/10.1109/TRO.2022.3186804

Jenelten, F., Grandia, R., Farshidian, F., Hutter, M.: Tamols: Terrain-aware motion optimization for legged systems. IEEE Transactions on Robotics38(6), 3395–3413 (2022) https://doi.org/10.1109/TRO.2022.3186804

work page doi:10.1109/tro.2022.3186804 2022

[4] [4]

IEEE Robotics and Automation Letters8(11), 7210–7217 (2023) https://doi.org/10.1109/LRA.2023.3313919

Roscia, F., Focchi, M., Prete, A.D., Caldwell, D.G., Semini, C.: Reactive landing controller for quadruped robots. IEEE Robotics and Automation Letters8(11), 7210–7217 (2023) https://doi.org/10.1109/LRA.2023.3313919

work page doi:10.1109/lra.2023.3313919 2023

[5] [5]

The International Journal of Robotics Research36(2), 167–192 (2017) https://doi.org/10.1177/0278364917694244

Park, H.-W., Wensing, P.M., Kim, S.: High-speed bounding with the mit chee- tah 2: Control design and experiments. The International Journal of Robotics Research36(2), 167–192 (2017) https://doi.org/10.1177/0278364917694244

work page doi:10.1177/0278364917694244 2017

[6] [6]

IEEE Robotics and Automation Letters5(2), 3422–3429 (2020) https://doi.org/10.1109/LRA.2020

Yim, J.K., Singh, B.R.P., Wang, E.K., Featherstone, R., Fearing, R.S.: Preci- sion robotic leaping and landing using stance-phase balance. IEEE Robotics and Automation Letters5(2), 3422–3429 (2020) https://doi.org/10.1109/LRA.2020. 2976597

work page doi:10.1109/lra.2020 2020

[7] [7]

In: 2022 IEEE/RSJ Inter- national Conference on Intelligent Robots and Systems (IROS), pp

Nguyen, C., Nguyen, Q.: Contact-timing and trajectory optimization for 3d jumping on quadruped robots. In: 2022 IEEE/RSJ International Conference on 39 Intelligent Robots and Systems (IROS), pp. 11994–11999 (2022). https://doi.org/ 10.1109/IROS47612.2022.9981284

work page doi:10.1109/iros47612.2022.9981284 2022

[8] [8]

In: 2019 International Conference on Robotics and Automation (ICRA), pp

Katz, B., Di Carlo, J., Kim, S.: Mini cheetah: A platform for pushing the limits of dynamic quadruped control. In: 2019 International Conference on Robotics and Automation (ICRA), pp. 6295–6301 (2019). IEEE

work page 2019

[9] [9]

In: 2021 IEEE International Conference on Robotics and Automation (ICRA), pp

Chignoli, M., Kim, S.: Online trajectory optimization for dynamic aerial motions of a quadruped robot. In: 2021 IEEE International Conference on Robotics and Automation (ICRA), pp. 7693–7699 (2021). IEEE

work page 2021

[10] [10]

VINSEval: Evaluation Framework for Uniﬁed Testing of Consistency and Robustness of Visual-Inertial Navigation System Algorithms,

Garc´ ıa, G., Griffin, R., Pratt, J.: Time-varying model predictive control for highly dynamic motions of quadrupedal robots. In: 2021 IEEE International Conference on Robotics and Automation (ICRA), pp. 7344–7349 (2021). https://doi.org/10. 1109/ICRA48506.2021.9561913

work page arXiv 2021

[11] [11]

Varadarajan, A

Chignoli, M., Morozov, S., Kim, S.: Rapid and reliable quadruped motion planning with omnidirectional jumping. In: 2022 International Conference on Robotics and Automation (ICRA), pp. 6621–6627 (2022). https://doi.org/10. 1109/ICRA46639.2022.9812088

work page arXiv 2022

[12] [12]

In: 2022 IEEE/RSJ Inter- national Conference on Intelligent Robots and Systems (IROS), pp

Song, Z., Yue, L., Sun, G., Ling, Y., Wei, H., Gui, L., Liu, Y.-H.: An optimal motion planning framework for quadruped jumping. In: 2022 IEEE/RSJ Inter- national Conference on Intelligent Robots and Systems (IROS), pp. 11366–11373 (2022). https://doi.org/10.1109/IROS47612.2022.9981642

work page doi:10.1109/iros47612.2022.9981642 2022

[13] [13]

IEEE Transactions on Robotics 41, 837–856 (2025) https://doi.org/10.1109/TRO.2024.3504132

Li, H., Wensing, P.M.: Cafe-mpc: A cascaded-fidelity model predictive control framework with tuning-free whole-body control. IEEE Transactions on Robotics 41, 837–856 (2025) https://doi.org/10.1109/TRO.2024.3504132

work page doi:10.1109/tro.2024.3504132 2025

[14] [14]

IEEE Transactions on Robotics (2024)

Ding, J., Atanassov, V., Panichi, E., Kober, J., Della Santina, C.: Robust quadrupedal jumping with impact-aware landing: Exploiting parallel elasticity. IEEE Transactions on Robotics (2024)

work page 2024

[15] [15]

ArXiv (2022)

Mastalli, C., Merkt, W., Xin, G., Shim, J., Mistry, M., Havoutis, I., Vijayakumar, S.: Agile maneuvers in legged robots:a predictive control approach. ArXiv (2022)

work page 2022

[16] [16]

Li, H., Wensing, P.M.: Cafe-Mpc: A Cascaded-Fidelity Model Predictive Control Framework with Tuning-Free Whole-Body Control (2024)

work page 2024

[17] [17]

arXiv preprint arXiv:2403.06954 (2024)

Bellegarda, G., Shafiee, M., ¨Ozberk, M.E., Ijspeert, A.: Quadruped-frog: Rapid online optimization of continuous quadruped jumping. arXiv preprint arXiv:2403.06954 (2024)

work page arXiv 2024

[18] [18]

Continuous control with deep reinforcement learning

Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N.M.O., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. CoRR abs/1509.02971(2015) 40

work page internal anchor Pith review Pith/arXiv arXiv 2015

[19] [19]

IEEE Robotics & Automation Magazine23(1), 34–43 (2016)

Gehring, C., Coros, S., Hutter, M., Bellicoso, C.D., Heijnen, H., Diethelm, R., Bloesch, M., Fankhauser, P., Hwangbo, J., Hoepflinger, M.,et al.: Practice makes perfect: An optimization-based approach to controlling agile motions for a quadruped robot. IEEE Robotics & Automation Magazine23(1), 34–43 (2016)

work page 2016

[20] [20]

Science Robotics4(26), 5872 (2019)

Hwangbo, J., Lee, J., Dosovitskiy, A., Bellicoso, D., Tsounis, V., Koltun, V., Hutter, M.: Learning agile and dynamic motor skills for legged robots. Science Robotics4(26), 5872 (2019)

work page 2019

[21] [21]

Robotics: Science and Systems (2020) https://doi.org/10.15607/RSS.2020.XVI.064

Peng, X., Coumans, E., Zhang, T., Lee, T.-W., Tan, J., Levine, S.: Learning agile robotic locomotion skills by imitating animals. Robotics: Science and Systems (2020) https://doi.org/10.15607/RSS.2020.XVI.064

work page doi:10.15607/rss.2020.xvi.064 2020

[22] [22]

IEEE Robotics and Automation Letters7(2), 4630–4637 (2022)

Ji, G., Mun, J., Kim, H., Hwangbo, J.: Concurrent training of a control policy and a state estimator for dynamic and robust legged locomotion. IEEE Robotics and Automation Letters7(2), 4630–4637 (2022)

work page 2022

[23] [23]

In: Conference on Robot Learning, pp

Rudin, N., Hoeller, D., Reist, P., Hutter, M.: Learning to walk in minutes using massively parallel deep reinforcement learning. In: Conference on Robot Learning, pp. 91–100 (2022). PMLR

work page 2022

[24] [24]

In: 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp

Fankhauser, P., Hutter, M., Gehring, C., Bloesch, M., Hoepflinger, M.A., Sieg- wart, R.: Reinforcement learning of single legged locomotion. In: 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 188–193 (2013). IEEE

work page 2013

[25] [25]

Science Robotics9(88), 7566 (2024)

Hoeller, D., Rudin, N., Sako, D., Hutter, M.: Anymal parkour: Learning agile navigation for quadrupedal robots. Science Robotics9(88), 7566 (2024)

work page 2024

[26] [26]

https://spinningup.openai.com/en/latest/spinningup/bench.html# benchmarks-for-spinning-up-implementations [Accessed: 26/02/2023] (2022)

OpenAI, I.: Benchmarks for Spinning Up Implementations. https://spinningup.openai.com/en/latest/spinningup/bench.html# benchmarks-for-spinning-up-implementations [Accessed: 26/02/2023] (2022)

work page 2023

[27] [27]

In: Matni, N., Morari, M., Pappas, G.J

Yang, Y., Meng, X., Yu, W., Zhang, T., Tan, J., Boots, B.: Continuous versatile jumping using learned action residuals. In: Matni, N., Morari, M., Pappas, G.J. (eds.) Proceedings of The 5th Annual Learning for Dynamics and Control Confer- ence. Proceedings of Machine Learning Research, vol. 211, pp. 770–782. PMLR, ??? (2023)

work page 2023

[28] [28]

ArXivabs/2110.01411(2021)

Majid, A.Y., Saaybi, S., Rietbergen, T., Fran¸ cois-Lavet, V., Prasad, R.V., Verho- even, C.: Deep reinforcement learning versus evolution strategies: A comparative survey. ArXivabs/2110.01411(2021)

work page arXiv 2021

[29] [29]

Frontiers in Robotics and AI9, 854212 (2022) 41

Bogdanovic, M., Khadiv, M., Righetti, L.: Model-free reinforcement learning for robust locomotion using demonstrations from trajectory optimization. Frontiers in Robotics and AI9, 854212 (2022) 41

work page 2022

[30] [30]

Bellegarda, G., Nguyen, C., Nguyen, Q.: Robust Quadruped Jumping via Deep Reinforcement Learning (2023)

work page 2023

[31] [31]

IEEE Robotics and Automation Letters8(6), 3318–3325 (2023)

Grandesso, G., Alboni, E., Papini, G.P.R., Wensing, P.M., Del Prete, A.: Cacto: Continuous actor-critic with trajectory optimization—towards global optimality. IEEE Robotics and Automation Letters8(6), 3318–3325 (2023)

work page 2023

[32] [32]

ACM SIGGRAPH / Eurographics Symposium on Computer Animation (2017)

Peng, X.B., Panne, M.: Learning locomotion skills using deeprl: Does the choice of action space matter? In: Proc. ACM SIGGRAPH / Eurographics Symposium on Computer Animation (2017)

work page 2017

[33] [33]

In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp

Bellegarda, G., Byl, K.: Training in task space to speed up and guide reinforce- ment learning. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 2693–2699 (2019). https://doi.org/10.1109/IROS40897. 2019.8967995

work page doi:10.1109/iros40897 2019

[34] [34]

In: 2023 IEEE-RAS 22nd International Conference on Humanoid Robots (Humanoids), pp

Chen, S., Zhang, B., Mueller, M.W., Rai, A., Sreenath, K.: Learning torque control for quadrupedal locomotion. In: 2023 IEEE-RAS 22nd International Conference on Humanoid Robots (Humanoids), pp. 1–8 (2023). IEEE

work page 2023

[35] [35]

scientific Reports13(1), 11945 (2023)

Aractingi, M., L´ eziart, P.-A., Flayols, T., Perez, J., Silander, T., Sou` eres, P.: Con- trolling the solo12 quadruped robot with deep reinforcement learning. scientific Reports13(1), 11945 (2023)

work page 2023

[36] [36]

In: 2024 IEEE International Conference on Robotics and Automation (ICRA), pp

Shafiee, M., Bellegarda, G., Ijspeert, A.: Manyquadrupeds: Learning a single locomotion policy for diverse quadruped robots. In: 2024 IEEE International Conference on Robotics and Automation (ICRA), pp. 3471–3477 (2024). IEEE

work page 2024

[37] [37]

In: Matni, N., Morari, M., Pappas, G.J

Yang, Y., Meng, X., Yu, W., Zhang, T., Tan, J., Boots, B.: Continuous versatile jumping using learned action residuals. In: Matni, N., Morari, M., Pappas, G.J. (eds.) Proceedings of The 5th Annual Learning for Dynamics and Control Confer- ence. Proceedings of Machine Learning Research, vol. 211, pp. 770–782. PMLR, ??? (2023). https://proceedings.mlr.press...

work page 2023

[38] [38]

In: 2024 IEEE International Conference on Robotics and Automation (ICRA), pp

Vezzi, F., Ding, J., Raffin, A., Kober, J., Della Santina, C.: Two-stage learning of highly dynamic motions with rigid and articulated soft quadrupeds. In: 2024 IEEE International Conference on Robotics and Automation (ICRA), pp. 9720–9726 (2024). IEEE

work page 2024

[39] [39]

MacDorman, and Norri Kageki

Atanassov, V., Ding, J., Kober, J., Havoutis, I., Santina, C.D.: Curriculum-based reinforcement learning for quadrupedal jumping: A reference-free design. IEEE Robotics & Automation Magazine, 2–15 (2024) https://doi.org/10.1109/MRA. 2024.3487325

work page doi:10.1109/mra 2024

[40] [40]

IEEE Robotics & Automation Magazine30(2), 67–85 (2022) 42

Eßer, J., Bach, N., Jestel, C., Urbann, O., Kerner, S.: Guided reinforcement learn- ing: A review and evaluation for efficient and effective real-world robotics [survey]. IEEE Robotics & Automation Magazine30(2), 67–85 (2022) 42

work page 2022

[41] [41]

Sensors24(15), 4981 (2024)

Bussola, R., Focchi, M., Del Prete, A., Fontanelli, D., Palopoli, L.: Efficient reinforcement learning for 3d jumping monopods. Sensors24(15), 4981 (2024)

work page 2024

[42] [42]

Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal Policy Optimization Algorithms (2017)

work page 2017

[43] [43]

In: International Confer- ence on Machine Learning, pp

Fujita, Y., Maeda, S.-i.: Clipped action policy gradient. In: International Confer- ence on Machine Learning, pp. 1597–1606 (2018). PMLR

work page 2018

[44] [44]

IEEE Robotics and Automation Letters8(6), 3740– 3747 (2023)

Mittal, M., Yu, C., Yu, Q., Liu, J., Rudin, N., Hoeller, D., Yuan, J.L., Singh, R., Guo, Y., Mazhar, H.,et al.: Orbit: A unified simulation framework for interactive robot learning environments. IEEE Robotics and Automation Letters8(6), 3740– 3747 (2023)

work page 2023

[45] [45]

In: Climbing and Walking Robots Conference, pp

Focchi, M., Roscia, F., Semini, C.: Locosim: an open-source cross-platform robotics framework. In: Climbing and Walking Robots Conference, pp. 395–406 (2023). Springer

work page 2023

[46] [46]

In: Liu, K., Kulic, D., Ichnowski, J

Feng, G., Zhang, H., Li, Z., Peng, X.B., Basireddy, B., Yue, L., SONG, Z., Yang, L., Liu, Y., Sreenath, K., Levine, S.: Genloco: Generalized locomotion controllers for quadrupedal robots. In: Liu, K., Kulic, D., Ichnowski, J. (eds.) Proceedings of The 6th Conference on Robot Learning. Proceedings of Machine Learning Research, vol. 205, pp. 1893–1903. PMLR...

work page 1903

[47] [47]

The International Journal of Robotics Research (2025)

Ordo˜ nez-Apraez, D., Turrisi, G., Kostic, V., Martin, M., Agudo, A., Moreno- Noguer, F., Pontil, M., Semini, C., Mastalli, C.: Morphological symmetries in robotics. The International Journal of Robotics Research (2025)

work page 2025

[48] [48]

In: 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp

Su, Z., Huang, X., Ordo˜ nez-Apraez, D., Li, Y., Li, Z., Liao, Q., Turrisi, G., Pontil, M., Semini, C., Wu, Y., Sreenath, K.: Leveraging symmetry in rl-based legged locomotion control. In: 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 6899–6906 (2024)

work page 2024

[49] [49]

SIAM Review7(1), 151–152 (1965) https://doi.org/10.1137/1007028 43

Greenstein, D.S.: Interpolation and approximation. SIAM Review7(1), 151–152 (1965) https://doi.org/10.1137/1007028 43

work page doi:10.1137/1007028 1965