pith. the verified trust layer for science. sign in

arxiv: 2507.13662 · v2 · submitted 2025-07-18 · 💻 cs.RO

Iteratively Learning Muscle Memory for Legged Robots to Master Adaptive and High Precision Locomotion

Pith reviewed 2026-05-19 04:46 UTC · model grok-4.3

classification 💻 cs.RO
keywords legged locomotioniterative learning controltorque libraryadaptive controltrajectory trackingbipedal robotquadrupedal robotmuscle memory
0
0 comments X p. Extension

The pith

A torque library built by iterative learning lets legged robots cut joint errors by 85 percent and adapt to new speeds and slopes without retraining each time.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets out to show that combining iterative learning control with a stored torque library produces precise and adaptive locomotion for legged robots. The library acts like muscle memory by holding control profiles that the robot can reuse across changing conditions. This approach would matter because it removes the need for heavy online computation or starting over when speed, terrain, or gravity shifts, making reliable walking feasible in real environments. The method is tested on both a biped and a quadruped through simulation and hardware runs.

Core claim

The authors establish that a generalized torque library stores control profiles learned through iterative learning control applied to a hybrid-system physics model. Once built, the library supplies the corrections needed for model uncertainties and disturbances, allowing the robot to track trajectories accurately on both periodic and nonperiodic gaits while adapting to slopes and uneven ground.

What carries the argument

The generalized torque library that stores learned control profiles and supplies them for rapid reuse across different speeds, terrains, and gravitational conditions.

If this is right

  • Both periodic and nonperiodic gaits become reliable on bipedal and quadrupedal platforms.
  • Slope traversal and terrain adaptation occur without repeated full learning cycles.
  • Online computation during execution drops enough to support control rates over 30 times higher than existing methods.
  • The same learned profiles transfer across robots of different leg counts when the library is generalized.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same library structure could support higher-level planners that previously ran too slowly under tight timing constraints.
  • Extending the library to include brief disturbance responses might further improve robustness on fully unstructured outdoor ground.
  • Because the method works on both Cassie and A1, similar torque storage may apply to other legged morphologies with minimal redesign.

Load-bearing premise

A single stored torque library can supply accurate corrections for new speeds, terrains, and gravity without the robot having to learn the profiles again from scratch.

What would settle it

Run the robot on a slope or speed change it has not encountered before and check whether joint tracking error still falls by up to 85 percent within a few seconds and whether control updates remain more than 30 times faster than standard whole-body controllers.

Figures

Figures reproduced from arXiv: 2507.13662 by Amit K. Sanyal, Jing Cheng, Yasser G. Alqaham, Zhenyu Gan.

Figure 1
Figure 1. Figure 1: Conceptual illustration of the ILC framework. The ILC process iteratively refines the feedforward control inputs by leveraging data from previous iterations to minimize tracking errors. The figure highlights the interplay between feedback and feedforward components, showcasing how the control scheme adapts to improve trajectory tracking over successive iterations. to converge, and difficulties in transferr… view at source ↗
Figure 2
Figure 2. Figure 2: The kinematic configurations of (a) the quadrupedal A1 (Alqaham et al., 2024) and (b) the bipedal Cassie (Gong et al., 2019) platforms used in this study. The generalized coordinates for both robot platforms are defined as: q =  qB qL  ∈ R nB+nL . where qB = [qx, qy, qz, qyaw, qpitch, qroll] T ∈ R nB denotes the base (torso) position and orientation in 3D space, and qL ∈ R nL contains the actuated and pa… view at source ↗
Figure 3
Figure 3. Figure 3: The proposed control architecture includes trajectory planning (green), feedback control (blue), and feedforward control (orange) modules. An iterative policy further improves stability by refining torques applied to the thigh joints. Zero-phase filtering ensures smooth and phase-consistent control signals. Key computations align with the equations presented in this section. where Kb P and Kb D are the pro… view at source ↗
Figure 5
Figure 5. Figure 5: Tracking improvements for the A1 robot’s calf joints during pronking under lunar and high-gravity conditions. The implementation of ILC achieves significant error reductions in both scenarios, highlighting its adaptability and robustness across diverse gravitational environments. these improvements, demonstrating the adaptability of the controller across extreme gravitational environments. 4.2 Hardware Tes… view at source ↗
Figure 6
Figure 6. Figure 6: A1 robot performing locomotion using the hybrid control scheme across diverse terrains: (a) indoor carpet, (b) wet outdoor concrete, (c) natural grass, (d) snow-covered surface, and (e) inclined ground. optimal policy a = π(z), enabling precise regulation of torso orientation throughout the learning process. 4.2.1 Locomotion Across Natural Terrains To assess the robustness of the proposed control framework… view at source ↗
Figure 7
Figure 7. Figure 7: Tracking performance of A1 during pronking at 0.4 m/s. Subfigures (a)–(b) show control torque evolution before and after ILC activation at t = 10.6 s. Subfigures (c)–(d) compare tracking errors under PD-only and ILC-based control, with calf and thigh RMSE reduced by 58.3% and 25.0%, respectively [PITH_FULL_IMAGE:figures/full_fig_p012_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Hardware experiment on the Cassie robot showing tracking improvements for the left hip joint q3 and knee joint q4. The activation of ILC at t = 8.6 s resulted in up to 80% reduction in tracking error within 3–5 strides. thigh joint, when compared to baseline PD control. These results demonstrate the adaptability and generalizability of ILC in accommodating terrain-induced disturbances without requiring man… view at source ↗
Figure 9
Figure 9. Figure 9: Learned feedforward torque profiles for the A1 robot at various average speeds, organized within the TL. These profiles are directly used during online execution to provide predictive feedforward control without retraining. Panel (a) shows the rear thigh joint torque profiles, while panel (b) presents the rear calf joint. τ k(s), as defined in (13). Once the convergence criterion in (18) is satisfied, the … view at source ↗
Figure 12
Figure 12. Figure 12: Tracking performance of the A1 robot at interpolated speeds (−0.35 m/s, 0.43 m/s, and 0.55 m/s) using the TL. The interpolated feedforward torques significantly reduced convergence time, with the robot achieving steady-state tracking within just two strides [PITH_FULL_IMAGE:figures/full_fig_p014_12.png] view at source ↗
Figure 11
Figure 11. Figure 11: The average torque profiles at 0.5 m/s for A1 model and 0.8 m/s for Cassie model alongside multiple trial curves for: (a) rear thigh joint in A1, (b) rear calf joint in A1, (c) hip pitch joint in Cassie, and (d) knee joint in Cassie. (a) and (b) present the averaged torque profiles computed across 12 trials, while (c) and (d) display the averaged results from 20 trials. Individual trial curves are shown i… view at source ↗
Figure 14
Figure 14. Figure 14: A1 hardware comparison of tracking performance and computation time: (a) Tracking using the TL-based controller with 57.7% RMSE reduction; (c) Tracking using WBC with 50.4% RMSE reduction. (b) TL function call time: 0.0065 ms; (d) WBC function call time: 0.2274 ms. The TL-based controller not only delivers over 35 times faster computation but also achieves better trajectory tracking than WBC, making it hi… view at source ↗
Figure 15
Figure 15. Figure 15: Improvement in jumping performance with learned torque. (a) Jumping distance before learning is approximately 0.09 m; (b) calf joint tracking prior to ILC; (c) improved tracking after ILC results in a final jump distance of 0.39 m. touchdown. After applying the ILC process over three iterations, the controller significantly improves tracking performance, as illustrated in [PITH_FULL_IMAGE:figures/full_fi… view at source ↗
read the original abstract

This paper presents a scalable and adaptive control framework for legged robots that integrates Iterative Learning Control (ILC) with a biologically inspired torque library (TL), analogous to muscle memory. The proposed method addresses key challenges in robotic locomotion, including accurate trajectory tracking under unmodeled dynamics and external disturbances. By leveraging the repetitive nature of periodic gaits and extending ILC to nonperiodic tasks, the framework enhances accuracy and generalization across diverse locomotion scenarios. The control architecture is data-enabled, combining a physics-based model derived from hybrid-system trajectory optimization with real-time learning to compensate for model uncertainties and external disturbances. A central contribution is the development of a generalized TL that stores learned control profiles and enables rapid adaptation to changes in speed, terrain, and gravitational conditions-eliminating the need for repeated learning and significantly reducing online computation. The approach is validated on the bipedal robot Cassie and the quadrupedal robot A1 through extensive simulations and hardware experiments. Results demonstrate that the proposed framework reduces joint tracking errors by up to 85% within a few seconds and enables reliable execution of both periodic and nonperiodic gaits, including slope traversal and terrain adaptation. Compared to state-of-the-art whole-body controllers, the learned skills eliminate the need for online computation during execution and achieve control update rates exceeding 30x those of existing methods. These findings highlight the effectiveness of integrating ILC with torque memory as a highly data-efficient and practical solution for legged locomotion in unstructured and dynamic environments.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript presents a scalable control framework for legged robots integrating Iterative Learning Control (ILC) with a biologically inspired generalized torque library (TL). It claims to achieve up to 85% reduction in joint tracking errors within seconds, reliable execution of periodic and nonperiodic gaits (including slope traversal and terrain adaptation), and control update rates exceeding 30x those of existing whole-body controllers, validated via simulations and hardware experiments on the Cassie biped and A1 quadruped.

Significance. If the generalization claims for the torque library hold under rigorous out-of-distribution testing, the work could meaningfully advance practical legged locomotion by minimizing online computation while retaining adaptability, with the dual-platform hardware validation serving as a concrete strength.

major comments (2)
  1. [Validation experiments] Validation experiments: the reported 85% joint tracking error reduction and 30x control rate improvement lack accompanying error bars, statistical significance tests, data exclusion criteria, and explicit experimental protocols, which are load-bearing for substantiating the central performance claims.
  2. [Torque library section] Torque library section: the indexing/retrieval function (similarity metric or interpolation) for the generalized TL is not formalized, and no separate ablation isolates library-only performance on truly unseen gravitational or terrain parameters; this directly undermines the claim of rapid adaptation without repeated learning from scratch.
minor comments (2)
  1. [Notation] The notation for ILC updates and TL storage could be clarified with a dedicated symbol table to improve readability.
  2. [Figures] Figures depicting adaptation trajectories would benefit from explicit legends distinguishing periodic vs. nonperiodic cases and library retrieval vs. online ILC phases.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We appreciate the referee's detailed feedback on our manuscript. We address each of the major comments below and have revised the manuscript accordingly to improve clarity and rigor.

read point-by-point responses
  1. Referee: [Validation experiments] Validation experiments: the reported 85% joint tracking error reduction and 30x control rate improvement lack accompanying error bars, statistical significance tests, data exclusion criteria, and explicit experimental protocols, which are load-bearing for substantiating the central performance claims.

    Authors: We agree with this assessment and have revised the manuscript to include the requested elements. Specifically, we now report mean and standard deviation across multiple trials (n=10 for simulation, n=5 for hardware) with error bars in all relevant figures and tables. Statistical significance is assessed using paired t-tests, with p-values reported (all <0.01 for the 85% reduction claim). Data exclusion criteria are detailed (e.g., exclusion of trials with communication loss, affecting <3% of data). Experimental protocols are now explicitly described in Section 5, including step-by-step procedures for ILC iterations and hardware setup. revision: yes

  2. Referee: [Torque library section] Torque library section: the indexing/retrieval function (similarity metric or interpolation) for the generalized TL is not formalized, and no separate ablation isolates library-only performance on truly unseen gravitational or terrain parameters; this directly undermines the claim of rapid adaptation without repeated learning from scratch.

    Authors: We acknowledge that the formalization of the retrieval function was insufficiently detailed in the original manuscript. We have added a precise mathematical definition in the revised Section 3.4, specifying the similarity metric as the Euclidean distance in a normalized feature space (including joint positions, velocities, and estimated external forces) and using nearest-neighbor lookup with linear interpolation for non-exact matches. For the ablation study, we have included a new experiment in the revision where the torque library is queried on out-of-distribution parameters (e.g., slopes of 20 degrees not seen during learning and gravity variations of ±20%), demonstrating that adaptation occurs without re-learning from scratch, with tracking errors reduced by 70% on average within 3 iterations. This addresses the concern regarding generalization. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper describes an integrated framework that combines Iterative Learning Control (ILC) with a torque library and a physics-based hybrid-system model, using real-time data to compensate for uncertainties. Performance claims (error reduction, adaptation to speed/terrain/gravity, high update rates) are presented as outcomes of simulations and hardware validation on Cassie and A1, rather than as mathematical predictions derived from the inputs themselves. No equations, definitions, or steps reduce the central results to fitted parameters renamed as predictions, self-citations that bear the load of uniqueness, or ansatzes smuggled via prior work. The approach is self-contained through explicit empirical testing against external benchmarks and does not rely on any load-bearing self-referential construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the repetitive nature of gaits allowing ILC extension, the existence of a storable generalized torque library that generalizes without retraining, and the hybrid-system model providing a sufficient base for learning compensation.

axioms (1)
  • domain assumption Repetitive nature of periodic gaits allows extension of ILC to nonperiodic tasks
    Invoked in the abstract to justify the framework's generalization capability.
invented entities (1)
  • Generalized torque library (TL) no independent evidence
    purpose: Stores learned control profiles for rapid adaptation across conditions without repeated learning
    New construct introduced to eliminate online computation during execution

pith-pipeline@v0.9.0 · 5810 in / 1253 out tokens · 30982 ms · 2026-05-19T04:46:56.663463+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

54 extracted references · 54 canonical work pages · 1 internal anchor

  1. [1]

    In: Precup D and Teh YW (eds.) Proceedings of the 34th International Conference on Machine Learning, Proceedings of Machine Learning Research, volume 70

    Achiam J, Held D, Tamar A and Abbeel P (2017) Constrained policy optimization. In: Precup D and Teh YW (eds.) Proceedings of the 34th International Conference on Machine Learning, Proceedings of Machine Learning Research, volume 70. PMLR, pp. 22--31. ://proceedings.mlr.press/v70/achiam17a.html

  2. [2]

    https://agilityrobotics.com/

    Agility Robotics (2025) Cassie Bipedal Robot . https://agilityrobotics.com/. Accessed: 2025-03-30

  3. [3]

    IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) 37(6): 1099--1121

    Ahn HS, Chen Y and Moore KL (2007) Iterative learning control: Brief survey and categorization. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) 37(6): 1099--1121

  4. [4]

    IEEE Robotics and Automation Letters 9(10): 8386--8393

    Alqaham YG, Cheng J and Gan Z (2024) Energy-optimal asymmetrical gait selection for quadrupedal robots. IEEE Robotics and Automation Letters 9(10): 8386--8393. doi:10.1109/LRA.2024.3443589

  5. [5]

    In: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

    Bledt G, Katz B, Di Carlo J, Wensing PM and Kim S (2018) Mit cheetah 3: Design and control of a robust, dynamic quadruped robot. In: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, pp. 2245--2252. doi:10.1109/IROS.2018.8593885

  6. [6]

    ://iit-dlslab.github.io/papers/bratta21irim.pdf

    Bratta A, Rathod N, Zanon M, Villarreal O, Bemporad A, Semini C and Focchi M (2021) Towards a nonlinear model predictive control for quadrupedal locomotion on rough terrain. ://iit-dlslab.github.io/papers/bratta21irim.pdf

  7. [7]

    IEEE Control Systems Magazine 26(3): 96--114

    Bristow DA, Tharayil M and Alleyne AG (2006) A survey of iterative learning control. IEEE Control Systems Magazine 26(3): 96--114. doi:10.1109/MCS.2006.1636313

  8. [8]

    IEEE Robotics and Automation Letters 5(4): 6318--6325

    Chadwick M, Kolvenbach H, Dubois F, Lau HF and Hutter M (2020) Vitruvio: An open-source leg design optimization toolbox for walking robots. IEEE Robotics and Automation Letters 5(4): 6318--6325. doi:10.1109/LRA.2020.3013913

  9. [9]

    ://arxiv.org/abs/2203.05194

    Chen S, Zhang B, Mueller MW, Rai A and Sreenath K (2023) Learning torque control for quadrupedal locomotion. ://arxiv.org/abs/2203.05194

  10. [10]

    In: 2023 American Control Conference (ACC)

    Cheng J, Alqaham YG, Sanyal AK and Gan Z (2023) Practice makes perfect: an iterative approach to achieve precise tracking for legged robots. In: 2023 American Control Conference (ACC). pp. 2165--2170. doi:10.23919/ACC55779.2023.10156623

  11. [11]

    In: 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems

    Chilian A, Hirschmüller H and Görner M (2011) Multisensor data fusion for robust pose estimation of a six-legged walking robot. In: 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems. pp. 2497--2504. doi:10.1109/IROS.2011.6094484

  12. [12]

    Lyapunov-based Safe Policy Optimization for Continuous Control

    Chow Y, Nachum O, Faust A, Duenez-Guzman E and Ghavamzadeh M (2019) Lyapunov-based safe policy optimization for continuous control. ://arxiv.org/abs/1901.10031

  13. [13]

    IEEE Access 4: 3469--3478

    Da X, Harib O, Hartley R, Griffin B and Grizzle JW (2016) From 2d design of underactuated bipedal gaits to 3d implementation: Walking with speed tracking. IEEE Access 4: 3469--3478. doi:10.1109/ACCESS.2016.2582731

  14. [14]

    In: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

    Di Carlo J, Wensing PM, Katz B, Bledt G and Kim S (2018) Dynamic locomotion in the mit cheetah 3 through convex model-predictive control. In: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). pp. 1--9. doi:10.1109/IROS.2018.8594448

  15. [15]

    In: Precup D and Teh YW (eds.) Proceedings of the 34th International Conference on Machine Learning, Proceedings of Machine Learning Research, volume 70

    Finn C, Abbeel P and Levine S (2017) Model-agnostic meta-learning for fast adaptation of deep networks. In: Precup D and Teh YW (eds.) Proceedings of the 34th International Conference on Machine Learning, Proceedings of Machine Learning Research, volume 70. PMLR, pp. 1126--1135. ://proceedings.mlr.press/v70/finn17a.html

  16. [16]

    In: 2019 American Control Conference (ACC)

    Gong Y, Hartley R, Da X, Hereid A, Harib O, Huang JK and Grizzle J (2019) Feedback control of a cassie bipedal robot: Walking, standing, and riding a segway. In: 2019 American Control Conference (ACC). pp. 4559--4566. doi:10.23919/ACC.2019.8814833

  17. [17]

    7666–7673

    Grandia R, Farshidian F, Ranftl R and Hutter M (2019) Feedback mpc for torque-controlled legged robots. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). pp. 4730--4737. doi:10.1109/IROS40897.2019.8968251

  18. [18]

    In: 2017 IEEE International Conference on Robotics and Automation (ICRA)

    Gu S, Holly E, Lillicrap T and Levine S (2017) Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates. In: 2017 IEEE International Conference on Robotics and Automation (ICRA). pp. 3389--3396. doi:10.1109/ICRA.2017.7989385

  19. [19]

    In: Proceedings of the 35th International Conference on Machine Learning (ICML), volume 80

    Haarnoja T, Zhou A, Abbeel P and Levine S (2018) Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In: Proceedings of the 35th International Conference on Machine Learning (ICML), volume 80. PMLR, pp. 1861--1870. ://proceedings.mlr.press/v80/haarnoja18b/haarnoja18b.pdf

  20. [20]

    ://arxiv.org/abs/2007.04309

    Hansen N, Jangir R, Sun Y, Alenyà G, Abbeel P, Efros AA, Pinto L and Wang X (2021) Self-supervised policy adaptation during deployment. ://arxiv.org/abs/2007.04309

  21. [21]

    In: 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

    Hereid A and Ames AD (2017) Frost*: Fast robot optimization and simulation toolkit. In: 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). pp. 719--726. doi:10.1109/IROS.2017.8202230

  22. [22]

    Gait and Posture 4(3): 222--223

    Hof AL (1996) Scaling gait data to body size. Gait and Posture 4(3): 222--223. doi:10.1016/0966-6362(95)01057-2

  23. [23]

    In: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

    Kajita S, Kanehiro F, Kaneko K, Yokoi K and Hirukawa H (2001) The 3d linear inverted pendulum mode: A simple modeling for a biped walking pattern generation. In: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, pp. 239--246

  24. [24]

    ://arxiv.org/abs/2207.10465

    Kang D, Vincenti FD and Coros S (2022) Nonlinear model predictive control for quadrupedal locomotion using second-order sensitivity analysis. ://arxiv.org/abs/2207.10465

  25. [25]

    In: 2019 International Conference on Robotics and Automation (ICRA)

    Katz B, Carlo JD and Kim S (2019) Mini cheetah: A platform for pushing the limits of dynamic quadruped control. In: 2019 International Conference on Robotics and Automation (ICRA). pp. 6295--6301. doi:10.1109/ICRA.2019.8793865

  26. [26]

    ://arxiv.org/abs/1909.06586

    Kim D, Carlo JD, Katz B, Bledt G and Kim S (2019) Highly dynamic quadruped locomotion via whole-body impulse control and model predictive control. ://arxiv.org/abs/1909.06586

  27. [27]

    Koenig and A

    Koenig N and Howard A (2004) Design and use paradigms for gazebo, an open-source multi-robot simulator. In: 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566), volume 3. pp. 2149--2154 vol.3. doi:10.1109/IROS.2004.1389727

  28. [28]

    IEEE Transactions on Robotics 40: 1617--1629

    Le Cleac'h S, Howell TA, Yang S, Lee CY, Zhang J, Bishop A, Schwager M and Manchester Z (2024) Fast contact-implicit model predictive control. IEEE Transactions on Robotics 40: 1617--1629. doi:10.1109/TRO.2024.3351554

  29. [29]

    Science Robotics 5(47): eabc5986

    Lee J, Hwangbo J, Wellhausen L, Koltun V and Hutter M (2020) Learning quadrupedal locomotion over challenging terrain. Science Robotics 5(47): eabc5986. doi:10.1126/scirobotics.abc5986

  30. [30]

    Miller, K

    Melon O, Geisert M, Surovik D, Havoutis I and Fallon M (2020) Reliable trajectories for dynamic quadrupeds using analytical costs and learned initializations. In: 2020 IEEE International Conference on Robotics and Automation (ICRA). pp. 1410--1416. doi:10.1109/ICRA40945.2020.9196562

  31. [31]

    In: ACM SIGGRAPH 2010 Papers, SIGGRAPH '10

    Mordatch I, de Lasa M and Hertzmann A (2010) Robust physics-based locomotion using low-dimensional planning. In: ACM SIGGRAPH 2010 Papers, SIGGRAPH '10. New York, NY, USA: Association for Computing Machinery. ISBN 9781450302104. doi:10.1145/1833349.1778808

  32. [32]

    In: Proceedings of the 32nd International Conference on Neural Information Processing Systems, NIPS'18

    Nachum O, Gu S, Lee H and Levine S (2018) Data-efficient hierarchical reinforcement learning. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems, NIPS'18. Red Hook, NY, USA: Curran Associates Inc., p. 3307–3317

  33. [33]

    IEEE Robotics and Automation Letters 3(3): 1458--1465

    Neunert M, Farshidian F, Wermelinger M, St \"a uble A and Buchli J (2018) Whole-body nonlinear model predictive control through contacts for quadrupeds. IEEE Robotics and Automation Letters 3(3): 1458--1465

  34. [34]

    ://arxiv.org/abs/2408.02619

    Nguyen C, Bao L and Nguyen Q (2024) Mastering agile jumping skills from simple practices with iterative learning control. ://arxiv.org/abs/2408.02619

  35. [35]

    In: 2018 IEEE International Conference on Robotics and Automation (ICRA)

    Peng X, Andrychowicz M, Zaremba W and Abbeel P (2018) Sim-to-real transfer of robotic control with dynamics randomization. In: 2018 IEEE International Conference on Robotics and Automation (ICRA). IEEE, p. 3803–3810. doi:10.1109/icra.2018.8460528

  36. [36]

    MIT press

    Raibert MH (1986) Legged robots that balance. MIT press

  37. [37]

    IEEE Access 9: 145710–145727

    Rathod N, Bratta A, Focchi M, Zanon M, Villarreal O, Semini C and Bemporad A (2021) Model predictive control with environment adaptation for legged locomotion. IEEE Access 9: 145710–145727. doi:10.1109/access.2021.3118957. ://dx.doi.org/10.1109/ACCESS.2021.3118957

  38. [38]

    ://arxiv.org/abs/2105.08328

    Siekmann J, Green K, Warila J, Fern A and Hurst J (2021) Blind bipedal stair traversal via sim-to-real reinforcement learning. ://arxiv.org/abs/2105.08328

  39. [39]

    In: Robotics: Science and Systems XIV, RSS2018

    Tan J, Zhang T, Coumans E, Iscen A, Bai Y, Hafner D, Bohez S and Vanhoucke V (2018) Sim-to-real: Learning agile locomotion for quadruped robots. In: Robotics: Science and Systems XIV, RSS2018. Robotics: Science and Systems Foundation. ://dx.doi.org/10.15607/rss.2018.xiv.010

  40. [40]

    ://underactuated.mit.edu/dp.html

    Tedrake R (2022) Underactuated Robotics: Algorithms for Walking, Running, Swimming, Flying, and Manipulation. ://underactuated.mit.edu/dp.html. Course notes for MIT 6.832, Chapter 6: Dynamic Programming

  41. [41]

    In: 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

    Tobin J, Fong R, Ray A, Schneider J, Zaremba W and Abbeel P (2017) Domain randomization for transferring deep neural networks from simulation to the real world. In: 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). pp. 23--30. doi:10.1109/IROS.2017.8202133

  42. [42]

    https://www.unitree.com/a1/

    Unitree Robotics (2020) Unitree A1 Quadruped Robot . https://www.unitree.com/a1/. Accessed: 2025-03-29

  43. [43]

    In: 2024 European Control Conference (ECC)

    Weiss M, Stirling A, Pawluchin A, Lehmann D, Hannemann Y, Seel T and Boblan I (2024) Achieving velocity tracking despite model uncertainty for a quadruped robot with a pd-ilc controller. In: 2024 European Control Conference (ECC). pp. 134--140. doi:10.23919/ECC64448.2024.10590932

  44. [44]

    CRC Press

    Westervelt ER, Grizzle JW, Chevallereau C, Choi JH and Morris B (2007) Feedback Control of Dynamic Bipedal Robot Locomotion. CRC Press. ://web.eecs.umich.edu/ grizzle/papers/Westervelt_biped_control_book_15_May_2007.pdf

  45. [45]

    In: 2006 6th IEEE-RAS International Conference on Humanoid Robots

    Wieber Pb (2006) Trajectory free linear model predictive control for stable walking in the presence of strong perturbations. In: 2006 6th IEEE-RAS International Conference on Humanoid Robots. pp. 137--142. doi:10.1109/ICHR.2006.321375

  46. [46]

    IEEE Robotics and Automation Letters 3(3): 1560--1567

    Winkler AW, Bellicoso CD, Hutter M and Buchli J (2018) Gait and trajectory optimization for legged systems through phase-based end-effector parameterization. IEEE Robotics and Automation Letters 3(3): 1560--1567. doi:10.1109/LRA.2018.2798285

  47. [47]

    In: LaValle SM, O'Kane JM, Otte M, Sadigh D and Tokekar P (eds.) Algorithmic Foundations of Robotics XV

    Xie Z, Da X, Babich B, Garg A and de Panne Mv (2023) Glide: Generalizable quadrupedal locomotion in diverse environments with a centroidal model. In: LaValle SM, O'Kane JM, Otte M, Sadigh D and Tokekar P (eds.) Algorithmic Foundations of Robotics XV. Cham: Springer International Publishing. ISBN 978-3-031-21090-7, pp. 523--539

  48. [48]

    In: Proceedings of the 34th Conference on Neural Information Processing Systems (NeurIPS)

    Yang A, Hwangbo J, Margolis C and Hutter M (2020) Data-efficient reinforcement learning for legged robots. In: Proceedings of the 34th Conference on Neural Information Processing Systems (NeurIPS). pp. 1--12. ://proceedings.mlr.press/v100/yang20a/yang20a.pdf

  49. [49]

    ://arxiv.org/abs/2203.02638

    Yang TY, Zhang T, Luu L, Ha S, Tan J and Yu W (2022) Safe reinforcement learning for legged locomotion. ://arxiv.org/abs/2203.02638

  50. [50]

    In: 2020 IEEE Symposium Series on Computational Intelligence (SSCI)

    Zhao W, Queralta JP and Westerlund T (2020) Sim-to-real transfer in deep reinforcement learning for robotics: a survey. In: 2020 IEEE Symposium Series on Computational Intelligence (SSCI). pp. 737--744. doi:10.1109/SSCI47803.2020.9308468

  51. [51]

    , " * write output.state after.block = add.period write newline

    ENTRY address author booktitle chapter doi edition editor eid howpublished institution isbn journal key month note number organization pages publisher school series title type url volume year label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block FUNCTION init.state.consts #0 'before.all := #1 'mid...

  52. [52]

    write newline

    " write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in capitalize ":" * " " *...

  53. [53]

    , " * write output.state after.block = add.period write newline

    ENTRY address archive author booktitle chapter doi edition editor eid eprint howpublished institution isbn journal key month note number organization pages publisher school series title type url volume year label INTEGERS output.state before.all mid.sentence after.sentence after.block FUNCTION init.state.consts #0 'before.all := #1 'mid.sentence := #2 'af...

  54. [54]

    write newline

    " write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in capitalize " " * FUNCT...