pith. sign in

arxiv: 2604.14344 · v1 · submitted 2026-04-15 · 💻 cs.RO

CART: Context-Aware Terrain Adaptation using Temporal Sequence Selection for Legged Robots

Pith reviewed 2026-05-10 12:47 UTC · model grok-4.3

classification 💻 cs.RO
keywords legged robotsterrain adaptationmultimodal sensingtemporal sequence selectionproprioceptionexteroceptioncontext-aware controlvibrational stability
0
0 comments X

The pith

CART combines vision and proprioception with temporal sequences to enable stable walking on complex terrain for legged robots.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper aims to show that legged robots can achieve more stable walking on uneven and complex terrain by developing an integrated understanding of how surfaces look and feel, using sequences of sensor data over time. Most existing methods struggle because they depend heavily on vision which does not always align with the physical feedback the robot receives, leading to poor adaptation. CART counters this by selecting relevant temporal sequences from combined visual and body sensors to inform a high-level controller. This is important for expanding robot use in outdoor and off-road settings where terrain varies unpredictably. The work demonstrates these gains through comparisons on simulation and physical robots, with gains in task success and stability measures while keeping locomotion speed the same.

Core claim

CART is a high-level controller that integrates proprioception and exteroception from onboard sensing to achieve a robust understanding of terrain by using context-aware adaptation with temporal sequence selection. This method addresses the Visual-Texture Paradox, where visual cues do not match actual terrain feel, resulting in improved stability on complex terrains.

What carries the argument

Temporal sequence selection, which processes sequences of multimodal sensor data to build contextual terrain properties for adaptation.

If this is right

  • Average success rate in simulation increases by 5 percent compared to multimodal baselines.
  • Stability improves by up to 45 percent in one real-world setting and 24 percent in another.
  • Task completion time remains unchanged despite the added adaptation.
  • The method applies to multiple legged robot hardware platforms.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Extending the temporal window or adding more sensor types could further enhance terrain understanding in dynamic environments.
  • This temporal approach may help bridge gaps in purely end-to-end learning methods that lack explicit context modeling.
  • Applying similar sequence selection to other robot tasks like manipulation could improve performance in varied conditions.
  • Validating the vibrational stability metric against direct measures of energy efficiency or failure modes would strengthen the evaluation.

Load-bearing premise

Vibrational stability measured at the robot base accurately reflects the quality of terrain understanding and that the temporal selection process generalizes without overfitting to tested conditions.

What would settle it

A test where CART is evaluated on a new set of terrains with different properties from those used in training and evaluation, checking if the stability and success improvements hold or if performance drops to baseline levels.

Figures

Figures reproduced from arXiv: 2604.14344 by Karthik Dantu, Kartikeya Singh, Yash Turkar, Youngjin Kim.

Figure 1
Figure 1. Figure 1: Context switching using CART: Robot traverse through slushy snow overlayed on grass. On the left, we show the actual image projection from the robot’s perspective that describes the raw image and snow-grass segments using [8]. We show a Visual-Texture paradox instance between vision only [19] and CART’s estimated context using vision and proprioception. of robust terrain understanding with minimal dependen… view at source ↗
Figure 2
Figure 2. Figure 2: Overview of the Pipeline: CART inputs a stream of RGBD images Sv, friction meshes Sm using [19], and proprioception data Pt (joints torques & velocity, and feet slips) resulting in a state space S. We train a modular high-level locomotion policy θ(cmdvel,height) with an added attention-based context vector Ct that determines the context between the exteroception and proprioception using our defined rewards… view at source ↗
Figure 4
Figure 4. Figure 4: Relationship between lateral foot perturbation magnitude (∆q) and base vibration at five walking speeds. Each subplot shows RMS base vibrations (roll, pitch, yaw, and total) as a function of ∆q; colored curves indicate different speeds, and error bars denote variability across random perturbation seeds. Overall, larger ∆q yields higher vibration. Our simulated experiments were carried out on Isaacsim’s leg… view at source ↗
Figure 3
Figure 3. Figure 3: IsaacSim terrain configurations used for training and testing of all baselines along with CART during our experiments. The top and bottom rows represent the difficulty of the same terrain type used for training and testing. q (m) 0.0000 0.0025 0.0050 0.0075 0.0100 0.0125 0.0150 0.0175 roll RMS (deg) speed 0.17 m/s 0.14 m/s 0.12 m/s 0.10 m/s 0.07 m/s q (m) 0.000 0.005 0.010 0.015 0.020 0.025 0.030 0.035 pit… view at source ↗
Figure 5
Figure 5. Figure 5: Real-world experiment scenario with multiple rugged ter￾rains and elevations, including: Grass, Mud, Concrete, Gravel, and Mulch. The experiments were conducted over various underlying terrains, such as Grass over mud and Grass over Gravel. The red markers represent the waypoints resulting in 7 runs per baseline. B. Real-world Experiment Setup Our real-world experiments were conducted using a Boston Dynami… view at source ↗
Figure 6
Figure 6. Figure 6: Success rates: CART achieve better success rates during the simulation experiments when compared with existing baselines. model was trained using terrain configurations shown in [PITH_FULL_IMAGE:figures/full_fig_p006_6.png] view at source ↗
read the original abstract

Animals in nature combine multiple modalities, such as sight and feel, to perceive terrain and develop an understanding of how to walk on uneven terrain in a stable manner. Similarly, legged robots need to develop their ability to stably walk on complex terrains by developing an understanding of the relationship between vision and proprioception. Most current terrain adaptation methods are susceptible to failure on complex, off-road terrain as they rely on prior experience, particularly observations from a vision sensor. This experience-based learning often creates a Visual-Texture Paradox between what has been seen and how it actually feels. In this work, we introduce CART, a high-level controller built on a context-aware terrain adaptation approach that integrates proprioception and exteroception from onboard sensing to achieve a robust understanding of terrain. We evaluate our method on multiple terrains using an ANYmal-C robot on the IsaacSim simulator and a Boston Dynamics SPOT robot for our real-world experiments. To evaluate the learned contextual terrain properties, we adapt vibrational stability on the base of the robot as a metric. We compare CART with various state-of-the-art baselines equipped with multimodal sensing in both simulation and the real world. CART achieves an average success rate improvement of 5% over all baselines in simulation and improves the overall stability up to 45% and 24% in the real world without increasing the time taken by the robot to accomplish locomotion tasks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper introduces CART, a high-level controller for legged robots that performs context-aware terrain adaptation by integrating proprioceptive and exteroceptive (vision) inputs through temporal sequence selection. This is intended to overcome the visual-texture paradox and enable stable locomotion on complex off-road terrains. The method is evaluated on an ANYmal-C robot in IsaacSim simulation and a Boston Dynamics SPOT robot in real-world experiments across multiple terrains. CART is compared against state-of-the-art multimodal baselines and claims an average 5% success-rate improvement in simulation plus stability gains of up to 45% and 24% in the real world, without increasing task completion time. Vibrational stability measured at the robot base is used as the primary metric for assessing the quality of the learned contextual terrain properties.

Significance. If the central empirical claims are substantiated with rigorous controls, CART would represent a practical advance in multimodal terrain adaptation for legged locomotion, directly addressing a known failure mode of vision-only methods. The temporal-sequence approach to fusing modalities is a plausible mechanism for building robust context, and the absence of increased traversal time is a positive practical result. However, the significance is currently limited by the reliance on a single, potentially confounded stability metric whose correlation with actual terrain understanding and generalization remains unverified.

major comments (3)
  1. [Abstract and §4] Abstract and §4 (Evaluation): The central claim that temporal sequence selection produces a robust multimodal terrain understanding rests on vibrational stability at the robot base as the evaluation metric. This metric is vulnerable to confounding by controller tuning, leg compliance, and sensor noise, and may not capture failure modes such as foot slippage or inefficient gaits on unseen terrains; no correlation analysis or ablation against alternative metrics (e.g., foot-force variance, energy consumption, or slip detection) is provided to establish that the reported 5%/45%/24% gains reflect improved contextual understanding rather than incidental controller effects.
  2. [§3 and §4] §3 (Method) and §4: The description of the temporal sequence selection mechanism does not include an analysis of its sensitivity to sequence length, sampling rate, or terrain-specific overfitting. Without cross-terrain generalization tests or hold-out terrain results that isolate the contribution of the selection module, it is unclear whether the observed improvements generalize beyond the specific test set or simply reflect better tuning on the evaluated surfaces.
  3. [§4] §4: The abstract states quantitative improvements but the experimental section supplies insufficient detail on the number of trials per terrain, statistical tests used, baseline implementation fidelity (e.g., whether baselines received identical hyper-parameter tuning), and data exclusion criteria. These omissions prevent independent verification of the 5% success-rate and stability figures and undermine the strength of the comparative claims.
minor comments (2)
  1. [Abstract] The term 'exteroception' is used without an explicit definition or reference in the abstract; a brief clarification in the introduction would improve accessibility for readers outside the immediate subfield.
  2. [§4] Figure captions and axis labels in the experimental results should explicitly state the number of runs and error bars (standard deviation or confidence intervals) to allow immediate assessment of variability.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment below, providing clarifications and committing to revisions that strengthen the evaluation and reporting without altering the core contributions.

read point-by-point responses
  1. Referee: [Abstract and §4] Abstract and §4 (Evaluation): The central claim that temporal sequence selection produces a robust multimodal terrain understanding rests on vibrational stability at the robot base as the evaluation metric. This metric is vulnerable to confounding by controller tuning, leg compliance, and sensor noise, and may not capture failure modes such as foot slippage or inefficient gaits on unseen terrains; no correlation analysis or ablation against alternative metrics (e.g., foot-force variance, energy consumption, or slip detection) is provided to establish that the reported 5%/45%/24% gains reflect improved contextual understanding rather than incidental controller effects.

    Authors: We appreciate the concern about potential confounding in the vibrational stability metric. All compared methods used the identical low-level controller, robot platform, and sensor suite, which controls for tuning and compliance differences. The metric was selected as it directly measures the outcome of terrain adaptation (base smoothness during locomotion). We acknowledge that it does not explicitly quantify every failure mode. In revision we will add a limited correlation analysis using available logged data to compare vibrational stability against foot-force variance and energy consumption on representative terrains, plus a short discussion of limitations with respect to slip and sensor noise. revision: partial

  2. Referee: [§3 and §4] §3 (Method) and §4: The description of the temporal sequence selection mechanism does not include an analysis of its sensitivity to sequence length, sampling rate, or terrain-specific overfitting. Without cross-terrain generalization tests or hold-out terrain results that isolate the contribution of the selection module, it is unclear whether the observed improvements generalize beyond the specific test set or simply reflect better tuning on the evaluated surfaces.

    Authors: We agree that explicit sensitivity and isolation analyses would improve clarity. Sequence length was chosen via preliminary tuning for real-time feasibility; we will add a new paragraph in §4 reporting performance across a range of lengths and sampling rates on the existing terrain set. Our evaluation already spans multiple distinct simulation and real-world terrains with consistent outperformance. To isolate the selection module we will include an ablation replacing it with fixed-length or random selection, showing its specific contribution. These additions will be based on re-analysis of existing runs where possible. revision: yes

  3. Referee: [§4] §4: The abstract states quantitative improvements but the experimental section supplies insufficient detail on the number of trials per terrain, statistical tests used, baseline implementation fidelity (e.g., whether baselines received identical hyper-parameter tuning), and data exclusion criteria. These omissions prevent independent verification of the 5% success-rate and stability figures and undermine the strength of the comparative claims.

    Authors: We regret the insufficient experimental detail. In the revised §4 we will report the precise number of trials executed per terrain and method, the statistical tests applied (including p-values), confirmation that baselines were re-implemented from their original papers with identical hyper-parameter search procedures where applicable, and the exact data exclusion rules (e.g., safety aborts counted as failures). These additions will be textual and tabular and will not require new experiments. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical method with direct experimental validation

full rationale

The paper presents CART as a context-aware controller that integrates proprioception and exteroception via temporal sequence selection, evaluated through direct comparisons of success rate and vibrational stability against multimodal baselines in simulation (IsaacSim) and real-world (ANYmal-C, SPOT) experiments. No equations, derivations, fitted parameters renamed as predictions, or self-citation chains appear in the abstract or described claims. The vibrational stability metric is introduced as an evaluation choice without reduction to prior fits or self-definitions. All load-bearing claims rest on reported empirical deltas (5% sim success, 45%/24% real stability) rather than any construction that equates outputs to inputs by definition.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Based solely on the abstract, the central claim rests on the domain assumption that multimodal proprioceptive-exteroceptive integration via temporal sequences yields superior terrain context; no free parameters, ad-hoc axioms, or invented entities are identifiable.

axioms (1)
  • domain assumption Legged robots benefit from combining vision and proprioception for terrain adaptation on complex surfaces
    Standard premise in robotics locomotion research invoked to motivate the Visual-Texture Paradox.

pith-pipeline@v0.9.0 · 5555 in / 1341 out tokens · 54825 ms · 2026-05-10T12:47:29.273308+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

49 extracted references · 49 canonical work pages · 1 internal anchor

  1. [1]

    Autonomous delivery robots and their potential impacts on urban freight energy consumption and emissions,

    M. Figliozzi and D. Jennings, “Autonomous delivery robots and their potential impacts on urban freight energy consumption and emissions,” Transportation research procedia, vol. 46, pp. 21–28, 2020

  2. [2]

    Advances in real-world applications for legged robots,

    C. D. Bellicoso, M. Bjelonic, L. Wellhausen, K. Holtmann, F. G ¨unther, M. Tranzatto, P. Fankhauser, and M. Hutter, “Advances in real-world applications for legged robots,”Journal of Field Robotics, vol. 35, no. 8, pp. 1311–1326, 2018

  3. [3]

    Haptic inspection of planetary soils with legged robots,

    H. Kolvenbach, C. B ¨artschi, L. Wellhausen, R. Grandia, and M. Hutter, “Haptic inspection of planetary soils with legged robots,”IEEE Robotics and Automation Letters, vol. 4, no. 2, pp. 1626–1632, 2019

  4. [4]

    Precision agriculture robot for seeding function,

    N. S. Naik, V . V . Shete, and S. R. Danve, “Precision agriculture robot for seeding function,” in2016 international conference on inventive computation technologies (ICICT), vol. 2. IEEE, 2016, pp. 1–3

  5. [5]

    Dynamics

    B. Dynamics. (2023) About the spot robot. Accessed: 2024-09-

  6. [6]

    Available: https://support.bostondynamics.com/s/article/ About-the-Spot-Robot-72005

    [Online]. Available: https://support.bostondynamics.com/s/article/ About-the-Spot-Robot-72005

  7. [7]

    (2023) About the unitree robot

    unitree. (2023) About the unitree robot. [Online]. Available: https://shop.unitree.com/products/unitree-go2? srsltid=AfmBOopSkw67HujLhIwAHpq1DLuCBe7h4Qh z4c4EaotY6eFRrMvbPo8

  8. [8]

    Anymal-a highly mobile and dynamic quadrupedal robot,

    M. Hutter, C. Gehring, D. Jud, A. Lauber, C. D. Bellicoso, V . Tsounis, J. Hwangbo, K. Bodie, P. Fankhauser, M. Bloesch,et al., “Anymal-a highly mobile and dynamic quadrupedal robot,” in2016 IEEE/RSJ international conference on intelligent robots and systems (IROS). IEEE, 2016, pp. 38–44

  9. [9]

    Offseg: A semantic segmentation framework for off-road driving,

    K. Viswanath, K. Singh, P. Jiang, P. Sujit, and S. Saripalli, “Offseg: A semantic segmentation framework for off-road driving,” in2021 IEEE 17th international conference on automation science and engineering (CASE). IEEE, 2021, pp. 354–359

  10. [10]

    Ganav: Group-wise attention for classifying navigable regions in unstructured outdoor environments

    T. Guan, D. Kothandaraman, R. Chandra, A. J. Sathyamoorthy, and D. Manocha, “Ganav: Group-wise attention for classifying navigable regions in unstructured outdoor environments.”

  11. [11]

    Off-road drivable area detection: A learning-based approach exploiting lidar reflection texture informa- tion,

    C. Zhong, B. Li, and T. Wu, “Off-road drivable area detection: A learning-based approach exploiting lidar reflection texture informa- tion,”Remote Sensing, vol. 15, no. 1, p. 27, 2022

  12. [12]

    Wait, that feels familiar: Learning to extrapolate human preferences for preference-aligned path planning,

    E. Yang, H. Karnan, G. Warnell, P. Stone, and J. Biswas, “Wait, that feels familiar: Learning to extrapolate human preferences for preference-aligned path planning,” inICRA2023 Workshop on Pre- training for Robotics (PT4R), 2023

  13. [13]

    Off-road lidar intensity based semantic segmentation,

    K. Viswanath, P. Jiang, P. Sujit, and S. Saripalli, “Off-road lidar intensity based semantic segmentation,” inInternational Symposium on Experimental Robotics. Springer, 2023, pp. 608–617

  14. [14]

    Lidar data seg- mentation in off-road environment using convolutional neural networks (cnn),

    L. Dabbiru, C. Goodin, N. Scherrer, and D. Carruth, “Lidar data seg- mentation in off-road environment using convolutional neural networks (cnn),”SAE International Journal of Advances and Current Practices in Mobility, vol. 2, no. 2020-01-0696, pp. 3288–3292, 2020

  15. [15]

    Ufo: Uncertainty-aware lidar-image fusion for off-road semantic terrain map estimation,

    O. Kim, J. Seo, S. Ahn, and C. H. Kim, “Ufo: Uncertainty-aware lidar-image fusion for off-road semantic terrain map estimation,”arXiv preprint arXiv:2403.02642, 2024

  16. [16]

    Fine-grained off-road semantic segmentation and mapping via contrastive learning,

    B. Gao, S. Hu, X. Zhao, and H. Zhao, “Fine-grained off-road semantic segmentation and mapping via contrastive learning,” in2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2021, pp. 5950–5957

  17. [17]

    Legged locomotion in challenging terrains using egocentric vision,

    A. Agarwal, A. Kumar, J. Malik, and D. Pathak, “Legged locomotion in challenging terrains using egocentric vision,” inConference on robot learning. PMLR, 2023, pp. 403–415

  18. [18]

    Coupling vision and proprioception for navigation of legged robots,

    Z. Fu, A. Kumar, A. Agarwal, H. Qi, J. Malik, and D. Pathak, “Coupling vision and proprioception for navigation of legged robots,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 17 273–17 283

  19. [19]

    Learning robust perceptive locomotion for quadrupedal robots in the wild,

    T. Miki, J. Lee, J. Hwangbo, L. Wellhausen, V . Koltun, and M. Hutter, “Learning robust perceptive locomotion for quadrupedal robots in the wild,”Science robotics, vol. 7, no. 62, p. eabk2822, 2022

  20. [20]

    These maps are made for walking: Real-time terrain property estimation for mobile robots,

    P. Ewen, A. Li, Y . Chen, S. Hong, and R. Vasudevan, “These maps are made for walking: Real-time terrain property estimation for mobile robots,”IEEE Robotics and Automation Letters, vol. 7, no. 3, pp. 7083–7090, 2022

  21. [21]

    Using lidar intensity for robot navigation,

    A. J. Sathyamoorthy, K. Weerakoon, M. Elnoor, and D. Manocha, “Using lidar intensity for robot navigation,”arXiv preprint arXiv:2309.07014, 2023

  22. [22]

    Adventr: Autonomous robot navigation in complex outdoor envi- ronments,

    K. Weerakoon, A. J. Sathyamoorthy, M. Elnoor, and D. Manocha, “Adventr: Autonomous robot navigation in complex outdoor envi- ronments,” inInternational Symposium on Experimental Robotics. Springer, 2023, pp. 219–228

  23. [23]

    Graspe: Graph based multimodal fusion for robot navigation in unstructured outdoor environments,

    K. Weerakoon, A. J. Sathyamoorthy, J. Liang, T. Guan, U. Patel, and D. Manocha, “Graspe: Graph based multimodal fusion for robot navigation in unstructured outdoor environments,”arXiv preprint arXiv:2209.05722, 2022

  24. [24]

    Pronav: Proprioceptive traversability estimation for legged robot navigation in outdoor environments,

    M. Elnoor, A. J. Sathyamoorthy, K. Weerakoon, and D. Manocha, “Pronav: Proprioceptive traversability estimation for legged robot navigation in outdoor environments,”IEEE Robotics and Automation Letters, 2024

  25. [25]

    Vapor: Legged robot navigation in unstructured outdoor environments using offline reinforcement learning,

    K. Weerakoon, A. J. Sathyamoorthy, M. Elnoor, and D. Manocha, “Vapor: Legged robot navigation in unstructured outdoor environments using offline reinforcement learning,” in2024 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2024, pp. 10 344–10 350

  26. [26]

    Slr: Learning quadruped locomotion without privileged information. arxiv 2024,

    S. Chen, Z. Wan, S. Yan, C. Zhang, W. Zhang, Q. Liu, D. Zhang, and F. Farrukh, “Slr: Learning quadruped locomotion without privileged information. arxiv 2024,”arXiv preprint arXiv:2406.04835

  27. [27]

    Navigation planning for legged robots in challenging terrain,

    M. Wermelinger, P. Fankhauser, R. Diethelm, P. Kr ¨usi, R. Siegwart, and M. Hutter, “Navigation planning for legged robots in challenging terrain,” in2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2016, pp. 1184–1189

  28. [28]

    Artplanner: Robust legged robot navigation in the field,

    L. Wellhausen and M. Hutter, “Artplanner: Robust legged robot navigation in the field,”arXiv preprint arXiv:2303.01420, 2023

  29. [29]

    Convoi: Context-aware navigation using vision language models in outdoor and indoor en- vironments,

    A. J. Sathyamoorthy, K. Weerakoon, M. Elnoor, A. Zore, B. Ichter, F. Xia, J. Tan, W. Yu, and D. Manocha, “Convoi: Context-aware navigation using vision language models in outdoor and indoor en- vironments,”arXiv preprint arXiv:2403.15637, 2024

  30. [30]

    Learning quadrupedal locomotion over challenging terrain,

    J. Lee, J. Hwangbo, L. Wellhausen, V . Koltun, and M. Hutter, “Learning quadrupedal locomotion over challenging terrain,”Science robotics, vol. 5, no. 47, p. eabc5986, 2020

  31. [31]

    Design of an adaptive lightweight lidar to decouple robot–camera geometry,

    Y . Chen, D. Wang, L. Thomas, K. Dantu, and S. J. Koppal, “Design of an adaptive lightweight lidar to decouple robot–camera geometry,” IEEE Transactions on Robotics, vol. 40, pp. 2254–2271, 2024

  32. [32]

    Mc2slam: Real- time inertial lidar odometry using two-scan motion compensation,

    F. Neuhaus, T. Koß, R. Kohnen, and D. Paulus, “Mc2slam: Real- time inertial lidar odometry using two-scan motion compensation,” inGerman Conference on Pattern Recognition. Springer, 2018, pp. 60–72

  33. [33]

    Visual slam algorithms: A survey from 2010 to 2016,

    T. Taketomi, H. Uchiyama, and S. Ikeda, “Visual slam algorithms: A survey from 2010 to 2016,”IPSJ transactions on computer vision and applications, vol. 9, no. 1, p. 16, 2017

  34. [34]

    The perfect match: 3d point cloud matching with smoothed densities,

    Z. Gojcic, C. Zhou, J. D. Wegner, and A. Wieser, “The perfect match: 3d point cloud matching with smoothed densities,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 5545–5554

  35. [35]

    Active camera stabilization to enhance the vision of agile legged robots,

    S. Bazeille, J. Ortiz, F. Rovida, M. Camurri, A. Meguenani, D. G. Caldwell, and C. Semini, “Active camera stabilization to enhance the vision of agile legged robots,”Robotica, vol. 35, no. 4, pp. 942–960, 2017

  36. [36]

    Terrain- adaptive planning of a mobile robot with a multi-axis gimbal system for stable slam,

    Z. Wangy, M. Liy, X. Liu, Y . Wang, Y . Liu, and H. Chen, “Terrain- adaptive planning of a mobile robot with a multi-axis gimbal system for stable slam,”IEEE Transactions on Field Robotics, 2025

  37. [37]

    The spring-mass model for running and hopping,

    R. Blickhan, “The spring-mass model for running and hopping,” Journal of biomechanics, vol. 22, no. 11-12, pp. 1217–1227, 1989

  38. [38]

    A connectionist central pattern generator for the aquatic and terrestrial gaits of a simulated salamander,

    A. J. Ijspeert, “A connectionist central pattern generator for the aquatic and terrestrial gaits of a simulated salamander,”Biological cybernetics, vol. 84, no. 5, pp. 331–348, 2001

  39. [39]

    Mujoco: A physics engine for model-based control,

    E. Todorov, T. Erez, and Y . Tassa, “Mujoco: A physics engine for model-based control,” in2012 IEEE/RSJ international conference on intelligent robots and systems. IEEE, 2012, pp. 5026–5033

  40. [40]

    Bullet real-time physics simulation,

    “Bullet real-time physics simulation,” in https://pybullet.org/wordpress/

  41. [41]

    Nvidia isaac sim,

    “Nvidia isaac sim,” inhttps://developer .nvidia.com/isaac/sim. NVIDIA

  42. [42]

    Proximal Policy Optimization Algorithms

    J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,”arXiv preprint arXiv:1707.06347, 2017

  43. [43]

    Soft actor-critic: Off- policy maximum entropy deep reinforcement learning with a stochastic actor,

    T. Haarnoja, A. Zhou, P. Abbeel, and S. Levine, “Soft actor-critic: Off- policy maximum entropy deep reinforcement learning with a stochastic actor,” inInternational conference on machine learning. Pmlr, 2018, pp. 1861–1870

  44. [44]

    Navila: Legged robot vision-language-action model for navigation.arXiv preprint arXiv:2412.04453, 2024

    A.-C. Cheng, Y . Ji, Z. Yang, Z. Gongye, X. Zou, J. Kautz, E. Bıyık, H. Yin, S. Liu, and X. Wang, “Navila: Legged robot vision-language- action model for navigation,”arXiv preprint arXiv:2412.04453, 2024

  45. [45]

    Perceptive pedipulation with local obstacle avoidance,

    J. Stolle, P. Arm, M. Mittal, and M. Hutter, “Perceptive pedipulation with local obstacle avoidance,” in2024 IEEE-RAS 23rd International Conference on Humanoid Robots (Humanoids). IEEE, 2024, pp. 157–164

  46. [46]

    Learning perceptive humanoid locomotion over challenging terrain

    W. Sun, B. Cao, L. Chen, Y . Su, Y . Liu, Z. Xie, and H. Liu, “Learning perceptive humanoid locomotion over challenging terrain,” arXiv preprint arXiv:2503.00692, 2025

  47. [47]

    Learning to walk in minutes using massively parallel deep reinforcement learning,

    N. Rudin, D. Hoeller, P. Reist, and M. Hutter, “Learning to walk in minutes using massively parallel deep reinforcement learning,” in Conference on robot learning. PMLR, 2022, pp. 91–100

  48. [48]

    Moral: Learning morphologically adaptive locomotion controller for quadrupedal robots on challenging terrains,

    Z. Luo, Y . Dong, X. Li, R. Huang, Z. Shu, E. Xiao, and P. Lu, “Moral: Learning morphologically adaptive locomotion controller for quadrupedal robots on challenging terrains,”IEEE Robotics and Au- tomation Letters, 2024

  49. [49]

    B. D. R. kit. (2023) About the spot robot. Accessed: 2024-09-15. [Online]. Available: https://bostondynamics.com/ reinforcement-learning-researcher-kit/