Watch Your Step: Learning Semantically-Guided Locomotion in Cluttered Environment

Denan Liang; Lihua Xie; Ruimeng Liu; Shenghai Yuan; Thien-Minh Nguyen; Yuan Zhu

arxiv: 2603.02657 · v2 · submitted 2026-03-03 · 💻 cs.RO

Watch Your Step: Learning Semantically-Guided Locomotion in Cluttered Environment

Denan Liang , Yuan Zhu , Ruimeng Liu , Thien-Minh Nguyen , Shenghai Yuan , Lihua Xie This is my paper

Pith reviewed 2026-05-15 17:31 UTC · model grok-4.3

classification 💻 cs.RO

keywords legged robotsreinforcement learningsemantic mapsfoothold selectioncluttered environmentsobstacle avoidancesafe locomotion

0 comments

The pith

SemLoco uses reinforcement learning to let legged robots step safely around low-lying objects in cluttered areas.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents SemLoco, a reinforcement learning framework that helps legged robots avoid stepping on low-lying objects such as cables or devices in densely cluttered environments. It combines a two-stage RL process with pixel-wise foothold safety inference and integrates semantic maps to assign traversability costs rather than relying solely on geometry. This approach tackles the gap between high-level semantic understanding and low-level control, along with errors that arise in real-world elevation maps. If the method works as described, robots can navigate reliably in settings where standard controllers risk damage to sensitive items. Tests indicate the framework extends to complex unstructured real environments.

Core claim

SemLoco is a two-stage RL approach that combines soft and hard constraints to perform pixel-wise foothold safety inference while integrating semantic maps to assign traversability costs, which greatly reduces collisions and improves safety around sensitive objects in densely cluttered environments.

What carries the argument

Two-stage RL policy that performs pixel-wise foothold safety inference and incorporates semantic maps for traversability costs instead of geometry alone.

If this is right

Collisions with low-lying objects drop substantially in cluttered settings.
Safety improves around sensitive objects such as high-cost devices or cables on flat ground.
The controller supports reliable navigation in complex unstructured real-world spaces.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The approach may let legged robots share spaces with humans or delicate equipment with lower accidental damage risk.
It could lessen dependence on perfectly accurate elevation maps for basic safety.
Similar semantic guidance might extend to other robot platforms or tasks involving precise placement.

Load-bearing premise

Accurate semantic maps can be generated in real time and the learned policy transfers from training to real cluttered environments without major elevation map errors.

What would settle it

Place a legged robot equipped with SemLoco in a real cluttered test space containing known low-lying sensitive objects and check whether it still steps on them under realistic elevation map noise.

Figures

Figures reproduced from arXiv: 2603.02657 by Denan Liang, Lihua Xie, Ruimeng Liu, Shenghai Yuan, Thien-Minh Nguyen, Yuan Zhu.

**Figure 1.** Figure 1: SemLoco Overview for semantic-aware locomotion in cluttered environments. This figure demonstrates a quadruped robot performing obstacle avoidance in a real-world scene with small sensitive objects. Compared to the traditional pure elevation map, SemLoco integrates semantic map to low-level control, enabling the controller to perform pixel-wise foothold safety inference. This allows for precise gait planni… view at source ↗

**Figure 2.** Figure 2: Framework of SemLoco. Sub-modules have different styles based on their functions. Among them, the red trapezoid represents the neural network, the blue rectangle represents unprocessed raw data, and the green rectangle represents processed ready-to-use data. (a) Training in the simulator: In stage 1, we use virtual obstacles (Highly yellow spheres). Although the robot walks on flat ground, it receives a vi… view at source ↗

**Figure 3.** Figure 3: Footswing tracking for each leg. 2) Rigid Obstacle Stage: After the policy converges in the soft dynamic stage, we transfer it to the second stage, where full physical interactions are enabled. Obstacles are created as rigid bodies, and their density and variety are increased compared to the first stage, further improving the robot’s obstacle avoidance ability. In this stage, the policy adjusts the previou… view at source ↗

**Figure 4.** Figure 4: Examples of obstacle curricula. F. Training Setup in Simulator and Real-world Deployment Training: We conduct two-stage training environments in the IsaacLab framework [29] and use Proximal Policy Optimization [30] to train policies. Each stage is trained with 4096 parallel environments of Unitree Go2 robot. We train on a single NVIDIA RTX 5090 GPU, with the soft constraint stage taking 1.5 hours and the … view at source ↗

**Figure 5.** Figure 5: Qualitative comparison of obstacle avoidance in a real cluttered environment.We compare our semantic-aware locomotion policy (Bottom) against an [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗

read the original abstract

Although legged robots demonstrate impressive mobility on rough terrain, using them safely in cluttered environments remains a challenge. A key issue is their inability to avoid stepping on low-lying objects, such as high-cost small devices or cables on flat ground. This limitation arises from a disconnection between high-level semantic understanding and low-level control, combined with errors in elevation maps during real-world operation. To address this, we introduce SemLoco, a Reinforcement Learning (RL) framework designed to avoid obstacles precisely in densely cluttered environments. SemLoco uses a two-stage RL approach that combines both soft and hard constraints. It performs pixel-wise foothold safety inference, which enables more accurate foot placement. Additionally, SemLoco integrates semantic map, allowing it to assign traversability costs instead of relying only on geometric data. SemLoco greatly reduces collisions and improves safety around sensitive objects, enabling reliable navigation in situations where traditional controllers would likely cause damage. Experimental results further show that SemLoco can be effectively applied to more complex, unstructured real-world environments. A demo video can be view at https://youtu.be/FSq-RSmIxOM.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper introduces SemLoco, a two-stage reinforcement learning framework for legged robots that combines soft and hard constraints with pixel-wise foothold safety inference and semantic-map-based traversability costs. The central claim is that this approach enables precise obstacle avoidance in densely cluttered environments, greatly reducing collisions with low-lying sensitive objects (e.g., cables or devices) where traditional geometric controllers fail due to elevation-map errors and lack of semantic understanding, with effective transfer to complex real-world scenes.

Significance. If the quantitative claims hold under rigorous validation, the work would meaningfully advance safe deployment of legged robots in human environments by bridging semantic understanding with low-level foot placement, addressing a practical gap that limits current controllers around fragile objects.

major comments (2)

[Abstract] Abstract: The claim that 'SemLoco greatly reduces collisions and improves safety' is presented without any quantitative metrics, baselines, success rates, or statistical comparisons, leaving the magnitude and reliability of the improvement unsupported by visible evidence.
[Experimental results] Experimental results: The central attribution of collision reduction to pixel-wise inference plus semantic costs assumes accurate real-time semantic maps and robustness to elevation errors, yet no ablations on label noise, elevation drift tolerance, or failure cases under realistic map inaccuracies are reported; this makes it impossible to confirm the policy generalizes beyond idealized training conditions.

minor comments (1)

[Abstract] The abstract references a demo video but the main text should include at least summary tables of key metrics (e.g., collision rates, success rates) to allow readers to assess performance without external media.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thoughtful and constructive comments. We appreciate the opportunity to clarify and strengthen our manuscript based on the feedback provided.

read point-by-point responses

Referee: [Abstract] Abstract: The claim that 'SemLoco greatly reduces collisions and improves safety' is presented without any quantitative metrics, baselines, success rates, or statistical comparisons, leaving the magnitude and reliability of the improvement unsupported by visible evidence.

Authors: We agree that the abstract would benefit from including quantitative evidence to support the claims. The experimental section of the manuscript includes detailed metrics such as collision rates, success rates in navigation tasks, and comparisons to baseline methods. We will revise the abstract to incorporate key quantitative results, including specific improvements in collision reduction and safety metrics. revision: yes
Referee: [Experimental results] Experimental results: The central attribution of collision reduction to pixel-wise inference plus semantic costs assumes accurate real-time semantic maps and robustness to elevation errors, yet no ablations on label noise, elevation drift tolerance, or failure cases under realistic map inaccuracies are reported; this makes it impossible to confirm the policy generalizes beyond idealized training conditions.

Authors: The referee correctly identifies a gap in our analysis. While we demonstrate real-world transfer and some robustness to map errors in the experiments, we did not perform explicit ablations on semantic label noise or elevation drift. We will add these ablations to the revised version, including quantitative evaluations under varying levels of noise and drift to better support the generalization claims. revision: yes

Circularity Check

0 steps flagged

No circularity: RL framework is a new construction without self-referential derivations

full rationale

The paper describes SemLoco as a two-stage RL method that integrates pixel-wise foothold inference and semantic traversability costs. No equations, parameter fits, or derivations are presented that reduce by construction to the inputs. No self-citations appear as load-bearing steps for uniqueness theorems or ansatzes. The approach is self-contained as an independent RL construction, with claims resting on empirical validation rather than tautological re-labeling of fitted quantities.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The claim rests on the effectiveness of semantic integration and RL constraints for foothold planning, with the main domain assumption being reliable real-time semantic labeling beyond geometry.

axioms (1)

domain assumption Semantic maps can be generated accurately enough in real time to assign meaningful traversability costs beyond elevation data
Central to replacing pure geometric maps with semantic costs as described in the abstract.

pith-pipeline@v0.9.0 · 5516 in / 1058 out tokens · 51961 ms · 2026-05-15T17:31:10.979832+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

SemLoco uses a two-stage RL approach that combines both soft and hard constraints. It performs pixel-wise foothold safety inference... integrates semantic map, allowing it to assign traversability costs instead of relying only on geometric data.
IndisputableMonolith/Foundation/AlphaCoordinateFixation.lean J_uniquely_calibrated_via_higher_derivative unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

J(p_k)=C_dir(p_k)+w_col ·1_col(p_k) ... asymmetric directional deviation cost

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

29 extracted references · 29 canonical work pages · 2 internal anchors

[1]

Learning agile and dynamic motor skills for legged robots,

J. Hwangbo, J. Leeet al., “Learning agile and dynamic motor skills for legged robots,”Science Robotics, vol. 4, no. 26, p. eaau5872, 2019

work page 2019
[2]

Learning robust perceptive locomotion for quadrupedal robots in the wild,

T. Miki, J. Leeet al., “Learning robust perceptive locomotion for quadrupedal robots in the wild,”Science Robotics, vol. 7, no. 62, p. eabk2822, 2022

work page 2022
[3]

Rma: Rapid motor adaptation for legged robots,

A. Kumar, Z. Fuet al., “Rma: Rapid motor adaptation for legged robots,” inRobotics: Science and Systems (RSS), 2021

work page 2021
[5]

Probabilistic terrain mapping for mobile robots with uncertain localization,

P. Fankhauser, M. Bloesch, and M. Hutter, “Probabilistic terrain mapping for mobile robots with uncertain localization,”IEEE Robot. Autom. Lett., 2018

work page 2018
[6]

Learning risk-aware costmaps for traversability in challenging environments,

D. D. Fan, S. Deyet al., “Learning risk-aware costmaps for traversability in challenging environments,”IEEE Robot. Autom. Lett., 2022

work page 2022
[7]

Terrain-aware semantic mapping for cooperative subterranean exploration,

M. J. Miles, H. Biggie, and C. Heckman, “Terrain-aware semantic mapping for cooperative subterranean exploration,”Frontiers in Robotics and AI, vol. 10, p. 1249586, 2023

work page 2023
[8]

Safety path planning for quadruped robots optimized by multi-sensor fusion,

R. Yue, L. Fenget al., “Safety path planning for quadruped robots optimized by multi-sensor fusion,”IF AC-PapersOnLine, vol. 59, no. 27, pp. 55–60, 2025

work page 2025
[9]

Learning-based legged locomotion: State of the art and future perspectives,

S. Ha, J. Leeet al., “Learning-based legged locomotion: State of the art and future perspectives,”The International Journal of Robotics Research, vol. 44, no. 8, pp. 1396–1427, 2025

work page 2025
[10]

Perceptive locomotion through nonlinear model-predictive control,

R. Grandia, F. Jeneltenet al., “Perceptive locomotion through nonlinear model-predictive control,”IEEE Transactions on Robotics, vol. 39, no. 5, pp. 3402–3421, 2023

work page 2023
[11]

://arxiv.org/abs/1909.06586

D. Kim, J. Di Carloet al., “Highly dynamic quadruped locomotion via whole-body impulse control and model predictive control,”arXiv preprint arXiv:1909.06586, 2019

work page arXiv 1909
[12]

Advanced skills by learning locomotion and local navigation end-to-end,

N. Rudin, D. Hoelleret al., “Advanced skills by learning locomotion and local navigation end-to-end,” inProc. IEEE/RSJ Int. Conf. Intell. Robots Syst. (IROS), 2022

work page 2022
[13]

Perceptive locomotion in rough terrain–online foothold optimization,

F. Jenelten, T. Mikiet al., “Perceptive locomotion in rough terrain–online foothold optimization,”IEEE Robot. Autom. Lett., 2020

work page 2020
[14]

Mem: Multi-modal elevation mapping for robotics and learning,

G. Erni, J. Freyet al., “Mem: Multi-modal elevation mapping for robotics and learning,” inProc. IEEE/RSJ Int. Conf. Intell. Robots Syst. (IROS), 2023

work page 2023
[15]

Real-time semantic mapping for autonomous off-road navigation,

D. Maturana, P.-W. Chouet al., “Real-time semantic mapping for autonomous off-road navigation,” inField and Service Robotics: Results of the 11th International Conference. Springer, 2017, pp. 335–350

work page 2017
[16]

S-nav: Semantic-geometric planning for mobile robots,

P. Kremer, H. Bavleet al., “S-nav: Semantic-geometric planning for mobile robots,”arXiv preprint arXiv:2307.01613, 2023

work page arXiv 2023
[17]

Learning semantic traversability with ego- centric video and automated annotation strategy,

Y . Kim, J. H. Leeet al., “Learning semantic traversability with ego- centric video and automated annotation strategy,”IEEE Robot. Autom. Lett., 2024

work page 2024
[18]

Risk-aware off-road navigation via a learned speed distribution map,

X. Cai, M. Everettet al., “Risk-aware off-road navigation via a learned speed distribution map,” inProc. IEEE/RSJ Int. Conf. Intell. Robots Syst. (IROS), 2022

work page 2022
[19]

Path planning incorporating se- mantic information for autonomous robot navigation,

S. Achat, J. Marzat, and J. Moras, “Path planning incorporating se- mantic information for autonomous robot navigation,” in19th Interna- tional Conference on Informatics in Control, Automation and Robotics (ICINCO) 2022. SCITEPRESS-Science and Technology Publications, 2022, pp. 285–295

work page 2022
[20]

Viplanner: Visual semantic imperative learning for local navigation,

P. Roth, J. Nubertet al., “Viplanner: Visual semantic imperative learning for local navigation,” inProc. IEEE Int. Conf. Robot. Autom. (ICRA), 2024

work page 2024
[21]

Learning semantics-aware locomotion skills from human demonstration,

Y . Yang, X. Menget al., “Learning semantics-aware locomotion skills from human demonstration,” inProceedings of The 6th Conference on Robot Learning (CoRL), ser. Proceedings of Machine Learning Research, vol. 205, 2023, pp. 2205–2214. [Online]. Available: https://proceedings.mlr.press/v205/

work page 2023
[22]

Watch your stepp: Semantic traversability estimation using pose projected features,

S. Ægidius, D. Hadjivelichkovet al., “Watch your stepp: Semantic traversability estimation using pose projected features,” in2025 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2025, pp. 2376–2382

work page 2025
[23]

Walk these ways: Tuning robot control for generalization with multiplicity of behavior,

G. B. Margolis and P. Agrawal, “Walk these ways: Tuning robot control for generalization with multiplicity of behavior,” inConf. Robot Learn. (CoRL), 2023

work page 2023
[24]

Sim-to-real learning of all common bipedal gaits via periodic reward composition,

J. Siekmann, Y . Godseet al., “Sim-to-real learning of all common bipedal gaits via periodic reward composition,” inProc. IEEE Int. Conf. Robot. Autom. (ICRA), 2021

work page 2021
[25]

M. H. Raibert,Legged robots that balance. MIT press, 1986

work page 1986
[26]

Yolo-e: a lightweight object detection algorithm for military targets,

Y . Sun, J. Wanget al., “Yolo-e: a lightweight object detection algorithm for military targets,”Signal, Image and Video Processing, vol. 19, no. 3, p. 241, 2025

work page 2025
[27]

Concurrent training of a control policy and a state estimator for dynamic and robust legged locomotion,

G. Ji, J. Munet al., “Concurrent training of a control policy and a state estimator for dynamic and robust legged locomotion,”IEEE Robot. Autom. Lett., 2022

work page 2022
[28]

Rapid locomotion via reinforcement learning,

G. B. Margolis, G. Yanget al., “Rapid locomotion via reinforcement learning,”Int. J. Robot. Res., 2024

work page 2024
[29]

Isaac Lab: A GPU-Accelerated Simulation Framework for Multi-Modal Robot Learning

M. Mittal, P. Rothet al., “Isaac lab: A gpu-accelerated simulation frame- work for multi-modal robot learning,”arXiv preprint arXiv:2511.04831, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[30]

Proximal Policy Optimization Algorithms

J. Schulman, F. Wolskiet al., “Proximal policy optimization algorithms,” arXiv preprint arXiv:1707.06347, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[1] [1]

Learning agile and dynamic motor skills for legged robots,

J. Hwangbo, J. Leeet al., “Learning agile and dynamic motor skills for legged robots,”Science Robotics, vol. 4, no. 26, p. eaau5872, 2019

work page 2019

[2] [2]

Learning robust perceptive locomotion for quadrupedal robots in the wild,

T. Miki, J. Leeet al., “Learning robust perceptive locomotion for quadrupedal robots in the wild,”Science Robotics, vol. 7, no. 62, p. eabk2822, 2022

work page 2022

[3] [3]

Rma: Rapid motor adaptation for legged robots,

A. Kumar, Z. Fuet al., “Rma: Rapid motor adaptation for legged robots,” inRobotics: Science and Systems (RSS), 2021

work page 2021

[4] [5]

Probabilistic terrain mapping for mobile robots with uncertain localization,

P. Fankhauser, M. Bloesch, and M. Hutter, “Probabilistic terrain mapping for mobile robots with uncertain localization,”IEEE Robot. Autom. Lett., 2018

work page 2018

[5] [6]

Learning risk-aware costmaps for traversability in challenging environments,

D. D. Fan, S. Deyet al., “Learning risk-aware costmaps for traversability in challenging environments,”IEEE Robot. Autom. Lett., 2022

work page 2022

[6] [7]

Terrain-aware semantic mapping for cooperative subterranean exploration,

M. J. Miles, H. Biggie, and C. Heckman, “Terrain-aware semantic mapping for cooperative subterranean exploration,”Frontiers in Robotics and AI, vol. 10, p. 1249586, 2023

work page 2023

[7] [8]

Safety path planning for quadruped robots optimized by multi-sensor fusion,

R. Yue, L. Fenget al., “Safety path planning for quadruped robots optimized by multi-sensor fusion,”IF AC-PapersOnLine, vol. 59, no. 27, pp. 55–60, 2025

work page 2025

[8] [9]

Learning-based legged locomotion: State of the art and future perspectives,

S. Ha, J. Leeet al., “Learning-based legged locomotion: State of the art and future perspectives,”The International Journal of Robotics Research, vol. 44, no. 8, pp. 1396–1427, 2025

work page 2025

[9] [10]

Perceptive locomotion through nonlinear model-predictive control,

R. Grandia, F. Jeneltenet al., “Perceptive locomotion through nonlinear model-predictive control,”IEEE Transactions on Robotics, vol. 39, no. 5, pp. 3402–3421, 2023

work page 2023

[10] [11]

://arxiv.org/abs/1909.06586

D. Kim, J. Di Carloet al., “Highly dynamic quadruped locomotion via whole-body impulse control and model predictive control,”arXiv preprint arXiv:1909.06586, 2019

work page arXiv 1909

[11] [12]

Advanced skills by learning locomotion and local navigation end-to-end,

N. Rudin, D. Hoelleret al., “Advanced skills by learning locomotion and local navigation end-to-end,” inProc. IEEE/RSJ Int. Conf. Intell. Robots Syst. (IROS), 2022

work page 2022

[12] [13]

Perceptive locomotion in rough terrain–online foothold optimization,

F. Jenelten, T. Mikiet al., “Perceptive locomotion in rough terrain–online foothold optimization,”IEEE Robot. Autom. Lett., 2020

work page 2020

[13] [14]

Mem: Multi-modal elevation mapping for robotics and learning,

G. Erni, J. Freyet al., “Mem: Multi-modal elevation mapping for robotics and learning,” inProc. IEEE/RSJ Int. Conf. Intell. Robots Syst. (IROS), 2023

work page 2023

[14] [15]

Real-time semantic mapping for autonomous off-road navigation,

D. Maturana, P.-W. Chouet al., “Real-time semantic mapping for autonomous off-road navigation,” inField and Service Robotics: Results of the 11th International Conference. Springer, 2017, pp. 335–350

work page 2017

[15] [16]

S-nav: Semantic-geometric planning for mobile robots,

P. Kremer, H. Bavleet al., “S-nav: Semantic-geometric planning for mobile robots,”arXiv preprint arXiv:2307.01613, 2023

work page arXiv 2023

[16] [17]

Learning semantic traversability with ego- centric video and automated annotation strategy,

Y . Kim, J. H. Leeet al., “Learning semantic traversability with ego- centric video and automated annotation strategy,”IEEE Robot. Autom. Lett., 2024

work page 2024

[17] [18]

Risk-aware off-road navigation via a learned speed distribution map,

X. Cai, M. Everettet al., “Risk-aware off-road navigation via a learned speed distribution map,” inProc. IEEE/RSJ Int. Conf. Intell. Robots Syst. (IROS), 2022

work page 2022

[18] [19]

Path planning incorporating se- mantic information for autonomous robot navigation,

S. Achat, J. Marzat, and J. Moras, “Path planning incorporating se- mantic information for autonomous robot navigation,” in19th Interna- tional Conference on Informatics in Control, Automation and Robotics (ICINCO) 2022. SCITEPRESS-Science and Technology Publications, 2022, pp. 285–295

work page 2022

[19] [20]

Viplanner: Visual semantic imperative learning for local navigation,

P. Roth, J. Nubertet al., “Viplanner: Visual semantic imperative learning for local navigation,” inProc. IEEE Int. Conf. Robot. Autom. (ICRA), 2024

work page 2024

[20] [21]

Learning semantics-aware locomotion skills from human demonstration,

Y . Yang, X. Menget al., “Learning semantics-aware locomotion skills from human demonstration,” inProceedings of The 6th Conference on Robot Learning (CoRL), ser. Proceedings of Machine Learning Research, vol. 205, 2023, pp. 2205–2214. [Online]. Available: https://proceedings.mlr.press/v205/

work page 2023

[21] [22]

Watch your stepp: Semantic traversability estimation using pose projected features,

S. Ægidius, D. Hadjivelichkovet al., “Watch your stepp: Semantic traversability estimation using pose projected features,” in2025 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2025, pp. 2376–2382

work page 2025

[22] [23]

Walk these ways: Tuning robot control for generalization with multiplicity of behavior,

G. B. Margolis and P. Agrawal, “Walk these ways: Tuning robot control for generalization with multiplicity of behavior,” inConf. Robot Learn. (CoRL), 2023

work page 2023

[23] [24]

Sim-to-real learning of all common bipedal gaits via periodic reward composition,

J. Siekmann, Y . Godseet al., “Sim-to-real learning of all common bipedal gaits via periodic reward composition,” inProc. IEEE Int. Conf. Robot. Autom. (ICRA), 2021

work page 2021

[24] [25]

M. H. Raibert,Legged robots that balance. MIT press, 1986

work page 1986

[25] [26]

Yolo-e: a lightweight object detection algorithm for military targets,

Y . Sun, J. Wanget al., “Yolo-e: a lightweight object detection algorithm for military targets,”Signal, Image and Video Processing, vol. 19, no. 3, p. 241, 2025

work page 2025

[26] [27]

Concurrent training of a control policy and a state estimator for dynamic and robust legged locomotion,

G. Ji, J. Munet al., “Concurrent training of a control policy and a state estimator for dynamic and robust legged locomotion,”IEEE Robot. Autom. Lett., 2022

work page 2022

[27] [28]

Rapid locomotion via reinforcement learning,

G. B. Margolis, G. Yanget al., “Rapid locomotion via reinforcement learning,”Int. J. Robot. Res., 2024

work page 2024

[28] [29]

Isaac Lab: A GPU-Accelerated Simulation Framework for Multi-Modal Robot Learning

M. Mittal, P. Rothet al., “Isaac lab: A gpu-accelerated simulation frame- work for multi-modal robot learning,”arXiv preprint arXiv:2511.04831, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[29] [30]

Proximal Policy Optimization Algorithms

J. Schulman, F. Wolskiet al., “Proximal policy optimization algorithms,” arXiv preprint arXiv:1707.06347, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017