arxiv: 2604.23761 · v1 · submitted 2026-04-26 · 💻 cs.RO

Recognition: unknown

Unleashing the Agility of Wheeled-Legged Robots for High-Dynamic Reflexive Obstacle Evasion

Yongen Zhao (1 , 3) , Zihao Xu (2) , Wenzhi Lu (1) , Zhen Chu (4) , Ce Hao (2 , 3) ((1) School of Mechanical Engineering , Tianjin University

show 10 more authors

Tianjin China (2) School of Computing National University of Singapore Singapore (3) Beijing Zhongguancun Academy Beijing (4) DeepRobotics Hangzhou China)

Authors on Pith no claims yet

Pith reviewed 2026-05-08 05:57 UTC · model grok-4.3

classification 💻 cs.RO

keywords wheeled-legged robotsobstacle evasionhierarchical reinforcement learningemergent gaitsreflexive behaviorshybrid morphologydynamic environments

0 comments

The pith

A hierarchical reinforcement learning framework lets wheeled-legged robots discover reflexive evasion behaviors such as forward lunges and lateral dodges.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops AWARE, a hierarchical reinforcement learning system that trains wheeled-legged robots to avoid fast-moving obstacles. It tackles the difficulties of combining wheels and legs, coupled motion modes, and non-holonomic movement limits by letting policies emerge without hand-designed behaviors. The result is that the robot produces varied gaits including lunging ahead or shifting sideways to exploit its hybrid form for quick responses. Validation occurs through simulation runs and physical trials on a real platform across multiple fast-threat setups. Readers would care because this shows a route to more responsive mobility in unpredictable spaces using the robot's built-in design advantages rather than exhaustive manual tuning.

Core claim

The AWARE hierarchical reinforcement learning framework enables wheeled-legged robots to naturally exhibit diverse emergent gaits and evasive behaviors, including forward lunge and lateral dodge, thereby leveraging the robot's hybrid morphology to enhance agility under highly dynamic threats, with robust performance shown in simulation and real-world deployment.

What carries the argument

AWARE, the Adaptive Wheeled-Legged Avoidance and Reflexive Evasion hierarchical reinforcement learning framework, which trains policies that produce emergent reflexive evasion by bridging hybrid morphology, mode coupling, and non-holonomic constraints.

Load-bearing premise

The hierarchical reinforcement learning framework can sufficiently bridge the hybrid morphology, mode coupling, and non-holonomic constraints to produce robust real-world evasion without major sim-to-real failures or safety issues.

What would settle it

If real-world trials show frequent collisions with fast-moving obstacles or an absence of the described emergent behaviors such as forward lunges and lateral dodges, the claim of effective reflexive evasion would be disproven.

Figures

Figures reproduced from arXiv: 2604.23761 by (2) School of Computing, 3), 3) ((1) School of Mechanical Engineering, (3) Beijing Zhongguancun Academy, (4) DeepRobotics, Beijing, Ce Hao (2, China, China), Hangzhou, National University of Singapore, Singapore, Tianjin, Tianjin University, Wenzhi Lu (1), Yongen Zhao (1, Zhen Chu (4), Zihao Xu (2).

**Figure 1.** Figure 1: AWARE: Adaptive Wheeled-Legged Avoidance and Reflexive Evasion. AWARE enables wheeled-legged robots to execute agile obstacle avoidance in both single-mode reflexive evasion (a, c, d, e) and continuous mixed-mode scenarios (b, f), with seamless transitions between smooth navigation avoidance and reflexive evasion maneuvers. The figure presents representative real-world results under three dynamic obstacle … view at source ↗

**Figure 2.** Figure 2: Overview of the AWARE framework. (a) Hierarchical Architecture: A high-level policy generates evasion commands that are executed by specialized low-level experts for either smooth navigation or high-dynamic reflexive evasion. (b) Two-Stage Training: A decoupled pipeline sequentially trains the low-level experts and the high-level policy to optimize overall evasion performance and efficiency. (c) Real-Robot… view at source ↗

**Figure 3.** Figure 3: The training acceleration distribution for two experts. view at source ↗

**Figure 4.** Figure 4: Visualization of the dual-mode high-dynamic obstacle avoidance system. Reflexive Evasion Mode (a–d): Triggered by high-speed obstacles (red ball) from varying directions, the system emerges extreme maneuvers, specifically lateral jumping within the stepping mode (a, b) and forward leaping within the rolling mode (c, d). Navigation Avoidance Mode (e–f): Under lower threat levels (silver ball), the robot exe… view at source ↗

**Figure 5.** Figure 5: Performance comparison of the proposed method and baseline view at source ↗

**Figure 6.** Figure 6: (a) t-SNE visualization of kinematic features for five gait modes. Convex hulls delineate the region occupied by each mode. The low-speed modes form a compact cluster, with Hybrid bridging Stepping and Rolling, while the high-speed modes are distinctly separated, validating the mode categorization. Quantitative analysis of the emergent avoidance behaviors under varying reaction times and approach angles. (… view at source ↗

**Figure 7.** Figure 7: Ablation studies on the AWARE framework. (a) Avoidance Success view at source ↗

**Figure 8.** Figure 8: Real-robot experiments for high-dynamic obstacle avoidance. (a) Reflexive evasion of a y-direction obstacle using a rolling gait. (b) Reflexive view at source ↗

read the original abstract

Wheeled-legged robots combine the energy efficiency of wheeled locomotion with the terrain adaptability of legged systems, making them promising platforms for agile mobility in complex and dynamic environments. However, enabling high-dynamic reflexive evasion against fast-moving obstacles remains challenging due to the hybrid morphology, mode coupling, and non-holonomic constraints of such platforms. In this work, we propose AWARE, Adaptive Wheeled-Legged Avoidance and Reflexive Evasion, a hierarchical reinforcement learning framework for high-dynamic obstacle avoidance in wheeled-legged robots. The proposed system naturally exhibits diverse emergent gaits and evasive behaviors, including forward lunge and lateral dodge, thereby leveraging the robot's hybrid morphology to enhance agility under highly dynamic threats. Extensive experiments in Isaac Lab simulation and real-world deployment on the M20 platform across diverse dynamic scenarios demonstrate that AWARE achieves robust and agile obstacle avoidance while revealing behaviorally distinct evasive strategies. These results highlight both the practical effectiveness of AWARE and the intrinsic reflexive agility of wheeled-legged robots.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

AWARE is a straightforward hierarchical RL application that gets real hardware evasion working on a wheeled-legged platform, with reported metrics and emergent behaviors.

read the letter

The main takeaway is that this paper gets a hierarchical RL controller running on the M20 wheeled-legged robot and shows it can handle fast-moving obstacles in both sim and real tests. They split the policy into a high-level part that picks evasion modes and a low-level part that handles the actual gaits, which lets the system produce things like lunges and dodges without hand-crafted rules for every case. That part is useful because hybrid platforms have tricky mode switches and non-holonomic limits, and the setup appears to exploit the wheels-plus-legs design rather than fight it. The experiments include Isaac Lab runs plus physical deployments across dynamic scenarios, and they include success rates and observed behavior differences, which is better than many sim-only claims in this area. The architecture itself is standard, but the real-world transfer and the fact that distinct evasive strategies appear without explicit programming count as concrete evidence. One soft spot is that the abstract and description do not spell out the exact baselines or ablation details in the provided summary, so it is hard to judge how much the hierarchy beats simpler end-to-end alternatives. Still, the hardware results are the load-bearing part here and they seem to hold up. This paper is mainly for people working on practical control of hybrid robots who need a working example of reflexive avoidance rather than a theoretical breakthrough. It deserves a serious referee because the real deployment data makes the claims checkable.

Referee Report

0 major / 3 minor

Summary. The manuscript presents AWARE, a hierarchical reinforcement learning framework for high-dynamic reflexive obstacle evasion on wheeled-legged robots. It claims that the framework bridges hybrid morphology, mode coupling, and non-holonomic constraints to produce emergent gaits and evasive behaviors (e.g., forward lunge, lateral dodge), with validation via Isaac Lab simulations and real-world M20 hardware deployment across dynamic scenarios, including quantitative success rates and observed behavioral distinctions.

Significance. If the reported results hold, the work is significant for showing that hierarchical RL can yield practical, robust evasion on hybrid platforms without explicit programming of maneuvers. The real-world deployment on M20 hardware together with quantitative metrics constitutes falsifiable evidence, which is a clear strength over purely simulated studies. This advances understanding of reflexive agility in wheeled-legged systems and could inform downstream applications in dynamic environments.

minor comments (3)

Abstract: the claim of 'robust and agile obstacle avoidance' and 'extensive experiments' would be strengthened by inserting one or two concrete numbers (e.g., success rate, average evasion time) rather than leaving them for the body only.
§3 (or equivalent methods section): the interface between the high-level evasion-mode policy and the low-level gait controller is described at a high level; a short pseudocode block or explicit state-transition diagram would clarify how non-holonomic constraints are handled at each level.
Figures 4–6 (behavioral results): ensure every sub-figure caption explicitly states the quantitative metric shown (success rate, collision count, etc.) and the number of trials, to allow immediate visual verification of the cross-scenario claims.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive and constructive review of our manuscript on AWARE. We appreciate the acknowledgment of the framework's ability to produce emergent gaits and evasive behaviors through hierarchical RL, as well as the value placed on our real-world M20 hardware experiments and quantitative metrics. The recommendation for minor revision is noted, and we are prepared to address any remaining editorial points in the revised version.

Circularity Check

0 steps flagged

No significant circularity detected in derivation or claims

full rationale

The paper introduces AWARE, a hierarchical RL framework for reflexive obstacle evasion on wheeled-legged robots. No equations, derivations, or parameter-fitting steps are described that reduce by construction to inputs, self-definitions, or prior self-citations. The central claims rest on empirical validation via Isaac Lab simulation and real-world M20 hardware deployment, including observed emergent behaviors and quantitative success metrics. These constitute independent, falsifiable evidence outside any fitted values or internal definitions. No load-bearing self-citations, uniqueness theorems, or ansatz smuggling appear in the provided text. The architecture (high-level mode selection + low-level gait execution) is a standard decomposition for hybrid systems and does not collapse into tautology.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract provides no explicit free parameters, axioms, or invented entities; the approach relies on standard hierarchical reinforcement learning applied to the described robot platform.

pith-pipeline@v0.9.0 · 5555 in / 1024 out tokens · 44909 ms · 2026-05-08T05:57:01.855438+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

32 extracted references · 7 canonical work pages · 2 internal anchors

[1]

Max: A wheeled-legged quadruped robot for multimodal agile locomotion,

Q. Zhou, S. Yang, X. Jiang, D. Zhang, W. Chi, K. Chen, S. Zhang, J. Li, J. Zhang, R. Wanget al., “Max: A wheeled-legged quadruped robot for multimodal agile locomotion,”IEEE Transactions on Automation Science and Engineering, vol. 21, no. 4, pp. 7562–7582, 2023

2023
[2]

Hybrid driving-stepping locomotion with the wheeled-legged robot momaro,

M. Schwarz, T. Rodehutskors, M. Schreiber, and S. Behnke, “Hybrid driving-stepping locomotion with the wheeled-legged robot momaro,” in2016 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2016, pp. 5589–5595

2016
[3]

Balance control of a novel wheel-legged robot: Design and experiments,

S. Wang, L. Cui, J. Zhang, J. Lai, D. Zhang, K. Chen, Y . Zheng, Z. Zhang, and Z.-P. Jiang, “Balance control of a novel wheel-legged robot: Design and experiments,” in2021 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2021, pp. 6782–6788

2021
[4]

Ascento: A two-wheeled jumping robot,

V . Klemm, A. Morra, C. Salzmann, F. Tschopp, K. Bodie, L. Gulich, N. Küng, D. Mannhart, C. Pfister, M. Vierneiselet al., “Ascento: A two-wheeled jumping robot,” in2019 International conference on robotics and automation (ICRA). IEEE, 2019, pp. 7515–7521

2019
[5]

Centauro: A hybrid locomotion and high power resilient manipulation platform,

N. Kashiri, L. Baccelliere, L. Muratore, A. Laurenzi, Z. Ren, E. M. Hoffman, M. Kamedula, G. F. Rigano, J. Malzahn, S. Cordascoet al., “Centauro: A hybrid locomotion and high power resilient manipulation platform,”IEEE Robotics and Automation Letters, vol. 4, no. 2, pp. 1595–1602, 2019. (c) (d) (a) (b) Fig. 8. Real-robot experiments for high-dynamic obsta...

2019
[6]

Learning robust autonomous navigation and locomotion for wheeled- legged robots,

J. Lee, M. Bjelonic, A. Reske, L. Wellhausen, T. Miki, and M. Hutter, “Learning robust autonomous navigation and locomotion for wheeled- legged robots,”Science Robotics, vol. 9, no. 89, p. eadi9641, 2024

2024
[7]

Atros: Learning energy- efficient agile locomotion for wheeled-legged robots,

J. Sun, H. Ji, Z. Qu, C. Wang, and M. Zhang, “Atros: Learning energy- efficient agile locomotion for wheeled-legged robots,”arXiv preprint arXiv:2510.09980, 2025

work page arXiv 2025
[8]

Rolling in the deep–hybrid locomotion for wheeled-legged robots using online trajectory optimization,

M. Bjelonic, P. K. Sankar, C. D. Bellicoso, H. Vallery, and M. Hutter, “Rolling in the deep–hybrid locomotion for wheeled-legged robots using online trajectory optimization,”IEEE Robotics and Automation Letters, vol. 5, no. 2, pp. 3626–3633, 2020

2020
[9]

Dynamic hybrid locomo- tion and jumping for wheeled-legged quadrupeds,

M. Hosseini, D. Rodriguez, and S. Behnke, “Dynamic hybrid locomo- tion and jumping for wheeled-legged quadrupeds,” in2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2023, pp. 793–799

2023
[10]

Keep rollin’—whole-body motion control and planning for wheeled quadrupedal robots,

M. Bjelonic, C. D. Bellicoso, Y . de Viragh, D. Sako, F. D. Tresoldi, F. Jenelten, and M. Hutter, “Keep rollin’—whole-body motion control and planning for wheeled quadrupedal robots,”IEEE Robotics and Automation Letters, vol. 4, no. 2, pp. 2116–2123, 2019

2019
[11]

Balancing control and pose optimization for wheel-legged robots navigating high obstacles,

J. Li, J. Ma, and Q. Nguyen, “Balancing control and pose optimization for wheel-legged robots navigating high obstacles,” in2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2022, pp. 8835–8841

2022
[12]

Fast tube model predictive control for driverless cars using linear data-driven models,

B. A. H. Vicente, P. A. Trodden, and S. R. Anderson, “Fast tube model predictive control for driverless cars using linear data-driven models,” IEEE Transactions on Control Systems Technology, vol. 31, no. 3, pp. 1395–1410, 2022

2022
[13]

A collision-free mpc for whole-body dynamic locomotion and manipu- lation,

J.-R. Chiu, J.-P. Sleiman, M. Mittal, F. Farshidian, and M. Hutter, “A collision-free mpc for whole-body dynamic locomotion and manipu- lation,” in2022 international conference on robotics and automation (ICRA). IEEE, 2022, pp. 4686–4693

2022
[14]

Intent prediction- driven model predictive control for uav planning and navigation in dynamic environments,

Z. Xu, H. Jin, X. Han, H. Shen, and K. Shimada, “Intent prediction- driven model predictive control for uav planning and navigation in dynamic environments,”IEEE Robotics and Automation Letters, 2025

2025
[15]

Perceptive locomotion through nonlinear model-predictive control,

R. Grandia, F. Jenelten, S. Yang, F. Farshidian, and M. Hutter, “Perceptive locomotion through nonlinear model-predictive control,” IEEE Transactions on Robotics, vol. 39, no. 5, pp. 3402–3421, 2023

2023
[16]

Collision-free mpc for legged robots in static and dynamic scenes,

M. Gaertner, M. Bjelonic, F. Farshidian, and M. Hutter, “Collision-free mpc for legged robots in static and dynamic scenes,” in2021 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2021, pp. 8266–8272

2021
[17]

Rebot: Reflexive evasion robot for instantaneous dynamic obstacle avoidance,

Z. Xu, C. Hao, C. Wang, K. Sima, F. Shi, and J. S. Dong, “Rebot: Reflexive evasion robot for instantaneous dynamic obstacle avoidance,” arXiv preprint arXiv:2508.06229, 2025

work page arXiv 2025
[18]

Agile but safe: Learning collision-free high-speed legged locomotion,

T. He, C. Zhang, W. Xiao, G. He, C. Liu, and G. Shi, “Agile but safe: Learning collision-free high-speed legged locomotion,”arXiv preprint arXiv:2401.17583, 2024

work page arXiv 2024
[19]

Learning agile loco- motion on risky terrains,

C. Zhang, N. Rudin, D. Hoeller, and M. Hutter, “Learning agile loco- motion on risky terrains,” in2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2024, pp. 11 864– 11 871

2024
[20]

Learning robust perceptive locomotion for quadrupedal robots in the wild,

T. Miki, J. Lee, J. Hwangbo, L. Wellhausen, V . Koltun, and M. Hutter, “Learning robust perceptive locomotion for quadrupedal robots in the wild,”Science robotics, vol. 7, no. 62, p. eabk2822, 2022

2022
[21]

Learning agile and dynamic motor skills for legged robots,

J. Hwangbo, J. Lee, A. Dosovitskiy, D. Bellicoso, V . Tsounis, V . Koltun, and M. Hutter, “Learning agile and dynamic motor skills for legged robots,”Science Robotics, vol. 4, no. 26, p. eaau5872, 2019

2019
[22]

Advanced skills by learning locomotion and local navigation end-to-end,

N. Rudin, D. Hoeller, M. Bjelonic, and M. Hutter, “Advanced skills by learning locomotion and local navigation end-to-end,” in2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2022, pp. 2497–2503

2022
[23]

Ego-planner: An esdf- free gradient-based local planner for quadrotors,

X. Zhou, Z. Wang, H. Ye, C. Xu, and F. Gao, “Ego-planner: An esdf- free gradient-based local planner for quadrotors,”IEEE Robotics and Automation Letters, vol. 6, no. 2, pp. 478–485, 2020

2020
[24]

Reactive base control for on-the-move mobile manipulation in dynamic envi- ronments,

B. Burgess-Limerick, J. Haviland, C. Lehnert, and P. Corke, “Reactive base control for on-the-move mobile manipulation in dynamic envi- ronments,”IEEE Robotics and Automation Letters, vol. 9, no. 3, pp. 2048–2055, 2024

2048
[25]

Dynamic obstacle avoidance for car-like mobile robots based on neurodynamic optimization with control barrier functions,

Z. Zhang and G.-H. Yang, “Dynamic obstacle avoidance for car-like mobile robots based on neurodynamic optimization with control barrier functions,”Neurocomputing, p. 131252, 2025

2025
[26]

Ackerman unmanned mobile vehicle based on heterogeneous sensor in navigation control application,

C.-H. Shih, C.-J. Lin, and J.-Y . Jhang, “Ackerman unmanned mobile vehicle based on heterogeneous sensor in navigation control application,” Sensors, vol. 23, no. 9, p. 4558, 2023

2023
[27]

Anymal parkour: Learning agile navigation for quadrupedal robots,

D. Hoeller, N. Rudin, D. Sako, and M. Hutter, “Anymal parkour: Learning agile navigation for quadrupedal robots,”Science Robotics, vol. 9, no. 88, p. eadi7566, 2024

2024
[28]

Dynamic obstacle avoidance with bounded rationality adversarial reinforcement learning,

J.-L. Holgado-Alvarez, A. Reddi, and C. D’Eramo, “Dynamic obstacle avoidance with bounded rationality adversarial reinforcement learning,” arXiv preprint arXiv:2503.11467, 2025

work page arXiv 2025
[29]

Moe-loco: Mixture of experts for multitask locomotion,

R. Huang, S. Zhu, Y . Du, and H. Zhao, “Moe-loco: Mixture of experts for multitask locomotion,” in2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2025, pp. 14 218– 14 225

2025
[30]

GMT: General motion tracking for humanoid whole-body control.arXiv preprint arXiv:2506.14770, 2025

Z. Chen, M. Ji, X. Cheng, X. Peng, X. B. Peng, and X. Wang, “Gmt: General motion tracking for humanoid whole-body control,”arXiv preprint arXiv:2506.14770, 2025

work page arXiv 2025
[31]

Proximal Policy Optimization Algorithms

J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Prox- imal policy optimization algorithms,”arXiv preprint arXiv:1707.06347, 2017

work page internal anchor Pith review arXiv 2017
[32]

Isaac Lab: A GPU-Accelerated Simulation Framework for Multi-Modal Robot Learning

M. Mittal, P. Roth, J. Tigue, A. Richard, O. Zhang, P. Du, A. Serrano- Munoz, X. Yao, R. Zurbrügg, N. Rudinet al., “Isaac lab: A gpu- accelerated simulation framework for multi-modal robot learning,” arXiv preprint arXiv:2511.04831, 2025

work page internal anchor Pith review arXiv 2025