Long-Distance Real-World Navigation of the Legged-Wheeled Robot Go2-W Using Deep Reinforcement Learning
Pith reviewed 2026-06-26 13:55 UTC · model grok-4.3
The pith
A proprioception-only deep reinforcement learning policy extended to the 16-DoF Go2-W robot with load-distribution training enables 2.8 km autonomous real-world navigation without overheating.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper establishes that extending a proprioception-only DRL policy to the 16-DoF legged-wheeled Go2-W and training it to distribute load produces stable locomotion that suppresses hip-joint heat concentration, allowing sustained autonomous traversal of a 2.8 km real-world route that includes sidewalks, a park, and stairs without stopping due to overheating.
What carries the argument
The extended proprioception-only DRL policy augmented with load-distribution training on the Go2-W platform, which balances actuator effort to prevent localized overheating during wheeled segments.
If this is right
- Hybrid legged-wheeled robots become capable of multi-kilometer autonomous missions on mixed outdoor terrain without external cooling or frequent stops.
- Proprioception alone suffices for reliable long-duration control once load distribution is included in training.
- The same policy transfer approach can be applied to other commercial legged-wheeled platforms to shorten development time for real-world navigation.
- Thermal limits shift from hardware redesign to software objectives that can be optimized during training.
Where Pith is reading between the lines
- Similar load-balancing objectives could be added to policies for other hybrid robots to extend their operational range.
- The method may allow legged-wheeled systems to replace wheeled robots on tasks that occasionally require stair climbing without sacrificing flat-ground efficiency.
- Integration with global planners could turn the demonstrated local controller into a city-scale autonomous delivery or inspection system.
Load-bearing premise
That the proprioception-only policy developed for quadrupeds can be extended to the Go2-W and, when retrained for load distribution, will remain stable and thermally safe over multi-kilometer real-world distances.
What would settle it
A single continuous run on the same 2.8 km Tsukuba Challenge route in which the Go2-W stops from hip-joint overheating despite the load-distribution training.
read the original abstract
Legged-wheeled robots have long been studied for their potential to combine the efficient flat-ground mobility of wheels with the rough-terrain capability of legs. However, examples of their application to long-range autonomous navigation in real environments remain limited. This paper reports our effort to build a deep reinforcement learning (DRL) based locomotion controller and an autonomous navigation system for the commercially available legged-wheeled robot Go2-W, and to apply them to long-range autonomous navigation in a real environment. For locomotion control, we extended a proprioception-only policy, which we had previously developed for quadruped robots, to the 16-DoF legged-wheeled robot. We also found that wheeled locomotion concentrates the load on the hip joints and causes heat concentration that hinders sustained travel, and obtained a policy that suppresses it by distributing the load. We evaluated the system at the Tsukuba Challenge 2025, demonstrating that it can autonomously traverse an approximately 2.8 km route including sidewalks, a park, and stairs without stopping due to overheating.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper reports the extension of a prior proprioception-only DRL locomotion policy from quadruped robots to the 16-DoF legged-wheeled Go2-W platform. It identifies hip-joint heat concentration during wheeled locomotion as a platform-specific issue and describes obtaining a policy via load-distribution training to mitigate it. The central empirical result is a successful autonomous 2.8 km traversal at the Tsukuba Challenge 2025 across sidewalks, a park, and stairs without overheating-related stops.
Significance. If the deployment result holds under scrutiny, the work provides a concrete systems-level demonstration of long-range real-world navigation with a commercial legged-wheeled robot using DRL. The explicit identification and training-based mitigation of thermal load concentration on a hybrid platform is a practical contribution. The use of a public competition as the evaluation venue supplies a falsifiable, high-stakes test of sustained operation.
major comments (2)
- [Evaluation at Tsukuba Challenge 2025] The manuscript supplies no quantitative metrics (traversal time, average speed, joint-temperature time series, or power-consumption data), ablation results, or failure-mode analysis to support the claim of 2.8 km autonomous navigation without overheating stoppages. This information is load-bearing for the central empirical assertion.
- [Locomotion control description] The load-distribution training procedure is described only in a single sentence; no modified reward terms, training hyperparameters, simulation setup, or comparison against the baseline policy are provided. These details are required to evaluate how the heat-mitigation claim was achieved and whether the extension to 16 DoF is reproducible.
Simulated Author's Rebuttal
We thank the referee for the constructive comments on our manuscript. The feedback identifies opportunities to strengthen the empirical support and methodological detail. We respond to each major comment below and will revise the manuscript accordingly.
read point-by-point responses
-
Referee: [Evaluation at Tsukuba Challenge 2025] The manuscript supplies no quantitative metrics (traversal time, average speed, joint-temperature time series, or power-consumption data), ablation results, or failure-mode analysis to support the claim of 2.8 km autonomous navigation without overheating stoppages. This information is load-bearing for the central empirical assertion.
Authors: We agree that the current manuscript would be strengthened by additional quantitative metrics from the Tsukuba Challenge 2025 deployment. The primary claim rests on successful completion of the 2.8 km route without overheating-related stops. In revision we will incorporate logged data on traversal time, average speed, joint-temperature time series, and power consumption. We will also add a discussion of observed failure modes and any minor issues encountered during the event. Controlled ablations were outside the scope of the public competition evaluation, but we will include qualitative comparisons to the baseline policy where supporting simulation or field data exist. revision: yes
-
Referee: [Locomotion control description] The load-distribution training procedure is described only in a single sentence; no modified reward terms, training hyperparameters, simulation setup, or comparison against the baseline policy are provided. These details are required to evaluate how the heat-mitigation claim was achieved and whether the extension to 16 DoF is reproducible.
Authors: The load-distribution training procedure is presented concisely. We will expand the methods section in revision to specify the modified reward terms used to penalize hip-joint load concentration, the training hyperparameters, the simulation environment configuration for the 16-DoF Go2-W platform, and direct comparisons of joint-load and temperature metrics between the baseline and modified policies. These additions will improve reproducibility of the extension from prior quadruped work. revision: yes
Circularity Check
Empirical systems report with no derivation chain
full rationale
The paper is a deployment report describing extension of a prior proprioception-only DRL policy to the Go2-W platform, addition of load-distribution training, and real-world evaluation over 2.8 km at Tsukuba Challenge 2025. No equations, predictions, or uniqueness theorems are invoked; the result is an empirical outcome rather than a derived claim. Self-citation of prior policy work is present but not load-bearing for any circular reduction. This matches the default non-circular case for systems papers.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
In: Robotics in Natural Settings (CLAWAR 2022)
Bjelonic, M., Klemm, V., Lee, J., Hutter, M.: A survey of Wheeled-Legged robots. In: Robotics in Natural Settings (CLAWAR 2022). Lecture Notes in Networks and Systems, vol. 530, pp. 83–94. Springer, Cham (2022). https://doi.org/10.1007/ 978-3-031-15226-9 11 28
2022
-
[2]
Journal of the Robotics Society of Japan 10(4), 520–525 (1992) https://doi.org/10.7210/jrsj.10.520
Kimura, H., Nakano, E., Nonaka, Y.: Development of leg-wheel robot and coop- erational motion of legs and wheels. Journal of the Robotics Society of Japan 10(4), 520–525 (1992) https://doi.org/10.7210/jrsj.10.520 . (in Japanese)
-
[3]
Advanced Robotics26(8-9), 969–988 (2012) https://doi.org/10.1163/156855312X633066
Endo, G., Hirose, S.: Study on Roller-Walker – improvement of locomotive efficiency of quadruped robots by passive wheels. Advanced Robotics26(8-9), 969–988 (2012) https://doi.org/10.1163/156855312X633066
-
[4]
In: 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
Bjelonic, M., Grandia, R., Harley, O., Galliard, C., Zimmermann, S., Hutter, M.: Whole-body MPC and online gait sequence generation for wheeled-legged robots. In: 2021 IEEE/RSJ International Conference on Intelligent Robots and Sys- tems (IROS), pp. 8388–8395 (2021). https://doi.org/10.1109/IROS51168.2021. 9636371
-
[5]
Chamorro, S., Klemm, V., La Iglesia Valls, M., Pal, C., Siegwart, R.: Reinforce- ment learning for blind stair climbing with legged and wheeled-legged robots. In: Proceedings of the 2024 IEEE International Conference on Robotics and Automa- tion (ICRA), pp. 8081–8087 (2024). https://doi.org/10.1109/ICRA57147.2024. 10610069
-
[6]
https://www.unitree.com/ go2-w
Unitree Robotics: Unitree Go2-W Driving All Terrain. https://www.unitree.com/ go2-w. Accessed: 2026-06-11 (2024)
2026
-
[7]
https://www
Ascento Robotics: Ascento – Secure Assets with Robotics and AI. https://www. ascento.ai. Accessed: 2026-06-11 (2024)
2026
-
[8]
Science Robotics9(89), eadi9641 (2024) https://doi.org/10.1126/scirobotics.adi9641
Lee, J., Bjelonic, M., Reske, A., Wellhausen, L., Miki, T., Hutter, M.: Learning robust autonomous navigation and locomotion for wheeled-legged robots. Science Robotics9(89), eadi9641 (2024) https://doi.org/10.1126/scirobotics.adi9641
-
[9]
Text-driven affordance learning from egocentric vision.Adv
Irie, K., Yoshida, T., Matsuzawa, T., Suzuki, T., Hara, Y., Tomono, M.: Rough terrain navigation for a quadruped robot using deep reinforcement learning-based blind locomotion control and a stuck-escape strategy. Advanced Robotics39(18), 1182–1198 (2025) https://doi.org/10.1080/01691864.2025.2561643
-
[10]
In: Proceedings of Robotics: Science and Systems, Pittsburgh, Pennsylvania (2018)
Tan, J., Zhang, T., Coumans, E., Iscen, A., Bai, Y., Hafner, D., Bohez, S., Vanhoucke, V.: Sim-to-real: Learning agile locomotion for quadruped robots. In: Proceedings of Robotics: Science and Systems, Pittsburgh, Pennsylvania (2018). https://doi.org/10.15607/RSS.2018.XIV.010
-
[11]
Hwangbo, J., Lee, J., Dosovitskiy, A., Bellicoso, D., Tsounis, V., Koltun, V., Hutter, M.: Learning agile and dynamic motor skills for legged robots. Science Robotics4(26), eaau5872 (2019) https://doi.org/10.1126/scirobotics.aau5872
-
[12]
Science Robotics5(47), eabc5986 (2020) https://doi.org/10.1126/scirobotics.abc5986 29
Lee, J., Hwangbo, J., Wellhausen, L., Koltun, V., Hutter, M.: Learning quadrupedal locomotion over challenging terrain. Science Robotics5(47), eabc5986 (2020) https://doi.org/10.1126/scirobotics.abc5986 29
-
[13]
Kumar, A., Fu, Z., Pathak, D., Malik, J.: RMA: Rapid Motor Adaptation for Legged Robots. In: Proceedings of Robotics: Science and Systems, Virtual (2021). https://doi.org/10.15607/RSS.2021.XVII.011
-
[14]
URLhttps://doi.org/10.1126/scirobotics
Miki, T., Lee, J., Hwangbo, J., Wellhausen, L., Koltun, V., Hutter, M.: Learning robust perceptive locomotion for quadrupedal robots in the wild. Science Robotics 7(62), eabk2822 (2022) https://doi.org/10.1126/scirobotics.abk2822
-
[15]
Isaac Gym: High Performance GPU-Based Physics Simulation For Robot Learning
Makoviychuk, V., Wawrzyniak, L., Guo, Y., Lu, M., Storey, K., Macklin, M., Hoeller, D., Rudin, N., Allshire, A., Handa, A., State, G.: Isaac gym: High per- formance GPU based physics simulation for robot learning. In: Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks, vol. 1 (2021). https://doi.org/10.48550/arXiv.2108.10470
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2108.10470 2021
-
[16]
Rudin, N., Hoeller, D., Reist, P., Hutter, M.: Learning to walk in minutes using massively parallel deep reinforcement learning. In: Proceedings of the 5th Con- ference on Robot Learning, pp. 91–100 (2021). https://doi.org/10.48550/arXiv. 2109.11978
work page internal anchor Pith review doi:10.48550/arxiv 2021
-
[17]
https://doi.org/10.48550/arXiv.2502.08844
Zakka, K., Tabanpour, B., Liao, Q., Haiderbhai, M., Holt, S., Luo, J.Y., Allshire, A., Frey, E., Sreenath, K., Kahrs, L.A., Sferrazza, C., Tassa, Y., Abbeel, P.: MuJoCo Playground (2025). https://doi.org/10.48550/arXiv.2502.08844
-
[18]
Xu, Z., Raj, A.H., Xiao, X., Stone, P.: Dexterous legged locomotion in confined 3D spaces with reinforcement learning. In: 2024 IEEE International Conference on Robotics and Automation (ICRA) (2024). https://doi.org/10.1109/ICRA57147. 2024.10610668
-
[19]
IEEE Robotics and Automation Letters9(11), 9986–9993 (2024) https://doi.org/10.1109/LRA.2024.3459797
Luo, S., Li, S., Yu, R., Wang, Z., Wu, J., Zhu, Q.: PIE: Parkour with implicit- explicit learning framework for legged robots. IEEE Robotics and Automation Letters9(11), 9986–9993 (2024) https://doi.org/10.1109/LRA.2024.3459797
-
[20]
npj Robotics3(22) (2025) https://doi.org/10.1038/s44182-025-00043-2
Xiao, E., Dong, Y., Lam, J., Lu, P.: Learning stable bipedal locomotion skills for quadrupedal robots on challenging terrains with automatic fall recovery. npj Robotics3(22) (2025) https://doi.org/10.1038/s44182-025-00043-2
-
[21]
In: 2019 IEEE International Conference on Robotics and Automation (ICRA), pp
Klemm, V., Morra, A., Salzmann, C., Tschopp, F., Bodie, K., Gulich, L., K¨ ung, N., Mannhart, D., Pfister, C., Vierneisel, M., Weber, F., Deuber, R., Siegwart, R.: Ascento: A two-wheeled jumping robot. In: 2019 IEEE International Conference on Robotics and Automation (ICRA), pp. 7515–7521 (2019). https://doi.org/10. 1109/ICRA.2019.8793792
arXiv 2019
-
[22]
Klemm, V., Morra, A., Gulich, L., Mannhart, D., Rohr, D., Kamel, M., Viragh, Y., Siegwart, R.: LQR-assisted whole-body control of a wheeled bipedal robot with kinematic loops. IEEE Robotics and Automation Letters5(2), 3745–3752 (2020) https://doi.org/10.1109/LRA.2020.2979625 30
-
[23]
IEEE Robotics and Automation Letters4(2), 2116–2123 (2019) https://doi.org/10.1109/LRA.2019.2899750
Bjelonic, M., Bellicoso, C.D., Viragh, Y., Sako, D., Tresoldi, F.D., Jenelten, F., Hutter, M.: Keep rollin’—whole-body motion control and planning for wheeled quadrupedal robots. IEEE Robotics and Automation Letters4(2), 2116–2123 (2019) https://doi.org/10.1109/LRA.2019.2899750
-
[24]
Takubo, T., Yoshioka, T., Arai, T., Mae, Y., Ohara, K.: Leg-wheel hybrid loco- motion for multi-legged robot. Transactions of the Japan Society of Mechanical Engineers, Series C75(759), 2996–3004 (2009) https://doi.org/10.1299/kikaic.75. 2996 . (in Japanese)
-
[25]
Journal of the Robotics Society of Japan40(5), 421–430 (2022) https://doi.org/10.7210/jrsj.40.421
Oda, K., Ida, Y., Ishikawa, J., Hiraoka, M., Hyon, S.-H.: Realization of whole-body torque-controlled hydraulic wheel-on-leg rover. Journal of the Robotics Society of Japan40(5), 421–430 (2022) https://doi.org/10.7210/jrsj.40.421 . (in Japanese)
-
[26]
In: 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems, Nice, France, pp
Besseron, G., Grand, C., Amar, F.B., Bidaud, P.: Decoupled control of the high mobility robot Hylos based on a dynamic stability margin. In: 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems, Nice, France, pp. 2435–2440 (2008). https://doi.org/10.1109/IROS.2008.4651092
-
[27]
InProceedings of Robotics: Science and Systems, DOI: 10.15607/RSS
Pinto, L., Andrychowicz, M., Welinder, P., Zaremba, W., Abbeel, P.: Asymmetric actor critic for image-based robot learning. In: Proceedings of Robotics: Science and Systems, Pittsburgh, Pennsylvania (2018). https://doi.org/10.15607/RSS. 2018.XIV.008
-
[28]
Proximal Policy Optimization Algorithms
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal Policy Optimization Algorithms (2017). https://doi.org/10.48550/arXiv.1707.06347
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1707.06347 2017
-
[29]
Tobin, J., Fong, R., Ray, A., Schneider, J., Zaremba, W., Abbeel, P.: Domain randomization for transferring deep neural networks from simulation to the real world. In: 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 23–30 (2017). https://doi.org/10.1109/IROS.2017.8202133
-
[30]
IEEE Transactions on Robotics38(4), 2053–2073 (2022) https: //doi.org/10.1109/TRO.2022.3141876
Xu, W., Cai, Y., He, D., Lin, J., Zhang, F.: FAST-LIO2: Fast direct LiDAR- inertial odometry. IEEE Transactions on Robotics38(4), 2053–2073 (2022) https: //doi.org/10.1109/TRO.2022.3141876
-
[31]
Journal of Open Source Software9(100), 6948 (2024) https://doi.org/10
Koide, K.: small gicp: Efficient and parallel algorithms for point cloud registra- tion. Journal of Open Source Software9(100), 6948 (2024) https://doi.org/10. 21105/joss.06948
2024
-
[32]
In: Proceedings of the Japan Society of Mechanical Engineers Robotics and Mechatronics Conference (ROBOMECH2021) (2021)
Irie, K., Suzuki, T., Hara, Y., Yoshida, T., Tomono, M., Nishimura, K., Yamato, H., Shimizu, M.: Autonomous navigation of a legged robot using data-driven path following. In: Proceedings of the Japan Society of Mechanical Engineers Robotics and Mechatronics Conference (ROBOMECH2021) (2021). (in Japanese) 1P1-L01
2021
-
[33]
31 Journal of Robotics and Mechatronics30(4), 504–512 (2018) https://doi.org/10
Yuta, S.: Tsukuba Challenge: Open experiments for autonomous navigation of mobile robots in the city – activities and results of the first and second stages –. 31 Journal of Robotics and Mechatronics30(4), 504–512 (2018) https://doi.org/10. 20965/jrm.2018.p0504
2018
-
[34]
Journal of Robotics and Mechatronics32(6), 1104–1111 (2020) https://doi.org/10.20965/jrm.2020.p1104
Hara, Y., Tomizawa, T., Date, H., Kuroda, Y., Tsubouchi, T.: Tsukuba Chal- lenge 2019: Task settings and experimental results. Journal of Robotics and Mechatronics32(6), 1104–1111 (2020) https://doi.org/10.20965/jrm.2020.p1104
-
[35]
running the Tsukuba Challenge 2007
Saiki, Y.M., Takeuchi, E., Carballo, A., Tokunaga, W., Kuniyoshi, H., Aburadani, A., Hirosawa, A., Nagasaka, Y., Suzuki, Y., Tsubouchi, T.: 1Km autonomous robot navigation on outdoor pedestrian paths “running the Tsukuba Challenge 2007”. In: 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 219–225 (2008). https://doi.org/10.11...
-
[36]
Robotics and Autonomous Systems179, 104750 (2024) https://doi.org/10.1016/j.robot.2024
Koide, K., Yokozuka, M., Oishi, S., Banno, A.: GLIM: 3D range-inertial local- ization and mapping with GPU-accelerated scan matching factors. Robotics and Autonomous Systems179, 104750 (2024) https://doi.org/10.1016/j.robot.2024. 104750
-
[37]
https://github.com/tsukubachallenge/tc-datasets
Tsukuba Challenge Datasets: Real World Datasets for Autonomous Navigation. https://github.com/tsukubachallenge/tc-datasets. Accessed: 2026-06-11
2026
-
[38]
Learning to Balance Motor Thermal Safety and Quadrupedal Locomotion Performance with Residual Policy
Wan, Y., Lin, W., Qian, L., Zou, Y., Wu, W., Wu, S., Zhao, C., Luo, X.: Learning to Balance Motor Thermal Safety and Quadrupedal Locomotion Performance with Residual Policy (2026). https://doi.org/10.48550/arXiv.2605.27046
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2605.27046 2026
-
[39]
In: Proceedings of Robotics: Science and Systems, Virtual (2021)
Zhang, Z., Fisac, J.F.: Safe Occlusion-Aware autonomous driving via Game- Theoretic active perception. In: Proceedings of Robotics: Science and Systems, Virtual (2021). https://doi.org/10.15607/RSS.2021.XVII.066
-
[40]
IEEE Transactions on Control Systems Technology33(3), 940–951 (2025) https://doi
Firoozi, R., Mir, A., Camps, G.S., Schwager, M.: OA-MPC: Occlusion-Aware MPC for guaranteed safe robot navigation with unseen dynamic obstacles. IEEE Transactions on Control Systems Technology33(3), 940–951 (2025) https://doi. org/10.1109/TCST.2024.3520462
-
[41]
Autonomous Robots49, 19 (2025) https://doi.org/10.1007/s10514-025-10202-x
Mattamala, M., Frey, J., Libera, P., Chebrolu, N., Martius, G., Cadena, C., Hutter, M., Fallon, M.: Wild visual navigation: Fast traversability learning via pre-trained models and online self-supervision. Autonomous Robots49, 19 (2025) https://doi.org/10.1007/s10514-025-10202-x
-
[42]
IEEE Robotics and Automation Letters9(11), 10423–10430 (2024) https://doi
Kim, Y., Lee, J.H., Lee, C., Mun, J., Youm, D., Park, J., Hwangbo, J.: Learning semantic traversability with egocentric video and automated annotation strategy. IEEE Robotics and Automation Letters9(11), 10423–10430 (2024) https://doi. org/10.1109/LRA.2024.3474548
-
[43]
Lu, F., Milios, E.: Globally consistent range scan alignment for environment mapping. Autonomous Robots4, 333–349 (1997) https://doi.org/10.1023/A: 1008854305733 32
work page doi:10.1023/a: 1997
-
[44]
In: Pro- ceedings of the 35th International Conference on Machine Learning
Haarnoja, T., Zhou, A., Abbeel, P., Levine, S.: Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In: Pro- ceedings of the 35th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 80, pp. 1861–1870 (2018) 33
2018
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.