Reinforcement Learning-Based Control for an Inline Skating Humanoid Robot

Clemens Schwarke; Ethan Marot; Marco Hutter; Raffaello D'Andrea; Thomas Bi; Victor Klemm

arxiv: 2606.31807 · v1 · pith:Z4RLVFA6new · submitted 2026-06-30 · 💻 cs.RO

Reinforcement Learning-Based Control for an Inline Skating Humanoid Robot

Ethan Marot , Thomas Bi , Clemens Schwarke , Victor Klemm , Marco Hutter , Raffaello D'Andrea This is my paper

Pith reviewed 2026-07-01 05:14 UTC · model grok-4.3

classification 💻 cs.RO

keywords reinforcement learninghumanoid robotinline skatingpassive wheelslocomotion controlcost of transportsim-to-real transferdynamic balance

0 comments

The pith

Reinforcement learning produces a control policy for a humanoid robot on passive inline skates that cuts energy cost by up to 50 percent versus walking and transfers zero-shot to hardware.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper trains a reinforcement learning policy that lets a humanoid robot use consumer inline skates for locomotion instead of feet. Training relies on simulation with spherical and ellipsoidal wheel models, a success-based command curriculum, and a rolling reward to generate propulsion strategies from scratch. The resulting policy achieves lower energy use than standard walking and performs dynamic balance, perturbation rejection, and turning on the physical robot without further tuning. A sympathetic reader would care because passive wheels offer a route to faster, cheaper locomotion for humanoids if the sim-to-real gap can be closed this way.

Core claim

An RL policy trained without human motion data or imitation enables precise 6-DoF control of passive inline skates on a humanoid robot. Propulsion and balance strategies emerge from the reward structure when the policy is trained across spherical and ellipsoidal wheel geometries plus a custom curriculum and rolling reward. The policy reduces Cost of Transport by up to 50 percent relative to walking gaits and transfers directly to the Booster T1 hardware, where it maintains dynamic balance, rejects physical perturbations, and executes agile turns at speed.

What carries the argument

The reinforcement learning policy trained with multiple geometric wheel models during simulation, a success-based command curriculum, and a specialized rolling reward to produce edge-driven skating.

If this is right

The policy achieves up to 50 percent lower Cost of Transport than standard walking gaits.
Locomotion transfers zero-shot from simulation to the physical Booster T1 hardware.
The robot demonstrates dynamic balance and rejection of active physical perturbations.
Agile turning at speed becomes possible through the learned skating strategies.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar reward and curriculum techniques could be tested on other passive-wheeled platforms to check whether the 50 percent energy reduction generalizes beyond this robot.
If the approach scales, it opens a path to energy-efficient locomotion that combines legged versatility with wheeled efficiency without adding motors to the wheels.
The absence of motion capture data suggests the method could be applied to robots whose human-like skating motion has no prior examples.

Load-bearing premise

Training with two different wheel shapes in simulation plus the custom curriculum and rolling reward is enough to overcome contact modeling errors and produce policies that work on real passive skates with no hardware fine-tuning.

What would settle it

Deploy the same policy on the physical robot after training with only one wheel geometry instead of both spherical and ellipsoidal models; if balance fails or Cost of Transport stays comparable to walking, the claim does not hold.

Figures

Figures reproduced from arXiv: 2606.31807 by Clemens Schwarke, Ethan Marot, Marco Hutter, Raffaello D'Andrea, Thomas Bi, Victor Klemm.

**Figure 2.** Figure 2: SLA mounts and inline skate bracket and wheels. The mounts are [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Spherical primitives representing wheel contact points are modeled [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 5.** Figure 5: Real-world deployment of the learned skating policy turning at [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗

**Figure 6.** Figure 6: Position-velocity trajectories of leg joints from 0 m/s to 1.8 m/s and back to 0 m/s forward velocity command, with zero yaw commands. Isaac Gym [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗

**Figure 7.** Figure 7: CoT values for the baseline walking policy compared to our [PITH_FULL_IMAGE:figures/full_fig_p007_7.png] view at source ↗

read the original abstract

As humanoid robots become increasingly dynamic, coupling them with reinforcement learning offers a promising approach to solving the complex, underactuated mechanics of passive inline skating. Equipping a humanoid robot with passive inline skating wheels presents an opportunity to combine the versatile agility of humanoids with the high-speed, energy-efficient locomotion strategies utilized by human skaters. In this paper, we train and deploy a reinforcement learning control policy that enables novel locomotion strategies for a humanoid robot modified to equip consumer inline skates instead of conventional feet. Unlike previous work limited to quadrupedal robots or actively driven wheels, our system allows for precise 6-DoF control of the skates to execute dynamic, edge-driven propulsion strategies. Our skating strategies emerge entirely from our reward structure, without reliance on human motion data, imitation learning, or kinematic priors. We overcome the inherent instability of passive wheels and simulation contact artifacts by utilizing different geometric wheel models (spherical and ellipsoidal) during training and validation, along with a custom success-based command curriculum and a specialized rolling reward. Consequently, our policy demonstrates up to a 50% reduction in Cost of Transport (CoT) compared to standard walking gaits. The resulting policy successfully transfers zero-shot to the physical Booster T1 hardware. Real-world deployments demonstrate dynamic balance, the ability to reject active physical perturbations, and agile locomotion strategies capable of turning at speed. A video of our results can be found at https://www.youtube.com/watch?v=-_APcOS7uFo.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gets an RL policy for passive inline skating on a humanoid to transfer zero-shot to hardware with up to 50% lower CoT than walking, using multi-geometry wheel models and a custom curriculum.

read the letter

The main takeaway is that they trained an RL controller for a full humanoid on consumer inline skates that runs on the real Booster T1 without fine-tuning and uses less energy than walking while handling turns and pushes.

What is new is the combination of passive wheels with 6-DoF skate control on a biped, trained entirely from reward without motion data or priors. They alternate spherical and ellipsoidal wheel models in simulation, add a success-based command curriculum, and include a rolling reward to push the policy toward proper propulsion. The hardware results show dynamic balance and agile maneuvers that prior quadruped or active-wheel work did not demonstrate on this platform.

The soft spot is the lack of direct evidence in the abstract that the wheel geometry choices actually close the contact gap. No force comparisons or ablation numbers are given to show how much the ellipsoidal model helps versus spherical alone, or how well simulated rolling resistance matches the real skates. If the full paper has those measurements, the zero-shot claim strengthens; otherwise the transfer success rests more on the curriculum and reward tuning.

This is useful for groups working on hybrid wheeled-legged systems or sim-to-real for underactuated contact. Readers who need concrete hardware examples of energy-efficient passive locomotion will get value from the experiments. The methods are described clearly enough on their own terms to merit referee time.

Referee Report

2 major / 1 minor

Summary. The manuscript presents a reinforcement learning policy for controlling a humanoid robot equipped with passive inline skates on the Booster T1 platform. Training alternates between spherical and ellipsoidal wheel geometries, incorporates a success-based command curriculum and rolling reward, and produces emergent skating gaits without imitation learning or kinematic priors. The central claims are up to 50% reduction in Cost of Transport relative to walking gaits and successful zero-shot sim-to-real transfer, with real-world demonstrations of dynamic balance, perturbation rejection, and agile turning maneuvers.

Significance. If the zero-shot transfer and CoT reduction hold under rigorous validation, the work would advance RL control of underactuated passive-wheeled humanoids by showing that energy-efficient, dynamic locomotion strategies can be obtained purely from reward design. The multi-geometry wheel modeling technique for mitigating contact artifacts constitutes a practical methodological contribution for sim-to-real problems in contact-rich locomotion.

major comments (2)

[Abstract] Abstract: The claim of reliable zero-shot transfer to passive inline skates is load-bearing on the spherical/ellipsoidal wheel models plus curriculum and rolling reward overcoming simulation contact artifacts, yet no quantitative evidence (measured vs. simulated normal/tangential force traces, identified rolling-resistance coefficients, or ablation on model fidelity) is supplied to confirm these choices close the gap for the real Booster T1 skates.
[Abstract] Abstract: The reported 50% CoT reduction is presented as a headline result, but the abstract does not specify the exact baseline walking gait, the CoT measurement protocol (simulation vs. hardware), or error bars, making it impossible to assess whether the improvement is robust or sensitive to the free reward-structure weights.

minor comments (1)

[Abstract] Abstract: The provided video link is a positive addition for conveying the locomotion behaviors.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for the constructive feedback focused on the abstract. We will revise the abstract for greater precision on the CoT claim and to clarify the nature of the zero-shot validation. We address each point below.

read point-by-point responses

Referee: [Abstract] Abstract: The claim of reliable zero-shot transfer to passive inline skates is load-bearing on the spherical/ellipsoidal wheel models plus curriculum and rolling reward overcoming simulation contact artifacts, yet no quantitative evidence (measured vs. simulated normal/tangential force traces, identified rolling-resistance coefficients, or ablation on model fidelity) is supplied to confirm these choices close the gap for the real Booster T1 skates.

Authors: The manuscript validates zero-shot transfer through successful hardware deployment, including balance, perturbation rejection, and turning maneuvers on the physical Booster T1, without policy fine-tuning. Direct force trace comparisons, rolling-resistance identification, or model-fidelity ablations are not included, as the evaluation prioritized end-to-end locomotion metrics over contact-level instrumentation. We will revise the abstract to state that transfer success is demonstrated via hardware performance rather than force matching, and we will add a sentence noting this as a limitation of the current validation. revision: partial
Referee: [Abstract] Abstract: The reported 50% CoT reduction is presented as a headline result, but the abstract does not specify the exact baseline walking gait, the CoT measurement protocol (simulation vs. hardware), or error bars, making it impossible to assess whether the improvement is robust or sensitive to the free reward-structure weights.

Authors: We will revise the abstract to specify: (1) the baseline is a walking policy trained with the identical RL framework and reward structure on the Booster T1 platform without skates; (2) CoT is computed using the standard mechanical power / forward velocity formulation and reported for both simulation and hardware; (3) the up-to-50% figure includes standard deviation across repeated trials; and (4) the reduction remained consistent under moderate variations of the free reward weights. These details will be added while preserving the abstract length. revision: yes

standing simulated objections not resolved

No quantitative force traces, identified rolling-resistance coefficients, or model-fidelity ablations are present in the manuscript to directly confirm that the spherical/ellipsoidal modeling closes the sim-to-real gap.

Circularity Check

0 steps flagged

No circularity: empirical RL training and hardware results are independent of inputs

full rationale

The paper reports an RL policy trained with geometric wheel models, a success-based curriculum, and a rolling reward to achieve skating locomotion. The 50% CoT reduction and zero-shot hardware transfer are presented as measured outcomes of this training process on the Booster T1, not as quantities derived by algebraic construction or by renaming fitted parameters. No equations reduce the reported performance metrics to the training inputs by definition, and no self-citation chain is invoked to justify uniqueness or force the result. The derivation chain consists of standard RL optimization whose outputs are externally validated on physical hardware.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

Only abstract available; specific free parameters such as reward weights and curriculum thresholds are not reported. The approach rests on the domain assumption that varied wheel geometries mitigate sim contact artifacts.

free parameters (1)

reward structure weights
RL reward terms for rolling and success-based curriculum are tuned to produce skating behavior; exact values not stated in abstract.

axioms (1)

domain assumption Simulation contact dynamics with spherical and ellipsoidal wheel models sufficiently approximate real passive inline skate behavior for zero-shot transfer.
Invoked to overcome instability and artifacts during training and validation.

pith-pipeline@v0.9.1-grok · 5810 in / 1269 out tokens · 34882 ms · 2026-07-01T05:14:16.675789+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

26 extracted references · 7 canonical work pages · 4 internal anchors

[1]

Perceptive Humanoid Parkour: Chaining Dynamic Human Skills via Motion Matching

Z. Wuet al., “Perceptive humanoid parkour: Chaining dynamic human skills via motion matching,” 2026. [Online]. Available: https://arxiv.org/abs/2602.15827

work page internal anchor Pith review Pith/arXiv arXiv 2026
[2]

Humanoid whole-body locomotion on narrow terrain via dynamic balance and reinforcement learning,

W. Xie, C. Bai, J. Shi, J. Yang, Y . Ge, W. Zhang, and X. Li, “Humanoid whole-body locomotion on narrow terrain via dynamic balance and reinforcement learning,” in2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2025, pp. 4751– 4758

2025
[3]

(2025) Booster t1: High-performance humanoid robot

Booster Robotics. (2025) Booster t1: High-performance humanoid robot. [Online]. Available: https://www.booster.tech/booster-t1/

2025
[4]

Study on roller-walker — improvement of locomotive efficiency of quadruped robots by passive wheels,

G. Endo and S. Hirose, “Study on roller-walker — improvement of locomotive efficiency of quadruped robots by passive wheels,” Advanced Robotics, vol. 26, no. 8-9, pp. 969–988, 2012

2012
[5]

Skating with a force controlled quadrupedal robot,

M. Bjelonic, C. D. Bellicoso, M. E. Tiryaki, and M. Hutter, “Skating with a force controlled quadrupedal robot,” in2018 IEEE/RSJ Inter- national Conference on Intelligent Robots and Systems (IROS), 2018, pp. 7555–7561

2018
[6]

Design and gait control of a rollerblading robot,

S. Chitta, F. Heger, and V . Kumar, “Design and gait control of a rollerblading robot,” in2004 IEEE International Conference on Robotics and Automation (ICRA), vol. 4, 2004, pp. 3944–3949

2004
[7]

Storytelling through characters at disney parks — sxsw 2023,

Walt Disney Imagineering, “Storytelling through characters at disney parks — sxsw 2023,” YouTube video, 2023. [Online]. Available: https://www.youtube.com/watch?v=lPqzLE4KjhI

2023
[8]

Skater: Synthesized kinematics for advanced traversing efficiency on a humanoid robot via roller skate swizzles,

J. Guet al., “Skater: Synthesized kinematics for advanced traversing efficiency on a humanoid robot via roller skate swizzles,”arXiv preprint arXiv:2601.04948, 2026

work page arXiv 2026
[9]

Keep rollin’—whole-body motion control and planning for wheeled quadrupedal robots,

M. Bjelonic, C. D. Bellicoso, Y . de Viragh, D. Sako, F. D. Tresoldi, F. Jenelten, and M. Hutter, “Keep rollin’—whole-body motion control and planning for wheeled quadrupedal robots,”IEEE Robotics and Automation Letters, vol. 4, no. 2, p. 2116–2123, Apr. 2019

2019
[10]

A simple 2-dimensional model of speed skating which mimics observed forces and motions,

D. Fintelman, O. Braver, and A. Schwab, “A simple 2-dimensional model of speed skating which mimics observed forces and motions,” inMultibody dynamics, ECCOMAS Thematic Conference, Brugge, Belgium, vol. 511, 2011

2011
[11]

Zero-moment point - thirty five years of its life

M. Vukobratovic and B. Borovac, “Zero-moment point - thirty five years of its life.”I. J. Humanoid Robotics, vol. 1, pp. 157–173, 2004

2004
[12]

Locomotion approach of bipedal robot utilizing passive wheel without swing leg based on stability margin maximization and fall prevention functions,

K. Kimura, N. Imaoka, S. Noda, Y . Kakiuchi, K. Okada, and M. Inaba, “Locomotion approach of bipedal robot utilizing passive wheel without swing leg based on stability margin maximization and fall prevention functions,”ROBOMECH Journal, vol. 7, no. 1, p. 35, Oct 2020

2020
[13]

Inline skating motion genera- tor with passive wheels for small size humanoid robots,

N. Ziv, Y . Lee, and G. Ciaravella, “Inline skating motion genera- tor with passive wheels for small size humanoid robots,” in2010 IEEE/ASME International Conference on Advanced Intelligent Mecha- tronics, 2010, pp. 1391–1395

2010
[14]

Kungfubot: Physics-based humanoid whole-body control for learning highly-dynamic skills,

W. Xieet al., “Kungfubot: Physics-based humanoid whole-body control for learning highly-dynamic skills,” inThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025

2025
[15]

Hitter: A humanoid table tennis robot via hierarchical planning and learning,

Z. Suet al., “Hitter: A humanoid table tennis robot via hierarchical planning and learning,”arXiv preprint arXiv:2508.21043, 2025

work page arXiv 2025
[16]

Real-time skating motion control of humanoid robots for acceleration and balancing,

N. Takasugi, K. Kojima, S. Nozawa, Y . Kakiuchi, K. Okada, and M. Inaba, “Real-time skating motion control of humanoid robots for acceleration and balancing,” in2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2016, pp. 1356– 1363

2016
[17]

HUSKY: Humanoid Skateboarding System via Physics-Aware Whole-Body Control

J. Han, D. Wang, C. Zhang, X. Liu, P. Luo, C. Bai, and X. Li, “Husky: Humanoid skateboarding system via physics-aware whole- body control,”arXiv preprint arXiv:2602.03205, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[18]

Booster gym: An end-to-end reinforcement learning framework for humanoid robot locomotion,

Y . Wang, P. Chen, X. Han, F. Wu, and M. Zhao, “Booster gym: An end-to-end reinforcement learning framework for humanoid robot locomotion,”arXiv preprint arXiv:2506.15132, 2025

work page arXiv 2025
[19]

Dreamwaq: Learning robust quadrupedal locomotion with implicit terrain imagination via deep reinforcement learning,

I. M. A. Nahrendra, B. Yu, and H. Myung, “Dreamwaq: Learning robust quadrupedal locomotion with implicit terrain imagination via deep reinforcement learning,” in2023 IEEE International Conference on Robotics and Automation (ICRA), 2023, pp. 5078–5084

2023
[20]

Proximal Policy Optimization Algorithms

J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,”arXiv preprint arXiv:1707.06347, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[21]

Isaac gym: High performance gpu-based robotic simulation for robot learning,

V . Makoviychuket al., “Isaac gym: High performance gpu-based robotic simulation for robot learning,” inThirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2), 2021

2021
[22]

Mujoco: A physics engine for model-based control,

E. Todorov, T. Erez, and Y . Tassa, “Mujoco: A physics engine for model-based control,” in2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2012, pp. 5026–5033

2012
[23]

Isaac Lab: A GPU-Accelerated Simulation Framework for Multi-Modal Robot Learning

M. Mittalet al., “Isaac lab: A gpu-accelerated simulation framework for multi-modal robot learning,” 2025. [Online]. Available: https://arxiv.org/abs/2511.04831

work page internal anchor Pith review Pith/arXiv arXiv 2025
[24]

Curriculum learning,

Y . Bengio, J. Louradour, R. Collobert, and J. Weston, “Curriculum learning,” inProceedings of the 26th annual international conference on machine learning, 2009, pp. 41–48

2009
[25]

Automatic goal gen- eration for reinforcement learning agents,

C. Florensa, D. Held, X. Geng, and P. Abbeel, “Automatic goal gen- eration for reinforcement learning agents,” inInternational conference on machine learning. PMLR, 2018, pp. 1515–1528

2018
[26]

Rapid locomotion via reinforcement learning,

G. B. Margolis, G. Yang, K. Paigwar, T. Chen, and P. Agrawal, “Rapid locomotion via reinforcement learning,”The International Journal of Robotics Research, vol. 43, no. 4, pp. 572–587, 2024

2024

[1] [1]

Perceptive Humanoid Parkour: Chaining Dynamic Human Skills via Motion Matching

Z. Wuet al., “Perceptive humanoid parkour: Chaining dynamic human skills via motion matching,” 2026. [Online]. Available: https://arxiv.org/abs/2602.15827

work page internal anchor Pith review Pith/arXiv arXiv 2026

[2] [2]

Humanoid whole-body locomotion on narrow terrain via dynamic balance and reinforcement learning,

W. Xie, C. Bai, J. Shi, J. Yang, Y . Ge, W. Zhang, and X. Li, “Humanoid whole-body locomotion on narrow terrain via dynamic balance and reinforcement learning,” in2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2025, pp. 4751– 4758

2025

[3] [3]

(2025) Booster t1: High-performance humanoid robot

Booster Robotics. (2025) Booster t1: High-performance humanoid robot. [Online]. Available: https://www.booster.tech/booster-t1/

2025

[4] [4]

Study on roller-walker — improvement of locomotive efficiency of quadruped robots by passive wheels,

G. Endo and S. Hirose, “Study on roller-walker — improvement of locomotive efficiency of quadruped robots by passive wheels,” Advanced Robotics, vol. 26, no. 8-9, pp. 969–988, 2012

2012

[5] [5]

Skating with a force controlled quadrupedal robot,

M. Bjelonic, C. D. Bellicoso, M. E. Tiryaki, and M. Hutter, “Skating with a force controlled quadrupedal robot,” in2018 IEEE/RSJ Inter- national Conference on Intelligent Robots and Systems (IROS), 2018, pp. 7555–7561

2018

[6] [6]

Design and gait control of a rollerblading robot,

S. Chitta, F. Heger, and V . Kumar, “Design and gait control of a rollerblading robot,” in2004 IEEE International Conference on Robotics and Automation (ICRA), vol. 4, 2004, pp. 3944–3949

2004

[7] [7]

Storytelling through characters at disney parks — sxsw 2023,

Walt Disney Imagineering, “Storytelling through characters at disney parks — sxsw 2023,” YouTube video, 2023. [Online]. Available: https://www.youtube.com/watch?v=lPqzLE4KjhI

2023

[8] [8]

Skater: Synthesized kinematics for advanced traversing efficiency on a humanoid robot via roller skate swizzles,

J. Guet al., “Skater: Synthesized kinematics for advanced traversing efficiency on a humanoid robot via roller skate swizzles,”arXiv preprint arXiv:2601.04948, 2026

work page arXiv 2026

[9] [9]

Keep rollin’—whole-body motion control and planning for wheeled quadrupedal robots,

M. Bjelonic, C. D. Bellicoso, Y . de Viragh, D. Sako, F. D. Tresoldi, F. Jenelten, and M. Hutter, “Keep rollin’—whole-body motion control and planning for wheeled quadrupedal robots,”IEEE Robotics and Automation Letters, vol. 4, no. 2, p. 2116–2123, Apr. 2019

2019

[10] [10]

A simple 2-dimensional model of speed skating which mimics observed forces and motions,

D. Fintelman, O. Braver, and A. Schwab, “A simple 2-dimensional model of speed skating which mimics observed forces and motions,” inMultibody dynamics, ECCOMAS Thematic Conference, Brugge, Belgium, vol. 511, 2011

2011

[11] [11]

Zero-moment point - thirty five years of its life

M. Vukobratovic and B. Borovac, “Zero-moment point - thirty five years of its life.”I. J. Humanoid Robotics, vol. 1, pp. 157–173, 2004

2004

[12] [12]

Locomotion approach of bipedal robot utilizing passive wheel without swing leg based on stability margin maximization and fall prevention functions,

K. Kimura, N. Imaoka, S. Noda, Y . Kakiuchi, K. Okada, and M. Inaba, “Locomotion approach of bipedal robot utilizing passive wheel without swing leg based on stability margin maximization and fall prevention functions,”ROBOMECH Journal, vol. 7, no. 1, p. 35, Oct 2020

2020

[13] [13]

Inline skating motion genera- tor with passive wheels for small size humanoid robots,

N. Ziv, Y . Lee, and G. Ciaravella, “Inline skating motion genera- tor with passive wheels for small size humanoid robots,” in2010 IEEE/ASME International Conference on Advanced Intelligent Mecha- tronics, 2010, pp. 1391–1395

2010

[14] [14]

Kungfubot: Physics-based humanoid whole-body control for learning highly-dynamic skills,

W. Xieet al., “Kungfubot: Physics-based humanoid whole-body control for learning highly-dynamic skills,” inThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025

2025

[15] [15]

Hitter: A humanoid table tennis robot via hierarchical planning and learning,

Z. Suet al., “Hitter: A humanoid table tennis robot via hierarchical planning and learning,”arXiv preprint arXiv:2508.21043, 2025

work page arXiv 2025

[16] [16]

Real-time skating motion control of humanoid robots for acceleration and balancing,

N. Takasugi, K. Kojima, S. Nozawa, Y . Kakiuchi, K. Okada, and M. Inaba, “Real-time skating motion control of humanoid robots for acceleration and balancing,” in2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2016, pp. 1356– 1363

2016

[17] [17]

HUSKY: Humanoid Skateboarding System via Physics-Aware Whole-Body Control

J. Han, D. Wang, C. Zhang, X. Liu, P. Luo, C. Bai, and X. Li, “Husky: Humanoid skateboarding system via physics-aware whole- body control,”arXiv preprint arXiv:2602.03205, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026

[18] [18]

Booster gym: An end-to-end reinforcement learning framework for humanoid robot locomotion,

Y . Wang, P. Chen, X. Han, F. Wu, and M. Zhao, “Booster gym: An end-to-end reinforcement learning framework for humanoid robot locomotion,”arXiv preprint arXiv:2506.15132, 2025

work page arXiv 2025

[19] [19]

Dreamwaq: Learning robust quadrupedal locomotion with implicit terrain imagination via deep reinforcement learning,

I. M. A. Nahrendra, B. Yu, and H. Myung, “Dreamwaq: Learning robust quadrupedal locomotion with implicit terrain imagination via deep reinforcement learning,” in2023 IEEE International Conference on Robotics and Automation (ICRA), 2023, pp. 5078–5084

2023

[20] [20]

Proximal Policy Optimization Algorithms

J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,”arXiv preprint arXiv:1707.06347, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[21] [21]

Isaac gym: High performance gpu-based robotic simulation for robot learning,

V . Makoviychuket al., “Isaac gym: High performance gpu-based robotic simulation for robot learning,” inThirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2), 2021

2021

[22] [22]

Mujoco: A physics engine for model-based control,

E. Todorov, T. Erez, and Y . Tassa, “Mujoco: A physics engine for model-based control,” in2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2012, pp. 5026–5033

2012

[23] [23]

Isaac Lab: A GPU-Accelerated Simulation Framework for Multi-Modal Robot Learning

M. Mittalet al., “Isaac lab: A gpu-accelerated simulation framework for multi-modal robot learning,” 2025. [Online]. Available: https://arxiv.org/abs/2511.04831

work page internal anchor Pith review Pith/arXiv arXiv 2025

[24] [24]

Curriculum learning,

Y . Bengio, J. Louradour, R. Collobert, and J. Weston, “Curriculum learning,” inProceedings of the 26th annual international conference on machine learning, 2009, pp. 41–48

2009

[25] [25]

Automatic goal gen- eration for reinforcement learning agents,

C. Florensa, D. Held, X. Geng, and P. Abbeel, “Automatic goal gen- eration for reinforcement learning agents,” inInternational conference on machine learning. PMLR, 2018, pp. 1515–1528

2018

[26] [26]

Rapid locomotion via reinforcement learning,

G. B. Margolis, G. Yang, K. Paigwar, T. Chen, and P. Agrawal, “Rapid locomotion via reinforcement learning,”The International Journal of Robotics Research, vol. 43, no. 4, pp. 572–587, 2024

2024