pith. sign in

arxiv: 2606.31807 · v1 · pith:Z4RLVFA6new · submitted 2026-06-30 · 💻 cs.RO

Reinforcement Learning-Based Control for an Inline Skating Humanoid Robot

Pith reviewed 2026-07-01 05:14 UTC · model grok-4.3

classification 💻 cs.RO
keywords reinforcement learninghumanoid robotinline skatingpassive wheelslocomotion controlcost of transportsim-to-real transferdynamic balance
0
0 comments X

The pith

Reinforcement learning produces a control policy for a humanoid robot on passive inline skates that cuts energy cost by up to 50 percent versus walking and transfers zero-shot to hardware.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper trains a reinforcement learning policy that lets a humanoid robot use consumer inline skates for locomotion instead of feet. Training relies on simulation with spherical and ellipsoidal wheel models, a success-based command curriculum, and a rolling reward to generate propulsion strategies from scratch. The resulting policy achieves lower energy use than standard walking and performs dynamic balance, perturbation rejection, and turning on the physical robot without further tuning. A sympathetic reader would care because passive wheels offer a route to faster, cheaper locomotion for humanoids if the sim-to-real gap can be closed this way.

Core claim

An RL policy trained without human motion data or imitation enables precise 6-DoF control of passive inline skates on a humanoid robot. Propulsion and balance strategies emerge from the reward structure when the policy is trained across spherical and ellipsoidal wheel geometries plus a custom curriculum and rolling reward. The policy reduces Cost of Transport by up to 50 percent relative to walking gaits and transfers directly to the Booster T1 hardware, where it maintains dynamic balance, rejects physical perturbations, and executes agile turns at speed.

What carries the argument

The reinforcement learning policy trained with multiple geometric wheel models during simulation, a success-based command curriculum, and a specialized rolling reward to produce edge-driven skating.

If this is right

  • The policy achieves up to 50 percent lower Cost of Transport than standard walking gaits.
  • Locomotion transfers zero-shot from simulation to the physical Booster T1 hardware.
  • The robot demonstrates dynamic balance and rejection of active physical perturbations.
  • Agile turning at speed becomes possible through the learned skating strategies.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar reward and curriculum techniques could be tested on other passive-wheeled platforms to check whether the 50 percent energy reduction generalizes beyond this robot.
  • If the approach scales, it opens a path to energy-efficient locomotion that combines legged versatility with wheeled efficiency without adding motors to the wheels.
  • The absence of motion capture data suggests the method could be applied to robots whose human-like skating motion has no prior examples.

Load-bearing premise

Training with two different wheel shapes in simulation plus the custom curriculum and rolling reward is enough to overcome contact modeling errors and produce policies that work on real passive skates with no hardware fine-tuning.

What would settle it

Deploy the same policy on the physical robot after training with only one wheel geometry instead of both spherical and ellipsoidal models; if balance fails or Cost of Transport stays comparable to walking, the claim does not hold.

Figures

Figures reproduced from arXiv: 2606.31807 by Clemens Schwarke, Ethan Marot, Marco Hutter, Raffaello D'Andrea, Thomas Bi, Victor Klemm.

Figure 1
Figure 1. Figure 1: Zero-shot sim-to-real transfer of an inline skating policy. Shown [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: SLA mounts and inline skate bracket and wheels. The mounts are [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Spherical primitives representing wheel contact points are modeled [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 5
Figure 5. Figure 5: Real-world deployment of the learned skating policy turning at [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Position-velocity trajectories of leg joints from 0 m/s to 1.8 m/s and back to 0 m/s forward velocity command, with zero yaw commands. Isaac Gym [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: CoT values for the baseline walking policy compared to our [PITH_FULL_IMAGE:figures/full_fig_p007_7.png] view at source ↗
read the original abstract

As humanoid robots become increasingly dynamic, coupling them with reinforcement learning offers a promising approach to solving the complex, underactuated mechanics of passive inline skating. Equipping a humanoid robot with passive inline skating wheels presents an opportunity to combine the versatile agility of humanoids with the high-speed, energy-efficient locomotion strategies utilized by human skaters. In this paper, we train and deploy a reinforcement learning control policy that enables novel locomotion strategies for a humanoid robot modified to equip consumer inline skates instead of conventional feet. Unlike previous work limited to quadrupedal robots or actively driven wheels, our system allows for precise 6-DoF control of the skates to execute dynamic, edge-driven propulsion strategies. Our skating strategies emerge entirely from our reward structure, without reliance on human motion data, imitation learning, or kinematic priors. We overcome the inherent instability of passive wheels and simulation contact artifacts by utilizing different geometric wheel models (spherical and ellipsoidal) during training and validation, along with a custom success-based command curriculum and a specialized rolling reward. Consequently, our policy demonstrates up to a 50% reduction in Cost of Transport (CoT) compared to standard walking gaits. The resulting policy successfully transfers zero-shot to the physical Booster T1 hardware. Real-world deployments demonstrate dynamic balance, the ability to reject active physical perturbations, and agile locomotion strategies capable of turning at speed. A video of our results can be found at https://www.youtube.com/watch?v=-_APcOS7uFo.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript presents a reinforcement learning policy for controlling a humanoid robot equipped with passive inline skates on the Booster T1 platform. Training alternates between spherical and ellipsoidal wheel geometries, incorporates a success-based command curriculum and rolling reward, and produces emergent skating gaits without imitation learning or kinematic priors. The central claims are up to 50% reduction in Cost of Transport relative to walking gaits and successful zero-shot sim-to-real transfer, with real-world demonstrations of dynamic balance, perturbation rejection, and agile turning maneuvers.

Significance. If the zero-shot transfer and CoT reduction hold under rigorous validation, the work would advance RL control of underactuated passive-wheeled humanoids by showing that energy-efficient, dynamic locomotion strategies can be obtained purely from reward design. The multi-geometry wheel modeling technique for mitigating contact artifacts constitutes a practical methodological contribution for sim-to-real problems in contact-rich locomotion.

major comments (2)
  1. [Abstract] Abstract: The claim of reliable zero-shot transfer to passive inline skates is load-bearing on the spherical/ellipsoidal wheel models plus curriculum and rolling reward overcoming simulation contact artifacts, yet no quantitative evidence (measured vs. simulated normal/tangential force traces, identified rolling-resistance coefficients, or ablation on model fidelity) is supplied to confirm these choices close the gap for the real Booster T1 skates.
  2. [Abstract] Abstract: The reported 50% CoT reduction is presented as a headline result, but the abstract does not specify the exact baseline walking gait, the CoT measurement protocol (simulation vs. hardware), or error bars, making it impossible to assess whether the improvement is robust or sensitive to the free reward-structure weights.
minor comments (1)
  1. [Abstract] Abstract: The provided video link is a positive addition for conveying the locomotion behaviors.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for the constructive feedback focused on the abstract. We will revise the abstract for greater precision on the CoT claim and to clarify the nature of the zero-shot validation. We address each point below.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The claim of reliable zero-shot transfer to passive inline skates is load-bearing on the spherical/ellipsoidal wheel models plus curriculum and rolling reward overcoming simulation contact artifacts, yet no quantitative evidence (measured vs. simulated normal/tangential force traces, identified rolling-resistance coefficients, or ablation on model fidelity) is supplied to confirm these choices close the gap for the real Booster T1 skates.

    Authors: The manuscript validates zero-shot transfer through successful hardware deployment, including balance, perturbation rejection, and turning maneuvers on the physical Booster T1, without policy fine-tuning. Direct force trace comparisons, rolling-resistance identification, or model-fidelity ablations are not included, as the evaluation prioritized end-to-end locomotion metrics over contact-level instrumentation. We will revise the abstract to state that transfer success is demonstrated via hardware performance rather than force matching, and we will add a sentence noting this as a limitation of the current validation. revision: partial

  2. Referee: [Abstract] Abstract: The reported 50% CoT reduction is presented as a headline result, but the abstract does not specify the exact baseline walking gait, the CoT measurement protocol (simulation vs. hardware), or error bars, making it impossible to assess whether the improvement is robust or sensitive to the free reward-structure weights.

    Authors: We will revise the abstract to specify: (1) the baseline is a walking policy trained with the identical RL framework and reward structure on the Booster T1 platform without skates; (2) CoT is computed using the standard mechanical power / forward velocity formulation and reported for both simulation and hardware; (3) the up-to-50% figure includes standard deviation across repeated trials; and (4) the reduction remained consistent under moderate variations of the free reward weights. These details will be added while preserving the abstract length. revision: yes

standing simulated objections not resolved
  • No quantitative force traces, identified rolling-resistance coefficients, or model-fidelity ablations are present in the manuscript to directly confirm that the spherical/ellipsoidal modeling closes the sim-to-real gap.

Circularity Check

0 steps flagged

No circularity: empirical RL training and hardware results are independent of inputs

full rationale

The paper reports an RL policy trained with geometric wheel models, a success-based curriculum, and a rolling reward to achieve skating locomotion. The 50% CoT reduction and zero-shot hardware transfer are presented as measured outcomes of this training process on the Booster T1, not as quantities derived by algebraic construction or by renaming fitted parameters. No equations reduce the reported performance metrics to the training inputs by definition, and no self-citation chain is invoked to justify uniqueness or force the result. The derivation chain consists of standard RL optimization whose outputs are externally validated on physical hardware.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

Only abstract available; specific free parameters such as reward weights and curriculum thresholds are not reported. The approach rests on the domain assumption that varied wheel geometries mitigate sim contact artifacts.

free parameters (1)
  • reward structure weights
    RL reward terms for rolling and success-based curriculum are tuned to produce skating behavior; exact values not stated in abstract.
axioms (1)
  • domain assumption Simulation contact dynamics with spherical and ellipsoidal wheel models sufficiently approximate real passive inline skate behavior for zero-shot transfer.
    Invoked to overcome instability and artifacts during training and validation.

pith-pipeline@v0.9.1-grok · 5810 in / 1269 out tokens · 34882 ms · 2026-07-01T05:14:16.675789+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

26 extracted references · 7 canonical work pages · 4 internal anchors

  1. [1]

    Perceptive Humanoid Parkour: Chaining Dynamic Human Skills via Motion Matching

    Z. Wuet al., “Perceptive humanoid parkour: Chaining dynamic human skills via motion matching,” 2026. [Online]. Available: https://arxiv.org/abs/2602.15827

  2. [2]

    Humanoid whole-body locomotion on narrow terrain via dynamic balance and reinforcement learning,

    W. Xie, C. Bai, J. Shi, J. Yang, Y . Ge, W. Zhang, and X. Li, “Humanoid whole-body locomotion on narrow terrain via dynamic balance and reinforcement learning,” in2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2025, pp. 4751– 4758

  3. [3]

    (2025) Booster t1: High-performance humanoid robot

    Booster Robotics. (2025) Booster t1: High-performance humanoid robot. [Online]. Available: https://www.booster.tech/booster-t1/

  4. [4]

    Study on roller-walker — improvement of locomotive efficiency of quadruped robots by passive wheels,

    G. Endo and S. Hirose, “Study on roller-walker — improvement of locomotive efficiency of quadruped robots by passive wheels,” Advanced Robotics, vol. 26, no. 8-9, pp. 969–988, 2012

  5. [5]

    Skating with a force controlled quadrupedal robot,

    M. Bjelonic, C. D. Bellicoso, M. E. Tiryaki, and M. Hutter, “Skating with a force controlled quadrupedal robot,” in2018 IEEE/RSJ Inter- national Conference on Intelligent Robots and Systems (IROS), 2018, pp. 7555–7561

  6. [6]

    Design and gait control of a rollerblading robot,

    S. Chitta, F. Heger, and V . Kumar, “Design and gait control of a rollerblading robot,” in2004 IEEE International Conference on Robotics and Automation (ICRA), vol. 4, 2004, pp. 3944–3949

  7. [7]

    Storytelling through characters at disney parks — sxsw 2023,

    Walt Disney Imagineering, “Storytelling through characters at disney parks — sxsw 2023,” YouTube video, 2023. [Online]. Available: https://www.youtube.com/watch?v=lPqzLE4KjhI

  8. [8]

    Skater: Synthesized kinematics for advanced traversing efficiency on a humanoid robot via roller skate swizzles,

    J. Guet al., “Skater: Synthesized kinematics for advanced traversing efficiency on a humanoid robot via roller skate swizzles,”arXiv preprint arXiv:2601.04948, 2026

  9. [9]

    Keep rollin’—whole-body motion control and planning for wheeled quadrupedal robots,

    M. Bjelonic, C. D. Bellicoso, Y . de Viragh, D. Sako, F. D. Tresoldi, F. Jenelten, and M. Hutter, “Keep rollin’—whole-body motion control and planning for wheeled quadrupedal robots,”IEEE Robotics and Automation Letters, vol. 4, no. 2, p. 2116–2123, Apr. 2019

  10. [10]

    A simple 2-dimensional model of speed skating which mimics observed forces and motions,

    D. Fintelman, O. Braver, and A. Schwab, “A simple 2-dimensional model of speed skating which mimics observed forces and motions,” inMultibody dynamics, ECCOMAS Thematic Conference, Brugge, Belgium, vol. 511, 2011

  11. [11]

    Zero-moment point - thirty five years of its life

    M. Vukobratovic and B. Borovac, “Zero-moment point - thirty five years of its life.”I. J. Humanoid Robotics, vol. 1, pp. 157–173, 2004

  12. [12]

    Locomotion approach of bipedal robot utilizing passive wheel without swing leg based on stability margin maximization and fall prevention functions,

    K. Kimura, N. Imaoka, S. Noda, Y . Kakiuchi, K. Okada, and M. Inaba, “Locomotion approach of bipedal robot utilizing passive wheel without swing leg based on stability margin maximization and fall prevention functions,”ROBOMECH Journal, vol. 7, no. 1, p. 35, Oct 2020

  13. [13]

    Inline skating motion genera- tor with passive wheels for small size humanoid robots,

    N. Ziv, Y . Lee, and G. Ciaravella, “Inline skating motion genera- tor with passive wheels for small size humanoid robots,” in2010 IEEE/ASME International Conference on Advanced Intelligent Mecha- tronics, 2010, pp. 1391–1395

  14. [14]

    Kungfubot: Physics-based humanoid whole-body control for learning highly-dynamic skills,

    W. Xieet al., “Kungfubot: Physics-based humanoid whole-body control for learning highly-dynamic skills,” inThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025

  15. [15]

    Hitter: A humanoid table tennis robot via hierarchical planning and learning,

    Z. Suet al., “Hitter: A humanoid table tennis robot via hierarchical planning and learning,”arXiv preprint arXiv:2508.21043, 2025

  16. [16]

    Real-time skating motion control of humanoid robots for acceleration and balancing,

    N. Takasugi, K. Kojima, S. Nozawa, Y . Kakiuchi, K. Okada, and M. Inaba, “Real-time skating motion control of humanoid robots for acceleration and balancing,” in2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2016, pp. 1356– 1363

  17. [17]

    HUSKY: Humanoid Skateboarding System via Physics-Aware Whole-Body Control

    J. Han, D. Wang, C. Zhang, X. Liu, P. Luo, C. Bai, and X. Li, “Husky: Humanoid skateboarding system via physics-aware whole- body control,”arXiv preprint arXiv:2602.03205, 2026

  18. [18]

    Booster gym: An end-to-end reinforcement learning framework for humanoid robot locomotion,

    Y . Wang, P. Chen, X. Han, F. Wu, and M. Zhao, “Booster gym: An end-to-end reinforcement learning framework for humanoid robot locomotion,”arXiv preprint arXiv:2506.15132, 2025

  19. [19]

    Dreamwaq: Learning robust quadrupedal locomotion with implicit terrain imagination via deep reinforcement learning,

    I. M. A. Nahrendra, B. Yu, and H. Myung, “Dreamwaq: Learning robust quadrupedal locomotion with implicit terrain imagination via deep reinforcement learning,” in2023 IEEE International Conference on Robotics and Automation (ICRA), 2023, pp. 5078–5084

  20. [20]

    Proximal Policy Optimization Algorithms

    J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,”arXiv preprint arXiv:1707.06347, 2017

  21. [21]

    Isaac gym: High performance gpu-based robotic simulation for robot learning,

    V . Makoviychuket al., “Isaac gym: High performance gpu-based robotic simulation for robot learning,” inThirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2), 2021

  22. [22]

    Mujoco: A physics engine for model-based control,

    E. Todorov, T. Erez, and Y . Tassa, “Mujoco: A physics engine for model-based control,” in2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2012, pp. 5026–5033

  23. [23]

    Isaac Lab: A GPU-Accelerated Simulation Framework for Multi-Modal Robot Learning

    M. Mittalet al., “Isaac lab: A gpu-accelerated simulation framework for multi-modal robot learning,” 2025. [Online]. Available: https://arxiv.org/abs/2511.04831

  24. [24]

    Curriculum learning,

    Y . Bengio, J. Louradour, R. Collobert, and J. Weston, “Curriculum learning,” inProceedings of the 26th annual international conference on machine learning, 2009, pp. 41–48

  25. [25]

    Automatic goal gen- eration for reinforcement learning agents,

    C. Florensa, D. Held, X. Geng, and P. Abbeel, “Automatic goal gen- eration for reinforcement learning agents,” inInternational conference on machine learning. PMLR, 2018, pp. 1515–1528

  26. [26]

    Rapid locomotion via reinforcement learning,

    G. B. Margolis, G. Yang, K. Paigwar, T. Chen, and P. Agrawal, “Rapid locomotion via reinforcement learning,”The International Journal of Robotics Research, vol. 43, no. 4, pp. 572–587, 2024