arxiv: 2604.24916 · v2 · submitted 2026-04-27 · 💻 cs.RO · cs.AI

Recognition: unknown

asRoBallet: Closing the Sim2Real Gap via Friction-Aware Reinforcement Learning for Underactuated Spherical Dynamics

Fang Wan , Guangyi Huang , Tianyu Wu , Zishang Zhang , Bangchao Huang , Haoran Sun , Mingdong Chen , Chaoyang Song

Authors on Pith no claims yet

Pith reviewed 2026-05-08 03:23 UTC · model grok-4.3

classification 💻 cs.RO cs.AI

keywords ballbotreinforcement learningsim2real transferfriction modelingomni-wheelsunderactuated roboticslocomotion policyhumanoid robot

0 comments

The pith

Friction-aware RL policy achieves zero-shot transfer to real humanoid ballbot hardware

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents asRoBallet as the first end-to-end reinforcement learning locomotion policy successfully deployed on a humanoid ballbot. It tackles the persistent sim-to-real gap in underactuated spherical systems by building a high-fidelity MuJoCo simulation that models the discrete roller mechanics of ETH-type omni-wheels along with parasitic vibrations and contact discontinuities. A dedicated friction-aware RL framework then learns to handle the coupled rolling, lateral, and torsional friction at the wheel-ball and ball-floor interfaces. This combination produces zero-shot transfer without real-world fine-tuning. The hardware itself is assembled at low cost through subtractive reconfiguration of quadruped components and paired with an iOS-based control interface.

Core claim

We introduce asRoBallet, to the best of our knowledge, the first end-to-end reinforcement learning (RL) locomotion policy deployed on a humanoid ballbot hardware platform. A high-fidelity MuJoCo simulation explicitly models the discrete roller mechanics of ETH-type omni-wheels to capture parasitic vibrations and contact discontinuities. A Friction-Aware Reinforcement Learning framework masters the coupled rolling, lateral, and torsional friction channels at the wheel-ball and ball-floor interfaces to achieve zero-shot Sim2Real transfer.

What carries the argument

High-fidelity MuJoCo simulation of discrete omni-wheel roller mechanics combined with a Friction-Aware Reinforcement Learning framework that masters coupled friction channels

If this is right

Zero-shot deployment of the RL policy on real hardware without additional training
Accurate capture of previously ignored vibrations and discontinuous contacts in omni-wheel systems
Low-cost ballbot platform constructed by repurposing quadruped robot parts
Intuitive single-operator control through a generalized iOS ecosystem for expressive maneuvers

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The friction-modeling approach could extend to other nonholonomic platforms that rely on rolling contact, such as certain wheeled manipulators.
Adding sensor noise or terrain variation directly into the same simulation might further improve policy robustness without changing the training pipeline.
The subtractive hardware design process could be applied to other robot morphologies to reduce development cost while preserving research utility.

Load-bearing premise

The MuJoCo simulation accurately reproduces the real-world parasitic vibrations, contact discontinuities, and coupled friction effects at the wheel-ball and ball-floor interfaces.

What would settle it

Deploy the trained policy on the physical ballbot and observe whether it produces stable locomotion and balancing without any hardware-specific tuning or retraining; repeated failure would show the modeled friction channels do not close the gap.

Figures

Figures reproduced from arXiv: 2604.24916 by Bangchao Huang, Chaoyang Song, Fang Wan, Guangyi Huang, Haoran Sun, Mingdong Chen, Tianyu Wu, Zishang Zhang.

**Figure 1.** Figure 1: Structural transformation from the asOverDog quadruped (top left) to the asRoBallet humanoid (right). view at source ↗

**Figure 2.** Figure 2: Repurpose mobile perception & computing from an iPhone for robotics, the same for asRoBallet & asOverDog. view at source ↗

**Figure 3.** Figure 3: High-fidelity MuJoCo simulation of asRoBallet’s highly underactuated spherical dynamics. view at source ↗

**Figure 4.** Figure 4: iPhone’s ARKit pose estimation performance and peer-to-peer latency results. view at source ↗

**Figure 5.** Figure 5: Scalable Human-Robot Interaction via iOS & our app. view at source ↗

**Figure 6.** Figure 6: Our iOS app ecosystem for multi-modal data streaming and sim-like interactive control. view at source ↗

**Figure 7.** Figure 7: Velocity-tracking learning curves (mean episode reward) for the proposed RL policy and ablations. view at source ↗

**Figure 8.** Figure 8: Successful Sim2Real deployment of asRoBallet on various indoor & outdoor floor textures and lighting view at source ↗

**Figure 9.** Figure 9: Performance of real robot experiments. chip are not accessible to users, which may pose challenges for rigorous safety certification in dynamic, clutter-rich environments. Future development will focus on extending the dynamic capabilities of asRoBallet from robust balancing to useful mobile manipulation. We intend to integrate the currently passive upper-body joints into the active control loop. By traini… view at source ↗

read the original abstract

We introduce asRoBallet, to the best of our knowledge, the first end-to-end reinforcement learning (RL) locomotion policy deployed on a humanoid ballbot hardware platform. Historically, ballbots have served as a canonical benchmark for underactuated and nonholonomic control, which are characterized by a reality gap in complex friction models for wheel-ball-floor interactions. While current literature demonstrates successful handling of 3D balancing with LQR and MPC, transitioning to actual hardware for a humanoid ballbot using RL is currently hindered by critical gaps in contact modeling, actuator latency & jitter, and safe hardware exploration. This study proposes a high-fidelity MuJoCo simulation that explicitly models the discrete roller mechanics of ETH-type omni-wheels, thereby capturing parasitic vibrations and contact discontinuities that have previously been ignored. We also developed a Friction-Aware Reinforcement Learning framework that achieves zero-shot Sim2Real transfer by mastering the coupled rolling, lateral, and torsional friction channels at the wheel-ball and ball-floor interfaces. We designed asRoBallet through subtractive reconfiguration, repurposing key components from an overconstrained quadruped and integrating them into a newly designed structural frame to achieve a robust research platform at low cost. We also developed a generalized iOS ecosystem that transforms consumer electronics into a low-latency interface, enabling a single operator to orchestrate expressive humanoid maneuvers via intuitive natural motion.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

They built a low-cost humanoid ballbot from repurposed parts and ran an end-to-end RL policy zero-shot on hardware, but the friction modeling that supposedly closes the gap has no quantitative validation against real data.

read the letter

The key point is that this paper delivers a working hardware deployment of RL locomotion on a humanoid ballbot, which is new for this platform, and they put real effort into a MuJoCo model that includes discrete roller geometry and the three friction channels. The hardware build itself is practical: they took components from an overconstrained quadruped, added a new frame, and used an iOS app for low-latency control. That part shows solid engineering for a research platform at low cost and directly addresses actuator latency and safe exploration issues that have blocked RL on ballbots before.

Referee Report

2 major / 2 minor

Summary. The paper introduces asRoBallet as the first end-to-end RL locomotion policy deployed on humanoid ballbot hardware. It proposes a high-fidelity MuJoCo simulation explicitly modeling discrete roller mechanics of ETH-type omni-wheels to capture parasitic vibrations and contact discontinuities previously ignored. A Friction-Aware RL framework is claimed to enable zero-shot Sim2Real transfer by mastering coupled rolling, lateral, and torsional friction channels at wheel-ball and ball-floor interfaces. The platform is constructed via subtractive reconfiguration from an overconstrained quadruped, with an iOS-based low-latency control interface.

Significance. If the zero-shot hardware transfer is quantitatively validated, the work would advance RL application to underactuated nonholonomic systems by demonstrating that explicit friction modeling in simulation can close the reality gap for ballbot locomotion without real-world fine-tuning. This could inform contact-rich control for other spherical or wheeled platforms where friction discontinuities dominate dynamics.

major comments (2)

[§3] §3 (Simulation and Contact Modeling): The assertion that the MuJoCo model accurately captures coupled rolling/lateral/torsional friction and parasitic vibrations at wheel-ball and ball-floor interfaces is not supported by any quantitative validation (e.g., force-torque sensor comparisons, frequency-domain vibration matching, or ablation on contact parameters). This undermines the central claim that friction-aware RL, rather than policy robustness to mismatch, enables zero-shot transfer.
[Results] Results and Experiments: No error metrics, success rates, ablation studies, or baseline comparisons (e.g., vs. LQR/MPC or non-friction-aware RL) are reported for the hardware deployment. The abstract and methods describe successful deployment but supply no data to evaluate the zero-shot claim or the contribution of the friction modeling.

minor comments (2)

[Abstract] The abstract states 'to the best of our knowledge' for the first RL deployment on humanoid ballbot; a brief literature comparison table would strengthen this novelty claim.
[§3] Notation for friction channels (rolling, lateral, torsional) is introduced without explicit equations or parameter values in the provided description; adding these in §3 would improve reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment point by point below and outline the revisions we will make to strengthen the quantitative support for our claims.

read point-by-point responses

Referee: [§3] §3 (Simulation and Contact Modeling): The assertion that the MuJoCo model accurately captures coupled rolling/lateral/torsional friction and parasitic vibrations at wheel-ball and ball-floor interfaces is not supported by any quantitative validation (e.g., force-torque sensor comparisons, frequency-domain vibration matching, or ablation on contact parameters). This undermines the central claim that friction-aware RL, rather than policy robustness to mismatch, enables zero-shot transfer.

Authors: We agree that the manuscript would benefit from explicit quantitative validation of the contact model. The simulation parameters were selected based on manufacturer data for the ETH-type omni-wheels and iterative tuning to reproduce observed hardware dynamics, including parasitic vibrations. However, direct comparisons such as force-torque measurements or frequency-domain matching were not included in the original submission. In the revised version, we will add an ablation study varying friction coefficients across the three channels and include spectral analysis of simulated versus hardware vibrations to better substantiate the modeling fidelity and isolate the contribution of friction awareness. revision: yes
Referee: [Results] Results and Experiments: No error metrics, success rates, ablation studies, or baseline comparisons (e.g., vs. LQR/MPC or non-friction-aware RL) are reported for the hardware deployment. The abstract and methods describe successful deployment but supply no data to evaluate the zero-shot claim or the contribution of the friction modeling.

Authors: We acknowledge that the hardware results section currently relies on qualitative demonstration of successful zero-shot transfer without accompanying quantitative metrics. The manuscript prioritizes the novelty of the first end-to-end RL deployment on this platform and the simulation framework, but this leaves the zero-shot performance and the specific role of friction modeling insufficiently quantified. We will revise the results to include success rates over multiple trials, trajectory tracking error metrics, and ablations comparing the friction-aware policy against a non-friction-aware baseline, thereby providing clearer evidence for the contribution of the proposed approach. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical hardware result with no self-referential derivations

full rationale

The paper's core claim is an experimental outcome: successful zero-shot deployment of an end-to-end RL policy on physical ballbot hardware after training in a custom MuJoCo simulator. The abstract and provided context describe modeling choices (discrete roller geometry, friction channels) and an RL framework, but contain no equations, parameter-fitting procedures, or predictions that reduce to their own inputs by construction. No self-citations, uniqueness theorems, or ansatzes are invoked as load-bearing steps. The result is presented as a hardware validation rather than a closed mathematical derivation, making the derivation chain self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review provides no explicit free parameters, axioms, or invented entities; the central claim rests on unstated assumptions about simulation fidelity and RL training stability that cannot be audited from the given text.

pith-pipeline@v0.9.0 · 9339 in / 1230 out tokens · 65544 ms · 2026-05-08T03:23:26.902846+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Reference graph

Works this paper leans on

24 extracted references · 5 canonical work pages · 1 internal anchor

[1]

Lauwers, George A

Tom B. Lauwers, George A. Kantor, and Ralph L. Hollis. A Dynamically Stable Single-Wheeled Mobile Robot with Inverse Mouse-Ball Drive. InIEEE International Conference on Robotics and Automation (ICRA), pages 2884–2889, 2006

2006
[2]

Umashankar Nagarajan, George Kantor, and Ralph L. Hollis. The Ballbot: An Omnidirectional Balancing Mobile Robot.The International Journal of Robotics Research, 33(6):917–930, 2014

2014
[3]

Hsiao-Wecksler

Seung Yun Song, Nadja Marin, Chenzhang Xiao, Ryu Okubo, Joao Ramos, and Elizabeth T. Hsiao-Wecksler. Hands-Free Physical Human-Robot Interaction and Testing for Navigating a Virtual Ballbot. InIEEE International Conference on Robot and Human Interactive Communication (RO-MAN), pages 556–563, 2023

2023
[4]

Hsiao-Wecksler

Chenzhang Xiao, Mahshid Mansouri, David Lam, Joao Ramos, and Elizabeth T. Hsiao-Wecksler. Design and Control of a Ballbot Drivetrain with High Agility, Minimal Footprint, and High Payload. InIEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 376–383, 2023

2023
[5]

Bachelor thesis, ETH Zurich, 2013

Christoph Skrabel.Mechanical Design of a Ballbot Platform. Bachelor thesis, ETH Zurich, 2013

2013
[6]

Bachelor thesis, Aalborg University, 2019

Thomas Kølbæk Jespersen.Kugle - Modelling and Control of a Ball-balancing Robot. Bachelor thesis, Aalborg University, 2019

2019
[7]

Momentum-Based Whole-Body Optimal Planning for a Single-Spherical-Wheeled Balancing Mobile Manipulator

Roberto Shu and Ralph Hollis. Momentum-Based Whole-Body Optimal Planning for a Single-Spherical-Wheeled Balancing Mobile Manipulator. InIEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 3221–3226, 2021

2021
[8]

Stanford Doggo: An Open-Source, Quasi-Direct- Drive Quadruped

Nathan Kau, Aaron Schultz, Natalie Ferrante, and Patrick Slade. Stanford Doggo: An Open-Source, Quasi-Direct- Drive Quadruped. InIEEE International Conference on Robotics and Automation (ICRA), pages 6309–6315, 2019

2019
[9]

Smartphone Based Robotics:Powerful, Flexible and Inexpensive Robots forHobbyists, Educators, Students and Researchers

Nicolas Oros and Jeffrey L Krichmar. Smartphone Based Robotics:Powerful, Flexible and Inexpensive Robots forHobbyists, Educators, Students and Researchers. Technical report, University of California, Irvine, 2013

2013
[10]

Path-Following Model Predictive Control of Ballbots

Thomas K Jespersen, Mohammad al Ahdab, F Mendez Juan de Dios, Malte R Damgaard, Karl D Hansen, Rasmus Pedersen, and Thomas Bak. Path-Following Model Predictive Control of Ballbots. InIEEE International Conference on Robotics and Automation (ICRA), pages 1498–1504, 2020

2020
[11]

Parameter Identification and LQR/MPC Balancing Control of a Ballbot

Max Studt, Ievgen Zhavzharov, and Hossam S Abbas. Parameter Identification and LQR/MPC Balancing Control of a Ballbot. InEuropean Control Conference (ECC), pages 1315–1321, 2022

2022
[12]

John Wiley & Sons, Ltd, 2013

Bharat Bhushan.Introduction to Tribology. John Wiley & Sons, Ltd, 2013

2013
[13]

Learning Ball-Balancing Robot through Deep Reinforcement Learning

Yifan Zhou, Jianghao Lin, Shuai Wang, and Chong Zhang. Learning Ball-Balancing Robot through Deep Reinforcement Learning. InInternational Conference on Computer, Control and Robotics (ICCCR), pages 1–8, 2021

2021
[14]

Reinforcement Learning for Ballbot Navigation in Uneven Terrain, 2025

Achkan Salehi. Reinforcement Learning for Ballbot Navigation in Uneven Terrain, 2025. arXiv:2505.18417 [cs.RO]

work page arXiv 2025
[15]

Overconstrained Locomotion

Haoran Sun, Bangchao Huang, Zishang Zhang, Ronghan Xu, Guojing Huang, Guangyi Huang, Jiayi Yin, Nuofan Qiu, Hua Chen, Wei Zhang, Jia Pan, Fang Wan, and Chaoyang Song. Overconstrained Locomotion. In International Symposium of Robotics Research (ISRR), 2024

2024
[16]

One-DoF Robotic Design of Overconstrained Limbs with Energy-Efficient, Self-Collision-Free Motion

Yuping Gu, Bangchao Huang, Haoran Sun, Ronghan Xu, Jiayi Yin, Wei Zhang, Fang Wan, Jia Pan, and Chaoyang Song. One-DoF Robotic Design of Overconstrained Limbs with Energy-Efficient, Self-Collision-Free Motion. Fundamental Research, 21(5):1571, 2025

2025
[17]

SeeThruFinger: See and Grasp Anything with a Multi-Modal Soft Touch, 2025

Fang Wan and Chaoyang Song. SeeThruFinger: See and Grasp Anything with a Multi-Modal Soft Touch, 2025. arXiv:2312.09822 [cs.RO]

work page arXiv 2025
[18]

Anchoring Morphological Representations Unlocks Latent Proprioception in Soft Robots.Advanced Intelligent Systems, 7(12):e202500444, 2025

Xudong Han, Ning Guo, Ronghan Xu, Fang Wan, and Chaoyang Song. Anchoring Morphological Representations Unlocks Latent Proprioception in Soft Robots.Advanced Intelligent Systems, 7(12):e202500444, 2025

2025
[19]

Rezero, Focus Project Report

Simon Doessegger, Peter Fankhauser, Corsin Gwerder, Jonathan Huessy, Jerome Kaeser, Thomas Kammermann, Lukas Limacher, and Michael Neunert. Rezero, Focus Project Report. Technical report, ETH Zurich, 2010

2010
[20]

Mujoco playground,

Kevin Zakka, Baruch Tabanpour, Qiayuan Liao, Mustafa Haiderbhai, Samuel Holt, Jing Yuan Luo, Arthur Allshire, Erik Frey, Koushil Sreenath, Lueder A. Kahrs, Carmelo Sferrazza, Yuval Tassa, and Pieter Abbeel. MuJoCo Playground, 2025. arXiv:2502.08844 [cs.RO]

work page arXiv 2025
[21]

Human-Robot Perception in Industrial Environments: A Survey.Sensors, 21(5):1571, 2021

Andrea Bonci, Pangcheng David Cen Cheng, Marina Indri, Giacomo Nabissi, and Fiorella Sibona. Human-Robot Perception in Industrial Environments: A Survey.Sensors, 21(5):1571, 2021. 16 APREPRINT- APRIL29, 2026

2021
[22]

Cross- embodiment robot manipulation skill transfer using la- tent space alignment,

Tianyu Wang, Dwait Bhatt, Xiaolong Wang, and Nikolay Atanasov. Cross-Embodiment Robot Manipulation Skill Transfer using Latent Space Alignment, 2024. arXiv:2406.01968 [cs.RO]

work page arXiv 2024
[23]

Scaling Cross-Embodied Learning: One Policy for Manipulation, Navigation, Locomotion and Aviation

Ria Doshi, Homer Rich Walke, Oier Mees, Sudeep Dasari, and Sergey Levine. Scaling Cross-Embodied Learning: One Policy for Manipulation, Navigation, Locomotion and Aviation. InConference on Robot Learning (CoRL), 2024

2024
[24]

BEHAVIOR-1K: A Human-Centered, Embodied AI Benchmark with 1,000 Everyday Activities and Realistic Simulation

Chengshu Li, Ruohan Zhang, Josiah Wong, Cem Gokmen, Sanjana Srivastava, Roberto Martín-Martín, Chen Wang, Gabrael Levine, Wensi Ai, Benjamin Martinez, Hang Yin, Michael Lingelbach, Minjune Hwang, Ayano Hiranaka, Sujay Garlanka, Arman Aydin, Sharon Lee, Jiankai Sun, Mona Anvari, Manasi Sharma, Dhruva Bansal, Samuel Hunter, Kyu-Young Kim, Alan Lou, Caleb R....

work page internal anchor Pith review arXiv 2024