pith. sign in

arxiv: 2606.11092 · v3 · pith:IFZS2FZ6new · submitted 2026-06-09 · 💻 cs.RO · cs.AI

RoboNaldo: Accurate, Stable and Powerful Humanoid Soccer Shooting via Motion-Guided Curriculum Reinforcement Learning

Pith reviewed 2026-06-27 13:03 UTC · model grok-4.3

classification 💻 cs.RO cs.AI
keywords humanoid soccerreinforcement learningcurriculum learningmotion trackingwhole-body controlfree-kick shootingmoving-ball shootingUnitree G1
0
0 comments X

The pith

A three-stage curriculum RL method lets humanoid robots learn accurate powerful soccer shooting from one fixed human kick reference.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tackles the difficulty of training humanoid robots for soccer shooting, where whole-body stability, high impulse contact, and target accuracy must all be achieved together. Pure motion-tracking RL stays stable but cannot adapt to different ball positions and timings, while pure task-reward RL rarely discovers workable kicks at all. RoboNaldo therefore anchors training on a single human-kick reference motion and advances it through three curriculum stages that first establish stable whole-body coordination, then adapt the motion to random stationary ball locations, and finally add locomotion commands and a kick trigger for moving-ball cases. The resulting policy shows substantially lower shot error and higher ball speed than prior baselines in simulation and transfers to a real Unitree G1 robot with onboard perception, reaching average errors of 0.73 m and 0.86 m from three meters away together with 13.10 m/s post-contact ball velocity. A reader would care because the approach demonstrates a practical route to reliable high-impulse physical skills on humanoid hardware without requiring large libraries of reference motions.

Core claim

RoboNaldo is a three-stage motion-guided curriculum reinforcement learning framework that employs one fixed human-kick reference motion as a scaffold. Optimization first produces a stable whole-body kicking prior, then shifts toward task performance on free-kick scenarios with stationary balls at random positions, and finally extends the policy to moving-ball shooting through a locomotion-command and kick-trigger interface controlled by a high-level heuristic planner during training. Alternative high-level controllers can substitute for the planner at inference time. In simulation this yields free-kick shot error 48.6 percent lower and shoot velocity 2.96 times higher than prior baselines; o

What carries the argument

The three-stage motion-guided curriculum RL framework that progressively relaxes a single human-kick reference motion from a stable prior into task-optimized free-kick and moving-ball behaviors.

If this is right

  • The low-level policy remains compatible with alternative high-level controllers at deployment time.
  • The same curriculum progression produces policies that generalize from simulation to real-world stationary and moving-ball shooting.
  • Whole-body stability and high impulse are retained while accuracy and velocity are improved through progressive task reward weighting.
  • The approach supports both free-kick and moving-ball scenarios using the same trained policy.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The scaffolding technique could be reused for other high-impulse whole-body skills such as throwing or catching on humanoid platforms.
  • Reliance on a single reference motion may lower the data-collection burden compared with methods that require many expert demonstrations.
  • Integration of the low-level policy with onboard perception opens a path toward fully autonomous dynamic interaction tasks beyond soccer.
  • Testing the same curriculum on robots with different morphologies would reveal how much of the learned behavior is hardware-specific.

Load-bearing premise

One fixed human-kick reference motion supplies a scaffold whose optimization can be shifted across three curriculum stages without losing stability or the ability to generalize to random stationary and moving ball conditions.

What would settle it

A non-curriculum policy trained from scratch or from the same reference without staged progression that nevertheless reaches comparable shot accuracy, velocity, and real-world transfer on the Unitree G1 would falsify the necessity of the motion-guided curriculum.

Figures

Figures reproduced from arXiv: 2606.11092 by Haoguang Mai, Hongyang Li, Jingbo Wang, Li Chen, Peng Lu, Tianyang Tang, Tianyu Li, Yichao Zhong, Yidan Lu, Yixuan Pan, Yuhang Lu, Zhongyu Li.

Figure 1
Figure 1. Figure 1: RoboNaldo: accurate, stable and powerful whole-body soccer shooting on a real grassy soccer field. Ghost-overlaid time sequence of a Unitree G1 humanoid executing a directed penalty kick. Abstract: Elite humanoid soccer shooting requires whole-body stability, high￾impulse whole-body interactions, and accuracy to targets. Motion tracking-driven reinforcement learning (RL) provides stability in whole-body mo… view at source ↗
Figure 2
Figure 2. Figure 2: RoboNaldo motion-guided curriculum. A single low-level policy is trained sequentially across three stages. Stage 1 learns a stable kick by tracking a retargeted human motion. Stage 2 introduces shooting task rewards to adapt the kick for accurate stationary-ball free-kicks. Stage 3 replaces the reference motion and anchor cue with high-level locomotion and kick-trigger decision commands, enabling the polic… view at source ↗
Figure 3
Figure 3. Figure 3: Real-world perception stack. A head-mounted MID-360 LiDAR tracks the retro-reflective ball at close range, while a chest-mounted IR camera extends ball detection beyond the LiDAR near-field. Both estimates are fused by a constant-velocity Kalman filter and transformed into the pelvis frame for policy input. The target board is localized by AprilTags from the same chest camera. 2 1 0 −1 −2 goal lateral y (m… view at source ↗
Figure 4
Figure 4. Figure 4: Real-robot shooting overview. A: front-goal view shot dispersion; B: free-kick shot trajectories; C: per-region accuracy success rates and speed; D: moving-ball shot trajectories. Moving-ball trials expose a different bottleneck. Human passes introduce timing variation and off-reference approach states, reducing contact reliability, but the kick remains accurate once a clean launch is produced. Among 27 mo… view at source ↗
Figure 5
Figure 5. Figure 5: Real-robot shooting showcase. (a) outdoor free-kick and (b) moving-ball one-touch shooting are time-lapses read left-to-right. (c)–(f) are four free-kick trials read top-to-bottom (contact frame above, landing outcome below), with landing errors of 31.5, 38.7, 45.3, and 35.9 cm [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗
read the original abstract

Elite humanoid soccer shooting requires whole-body stability, high-impulse whole-body interactions, and accuracy to targets. Motion tracking-driven reinforcement learning (RL) provides stability in whole-body movement coordination, but a fixed reference makes it hard to adapt to varied ball positions and strike timings; in contrast, task reward-driven RL struggles to explore and discover valid kicks from scratch. We therefore introduce RoboNaldo, a three-stage motion-guided curriculum RL framework for high-impulse humanoid interaction. A single human-kick reference is used as a scaffold and progressively shifts optimization towards shooting performance. The curriculum first learns a stable whole-body kicking prior, then adapts the kick to free-kick settings where the ball is stationary at random positions, and finally extends it to moving-ball shooting through a locomotion-command and kick-trigger interface. A high-level heuristic planner controls this interface during training, while alternative high-level controllers can drive the same low-level policy at inference. In simulation, RoboNaldo demonstrates free-kick shot error 48.6% lower and shoot velocity 2.96x than prior work baselines. In real world on a Unitree G1 with onboard perception, RoboNaldo attains 0.73 m and 0.86 m average target shooting error from 3 m away in free-kick and moving-ball cases, accordingly. And the post-contact ball velocity reaches 13.10 m/s, which is 59-71% of reported professional open-play shot speed. Project page: https://opendrivelab.com/RoboNaldo.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The manuscript introduces RoboNaldo, a three-stage motion-guided curriculum RL framework for humanoid soccer shooting. A single fixed human-kick reference motion serves as a scaffold: stage 1 learns a stable whole-body kicking prior, stage 2 adapts it to free-kick settings with stationary balls at random positions, and stage 3 extends it to moving-ball shooting via a locomotion-command and kick-trigger interface controlled by a high-level heuristic planner. The paper reports simulation results of 48.6% lower free-kick shot error and 2.96x higher shoot velocity versus prior baselines, plus real-world results on a Unitree G1 (onboard perception) of 0.73 m and 0.86 m average target error from 3 m in free-kick and moving-ball cases, with post-contact ball velocity of 13.10 m/s (59-71% of professional open-play speeds).

Significance. If the curriculum successfully generalizes as described, the work would demonstrate a practical way to combine motion-tracking stability with task-driven optimization for high-impulse whole-body interactions, offering a template for other dynamic humanoid tasks such as manipulation or locomotion under contact. The real-world transfer on commodity hardware with onboard sensing adds practical value.

major comments (1)
  1. [Abstract / Curriculum description] Abstract and curriculum description: the central performance claims (48.6% error reduction, 2.96x velocity, real-world errors of 0.73 m / 0.86 m) rest on the assumption that a single fixed human-kick reference can scaffold stable generalization across the three stages to random stationary and moving ball conditions without instability or loss of exploration. No ablations, failure-case analysis, or stage-wise contribution metrics are described to test whether the progressive reward shift preserves the prior or leads to collapse when ball locations/timings vary, which is load-bearing for the method's validity.
minor comments (2)
  1. [Abstract] The abstract states clear quantitative gains but does not report error bars, number of trials, or statistical tests; these should be added to support the reported margins.
  2. [Methods] Training details (hyperparameters, reward weights per stage, data exclusion rules, random seed handling) are referenced as missing in the reader's assessment and should be supplied for reproducibility.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback on the curriculum design. The comment correctly identifies a gap in the current manuscript regarding explicit validation of the three-stage progression. We will address this through targeted revisions.

read point-by-point responses
  1. Referee: [Abstract / Curriculum description] Abstract and curriculum description: the central performance claims (48.6% error reduction, 2.96x velocity, real-world errors of 0.73 m / 0.86 m) rest on the assumption that a single fixed human-kick reference can scaffold stable generalization across the three stages to random stationary and moving ball conditions without instability or loss of exploration. No ablations, failure-case analysis, or stage-wise contribution metrics are described to test whether the progressive reward shift preserves the prior or leads to collapse when ball locations/timings vary, which is load-bearing for the method's validity.

    Authors: We agree that additional analysis is needed to substantiate the curriculum's role. In the revised manuscript we will add: (1) stage-wise ablations measuring policy performance and stability metrics (e.g., kick success rate, joint torque limits, and reference tracking error) after each stage; (2) failure-case analysis on out-of-distribution ball positions and timings, including cases that cause instability or collapse; and (3) quantitative metrics tracking how the reward weighting shift affects preservation of the motion prior versus task performance. These results will be presented in a new subsection and supporting figures. The existing simulation and real-world performance numbers already provide indirect evidence of successful generalization, but the requested analyses will make the claims more robust. revision: yes

Circularity Check

0 steps flagged

No circularity: derivation relies on external baselines and standard RL curriculum without self-referential reductions

full rationale

The paper describes a three-stage motion-guided curriculum RL method that starts from a single fixed human-kick reference motion, then shifts optimization toward task rewards for stationary and moving balls. All reported performance numbers (48.6% lower error, 2.96x velocity in simulation; 0.73 m / 0.86 m real-world error; 13.10 m/s ball velocity) are measured against external prior-work baselines and real hardware, not derived from quantities defined inside the method itself. No equations, fitted parameters renamed as predictions, or load-bearing self-citations appear in the provided text. The central claim therefore remains independently falsifiable and does not reduce to its own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Only the abstract is available; no explicit free parameters, axioms, or invented entities are stated. The approach implicitly relies on the domain assumption that curriculum stages will transfer stability from motion tracking to task performance.

axioms (1)
  • domain assumption Curriculum learning with a single motion reference can progressively adapt a policy from motion tracking to varied task conditions without loss of stability
    The three-stage design assumes the scaffold remains useful across random ball positions and moving-ball cases.

pith-pipeline@v0.9.1-grok · 5857 in / 1405 out tokens · 24715 ms · 2026-06-27T13:03:24.960760+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

32 extracted references · 19 canonical work pages · 3 internal anchors

  1. [1]

    Haarnoja, B

    T. Haarnoja, B. Moran, G. Lever, S. H. Huang, D. Tirumala, J. Humplik, M. Wulfmeier, S. Tunyasuvunakool, N. Y . Siegel, R. Hafner, et al. Learning agile soccer skills for a bipedal robot with deep reinforcement learning.Science Robotics, 9(89):eadi8022, 2024

  2. [2]

    Y . Wang, C. Luo, P. Chen, J. Liu, W. Sun, T. Guo, K. Yang, B. Hu, Y . Zhang, and M. Zhao. Learn- ing vision-driven reactive soccer skills for humanoid robots.arXiv preprint arXiv:2511.03996, 2025

  3. [3]

    Y . Wang, Q. Zhao, Y . F. Lau, R. Yu, H. W. Tsui, Q. Chen, J. Wang, J. Pang, and P. Tan. HumanX: Toward agile and generalizable humanoid interaction skills from human videos.arXiv preprint arXiv:2602.02473, 2026

  4. [4]

    J. Kong, X. Liu, Y . Lin, J. Han, S. Schwertfeger, C. Bai, and X. Li. Learning soccer skills for humanoid robots: A progressive perception-action framework.arXiv preprint arXiv:2602.05310, 2026

  5. [5]

    X. B. Peng, P. Abbeel, S. Levine, and M. van de Panne. Deepmimic: Example-guided deep reinforcement learning of physics-based character skills.ACM Transactions on Graphics (TOG), 37(4):1–14, 2018

  6. [6]

    X. B. Peng, Z. Ma, P. Abbeel, S. Levine, and A. Kanazawa. Amp: Adversarial motion priors for stylized physics-based character control.ACM Transactions on Graphics (TOG), 40(4):1–20, 2021

  7. [7]

    De-la Cruz-Torres, A

    B. De-la Cruz-Torres, A. Ruiz-de Alarc ´on-Quintero, and M. Navarro-Castro. The role of shot velocity in advanced post-shot metrics: Evidence from the UEFA european football championships.Data, 11(2):39, 2026. doi:10.3390/data11020039

  8. [8]

    Z. Su, Y . Gao, E. Lukas, Y . Li, J. Cai, F. Tulbah, F. Gao, C. Yu, Z. Li, Y . Wu, and K. Sreenath. Toward real-world cooperative and competitive soccer with quadrupedal robot teams. In Conference on Robot Learning (CoRL), 2025. arXiv:2505.13834

  9. [9]

    Z. Su, B. Zhang, N. Rahmanian, Y . Gao, Q. Liao, C. Regan, K. Sreenath, and S. S. Sastry. HITTER: A HumanoId table TEnnis robot via hierarchical planning and learning.arXiv preprint arXiv:2508.21043, 2025

  10. [10]

    Y . Ma, A. Cramariuc, F. Farshidian, and M. Hutter. Learning coordinated badminton skills for legged manipulators.Science Robotics, 10(102):eadu3922, 2025. doi:10.1126/scirobotics. adu3922

  11. [11]

    W. Li, J. Ma, M. Lu, and P. Lu. Like playing a video game: Spatial-temporal optimization of foot trajectories for controlled football kicking in bipedal robots. InIEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 3565–3572, 2025. doi:10.1109/ IROS60139.2025.11246655. arXiv:2510.01843. 9

  12. [12]

    Z. Xu, M. Seo, D. Lee, H. Fu, J. Hu, J. Cui, Y . Jiang, Z. Wang, A. Brund, J. Biswas, and P. Stone. Learning agile striker skills for humanoid soccer robots from noisy sensory input. InIEEE International Conference on Robotics and Automation (ICRA), 2026. arXiv:2512.06571

  13. [13]

    J. Ren, J. Long, T. Huang, H. Wang, Z. Wang, F. Jia, W. Zhang, J. Wang, P. Luo, and J. Pang. Humanoid goalkeeper: Learning from position conditioned task-motion constraints.arXiv preprint arXiv:2510.18002, 2025

  14. [14]

    H. Weng, Y . Li, N. Sobanbabu, Z. Wang, Z. Luo, T. He, D. Ramanan, and G. Shi. HDMI: Learning interactive humanoid whole-body control from human videos.arXiv preprint arXiv:2509.16757, 2025

  15. [15]

    S. Zhao, Y . Ze, Y . Wang, C. K. Liu, P. Abbeel, G. Shi, and R. Duan. Resmimic: From general motion tracking to humanoid whole-body loco-manipulation via residual learning.arXiv preprint arXiv:2510.05070, 2025

  16. [16]

    Kalaria, S

    D. Kalaria, S. S. Harithas, P. Katara, S. Kwak, S. Bhagat, S. Sastry, S. Sridhar, S. Vemprala, A. Kapoor, and J. C.-K. Huang. DreamControl: Human-inspired whole-body humanoid control for scene interaction via guided diffusion.arXiv preprint arXiv:2509.14353, 2025

  17. [17]

    Falcon: Learning force- adaptive humanoid loco-manipulation.arXiv preprint arXiv:2505.06776, 2025

    Y . Zhang, Y . Yuan, P. Gurunath, I. Gupta, S. Omidshafiei, A.-a. Agha-mohammadi, M. Vazquez- Chanlatte, L. Pedersen, T. He, and G. Shi. FALCON: Learning force-adaptive humanoid loco-manipulation. InLearning for Dynamics and Control Conference (L4DC), 2026. arXiv:2505.06776

  18. [18]

    S. Xu, H. Y . Ling, Y .-X. Wang, and L.-Y . Gui. InterMimic: Towards universal whole-body control for physics-based human-object interactions. InProceedings of the IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition (CVPR), pages 12266–12277, June 2025. doi:10.1109/CVPR52734.2025.01145. Highlight Paper

  19. [19]

    Bengio, J

    Y . Bengio, J. Louradour, R. Collobert, and J. Weston. Curriculum learning. InProceedings of the 26th Annual International Conference on Machine Learning, pages 41–48, 2009

  20. [20]

    Narvekar, B

    S. Narvekar, B. Peng, M. Leonetti, J. Sinapov, M. E. Taylor, and P. Stone. Curriculum learning for reinforcement learning domains: A framework and survey.Journal of Machine Learning Research, 21(181):1–50, 2020

  21. [21]

    Rudin, D

    N. Rudin, D. Hoeller, P. Reist, and M. Hutter. Learning to walk in minutes using massively parallel deep reinforcement learning. InConference on Robot Learning, pages 91–100, 2022

  22. [22]

    Z. Shen, H. Pi, Y . Xia, Z. Cen, S. Peng, Z. Hu, H. Bao, R. Hu, and X. Zhou. World-grounded human motion recovery via gravity-view coordinates. InSIGGRAPH Asia 2024 Conference Papers, 2024. doi:10.1145/3680528.3687565

  23. [23]

    J. P. Ara´ujo, Y . Ze, P. Xu, J. Wu, and C. K. Liu. Retargeting matters: General motion retargeting for humanoid motion tracking.arXiv preprint arXiv:2510.02252, 2025

  24. [24]

    Q. Liao, T. E. Truong, X. Huang, Y . Gao, G. Tevet, K. Sreenath, and C. K. Liu. Beyondmimic: From motion tracking to versatile humanoid control via guided diffusion.arXiv preprint arXiv:2508.08241, 2025

  25. [25]

    Mittal, C

    M. Mittal, C. Yu, Q. Yu, J. Liu, N. Rudin, D. Hoeller, J. L. Yuan, R. Singh, Y . Guo, H. Mazhar, et al. Orbit: A unified simulation framework for interactive robot learning environments.IEEE Robotics and Automation Letters, 8:3740–3747, 2023

  26. [26]

    Proximal Policy Optimization Algorithms

    J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov. Proximal policy optimization algorithms.arXiv preprint arXiv:1707.06347, 2017

  27. [27]

    Unitree g1 humanoid robot.https://www.unitree.com/g1, 2024

    Unitree Robotics. Unitree g1 humanoid robot.https://www.unitree.com/g1, 2024. 10

  28. [28]

    Makoviychuk, L

    V . Makoviychuk, L. Wawrzyniak, Y . Guo, M. Lu, K. Storey, M. Macklin, D. Hoeller, N. Rudin, A. Allshire, A. Handa, et al. Isaac gym: High performance gpu-based physics simulation for robot learning. InAdvances in Neural Information Processing Systems, 2021

  29. [29]

    Tobin, R

    J. Tobin, R. Fong, A. Ray, J. Schneider, W. Zaremba, and P. Abbeel. Domain randomization for transferring deep neural networks from simulation to the real world. InIEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 23–30, 2017

  30. [30]

    X. B. Peng, M. Andrychowicz, W. Zaremba, and P. Abbeel. Sim-to-real transfer of robotic control with dynamics randomization. InIEEE International Conference on Robotics and Automation (ICRA), pages 3803–3810, 2018

  31. [31]

    Jocher and J

    G. Jocher and J. Qiu. Ultralytics YOLO11. https://docs.ultralytics.com/models/ yolo11/, 2024

  32. [32]

    Todorov, T

    E. Todorov, T. Erez, and Y . Tassa. Mujoco: A physics engine for model-based control. In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 5026–5033, 2012. 11 A Observation Space: Per-Dimension Breakdown Table A.1: Policy observation vector (547 dims). †In Stage 3 the anchor reference is replaced by a 9-dim heuristic locomo...