Learning Agile Striker Skills for Humanoid Soccer Robots from Noisy Sensory Input

Anastasiia Brund; Dongmyeong Lee; Hao Fu; Jiaheng Hu; Jiaxun Cui; Joydeep Biswas; Myoungkyu Seo; Peter Stone; Yuqian Jiang; Zhihan Wang

arxiv: 2512.06571 · v3 · submitted 2025-12-06 · 💻 cs.RO

Learning Agile Striker Skills for Humanoid Soccer Robots from Noisy Sensory Input

Zifan Xu , Myoungkyu Seo , Dongmyeong Lee , Hao Fu , Jiaheng Hu , Jiaxun Cui , Yuqian Jiang , Zhihan Wang

show 3 more authors

Anastasiia Brund Joydeep Biswas Peter Stone

This is my paper

Pith reviewed 2026-05-17 00:16 UTC · model grok-4.3

classification 💻 cs.RO

keywords reinforcement learninghumanoid robotsball kickingsim-to-real transfernoisy sensorssoccer roboticswhole-body controlpolicy distillation

0 comments

The pith

A four-stage reinforcement learning pipeline lets humanoid robots kick soccer balls accurately from noisy sensors and transfer to real hardware.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a reinforcement learning system that trains humanoid robots to perform fast, stable ball kicks adaptable to different positions and goals. It extends the standard teacher-student setup by adding four explicit stages: teacher training on perfect state data for chasing and kicking, distillation to a noisy-input student, and then online constrained reinforcement learning to refine the student policy. A sympathetic reader would care because the work shows a concrete way to close the simulation-to-reality gap for whole-body skills that must run under perceptual uncertainty and physical disturbances. Evaluations in simulation and on a physical robot report high kicking accuracy and goal-scoring rates across varied ball-goal configurations.

Core claim

The system achieves robust continual ball-kicking by first training a teacher policy with ground-truth information on long-distance chasing and directional kicking, distilling that policy to a student that receives only noisy sensory input, and finally adapting and refining the student policy through online constrained reinforcement learning. Tailored reward functions, realistic noise modeling, and the adaptation stage together close the sim-to-real gap and sustain performance under perceptual uncertainty, producing strong kicking accuracy and goal-scoring success on both simulated and real humanoid robots across diverse ball-goal setups.

What carries the argument

The four-stage teacher-student pipeline with online constrained RL for adaptation, which first learns ideal behaviors using perfect state information and then refines the policy to operate reliably under sensor noise and external perturbations.

If this is right

The robot maintains single-support stability while executing rapid leg swings for kicks under varying ball and goal positions.
Performance remains high when external perturbations such as opponents are present.
Removing the constrained RL adaptation stage or the noise modeling causes measurable degradation in kicking success.
The method supplies a reproducible benchmark task for visuomotor whole-body skill learning in humanoid robots.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same staged training pattern could be tested on related agile skills such as walking through crowds or recovering from pushes.
Replacing modeled noise with direct camera input during the student phase might reduce the need for hand-crafted noise models.
Extending the adaptation stage to multi-robot coordination would reveal whether the constrained RL component scales to team interactions.

Load-bearing premise

The simulated noise model and external perturbations are close enough to real sensor noise and physical disturbances that policies trained in simulation transfer to the physical robot without major performance loss.

What would settle it

A large drop in real-robot kicking accuracy or goal-scoring success rate relative to simulation when the robot encounters sensor noise or disturbances not captured by the training model would show the transfer does not hold.

Figures

Figures reproduced from arXiv: 2512.06571 by Anastasiia Brund, Dongmyeong Lee, Hao Fu, Jiaheng Hu, Jiaxun Cui, Joydeep Biswas, Myoungkyu Seo, Peter Stone, Yuqian Jiang, Zhihan Wang, Zifan Xu.

**Figure 2.** Figure 2: Left: The network architectures for the teacher and the student network; Right: Multi-stage training framework: (1) [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Kicking cycle phase (iii): reorienting to locate the [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗

**Figure 4.** Figure 4: An illustration of the realistic perception modeling. [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

**Figure 6.** Figure 6: An overview of the real-world deployment pipeline [PITH_FULL_IMAGE:figures/full_fig_p006_6.png] view at source ↗

**Figure 7.** Figure 7: Visualizations of success rate, kick accuracy, max ball vel., and energy cost, at different initial ball positions. [PITH_FULL_IMAGE:figures/full_fig_p007_7.png] view at source ↗

read the original abstract

Learning fast and robust ball-kicking skills is a critical capability for humanoid soccer robots, yet it remains a challenging problem due to the need for rapid leg swings, postural stability on a single support foot, and robustness under noisy sensory input and external perturbations (e.g., opponents). This paper presents a reinforcement learning (RL)-based system that enables humanoid robots to execute robust continual ball-kicking with adaptability to different ball-goal configurations. The system extends a typical teacher-student training framework -- in which a "teacher" policy is trained with ground truth state information and the "student" learns to mimic it with noisy, imperfect sensing -- by including four training stages: (1) long-distance ball chasing (teacher); (2) directional kicking (teacher); (3) teacher policy distillation (student); and (4) student adaptation and refinement (student). Key design elements -- including tailored reward functions, realistic noise modeling, and online constrained RL for adaptation and refinement -- are critical for closing the sim-to-real gap and sustaining performance under perceptual uncertainty. Extensive evaluations in both simulation and on a real robot demonstrate strong kicking accuracy and goal-scoring success across diverse ball-goal configurations. Ablation studies further highlight the necessity of the constrained RL, noise modeling, and the adaptation stage. This work presents a system for learning robust continual humanoid ball-kicking under imperfect perception, establishing a benchmark task for visuomotor skill learning in humanoid whole-body control.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a practical four-stage RL pipeline for humanoid ball chasing and kicking under noise, with real-robot results and ablations, but the noise model fidelity to physical sensors is not quantitatively checked.

read the letter

The main thing to know is that this work puts together a four-stage training process for a humanoid to chase a ball from distance, kick it directionally, distill the skill to a noisy student policy, and then refine it online with constrained RL. They report success on both simulation and a physical robot across varied ball and goal positions, plus ablations that test the noise modeling and adaptation steps.

Referee Report

3 major / 2 minor

Summary. The paper presents a four-stage reinforcement learning pipeline for humanoid robots to learn agile, continual ball-kicking skills under noisy sensory input and external perturbations. It extends the standard teacher-student distillation approach by adding stages for long-distance chasing (teacher), directional kicking (teacher), policy distillation to a student with noisy observations, and online constrained RL adaptation/refinement of the student. Tailored reward functions, realistic noise modeling on vision/proprioception, and constrained RL are highlighted as essential for sim-to-real transfer. Extensive simulation and real-robot evaluations are reported to show strong kicking accuracy and goal-scoring success across diverse ball-goal configurations, supported by ablation studies on the constrained RL, noise modeling, and adaptation stage.

Significance. If the empirical results hold under rigorous quantitative scrutiny, the work would provide a practical, reproducible benchmark for visuomotor whole-body control in dynamic humanoid tasks. The explicit four-stage curriculum, emphasis on noise modeling for perceptual uncertainty, and use of constrained RL for adaptation represent concrete engineering contributions that could transfer to other legged locomotion or manipulation problems requiring robustness to sensor noise.

major comments (3)

[Abstract and Evaluation] Abstract and Evaluation section: the central claim that 'realistic noise modeling' is 'key' and 'critical for closing the sim-to-real gap' is load-bearing for the transfer success, yet the manuscript supplies no quantitative validation (KL divergence, Wasserstein distance, or spectral comparison) between the injected noise distributions and logged real-robot sensor statistics under matching ball-goal configurations.
[Evaluation] Evaluation section: success rates and goal-scoring performance are asserted for both simulation and hardware, but the absence of reported numerical metrics, error bars, number of trials, or full experimental protocols makes it impossible to assess whether the observed transfer constitutes a statistically meaningful improvement over baselines.
[Ablation studies] Ablation studies: while the necessity of constrained RL, noise modeling, and the adaptation stage is highlighted, the reported performance deltas or failure modes under each ablation are not quantified, weakening the ability to attribute gains specifically to the noise model versus reward shaping.

minor comments (2)

[Methods] Notation for the four training stages and the constrained RL formulation should be introduced with explicit equations or pseudocode in the Methods section for reproducibility.
[Figures] Figure captions for real-robot experiments should include the exact number of trials, success criteria, and any observed failure modes to improve clarity.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed comments on our manuscript. We address each major comment point by point below, indicating the revisions we will make to improve clarity and rigor. We agree that additional quantitative details will strengthen the paper.

read point-by-point responses

Referee: [Abstract and Evaluation] Abstract and Evaluation section: the central claim that 'realistic noise modeling' is 'key' and 'critical for closing the sim-to-real gap' is load-bearing for the transfer success, yet the manuscript supplies no quantitative validation (KL divergence, Wasserstein distance, or spectral comparison) between the injected noise distributions and logged real-robot sensor statistics under matching ball-goal configurations.

Authors: We acknowledge that explicit quantitative validation of the noise model against real sensor statistics would better support the central claim. The noise parameters were derived from logged real-robot proprioceptive and visual data collected under comparable conditions, but the manuscript does not report distribution-level metrics. In the revision we will add a dedicated paragraph and supplementary figure that computes and reports KL divergence, Wasserstein distance, and spectral comparisons between the injected noise and the logged real-robot statistics for matching ball-goal configurations. revision: yes
Referee: [Evaluation] Evaluation section: success rates and goal-scoring performance are asserted for both simulation and hardware, but the absence of reported numerical metrics, error bars, number of trials, or full experimental protocols makes it impossible to assess whether the observed transfer constitutes a statistically meaningful improvement over baselines.

Authors: We agree that the current presentation lacks sufficient numerical detail for rigorous assessment. The manuscript contains success-rate and goal-scoring results, yet we will expand the Evaluation section with explicit tables reporting mean success rates, standard deviations (error bars), the exact number of trials per configuration (e.g., 50 trials), and a complete experimental protocol description for both simulation and hardware. These additions will enable readers to evaluate statistical significance and reproducibility. revision: yes
Referee: [Ablation studies] Ablation studies: while the necessity of constrained RL, noise modeling, and the adaptation stage is highlighted, the reported performance deltas or failure modes under each ablation are not quantified, weakening the ability to attribute gains specifically to the noise model versus reward shaping.

Authors: We recognize that quantitative ablation results are needed to isolate the contribution of each component. The manuscript already presents ablation outcomes, but we will augment the section with explicit performance deltas (e.g., percentage-point drops in success rate when noise modeling or constrained RL is removed) and concise descriptions of observed failure modes for each ablated variant. This will clarify the relative importance of noise modeling versus reward shaping. revision: yes

Circularity Check

0 steps flagged

No significant circularity in the empirical RL training and evaluation pipeline.

full rationale

The paper describes a four-stage teacher-student RL framework for learning humanoid ball-kicking policies, incorporating tailored rewards, noise modeling, and online constrained RL. All load-bearing claims of kicking accuracy and goal-scoring success rest on direct empirical evaluations performed in simulation and on physical hardware across diverse ball-goal configurations, which serve as external benchmarks independent of the training objectives. No equations or results reduce by construction to fitted inputs, self-definitions, or self-citation chains; ablations isolate component contributions without creating circular dependencies. The sim-to-real transfer relies on an unverified modeling assumption, but this is a validity concern rather than circularity.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

The approach depends on standard RL assumptions plus task-specific choices whose details are only sketched in the abstract.

free parameters (2)

reward function coefficients
Tailored rewards for kicking accuracy, balance, and goal scoring are stated as critical but their numerical weights are not given.
noise model parameters
Parameters chosen to produce realistic sensor noise during training are described as key but not enumerated.

axioms (2)

domain assumption A teacher policy trained with ground-truth state can acquire effective chasing and kicking behaviors that are worth distilling.
Invoked in the first two training stages.
domain assumption Constrained RL can refine the student policy without destabilizing previously learned behaviors.
Central to the fourth adaptation stage.

pith-pipeline@v0.9.0 · 5597 in / 1309 out tokens · 83393 ms · 2026-05-17T00:16:45.194703+00:00 · methodology

discussion (0)

Forward citations

Cited by 4 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Rhythm: Learning Interactive Whole-Body Control for Dual Humanoids
cs.RO 2026-03 unverdicted novelty 7.0

Rhythm transfers interactive whole-body behaviors from simulation to real dual Unitree G1 humanoids via interaction-aware retargeting and graph-reward RL.
VOFA: Visual Object Goal Pushing with Force-Adaptive Control for Humanoids
cs.RO 2026-05 unverdicted novelty 6.0

VOFA combines a high-level visuomotor policy with a low-level force-adaptive controller to let humanoids push objects up to 17 kg to arbitrary goals using only noisy onboard vision, achieving over 80% real-world success.
Sim-to-Real Transfer for Muscle-Actuated Robots via Generalized Actuator Networks
cs.RO 2026-04 unverdicted novelty 6.0

GeAN learns actuator dynamics from position trajectories to enable successful sim-to-real transfer of goal-reaching and ball-in-a-cup policies on a 4-DoF pneumatic muscle-actuated robot, reported as the first such tra...
HAIC: Humanoid Agile Object Interaction Control via Dynamics-Aware World Model
cs.RO 2026-02 unverdicted novelty 6.0

HAIC enables robust humanoid interactions with underactuated objects by predicting their dynamics from proprioceptive history and using a world model for adaptive control.

Reference graph

Works this paper leans on

46 extracted references · 46 canonical work pages · cited by 4 Pith papers · 1 internal anchor

[1]

Karen Liu, Abder- rahmane Kheddar, Xue Bin Peng, Yuke Zhu, Guanya Shi, Quan Nguyen, Gordon Cheng, Huijun Gao, and Ye Zhao

Z. Gu, J. Li, W. Shen, W. Yu, Z. Xie, S. McCrory, X. Cheng, A. Shamsah, R. Griffin, C. K. Liu,et al., “Humanoid locomotion and manipulation: Current progress and challenges in control, planning, and learning,”arXiv preprint arXiv:2501.02116, 2025

work page arXiv 2025
[2]

Sim-to-real learning for humanoid box loco-manipulation,

J. Dao, H. Duan, and A. Fern, “Sim-to-real learning for humanoid box loco-manipulation,” in2024 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2024, pp. 16 930–16 936

work page 2024
[3]

Hold my beer: Learning gentle humanoid locomotion and end-effector stabilization control, 2025

Y . Li, Y . Zhang, W. Xiao, C. Pan, H. Weng, G. He, T. He, and G. Shi, “Learning gentle humanoid locomotion and end-effector stabilization control,”arXiv preprint arXiv:2505.24198, 2025

work page arXiv 2025
[4]

Hub: Learning extreme humanoid balance,

T. Zhang, B. Zheng, R. Nai, Y . Hu, Y .-J. Wang, G. Chen, F. Lin, J. Li, C. Hong, K. Sreenath,et al., “Hub: Learning extreme humanoid balance,”arXiv preprint arXiv:2505.07294, 2025

work page arXiv 2025
[5]

End-to-end humanoid robot safe and comfortable loco- motion policy.arXiv preprint arXiv:2508.07611, 2025

Z. Wang, X. Yang, J. Zhao, J. Zhou, T. Ma, Z. Gao, A. Ajoudani, and J. Liang, “End-to-end humanoid robot safe and comfortable locomotion policy,”arXiv preprint arXiv:2508.07611, 2025

work page arXiv 2025
[6]

Amo: Adaptive motion optimization for hyper-dexterous humanoid whole-body control.arXiv preprint arXiv:2505.03738,

J. Li, X. Cheng, T. Huang, S. Yang, R.-Z. Qiu, and X. Wang, “Amo: Adaptive motion optimization for hyper-dexterous humanoid whole- body control,”arXiv preprint arXiv:2505.03738, 2025

work page arXiv 2025
[7]

Learning to walk in minutes using massively parallel deep reinforcement learning,

N. Rudin, D. Hoeller, P. Reist, and M. Hutter, “Learning to walk in minutes using massively parallel deep reinforcement learning,” in Conference on robot learning. PMLR, 2022, pp. 91–100

work page 2022
[8]

Robot parkour learning

Z. Zhuang, Z. Fu, J. Wang, C. Atkeson, S. Schwertfeger, C. Finn, and H. Zhao, “Robot parkour learning,”arXiv preprint arXiv:2309.05665, 2023

work page arXiv 2023
[9]

Legged locomotion in challenging terrains using egocentric vision,

A. Agarwal, A. Kumar, J. Malik, and D. Pathak, “Legged locomotion in challenging terrains using egocentric vision,” inConference on robot learning. PMLR, 2023, pp. 403–415

work page 2023
[10]

Learning visual quadrupedal loco-manipulation from demonstrations,

Z. He, K. Lei, Y . Ze, K. Sreenath, Z. Li, and H. Xu, “Learning visual quadrupedal loco-manipulation from demonstrations,” in2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2024, pp. 9102–9109

work page 2024
[11]

A Careful Examination of Large Behavior Models for Multitask Dexterous Manipulation

J. Barreiros, A. Beaulieu, A. Bhat, R. Cory, E. Cousineau, H. Dai, C.-H. Fang, K. Hashimoto, M. Z. Irshad, M. Itkina,et al., “A careful examination of large behavior models for multitask dexterous manipulation,”arXiv preprint arXiv:2507.05331, 2025

work page internal anchor Pith review arXiv 2025
[12]

Dynamic object goal pushing with mobile manipulators through model-free constrained reinforcement learning,

I. Dadiotis, M. Mittal, N. Tsagarakis, and M. Hutter, “Dynamic object goal pushing with mobile manipulators through model-free constrained reinforcement learning,”arXiv preprint arXiv:2502.01546, 2025

work page arXiv 2025
[13]

A reduction of imitation learning and structured prediction to no-regret online learning,

S. Ross, G. Gordon, and D. Bagnell, “A reduction of imitation learning and structured prediction to no-regret online learning,” inProceedings of the fourteenth international conference on artificial intelligence and statistics. JMLR Workshop and Conference Proceedings, 2011, pp. 627–635

work page 2011
[14]

Evaluation of constrained reinforcement learning algorithms for legged locomotion,

J. Lee, L. Schroth, V . Klemm, M. Bjelonic, A. Reske, and M. Hut- ter, “Evaluation of constrained reinforcement learning algorithms for legged locomotion,”arXiv preprint arXiv:2309.15430, 2023

work page arXiv 2023
[15]

Policy gradient reinforcement learning for fast quadrupedal locomotion,

N. Kohl and P. Stone, “Policy gradient reinforcement learning for fast quadrupedal locomotion,” inIEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA’04. 2004, vol. 3. IEEE, 2004, pp. 2619–2624

work page 2004
[16]

Learning powerful kicks on the aibo ers-7: The quest for a striker,

M. Hausknecht and P. Stone, “Learning powerful kicks on the aibo ers-7: The quest for a striker,” inRobot Soccer World Cup. Springer, 2010, pp. 254–265

work page 2010
[17]

Humanoid robots in soccer: Robots versus humans in robocup 2050,

R. Gerndt, D. Seifert, J. H. Baltes, S. Sadeghnejad, and S. Behnke, “Humanoid robots in soccer: Robots versus humans in robocup 2050,” IEEE Robotics & Automation Magazine, vol. 22, no. 3, pp. 147–154, 2015

work page 2050
[18]

The robocup humanoid challenge as the millennium challenge for advanced robotics,

H. Kitano and M. Asada, “The robocup humanoid challenge as the millennium challenge for advanced robotics,”Advanced Robotics, vol. 13, no. 8, pp. 723–736, 1998

work page 1998
[19]

Toward real-world cooperative and competitive soccer with quadrupedal robot teams,

Z. Su, Y . Gao, E. Lukas, Y . Li, J. Cai, F. Tulbah, F. Gao, C. Yu, Z. Li, Y . Wu, and K. Sreenath, “Toward real-world cooperative and competitive soccer with quadrupedal robot teams,”arXiv preprint arXiv:2505.13834, 2025, may 20 2025. [Online]. Available: https://arxiv.org/abs/2505.13834

work page arXiv 2025
[20]

Dribblebot: Dynamic legged manipulation in the wild,

Y . Ji, G. B. Margolis, and P. Agrawal, “Dribblebot: Dynamic legged manipulation in the wild,”arXiv preprint arXiv:2304.01159, 2023, april 2023. [Online]. Available: https://arxiv.org/abs/2304.01159

work page arXiv 2023
[21]

Hierarchical reinforcement learning for precise soccer shooting skills using a quadrupedal robot,

Y . Ji, Z. Li, Y . Sun, X. B. Peng, S. Levine, G. Berseth, and K. Sreenath, “Hierarchical reinforcement learning for precise soccer shooting skills using a quadrupedal robot,” in2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Oct 2022, arXiv:2208.01160. [Online]. Available: https://arxiv.org/abs/2208.01160

work page arXiv 2022
[22]

Autonomous quadrupedal goalkeeping using hierar- chical rl and vision-based localization,

L. Blommers, “Autonomous quadrupedal goalkeeping using hierar- chical rl and vision-based localization,” Master’s thesis, Eindhoven University of Technology, November 2024, master’s thesis

work page 2024
[23]

Creating a dynamic quadrupedal robotic goalkeeper with reinforcement learning,

X. Huang, Z. Li, Y . Xiang, Y . Ni, Y . Chi, Y . Li, L. Yang, X. B. Peng, and K. Sreenath, “Creating a dynamic quadrupedal robotic goalkeeper with reinforcement learning,” in 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Oct 2023, arXiv:2210.04435. [Online]. Available: https://arxiv.org/abs/2210.04435

work page arXiv 2023
[24]

Learning agile soccer skills for a bipedal robot with deep reinforcement learning,

T. Haarnoja, B. Moran, G. Lever, S. H. Huang, D. Tirumala, J. Humplik, M. Wulfmeier, S. Tunyasuvunakool, N. Y . Siegel, R. Hafner, M. Bloesch, K. Hartikainen, A. Byravan, L. Hasenclever, Y . Tassa, F. Sadeghi, N. Batchelor, F. Casarini, S. Saliceti, C. Game, N. Sreendra, K. Patel, M. Gwira, A. Huber, N. Hurley, F. Nori, R. Hadsell, and N. Heess, “Learning...

work page doi:10.1126/scirobotics.adi8022 2024
[25]

Learning vision-driven reactive soccer skills for humanoid robots,

Y . Wang, C. Luo, P. Chen, J. Liu, W. Sun, T. Guo, K. Yang, B. Hu, Y . Zhang, and M. Zhao, “Learning vision-driven reactive soccer skills for humanoid robots,”arXiv preprint arXiv:2511.03996, 2025

work page arXiv 2025
[26]

Dribble master: Learning agile humanoid dribbling through legged locomotion,

Z. Wang, J. Zhou, and Q. Wu, “Dribble master: Learning agile humanoid dribbling through legged locomotion,”arXiv preprint arXiv:2505.12679, May 2025. [Online]. Available: https://arxiv.org/abs/2505.12679

work page arXiv 2025
[28]

Available: https://arxiv.org/abs/2504.20808

[Online]. Available: https://arxiv.org/abs/2504.20808

work page arXiv
[29]

A biomechanics-inspired approach to soccer kicking for humanoid robots,

D. Marew, N. Perera, S. Yu, S. Roelker, and D. Kim, “A biomechanics-inspired approach to soccer kicking for humanoid robots,” inProceedings of the 2024 IEEE-RAS 23rd International Conference on Humanoid Robots (Humanoids), 2024, pp. 722–729. [Online]. Available: https://arxiv.org/abs/2407.14612

work page arXiv 2024
[30]

Dynamic behaviors on the nao robot with closed-loop whole body operational space con- trol,

D. Kim, S. J. Jorgensen, P. Stone, and L. Sentis, “Dynamic behaviors on the nao robot with closed-loop whole body operational space con- trol,” in2016 IEEE-RAS 16th International Conference on Humanoid Robots (Humanoids). IEEE, 2016, pp. 1121–1128

work page 2016
[31]

A whole-body control framework for humanoids operating in human environments,

L. Sentis and O. Khatib, “A whole-body control framework for humanoids operating in human environments,” inProceedings 2006 IEEE International Conference on Robotics and Automation, 2006. ICRA 2006.IEEE, 2006, pp. 2641–2648

work page 2006
[32]

Reinforcement learning for robust parameterized locomotion control of bipedal robots,

Z. Li, X. Cheng, X. B. Peng, P. Abbeel, S. Levine, G. Berseth, and K. Sreenath, “Reinforcement learning for robust parameterized locomotion control of bipedal robots,” in2021 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2021, pp. 2811–2817

work page 2021
[33]

Asap: Aligning simulation and real-world physics for learning agile humanoid whole-body skills.arXiv preprint arXiv:2502.01143, 2025

T. He, J. Gao, W. Xiao, Y . Zhang, Z. Wang, J. Wang, Z. Luo, G. He, N. Sobanbab, C. Pan,et al., “Asap: Aligning simulation and real-world physics for learning agile humanoid whole-body skills,”arXiv preprint arXiv:2502.01143, 2025

work page arXiv 2025
[34]

Cheng, Y

X. Cheng, Y . Ji, J. Chen, R. Yang, G. Yang, and X. Wang, “Ex- pressive whole-body control for humanoid robots,”arXiv preprint arXiv:2402.16796, 2024

work page arXiv 2024
[35]

Causal policy gradient for whole-body mobile manipulation,

J. Hu, P. Stone, and R. Mart ´ın-Mart´ın, “Causal policy gradient for whole-body mobile manipulation,”arXiv preprint arXiv:2305.04866, 2023

work page arXiv 2023
[36]

On the emergence of whole- body strategies from humanoid robot push-recovery learning,

D. Ferigo, R. Camoriano, P. M. Viceconte, D. Calandriello, S. Traver- saro, L. Rosasco, and D. Pucci, “On the emergence of whole- body strategies from humanoid robot push-recovery learning,”IEEE Robotics and Automation Letters, vol. 6, no. 4, pp. 8561–8568, 2021

work page 2021
[37]

Slac: Simulation-pretrained latent action space for whole-body real-world rl,

J. Hu, P. Stone, and R. Mart ´ın-Mart´ın, “Slac: Simulation-pretrained latent action space for whole-body real-world rl,”arXiv preprint arXiv:2506.04147, 2025

work page arXiv 2025
[38]

Learning coor- dinated badminton skills for legged manipulators,

Y . Ma, A. Cramariuc, F. Farshidian, and M. Hutter, “Learning coor- dinated badminton skills for legged manipulators,”Science Robotics, vol. 10, no. 102, p. eadu3922, 2025

work page 2025
[39]

Wococo: Learning whole-body humanoid control with sequential contacts.arXiv preprint arXiv:2406.06005, 2024

C. Zhang, W. Xiao, T. He, and G. Shi, “Wococo: Learning whole- body humanoid control with sequential contacts,”arXiv preprint arXiv:2406.06005, 2024

work page arXiv 2024
[40]

Booster gym: An end-to-end reinforcement learning framework for humanoid robot locomotion,

Y . Wang, P. Chen, X. Han, F. Wu, and M. Zhao, “Booster gym: An end-to-end reinforcement learning framework for humanoid robot locomotion,”arXiv preprint arXiv:2506.15132, 2025

work page arXiv 2025
[41]

unitree rl gym: A reinforcement learn- ing gym for unitree robots (go2, h1, h1 2, g1),

U. Robotics, “unitree rl gym: A reinforcement learn- ing gym for unitree robots (go2, h1, h1 2, g1),” https://github.com/unitreerobotics/unitree rl gym, 2025, bSD-3- Clause License; accessed September 15, 2025

work page 2025
[42]

Humanoid league laws of the game 2025: Robocup humanoid league adult- size rules,

RoboCup Humanoid League Technical Committee, “Humanoid league laws of the game 2025: Robocup humanoid league adult- size rules,” https://humanoid.robocup.org/wp-content/uploads/RC-HL- 2025-Rules.pdf, Apr. 2025, accessed: 2025-09-13

work page 2025
[43]

Clone: Closed-loop whole-body humanoid teleoperation for long- horizon tasks,

Y . Li, Y . Lin, J. Cui, T. Liu, W. Liang, Y . Zhu, and S. Huang, “Clone: Closed-loop whole-body humanoid teleoperation for long- horizon tasks,”arXiv preprint arXiv:2506.08931, 2025

work page arXiv 2025
[44]

Distillation-ppo: A novel two-stage reinforcement learning framework for humanoid robot perceptive locomotion

Q. Zhang, G. Han, J. Sun, W. Zhao, C. Sun, J. Cao, J. Wang, Y . Guo, and R. Xu, “Distillation-ppo: A novel two-stage reinforcement learning framework for humanoid robot perceptive locomotion,”arXiv preprint arXiv:2503.08299, 2025

work page arXiv 2025
[45]

Booster t1,

B. Robotics, “Booster t1,” https://www.boosterobotics.com/robots/, Booster Robotics, 2023–2024, “Made for Develop- ers”, Lightweight, Flexible, Durable. [Online]. Available: https://www.boosterobotics.com/robots/

work page 2023
[46]

Yolov8: A novel object detection algorithm with enhanced performance and robustness,

R. Varghese and M. Sambath, “Yolov8: A novel object detection algorithm with enhanced performance and robustness,” in2024 Inter- national conference on advances in data engineering and intelligent computing systems (ADICS). IEEE, 2024, pp. 1–6

work page 2024
[47]

Legolas: Deep leg-inertial odometry,

J. Wasserman, A. Agarwal, R. Jangir, G. Chowdhary, D. Pathak, and A. Gupta, “Legolas: Deep leg-inertial odometry,” in8th Annual Conference on Robot Learning, 2024

work page 2024

[1] [1]

Karen Liu, Abder- rahmane Kheddar, Xue Bin Peng, Yuke Zhu, Guanya Shi, Quan Nguyen, Gordon Cheng, Huijun Gao, and Ye Zhao

Z. Gu, J. Li, W. Shen, W. Yu, Z. Xie, S. McCrory, X. Cheng, A. Shamsah, R. Griffin, C. K. Liu,et al., “Humanoid locomotion and manipulation: Current progress and challenges in control, planning, and learning,”arXiv preprint arXiv:2501.02116, 2025

work page arXiv 2025

[2] [2]

Sim-to-real learning for humanoid box loco-manipulation,

J. Dao, H. Duan, and A. Fern, “Sim-to-real learning for humanoid box loco-manipulation,” in2024 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2024, pp. 16 930–16 936

work page 2024

[3] [3]

Hold my beer: Learning gentle humanoid locomotion and end-effector stabilization control, 2025

Y . Li, Y . Zhang, W. Xiao, C. Pan, H. Weng, G. He, T. He, and G. Shi, “Learning gentle humanoid locomotion and end-effector stabilization control,”arXiv preprint arXiv:2505.24198, 2025

work page arXiv 2025

[4] [4]

Hub: Learning extreme humanoid balance,

T. Zhang, B. Zheng, R. Nai, Y . Hu, Y .-J. Wang, G. Chen, F. Lin, J. Li, C. Hong, K. Sreenath,et al., “Hub: Learning extreme humanoid balance,”arXiv preprint arXiv:2505.07294, 2025

work page arXiv 2025

[5] [5]

End-to-end humanoid robot safe and comfortable loco- motion policy.arXiv preprint arXiv:2508.07611, 2025

Z. Wang, X. Yang, J. Zhao, J. Zhou, T. Ma, Z. Gao, A. Ajoudani, and J. Liang, “End-to-end humanoid robot safe and comfortable locomotion policy,”arXiv preprint arXiv:2508.07611, 2025

work page arXiv 2025

[6] [6]

Amo: Adaptive motion optimization for hyper-dexterous humanoid whole-body control.arXiv preprint arXiv:2505.03738,

J. Li, X. Cheng, T. Huang, S. Yang, R.-Z. Qiu, and X. Wang, “Amo: Adaptive motion optimization for hyper-dexterous humanoid whole- body control,”arXiv preprint arXiv:2505.03738, 2025

work page arXiv 2025

[7] [7]

Learning to walk in minutes using massively parallel deep reinforcement learning,

N. Rudin, D. Hoeller, P. Reist, and M. Hutter, “Learning to walk in minutes using massively parallel deep reinforcement learning,” in Conference on robot learning. PMLR, 2022, pp. 91–100

work page 2022

[8] [8]

Robot parkour learning

Z. Zhuang, Z. Fu, J. Wang, C. Atkeson, S. Schwertfeger, C. Finn, and H. Zhao, “Robot parkour learning,”arXiv preprint arXiv:2309.05665, 2023

work page arXiv 2023

[9] [9]

Legged locomotion in challenging terrains using egocentric vision,

A. Agarwal, A. Kumar, J. Malik, and D. Pathak, “Legged locomotion in challenging terrains using egocentric vision,” inConference on robot learning. PMLR, 2023, pp. 403–415

work page 2023

[10] [10]

Learning visual quadrupedal loco-manipulation from demonstrations,

Z. He, K. Lei, Y . Ze, K. Sreenath, Z. Li, and H. Xu, “Learning visual quadrupedal loco-manipulation from demonstrations,” in2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2024, pp. 9102–9109

work page 2024

[11] [11]

A Careful Examination of Large Behavior Models for Multitask Dexterous Manipulation

J. Barreiros, A. Beaulieu, A. Bhat, R. Cory, E. Cousineau, H. Dai, C.-H. Fang, K. Hashimoto, M. Z. Irshad, M. Itkina,et al., “A careful examination of large behavior models for multitask dexterous manipulation,”arXiv preprint arXiv:2507.05331, 2025

work page internal anchor Pith review arXiv 2025

[12] [12]

Dynamic object goal pushing with mobile manipulators through model-free constrained reinforcement learning,

I. Dadiotis, M. Mittal, N. Tsagarakis, and M. Hutter, “Dynamic object goal pushing with mobile manipulators through model-free constrained reinforcement learning,”arXiv preprint arXiv:2502.01546, 2025

work page arXiv 2025

[13] [13]

A reduction of imitation learning and structured prediction to no-regret online learning,

S. Ross, G. Gordon, and D. Bagnell, “A reduction of imitation learning and structured prediction to no-regret online learning,” inProceedings of the fourteenth international conference on artificial intelligence and statistics. JMLR Workshop and Conference Proceedings, 2011, pp. 627–635

work page 2011

[14] [14]

Evaluation of constrained reinforcement learning algorithms for legged locomotion,

J. Lee, L. Schroth, V . Klemm, M. Bjelonic, A. Reske, and M. Hut- ter, “Evaluation of constrained reinforcement learning algorithms for legged locomotion,”arXiv preprint arXiv:2309.15430, 2023

work page arXiv 2023

[15] [15]

Policy gradient reinforcement learning for fast quadrupedal locomotion,

N. Kohl and P. Stone, “Policy gradient reinforcement learning for fast quadrupedal locomotion,” inIEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA’04. 2004, vol. 3. IEEE, 2004, pp. 2619–2624

work page 2004

[16] [16]

Learning powerful kicks on the aibo ers-7: The quest for a striker,

M. Hausknecht and P. Stone, “Learning powerful kicks on the aibo ers-7: The quest for a striker,” inRobot Soccer World Cup. Springer, 2010, pp. 254–265

work page 2010

[17] [17]

Humanoid robots in soccer: Robots versus humans in robocup 2050,

R. Gerndt, D. Seifert, J. H. Baltes, S. Sadeghnejad, and S. Behnke, “Humanoid robots in soccer: Robots versus humans in robocup 2050,” IEEE Robotics & Automation Magazine, vol. 22, no. 3, pp. 147–154, 2015

work page 2050

[18] [18]

The robocup humanoid challenge as the millennium challenge for advanced robotics,

H. Kitano and M. Asada, “The robocup humanoid challenge as the millennium challenge for advanced robotics,”Advanced Robotics, vol. 13, no. 8, pp. 723–736, 1998

work page 1998

[19] [19]

Toward real-world cooperative and competitive soccer with quadrupedal robot teams,

Z. Su, Y . Gao, E. Lukas, Y . Li, J. Cai, F. Tulbah, F. Gao, C. Yu, Z. Li, Y . Wu, and K. Sreenath, “Toward real-world cooperative and competitive soccer with quadrupedal robot teams,”arXiv preprint arXiv:2505.13834, 2025, may 20 2025. [Online]. Available: https://arxiv.org/abs/2505.13834

work page arXiv 2025

[20] [20]

Dribblebot: Dynamic legged manipulation in the wild,

Y . Ji, G. B. Margolis, and P. Agrawal, “Dribblebot: Dynamic legged manipulation in the wild,”arXiv preprint arXiv:2304.01159, 2023, april 2023. [Online]. Available: https://arxiv.org/abs/2304.01159

work page arXiv 2023

[21] [21]

Hierarchical reinforcement learning for precise soccer shooting skills using a quadrupedal robot,

Y . Ji, Z. Li, Y . Sun, X. B. Peng, S. Levine, G. Berseth, and K. Sreenath, “Hierarchical reinforcement learning for precise soccer shooting skills using a quadrupedal robot,” in2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Oct 2022, arXiv:2208.01160. [Online]. Available: https://arxiv.org/abs/2208.01160

work page arXiv 2022

[22] [22]

Autonomous quadrupedal goalkeeping using hierar- chical rl and vision-based localization,

L. Blommers, “Autonomous quadrupedal goalkeeping using hierar- chical rl and vision-based localization,” Master’s thesis, Eindhoven University of Technology, November 2024, master’s thesis

work page 2024

[23] [23]

Creating a dynamic quadrupedal robotic goalkeeper with reinforcement learning,

X. Huang, Z. Li, Y . Xiang, Y . Ni, Y . Chi, Y . Li, L. Yang, X. B. Peng, and K. Sreenath, “Creating a dynamic quadrupedal robotic goalkeeper with reinforcement learning,” in 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Oct 2023, arXiv:2210.04435. [Online]. Available: https://arxiv.org/abs/2210.04435

work page arXiv 2023

[24] [24]

Learning agile soccer skills for a bipedal robot with deep reinforcement learning,

T. Haarnoja, B. Moran, G. Lever, S. H. Huang, D. Tirumala, J. Humplik, M. Wulfmeier, S. Tunyasuvunakool, N. Y . Siegel, R. Hafner, M. Bloesch, K. Hartikainen, A. Byravan, L. Hasenclever, Y . Tassa, F. Sadeghi, N. Batchelor, F. Casarini, S. Saliceti, C. Game, N. Sreendra, K. Patel, M. Gwira, A. Huber, N. Hurley, F. Nori, R. Hadsell, and N. Heess, “Learning...

work page doi:10.1126/scirobotics.adi8022 2024

[25] [25]

Learning vision-driven reactive soccer skills for humanoid robots,

Y . Wang, C. Luo, P. Chen, J. Liu, W. Sun, T. Guo, K. Yang, B. Hu, Y . Zhang, and M. Zhao, “Learning vision-driven reactive soccer skills for humanoid robots,”arXiv preprint arXiv:2511.03996, 2025

work page arXiv 2025

[26] [26]

Dribble master: Learning agile humanoid dribbling through legged locomotion,

Z. Wang, J. Zhou, and Q. Wu, “Dribble master: Learning agile humanoid dribbling through legged locomotion,”arXiv preprint arXiv:2505.12679, May 2025. [Online]. Available: https://arxiv.org/abs/2505.12679

work page arXiv 2025

[27] [28]

Available: https://arxiv.org/abs/2504.20808

[Online]. Available: https://arxiv.org/abs/2504.20808

work page arXiv

[28] [29]

A biomechanics-inspired approach to soccer kicking for humanoid robots,

D. Marew, N. Perera, S. Yu, S. Roelker, and D. Kim, “A biomechanics-inspired approach to soccer kicking for humanoid robots,” inProceedings of the 2024 IEEE-RAS 23rd International Conference on Humanoid Robots (Humanoids), 2024, pp. 722–729. [Online]. Available: https://arxiv.org/abs/2407.14612

work page arXiv 2024

[29] [30]

Dynamic behaviors on the nao robot with closed-loop whole body operational space con- trol,

D. Kim, S. J. Jorgensen, P. Stone, and L. Sentis, “Dynamic behaviors on the nao robot with closed-loop whole body operational space con- trol,” in2016 IEEE-RAS 16th International Conference on Humanoid Robots (Humanoids). IEEE, 2016, pp. 1121–1128

work page 2016

[30] [31]

A whole-body control framework for humanoids operating in human environments,

L. Sentis and O. Khatib, “A whole-body control framework for humanoids operating in human environments,” inProceedings 2006 IEEE International Conference on Robotics and Automation, 2006. ICRA 2006.IEEE, 2006, pp. 2641–2648

work page 2006

[31] [32]

Reinforcement learning for robust parameterized locomotion control of bipedal robots,

Z. Li, X. Cheng, X. B. Peng, P. Abbeel, S. Levine, G. Berseth, and K. Sreenath, “Reinforcement learning for robust parameterized locomotion control of bipedal robots,” in2021 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2021, pp. 2811–2817

work page 2021

[32] [33]

Asap: Aligning simulation and real-world physics for learning agile humanoid whole-body skills.arXiv preprint arXiv:2502.01143, 2025

T. He, J. Gao, W. Xiao, Y . Zhang, Z. Wang, J. Wang, Z. Luo, G. He, N. Sobanbab, C. Pan,et al., “Asap: Aligning simulation and real-world physics for learning agile humanoid whole-body skills,”arXiv preprint arXiv:2502.01143, 2025

work page arXiv 2025

[33] [34]

Cheng, Y

X. Cheng, Y . Ji, J. Chen, R. Yang, G. Yang, and X. Wang, “Ex- pressive whole-body control for humanoid robots,”arXiv preprint arXiv:2402.16796, 2024

work page arXiv 2024

[34] [35]

Causal policy gradient for whole-body mobile manipulation,

J. Hu, P. Stone, and R. Mart ´ın-Mart´ın, “Causal policy gradient for whole-body mobile manipulation,”arXiv preprint arXiv:2305.04866, 2023

work page arXiv 2023

[35] [36]

On the emergence of whole- body strategies from humanoid robot push-recovery learning,

D. Ferigo, R. Camoriano, P. M. Viceconte, D. Calandriello, S. Traver- saro, L. Rosasco, and D. Pucci, “On the emergence of whole- body strategies from humanoid robot push-recovery learning,”IEEE Robotics and Automation Letters, vol. 6, no. 4, pp. 8561–8568, 2021

work page 2021

[36] [37]

Slac: Simulation-pretrained latent action space for whole-body real-world rl,

J. Hu, P. Stone, and R. Mart ´ın-Mart´ın, “Slac: Simulation-pretrained latent action space for whole-body real-world rl,”arXiv preprint arXiv:2506.04147, 2025

work page arXiv 2025

[37] [38]

Learning coor- dinated badminton skills for legged manipulators,

Y . Ma, A. Cramariuc, F. Farshidian, and M. Hutter, “Learning coor- dinated badminton skills for legged manipulators,”Science Robotics, vol. 10, no. 102, p. eadu3922, 2025

work page 2025

[38] [39]

Wococo: Learning whole-body humanoid control with sequential contacts.arXiv preprint arXiv:2406.06005, 2024

C. Zhang, W. Xiao, T. He, and G. Shi, “Wococo: Learning whole- body humanoid control with sequential contacts,”arXiv preprint arXiv:2406.06005, 2024

work page arXiv 2024

[39] [40]

Booster gym: An end-to-end reinforcement learning framework for humanoid robot locomotion,

Y . Wang, P. Chen, X. Han, F. Wu, and M. Zhao, “Booster gym: An end-to-end reinforcement learning framework for humanoid robot locomotion,”arXiv preprint arXiv:2506.15132, 2025

work page arXiv 2025

[40] [41]

unitree rl gym: A reinforcement learn- ing gym for unitree robots (go2, h1, h1 2, g1),

U. Robotics, “unitree rl gym: A reinforcement learn- ing gym for unitree robots (go2, h1, h1 2, g1),” https://github.com/unitreerobotics/unitree rl gym, 2025, bSD-3- Clause License; accessed September 15, 2025

work page 2025

[41] [42]

Humanoid league laws of the game 2025: Robocup humanoid league adult- size rules,

RoboCup Humanoid League Technical Committee, “Humanoid league laws of the game 2025: Robocup humanoid league adult- size rules,” https://humanoid.robocup.org/wp-content/uploads/RC-HL- 2025-Rules.pdf, Apr. 2025, accessed: 2025-09-13

work page 2025

[42] [43]

Clone: Closed-loop whole-body humanoid teleoperation for long- horizon tasks,

Y . Li, Y . Lin, J. Cui, T. Liu, W. Liang, Y . Zhu, and S. Huang, “Clone: Closed-loop whole-body humanoid teleoperation for long- horizon tasks,”arXiv preprint arXiv:2506.08931, 2025

work page arXiv 2025

[43] [44]

Distillation-ppo: A novel two-stage reinforcement learning framework for humanoid robot perceptive locomotion

Q. Zhang, G. Han, J. Sun, W. Zhao, C. Sun, J. Cao, J. Wang, Y . Guo, and R. Xu, “Distillation-ppo: A novel two-stage reinforcement learning framework for humanoid robot perceptive locomotion,”arXiv preprint arXiv:2503.08299, 2025

work page arXiv 2025

[44] [45]

Booster t1,

B. Robotics, “Booster t1,” https://www.boosterobotics.com/robots/, Booster Robotics, 2023–2024, “Made for Develop- ers”, Lightweight, Flexible, Durable. [Online]. Available: https://www.boosterobotics.com/robots/

work page 2023

[45] [46]

Yolov8: A novel object detection algorithm with enhanced performance and robustness,

R. Varghese and M. Sambath, “Yolov8: A novel object detection algorithm with enhanced performance and robustness,” in2024 Inter- national conference on advances in data engineering and intelligent computing systems (ADICS). IEEE, 2024, pp. 1–6

work page 2024

[46] [47]

Legolas: Deep leg-inertial odometry,

J. Wasserman, A. Agarwal, R. Jangir, G. Chowdhary, D. Pathak, and A. Gupta, “Legolas: Deep leg-inertial odometry,” in8th Annual Conference on Robot Learning, 2024

work page 2024