Tree Learning: A Multi-Skill Continual Learning Framework for Humanoid Robots

Linqi Ye; Yifei Yan

arxiv: 2604.12909 · v1 · submitted 2026-04-14 · 💻 cs.RO

Tree Learning: A Multi-Skill Continual Learning Framework for Humanoid Robots

Yifei Yan , Linqi Ye This is my paper

Pith reviewed 2026-05-10 15:12 UTC · model grok-4.3

classification 💻 cs.RO

keywords continual learninghumanoid robotsreinforcement learningmulti-skill learninglocomotionparameter inheritancecatastrophic forgettingtree learning

0 comments

The pith

A root-branch tree of parameters lets humanoid robots add new locomotion skills without forgetting any previous ones.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tries to establish that organizing reinforcement learning skills into a root and branches, where new skills inherit parameters from the root, allows humanoid robots to learn multiple abilities sequentially while keeping all earlier skills intact. This matters if true because it avoids the usual trade-off between adding skills and losing old performance, and it does so without needing extremely large models or complex topology changes. The inherited parameters act as motion priors that speed convergence on new tasks like walking or jumping. Additional mechanisms handle different motion types and shape rewards to help learning. Experiments show the method produces higher rewards than training all skills at once and supports instant switching between them.

Core claim

Tree Learning adopts a root-branch hierarchical parameter inheritance mechanism that provides motion priors for branch skills through parameter reuse to fundamentally prevent catastrophic forgetting. A multi-modal feedforward adaptation mechanism combining phase modulation and interpolation supports both periodic and aperiodic motions, while a task-level reward shaping strategy accelerates skill convergence. Unity-based simulation experiments show that, in contrast to simultaneous multi-task training, Tree Learning achieves higher rewards across various representative locomotion skills while maintaining a 100% skill retention rate, enabling seamless multi-skill switching and real-time 2D/3D/

What carries the argument

The root-branch hierarchical parameter inheritance mechanism, which reuses root parameters to supply motion priors to new branch skills and thereby blocks forgetting.

If this is right

New skills reach higher final rewards than when all skills are trained simultaneously.
Every learned skill remains at full performance with no degradation after new skills are added.
Robots can switch between skills instantly and accept real-time interactive commands.
The same framework handles both repeating motions like walking and one-off motions like jumps.
Performance holds across distinct simulated settings including game-like interaction and navigation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The tree could grow to dozens of skills by adding branches without a matching increase in total parameters.
Training time for each new skill might stay lower than retraining a flat multi-task model from scratch.
The inheritance pattern could be tested on physical humanoid hardware to check whether simulation retention transfers.
Similar root-branch reuse might help continual learning in other embodied tasks such as object manipulation.

Load-bearing premise

Inheriting parameters from the root skill supplies motion priors strong enough to prevent any performance loss on earlier skills when new branches are trained.

What would settle it

Measuring a drop in reward or success rate on any previously mastered skill after training a new branch skill in the same Unity simulation environment would disprove the 100% retention claim.

Figures

Figures reproduced from arXiv: 2604.12909 by Linqi Ye, Yifei Yan.

**Figure 3.** Figure 3: Interactive multi-skill control. 3. Methodology 3.1. Feedforward Action Design For each skill, motion prior is used as a feedforward action to achieve a specific motion style. Different actions are realized through simple feedforward signals. For periodic and aperiodic actions, phase-based and interpolation-based methods are employed, respectively. 3.1.1. Phase modulation method For periodic locomotion ski… view at source ↗

**Figure 2.** Figure 2: Tree Learning for Unitree G1. The Tree Learning framework enforces consistency of the global state space and action representation by design. Since the root skill and all branch sub-networks share identical sensor input interfaces and joint control output dimensions, and network switching is only performed between skills with overlapping action sequences and state spaces, the system maintains high consis… view at source ↗

**Figure 4.** Figure 4: Reward comparison of walk skill. : Preprint submitted to Elsevier Page 4 of 11 [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗

**Figure 8.** Figure 8: Reward comparison of crawl skill [PITH_FULL_IMAGE:figures/full_fig_p005_8.png] view at source ↗

**Figure 9.** Figure 9: Reward comparison of one-leg stand skill [PITH_FULL_IMAGE:figures/full_fig_p005_9.png] view at source ↗

**Figure 10.** Figure 10: Final reward comparison. : Preprint submitted to Elsevier Page 5 of 11 [PITH_FULL_IMAGE:figures/full_fig_p005_10.png] view at source ↗

**Figure 11.** Figure 11: Super Mario simulation scene. The robot sequentially performed (a) walking, (b) running to escape a ghost, (c) climbing up and down stairs, (d) lying prone, (e) crawling through a tunnel, (f) standing up, and (g) jumping to hit boxes to collect coins or gifts along the way. Finally, (h) the robot kicks a ball into the goal and wins. (i) is the global view. 5. Super Mario Scenario Experiment 5.1. Simulatio… view at source ↗

**Figure 12.** Figure 12: Ghost chasing scene. As shown in [PITH_FULL_IMAGE:figures/full_fig_p006_12.png] view at source ↗

**Figure 14.** Figure 14: Autonomous navigation workflow. 6.1. Navigation Performance Analysis [PITH_FULL_IMAGE:figures/full_fig_p007_14.png] view at source ↗

**Figure 13.** Figure 13: Autonomous navigation task. The left shows snapshots during the task. The right shows the top-down trajectory of the robot [PITH_FULL_IMAGE:figures/full_fig_p007_13.png] view at source ↗

**Figure 17.** Figure 17: presents the statistic of the navigation experiment in three forms: pie chart, bar chart, and histogram. The gait distribution pie chart shows that Walk mode accounts for 87.8%, Stair mode for 10.1%, and Run mode for 2.0%, which is consistent with the expected design of mainly normal walking, automatic acceleration for long-distance targets, and stair climbing. The navigation bar chart visually compares… view at source ↗

**Figure 18.** Figure 18: shows the time series of control commands 𝑣𝑟 (forward velocity), 𝑤𝑟 (turning angular velocity). The 𝑣𝑟 curve exhibits a distinct "step-pulse" alternating pattern: 𝑣𝑟 stabilizes at 0.8–1.0 m/s during straight segments and drops rapidly to near zero during turning segments. Correspondingly, the 𝑤𝑟 curve fluctuates significantly during turning segments (peak value ±1.0 rad/s) and remains small during straig… view at source ↗

**Figure 16.** Figure 16: Autonomous navigation metric [PITH_FULL_IMAGE:figures/full_fig_p008_16.png] view at source ↗

**Figure 20.** Figure 20: shows the temporal switching of gait modes during the 240-second experiment in the form of a color band diagram. The robot remained in Walk mode throughout the 0–120 s period, conducting stable locomotion in the flat environment. Then, at approximately 120 s, as the operator specified a farther target point, the system automatically switched to Run mode (lasting about 5 s) for acceleration, and then retur… view at source ↗

**Figure 21.** Figure 21: presents the posture stability data of the robot’s torso. The upper figure shows the time curves of roll and pitch angles. During the flat-ground walking phase, both roll and pitch angles were controlled within about ±2.5°, demonstrating high posture stability. In the stair interval (160–180 s), the Roll angle fluctuations increased to approximately ±(5–7)°, caused by periodic disturbances from the stair… view at source ↗

read the original abstract

As reinforcement learning for humanoid robots evolves from single-task to multi-skill paradigms, efficiently expanding new skills while avoiding catastrophic forgetting has become a key challenge in embodied intelligence. Existing approaches either rely on complex topology adjustments in Mixture-of-Experts (MoE) models or require training extremely large-scale models, making lightweight deployment difficult. To address this, we propose Tree Learning, a multi-skill continual learning framework for humanoid robots. The framework adopts a root-branch hierarchical parameter inheritance mechanism, providing motion priors for branch skills through parameter reuse to fundamentally prevent catastrophic forgetting. A multi-modal feedforward adaptation mechanism combining phase modulation and interpolation is designed to support both periodic and aperiodic motions. A task-level reward shaping strategy is also proposed to accelerate skill convergence. Unity-based simulation experiments show that, in contrast to simultaneous multi-task training, Tree Learning achieves higher rewards across various representative locomotion skills while maintaining a 100% skill retention rate, enabling seamless multi-skill switching and real-time interactive control. We further validate the performance and generalization capability of Tree Learning on two distinct Unity-simulated tasks: a Super Mario-inspired interactive scenario and autonomous navigation in a classical Chinese garden environment.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Tree Learning gives a clean hierarchical way to add humanoid skills without forgetting, but its edge over joint training could be capacity rather than the method.

read the letter

Tree Learning gives a clean hierarchical way to add humanoid skills without forgetting, but its edge over joint training could be capacity rather than the method. The framework uses a root-branch tree for parameter inheritance so new skills reuse motion priors from the root, pairs it with phase-modulated feedforward adaptation that handles both periodic and aperiodic motions through interpolation, and adds task-level reward shaping to speed convergence. In Unity simulations it reportedly delivers higher rewards than simultaneous multi-task training across locomotion skills while keeping 100% retention, plus it supports real-time switching and works on a Mario-style interactive task and garden navigation. This is a straightforward attempt to make continual multi-skill learning lightweight instead of relying on big MoE models or full retraining. The hierarchical reuse idea is sensible for avoiding catastrophic forgetting and the adaptation module tries to cover a useful range of behaviors in one structure. The soft spot is the baseline comparison. The root-branch design necessarily adds branch-specific parameters, so if the simultaneous training baseline used a single shared network without equivalent total capacity or compute, the reward difference may simply reflect extra model size rather than the continual mechanism itself. The abstract supplies no parameter counts, training budgets, run-to-run variance, or ablations that would separate these factors. All evidence stays in simulation with no hardware results shown. This is for robotics people working on incremental skill acquisition for legged robots who need something deployable without massive models. A reader building practical skill libraries could test the framework if the full paper has the missing controls and code. It shows honest engagement with the continual-learning bottleneck and deserves a serious referee to verify the experiments and baselines. I would send it for peer review.

Referee Report

2 major / 0 minor

Summary. The manuscript presents Tree Learning, a multi-skill continual learning framework for humanoid robots. It features a root-branch hierarchical parameter inheritance to reuse parameters as motion priors for new skills, thereby preventing catastrophic forgetting. Additional components include a multi-modal feedforward adaptation mechanism for periodic and aperiodic motions using phase modulation and interpolation, and a task-level reward shaping strategy. Experiments conducted in Unity simulation demonstrate that Tree Learning achieves higher rewards than simultaneous multi-task training across locomotion skills while maintaining 100% skill retention, facilitating seamless switching and real-time control. Further validation is provided on a Super Mario-inspired interactive task and autonomous navigation in a simulated Chinese garden environment.

Significance. If the experimental results are robust, the framework represents a significant advancement in continual learning for robotics by offering a lightweight, hierarchical approach that avoids the need for complex MoE topologies or massive models. The emphasis on parameter reuse for priors and the adaptation mechanisms could enable efficient skill expansion on humanoid platforms. The reported 100% retention rate and superior rewards suggest effective mitigation of forgetting, which is a major challenge in multi-task RL. However, the significance is tempered by the reliance on simulation; transfer to physical robots remains untested. The design choices for handling both periodic and aperiodic motions broaden its applicability.

major comments (2)

[Abstract] Abstract: The headline claim that Tree Learning outperforms simultaneous multi-task training on rewards while achieving 100% retention rests on an unverified assumption of comparable setups. The root-branch inheritance necessarily introduces additional branch-specific parameters beyond a single shared network; without reported total parameter counts, training steps per skill, or confirmation that the simultaneous baseline received equivalent capacity and compute, the reward gap cannot be attributed to the continual mechanism rather than specialization capacity.
[Unity-based simulation experiments] Unity-based simulation experiments: The central performance claims of higher rewards across locomotion skills and 100% skill retention lack visible controls, metrics, statistical details, or ablation evidence. This makes it impossible to evaluate whether the results are robust or whether the hierarchical structure's benefits are isolated from other factors such as the reward shaping or adaptation modules.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive comments. We address the concerns about experimental comparability and robustness below. We will revise the manuscript to include the requested details on parameter counts, training procedures, ablations, and statistical analyses.

read point-by-point responses

Referee: [Abstract] Abstract: The headline claim that Tree Learning outperforms simultaneous multi-task training on rewards while achieving 100% retention rests on an unverified assumption of comparable setups. The root-branch inheritance necessarily introduces additional branch-specific parameters beyond a single shared network; without reported total parameter counts, training steps per skill, or confirmation that the simultaneous baseline received equivalent capacity and compute, the reward gap cannot be attributed to the continual mechanism rather than specialization capacity.

Authors: We agree that the abstract claim requires supporting details on setup equivalence to allow proper attribution of results. In the revised version, we will explicitly report total parameter counts for the Tree Learning architecture (root plus branches) versus the simultaneous multi-task baseline. The root parameters are shared across all skills as motion priors, while branch-specific parameters are limited to lightweight adaptation layers; the simultaneous baseline was configured with a network whose total capacity matches the full tree size. We will also document the number of training steps allocated per skill for both approaches and confirm identical compute budgets and environment settings. These additions will clarify that performance differences arise from the continual learning design rather than unequal capacity. revision: yes
Referee: [Unity-based simulation experiments] Unity-based simulation experiments: The central performance claims of higher rewards across locomotion skills and 100% skill retention lack visible controls, metrics, statistical details, or ablation evidence. This makes it impossible to evaluate whether the results are robust or whether the hierarchical structure's benefits are isolated from other factors such as the reward shaping or adaptation modules.

Authors: We acknowledge that the experimental presentation would be strengthened by additional controls and evidence. In the revision, we will add: (i) ablation studies that systematically disable the root-branch inheritance, multi-modal adaptation, and task-level reward shaping to isolate each component's contribution; (ii) statistical reporting including mean rewards, standard deviations, and confidence intervals computed over multiple independent runs; (iii) training and retention curves showing performance over time for all skills; and (iv) explicit confirmation that all compared methods used identical simulation parameters, episode lengths, and random seeds. These changes will demonstrate robustness and attribute benefits specifically to the hierarchical structure. revision: yes

Circularity Check

0 steps flagged

No significant circularity: empirical framework with independent simulation results

full rationale

The paper introduces Tree Learning as a novel root-branch hierarchical framework with adaptation mechanisms and reward shaping, then reports outcomes from Unity simulation experiments on locomotion skills, interactive scenarios, and navigation. No equations, fitted parameters, or predictions are defined in terms of themselves; the 100% retention and reward comparisons are presented as measured results from separate training runs rather than derived by construction from the method's own inputs. No self-citations are invoked as load-bearing uniqueness theorems or ansatzes, and the central claims rest on external benchmark comparisons rather than renaming or self-referential fitting. The derivation chain is self-contained as an algorithmic proposal plus empirical validation.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review; no explicit free parameters, axioms, or invented entities are stated. The framework itself is introduced as a new construct whose internal mechanisms are not further decomposed here.

pith-pipeline@v0.9.0 · 5502 in / 1130 out tokens · 62927 ms · 2026-05-10T15:12:01.061233+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

26 extracted references · 26 canonical work pages · 3 internal anchors

[1]

Learning robust and agile legged locomotion using adversarial motion priors

Wu, J., Xin, G., Qi, C., Xue, Y., 2023. Learning robust and agile legged locomotion using adversarial motion priors. IEEE Robotics andAutomationLetters8,4975–4982. doi:10.1109/LRA.2023.3290509

work page doi:10.1109/lra.2023.3290509 2023
[2]

Robustrobotwalker:Learn- ingagilelocomotionovertinytraps

Zhu,S.,Huang,R.,Mou,L.,etal.,2024. Robustrobotwalker:Learn- ingagilelocomotionovertinytraps. arXivpreprintarXiv:2409.07409

work page arXiv 2024
[3]

Learning quadrupedal locomotion over challenging terrain

Lee, J., Hwangbo, J., Wellhausen, L., et al., 2020. Learning quadrupedal locomotion over challenging terrain. Science Robotics 5, eabc5986

work page 2020
[4]

Learning-basedlegged locomotion:Stateoftheartandfutureperspectives

Ha,S.,Lee,J.,vandePanne,M.,etal.,2025. Learning-basedlegged locomotion:Stateoftheartandfutureperspectives. TheInternational Journal of Robotics Research 44, 1396–1427

work page 2025
[5]

Switch control of passive walking robot under variable road conditions

Liu, L.M., Tian, Y.T., Li, J.F., et al., 2011. Switch control of passive walking robot under variable road conditions. Control and Decision 26, 1203–1208

work page 2011
[6]

Overviewoftheintelligent robot training platform

Xie,B.,Chen,Y.L.,Liu,X.L.,etal.,2025. Overviewoftheintelligent robot training platform. Space Electronic Technology 22, 1–9

work page 2025
[7]

Review of domestic humanoid robot development in 2024

Gou, G.Z., Guo, M., 2025. Review of domestic humanoid robot development in 2024. Robot Technique and Application 9, 5–13. doi:10.3969/j.issn.1004-6437.2025.02.007

work page doi:10.3969/j.issn.1004-6437.2025.02.007 2025
[8]

Review of quadruped robotresearch basedon deepreinforcement learning

Liu, W.L., Li, B., Hou, L.D., et al., 2022. Review of quadruped robotresearch basedon deepreinforcement learning. Journalof Qilu University of Technology 32, 67–74

work page 2022
[9]

A dual vision- guided mobile robot control approach for multi-target path planning and intelligent pickup

Zhang, W.Y., Xia, D.W., Chang, G.Y., et al., 2025. A dual vision- guided mobile robot control approach for multi-target path planning and intelligent pickup. Robotics and Autonomous Systems 190, 104993

work page 2025
[10]

BeyondMimic: From Motion Tracking to Versatile Humanoid Control via Guided Diffusion

Liao, Q., Truong, T.E., Huang, X., et al., 2025. Beyondmimic: From motion tracking to versatile humanoid control via guided diffusion. arXiv preprint arXiv:2508.08241

work page internal anchor Pith review arXiv 2025
[11]

Available: https://arxiv.org/abs/2506.12851

Xie, W., Han, J., Zheng, J., et al., 2025. Kungfubot: Physics-based humanoid whole-body control for learning highly-dynamic skills. arXiv preprint arXiv:2506.12851

work page arXiv 2025
[12]

arXiv preprint arXiv:2509.16638 , year=

Han, J., Xie, W., Zheng, J., et al., 2025. Kungfubot2: Learning versatile motion skills for humanoid whole-body control. arXiv preprint arXiv:2509.16638

work page arXiv 2025
[13]

Track any motions under any disturbances

Zhang, Z., Guo, J., Chen, C., et al., 2025. Track any motions under any disturbances. arXiv preprint arXiv:2509.13833

work page arXiv 2025
[14]

K., Precup, D., and Castro, P

Obando-Ceron, J., Sokar, G., Willi, T., et al., 2024. Mixtures of experts unlock parameter scaling for deep rl. arXiv preprint arXiv:2402.08609

work page arXiv 2024
[15]

Multi-expert learning of adaptive legged locomotion

Yang, C., Yuan, K., Zhu, Q., et al., 2020. Multi-expert learning of adaptive legged locomotion. Science Robotics 5, eabb2174

work page 2020
[16]

Moe-loco: Mixture of experts for multitask locomotion, in: 2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp

Huang, R., Zhu, S., Du, Y., et al., 2025. Moe-loco: Mixture of experts for multitask locomotion, in: 2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 14218– 14225

work page 2025
[17]

Multi-task reinforcement learning with attention-based mixture of experts

Cheng, G., Dong, L., Cai, W., et al., 2023. Multi-task reinforcement learning with attention-based mixture of experts. IEEE Robotics and Automation Letters 8, 3812–3819. :Preprint submitted to Elsevier Page 10 of 11

work page 2023
[18]

Sonic: Supersizing motion tracking for natural humanoid whole-body control.arXiv preprint arXiv:2511.07820, 2025

Luo, Z., Yuan, Y., Wang, T., et al., 2025. Sonic: Supersizing motion tracking for natural humanoid whole-body control. arXiv preprint arXiv:2511.07820

work page internal anchor Pith review arXiv 2025
[19]

Deepmimic:Example- guideddeepreinforcementlearningofphysics-basedcharacterskills

Peng,X.B.,Abbeel,P.,Levine,S.,etal.,2018. Deepmimic:Example- guideddeepreinforcementlearningofphysics-basedcharacterskills. ACM Transactions on Graphics 37, 1–14

work page 2018
[20]

Li, Z., Peng, X.B., Abbeel, P., Levine, S., Berseth, G., Sreenath, K.,

work page
[21]

arXiv preprint arXiv:2401.16889

Reinforcement learning for versatile, dynamic, and robust bipedal locomotion control. arXiv preprint arXiv:2401.16889

work page arXiv
[22]

Whole-body motion strat- egy intelligent generation method for multi-skilled humanoid robots

Zhang, L.J., Tang, L., Liu, L., 2025. Whole-body motion strat- egy intelligent generation method for multi-skilled humanoid robots. Aerospace Control and Application 51, 28–40

work page 2025
[23]

Synloco: Synthesizing central pattern generator and reinforcement learning for quadruped locomotion,in:2024IEEE63rdConferenceonDecisionandControl (CDC), pp

Zhang, X., Xiao, Z., Zhang, Q., et al., 2024. Synloco: Synthesizing central pattern generator and reinforcement learning for quadruped locomotion,in:2024IEEE63rdConferenceonDecisionandControl (CDC), pp. 2640–2645

work page 2024
[24]

Visual cpg-rl: Learning central pattern generators for visually-guided quadruped locomotion,in:2024IEEEInternationalConferenceonRoboticsand Automation (ICRA), pp

Bellegarda, G., Shafiee, M., Ijspeert, A.J., 2024. Visual cpg-rl: Learning central pattern generators for visually-guided quadruped locomotion,in:2024IEEEInternationalConferenceonRoboticsand Automation (ICRA), pp. 1420–1427

work page 2024
[25]

Terrain-adaptive locomotion control for an underwater hexapod robot: Sensing leg-terrain inter- action with proprioceptive sensors

Chen, L., Cui, R., Yan, W., et al., 2024. Terrain-adaptive locomotion control for an underwater hexapod robot: Sensing leg-terrain inter- action with proprioceptive sensors. IEEE Robotics & Automation Magazine 31, 41–52

work page 2024
[26]

Proximal Policy Optimization Algorithms

Schulman, J., Wolski, F., Dhariwal, P., et al., 2017. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 . :Preprint submitted to Elsevier Page 11 of 11

work page internal anchor Pith review Pith/arXiv arXiv 2017

[1] [1]

Learning robust and agile legged locomotion using adversarial motion priors

Wu, J., Xin, G., Qi, C., Xue, Y., 2023. Learning robust and agile legged locomotion using adversarial motion priors. IEEE Robotics andAutomationLetters8,4975–4982. doi:10.1109/LRA.2023.3290509

work page doi:10.1109/lra.2023.3290509 2023

[2] [2]

Robustrobotwalker:Learn- ingagilelocomotionovertinytraps

Zhu,S.,Huang,R.,Mou,L.,etal.,2024. Robustrobotwalker:Learn- ingagilelocomotionovertinytraps. arXivpreprintarXiv:2409.07409

work page arXiv 2024

[3] [3]

Learning quadrupedal locomotion over challenging terrain

Lee, J., Hwangbo, J., Wellhausen, L., et al., 2020. Learning quadrupedal locomotion over challenging terrain. Science Robotics 5, eabc5986

work page 2020

[4] [4]

Learning-basedlegged locomotion:Stateoftheartandfutureperspectives

Ha,S.,Lee,J.,vandePanne,M.,etal.,2025. Learning-basedlegged locomotion:Stateoftheartandfutureperspectives. TheInternational Journal of Robotics Research 44, 1396–1427

work page 2025

[5] [5]

Switch control of passive walking robot under variable road conditions

Liu, L.M., Tian, Y.T., Li, J.F., et al., 2011. Switch control of passive walking robot under variable road conditions. Control and Decision 26, 1203–1208

work page 2011

[6] [6]

Overviewoftheintelligent robot training platform

Xie,B.,Chen,Y.L.,Liu,X.L.,etal.,2025. Overviewoftheintelligent robot training platform. Space Electronic Technology 22, 1–9

work page 2025

[7] [7]

Review of domestic humanoid robot development in 2024

Gou, G.Z., Guo, M., 2025. Review of domestic humanoid robot development in 2024. Robot Technique and Application 9, 5–13. doi:10.3969/j.issn.1004-6437.2025.02.007

work page doi:10.3969/j.issn.1004-6437.2025.02.007 2025

[8] [8]

Review of quadruped robotresearch basedon deepreinforcement learning

Liu, W.L., Li, B., Hou, L.D., et al., 2022. Review of quadruped robotresearch basedon deepreinforcement learning. Journalof Qilu University of Technology 32, 67–74

work page 2022

[9] [9]

A dual vision- guided mobile robot control approach for multi-target path planning and intelligent pickup

Zhang, W.Y., Xia, D.W., Chang, G.Y., et al., 2025. A dual vision- guided mobile robot control approach for multi-target path planning and intelligent pickup. Robotics and Autonomous Systems 190, 104993

work page 2025

[10] [10]

BeyondMimic: From Motion Tracking to Versatile Humanoid Control via Guided Diffusion

Liao, Q., Truong, T.E., Huang, X., et al., 2025. Beyondmimic: From motion tracking to versatile humanoid control via guided diffusion. arXiv preprint arXiv:2508.08241

work page internal anchor Pith review arXiv 2025

[11] [11]

Available: https://arxiv.org/abs/2506.12851

Xie, W., Han, J., Zheng, J., et al., 2025. Kungfubot: Physics-based humanoid whole-body control for learning highly-dynamic skills. arXiv preprint arXiv:2506.12851

work page arXiv 2025

[12] [12]

arXiv preprint arXiv:2509.16638 , year=

Han, J., Xie, W., Zheng, J., et al., 2025. Kungfubot2: Learning versatile motion skills for humanoid whole-body control. arXiv preprint arXiv:2509.16638

work page arXiv 2025

[13] [13]

Track any motions under any disturbances

Zhang, Z., Guo, J., Chen, C., et al., 2025. Track any motions under any disturbances. arXiv preprint arXiv:2509.13833

work page arXiv 2025

[14] [14]

K., Precup, D., and Castro, P

Obando-Ceron, J., Sokar, G., Willi, T., et al., 2024. Mixtures of experts unlock parameter scaling for deep rl. arXiv preprint arXiv:2402.08609

work page arXiv 2024

[15] [15]

Multi-expert learning of adaptive legged locomotion

Yang, C., Yuan, K., Zhu, Q., et al., 2020. Multi-expert learning of adaptive legged locomotion. Science Robotics 5, eabb2174

work page 2020

[16] [16]

Moe-loco: Mixture of experts for multitask locomotion, in: 2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp

Huang, R., Zhu, S., Du, Y., et al., 2025. Moe-loco: Mixture of experts for multitask locomotion, in: 2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 14218– 14225

work page 2025

[17] [17]

Multi-task reinforcement learning with attention-based mixture of experts

Cheng, G., Dong, L., Cai, W., et al., 2023. Multi-task reinforcement learning with attention-based mixture of experts. IEEE Robotics and Automation Letters 8, 3812–3819. :Preprint submitted to Elsevier Page 10 of 11

work page 2023

[18] [18]

Sonic: Supersizing motion tracking for natural humanoid whole-body control.arXiv preprint arXiv:2511.07820, 2025

Luo, Z., Yuan, Y., Wang, T., et al., 2025. Sonic: Supersizing motion tracking for natural humanoid whole-body control. arXiv preprint arXiv:2511.07820

work page internal anchor Pith review arXiv 2025

[19] [19]

Deepmimic:Example- guideddeepreinforcementlearningofphysics-basedcharacterskills

Peng,X.B.,Abbeel,P.,Levine,S.,etal.,2018. Deepmimic:Example- guideddeepreinforcementlearningofphysics-basedcharacterskills. ACM Transactions on Graphics 37, 1–14

work page 2018

[20] [20]

Li, Z., Peng, X.B., Abbeel, P., Levine, S., Berseth, G., Sreenath, K.,

work page

[21] [21]

arXiv preprint arXiv:2401.16889

Reinforcement learning for versatile, dynamic, and robust bipedal locomotion control. arXiv preprint arXiv:2401.16889

work page arXiv

[22] [22]

Whole-body motion strat- egy intelligent generation method for multi-skilled humanoid robots

Zhang, L.J., Tang, L., Liu, L., 2025. Whole-body motion strat- egy intelligent generation method for multi-skilled humanoid robots. Aerospace Control and Application 51, 28–40

work page 2025

[23] [23]

Synloco: Synthesizing central pattern generator and reinforcement learning for quadruped locomotion,in:2024IEEE63rdConferenceonDecisionandControl (CDC), pp

Zhang, X., Xiao, Z., Zhang, Q., et al., 2024. Synloco: Synthesizing central pattern generator and reinforcement learning for quadruped locomotion,in:2024IEEE63rdConferenceonDecisionandControl (CDC), pp. 2640–2645

work page 2024

[24] [24]

Visual cpg-rl: Learning central pattern generators for visually-guided quadruped locomotion,in:2024IEEEInternationalConferenceonRoboticsand Automation (ICRA), pp

Bellegarda, G., Shafiee, M., Ijspeert, A.J., 2024. Visual cpg-rl: Learning central pattern generators for visually-guided quadruped locomotion,in:2024IEEEInternationalConferenceonRoboticsand Automation (ICRA), pp. 1420–1427

work page 2024

[25] [25]

Terrain-adaptive locomotion control for an underwater hexapod robot: Sensing leg-terrain inter- action with proprioceptive sensors

Chen, L., Cui, R., Yan, W., et al., 2024. Terrain-adaptive locomotion control for an underwater hexapod robot: Sensing leg-terrain inter- action with proprioceptive sensors. IEEE Robotics & Automation Magazine 31, 41–52

work page 2024

[26] [26]

Proximal Policy Optimization Algorithms

Schulman, J., Wolski, F., Dhariwal, P., et al., 2017. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 . :Preprint submitted to Elsevier Page 11 of 11

work page internal anchor Pith review Pith/arXiv arXiv 2017