TeleGate: Whole-Body Humanoid Teleoperation via Gated Expert Selection with Motion Prior
Pith reviewed 2026-05-16 05:37 UTC · model grok-4.3
Add this Pith Number to your LaTeX paper
What is a Pith Number?\usepackage{pith}
\pithnumber{777KAG5N}
Prints a linked pith:777KAG5N badge after your title and writes the identifier into PDF metadata. Compiles on arXiv with no extra files. Learn more
The pith
A lightweight gating network selects among expert policies for precise whole-body humanoid teleoperation while a motion prior supplies missing future intent.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
TeleGate preserves the full capability of domain-specific expert policies by training a lightweight gating network, which dynamically activates experts in real-time based on proprioceptive states and reference trajectories. To compensate for the absence of future reference trajectories in real-time teleoperation, a VAE-based motion prior module extracts implicit future motion intent from historical observations, enabling anticipatory control for motions requiring prediction such as jumping and standing up.
What carries the argument
Lightweight gating network that selects which expert policy to activate, paired with a VAE-based motion prior that infers future intent from past observations.
If this is right
- High-precision real-time tracking holds for running, fall recovery, and jumping without the accuracy loss typical of single-policy distillation.
- Only 2.5 hours of motion-capture data suffice for training that generalizes to both simulation and the physical Unitree G1 robot.
- Success rate and tracking error both improve over baseline methods that merge experts into one policy.
- The same gating-plus-prior structure supports deployment in unstructured environments where motions vary rapidly.
Where Pith is reading between the lines
- The gating idea could apply to other multi-skill robotic domains where merging policies degrades peak performance on any single skill.
- If the motion prior generalizes beyond the training distribution, similar modules might reduce reliance on future reference data in other teleoperation or imitation settings.
- Testing whether adding more experts further widens the motion range without increasing gate error would be a direct next measurement.
Load-bearing premise
The lightweight gating network can reliably pick the correct expert from proprioceptive states and references, and the VAE can accurately infer future motion intent from history alone.
What would settle it
Record the gating network's expert choices during a failed jump or fall-recovery trial; if the wrong expert is chosen more than half the time and performance collapses, the selection mechanism does not work as claimed.
Figures
read the original abstract
Real-time whole-body teleoperation is a critical method for humanoid robots to perform complex tasks in unstructured environments. However, developing a unified controller that robustly supports diverse human motions remains a significant challenge. Existing methods typically distill multiple expert policies into a single general policy, which often inevitably leads to performance degradation, particularly on highly dynamic motions. This paper presents TeleGate, a unified whole-body teleoperation framework for humanoid robots that achieves high-precision tracking across various motions while avoiding the performance loss inherent in knowledge distillation. Our key idea is to preserve the full capability of domain-specific expert policies by training a lightweight gating network, which dynamically activates experts in real-time based on proprioceptive states and reference trajectories. Furthermore, to compensate for the absence of future reference trajectories in real-time teleoperation, we introduce a VAE-based motion prior module that extracts implicit future motion intent from historical observations, enabling anticipatory control for motions requiring prediction such as jumping and standing up. We conducted empirical evaluations in simulation and also deployed our technique on the Unitree G1 humanoid robot. Using only 2.5 hours of motion capture data for training, our TeleGate achieves high-precision real-time teleoperation across diverse dynamic motions (e.g., running, fall recovery, and jumping), significantly outperforming the baseline methods in both tracking accuracy and success rate.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces TeleGate, a whole-body teleoperation system for humanoid robots that preserves specialized expert policies via a lightweight gating network selecting experts in real time from proprioceptive states and reference trajectories, augmented by a VAE-based motion prior that infers future intent from historical observations to support anticipatory control. It claims high-precision tracking on dynamic motions (running, jumping, fall recovery) in simulation and on the Unitree G1, trained from 2.5 hours of mocap data, with significant gains over distillation baselines in accuracy and success rate.
Significance. If the empirical advantages are confirmed with detailed metrics and controls, the gated-expert approach could meaningfully reduce the performance trade-offs typical of single-policy distillation for agile humanoid control, offering a practical route to robust real-time teleoperation with modest data requirements. The explicit separation of expert training from gating and the addition of a learned motion prior are technically coherent contributions.
major comments (3)
- [Abstract] Abstract: the central claim of 'significantly outperforming' baselines in tracking accuracy and success rate is stated without any numerical values, baseline implementations, statistical tests, or error bars, leaving the magnitude and reliability of the reported advantage impossible to assess from the provided text.
- [Evaluation] Evaluation section (implied by abstract): no quantitative results are supplied on gating-network accuracy, switching frequency, chattering, or latency under real-time constraints and sensor noise on the Unitree G1; without these, the skeptic concern that rapid state transitions (jumping, fall recovery) could cause incorrect expert activation remains unaddressed and load-bearing for the stability claim.
- [Method] Method (VAE motion prior): the abstract asserts that the VAE extracts 'implicit future motion intent' enabling anticipatory control, yet no prediction-error metrics, ablation on history length, or comparison against simpler predictors are reported, so the necessity and effectiveness of this module for the claimed dynamic motions cannot be verified.
minor comments (2)
- [Method] Clarify the precise conditioning inputs to the gating network (proprioception vs. reference trajectory encoding) and whether any smoothing or hysteresis is applied to prevent chattering.
- [Experiments] The training data volume (2.5 h) is stated but the split between expert-policy training and gating/VAE training is not detailed; add this breakdown.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback. We address each major comment point by point below, agreeing to strengthen the manuscript with additional quantitative details and analyses where appropriate.
read point-by-point responses
-
Referee: [Abstract] Abstract: the central claim of 'significantly outperforming' baselines in tracking accuracy and success rate is stated without any numerical values, baseline implementations, statistical tests, or error bars, leaving the magnitude and reliability of the reported advantage impossible to assess from the provided text.
Authors: We agree that the abstract would benefit from explicit numerical support for the performance claims. In the revised version, we will update the abstract to report key quantitative results, including average joint position RMSE, velocity tracking errors, and success rates for dynamic motions (running, jumping, fall recovery), with direct comparisons to the distillation baselines. These values will be drawn from the full evaluation tables and will reference the specific experimental conditions. revision: yes
-
Referee: [Evaluation] Evaluation section (implied by abstract): no quantitative results are supplied on gating-network accuracy, switching frequency, chattering, or latency under real-time constraints and sensor noise on the Unitree G1; without these, the skeptic concern that rapid state transitions (jumping, fall recovery) could cause incorrect expert activation remains unaddressed and load-bearing for the stability claim.
Authors: We acknowledge that these gating-specific metrics are important for addressing real-time stability concerns. We will add a dedicated subsection in the Experiments section that reports gating-network accuracy (expert selection precision), switching frequency, chattering statistics, and end-to-end latency measurements. This analysis will include results from the Unitree G1 hardware under realistic sensor noise, with focused case studies on rapid transitions such as jumping and fall recovery to directly mitigate the concern about incorrect expert activation. revision: yes
-
Referee: [Method] Method (VAE motion prior): the abstract asserts that the VAE extracts 'implicit future motion intent' enabling anticipatory control, yet no prediction-error metrics, ablation on history length, or comparison against simpler predictors are reported, so the necessity and effectiveness of this module for the claimed dynamic motions cannot be verified.
Authors: We agree that explicit validation of the VAE motion prior is needed. In the revision, we will add quantitative prediction-error metrics (e.g., future pose MSE over multiple horizons), an ablation study varying history length, and comparisons against simpler baselines such as constant-velocity extrapolation and LSTM predictors. These results will be presented in the Method and Experiments sections to demonstrate the VAE's contribution to anticipatory control for dynamic motions. revision: yes
Circularity Check
No circularity; claims rest on separate training data and external baseline comparisons
full rationale
The paper trains a lightweight gating network and VAE motion prior on 2.5 hours of motion-capture data, then evaluates tracking accuracy and success rate on held-out dynamic motions (running, jumping, fall recovery) plus real-robot deployment on Unitree G1. No equations or steps reduce by construction to their own inputs; the gating selection and anticipatory control are learned modules whose outputs are measured against independent baselines rather than defined to match them. No self-citations are load-bearing, no uniqueness theorems are imported from the authors' prior work, and no fitted parameters are relabeled as predictions. The derivation chain is therefore self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
free parameters (2)
- gating network weights
- VAE latent parameters
axioms (2)
- domain assumption Expert policies can be trained independently for distinct motion domains without interference
- domain assumption Historical proprioceptive observations contain sufficient information to infer future motion intent for dynamic actions
invented entities (1)
-
VAE-based motion prior module
no independent evidence
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
lightweight gating network... dynamically activates experts... VAE-based motion prior module that extracts implicit future motion intent
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
expert policies... gated expert selection... VAE... anticipatory control
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
- [1]
-
[2]
Homie: Humanoid loco- manipulation with isomorphic exoskeleton cockpit
Qingwei Ben, Feiyu Jia, Jia Zeng, Junting Dong, Dahua Lin, and Jiangmiao Pang. Homie: Humanoid loco- manipulation with isomorphic exoskeleton cockpit. In Robotics: Science and Systems (RSS), 2025
work page 2025
-
[3]
$\pi_0$: A Vision-Language-Action Flow Model for General Robot Control
Kevin Black, Noah Brown, Danny Driess, Adnan Es- mail, Michael Equi, Chelsea Finn, Niccolo Fusai, Lachy Groom, Karol Hausman, Brian Ichter, et al.π0: A vision- language-action flow model for general robot control. arXiv preprint arXiv:2410.24164, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[4]
Manuel Caeiro-Rodr ´ıguez, Iv ´an Otero-Gonz ´alez, Fer- nando A Mikic-Fonte, and Mart ´ın Llamas-Nistal. A systematic review of commercial smart gloves: Current status and applications.Sensors, 21(8):2667, 2021
work page 2021
-
[5]
Learning smooth humanoid locomotion through lipschitz-constrained poli- cies
Zixuan Chen, Xialin He, Yen-Jen Wang, Qiayuan Liao, Yanjie Ze, Zhongyu Li, S Shankar Sastry, Jiajun Wu, Koushil Sreenath, Saurabh Gupta, et al. Learning smooth humanoid locomotion through lipschitz-constrained poli- cies. InIEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2025
work page 2025
-
[6]
Gmt: Gen- eral motion tracking for humanoid whole-body control
Zixuan Chen, Mazeyu Ji, Xuxin Cheng, Xuanbin Peng, Xue Bin Peng, and Xiaolong Wang. Gmt: General motion tracking for humanoid whole-body control.arXiv preprint arXiv:2506.14770, 2025
-
[7]
Expressive whole-body control for humanoid robots
Xuxin Cheng, Yandong Ji, Junming Chen, Ruihan Yang, Ge Yang, and Xiaolong Wang. Expressive whole-body control for humanoid robots. InRobotics: Science and Systems (RSS), 2024
work page 2024
-
[8]
Open-television: Teleoperation with immersive active visual feedback
Xuxin Cheng, Jialong Li, Shiqi Yang, Ge Yang, and Xiaolong Wang. Open-television: Teleoperation with immersive active visual feedback. InConference on Robot Learning (CoRL), 2024
work page 2024
-
[9]
Stefano Dafarra, Ugo Pattacini, Giulio Romualdi, Lorenzo Rapetti, Riccardo Grieco, Kourosh Darvish, Gi- anluca Milani, Enrico Valli, Ines Sorrentino, Paolo Maria Viceconte, et al. icub3 avatar system: Enabling remote fully immersive embodiment of humanoid robots.Sci- ence Robotics, 9(86):eadh3834, 2024
work page 2024
-
[10]
Whole-body geometric retargeting for humanoid robots
Kourosh Darvish, Yeshasvi Tirupachuri, Giulio Ro- mualdi, Lorenzo Rapetti, Diego Ferigo, Francisco Javier Andrade Chavez, and Daniele Pucci. Whole-body geometric retargeting for humanoid robots. InIEEE- RAS International Conference on Humanoid Robots (Hu- manoids), pages 679–686, 2019
work page 2019
-
[11]
Legibility and predictability of robot motion
Anca D Dragan, Kenton CT Lee, and Siddhartha S Srinivasa. Legibility and predictability of robot motion. InACM/IEEE International Conference on Human-Robot Interaction (HRI), pages 301–308, 2013
work page 2013
-
[12]
Learning Multi-Modal Whole-Body Control for Real-World Humanoid Robots
Pranay Dugar, Aayam Shrestha, Fangzhou Yu, Bart van Marum, and Alan Fern. Learning multi-modal whole- body control for real-world humanoid robots.arXiv preprint arXiv:2408.07295, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[13]
Airexo: Low-cost exoskeletons for learning whole- arm manipulation in the wild
Hongjie Fang, Hao-Shu Fang, Yiming Wang, Jieji Ren, Jingjing Chen, Ruo Zhang, Weiming Wang, and Cewu Lu. Airexo: Low-cost exoskeletons for learning whole- arm manipulation in the wild. InIEEE International Conference on Robotics and Automation (ICRA), pages 15031–15038, 2024
work page 2024
-
[14]
Humanplus: Humanoid shadowing and imitation from humans
Zipeng Fu, Qingqing Zhao, Qi Wu, Gordon Wetzstein, and Chelsea Finn. Humanplus: Humanoid shadowing and imitation from humans. InConference on Robot Learning (CoRL), 2024
work page 2024
-
[15]
Mobile aloha: Learning bimanual mobile manipulation with low- cost whole-body teleoperation
Zipeng Fu, Tony Z Zhao, and Chelsea Finn. Mobile aloha: Learning bimanual mobile manipulation with low- cost whole-body teleoperation. InConference on Robot Learning (CoRL), 2024
work page 2024
- [16]
-
[17]
Omnih2o: Universal and dexterous human- to-humanoid whole-body teleoperation and learning
Tairan He, Zhengyi Luo, Xialin He, Wenli Xiao, Chong Zhang, Weinan Zhang, Kris Kitani, Changliu Liu, and Guanya Shi. Omnih2o: Universal and dexterous human- to-humanoid whole-body teleoperation and learning. In Conference on Robot Learning (CoRL), 2024
work page 2024
-
[18]
Learning human- to-humanoid real-time whole-body teleoperation
Tairan He, Zhengyi Luo, Wenli Xiao, Chong Zhang, Kris Kitani, Changliu Liu, and Guanya Shi. Learning human- to-humanoid real-time whole-body teleoperation. In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2024. Oral Presentation
work page 2024
-
[19]
Hodgins, Linxi Fan, Yuke Zhu, Changliu Liu, and Guanya Shi
Tairan He, Jiawei Gao, Wenli Xiao, Yuanhang Zhang, Zi Wang, Jiashun Wang, Zhengyi Luo, Guanqi He, Nikhil Sobanbabu, Chaoyi Pan, Zeji Yi, Guannan Qu, Kris Ki- tani, Jessica K. Hodgins, Linxi Fan, Yuke Zhu, Changliu Liu, and Guanya Shi. Asap: Aligning simulation and real-world physics for learning agile humanoid whole- body skills. InRobotics: Science and S...
work page 2025
-
[20]
Humanup: Learning getting-up policies for real- world humanoid robots
Xialin He, Runpei Dong, Zixuan Chen, and Saurabh Gupta. Humanup: Learning getting-up policies for real- world humanoid robots. InRobotics: Science and Sys- tems (RSS), 2025
work page 2025
-
[21]
Host: Learning humanoid standing-up control across diverse postures
Tao Huang, Junli Ren, Huayi Wang, Zirui Wang, Qing- wei Ben, Muning Wen, Xiao Chen, Jianan Li, and Jiangmiao Pang. Host: Learning humanoid standing-up control across diverse postures. InRobotics: Science and Systems (RSS), 2025. Best Systems Paper Finalist
work page 2025
-
[22]
OPEN TEACH: A versatile teleoperation system for robotic manipulation,
Aadhithya Iyer, Zhuoran Peng, Yinlong Dai, Irmak Guzey, Siddhant Haldar, Soumith Chintala, and Lerrel Pinto. Open teach: A versatile teleoperation system for robotic manipulation.arXiv preprint arXiv:2403.07870, 2024
-
[23]
Exbody2: Ad- vanced expressive humanoid whole-body control.arXiv preprint arXiv:2412.13196, 2024
Mazeyu Ji, Xuanbin Peng, Fangchen Liu, Jialong Li, Ge Yang, Xuxin Cheng, and Xiaolong Wang. Ex- body2: Advanced expressive humanoid whole-body con- trol.arXiv preprint arXiv:2412.13196, 2024
-
[24]
Yizhou Jiang, Ruihai Zhang, Josiah Wong, Chris Wang, Yanjie Ze, Hang Yin, Celso Gokmen, Shuran Song, Jiajun Wu, and Li Fei-Fei. Behavior robot suite: Stream- lining real-world whole-body manipulation for everyday household activities.arXiv preprint arXiv:2503.05652, 2025
-
[25]
Auto-Encoding Variational Bayes
Diederik P Kingma and Max Welling. Auto-encoding variational bayes.arXiv preprint arXiv:1312.6114, 2014. ICLR 2014
work page internal anchor Pith review Pith/arXiv arXiv 2014
-
[26]
Real-time imitation of human whole-body mo- tions by humanoids
Johannes Koenemann, Felix Burget, and Maren Ben- newitz. Real-time imitation of human whole-body mo- tions by humanoids. InIEEE International Conference on Robotics and Automation (ICRA), pages 2806–2812, 2014
work page 2014
-
[27]
Amo: Adaptive mo- tion optimization for hyper-dexterous humanoid whole- body control
Jialong Li, Xuxin Cheng, Tianshu Huang, Shiqi Yang, Ri-Zhao Qiu, and Xiaolong Wang. Amo: Adaptive mo- tion optimization for hyper-dexterous humanoid whole- body control. InRobotics: Science and Systems (RSS), 2025
work page 2025
-
[28]
Okami: Teaching humanoid robots manipulation skills through single video imitation
Jinhan Li, Yifeng Zhu, Yuqi Xie, Zhenyu Jiang, Mingyo Seo, Georgios Pavlakos, and Yuke Zhu. Okami: Teaching humanoid robots manipulation skills through single video imitation. InConference on Robot Learning (CoRL),
-
[29]
Reinforcement learning for robust parameterized loco- motion control of bipedal robots
Zhongyu Li, Xuxin Cheng, Xue Bin Peng, Pieter Abbeel, Sergey Levine, Glen Berseth, and Koushil Sreenath. Reinforcement learning for robust parameterized loco- motion control of bipedal robots. InIEEE International Conference on Robotics and Automation (ICRA), pages 2811–2817, 2021
work page 2021
-
[30]
Zhongyu Li, Xue Bin Peng, Pieter Abbeel, Sergey Levine, Glen Berseth, and Koushil Sreenath. Rein- forcement learning for versatile, dynamic, and robust bipedal locomotion control.The International Journal of Robotics Research, page 02783649241285161, 2024
work page 2024
-
[31]
Berkeley humanoid: A research platform for learning-based con- trol
Qiayuan Liao, Bike Zhang, Xuanyu Huang, Xiaoyu Huang, Zhongyu Li, and Koushil Sreenath. Berkeley humanoid: A research platform for learning-based con- trol. InIEEE International Conference on Robotics and Automation (ICRA), 2025
work page 2025
-
[32]
Learning visuotactile skills with two multifingered hands,
Toru Lin, Yu Zhang, Qiyang Li, Haozhi Qi, Brent Yi, Sergey Levine, and Jitendra Malik. Learning visuotactile skills with two multifingered hands.arXiv preprint arXiv:2404.16823, 2024
-
[33]
A glove-based system for studying hand-object manipulation via joint pose and force sens- ing
Hangxin Liu, Xu Xie, Matt Millar, Mark Edmonds, Feng Gao, Yixin Zhu, Veronica J Santos, Brandon Rothrock, and Song-Chun Zhu. A glove-based system for studying hand-object manipulation via joint pose and force sens- ing. InIEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 6617–6624, 2017
work page 2017
-
[34]
High- fidelity grasping in virtual reality using a glove-based system
Hangxin Liu, Zhenliang Zhang, Xu Xie, Yixin Zhu, Yue Liu, Yongtian Wang, and Song-Chun Zhu. High- fidelity grasping in virtual reality using a glove-based system. InIEEE International Conference on Robotics and Automation (ICRA), pages 5180–5186, 2019
work page 2019
-
[35]
Learning humanoid locomotion with perceptive internal model,
Junfeng Long, Junli Ren, Moji Shi, Zirui Wang, Tao Huang, Ping Luo, and Jiangmiao Pang. Learning hu- manoid locomotion with perceptive internal model.arXiv preprint arXiv:2411.14386, 2024
-
[36]
Learning h-infinity locomotion control.arXiv preprint, 2024
Junfeng Long, Wenhan Yu, Quanyi Li, Zirui Wang, Dahua Lin, and Jiangmiao Pang. Learning h-infinity locomotion control.arXiv preprint, 2024
work page 2024
-
[37]
Mobile-television: Predictive motion priors for humanoid whole-body control
Chenhao Lu, Xuxin Cheng, Jialong Li, Shiqi Yang, Mazeyu Ji, Chengjing Yuan, Ge Yang, Sha Yi, and Xiao- long Wang. Mobile-television: Predictive motion priors for humanoid whole-body control. InIEEE International Conference on Robotics and Automation (ICRA), 2025
work page 2025
-
[38]
Perpetual humanoid control for real-time simulated avatars
Zhengyi Luo, Jinkun Cao, Alexander W Winkler, Kris Kitani, and Weipeng Xu. Perpetual humanoid control for real-time simulated avatars. InIEEE/CVF International Conference on Computer Vision (ICCV), pages 10895– 10904, 2023
work page 2023
-
[39]
Univer- sal humanoid motion representations for physics-based control
Zhengyi Luo, Jinkun Cao, Josh Merel, Alexander Win- kler, Jing Huang, Kris Kitani, and Weipeng Xu. Univer- sal humanoid motion representations for physics-based control. InInternational Conference on Learning Repre- sentations (ICLR), 2024. Spotlight
work page 2024
-
[40]
Zhengyi Luo, Ye Yuan, Tingwu Wang, Chenran Li, Sirui Chen, Fernando Casta ˜neda, Zi-Ang Cao, Jiefeng Li, David Minor, Qingwei Ben, Xingye Da, Runyu Ding, Cyrus Hogg, Lina Song, Edy Lim, Eugene Jeong, Tairan He, Haoru Xue, Wenli Xiao, Zi Wang, Simon Yuen, Jan Kautz, Yan Chang, Umar Iqbal, Linxi Fan, and Yuke Zhu. Sonic: Supersizing motion tracking for natu...
-
[41]
Amass: Archive of motion capture as surface shapes
Naureen Mahmood, Nima Ghorbani, Nikolaus F Troje, Gerard Pons-Moll, and Michael J Black. Amass: Archive of motion capture as surface shapes. InIEEE/CVF International Conference on Computer Vision (ICCV), pages 5442–5451, 2019
work page 2019
-
[42]
Deepmimic: Example-guided deep rein- forcement learning of physics-based character skills
Xue Bin Peng, Pieter Abbeel, Sergey Levine, and Michiel van de Panne. Deepmimic: Example-guided deep rein- forcement learning of physics-based character skills. In ACM Transactions on Graphics (TOG), volume 37, pages 1–14, 2018
work page 2018
-
[43]
Xue Bin Peng, Ze Ma, Pieter Abbeel, Sergey Levine, and Angjoo Kanazawa. Amp: Adversarial motion priors for stylized physics-based character control.ACM Transac- tions on Graphics (TOG), 40(4):1–20, 2021. SIGGRAPH 2021
work page 2021
-
[44]
Yuzhe Qin, Wei Yang, Binghao Huang, Karl Van Wyk, Hao Su, Xiaolong Wang, Yu-Wei Chao, and Dieter Fox. Anyteleop: A general vision-based dexterous robot arm-hand teleoperation system.arXiv preprint arXiv:2307.04577, 2023
-
[45]
Learning humanoid locomotion over challenging terrain.arXiv preprint arXiv:2410.03654, 2024
Ilija Radosavovic, Sarthak Kamat, Trevor Darrell, and Jitendra Malik. Learning humanoid locomotion over challenging terrain.arXiv preprint arXiv:2410.03654, 2024
-
[46]
Real-world hu- manoid locomotion with reinforcement learning.Science Robotics, 9(89):eadi9579, 2024
Ilija Radosavovic, Tete Xiao, Bike Zhang, Trevor Darrell, Jitendra Malik, and Koushil Sreenath. Real-world hu- manoid locomotion with reinforcement learning.Science Robotics, 9(89):eadi9579, 2024
work page 2024
-
[47]
Humanoid locomotion as next token prediction
Ilija Radosavovic, Bike Zhang, Baifeng Shi, Jathushan Rajasegaran, Sarthak Kamat, Trevor Darrell, Koushil Sreenath, and Jitendra Malik. Humanoid locomotion as next token prediction. 2024
work page 2024
-
[48]
Proximal Policy Optimization Algorithms
John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy optimization algorithms.arXiv preprint arXiv:1707.06347, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[49]
Agon Serifi, Ruben Grandia, Espen Knoop, Markus Gross, and Moritz B ¨acher. Vmp: Versatile motion priors for robustly tracking motion on physical charac- ters.Computer Graphics Forum (ACM SIGGRAPH / Eurographics Symposium on Computer Animation), 43 (8), 2024
work page 2024
-
[50]
Bimanual dexterity for complex tasks
Kenneth Shaw, Yulong Li, Jiahui Yang, Mohan Kumar Srirama, Ray Liu, Haoyu Xiong, Russell Mendonca, and Deepak Pathak. Bimanual dexterity for complex tasks. In8th Annual Conference on Robot Learning, 2024
work page 2024
-
[51]
Outrageously large neural networks: The sparsely-gated mixture-of-experts layer
Noam Shazeer, Azalia Mirhoseini, Krzysztof Maziarz, Andy Davis, Quoc Le, Geoffrey Hinton, and Jeff Dean. Outrageously large neural networks: The sparsely-gated mixture-of-experts layer. InInternational Conference on Learning Representations (ICLR), 2017
work page 2017
-
[52]
Robotic telekinesis: Learning a robotic hand imitator by watching humans on youtube
Aravind Sivakumar, Kenneth Shaw, and Deepak Pathak. Robotic telekinesis: Learning a robotic hand imitator by watching humans on youtube. InProceedings of Robotics: Science and Systems, New York City, NY , USA, 2022
work page 2022
-
[53]
Unified loco-manipulation controller for humanoid robots.arXiv preprint arXiv:2507.06905, 2025
Wandong Sun, Luying Feng, Baoshi Cao, Yang Liu, Yaochu Jin, and Zongwu Xie. Unified loco-manipulation controller for humanoid robots.arXiv preprint arXiv:2507.06905, 2025
-
[54]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. In Advances in Neural Information Processing Systems (NeurIPS), volume 30, 2017
work page 2017
-
[55]
Chen Wang, Haochen Shi, Weizhuo Wang, Ruohan Zhang, Li Fei-Fei, and C Karen Liu. Dexcap: Scalable and portable mocap data collection system for dexterous manipulation.arXiv preprint arXiv:2403.07788, 2024
-
[56]
Yuxuan Wang, Ming Yang, Weishuai Zeng, Yu Zhang, Xinrun Xu, Haobin Jiang, Ziluo Ding, and Zongqing Lu. From experts to a generalist: Toward general whole-body control for humanoid robots.arXiv preprint arXiv:2506.12779, 2025
-
[57]
Gello: A general, low-cost, and intuitive teleoperation framework for robot manipulators
Philipp Wu, Yide Shentu, Zhongke Yi, Xingyu Lin, and Pieter Abbeel. Gello: A general, low-cost, and intuitive teleoperation framework for robot manipulators. 2023
work page 2023
-
[58]
Hugwbc: A unified and general hu- manoid whole-body controller for versatile locomotion
Yufei Xue, Wentao Dong, Minghuan Liu, Weinan Zhang, and Jiangmiao Pang. Hugwbc: A unified and general hu- manoid whole-body controller for versatile locomotion. InRobotics: Science and Systems (RSS), 2025
work page 2025
-
[59]
Ace: A cross-platform visual-exoskeletons system for low-cost dexterous teleoperation
Shiqi Yang, Minghuan Liu, Yuzhe Qin, Runyu Ding, Jialong Li, Xuxin Cheng, Ruihan Yang, Sha Yi, and Xi- aolong Wang. Ace: A cross-platform visual-exoskeletons system for low-cost dexterous teleoperation. InConfer- ence on Robot Learning (CoRL), 2024
work page 2024
-
[60]
Generalizable humanoid manipulation with improved 3d diffusion policies
Yanjie Ze, Zixuan Chen, Jo ˜ao Pedro Ara ´ujo, Zi-ang Cao, Xue Bin Peng, Jiajun Wu, and C Karen Liu. Generalizable humanoid manipulation with improved 3d diffusion policies. InIEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2025
work page 2025
- [61]
-
[62]
Wococo: Learning whole-body humanoid control with sequential contacts
Chong Zhang, Wenli Xiao, Tairan He, and Guanya Shi. Wococo: Learning whole-body humanoid control with sequential contacts. InConference on Robot Learning (CoRL), 2024. Oral Presentation
work page 2024
-
[63]
Track any motions under any disturbances
Zhikai Zhang, Jun Guo, Chao Chen, Jilong Wang, Chenghuai Lin, Yunrui Lian, Han Xue, Zhenrong Wang, Maoqi Liu, Jiangran Lyu, Huaping Liu, He Wang, and Li Yi. Track any motions under any disturbances, 2025. URL https://arxiv.org/abs/2509.13833
-
[64]
Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware
Tony Z Zhao, Vikash Kumar, Sergey Levine, and Chelsea Finn. Learning fine-grained bimanual manipulation with low-cost hardware.arXiv preprint arXiv:2304.13705, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[65]
Ziwen Zhuang and Hang Zhao. Embrace collisions: Humanoid shadowing for deployable contact-agnostics motions.arXiv preprint arXiv:2502.01465, 2025
-
[66]
Ziwen Zhuang, Shenzhe Yao, and Hang Zhao. Humanoid parkour learning. InConference on Robot Learning (CoRL), pages 1975–1991. PMLR, 2024. APPENDIX A. Hyperparameters and Training Settings
work page 1975
-
[67]
The learning rate is set to 3×10 −4, with a clip range ofϵ clip = 0.2
PPO Hyperparameters:Proximal Policy Optimization (PPO) is adopted for policy gradient training of both expert policies and the gating network. The learning rate is set to 3×10 −4, with a clip range ofϵ clip = 0.2. The Generalized Advantage Estimation (GAE) parameter isλ= 0.95, and the discount factor isγ= 0.97. Each batch of data is updated 4 times, with ...
-
[68]
VAE, Curriculum Sampling, and Action Scaling:The motion prediction prior based on Variational Autoencoder (V AE) is jointly trained with expert policies, with future tra- jectory reconstruction loss weightλ recon = 0.5, KL divergence weightλ KL = 0.0005, and latent dimensiond= 32. The tra- jectory sampling weight is computed asw i =T i · 1+min(γ· fi, β) ,...
-
[69]
The number of parallel environments is 32768, with a max- imum episode length of 500 steps
Training Environment and Scale:All policies are trained in the MuJoCo physics simulator with NVIDIA RTX A6000 PRO GPUs, and implemented based on the mjlab framework. The number of parallel environments is 32768, with a max- imum episode length of 500 steps. During the expert policy phase, four expert groups (Walk/Run, Dance/Fight, Fall/Getup, Jump) are tr...
-
[70]
The architecture is shown in Table V
VAE with Transformer-based Encoder/Decoder:The motion prediction prior adopts a Transformer-based V AE architecture: the encoderE ϕ takes as input the historical reference trajectoryM − t (5 frames) and outputs latent distri- bution parameters(µ t, σt); the decoderD ψ predicts the future window ˜M + t (3 frames) conditioned onz t. The architecture is show...
-
[71]
It adopts a 5-layer MLP with hidden layer dimensions of (512,512,256,256,128)and ReLU activation
Expert Policy Network (Actor):The Actor takes as inputs t = (o t, mt, zt)and outputs actiona t ∈R 29. It adopts a 5-layer MLP with hidden layer dimensions of (512,512,256,256,128)and ReLU activation. The output TABLE V VAE / TRANSFORMERARCHITECTURE Component/Hyperparameter Value Encoder Transformer Layers 3 Attention Heads 8 Hidden Dimension (d model) 256...
-
[72]
Critic Network:The Critic takes as input privileged observations (e.g., true state and future reference trajectories) and outputs a scalar state valueV(s t)∈R. The network architecture is the same as the Actor, adopting a 5-layer MLP with hidden layer dimensions of(512,512,256,256,128), ReLU activation, and a 1-dimensional output layer
-
[73]
Gating Network:The gating networkG θ : (o t, mt)7→ RK outputs scores forK= 4experts, and takes arg maxto obtain the current expert index. The network adopts a 5-layer MLP with hidden layer dimensions of (512,512,256,256,128), ReLU activation, and outputs a 4-dimensional vector corresponding to four expert groups (Walk/Run, Dance/Fight, Fall/Getup, Jump). ...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.