Recognition: no theorem link
Now You See That: Learning End-to-End Humanoid Locomotion from Raw Pixels
Pith reviewed 2026-05-16 07:18 UTC · model grok-4.3
The pith
An end-to-end policy trained on simulated depth images lets real humanoid robots traverse high platforms, wide gaps, and long staircases from raw pixels alone.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors claim that high-fidelity depth-sensor simulation combined with latent-space alignment and noise-invariant auxiliary tasks during behavior distillation, together with terrain-specific multi-critic and multi-discriminator learning, produces a single policy that operates directly on raw stereo depth images, transfers without further tuning, and achieves robust locomotion across high platforms, wide gaps, and bidirectional long staircases on physical humanoids.
What carries the argument
Vision-aware behavior distillation, which performs latent alignment from privileged height maps to noisy depth observations while adding auxiliary tasks that enforce invariance to sensor noise.
If this is right
- Policies transfer zero-shot to two different real humanoid platforms with distinct stereo cameras.
- The same controller handles both extreme obstacles such as high platforms and wide gaps and fine-grained tasks such as long bidirectional staircases.
- No privileged height-map information or additional real-world training is required at test time.
- Terrain-specific critics and discriminators prevent conflicting objectives from degrading performance across mixed environments.
Where Pith is reading between the lines
- The distillation technique could be tested on other noisy sensors such as event cameras or low-cost RGB-D units to check broader applicability.
- Removing the privileged information requirement at test time opens a route toward fully onboard, map-free navigation in previously unseen buildings.
- Extending the multi-terrain critics to dynamic obstacles or moving platforms would be a direct next experiment.
- The method might reduce the data needed for learning new locomotion skills if the distilled latent features prove reusable across robot morphologies.
Load-bearing premise
The simulated depth artifacts and the distillation procedure are accurate enough to eliminate the need for any real-world fine-tuning or privileged information once the policy is deployed on physical robots.
What would settle it
Deploy the trained policy on a physical humanoid with a stereo camera and measure whether it completes repeated bidirectional traversals of a long staircase without falling or requiring any real-world adaptation; failure on this task while simulation performance remains high would falsify the transfer claim.
Figures
read the original abstract
Achieving robust vision-based humanoid locomotion remains challenging due to two fundamental issues: the sim-to-real gap introduces significant perception noise that degrades performance on fine-grained tasks, and training a unified policy across diverse terrains is hindered by conflicting learning objectives. To address these challenges, we present an end-to-end framework for vision-driven humanoid locomotion. For robust sim-to-real transfer, we develop a high-fidelity depth sensor simulation that captures stereo matching artifacts and calibration uncertainties inherent in real-world sensing. We further propose a vision-aware behavior distillation approach that combines latent space alignment with noise-invariant auxiliary tasks, enabling effective knowledge transfer from privileged height maps to noisy depth observations. For versatile terrain adaptation, we introduce terrain-specific reward shaping integrated with multi-critic and multi-discriminator learning, where dedicated networks capture the distinct dynamics and motion priors of each terrain type. We validate our approach on two humanoid platforms equipped with different stereo depth cameras. The resulting policy demonstrates robust performance across diverse environments, seamlessly handling extreme challenges such as high platforms and wide gaps, as well as fine-grained tasks including bidirectional long-term staircase traversal.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents an end-to-end framework for vision-based humanoid locomotion that uses a high-fidelity depth sensor simulation (capturing stereo matching artifacts and calibration uncertainties) for sim-to-real transfer, combined with vision-aware behavior distillation (latent alignment plus noise-invariant auxiliaries) to transfer from privileged height maps to raw depth observations, and terrain-specific reward shaping with multi-critic/multi-discriminator learning to handle diverse terrains. It claims robust zero-shot performance on two humanoid platforms with different stereo cameras, including extreme tasks (high platforms, wide gaps) and fine-grained ones (bidirectional long-term staircase traversal) without real-world fine-tuning or privileged information at test time.
Significance. If the central claims hold with supporting metrics, the work would be significant for humanoid robotics by demonstrating a practical path to close the perception sim-to-real gap for fine-grained locomotion using only raw depth at deployment. The integration of explicit stereo artifact modeling and multi-critic terrain adaptation addresses two key bottlenecks (perception noise and conflicting objectives) in a unified policy; reproducible validation on multiple platforms would strengthen its impact.
major comments (3)
- [Abstract] Abstract: reports successful validation on two platforms but provides no quantitative metrics (success rates, traversal distances, failure modes), ablation results, or details on post-hoc tuning avoidance; this directly undermines verification of the headline claim that the policy handles bidirectional long-term staircase traversal robustly from raw pixels.
- [§3] High-fidelity depth sensor simulation (assumed §3): the sim-to-real transfer claim rests on modeling stereo artifacts and calibration uncertainties, yet no quantitative comparison to real depth data collected under locomotion dynamics (e.g., motion blur, rolling shutter, or edge discontinuities during foot placement) is shown; without this, the simulation fidelity for fine-grained tasks remains unverified.
- [§4] Vision-aware behavior distillation (assumed §4): latent-space alignment and noise-invariant auxiliaries are presented as sufficient to transfer privileged height-map policies to noisy depth without real fine-tuning, but no ablation isolating their contribution versus standard distillation or privileged baselines is reported, leaving the necessity of these components unclear for the multi-terrain results.
minor comments (2)
- [Methods] Notation for the multi-critic and multi-discriminator losses could be clarified with explicit equations showing how terrain-specific rewards are combined.
- [Results] Figure captions should include quantitative performance numbers (e.g., success rate per terrain) to make visual results self-contained.
Simulated Author's Rebuttal
We thank the referee for the constructive comments, which highlight opportunities to strengthen the quantitative support for our claims. We address each major point below and have revised the manuscript accordingly to improve clarity and verifiability.
read point-by-point responses
-
Referee: [Abstract] Abstract: reports successful validation on two platforms but provides no quantitative metrics (success rates, traversal distances, failure modes), ablation results, or details on post-hoc tuning avoidance; this directly undermines verification of the headline claim that the policy handles bidirectional long-term staircase traversal robustly from raw pixels.
Authors: We agree that the abstract would benefit from explicit quantitative metrics. Section 5 of the manuscript already reports success rates above 85% for bidirectional long-term staircase traversal, average traversal distances exceeding 50 meters without failure, and categorized failure modes, along with explicit confirmation that no post-hoc real-world tuning was applied. We have revised the abstract to incorporate these key metrics and reference the supporting ablations. revision: yes
-
Referee: [§3] High-fidelity depth sensor simulation (assumed §3): the sim-to-real transfer claim rests on modeling stereo artifacts and calibration uncertainties, yet no quantitative comparison to real depth data collected under locomotion dynamics (e.g., motion blur, rolling shutter, or edge discontinuities during foot placement) is shown; without this, the simulation fidelity for fine-grained tasks remains unverified.
Authors: Section 3 describes the high-fidelity simulation that explicitly models stereo matching artifacts and calibration uncertainties. We acknowledge that a direct quantitative comparison to real dynamic depth data would further substantiate fidelity. We have added this comparison in the revised manuscript, including metrics on noise distributions, motion blur, rolling shutter effects, and edge discontinuities observed during foot placement, confirming close alignment with real stereo camera data. revision: yes
-
Referee: [§4] Vision-aware behavior distillation (assumed §4): latent-space alignment and noise-invariant auxiliaries are presented as sufficient to transfer privileged height-map policies to noisy depth without real fine-tuning, but no ablation isolating their contribution versus standard distillation or privileged baselines is reported, leaving the necessity of these components unclear for the multi-terrain results.
Authors: Section 4 presents the vision-aware distillation with latent alignment and noise-invariant auxiliaries, and Section 5 includes comparisons to privileged baselines. To isolate the specific contributions, we have added dedicated ablation experiments in the revised manuscript. These demonstrate that ablating either component leads to degraded performance on diverse terrains, confirming their necessity for effective transfer to raw depth without real-world fine-tuning. revision: yes
Circularity Check
No significant circularity; derivation uses external simulation and standard RL components
full rationale
The paper describes an end-to-end framework relying on high-fidelity depth sensor simulation (capturing stereo artifacts and calibration uncertainties), vision-aware behavior distillation (latent alignment plus noise-invariant auxiliaries), and terrain-specific reward shaping with multi-critic/multi-discriminator learning. These are presented as engineering choices and empirical techniques rather than any self-definitional loop, fitted parameter renamed as prediction, or load-bearing self-citation chain. No equations or sections in the provided text reduce a claimed result to its own inputs by construction; validation is on physical platforms with different cameras, making the central claims externally falsifiable. This matches the default expectation of no circularity.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Legged locomotion in challenging ter- rains using egocentric vision
Ananye Agarwal, Ashish Kumar, Jitendra Malik, and Deepak Pathak. Legged locomotion in challenging ter- rains using egocentric vision. InConference on robot learning, pages 403–415. PMLR, 2023
work page 2023
-
[2]
Qingwei Ben, Botian Xu, Kailin Li, Feiyu Jia, Wentao Zhang, Jingping Wang, Jingbo Wang, Dahua Lin, and Jiangmiao Pang. Gallant: V oxel grid-based humanoid locomotion and local-navigation across 3d constrained terrains, 2025. URL https://arxiv.org/abs/2511.14625
-
[3]
David Bertoin, Adil Zouitine, Mehdi Zouitine, and Em- manuel Rachelson. Look where you look! saliency- guided q-networks for generalization in visual reinforce- ment learning.Advances in Neural Information Process- ing Systems, 35:30693–30706, 2022
work page 2022
-
[4]
Understanding domain randomization for sim-to- real transfer.arXiv preprint arXiv:2110.03239, 2021
Xiaoyu Chen, Jiachen Hu, Chi Jin, Lihong Li, and Liwei Wang. Understanding domain randomization for sim-to- real transfer.arXiv preprint arXiv:2110.03239, 2021
-
[5]
Extreme parkour with legged robots
Xuxin Cheng, Kexin Shi, Ananye Agarwal, and Deepak Pathak. Extreme parkour with legged robots. In2024 IEEE International Conference on Robotics and Automa- tion (ICRA), pages 11443–11450. IEEE, 2024
work page 2024
-
[6]
Robot-centric eleva- tion mapping with uncertainty estimates
P ´eter Fankhauser, Michael Bloesch, Christian Gehring, Marco Hutter, and Roland Siegwart. Robot-centric eleva- tion mapping with uncertainty estimates. InInternational Conference on Climbing and Walking Robots (CLAWAR), 2014
work page 2014
-
[7]
P ´eter Fankhauser, Michael Bloesch, and Marco Hutter. Probabilistic terrain mapping for mobile robots with uncertain localization.IEEE Robotics and Automation Letters (RA-L), 3(4):3019–3026, 2018. doi: 10.1109/ LRA.2018.2849506
-
[8]
Rate of change of angular momentum and balance maintenance of biped robots
Ambarish Goswami and Vinutha Kallem. Rate of change of angular momentum and balance maintenance of biped robots. InInternational Conference on Robotics and Automation (ICRA), 2004
work page 2004
-
[9]
Nicklas Hansen, Hao Su, and Xiaolong Wang. Stabilizing deep q-learning with convnets and vision transformers under data augmentation.Advances in Neural Informa- tion Processing Systems, 34, 2021
work page 2021
-
[10]
Nicklas Hansen, Zhecheng Yuan, Yanjie Ze, Tongzhou Mu, Aravind Rajeswaran, Hao Su, Huazhe Xu, and Xiaolong Wang. On pre-training for visuo-motor con- trol: Revisiting a learning-from-scratch baseline.arXiv preprint arXiv:2212.05749, 2022
-
[11]
Md-gan: Multi-discriminator generative adversarial net- works for distributed datasets
Corentin Hardy, Erwan Le Merrer, and Bruno Sericola. Md-gan: Multi-discriminator generative adversarial net- works for distributed datasets. In2019 IEEE Interna- tional Parallel and Distributed Processing Symposium (IPDPS), page 866–877. IEEE, May 2019. doi: 10. 1109/ipdps.2019.00095. URL http://dx.doi.org/10.1109/ IPDPS.2019.00095
-
[12]
Junzhe He, Chong Zhang, Fabian Jenelten, Ruben Grandia, Moritz B ¨acher, and Marco Hutter. Attention- based map encoding for learning generalized legged locomotion.Science Robotics, 10(105):eadv3604, 2025
work page 2025
-
[13]
Attention- based map encoding for learning generalized legged locomotion, 2025
Junzhe He, Chong Zhang, Fabian Jenelten, Ruben Grandia, Moritz B ¨Acher, and Marco Hutter. Attention- based map encoding for learning generalized legged locomotion, 2025. URL https://arxiv.org/abs/2506.09588
-
[14]
Masked Autoencoders Are Scalable Vision Learners
Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Doll ´ar, and Ross Girshick. Masked autoen- coders are scalable vision learners.arXiv preprint arXiv:2111.06377, 2021
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[15]
David Hoeller, Nikita Rudin, Dhionis Sako, and Marco Hutter. Anymal parkour: Learning agile navigation for quadrupedal robots.Science Robotics, 9(88):eadi7566, 2024
work page 2024
-
[16]
Wurm, Maren Bennewitz, Cyrill Stachniss, and Wolfram Burgard
Armin Hornung, Kai M. Wurm, Maren Bennewitz, Cyrill Stachniss, and Wolfram Burgard. OctoMap: An ef- ficient probabilistic 3D mapping framework based on octrees.Autonomous Robots, 2013. doi: 10.1007/ s10514-012-9321-0. URL https://octomap.github.io. Software available at https://octomap.github.io
work page 2013
-
[17]
Towards the generalization of contrastive self-supervised learning
Weiran Huang, Mingyang Yi, and Xuyang Zhao. Towards the generalization of contrastive self-supervised learning. arXiv preprint arXiv:2111.00743, 2021
-
[18]
Spectrum random masking for generalization in image-based reinforcement learning
Yangru Huang, Peixi Peng, Yifan Zhao, Guangyao Chen, and Yonghong Tian. Spectrum random masking for generalization in image-based reinforcement learning. Advances in Neural Information Processing Systems, 35: 20393–20406, 2022
work page 2022
-
[19]
Learning agile and dynamic motor skills for legged robots.Science Robotics, 2019
Jemin Hwangbo, Joonho Lee, Alexey Dosovitskiy, Dario Bellicoso, Vassilios Tsounis, Vladlen Koltun, and Marco Hutter. Learning agile and dynamic motor skills for legged robots.Science Robotics, 2019
work page 2019
-
[20]
R.E. Kalman. A new approach to linear filtering and prediction problems.Journal of Basic Engineering, 82 (1):35–45, 1960
work page 1960
-
[21]
Curl: Contrastive unsupervised representations for re- inforcement learning
Michael Laskin, Aravind Srinivas, and Pieter Abbeel. Curl: Contrastive unsupervised representations for re- inforcement learning. InInternational Conference on Machine Learning, pages 5639–5650. PMLR, 2020
work page 2020
-
[22]
Misha Laskin, Kimin Lee, Adam Stooke, Lerrel Pinto, Pieter Abbeel, and Aravind Srinivas. Reinforcement learning with augmented data.Advances in neural information processing systems, 33:19884–19895, 2020
work page 2020
-
[23]
Learning quadrupedal locomotion over challenging terrain.Science robotics, 2020
Joonho Lee, Jemin Hwangbo, Lorenz Wellhausen, Vladlen Koltun, and Marco Hutter. Learning quadrupedal locomotion over challenging terrain.Science robotics, 2020
work page 2020
-
[24]
Zhongyu Li, Xue Bin Peng, Pieter Abbeel, Sergey Levine, Glen Berseth, and Koushil Sreenath. Rein- forcement learning for versatile, dynamic, and robust bipedal locomotion control.The International Journal of Robotics Research (IJRR), 2024
work page 2024
-
[25]
Minsong Liu, Yuanheng Zhu, Yaran Chen, and Dongbin Zhao. Enhancing reinforcement learning via transformer- based state predictive representations.IEEE Transactions on Artificial Intelligence, 2024
work page 2024
-
[26]
Learning hu- manoid locomotion with perceptive internal model.arXiv preprint arXiv:2411.14386, 2024
Junfeng Long, Junli Ren, Moji Shi, Zirui Wang, Tao Huang, Ping Luo, and Jiangmiao Pang. Learning hu- manoid locomotion with perceptive internal model.arXiv preprint arXiv:2411.14386, 2024
-
[27]
Learning hu- manoid locomotion with perceptive internal model
Junfeng Long, Junli Ren, Moji Shi, Zirui Wang, Tao Huang, Ping Luo, and Jiangmiao Pang. Learning hu- manoid locomotion with perceptive internal model. In 2025 IEEE International Conference on Robotics and Automation (ICRA), pages 9997–10003. IEEE, 2025
work page 2025
-
[28]
Learning visual locomotion with cross-modal supervi- sion
Antonio Loquercio, Ashish Kumar, and Jitendra Malik. Learning visual locomotion with cross-modal supervi- sion. InIEEE International Conference on Robotics and Automation (ICRA), pages 7295–7302. IEEE, 2023
work page 2023
-
[29]
Deep reinforcement and infomax learning
Bogdan Mazoure, Remi Tachet des Combes, Thang Long Doan, Philip Bachman, and R Devon Hjelm. Deep reinforcement and infomax learning. InAdvances in Neural Information Processing Systems, 2020
work page 2020
-
[30]
Passive Dynamic Walking.The In- ternational Journal of Robotics Research, 9(2):62–82,
Tad McGeer. Passive Dynamic Walking.The In- ternational Journal of Robotics Research, 9(2):62–82,
-
[31]
URL http: //ijr.sagepub.com/content/9/2/62.abstract
doi: 10.1177/027836499000900206. URL http: //ijr.sagepub.com/content/9/2/62.abstract
-
[32]
Takahiro Miki, Joonho Lee, Jemin Hwangbo, Lorenz Wellhausen, Vladlen Koltun, and Marco Hutter. Learning robust perceptive locomotion for quadrupedal robots in the wild.Science Robotics, 7(62), January 2022. ISSN 2470-9476. doi: 10.1126/scirobotics.abk2822. URL http://dx.doi.org/10.1126/scirobotics.abk2822
-
[33]
Elevation mapping for locomotion and navigation using gpu
Takahiro Miki, Lorenz Wellhausen, Ruben Grandia, Fabian Jenelten, Timon Homberger, and Marco Hutter. Elevation mapping for locomotion and navigation using gpu. In2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 2273–2280. IEEE, 2022
work page 2022
-
[34]
Multi-critic actor learning: Teaching rl policies to act with style
Siddharth Mysore, George Cheng, Yunqi Zhao, Kate Saenko, and Meng Wu. Multi-critic actor learning: Teaching rl policies to act with style. InInternational Conference on Learning Representations (ICLR), 2022
work page 2022
-
[35]
Obstacle-aware quadrupedal locomotion with resilient multi-modal reinforcement learning, 2024
I Made Aswin Nahrendra, Byeongho Yu, Minho Oh, Dongkyu Lee, Seunghyun Lee, Hyeonwoo Lee, Hyung- tae Lim, and Hyun Myung. Obstacle-aware quadrupedal locomotion with resilient multi-modal reinforcement learning, 2024. URL https://arxiv.org/abs/2409.19709
-
[36]
R3M: A Universal Visual Representation for Robot Manipulation
Suraj Nair, Aravind Rajeswaran, Vikash Kumar, Chelsea Finn, and Abhinav Gupta. R3m: A universal visual representation for robot manipulation.arXiv preprint arXiv:2203.12601, 2022
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[37]
Tianwei Ni, Benjamin Eysenbach, Erfan Seyedsalehi, Michel Ma, Clement Gehring, Aditya Mahajan, and Pierre-Luc Bacon. Bridging state and history represen- tations: Understanding self-predictive rl.arXiv preprint arXiv:2401.08898, 2024
-
[38]
The unsurprising effec- tiveness of pre-trained vision models for control
Simone Parisi, Aravind Rajeswaran, Senthil Purush- walkam, and Abhinav Gupta. The unsurprising effec- tiveness of pre-trained vision models for control. In International Conference on Machine Learning, pages 17359–17371. PMLR, 2022
work page 2022
-
[39]
Sim-to-real transfer of robotic control with dynamics randomization
Xue Bin Peng, Marcin Andrychowicz, Wojciech Zaremba, and Pieter Abbeel. Sim-to-real transfer of robotic control with dynamics randomization. In2018 IEEE international conference on robotics and automa- tion (ICRA), pages 3803–3810. IEEE, 2018
work page 2018
-
[40]
Xue Bin Peng, Ze Ma, Pieter Abbeel, Sergey Levine, and Angjoo Kanazawa. Amp: adversarial motion pri- ors for stylized physics-based character control.ACM Transactions on Graphics, 40(4):1–20, July 2021. ISSN 1557-7368. doi: 10.1145/3450626.3459670. URL http: //dx.doi.org/10.1145/3450626.3459670
-
[41]
Real-world hu- manoid locomotion with reinforcement learning.Science Robotics, 2024
Ilija Radosavovic, Tete Xiao, Bike Zhang, Trevor Darrell, Jitendra Malik, and Koushil Sreenath. Real-world hu- manoid locomotion with reinforcement learning.Science Robotics, 2024
work page 2024
-
[42]
Banafsheh Rafiee, Jun Jin, Jun Luo, and Adam White. What makes useful auxiliary tasks in reinforcement learning: investigating the effect of the target policy. arXiv preprint arXiv:2204.00565, 2022
-
[43]
Roberta Raileanu, Maxwell Goldstein, Denis Yarats, Ilya Kostrikov, and Rob Fergus. Automatic data augmentation for generalization in reinforcement learning.Advances in Neural Information Processing Systems, 34:5402–5415, 2021
work page 2021
-
[44]
Rrl: Resnet as rep- resentation for reinforcement learning
Rutav M Shah and Vikash Kumar. Rrl: Resnet as rep- resentation for reinforcement learning. InInternational Conference on Machine Learning, pages 9465–9476. PMLR, 2021
work page 2021
-
[45]
Super-Convergence: Very Fast Training of Neural Networks Using Large Learning Rates
Leslie N. Smith and Nicholay Topin. Super-convergence: Very fast training of neural networks using large learning rates, 2018. URL https://arxiv.org/abs/1708.07120
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[46]
Gait-adaptive perceptive humanoid locomotion with real- time under-base terrain reconstruction, 2025
Haolin Song, Hongbo Zhu, Tao Yu, Yan Liu, Mingqi Yuan, Wengang Zhou, Hua Chen, and Houqiang Li. Gait-adaptive perceptive humanoid locomotion with real- time under-base terrain reconstruction, 2025. URL https: //arxiv.org/abs/2512.07464
-
[47]
Jingkai Sun, Gang Han, Pihai Sun, Wen Zhao, Jiahang Cao, Jiaxu Wang, Yijie Guo, and Qiang Zhang. Dpl: Depth-only perceptive humanoid locomotion via realistic depth synthesis and cross-attention terrain reconstruction. arXiv preprint arXiv:2510.07152, 2025
-
[48]
Learning per- ceptive humanoid locomotion over challenging terrain
Wandong Sun, Baoshi Cao, Long Chen, Yongbo Su, Yang Liu, Zongwu Xie, and Hong Liu. Learning per- ceptive humanoid locomotion over challenging terrain. arXiv preprint arXiv:2503.00692, 2025
-
[49]
Domain ran- domization for transferring deep neural networks from simulation to the real world
Josh Tobin, Rachel Fong, Alex Ray, Jonas Schneider, Wojciech Zaremba, and Pieter Abbeel. Domain ran- domization for transferring deep neural networks from simulation to the real world. In2017 IEEE/RSJ in- ternational conference on intelligent robots and systems (IROS). IEEE, 2017
work page 2017
-
[50]
Masked visual pre-training for motor control
Tete Xiao, Ilija Radosavovic, Trevor Darrell, and Jitendra Malik. Masked visual pre-training for motor control. arXiv preprint arXiv:2203.06173, 2022
-
[51]
Mingle Xu, Sook Yoon, Alvaro Fuentes, and Dong Sun Park. A comprehensive survey of image augmen- tation techniques for deep learning.arXiv preprint arXiv:2205.01491, 2022
-
[52]
Ruihan Yang, Minghao Zhang, Nicklas Hansen, Huazhe Xu, and Xiaolong Wang. Learning vision-guided quadrupedal locomotion end-to-end with cross-modal transformers.arXiv preprint arXiv:2107.03996, 2021
-
[53]
Image data aug- mentation for deep learning: A survey.arXiv preprint arXiv:2204.08610, 2022
Suorong Yang, Weikang Xiao, Mengcheng Zhang, Suhan Guo, Jian Zhao, and Furao Shen. Image data aug- mentation for deep learning: A survey.arXiv preprint arXiv:2204.08610, 2022
-
[54]
Im- age augmentation is all you need: Regularizing deep reinforcement learning from pixels
Denis Yarats, Ilya Kostrikov, and Rob Fergus. Im- age augmentation is all you need: Regularizing deep reinforcement learning from pixels. InInternational Conference on Learning Representations, 2020
work page 2020
-
[55]
Mastering visual continuous control: Improved data-augmented reinforcement learning
Denis Yarats, Rob Fergus, Alessandro Lazaric, and Lerrel Pinto. Mastering visual continuous control: Improved data-augmented reinforcement learning. InInternational Conference on Learning Representations, 2021
work page 2021
-
[56]
Mask-based latent reconstruction for reinforce- ment learning.arXiv preprint arXiv:2201.12096, 2022
Tao Yu, Zhizheng Zhang, Cuiling Lan, Zhibo Chen, and Yan Lu. Mask-based latent reconstruction for reinforce- ment learning.arXiv preprint arXiv:2201.12096, 2022
-
[57]
Pre-trained image encoder for generalizable visual reinforcement learning
Zhecheng Yuan, Zhengrong Xue, Bo Yuan, Xueqian Wang, Yi Wu, Yang Gao, and Huazhe Xu. Pre-trained image encoder for generalizable visual reinforcement learning. InFirst Workshop on Pre-training: Perspec- tives, Pitfalls, and Paths Forward at ICML 2022, 2022
work page 2022
-
[58]
Unpaired image-to-image translation using cycle- consistent adversarial networkss
Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A Efros. Unpaired image-to-image translation using cycle- consistent adversarial networkss. InComputer Vision (ICCV), 2017 IEEE International Conference on, 2017
work page 2017
-
[60]
Ziwen Zhuang, Zipeng Fu, Jianren Wang, Christopher Atkeson, Soeren Schwertfeger, Chelsea Finn, and Hang Zhao. Robot parkour learning, 2023. URL https://arxiv. org/abs/2309.05665
-
[61]
Ziwen Zhuang, Shenzhe Yao, and Hang Zhao. Humanoid parkour learning. InConference on Robot Learning (CoRL), 2024. APPENDIX A. Network Architectures We present the detailed network architectures for the teacher policy, student policy, and multi-critic networks used in our framework. The teacher and student policies share identical architectures except for ...
work page 2024
-
[62]
Translation Quality Evaluation:To validate that our CycleGAN produces realistic depth translations, we evaluate the model using standard image-to-image translation metrics on a held-out test set comprising 10% of the data (approximately 20,000 frames). Metric Sim→Real Real→Sim Translation Quality FID↓(no translation baseline) 67.2 FID↓(after CycleGAN) 23....
-
[63]
Velocity Tracking Rewards:We employ two distinct velocity tracking formulations depending on terrain requirements: Exponential Velocity Tracking(r exp vel ) is used for stairs, platforms, and rough terrain, where precise velocity regulation ensures stable foot placement: rexp vel = exp − ∥vcmd xy −v robot xy ∥2 σ2 ! (13) wherev cmd xy denotes the commande...
-
[64]
Feet Contact Height Reward:For stairs and platform terrains, we introduce a feet contact height reward (r contact) that encourages the robot to place its feet on flat, stable surfaces: rcontact = X f∈{left,right} ⊮f contact ·std clip(hscan f ,−h max, hmax) (15) whereh scan f denotes the height scan measurements around footfrelative to the foot position,⊮ ...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.