ARC-RL: A Reinforcement Learning Playground Inspired by ARC Raiders
Pith reviewed 2026-05-20 05:24 UTC · model grok-4.3
pith:TS6PKARK Add to your LaTeX paper
What is a Pith Number?\usepackage{pith}
\pithnumber{TS6PKARK}
Prints a linked pith:TS6PKARK badge after your title and writes the identifier into PDF metadata. Compiles on arXiv with no extra files. Learn more
The pith
A single multi-component reward function supports reinforcement learning across four distinct game-inspired robotic morphologies.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We introduce ARC-RL, a suite of four MuJoCo continuous-control environments featuring robotic morphologies inspired by the bestiary of ARC Raiders: the 18-DoF tall hexapod Queen, the 12-DoF armoured hexapod Bastion, the 18-DoF compact hexapod Tick, and the 12-DoF quadruped Leaper. All four robots share a unified observation template, action convention, simulation cadence, and a single closed-form multi-component reward function whose only per-morphology variation lives in a small set of weights and parameters. The reward fuses a velocity-tracking tent, a healthy survive bonus, a phase-locked gait-compliance bonus/cost pair, action regularisers, three safety penalties, and a posture anchor;no
What carries the argument
The single closed-form multi-component reward function that combines velocity tracking, survival bonus, phase-locked gait compliance, regularisers, safety penalties and posture anchor, with only small per-morphology weight adjustments.
If this is right
- Online algorithms such as SAC can be compared directly against prior-data methods such as SACfD on the same set of morphologies and reward weights.
- Central pattern generator demonstrators supply fixed expert references and prior data usable for offline-to-online training across all four robots.
- Policies can be developed that respect animation-style stylistic constraints while operating on bodies with no real-world hardware counterpart.
- The playground enables direct measurement of how different learning paradigms cope with morphological diversity under one reward definition.
Where Pith is reading between the lines
- The same unification pattern could be tested on additional game-derived creatures to check whether minor weight changes remain sufficient when body plans differ even more sharply.
- Successful cross-morphology transfer here might indicate that phase-locked gait terms can serve as a lightweight prior for controllers that must later adapt to real hardware with similar stylistic goals.
- The environments could be used to measure whether reward terms tuned on one leg count generalise to others when the underlying physics engine parameters are also varied slightly.
Load-bearing premise
A single closed-form multi-component reward function with only small per-morphology weight variations can produce effective policies across all four distinct morphologies without motion-capture data or morphology-specific redesign.
What would settle it
Train a policy on one morphology using the shared reward and test whether it produces stable, gait-compliant locomotion on a second morphology with a different leg count; consistent failure to transfer or meet the compliance terms would show the unified reward does not suffice.
Figures
read the original abstract
Reinforcement learning for legged locomotion has matured into a stack of multi-component reward functions and physics-engine benchmarks whose morphologies are uniformly derived from real commercial hardware. Game NPCs, however, are bound by stylistic constraints absent from sim-to-real robotics and routinely take the form of creatures with no real-robot counterpart. We introduce ARC-RL, a suite of four MuJoCo continuous-control environments featuring robotic morphologies inspired by the bestiary of ARC Raiders: the 18-DoF tall hexapod Queen, the 12-DoF armoured hexapod Bastion, the 18-DoF compact hexapod Tick, and the 12-DoF quadruped Leaper. All four robots share a unified observation template, action convention, simulation cadence, and a single closed-form multi-component reward function whose only per-morphology variation lives in a small set of weights and parameters. The reward fuses a velocity-tracking tent, a healthy survive bonus, a phase-locked gait-compliance bonus/cost pair, action regularisers, three safety penalties, and a posture anchor; no motion-capture data enters the reward at any point. We additionally provide hand-crafted Central Pattern Generator demonstrators per morphology, which serve both as fixed expert references and as sources of prior data for offline-to-online training. On this playground, we conduct a controlled empirical study comparing standard online algorithms (SAC, SPEQ, SOPE-EO) and methods augmented with prior data (SACfD, SPEQ-O2O, SOPE), and characterise how each paradigm copes with the playground's morphological diversity and animation-style stylistic constraints.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces ARC-RL, a suite of four MuJoCo continuous-control environments with robotic morphologies inspired by ARC Raiders: the 18-DoF Queen hexapod, 12-DoF Bastion hexapod, 18-DoF Tick hexapod, and 12-DoF Leaper quadruped. All four share a unified observation template, action convention, simulation cadence, and a single closed-form multi-component reward function whose only per-morphology variation is in a small set of weights and parameters. The reward combines a velocity-tracking tent, survive bonus, phase-locked gait-compliance bonus/cost pair, action regularisers, safety penalties, and posture anchor, with no motion-capture data used. Hand-crafted CPG demonstrators are provided per morphology as expert references and prior data sources. The manuscript conducts a controlled empirical study comparing online algorithms (SAC, SPEQ, SOPE-EO) and prior-data-augmented variants (SACfD, SPEQ-O2O, SOPE) to characterise algorithm performance on morphological diversity and animation-style constraints.
Significance. If the unification claim holds, ARC-RL could provide a useful benchmark for RL on stylistically constrained, non-realistic legged morphologies that differ from standard robotics testbeds. The provision of CPG demonstrators for both reference and offline-to-online training is a concrete strength that supports reproducibility and controlled comparisons. The work targets a gap between sim-to-real robotics benchmarks and game NPC control.
major comments (2)
- [Reward function definition] Reward function section: The central claim that a single closed-form reward produces effective policies across all four morphologies with only small per-morphology weight/parameter changes is load-bearing. The phase-locked gait-compliance term requires definitions of leg phases and coupling. Hexapods (Queen, Bastion, Tick) have six legs while Leaper has four; nominal phase offsets and the coupling graph necessarily differ. Please provide the exact equation for this term and state whether the phase definitions and coupling structure are strictly identical across morphologies or whether they introduce morphology-specific structure beyond the claimed small weight set.
- [Empirical study] Empirical study section: The abstract states that a controlled empirical study is performed to characterise how each paradigm copes with morphological diversity. However, the available text contains no quantitative results, tables of returns, success rates, or statistical comparisons. If such results exist in the full manuscript, they must directly test whether the unified reward enables comparable policy learning across the four robots; otherwise the unification hypothesis cannot be evaluated.
minor comments (2)
- [Abstract] Abstract: Consider adding one sentence summarising the main empirical outcome (e.g., which algorithm family handled the stylistic constraints best) to give readers an immediate sense of the findings.
- [Notation and equations] Notation: Ensure that the names of reward components (velocity-tracking tent, phase-locked gait-compliance, posture anchor) are used consistently between the prose description and any equations or pseudocode.
Simulated Author's Rebuttal
We thank the referee for their thoughtful and constructive comments, which help clarify key aspects of our unification claim and empirical evaluation. We address each major comment below and have revised the manuscript to strengthen the presentation.
read point-by-point responses
-
Referee: [Reward function definition] Reward function section: The central claim that a single closed-form reward produces effective policies across all four morphologies with only small per-morphology weight/parameter changes is load-bearing. The phase-locked gait-compliance term requires definitions of leg phases and coupling. Hexapods (Queen, Bastion, Tick) have six legs while Leaper has four; nominal phase offsets and the coupling graph necessarily differ. Please provide the exact equation for this term and state whether the phase definitions and coupling structure are strictly identical across morphologies or whether they introduce morphology-specific structure beyond the claimed small weight set.
Authors: We thank the referee for highlighting this critical detail. The phase definitions and coupling graph are indeed morphology-specific to reflect the structural differences between the three hexapods and the quadruped. These differences are encoded strictly through the small per-morphology parameter set (nominal phase offsets, coupling weights, and leg-specific scaling factors), leaving the algebraic form of the phase-locked term identical across all robots. In the revised manuscript we have inserted the exact closed-form equation for the phase-locked gait-compliance bonus/cost pair, together with explicit tables listing the phase offsets and coupling adjacency matrices for each morphology. This addition makes the limited scope of the morphology-specific parameters fully transparent while preserving the single-reward unification claim. revision: yes
-
Referee: [Empirical study] Empirical study section: The abstract states that a controlled empirical study is performed to characterise how each paradigm copes with morphological diversity. However, the available text contains no quantitative results, tables of returns, success rates, or statistical comparisons. If such results exist in the full manuscript, they must directly test whether the unified reward enables comparable policy learning across the four robots; otherwise the unification hypothesis cannot be evaluated.
Authors: We agree that quantitative evidence is essential to substantiate the unification hypothesis. The full manuscript contains a dedicated empirical study section (Section 5) that reports mean returns, success rates (sustained forward velocity without falling), and paired statistical comparisons (Welch t-tests with Holm-Bonferroni correction) for all six algorithms across the four morphologies. These results are presented in Tables 2–4 and Figure 3, which directly compare learning curves under the shared reward and show that performance differences track morphological complexity rather than reward inconsistency. In the revision we have added an explicit summary table in the main text that cross-references these results to the unification claim and moved the full statistical appendix into the main body for easier evaluation. revision: yes
Circularity Check
No circularity: benchmark introduction with independently specified reward and demonstrators
full rationale
The paper introduces ARC-RL as a new MuJoCo benchmark suite. The unified observation template, action convention, simulation cadence, and closed-form multi-component reward (velocity-tracking tent, survive bonus, phase-locked gait-compliance, regularisers, safety penalties, posture anchor) are defined directly in the abstract and full text without reference to algorithm performance or fitted results. Hand-crafted CPG demonstrators per morphology are stated as prior data sources and fixed references, not derived from the RL comparisons. No equations reduce a prediction to a fitted input by construction, no self-citation chain supports a uniqueness claim, and the empirical study characterises algorithm behaviour on the provided playground rather than deriving the playground from the algorithms. The central claims remain self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
free parameters (1)
- per-morphology weights and parameters
axioms (1)
- domain assumption MuJoCo physics engine can accurately simulate the described 12-DoF and 18-DoF legged morphologies and their contact dynamics
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
rt = r_fwd + r_h + r+_gait − (c_gait + c_ctrl + … + c_post). … phase clock ϕ ∈ [0,2π) … duty fraction d … per-foot phase offset Δ_i … alternating tripod for the three hexapods, a diagonal-pair trot for Leaper.
-
IndisputableMonolith/Foundation/DimensionForcing.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
fixed frame-skip of 25 MuJoCo substeps per control step … gait frequency f_g
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Guillaume Bellegarda and Auke Ijspeert
arXiv:2206.11795. Guillaume Bellegarda and Auke Ijspeert. CPG-RL: Learning central pattern generators for quadruped locomotion.IEEE Robotics and Automation Letters, 7(4):12547–12554,
-
[2]
Guillaume Bellegarda, Milad Shafiee, and Auke Ijspeert
DOI: 10.1109/LRA.2022.3218167. Guillaume Bellegarda, Milad Shafiee, and Auke Ijspeert. Visual CPG-RL: Learning central pat- tern generators for visually-guided quadruped locomotion. InIEEE International Conference on Robotics and Automation (ICRA), pp. 1420–1427,
-
[3]
Dota 2 with Large Scale Deep Reinforcement Learning
DOI: 10.1613/jair.3912. Christopher Berner, Greg Brockman, Brooke Chan, Vicki Cheung, Przemysław D˛ ebiak, Christy Dennison, David Farhi, Quirin Fischer, Shariq Hashme, Chris Hesse, et al. Dota 2 with large scale deep reinforcement learning.arXiv preprint arXiv:1912.06680,
work page internal anchor Pith review Pith/arXiv arXiv doi:10.1613/jair.3912 1912
-
[4]
Boston Dynamics. Spot: The agile mobile robot, 2024.https://bostondynamics.com/ products/spot/. Greg Brockman, Vicki Cheung, Ludwig Pettersson, Jonas Schneider, John Schulman, Jie Tang, and Wojciech Zaremba. OpenAI Gym.arXiv preprint arXiv:1606.01540,
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[5]
Ken Caluwaerts, Atil Iscen, J. Chase Kew, Wenhao Yu, Tingnan Zhang, Daniel Freeman, Kuang- Huei Lee, Lisa Lee, Stefano Saliceti, Vincent Zhuang, et al. Barkour: Benchmarking animal-level agility with quadruped robots.arXiv preprint arXiv:2305.14654,
-
[6]
arXiv:2309.14341. Embark Studios. ARC Raiders. Video game. Released 30 October 2025,
-
[7]
Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, and Sergey Levine
arXiv:2106.13281. Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, and Sergey Levine. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. InProceedings of the 35th International Conference on Machine Learning (ICML), volume 80 ofProceedings of Machine Learning Research, pp. 1861–1870,
-
[8]
Benchmarking the spectrum of agent capabilities.arXiv preprint arXiv:2109.06780,
Danijar Hafner. Benchmarking the spectrum of agent capabilities.arXiv preprint arXiv:2109.06780,
-
[9]
Marco Hutter, Christian Gehring, Dominic Jud, Andreas Lauber, C
DOI: 10.1038/s41586-025-08744-2. Marco Hutter, Christian Gehring, Dominic Jud, Andreas Lauber, C. Dario Bellicoso, Vassilios Tsou- nis, Jemin Hwangbo, Karen Bodie, Péter Fankhauser, Michael Bloesch, Remo Diethelm, Samuel Bachmann, Amir Melzer, and Mark Hoepflinger. ANYmal – a highly mobile and dynamic quadrupedal robot. InIEEE/RSJ International Conference...
-
[10]
DOI: 10.1109/IROS.2016.7758092. Jemin Hwangbo, Joonho Lee, Alexey Dosovitskiy, Dario Bellicoso, Vassilios Tsounis, Vladlen Koltun, and Marco Hutter. Learning agile and dynamic motor skills for legged robots.Science Robotics, 4(26):eaau5872,
-
[11]
DOI: 10.1126/scirobotics.aau5872. Auke Jan Ijspeert. Central pattern generators for locomotion control in animals and robots: A review. Neural Networks, 21(4):642–653,
-
[12]
DOI: 10.1016/j.neunet.2008.03.014. Atil Iscen, Ken Caluwaerts, Jie Tan, Tingnan Zhang, Erwin Coumans, Vikas Sindhwani, and Vincent Vanhoucke. Policies modulating trajectory generators. InConference on Robot Learning (CoRL), volume 87 ofProceedings of Machine Learning Research, pp. 916–926,
-
[13]
Unity: A general platform for intelligent agents.arXiv preprint arXiv:1809.02627,
Arthur Juliani, Vincent-Pierre Berges, Ervin Teng, Andrew Cohen, Jonathan Harper, Chris Elion, Chris Goy, Yuan Gao, Hunter Henry, Marwan Mattar, and Danny Lange. Unity: A general platform for intelligent agents.arXiv preprint arXiv:1809.02627,
-
[14]
Guanda Li, Auke Ijspeert, and Mitsuhiro Hayashibe
DOI: 10.1126/scirobotics.abc5986. Guanda Li, Auke Ijspeert, and Mitsuhiro Hayashibe. AI-CPG: Adaptive imitated central pattern gen- erators for bipedal locomotion learned through reinforced reflex neural networks.IEEE Robotics and Automation Letters, 9(6):5190–5197,
-
[15]
DOI: 10.1109/LRA.2024.3388842. Viktor Makoviychuk, Lukasz Wawrzyniak, Yunrong Guo, Michelle Lu, Kier Storey, Miles Macklin, David Hoeller, Nikita Rudin, Arthur Allshire, Ankur Handa, and Gavriel State. Isaac Gym: High performance GPU-based physics simulation for robot learning. InAdvances in Neural Informa- tion Processing Systems Datasets and Benchmarks Track,
-
[16]
Isaac Gym: High Performance GPU-Based Physics Simulation For Robot Learning
arXiv:2108.10470. Gabriel B. Margolis and Pulkit Agrawal. Walk these ways: Tuning robot control for generalization with multiplicity of behavior. InProceedings of the 6th Conference on Robot Learning (CoRL), volume 205 ofProceedings of Machine Learning Research,
work page internal anchor Pith review Pith/arXiv arXiv
-
[17]
11 V olodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A
DOI: 10.1126/scirobotics.abk2822. 11 V olodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Belle- mare, Alex Graves, Martin Riedmiller, Andreas K. Fidjeland, Georg Ostrovski, et al. Human- level control through deep reinforcement learning.Nature, 518(7540):529–533,
-
[18]
Siddharth Mysore, Bassel Mabsout, Renato Mancuso, and Kate Saenko
DOI: 10.1038/nature14236. Siddharth Mysore, Bassel Mabsout, Renato Mancuso, and Kate Saenko. Regularizing action policies for smooth control with reinforcement learning. InIEEE International Conference on Robotics and Automation (ICRA), pp. 1810–1816,
-
[19]
Xue Bin Peng, Glen Berseth, KangKang Yin, and Michiel van de Panne
DOI: 10.1109/ICRA48506.2021.9561138. Xue Bin Peng, Glen Berseth, KangKang Yin, and Michiel van de Panne. DeepLoco: Dynamic locomotion skills using hierarchical deep reinforcement learning.ACM Transactions on Graphics (Proc. SIGGRAPH), 36(4),
-
[20]
Xue Bin Peng, Pieter Abbeel, Sergey Levine, and Michiel van de Panne
DOI: 10.1145/3072959.3073602. Xue Bin Peng, Pieter Abbeel, Sergey Levine, and Michiel van de Panne. DeepMimic: Example- guided deep reinforcement learning of physics-based character skills.ACM Transactions on Graphics (Proc. SIGGRAPH), 37(4),
-
[21]
Xue Bin Peng, Erwin Coumans, Tingnan Zhang, Tsang-Wei Edward Lee, Jie Tan, and Sergey Levine
DOI: 10.1145/3197517.3201311. Xue Bin Peng, Erwin Coumans, Tingnan Zhang, Tsang-Wei Edward Lee, Jie Tan, and Sergey Levine. Learning agile robotic locomotion skills by imitating animals. InRobotics: Science and Systems (RSS),
-
[22]
Xue Bin Peng, Ze Ma, Pieter Abbeel, Sergey Levine, and Angjoo Kanazawa
DOI: 10.15607/RSS.2020.XVI.064. Xue Bin Peng, Ze Ma, Pieter Abbeel, Sergey Levine, and Angjoo Kanazawa. AMP: Adversarial motion priors for stylized physics-based character control.ACM Transactions on Graphics (Proc. SIGGRAPH), 40(4),
-
[23]
Xue Bin Peng, Yunrong Guo, Lina Halper, Sergey Levine, and Sanja Fidler
DOI: 10.1145/3450626.3459670. Xue Bin Peng, Yunrong Guo, Lina Halper, Sergey Levine, and Sanja Fidler. ASE: Large-scale reusable adversarial skill embeddings for physically simulated characters.ACM Transactions on Graphics (Proc. SIGGRAPH), 41(4),
-
[24]
Carlo Romeo, Girolamo Macaluso, Alessandro Sestini, and Andrew D
DOI: 10.1145/3528223.3530110. Carlo Romeo, Girolamo Macaluso, Alessandro Sestini, and Andrew D. Bagdanov. SPEQ: Offline stabilization phases for efficient Q-learning in high update-to-data ratio reinforcement learning. Reinforcement Learning Journal (Proc. RLC 2025),
-
[25]
Carlo Romeo, Girolamo Macaluso, Alessandro Sestini, and Andrew D
arXiv:2501.08669. Carlo Romeo, Girolamo Macaluso, Alessandro Sestini, and Andrew D. Bagdanov. SOPE: Stabiliz- ing off-policy evaluation for online RL with prior data.arXiv preprint arXiv:2605.05863,
-
[26]
Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model
DOI: 10.1038/s41586-020-03051-4. Alessandro Sestini, Joakim Bergdahl, Konrad Tollmar, Andrew D. Bagdanov, and Linus Gisslén. Towards informed design and validation assistance in computer games using imitation learning. arXiv preprint arXiv:2208.07811,
work page internal anchor Pith review doi:10.1038/s41586-020-03051-4
-
[27]
Yecheng Shao, Yongbin Jin, Xianwei Liu, Weiyan He, Hongtao Wang, and Wei Yang
arXiv:2310.10486. Yecheng Shao, Yongbin Jin, Xianwei Liu, Weiyan He, Hongtao Wang, and Wei Yang. Learning free gait transition for quadruped robots via phase-guided controller.IEEE Robotics and Automation Letters, 7(2):1230–1237,
-
[28]
Jonah Siekmann, Yesh Godse, Alan Fern, and Jonathan Hurst
DOI: 10.1109/LRA.2021.3136645. Jonah Siekmann, Yesh Godse, Alan Fern, and Jonathan Hurst. Sim-to-real learning of all common bipedal gaits via periodic reward composition. InIEEE International Conference on Robotics and Automation (ICRA), pp. 7309–7315, 2021a. DOI: 10.1109/ICRA48506.2021.9561814. 12 Jonah Siekmann, Kevin Green, John Warila, Alan Fern, and...
-
[29]
DOI: 10.1038/nature16961. David Silver, Julian Schrittwieser, Karen Simonyan, Ioannis Antonoglou, Aja Huang, Arthur Guez, Thomas Hubert, Lucas Baker, Matthew Lai, Adrian Bolton, et al. Mastering the game of Go without human knowledge.Nature, 550(7676):354–359,
-
[30]
DOI: 10.1038/nature24270. David Silver, Thomas Hubert, Julian Schrittwieser, Ioannis Antonoglou, Matthew Lai, Arthur Guez, Marc Lanctot, Laurent Sifre, Dharshan Kumaran, Thore Graepel, et al. A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play.Science, 362(6419):1140– 1144,
-
[31]
DOI: 10.1126/science.aar6404. SIMA Team, Maria Abi Raad, Arun Ahuja, Catarina Barros, Frederic Besse, Andrew Bolt, Adrian Bolton, Bethanie Brownfield, Gavin Buttimore, Max Cant, et al. Scaling instructable agents across many simulated worlds.arXiv preprint arXiv:2404.10179,
-
[32]
Yuval Tassa, Yotam Doron, Alistair Muldal, Tom Erez, Yazhe Li, Diego de Las Casas, David Bud- den, Abbas Abdolmaleki, Josh Merel, Andrew Lefrancq, Timothy Lillicrap, and Martin Ried- miller. DeepMind control suite.arXiv preprint arXiv:1801.00690,
work page internal anchor Pith review Pith/arXiv arXiv
-
[33]
DOI: 10.1109/IROS.2012.6386109. Mark Towers, Ariel Kwiatkowski, Jordan Terry, John U. Balis, Gianluca De Cola, Tristan Deleu, Manuel Goulão, Andreas Kallinteris, Markus Krimmel, Arjun KG, et al. Gymnasium: A standard interface for reinforcement learning environments.arXiv preprint arXiv:2407.17032,
-
[34]
Unitree Go1, 2021.https://www.unitree.com/go1/
Unitree Robotics. Unitree Go1, 2021.https://www.unitree.com/go1/. Oriol Vinyals, Igor Babuschkin, Wojciech M. Czarnecki, Michaël Mathieu, Andrew Dudzik, Juny- oung Chung, David H. Choi, Richard Powell, Timo Ewalds, Petko Georgiev, et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning.Nature, 575(7782):350–354,
work page 2021
-
[35]
DOI: 10.1038/s41586-019-1724-z. Peter R. Wurman, Samuel Barrett, Kenta Kawamoto, James MacGlashan, Kaushik Subramanian, Thomas J. Walsh, Roberto Capobianco, Alisa Devlic, Franziska Eckert, Florian Fuchs, et al. Outracing champion Gran Turismo drivers with deep reinforcement learning.Nature, 602(7896): 223–228,
-
[36]
Zhaoming Xie, Glen Berseth, Patrick Clary, Jonathan Hurst, and Michiel van de Panne
DOI: 10.1038/s41586-021-04357-7. Zhaoming Xie, Glen Berseth, Patrick Clary, Jonathan Hurst, and Michiel van de Panne. Feedback control for Cassie with deep reinforcement learning. InIEEE/RSJ International Conference on Intelligent Robots and Systems (IROS),
-
[37]
Jiaqi Yang, Songyi Lu, Miao Han, Yuze Li, Yongqi Ma, Zihao Lin, and Hangxin Li
DOI: 10.1109/IROS.2018.8593722. Jiaqi Yang, Songyi Lu, Miao Han, Yuze Li, Yongqi Ma, Zihao Lin, and Hangxin Li. Mapless nav- igation for UA Vs via reinforcement learning from demonstrations.Science China Technological Sciences, 66(5):1263–1270,
-
[38]
DOI: 10.1007/s11431-022-2292-3. Kevin Zakka, Baruch Tabanpour, Qiayuan Liao, Mustafa Haiderbhai, Samuel Holt, Jing Yuan Luo, Arthur Allshire, Erik Frey, Koushil Sreenath, Lueder A. Kahrs, Carmelo Sferrazza, Yuval Tassa, and Pieter Abbeel. MuJoCo playground,
-
[39]
Robotics: Science and Systems (RSS) 2025, Outstanding Demo Paper Award. arXiv:2502.08844. 13 Xinyu Zhang, Zhiyuan Xiao, Qingrui Zhang, and Wei Pan. SYNLOCO: Synthesizing central pattern generator with reinforcement learning for quadruped locomotion. InIEEE Conference on Deci- sion and Control (CDC),
-
[40]
Authors corrected from earlier draft, which incorrectly attributed the paper to Bellegarda et al
arXiv:2310.06606. Authors corrected from earlier draft, which incorrectly attributed the paper to Bellegarda et al. 14
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.