Self-Supervised Multisensory Pretraining for Contact-Rich Robot Reinforcement Learning
Pith reviewed 2026-05-17 21:01 UTC · model grok-4.3
The pith
Self-supervised multisensory pretraining allows robots to learn contact-rich manipulation with few real-world trials.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
MSDP trains a transformer encoder via masked autoencoding to reconstruct multisensory observations from subsets of sensor embeddings, fostering cross-modal prediction. For policy learning, the asymmetric architecture freezes the encoder and lets the critic extract dynamic features via cross-attention while the actor receives pooled representations, resulting in accelerated learning and robust performance in contact-rich tasks.
What carries the argument
The MultiSensory Dynamic Pretraining (MSDP) based on masked autoencoding of multisensory observations with a transformer encoder, combined with an asymmetric actor-critic architecture for downstream reinforcement learning.
If this is right
- Accelerated learning in multiple contact-rich robot manipulation tasks
- Robust performance under sensor noise and changes in object dynamics
- High success rates on real robots with as few as 6000 online interactions
- Effective sensor fusion through cross-modal prediction during pretraining
Where Pith is reading between the lines
- Such pretraining could be applied to additional sensory modalities like touch to further improve manipulation skills.
- The separation of stable actor input and dynamic critic features might generalize to other RL settings where representations need to balance consistency and adaptability.
- Offline pretraining on multisensory data may help bridge simulation to real-world transfer in robotics.
Load-bearing premise
The representations learned by masked autoencoding on multisensory observations contain the dynamic, task-relevant features needed by the critic without requiring additional fine-tuning or task-specific adaptation during pretraining.
What would settle it
A direct comparison showing whether removing the masked autoencoding pretraining or the cross-attention mechanism in the critic leads to significantly lower success rates and less robustness in the real-robot contact-rich tasks.
Figures
read the original abstract
Effective contact-rich manipulation requires robots to synergistically leverage vision, force, and proprioception. However, Reinforcement Learning agents struggle to learn in such multisensory settings, especially amidst sensory noise and dynamic changes. We propose MultiSensory Dynamic Pretraining (MSDP), a novel framework for learning expressive multisensory representations tailored for task-oriented policy learning. MSDP is based on masked autoencoding and trains a transformer-based encoder by reconstructing multisensory observations from only a subset of sensor embeddings, leading to cross-modal prediction and sensor fusion. For downstream policy learning, we introduce a novel asymmetric architecture, where a cross-attention mechanism allows the critic to extract dynamic, task-specific features from the frozen embeddings, while the actor receives a stable pooled representation to guide its actions. Our method demonstrates accelerated learning and robust performance under diverse perturbations, including sensor noise, and changes in object dynamics. Evaluations in multiple challenging, contact-rich robot manipulation tasks in simulation and the real world showcase the effectiveness of MSDP. Our approach exhibits strong robustness to perturbations and achieves high success rates on the real robot with as few as 6,000 online interactions, offering a simple yet powerful solution for complex multisensory robotic control. Website: https://msdp-pearl.github.io/
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes MultiSensory Dynamic Pretraining (MSDP), a self-supervised framework that pretrains a transformer encoder via masked autoencoding on multisensory observations (vision, force, proprioception) to induce cross-modal fusion. For downstream reinforcement learning, it introduces an asymmetric actor-critic architecture in which the critic extracts task-specific dynamic features from the frozen embeddings via cross-attention while the actor receives a pooled representation. The paper claims that this yields accelerated learning, robustness to sensor noise and object dynamics changes, and high real-robot success rates in contact-rich manipulation tasks using as few as 6,000 online interactions.
Significance. If the empirical claims are substantiated, the work could provide a practical route to improved sample efficiency and robustness in multisensory robotic RL by separating representation learning from task-specific fine-tuning. The asymmetric critic design and emphasis on cross-modal prediction address a recognized challenge in contact-rich settings.
major comments (2)
- [Abstract] Abstract: the headline claim of high real-robot success with only 6,000 interactions is presented without quantitative baselines, ablation studies, statistical details, or description of the pretraining corpus and downstream task suite. This absence prevents evaluation of whether the reported gains are driven by MSDP rather than task selection or architecture alone.
- [Pretraining and downstream sections] Pretraining and downstream sections: the masked reconstruction objective is purely reconstructive and action-free. No analysis is supplied showing that the resulting embeddings encode forward dynamics or force transients required by the critic for value estimation; if the embeddings primarily capture static correlations, the robustness and sample-efficiency claims would rest on the asymmetric architecture rather than the pretraining.
minor comments (2)
- [Method] Clarify the precise masking ratio, sensor-specific embedding dimensions, and reconstruction loss weighting across modalities to support reproducibility.
- [Experiments] Real-robot results should report trial counts, success-rate confidence intervals, and perturbation magnitudes for the claimed robustness.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback. We address each major comment below, clarifying our contributions and indicating planned revisions to strengthen the manuscript.
read point-by-point responses
-
Referee: [Abstract] Abstract: the headline claim of high real-robot success with only 6,000 interactions is presented without quantitative baselines, ablation studies, statistical details, or description of the pretraining corpus and downstream task suite. This absence prevents evaluation of whether the reported gains are driven by MSDP rather than task selection or architecture alone.
Authors: We agree that the abstract, as a concise summary, omits specific quantitative details that would better contextualize the results. In the revised manuscript we will expand the abstract to report key success rates with statistical significance, reference the scale of the pretraining corpus, and briefly describe the downstream task suite. We will also explicitly note that full baselines, ablations, and statistical analyses appear in Sections 4 and 5. These additions will make it clearer that the reported gains are attributable to MSDP rather than task choice alone. revision: yes
-
Referee: [Pretraining and downstream sections] Pretraining and downstream sections: the masked reconstruction objective is purely reconstructive and action-free. No analysis is supplied showing that the resulting embeddings encode forward dynamics or force transients required by the critic for value estimation; if the embeddings primarily capture static correlations, the robustness and sample-efficiency claims would rest on the asymmetric architecture rather than the pretraining.
Authors: The masked multisensory reconstruction objective is indeed reconstructive and action-free; however, because the model must reconstruct force and proprioceptive signals from partial or missing visual and proprioceptive inputs across time, it is forced to capture cross-modal temporal correlations and force transients. Our experiments already demonstrate that MSDP embeddings yield faster learning and greater robustness to object-dynamics changes and sensor noise than baselines that use the same asymmetric architecture without pretraining. To directly address the concern, we will add a new analysis subsection that probes the frozen embeddings for their ability to predict short-term force changes and state transitions, together with an ablation isolating the contribution of pretraining versus the critic’s cross-attention mechanism. revision: partial
Circularity Check
No significant circularity; pretraining objective independent of downstream task
full rationale
The paper's core derivation separates masked autoencoding pretraining (reconstructive, action-free, on multisensory observations) from the downstream asymmetric actor-critic RL stage. No equations or steps reduce the reported success rates or robustness claims to a fitted quantity defined by the evaluation data itself. The pretraining loss is defined independently of task rewards, and the frozen embeddings are used without task-specific adaptation during pretraining. This satisfies the default expectation of a self-contained pipeline with no load-bearing self-definition or fitted-input-as-prediction patterns.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/AbsoluteFloorClosure.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
MSDP is based on masked autoencoding and trains a transformer-based encoder by reconstructing multisensory observations from only a subset of sensor embeddings, leading to cross-modal prediction and sensor fusion.
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
The critic uses a single cross-attention layer with a learnable query and the multisensory embeddings from the MSDP encoder as keys and values.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Playing Atari with Deep Reinforcement Learning
V . Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wier- stra, and M. Riedmiller, “Playing Atari with Deep Reinforcement Learning,” Dec. 2013, arXiv:1312.5602 [cs]
work page internal anchor Pith review Pith/arXiv arXiv 2013
-
[2]
Z. Zhuang, Z. Fu, J. Wang, C. Atkeson, S. Schwertfeger, C. Finn, and H. Zhao, “Robot Parkour Learning,” Sep. 2023, arXiv:2309.05665 [cs]
-
[3]
Towards Vision-Based Deep Reinforcement Learning for Robotic Motion Control
F. Zhang, J. Leitner, M. Milford, B. Upcroft, and P. Corke, “Towards Vision-Based Deep Reinforcement Learning for Robotic Motion Control,” Nov. 2015, arXiv:1511.03791 [cs]
work page internal anchor Pith review Pith/arXiv arXiv 2015
-
[4]
M. A. Lee, Y . Zhu, K. Srinivasan, P. Shah, S. Savarese, L. Fei-Fei, A. Garg, and J. Bohg, “Making Sense of Vision and Touch: Self- Supervised Learning of Multimodal Representations for Contact-Rich Tasks,”CoRR, vol. abs/1810.10191, 2018, _eprint: 1810.10191
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[5]
C. Sferrazza, Y . Seo, H. Liu, Y . Lee, and P. Abbeel, “The Power of the Senses: Generalizable Manipulation from Vision and Touch through Masked Multimodal Learning,” Nov. 2023, arXiv:2311.00924 [cs]
-
[6]
Sensor fusion for compliant robot motion control,
J. G. García, A. Robertsson, J. G. Ortega, and R. Johansson, “Sensor fusion for compliant robot motion control,”IEEE Transactions on Robotics, vol. 24, no. 2, pp. 430–441, 2008
work page 2008
-
[7]
Dexterous robotic manipulation of deformable objects with multi-sensory feedback-a review,
F. F. Khalil and P. Payeur, “Dexterous robotic manipulation of deformable objects with multi-sensory feedback-a review,”Robot Manipulators Trends and Development, no. March 2010, 2010
work page 2010
-
[8]
Y . Hu, Z. Li, G. Li, P. Yuan, C. Yang, and R. Song, “Development of sensory-motor fusion-based manipulation and grasping control for a robotic hand-eye system,”IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 47, no. 7, pp. 1169–1180, 2016
work page 2016
-
[9]
A review on sensory perception for dexterous robotic manipulation,
Z. Xia, Z. Deng, B. Fang, Y . Yang, and F. Sun, “A review on sensory perception for dexterous robotic manipulation,”International Journal of Advanced Robotic Systems, vol. 19, no. 2, p. 17298806221095974, 2022
work page 2022
-
[10]
A review on challenges of autonomous mobile robot and sensor fusion methods,
M. B. Alatise and G. P. Hancke, “A review on challenges of autonomous mobile robot and sensor fusion methods,”IEEE Access, vol. 8, pp. 39 830–39 846, 2020
work page 2020
-
[11]
See, Hear, and Feel: Smart Sensory Fusion for Robotic Manipulation,
H. Li, Y . Zhang, J. Zhu, S. Wang, M. A. Lee, H. Xu, E. Adelson, L. Fei-Fei, R. Gao, and J. Wu, “See, Hear, and Feel: Smart Sensory Fusion for Robotic Manipulation,” Dec. 2022, arXiv:2212.03858 [cs]
-
[12]
Y . Hao, R. Wang, Z. Cao, Z. Wang, Y . Cui, and D. Sadigh, “Masked Imitation Learning: Discovering Environment-Invariant Modalities in Multimodal Demonstrations,” in2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Oct. 2023, pp. 1–7, iSSN: 2153-0866
work page 2023
-
[13]
MResT: Multi-Resolution Sensing for Real-Time Control with Vision-Language Models,
S. Saxena, M. Sharma, and O. Kroemer, “MResT: Multi-Resolution Sensing for Real-Time Control with Vision-Language Models,” Jan. 2024, arXiv:2401.14502 [cs]
-
[14]
Data quality in imitation learning,
S. Belkhale, Y . Cui, and D. Sadigh, “Data quality in imitation learning,” Advances in Neural Information Processing Systems, vol. 36, 2024
work page 2024
-
[15]
Visuo-Tactile Transformers for Manipulation,
Y . Chen, A. Sipos, M. Van der Merwe, and N. Fazeli, “Visuo-Tactile Transformers for Manipulation,” Sep. 2022, arXiv:2210.00121 [cs]
-
[16]
Q. Liu, Z. Sun, Y . Cui, L. Gaofeng, Q. Ye, and J. Chen,Masked Visual-Tactile Pre-training for Robot Manipulation, Feb. 2024
work page 2024
-
[17]
MultiMAE: Multi-modal Multi-task Masked Autoencoders,
R. Bachmann, D. Mizrahi, A. Atanov, and A. Zamir, “MultiMAE: Multi-modal Multi-task Masked Autoencoders,” Apr. 2022
work page 2022
-
[18]
Multimodal Masked Autoencoders Learn Transferable Representations,
X. Geng, H. Liu, L. Lee, D. Schuurmans, S. Levine, and P. Abbeel, “Multimodal Masked Autoencoders Learn Transferable Representations,” May 2022
work page 2022
-
[19]
Simple Masked Training Strategies Yield Control Policies That Are Robust to Sensor Failure,
S. Skand, B. Pandit, C. Kim, L. Fuxin, and S. Lee, “Simple Masked Training Strategies Yield Control Policies That Are Robust to Sensor Failure,” Sep. 2024
work page 2024
-
[20]
Learning End-to-end Multimodal Sensor Policies for Autonomous Navigation,
G.-H. Liu, A. Siravuru, S. Prabhakar, M. Veloso, and G. Kantor, “Learning End-to-end Multimodal Sensor Policies for Autonomous Navigation,” 2017
work page 2017
-
[21]
Real-World Robot Learning with Masked Visual Pre-training,
I. Radosavovic, T. Xiao, S. James, P. Abbeel, J. Malik, and T. Darrell, “Real-World Robot Learning with Masked Visual Pre-training,” Oct. 2022, arXiv:2210.03109 [cs]
-
[22]
Historical perspective and state of the art in robot force control,
D. E. Whitney, “Historical perspective and state of the art in robot force control,”The International Journal of Robotics Research, vol. 6, no. 1, pp. 3–14, 1987
work page 1987
-
[23]
Quasi-static assembly of compliantly supported rigid parts,
D. E. Whitneyet al., “Quasi-static assembly of compliantly supported rigid parts,”Journal of Dynamic Systems, Measurement, and Control, vol. 104, no. 1, pp. 65–77, 1982
work page 1982
-
[24]
Learning the peg-into-hole assembly operation with a connectionist reinforcement technique,
M. Nuttin and H. Van Brussel, “Learning the peg-into-hole assembly operation with a connectionist reinforcement technique,”Computers in Industry, vol. 33, no. 1, pp. 101–109, 1997
work page 1997
-
[25]
End-to-end training of deep visuomotor policies,
S. Levine, C. Finn, T. Darrell, and P. Abbeel, “End-to-end training of deep visuomotor policies,”Journal of Machine Learning Research, vol. 17, no. 39, pp. 1–40, 2016
work page 2016
-
[26]
A review on reinforce- ment learning for contact-rich robotic manipulation tasks,
Í. Elguea-Aguinaco, A. Serrano-Muñoz, D. Chrysostomou, I. Inziarte- Hidalgo, S. Bøgh, and N. Arana-Arexolaleiba, “A review on reinforce- ment learning for contact-rich robotic manipulation tasks,”Robotics and Computer-Integrated Manufacturing, vol. 81, p. 102517, 2023
work page 2023
-
[27]
A survey of robot manipulation in contact,
M. Suomalainen, Y . Karayiannidis, and V . Kyrki, “A survey of robot manipulation in contact,”Robotics and Autonomous Systems, vol. 156, p. 104224, 2022
work page 2022
-
[28]
Deep reinforcement learning for the control of robotic manipulation: a focussed mini-review,
R. Liu, F. Nageotte, P. Zanne, M. de Mathelin, and B. Dresp-Langley, “Deep reinforcement learning for the control of robotic manipulation: a focussed mini-review,”Robotics, vol. 10, no. 1, p. 22, 2021
work page 2021
-
[29]
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby, “An Image is Worth 16x16 Words: Trans- formers for Image Recognition at Scale,”CoRR, vol. abs/2010.11929, 2020, _eprint: 2010.11929
work page internal anchor Pith review Pith/arXiv arXiv 2010
-
[30]
Stochastic Latent Actor-Critic: Deep Reinforcement Learning with a Latent Variable Model,
A. X. Lee, A. Nagabandi, P. Abbeel, and S. Levine, “Stochastic Latent Actor-Critic: Deep Reinforcement Learning with a Latent Variable Model,” Oct. 2020, arXiv:1907.00953 [cs]
-
[31]
Hearing Touch: Audio-Visual Pretraining for Contact-Rich Manipulation,
J. Mejia, V . Dean, T. Hellebrekers, and A. Gupta, “Hearing Touch: Audio-Visual Pretraining for Contact-Rich Manipulation,” May 2024, arXiv:2405.08576 [cs]
-
[32]
Audio-Visual In- stance Discrimination with Cross-Modal Agreement,
P. Morgado, N. Vasconcelos, and I. Misra, “Audio-Visual In- stance Discrimination with Cross-Modal Agreement,” Mar. 2021, arXiv:2004.12943 [cs]
-
[33]
Making sense of vision and touch: Learning multimodal representations for contact-rich tasks,
M. A. Lee, Y . Zhu, P. Zachares, M. Tan, K. Srinivasan, S. Savarese, L. Fei-Fei, A. Garg, and J. Bohg, “Making sense of vision and touch: Learning multimodal representations for contact-rich tasks,”IEEE Transactions on Robotics, vol. 36, no. 3, pp. 582–596, 2020
work page 2020
-
[34]
See, hear, and feel: Smart sensory fusion for robotic manipulation,
H. Li, Y . Zhang, J. Zhu, S. Wang, M. A. Lee, H. Xu, E. Adelson, L. Fei-Fei, R. Gao, and J. Wu, “See, hear, and feel: Smart sensory fusion for robotic manipulation,” inConference on Robot Learning. PMLR, 2023, pp. 1368–1378
work page 2023
-
[35]
A. Li, R. Liu, X. Yang, and Y . Lou, “Reinforcement learning strategy based on multimodal representations for high-precision assembly tasks,” inIntelligent Robotics and Applications: 14th International Conference, ICIRA 2021, Yantai, China, October 22–25, 2021, Proceedings, Part I
work page 2021
-
[36]
Springer, 2021, pp. 56–66
work page 2021
-
[37]
Dexterity from touch: Self-supervised pre-training of tactile representations with robotic play,
I. Guzey, B. Evans, S. Chintala, and L. Pinto, “Dexterity from touch: Self-supervised pre-training of tactile representations with robotic play,” arXiv preprint arXiv:2303.12076, 2023
-
[38]
Y . Han, K. Yu, R. Batra, N. Boyd, C. Mehta, T. Zhao, Y . She, S. Hutchinson, and Y . Zhao, “Learning generalizable vision-tactile robotic grasping strategy for deformable objects via transformer,” IEEE/ASME Transactions on Mechatronics, 2024
work page 2024
-
[39]
Play to the score: Stage-guided dynamic multi-sensory fusion for robotic manipulation,
R. Feng, D. Hu, W. Ma, and X. Li, “Play to the score: Stage-guided dynamic multi-sensory fusion for robotic manipulation,” in8th Annual Conference on Robot Learning, 2024
work page 2024
-
[40]
V . Dave, F. Lygerakis, and E. Rueckert, “Multimodal Visual-Tactile Representation Learning through Self-Supervised Contrastive Pre- Training,” Jan. 2024, arXiv:2401.12024 [cs]
-
[41]
F. Lygerakis, V . Dave, and E. Rueckert, “M2CURL: Sample-Efficient Multimodal Reinforcement Learning via Self-Supervised Representa- tion Learning for Robotic Manipulation,” Jun. 2024, arXiv:2401.17032 [cs]
-
[42]
Partially Observable Markov Decision Processes (POMDPs) and Robotics,
H. Kurniawati, “Partially Observable Markov Decision Processes (POMDPs) and Robotics,” Jul. 2021, arXiv:2107.07599 [cs]
-
[43]
Soft Actor-Critic Algorithms and Applications
T. Haarnoja, A. Zhou, K. Hartikainen, G. Tucker, S. Ha, J. Tan, V . Kumar, H. Zhu, A. Gupta, P. Abbeel, and S. Levine, “Soft Actor- Critic Algorithms and Applications,” 2019, _eprint: 1812.05905
work page internal anchor Pith review Pith/arXiv arXiv 2019
-
[44]
Efficient Online Rein- forcement Learning with Offline Data,
P. J. Ball, L. Smith, I. Kostrikov, and S. Levine, “Efficient Online Rein- forcement Learning with Offline Data,” May 2023, arXiv:2302.02948 [cs]
-
[45]
Masked Autoencoders Are Scalable Vision Learners
K. He, X. Chen, S. Xie, Y . Li, P. Dollár, and R. Girshick, “Masked Au- toencoders Are Scalable Vision Learners,” Dec. 2021, arXiv:2111.06377 [cs]
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[46]
Context Autoencoder for Self-Supervised Representation Learning,
X. Chen, M. Ding, X. Wang, Y . Xin, S. Mo, Y . Wang, S. Han, P. Luo, G. Zeng, and J. Wang, “Context Autoencoder for Self-Supervised Representation Learning,” Aug. 2023, arXiv:2202.03026 [cs]
-
[47]
Masked World Models for Visual Control,
Y . Seo, D. Hafner, H. Liu, F. Liu, S. James, K. Lee, and P. Abbeel, “Masked World Models for Visual Control,” May 2023, arXiv:2206.14244 [cs]
-
[48]
Early Convolutions Help Transformers See Better,
T. Xiao, M. Singh, E. Mintun, T. Darrell, P. Dollár, and R. Gir- shick, “Early Convolutions Help Transformers See Better,” Oct. 2021, arXiv:2106.14881 [cs]
-
[49]
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, “Attention is All you Need,” in Advances in Neural Information Processing Systems, I. Guyon, U. V . Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, Eds., vol. 30. Curran Associates, Inc., 2017
work page 2017
-
[50]
Masked visual pre-training for motor control.arXiv preprint arXiv:2203.06173, 2022
T. Xiao, I. Radosavovic, T. Darrell, and J. Malik, “Masked Visual Pre-training for Motor Control,” Mar. 2022, arXiv:2203.06173
-
[51]
Where are we in the search for an Artificial Visual Cortex for Embodied Intelligence?
A. Majumdar, K. Yadav, S. Arnaud, Y . J. Ma, C. Chen, S. Silwal, A. Jain, V .-P. Berges, P. Abbeel, J. Malik, D. Batra, Y . Lin, O. Maksymets, A. Rajeswaran, and F. Meier, “Where are we in the search for an Artificial Visual Cortex for Embodied Intelligence?” Feb. 2024, arXiv:2303.18240 [cs]
-
[52]
Studying the Interplay Between the Actor and Critic Representations in Reinforcement Learning,
S. Garcin, T. McInroe, P. S. Castro, P. Panangaden, C. G. Lucas, D. Abel, and S. V . Albrecht, “Studying the Interplay Between the Actor and Critic Representations in Reinforcement Learning,” Mar. 2025, arXiv:2503.06343 [cs]
-
[53]
panda-gym: Open-Source Goal-Conditioned Environments for Robotic Learning,
Q. Gallouédec, N. Cazin, E. Dellandréa, and L. Chen, “panda-gym: Open-Source Goal-Conditioned Environments for Robotic Learning,” 4th Robot Learning Workshop: Self-Supervised and Lifelong Learning at NeurIPS, 2021
work page 2021
-
[54]
PyBullet, a Python module for physics simulation for games, robotics and machine learning,
E. Coumans and Y . Bai, “PyBullet, a Python module for physics simulation for games, robotics and machine learning,” 2016
work page 2016
-
[55]
Serl: A software suite for sample-efficient robotic reinforcement learning,
J. Luo, Z. Hu, C. Xu, Y . L. Tan, J. Berg, A. Sharma, S. Schaal, C. Finn, A. Gupta, and S. Levine, “Serl: A software suite for sample-efficient robotic reinforcement learning,” 2024
work page 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.