Continual Domain Randomization
Pith reviewed 2026-05-24 03:02 UTC · model grok-4.3
The pith
Continual Domain Randomization trains robotic policies by sequentially adding simulation parameter randomizations while using continual learning to retain effects from earlier stages.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By combining domain randomization with continual learning, a policy can be trained sequentially on subsets of randomization parameters starting from a non-randomized simulation; this yields effective learning in simulation and robust real-robot performance on reaching and grasping tasks that matches or outperforms baselines using combined randomization or sequential randomization without continual learning.
What carries the argument
Continual Domain Randomization (CDR), the mechanism that applies continual learning to retain the effects of previous randomization subsets while training on new parameter groups in sequence.
If this is right
- Policies achieve robust real-world transfer without needing to solve the full randomization problem simultaneously.
- Training begins in an easier non-randomized setting before complexity is added incrementally.
- Continual learning preserves the benefits of each randomization stage for the final policy.
- The method produces performance that is at least as good as standard domain randomization on the tested robotic tasks.
Where Pith is reading between the lines
- The same sequential schedule could be applied to other manipulation or locomotion tasks to test whether the robustness gains generalize.
- Different continual learning techniques might be substituted to determine which best retains randomization effects without additional hyperparameter tuning.
- The order in which parameter subsets are introduced may influence final policy quality and could be optimized as a separate design choice.
Load-bearing premise
That continual learning methods can reliably prevent catastrophic forgetting of prior randomization effects when new parameter subsets are introduced sequentially, preserving policy performance across the training sequence.
What would settle it
A direct comparison experiment in which the CDR-trained policy exhibits clearly worse real-robot success rates or measurable forgetting of earlier randomization effects than a combined-randomization baseline would falsify the central claim.
Figures
read the original abstract
Domain Randomization (DR) is commonly used for sim2real transfer of reinforcement learning (RL) policies in robotics. Most DR approaches require a simulator with a fixed set of tunable parameters from the start of the training, from which the parameters are randomized simultaneously to train a robust model for use in the real world. However, the combined randomization of many parameters increases the task difficulty and might result in sub-optimal policies. To address this problem and to provide a more flexible training process, we propose Continual Domain Randomization (CDR) for RL that combines domain randomization with continual learning to enable sequential training in simulation on a subset of randomization parameters at a time. Starting from a model trained in a non-randomized simulation where the task is easier to solve, the model is trained on a sequence of randomizations, and continual learning is employed to remember the effects of previous randomizations. Our robotic reaching and grasping tasks experiments show that the model trained in this fashion learns effectively in simulation and performs robustly on the real robot while matching or outperforming baselines that employ combined randomization or sequential randomization without continual learning. Our code and videos are available at https://continual-dr.github.io/.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes Continual Domain Randomization (CDR), which augments domain randomization for RL policies by training sequentially on subsets of randomization parameters, starting from a non-randomized simulation and employing continual learning to retain effects of prior randomizations. Experiments on robotic reaching and grasping tasks indicate that CDR policies learn effectively in simulation and transfer robustly to the real robot, matching or outperforming baselines that use combined randomization or sequential randomization without continual learning.
Significance. If the empirical results hold with proper verification of retention, CDR offers a more flexible alternative to simultaneous multi-parameter randomization, potentially yielding higher-performing policies for sim-to-real transfer by reducing task difficulty during training.
major comments (2)
- [Abstract, Experiments] Abstract and Experiments section: The central claim that CDR matches or outperforms the combined-randomization and sequential-without-CL baselines on real-robot reaching/grasping requires that the continual-learning component actually preserves policy performance on earlier randomization subsets after later subsets are introduced. No diagnostic metrics (e.g., return on task-1 after completing task-3), algorithm name, regularization strength, or replay-buffer size are reported, rendering the advantage dependent on an unverified retention property.
- [Experiments] Experiments section: No quantitative metrics, learning curves, or statistical significance tests are provided for the reported positive results versus baselines, limiting assessment of whether the observed robustness is reliable or merely anecdotal.
minor comments (1)
- [Abstract] The abstract states that code and videos are available at a URL, but the manuscript should include a brief description of the randomization schedule and task sequence to allow readers to understand the experimental protocol without external resources.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. We address the major comments below and will revise the manuscript to incorporate additional verification and quantitative analysis.
read point-by-point responses
-
Referee: [Abstract, Experiments] Abstract and Experiments section: The central claim that CDR matches or outperforms the combined-randomization and sequential-without-CL baselines on real-robot reaching/grasping requires that the continual-learning component actually preserves policy performance on earlier randomization subsets after later subsets are introduced. No diagnostic metrics (e.g., return on task-1 after completing task-3), algorithm name, regularization strength, or replay-buffer size are reported, rendering the advantage dependent on an unverified retention property.
Authors: We agree that explicit verification of retention is important for substantiating the advantage of CDR. The manuscript will be revised to include diagnostic metrics such as policy returns on earlier randomization subsets (e.g., task-1) after training on later subsets. We will also specify the continual learning algorithm used, regularization strength, and any replay buffer details to allow full verification of the retention property. revision: yes
-
Referee: [Experiments] Experiments section: No quantitative metrics, learning curves, or statistical significance tests are provided for the reported positive results versus baselines, limiting assessment of whether the observed robustness is reliable or merely anecdotal.
Authors: We acknowledge the lack of quantitative support in the current version. The revised manuscript will include learning curves from simulation training, quantitative real-robot performance metrics with variability measures (e.g., standard deviation across runs), and statistical significance tests comparing CDR against the baselines to demonstrate reliability of the results. revision: yes
Circularity Check
No circularity in empirical method paper
full rationale
The paper proposes Continual Domain Randomization by combining domain randomization with continual learning for sequential parameter randomization in RL, then evaluates the approach via reaching and grasping experiments on simulated and real robots against combined-randomization and sequential-without-CL baselines. No mathematical derivation, fitted parameters renamed as predictions, or self-referential definitions appear; the central claim rests on direct experimental comparison rather than any reduction to inputs by construction. No load-bearing self-citations or ansatzes are present.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Continual learning techniques can mitigate forgetting across sequential additions of domain randomization parameters
Reference graph
Works this paper leans on
-
[1]
Sim-to-Real Transfer in Deep Reinforcement Learning for Robotics: A Survey,
W. Zhao, J. P. Queralta, and T. Westerlund, “Sim-to-Real Transfer in Deep Reinforcement Learning for Robotics: A Survey,” in IEEE Symposium Series on Computational Intelligence (SSCI) , Canberra, ACT, Australia, Dec. 2020, pp. 737–744
work page 2020
-
[2]
Challenges of Real-World Reinforcement Learning: Definitions, Benchmarks and Analysis,
G. Dulac-Arnold, N. Levine, D. J. Mankowitz, J. Li, C. Paduraru, S. Gowal, and T. Hester, “Challenges of Real-World Reinforcement Learning: Definitions, Benchmarks and Analysis,” Machine Learning, vol. 110, no. 9, pp. 2419–2468, 2021
work page 2021
-
[3]
Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World,
J. Tobin, R. Fong, A. Ray, J. Schneider, W. Zaremba, and P. Abbeel, “Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World,” in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) , 2017, pp. 23–30
work page 2017
-
[4]
J. Josifovski, M. Malmir, N. Klarmann, B. L. Zagar, N. Navarro- Guerrero, and A. Knoll, “Analysis of Randomization Effects on Sim2Real Transfer in Reinforcement Learning for Robotic Manip- ulation Tasks,” in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) . Kyoto, Japan: IEEE, Oct. 2022, pp. 10 193–10 200
work page 2022
-
[5]
Continual Lifelong Learning with Neural Networks: A Review,
G. I. Parisi, R. Kemker, J. L. Part, C. Kanan, and S. Wermter, “Continual Lifelong Learning with Neural Networks: A Review,” Neural Networks, vol. 113, pp. 54–71, May 2019
work page 2019
-
[6]
Proximal Policy Optimization Algorithms
J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal Policy Optimization Algorithms, Tech. Rep. arXiv: 1707.06347, July 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[7]
Overcoming Catastrophic Forgetting in Neural Networks,
J. Kirkpatrick, R. Pascanu, N. Rabinowitz, J. Veness, G. Desjardins, A. A. Rusu, K. Milan, J. Quan, T. Ramalho, A. Grabska-Barwinska, D. Hassabis, C. Clopath, D. Kumaran, and R. Hadsell, “Overcoming Catastrophic Forgetting in Neural Networks,” Proceedings of the National Academy of Sciences, vol. 114, no. 13, pp. 3521–3526, 2017
work page 2017
-
[8]
Progress & Compress: A Scalable Framework for Continual Learning,
J. Schwarz, W. Czarnecki, J. Luketina, A. Grabska-Barwinska, Y . W. Teh, R. Pascanu, and R. Hadsell, “Progress & Compress: A Scalable Framework for Continual Learning,” in International Conference on Machine Learning (ICML) , vol. 80. PMLR, 2018, pp. 4528–4537
work page 2018
-
[9]
3D Simulation for Robot Arm Control with Deep Q-Learning,
S. James and E. Johns, “3D Simulation for Robot Arm Control with Deep Q-Learning,” in NIPS Workshop: Deep Learning for Action and Interaction, Barcelona, Spain, Dec. 2016
work page 2016
-
[10]
Sim2Real Transfer for Reinforcement Learning Without Dynamics Randomization,
M. Kaspar, J. D. Mu ˜noz Osorio, and J. Bock, “Sim2Real Transfer for Reinforcement Learning Without Dynamics Randomization,” in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Las Vegas, NV , USA: IEEE, Oct. 2020, pp. 4383–4388
work page 2020
-
[11]
Robot Learning from Randomized Simulations: A Review,
F. Muratore, F. Ramos, G. Turk, W. Yu, M. Gienger, and J. Peters, “Robot Learning from Randomized Simulations: A Review,”Frontiers in Robotics and AI , vol. 9, no. 799893, Apr. 2022
work page 2022
-
[12]
J. Josifovski, M. Kerzel, C. Pregizer, L. Posniak, and S. Wermter, “Object Detection and Pose Estimation Based on Convolutional Neural Networks Trained with Synthetic Data,” in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) , Madrid, Spain, Oct. 2018, pp. 6269–6276
work page 2018
-
[13]
Sim-to- Real Transfer of Robotic Control with Dynamics Randomization,
X. B. Peng, M. Andrychowicz, W. Zaremba, and P. Abbeel, “Sim-to- Real Transfer of Robotic Control with Dynamics Randomization,” in IEEE International Conference on Robotics and Automation (ICRA) , Brisbane, QLD, Australia, 2018, pp. 3803–3810, iSSN: 2577-087X
work page 2018
-
[14]
Solving Rubik's Cube with a Robot Hand
OpenAI, I. Akkaya, M. Andrychowicz, M. Chociej, M. Litwin, B. McGrew, A. Petron, A. Paino, M. Plappert, G. Powell, R. Ribas, J. Schneider, N. Tezak, J. Tworek, P. Welinder, L. Weng, Q. Yuan, W. Zaremba, and L. Zhang, “Solving Rubik’s Cube with a Robot Hand, Tech. Rep. arXiv:1910.07113, Oct. 2019
work page internal anchor Pith review Pith/arXiv arXiv 1910
-
[15]
B. Mehta, M. Diaz, F. Golemo, C. J. Pal, and L. Paull, “Active Domain Randomization,” in Conference on Robot Learning (CoRL) , vol. 100. Osaka, Japan: PMLR, May 2020, pp. 1162–1176
work page 2020
-
[16]
BayesSim: Adaptive Domain Randomization Via Probabilistic Inference for Robotics Simulators,
F. Ramos, R. Possas, and D. Fox, “BayesSim: Adaptive Domain Randomization Via Probabilistic Inference for Robotics Simulators,” in Robotics: Science and Systems (R:SS) , vol. 15, June 2019
work page 2019
-
[17]
Neural Posterior Domain Randomization,
F. Muratore, T. Gruner, F. Wiese, B. Belousov, M. Gienger, and J. Peters, “Neural Posterior Domain Randomization,” in Conference on Robot Learning (CoRL) , vol. 164. PMLR, Nov. 2021, pp. 1532– 1542
work page 2021
-
[18]
What Went Wrong? Closing the Sim-to-Real Gap Via Differentiable Causal Discovery,
P. Huang, X. Zhang, Z. Cao, S. Liu, M. Xu, W. Ding, J. Francis, B. Chen, and D. Zhao, “What Went Wrong? Closing the Sim-to-Real Gap Via Differentiable Causal Discovery,” in Conference on Robot Learning (CoRL), Atlanta, GA, USA, Aug. 2023
work page 2023
-
[19]
DROPO: Sim-to-Real Transfer with Offline Domain Randomization,
G. Tiboni, K. Arndt, and V . Kyrki, “DROPO: Sim-to-Real Transfer with Offline Domain Randomization,” Robotics and Autonomous Sys- tems, vol. 166, p. 104432, Aug. 2023
work page 2023
-
[20]
Hypernetwork-PPO for Continual Reinforcement Learning,
P. Sch ¨opf, S. Auddy, J. Hollenstein, and A. Rodr ´ıguez-S´anchez, “Hypernetwork-PPO for Continual Reinforcement Learning,” in Deep Reinforcement Learning Workshop NeurIPS 2022 , Dec. 2022
work page 2022
-
[21]
Continual Learning from Demonstration of Robotics Skills,
S. Auddy, J. Hollenstein, M. Saveriano, A. Rodr ´ıguez-S´anchez, and J. Piater, “Continual Learning from Demonstration of Robotics Skills,” Robotics and Autonomous Systems , vol. 165, p. 104427, 2023
work page 2023
-
[22]
——, “Scalable and Efficient Continual Learning from Demonstration Via a Hypernetwork-Generated Stable Dynamics Model, Tech. Rep. arXiv:2311.03600, Jan. 2024, eprint: 2311.03600
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[23]
Towards Continual Reinforcement Learning: A Review and Perspectives,
K. Khetarpal, M. Riemer, I. Rish, and D. Precup, “Towards Continual Reinforcement Learning: A Review and Perspectives,” Journal of Artificial Intelligence Research, vol. 75, pp. 1401–1476, Dec. 2022
work page 2022
-
[24]
Y . Luo, Y . Wang, K. Dong, Q. Zhang, E. Cheng, Z. Sun, and B. Song, “Relay Hindsight Experience Replay: Self-Guided Continual Reinforcement Learning for Sequential Object Manipulation Tasks with Sparse Rewards,” Neurocomputing, vol. 557, p. 126620, Nov. 2023
work page 2023
-
[25]
Continual Learning on Incremental Simulations for Real-World Robotic Manip- ulation Tasks,
J. Josifovski, M. Malmir, N. Klarmann, and A. Knoll, “Continual Learning on Incremental Simulations for Real-World Robotic Manip- ulation Tasks,” in 2nd R:SS Workshop on Closing the Reality Gap in Sim2Real Transfer for Robotics, Corvallis, OR, USA, July 2020, p. 3
work page 2020
-
[26]
A. A. Rusu, N. C. Rabinowitz, G. Desjardins, H. Soyer, J. Kirkpatrick, K. Kavukcuoglu, R. Pascanu, and R. Hadsell, “Progressive Neural Networks, Tech. Rep. arXiv:1606.04671, Oct. 2016
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[27]
Sim-to-Real Robot Learning from Pixels with Progressive Nets,
A. A. Rusu, M. Ve ˇcer´ık, T. Roth ¨orl, N. Heess, R. Pascanu, and R. Hadsell, “Sim-to-Real Robot Learning from Pixels with Progressive Nets,” in Annual Conference on Robot Learning (CoRL) , vol. 78. Mountain View, CA, USA: PMLR, 2017, pp. 262–270, iSSN: 2640- 3498
work page 2017
-
[28]
A. A. Rusu, S. G. Colmenarejo, C. Gulcehre, G. Desjardins, J. Kirk- patrick, R. Pascanu, V . Mnih, K. Kavukcuoglu, and R. Hadsell, “Policy Distillation,” in International Conference on Learning Representations (ICLR). San Juan, Puerto Rico: arXiv, May 2016, eprint: 1511.06295
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[29]
R. Traor ´e, H. Caselles-Dupr ´e, T. Lesort, T. Sun, N. D ´ıaz-Rodr´ıguez, and D. Filliat, “Continual Reinforcement Learning Deployed in Real- Life Using Policy Distillation and Sim2real Transfer,” in ICML Work- shop on Multi-Task and Lifelong Learning . arXiv, June 2019
work page 2019
-
[30]
UNCLEAR: A Straightforward Method for Continual Reinforcement Learning,
S. Kessler, J. Parker-Holder, P. Ball, S. Zohren, and S. J. Roberts, “UNCLEAR: A Straightforward Method for Continual Reinforcement Learning,” in ICML Workshop on Continual Learning , vol. 108. Vienna, Austria: PMLR, 2020
work page 2020
-
[31]
Safety-Oriented Stability Biases for Continual Learning,
A. Gaurav, “Safety-Oriented Stability Biases for Continual Learning,” Master’s thesis, University of Waterloo, 2020
work page 2020
-
[32]
“Robotiq Gripper.” [Online]. Available: http://robotiq.com/products/ industrial-robot-hand/
-
[33]
A Comparison of Action Spaces for Learning Manipulation Tasks,
P. Varin, L. Grossman, and S. Kuindersma, “A Comparison of Action Spaces for Learning Manipulation Tasks,” in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) , Macau,China, 2019, pp. 6015–6021
work page 2019
-
[34]
Parameter Identification of the KUKA LBR iiwa Robot Including Constraints on Physical Feasibility,
Y . R. St ¨urz, L. M. Affolter, and R. S. Smith, “Parameter Identification of the KUKA LBR iiwa Robot Including Constraints on Physical Feasibility,”IFAC-PapersOnLine, vol. 50, no. 1, pp. 6863–6868, 2017
work page 2017
-
[35]
ROS-Industrial — Applying the Robot Operating System (ROS) to Industrial Applications,
S. Edwards and C. Lewis, “ROS-Industrial — Applying the Robot Operating System (ROS) to Industrial Applications,” in ECHORD Workshop at the IEEE International Conference on Robotics and Automation (ICRA), St. Paul, MN, USA, 2012
work page 2012
-
[36]
Towards MRI-Based Autonomous Robotic Us Ac- quisitions: A First Feasibility Study,
C. Hennersperger, B. Fuerst, S. Virga, O. Zettinig, B. Frisch, T. Neff, and N. Navab, “Towards MRI-Based Autonomous Robotic Us Ac- quisitions: A First Feasibility Study,” IEEE Transactions on Medical Imaging, vol. 36, no. 2, pp. 538–548, Feb. 2017
work page 2017
-
[37]
ROS: An Open-Source Robot Operating System,
M. Quigley, B. Gerkey, K. Conley, J. Faust, T. Foote, J. Leibs, E. Berger, R. Wheeler, and A. Ng, “ROS: An Open-Source Robot Operating System,” in ICRA Workshop on Open Source Software , vol. 3, 2009, p. 6
work page 2009
-
[38]
A. Hill, A. Raffin, M. Ernestus, A. Gleave, A. Kanervisto, R. Traore, P. Dhariwal, C. Hesse, O. Klimov, A. Nichol, M. Plappert, A. Radford, J. Schulman, S. Sidor, and Y . Wu, “Stable Baselines,” 2018
work page 2018
-
[39]
Smooth Exploration for Robotic Reinforcement Learning,
A. Raffin, J. Kober, and F. Stulp, “Smooth Exploration for Robotic Reinforcement Learning,” in Conference on Robot Learning (CoRL) , vol. 164. London, UK: PMLR, Jan. 2022, pp. 1634–1644
work page 2022
- [40]
-
[41]
Y . Kadokawa, L. Zhu, Y . Tsurumine, and T. Matsubara, “Cyclic policy distillation: Sample-efficient sim-to-real reinforcement learning with domain randomization,” Robotics and Autonomous Systems , vol. 165, p. 104425, 2023
work page 2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.