A Model-Driven Approach for Developing Families of Reinforcement Learning Environments

Istvan David; Xiaoran Liu

arxiv: 2606.20324 · v1 · pith:TKXVOTXQnew · submitted 2026-06-18 · 💻 cs.SE · cs.LG

A Model-Driven Approach for Developing Families of Reinforcement Learning Environments

Xiaoran Liu , Istvan David This is my paper

Pith reviewed 2026-06-26 16:11 UTC · model grok-4.3

classification 💻 cs.SE cs.LG

keywords model-driven engineeringreinforcement learninggenetic algorithmsenvironment generationmodel transformationscurriculum learningwildfire mitigation

0 comments

The pith

A hybrid genetic algorithm generates families of reinforcement learning environments by treating mutations and constraints as model transformations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper aims to replace the manual, error-prone creation of multiple similar but varied training environments for reinforcement learning agents with an automated model-driven process. A hybrid genetic algorithm performs the search, where changes to environments and rules limiting those changes are written as model transformations run by an existing transformation engine. This produces families of environments that support agent convergence, as shown in a wildfire mitigation example and in curriculum learning setups that gradually increase difficulty. A reader would care because realistic RL tasks often fail to converge without many environment variants, yet building those variants by hand does not scale to complex problems.

Core claim

The paper claims that a hybrid genetic algorithm, which combines population-based global search with heuristic local search, can generate families of RL environments when mutations and constraints are expressed as model transformations and executed by a state-of-the-art model transformation engine, with soundness shown through application to wildfire mitigation and curriculum learning.

What carries the argument

The hybrid genetic algorithm that operationalizes mutations and constraints as model transformations executed by a model transformation engine.

If this is right

Development of RL environment families shifts from labor-intensive manual work to an automated search process.
Environment variants become available at scale for any RL problem that needs multiple similar but distinct training settings.
Curriculum learning setups can be produced by generating sequences of environments with controlled increases in difficulty.
The same transformation-based search applies to other domains that require families of simulation environments, such as wildfire mitigation training.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The approach could be combined with existing RL frameworks so that generated environments are directly usable for training runs.
Model-based representation of environments might allow additional checks for safety or consistency before training begins.
The method could be tested on other RL domains, such as robotics or game playing, to see whether the same transformation encoding works without redesign.
Extending the search objective to include measured convergence speed or sample efficiency would produce environment families tuned for faster learning.

Load-bearing premise

That expressing mutations and constraints as model transformations is sufficient to produce environment families that support RL convergence without requiring substantial additional manual intervention or domain-specific tuning.

What would settle it

Running the generated environment families on the wildfire mitigation task and finding that RL agents fail to converge to meaningful behavior unless the environments or search process receive extra manual adjustments.

Figures

Figures reproduced from arXiv: 2606.20324 by Istvan David, Xiaoran Liu.

**Figure 1.** Figure 1: Example Burning Forest instance and its metamodel 2.1.3 Curriculum learning. In this work, we use curriculum learning (CL) as the representative example of learning paradigms that rely on families of environments. CL is a training strategy in which the learning process is organized as a sequence of training criteria that evolve over time [74], typically from simpler to more complex settings [4]. This allo… view at source ↗

**Figure 2.** Figure 2: Overview of the approach Example Defining the initial environment In the running example, the environment model defines the Forest composed of Tiles, specialized into BurningTile, RoadTile, VegetationTile, WaterTile, StartTile, and GoalTile, with a FireTruck agent whose currentState references a single tile, as shown in Fig. 1b. The expert specifies the initial environment model as a Burning Forest contain… view at source ↗

**Figure 3.** Figure 3: A family of Burning Forest environments in CL [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: Environment generation process 4.4 Generating families of environments In Step 4②✐, a GA generates the family of environments that meet the objective function and respect the defined constraints. ( [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

**Figure 5.** Figure 5: Generated curriculum of increasing complexity [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗

**Figure 6.** Figure 6: Cumulative reward with different prefixes [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗

**Figure 7.** Figure 7: Cumulative reward across all curricula [PITH_FULL_IMAGE:figures/full_fig_p008_7.png] view at source ↗

read the original abstract

Virtual training environments are software-intensive systems in which reinforcement learning (RL) agents learn, adapt, and demonstrate meaningful behavior. Virtual training environments offer a safe and cost-efficient alternative to training agents in real-world settings. However, to converge, most realistic RL problems require training in multiple, mostly similar but slightly different environments - i.e., families of environment variants. The typical development process of environment families is a labor-intensive and error-prone manual endeavor that does not scale well. To alleviate these issues, in this paper, we propose a model-driven approach for developing families of RL training environments. To obtain the family of environments, we develop an approach and prototype tool. In our approach, a hybrid genetic algorithm - a combination of population-based global search and heuristic local search - generates environment families. Mutations and constraints are expressed as model transformations and are operationalized into a search process by a state-of-the-art model transformation engine. We demonstrate the soundness of our approach in a wildfire mitigation scenario and curriculum learning - a particular learning paradigm that relies on environment families.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper combines model-driven engineering with a hybrid GA to generate RL environment families via model transformations, but provides no metrics showing reduced effort or better RL outcomes versus manual methods.

read the letter

The main point is a model-driven setup that uses a hybrid genetic algorithm to create families of RL environments, with mutations and constraints handled as model transformations run through a transformation engine.

This is a fresh combination for the RL training problem. The authors describe the approach clearly, built a prototype, and applied it to wildfire mitigation and curriculum learning. Those demonstrations confirm the method can produce environment variants without obvious breakage.

The soft spot is the missing evidence. The paper states the goal is to cut down on manual, error-prone work, yet it reports no comparisons of development time, number of hand-tuned parameters, or RL learning curves against manually built equivalents. Without those numbers it is hard to know whether the generated families actually help agents converge or just shift the manual work elsewhere.

This is for software engineers and applied RL researchers who build training setups and want ideas for automating variant generation. It is coherent on its own terms and shows honest engagement with the problem, so it deserves a serious referee to see the full details and ask for the missing comparisons.

Referee Report

2 major / 2 minor

Summary. The paper proposes a model-driven approach to generate families of RL training environments via a hybrid genetic algorithm that encodes mutations and constraints as model transformations executed by a model transformation engine. The approach is demonstrated for soundness on a wildfire mitigation scenario and on curriculum learning, with the claim that it reduces the labor-intensive manual development of environment variants.

Significance. If the generated families demonstrably support RL convergence with substantially lower manual effort than conventional methods, the work would address a practical bottleneck in RL engineering. The model-transformation framing and hybrid GA are technically coherent, but the significance hinges on evidence that the automation yields measurable gains in development effort or learning performance; without such evidence the contribution remains primarily methodological.

major comments (2)

[Evaluation] Evaluation section (wildfire mitigation and curriculum learning demonstrations): the paper shows that the hybrid GA produces environment families but reports no quantitative metrics comparing development effort (e.g., person-hours, number of hand-tuned parameters), RL convergence (success rate, sample efficiency, learning curves), or post-generation manual intervention against manually constructed baseline families. This absence directly undermines the central claim that the transformation-based approach alleviates labor-intensive manual development.
[Approach] Approach description (hybrid GA and model-transformation operationalization): while mutations and constraints are expressed as model transformations, the manuscript does not specify how domain-specific RL convergence requirements (e.g., reward shaping, state-space coverage) are encoded or validated within the transformation rules, leaving open whether substantial expert knowledge is still required to define the initial metamodel and constraints.

minor comments (2)

[Abstract] The abstract and introduction use the term 'soundness' for the demonstrations; clarify whether this refers to syntactic validity of generated environments, semantic correctness for RL, or empirical convergence.
Figure captions and pseudocode for the hybrid GA should explicitly label the population-based and local-search components to improve readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We agree that the evaluation would benefit from quantitative comparisons and will revise the manuscript to include them. We also clarify the encoding of domain-specific requirements in the approach description.

read point-by-point responses

Referee: [Evaluation] Evaluation section (wildfire mitigation and curriculum learning demonstrations): the paper shows that the hybrid GA produces environment families but reports no quantitative metrics comparing development effort (e.g., person-hours, number of hand-tuned parameters), RL convergence (success rate, sample efficiency, learning curves), or post-generation manual intervention against manually constructed baseline families. This absence directly undermines the central claim that the transformation-based approach alleviates labor-intensive manual development.

Authors: We acknowledge that the demonstrations focus on soundness and feasibility without direct quantitative baselines. To strengthen the central claim, we will revise the evaluation section to report metrics on development effort (person-hours and hand-tuned parameters) and RL performance (success rates, sample efficiency, learning curves) against manually constructed families, along with any post-generation interventions required. revision: yes
Referee: [Approach] Approach description (hybrid GA and model-transformation operationalization): while mutations and constraints are expressed as model transformations, the manuscript does not specify how domain-specific RL convergence requirements (e.g., reward shaping, state-space coverage) are encoded or validated within the transformation rules, leaving open whether substantial expert knowledge is still required to define the initial metamodel and constraints.

Authors: Domain expertise is required to define the initial metamodel and constraints that capture RL requirements such as reward shaping and state-space coverage. Once established, the transformations and hybrid GA automate variant generation and validation. We will revise the approach section to explicitly detail how these RL-specific elements are encoded in the transformation rules and validated during search, clarifying the division between initial expert input and subsequent automation. revision: yes

Circularity Check

0 steps flagged

No circularity; methodological proposal with external demonstration

full rationale

The paper presents a model-driven engineering method using a hybrid genetic algorithm operationalized via model transformations to generate families of RL environments. It demonstrates the prototype on wildfire mitigation and curriculum learning scenarios. No equations, fitted parameters, predictions derived from inputs, uniqueness theorems, or self-citation chains appear in the provided text. The central claim rests on the soundness of the prototype tool and empirical demonstration rather than any self-referential reduction of outputs to inputs by construction. This is a standard non-circular software engineering contribution.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only; no free parameters, axioms, or invented entities are described.

pith-pipeline@v0.9.1-grok · 5706 in / 915 out tokens · 39073 ms · 2026-06-26T16:11:03.040245+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

84 extracted references · 33 canonical work pages · 2 internal anchors

[1]

Hani Abdeen, Dániel Varró, Houari Sahraoui, András Szabolcs Nagy, Csaba Debreceni, Ábel Hegedüs, and Ákos Horváth. 2014. Multi-objective optimization in rule-based design space exploration. InProceedings of the 29th ACM/IEEE International Conference on Automated Software Engineering (ASE ’14). ACM, 289–300. doi:10.1145/2642937.2643005

work page doi:10.1145/2642937.2643005 2014
[2]

OpenAI: Marcin Andrychowicz et al . 2020. Learning dexterous in-hand ma- nipulation.The International Journal of Robotics Research39, 1 (2020), 3–20. doi:10.1177/0278364919887447

work page doi:10.1177/0278364919887447 2020
[3]

Angela Barriga, Rogardt Heldal, Adrian Rutle, and Ludovico Iovino. 2022. PAR- MOREL: a framework for customizable model repair.Soft. Sys. Mod.21, 5 (2022), 1739–1762

2022
[4]

Yoshua Bengio, Jérôme Louradour, Ronan Collobert, and Jason Weston. 2009. Curriculum learning. InProceedings of the 26th Annual International Conference on Machine Learning (ICML ’09). ACM, 41–48. doi:10.1145/1553374.1553380

work page doi:10.1145/1553374.1553380 2009
[5]

Gábor Bergmann, Istvan David, Ábel Hegedüs, Ákos Horváth, István Ráth, Zoltán Ujhelyi, and Dániel Varró. 2015. VIATRA 3: A Reactive Model Transformation Platform. InTheory and Practice of Model Transformations - 8th International Conference, ICMTSTAF 2015, L’Aquila, Italy, July 20-21, 2015. Proceedings (LNCS, Vol. 9152). Springer, 101–110. doi:10.1007/978...

work page doi:10.1007/978-3-319-21155-8_8 2015
[6]

Dimitris Bertsimas and John Tsitsiklis. 1993. Simulated Annealing.Statist. Sci.8, 1 (1993), 10 – 15. doi:10.1214/ss/1177011077

work page doi:10.1214/ss/1177011077 1993
[7]

Alexandru Burdusel, Steffen Zschaler, and Stefan John. 2021. Automatic genera- tion of atomic multiplicity-preserving search operators for search-based model engineering.Soft. Sys. Mod.20, 6 (2021), 1857–1887

2021
[8]

Thomas Chaffre, Julien Moras, Adrien Chan-Hon-Tong, and Julien Marzat. 2020. Sim-to-Real Transfer with Incremental Environment Complexity for Reinforce- ment Learning of Depth-Based Robot Navigation. InProceedings of the 17th International Conference on Informatics, Automation and Robotics, ICINCO 2020. 314–323. https://ensta.hal.science/hal-02958155

2020
[9]

Karl Cobbe, Chris Hesse, Jacob Hilton, and John Schulman. 2020. Leveraging Procedural Generation to Benchmark Reinforcement Learning. InProc of the 37th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 119). PMLR, 2048–2056

2020
[10]

2011.Measuring Inequality(3 ed.)

Frank Cowell. 2011.Measuring Inequality(3 ed.). Oxford University Press, London, England

2011
[11]

Kyanna Dagenais and Istvan David. 2025. Complex Model Transformations by Reinforcement Learning with Uncertain Human Guidance. In2025 ACM/IEEE 28th International Conference on Model Driven Engineering Languages and Systems (MODELS). doi:10.1109/MODELS67397.2025.00025

work page doi:10.1109/models67397.2025.00025 2025
[12]

Istvan David and Eugene Syriani. 2022. DEVS Model Construction as a Rein- forcement Learning Problem. In2022 Annual Modeling and Simulation Conference (ANNSIM). IEEE, 30–41. doi:10.23919/ANNSIM55834.2022.9859369

work page doi:10.23919/annsim55834.2022.9859369 2022
[13]

2024.Automated Inference of Simulators in Digital Twins

Istvan David and Eugene Syriani. 2024.Automated Inference of Simulators in Digital Twins. CRC Press, Chapter 8, 122–148. doi:10.1201/9781003425724-11

work page doi:10.1201/9781003425724-11 2024
[14]

Michael Dennis, Natasha Jaques, Eugene Vinitsky, Alexandre Bayen, Stuart Rus- sell, Andrew Critch, and Sergey Levine. 2020. Emergent Complexity and Zero- shot Transfer via Unsupervised Environment Design. InAdvances in Neural Information Processing Systems, Vol. 33. Curran Associates, Inc., 13049–13061

2020
[15]

2015.Introduction to evolutionary computing

Agoston E Eiben and James E Smith. 2015.Introduction to evolutionary computing. Springer

2015
[16]

Martin Eisenberg, Hans-Peter Pichler, Antonio Garmendia, and Manuel Wimmer
[17]

In2021 ACM/IEEE 24th Intl Conf

Towards Reinforcement Learning for In-Place Model Transformations. In2021 ACM/IEEE 24th Intl Conf. on Model Driven Engineering Languages and Systems (MODELS). 82–88
[18]

Tarek A El-Mihoub, Adrian A Hopgood, Lars Nolle, and Alan Battersby. [n. d.]. Hybrid Genetic Algorithms: A Review. ([n. d.])
[19]

Maged Elaasar, Nicolas Rouquette, David Wagner, Bentley James Oakes, Abdel- wahab Hamou-Lhadj, and Mohammad Hamdaqa. 2023. openCAESAR: Balancing Agility and Rigor in Model-Based Systems Engineering. In2023 ACM/IEEE Interna- tional Conference on Model Driven Engineering Languages and Systems Companion (MODELS-C). 221–230. doi:10.1109/MODELS-C59198.2023.00051

work page doi:10.1109/models-c59198.2023.00051 2023
[20]

Rafael Figueiredo Prudencio, Marcos R. O. A. Maximo, and Esther Luna Colombini
[21]

doi:10.1109/TNNLS.2023.3250269

A Survey on Offline Reinforcement Learning: Taxonomy, Review, and Open Problems.IEEE Trans Neural Netw Learn Syst35, 8 (2024), 10237–10257. doi:10.1109/TNNLS.2023.3250269

work page doi:10.1109/tnnls.2023.3250269 2024
[22]

Carlos Florensa, David Held, Markus Wulfmeier, Michael Zhang, and Pieter Abbeel. 2017. Reverse Curriculum Generation for Reinforcement Learning. In Proceedings of the 1st Annual Conference on Robot Learning (Proceedings of Machine Learning Research, Vol. 78). PMLR, 482–495

2017
[23]

Nicola Gatto, Evgeny Kusmenko, and Bernhard Rumpe. 2019. Modeling Deep Reinforcement Learning Based Architectures for Cyber-Physical Systems. In2019 ACM/IEEE 22nd International Conference on Model Driven Engineering Languages and Systems Companion. 196–202. doi:10.1109/MODELS-C.2019.00033

work page doi:10.1109/models-c.2019.00033 2019
[24]

Timothy Hospedales et al. 2022. Meta-Learning in Neural Networks: A Survey. IEEE Transactions on Pattern Analysis and Machine Intelligence44, 9 (2022), 5149–

2022
[25]

doi:10.1109/TPAMI.2021.3079209

work page doi:10.1109/tpami.2021.3079209 2021
[26]

Mengkang Hu et al. 2025. AgentGen: Enhancing Planning Abilities for Large Lan- guage Model based Agent via Environment and Task Generation. InProceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V.1 (KDD ’25). ACM, 496–507. doi:10.1145/3690624.3709321

work page doi:10.1145/3690624.3709321 2025
[27]

Xuemin Hu, Shen Li, Tingyu Huang, Bo Tang, Rouxing Huai, and Long Chen
[28]

doi:10.1109/TIV.2023.3312777

How Simulation Helps Autonomous Driving: A Survey of Sim2real, Digital Twins, and Parallel Intelligence.IEEE Transactions on Intelligent Vehicles9, 1 (2024), 593–612. doi:10.1109/TIV.2023.3312777

work page doi:10.1109/tiv.2023.3312777 2024
[29]

Stefan John, Alexandru Burdusel, Robert Bill, Daniel Struber, Gabriele Taentzer, Steffen Zschaler, and Manuel Wimmer. 2019. Searching for optimal models: Comparing two encoding approaches. In12th International Conference on Model Transformations ICMT 2019. 1–22

2019
[30]

Stefan John, Jens Kosiol, Leen Lambers, and Gabriele Taentzer. 2023. A graph- based framework for model-driven optimization facilitating impact analysis of mutation operator properties.Soft. Sys. Mod.22, 4 (2023), 1281–1318

2023
[31]

Lawrence Johnson, Georgios N Yannakakis, and Julian Togelius. 2010. Cellular automata for real-time generation of infinite cave levels. InProceedings of the 2010 Workshop on Procedural Content Generation in Games. 1–4

2010
[32]

Joerg Kienzle et al . 2023. Global Decision Making Over Deep Variability in Feedback-Driven Software Development. InProceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering (ASE ’22). ACM, Article 178, 6 pages. doi:10.1145/3551349.3559551

work page doi:10.1145/3551349.3559551 2023
[33]

Taewoo Kim, Minsu Jang, and Jaehong Kim. 2021. A survey on simulation environments for reinforcement learning. In2021 18th International Conference on Ubiquitous Robots (UR). IEEE, 63–67

2021
[34]

Thomas Kühne, Gergely Mezei, Eugene Syriani, Hans Vangheluwe, and Manuel Wimmer. 2010. Explicit Transformation Modeling. InModels in Software Engi- neering. Springer, 240–255

2010
[35]

Evgeny Kusmenko et al . 2022. A Model-Driven Generative Self Play-Based Toolchain for Developing Games and Players. InProceedings of the 21st ACM SIGPLAN International Conference on Generative Programming: Concepts and Experiences (GPCE 2022). ACM, 95–107. doi:10.1145/3564719.3568687

work page doi:10.1145/3564719.3568687 2022
[36]

Marta Kwiatkowska and Xiyue Zhang. 2023. When to Trust AI: Advances and Challenges for Certification of Neural Networks. In2023 18th Conference on Com- puter Science and Intelligence Systems (FedCSIS). 25–37. doi:10.15439/2023F2324

work page doi:10.15439/2023f2324 2023
[37]

Hartmut Lackner and Bernd-Holger Schlingloff. 2017. Chapter Four - Advances in Testing Software Product Lines. Advances in Computers, Vol. 107. Elsevier, 157–217. doi:10.1016/bs.adcom.2017.07.001

work page doi:10.1016/bs.adcom.2017.07.001 2017
[38]

José Lameh, Alexandra Dubray, and Marija Jankovic. 2025. Modeling variability in product line engineering (PLE) for systems engineering (SE).Proceedings of the Design Society5 (2025), 2491–2500. doi:10.1017/pds.2025.10263

work page doi:10.1017/pds.2025.10263 2025
[39]

William Liang, Sam Wang, Hung-Ju Wang, Osbert Bastani, Dinesh Jayaraman, and Yecheng Jason Ma. 2024. Eurekaverse: Environment curriculum generation via large language models.arXiv preprint arXiv:2411.01775(2024)

arXiv 2024
[40]

Khan, John Mylopoulos, and Reza Golipour

Sotirios Liaskos, Shakil M. Khan, John Mylopoulos, and Reza Golipour. 2025. Model-Driven Design and Generation of Training Simulators for Reinforcement Learning. InConceptual Modeling. Springer, 170–191

2025
[41]

Jiashuo Liu, Zheyan Shen, Yue He, Xingxuan Zhang, Renzhe Xu, Han Yu, and Peng Cui. 2023. Towards Out-Of-Distribution Generalization: A Survey. arXiv:2108.13624 [cs.LG] https://arxiv.org/abs/2108.13624

arXiv 2023
[42]

Xiaoran Liu and Istvan David. 2025. AI Simulation by Digital Twins: Systematic Survey, Reference Framework, and Mapping to a Standardized Architecture. Software and Systems Modeling(2025). doi:10.1007/s10270-025-01306-0

work page doi:10.1007/s10270-025-01306-0 2025
[43]

Xiaoran Liu and Istvan David. 2026. A Reference Architecture of Reinforcement Learning Frameworks. In2026 IEEE 23rd International Conference on Software Architecture (ICSA). doi:10.1109/ICSA66085.2026.00016

work page doi:10.1109/icsa66085.2026.00016 2026
[44]

Viktor Makoviychuk et al . 2021. Isaac Gym: High Performance GPU-Based Physics Simulation For Robot Learning. doi:10.48550/arXiv.2108.10470

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2108.10470 2021
[45]

Dhruv Malik, Yuanzhi Li, and Pradeep Ravikumar. 2021. When Is Generalizable Reinforcement Learning Tractable?. InAdvances in Neural Information Processing Systems, Vol. 34. Curran Associates, Inc., 8032–8045

2021
[46]

Tambet Matiisen, Avital Oliver, Taco Cohen, and John Schulman. 2019. Teacher– student curriculum learning.IEEE transactions on neural networks and learning systems31, 9 (2019), 3732–3740

2019
[47]

Zentner, Ryan Julian, J K Terry, Isaac Woungang, Nariman Farsad, and Pablo Samuel Castro

Reginald McLean, Evangelos Chatzaroulas, Luc McCutcheon, Frank Röder, Tianhe Yu, Zhanpeng He, K.R. Zentner, Ryan Julian, J K Terry, Isaac Woungang, Nariman Farsad, and Pablo Samuel Castro. 2025. Meta-World+: An Improved, Standardized, RL Benchmark. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems Datasets and Benchmarks Track

2025
[48]

Marjan Mernik, Jan Heering, and Anthony M. Sloane. 2005. When and how to develop domain-specific languages.ACM Comput. Surv.37, 4 (Dec. 2005), 316–344. doi:10.1145/1118890.1118892

work page doi:10.1145/1118890.1118892 2005
[49]

Tim Molderez, Bjarno Oeyen, Coen De Roover, and Wolfgang De Meuter. 2019. Marlon: A domain-specific language for multi-agent reinforcement learning on networks. InProc of the 34th ACM/SIGAPP Symposium on Applied Computing. ACM, 1322–1329. doi:10.1145/3297280.3297413

work page doi:10.1145/3297280.3297413 2019
[50]

Pablo Moscato et al. [n. d.]. On evolution, search, optimization, genetic algorithms and martial arts: Towards memetic algorithms. ([n. d.]). MODELS ’26, October 4–9, 2026, Malaga, Spain Liu and David

2026
[51]

Dirk Muthig and Colin Atkinson. 2002. Model-Driven Product Line Architectures. InSoftware Product Lines. Springer, 110–129

2002
[52]

Taylor, and Peter Stone

Sanmit Narvekar, Bei Peng, Matteo Leonetti, Jivko Sinapov, Matthew E. Taylor, and Peter Stone. 2020. Curriculum Learning for Reinforcement Learning Domains: A Framework and Survey.J Machine Learning Research21, 181 (2020), 1–50

2020
[53]

Sanmit Narvekar, Jivko Sinapov, Matteo Leonetti, and Peter Stone. 2016. Source task creation for curriculum learning. InProceedings of the 2016 international conference on autonomous agents & multiagent systems. 566–574

2016
[54]

Hira Naveed, Chetan Arora, Hourieh Khalajzadeh, John Grundy, and Omar Haggag. 2024. Model driven engineering for machine learning components: A systematic literature review.Inf Softw Technol169 (2024), 107423

2024
[55]

Evangelos Ntentos, Stephen John Warnett, and Uwe Zdun. 2024. Supporting architectural decision making on training strategies in reinforcement learning architectures. In21st Intl Conf on Software Architecture (ICSA). IEEE, 90–100

2024
[56]

Maria Joao Varanda Pereira, Joao Fonseca, and Pedro Rangel Henriques. 2016. Ontological approach for DSL development.Computer Languages, Systems & Structures45 (2016), 35–52

2016
[57]

Andrei Pitkevich and Ilya Makarov. 2024. A Survey on Sim-to-Real Transfer Methods for Robotic Manipulation. InIEEE Intl Symposium on Intelligent Systems and Informatics (SISY). 000259–000266. doi:10.1109/SISY62279.2024.10737545

work page doi:10.1109/sisy62279.2024.10737545 2024
[58]

Martin L Puterman. 1990. Markov decision processes.Handbooks in operations research and management science2 (1990), 331–434

1990
[59]

2022.Simulation

Sheldon M Ross. 2022.Simulation. academic press

2022
[60]

Oszkár Semeráth, Aren A Babikian, Boqi Chen, Chuning Li, Kristóf Marussy, Gábor Szárnyas, and Dániel Varró. 2021. Automated generation of consistent, diverse and structurally realistic graph models.Soft. Sys. Mod.20, 5 (2021), 1713–1734. doi:10.1007/s10270-021-00884-z

work page doi:10.1007/s10270-021-00884-z 2021
[61]

Oszkár Semeráth, Rebeka Farkas, Gábor Bergmann, and Dániel Varró. 2020. Diversity of graph models and graph generators in mutation testing.Int J Softw Tools Technol Transf22, 1 (2020), 57–78. doi:10.1007/s10009-019-00530-6

work page doi:10.1007/s10009-019-00530-6 2020
[62]

C. E. Shannon. 1948. A mathematical theory of communication.The Bell System Technical Journal27, 3 (1948), 379–423. doi:10.1002/j.1538-7305.1948.tb01338.x

work page doi:10.1002/j.1538-7305.1948.tb01338.x 1948
[63]

Shephard and Rolf Färe

Ronald W. Shephard and Rolf Färe. 1974. The Law of Diminishing Returns. In Production Theory. Springer, 287–318. doi:10.1007/978-3-642-80864-7_17

work page doi:10.1007/978-3-642-80864-7_17 1974
[64]

Daniele F Silva, Rafael P Torchelsen, and Marilton S Aguiar. 2025. Procedural game level generation with GANs: potential, weaknesses, and unresolved challenges in the literature.Multimedia Tools and Applications(2025), 1–27

2025
[65]

Natalie Sinani et al. 2024. Towards a Domain-Specific Modelling Environment for Reinforcement Learning.arXiv preprint arXiv:2410.09368(2024)

arXiv 2024
[66]

Shagun Sodhani, Amy Zhang, and Joelle Pineau. 2021. Multi-task reinforce- ment learning with context-based representations. InInternational conference on machine learning. PMLR, 9767–9779

2021
[67]

Petru Soviany et al. 2022. Curriculum Learning: A Survey.International Journal of Computer Vision130, 6 (2022), 1526–1565. doi:10.1007/s11263-022-01611-x

work page doi:10.1007/s11263-022-01611-x 2022
[68]

1998.Reinforcement learning: An intro- duction

Richard S Sutton and Andrew G Barto. 1998.Reinforcement learning: An intro- duction. MIT press Cambridge

1998
[69]

Jordan Terry et al. 2021. PettingZoo: Gym for Multi-Agent Reinforcement Learn- ing. InAdvances in Neural Information Processing Systems, Vol. 34. Curran Asso- ciates, Inc., 15032–15043

2021
[70]

2011.Graphs: theory and algorithms

Krishnaiyan Thulasiraman and Madisetti NS Swamy. 2011.Graphs: theory and algorithms. John Wiley & Sons

2011
[71]

Massimo Tisi, Frédéric Jouault, Piero Fraternali, Stefano Ceri, and Jean Bézivin
[72]

InProceedings of the 5th European Conference on Model Driven Architecture - Foundations and Applications (ECMDA-FA ’09)

On the Use of Higher-Order Model Transformations. InProceedings of the 5th European Conference on Model Driven Architecture - Foundations and Applications (ECMDA-FA ’09). Springer, 18–33. doi:10.1007/978-3-642-02674-4_3

work page doi:10.1007/978-3-642-02674-4_3
[73]

Cover and Joy A Thomas

T.M. Cover and Joy A Thomas. 1991.Elements of Information Theory(99 ed.). John Wiley & Sons, Nashville, TN

1991
[74]

Josh Tobin, Rachel Fong, Alex Ray, Jonas Schneider, Wojciech Zaremba, and Pieter Abbeel. 2017. Domain randomization for transferring deep neural networks from simulation to the real world. In2017 IEEE/RSJ international conference on intelligent robots and systems (IROS). IEEE, 23–30

2017
[75]

Julian Togelius, Alex J Champandard, Pier Luca Lanzi, Michael Mateas, Ana Paiva, Mike Preuss, and Kenneth O Stanley. 2013. Procedural content generation: Goals, challenges and actionable steps

2013
[76]

Mark Towers et al. 2025. Gymnasium: A Standard Interface for Reinforcement Learning Environments. doi:10.48550/arXiv.2407.17032

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2407.17032 2025
[77]

Y.R. Tsoy. 2003. The influence of population size and search time limit on genetic algorithm. In7th Korea-Russia International Symposium on Science and Technology, Proceedings KORUS 2003. (IEEE Cat. No.03EX737), Vol. 3. 181–187 vol.3

2003
[78]

Dejin Wang and Seyede Fatemeh Ghoreishi. 2025. RGDR: Reward-Guided Do- main Randomization for Autonomous Driving. In2025 IEEE 28th International Conference on Intelligent Transportation Systems (ITSC 2025), IEEE

2025
[79]

Xin Wang et al. 2022. A Survey on Curriculum Learning.IEEE Transactions on Pattern Analysis and Machine Intelligence44, 9 (2022), 4555–4576. doi:10.1109/ TPAMI.2021.3069908

arXiv 2022
[80]

Christopher JCH Watkins and Peter Dayan. 1992. Q-learning.Machine learning 8, 3 (1992), 279–292

1992

Showing first 80 references.

[1] [1]

Hani Abdeen, Dániel Varró, Houari Sahraoui, András Szabolcs Nagy, Csaba Debreceni, Ábel Hegedüs, and Ákos Horváth. 2014. Multi-objective optimization in rule-based design space exploration. InProceedings of the 29th ACM/IEEE International Conference on Automated Software Engineering (ASE ’14). ACM, 289–300. doi:10.1145/2642937.2643005

work page doi:10.1145/2642937.2643005 2014

[2] [2]

OpenAI: Marcin Andrychowicz et al . 2020. Learning dexterous in-hand ma- nipulation.The International Journal of Robotics Research39, 1 (2020), 3–20. doi:10.1177/0278364919887447

work page doi:10.1177/0278364919887447 2020

[3] [3]

Angela Barriga, Rogardt Heldal, Adrian Rutle, and Ludovico Iovino. 2022. PAR- MOREL: a framework for customizable model repair.Soft. Sys. Mod.21, 5 (2022), 1739–1762

2022

[4] [4]

Yoshua Bengio, Jérôme Louradour, Ronan Collobert, and Jason Weston. 2009. Curriculum learning. InProceedings of the 26th Annual International Conference on Machine Learning (ICML ’09). ACM, 41–48. doi:10.1145/1553374.1553380

work page doi:10.1145/1553374.1553380 2009

[5] [5]

Gábor Bergmann, Istvan David, Ábel Hegedüs, Ákos Horváth, István Ráth, Zoltán Ujhelyi, and Dániel Varró. 2015. VIATRA 3: A Reactive Model Transformation Platform. InTheory and Practice of Model Transformations - 8th International Conference, ICMTSTAF 2015, L’Aquila, Italy, July 20-21, 2015. Proceedings (LNCS, Vol. 9152). Springer, 101–110. doi:10.1007/978...

work page doi:10.1007/978-3-319-21155-8_8 2015

[6] [6]

Dimitris Bertsimas and John Tsitsiklis. 1993. Simulated Annealing.Statist. Sci.8, 1 (1993), 10 – 15. doi:10.1214/ss/1177011077

work page doi:10.1214/ss/1177011077 1993

[7] [7]

Alexandru Burdusel, Steffen Zschaler, and Stefan John. 2021. Automatic genera- tion of atomic multiplicity-preserving search operators for search-based model engineering.Soft. Sys. Mod.20, 6 (2021), 1857–1887

2021

[8] [8]

Thomas Chaffre, Julien Moras, Adrien Chan-Hon-Tong, and Julien Marzat. 2020. Sim-to-Real Transfer with Incremental Environment Complexity for Reinforce- ment Learning of Depth-Based Robot Navigation. InProceedings of the 17th International Conference on Informatics, Automation and Robotics, ICINCO 2020. 314–323. https://ensta.hal.science/hal-02958155

2020

[9] [9]

Karl Cobbe, Chris Hesse, Jacob Hilton, and John Schulman. 2020. Leveraging Procedural Generation to Benchmark Reinforcement Learning. InProc of the 37th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 119). PMLR, 2048–2056

2020

[10] [10]

2011.Measuring Inequality(3 ed.)

Frank Cowell. 2011.Measuring Inequality(3 ed.). Oxford University Press, London, England

2011

[11] [11]

Kyanna Dagenais and Istvan David. 2025. Complex Model Transformations by Reinforcement Learning with Uncertain Human Guidance. In2025 ACM/IEEE 28th International Conference on Model Driven Engineering Languages and Systems (MODELS). doi:10.1109/MODELS67397.2025.00025

work page doi:10.1109/models67397.2025.00025 2025

[12] [12]

Istvan David and Eugene Syriani. 2022. DEVS Model Construction as a Rein- forcement Learning Problem. In2022 Annual Modeling and Simulation Conference (ANNSIM). IEEE, 30–41. doi:10.23919/ANNSIM55834.2022.9859369

work page doi:10.23919/annsim55834.2022.9859369 2022

[13] [13]

2024.Automated Inference of Simulators in Digital Twins

Istvan David and Eugene Syriani. 2024.Automated Inference of Simulators in Digital Twins. CRC Press, Chapter 8, 122–148. doi:10.1201/9781003425724-11

work page doi:10.1201/9781003425724-11 2024

[14] [14]

Michael Dennis, Natasha Jaques, Eugene Vinitsky, Alexandre Bayen, Stuart Rus- sell, Andrew Critch, and Sergey Levine. 2020. Emergent Complexity and Zero- shot Transfer via Unsupervised Environment Design. InAdvances in Neural Information Processing Systems, Vol. 33. Curran Associates, Inc., 13049–13061

2020

[15] [15]

2015.Introduction to evolutionary computing

Agoston E Eiben and James E Smith. 2015.Introduction to evolutionary computing. Springer

2015

[16] [16]

Martin Eisenberg, Hans-Peter Pichler, Antonio Garmendia, and Manuel Wimmer

[17] [17]

In2021 ACM/IEEE 24th Intl Conf

Towards Reinforcement Learning for In-Place Model Transformations. In2021 ACM/IEEE 24th Intl Conf. on Model Driven Engineering Languages and Systems (MODELS). 82–88

[18] [18]

Tarek A El-Mihoub, Adrian A Hopgood, Lars Nolle, and Alan Battersby. [n. d.]. Hybrid Genetic Algorithms: A Review. ([n. d.])

[19] [19]

Maged Elaasar, Nicolas Rouquette, David Wagner, Bentley James Oakes, Abdel- wahab Hamou-Lhadj, and Mohammad Hamdaqa. 2023. openCAESAR: Balancing Agility and Rigor in Model-Based Systems Engineering. In2023 ACM/IEEE Interna- tional Conference on Model Driven Engineering Languages and Systems Companion (MODELS-C). 221–230. doi:10.1109/MODELS-C59198.2023.00051

work page doi:10.1109/models-c59198.2023.00051 2023

[20] [20]

Rafael Figueiredo Prudencio, Marcos R. O. A. Maximo, and Esther Luna Colombini

[21] [21]

doi:10.1109/TNNLS.2023.3250269

A Survey on Offline Reinforcement Learning: Taxonomy, Review, and Open Problems.IEEE Trans Neural Netw Learn Syst35, 8 (2024), 10237–10257. doi:10.1109/TNNLS.2023.3250269

work page doi:10.1109/tnnls.2023.3250269 2024

[22] [22]

Carlos Florensa, David Held, Markus Wulfmeier, Michael Zhang, and Pieter Abbeel. 2017. Reverse Curriculum Generation for Reinforcement Learning. In Proceedings of the 1st Annual Conference on Robot Learning (Proceedings of Machine Learning Research, Vol. 78). PMLR, 482–495

2017

[23] [23]

Nicola Gatto, Evgeny Kusmenko, and Bernhard Rumpe. 2019. Modeling Deep Reinforcement Learning Based Architectures for Cyber-Physical Systems. In2019 ACM/IEEE 22nd International Conference on Model Driven Engineering Languages and Systems Companion. 196–202. doi:10.1109/MODELS-C.2019.00033

work page doi:10.1109/models-c.2019.00033 2019

[24] [24]

Timothy Hospedales et al. 2022. Meta-Learning in Neural Networks: A Survey. IEEE Transactions on Pattern Analysis and Machine Intelligence44, 9 (2022), 5149–

2022

[25] [25]

doi:10.1109/TPAMI.2021.3079209

work page doi:10.1109/tpami.2021.3079209 2021

[26] [26]

Mengkang Hu et al. 2025. AgentGen: Enhancing Planning Abilities for Large Lan- guage Model based Agent via Environment and Task Generation. InProceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V.1 (KDD ’25). ACM, 496–507. doi:10.1145/3690624.3709321

work page doi:10.1145/3690624.3709321 2025

[27] [27]

Xuemin Hu, Shen Li, Tingyu Huang, Bo Tang, Rouxing Huai, and Long Chen

[28] [28]

doi:10.1109/TIV.2023.3312777

How Simulation Helps Autonomous Driving: A Survey of Sim2real, Digital Twins, and Parallel Intelligence.IEEE Transactions on Intelligent Vehicles9, 1 (2024), 593–612. doi:10.1109/TIV.2023.3312777

work page doi:10.1109/tiv.2023.3312777 2024

[29] [29]

Stefan John, Alexandru Burdusel, Robert Bill, Daniel Struber, Gabriele Taentzer, Steffen Zschaler, and Manuel Wimmer. 2019. Searching for optimal models: Comparing two encoding approaches. In12th International Conference on Model Transformations ICMT 2019. 1–22

2019

[30] [30]

Stefan John, Jens Kosiol, Leen Lambers, and Gabriele Taentzer. 2023. A graph- based framework for model-driven optimization facilitating impact analysis of mutation operator properties.Soft. Sys. Mod.22, 4 (2023), 1281–1318

2023

[31] [31]

Lawrence Johnson, Georgios N Yannakakis, and Julian Togelius. 2010. Cellular automata for real-time generation of infinite cave levels. InProceedings of the 2010 Workshop on Procedural Content Generation in Games. 1–4

2010

[32] [32]

Joerg Kienzle et al . 2023. Global Decision Making Over Deep Variability in Feedback-Driven Software Development. InProceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering (ASE ’22). ACM, Article 178, 6 pages. doi:10.1145/3551349.3559551

work page doi:10.1145/3551349.3559551 2023

[33] [33]

Taewoo Kim, Minsu Jang, and Jaehong Kim. 2021. A survey on simulation environments for reinforcement learning. In2021 18th International Conference on Ubiquitous Robots (UR). IEEE, 63–67

2021

[34] [34]

Thomas Kühne, Gergely Mezei, Eugene Syriani, Hans Vangheluwe, and Manuel Wimmer. 2010. Explicit Transformation Modeling. InModels in Software Engi- neering. Springer, 240–255

2010

[35] [35]

Evgeny Kusmenko et al . 2022. A Model-Driven Generative Self Play-Based Toolchain for Developing Games and Players. InProceedings of the 21st ACM SIGPLAN International Conference on Generative Programming: Concepts and Experiences (GPCE 2022). ACM, 95–107. doi:10.1145/3564719.3568687

work page doi:10.1145/3564719.3568687 2022

[36] [36]

Marta Kwiatkowska and Xiyue Zhang. 2023. When to Trust AI: Advances and Challenges for Certification of Neural Networks. In2023 18th Conference on Com- puter Science and Intelligence Systems (FedCSIS). 25–37. doi:10.15439/2023F2324

work page doi:10.15439/2023f2324 2023

[37] [37]

Hartmut Lackner and Bernd-Holger Schlingloff. 2017. Chapter Four - Advances in Testing Software Product Lines. Advances in Computers, Vol. 107. Elsevier, 157–217. doi:10.1016/bs.adcom.2017.07.001

work page doi:10.1016/bs.adcom.2017.07.001 2017

[38] [38]

José Lameh, Alexandra Dubray, and Marija Jankovic. 2025. Modeling variability in product line engineering (PLE) for systems engineering (SE).Proceedings of the Design Society5 (2025), 2491–2500. doi:10.1017/pds.2025.10263

work page doi:10.1017/pds.2025.10263 2025

[39] [39]

William Liang, Sam Wang, Hung-Ju Wang, Osbert Bastani, Dinesh Jayaraman, and Yecheng Jason Ma. 2024. Eurekaverse: Environment curriculum generation via large language models.arXiv preprint arXiv:2411.01775(2024)

arXiv 2024

[40] [40]

Khan, John Mylopoulos, and Reza Golipour

Sotirios Liaskos, Shakil M. Khan, John Mylopoulos, and Reza Golipour. 2025. Model-Driven Design and Generation of Training Simulators for Reinforcement Learning. InConceptual Modeling. Springer, 170–191

2025

[41] [41]

Jiashuo Liu, Zheyan Shen, Yue He, Xingxuan Zhang, Renzhe Xu, Han Yu, and Peng Cui. 2023. Towards Out-Of-Distribution Generalization: A Survey. arXiv:2108.13624 [cs.LG] https://arxiv.org/abs/2108.13624

arXiv 2023

[42] [42]

Xiaoran Liu and Istvan David. 2025. AI Simulation by Digital Twins: Systematic Survey, Reference Framework, and Mapping to a Standardized Architecture. Software and Systems Modeling(2025). doi:10.1007/s10270-025-01306-0

work page doi:10.1007/s10270-025-01306-0 2025

[43] [43]

Xiaoran Liu and Istvan David. 2026. A Reference Architecture of Reinforcement Learning Frameworks. In2026 IEEE 23rd International Conference on Software Architecture (ICSA). doi:10.1109/ICSA66085.2026.00016

work page doi:10.1109/icsa66085.2026.00016 2026

[44] [44]

Viktor Makoviychuk et al . 2021. Isaac Gym: High Performance GPU-Based Physics Simulation For Robot Learning. doi:10.48550/arXiv.2108.10470

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2108.10470 2021

[45] [45]

Dhruv Malik, Yuanzhi Li, and Pradeep Ravikumar. 2021. When Is Generalizable Reinforcement Learning Tractable?. InAdvances in Neural Information Processing Systems, Vol. 34. Curran Associates, Inc., 8032–8045

2021

[46] [46]

Tambet Matiisen, Avital Oliver, Taco Cohen, and John Schulman. 2019. Teacher– student curriculum learning.IEEE transactions on neural networks and learning systems31, 9 (2019), 3732–3740

2019

[47] [47]

Zentner, Ryan Julian, J K Terry, Isaac Woungang, Nariman Farsad, and Pablo Samuel Castro

Reginald McLean, Evangelos Chatzaroulas, Luc McCutcheon, Frank Röder, Tianhe Yu, Zhanpeng He, K.R. Zentner, Ryan Julian, J K Terry, Isaac Woungang, Nariman Farsad, and Pablo Samuel Castro. 2025. Meta-World+: An Improved, Standardized, RL Benchmark. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems Datasets and Benchmarks Track

2025

[48] [48]

Marjan Mernik, Jan Heering, and Anthony M. Sloane. 2005. When and how to develop domain-specific languages.ACM Comput. Surv.37, 4 (Dec. 2005), 316–344. doi:10.1145/1118890.1118892

work page doi:10.1145/1118890.1118892 2005

[49] [49]

Tim Molderez, Bjarno Oeyen, Coen De Roover, and Wolfgang De Meuter. 2019. Marlon: A domain-specific language for multi-agent reinforcement learning on networks. InProc of the 34th ACM/SIGAPP Symposium on Applied Computing. ACM, 1322–1329. doi:10.1145/3297280.3297413

work page doi:10.1145/3297280.3297413 2019

[50] [50]

Pablo Moscato et al. [n. d.]. On evolution, search, optimization, genetic algorithms and martial arts: Towards memetic algorithms. ([n. d.]). MODELS ’26, October 4–9, 2026, Malaga, Spain Liu and David

2026

[51] [51]

Dirk Muthig and Colin Atkinson. 2002. Model-Driven Product Line Architectures. InSoftware Product Lines. Springer, 110–129

2002

[52] [52]

Taylor, and Peter Stone

Sanmit Narvekar, Bei Peng, Matteo Leonetti, Jivko Sinapov, Matthew E. Taylor, and Peter Stone. 2020. Curriculum Learning for Reinforcement Learning Domains: A Framework and Survey.J Machine Learning Research21, 181 (2020), 1–50

2020

[53] [53]

Sanmit Narvekar, Jivko Sinapov, Matteo Leonetti, and Peter Stone. 2016. Source task creation for curriculum learning. InProceedings of the 2016 international conference on autonomous agents & multiagent systems. 566–574

2016

[54] [54]

Hira Naveed, Chetan Arora, Hourieh Khalajzadeh, John Grundy, and Omar Haggag. 2024. Model driven engineering for machine learning components: A systematic literature review.Inf Softw Technol169 (2024), 107423

2024

[55] [55]

Evangelos Ntentos, Stephen John Warnett, and Uwe Zdun. 2024. Supporting architectural decision making on training strategies in reinforcement learning architectures. In21st Intl Conf on Software Architecture (ICSA). IEEE, 90–100

2024

[56] [56]

Maria Joao Varanda Pereira, Joao Fonseca, and Pedro Rangel Henriques. 2016. Ontological approach for DSL development.Computer Languages, Systems & Structures45 (2016), 35–52

2016

[57] [57]

Andrei Pitkevich and Ilya Makarov. 2024. A Survey on Sim-to-Real Transfer Methods for Robotic Manipulation. InIEEE Intl Symposium on Intelligent Systems and Informatics (SISY). 000259–000266. doi:10.1109/SISY62279.2024.10737545

work page doi:10.1109/sisy62279.2024.10737545 2024

[58] [58]

Martin L Puterman. 1990. Markov decision processes.Handbooks in operations research and management science2 (1990), 331–434

1990

[59] [59]

2022.Simulation

Sheldon M Ross. 2022.Simulation. academic press

2022

[60] [60]

Oszkár Semeráth, Aren A Babikian, Boqi Chen, Chuning Li, Kristóf Marussy, Gábor Szárnyas, and Dániel Varró. 2021. Automated generation of consistent, diverse and structurally realistic graph models.Soft. Sys. Mod.20, 5 (2021), 1713–1734. doi:10.1007/s10270-021-00884-z

work page doi:10.1007/s10270-021-00884-z 2021

[61] [61]

Oszkár Semeráth, Rebeka Farkas, Gábor Bergmann, and Dániel Varró. 2020. Diversity of graph models and graph generators in mutation testing.Int J Softw Tools Technol Transf22, 1 (2020), 57–78. doi:10.1007/s10009-019-00530-6

work page doi:10.1007/s10009-019-00530-6 2020

[62] [62]

C. E. Shannon. 1948. A mathematical theory of communication.The Bell System Technical Journal27, 3 (1948), 379–423. doi:10.1002/j.1538-7305.1948.tb01338.x

work page doi:10.1002/j.1538-7305.1948.tb01338.x 1948

[63] [63]

Shephard and Rolf Färe

Ronald W. Shephard and Rolf Färe. 1974. The Law of Diminishing Returns. In Production Theory. Springer, 287–318. doi:10.1007/978-3-642-80864-7_17

work page doi:10.1007/978-3-642-80864-7_17 1974

[64] [64]

Daniele F Silva, Rafael P Torchelsen, and Marilton S Aguiar. 2025. Procedural game level generation with GANs: potential, weaknesses, and unresolved challenges in the literature.Multimedia Tools and Applications(2025), 1–27

2025

[65] [65]

Natalie Sinani et al. 2024. Towards a Domain-Specific Modelling Environment for Reinforcement Learning.arXiv preprint arXiv:2410.09368(2024)

arXiv 2024

[66] [66]

Shagun Sodhani, Amy Zhang, and Joelle Pineau. 2021. Multi-task reinforce- ment learning with context-based representations. InInternational conference on machine learning. PMLR, 9767–9779

2021

[67] [67]

Petru Soviany et al. 2022. Curriculum Learning: A Survey.International Journal of Computer Vision130, 6 (2022), 1526–1565. doi:10.1007/s11263-022-01611-x

work page doi:10.1007/s11263-022-01611-x 2022

[68] [68]

1998.Reinforcement learning: An intro- duction

Richard S Sutton and Andrew G Barto. 1998.Reinforcement learning: An intro- duction. MIT press Cambridge

1998

[69] [69]

Jordan Terry et al. 2021. PettingZoo: Gym for Multi-Agent Reinforcement Learn- ing. InAdvances in Neural Information Processing Systems, Vol. 34. Curran Asso- ciates, Inc., 15032–15043

2021

[70] [70]

2011.Graphs: theory and algorithms

Krishnaiyan Thulasiraman and Madisetti NS Swamy. 2011.Graphs: theory and algorithms. John Wiley & Sons

2011

[71] [71]

Massimo Tisi, Frédéric Jouault, Piero Fraternali, Stefano Ceri, and Jean Bézivin

[72] [72]

InProceedings of the 5th European Conference on Model Driven Architecture - Foundations and Applications (ECMDA-FA ’09)

On the Use of Higher-Order Model Transformations. InProceedings of the 5th European Conference on Model Driven Architecture - Foundations and Applications (ECMDA-FA ’09). Springer, 18–33. doi:10.1007/978-3-642-02674-4_3

work page doi:10.1007/978-3-642-02674-4_3

[73] [73]

Cover and Joy A Thomas

T.M. Cover and Joy A Thomas. 1991.Elements of Information Theory(99 ed.). John Wiley & Sons, Nashville, TN

1991

[74] [74]

Josh Tobin, Rachel Fong, Alex Ray, Jonas Schneider, Wojciech Zaremba, and Pieter Abbeel. 2017. Domain randomization for transferring deep neural networks from simulation to the real world. In2017 IEEE/RSJ international conference on intelligent robots and systems (IROS). IEEE, 23–30

2017

[75] [75]

Julian Togelius, Alex J Champandard, Pier Luca Lanzi, Michael Mateas, Ana Paiva, Mike Preuss, and Kenneth O Stanley. 2013. Procedural content generation: Goals, challenges and actionable steps

2013

[76] [76]

Mark Towers et al. 2025. Gymnasium: A Standard Interface for Reinforcement Learning Environments. doi:10.48550/arXiv.2407.17032

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2407.17032 2025

[77] [77]

Y.R. Tsoy. 2003. The influence of population size and search time limit on genetic algorithm. In7th Korea-Russia International Symposium on Science and Technology, Proceedings KORUS 2003. (IEEE Cat. No.03EX737), Vol. 3. 181–187 vol.3

2003

[78] [78]

Dejin Wang and Seyede Fatemeh Ghoreishi. 2025. RGDR: Reward-Guided Do- main Randomization for Autonomous Driving. In2025 IEEE 28th International Conference on Intelligent Transportation Systems (ITSC 2025), IEEE

2025

[79] [79]

Xin Wang et al. 2022. A Survey on Curriculum Learning.IEEE Transactions on Pattern Analysis and Machine Intelligence44, 9 (2022), 4555–4576. doi:10.1109/ TPAMI.2021.3069908

arXiv 2022

[80] [80]

Christopher JCH Watkins and Peter Dayan. 1992. Q-learning.Machine learning 8, 3 (1992), 279–292

1992