pith. sign in

arxiv: 2606.20324 · v1 · pith:TKXVOTXQnew · submitted 2026-06-18 · 💻 cs.SE · cs.LG

A Model-Driven Approach for Developing Families of Reinforcement Learning Environments

Pith reviewed 2026-06-26 16:11 UTC · model grok-4.3

classification 💻 cs.SE cs.LG
keywords model-driven engineeringreinforcement learninggenetic algorithmsenvironment generationmodel transformationscurriculum learningwildfire mitigation
0
0 comments X

The pith

A hybrid genetic algorithm generates families of reinforcement learning environments by treating mutations and constraints as model transformations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper aims to replace the manual, error-prone creation of multiple similar but varied training environments for reinforcement learning agents with an automated model-driven process. A hybrid genetic algorithm performs the search, where changes to environments and rules limiting those changes are written as model transformations run by an existing transformation engine. This produces families of environments that support agent convergence, as shown in a wildfire mitigation example and in curriculum learning setups that gradually increase difficulty. A reader would care because realistic RL tasks often fail to converge without many environment variants, yet building those variants by hand does not scale to complex problems.

Core claim

The paper claims that a hybrid genetic algorithm, which combines population-based global search with heuristic local search, can generate families of RL environments when mutations and constraints are expressed as model transformations and executed by a state-of-the-art model transformation engine, with soundness shown through application to wildfire mitigation and curriculum learning.

What carries the argument

The hybrid genetic algorithm that operationalizes mutations and constraints as model transformations executed by a model transformation engine.

If this is right

  • Development of RL environment families shifts from labor-intensive manual work to an automated search process.
  • Environment variants become available at scale for any RL problem that needs multiple similar but distinct training settings.
  • Curriculum learning setups can be produced by generating sequences of environments with controlled increases in difficulty.
  • The same transformation-based search applies to other domains that require families of simulation environments, such as wildfire mitigation training.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The approach could be combined with existing RL frameworks so that generated environments are directly usable for training runs.
  • Model-based representation of environments might allow additional checks for safety or consistency before training begins.
  • The method could be tested on other RL domains, such as robotics or game playing, to see whether the same transformation encoding works without redesign.
  • Extending the search objective to include measured convergence speed or sample efficiency would produce environment families tuned for faster learning.

Load-bearing premise

That expressing mutations and constraints as model transformations is sufficient to produce environment families that support RL convergence without requiring substantial additional manual intervention or domain-specific tuning.

What would settle it

Running the generated environment families on the wildfire mitigation task and finding that RL agents fail to converge to meaningful behavior unless the environments or search process receive extra manual adjustments.

Figures

Figures reproduced from arXiv: 2606.20324 by Istvan David, Xiaoran Liu.

Figure 1
Figure 1. Figure 1: Example Burning Forest instance and its metamodel 2.1.3 Curriculum learning. In this work, we use curriculum learn￾ing (CL) as the representative example of learning paradigms that rely on families of environments. CL is a training strategy in which the learning process is organized as a sequence of training criteria that evolve over time [74], typically from simpler to more complex settings [4]. This allo… view at source ↗
Figure 2
Figure 2. Figure 2: Overview of the approach Example Defining the initial environment In the running example, the environment model defines the Forest composed of Tiles, specialized into BurningTile, RoadTile, VegetationTile, WaterTile, StartTile, and GoalTile, with a FireTruck agent whose currentState references a single tile, as shown in Fig. 1b. The expert specifies the initial environment model as a Burning Forest contain… view at source ↗
Figure 3
Figure 3. Figure 3: A family of Burning Forest environments in CL [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Environment generation process 4.4 Generating families of environments In Step 4②✐, a GA generates the family of environments that meet the objective function and respect the defined constraints. ( [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Generated curriculum of increasing complexity [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Cumulative reward with different prefixes [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Cumulative reward across all curricula [PITH_FULL_IMAGE:figures/full_fig_p008_7.png] view at source ↗
read the original abstract

Virtual training environments are software-intensive systems in which reinforcement learning (RL) agents learn, adapt, and demonstrate meaningful behavior. Virtual training environments offer a safe and cost-efficient alternative to training agents in real-world settings. However, to converge, most realistic RL problems require training in multiple, mostly similar but slightly different environments - i.e., families of environment variants. The typical development process of environment families is a labor-intensive and error-prone manual endeavor that does not scale well. To alleviate these issues, in this paper, we propose a model-driven approach for developing families of RL training environments. To obtain the family of environments, we develop an approach and prototype tool. In our approach, a hybrid genetic algorithm - a combination of population-based global search and heuristic local search - generates environment families. Mutations and constraints are expressed as model transformations and are operationalized into a search process by a state-of-the-art model transformation engine. We demonstrate the soundness of our approach in a wildfire mitigation scenario and curriculum learning - a particular learning paradigm that relies on environment families.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes a model-driven approach to generate families of RL training environments via a hybrid genetic algorithm that encodes mutations and constraints as model transformations executed by a model transformation engine. The approach is demonstrated for soundness on a wildfire mitigation scenario and on curriculum learning, with the claim that it reduces the labor-intensive manual development of environment variants.

Significance. If the generated families demonstrably support RL convergence with substantially lower manual effort than conventional methods, the work would address a practical bottleneck in RL engineering. The model-transformation framing and hybrid GA are technically coherent, but the significance hinges on evidence that the automation yields measurable gains in development effort or learning performance; without such evidence the contribution remains primarily methodological.

major comments (2)
  1. [Evaluation] Evaluation section (wildfire mitigation and curriculum learning demonstrations): the paper shows that the hybrid GA produces environment families but reports no quantitative metrics comparing development effort (e.g., person-hours, number of hand-tuned parameters), RL convergence (success rate, sample efficiency, learning curves), or post-generation manual intervention against manually constructed baseline families. This absence directly undermines the central claim that the transformation-based approach alleviates labor-intensive manual development.
  2. [Approach] Approach description (hybrid GA and model-transformation operationalization): while mutations and constraints are expressed as model transformations, the manuscript does not specify how domain-specific RL convergence requirements (e.g., reward shaping, state-space coverage) are encoded or validated within the transformation rules, leaving open whether substantial expert knowledge is still required to define the initial metamodel and constraints.
minor comments (2)
  1. [Abstract] The abstract and introduction use the term 'soundness' for the demonstrations; clarify whether this refers to syntactic validity of generated environments, semantic correctness for RL, or empirical convergence.
  2. Figure captions and pseudocode for the hybrid GA should explicitly label the population-based and local-search components to improve readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We agree that the evaluation would benefit from quantitative comparisons and will revise the manuscript to include them. We also clarify the encoding of domain-specific requirements in the approach description.

read point-by-point responses
  1. Referee: [Evaluation] Evaluation section (wildfire mitigation and curriculum learning demonstrations): the paper shows that the hybrid GA produces environment families but reports no quantitative metrics comparing development effort (e.g., person-hours, number of hand-tuned parameters), RL convergence (success rate, sample efficiency, learning curves), or post-generation manual intervention against manually constructed baseline families. This absence directly undermines the central claim that the transformation-based approach alleviates labor-intensive manual development.

    Authors: We acknowledge that the demonstrations focus on soundness and feasibility without direct quantitative baselines. To strengthen the central claim, we will revise the evaluation section to report metrics on development effort (person-hours and hand-tuned parameters) and RL performance (success rates, sample efficiency, learning curves) against manually constructed families, along with any post-generation interventions required. revision: yes

  2. Referee: [Approach] Approach description (hybrid GA and model-transformation operationalization): while mutations and constraints are expressed as model transformations, the manuscript does not specify how domain-specific RL convergence requirements (e.g., reward shaping, state-space coverage) are encoded or validated within the transformation rules, leaving open whether substantial expert knowledge is still required to define the initial metamodel and constraints.

    Authors: Domain expertise is required to define the initial metamodel and constraints that capture RL requirements such as reward shaping and state-space coverage. Once established, the transformations and hybrid GA automate variant generation and validation. We will revise the approach section to explicitly detail how these RL-specific elements are encoded in the transformation rules and validated during search, clarifying the division between initial expert input and subsequent automation. revision: yes

Circularity Check

0 steps flagged

No circularity; methodological proposal with external demonstration

full rationale

The paper presents a model-driven engineering method using a hybrid genetic algorithm operationalized via model transformations to generate families of RL environments. It demonstrates the prototype on wildfire mitigation and curriculum learning scenarios. No equations, fitted parameters, predictions derived from inputs, uniqueness theorems, or self-citation chains appear in the provided text. The central claim rests on the soundness of the prototype tool and empirical demonstration rather than any self-referential reduction of outputs to inputs by construction. This is a standard non-circular software engineering contribution.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only; no free parameters, axioms, or invented entities are described.

pith-pipeline@v0.9.1-grok · 5706 in / 915 out tokens · 39073 ms · 2026-06-26T16:11:03.040245+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

84 extracted references · 33 canonical work pages · 2 internal anchors

  1. [1]

    Hani Abdeen, Dániel Varró, Houari Sahraoui, András Szabolcs Nagy, Csaba Debreceni, Ábel Hegedüs, and Ákos Horváth. 2014. Multi-objective optimization in rule-based design space exploration. InProceedings of the 29th ACM/IEEE International Conference on Automated Software Engineering (ASE ’14). ACM, 289–300. doi:10.1145/2642937.2643005

  2. [2]

    OpenAI: Marcin Andrychowicz et al . 2020. Learning dexterous in-hand ma- nipulation.The International Journal of Robotics Research39, 1 (2020), 3–20. doi:10.1177/0278364919887447

  3. [3]

    Angela Barriga, Rogardt Heldal, Adrian Rutle, and Ludovico Iovino. 2022. PAR- MOREL: a framework for customizable model repair.Soft. Sys. Mod.21, 5 (2022), 1739–1762

  4. [4]

    Yoshua Bengio, Jérôme Louradour, Ronan Collobert, and Jason Weston. 2009. Curriculum learning. InProceedings of the 26th Annual International Conference on Machine Learning (ICML ’09). ACM, 41–48. doi:10.1145/1553374.1553380

  5. [5]

    Gábor Bergmann, Istvan David, Ábel Hegedüs, Ákos Horváth, István Ráth, Zoltán Ujhelyi, and Dániel Varró. 2015. VIATRA 3: A Reactive Model Transformation Platform. InTheory and Practice of Model Transformations - 8th International Conference, ICMTSTAF 2015, L’Aquila, Italy, July 20-21, 2015. Proceedings (LNCS, Vol. 9152). Springer, 101–110. doi:10.1007/978...

  6. [6]

    Dimitris Bertsimas and John Tsitsiklis. 1993. Simulated Annealing.Statist. Sci.8, 1 (1993), 10 – 15. doi:10.1214/ss/1177011077

  7. [7]

    Alexandru Burdusel, Steffen Zschaler, and Stefan John. 2021. Automatic genera- tion of atomic multiplicity-preserving search operators for search-based model engineering.Soft. Sys. Mod.20, 6 (2021), 1857–1887

  8. [8]

    Thomas Chaffre, Julien Moras, Adrien Chan-Hon-Tong, and Julien Marzat. 2020. Sim-to-Real Transfer with Incremental Environment Complexity for Reinforce- ment Learning of Depth-Based Robot Navigation. InProceedings of the 17th International Conference on Informatics, Automation and Robotics, ICINCO 2020. 314–323. https://ensta.hal.science/hal-02958155

  9. [9]

    Karl Cobbe, Chris Hesse, Jacob Hilton, and John Schulman. 2020. Leveraging Procedural Generation to Benchmark Reinforcement Learning. InProc of the 37th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 119). PMLR, 2048–2056

  10. [10]

    2011.Measuring Inequality(3 ed.)

    Frank Cowell. 2011.Measuring Inequality(3 ed.). Oxford University Press, London, England

  11. [11]

    Kyanna Dagenais and Istvan David. 2025. Complex Model Transformations by Reinforcement Learning with Uncertain Human Guidance. In2025 ACM/IEEE 28th International Conference on Model Driven Engineering Languages and Systems (MODELS). doi:10.1109/MODELS67397.2025.00025

  12. [12]

    Istvan David and Eugene Syriani. 2022. DEVS Model Construction as a Rein- forcement Learning Problem. In2022 Annual Modeling and Simulation Conference (ANNSIM). IEEE, 30–41. doi:10.23919/ANNSIM55834.2022.9859369

  13. [13]

    2024.Automated Inference of Simulators in Digital Twins

    Istvan David and Eugene Syriani. 2024.Automated Inference of Simulators in Digital Twins. CRC Press, Chapter 8, 122–148. doi:10.1201/9781003425724-11

  14. [14]

    Michael Dennis, Natasha Jaques, Eugene Vinitsky, Alexandre Bayen, Stuart Rus- sell, Andrew Critch, and Sergey Levine. 2020. Emergent Complexity and Zero- shot Transfer via Unsupervised Environment Design. InAdvances in Neural Information Processing Systems, Vol. 33. Curran Associates, Inc., 13049–13061

  15. [15]

    2015.Introduction to evolutionary computing

    Agoston E Eiben and James E Smith. 2015.Introduction to evolutionary computing. Springer

  16. [16]

    Martin Eisenberg, Hans-Peter Pichler, Antonio Garmendia, and Manuel Wimmer

  17. [17]

    In2021 ACM/IEEE 24th Intl Conf

    Towards Reinforcement Learning for In-Place Model Transformations. In2021 ACM/IEEE 24th Intl Conf. on Model Driven Engineering Languages and Systems (MODELS). 82–88

  18. [18]

    Tarek A El-Mihoub, Adrian A Hopgood, Lars Nolle, and Alan Battersby. [n. d.]. Hybrid Genetic Algorithms: A Review. ([n. d.])

  19. [19]

    Maged Elaasar, Nicolas Rouquette, David Wagner, Bentley James Oakes, Abdel- wahab Hamou-Lhadj, and Mohammad Hamdaqa. 2023. openCAESAR: Balancing Agility and Rigor in Model-Based Systems Engineering. In2023 ACM/IEEE Interna- tional Conference on Model Driven Engineering Languages and Systems Companion (MODELS-C). 221–230. doi:10.1109/MODELS-C59198.2023.00051

  20. [20]

    Rafael Figueiredo Prudencio, Marcos R. O. A. Maximo, and Esther Luna Colombini

  21. [21]

    doi:10.1109/TNNLS.2023.3250269

    A Survey on Offline Reinforcement Learning: Taxonomy, Review, and Open Problems.IEEE Trans Neural Netw Learn Syst35, 8 (2024), 10237–10257. doi:10.1109/TNNLS.2023.3250269

  22. [22]

    Carlos Florensa, David Held, Markus Wulfmeier, Michael Zhang, and Pieter Abbeel. 2017. Reverse Curriculum Generation for Reinforcement Learning. In Proceedings of the 1st Annual Conference on Robot Learning (Proceedings of Machine Learning Research, Vol. 78). PMLR, 482–495

  23. [23]

    Nicola Gatto, Evgeny Kusmenko, and Bernhard Rumpe. 2019. Modeling Deep Reinforcement Learning Based Architectures for Cyber-Physical Systems. In2019 ACM/IEEE 22nd International Conference on Model Driven Engineering Languages and Systems Companion. 196–202. doi:10.1109/MODELS-C.2019.00033

  24. [24]

    Timothy Hospedales et al. 2022. Meta-Learning in Neural Networks: A Survey. IEEE Transactions on Pattern Analysis and Machine Intelligence44, 9 (2022), 5149–

  25. [25]

    doi:10.1109/TPAMI.2021.3079209

  26. [26]

    Mengkang Hu et al. 2025. AgentGen: Enhancing Planning Abilities for Large Lan- guage Model based Agent via Environment and Task Generation. InProceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V.1 (KDD ’25). ACM, 496–507. doi:10.1145/3690624.3709321

  27. [27]

    Xuemin Hu, Shen Li, Tingyu Huang, Bo Tang, Rouxing Huai, and Long Chen

  28. [28]

    doi:10.1109/TIV.2023.3312777

    How Simulation Helps Autonomous Driving: A Survey of Sim2real, Digital Twins, and Parallel Intelligence.IEEE Transactions on Intelligent Vehicles9, 1 (2024), 593–612. doi:10.1109/TIV.2023.3312777

  29. [29]

    Stefan John, Alexandru Burdusel, Robert Bill, Daniel Struber, Gabriele Taentzer, Steffen Zschaler, and Manuel Wimmer. 2019. Searching for optimal models: Comparing two encoding approaches. In12th International Conference on Model Transformations ICMT 2019. 1–22

  30. [30]

    Stefan John, Jens Kosiol, Leen Lambers, and Gabriele Taentzer. 2023. A graph- based framework for model-driven optimization facilitating impact analysis of mutation operator properties.Soft. Sys. Mod.22, 4 (2023), 1281–1318

  31. [31]

    Lawrence Johnson, Georgios N Yannakakis, and Julian Togelius. 2010. Cellular automata for real-time generation of infinite cave levels. InProceedings of the 2010 Workshop on Procedural Content Generation in Games. 1–4

  32. [32]

    Joerg Kienzle et al . 2023. Global Decision Making Over Deep Variability in Feedback-Driven Software Development. InProceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering (ASE ’22). ACM, Article 178, 6 pages. doi:10.1145/3551349.3559551

  33. [33]

    Taewoo Kim, Minsu Jang, and Jaehong Kim. 2021. A survey on simulation environments for reinforcement learning. In2021 18th International Conference on Ubiquitous Robots (UR). IEEE, 63–67

  34. [34]

    Thomas Kühne, Gergely Mezei, Eugene Syriani, Hans Vangheluwe, and Manuel Wimmer. 2010. Explicit Transformation Modeling. InModels in Software Engi- neering. Springer, 240–255

  35. [35]

    Evgeny Kusmenko et al . 2022. A Model-Driven Generative Self Play-Based Toolchain for Developing Games and Players. InProceedings of the 21st ACM SIGPLAN International Conference on Generative Programming: Concepts and Experiences (GPCE 2022). ACM, 95–107. doi:10.1145/3564719.3568687

  36. [36]

    Marta Kwiatkowska and Xiyue Zhang. 2023. When to Trust AI: Advances and Challenges for Certification of Neural Networks. In2023 18th Conference on Com- puter Science and Intelligence Systems (FedCSIS). 25–37. doi:10.15439/2023F2324

  37. [37]

    Hartmut Lackner and Bernd-Holger Schlingloff. 2017. Chapter Four - Advances in Testing Software Product Lines. Advances in Computers, Vol. 107. Elsevier, 157–217. doi:10.1016/bs.adcom.2017.07.001

  38. [38]

    José Lameh, Alexandra Dubray, and Marija Jankovic. 2025. Modeling variability in product line engineering (PLE) for systems engineering (SE).Proceedings of the Design Society5 (2025), 2491–2500. doi:10.1017/pds.2025.10263

  39. [39]

    William Liang, Sam Wang, Hung-Ju Wang, Osbert Bastani, Dinesh Jayaraman, and Yecheng Jason Ma. 2024. Eurekaverse: Environment curriculum generation via large language models.arXiv preprint arXiv:2411.01775(2024)

  40. [40]

    Khan, John Mylopoulos, and Reza Golipour

    Sotirios Liaskos, Shakil M. Khan, John Mylopoulos, and Reza Golipour. 2025. Model-Driven Design and Generation of Training Simulators for Reinforcement Learning. InConceptual Modeling. Springer, 170–191

  41. [41]

    Jiashuo Liu, Zheyan Shen, Yue He, Xingxuan Zhang, Renzhe Xu, Han Yu, and Peng Cui. 2023. Towards Out-Of-Distribution Generalization: A Survey. arXiv:2108.13624 [cs.LG] https://arxiv.org/abs/2108.13624

  42. [42]

    Xiaoran Liu and Istvan David. 2025. AI Simulation by Digital Twins: Systematic Survey, Reference Framework, and Mapping to a Standardized Architecture. Software and Systems Modeling(2025). doi:10.1007/s10270-025-01306-0

  43. [43]

    Xiaoran Liu and Istvan David. 2026. A Reference Architecture of Reinforcement Learning Frameworks. In2026 IEEE 23rd International Conference on Software Architecture (ICSA). doi:10.1109/ICSA66085.2026.00016

  44. [44]

    Viktor Makoviychuk et al . 2021. Isaac Gym: High Performance GPU-Based Physics Simulation For Robot Learning. doi:10.48550/arXiv.2108.10470

  45. [45]

    Dhruv Malik, Yuanzhi Li, and Pradeep Ravikumar. 2021. When Is Generalizable Reinforcement Learning Tractable?. InAdvances in Neural Information Processing Systems, Vol. 34. Curran Associates, Inc., 8032–8045

  46. [46]

    Tambet Matiisen, Avital Oliver, Taco Cohen, and John Schulman. 2019. Teacher– student curriculum learning.IEEE transactions on neural networks and learning systems31, 9 (2019), 3732–3740

  47. [47]

    Zentner, Ryan Julian, J K Terry, Isaac Woungang, Nariman Farsad, and Pablo Samuel Castro

    Reginald McLean, Evangelos Chatzaroulas, Luc McCutcheon, Frank Röder, Tianhe Yu, Zhanpeng He, K.R. Zentner, Ryan Julian, J K Terry, Isaac Woungang, Nariman Farsad, and Pablo Samuel Castro. 2025. Meta-World+: An Improved, Standardized, RL Benchmark. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems Datasets and Benchmarks Track

  48. [48]

    Marjan Mernik, Jan Heering, and Anthony M. Sloane. 2005. When and how to develop domain-specific languages.ACM Comput. Surv.37, 4 (Dec. 2005), 316–344. doi:10.1145/1118890.1118892

  49. [49]

    Tim Molderez, Bjarno Oeyen, Coen De Roover, and Wolfgang De Meuter. 2019. Marlon: A domain-specific language for multi-agent reinforcement learning on networks. InProc of the 34th ACM/SIGAPP Symposium on Applied Computing. ACM, 1322–1329. doi:10.1145/3297280.3297413

  50. [50]

    Pablo Moscato et al. [n. d.]. On evolution, search, optimization, genetic algorithms and martial arts: Towards memetic algorithms. ([n. d.]). MODELS ’26, October 4–9, 2026, Malaga, Spain Liu and David

  51. [51]

    Dirk Muthig and Colin Atkinson. 2002. Model-Driven Product Line Architectures. InSoftware Product Lines. Springer, 110–129

  52. [52]

    Taylor, and Peter Stone

    Sanmit Narvekar, Bei Peng, Matteo Leonetti, Jivko Sinapov, Matthew E. Taylor, and Peter Stone. 2020. Curriculum Learning for Reinforcement Learning Domains: A Framework and Survey.J Machine Learning Research21, 181 (2020), 1–50

  53. [53]

    Sanmit Narvekar, Jivko Sinapov, Matteo Leonetti, and Peter Stone. 2016. Source task creation for curriculum learning. InProceedings of the 2016 international conference on autonomous agents & multiagent systems. 566–574

  54. [54]

    Hira Naveed, Chetan Arora, Hourieh Khalajzadeh, John Grundy, and Omar Haggag. 2024. Model driven engineering for machine learning components: A systematic literature review.Inf Softw Technol169 (2024), 107423

  55. [55]

    Evangelos Ntentos, Stephen John Warnett, and Uwe Zdun. 2024. Supporting architectural decision making on training strategies in reinforcement learning architectures. In21st Intl Conf on Software Architecture (ICSA). IEEE, 90–100

  56. [56]

    Maria Joao Varanda Pereira, Joao Fonseca, and Pedro Rangel Henriques. 2016. Ontological approach for DSL development.Computer Languages, Systems & Structures45 (2016), 35–52

  57. [57]

    Andrei Pitkevich and Ilya Makarov. 2024. A Survey on Sim-to-Real Transfer Methods for Robotic Manipulation. InIEEE Intl Symposium on Intelligent Systems and Informatics (SISY). 000259–000266. doi:10.1109/SISY62279.2024.10737545

  58. [58]

    Martin L Puterman. 1990. Markov decision processes.Handbooks in operations research and management science2 (1990), 331–434

  59. [59]

    2022.Simulation

    Sheldon M Ross. 2022.Simulation. academic press

  60. [60]

    Oszkár Semeráth, Aren A Babikian, Boqi Chen, Chuning Li, Kristóf Marussy, Gábor Szárnyas, and Dániel Varró. 2021. Automated generation of consistent, diverse and structurally realistic graph models.Soft. Sys. Mod.20, 5 (2021), 1713–1734. doi:10.1007/s10270-021-00884-z

  61. [61]

    Oszkár Semeráth, Rebeka Farkas, Gábor Bergmann, and Dániel Varró. 2020. Diversity of graph models and graph generators in mutation testing.Int J Softw Tools Technol Transf22, 1 (2020), 57–78. doi:10.1007/s10009-019-00530-6

  62. [62]

    C. E. Shannon. 1948. A mathematical theory of communication.The Bell System Technical Journal27, 3 (1948), 379–423. doi:10.1002/j.1538-7305.1948.tb01338.x

  63. [63]

    Shephard and Rolf Färe

    Ronald W. Shephard and Rolf Färe. 1974. The Law of Diminishing Returns. In Production Theory. Springer, 287–318. doi:10.1007/978-3-642-80864-7_17

  64. [64]

    Daniele F Silva, Rafael P Torchelsen, and Marilton S Aguiar. 2025. Procedural game level generation with GANs: potential, weaknesses, and unresolved challenges in the literature.Multimedia Tools and Applications(2025), 1–27

  65. [65]

    Natalie Sinani et al. 2024. Towards a Domain-Specific Modelling Environment for Reinforcement Learning.arXiv preprint arXiv:2410.09368(2024)

  66. [66]

    Shagun Sodhani, Amy Zhang, and Joelle Pineau. 2021. Multi-task reinforce- ment learning with context-based representations. InInternational conference on machine learning. PMLR, 9767–9779

  67. [67]

    Petru Soviany et al. 2022. Curriculum Learning: A Survey.International Journal of Computer Vision130, 6 (2022), 1526–1565. doi:10.1007/s11263-022-01611-x

  68. [68]

    1998.Reinforcement learning: An intro- duction

    Richard S Sutton and Andrew G Barto. 1998.Reinforcement learning: An intro- duction. MIT press Cambridge

  69. [69]

    Jordan Terry et al. 2021. PettingZoo: Gym for Multi-Agent Reinforcement Learn- ing. InAdvances in Neural Information Processing Systems, Vol. 34. Curran Asso- ciates, Inc., 15032–15043

  70. [70]

    2011.Graphs: theory and algorithms

    Krishnaiyan Thulasiraman and Madisetti NS Swamy. 2011.Graphs: theory and algorithms. John Wiley & Sons

  71. [71]

    Massimo Tisi, Frédéric Jouault, Piero Fraternali, Stefano Ceri, and Jean Bézivin

  72. [72]

    InProceedings of the 5th European Conference on Model Driven Architecture - Foundations and Applications (ECMDA-FA ’09)

    On the Use of Higher-Order Model Transformations. InProceedings of the 5th European Conference on Model Driven Architecture - Foundations and Applications (ECMDA-FA ’09). Springer, 18–33. doi:10.1007/978-3-642-02674-4_3

  73. [73]

    Cover and Joy A Thomas

    T.M. Cover and Joy A Thomas. 1991.Elements of Information Theory(99 ed.). John Wiley & Sons, Nashville, TN

  74. [74]

    Josh Tobin, Rachel Fong, Alex Ray, Jonas Schneider, Wojciech Zaremba, and Pieter Abbeel. 2017. Domain randomization for transferring deep neural networks from simulation to the real world. In2017 IEEE/RSJ international conference on intelligent robots and systems (IROS). IEEE, 23–30

  75. [75]

    Julian Togelius, Alex J Champandard, Pier Luca Lanzi, Michael Mateas, Ana Paiva, Mike Preuss, and Kenneth O Stanley. 2013. Procedural content generation: Goals, challenges and actionable steps

  76. [76]

    Mark Towers et al. 2025. Gymnasium: A Standard Interface for Reinforcement Learning Environments. doi:10.48550/arXiv.2407.17032

  77. [77]

    Y.R. Tsoy. 2003. The influence of population size and search time limit on genetic algorithm. In7th Korea-Russia International Symposium on Science and Technology, Proceedings KORUS 2003. (IEEE Cat. No.03EX737), Vol. 3. 181–187 vol.3

  78. [78]

    Dejin Wang and Seyede Fatemeh Ghoreishi. 2025. RGDR: Reward-Guided Do- main Randomization for Autonomous Driving. In2025 IEEE 28th International Conference on Intelligent Transportation Systems (ITSC 2025), IEEE

  79. [79]

    Xin Wang et al. 2022. A Survey on Curriculum Learning.IEEE Transactions on Pattern Analysis and Machine Intelligence44, 9 (2022), 4555–4576. doi:10.1109/ TPAMI.2021.3069908

  80. [80]

    Christopher JCH Watkins and Peter Dayan. 1992. Q-learning.Machine learning 8, 3 (1992), 279–292

Showing first 80 references.