LLM-based Generation of Semantically Diverse and Realistic Domain Model Instances
Pith reviewed 2026-05-10 15:21 UTC · model grok-4.3
The pith
Large language models can generate mostly correct and semantically realistic instances of UML domain models when prompted with class diagram descriptions.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
LLMs prompted with class diagram descriptions, used together with existing validation tools, produce instances that are mostly syntactically correct, conform to the domain model, contain only a few semantic errors, and exhibit semantically diverse realistic values whose combinations within each instance remain coherent.
What carries the argument
The combination of large language models with two prompting strategies applied to class diagram descriptions, followed by validation with existing model-checking tools.
If this is right
- Educators obtain ready-to-use concrete examples for teaching domain modeling without manual construction.
- Research projects can draw on diverse yet model-conformant data sets for analysis or simulation.
- Modeling environments can automate the creation of test populations that respect both structure and domain meaning.
- The effort to prepare example instances for validation or demonstration drops while preserving semantic realism.
Where Pith is reading between the lines
- The same prompting-plus-validation pattern might apply to other diagram types such as state machines or activity diagrams.
- Adding domain-specific knowledge bases could further reduce the remaining semantic errors observed in the experiments.
- Scaling tests to larger industrial models would show whether the approach remains practical when class counts and constraints grow.
- Generated instances could serve as training data for other model-related machine learning tasks that need realistic examples.
Load-bearing premise
Large language models given only class diagram descriptions can reliably infer real-world domain semantics and produce coherent value sets without extra domain training or knowledge bases.
What would settle it
Running the generation process on a fresh collection of domain models and finding that most resulting instances contain multiple semantic inconsistencies or unrealistic value combinations.
Figures
read the original abstract
Large Language Models (LLMs) have been recently proposed for supporting domain modeling tasks mostly related to the completion of partial models by recommending additional model elements. However, there are many more modeling tasks, one of them being the instantiation of domain models to represent concrete domain objects. While there is considerable work supporting the generation of structurally valid instantiations, there are still open challenges to incorporating real-world semantics by having realistic values contained in instances and ensuring the generation of semantically diverse models. Only then will such generated models become human-understandable and helpful in educational or data-driven research contexts. To tackle these challenges, this paper presents an approach that employs LLMs and two prompting strategies in combination with existing model validation tools for instantiating semantically realistic and diverse domain models expressed as UML class diagrams. We have applied our approach to models used in education and available in the literature from different domains and evaluated the generated instances in terms of syntactic correctness, model conformance, semantic correctness, and diversity of the generated values. The results show that the generated instances are mostly syntactically correct, that they conform to the domain model, and that there are only a few semantic errors. Moreover, the generated instance values are semantically diverse, i.e., concrete realistic examples in line with the domain and the combination of the values within one model are semantically coherent.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes an LLM-based method using two prompting strategies combined with existing model validation tools to instantiate UML class diagrams. The generated instances are evaluated for syntactic correctness, conformance to the domain model, semantic correctness (few errors), and semantic diversity (realistic, domain-appropriate values with coherent combinations within each instance). The approach is tested on educational and literature models from multiple domains, with results indicating mostly positive outcomes on all four criteria.
Significance. If the semantic realism and diversity claims hold under rigorous evaluation, the work would address a clear gap in domain model instantiation: moving beyond structural validity to produce human-understandable, realistic examples useful for education and data-driven research. The combination of LLMs with validation tools is a practical contribution, but its impact depends on demonstrating that the semantic properties are reproducible and not artifacts of subjective assessment.
major comments (2)
- [Evaluation] Evaluation section: the abstract and results claim 'mostly syntactically correct,' 'conform to the domain model,' 'only a few semantic errors,' and 'semantically diverse' values that are 'concrete realistic examples' with 'semantically coherent' combinations, yet no exact metrics, trial counts, error definitions, or inter-rater reliability statistics are provided. This makes the central empirical claims difficult to reproduce or falsify.
- [Results] Results section: semantic correctness and diversity are assessed without reported baselines (e.g., random or template-based value assignment), control conditions, or comparison to prior non-LLM instantiation techniques. Without these, it is unclear whether the LLM prompting genuinely improves semantic realism or merely produces plausible output.
minor comments (2)
- [Approach] The description of the two prompting strategies would benefit from explicit examples of the prompts used and how they differ in handling class attributes versus associations.
- [Evaluation] Consider reporting the exact number of models tested, their sizes (number of classes/attributes), and the domains represented to allow readers to assess generalizability.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive comments, which highlight important opportunities to strengthen the empirical rigor of our work. We address each major comment below and commit to revisions that improve reproducibility and contextualization without altering the core claims or contributions of the manuscript.
read point-by-point responses
-
Referee: [Evaluation] Evaluation section: the abstract and results claim 'mostly syntactically correct,' 'conform to the domain model,' 'only a few semantic errors,' and 'semantically diverse' values that are 'concrete realistic examples' with 'semantically coherent' combinations, yet no exact metrics, trial counts, error definitions, or inter-rater reliability statistics are provided. This makes the central empirical claims difficult to reproduce or falsify.
Authors: We agree that the current Evaluation section would benefit from greater quantitative detail and explicit definitions to support reproducibility. In the revised manuscript, we will expand this section to report the precise number of generation trials and instances produced for each domain model, exact counts and percentages for syntactic correctness and model conformance (e.g., number of instances passing parser checks and OCL validation), clear operational definitions of semantic errors (e.g., attribute values that are implausible given domain knowledge or combinations that violate real-world coherence), and the assessment procedure for semantic diversity (manual review of value variety and intra-instance coherence). The semantic evaluation was performed by the authors with iterative cross-checking for consensus; we will describe this process in detail and acknowledge the absence of formal inter-rater reliability metrics as a limitation. revision: yes
-
Referee: [Results] Results section: semantic correctness and diversity are assessed without reported baselines (e.g., random or template-based value assignment), control conditions, or comparison to prior non-LLM instantiation techniques. Without these, it is unclear whether the LLM prompting genuinely improves semantic realism or merely produces plausible output.
Authors: Our primary aim was to demonstrate the feasibility of the LLM-based approach combined with validation tools for producing instances with the targeted semantic properties, rather than to perform a full comparative benchmark. We acknowledge that the absence of explicit baselines leaves open questions about relative improvement. In the revision, we will add a dedicated subsection in Results that compares our outputs to prior non-LLM techniques discussed in the related work (e.g., random instantiation and constraint-based solvers), explaining that such methods reliably achieve structural validity but typically produce semantically unrealistic or repetitive values. We will also include a small illustrative baseline using template-based random assignment on one of the evaluated models to contrast semantic quality, while noting that a comprehensive controlled experiment lies beyond the scope of this feasibility study. revision: partial
Circularity Check
No circularity: empirical evaluation is independent of inputs
full rationale
The paper presents a prompting-based LLM method for generating UML class diagram instances, then evaluates outputs via mechanical validation tools for syntactic correctness and model conformance plus separate checks for semantic correctness and value diversity. No equations, fitted parameters, predictions derived from those parameters, self-definitional constructs, uniqueness theorems, or ansatzes appear in the abstract or described approach. Central claims rest on external validation steps rather than reducing to the generation process by construction, so the derivation chain is self-contained.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Large language model assisted software engineering: prospects, challenges, and a case study,
L. Belzner, T. Gabor, and M. Wirsing, “Large language model assisted software engineering: prospects, challenges, and a case study,” inIn- ternational Conference on Bridging the Gap between AI and Reality. Springer, 2023, pp. 355–374
work page 2023
-
[2]
Large language models for software engi- neering: A systematic literature review,
X. Hou, Y . Zhao, Y . Liu, Z. Yang, K. Wang, L. Li, X. Luo, D. Lo, J. Grundy, and H. Wang, “Large language models for software engi- neering: A systematic literature review,”ACM Transactions on Software Engineering and Methodology, 2023
work page 2023
-
[3]
Conceptual modeling and artificial intelligence: A systematic mapping study,
D. Bork, S. J. Ali, and B. Roelens, “Conceptual modeling and artificial intelligence: A systematic mapping study,”CoRR, vol. abs/2303.06758,
-
[4]
Conceptual modeling and artificial intelligence: A systematic mapping study,
[Online]. Available: https://doi.org/10.48550/arXiv.2303.06758
-
[5]
S. Rädler, L. Berardinelli, K. Winter, A. Rahimi, and S. Rinderle-Ma, “Bridging MDE and AI: a systematic review of domain-specific lan- guages and model-driven practices in AI software systems engineering,” Software and Systems Modeling, 2024
work page 2024
-
[6]
Towards using few-shot prompt learning for automating model completion,
M. B. Chaaben, L. Burgueño, and H. A. Sahraoui, “Towards using few-shot prompt learning for automating model completion,” in45th IEEE/ACM International Conference on Software Engineering: New Ideas and Emerging Results, NIER@ICSE, Melbourne, Australia, May 14-20, 2023. IEEE, 2023, pp. 7–12
work page 2023
-
[7]
Automated domain modeling with large language models: A comparative study,
K. Chen, Y . Yang, B. Chen, J. A. H. López, G. Mussbacher, and D. Varró, “Automated domain modeling with large language models: A comparative study,” in26th ACM/IEEE International Conference on Model Driven Engineering Languages and Systems, MODELS 2023, Västerås, Sweden, October 1-6, 2023. IEEE, 2023, pp. 162–172
work page 2023
-
[8]
A systematic approach to generate diverse instantiations for conceptual schemas,
L. Burgueño, J. Cabot, R. Clarisó, and M. Gogolla, “A systematic approach to generate diverse instantiations for conceptual schemas,” inConceptual Modeling - 38th International Conference, ER 2019, Salvador, Brazil, November 4-7, 2019, Proceedings, ser. Lecture Notes in Computer Science, A. H. F. Laender, B. Pernici, E. Lim, and J. P. M. de Oliveira, Eds....
-
[9]
Yekta: A low-code framework for automated test models generation,
M. Karimi, S. Kolahdouz-Rahimi, and J. Troya, “Yekta: A low-code framework for automated test models generation,”SoftwareX, vol. 27, p. 101850, 2024
work page 2024
-
[10]
Viatra solver: A framework for the automated generation of consistent domain-specific models,
O. Semeráth, A. A. Babikian, S. Pilarski, and D. Varró, “Viatra solver: A framework for the automated generation of consistent domain-specific models,” in2019 IEEE/ACM 41st International Conference on Software Engineering: Companion Proceedings (ICSE-Companion), 2019, pp. 43–46
work page 2019
-
[11]
A graph solver for the auto- mated generation of consistent domain-specific models,
O. Semeráth, A. S. Nagy, and D. Varró, “A graph solver for the auto- mated generation of consistent domain-specific models,” inProceedings of the 40th International Conference on Software Engineering, ICSE 2018, Gothenburg, Sweden, May 27 - June 03, 2018. ACM, 2018, pp. 969–980
work page 2018
-
[12]
Generating instance models from meta models,
K. Ehrig, J. M. Küster, and G. Taentzer, “Generating instance models from meta models,”Softw. Syst. Model., vol. 8, no. 4, pp. 479–500, 2009
work page 2009
-
[13]
Generating large EMF models efficiently - A rule-based, configurable approach,
N. Nassar, J. Kosiol, T. Kehrer, and G. Taentzer, “Generating large EMF models efficiently - A rule-based, configurable approach,” in Fundamental Approaches to Software Engineering - 23rd International Conference, FASE 2020. Springer, 2020, pp. 224–244
work page 2020
-
[14]
Diversity of graph models and graph generators in mutation testing,
O. Semeráth, R. Farkas, G. Bergmann, and D. Varró, “Diversity of graph models and graph generators in mutation testing,”Int. J. Softw. Tools Technol. Transf., vol. 22, no. 1, pp. 57–78, 2020
work page 2020
-
[15]
Generating structurally realistic models with deep autoregressive networks,
J. A. H. López and J. S. Cuadrado, “Generating structurally realistic models with deep autoregressive networks,”IEEE Trans. Software Eng., vol. 49, no. 4, pp. 2661–2676, 2023
work page 2023
-
[16]
Empirical evidence about the UML: a systematic literature review,
D. Budgen, A. J. Burn, O. P. Brereton, B. A. Kitchenham, and R. Preto- rius, “Empirical evidence about the UML: a systematic literature review,” Softw. Pract. Exp., vol. 41, no. 4, pp. 363–392, 2011
work page 2011
-
[17]
J. Hutchinson, J. Whittle, and M. Rouncefield, “Model-driven engineer- ing practices in industry: Social, organizational and managerial factors that lead to success or failure,”Science of Computer Programming, vol. 89, pp. 144–161, 2014
work page 2014
-
[18]
A model-driven approach for developing a model repository: Methodology and tool support,
B. Hamid, “A model-driven approach for developing a model repository: Methodology and tool support,”Future Gener. Comput. Syst., vol. 68, pp. 473–490, 2017
work page 2017
-
[19]
Modelset: a dataset for machine learning in model-driven engineering,
J. A. H. López, J. L. C. Izquierdo, and J. S. Cuadrado, “Modelset: a dataset for machine learning in model-driven engineering,”Softw. Syst. Model., vol. 21, no. 3, pp. 967–986, 2022
work page 2022
-
[20]
P.-L. Glaser, E. Sallinger, and D. Bork, “The extended EA ModelSet— a FAIR dataset for researching and reasoning enterprise architecture modeling practices,”Software and Systems Modeling, 2025
work page 2025
-
[21]
Beobachtungen und Einsichten zu Reposito- rys von BPMN-Modellen,
R. Laue and M. Läuter, “Beobachtungen und Einsichten zu Reposito- rys von BPMN-Modellen,” inModellierung 2024, Potsdam, Germany, March 12-15, 2024, ser. LNI, vol. P-348. Gesellschaft für Informatik e.V ., 2024, pp. 157–173
work page 2024
-
[22]
USE: A UML-based specifica- tion environment for validating UML and OCL,
M. Gogolla, F. Büttner, and M. Richters, “USE: A UML-based specifica- tion environment for validating UML and OCL,”Sci. Comput. Program., vol. 69, no. 1-3, pp. 27–34, 2007
work page 2007
-
[23]
S. Joel, J. J. Wu, and F. H. Fard, “A Survey on LLM-based Code Generation for Low-Resource and Domain-Specific Programming Languages,” Nov. 2024, arXiv:2410.03981 [cs]. [Online]. Available: http://arxiv.org/abs/2410.03981
-
[24]
Anonymous, “Accompanying software repository for the paper: LLM-based generation of semantically diverse and realistic domain model instances,” https://anonymous.4open.science/r/ instance-generation-MODELS25/, 2025, accessed: 2025-04-03
work page 2025
-
[25]
Chain-of-thought prompting elicits reasoning in large language models,
J. Wei, X. Wang, D. Schuurmans, M. Bosma, B. Ichter, F. Xia, E. H. Chi, Q. V . Le, and D. Zhou, “Chain-of-thought prompting elicits reasoning in large language models,” inProceedings of the 36th International Conference on Neural Information Processing Systems, ser. NIPS ’22. Red Hook, NY , USA: Curran Associates Inc., 2022
work page 2022
- [26]
-
[27]
The good, the bad, and the greedy: Evaluation of llms should not ignore non-determinism, 2024
Y . Song, G. Wang, S. Li, and B. Y . Lin, “The good, the bad, and the greedy: Evaluation of llms should not ignore non-determinism,”arXiv preprint arXiv:2407.10457, 2024
-
[28]
Kodkod: A relational model finder,
E. Torlak and D. Jackson, “Kodkod: A relational model finder,” in Tools and Algorithms for the Construction and Analysis of Systems, 13th International Conference, TACAS 2007, Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2007 Braga, Portugal, March 24 - April 1, 2007, Proceedings, ser. Lecture Notes in Computer ...
work page 2007
-
[29]
Testing models and model transformations using classifying terms,
F. Hilken, M. Gogolla, L. Burgueño, and A. Vallecillo, “Testing models and model transformations using classifying terms,”Softw. Syst. Model., vol. 17, no. 3, pp. 885–912, 2018
work page 2018
-
[30]
Fixing defects in integrity constraints via constraint mutation,
R. Clarisó and J. Cabot, “Fixing defects in integrity constraints via constraint mutation,” in11th International Conference on the Quality of Information and Communications Technology, QUATIC 2018, Coimbra, Portugal, September 4-7, 2018, A. Bertolino, V . Amaral, P. Rupino, and M. Vieira, Eds. IEEE Computer Society, 2018, pp. 74–82
work page 2018
-
[31]
S. J. Ali, I. Reinhartz-Berger, and D. Bork, “How are LLMs Used for Conceptual Modeling? An Exploratory Study on Interaction Behavior and User Perception,” inConceptual Modeling - 43rd International Conference, ER 2024, Pittsburgh, PA, USA, October 28-31, 2024, Proceedings, ser. Lecture Notes in Computer Science, W. Maass, H. Han, H. Yasar, and N. J. Mult...
work page 2024
-
[32]
On the assessment of generative AI in modeling tasks: an experience report with ChatGPT and UML,
J. Cámara, J. Troya, L. Burgueño, and A. Vallecillo, “On the assessment of generative AI in modeling tasks: an experience report with ChatGPT and UML,”Softw. Syst. Model., vol. 22, no. 3, pp. 781–793, 2023
work page 2023
-
[33]
Application of the Tree-of-Thoughts Framework to LLM-Enabled Domain Modeling,
J. Silva, Q. Ma, J. Cabot, P. Kelsen, and H. A. Proper, “Application of the Tree-of-Thoughts Framework to LLM-Enabled Domain Modeling,” inConceptual Modeling - 43rd International Conference, ER 2024, Pittsburgh, PA, USA, October 28-31, 2024, Proceedings, ser. Lecture Notes in Computer Science, W. Maass, H. Han, H. Yasar, and N. J. Multari, Eds., vol. 1523...
work page 2024
-
[34]
Large language models as oracles for instantiating ontologies with domain-specific knowledge,
G. Ciatto, A. Agiollo, M. Magnini, and A. Omicini, “Large language models as oracles for instantiating ontologies with domain-specific knowledge,”CoRR, vol. abs/2404.04108, 2024
-
[35]
A. B. Benevides, G. Guizzardi, B. F. B. Braga, and J. P. A. Almeida, “Validating Modal Aspects of OntoUML Conceptual Models Using Automatically Generated Visual World Structures,”J. Univers. Comput. Sci., vol. 16, no. 20, pp. 2904–2933, 2010
work page 2010
-
[36]
Jackson,Software Abstractions - Logic, Language, and Analysis
D. Jackson,Software Abstractions - Logic, Language, and Analysis. MIT Press, 2006. [Online]. Available: http://mitpress.mit.edu/catalog/ item/default.asp?ttype=2&tid=10928
work page 2006
-
[37]
T. P. Sales and G. Guizzardi, “Ontological anti-patterns: empirically uncovered error-prone structures in ontology-driven conceptual models,” Data Knowl. Eng., vol. 99, pp. 72–104, 2015
work page 2015
-
[38]
Uml2alloy: A chal- lenging model transformation,
K. Anastasakis, B. Bordbar, G. Georg, and I. Ray, “Uml2alloy: A chal- lenging model transformation,” inModel Driven Engineering Languages and Systems, 10th International Conference, MoDELS 2007, Nashville, USA, September 30 - October 5, 2007, Proceedings, ser. Lecture Notes in Computer Science, G. Engels, B. Opdyke, D. C. Schmidt, and F. Weil, Eds., vol. ...
work page 2007
-
[39]
Generating realistic test models for model processing tools,
P. Pietsch, H. S. Yazdi, and U. Kelter, “Generating realistic test models for model processing tools,” in26th IEEE/ACM International Confer- ence on Automated Software Engineering (ASE 2011), Lawrence, KS, USA, November 6-10, 2011, P. Alexander, C. S. Pasareanu, and J. G. Hosking, Eds. IEEE Computer Society, 2011, pp. 620–623
work page 2011
-
[40]
Uniform random generation of huge metamodel instances,
A. Mougenot, A. Darrasse, X. Blanc, and M. Soria, “Uniform random generation of huge metamodel instances,” inModel Driven Architecture - Foundations and Applications, 5th European Conference, ECMDA- FA 2009, Enschede, The Netherlands, June 23-26, 2009. Proceedings, ser. Lecture Notes in Computer Science, R. F. Paige, A. Hartman, and A. Rensink, Eds., vol....
work page 2009
-
[41]
Boltzmann samplers for the random generation of combinatorial structures,
P. Duchon, P. Flajolet, G. Louchard, and G. Schaeffer, “Boltzmann samplers for the random generation of combinatorial structures,”Comb. Probab. Comput., vol. 13, no. 4-5, pp. 577–625, 2004
work page 2004
-
[42]
AtlanMod Team, “EMF– random instantiator.” [Online]. Avail- able: https://github.com/atlanmod/mondo-atlzoo-benchmark/tree/master/ fr.inria.atlanmod.instantiator
-
[43]
Generation of large random models for benchmarking,
M. Scheidgen, “Generation of large random models for benchmarking,” inProceedings of the 3rd Workshop on Scalable Model Driven Engineering part of the Software Technologies: Applications and Foundations (STAF 2015) federation of conferences, L’Aquila, Italy, July 23, 2015, ser. CEUR Workshop Proceedings, D. S. Kolovos, D. D. Ruscio, N. D. Matragkas, J. S....
work page 2015
-
[44]
EMG: A domain- specific transformation language for synthetic model generation,
S. Popoola, D. S. Kolovos, and H. H. Rodriguez, “EMG: A domain- specific transformation language for synthetic model generation,” in Theory and Practice of Model Transformations - 9th International Conference, ICMT@STAF 2016, Vienna, Austria, July 4-5, 2016, Pro- ceedings, ser. Lecture Notes in Computer Science, P. V . Gorp and G. Engels, Eds., vol. 9765....
work page 2016
-
[45]
Towards the characterization of realistic models: evaluation of multidisciplinary graph metrics,
G. Szárnyas, Z. Kovári, Á. Salánki, and D. Varró, “Towards the characterization of realistic models: evaluation of multidisciplinary graph metrics,” inProceedings of the ACM/IEEE 19th International Conference on Model Driven Engineering Languages and Systems, Saint-Malo, France, October 2-7, 2016, B. Baudry and B. Combemale, Eds. ACM, 2016, pp. 87–94. [On...
work page 2016
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.