Large Language Models to Enhance Business Process Modeling: Past, Present, and Future Trends
Pith reviewed 2026-05-10 12:54 UTC · model grok-4.3
The pith
Large language models are transforming natural language into BPMN process models through prompting and iteration, though semantic accuracy and evaluation remain problematic.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Following a structured review, the analysis reveals a clear shift from rule-based and traditional NLP pipelines toward LLM-based architectures that rely on prompt engineering, intermediate representations, and iterative refinement mechanisms. While these approaches significantly expand the capabilities of automated process model generation, the literature exposes persistent challenges related to semantic correctness, evaluation fragmentation, reproducibility, and limited validation in real-world organizational contexts.
What carries the argument
A structured literature review that classifies existing AI-driven text-to-BPMN approaches and examines the integration of LLMs through prompt engineering, intermediate representations, and iterative refinement.
If this is right
- LLM-based methods expand automated process model generation beyond the limits of rigid rule systems.
- Generated models still risk semantic inaccuracies that could mislead organizational decisions.
- Fragmented evaluation practices make it hard to compare or trust different proposed solutions.
- Reproducibility issues hinder cumulative progress in the field.
- Greater use of contextual knowledge via retrieval and interactive designs is suggested to advance the area.
Where Pith is reading between the lines
- Standardized benchmarks using real business cases could help resolve the evaluation fragmentation noted in the review.
- Without addressing reproducibility, many LLM-based tools may remain difficult for practitioners to adopt confidently.
- Combining LLMs with domain knowledge retrieval might reduce semantic errors in ways not fully explored in current studies.
- Future interactive systems could allow users to correct models in real time, addressing some validation gaps.
Load-bearing premise
That the studies identified by the structured search strategy represent the broader field and that the observed challenges are typical rather than specific to the selected papers.
What would settle it
A comparative experiment that applies multiple reviewed methods to the same set of real-world process descriptions and evaluates the semantic correctness of outputs against expert models using a unified, reproducible metric.
Figures
read the original abstract
Recent advances in Generative Artificial Intelligence, particularly Large Language Models (LLMs), have stimulated growing interest in automating or assisting Business Process Modeling tasks using natural language. Several approaches have been proposed to transform textual process descriptions into BPMN and related workflow models. However, the extent to which these approaches effectively support complex process modeling in organizational settings remains unclear. This article presents a literature review of AI-driven methods for transforming natural language into BPMN process models, with a particular focus on the role of LLMs. Following a structured review strategy, relevant studies were identified and analyzed to classify existing approaches, examine how LLMs are integrated into text-to-model pipelines, and investigate the evaluation practices used to assess generated models. The analysis reveals a clear shift from rule-based and traditional NLP pipelines toward LLM-based architectures that rely on prompt engineering, intermediate representations, and iterative refinement mechanisms. While these approaches significantly expand the capabilities of automated process model generation, the literature also exposes persistent challenges related to semantic correctness, evaluation fragmentation, reproducibility, and limited validation in real-world organizational contexts. Based on these findings, this review identifies key research gaps and discusses promising directions for future research, including the integration of contextual knowledge through Retrieval-Augmented Generation (RAG), its integration with LLMs, the development of interactive modeling architectures, and the need for more comprehensive and standardized evaluation frameworks.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents a literature review of AI-driven methods for transforming natural language into BPMN process models, with emphasis on LLMs. Following a structured review strategy, it classifies approaches, examines LLM integration through prompt engineering, intermediate representations, and iterative refinement, surveys evaluation practices, identifies a shift from rule-based and traditional NLP pipelines to LLM-based architectures, highlights persistent challenges in semantic correctness, evaluation fragmentation, reproducibility, and real-world validation, and proposes future directions including RAG integration, interactive modeling, and standardized evaluation frameworks.
Significance. If the sampled studies prove representative, the review would usefully consolidate an emerging intersection of LLMs and business process management, surfacing actionable gaps around evaluation standards and organizational deployment that could steer both BPM tooling and domain-specific LLM research.
major comments (1)
- [Abstract and Methodology] The abstract states that a 'structured review strategy' was followed to identify and analyze studies, yet no section supplies the concrete protocol: databases queried, exact search strings, date range, inclusion/exclusion criteria, screening process, inter-rater reliability, or final paper count (PRISMA-style flow). This omission is load-bearing for the central claim of a 'clear shift' from rule-based/NLP to LLM architectures and for the generality of the listed challenges, because without these details the observed trends cannot be distinguished from selection or recency bias.
minor comments (2)
- [Classification of approaches] The classification of approaches would be clearer if accompanied by a summary table listing each category, representative papers, and key LLM integration mechanisms.
- [Discussion and future work] Future-research directions (RAG, interactive architectures, standardized evaluation) are listed but not prioritized or linked back to specific gaps identified in the reviewed studies.
Simulated Author's Rebuttal
We thank the referee for the careful reading and constructive critique of our literature review. We agree that the absence of a detailed methodology section undermines the transparency and defensibility of our claims, and we will revise the manuscript to rectify this.
read point-by-point responses
-
Referee: [Abstract and Methodology] The abstract states that a 'structured review strategy' was followed to identify and analyze studies, yet no section supplies the concrete protocol: databases queried, exact search strings, date range, inclusion/exclusion criteria, screening process, inter-rater reliability, or final paper count (PRISMA-style flow). This omission is load-bearing for the central claim of a 'clear shift' from rule-based/NLP to LLM architectures and for the generality of the listed challenges, because without these details the observed trends cannot be distinguished from selection or recency bias.
Authors: We agree that this is a substantive shortcoming. The current manuscript does not contain a dedicated methodology section with the requested protocol details. In the revised version we will insert a new Section 2 ('Review Methodology') that fully documents the structured review process. It will specify the databases queried (Google Scholar, IEEE Xplore, ACM Digital Library, ScienceDirect, arXiv), the exact Boolean search strings, the date range (2018–2024), inclusion/exclusion criteria, the two-stage screening procedure, any inter-rater reliability measures employed, and the final number of included studies together with a PRISMA flow diagram. These additions will allow readers to assess potential selection or recency bias and will directly support the validity of the reported shift toward LLM-based architectures and the identified challenges. revision: yes
Circularity Check
No circularity: literature review with external claims only
full rationale
The paper is a structured literature review that identifies, classifies, and synthesizes external studies on AI-driven text-to-BPMN transformation. Its core claims (shift from rule-based/NLP to LLM architectures with prompt engineering and iterative refinement; persistent challenges in semantic correctness, evaluation, and reproducibility) rest on analysis of cited papers rather than any internal equations, fitted parameters, self-definitional loops, or load-bearing self-citations. No derivations, predictions, uniqueness theorems, or ansatzes are present that could reduce to the paper's own inputs by construction. The review methodology, while potentially underspecified, does not create circularity because the findings are benchmarked against independently published external work.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Ajmal, F., Wijekoon, P., Dhanamina, H., Ravishan, Y., Nawinna, D., & Attanayaka, B. (2024). Automated bpmn diagram generation.Proceedings of the 6th International Conference on Advancements in Computing (ICAC 2024), 7–12. https://doi.org/10.1109/ICAC64487.2024.10851120 Azeredo, F. (2025).Enhancing semantic search with nlp for secure and scalable document ...
-
[2]
https://doi.org/10.1007/978-3-031-70445-1 35
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.