Large Language Models to Enhance Business Process Modeling: Past, Present, and Future Trends

Jo\~ao Bettencourt; S\'ergio Guerreiro

arxiv: 2604.14034 · v1 · submitted 2026-04-15 · 💻 cs.SE · cs.AI· cs.IR

Large Language Models to Enhance Business Process Modeling: Past, Present, and Future Trends

Jo\~ao Bettencourt , S\'ergio Guerreiro This is my paper

Pith reviewed 2026-05-10 12:54 UTC · model grok-4.3

classification 💻 cs.SE cs.AIcs.IR

keywords Large Language ModelsBusiness Process ModelingBPMNText-to-ModelLiterature ReviewPrompt EngineeringProcess Automation

0 comments

The pith

Large language models are transforming natural language into BPMN process models through prompting and iteration, though semantic accuracy and evaluation remain problematic.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper conducts a literature review on methods that use AI, especially large language models, to convert textual descriptions of business processes into formal models such as BPMN diagrams. It documents a move away from earlier rule-based and traditional natural language processing techniques toward LLM approaches that depend on careful prompt design, intermediate steps, and repeated refinement. A reader might care because such tools could allow business users without modeling expertise to create accurate process maps quickly, potentially improving efficiency in organizations. The review points out that these new methods increase what is possible but still encounter difficulties in ensuring the models match the intended meaning and in applying consistent tests outside controlled settings.

Core claim

Following a structured review, the analysis reveals a clear shift from rule-based and traditional NLP pipelines toward LLM-based architectures that rely on prompt engineering, intermediate representations, and iterative refinement mechanisms. While these approaches significantly expand the capabilities of automated process model generation, the literature exposes persistent challenges related to semantic correctness, evaluation fragmentation, reproducibility, and limited validation in real-world organizational contexts.

What carries the argument

A structured literature review that classifies existing AI-driven text-to-BPMN approaches and examines the integration of LLMs through prompt engineering, intermediate representations, and iterative refinement.

If this is right

LLM-based methods expand automated process model generation beyond the limits of rigid rule systems.
Generated models still risk semantic inaccuracies that could mislead organizational decisions.
Fragmented evaluation practices make it hard to compare or trust different proposed solutions.
Reproducibility issues hinder cumulative progress in the field.
Greater use of contextual knowledge via retrieval and interactive designs is suggested to advance the area.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Standardized benchmarks using real business cases could help resolve the evaluation fragmentation noted in the review.
Without addressing reproducibility, many LLM-based tools may remain difficult for practitioners to adopt confidently.
Combining LLMs with domain knowledge retrieval might reduce semantic errors in ways not fully explored in current studies.
Future interactive systems could allow users to correct models in real time, addressing some validation gaps.

Load-bearing premise

That the studies identified by the structured search strategy represent the broader field and that the observed challenges are typical rather than specific to the selected papers.

What would settle it

A comparative experiment that applies multiple reviewed methods to the same set of real-world process descriptions and evaluates the semantic correctness of outputs against expert models using a unified, reproducible metric.

Figures

Figures reproduced from arXiv: 2604.14034 by Jo\~ao Bettencourt, S\'ergio Guerreiro.

**Figure 1.** Figure 1: The included number of paper organized by year (from authors). 2.3 Reporting RQ1 – Approaches: What methods using AI have been proposed to transform natural language into BPMN models? AI is a broad umbrella term encompassing a wide range of techniques and paradigms. Although all included studies employ AI-based methods, it is necessary to clarify how AI is framed in this review in order to support a consis… view at source ↗

**Figure 2.** Figure 2: Included results per year categorized. An increase in generative AI is noticed, while noGenAI is not evolving (from authors) [PITH_FULL_IMAGE:figures/full_fig_p010_2.png] view at source ↗

read the original abstract

Recent advances in Generative Artificial Intelligence, particularly Large Language Models (LLMs), have stimulated growing interest in automating or assisting Business Process Modeling tasks using natural language. Several approaches have been proposed to transform textual process descriptions into BPMN and related workflow models. However, the extent to which these approaches effectively support complex process modeling in organizational settings remains unclear. This article presents a literature review of AI-driven methods for transforming natural language into BPMN process models, with a particular focus on the role of LLMs. Following a structured review strategy, relevant studies were identified and analyzed to classify existing approaches, examine how LLMs are integrated into text-to-model pipelines, and investigate the evaluation practices used to assess generated models. The analysis reveals a clear shift from rule-based and traditional NLP pipelines toward LLM-based architectures that rely on prompt engineering, intermediate representations, and iterative refinement mechanisms. While these approaches significantly expand the capabilities of automated process model generation, the literature also exposes persistent challenges related to semantic correctness, evaluation fragmentation, reproducibility, and limited validation in real-world organizational contexts. Based on these findings, this review identifies key research gaps and discusses promising directions for future research, including the integration of contextual knowledge through Retrieval-Augmented Generation (RAG), its integration with LLMs, the development of interactive modeling architectures, and the need for more comprehensive and standardized evaluation frameworks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The paper presents a literature review of AI-driven methods for transforming natural language into BPMN process models, with emphasis on LLMs. Following a structured review strategy, it classifies approaches, examines LLM integration through prompt engineering, intermediate representations, and iterative refinement, surveys evaluation practices, identifies a shift from rule-based and traditional NLP pipelines to LLM-based architectures, highlights persistent challenges in semantic correctness, evaluation fragmentation, reproducibility, and real-world validation, and proposes future directions including RAG integration, interactive modeling, and standardized evaluation frameworks.

Significance. If the sampled studies prove representative, the review would usefully consolidate an emerging intersection of LLMs and business process management, surfacing actionable gaps around evaluation standards and organizational deployment that could steer both BPM tooling and domain-specific LLM research.

major comments (1)

[Abstract and Methodology] The abstract states that a 'structured review strategy' was followed to identify and analyze studies, yet no section supplies the concrete protocol: databases queried, exact search strings, date range, inclusion/exclusion criteria, screening process, inter-rater reliability, or final paper count (PRISMA-style flow). This omission is load-bearing for the central claim of a 'clear shift' from rule-based/NLP to LLM architectures and for the generality of the listed challenges, because without these details the observed trends cannot be distinguished from selection or recency bias.

minor comments (2)

[Classification of approaches] The classification of approaches would be clearer if accompanied by a summary table listing each category, representative papers, and key LLM integration mechanisms.
[Discussion and future work] Future-research directions (RAG, interactive architectures, standardized evaluation) are listed but not prioritized or linked back to specific gaps identified in the reviewed studies.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the careful reading and constructive critique of our literature review. We agree that the absence of a detailed methodology section undermines the transparency and defensibility of our claims, and we will revise the manuscript to rectify this.

read point-by-point responses

Referee: [Abstract and Methodology] The abstract states that a 'structured review strategy' was followed to identify and analyze studies, yet no section supplies the concrete protocol: databases queried, exact search strings, date range, inclusion/exclusion criteria, screening process, inter-rater reliability, or final paper count (PRISMA-style flow). This omission is load-bearing for the central claim of a 'clear shift' from rule-based/NLP to LLM architectures and for the generality of the listed challenges, because without these details the observed trends cannot be distinguished from selection or recency bias.

Authors: We agree that this is a substantive shortcoming. The current manuscript does not contain a dedicated methodology section with the requested protocol details. In the revised version we will insert a new Section 2 ('Review Methodology') that fully documents the structured review process. It will specify the databases queried (Google Scholar, IEEE Xplore, ACM Digital Library, ScienceDirect, arXiv), the exact Boolean search strings, the date range (2018–2024), inclusion/exclusion criteria, the two-stage screening procedure, any inter-rater reliability measures employed, and the final number of included studies together with a PRISMA flow diagram. These additions will allow readers to assess potential selection or recency bias and will directly support the validity of the reported shift toward LLM-based architectures and the identified challenges. revision: yes

Circularity Check

0 steps flagged

No circularity: literature review with external claims only

full rationale

The paper is a structured literature review that identifies, classifies, and synthesizes external studies on AI-driven text-to-BPMN transformation. Its core claims (shift from rule-based/NLP to LLM architectures with prompt engineering and iterative refinement; persistent challenges in semantic correctness, evaluation, and reproducibility) rest on analysis of cited papers rather than any internal equations, fitted parameters, self-definitional loops, or load-bearing self-citations. No derivations, predictions, uniqueness theorems, or ansatzes are present that could reduce to the paper's own inputs by construction. The review methodology, while potentially underspecified, does not create circularity because the findings are benchmarked against independently published external work.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The paper is a survey and introduces no new mathematical objects, fitted parameters, or postulated entities. Its claims rest on the assumption that the reviewed literature accurately reflects the state of the field.

pith-pipeline@v0.9.0 · 5548 in / 1064 out tokens · 15181 ms · 2026-05-10T12:54:22.698079+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

2 extracted references · 2 canonical work pages

[1]

Ajmal, F., Wijekoon, P., Dhanamina, H., Ravishan, Y., Nawinna, D., & Attanayaka, B. (2024). Automated bpmn diagram generation.Proceedings of the 6th International Conference on Advancements in Computing (ICAC 2024), 7–12. https://doi.org/10.1109/ICAC64487.2024.10851120 Azeredo, F. (2025).Enhancing semantic search with nlp for secure and scalable document ...

work page doi:10.1109/icac64487.2024.10851120 2024
[2]

https://doi.org/10.1007/978-3-031-70445-1 35

work page doi:10.1007/978-3-031-70445-1

[1] [1]

Ajmal, F., Wijekoon, P., Dhanamina, H., Ravishan, Y., Nawinna, D., & Attanayaka, B. (2024). Automated bpmn diagram generation.Proceedings of the 6th International Conference on Advancements in Computing (ICAC 2024), 7–12. https://doi.org/10.1109/ICAC64487.2024.10851120 Azeredo, F. (2025).Enhancing semantic search with nlp for secure and scalable document ...

work page doi:10.1109/icac64487.2024.10851120 2024

[2] [2]

https://doi.org/10.1007/978-3-031-70445-1 35

work page doi:10.1007/978-3-031-70445-1