Process Extraction from Text: Benchmarking the State of the Art and Paving the Way for Future Challenges

Chiara Ghidini; Han van der Aa; Mauro Dragoni; Patrizio Bellan; Simone Paolo Ponzetto

arxiv: 2110.03754 · v2 · pith:NZLVAYSCnew · submitted 2021-10-07 · 💻 cs.AI

Process Extraction from Text: Benchmarking the State of the Art and Paving the Way for Future Challenges

Patrizio Bellan , Mauro Dragoni , Chiara Ghidini , Han van der Aa , Simone Paolo Ponzetto This is my paper

classification 💻 cs.AI

keywords processevaluationcompareextractionmodelproposedproblemqualitative

0 comments

read the original abstract

The extraction of process models from text refers to the problem of turning the information contained in an unstructured textual process descriptions into a formal representation,i.e.,a process model. Several automated approaches have been proposed to tackle this problem, but they are highly heterogeneous in scope and underlying assumptions,i.e., differences in input, target output, and data used in their evaluation.As a result, it is currently unclear how well existing solutions are able to solve the model-extraction problem and how they compare to each other.We overcome this issue by comparing 10 state-of-the-art approaches for model extraction in a systematic manner, covering both qualitative and quantitative aspects.The qualitative evaluation compares the analysis of the primary studies on: 1 the main characteristics of each solution;2 the type of process model elements extracted from the input data;3 the experimental evaluation performed to evaluate the proposed framework.The results show a heterogeneity of techniques, elements extracted and evaluations conducted, that are often impossible to compare.To overcome this difficulty we propose a quantitative comparison of the tools proposed by the papers on the unifying task of process model entity and relation extraction so as to be able to compare them directly.The results show three distinct groups of tools in terms of performance, with no tool obtaining very good scores and also serious limitations.Moreover, the proposed evaluation pipeline can be considered a reference task on a well-defined dataset and metrics that can be used to compare new tools. The paper also presents a reflection on the results of the qualitative and quantitative evaluation on the limitations and challenges that the community needs to address in the future to produce significant advances in this area.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Automated BPMN Model Generation from Textual Process Descriptions: A Multi-Stage LLM-Driven Approach
cs.SE 2026-04 unverdicted novelty 6.0

Multi-stage LLM pipeline creates validated BPMN models from text and reconstructs them with average similarity above 0.75 across 387 cases from 750 public diagrams.
Automatic Generation of Executable BPMN Models from Medical Guidelines
cs.AI 2026-04 unverdicted novelty 5.0

LLM-based pipeline converts medical guidelines into executable BPMN models with over 92% per-patient decision agreement and an entropy detector for policy ambiguity.