An Innovative Next Activity Prediction Using Process Entropy and Dynamic Attribute-Wise-Transformer in Predictive Business Process Monitoring
Pith reviewed 2026-05-23 02:59 UTC · model grok-4.3
The pith
Dataset entropy determines whether a dynamic transformer or an interpretable decision tree performs better for next activity prediction.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper establishes that the DAW-Transformer achieves superior performance on high-entropy datasets such as Sepsis and Filtered Hospital Logs, while interpretable methods like Decision Trees perform competitively on low-entropy datasets such as BPIC 2020 Prepaid Travel Costs, and that an entropy-based model selection framework can guide the choice between them.
What carries the argument
The entropy-based model selection framework together with the DAW-Transformer that integrates multi-head attention and dynamic windowing to capture long-range dependencies across all attributes.
Load-bearing premise
That the proposed entropy measure of dataset complexity reliably indicates which algorithm family will perform best, and that the dynamic windowing in DAW-Transformer is the key reason for any observed gains on high-entropy logs.
What would settle it
An experiment on additional event logs stratified by entropy level in which the DAW-Transformer does not outperform on high-entropy cases or decision trees do not remain competitive on low-entropy cases.
Figures
read the original abstract
Next activity prediction in predictive business process monitoring is crucial for operational efficiency and informed decision-making. While machine learning and Artificial Intelligence have achieved promising results, challenges remain in balancing interpretability and accuracy, particularly due to the complexity and evolving nature of event logs. This paper presents two contributions: (i) an entropy-based model selection framework that quantifies dataset complexity to recommend suitable algorithms, and (ii) the DAW-Transformer (Dynamic Attribute-Wise Transformer), which integrates multi-head attention with a dynamic windowing mechanism to capture long-range dependencies across all attributes. Experiments on six public event logs show that the DAW-Transformer achieves superior performance on high-entropy datasets (e.g., Sepsis, Filtered Hospital Logs), whereas interpretable methods like Decision Trees perform competitively on low-entropy datasets (e.g., BPIC 2020 Prepaid Travel Costs). These results highlight the importance of aligning model choice with dataset entropy to balance accuracy and interpretability.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims two contributions in next activity prediction for predictive business process monitoring: (i) an entropy-based model selection framework that quantifies dataset complexity to recommend algorithm families, and (ii) the DAW-Transformer, which augments multi-head attention with a dynamic windowing mechanism to capture long-range dependencies across attributes. Experiments on six public event logs are reported to show DAW-Transformer superiority on high-entropy logs (Sepsis, Filtered Hospital) and competitive performance of interpretable methods such as Decision Trees on low-entropy logs (BPIC 2020 Prepaid Travel Costs).
Significance. If the central claims hold, the work would provide a practical, data-driven heuristic for trading off accuracy against interpretability in process monitoring by aligning model choice with an entropy measure of log complexity. The evaluation on six public logs is a positive feature that supports reproducibility and cross-dataset comparison.
major comments (2)
- [Experiments] Experiments section: the claim that DAW-Transformer superiority on high-entropy logs is due to the dynamic windowing mechanism (rather than multi-head attention or attribute-wise processing alone) is load-bearing for both the architecture contribution and the entropy-based selection framework, yet no ablation is presented that replaces dynamic windowing with a fixed-length or average-length window while holding all other components constant.
- [Framework] Framework description (likely §3): the entropy-based recommendation of model families risks post-hoc fitting if the entropy thresholds or mapping from entropy values to recommended algorithms were derived or tuned using the same six evaluation logs rather than a separate validation procedure or theoretical derivation.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. We address each major comment below and indicate the revisions we will make.
read point-by-point responses
-
Referee: [Experiments] Experiments section: the claim that DAW-Transformer superiority on high-entropy logs is due to the dynamic windowing mechanism (rather than multi-head attention or attribute-wise processing alone) is load-bearing for both the architecture contribution and the entropy-based selection framework, yet no ablation is presented that replaces dynamic windowing with a fixed-length or average-length window while holding all other components constant.
Authors: We agree that an explicit ablation isolating the dynamic windowing component is necessary to substantiate the architectural claim. In the revised manuscript we will add an ablation study that replaces the dynamic windowing mechanism with both fixed-length and average-length windows while keeping multi-head attention, attribute-wise processing, and all other hyperparameters identical. Results will be reported on the same six logs with the same evaluation protocol. revision: yes
-
Referee: [Framework] Framework description (likely §3): the entropy-based recommendation of model families risks post-hoc fitting if the entropy thresholds or mapping from entropy values to recommended algorithms were derived or tuned using the same six evaluation logs rather than a separate validation procedure or theoretical derivation.
Authors: The entropy thresholds and the mapping from entropy ranges to model families were derived from theoretical considerations of process complexity (drawing on prior work relating entropy to branching factor and trace variability) together with distributional analysis performed on a larger collection of public logs before the six evaluation logs were selected. Nevertheless, to eliminate any perception of post-hoc fitting we will (i) document the exact derivation procedure and the auxiliary logs used, and (ii) add a sensitivity analysis that recomputes recommendations on a held-out set of logs. These clarifications and additional checks will appear in the revised §3. revision: partial
Circularity Check
No significant circularity identified
full rationale
The provided abstract and description present an empirical study proposing an entropy-based model selection framework and the DAW-Transformer architecture, validated via experiments on six public event logs. No equations, self-citations, or derivation steps are shown that reduce a claimed prediction or result to its inputs by construction (e.g., no fitted parameters renamed as predictions, no self-definitional entropy measures, no load-bearing self-citations). The claims rest on comparative performance results rather than a closed mathematical chain, making the work self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Abbasi, M., Khadivi, M., Ahang, M., Lasserre, P., Lucet, Y . & Najjaran, H. (2025), “Forlaps: An innovative data-driven reinforcement learning approach for prescriptive process monitoring”,arXiv preprint arXiv:2501.10543. Abbasi, M., Nishat, R. I., Bond, C., Graham-Knight, J. B., Lasserre, P., Lucet, Y . & Najjaran, H. (2024), “A review of ai and machine ...
-
[2]
Next activity prediction and elapsed time prediction on process dataset
Dentamaro, V ., Impedovo, D., Pirlo, G. & Semeraro, G. (2023), “Next activity prediction and elapsed time prediction on process dataset.”,Ital-IA, pp. 605–609. Di Mauro, N., Appice, A. & Basile, T. M. (2019), “Activity prediction of business process instances with inception cnn models”,AI* IA 2019–Advances in Artificial Intelligence: XVIIIth International...
work page 2023
-
[3]
Review of random forest classification techniques to resolve data imbalance
More, A. & Rana, D. P. (2017), “Review of random forest classification techniques to resolve data imbalance”, 2017 1st International conference on intelligent systems and information management (ICISIM), IEEE, pp. 72–78. Musa, T. H. A. & Bouras, A. (2023), “Prediction of next events in business processes: A deep learning approach”, IFIP International Conf...
work page 2017
-
[4]
Multi-attribute transformers for sequence prediction in busi- ness process management
Rivera Lazo, G. & ˜Nanculef, R. (2022), “Multi-attribute transformers for sequence prediction in busi- ness process management”, International Conference on Discovery Science , Springer, pp. 184–
work page 2022
-
[5]
Decision tree methods: applications for classification and predic- tion
Song, Y .-Y . & Ying, L. (2015), “Decision tree methods: applications for classification and predic- tion”, Shanghai archives of psychiatry, V ol. 27 , No. 2, pp
work page 2015
-
[6]
Next activity prediction of ongoing business processes based on deep learning
Sun, X., Yang, S., Ying, Y . & Yu, D. (2024), “Next activity prediction of ongoing business processes based on deep learning”, Expert Systems, V ol. 41 , No. 5, pp. e13421. 17 Turner, C. J., Tiwari, A., Olaiya, R. & Xu, Y . (2012), “Process mining: from theory to practice”, Business process management journal, V ol. 18 , No. 3, pp. 493–512. Vaswani, A. (2...
work page 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.