An Innovative Next Activity Prediction Using Process Entropy and Dynamic Attribute-Wise-Transformer in Predictive Business Process Monitoring

Hadi Zare; Homayoun Najjaran; Maryam Ahang; Mostafa Abbasi

arxiv: 2502.10573 · v2 · submitted 2025-02-14 · 💻 cs.LG · cs.AI

An Innovative Next Activity Prediction Using Process Entropy and Dynamic Attribute-Wise-Transformer in Predictive Business Process Monitoring

Hadi Zare , Mostafa Abbasi , Maryam Ahang , Homayoun Najjaran This is my paper

Pith reviewed 2026-05-23 02:59 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords next activity predictionpredictive business process monitoringprocess entropydynamic attribute-wise transformermodel selectionevent logs

0 comments

The pith

Dataset entropy determines whether a dynamic transformer or an interpretable decision tree performs better for next activity prediction.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes an entropy measure to quantify the complexity of business process event logs. It pairs this measure with a model selection rule that favors the new DAW-Transformer on high-entropy logs and simpler interpretable algorithms such as decision trees on low-entropy logs. The DAW-Transformer applies multi-head attention together with dynamic windowing to track dependencies across attributes in evolving sequences. Experiments on six public logs illustrate that this entropy-guided choice improves the accuracy-interpretability balance.

Core claim

The paper establishes that the DAW-Transformer achieves superior performance on high-entropy datasets such as Sepsis and Filtered Hospital Logs, while interpretable methods like Decision Trees perform competitively on low-entropy datasets such as BPIC 2020 Prepaid Travel Costs, and that an entropy-based model selection framework can guide the choice between them.

What carries the argument

The entropy-based model selection framework together with the DAW-Transformer that integrates multi-head attention and dynamic windowing to capture long-range dependencies across all attributes.

Load-bearing premise

That the proposed entropy measure of dataset complexity reliably indicates which algorithm family will perform best, and that the dynamic windowing in DAW-Transformer is the key reason for any observed gains on high-entropy logs.

What would settle it

An experiment on additional event logs stratified by entropy level in which the DAW-Transformer does not outperform on high-entropy cases or decision trees do not remain competitive on low-entropy cases.

Figures

Figures reproduced from arXiv: 2502.10573 by Hadi Zare, Homayoun Najjaran, Maryam Ahang, Mostafa Abbasi.

**Figure 1.** Figure 1: Entropy-Driven Next Activity Prediction: High-entropy datasets use DAW-Transformer for accuracy, while [PITH_FULL_IMAGE:figures/full_fig_p009_1.png] view at source ↗

**Figure 2.** Figure 2: Confusion matrix for the high-entropy sepsis dataset, demonstrating improved performance with the DAW [PITH_FULL_IMAGE:figures/full_fig_p014_2.png] view at source ↗

**Figure 3.** Figure 3: Confusion matrix for the low-entropy road traffic fine dataset, demonstrating improved performance with the [PITH_FULL_IMAGE:figures/full_fig_p014_3.png] view at source ↗

read the original abstract

Next activity prediction in predictive business process monitoring is crucial for operational efficiency and informed decision-making. While machine learning and Artificial Intelligence have achieved promising results, challenges remain in balancing interpretability and accuracy, particularly due to the complexity and evolving nature of event logs. This paper presents two contributions: (i) an entropy-based model selection framework that quantifies dataset complexity to recommend suitable algorithms, and (ii) the DAW-Transformer (Dynamic Attribute-Wise Transformer), which integrates multi-head attention with a dynamic windowing mechanism to capture long-range dependencies across all attributes. Experiments on six public event logs show that the DAW-Transformer achieves superior performance on high-entropy datasets (e.g., Sepsis, Filtered Hospital Logs), whereas interpretable methods like Decision Trees perform competitively on low-entropy datasets (e.g., BPIC 2020 Prepaid Travel Costs). These results highlight the importance of aligning model choice with dataset entropy to balance accuracy and interpretability.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a usable entropy heuristic for picking transformers versus trees on event logs, but the evidence does not isolate the dynamic windowing as the cause of gains.

read the letter

The main takeaway is that entropy of an event log can guide whether to reach for a transformer or stick with something interpretable like a decision tree. On the six public logs they test, the DAW-Transformer looks stronger when entropy is high and the tree holds its own when entropy is low. That split is the concrete observation worth noting for anyone who has to pick a model for next-activity prediction without running every option every time. They also ship the DAW-Transformer itself, which adds a dynamic window per attribute on top of standard multi-head attention. The experiments are run on the same public datasets that the field already uses, so the comparison is at least reproducible in principle. That is the part that earns credit: a simple, testable rule of thumb tied to measurable data properties rather than another architecture paper that claims universal superiority. The soft spots are exactly where the stress-test note says. There is no ablation that keeps the rest of the transformer fixed and only turns the dynamic window on or off, so it is not clear the dynamic part is doing the work rather than attention in general or hyper-parameter choices. The abstract gives no formula for the entropy measure or how the high/low threshold is chosen, which leaves open the possibility that the selection rule was tuned on the same logs used for the final numbers. If the full paper has those details and a hold-out check on the rule itself, the concern shrinks; otherwise it stays material. This work is aimed at the predictive business process monitoring crowd who already work with event logs and need practical model-selection advice. A reader outside that subfield will not find new theory or methods that transfer elsewhere. It is worth sending to a serious referee because the claim is narrow, the data are public, and the experiments are set up so that the gaps can be fixed with targeted revisions rather than a full rewrite.

Referee Report

2 major / 0 minor

Summary. The paper claims two contributions in next activity prediction for predictive business process monitoring: (i) an entropy-based model selection framework that quantifies dataset complexity to recommend algorithm families, and (ii) the DAW-Transformer, which augments multi-head attention with a dynamic windowing mechanism to capture long-range dependencies across attributes. Experiments on six public event logs are reported to show DAW-Transformer superiority on high-entropy logs (Sepsis, Filtered Hospital) and competitive performance of interpretable methods such as Decision Trees on low-entropy logs (BPIC 2020 Prepaid Travel Costs).

Significance. If the central claims hold, the work would provide a practical, data-driven heuristic for trading off accuracy against interpretability in process monitoring by aligning model choice with an entropy measure of log complexity. The evaluation on six public logs is a positive feature that supports reproducibility and cross-dataset comparison.

major comments (2)

[Experiments] Experiments section: the claim that DAW-Transformer superiority on high-entropy logs is due to the dynamic windowing mechanism (rather than multi-head attention or attribute-wise processing alone) is load-bearing for both the architecture contribution and the entropy-based selection framework, yet no ablation is presented that replaces dynamic windowing with a fixed-length or average-length window while holding all other components constant.
[Framework] Framework description (likely §3): the entropy-based recommendation of model families risks post-hoc fitting if the entropy thresholds or mapping from entropy values to recommended algorithms were derived or tuned using the same six evaluation logs rather than a separate validation procedure or theoretical derivation.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment below and indicate the revisions we will make.

read point-by-point responses

Referee: [Experiments] Experiments section: the claim that DAW-Transformer superiority on high-entropy logs is due to the dynamic windowing mechanism (rather than multi-head attention or attribute-wise processing alone) is load-bearing for both the architecture contribution and the entropy-based selection framework, yet no ablation is presented that replaces dynamic windowing with a fixed-length or average-length window while holding all other components constant.

Authors: We agree that an explicit ablation isolating the dynamic windowing component is necessary to substantiate the architectural claim. In the revised manuscript we will add an ablation study that replaces the dynamic windowing mechanism with both fixed-length and average-length windows while keeping multi-head attention, attribute-wise processing, and all other hyperparameters identical. Results will be reported on the same six logs with the same evaluation protocol. revision: yes
Referee: [Framework] Framework description (likely §3): the entropy-based recommendation of model families risks post-hoc fitting if the entropy thresholds or mapping from entropy values to recommended algorithms were derived or tuned using the same six evaluation logs rather than a separate validation procedure or theoretical derivation.

Authors: The entropy thresholds and the mapping from entropy ranges to model families were derived from theoretical considerations of process complexity (drawing on prior work relating entropy to branching factor and trace variability) together with distributional analysis performed on a larger collection of public logs before the six evaluation logs were selected. Nevertheless, to eliminate any perception of post-hoc fitting we will (i) document the exact derivation procedure and the auxiliary logs used, and (ii) add a sensitivity analysis that recomputes recommendations on a held-out set of logs. These clarifications and additional checks will appear in the revised §3. revision: partial

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The provided abstract and description present an empirical study proposing an entropy-based model selection framework and the DAW-Transformer architecture, validated via experiments on six public event logs. No equations, self-citations, or derivation steps are shown that reduce a claimed prediction or result to its inputs by construction (e.g., no fitted parameters renamed as predictions, no self-definitional entropy measures, no load-bearing self-citations). The claims rest on comparative performance results rather than a closed mathematical chain, making the work self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review yields no explicit free parameters, axioms, or invented entities; the DAW-Transformer is presented as a new model but without internal details.

pith-pipeline@v0.9.0 · 5711 in / 970 out tokens · 22348 ms · 2026-05-23T02:59:39.922171+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

6 extracted references · 6 canonical work pages

[1]

Forlaps: An innovative data-driven reinforcement learning approach for prescriptive process monitoring

Abbasi, M., Khadivi, M., Ahang, M., Lasserre, P., Lucet, Y . & Najjaran, H. (2025), “Forlaps: An innovative data-driven reinforcement learning approach for prescriptive process monitoring”,arXiv preprint arXiv:2501.10543. Abbasi, M., Nishat, R. I., Bond, C., Graham-Knight, J. B., Lasserre, P., Lucet, Y . & Najjaran, H. (2024), “A review of ai and machine ...

work page arXiv 2025
[2]

Next activity prediction and elapsed time prediction on process dataset

Dentamaro, V ., Impedovo, D., Pirlo, G. & Semeraro, G. (2023), “Next activity prediction and elapsed time prediction on process dataset.”,Ital-IA, pp. 605–609. Di Mauro, N., Appice, A. & Basile, T. M. (2019), “Activity prediction of business process instances with inception cnn models”,AI* IA 2019–Advances in Artificial Intelligence: XVIIIth International...

work page 2023
[3]

Review of random forest classification techniques to resolve data imbalance

More, A. & Rana, D. P. (2017), “Review of random forest classification techniques to resolve data imbalance”, 2017 1st International conference on intelligent systems and information management (ICISIM), IEEE, pp. 72–78. Musa, T. H. A. & Bouras, A. (2023), “Prediction of next events in business processes: A deep learning approach”, IFIP International Conf...

work page 2017
[4]

Multi-attribute transformers for sequence prediction in busi- ness process management

Rivera Lazo, G. & ˜Nanculef, R. (2022), “Multi-attribute transformers for sequence prediction in busi- ness process management”, International Conference on Discovery Science , Springer, pp. 184–

work page 2022
[5]

Decision tree methods: applications for classification and predic- tion

Song, Y .-Y . & Ying, L. (2015), “Decision tree methods: applications for classification and predic- tion”, Shanghai archives of psychiatry, V ol. 27 , No. 2, pp

work page 2015
[6]

Next activity prediction of ongoing business processes based on deep learning

Sun, X., Yang, S., Ying, Y . & Yu, D. (2024), “Next activity prediction of ongoing business processes based on deep learning”, Expert Systems, V ol. 41 , No. 5, pp. e13421. 17 Turner, C. J., Tiwari, A., Olaiya, R. & Xu, Y . (2012), “Process mining: from theory to practice”, Business process management journal, V ol. 18 , No. 3, pp. 493–512. Vaswani, A. (2...

work page 2024

[1] [1]

Forlaps: An innovative data-driven reinforcement learning approach for prescriptive process monitoring

Abbasi, M., Khadivi, M., Ahang, M., Lasserre, P., Lucet, Y . & Najjaran, H. (2025), “Forlaps: An innovative data-driven reinforcement learning approach for prescriptive process monitoring”,arXiv preprint arXiv:2501.10543. Abbasi, M., Nishat, R. I., Bond, C., Graham-Knight, J. B., Lasserre, P., Lucet, Y . & Najjaran, H. (2024), “A review of ai and machine ...

work page arXiv 2025

[2] [2]

Next activity prediction and elapsed time prediction on process dataset

Dentamaro, V ., Impedovo, D., Pirlo, G. & Semeraro, G. (2023), “Next activity prediction and elapsed time prediction on process dataset.”,Ital-IA, pp. 605–609. Di Mauro, N., Appice, A. & Basile, T. M. (2019), “Activity prediction of business process instances with inception cnn models”,AI* IA 2019–Advances in Artificial Intelligence: XVIIIth International...

work page 2023

[3] [3]

Review of random forest classification techniques to resolve data imbalance

More, A. & Rana, D. P. (2017), “Review of random forest classification techniques to resolve data imbalance”, 2017 1st International conference on intelligent systems and information management (ICISIM), IEEE, pp. 72–78. Musa, T. H. A. & Bouras, A. (2023), “Prediction of next events in business processes: A deep learning approach”, IFIP International Conf...

work page 2017

[4] [4]

Multi-attribute transformers for sequence prediction in busi- ness process management

Rivera Lazo, G. & ˜Nanculef, R. (2022), “Multi-attribute transformers for sequence prediction in busi- ness process management”, International Conference on Discovery Science , Springer, pp. 184–

work page 2022

[5] [5]

Decision tree methods: applications for classification and predic- tion

Song, Y .-Y . & Ying, L. (2015), “Decision tree methods: applications for classification and predic- tion”, Shanghai archives of psychiatry, V ol. 27 , No. 2, pp

work page 2015

[6] [6]

Next activity prediction of ongoing business processes based on deep learning

Sun, X., Yang, S., Ying, Y . & Yu, D. (2024), “Next activity prediction of ongoing business processes based on deep learning”, Expert Systems, V ol. 41 , No. 5, pp. e13421. 17 Turner, C. J., Tiwari, A., Olaiya, R. & Xu, Y . (2012), “Process mining: from theory to practice”, Business process management journal, V ol. 18 , No. 3, pp. 493–512. Vaswani, A. (2...

work page 2024