pith. sign in

arxiv: 2502.10573 · v2 · submitted 2025-02-14 · 💻 cs.LG · cs.AI

An Innovative Next Activity Prediction Using Process Entropy and Dynamic Attribute-Wise-Transformer in Predictive Business Process Monitoring

Pith reviewed 2026-05-23 02:59 UTC · model grok-4.3

classification 💻 cs.LG cs.AI
keywords next activity predictionpredictive business process monitoringprocess entropydynamic attribute-wise transformermodel selectionevent logs
0
0 comments X

The pith

Dataset entropy determines whether a dynamic transformer or an interpretable decision tree performs better for next activity prediction.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes an entropy measure to quantify the complexity of business process event logs. It pairs this measure with a model selection rule that favors the new DAW-Transformer on high-entropy logs and simpler interpretable algorithms such as decision trees on low-entropy logs. The DAW-Transformer applies multi-head attention together with dynamic windowing to track dependencies across attributes in evolving sequences. Experiments on six public logs illustrate that this entropy-guided choice improves the accuracy-interpretability balance.

Core claim

The paper establishes that the DAW-Transformer achieves superior performance on high-entropy datasets such as Sepsis and Filtered Hospital Logs, while interpretable methods like Decision Trees perform competitively on low-entropy datasets such as BPIC 2020 Prepaid Travel Costs, and that an entropy-based model selection framework can guide the choice between them.

What carries the argument

The entropy-based model selection framework together with the DAW-Transformer that integrates multi-head attention and dynamic windowing to capture long-range dependencies across all attributes.

Load-bearing premise

That the proposed entropy measure of dataset complexity reliably indicates which algorithm family will perform best, and that the dynamic windowing in DAW-Transformer is the key reason for any observed gains on high-entropy logs.

What would settle it

An experiment on additional event logs stratified by entropy level in which the DAW-Transformer does not outperform on high-entropy cases or decision trees do not remain competitive on low-entropy cases.

Figures

Figures reproduced from arXiv: 2502.10573 by Hadi Zare, Homayoun Najjaran, Maryam Ahang, Mostafa Abbasi.

Figure 1
Figure 1. Figure 1: Entropy-Driven Next Activity Prediction: High-entropy datasets use DAW-Transformer for accuracy, while [PITH_FULL_IMAGE:figures/full_fig_p009_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Confusion matrix for the high-entropy sepsis dataset, demonstrating improved performance with the DAW [PITH_FULL_IMAGE:figures/full_fig_p014_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Confusion matrix for the low-entropy road traffic fine dataset, demonstrating improved performance with the [PITH_FULL_IMAGE:figures/full_fig_p014_3.png] view at source ↗
read the original abstract

Next activity prediction in predictive business process monitoring is crucial for operational efficiency and informed decision-making. While machine learning and Artificial Intelligence have achieved promising results, challenges remain in balancing interpretability and accuracy, particularly due to the complexity and evolving nature of event logs. This paper presents two contributions: (i) an entropy-based model selection framework that quantifies dataset complexity to recommend suitable algorithms, and (ii) the DAW-Transformer (Dynamic Attribute-Wise Transformer), which integrates multi-head attention with a dynamic windowing mechanism to capture long-range dependencies across all attributes. Experiments on six public event logs show that the DAW-Transformer achieves superior performance on high-entropy datasets (e.g., Sepsis, Filtered Hospital Logs), whereas interpretable methods like Decision Trees perform competitively on low-entropy datasets (e.g., BPIC 2020 Prepaid Travel Costs). These results highlight the importance of aligning model choice with dataset entropy to balance accuracy and interpretability.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The paper claims two contributions in next activity prediction for predictive business process monitoring: (i) an entropy-based model selection framework that quantifies dataset complexity to recommend algorithm families, and (ii) the DAW-Transformer, which augments multi-head attention with a dynamic windowing mechanism to capture long-range dependencies across attributes. Experiments on six public event logs are reported to show DAW-Transformer superiority on high-entropy logs (Sepsis, Filtered Hospital) and competitive performance of interpretable methods such as Decision Trees on low-entropy logs (BPIC 2020 Prepaid Travel Costs).

Significance. If the central claims hold, the work would provide a practical, data-driven heuristic for trading off accuracy against interpretability in process monitoring by aligning model choice with an entropy measure of log complexity. The evaluation on six public logs is a positive feature that supports reproducibility and cross-dataset comparison.

major comments (2)
  1. [Experiments] Experiments section: the claim that DAW-Transformer superiority on high-entropy logs is due to the dynamic windowing mechanism (rather than multi-head attention or attribute-wise processing alone) is load-bearing for both the architecture contribution and the entropy-based selection framework, yet no ablation is presented that replaces dynamic windowing with a fixed-length or average-length window while holding all other components constant.
  2. [Framework] Framework description (likely §3): the entropy-based recommendation of model families risks post-hoc fitting if the entropy thresholds or mapping from entropy values to recommended algorithms were derived or tuned using the same six evaluation logs rather than a separate validation procedure or theoretical derivation.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment below and indicate the revisions we will make.

read point-by-point responses
  1. Referee: [Experiments] Experiments section: the claim that DAW-Transformer superiority on high-entropy logs is due to the dynamic windowing mechanism (rather than multi-head attention or attribute-wise processing alone) is load-bearing for both the architecture contribution and the entropy-based selection framework, yet no ablation is presented that replaces dynamic windowing with a fixed-length or average-length window while holding all other components constant.

    Authors: We agree that an explicit ablation isolating the dynamic windowing component is necessary to substantiate the architectural claim. In the revised manuscript we will add an ablation study that replaces the dynamic windowing mechanism with both fixed-length and average-length windows while keeping multi-head attention, attribute-wise processing, and all other hyperparameters identical. Results will be reported on the same six logs with the same evaluation protocol. revision: yes

  2. Referee: [Framework] Framework description (likely §3): the entropy-based recommendation of model families risks post-hoc fitting if the entropy thresholds or mapping from entropy values to recommended algorithms were derived or tuned using the same six evaluation logs rather than a separate validation procedure or theoretical derivation.

    Authors: The entropy thresholds and the mapping from entropy ranges to model families were derived from theoretical considerations of process complexity (drawing on prior work relating entropy to branching factor and trace variability) together with distributional analysis performed on a larger collection of public logs before the six evaluation logs were selected. Nevertheless, to eliminate any perception of post-hoc fitting we will (i) document the exact derivation procedure and the auxiliary logs used, and (ii) add a sensitivity analysis that recomputes recommendations on a held-out set of logs. These clarifications and additional checks will appear in the revised §3. revision: partial

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The provided abstract and description present an empirical study proposing an entropy-based model selection framework and the DAW-Transformer architecture, validated via experiments on six public event logs. No equations, self-citations, or derivation steps are shown that reduce a claimed prediction or result to its inputs by construction (e.g., no fitted parameters renamed as predictions, no self-definitional entropy measures, no load-bearing self-citations). The claims rest on comparative performance results rather than a closed mathematical chain, making the work self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review yields no explicit free parameters, axioms, or invented entities; the DAW-Transformer is presented as a new model but without internal details.

pith-pipeline@v0.9.0 · 5711 in / 970 out tokens · 22348 ms · 2026-05-23T02:59:39.922171+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

6 extracted references · 6 canonical work pages

  1. [1]

    Forlaps: An innovative data-driven reinforcement learning approach for prescriptive process monitoring

    Abbasi, M., Khadivi, M., Ahang, M., Lasserre, P., Lucet, Y . & Najjaran, H. (2025), “Forlaps: An innovative data-driven reinforcement learning approach for prescriptive process monitoring”,arXiv preprint arXiv:2501.10543. Abbasi, M., Nishat, R. I., Bond, C., Graham-Knight, J. B., Lasserre, P., Lucet, Y . & Najjaran, H. (2024), “A review of ai and machine ...

  2. [2]

    Next activity prediction and elapsed time prediction on process dataset

    Dentamaro, V ., Impedovo, D., Pirlo, G. & Semeraro, G. (2023), “Next activity prediction and elapsed time prediction on process dataset.”,Ital-IA, pp. 605–609. Di Mauro, N., Appice, A. & Basile, T. M. (2019), “Activity prediction of business process instances with inception cnn models”,AI* IA 2019–Advances in Artificial Intelligence: XVIIIth International...

  3. [3]

    Review of random forest classification techniques to resolve data imbalance

    More, A. & Rana, D. P. (2017), “Review of random forest classification techniques to resolve data imbalance”, 2017 1st International conference on intelligent systems and information management (ICISIM), IEEE, pp. 72–78. Musa, T. H. A. & Bouras, A. (2023), “Prediction of next events in business processes: A deep learning approach”, IFIP International Conf...

  4. [4]

    Multi-attribute transformers for sequence prediction in busi- ness process management

    Rivera Lazo, G. & ˜Nanculef, R. (2022), “Multi-attribute transformers for sequence prediction in busi- ness process management”, International Conference on Discovery Science , Springer, pp. 184–

  5. [5]

    Decision tree methods: applications for classification and predic- tion

    Song, Y .-Y . & Ying, L. (2015), “Decision tree methods: applications for classification and predic- tion”, Shanghai archives of psychiatry, V ol. 27 , No. 2, pp

  6. [6]

    Next activity prediction of ongoing business processes based on deep learning

    Sun, X., Yang, S., Ying, Y . & Yu, D. (2024), “Next activity prediction of ongoing business processes based on deep learning”, Expert Systems, V ol. 41 , No. 5, pp. e13421. 17 Turner, C. J., Tiwari, A., Olaiya, R. & Xu, Y . (2012), “Process mining: from theory to practice”, Business process management journal, V ol. 18 , No. 3, pp. 493–512. Vaswani, A. (2...