Promoting Simple Agents: Ensemble Methods for Event-Log Prediction
Pith reviewed 2026-05-09 22:55 UTC · model grok-4.3
The pith
Lightweight n-gram models combined with a promotion ensemble achieve accuracy comparable to neural networks for event-log prediction at lower computational cost.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Experiments on synthetic patterns and five real-world process mining datasets show that n-grams with appropriate context windows achieve comparable accuracy to neural models while requiring substantially fewer resources. Unlike windowed neural architectures, which show unstable performance patterns, n-grams provide stable and consistent accuracy. While classical ensemble methods like voting improve n-gram performance, they require running many agents in parallel during inference, increasing memory consumption and latency. The proposed promotion algorithm dynamically selects between two active models during inference, reducing overhead compared to classical voting schemes. On real-world data,
What carries the argument
The promotion algorithm, which dynamically selects between two active n-gram models during inference to reduce overhead.
If this is right
- N-grams with suitable context windows achieve comparable accuracy to neural models but require substantially fewer resources.
- N-grams deliver stable and consistent accuracy unlike windowed neural architectures that fluctuate.
- Classical voting improves n-gram performance but raises memory and latency costs; promotion reduces this overhead.
- On real-world datasets the resulting ensembles match or exceed non-windowed neural models with lower cost.
Where Pith is reading between the lines
- Dynamic selection like promotion could extend to other streaming prediction domains where full ensembles are too expensive at inference time.
- Process mining systems running on limited hardware might adopt n-grams to enable real-time monitoring without neural-scale compute.
- Controlled experiments varying log complexity could clarify exactly when the stability of n-grams outweighs neural capacity.
Load-bearing premise
The five real-world process mining datasets and the chosen context windows are representative enough for the claimed general superiority in the resource-accuracy trade-off, with no hidden data leakage in window selection.
What would settle it
A new independent event-log dataset where the promotion ensembles fail to match or exceed non-windowed neural accuracy while using lower computational cost would disprove the central result.
Figures
read the original abstract
We compare lightweight automata-based models (n-grams) with neural architectures (LSTM, Transformer) for next-activity prediction in streaming event logs. Experiments on synthetic patterns and five real-world process mining datasets show that n-grams with appropriate context windows achieve comparable accuracy to neural models while requiring substantially fewer resources. Unlike windowed neural architectures, which show unstable performance patterns, n-grams provide stable and consistent accuracy. While we demonstrate that classical ensemble methods like voting improve n-gram performance, they require running many agents in parallel during inference, increasing memory consumption and latency. We propose an ensemble method, the promotion algorithm, that dynamically selects between two active models during inference, reducing overhead compared to classical voting schemes. On real-world datasets, these ensembles match or exceed the accuracy of non-windowed neural models with lower computational cost.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper compares lightweight n-gram automata models against neural architectures (LSTM, Transformer) for next-activity prediction on streaming event logs. It shows that n-grams with suitable context windows achieve comparable accuracy to non-windowed neural models at lower computational cost, while being more stable than windowed neural variants. The authors introduce a 'promotion algorithm' ensemble that dynamically switches between two active n-gram models during inference to reduce the overhead of classical voting ensembles, and validate the approach on synthetic patterns plus five real-world process mining datasets.
Significance. If the accuracy and resource claims hold after addressing experimental gaps, the work would provide a practical, low-overhead alternative for event-log prediction in resource-constrained process mining settings. The promotion algorithm offers a targeted ensemble technique that improves on voting by limiting active models at inference time. The multi-dataset empirical comparison is a strength, though it remains entirely empirical without parameter-free derivations or machine-checked proofs.
major comments (3)
- [Experiments] Experiments section (real-world results): the central claim that n-gram ensembles 'match or exceed the accuracy of non-windowed neural models with lower computational cost' is presented without error bars, standard deviations across runs, or statistical significance tests on the five datasets. This directly affects whether the reported stability and trade-off can be considered reliable.
- [Model Description and Experiments] Context window selection procedure (described in the n-gram model setup and experimental protocol): insufficient detail is given on how 'appropriate context windows' were chosen for each dataset. If any test-set information was used in this selection, it would constitute leakage and undermine the general superiority claim in the abstract.
- [Results] Comparison to baselines (results tables): the non-windowed LSTM/Transformer baselines must be confirmed to use identical train/test splits, preprocessing, and metrics as the n-gram ensembles. Any mismatch in implementation would invalidate the accuracy-cost conclusion.
minor comments (2)
- [Algorithm] The promotion algorithm pseudocode could benefit from explicit notation for the promotion/demotion thresholds and state transitions to improve reproducibility.
- [Tables] Some result tables would be clearer with explicit column headers indicating whether accuracy or resource metrics are reported, and with consistent ordering of methods across datasets.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. We address each major comment below, providing clarifications and committing to revisions that strengthen the experimental reporting without altering the core findings.
read point-by-point responses
-
Referee: [Experiments] Experiments section (real-world results): the central claim that n-gram ensembles 'match or exceed the accuracy of non-windowed neural models with lower computational cost' is presented without error bars, standard deviations across runs, or statistical significance tests on the five datasets. This directly affects whether the reported stability and trade-off can be considered reliable.
Authors: We agree that reporting variability and statistical tests would make the reliability of the accuracy and stability claims more robust. In the revised manuscript, we will add error bars representing standard deviations across multiple independent runs and include paired statistical significance tests (e.g., Wilcoxon signed-rank) for the key comparisons on the five real-world datasets. revision: yes
-
Referee: [Model Description and Experiments] Context window selection procedure (described in the n-gram model setup and experimental protocol): insufficient detail is given on how 'appropriate context windows' were chosen for each dataset. If any test-set information was used in this selection, it would constitute leakage and undermine the general superiority claim in the abstract.
Authors: Context windows were selected solely from training data using cross-validation on the training portions of each dataset, with no test-set information involved at any stage. We will revise the n-gram model setup and experimental protocol sections to provide a step-by-step description of this leakage-free procedure, including the validation strategy employed. revision: yes
-
Referee: [Results] Comparison to baselines (results tables): the non-windowed LSTM/Transformer baselines must be confirmed to use identical train/test splits, preprocessing, and metrics as the n-gram ensembles. Any mismatch in implementation would invalidate the accuracy-cost conclusion.
Authors: The non-windowed LSTM and Transformer baselines were trained and evaluated using precisely the same train/test splits, preprocessing pipeline, and evaluation metrics as the n-gram models and ensembles. We will add explicit confirmation of this equivalence, along with implementation details, to the experimental setup and results sections in the revision. revision: yes
Circularity Check
No circularity; claims rest on direct empirical comparisons
full rationale
The paper is an empirical comparison of n-gram ensembles against LSTM/Transformer models on synthetic patterns and five real-world process-mining logs. No mathematical derivation chain, equations, or proofs are present that could reduce by construction to fitted parameters, self-definitions, or self-citations. Performance claims (accuracy, stability, resource cost) are supported by reported experimental measurements rather than any tautological reduction. Self-citations, if any, are not load-bearing for the central results.
Axiom & Free-Parameter Ledger
free parameters (1)
- context window size
Reference graph
Works this paper leans on
-
[1]
van der Aalst, W.M.P.: Process Mining - Data Science in Action, Second Edition. Springer (2016). https://doi.org/10.1007/978-3-662-49851-4, https://doi.org/10. 1007/978-3-662-49851-4
-
[2]
van der Aalst, W.M.P., Schonenberg, M.H., Song, M.: Time prediction based on process mining. Inf. Syst.36(2), 450–475 (2011). https://doi.org/10.1016/J.IS. 2010.09.001, https://doi.org/10.1016/j.is.2010.09.001
-
[3]
Balle, B., Castro, J., Gavaldà, R.: Adaptively learning probabilistic deterministic automata from data streams. Mach. Learn.96(1-2), 99–127 (2014). https://doi. org/10.1007/S10994-013-5408-X, https://doi.org/10.1007/s10994-013-5408-x Promoting Simple Agents: Ensemble Methods for Event-Log Prediction 15
-
[4]
In: Coste, F., Ouardi, F., Rabusseau, G
Baumgartner,R.,Verwer,S.:Learningstatemachinesfromdatastreams:Ageneric strategy and an improved heuristic. In: Coste, F., Ouardi, F., Rabusseau, G. (eds.) International Conference on Grammatical Inference, ICGI 2023, 10-13 July 2023, Rabat, Morocco. Proceedings of Machine Learning Research, vol. 217, pp. 117–141. PMLR (2023), https://proceedings.mlr.press...
2023
-
[5]
Bollig,B.,Függer,M.,Nowak,T.,Zeinaty,P.:logicsponge-processmining:Alibrary for process-mining tasks and next activity prediction in business processes., https: //github.com/innatelogic/logicsponge-processmining.git, accessed: 2026-02-13
2026
-
[6]
In: Touili, T., Cook, B., Jackson, P.B
Bollig, B., Katoen, J., Kern, C., Leucker, M., Neider, D., Piegdon, D.R.: libalf: The automata learning framework. In: Touili, T., Cook, B., Jackson, P.B. (eds.) Com- puter Aided Verification, 22nd International Conference, CAV 2010, Edinburgh, UK, July 15-19, 2010. Proceedings. Lecture Notes in Computer Science, vol. 6174, pp. 360–364. Springer (2010). h...
-
[7]
Breuker, D., Matzner, M., Delfmann, P., Becker, J.: Comprehensible predictive models for business processes. MIS Q.40(4), 1009–1034 (2016). https://doi.org/ 10.25300/MISQ/2016/40.4.10, https://doi.org/10.25300/misq/2016/40.4.10
-
[8]
https://doi.org/10.48550/ arXiv.2104.00721, https://arxiv.org/abs/2104.00721
Bukhsh, Z.A., Saeed, A., Dijkman, R.M.: Processtransformer: Predictive business process monitoring with transformer network (2021). https://doi.org/10.48550/ arXiv.2104.00721, https://arxiv.org/abs/2104.00721
-
[9]
In: van der Aalst, W.M.P., Carmona, J
Burattin, A.: Streaming process mining. In: van der Aalst, W.M.P., Carmona, J. (eds.) Process Mining Handbook, Lecture Notes in Business Information Processing, vol. 448, pp. 349–372. Springer (2022). https://doi.org/10.1007/ 978-3-031-08848-3_11, https://doi.org/10.1007/978-3-031-08848-3_11
-
[10]
In: Proceedings of the IEEE Congress on Evolution- ary Computation, CEC 2014, Beijing, China, July 6-11, 2014
Burattin, A., Sperduti, A., van der Aalst, W.M.P.: Control-flow discovery from event streams. In: Proceedings of the IEEE Congress on Evolution- ary Computation, CEC 2014, Beijing, China, July 6-11, 2014. pp. 2420–
2014
-
[11]
https://doi.org/10.1109/CEC.2014.6900341, https://doi.org/ 10.1109/CEC.2014.6900341
IEEE (2014). https://doi.org/10.1109/CEC.2014.6900341, https://doi.org/ 10.1109/CEC.2014.6900341
-
[12]
In: International Colloquium on Grammatical Inference
Carrasco, R.C., Oncina, J.: Learning stochastic regular grammars by means of a state merging method. In: International Colloquium on Grammatical Inference. pp. 139–152. Springer (1994)
1994
-
[13]
In: Dzeroski, S., Panov, P., Kocev, D., Todorovski, L
Ceci, M., Lanotte, P.F., Fumarola, F., Cavallo, D.P., Malerba, D.: Completion time and next activity prediction of processes using sequential pattern min- ing. In: Dzeroski, S., Panov, P., Kocev, D., Todorovski, L. (eds.) Discovery Sci- ence - 17th International Conference, DS 2014, Bled, Slovenia, October 8-10,
2014
-
[14]
Lecture Notes in Computer Science, vol
Proceedings. Lecture Notes in Computer Science, vol. 8777, pp. 49–61. Springer (2014). https://doi.org/10.1007/978-3-319-11812-3_5, https://doi.org/ 10.1007/978-3-319-11812-3_5
-
[15]
Journal of algorithms3(1), 14–30 (1982)
Dolev, D.: The byzantine generals strike again. Journal of algorithms3(1), 14–30 (1982)
1982
-
[16]
Information and Control 52(3), 257–274 (1982)
Dolev, D., Fischer, M.J., Fowler, R., Lynch, N.A., Strong, H.R.: An efficient al- gorithm for byzantine agreement without authentication. Information and Control 52(3), 257–274 (1982)
1982
-
[19]
https://doi.org/10
van Dongen, B., Borchert, F.: BPI Challenge 2018. https://doi.org/10. 4121/uuid:3301445f-95e8-4ff0-98a4-901f1f204972 (2018). https://doi.org/10.4121/ UUID:3301445F-95E8-4FF0-98A4-901F1F204972
2018
-
[20]
In: Miclet, L., de la Higuera, C
Dupont, P.: Incremental regular inference. In: Miclet, L., de la Higuera, C. (eds.) Grammatical Inference: Learning Syntax from Sentences, 3rd International Col- loquium, ICGI-96, Montpellier, France, September 25-27, 1996, Proceedings. Lec- ture Notes in Computer Science, vol. 1147, pp. 222–237. Springer (1996). https: //doi.org/10.1007/BFB0033357, https...
-
[21]
https://doi.org/10.48550/arXiv.2404.06267, https://arxiv.org/abs/2404.06267
Elyasi, K.A., van der Aa, H., Stuckenschmidt, H.: Pgtnet: A process graph trans- former network for remaining time prediction of business process instances (2024). https://doi.org/10.48550/arXiv.2404.06267, https://arxiv.org/abs/2404.06267
-
[22]
Cambridge University Press, USA (2010)
de la Higuera, C.: Grammatical Inference: Learning Automata and Grammars. Cambridge University Press, USA (2010)
2010
-
[23]
Neural Computation 9(8), 1735–1780 (1997)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Computation 9(8), 1735–1780 (1997)
1997
-
[24]
In: Kroening, D., Pasareanu, C.S
Isberner, M., Howar, F., Steffen, B.: The open-source learnlib - A framework for active automata learning. In: Kroening, D., Pasareanu, C.S. (eds.) Com- puter Aided Verification - 27th International Conference, CAV 2015, San Fran- cisco, CA, USA, July 18-24, 2015, Proceedings, Part I. Lecture Notes in Com- puter Science, vol. 9206, pp. 487–495. Springer (...
-
[25]
Krawczyk, B., Cano, A.: Online ensemble learning with abstaining classifiers for drifting and noisy data streams. Appl. Soft Comput.68, 677–692 (2018). https:// doi.org/10.1016/J.ASOC.2017.12.008, https://doi.org/10.1016/j.asoc.2017.12.008
-
[26]
ACM Trans
Lamport, L., Shostak, R., Pease, M.: The byzantine generals problem. ACM Trans. Program. Lang. Syst.4(3), 382–401 (1982)
1982
-
[27]
In: de Boer, F.S., Bonsangue, M.M., Graf, S., de Roever, W.P
Leucker, M.: Learning meets verification. In: de Boer, F.S., Bonsangue, M.M., Graf, S., de Roever, W.P. (eds.) Formal Methods for Components and Objects, 5th International Symposium, FMCO 2006, Amsterdam, The Netherlands, November 7-10, 2006, Revised Lectures. Lecture Notes in Computer Science, vol. 4709, pp. 127–151. Springer (2006). https://doi.org/10.1...
-
[28]
Lischka, A., Rauch, S., Stritzel, O.: Directly follows graphs go predictive process monitoring with graph neural networks (2025). https://doi.org/10.48550/arXiv. 2503.03197, https://arxiv.org/abs/2503.03197
work page internal anchor Pith review doi:10.48550/arxiv 2025
-
[30]
Mao, H., Chen, Y., Jaeger, M., Nielsen, T.D., Larsen, K.G., Nielsen, B.: Learning deterministic probabilistic automata from a model checking perspective. Mach. Learn.105(2), 255–299 (2016). https://doi.org/10.1007/S10994-016-5565-9, https: //doi.org/10.1007/s10994-016-5565-9
-
[31]
In: Coste, F., Ouardi, F., Rabusseau, G
Mayr, F., Yovine, S., Carrasco, M., Pan, F., Vilensky, F.: A congruence-based approach to active automata learning from neural language models. In: Coste, F., Ouardi, F., Rabusseau, G. (eds.) International Conference on Grammatical Inference, ICGI 2023, 10-13 July 2023, Rabat, Morocco. Proceedings of Machine Learning Research, vol. 217, pp. 250–264. PMLR ...
2023
-
[32]
Muskardin, E., Aichernig, B.K., Pill, I., Pferscher, A., Tappler, M.: AALpy: an active automata learning library. Innov. Syst. Softw. Eng.18(3), Promoting Simple Agents: Ensemble Methods for Event-Log Prediction 17 417–426 (2022). https://doi.org/10.1007/S11334-022-00449-3, https://doi.org/10. 1007/s11334-022-00449-3
-
[33]
In: Abramowicz, W., Auer, S., Lewan- ska, E
Pegoraro, M., Uysal, M.S., Georgi, D.B., van der Aalst, W.M.P.: Text-aware pre- dictive monitoring of business processes. In: Abramowicz, W., Auer, S., Lewan- ska, E. (eds.) 24th International Conference on Business Information Systems, BIS 2021, Hannover, Germany, June 15-17, 2021. pp. 221–232 (2021). https: //doi.org/10.52825/BIS.V1I.62, https://doi.org...
-
[34]
Computing100(9), 1005–1031 (2018)
Polato, M., Sperduti, A., Burattin, A., de Leoni, M.: Time and activ- ity sequence prediction of business process instances. Computing100(9), 1005–1031 (2018). https://doi.org/10.1007/S00607-018-0593-X, https://doi.org/ 10.1007/s00607-018-0593-x
-
[35]
Rama-Maneiro, E., Vidal, J.C., Lama, M.: Embedding graph convolutional net- works in recurrent neural networks for predictive monitoring (2021). https://doi. org/10.48550/arXiv.2112.09641, https://arxiv.org/abs/2112.09641
-
[36]
In: Oh, A., Naumann, T., Globerson, A., Saenko, K., Hardt, M., Levine, S
Schick, T., Dwivedi-Yu, J., Dessì, R., Raileanu, R., Lomeli, M., Hambro, E., Zettle- moyer, L., Cancedda, N., Scialom, T.: Toolformer: Language models can teach themselves to use tools. In: Oh, A., Naumann, T., Globerson, A., Saenko, K., Hardt, M., Levine, S. (eds.) Advances in Neural Information Processing Systems 36: Annual Conference on Neural Informat...
2023
-
[37]
Schmidt, J., Kramer, S.: Online induction of probabilistic real-time automata. J. Comput. Sci. Technol.29(3), 345–360 (2014). https://doi.org/10.1007/ S11390-014-1435-8, https://doi.org/10.1007/s11390-014-1435-8
-
[38]
The Bell system tech- nical journal27(3), 379–423 (1948)
Shannon, C.E.: A mathematical theory of communication. The Bell system tech- nical journal27(3), 379–423 (1948)
1948
-
[39]
https://doi.org/10.4121/uuid: 500573e6-accc-4b0c-9576-aa5468b10cee (2013)
Steeman, W.: BPI Challenge 2013, incidents. https://doi.org/10.4121/uuid: 500573e6-accc-4b0c-9576-aa5468b10cee (2013). https://doi.org/10.4121/UUID: 500573E6-ACCC-4B0C-9576-AA5468B10CEE
-
[40]
Vaandrager, F.W.: Model learning. Commun. ACM60(2), 86–95 (2017). https: //doi.org/10.1145/2967606, https://doi.org/10.1145/2967606
-
[41]
In: Proceedings of the 31st International Conference on Neural Information Processing Systems
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., Polosukhin, I.: Attention is all you need. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. p. 6000–6010. NIPS’17, Curran Associates Inc., Red Hook, NY, USA (2017)
2017
-
[42]
Verwer, S., Hammerschmidt, C.A.: flexfringe: A passive automaton learning pack- age. In: 2017 IEEE International Conference on Software Maintenance and Evo- lution, ICSME 2017, Shanghai, China, September 17-22, 2017. pp. 638–642. IEEE Computer Society (2017). https://doi.org/10.1109/ICSME.2017.58, https: //doi.org/10.1109/ICSME.2017.58
-
[43]
Wang, F., Damiani, E.: Time-aware and transition-semantic graph neural networks for interpretable predictive business process monitoring (2025). https://doi.org/10. 48550/arXiv.2508.09527, https://arxiv.org/abs/2508.09527
-
[44]
In: Gurevych, I., Miyao, Y
Weiss, G., Goldberg, Y., Yahav, E.: On the practical computational power of finite precision rnns for language recognition. In: Gurevych, I., Miyao, Y. (eds.) Proceed- ings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, Melbourne, Australia, July 15-20, 2018, Volume 2: Short Papers. pp. 740–745. Association for Comp...
2018
-
[45]
In: The Eleventh Interna- tional Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023
Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K.R., Cao, Y.: ReAct: Synergizing reasoning and acting in language models. In: The Eleventh Interna- tional Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net (2023) 18 B. Bollig, M. Függer, T. Nowak, and P. Zeinaty
2023
-
[46]
van Zelst, S.J., van Dongen, B.F., van der Aalst, W.M.P.: Event stream-based process discovery using abstract representations. Knowl. Inf. Syst.54(2), 407– 435 (2018). https://doi.org/10.1007/S10115-017-1060-2, https://doi.org/10.1007/ s10115-017-1060-2
-
[47]
Chapman & Hal- l/CRC, 1st edn
Zhou, Z.H.: Ensemble Methods: Foundations and Algorithms. Chapman & Hal- l/CRC, 1st edn. (2012)
2012
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.