LLM-Orchestrated Conformance Checking in Stroke Care Without Computer-Interpretable Guidelines
Pith reviewed 2026-06-27 16:12 UTC · model grok-4.3
The pith
Orchestrated LLMs extract patient traces and rules from raw texts to measure stroke care conformance without formal guidelines.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
A modular architecture coordinates several LLMs and helper components to extract patient traces directly from unstructured discharge letters, derive normative rules from textual clinical guidelines, translate the rules into executable scripts, and compute a Trace Conformance Indicator that quantifies how many traces satisfy the rules; when applied to stroke care data from Alessandria Hospital the system processed hundreds of traces against fifty derived rules and reported more than 86 percent conformance.
What carries the argument
The modular LLM-orchestration pipeline that sequentially extracts traces, identifies rules, generates executable scripts, and produces a Trace Conformance Indicator.
If this is right
- Conformance checking becomes possible in hospitals that have only ordinary text guidelines.
- Hundreds of patient records can be assessed automatically against dozens of rules derived from a single guideline document.
- A single numeric Trace Conformance Indicator summarizes overall guideline adherence for an entire event log.
- The same pipeline can be reused on new domains once suitable text sources are supplied.
Where Pith is reading between the lines
- Hospitals could run periodic automated audits without first investing in formal guideline encoding.
- Non-conformant traces flagged by the indicator could be routed to clinicians for targeted review.
- The approach might lower the cost of maintaining compliance monitoring across multiple clinical pathways.
Load-bearing premise
Large language models can reliably extract accurate patient traces and guideline rules from unstructured clinical text without introducing significant errors or hallucinations.
What would settle it
Side-by-side comparison of LLM outputs against independent expert manual annotation on the same set of discharge letters and guideline text, showing extraction accuracy below 80 percent or rule sets that differ on more than 10 percent of conditions.
Figures
read the original abstract
Objective: Conformance checking in healthcare seeks to assess whether patient care pathways adhere to clinical guidelines. However, its practical application often depends on the availability of formal, machine-interpretable representations of guidelines, such as Computer-Interpretable Guidelines (CIGs), which are seldom available in real-world clinical settings. Methods: This work introduces a modular framework based on the orchestration of Large Language Models (LLMs) to support medical conformance checking directly from unstructured clinical and guideline texts, without requiring predefined CIGs. The proposed architecture integrates multiple LLMs and supporting components to extract patient traces from clinical discharge letters, identify normative rules from textual clinical guidelines, translate these rules into executable scripts, and compute a Trace Conformance Indicator to quantify compliance within the event log. Results: The framework was implemented and evaluated in the stroke care domain at the neurological ward of Alessandria Hospital. Hundreds of patient traces were automatically extracted from hospital data and assessed against 50 rules derived from the reference guideline. The analysis showed that more than 86\% of the available traces were conformant. Conclusion: The results demonstrate the feasibility of using orchestrated LLMs for practical healthcare conformance analysis. At the same time, the study provides evidence of a high level of adherence to stroke care guidelines at Alessandria Hospital.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces a modular framework that orchestrates multiple LLMs to perform conformance checking in healthcare directly from unstructured texts, without Computer-Interpretable Guidelines. It extracts patient traces from clinical discharge letters, derives normative rules from guideline texts, translates rules into executable scripts, and computes a Trace Conformance Indicator. The framework is implemented and evaluated in the stroke care domain using data from Alessandria Hospital, where hundreds of traces were assessed against 50 LLM-derived rules, yielding a reported conformance rate exceeding 86%.
Significance. If the LLM-based extractions prove reliable, the approach would enable conformance analysis in real-world settings lacking formal CIGs and provide evidence of high guideline adherence at the studied hospital. The real-hospital deployment and use of actual patient data constitute a practical strength that could support broader adoption of LLM-orchestrated process mining in clinical informatics.
major comments (1)
- [Results] Results: The headline finding that >86% of hundreds of traces are conformant depends entirely on the accuracy of the LLM extraction stages for patient traces (from discharge letters) and normative rules (from the guideline). No ground-truth validation, error rates, precision/recall metrics, or human review of the extracted traces and rules is reported, rendering the conformance percentage uninterpretable as evidence of actual guideline adherence rather than possible LLM artifacts.
minor comments (2)
- [Methods] The description of the orchestration architecture would benefit from explicit pseudocode or a diagram showing the sequence of LLM calls and data flows between components.
- [Abstract] The abstract and results paragraph should state the exact number of traces analyzed rather than 'hundreds' to allow readers to assess statistical power.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. The major comment highlights an important limitation in the current presentation of results, which we address below with a commitment to revision.
read point-by-point responses
-
Referee: [Results] Results: The headline finding that >86% of hundreds of traces are conformant depends entirely on the accuracy of the LLM extraction stages for patient traces (from discharge letters) and normative rules (from the guideline). No ground-truth validation, error rates, precision/recall metrics, or human review of the extracted traces and rules is reported, rendering the conformance percentage uninterpretable as evidence of actual guideline adherence rather than possible LLM artifacts.
Authors: We agree that the lack of reported validation for the LLM extraction stages is a substantive limitation that affects the strength of the conformance claims. The manuscript presents the framework as a feasibility demonstration in a real clinical setting and reports the observed rate from the hospital data, but does not include ground-truth checks or quantitative error metrics on the trace and rule extractions. In the revised manuscript we will add a dedicated validation subsection that includes: (i) human review of a random sample of extracted patient traces against the original discharge letters, (ii) human review of the 50 derived rules against the source guideline text, and (iii) reported agreement rates together with any observed error categories. This addition will allow readers to assess the reliability of the extraction pipeline and thereby interpret the >86% conformance figure more confidently. revision: yes
Circularity Check
No circularity; empirical result derived from external hospital traces and guideline text
full rationale
The paper presents an LLM-orchestrated pipeline for extracting traces and rules from real hospital discharge letters and guideline documents, then computes conformance on those extracted artifacts. No equations, fitted parameters, or self-citations are used to derive the 86% figure; it is reported as a direct count from the Alessandria Hospital data set. The central claim therefore rests on the accuracy of the LLM extractions rather than on any definitional or self-referential reduction. Absence of ground-truth validation for the extractions is a correctness concern, not a circularity issue.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Process Mining - Data Science in Action, Second Edition. Springer. URL:https://doi.org/10.1007/978-3-662-4 9851-4, doi:10.1007/978-3-662-49851-4. Adriansyah, A., Munoz-Gama, J., Carmona, J., van Dongen, B.F., van der Aalst, W.M.P.,
-
[2]
(Eds.), Business Process Management Workshops - BPM 2012 International Workshops, Tallinn, Estonia, September 3,
Alignment based precision checking, in: Rosa, M.L., Soffer, P. (Eds.), Business Process Management Workshops - BPM 2012 International Workshops, Tallinn, Estonia, September 3,
2012
-
[3]
Revised Papers, Springer. pp. 137–149. doi:10.1007/978-3-642 -36285-9\_15. Berti, A., Kourani, H., Hafke, H., Li, C.Y., Schuster, D.,
-
[4]
Berti, A., Schuster, D., van der Aalst, W.M.P.,
Evaluating largelanguagemodelsinprocessmining:Capabilities,benchmarks,and evaluation strategies.arXiv:2403.06749. Berti, A., Schuster, D., van der Aalst, W.M.P.,
-
[5]
Abstractions, sce- narios, and prompt definitions for process mining with llms: A case study,in:Weerdt,J.D.,Pufahl,L.(Eds.),BusinessProcessManagement Workshops - BPM 2023 International Workshops, Utrecht, The Nether- lands, September 11-15, 2023, Revised Selected Papers, Springer. pp. 427–439. URL:https://doi.org/10.1007/978-3-031-50974-2_32, doi:10.1007/...
-
[6]
Conformance checking and diagnosis for declarative business process models in data-aware scenarios. Expert Syst. Appl. 41, 5340–5352. doi:10.1016/J.ESWA.2014.03.010. Bottrighi, A., Canessa, A., Ferrandi, D., Leonardi, G., Maconi, A., Mas- sarino, C., Montani, S., Roveta, A., Striani, M.,
-
[7]
Computer-interpretable guideline formalisms, in: ten Teije, A., Miksch, S., Lucas, P.J.F. (Eds.), Computer-based Medical Guidelines and Protocols: A Primer and Cur- rentTrends.IOSPress.volume139ofStudiesinHealthTechnologyand Informatics, pp. 22–43. doi:10.3233/978-1-58603-873-1-22. Cosentino, C., Defilippo, A., Dossena, M., Irwin, C., Joubbi, S., Liò, P.,
-
[8]
URL:https://arxiv.org/ abs/2508.07308,arXiv:2508.07308
Healthbranches: Synthesizing clinically-grounded question answering datasets via decision pathways. URL:https://arxiv.org/ abs/2508.07308,arXiv:2508.07308. Desel,J.,Reisig,W.,Rozenberg,G.(Eds.),2004. LecturesonConcurrency and Petri Nets, Advances in Petri Nets [This tutorial volume originates from the 4th Advanced Course on Petri Nets, ACPN 2003, held in ...
arXiv 2004
-
[9]
volume 3098 ofLecture Notes in Computer Science, Springer
In addition to lectures given at ACPN 2003, additional chapters have been commissioned]. volume 3098 ofLecture Notes in Computer Science, Springer. URL:https: //doi.org/10.1007/b98282, doi:10.1007/B98282. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.,
-
[10]
BERT: Pre- training of deep bidirectional transformers for language understanding, in: Burstein, J., Doran, C., Solorio, T. (Eds.), Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Association for Computational Linguistics, Minn...
-
[11]
Conformance check- ing:astate-of-the-artliteraturereview,in:Betz,S.(Ed.),Proceedingsof the11thInternationalConferenceonSubject-OrientedBusinessProcess Management, S-BPM ONE 2019, Seville, Spain, June 26-28, 2019, ACM. pp. 4:1–4:10. doi:10.1145/3329007.3329014. Google DeepMind, 2025a. Gemini 2.5 flash. Large language model de- veloped by Google, optimized ...
-
[12]
Grohs,M.,Abb,L.,Elsayed,N.,Rehse,J.R.,2023
for document understanding and synthesis. Grohs,M.,Abb,L.,Elsayed,N.,Rehse,J.R.,2023. Largelanguagemodels can accomplish business process management tasks.arXiv:2307.09923. Gu,J.,Jiang,X.,Shi,Z.,Tan,H.,Zhai,X.,Xu,C.,Li,W.,Shen,Y.,Ma,S., Liu, H., Wang, S., Zhang, K., Wang, Y., Gao, W., Ni, L., Guo, J.,
arXiv 2023
-
[13]
URL:https://arxiv.org/abs/2411.15594, arXiv:2411.15594
A survey on llm-as-a-judge. URL:https://arxiv.org/abs/2411.15594, arXiv:2411.15594. ItalianStroke Association,. URL:https://isa-aii.com/linee-guida/linee -guida-attuali/. Jessen, U., Sroka, M., Fahland, D.,
-
[14]
Khurana, D., Koli, A., Khatter, K., Singh, S.,
Chit-chat or deep talk: Prompt engineering for process mining.arXiv:2307.09909. Khurana, D., Koli, A., Khatter, K., Singh, S.,
-
[15]
Natural language processing:stateoftheart,currenttrendsandchallenges. Multim.Tools Appl. 82, 3713–3744. URL:https://doi.org/10.1007/s11042-022-134 28-4, doi:10.1007/S11042-022-13428-4. First Author et al.:Preprint submitted to ElsevierPage 9 of 10 Short Title of the Article Korotich, A.,
-
[16]
Industry analysis emphasizing cohesive AI ecosystems over standalone tools
Cpo predictions: The year ai finally learns to speak workflow.https://www.wrike.com/blog/ai-workflow-2026/. Industry analysis emphasizing cohesive AI ecosystems over standalone tools. Kourani, H., Berti, A., Hennrich, J., Kratsch, W., Weidlich, R., Li, C.Y., Arslan,A.,Schuster,D.,vanderAalst,W.M.P.,2024a. Leveraginglarge language models for enhanced proce...
-
[17]
arXiv preprint arXiv:2508.19517 URL: https://arxiv.org/abs/2508.19517
Orchid: Orchestrating context across creative workflows with generative ai. arXiv preprint arXiv:2508.19517 URL: https://arxiv.org/abs/2508.19517. Qafari, M.S., van der Aalst, W.,
-
[18]
(Eds.), On the Move to Meaningful Internet Systems: OTM 2019 Conferences, Springer International Publishing, Cham
Fairness-aware process mining, in: Panetto, H., Debruyne, C., Hepp, M., Lewis, D., Ardagna, C.A., Meersman, R. (Eds.), On the Move to Meaningful Internet Systems: OTM 2019 Conferences, Springer International Publishing, Cham. pp. 182–192. Reichert, M., Weber, B.,
2019
-
[19]
Enabling Flexibility in Process-Aware Information Systems - Challenges, Methods, Technologies. Springer. URL:https://doi.org/10.1007/978-3-642-30409-5, doi:10.1007/978-3 -642-30409-5. Rodella,G.,Scalogna,A.,Carenzo,L.,DellaCorte,F.,2025. Fromprompt to platform: an agentic ai workflow for healthcare simulation scenario design. Advances in Simulation
-
[20]
Rozinat, A., van der Aalst, W.M.P.,
URL:https://advancesinsimu lation.biomedcentral.com/articles/10.1186/s41077-025-00357-z, doi:10.1186/s41077-025-00357-z. Rozinat, A., van der Aalst, W.M.P.,
-
[21]
Conformance checking of processesbasedonmonitoringrealbehavior. Inf.Syst.33,64–95. URL: https://doi.org/10.1016/j.is.2007.07.001, doi:10.1016/J.IS.2007.07.0
-
[22]
URL:https://arxiv.org/abs/2402.07927, arXiv:2402.07927
A systematic survey of prompt engineering in large language models: Techniques and applications. URL:https://arxiv.org/abs/2402.07927, arXiv:2402.07927. Susaiyah, A., Sidorova, N.,
-
[23]
JournalofMedicalInternetResearchURL:https: //pubmed.ncbi.nlm.nih.gov/40658884/
Large language model synergy for ensemble learning in medical question answering: Design andevaluationstudy. JournalofMedicalInternetResearchURL:https: //pubmed.ncbi.nlm.nih.gov/40658884/. Yang,L.,Xu,S.,Sellergren,A.,Kohlberger,T.,Zhou,Y.,Ktena,I.,Kiraly, A., Ahmed, F., Hormozdiari, F., Jaroensri, T., Wang, E., Wulczyn, E., Jamil, F., Guidroz, T., Lau, C....
-
[24]
First Author et al.:Preprint submitted to ElsevierPage 10 of 10
Advancing multimodal medical capabilities of gemini URL:https: //arxiv.org/abs/2405.03162,arXiv:2405.03162. First Author et al.:Preprint submitted to ElsevierPage 10 of 10
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.