Recognition: 2 theorem links
· Lean TheoremIntegrating Causal Machine Learning into Clinical Decision Support Systems: Insights from Literature and Practice
Pith reviewed 2026-05-15 00:23 UTC · model grok-4.3
The pith
Causal machine learning can be integrated into clinical decision support systems through specific design requirements and principles derived from literature and physician input to deliver interpretable treatment-specific insights.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Through design science methodology, a structured literature review and interviews with physicians establish eight empirically grounded design requirements, seven design principles, and nine practical features for CDSSs that use causal machine learning. These elements enable interfaces delivering causal insights, fitting into clinical workflows, and supporting trust, usability, and collaboration. The work additionally identifies tensions in automation, responsibility, and regulation, indicating the need for adaptive certification of ML-based medical products.
What carries the argument
The central mechanism is the translation of insights from a structured literature review and physician interviews into eight design requirements, seven principles, and nine features that shape clinician-facing interfaces for causal ML outputs.
If this is right
- CDSSs shift from correlation-based predictions to causation-based reasoning that specifies treatment effects for individual patients.
- Interfaces integrate causal outputs directly into existing clinical workflows without adding friction.
- Design features promote trust and usability in collaborative human-AI decision processes.
- Regulatory approaches must adapt to handle ML-based medical products through ongoing certification rather than static approval.
- Tensions around automation levels and assignment of responsibility become explicit design considerations.
Where Pith is reading between the lines
- The derived principles could be validated through pilots in multiple medical specialties to check performance under different data conditions and patient populations.
- Widespread use of such designs might reduce reliance on spurious correlations that currently contribute to diagnostic or treatment errors.
- Comparable design processes could transfer to other domains requiring causal explanations, such as financial risk assessment or legal outcome prediction.
Load-bearing premise
The design requirements and principles drawn from the literature review and physician interviews are generalizable and complete enough to guide effective CDSS design across varied clinical settings.
What would settle it
A controlled trial in which clinicians using CDSS interfaces built from these requirements and principles show no gains in decision accuracy, trust, or workflow efficiency relative to standard systems would falsify the proposed guidance.
read the original abstract
Current clinical decision support systems (CDSSs) typically base their predictions on correlation, not causation. In recent years, causal machine learning (ML) has emerged as a promising way to improve decision-making with CDSSs by offering interpretable, treatment-specific reasoning. However, existing research often emphasizes model development rather than designing clinician-facing interfaces. To address this gap, we investigated how CDSSs based on causal ML should be designed to effectively support collaborative clinical decision-making. Using a design science research methodology, we conducted a structured literature review and interviewed experienced physicians. From these, we derived eight empirically grounded design requirements, developed seven design principles, and proposed nine practical design features. Our results establish guidance for designing CDSSs that deliver causal insights, integrate seamlessly into clinical workflows, and support trust, usability, and human-AI collaboration. We also reveal tensions around automation, responsibility, and regulation, highlighting the need for an adaptive certification process for ML-based medical products.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that current CDSSs rely on correlational rather than causal ML, creating a gap in clinician-facing interfaces. Using design science research, the authors conduct a structured literature review and physician interviews to derive eight design requirements, seven design principles, and nine practical features. These artifacts are presented as establishing guidance for CDSSs that deliver causal insights, integrate into clinical workflows, support trust/usability/human-AI collaboration, and address tensions in automation, responsibility, and regulation.
Significance. If the derived requirements, principles, and features prove generalizable and effective when implemented, the work would address a recognized gap between causal ML model development and usable clinical interfaces, offering concrete design guidance for trustworthy human-AI collaboration in medicine. However, the absence of methodological transparency and any evaluation step limits the immediate impact to exploratory insights rather than validated design knowledge.
major comments (3)
- [§4] §4 (Results): The eight requirements, seven principles, and nine features are presented as empirically grounded outputs of the literature review and interviews, yet no details are supplied on the literature search strategy, number of papers screened/included, physician sample size, interview protocol, or thematic analysis procedure. This omission makes it impossible to evaluate the completeness or transferability of the artifacts.
- [§4 and Discussion] §4 and Discussion: The central claim that the artifacts 'establish guidance' for CDSS design across diverse settings rests on extraction alone, without any subsequent validation step (prototype instantiation, controlled usability trial, or cross-setting replication) to test whether the proposed features actually improve causal insight delivery, workflow integration, or trust. The leap from 'extracted from sources' to 'establishes guidance' therefore remains an unverified extrapolation.
- [Methods] Methods section: The design science research methodology is described at a high level but supplies no information on how conflicts between literature findings and interview data were resolved or how the final set of requirements/principles/features was consolidated, leaving the traceability of the design artifacts unclear.
minor comments (2)
- [Abstract] Abstract: The claim of a 'structured literature review' would benefit from a brief mention of the review protocol or PRISMA-style reporting to strengthen credibility.
- [Tables/Figures] Figure/Table captions: Ensure that any tables summarizing the eight requirements or nine features include explicit links back to the source literature or interview quotes for traceability.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive comments, which help strengthen the methodological transparency and framing of our design science research. We address each major comment point by point below.
read point-by-point responses
-
Referee: [§4] §4 (Results): The eight requirements, seven principles, and nine features are presented as empirically grounded outputs of the literature review and interviews, yet no details are supplied on the literature search strategy, number of papers screened/included, physician sample size, interview protocol, or thematic analysis procedure. This omission makes it impossible to evaluate the completeness or transferability of the artifacts.
Authors: We agree that these details are necessary for assessing rigor and transferability. In the revised manuscript, we will expand the Methods section with the complete literature search strategy (including databases searched, keywords, and inclusion/exclusion criteria), the exact numbers of papers screened and included, the physician sample size and recruitment approach, the full interview protocol (semi-structured questions, duration, and format), and the thematic analysis procedure (including coding framework, author involvement, and any measures of consistency). revision: yes
-
Referee: [§4 and Discussion] §4 and Discussion: The central claim that the artifacts 'establish guidance' for CDSS design across diverse settings rests on extraction alone, without any subsequent validation step (prototype instantiation, controlled usability trial, or cross-setting replication) to test whether the proposed features actually improve causal insight delivery, workflow integration, or trust. The leap from 'extracted from sources' to 'establishes guidance' therefore remains an unverified extrapolation.
Authors: We accept the distinction between derivation and validation. Our design science contribution centers on synthesizing requirements, principles, and features from the literature review and interviews; we do not present these as empirically validated in clinical use. In the revised Discussion, we will explicitly qualify the artifacts as proposed guidance derived from the sources, acknowledge the absence of instantiation or usability testing, and add a dedicated subsection on limitations and future validation steps (e.g., prototype development and controlled trials). We will revise the abstract and conclusion language accordingly to avoid overstatement. revision: partial
-
Referee: [Methods] Methods section: The design science research methodology is described at a high level but supplies no information on how conflicts between literature findings and interview data were resolved or how the final set of requirements/principles/features was consolidated, leaving the traceability of the design artifacts unclear.
Authors: We agree that traceability requires explicit description of the integration process. In the revised Methods section, we will detail how conflicts between literature-derived and interview-derived insights were identified and resolved (e.g., through author consensus meetings, weighting by source frequency or physician emphasis), and the iterative consolidation steps that produced the final eight requirements, seven principles, and nine features (including grouping criteria, elimination rules, and final selection rationale). revision: yes
Circularity Check
No significant circularity detected
full rationale
The paper follows a design science research methodology that synthesizes findings from an external structured literature review and new physician interviews to derive design requirements, principles, and features. These inputs are independent data sources rather than self-defined quantities, fitted parameters, or self-citation chains. No equations, predictions, or uniqueness theorems are invoked that reduce the central claims to the paper's own prior outputs by construction. The derivation chain remains self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Design science research methodology can reliably translate literature insights and physician interviews into actionable design requirements and principles.
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We derived eight empirically grounded design requirements, developed seven design principles, and proposed nine practical design features.
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
DR4: The system should make causal reasoning understandable... SR4.1 Visualizes causal relationships through clear, interpretable diagrams
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Thirty-Fourth European Conference on Information Systems (ECIS 2026), Milan, Italy 1 INTEGRATING CAUSAL MACHINE LEARNING INTO CLINICAL DECISION SUPPORT SYSTEMS: INSIGHTS FROM LITERATURE AND PRACTICE Completed Research Paper Domenique Zipperling, University of Bayreuth and Fraunhofer FIT, Bayreuth, Germany, domenique.zipperling@fit.fraunhofer.de Lukas Schm...
work page 2026
-
[2]
with limited attention to how clinicians can effectively interact with such systems in practice (Feuerriegel et al., 2024; Zheng et al., 2020). For causal ML to enhance clinical decision-making, its predictions and reasoning must be made interpretable and actionable through well-designed interfaces that facilitate trustworthy collaboration (Holzinger et a...
work page 2024
-
[3]
and performed a structured literature review (SLR), analyzed through open, axial, and selective coding (Gioia, 2021; Strauss & Corbin, 1998; Wolfswinkel et al., 2013), thereby retrieving design requirements (DRs). Finally, we pursued an approach to formulate design principles (DPs) and design features (DFs) grounded in our DRs (Walk et al., 2024). Our ana...
work page 2021
-
[4]
Visualization of the difference between an associative ML-based CDSS (left) and a causal ML-based CDSS (right) based on Feuerriegel et al. (2024). Causal ML extends associative ML by estimating individual treatment effects, enabling clinicians to compare outcomes across treatments and assess the impact of specific interventions (see Figure
work page 2024
-
[5]
Yet, realizing this promise remains challenging
(Feuerriegel et al., 2024). Yet, realizing this promise remains challenging. Methods such as structural causal models and directed acyclic graphs (DAGs) provide frameworks for representing causal assumptions (Kaddour et al., 2022), but establishing a valid causal structure requires substantial domain expertise to specify, evaluate, and refine these models...
work page 2024
-
[6]
Afterward, we anonymized and transcribed all interviews, then coded them using MAXQDA (Kuckartz, 2012). Causal ML-based CDSS HistoryPredictionTreatment Planning Prior toTreatmentRecovery SuccessNo Treatment Associative ML-based CDSS HistoryPredictionPrior toTreatmentRecovery SuccessNo TreatmentPhysical TherapySurgery Treatment EffectTreatment EffectTreatm...
work page 2012
-
[7]
Methodology based on Hevner (2007). We focus on deriving design requirements, principles, and features rather than conducting design cycles. The rigor cycle ensures the theoretical foundation of our work by integrating established scientific knowledge from recent literature. We conducted a rigorous SLR as proposed by Webster & Watson (2002) and Wolfswinke...
work page 2007
-
[8]
Overview of experts and interviews. Design RequirementsDesign Cycle 1Design Cycle 2Practical Knowledge BaseObjective: Uncover practical problemMethod: Semi-structured Expert InterviewsResult: Identification of Design Requirements Theoretical Knowledge BaseObjective: Uncover existing knowledge baseMethod: Rigorous Structured Literature ReviewResult: Identi...
work page 2026
-
[9]
Rigorous structured literature review based on Wolfswinkel et al. (2013). To derive preliminary DPs, we interpreted the eight DRs through five theory-grounded layers: (1) work-system integration, (2) workflow integration, and (3) decision-support transparency, explainability, and clinician trust (Salwei & Carayon, 2022), (4) governance, validation, and mo...
work page 2013
-
[10]
Overview of DRs, Layers, DPs, and DFs. Colored lines show the specific links. 4.1 Extracted Design Requirements Within the relevance and rigor cycle, eight DRs were extracted. Table 2 provides an overview of all DRs, their respective SRs, and corresponding sources from literature and expert interviews. Additionally, we group them into three categories: ge...
work page 2023
-
[11]
Generic (Thews et al., 1996; Vidal et al., 2025; Weiss et al.,
3, 4, 6 SR1.2: Enables fast, natural, and ergonomic data input through intuitive interaction mechanisms. Generic (Thews et al., 1996; Vidal et al., 2025; Weiss et al.,
work page 1996
-
[12]
SR2.1: Structures and prioritizes results to enhance interpretability and reduce cognitive load
5, 6, 10 DR2: The system should communicate outcomes in a clear, actionable, and confidence-aware manner. SR2.1: Structures and prioritizes results to enhance interpretability and reduce cognitive load. Generic (Cálem et al., 2024; Holzinger et al., 2021; Pierce et al., 2022; Thanathornwong, 2018; Weiss et al.,
work page 2024
-
[13]
Generic (Bienefeld et al., 2023; Pierce et al.,
1, 3 – 7, 9, 10 SR2.2: Uses multimodal and visual representations to support rapid comprehension and perception. Generic (Bienefeld et al., 2023; Pierce et al.,
work page 2023
-
[14]
1, 6, 7, 9, 10 SR2.3: Translates findings into actionable alerts and recommendations for timely intervention. Generic (Bienefeld et al., 2023; Müller et al., 2020; Pierce et al., 2022; Vidal et al., 2025; Weiss et al.,
work page 2023
-
[15]
ML-based (Bienefeld et al., 2023; Feuerriegel et al., 2024; Müller et al., 2020; Weiss et al.,
1, 3, 5, 7 – 10 SR2.4: Transparently conveys uncertainty and confidence levels to support informed decisions. ML-based (Bienefeld et al., 2023; Feuerriegel et al., 2024; Müller et al., 2020; Weiss et al.,
work page 2023
-
[16]
SR3.1: Dynamically tailores explanations and interactions to the clinical situation and task
2, 4, 5, 10 DR3: The system should adapt to varying clinical contexts, cognitive demands, and ethical complexities. SR3.1: Dynamically tailores explanations and interactions to the clinical situation and task. Generic (Hamon et al., 2022; Pierce et al.,
work page 2022
-
[17]
1, 3, 4, 6 SR3.2 Demonstrates reliability and usefulness to foster clinician trust and adoption. Generic (Aslan et al., 2022; Jensen & Andreassen, 2008; Müller-Sielaff et al., 2023; Stevens & Stetson,
work page 2022
-
[18]
3, 6, 7, 10 SR3.3 Provides reflective and ethical support in complex or uncertain decision contexts. ML-based - 1, 2, 5, 6, 9 DR4: The system should make causal reasoning understandable, traceable, and logically coherent. SR4.1 Visualizes causal relationships through clear, interpretable diagrams and models. Causal ML-based (Bienefeld et al., 2023; Holzin...
work page 2023
-
[19]
1 – 7 SR4.2 Offers interactive, layered explanations of reasoning processes on demand. Causal ML-based (Aslan et al., 2022; Cálem et al., 2024; Hamon et al., 2022; Holzinger et al., 2021; Müller-Sielaff et al., 2023; Palma et al., 2006; Pierce et al.,
work page 2022
-
[20]
3, 6, 7, 10 SR4.3 Links conclusions to verifiable clinical evidence and scientific sources. ML-based (Aslan et al., 2022; Müller et al., 2020; Müller-Sielaff et al., 2023; Papageorgiou et al., 2011; Pierce et al., 2022; Vidal et al.,
work page 2022
-
[21]
1 – 10 SR4.4: Enables interactive exploration of causal networks to support visual reasoning. Causal ML-based (Bienefeld et al., 2023; Hamon et al., 2022; Metsch et al., 2024; Müller et al., 2020; Thews et al.,
work page 2023
-
[22]
Causal ML-based (Palma et al., 2006; Weiss et al.,
1, 10 SR4.5: Maintains logical consistency and causal completeness within reasoning structures. Causal ML-based (Palma et al., 2006; Weiss et al.,
work page 2006
-
[23]
SR5.1 Supports integration of expert knowledge and evolving medical evidence
- DR5: The system should allow clinicians to modify and refine causal models through interaction. SR5.1 Supports integration of expert knowledge and evolving medical evidence. Causal ML-based (Bienefeld et al., 2023; Cálem et al., 2024; Constantinou et al., 2016; Holzinger et al., 2021; Iakovidis & Papageorgiou, 2011; Metsch et al., 2024; Müller-Sielaff e...
work page 2023
-
[24]
5, 8, 9 SR5.2 Enables exploration of causal dynamics through simulations and what-if analyses. Causal ML-based (Bienefeld et al., 2023; Cálem et al., 2024; Constantinou et al., 2016; Feuerriegel et al., 2024; Holzinger et al., 2021; Metsch et al., 2024; Müller et al., 2020; Müller-Sielaff et al., 2023; Rabbi et al., 2020; Weiss et al.,
work page 2023
-
[25]
SR6.1 Captures and visualizes time-based data to show trends and treatment responses
1, 4, 6, 9, 10 DR6: The system should facilitate longitudinal and comparative monitoring of patients. SR6.1 Captures and visualizes time-based data to show trends and treatment responses. Generic (Bienefeld et al., 2023; Müller et al., 2020; Palma et al., 2003, 2006; Rabbi et al., 2020; Thews et al., 1996; Weiss et al.,
work page 2023
-
[26]
Generic (Bienefeld et al., 2023; Feuerriegel et al., 2024; Thews et al.,
5, 7 SR6.2 Allows comparative evaluation of patient trajectories across cohorts or datasets. Generic (Bienefeld et al., 2023; Feuerriegel et al., 2024; Thews et al.,
work page 2023
-
[27]
SR7.1 Complies with regulatory and clinical standards to ensure validated performance
10 DR7: The system should ensure reliable, compliant, and unbiased causal modeling. SR7.1 Complies with regulatory and clinical standards to ensure validated performance. ML-based - 3, 4, 8, 9 SR.7.2 Prevents bias and data manipulation that could distort causal reasoning. Causal ML-based (Dijk et al., 2025; Thews et al.,
work page 2025
-
[28]
SR 8.1: Customizes interfaces and explanations to user preferences and roles
1, 4, 7 DR8: The system should adapt to individual clinician preferences, workflows, and learning needs. SR 8.1: Customizes interfaces and explanations to user preferences and roles. Generic (Cálem et al., 2024; Jensen & Andreassen, 2008; Vidal et al.,
work page 2024
-
[29]
simple, colorful visualizations
Overview: Design requirements, their sub-requirement, and the respective source. Integrating Causal ML into CDSS Thirty-Fourth European Conference on Information Systems (ECIS 2026), Milan, Italy 7 For multimodal and visual representation (SR2.2), the system should use intuitive graphics like survival plots, color-coded scales (E7, E9), or simple dashboar...
work page 2026
-
[30]
to check if it would have decided like [them]
and enabling interactive explanations that reflect situational diversity (Hamon et al., 2022). To foster trust, the system must be useful and reliable in practice (SR3.2). Clinicians viewed AI as a means of cross-validating reasoning, “to check if it would have decided like [them]” (E3), and to challenge assumptions constructively (E6, E10). It should fea...
work page 2022
-
[31]
to benchmark outcomes and validate effects (Feuerriegel et al., 2024). Ensuring causal interaction integrity (DR7) is essential so that the system’s reasoning remains valid, compliant, and protected from manipulation or bias. To ensure regulatory compliance (SR7.1), it must meet medical device standards. Clinicians stressed that the CDSS must not “halluci...
work page 2024
-
[32]
N., Inkpen, K., Teevan, J., Kikin-Gil, R., & Horvitz, E
https://doi.org/10.3390/healthcare13172154 Amershi, S., Weld, D., Vorvoreanu, M., Fourney, A., Nushi, B., Collisson, P., Suh, J., Iqbal, S., Bennett, P. N., Inkpen, K., Teevan, J., Kikin-Gil, R., & Horvitz, E. (2019). Guidelines for Human-AI Interaction. Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems, CHI ’19, 1–13. https://d...
-
[33]
https://doi.org/10.1016/j.jclinepi.2024.111538 Jensen, K., & Andreassen, S. (2008). Generic causal probabilistic networks: A solution to a problem of transferability in medical decision support. Computer Methods and Programs in Biomedicine, 89(2), 189–201. https://doi.org/10.1016/j.cmpb.2007.10.015 Ji, M., Genchev, G. Z., Huang, H., Xu, T., Lu, H., & Yu, ...
-
[34]
https://doi.org/10.34133/research.0467 Kaddour, J., Lynch, A., Liu, Q., Kusner, M. J., & Silva, R. (2022). Causal Machine Learning: A Survey and Open Problems (arXiv:2206.15475). arXiv. https://doi.org/10.48550/arXiv.2206.15475 Karimi, A.-H., Schölkopf, B., & Valera, I. (2021). Algorithmic Recourse: From Counterfactual Explanations to Interventions. Proce...
-
[35]
https://doi.org/10.1038/s41746-020-0221-y Thanathornwong, B. (2018). Bayesian-Based Decision Support System for Assessing the Needs for Orthodontic Treatment. Healthcare Informatics Research, 24(1), 22–28. https://doi.org/10.4258/hir.2018.24.1.22 Integrating Causal ML into CDSS Thirty-Fourth European Conference on Information Systems (ECIS 2026), Milan, I...
-
[36]
https://doi.org/10.1007/s10994-025-06855-5 Weiss, S. M., Kulikowski, C. A., Amarel, S., & Safir, A. (1978). A model-based method for computer-aided medical decision-making. Artificial Intelligence, 11(1–2), 145–172. https://doi.org/10.1016/0004-3702(78)90015-2 Whiting, L. S. (2008). Semi-structured interviews: Guidance for novice researchers. Nursing Stan...
-
[37]
F., Furtmueller, E., & Wilderom, C
https://www.spectaris.de/fileadmin/Content/Medizintechnik/DIHK_MedicalMountains_SPECTARIS_MDR_Survey_2023.pdf Wolfswinkel, J. F., Furtmueller, E., & Wilderom, C. P. M. (2013). Using grounded theory as a method for rigorously reviewing literature. European Journal of Information Systems, 22(1), 45–55. https://doi.org/10.1057/ejis.2011.51 Zhang, Y., Kreif, ...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.