pith. sign in

arxiv: 2605.19258 · v1 · pith:IM6FJJSBnew · submitted 2026-05-19 · 💻 cs.LG · cs.AI

ExECG: An Explainable AI Framework for ECG models

Pith reviewed 2026-05-20 07:31 UTC · model grok-4.3

classification 💻 cs.LG cs.AI
keywords explainable AIECGframeworkreproducibilitydeep learningarrhythmiavisualizationstandardization
0
0 comments X

The pith

The ExECG framework standardizes ECG data handling and unifies explainable AI methods in a three-stage Python pipeline to improve reproducibility.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces ExECG as a Python framework to make explainable AI practical for deep learning models that diagnose heart conditions from ECG signals. Current work on explaining these models suffers from inconsistent pipelines that vary by study, making it difficult to reuse methods or compare results reliably. ExECG addresses this by dividing the process into a Wrapper that handles different ECG data formats, an Explainer that runs various XAI techniques under the same rules, and a Visualizer that presents results uniformly. A sympathetic reader would care because better explanations could help doctors understand and trust model predictions, especially when accuracy alone is not enough for clinical decisions. The paper shows this through code examples and case studies on real models.

Core claim

ExECG is a Python framework that provides a three-stage pipeline for ECG explainability: the Wrapper standardizes access across heterogeneous ECG formats and intermediate representations, the Explainer unifies diverse XAI methods under a shared execution protocol, and the Visualizer supports consistent cross-method comparison within a unified interface, as demonstrated by end-to-end usage examples and two case studies.

What carries the argument

The three-stage pipeline (Wrapper, Explainer, Visualizer) that standardizes data access, unifies XAI execution, and enables consistent visualization for ECG models.

If this is right

  • Applying different XAI methods to ECG models becomes more interoperable and less dependent on custom code for each method.
  • Reproducibility of explanations improves because all methods follow the same execution protocol.
  • Cross-method comparisons of explanations are easier due to the unified visualization interface.
  • Clinical trust in ECG diagnostic models increases through more consistent and accessible explanations.
  • Error analysis and justification for specific model outputs are facilitated by the standardized pipeline.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Adoption of this framework could lead to more standardized practices in publishing ECG XAI results across different research groups.
  • Extensions to other biosignals such as EEG or wearable sensor data might follow naturally from the modular design.
  • Integration with popular ECG analysis libraries could amplify its impact on practical clinical tools.
  • Validation studies comparing user trust levels before and after using the framework would test its real-world value.

Load-bearing premise

That unifying diverse XAI methods under one shared execution protocol and providing consistent visualization will meaningfully improve reuse, reproducibility, and clinical trust without requiring additional validation or domain-specific adjustments.

What would settle it

Running the same XAI method directly versus through the ExECG Explainer on identical ECG data and model, and observing materially different explanation outputs or visualizations.

Figures

Figures reproduced from arXiv: 2605.19258 by Jong-Hwan Jang, Yong-yeon Jo.

Figure 1
Figure 1. Figure 1: Design principles and framework overview of ExECG. ExECG is developed following four design principles—Standardization, Reproducibility, Integration, and Extensibility—motivated by practical needs in clinical and research settings. The framework operationalizes these principles through three modular components: Wrapper, Explainer, and Visualizer. izes access to inputs, outputs, and intermediate rep￾resenta… view at source ↗
Figure 2
Figure 2. Figure 2: Three-stage pipeline of ExECG. Wrapper standardizes model I/O and exposes internal signals (e.g., activations and gradients) required by explainers; Explainer runs attribution-, counterfactual-, and concept-based XAI methods under a unified interface; and Visualizer renders explanation outputs in ECG-aligned plots. Example. Ex 1 shows how the Wrapper handles a difference between the standardized I/O conven… view at source ↗
Figure 3
Figure 3. Figure 3: An Example of attribution methods for AF classification. Six attribution methods are applied to the same ECG sample, with importance scores displayed as aligned heatmaps below the waveform. Most methods consistently highlight the P-wave region, suggesting the model relies on atrial activity for AF detection [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: An Example of counterfactual explanation for AF classification. The original ECG (blue, AF probability 0.0005) is overlaid with its counterfactual (red, AF probability 0.7712).The counterfactual shows P-wave attenuation and irregular RR intervals, which are characteristic patterns of AF [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: TCAV Analysis for AF Classification. Left: Heatmap of TCAV scores across four clinical concepts and network layers. Right: TCAV scores with 95% confidence intervals. The atrial fibrillation concept shows scores consistently above 0.5 (chance level) across all layers, indicating stable reliance on AF-related features, while other concepts remain near or below chance. 8 [PITH_FULL_IMAGE:figures/full_fig_p00… view at source ↗
Figure 6
Figure 6. Figure 6: Integrated XAI visualization in standard 12-lead ECG format. ExECG renders explanations on the clinical 4×3 grid layout familiar to practitioners. The chart overlays the original ECG (blue) with counterfactual (green), displays attribution importance as background shading, and reports TCAV-derived concept significance, enabling multi-method interpretation within a single view. 9 [PITH_FULL_IMAGE:figures/f… view at source ↗
Figure 7
Figure 7. Figure 7: Comparison of Attribution Methods for Potassium Level Estimation. Attribution heatmaps for six XAI methods applied to a potassium regression model. Integrated Gradients, SmoothGrad, and Vanilla Saliency consistently emphasize T-wave regions, while Grad-CAM variants assign broader importance across the QRS-T segment [PITH_FULL_IMAGE:figures/full_fig_p013_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Counterfactual Explanation for Potassium Level Estimation. The original ECG (blue, predicted value 0.33) is overlaid with its counterfactual (red, predicted value 0.60). The counterfactual shows peaked T-waves, widened QRS complexes, and prolonged PR intervals—characteristic ECG changes associated with hyperkalemia [PITH_FULL_IMAGE:figures/full_fig_p013_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: TCAV Analysis for Potassium Level Estimation. Left: Heatmap of TCAV scores for four clinical concepts across network layers. Right: TCAV scores with 95% confidence intervals. The T-wave abnormal concept exhibits scores significantly above 0.5, indicating that the model relies on T-wave morphology for potassium prediction. 13 [PITH_FULL_IMAGE:figures/full_fig_p013_9.png] view at source ↗
read the original abstract

Deep learning has enabled ECG diagnostic models with strong performance in tasks such as arrhythmia classification and abnormality detection. However, accuracy alone is insufficient for clinical deployment because it does not explain why a specific output was produced, limiting justification, error analysis, and trust. Although ECG XAI has been extensively investigated and steadily improved, practical pipelines and reporting conventions vary across studies, hindering reuse and reproducibility. To address these issues, we present Explainable AI framework for ECG models (ExECG), a Python framework that provides a three-stage pipeline: Wrapper standardizes access across heterogeneous ECG formats and intermediate representations, Explainer unifies diverse XAI methods under a shared execution protocol, and Visualizer supports consistent cross-method comparison within a unified interface. We demonstrate end-to-end usage with concise examples and two case studies, highlighting interoperable and reproducible ECG explainability.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The manuscript introduces ExECG, a Python framework for explainable AI applied to ECG diagnostic models. It proposes a three-stage pipeline in which the Wrapper standardizes access across heterogeneous ECG formats and intermediate representations, the Explainer unifies diverse XAI methods under a shared execution protocol, and the Visualizer enables consistent cross-method comparison. The contribution is demonstrated through concise code examples and two case studies that illustrate end-to-end usage and interoperability.

Significance. If implemented and adopted as described, the framework could reduce ad-hoc pipeline variability in ECG XAI research and thereby support greater reproducibility and reuse. The primary value is engineering-oriented: a unified interface rather than new theoretical methods or empirical XAI advances. Impact will ultimately depend on community uptake and any subsequent validation studies.

major comments (1)
  1. [Abstract] Abstract: the claim that standardizing formats, unifying XAI execution, and providing consistent visualization will improve reuse, reproducibility, and clinical trust is not accompanied by any quantitative support. The demonstrations are limited to usage examples and case studies; no metrics (e.g., implementation-time reduction, inter-user explanation consistency, or comparison against existing ad-hoc pipelines) are reported to substantiate the asserted benefits.
minor comments (2)
  1. The manuscript would benefit from an explicit link to the source repository and installation instructions so that readers can immediately reproduce the reported examples.
  2. A short table or paragraph comparing ExECG feature coverage against existing general-purpose XAI libraries (e.g., Captum, Alibi) would help readers assess the incremental contribution of the ECG-specific wrappers and visualizers.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their constructive feedback and for recognizing the engineering value of ExECG in reducing pipeline variability for ECG XAI research. We address the single major comment below and outline the revisions we will make.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the claim that standardizing formats, unifying XAI execution, and providing consistent visualization will improve reuse, reproducibility, and clinical trust is not accompanied by any quantitative support. The demonstrations are limited to usage examples and case studies; no metrics (e.g., implementation-time reduction, inter-user explanation consistency, or comparison against existing ad-hoc pipelines) are reported to substantiate the asserted benefits.

    Authors: We agree that the abstract asserts benefits of standardization and unification without accompanying quantitative evidence, and that the provided demonstrations consist of usage examples and case studies rather than controlled measurements. As the manuscript presents a framework whose primary contribution is a unified interface rather than new empirical XAI results, we focused on design and interoperability. To address this point, we will revise the abstract to qualify the claims, stating that the framework is intended to facilitate improved reuse and reproducibility through its standardized pipeline, as illustrated by the case studies, without asserting measured improvements. We will also add a short paragraph in the discussion section acknowledging the lack of user studies or timing benchmarks and noting that such validation is left for future work. These changes will be reflected in the revised manuscript. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper presents a software framework (ExECG) consisting of a three-stage pipeline for standardizing ECG data access, unifying XAI execution protocols, and enabling consistent visualization. No mathematical derivations, equations, fitted parameters, predictions, or first-principles results are claimed or present. The contribution is purely descriptive and implementational, with demonstrations via examples and case studies that do not reduce to self-referential inputs or prior fitted quantities by construction. No self-citation chains, uniqueness theorems, or ansatzes are invoked in a load-bearing manner that would create circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The framework rests on domain assumptions about ECG data heterogeneity and the feasibility of unifying XAI methods; no free parameters or new entities are introduced.

axioms (2)
  • domain assumption Diverse XAI methods can be unified under a shared execution protocol without significant loss of their individual strengths or applicability to ECG data.
    Invoked directly in the description of the Explainer stage to enable consistent cross-method comparison.
  • domain assumption Standardizing access across heterogeneous ECG formats and intermediate representations will improve interoperability and reuse across studies.
    Basis for the Wrapper component as stated in the abstract.

pith-pipeline@v0.9.0 · 5666 in / 1399 out tokens · 46260 ms · 2026-05-20T07:31:26.167625+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

44 extracted references · 44 canonical work pages · 1 internal anchor

  1. [1]

    European Heart Journal - Digital Health , year=

    Artificial Intelligence-Enhanced 6-Lead Portable Electrocardiogram Device for Detecting Left Ventricular Systolic Dysfunction , author=. European Heart Journal - Digital Health , year=. doi:10.1093/ehjdh/ztaf025 , publisher=

  2. [2]

    npj digital medicine , volume=

    Solving the explainable AI conundrum by bridging clinicians’ needs and developers’ goals , author=. npj digital medicine , volume=. 2023 , publisher=

  3. [3]

    Scientific Reports , volume=

    Clinician perspectives on explainability in AI-driven closed-loop neurotechnology , author=. Scientific Reports , volume=. 2025 , publisher=

  4. [4]

    European Heart Journal , year=

    Real-time opportunistic myocardial infarction detection by artificial intelligence-based electrocardiogram: a multicentre randomized controlled trial (ROMIAE) , author=. European Heart Journal , year=. doi:10.1093/eurheartj/ehaf004 , publisher=

  5. [5]

    International conference on machine learning , pages=

    Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav) , author=. International conference on machine learning , pages=. 2018 , organization=

  6. [6]

    arXiv preprint arXiv:2009.07896 , year=

    Captum: A unified and generic model interpretability library for pytorch , author=. arXiv preprint arXiv:2009.07896 , year=

  7. [7]

    Schlegel, Udo , title =

  8. [8]

    Scientific Reports , volume=

    A novel XAI framework for explainable AI-ECG using generative counterfactual XAI (GCX) , author=. Scientific Reports , volume=. 2025 , publisher=

  9. [9]

    Scholarpedia , volume=

    Saliency map , author=. Scholarpedia , volume=

  10. [10]

    Proceedings of the IEEE international conference on computer vision , pages=

    Grad-cam: Visual explanations from deep networks via gradient-based localization , author=. Proceedings of the IEEE international conference on computer vision , pages=

  11. [11]

    IEEE Winter Conference on Applications of Computer Vision (WACV) , pages=

    Grad-CAM++: Improved Visual Explanations for Deep Convolutional Networks , author=. IEEE Winter Conference on Applications of Computer Vision (WACV) , pages=. 2018 , organization=

  12. [12]

    arXiv preprint arXiv:2106.07756 , year=

    Counterfactual explanations for machine learning: Challenges revisited , author=. arXiv preprint arXiv:2106.07756 , year=

  13. [13]

    arXiv preprint arXiv:2508.16033 , year=

    CoFE: A Framework Generating Counterfactual ECG for Explainable Cardiac AI-Diagnostics , author=. arXiv preprint arXiv:2508.16033 , year=

  14. [14]

    Advances in neural information processing systems , volume=

    A unified approach to interpreting model predictions , author=. Advances in neural information processing systems , volume=

  15. [15]

    BioMedical Engineering OnLine , volume=

    Deep learning and electrocardiography: systematic review of current techniques in cardiovascular disease diagnosis and management , author=. BioMedical Engineering OnLine , volume=. 2025 , publisher=

  16. [16]

    Computers in Biology and Medicine , volume=

    Explainable deep learning based techniques for ECG-Based heart disease classification: A systematic literature review and future direction , author=. Computers in Biology and Medicine , volume=. 2025 , publisher=

  17. [17]

    Nature medicine , volume=

    AI in medicine must be explainable , author=. Nature medicine , volume=. 2021 , publisher=

  18. [18]

    Computers and Electrical Engineering , volume =

    A review of Explainable Artificial Intelligence in healthcare , author =. Computers and Electrical Engineering , volume =. 2024 , doi =

  19. [19]

    Scientific Reports , volume =

    Classification of multi-lead ECG based on multiple scales and hierarchical feature convolutional neural networks , author =. Scientific Reports , volume =. 2025 , month = may, doi =

  20. [20]

    AI Magazine , volume =

    Reproducibility in machine-learning-based research: Overview, barriers, and drivers , author =. AI Magazine , volume =. 2025 , doi =

  21. [21]

    2022 , doi =

    Wenzhuo Yang and Hung Le and Silvio Savarese and Steven Hoi , title =. 2022 , doi =

  22. [22]

    medRxiv , pages=

    Inherently explainable deep neural network-based interpretation of electrocardiograms using variational auto-encoders , author=. medRxiv , pages=. 2022 , publisher=

  23. [23]

    Nature Medicine , volume=

    Cardiologist-level arrhythmia detection and classification in ambulatory electrocardiograms using a deep neural network , author=. Nature Medicine , volume=. 2019 , publisher=

  24. [24]

    Nature Communications , volume=

    Automatic diagnosis of the 12-lead ECG using a deep neural network , author=. Nature Communications , volume=. 2020 , publisher=

  25. [25]

    Scientific Data , volume=

    PTB-XL, a large publicly available electrocardiography dataset , author=. Scientific Data , volume=. 2020 , publisher=

  26. [26]

    2023 , publisher=

    MIMIC-IV-ECG - Diagnostic Electrocardiogram Matched Subset , author=. 2023 , publisher=

  27. [27]

    2022 , month = jul, note =

    Reyna, Matthew and Sadr, Nadi and Gu, Annie and. 2022 , month = jul, note =. doi:10.13026/34va-7q14 , url =

  28. [28]

    ICLR Workshop , year=

    Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps , author=. ICLR Workshop , year=

  29. [29]

    SmoothGrad: removing noise by adding noise

    SmoothGrad: Removing noise by adding noise , author=. arXiv preprint arXiv:1706.03825 , year=

  30. [30]

    Proceedings of the 34th International Conference on Machine Learning (ICML) , pages=

    Axiomatic Attribution for Deep Networks , author=. Proceedings of the 34th International Conference on Machine Learning (ICML) , pages=. 2017 , volume=

  31. [31]

    2018 IEEE Winter Conference on Applications of Computer Vision (WACV) , pages=

    Grad-CAM++: Generalized Gradient-based Visual Explanations for Deep Convolutional Networks , author=. 2018 IEEE Winter Conference on Applications of Computer Vision (WACV) , pages=. 2018 , organization=

  32. [32]

    ICLR Workshop , year=

    Striving for Simplicity: The All Convolutional Net , author=. ICLR Workshop , year=

  33. [33]

    Diagnostics , volume=

    Transparent and robust Artificial intelligence-driven Electrocardiogram model for Left Ventricular Systolic Dysfunction , author=. Diagnostics , volume=. 2025 , publisher=

  34. [34]

    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , pages=

    Analyzing and Improving the Image Quality of StyleGAN , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , pages=

  35. [35]

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , pages=

    Deep Residual Learning for Image Recognition , author=. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , pages=

  36. [36]

    Hindricks, Gerhard and Potpara, Tatjana and Dagres, Nikolaos and others , journal=. 2020. 2021 , publisher=

  37. [37]

    Nature Reviews Disease Primers , volume=

    Atrial fibrillation , author=. Nature Reviews Disease Primers , volume=. 2016 , publisher=

  38. [38]

    The American Journal of Emergency Medicine , volume=

    Electrocardiographic manifestations of hyperkalemia , author=. The American Journal of Emergency Medicine , volume=. 2000 , publisher=

  39. [39]

    The Journal of Emergency Medicine , volume=

    Electrocardiographic manifestations: electrolyte abnormalities , author=. The Journal of Emergency Medicine , volume=. 2004 , publisher=

  40. [40]

    BMC Medicine , volume=

    Key challenges for delivering clinical impact with artificial intelligence , author=. BMC Medicine , volume=. 2019 , publisher=

  41. [41]

    Machine Learning for Healthcare Conference , pages=

    What clinicians want: contextualizing explainable machine learning for clinical end use , author=. Machine Learning for Healthcare Conference , pages=. 2019 , organization=

  42. [42]

    Approval of artificial intelligence and machine learning-based medical devices in the

    Muehlematter, Urs J and Daniore, Paola and Vokinger, Kerstin N , journal=. Approval of artificial intelligence and machine learning-based medical devices in the. 2021 , publisher=

  43. [43]

    Explainable Artificial Intelligence (

    Barredo Arrieta, Alejandro and D. Explainable Artificial Intelligence (. Information Fusion , volume=. 2020 , publisher=

  44. [44]

    Science Translational Medicine , volume=

    Reproducibility in machine learning for health research: still a ways to go , author=. Science Translational Medicine , volume=. 2021 , publisher=