pith. sign in

arxiv: 2508.09165 · v3 · submitted 2025-08-06 · 💻 cs.LG · cs.CV

Masked Training for Robust Arrhythmia Detection from Digitalized Multiple Layout ECG Images

Pith reviewed 2026-05-19 01:02 UTC · model grok-4.3

classification 💻 cs.LG cs.CV
keywords ECG image analysismasked trainingarrhythmia detectionatrial fibrillationpatch attentionmissing datadigitized ECGmulti-layout ECG
0
0 comments X

The pith

PatchECG detects atrial fibrillation from digitized multi-layout ECG images by masking missing patches and using disordered attention, achieving 0.778 AUROC on real clinical scans.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops PatchECG to process ECGs that survive only as paper printouts in varying layouts. Digitization creates timing mismatches across leads and stretches of completely lost signal. The model divides each lead into patches, drops any fully absent ones, and applies masked training plus a disordered attention layer that uses temporal and lead embeddings to relate the remaining patches without interpolation. Evaluation on PTB-XL under seven simulated conditions and on 400 real images from Chaoyang Hospital shows gains over baselines, especially on standard 12-lead layouts, together with attention maps that match cardiologist markings. This opens a route to automated reading of large legacy paper ECG collections that current digital pipelines cannot handle directly.

Core claim

PatchECG combines an adaptive variable block count missing learning mechanism with a masked training strategy. The model segments each lead into fixed-length patches, discards entirely missing patches, and encodes the remainder via a pluggable patch encoder. A disordered patch attention mechanism with patch-level temporal and lead embeddings captures cross-lead and temporal dependencies without interpolation. Trained on PTB-XL and evaluated under seven simulated layout conditions, it achieves an average AUROC of approximately 0.835 across all simulated layouts. On the Chaoyang cohort, the model attains an overall AUROC of 0.778 for atrial fibrillation detection, rising to 0.893 on the 12x1 1

What carries the argument

Disordered patch attention mechanism with patch-level temporal and lead embeddings that operates on variable numbers of surviving patches after discarding fully missing ones.

If this is right

  • Arrhythmia detection becomes possible directly on legacy paper ECG archives without requiring full signal reconstruction or interpolation.
  • Performance gains are largest on standard 12x1 layouts, suggesting immediate clinical utility for the most common recording formats.
  • Attention maps that approach inter-clinician agreement provide built-in interpretability for diagnostic review.
  • The same patch-masking approach can be applied to other signal types that suffer partial sensor loss or format variation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The method could reduce the cost and error of retrospective studies that rely on historical paper records.
  • Extending the masking strategy to multi-modal inputs such as ECG plus chest X-ray might improve joint cardiac diagnosis.
  • Deployment in low-resource clinics with inconsistent scanning equipment becomes more feasible if the model tolerates greater layout diversity.

Load-bearing premise

The seven simulated layout conditions and missing patterns used on PTB-XL represent the temporal asynchrony and contiguous blackout artifacts that appear when real paper ECGs are digitized.

What would settle it

A large drop in AUROC or attention alignment on a new collection of digitized ECG images whose missing patterns fall outside the seven simulated conditions would show the robustness claim does not hold.

Figures

Figures reproduced from arXiv: 2508.09165 by Deyun Zhang, Jun Li, Kexin Wang, Qinghao Zhao, Shanwei Zhang, Shenda Hong, Shengyong Chen, Shijia Geng, Xingliang Wu, Xingpeng Liu, Yirao Tao, Yuxi Zhou.

Figure 1
Figure 1. Figure 1: a: Traditional image methods focus on the image background and are limited in accuracy, and cannot effectively capture asynchronous ECG dependencies. b: Signal methods are highly accurate, cannot capture asynchronous asynchronous ECG dependencies and effectively deal with partial blackout missing. c: We propose a variable block number mechanism and patch guided attention mechanism based on masking training… view at source ↗
Figure 2
Figure 2. Figure 2: Results of PatchECG and Other Methods with Different Layouts [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Digitally generated ECG image good. Although some methods have higher recall rates, their specificity is significantly lower, 170 making them prone to misclassifying negative samples as positive, leading to an increased risk 171 of misdiagnosis. Due to pre-training on tens of millions of ECGs, ECGFounder exhibits strong 172 robustness. When there is a certain degree of data loss, the method fills in the da… view at source ↗
Figure 4
Figure 4. Figure 4: a: Accurate prediction of atrial fibrillation; b: Misdiagnosis of atrial fibrillation as non￾atrial fibrillation; this is a false negative; c: Successful identification of non-atrial fibrillation; d: Misdiagnosis of non-atrial fibrillation as atrial fibrillation; this is a false positive 8 [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Framework Overview. scenarios with different layouts of ECG, the above methods are difficult to accurately capture the 337 collaborative dependencies and temporal features between leads, making it difficult to achieve 338 effective modeling of ECG signals. 339 In summary, although progress has been made in ECG signal data imputation and direct 340 modeling based on missing data, so far there has been no re… view at source ↗
Figure 6
Figure 6. Figure 6: Data used in model development and validation for the ECG arrhythmia detection. [PITH_FULL_IMAGE:figures/full_fig_p015_6.png] view at source ↗
read the original abstract

Background: Electrocardiograms are indispensable for diagnosing cardiovascular diseases, yet in many settings they exist only as paper printouts stored in multiple recording layouts. Converting these images into digital signals introduces two key challenges: temporal asynchrony among leads and partial blackout missing, where contiguous signal segments become entirely unavailable. Existing models cannot adequately handle these concurrent problems while maintaining interpretability. Methods: We propose PatchECG, combining an adaptive variable block count missing learning mechanism with a masked training strategy. The model segments each lead into fixed-length patches, discards entirely missing patches, and encodes the remainder via a pluggable patch encoder. A disordered patch attention mechanism with patch-level temporal and lead embeddings captures cross-lead and temporal dependencies without interpolation. PatchECG was trained on PTB-XL and evaluated under seven simulated layout conditions, with external validation on 400 real ECG images from Chaoyang Hospital across three clinical layouts. Results: PatchECG achieves an average AUROC of approximately 0.835 across all simulated layouts. On the Chaoyang cohort, the model attains an overall AUROC of 0.778 for atrial fibrillation detection, rising to 0.893 on the 12x1 subset -- surpassing the pre-trained baseline by 0.111 and 0.190, respectively. Model attention aligns with cardiologist annotations at a rate approaching inter-clinician agreement. Conclusions: PatchECG provides a robust, interpolation-free, and interpretable solution for arrhythmia detection from digitized ECG images across diverse layouts. Its direct modeling of asynchronous and partially missing signals, combined with clinically aligned attention, positions it as a practical tool for cardiac diagnostics from legacy ECG archives in real-world clinical environments.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The manuscript introduces PatchECG, a model for arrhythmia detection from digitized multi-layout ECG images that combines an adaptive variable block count missing learning mechanism with masked training. Each lead is segmented into patches, missing patches are discarded, and a disordered patch attention mechanism with temporal and lead embeddings models cross-lead dependencies without interpolation. The model is trained on PTB-XL under seven simulated layout conditions and externally validated on 400 real ECG images from Chaoyang Hospital, reporting average AUROC ~0.835 on simulations and 0.778 overall (0.893 on 12x1 subset) for atrial fibrillation, outperforming a pre-trained baseline, with attention aligning to cardiologist annotations.

Significance. If the reported robustness and attention alignment hold under real digitization conditions, this could have meaningful clinical impact by enabling reliable arrhythmia detection from legacy paper ECG archives across varying layouts. The external validation on actual hospital images and the emphasis on interpretability without interpolation are notable strengths that address a practical gap in handling asynchronous and partially missing signals.

major comments (1)
  1. [Methods: evaluation under seven simulated layout conditions and external validation on Chaoyang Hospital images] Methods section on evaluation under seven simulated layout conditions and external validation on Chaoyang Hospital images: no quantitative comparison (histograms, statistical tests, or distribution metrics) is provided for missing-segment lengths, lead misalignment, or blackout patterns between the seven simulated conditions on PTB-XL and the actual artifacts in the 400 Chaoyang real images. This is load-bearing for the central claim of robustness, as the external AUROC gains (0.778 overall, 0.893 on 12x1) and attention alignment could be specific to the Chaoyang cohort rather than generalizable if real artifacts differ in contiguous gap statistics or asynchrony.
minor comments (2)
  1. [Abstract] Abstract and Results: reported AUROCs (e.g., 0.778, 0.893) are given without error bars, confidence intervals, or standard deviations, and no ablation on the disordered patch attention component is described, limiting assessment of its specific contribution to the performance.
  2. The selection criteria, annotation protocol, and any quality controls for the 400 Chaoyang real images are not detailed, which affects reproducibility of the external validation.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address the major comment below and describe the revisions we will make to strengthen the work.

read point-by-point responses
  1. Referee: [Methods: evaluation under seven simulated layout conditions and external validation on Chaoyang Hospital images] Methods section on evaluation under seven simulated layout conditions and external validation on Chaoyang Hospital images: no quantitative comparison (histograms, statistical tests, or distribution metrics) is provided for missing-segment lengths, lead misalignment, or blackout patterns between the seven simulated conditions on PTB-XL and the actual artifacts in the 400 Chaoyang real images. This is load-bearing for the central claim of robustness, as the external AUROC gains (0.778 overall, 0.893 on 12x1) and attention alignment could be specific to the Chaoyang cohort rather than generalizable if real artifacts differ in contiguous gap statistics or asynchrony.

    Authors: We agree that a direct quantitative comparison of artifact statistics between the simulated conditions and the real Chaoyang Hospital images would provide stronger support for the generalizability of our robustness claims. The seven simulated layouts were constructed to reflect common digitization artifacts observed in clinical practice, including variable contiguous missing segments and lead asynchrony, but we did not previously report explicit distribution metrics. In the revised manuscript we will add histograms of missing-segment lengths, summary statistics (means, variances, and ranges), and statistical tests (e.g., Kolmogorov-Smirnov) comparing blackout patterns and misalignment between the PTB-XL simulations and the 400 real images. These additions will allow readers to evaluate the degree of overlap in artifact distributions and will clarify that the observed performance gains are not limited to the specific characteristics of the Chaoyang cohort. revision: yes

Circularity Check

0 steps flagged

No significant circularity; results are measured on held-out external data

full rationale

The paper's central claims consist of empirical performance metrics obtained by training PatchECG on PTB-XL under seven simulated layout conditions and then measuring AUROC on both held-out simulated layouts and a separate external cohort of 400 real digitized ECG images from Chaoyang Hospital. These AUROCs (approximately 0.835 average on simulations; 0.778 overall and 0.893 on the 12x1 subset on Chaoyang) are direct evaluation outcomes on data unseen during training rather than quantities derived by construction from the loss function, fitted parameters, or self-cited prior results. No equations reduce the reported performance or attention-alignment statistics to tautological inputs, and the method description (adaptive missing-patch handling plus disordered patch attention) introduces independent architectural choices whose validity is assessed externally. The derivation chain is therefore self-contained against the provided benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the assumption that PTB-XL contains representative arrhythmia patterns and that the simulated missing mechanisms generalize to real digitization artifacts; no new physical entities are postulated.

axioms (1)
  • domain assumption PTB-XL dataset labels and signal quality are sufficiently accurate for supervised training of arrhythmia detectors
    The model is trained on PTB-XL and evaluated for arrhythmia detection performance.
invented entities (1)
  • disordered patch attention mechanism no independent evidence
    purpose: Captures cross-lead and temporal dependencies without requiring interpolation of missing segments
    Introduced to handle asynchrony and partial blackout in multi-layout ECG images.

pith-pipeline@v0.9.0 · 5878 in / 1502 out tokens · 21847 ms · 2026-05-19T01:02:44.171357+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

35 extracted references · 35 canonical work pages · 1 internal anchor

  1. [1]

    Model focuses on the background, is not recognized by experts

  2. [3]

    Capture asynchronous ECG dependence Lead Ⅰ Lead Ⅱ Lead Ⅲ aVR aVL aVF V1 V2 V3 V4 V5 V6 2.5s asynchronous 5s asynchronous CVD Model CVD

  3. [4]

    Model focuses on waveform, recognized by experts

  4. [5]

    Partial Blackout missing after digitization

  5. [6]

    Capture synchronous ECG dependence

  6. [7]

    Capture asynchronous ECG dependence a b PatchECGc Lead Ⅰ Lead Ⅱ Lead Ⅲ aVR aVL aVF V1 V2 V3 V4 V5 V6 2.5s asynchronous 5s asynchronous Patch Encoder Segment & Shuffle & Stitch Model CVD *

  7. [8]

    Strong accuracy, Strong Extensibility

  8. [9]

    Partial Blackout

    Partial Blackout missing 3.Capture asynchronous ECG dependence Lead Embedding Temporal Embedding Digitization Tranditional Method ECG Signal 3x4+Ⅱ 6x2 Pretrained Model Resnet Projection ... 3x4+Ⅱ+V1 3x4 6x2+Ⅱ 12x1 Figure 1: a: Traditional image methods focus on the image background and are limited in accuracy, and cannot effectively capture asynchronous E...

  9. [10]

    Partial Blackout

    155 When the digitized signal obtained from the generated ECG image is used as the model 156 input, our method performs the best among all baseline methods. Due to the high accuracy of 157 digitization, TimeXer can effectively capture the exogenous information provided by the missing 158 patterns, thus demonstrating excellent performance in this experimen...

  10. [11]

    Mensah, G.A., Roth, G.A., and Fuster, V. (2019). The global burden of cardiovascular 459 diseases and risk factors: 2020 and beyond. American College of Cardiology Foundation 460 Washington, DC. 461

  11. [12]

    Goldberger, A.L., Amaral, L.A., Glass, L., Hausdorff, J.M., Ivanov, P .C., Mark, R.G., Mietus,462 J.E., Moody, G.B., Peng, C.K., and Stanley, H.E. (2000). PhysioBank, PhysioToolkit, and Phy-463 sioNet: Components of a new research resource for complex physiologic signals. Circulation 464 101, e215–e220. 465

  12. [13]

    Sangha, V., Mortazavi, B.J., Haimovich, A.D., Ribeiro, A.H., Brandt, C.A., Jacoby, D.L., 466 Schulz, W.L., Krumholz, H.M., Ribeiro, A.L.P ., and Khera, R. (2022). Automated multilabel 467 diagnosis on electrocardiographic images and signals. Nature communications 13, 1583. 468

  13. [14]

    (2025).469 Clinically meaningful interpretability of an ai model for ecg classification

    Gliner, V., Levy, I., Tsutsui, K., Acha, M.R., Schliamser, J., Schuster, A., and Y aniv, Y . (2025).469 Clinically meaningful interpretability of an ai model for ecg classification. NPJ Digital Medicine 470 8, 109. 471

  14. [15]

    Krones, F ., Walker, B., Lyons, T., and Mahdi, A. (2024). Combining hough transform 472 and deep learning approaches to reconstruct ecg signals from printouts. arXiv preprint 473 arXiv:2410.14185. 474

  15. [16]

    Fortune, J.D., Coppa, N.E., Haq, K.T., Patel, H., and Tereshchenko, L.G. (2022). Digitizing 475 ecg image: A new method and open-source software code. Computer methods and programs 476 in biomedicine 221, 106890. 477

  16. [17]

    Sau, A., Zeidaabadi, B., Patlatzoglou, K., Pastika, L., Ribeiro, A.H., Sabino, E., Peters, 478 N.S., Ribeiro, A.L.P ., Kramer, D.B., Waks, J.W. et al. (2025). A comparison of artificial 479 intelligence–enhanced electrocardiography approaches for the prediction of time to mortality 480 using electrocardiogram images. European Heart Journal-Digital Health ...

  17. [18]

    Islam, M.R.U., Tadepalli, P ., and Fern, A. (2025). Self-attention-based diffusion model for 482 time-series imputation in partial blackout scenarios. In Proceedings of the AAAI Conference 483 on Artificial Intelligence vol. 39. pp. 17564–17572. 484

  18. [19]

    Neog, A., Daw, A., Khorasgani, S.F ., and Karpatne, A. (2025). Masking the gaps: An 485 imputation-free approach to time series modeling with missing data. arXiv preprint 486 arXiv:2502.15785. 487

  19. [20]

    Du, W., Côté, D., and Liu, Y . (2023). Saits: Self-attention-based imputation for time series.488 Expert Systems with Applications 219, 119619. 489

  20. [21]

    Dong, J., Wu, H., Zhang, H., Zhang, L., Wang, J., and Long, M. (2023). Simmtm: A simple 490 pre-training framework for masked time-series modeling. Advances in Neural Information 491 Processing Systems 36, 29996–30025. 492

  21. [22]

    Wang, Y ., Huang, N., Li, T., Y an, Y ., and Zhang, X. (2024). Medformer: A multi-granularity493 patching transformer for medical time-series classification. arXiv preprint arXiv:2405.19363. 494

  22. [23]

    Timexer: Empowering transformers for time series forecasting with exogenous variables,

    Wang, Y ., Wu, H., Dong, J., Qin, G., Zhang, H., Liu, Y ., Qiu, Y ., Wang, J., and Long, M. (2024).495 Timexer: Empowering transformers for time series forecasting with exogenous variables. 496 arXiv preprint arXiv:2402.19072. 497 17

  23. [24]

    Holmes: Health online model ensemble serving for deep learning models in intensive 499 care units

    Hong, S., Xu, Y ., Khare, A., Priambada, S., Maher, K., Aljiffry, A., Sun, J., and Tumanov, A.498 (2020). Holmes: Health online model ensemble serving for deep learning models in intensive 499 care units. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge 500 Discovery & Data Mining. pp. 1614–1624. 501

  24. [25]

    Li, J., Aguirre, A.D., Junior, V.M., Jin, J., Liu, C., Zhong, L., Sun, C., Clifford, G., Bran- 502 don Westover, M., and Hong, S. (2025). An electrocardiogram foundation model built on over 503 10 million recordings. NEJM AI 2, AIoa2401033. 504

  25. [26]

    Nie, Y ., Nguyen, N.H., Sinthong, P ., and Kalagnanam, J. (2022). A time series is worth 64505 words: Long-term forecasting with transformers. arXiv preprint arXiv:2211.14730. 506

  26. [27]

    Wang, J., Qiao, X., Liu, C., Wang, X., Liu, Y ., Y ao, L., and Zhang, H. (2021). Automated ecg507 classification using a non-local convolutional block attention module. Computer methods 508 and programs in biomedicine 203, 106006. 509

  27. [28]

    Zhang, S., Lian, C., Xu, B., Su, Y ., and Alhudhaif, A. (2024). 12-lead ecg signal classifica- 510 tion for detecting ecg arrhythmia via an information bottleneck-based multi-scale network. 511 Information Sciences 662, 120239. 512

  28. [29]

    513 (2024)

    Huang, Z., Herbozo Contreras, L.F ., Yu, L., Truong, N.D., Nikpour, A., and Kavehei, O. 513 (2024). S4d-ecg: a shallow state-of-the-art model for cardiac abnormality classification. 514 Cardiovascular Engineering and Technology 15, 305–316. 515

  29. [30]

    Liu, W., Wang, F ., Huang, Q., Chang, S., Wang, H., and He, J. (2019). Mfb-cbrnn: A 516 hybrid network for mi detection using 12-lead ecgs. IEEE journal of biomedical and health 517 informatics 24, 503–514. 518

  30. [31]

    Y ao, Q., Wang, R., Fan, X., Liu, J., and Li, Y . (2020). Multi-class arrhythmia detection from 12-519 lead varied-length ecg using attention-based time-incremental convolutional neural network. 520 Information Fusion 53, 174–182. 521

  31. [32]

    Che, Z., Purushotham, S., Cho, K., Sontag, D., and Liu, Y . (2018). Recurrent neural networks 522 for multivariate time series with missing values. Scientific reports 8, 6085. 523

  32. [33]

    Chen, X., Li, X., Liu, B., and Li, Z. (2023). Biased temporal convolution graph network for 524 time series forecasting with missing values. In The Twelfth International Conference on 525 Learning Representations. 526

  33. [34]

    Grover, S., Jalali, A., and Etemad, A. (2024). Segment, shuffle, and stitch: A simple layer for 527 improving time-series representations. Advances in Neural Information Processing Systems 528 37, 4878–4905. 529

  34. [35]

    Wagner, P ., Strodthoff, N., Bousseljot, R.D., Kreiseler, D., Lunze, F .I., Samek, W., and 530 Schaeffter, T. (2020). Ptb-xl, a large publicly available electrocardiography dataset. Scientific 531 data 7, 1–15. 532

  35. [36]

    Lin, T.Y ., Goyal, P ., Girshick, R., He, K., and Dollár, P . (2017). Focal loss for dense object533 detection. In Proceedings of the IEEE international conference on computer vision. pp. 534 2980–2988. 535 18