Masked Training for Robust Arrhythmia Detection from Digitalized Multiple Layout ECG Images
Pith reviewed 2026-05-19 01:02 UTC · model grok-4.3
The pith
PatchECG detects atrial fibrillation from digitized multi-layout ECG images by masking missing patches and using disordered attention, achieving 0.778 AUROC on real clinical scans.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
PatchECG combines an adaptive variable block count missing learning mechanism with a masked training strategy. The model segments each lead into fixed-length patches, discards entirely missing patches, and encodes the remainder via a pluggable patch encoder. A disordered patch attention mechanism with patch-level temporal and lead embeddings captures cross-lead and temporal dependencies without interpolation. Trained on PTB-XL and evaluated under seven simulated layout conditions, it achieves an average AUROC of approximately 0.835 across all simulated layouts. On the Chaoyang cohort, the model attains an overall AUROC of 0.778 for atrial fibrillation detection, rising to 0.893 on the 12x1 1
What carries the argument
Disordered patch attention mechanism with patch-level temporal and lead embeddings that operates on variable numbers of surviving patches after discarding fully missing ones.
If this is right
- Arrhythmia detection becomes possible directly on legacy paper ECG archives without requiring full signal reconstruction or interpolation.
- Performance gains are largest on standard 12x1 layouts, suggesting immediate clinical utility for the most common recording formats.
- Attention maps that approach inter-clinician agreement provide built-in interpretability for diagnostic review.
- The same patch-masking approach can be applied to other signal types that suffer partial sensor loss or format variation.
Where Pith is reading between the lines
- The method could reduce the cost and error of retrospective studies that rely on historical paper records.
- Extending the masking strategy to multi-modal inputs such as ECG plus chest X-ray might improve joint cardiac diagnosis.
- Deployment in low-resource clinics with inconsistent scanning equipment becomes more feasible if the model tolerates greater layout diversity.
Load-bearing premise
The seven simulated layout conditions and missing patterns used on PTB-XL represent the temporal asynchrony and contiguous blackout artifacts that appear when real paper ECGs are digitized.
What would settle it
A large drop in AUROC or attention alignment on a new collection of digitized ECG images whose missing patterns fall outside the seven simulated conditions would show the robustness claim does not hold.
Figures
read the original abstract
Background: Electrocardiograms are indispensable for diagnosing cardiovascular diseases, yet in many settings they exist only as paper printouts stored in multiple recording layouts. Converting these images into digital signals introduces two key challenges: temporal asynchrony among leads and partial blackout missing, where contiguous signal segments become entirely unavailable. Existing models cannot adequately handle these concurrent problems while maintaining interpretability. Methods: We propose PatchECG, combining an adaptive variable block count missing learning mechanism with a masked training strategy. The model segments each lead into fixed-length patches, discards entirely missing patches, and encodes the remainder via a pluggable patch encoder. A disordered patch attention mechanism with patch-level temporal and lead embeddings captures cross-lead and temporal dependencies without interpolation. PatchECG was trained on PTB-XL and evaluated under seven simulated layout conditions, with external validation on 400 real ECG images from Chaoyang Hospital across three clinical layouts. Results: PatchECG achieves an average AUROC of approximately 0.835 across all simulated layouts. On the Chaoyang cohort, the model attains an overall AUROC of 0.778 for atrial fibrillation detection, rising to 0.893 on the 12x1 subset -- surpassing the pre-trained baseline by 0.111 and 0.190, respectively. Model attention aligns with cardiologist annotations at a rate approaching inter-clinician agreement. Conclusions: PatchECG provides a robust, interpolation-free, and interpretable solution for arrhythmia detection from digitized ECG images across diverse layouts. Its direct modeling of asynchronous and partially missing signals, combined with clinically aligned attention, positions it as a practical tool for cardiac diagnostics from legacy ECG archives in real-world clinical environments.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces PatchECG, a model for arrhythmia detection from digitized multi-layout ECG images that combines an adaptive variable block count missing learning mechanism with masked training. Each lead is segmented into patches, missing patches are discarded, and a disordered patch attention mechanism with temporal and lead embeddings models cross-lead dependencies without interpolation. The model is trained on PTB-XL under seven simulated layout conditions and externally validated on 400 real ECG images from Chaoyang Hospital, reporting average AUROC ~0.835 on simulations and 0.778 overall (0.893 on 12x1 subset) for atrial fibrillation, outperforming a pre-trained baseline, with attention aligning to cardiologist annotations.
Significance. If the reported robustness and attention alignment hold under real digitization conditions, this could have meaningful clinical impact by enabling reliable arrhythmia detection from legacy paper ECG archives across varying layouts. The external validation on actual hospital images and the emphasis on interpretability without interpolation are notable strengths that address a practical gap in handling asynchronous and partially missing signals.
major comments (1)
- [Methods: evaluation under seven simulated layout conditions and external validation on Chaoyang Hospital images] Methods section on evaluation under seven simulated layout conditions and external validation on Chaoyang Hospital images: no quantitative comparison (histograms, statistical tests, or distribution metrics) is provided for missing-segment lengths, lead misalignment, or blackout patterns between the seven simulated conditions on PTB-XL and the actual artifacts in the 400 Chaoyang real images. This is load-bearing for the central claim of robustness, as the external AUROC gains (0.778 overall, 0.893 on 12x1) and attention alignment could be specific to the Chaoyang cohort rather than generalizable if real artifacts differ in contiguous gap statistics or asynchrony.
minor comments (2)
- [Abstract] Abstract and Results: reported AUROCs (e.g., 0.778, 0.893) are given without error bars, confidence intervals, or standard deviations, and no ablation on the disordered patch attention component is described, limiting assessment of its specific contribution to the performance.
- The selection criteria, annotation protocol, and any quality controls for the 400 Chaoyang real images are not detailed, which affects reproducibility of the external validation.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. We address the major comment below and describe the revisions we will make to strengthen the work.
read point-by-point responses
-
Referee: [Methods: evaluation under seven simulated layout conditions and external validation on Chaoyang Hospital images] Methods section on evaluation under seven simulated layout conditions and external validation on Chaoyang Hospital images: no quantitative comparison (histograms, statistical tests, or distribution metrics) is provided for missing-segment lengths, lead misalignment, or blackout patterns between the seven simulated conditions on PTB-XL and the actual artifacts in the 400 Chaoyang real images. This is load-bearing for the central claim of robustness, as the external AUROC gains (0.778 overall, 0.893 on 12x1) and attention alignment could be specific to the Chaoyang cohort rather than generalizable if real artifacts differ in contiguous gap statistics or asynchrony.
Authors: We agree that a direct quantitative comparison of artifact statistics between the simulated conditions and the real Chaoyang Hospital images would provide stronger support for the generalizability of our robustness claims. The seven simulated layouts were constructed to reflect common digitization artifacts observed in clinical practice, including variable contiguous missing segments and lead asynchrony, but we did not previously report explicit distribution metrics. In the revised manuscript we will add histograms of missing-segment lengths, summary statistics (means, variances, and ranges), and statistical tests (e.g., Kolmogorov-Smirnov) comparing blackout patterns and misalignment between the PTB-XL simulations and the 400 real images. These additions will allow readers to evaluate the degree of overlap in artifact distributions and will clarify that the observed performance gains are not limited to the specific characteristics of the Chaoyang cohort. revision: yes
Circularity Check
No significant circularity; results are measured on held-out external data
full rationale
The paper's central claims consist of empirical performance metrics obtained by training PatchECG on PTB-XL under seven simulated layout conditions and then measuring AUROC on both held-out simulated layouts and a separate external cohort of 400 real digitized ECG images from Chaoyang Hospital. These AUROCs (approximately 0.835 average on simulations; 0.778 overall and 0.893 on the 12x1 subset on Chaoyang) are direct evaluation outcomes on data unseen during training rather than quantities derived by construction from the loss function, fitted parameters, or self-cited prior results. No equations reduce the reported performance or attention-alignment statistics to tautological inputs, and the method description (adaptive missing-patch handling plus disordered patch attention) introduces independent architectural choices whose validity is assessed externally. The derivation chain is therefore self-contained against the provided benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption PTB-XL dataset labels and signal quality are sufficiently accurate for supervised training of arrhythmia detectors
invented entities (1)
-
disordered patch attention mechanism
no independent evidence
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
adaptive variable block count missing learning mechanism with a masked training strategy... disordered patch attention mechanism with patch-level temporal and lead embeddings
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
PatchECG achieves an average AUROC of approximately 0.835 across all simulated layouts
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Model focuses on the background, is not recognized by experts
-
[3]
Capture asynchronous ECG dependence Lead Ⅰ Lead Ⅱ Lead Ⅲ aVR aVL aVF V1 V2 V3 V4 V5 V6 2.5s asynchronous 5s asynchronous CVD Model CVD
-
[4]
Model focuses on waveform, recognized by experts
-
[5]
Partial Blackout missing after digitization
-
[6]
Capture synchronous ECG dependence
-
[7]
Capture asynchronous ECG dependence a b PatchECGc Lead Ⅰ Lead Ⅱ Lead Ⅲ aVR aVL aVF V1 V2 V3 V4 V5 V6 2.5s asynchronous 5s asynchronous Patch Encoder Segment & Shuffle & Stitch Model CVD *
-
[8]
Strong accuracy, Strong Extensibility
-
[9]
Partial Blackout missing 3.Capture asynchronous ECG dependence Lead Embedding Temporal Embedding Digitization Tranditional Method ECG Signal 3x4+Ⅱ 6x2 Pretrained Model Resnet Projection ... 3x4+Ⅱ+V1 3x4 6x2+Ⅱ 12x1 Figure 1: a: Traditional image methods focus on the image background and are limited in accuracy, and cannot effectively capture asynchronous E...
work page 2024
-
[10]
155 When the digitized signal obtained from the generated ECG image is used as the model 156 input, our method performs the best among all baseline methods. Due to the high accuracy of 157 digitization, TimeXer can effectively capture the exogenous information provided by the missing 158 patterns, thus demonstrating excellent performance in this experimen...
work page 2024
-
[11]
Mensah, G.A., Roth, G.A., and Fuster, V. (2019). The global burden of cardiovascular 459 diseases and risk factors: 2020 and beyond. American College of Cardiology Foundation 460 Washington, DC. 461
work page 2019
-
[12]
Goldberger, A.L., Amaral, L.A., Glass, L., Hausdorff, J.M., Ivanov, P .C., Mark, R.G., Mietus,462 J.E., Moody, G.B., Peng, C.K., and Stanley, H.E. (2000). PhysioBank, PhysioToolkit, and Phy-463 sioNet: Components of a new research resource for complex physiologic signals. Circulation 464 101, e215–e220. 465
work page 2000
-
[13]
Sangha, V., Mortazavi, B.J., Haimovich, A.D., Ribeiro, A.H., Brandt, C.A., Jacoby, D.L., 466 Schulz, W.L., Krumholz, H.M., Ribeiro, A.L.P ., and Khera, R. (2022). Automated multilabel 467 diagnosis on electrocardiographic images and signals. Nature communications 13, 1583. 468
work page 2022
-
[14]
(2025).469 Clinically meaningful interpretability of an ai model for ecg classification
Gliner, V., Levy, I., Tsutsui, K., Acha, M.R., Schliamser, J., Schuster, A., and Y aniv, Y . (2025).469 Clinically meaningful interpretability of an ai model for ecg classification. NPJ Digital Medicine 470 8, 109. 471
work page 2025
- [15]
-
[16]
Fortune, J.D., Coppa, N.E., Haq, K.T., Patel, H., and Tereshchenko, L.G. (2022). Digitizing 475 ecg image: A new method and open-source software code. Computer methods and programs 476 in biomedicine 221, 106890. 477
work page 2022
-
[17]
Sau, A., Zeidaabadi, B., Patlatzoglou, K., Pastika, L., Ribeiro, A.H., Sabino, E., Peters, 478 N.S., Ribeiro, A.L.P ., Kramer, D.B., Waks, J.W. et al. (2025). A comparison of artificial 479 intelligence–enhanced electrocardiography approaches for the prediction of time to mortality 480 using electrocardiogram images. European Heart Journal-Digital Health ...
work page 2025
-
[18]
Islam, M.R.U., Tadepalli, P ., and Fern, A. (2025). Self-attention-based diffusion model for 482 time-series imputation in partial blackout scenarios. In Proceedings of the AAAI Conference 483 on Artificial Intelligence vol. 39. pp. 17564–17572. 484
work page 2025
- [19]
-
[20]
Du, W., Côté, D., and Liu, Y . (2023). Saits: Self-attention-based imputation for time series.488 Expert Systems with Applications 219, 119619. 489
work page 2023
-
[21]
Dong, J., Wu, H., Zhang, H., Zhang, L., Wang, J., and Long, M. (2023). Simmtm: A simple 490 pre-training framework for masked time-series modeling. Advances in Neural Information 491 Processing Systems 36, 29996–30025. 492
work page 2023
- [22]
-
[23]
Timexer: Empowering transformers for time series forecasting with exogenous variables,
Wang, Y ., Wu, H., Dong, J., Qin, G., Zhang, H., Liu, Y ., Qiu, Y ., Wang, J., and Long, M. (2024).495 Timexer: Empowering transformers for time series forecasting with exogenous variables. 496 arXiv preprint arXiv:2402.19072. 497 17
-
[24]
Holmes: Health online model ensemble serving for deep learning models in intensive 499 care units
Hong, S., Xu, Y ., Khare, A., Priambada, S., Maher, K., Aljiffry, A., Sun, J., and Tumanov, A.498 (2020). Holmes: Health online model ensemble serving for deep learning models in intensive 499 care units. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge 500 Discovery & Data Mining. pp. 1614–1624. 501
work page 2020
-
[25]
Li, J., Aguirre, A.D., Junior, V.M., Jin, J., Liu, C., Zhong, L., Sun, C., Clifford, G., Bran- 502 don Westover, M., and Hong, S. (2025). An electrocardiogram foundation model built on over 503 10 million recordings. NEJM AI 2, AIoa2401033. 504
work page 2025
-
[26]
Nie, Y ., Nguyen, N.H., Sinthong, P ., and Kalagnanam, J. (2022). A time series is worth 64505 words: Long-term forecasting with transformers. arXiv preprint arXiv:2211.14730. 506
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[27]
Wang, J., Qiao, X., Liu, C., Wang, X., Liu, Y ., Y ao, L., and Zhang, H. (2021). Automated ecg507 classification using a non-local convolutional block attention module. Computer methods 508 and programs in biomedicine 203, 106006. 509
work page 2021
-
[28]
Zhang, S., Lian, C., Xu, B., Su, Y ., and Alhudhaif, A. (2024). 12-lead ecg signal classifica- 510 tion for detecting ecg arrhythmia via an information bottleneck-based multi-scale network. 511 Information Sciences 662, 120239. 512
work page 2024
-
[29]
Huang, Z., Herbozo Contreras, L.F ., Yu, L., Truong, N.D., Nikpour, A., and Kavehei, O. 513 (2024). S4d-ecg: a shallow state-of-the-art model for cardiac abnormality classification. 514 Cardiovascular Engineering and Technology 15, 305–316. 515
work page 2024
-
[30]
Liu, W., Wang, F ., Huang, Q., Chang, S., Wang, H., and He, J. (2019). Mfb-cbrnn: A 516 hybrid network for mi detection using 12-lead ecgs. IEEE journal of biomedical and health 517 informatics 24, 503–514. 518
work page 2019
-
[31]
Y ao, Q., Wang, R., Fan, X., Liu, J., and Li, Y . (2020). Multi-class arrhythmia detection from 12-519 lead varied-length ecg using attention-based time-incremental convolutional neural network. 520 Information Fusion 53, 174–182. 521
work page 2020
-
[32]
Che, Z., Purushotham, S., Cho, K., Sontag, D., and Liu, Y . (2018). Recurrent neural networks 522 for multivariate time series with missing values. Scientific reports 8, 6085. 523
work page 2018
-
[33]
Chen, X., Li, X., Liu, B., and Li, Z. (2023). Biased temporal convolution graph network for 524 time series forecasting with missing values. In The Twelfth International Conference on 525 Learning Representations. 526
work page 2023
-
[34]
Grover, S., Jalali, A., and Etemad, A. (2024). Segment, shuffle, and stitch: A simple layer for 527 improving time-series representations. Advances in Neural Information Processing Systems 528 37, 4878–4905. 529
work page 2024
-
[35]
Wagner, P ., Strodthoff, N., Bousseljot, R.D., Kreiseler, D., Lunze, F .I., Samek, W., and 530 Schaeffter, T. (2020). Ptb-xl, a large publicly available electrocardiography dataset. Scientific 531 data 7, 1–15. 532
work page 2020
-
[36]
Lin, T.Y ., Goyal, P ., Girshick, R., He, K., and Dollár, P . (2017). Focal loss for dense object533 detection. In Proceedings of the IEEE international conference on computer vision. pp. 534 2980–2988. 535 18
work page 2017
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.