PEACE: Cross-modal Enhanced Pediatric-Adult ECG Alignment for Robust Pediatric Diagnosis

Chengyu Liu; Heyang Xu; Hongxiang Gao; Jianqing Li; Xinran Liu; Yuwen Li; Zongmin Wang

arxiv: 2605.00647 · v1 · submitted 2026-05-01 · 💻 cs.LG

PEACE: Cross-modal Enhanced Pediatric-Adult ECG Alignment for Robust Pediatric Diagnosis

Xinran Liu , Yuwen Li , Hongxiang Gao , Heyang Xu , Jianqing Li , Zongmin Wang , Chengyu Liu This is my paper

Pith reviewed 2026-05-09 19:57 UTC · model grok-4.3

classification 💻 cs.LG

keywords pediatric ECGadult-to-pediatric transfercross-modal alignmentclinical semantic supervisiondomain adaptationlow-resource diagnosisECG feature alignment

0 comments

The pith

Structured clinical semantic supervision aligns adult ECG features with pediatric diagnostic targets to improve low-resource transfer.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces PEACE as a cross-modal framework that decomposes ECG signals along clinical semantic axes, extracts label-conditioned features, and applies curriculum-gated optimization to transfer representations from adult to pediatric data. Adult-trained models typically fail on pediatric ECGs due to physiological differences and the scarcity of labeled pediatric examples. By generating label-conditioned semantic descriptors with Gemini for use only as auxiliary training signals, the approach enables standard ECG-only inference while lifting performance on a dedicated pediatric dataset. A reader would care because reliable automated pediatric ECG diagnosis could help address data limitations without requiring large new labeled collections or paired clinical reports.

Core claim

PEACE integrates tri-axial clinical semantic decomposition, label-query feature extraction, and curriculum-gated optimization to align transferable adult ECG representations with pediatric diagnostic targets. Since ZZU-pECG provides no paired clinical reports, label-conditioned semantic descriptors are generated using Gemini with concise clinical prompts and used only as auxiliary training supervision; inference remains ECG-only. On ZZU-pECG this produces 59.39 percent AUC zero-shot, 79.03 percent 50-shot, and 90.89 percent full fine-tuning, together with 96.65 percent AUC on the shared PTB-XL label space.

What carries the argument

Tri-axial clinical semantic decomposition paired with label-query feature extraction that supplies auxiliary supervision from generated semantic descriptors to reduce adult-pediatric domain mismatch.

If this is right

Zero-shot adult-to-pediatric transfer reaches 59.39 percent AUC on ZZU-pECG.
With 50 labeled pediatric samples performance rises to 79.03 percent AUC and to 90.89 percent with full fine-tuning.
On the label space shared with PTB-XL the aligned model attains 96.65 percent AUC.
After training, inference uses only the ECG input because semantic descriptors serve solely as auxiliary signals.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Prospective clinical validation on new patient cohorts would be required to confirm utility beyond the retrospective ZZU-pECG collection.
Adding explicit age-aware modeling could further refine alignment across pediatric age subgroups where heart morphology changes rapidly.
The same auxiliary-supervision pattern may apply to other medical-signal domains that exhibit population mismatch and lack paired textual reports.

Load-bearing premise

The Gemini-generated label-conditioned semantic descriptors accurately capture clinically relevant ECG semantics and supply effective auxiliary supervision without adding bias or noise.

What would settle it

Training an otherwise identical model without the semantic-supervision component and observing no reduction in AUC on ZZU-pECG under zero-shot or 50-shot conditions would falsify the central claim.

Figures

Figures reproduced from arXiv: 2605.00647 by Chengyu Liu, Heyang Xu, Hongxiang Gao, Jianqing Li, Xinran Liu, Yuwen Li, Zongmin Wang.

**Figure 2.** Figure 2: Sample distribution across diagnostic labels, illustrating the relative proportion of each [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

**Figure 3.** Figure 3: Category-wise performance evaluation of PEACE on ZZU-pECG. [PITH_FULL_IMAGE:figures/full_fig_p015_3.png] view at source ↗

**Figure 4.** Figure 4: Visual comparison of IRBBB morphologies across adult and pediatric populations. The Lead II waveform segments highlight the significant rhythm and interval discrepancies between age groups [PITH_FULL_IMAGE:figures/full_fig_p015_4.png] view at source ↗

**Figure 5.** Figure 5: Grad-CAM++ attention visualization for a [PITH_FULL_IMAGE:figures/full_fig_p028_5.png] view at source ↗

**Figure 6.** Figure 6: Grad-CAM++ attention visualization for an [PITH_FULL_IMAGE:figures/full_fig_p028_6.png] view at source ↗

**Figure 7.** Figure 7: Grad-CAM++ attention visualization for an [PITH_FULL_IMAGE:figures/full_fig_p029_7.png] view at source ↗

**Figure 8.** Figure 8: Grad-CAM++ attention visualization for an [PITH_FULL_IMAGE:figures/full_fig_p029_8.png] view at source ↗

**Figure 9.** Figure 9: Grad-CAM++ attention visualization for an [PITH_FULL_IMAGE:figures/full_fig_p030_9.png] view at source ↗

**Figure 10.** Figure 10: Grad-CAM++ attention visualization for an [PITH_FULL_IMAGE:figures/full_fig_p030_10.png] view at source ↗

**Figure 11.** Figure 11: Grad-CAM++ attention visualization for an [PITH_FULL_IMAGE:figures/full_fig_p031_11.png] view at source ↗

**Figure 12.** Figure 12: Grad-CAM++ attention visualization for a [PITH_FULL_IMAGE:figures/full_fig_p031_12.png] view at source ↗

**Figure 13.** Figure 13: Grad-CAM++ attention visualization for an [PITH_FULL_IMAGE:figures/full_fig_p032_13.png] view at source ↗

**Figure 14.** Figure 14: Grad-CAM++ attention visualization for an [PITH_FULL_IMAGE:figures/full_fig_p032_14.png] view at source ↗

**Figure 15.** Figure 15: Grad-CAM++ attention visualization for an [PITH_FULL_IMAGE:figures/full_fig_p033_15.png] view at source ↗

**Figure 16.** Figure 16: Grad-CAM++ attention visualization for a [PITH_FULL_IMAGE:figures/full_fig_p033_16.png] view at source ↗

read the original abstract

Automated pediatric electrocardiogram (ECG) diagnosis remains challenging because models trained predominantly on adult data suffer from substantial cross-population mismatch, while pediatric labels are often scarce. We present PEACE (Pediatric-Adult ECG Alignment via Cross-modal Enhancement), a structured cross-modal alignment framework for adult-to-pediatric ECG transfer. PEACE integrates tri-axial clinical semantic decomposition, label-query feature extraction, and curriculum-gated optimization to align transferable adult ECG representations with pediatric diagnostic targets. Since ZZU-pECG provides no paired clinical reports, we generate label-conditioned semantic descriptors using Gemini with concise clinical prompts and use them only as auxiliary training supervision; inference remains ECG-only. On ZZU-pECG, PEACE achieves 59.39%, 79.03%, and 90.89% AUC under zero-shot, 50-shot, and full fine-tuning settings, respectively, and reaches 96.65% AUC on the shared PTB-XL label space. These results suggest that structured clinical semantic supervision can improve low-resource adult-to-pediatric ECG transfer, while prospective clinical validation and more explicit age-aware modeling remain necessary before real-world deployment.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper introduces PEACE, a cross-modal alignment framework for adult-to-pediatric ECG transfer that combines tri-axial clinical semantic decomposition, label-query feature extraction, and curriculum-gated optimization. Because the ZZU-pECG dataset lacks paired clinical reports, the method generates label-conditioned semantic descriptors via the Gemini LLM using concise prompts and employs them solely as auxiliary training supervision (inference remains ECG-only). It reports AUCs of 59.39% (zero-shot), 79.03% (50-shot), and 90.89% (full fine-tuning) on ZZU-pECG plus 96.65% on the shared PTB-XL label space, claiming that structured clinical semantic supervision improves low-resource transfer.

Significance. If the attribution of gains to the semantic supervision holds, the work offers a practical route to leverage abundant adult ECG data for pediatric diagnosis under label scarcity, with the curriculum gating and cross-modal components providing a structured way to mitigate population mismatch. The grounding on public datasets (PTB-XL) and the ECG-only inference design are positive for reproducibility and deployment.

major comments (2)

[Abstract] Abstract and Methods: The central claim that 'structured clinical semantic supervision can improve low-resource adult-to-pediatric ECG transfer' rests on the quality of the Gemini-generated descriptors, yet no expert validation, inter-rater agreement, or clinical-fidelity check is reported; because descriptors are produced only from class labels plus prompts on a dataset without paired reports, any systematic bias or age-inappropriate content directly contaminates the auxiliary signal.
[Results] Results and Experiments: The reported AUC gains (e.g., 59.39% zero-shot, 79.03% 50-shot) are presented without baselines that isolate the semantic branch, without error bars or statistical tests, and without an ablation that removes the cross-modal semantic supervision while keeping all other components fixed; this leaves open the possibility that gains arise from architectural choices unrelated to the LLM descriptors.

minor comments (1)

[Abstract] The abstract states that 'prospective clinical validation and more explicit age-aware modeling remain necessary,' but the manuscript would benefit from a dedicated limitations paragraph that quantifies the dependence on the external LLM and discusses reproducibility of the descriptor generation process.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on our work. We address the major concerns point by point below, providing clarifications and indicating the revisions we will make to strengthen the manuscript.

read point-by-point responses

Referee: [Abstract] Abstract and Methods: The central claim that 'structured clinical semantic supervision can improve low-resource adult-to-pediatric ECG transfer' rests on the quality of the Gemini-generated descriptors, yet no expert validation, inter-rater agreement, or clinical-fidelity check is reported; because descriptors are produced only from class labels plus prompts on a dataset without paired reports, any systematic bias or age-inappropriate content directly contaminates the auxiliary signal.

Authors: We agree that the lack of explicit validation for the LLM-generated descriptors is a limitation in the current version. The ZZU-pECG dataset indeed lacks paired clinical reports, which is why we relied on label-conditioned generation with concise clinical prompts. In the revised manuscript, we will add qualitative examples of the generated semantic descriptors in an appendix, along with a discussion of potential biases (e.g., age-inappropriate content) in the Limitations section. We will also perform a small-scale expert review on a subset of descriptors if feasible, or explicitly state that such validation is planned for future work. The curriculum-gated optimization is designed to reduce the influence of noisy supervision by gradually increasing its weight, and inference remains purely ECG-based to avoid propagating any biases at test time. We believe this addresses the core concern while acknowledging the inherent data constraints. revision: partial
Referee: [Results] Results and Experiments: The reported AUC gains (e.g., 59.39% zero-shot, 79.03% 50-shot) are presented without baselines that isolate the semantic branch, without error bars or statistical tests, and without an ablation that removes the cross-modal semantic supervision while keeping all other components fixed; this leaves open the possibility that gains arise from architectural choices unrelated to the LLM descriptors.

Authors: This is a valid point, and we will strengthen the experimental section accordingly. In the revision, we will include an ablation study that removes the cross-modal semantic supervision (i.e., the LLM descriptors and associated alignment loss) while retaining the tri-axial clinical semantic decomposition, label-query feature extraction, and curriculum-gated optimization. We will report mean AUC with standard deviations from multiple random seeds (e.g., 5 runs) and include statistical tests such as Wilcoxon signed-rank tests to compare against baselines. These additions will isolate the contribution of the semantic supervision and provide evidence that the gains are attributable to it rather than other architectural elements. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical framework with external grounding

full rationale

The paper presents an empirical cross-modal alignment framework evaluated on public datasets (ZZU-pECG and PTB-XL) with reported AUC metrics under zero-shot, few-shot, and full fine-tuning regimes. No mathematical derivation chain, equations, or uniqueness theorems are invoked that reduce by construction to the paper's own inputs or fitted parameters. The use of Gemini for generating label-conditioned descriptors is an external auxiliary step (training-only, inference ECG-only) rather than a self-referential loop; the descriptors are not claimed to be derived from model outputs or predictions. No self-citations are load-bearing for core claims, and no ansatz or renaming of known results occurs. The framework remains self-contained against external benchmarks without the specific reductions required for circularity flags.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the assumption that LLM-generated semantic descriptors serve as reliable auxiliary supervision for ECG alignment without paired reports, plus standard assumptions in deep learning transfer.

axioms (1)

domain assumption Gemini generates accurate label-conditioned semantic descriptors from concise clinical prompts that are useful for ECG feature alignment
Invoked to justify auxiliary supervision since ZZU-pECG provides no paired clinical reports.

pith-pipeline@v0.9.0 · 5516 in / 1220 out tokens · 43024 ms · 2026-05-09T19:57:36.575895+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

17 extracted references · 17 canonical work pages

[1]

Temporal Unification: ECG records (3–10s) were resampled to 500Hz and uniformly adjusted to 10 seconds (5,000 samples) via truncation or zero-padding

work page
[2]

left ventricular hypertrophy

Ontology Mapping: We mapped raw clinical strings to 12 target categories (e.g., mapping "left ventricular hypertrophy" toLVH) to ensure cross-dataset label consistency

work page
[3]

Signal Quality Filtering: After merging machine_measurements and label records via subject_id, we excluded records with missing annotations or excessive noise, resulting in 483,051 valid samples

work page
[4]

A.2.2 ZZU-pECG Preprocessing Labels in ZZU-pECG follow the AHA code format

Splitting: Using a fixed seed (42), we partitioned the data into training, validation, and test sets (8:1:1 ratio). A.2.2 ZZU-pECG Preprocessing Labels in ZZU-pECG follow the AHA code format. Our workflow addresses pediatric-specific signal characteristics: 16

work page
[5]

Specifically, codes A1/A2 were assigned toNORM, while L145/L146 were mapped toSTTC

Label Conversion: We mapped 14 AHA codes (e.g., I101, L147) to our 12 core disease labels (e.g.,LAFB,TAB_). Specifically, codes A1/A2 were assigned toNORM, while L145/L146 were mapped toSTTC

work page
[6]

We applied a filtering suite consisting of high-pass (0.5Hz), low-pass (100Hz), and 50Hz notch filters to remove baseline wander and power-line interference

Signal Refinement: Signals were standardized to 10s at 500Hz. We applied a filtering suite consisting of high-pass (0.5Hz), low-pass (100Hz), and 50Hz notch filters to remove baseline wander and power-line interference

work page
[7]

Amplitude Calibration: To mitigate gain variances across acquisition hardware, we scaled pediatric signal amplitudes relative to the peak-to-peak statistics of the MIMIC-IV dataset, ensuring a consistent input distribution

work page
[8]

We employed multi-label stratified sampling to maintain class balance across subsets (8:1:1 ratio)

Stratified Splitting: From 12,334 original records, 7,593 valid multi-label samples were selected. We employed multi-label stratified sampling to maintain class balance across subsets (8:1:1 ratio). Table 5: ZZU-pECG Label Mapping Original Label Mapping Target Full Name A1 NORM normal ECG A2 NORM normal ECG I101 LAFB left anterior fascicular block I105 IR...

work page
[9]

Ontology Refinement: We mapped SCP codes to our 12-label system based on the sub- diagnostic hierarchy. Notably, T-wave abnormalities (e.g.,NDT) were subsumed under the STTCcategory according to PTB-XL’s diagnostic logic, resulting in a zero count for the standaloneTAB_label in this dataset

work page
[10]

Standardization: 10-second signals (500Hz) were processed using a StandardScaler fitted on training set statistics to normalize the signal mean and variance

work page
[11]

ischemia

Data Splitting: We selected 16,363 valid samples containing target labels and split them into training, validation, and test sets at an 8:1:1 ratio. After preprocessing, the number of samples of each category used in the final experiment for the three datasets is shown in the table below. 17 Table 6: PTB-XL Label Mapping Original Label Mapping Target Full...

work page
[12]

Cold Start (N <10 ):Performance gains are marginal ( <1.3% ), as the model struggles to align sparse pediatric samples with the pretrained adult clinical priors

work page
[13]

At this stage, 50 samples appear sufficient to capture core pediatric-specific morphological features

Rapid Adaptation (10≤N≤50 ):A significant performance surge is observed, with AUC improving by approximately 13% (from 66.13% to 79.03%). At this stage, 50 samples appear sufficient to capture core pediatric-specific morphological features

work page
[14]

Saturation Phase (N >50 ):Beyond 50-shot, performance gains hit a bottleneck ( <1% ), suggesting that further increasing the annotation burden provides diminishing returns for cross-modal transfer. Based on this empirical evidence, we established the 50-shot protocol as our primary few-shot adaptation benchmark, as it achieves 99.2% of the 100-shot perfor...

work page 2016
[15]

This protocol strictly isolates the cross-population 22 generalization capacity of the learned representations

Zero-shot Evaluation Protocol.In this regime, all model parameters remain frozen at the state obtained from MIMIC-IV pre-training. This protocol strictly isolates the cross-population 22 generalization capacity of the learned representations. We optimize class-specific classification thresholds on the target validation set by identifying the values that m...

work page
[16]

Few-shot Adaptation Protocol (50-shot).The model is initialized with pre-trained weights. The ECG encoder and LQN fusion module are unfrozen, while the text encoder maintains its foundational 10 transformer layers in a frozen state to prevent catastrophic forgetting of medical priors. We randomly sample 50 labeled instances per diagnostic class (or the ma...

work page
[17]

Full Supervised Fine-tuning Protocol.This regime utilizes the entire labeled training portion of the target dataset. The optimization schedule and the graduated freezing strategy (first 10 transformer layers of BioClinicalBERT remain fixed) are consistent with the few-shot protocol to ensure stability during cross-domain adaptation. Training is conducted ...

work page 2003

[1] [1]

Temporal Unification: ECG records (3–10s) were resampled to 500Hz and uniformly adjusted to 10 seconds (5,000 samples) via truncation or zero-padding

work page

[2] [2]

left ventricular hypertrophy

Ontology Mapping: We mapped raw clinical strings to 12 target categories (e.g., mapping "left ventricular hypertrophy" toLVH) to ensure cross-dataset label consistency

work page

[3] [3]

Signal Quality Filtering: After merging machine_measurements and label records via subject_id, we excluded records with missing annotations or excessive noise, resulting in 483,051 valid samples

work page

[4] [4]

A.2.2 ZZU-pECG Preprocessing Labels in ZZU-pECG follow the AHA code format

Splitting: Using a fixed seed (42), we partitioned the data into training, validation, and test sets (8:1:1 ratio). A.2.2 ZZU-pECG Preprocessing Labels in ZZU-pECG follow the AHA code format. Our workflow addresses pediatric-specific signal characteristics: 16

work page

[5] [5]

Specifically, codes A1/A2 were assigned toNORM, while L145/L146 were mapped toSTTC

Label Conversion: We mapped 14 AHA codes (e.g., I101, L147) to our 12 core disease labels (e.g.,LAFB,TAB_). Specifically, codes A1/A2 were assigned toNORM, while L145/L146 were mapped toSTTC

work page

[6] [6]

We applied a filtering suite consisting of high-pass (0.5Hz), low-pass (100Hz), and 50Hz notch filters to remove baseline wander and power-line interference

Signal Refinement: Signals were standardized to 10s at 500Hz. We applied a filtering suite consisting of high-pass (0.5Hz), low-pass (100Hz), and 50Hz notch filters to remove baseline wander and power-line interference

work page

[7] [7]

Amplitude Calibration: To mitigate gain variances across acquisition hardware, we scaled pediatric signal amplitudes relative to the peak-to-peak statistics of the MIMIC-IV dataset, ensuring a consistent input distribution

work page

[8] [8]

We employed multi-label stratified sampling to maintain class balance across subsets (8:1:1 ratio)

Stratified Splitting: From 12,334 original records, 7,593 valid multi-label samples were selected. We employed multi-label stratified sampling to maintain class balance across subsets (8:1:1 ratio). Table 5: ZZU-pECG Label Mapping Original Label Mapping Target Full Name A1 NORM normal ECG A2 NORM normal ECG I101 LAFB left anterior fascicular block I105 IR...

work page

[9] [9]

Ontology Refinement: We mapped SCP codes to our 12-label system based on the sub- diagnostic hierarchy. Notably, T-wave abnormalities (e.g.,NDT) were subsumed under the STTCcategory according to PTB-XL’s diagnostic logic, resulting in a zero count for the standaloneTAB_label in this dataset

work page

[10] [10]

Standardization: 10-second signals (500Hz) were processed using a StandardScaler fitted on training set statistics to normalize the signal mean and variance

work page

[11] [11]

ischemia

Data Splitting: We selected 16,363 valid samples containing target labels and split them into training, validation, and test sets at an 8:1:1 ratio. After preprocessing, the number of samples of each category used in the final experiment for the three datasets is shown in the table below. 17 Table 6: PTB-XL Label Mapping Original Label Mapping Target Full...

work page

[12] [12]

Cold Start (N <10 ):Performance gains are marginal ( <1.3% ), as the model struggles to align sparse pediatric samples with the pretrained adult clinical priors

work page

[13] [13]

At this stage, 50 samples appear sufficient to capture core pediatric-specific morphological features

Rapid Adaptation (10≤N≤50 ):A significant performance surge is observed, with AUC improving by approximately 13% (from 66.13% to 79.03%). At this stage, 50 samples appear sufficient to capture core pediatric-specific morphological features

work page

[14] [14]

Saturation Phase (N >50 ):Beyond 50-shot, performance gains hit a bottleneck ( <1% ), suggesting that further increasing the annotation burden provides diminishing returns for cross-modal transfer. Based on this empirical evidence, we established the 50-shot protocol as our primary few-shot adaptation benchmark, as it achieves 99.2% of the 100-shot perfor...

work page 2016

[15] [15]

This protocol strictly isolates the cross-population 22 generalization capacity of the learned representations

Zero-shot Evaluation Protocol.In this regime, all model parameters remain frozen at the state obtained from MIMIC-IV pre-training. This protocol strictly isolates the cross-population 22 generalization capacity of the learned representations. We optimize class-specific classification thresholds on the target validation set by identifying the values that m...

work page

[16] [16]

Few-shot Adaptation Protocol (50-shot).The model is initialized with pre-trained weights. The ECG encoder and LQN fusion module are unfrozen, while the text encoder maintains its foundational 10 transformer layers in a frozen state to prevent catastrophic forgetting of medical priors. We randomly sample 50 labeled instances per diagnostic class (or the ma...

work page

[17] [17]

Full Supervised Fine-tuning Protocol.This regime utilizes the entire labeled training portion of the target dataset. The optimization schedule and the graduated freezing strategy (first 10 transformer layers of BioClinicalBERT remain fixed) are consistent with the few-shot protocol to ensure stability during cross-domain adaptation. Training is conducted ...

work page 2003