Recognition: unknown
Beyond Patient Invariance: Learning Cardiac Dynamics via Action-Conditioned JEPAs
Pith reviewed 2026-05-08 12:03 UTC · model grok-4.3
The pith
Predicting future heart electrical states after disease onset separates stable anatomy from changing pathology more effectively than invariance-based self-supervised learning.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Adapting joint-embedding predictive architectures to physiological time-series by conditioning on disease onsets enables models to simulate electrophysiological dynamics, where pathology functions as a transition vector acting on the patient's latent state; this explicitly disentangles stable anatomical features from transient pathological forces and produces higher performance than supervised learning on triage classification, especially when labeled examples are limited.
What carries the argument
Action-conditioned JEPA that predicts future latent states of the heart given a disease onset represented as a transition vector.
If this is right
- The model achieves higher accuracy than fully supervised baselines on critical cardiac triage tasks.
- It demonstrates over 0.05 AUROC improvement over supervised learning when training data is scarce.
- Modeling biological dynamics provides a denser supervision signal than static classification.
- Stable anatomical features are disentangled from dynamic pathological forces in the learned representations.
Where Pith is reading between the lines
- Event conditioning on time series could apply to other physiological signals such as continuous vital signs or respiratory recordings.
- The learned transition vectors might enable forecasting of individual patient trajectories under different disease onsets.
- Greater sample efficiency could support diagnostic development in clinical environments where labeled ECG data remains limited.
Load-bearing premise
That pathology can be represented as a transition vector acting on a patient's latent state and that the adapted model will learn to simulate disease-driven dynamics accurately from unlabeled ECG sequences.
What would settle it
If the action-conditioned model shows no improvement or lower accuracy than invariance-based self-supervised or fully supervised models on the MIMIC-IV-ECG triage task in low-resource regimes, the claimed benefit of learning dynamics would not hold.
Figures
read the original abstract
Self-supervised learning in healthcare has largely relied on invariance-based objectives, which maximize similarity between different views of the same patient. While effective for static anatomy, this paradigm is fundamentally misaligned with clinical diagnosis, as it mathematically compels the model to suppress the transient pathological changes it is intended to detect. We propose a shift towards Action-Conditioned World Models that learn to simulate the dynamics of disease progression, or Event-Conditioned. Adapting the LeJEPA framework to physiological time-series, we define pathology not as a static label, but as a transition vector acting on a patient's latent state. By predicting the future electrophysiological state of the heart given a disease onset, our model explicitly disentangles stable anatomical features from dynamic pathological forces. Evaluated on the MIMIC-IV-ECG dataset, our approach outperforms fully supervised baselines on the critical triage task. Crucially, we demonstrate superior sample efficiency: in low-resource regimes, our world model outperforms supervised learning by over 0.05 AUROC. These results suggest that modeling biological dynamics provides a dense supervision signal that is far more robust than static classification. Source code is available at https://github.com/cljosegfer/lesaude-dynamics
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper adapts the LeJEPA framework to ECG time-series from MIMIC-IV, replacing patient-invariance objectives with an action-conditioned predictor that treats pathology as a transition vector acting on latent states. By training to forecast future electrophysiological states given disease-onset actions, the model is claimed to disentangle stable anatomical features from dynamic pathological forces. On the triage task it reports outperforming fully supervised baselines, with gains exceeding 0.05 AUROC in low-resource regimes, and releases code.
Significance. If the reported gains arise from genuine simulation of disease dynamics rather than label leakage or static correlations, the work would supply a concrete alternative to invariance-based SSL for medical time-series and demonstrate improved sample efficiency on a clinically relevant task. The public code release is a positive contribution that enables direct verification.
major comments (3)
- [Abstract and §3] Abstract and §3 (Action-Conditioned JEPA): the manuscript states the model is trained on 'unlabeled ECG sequences' yet conditions the predictor on 'disease onset.' The source of this action vector must be specified (e.g., whether it is derived from the same MIMIC-IV clinical codes that define the supervised triage labels). Without this, the 0.05 AUROC improvement in low-resource settings cannot be attributed to learned dynamics rather than leakage of the target supervision signal.
- [§4.3] §4.3 (Low-resource experiments): the headline claim that the world model 'outperforms supervised learning by over 0.05 AUROC' is load-bearing for the central thesis. The section should report the number of independent runs, standard deviations, and a statistical test; an ablation that isolates the contribution of the action-conditioning (versus a non-conditioned LeJEPA baseline) is also required to confirm that the gain is not an artifact of the particular data split or hyper-parameter choice.
- [§3.1] §3.1 (Latent-state transition): the definition of pathology as a 'transition vector acting on a patient's latent state' is introduced without an explicit equation showing how the action is injected into the JEPA predictor (e.g., additive conditioning, cross-attention, or concatenation). This detail is necessary to evaluate whether the architecture can in principle separate anatomy from pathology or merely memorizes correlations present in the training sequences.
minor comments (2)
- [Abstract] The abstract and introduction repeatedly use 'Event-Conditioned' and 'Action-Conditioned' interchangeably; a single consistent term should be adopted.
- [Figure 1] Figure 1 caption should explicitly label the 'action' input and the 'future state' prediction target so that readers can map the diagram to the equations in §3.
Simulated Author's Rebuttal
We thank the referee for their constructive and detailed comments, which have identified important areas for clarification and strengthening of our empirical claims. We address each major comment point by point below and will incorporate revisions to improve the manuscript.
read point-by-point responses
-
Referee: [Abstract and §3] Abstract and §3 (Action-Conditioned JEPA): the manuscript states the model is trained on 'unlabeled ECG sequences' yet conditions the predictor on 'disease onset.' The source of this action vector must be specified (e.g., whether it is derived from the same MIMIC-IV clinical codes that define the supervised triage labels). Without this, the 0.05 AUROC improvement in low-resource settings cannot be attributed to learned dynamics rather than leakage of the target supervision signal.
Authors: We appreciate the referee highlighting the need for explicit specification of the action source. The disease-onset actions are derived from the clinical diagnostic codes in MIMIC-IV, which overlap with those used to construct the supervised triage labels. However, during the self-supervised pretraining phase the model receives only the raw ECG sequences and the action embeddings as conditioning; it has no access to the downstream triage label itself. The actions are temporally aligned with sequence onsets but serve exclusively as transition signals for future-state prediction. To eliminate ambiguity and address leakage concerns, we will revise §3 and the abstract to detail the exact extraction process, the temporal decoupling from evaluation labels, and an explicit statement that pretraining does not optimize for the triage task. We will also add a note clarifying that any performance gain arises from the learned dynamics rather than direct label supervision. revision: yes
-
Referee: [§4.3] §4.3 (Low-resource experiments): the headline claim that the world model 'outperforms supervised learning by over 0.05 AUROC' is load-bearing for the central thesis. The section should report the number of independent runs, standard deviations, and a statistical test; an ablation that isolates the contribution of the action-conditioning (versus a non-conditioned LeJEPA baseline) is also required to confirm that the gain is not an artifact of the particular data split or hyper-parameter choice.
Authors: We agree that the reported gains require stronger statistical support and isolation of the action-conditioning effect. In the revised manuscript we will report AUROC results averaged over five independent runs (different random seeds for data splits and initialization), including standard deviations. We will also include a paired statistical test (Wilcoxon signed-rank) comparing the action-conditioned model against the supervised baseline. In addition, we will insert a new ablation table contrasting the full action-conditioned JEPA against a non-conditioned LeJEPA baseline trained under identical conditions, thereby confirming that the observed improvement is attributable to the conditioning mechanism rather than split or hyper-parameter artifacts. revision: yes
-
Referee: [§3.1] §3.1 (Latent-state transition): the definition of pathology as a 'transition vector acting on a patient's latent state' is introduced without an explicit equation showing how the action is injected into the JEPA predictor (e.g., additive conditioning, cross-attention, or concatenation). This detail is necessary to evaluate whether the architecture can in principle separate anatomy from pathology or merely memorizes correlations present in the training sequences.
Authors: We thank the referee for noting the missing formalization. We will add an explicit equation in §3.1 that defines the predictor update: given latent state z_t and action embedding a, the conditioned input is formed by concatenation [z_t; a] followed by a linear projection and subsequent transformer layers. This formulation is intended to allow the model to learn additive or multiplicative modifications to the anatomical representation induced by pathology. The revised text will also discuss how this conditioning supports disentanglement of stable versus transient features, directly addressing the concern about memorization versus genuine dynamics modeling. revision: yes
Circularity Check
No significant circularity; derivation adapts external framework with independent empirical evaluation
full rationale
The paper adapts the LeJEPA framework to ECG time-series by defining pathology as an action/transition vector and trains a world model to predict future states from unlabeled sequences. The central result (0.05 AUROC gain over supervised baselines in low-resource regimes on MIMIC-IV-ECG) is presented as an empirical outcome of this modeling choice rather than a quantity derived by construction from fitted inputs. No equations, self-citations, or uniqueness theorems are invoked that reduce the claimed disentanglement or performance to tautological inputs. The approach remains self-contained against external supervised benchmarks without load-bearing self-referential steps.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption LeJEPA framework can be directly adapted to physiological time-series data
invented entities (1)
-
pathology as transition vector on latent state
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Adrien Bardes, Jean Ponce, and Yann LeCun. Vicreg: Variance-invariance-covariance regularization for self-supervised learning.arXiv preprint arXiv:2105.04906,
-
[2]
A path towards autonomous machine intelligence version 0.9
Yann LeCun. A path towards autonomous machine intelligence version 0.9. 2, 2022-06-27.Open Review, 62(1):1–62,
2022
-
[3]
A contrastive predictive coding-based classification frame- work for healthcare sensor data.Journal of Healthcare Engineering, 2022(1):5649253,
Chaoxu Ren, Le Sun, and Dandan Peng. A contrastive predictive coding-based classification frame- work for healthcare sensor data.Journal of Healthcare Engineering, 2022(1):5649253,
2022
-
[4]
Mimic-iv-ecg-ext-icd: Diagnostic labels for mimic-iv-ecg (version 1.0
9 ICLR 2026 the 2nd Workshop on World Models Nils Strodthoff, JM Lopez Alcaraz, and W Haverkamp IV . Mimic-iv-ecg-ext-icd: Diagnostic labels for mimic-iv-ecg (version 1.0. 1).PhysioNet. RRID: SCR 007345 https://doi. org/10.13026/hdyc- 1h77, 2024a. Nils Strodthoff, Juan Miguel Lopez Alcaraz, and Wilhelm Haverkamp. Prospects for artificial intelligence-enha...
-
[5]
Jets: A self-supervised joint embedding time series foundation model for behavioral data in healthcare
Erik Xie, Wyatt Chang, Raquel Rodriguez Martinez, and Brandon Ballinger. Jets: A self-supervised joint embedding time series foundation model for behavioral data in healthcare. InNeurIPS 2025 Workshop on Learning from Time Series for Health,
2025
-
[6]
Yijun Yang, Zhao-Yang Wang, Qiuping Liu, Shuwen Sun, Kang Wang, Rama Chellappa, Zongwei Zhou, Alan Yuille, Lei Zhu, Yu-Dong Zhang, et al. Medical world model: Generative simulation of tumor evolution for treatment planning.arXiv preprint arXiv:2506.02327,
-
[7]
arXiv preprint arXiv:2502.05494 , year=
Ya Zhou, Yujie Yang, Jianhuang Gan, Xiangjie Li, Jing Yuan, and Wei Zhao. Multi-scale masked autoencoder for electrocardiogram anomaly detection.arXiv preprint arXiv:2502.05494,
-
[8]
To focus the world model on cardiac electrophysiology, we filtered these labels for ICD-10 Chapter IX (Diseases of the Circulatory System), identified by the prefix ‘I‘
Label Processing and Cardiac State Definition.Each ECG in the dataset is annotated with ICD- 10 codes, with an average of8.15codes per record across15,197unique medical conditions. To focus the world model on cardiac electrophysiology, we filtered these labels for ICD-10 Chapter IX (Diseases of the Circulatory System), identified by the prefix ‘I‘. Follow...
2026
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.