Recognition: unknown
CGM-JEPA: Learning Consistent Continuous Glucose Monitor Representations via Predictive Self-Supervised Pretraining
Pith reviewed 2026-05-09 19:57 UTC · model grok-4.3
The pith
Predicting masked latent representations from CGM time series yields abstractions that transfer across modalities for detecting insulin resistance and beta-cell dysfunction.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
CGM-JEPA is a self-supervised pretraining framework which predicts masked latent representations rather than raw values, yielding abstraction that transfers across modalities. X-CGM-JEPA adds a masked Glucodensity cross-view objective for complementary distributional information. Pretrained on approximately 389k unlabeled CGM readings from 228 subjects and evaluated on two clinical cohorts across cohort generalization, venous-to-CGM transfer, and home CGM regimes under repeated cross-validation, X-CGM-JEPA ranks first or second on AUROC for both insulin resistance and beta-cell dysfunction endpoints in every regime while no baseline does, with gains up to 6.5 percentage points in cohort work
What carries the argument
The predictive self-supervised pretraining that learns to forecast masked latent vectors from CGM sequences, augmented by cross-view prediction of Glucodensity summaries to enforce distributional consistency.
If this is right
- Higher AUROC for insulin resistance and beta-cell dysfunction detection holds across cohort shifts and modality changes.
- Performance gaps shrink between subgroups such as different ethnicities under modality shift.
- Label-aware clustering improves when the distributional Glucodensity view is added to sparse in-domain data.
- Consistent top-two ranking occurs in all three evaluation regimes where single-view baselines fluctuate.
Where Pith is reading between the lines
- The same masked-latent prediction approach could be tested on other multi-view physiological signals such as continuous heart-rate or activity traces to test generality.
- Scaling the unlabeled pretraining corpus beyond 228 subjects would likely strengthen the abstractions for rarer subphenotypes.
- Real-world deployment might reduce reliance on matched venous sampling for initial screening by using only home CGM streams.
Load-bearing premise
Pretraining to predict masked latent representations on unlabeled CGM time series produces abstractions that reliably transfer to clinical cohorts under modality shifts and capture clinically relevant structure for insulin resistance and beta-cell dysfunction detection.
What would settle it
A supervised baseline trained directly on the target clinical cohort data achieving strictly higher AUROC than X-CGM-JEPA across the venous-to-CGM transfer and cohort generalization regimes on the same evaluation splits would falsify the claimed transfer advantage.
read the original abstract
Continuous Glucose Monitoring (CGM) can detect early metabolic subphenotypes (insulin resistance, IR; $\beta$-cell dysfunction), but population-scale deployment faces two coupled problems. First, the same physiological state appears through multiple views (CGM time series, venous OGTT, Glucodensity summaries), so single-view representations fail to transfer when deployment shifts the modality or setting. Second, baselines perform inconsistently across these shifts. Both problems point to one remedy: representations that abstract away from any single view to capture higher-level temporal and distributional structure. We propose CGM-JEPA, a self-supervised pretraining framework which predicts masked latent representations rather than raw values, yielding abstraction that transfers across modalities. X-CGM-JEPA adds a masked Glucodensity cross-view objective for complementary distributional information. We pretrain on $\sim$389k unlabeled CGM readings from 228 subjects and evaluate on two clinical cohorts ($N=27$ and $N=17$ public-release subsets) across three regimes (cohort generalization, venous-to-CGM transfer, home CGM) under 20-iteration $\times$ 2-fold cross-validation. X-CGM-JEPA ranks first or second on AUROC for both endpoints across all three regimes while no baseline does, exceeding the strongest baseline by up to $+6.5$ pp in cohort generalization and $+3.6$ pp in venous-to-CGM transfer (paired Wilcoxon, $p<0.001$). Under modality shift, it matches mean AUROC while redistributing toward weaker subgroups (ethnicity AUROC gap shrinks 25-54%); on sparse in-domain venous data, the distributional view lifts label-aware clustering (ARI $+39\%$, NMI $+40\%$). Code and weights: https://github.com/cruiseresearchgroup/CGM-JEPA
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes CGM-JEPA, a predictive self-supervised pretraining framework for CGM time series that learns representations by predicting masked latent representations (rather than raw values) to achieve abstraction and cross-modality transfer. X-CGM-JEPA extends this with a masked Glucodensity cross-view objective. The model is pretrained on ~389k unlabeled CGM readings from 228 subjects and evaluated on two small public clinical cohorts (N=27 and N=17) across three regimes (cohort generalization, venous-to-CGM transfer, home CGM) using 20-iteration 2-fold cross-validation. It claims to rank first or second on AUROC for both IR and β-cell dysfunction endpoints in all regimes, outperforming the strongest baseline by up to +6.5 pp (cohort generalization) and +3.6 pp (venous-to-CGM transfer) with paired Wilcoxon p<0.001, while also showing benefits in subgroup equity and clustering under modality shift.
Significance. If the empirical superiority holds under more robust validation, the work would advance self-supervised representation learning for physiological time series by demonstrating that latent-prediction objectives can produce abstractions robust to modality shifts (CGM vs. venous OGTT vs. Glucodensity). This has potential clinical value for scalable metabolic subphenotyping. The public release of code and pretrained weights is a clear strength that supports reproducibility and extension.
major comments (2)
- [Evaluation on clinical cohorts (Section 4)] Evaluation on clinical cohorts (Section 4 / results tables): The headline claim that X-CGM-JEPA is the only method to rank first or second across all regimes, with gains up to +6.5 pp and +3.6 pp (paired Wilcoxon p<0.001), rests on AUROC estimates from 20-iteration × 2-fold CV on N=27 and N=17 cohorts. This produces test folds of only ~13–14 subjects; AUROC variance is high at this scale, and the Wilcoxon test over the 20 runs does not establish that the observed ranking or effect size would replicate on new draws from the same population. This directly weakens the inference that the JEPA pretraining yields abstractions that reliably capture IR and β-cell structure under modality shift.
- [Experimental setup (Section 3.3 / 4.1)] Experimental setup (Section 3.3 / 4.1): No details are provided on baseline implementations (e.g., exact architectures, training procedures), the hyperparameter search protocol, or explicit checks for data leakage between the pretraining cohort (228 subjects) and the two evaluation cohorts. These omissions are load-bearing because the reported gains cannot be confidently attributed to the proposed latent-prediction objective versus implementation artifacts or cohort overlap.
minor comments (2)
- [Abstract] The abstract is information-dense; the distinction between CGM-JEPA and X-CGM-JEPA contributions could be stated more explicitly in the first paragraph.
- [Notation throughout] Notation: the manuscript alternates between 'CGM-JEPA' and 'X-CGM-JEPA' without a clear table or section defining which variant is used in each reported result.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback. We address each major comment point by point below, with honest acknowledgment of limitations where appropriate, and propose specific revisions to strengthen the manuscript.
read point-by-point responses
-
Referee: [Evaluation on clinical cohorts (Section 4)] Evaluation on clinical cohorts (Section 4 / results tables): The headline claim that X-CGM-JEPA is the only method to rank first or second across all regimes, with gains up to +6.5 pp and +3.6 pp (paired Wilcoxon p<0.001), rests on AUROC estimates from 20-iteration × 2-fold CV on N=27 and N=17 cohorts. This produces test folds of only ~13–14 subjects; AUROC variance is high at this scale, and the Wilcoxon test over the 20 runs does not establish that the observed ranking or effect size would replicate on new draws from the same population. This directly weakens the inference that the JEPA pretraining yields abstractions that reliably capture IR and β-cell structure under modality shift.
Authors: We agree that the small cohort sizes (N=27 and N=17) inherently limit statistical power and that AUROC estimates exhibit high variance at this scale; the paired Wilcoxon test on CV iterations compares methods on identical splits but does not prove replicability on new draws from the population. The 20-iteration 2-fold CV was selected as a practical approach for small-sample clinical data to enable stable ranking and paired statistical testing while using all available labels. The consistent first- or second-place ranking of X-CGM-JEPA across three distinct regimes and two endpoints provides convergent evidence supporting the value of latent-prediction pretraining for modality-robust representations. In the revised manuscript we will add a dedicated limitations paragraph in the Discussion section that explicitly notes the small sample sizes, high AUROC variance, and the need for external validation on larger independent cohorts. We will also augment the results with bootstrap-derived 95% confidence intervals on AUROC differences (resampled over the 20 CV iterations) to better quantify uncertainty. This constitutes a partial revision, as we cannot expand the underlying clinical datasets but can substantially improve transparency and caveats around the claims. revision: partial
-
Referee: [Experimental setup (Section 3.3 / 4.1)] Experimental setup (Section 3.3 / 4.1): No details are provided on baseline implementations (e.g., exact architectures, training procedures), the hyperparameter search protocol, or explicit checks for data leakage between the pretraining cohort (228 subjects) and the two evaluation cohorts. These omissions are load-bearing because the reported gains cannot be confidently attributed to the proposed latent-prediction objective versus implementation artifacts or cohort overlap.
Authors: We thank the referee for highlighting these transparency gaps. In the revised manuscript we will expand Section 3.3 (with additional details moved to the supplement) to provide complete specifications for every baseline: exact architectures (layer counts, hidden dimensions, attention heads for Transformer variants, etc.), training procedures (optimizer, learning-rate schedules, batch sizes, early-stopping criteria), and the hyperparameter search protocol (grid or random search ranges, validation split used for selection, and final chosen values). For data leakage, the pretraining cohort comprises 228 subjects from a private internal collection, while the two evaluation cohorts are distinct public-release subsets from separate clinical studies; subject identifiers, demographics, and collection dates confirm zero overlap. We will add an explicit statement of disjointness plus a cohort-characteristics table in Section 4.1. These additions will enable full reproduction and allow readers to attribute performance differences to the JEPA objectives rather than implementation details. revision: yes
Circularity Check
No significant circularity in the derivation chain
full rationale
The paper's central claims consist of empirical AUROC rankings and improvements obtained by pretraining a self-supervised model (predicting masked latent representations, plus a Glucodensity cross-view objective) on unlabeled CGM data from 228 subjects and then evaluating on separate held-out clinical cohorts (N=27 and N=17) under 20-iteration 2-fold CV. No equations, fitted parameters, or self-citations are shown to reduce the reported performance gains to quantities defined by the inputs by construction. The method follows standard predictive self-supervised learning without self-definitional, fitted-input, or uniqueness-imported circularity. This is a normal non-circular empirical ML evaluation paper.
Axiom & Free-Parameter Ledger
free parameters (1)
- masking ratio and other SSL hyperparameters
axioms (1)
- domain assumption Predicting masked latent representations abstracts away from single-view surface statistics to capture transferable temporal and distributional structure
Reference graph
Works this paper leans on
-
[1]
Maddix, Hao Wang, Michael W
Abdul Fatir Ansari, Lorenzo Stella, Caner Turkmen, Xiyuan Zhang, Pedro Mercado, Huibin Shen, Oleksandr Shchur, Syama Sundar Rangapuram, Sebastian Pineda Arango, Shubham Kapoor, Jasper Zschiegner, Danielle C. Maddix, Hao Wang, Michael W. Mahoney, Kari Torkkola, Andrew Gordon Wilson, Michael Bohlke-Schneider, and Yuyang Wang. Chronos: Learning the language ...
2024
-
[2]
Self-supervised learning from images with a joint-embedding predictive architecture
Mahmoud Assran, Quentin Duval, Ishan Misra, Piotr Bojanowski, Pascal Vincent, Michael Rabbat, Yann LeCun, and Nicolas Ballas. Self-supervised learning from images with a joint-embedding predictive architecture. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 15619–15629, 2023
2023
-
[3]
Self-supervised learning from images with a joint-embedding predictive architecture, 2023
Mahmoud Assran, Quentin Duval, Ishan Misra, Piotr Bojanowski, Pascal Vincent, Michael Rabbat, Yann LeCun, and Nicolas Ballas. Self-supervised learning from images with a joint-embedding predictive architecture, 2023
2023
-
[4]
Co- modo: Cross-modal video-to-imu distillation for efficient egocentric human activity recognition, 2025
Baiyu Chen, Wilson Wongso, Zechen Li, Yonchanok Khaokaew, Hao Xue, and Flora Salim. Co- modo: Cross-modal video-to-imu distillation for efficient egocentric human activity recognition, 2025
2025
-
[5]
Vl-jepa: Joint em- bedding predictive architecture for vision-language,
Delong Chen, Mustafa Shukor, Theo Moutakanni, Willy Chung, Jade Yu, Tejaswi Kasarla, Allen Bolourchi, Yann LeCun, and Pascale Fung. Vl-jepa: Joint embedding predictive architecture for vision-language.arXiv preprint arXiv:2512.10942, 2025
-
[6]
A simple framework for contrastive learning of visual representations
Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. A simple framework for contrastive learning of visual representations. InProceedings of the 37th International Conference on Machine Learning, ICML’20. JMLR.org, 2020
2020
-
[7]
AnaColás, LuisVigil, BorjaVargas,DavidCuesta-Frau,andManuelVarela. Detrendedfluctuation analysis in the prediction of type 2 diabetes mellitus in patients at risk: Model optimization and comparison with other metrics.PloS one, 14(12):e0225817, 2019
2019
-
[8]
A decoder-only foundation model for time-series forecasting
Abhimanyu Das, Weihao Kong, Rajat Sen, and Yichen Zhou. A decoder-only foundation model for time-series forecasting. InProceedings of the 41st International Conference on Machine Learning, ICML’24. JMLR.org, 2024
2024
-
[9]
Brain-jepa: Brain dynamics foundation model with gradient positioning and spatiotemporal masking.Advances in Neural Information Processing Systems, 37:86048–86073, 2024
Zijian Dong, Ruilin Li, Yilei Wu, Thuan Tinh Nguyen, Joanna Chong, Fang Ji, Nathanael Tong, Christopher Chen, and Juan Helen Zhou. Brain-jepa: Brain dynamics foundation model with gradient positioning and spatiotemporal masking.Advances in Neural Information Processing Systems, 37:86048–86073, 2024. 14 CGM-JEPA: Learning Consistent Continuous Glucose Moni...
2024
-
[10]
Mantis: Lightweight calibrated foundation model for user-friendly time series classification, 2025
Vasilii Feofanov, Songkang Wen, Marius Alonso, Romain Ilbert, Hongbo Guo, Malik Tiomoko, Lujia Pan, Jianfeng Zhang, and Ievgen Redko. Mantis: Lightweight calibrated foundation model for user-friendly time series classification, 2025
2025
-
[11]
Contextual motifs: Increasing the utility of motifs using contextual data
Ian Fox, Lynn Ang, Mamta Jaiswal, Rodica Pop-Busui, and Jenna Wiens. Contextual motifs: Increasing the utility of motifs using contextual data. InProceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’17, page 155–164, New York, NY, USA, 2017. Association for Computing Machinery
2017
-
[12]
Moment: A family of open time-series foundation models
Mononito Goswami, Konrad Szafer, Arjun Choudhry, Yifu Cai, Shuo Li, and Artur Dubrawski. Moment: A family of open time-series foundation models. InInternational Conference on Machine Learning, 2024
2024
-
[13]
Momentum contrast for unsupervised visual representation learning
Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, and Ross Girshick. Momentum contrast for unsupervised visual representation learning. In2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 9726–9735, 2020
2020
-
[14]
A survey on contrastive self-supervised learning, 2021
Ashish Jaiswal, Ashwin Ramesh Babu, Mohammad Zaki Zadeh, Debapriya Banerjee, and Fillia Makedon. A survey on contrastive self-supervised learning, 2021
2021
-
[15]
David C Klonoff, Richard M Bergenstal, Eda Cengiz, Mark A Clements, Daniel Espes, Juan Espinoza, David Kerr, Boris Kovatchev, David M Maahs, Julia K Mader, et al. Continuous glucose monitoring data analysis 2.0: functional data pattern recognition and artificial intelligence applications.Journal of Diabetes Science and Technology, 19(6):1515–1527, 2025
2025
-
[16]
Zechen Li, Baiyu Chen, Hao Xue, and Flora D. Salim. Zara: Zero-shot motion time-series analysis via knowledge and retrieval driven llm agents.arXiv preprint arXiv:2508.04038, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[17]
Zechen Li, Shohreh Deldari, Linyao Chen, Hao Xue, and Flora D. Salim. SensorLLM: Aligning large language models with motion sensors for human activity recognition. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 354–379, 2025
2025
-
[18]
A pretrained transformer model for decoding individual glucose dynamics from continuous glucose monitoring data.National Science Review, 12(5):nwaf039, 2025
Yurun Lu, Dan Liu, Zhongming Liang, Rui Liu, Pei Chen, Yitong Liu, Jiachen Li, Zhanying Feng, Lei M Li, Bin Sheng, et al. A pretrained transformer model for decoding individual glucose dynamics from continuous glucose monitoring data.National Science Review, 12(5):nwaf039, 2025
2025
-
[19]
A large sensor foundation model pretrained on continuous glucose monitor data for diabetes management.npj Health Systems, 2, 2024
Junjie Luo, Abhimanyu B Kumbara, Mansur Shomali, Rui Han, Anand Iyer, Grazia Aleppo, Ritu Agarwal, and Gordon Gao. A large sensor foundation model pretrained on continuous glucose monitor data for diabetes management.npj Health Systems, 2, 2024
2024
-
[20]
Gluformer: Learning generalizable representations from continuous glucose monitoring data
Guy Lutsker, Gal Sapir, Smadar Shilo, Jordi Merino, Anastasia Godneva, Jerry R Greenfield, Dorit Samocha-Bonet, Raja Dhir, Francisco Gude, Shie Mannor, et al. Gluformer: Learning generalizable representations from continuous glucose monitoring data. InLearning Meaningful Representations of Life (LMRL) Workshop at ICLR 2025
2025
-
[21]
Glucodensity functional profiles outperform traditional continuous glucose monitoring metrics.Scientific reports, 15(1):33662, 2025
Marcos Matabuena, Rahul Ghosal, Javier Enrique Aguilar, Ayya Keshet, Robert Wagner, Car- men Fernández Merino, Juan Sánchez Castro, Vadim Zipunnikov, Jukka-Pekka Onnela, and Francisco Gude. Glucodensity functional profiles outperform traditional continuous glucose monitoring metrics.Scientific reports, 15(1):33662, 2025
2025
-
[22]
Ahmed A Metwally, Heyjun Park, Yue Wu, Tracey McLaughlin, and Michael P Snyder. Use of continuous glucose monitoring with machine learning to identify metabolic subphenotypes and inform precision lifestyle changes.Journal of Diabetes Science and Technology, 2025. 15 CGM-JEPA: Learning Consistent Continuous Glucose Monitor Representations via Predictive Se...
2025
-
[23]
Prediction of metabolic subphe- notypes of type 2 diabetes via continuous glucose monitoring and machine learning.Nature biomedical engineering, 9(8):1222–1239, 2025
Ahmed A Metwally, Dalia Perelman, Heyjun Park, Yue Wu, Alokkumar Jha, Seth Sharp, Alessan- dra Celli, Ekrem Ayhan, Fahim Abbasi, Anna L Gloyn, et al. Prediction of metabolic subphe- notypes of type 2 diabetes via continuous glucose monitoring and machine learning.Nature biomedical engineering, 9(8):1222–1239, 2025
2025
-
[24]
Nguyen, Phanwadee Sinthong, and Jayant Kalagnanam
Yuqi Nie, Nam H. Nguyen, Phanwadee Sinthong, and Jayant Kalagnanam. A time series is worth 64 words: Long-term forecasting with transformers. InInternational Conference on Learning Representations, 2023
2023
-
[25]
Metwally, Alireza Delfarah, Yue Wu, Dalia Perelman, Caleb Mayer, Curtis McGinity, Majid Rodgar, Alessandra Celli, Tracey McLaughlin, Emmanuel Mignot, and Michael Snyder
Heyjun Park, Ahmed A. Metwally, Alireza Delfarah, Yue Wu, Dalia Perelman, Caleb Mayer, Curtis McGinity, Majid Rodgar, Alessandra Celli, Tracey McLaughlin, Emmanuel Mignot, and Michael Snyder. High-resolution lifestyle profiling and metabolic subphenotypes of type 2 diabetes.npj Digital Medicine, 8(352), 2025
2025
-
[26]
Glucobench: Curated list of continuous glucose monitoring datasets with prediction benchmarks, 2024
Renat Sergazinov, Elizabeth Chun, Valeriya Rogovchenko, Nathaniel Fernandes, Nicholas Kas- man, and Irina Gaynanova. Glucobench: Curated list of continuous glucose monitoring datasets with prediction benchmarks, 2024
2024
-
[27]
A tutorial on principal component analysis, 2014
Jonathon Shlens. A tutorial on principal component analysis, 2014
2014
-
[28]
Self-supervised pre-training with joint-embedding pre- dictive architecture boosts ecg classification performance.Computers in Biology and Medicine, 196:110809, 2025
Kuba Weimann and Tim OF Conrad. Self-supervised pre-training with joint-embedding pre- dictive architecture boosts ecg classification performance.Computers in Biology and Medicine, 196:110809, 2025
2025
-
[29]
Lancaster, Daniel Hornburg, Lucia Ramirez, David Bogumil, Sarah Pollock, Frank Wong, Denver Bradley, Georg Gutjahr, Ekanath Srihari Rangan, Tao Wang, Lettie McGuire, P
Yue Wu, Ben Ehlert, Ahmed A Metwally, Dalia Perelman, Heyjun Park, Andrew Wallace Brooks, Fahim Abbasi, Basil Michael, Alessandra Celli, Caroline Bejikian, Ekrem Ayhan, Yingzhou Lu, Samuel M. Lancaster, Daniel Hornburg, Lucia Ramirez, David Bogumil, Sarah Pollock, Frank Wong, Denver Bradley, Georg Gutjahr, Ekanath Srihari Rangan, Tao Wang, Lettie McGuire,...
2025
-
[30]
Ts2vec: Towards universal representation of time series, 2022
Zhihan Yue, Yujing Wang, Juanyong Duan, Tianmeng Yang, Congrui Huang, Yunhai Tong, and Bixiong Xu. Ts2vec: Towards universal representation of time series, 2022
2022
-
[31]
Ali Heydari, Girish Narayanswamy, Maxwell A
Yuwei Zhang, Kumar Ayush, Siyuan Qiao, A. Ali Heydari, Girish Narayanswamy, Maxwell A. Xu, Ahmed A. Metwally, Shawn Xu, Jake Garrison, Xuhai Xu, Tim Althoff, Yun Liu, Pushmeet Kohli, Jiening Zhan, Mark Malhotra, Shwetak Patel, Cecilia Mascolo, Xin Liu, Daniel McDuff, and Yuzhe Yang. Sensorlm: Learning the language of wearable sensors, 2025
2025
-
[32]
Glucodensity image
Meng Zhou, Zechen Li, and Pengtao Xie. Self-supervised regularization for text classification. Transactions of the Association for Computational Linguistics, 9:641–656, 2021. 16 CGM-JEPA: Learning Consistent Continuous Glucose Monitor Representations via Predictive Self-Supervised Pretraining Methods M.2. Related Work CGM-Based Metabolic Subphenotype Pred...
2021
-
[33]
2.Sliceeach subject’s stream into 24-hour windows of length 288
Loadthe pooled pretraining CSV (cgm_initial_cohort.csv, 22 Stanford + 206 Colas subjects, ≈389k rows at 5-minute cadence). 2.Sliceeach subject’s stream into 24-hour windows of length 288. 3.Tokenizeeach window into𝑃=24non-overlapping hourly patches of size 12
-
[34]
(X-CGM-JEPAonly)Look up the precomputed Glucodensity tensor for the same window from the pickle cache (Section M.5.2); no live KDE
-
[35]
Maska random subset of patches at the configured mask ratio (default 0.25, varied in the ablation), splitting the window into context (visible) and target (masked) sets
-
[36]
ForX-CGM-JEPA, an auxiliary cross-view predictor𝑞𝜔 predicts the masked Glucodensity embeddingu=𝑔 𝜓(D (𝑡) )from the same context
Forward.Context encoder 𝑓𝜃 encodes visible patches; EMA target encoder𝑓¯𝜃 encodes the full window and provides the target latents at masked positions; predictor𝑝𝜙 predicts the target latents from the context. ForX-CGM-JEPA, an auxiliary cross-view predictor𝑞𝜔 predicts the masked Glucodensity embeddingu=𝑔 𝜓(D (𝑡) )from the same context. 23 CGM-JEPA: Learni...
-
[37]
Gradients are stop-gradiented on the CGM target branch andflowinto𝑔 𝜓
Backward.Total loss LCGM +𝜆L GD (with LGD=0for vanillaCGM-JEPA); Adam step on 𝜃, 𝜙, 𝜔, 𝜓; ¯𝜃 is updated by EMA of𝜃 at momentum 0.997. Gradients are stop-gradiented on the CGM target branch andflowinto𝑔 𝜓
-
[38]
Downstream pipeline (linear probe)
Checkpoint.After 100 epochs, the encoder𝑓𝜃 is logged as a versioned wandb artifact together with all metadata (mask ratio,𝜆, seed) so it is fully reproducible at downstream time. Downstream pipeline (linear probe)
-
[39]
Architecture and tokenizer dimensions are read from the artifact metadata, so the loader can reconstruct the model without recourse to a separate checkpoint config file
Resolvea pretrained encoder by its wandb artifact version. Architecture and tokenizer dimensions are read from the artifact metadata, so the loader can reconstruct the model without recourse to a separate checkpoint config file
-
[40]
Loadthe labeled subject set fromtrain_split.json (initial cohort, 𝑛=27) and validation _split.json (validation cohort,𝑛=17); each subject’s daily window is already on the 5-minute grid via Section M.4.3
-
[41]
Freeze 𝑓𝜃; route each labeled window through the same patchify→ encode pipeline used at pretraining time, then mean-pool the patch tokens to obtain a fixed-dimensional subject embedding z∈ℝ 𝑑 (𝑑=96for the encoders in this paper)
-
[42]
Outer 2-fold cross-validation is repeated for 20 random seeds, giving20×2=40 paired test folds per(encoder, task)cell
Probe.Fit an ℓ2-regularized logistic-regression probe (LogisticRegressionCV) with class- balancedweightingandinner2-foldCVovertheregularizationgrid 𝐶∈ { 10−3, 10−2, 10−1, 1, 10, 100}, scored by AUROC. Outer 2-fold cross-validation is repeated for 20 random seeds, giving20×2=40 paired test folds per(encoder, task)cell
-
[43]
task_name
Evaluate.Report mean AUROC, PR-AUC, and F1 across folds and use a paired bootstrap test for headline comparisons. Optional probes (Linear SVC, Ridge, Random Forest, kNN) follow the same outer protocol. Theevaluation pipelineis parameterized by an(extract_method, val_extract_method) pair: in-clinic venous(ctru_venous→ctru_venous ) trains and tests on the O...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.