pith. sign in

arxiv: 2605.21301 · v1 · pith:E5J6NGN3new · submitted 2026-05-20 · 💻 cs.LG · cs.CV

Automatic Discovery of Disease Subgroups by Contrasting with Healthy Controls

Pith reviewed 2026-05-21 06:02 UTC · model grok-4.3

classification 💻 cs.LG cs.CV
keywords subgroup discoverycontrastive learningmedical imagingpatient stratificationdeep learningEM optimizationdisease clusteringhealthy controls
0
0 comments X

The pith

Contrasting patients with healthy controls lets a deep model isolate subgroups driven only by disease factors.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops Deep UCSL, a contrastive method for discovering subgroups inside patient groups by explicitly comparing them to healthy control subjects. It starts from the idea that controls and patients share normal sources of variation that are irrelevant to disease, so suppressing those shared factors leaves clusters shaped purely by pathology. A deep network learns a feature space while an expectation-maximization procedure alternates between inferring subgroup assignments and updating the encoder; a regularization term keeps the features focused on disease-specific signals. The resulting subgroups prove more homogeneous and interpretable than those from earlier methods when tested on a MNIST toy case and four real medical-imaging collections.

Core claim

Assuming that healthy subjects share common but irrelevant factors of variation with the patients, we motivate and develop a Contrastive Subgroup Discovery method, entitled Deep UCSL. By contrasting patients with controls, Deep UCSL identifies subgroups driven solely by pathological factors, ignoring common variability shared with healthy subjects. Our framework employs a deep feature extractor to learn a discriminative representation space. Mathematically, we derive a novel loss based on the conditional joint likelihood of latent clusters and patient/control labels, optimized via an Expectation-Maximization strategy alternating between subgroup inference and feature encoder updates. A ualar

What carries the argument

Deep UCSL contrastive framework that uses a deep feature extractor and an EM-optimized loss on the joint likelihood of latent clusters and patient/control labels, plus regularization to suppress shared variability.

If this is right

  • Subgroups become driven only by pathological factors rather than by variations also present in healthy subjects.
  • Quantitative measures of subgroup quality improve on both synthetic digit data and four distinct medical imaging collections.
  • The learned representations emphasize disease-specific signals while down-weighting common healthy variation.
  • The same EM alternation between cluster assignment and encoder update can be reused on new patient-control cohorts.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The contrastive principle could be tested on non-imaging modalities such as genomic or electronic health record data to check whether the same suppression of shared variation improves clustering.
  • If the method succeeds, downstream tasks like treatment-response prediction might benefit from using the discovered subgroups as strata.
  • Direct comparison with other contrastive or domain-adaptation techniques would clarify whether the specific EM-plus-regularization combination is necessary for the reported gains.

Load-bearing premise

Healthy subjects share common but irrelevant factors of variation with the patients that can be safely ignored or suppressed to isolate purely pathological drivers of subgroups.

What would settle it

If subgroup homogeneity or interpretability shows no gain over non-contrastive baselines on a held-out medical imaging dataset, or if controls do not exhibit the assumed shared variability, the central claim would not hold.

read the original abstract

In biomedical Subgroup Discovery, practitioners are interested in discovering interpretable and homogeneous subgroups within a group of patients. In this paper, assuming that healthy subjects (i.e., controls) share common but irrelevant factors of variation with the patients, we motivate and develop a Contrastive Subgroup Discovery method, entitled Deep UCSL. By contrasting patients with controls, Deep UCSL identifies subgroups driven solely by pathological factors, ignoring common variability shared with healthy subjects. Our framework employs a deep feature extractor to learn a discriminative representation space. Mathematically, we derive a novel loss based on the conditional joint likelihood of latent clusters and patient/control labels, optimized via an Expectation-Maximization strategy alternating between subgroup inference and feature encoder updates. A regularization term further encourages representations to capture disease-specific variability while ignoring variability shared with controls. Compared to previous related works, our approach quantitatively improves the quality of the estimated subgroups, as demonstrated on a MNIST example and four distinct real medical imaging datasets. Code and datasets are available at: https://github.com/rlouiset/deep_ucsl.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes Deep UCSL, a contrastive subgroup discovery method that contrasts patients with healthy controls to identify subgroups driven solely by pathological factors. It employs a deep feature extractor, derives a loss from the conditional joint likelihood of latent clusters and patient/control labels, optimizes via EM alternation between subgroup inference and encoder updates, and adds a regularization term to suppress shared variability. Quantitative improvements in subgroup quality are reported on an MNIST example and four real medical imaging datasets, with code and data released.

Significance. If the core assumption holds and the empirical gains are robust, the work offers a principled way to isolate pathology-specific structure in biomedical data, which could improve interpretability of patient subgroups. The self-contained loss derivation, EM procedure, and public code release are clear strengths that support reproducibility. The approach extends contrastive ideas to subgroup discovery but its broader impact depends on how well the control contrast generalizes when non-pathological factors differ in distribution between groups.

major comments (2)
  1. [Method and Experiments] The central claim that subgroups are 'driven solely by pathological factors' (Abstract) rests on the assumption that controls capture all shared non-pathological variability. No analysis or experiment is presented showing that the learned representations are uncorrelated with known non-disease covariates such as age, sex, or scanner after training; without this check the regularization term alone does not guarantee isolation of pathological drivers.
  2. [Abstract and Results] Abstract and Results: quantitative improvements are asserted on MNIST and four medical datasets, yet no exact metrics, baseline methods, statistical tests, data splits, or cross-validation details are supplied. This absence is load-bearing for the claim of improvement over prior work and prevents verification that post-hoc choices did not inflate performance.
minor comments (2)
  1. [Method] Clarify the precise form of the regularization term and its weighting hyper-parameter in the loss; the current description leaves open how strongly it enforces the desired separation versus the likelihood term.
  2. [Experiments] Figure captions and axis labels in the MNIST and medical-dataset visualizations should explicitly state what quantity is plotted (e.g., t-SNE of the learned representation colored by inferred subgroup).

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment below and indicate the revisions we will make to strengthen the manuscript.

read point-by-point responses
  1. Referee: [Method and Experiments] The central claim that subgroups are 'driven solely by pathological factors' (Abstract) rests on the assumption that controls capture all shared non-pathological variability. No analysis or experiment is presented showing that the learned representations are uncorrelated with known non-disease covariates such as age, sex, or scanner after training; without this check the regularization term alone does not guarantee isolation of pathological drivers.

    Authors: We agree that an explicit check for correlation between the learned representations and known non-disease covariates would provide stronger empirical support for the claim that pathological factors are isolated. In the revised manuscript we will add a new analysis subsection in the Experiments section. For the medical imaging datasets that include age and sex metadata, we will report Pearson and Spearman correlations between the final representations and these covariates, as well as any available scanner information. Where such metadata are unavailable we will explicitly note the limitation and discuss how the regularization term is intended to mitigate shared variability. These additions will be accompanied by the corresponding code updates in the public repository. revision: yes

  2. Referee: [Abstract and Results] Abstract and Results: quantitative improvements are asserted on MNIST and four medical datasets, yet no exact metrics, baseline methods, statistical tests, data splits, or cross-validation details are supplied. This absence is load-bearing for the claim of improvement over prior work and prevents verification that post-hoc choices did not inflate performance.

    Authors: We acknowledge that the current presentation of results lacks sufficient detail for full reproducibility and independent verification. In the revised manuscript we will expand both the Abstract and the Results section. The Abstract will be updated to report the primary quantitative metrics (e.g., adjusted Rand index, normalized mutual information) and the key baselines. A new subsection will provide: (i) exact definitions of all metrics, (ii) the complete list of baseline methods with references, (iii) statistical tests performed together with p-values and correction method, (iv) precise train/validation/test split ratios and any stratification used, and (v) the cross-validation procedure (including number of folds and random seeds). All experimental details will be cross-referenced to the released code. revision: yes

Circularity Check

0 steps flagged

Derivation from conditional joint likelihood is self-contained

full rationale

The paper derives its loss directly from the conditional joint likelihood of latent clusters and observed patient/control labels, then applies standard EM alternation for optimization. This follows conventional probabilistic clustering with side information and does not reduce any claimed result to the inputs by construction. The added regularization term is explicitly motivated to promote disease-specific variability rather than being a fitted quantity renamed as a prediction. No self-citation chains, uniqueness theorems imported from prior author work, or ansatzes smuggled via citation appear as load-bearing steps in the provided description. The central claim therefore rests on the modeling assumptions and derivation rather than tautology or statistical forcing.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that controls provide a clean contrast for irrelevant variation and on several implementation choices whose values are not detailed in the abstract.

free parameters (2)
  • regularization weight
    Controls the emphasis on disease-specific variability versus shared factors; value not specified in abstract.
  • number of subgroups
    Determines the granularity of discovered clusters; chosen or inferred but not detailed.
axioms (1)
  • domain assumption healthy subjects share common but irrelevant factors of variation with the patients
    Explicitly stated as the motivation for the contrastive approach in the abstract.

pith-pipeline@v0.9.0 · 5726 in / 1147 out tokens · 44114 ms · 2026-05-21T06:02:37.529861+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

54 extracted references · 54 canonical work pages

  1. [1]

    In: International Conference on Machine Learning (ICML) (2020)

    Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: International Conference on Machine Learning (ICML) (2020)

  2. [2]

    In: Proceedings of the IEEE/CVF Conference on Computer 25 Vision and Pattern Recognition, pp

    He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum contrast for unsu- pervised visual representation learning. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9726–9735 (2020). https://doi.org/ 10.1109/CVPR42600.2020.00975

  3. [3]

    Walk in the cloud: Learning curves for point clouds shape analysis, pp

    Zheng, M., You, S., Wang, F., Qian, C., Zhang, C., Wang, X., Xu, C.: Weakly supervised contrastive learning. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp. 10022–10031 (2021). https://doi.org/10.1109/ ICCV48922.2021.00989

  4. [4]

    In: European Conference on Computer Vision (ECCV), pp

    Caron, M., Bojanowski, P., Joulin, A., Douze, M.: Deep clustering for unsuper- vised learning of visual features. In: European Conference on Computer Vision (ECCV), pp. 139–156 (2018). https://doi.org/10.1007/978-3-030-01264-9 9

  5. [5]

    In: Advances in Neural Information Processing Systems (NeurIPS), pp

    Caron, M., Misra, I., Mairal, J., Goyal, P., Bojanowski, P., Joulin, A.: Unsuper- vised learning of visual features by contrasting cluster assignments. In: Advances in Neural Information Processing Systems (NeurIPS), pp. 9912–9924 (2020)

  6. [6]

    In: International Conference on Learning Repre- sentations (ICLR) (2021)

    Li, J., Zhou, P., Xiong, C., Hoi, S.C.H.: Prototypical contrastive learning of unsupervised representations. In: International Conference on Learning Repre- sentations (ICLR) (2021)

  7. [7]

    In: European Conference on Computer Vision (ECCV), pp

    Van Gansbeke, W., Vandenhende, S., Georgoulis, S., Proesmans, M., Van Gool, L.: Scan: Learning to classify images without labels. In: European Conference on Computer Vision (ECCV), pp. 268–285 (2020)

  8. [8]

    WIREs Data Mining and Knowledge Dis- covery5(1), 35–49 (2015) https://doi.org/10.1002/widm.1143 18

    Atzmueller, M.: Subgroup discovery. WIREs Data Mining and Knowledge Dis- covery5(1), 35–49 (2015) https://doi.org/10.1002/widm.1143 18

  9. [9]

    In: Advances in Knowledge Discovery and Data Mining, pp

    Klosgen, W.: Explora: A multipattern and multistrategy discovery assistant. In: Advances in Knowledge Discovery and Data Mining, pp. 249–271 (1996)

  10. [10]

    IEEE Transactions on Medical Imaging 40(12), 3652–3662 (2021) https://doi.org/10.1109/TMI.2021.3093206

    Yang, J., Angelini, E.D., Balte, P.P., Hoffman, E.A., Austin, J.H.M., Smith, B.M., Barr, R.G., Laine, A.F.: Novel subtypes of pulmonary emphysema based on spatially-informed lung texture learning. IEEE Transactions on Medical Imaging 40(12), 3652–3662 (2021) https://doi.org/10.1109/TMI.2021.3093206

  11. [11]

    JAMA 295(21), 2492–2502 (2006) https://doi.org/10.1001/jama.295.21.2492

    Carey, L.A., Perou, C.M., Livasy, C.A., Dressler, L.G., Cowan, D., Conway, K., Karaca, G., Troester, M.A., Tse, C.K., Edmiston, S., Deming, S.L., Geradts, J., Cheang, M.C.U., Nielsen, T.O., Moorman, P.G., Earp, H.S., Millikan, R.C.: Race, breast cancer subtypes, and survival in the carolina breast cancer study. JAMA 295(21), 2492–2502 (2006) https://doi.o...

  12. [12]

    Genome Medicine8(27) (2016) https://doi.org/10.1186/ s13073-016-0279-9

    Planey, C.R.: Coincide: A framework for discovery of patient subtypes across multiple datasets. Genome Medicine8(27) (2016) https://doi.org/10.1186/ s13073-016-0279-9

  13. [13]

    Brain143(3), 1027–1038 (2020) https://doi

    Chand, G.B., Dwyer, D.B., Erus, G., Sotiras, A., Varol, E., Doshi, J., Pomponio, R., Davatzikos, C.: Two distinct neuroanatomical subtypes of schizophrenia revealed using machine learning. Brain143(3), 1027–1038 (2020) https://doi. org/10.1093/brain/awaa015

  14. [14]

    Scientific Reports7 (2017) https://doi.org/10.1038/s41598-017-00364-0

    Ferreira, D., Verhagen, C., Hernandez-Cabrera, J.A., Cavallin, L., Guo, C.J., Ekman, U., Muehlboeck, J.S., Simmons, A., Barroso, J., Wahlund, L.O., West- man, E.: Distinct subtypes of alzheimer’s disease based on patterns of brain atrophy: longitudinal trajectories and clinical applications. Scientific Reports7 (2017) https://doi.org/10.1038/s41598-017-00364-0

  15. [15]

    Schizophrenia Research214, 43–50 (2019) https:// doi.org/10.1016/j.schres.2019.08.019

    Honnorat, N., Dong, A., Meisenzahl-Lechner, E., Koutsouleris, N., Davatzikos, C.: Neuroanatomical heterogeneity of schizophrenia revealed by semi-supervised machine learning methods. Schizophrenia Research214, 43–50 (2019) https:// doi.org/10.1016/j.schres.2019.08.019

  16. [16]

    Translational Psychiatry11(1), 1–8 (2021) https://doi.org/10.1038/s41398-021-01642-1

    Yang, T., Zhao, Y., Ni, T., Wang, Y., Li, X.: Probing the clinical and brain structural boundaries of bipolar and major depressive disorder. Translational Psychiatry11(1), 1–8 (2021) https://doi.org/10.1038/s41398-021-01642-1

  17. [17]

    In: International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI) (2020)

    Wen, J., Varol, E., Chand, G., Sotiras, A., Davatzikos, C.: Magic: Multi-scale het- erogeneity analysis and clustering for brain diseases. In: International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI) (2020)

  18. [18]

    In: MIDL (2024) 19

    Louiset, R., Duchesnay, E., Dufumier, B., Grigis, A., Gori, P.: Sepvae: a con- trastive vae to separate pathological patterns from healthy ones. In: MIDL (2024) 19

  19. [19]

    In: International Conference on Learning Representations (ICLR) (2024)

    Louiset, R., Duchesnay, E., Grigis, A., Gori, P.: Separating common from salient patterns with Contrastive Representation Learning. In: International Conference on Learning Representations (ICLR) (2024)

  20. [21]

    In: European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML/PKDD), pp

    Louiset, R., Gori, P., Dufumier, B., Houenou, J., Grigis, A., Duchesnay, E.: Ucsl: A machine learning expectation-maximization framework for unsupervised clustering driven by supervised learning. In: European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML/PKDD), pp. 755–771 (2021)

  21. [22]

    IEEE Transactions on Image Processing 26(5), 2274–2285 (2017)

    Lv, J., Kang, Z., Lu, X., Xu, Z.: Pseudo-supervised deep subspace clustering. IEEE Transactions on Image Processing30(2021) https://doi.org/10.1109/TIP. 2021.3079822

  22. [23]

    In: Proceedings of the IEEE/CVF Conference on Computer 25 Vision and Pattern Recognition, pp

    Zhan, X., Xie, J., Liu, Z., Ong, Y.S., Loy, C.C.: Online deep clustering for unsupervised representation learning. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6687–6696 (2020). https://doi.org/ 10.1109/CVPR42600.2020.00673

  23. [24]

    In: Advances in Neural Information Processing Systems (NeurIPS) (2020)

    Grill, J.B., Strub, F., Altch´ e, F., Tallec, C., Richemond, P.H., Buchatskaya, E., Doersch, C., Pires, B.A., Guo, Z.D., Azar, M.G., Piot, B., Kavukcuoglu, K., Munos, R., Valko, M.: Bootstrap your own latent - a new approach to self- supervised learning. In: Advances in Neural Information Processing Systems (NeurIPS) (2020)

  24. [25]

    In: International Conference on Learning Representations (ICLR) (2023)

    Barbano, C.A., Dufumier, B., Tartaglione, E., Grangetto, M., Gori, P.: Unbi- ased Supervised Contrastive Learning. In: International Conference on Learning Representations (ICLR) (2023)

  25. [26]

    In: International Conference on Machine Learning (ICML) (2023)

    Dufumier, B., Barbano, C.A., Louiset, R., Duchesnay, E., Gori, P.: Integrat- ing Prior Knowledge in Contrastive Learning with Kernel. In: International Conference on Machine Learning (ICML) (2023)

  26. [27]

    In: Medical Imaging at NeurIPS (2021)

    Dufumier, B., Gori, P., Victor, J., Grigis, A., Duchesnay, E.: Conditional Align- ment and Uniformity for Contrastive Learning with Continuous Proxy Labels. In: Medical Imaging at NeurIPS (2021)

  27. [28]

    In: Advances in Neural Information Processing Systems (NeurIPS) (2020) 20

    Khosla, P., Teterwak, P., Wang, C., Sarna, A., Tian, Y., Isola, P., Maschinot, A., Liu, C., Krishnan, D.: Supervised contrastive learning. In: Advances in Neural Information Processing Systems (NeurIPS) (2020) 20

  28. [29]

    In: International Conference on Learning Representations (ICLR) (2021)

    Tsai, T.W., Li, C., Zhu, J.: Mice: Mixture of contrastive experts for unsuper- vised image clustering. In: International Conference on Learning Representations (ICLR) (2021)

  29. [30]

    Derf: Decomposed radiance fields,

    Dang, Z., Deng, C., Yang, X., Wei, K., Huang, H.: Nearest neighbor match- ing for deep clustering. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 13688–13697 (2021). https://doi.org/10.1109/ CVPR46437.2021.01348

  30. [31]

    A ConvNet for the 2020s

    He, K., Chen, X., Xie, S., Li, Y., Doll´ ar, P., Girshick, R.: Masked autoencoders are scalable vision learners. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022). https://doi.org/10.1109/CVPR52688.2022. 01553

  31. [32]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp

    Fang, Y., Wang, W., Xie, B., Sun, Q., Wu, L., Wang, X., Huang, T., Wang, X., Cao, Y.: Eva: Exploring the limits of masked visual representation learning at scale. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2023). https://doi.org/10.1109/CVPR52729.2023.01850

  32. [33]

    A ConvNet for the 2020s

    Xie, Z., Zhang, Z., Cao, Y., Lin, Y., Bao, J., Yao, Z., Dai, Q., Hu, H.: Simmim: A simple framework for masked image modeling. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022). https://doi.org/10. 1109/CVPR52688.2022.00945

  33. [34]

    In: International Conference on Learning Representations (ICLR) (2021)

    Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (ICLR) (2021)

  34. [35]

    Nature622(7981), 156– 163 (2023) https://doi.org/10.1038/s41586-023-06427-x

    Zhou, Y., Chia, M.A., Wagner, S.K., Ayhan, M.S., Williamson, D.J., Struyven, R.R., Howell, T., Jones, N.P., Alexander, D.C., Keane, P.A.: A foundation model for generalizable disease detection from retinal images. Nature622(7981), 156– 163 (2023) https://doi.org/10.1038/s41586-023-06427-x

  35. [36]

    360mvsnet: Deep multi-view stereo network with 360° images for indoor scene reconstruction,

    Xiao, J., Bai, Y., Yuille, A., Zhou, Z.: Delving into masked autoencoders for multi-label thorax disease classification. In: IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pp. 3588–3600 (2023). https://doi. org/10.1109/WACV56688.2023.00360

  36. [37]

    In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2024)

    Yao, J., Wang, X., Song, Y., Zhao, H., Ma, J., Yang, Y., Liu, W., Wang, B.: Eva-x: A foundation model for general chest x-ray analysis with self-supervised learning. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2024)

  37. [38]

    In: International Conference on 21 Machine Learning (ICML) (2021)

    Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., Krueger, G., Sutskever, I.: Learning transferable visual models from natural language supervision. In: International Conference on 21 Machine Learning (ICML) (2021)

  38. [39]

    In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP) (2022)

    Wang, Z., Wu, Z., Agarwal, D., Sun, J.: Medclip: Contrastive learning from unpaired medical images and text. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP) (2022)

  39. [40]

    APACrefauthors \ 1987

    Rousseeuw, P.J.: Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics20(1987) https://doi.org/10.1016/0377-0427(87)90125-7

  40. [41]

    IEEE Transactions on Pattern Analysis and Machine Intelligence (1979) https://doi.org/10.1016/ 0377-0427(87)90125-7

    Davies, D.L., Bouldin, D.W.: A cluster separation measure. IEEE Transactions on Pattern Analysis and Machine Intelligence (1979) https://doi.org/10.1016/ 0377-0427(87)90125-7

  41. [42]

    In: International Conference on Machine Learning (ICML) (2000)

    Pelleg, D., Moore, A.: X-means: Extending k-means with efficient estimation of the number of clusters. In: International Conference on Machine Learning (ICML) (2000)

  42. [43]

    In: International Conference on Machine Learning (ICML) (2016)

    Xie, J., Girshick, R., Farhadi, A.: Unsupervised deep embedding for clustering analysis. In: International Conference on Machine Learning (ICML) (2016)

  43. [44]

    In: Advances in Neural Information Processing Systems (NeurIPS), pp

    Cuturi, M.: Sinkhorn distances: Lightspeed computation of optimal transport. In: Advances in Neural Information Processing Systems (NeurIPS), pp. 2292–2300 (2013)

  44. [45]

    In: International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), pp

    Sun, S.J., Zhen, X., Zhong, Y., Li, X.: Unsupervised representation learning meets pseudo-label supervised self-distillation: A new approach to rare disease classification. In: International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), pp. 519–529 (2021)

  45. [46]

    In: Advances in Neural Information Processing Systems (NeurIPS) (2022)

    Chen, Z., Deng, Y., Wu, Y., Gu, Q., Li, Y.: Towards understanding the mixture- of-experts layer in deep learning. In: Advances in Neural Information Processing Systems (NeurIPS) (2022)

  46. [47]

    Masset, R

    Wang, L., Kogan, A., Coburn, C., Kennedy, D.N., Keator, D.B., Marcus, D.S.: Schizconnect: Mediating neuroimaging databases on schizophrenia for large- scale integration. NeuroImage124, 1155–1167 (2016) https://doi.org/10.1016/j. neuroimage.2015.08.067

  47. [48]

    Bipolar Disorders20(8), 721–732 (2018) https://doi.org/10.1111/bdi.12658

    Sarrazin, S., Poupon, C., Teissier, A., Mangin, J.F., Polosan, M., Etain, B., Leboyer, M., Houenou, J.: Neurodevelopmental subtypes of bipolar disorder are related to cortical folding patterns: An international multicenter study. Bipolar Disorders20(8), 721–732 (2018) https://doi.org/10.1111/bdi.12658

  48. [49]

    NeuroImage (2024) 22

    Dufumier, B., Gori, P., Petiton, S., Louiset, R., Mangin, J.-F., Grigis, A., Duch- esnay, E.: Exploring the potential of representation and transfer learning for anatomical neuroimaging: application to psychiatry. NeuroImage (2024) 22

  49. [50]

    Schizophrenia Bulletin40(2), 131–137 (2014) https://doi.org/10.1093/schbul/sbt166

    Tamminga, C.A., Pearlson, G.D., Keshavan, M.S., Sweeney, J.A., Clementz, B.A., Ivleva, E.I.: Bipolar and schizophrenia network for intermediate phenotypes: Out- comes across the psychosis continuum. Schizophrenia Bulletin40(2), 131–137 (2014) https://doi.org/10.1093/schbul/sbt166

  50. [51]

    In: International Conference on Learning Representations (ICLR) (2014)

    Kingma, D.P., Welling, M.: Auto-encoding variational bayes. In: International Conference on Learning Representations (ICLR) (2014)

  51. [52]

    NeuroImage264 (2022) https://doi.org/10.1016/j.neuroimage.2022.119725

    Gaser, C., Dahnke, R., Thompson, P.M., Kurth, F., Luders, E.: Cat – a computa- tional anatomy toolbox for the analysis of structural mri data. NeuroImage264 (2022) https://doi.org/10.1016/j.neuroimage.2022.119725

  52. [53]

    Cell172(5), 1122–1131 (2018) https://doi.org/10.1016/j.cell.2018

    Kermany, D.S., Goldbaum, M., Cai, W., Valentim, C.C.S., Liang, H., Baxter, S.L., McKeown, A., Yang, G., Wu, X., Yan, F., Dong, J., Prasad, M.K., Pei, J., Ting, M.Y.L., Zhu, J., Li, C., Hewett, S., Dong, J., Ziyar, I., Shi, A., Zhang, R., Zheng, L., Hou, R., Shi, W., Xin, X., Squire, E., Cogen, M.E., Heinis, C.G., MacSiomain, R., Chiang, A., Hou, L., Jia, ...

  53. [54]

    In: International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), pp

    Dufumier, B., Gori, P., Victor, J., Grigis, A., Duchesnay, E.: Contrastive learn- ing with continuous proxy metadata for 3d mri classification. In: International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), pp. 58–68 (2021). https://doi.org/10.1007/978-3-030-87196-3 6

  54. [55]

    Joy, T., Schmon, S., Torr, P.H.S., Siddharth, N., Rainforth, T.: Capturing label characteristics in vaes. In: International Conference on Learning Representations (ICLR) (2021) 23 Appendix A Convergence guarantee Here, we provide proof that the proposed Expectation-Maximization optimization pro- cess yields a monotonic increase of the log of the joint con...