pith. sign in

arxiv: 2506.22901 · v2 · pith:M6KNH3T5new · submitted 2025-06-28 · 💻 cs.LG · cs.AI· q-bio.BM· q-bio.GN

Missing-Modality-Aware Graph Neural Network for Cancer Classification

Pith reviewed 2026-05-21 23:41 UTC · model grok-4.3

classification 💻 cs.LG cs.AIq-bio.BMq-bio.GN
keywords missing modalitiesgraph neural networkmultiomics datacancer classificationmultimodal fusionattention mechanismpatient graph
0
0 comments X

The pith

A graph neural network connecting patients by shared missing-modality patterns improves cancer classification from incomplete multiomics data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a method to handle missing modalities in multimodal biological data without imputing values or dropping patients. It first fuses available modality embeddings through an attention mechanism that weighs each modality according to its presence and contribution for that patient. The fused representations then become node features in a patient graph whose edges reflect similarities in missingness patterns across modalities. A graph neural network processes this graph to produce the final classification. This design keeps fusion cost linear in the number of modalities and is shown to outperform prior fusion techniques on three real-world multiomics cancer datasets.

Core claim

MAGNET fuses lower-dimensional modality embeddings with a dynamic patient-modality multi-head attention mechanism whose complexity grows linearly with the number of modalities while adapting to each patient's missing-pattern variability. It then constructs a patient graph whose nodes carry these fused embeddings and whose connectivity is set by modality missingness, after which a graph neural network generates the predictions. On three public multiomics datasets containing real-world missingness, this approach outperforms state-of-the-art fusion methods for cancer classification.

What carries the argument

The patient graph whose edges are determined by modality missingness patterns, allowing the subsequent graph neural network to propagate information among patients who share similar data-availability profiles.

If this is right

  • All patients with partial modalities can be retained rather than excluded or imputed.
  • Fusion cost remains linear rather than combinatorial as the number of modalities grows.
  • The model adapts automatically to different missing-pattern distributions without retraining for each pattern.
  • Predictions improve by exploiting shared missingness as an additional source of structure in the data.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same graph-construction idea could be applied to other clinical tasks such as survival prediction where missingness may also correlate with outcomes.
  • Missingness patterns themselves may serve as a weak but useful proxy for unmeasured patient factors such as disease severity or care access.
  • Future experiments could replace the missingness-based edges with edges derived from additional metadata to test whether the current signal is the strongest available.
  • The linear scaling property suggests the method remains practical when new modalities are added to existing multiomics collections.

Load-bearing premise

The assumption that connectivity in the patient graph determined by modality missingness patterns provides meaningful signal for the GNN to improve predictions beyond what the fused embeddings alone achieve.

What would settle it

An ablation that removes the graph neural network step and classifies directly from the fused embeddings, then measures whether accuracy drops on the same three datasets, would test whether the missingness-based graph structure adds predictive value.

Figures

Figures reproduced from arXiv: 2506.22901 by Chen (Cherise) Chen, Haiping Lu, Sina Tabakhi.

Figure 1
Figure 1. Figure 1: Different missing-modality patterns (zero, one, or two modalities missing) across 12 patients with three modalities. Each colored row within a modality [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: MAGNET performs classification tasks in three steps. (1) MAGNET encodes each modality into a lower-dimensional embedding of the same size. (2) A patient-modality multi-head attention mechanism computes the importance of each modality for each patient and applies a modality mask to set attention weights to zero for missing modalities. It aggregates patient embeddings into a fused embedding using the attenti… view at source ↗
Figure 3
Figure 3. Figure 3: Effect of varying missingness ratios on simulated cancer data using Macro F1, averaged over five independent runs per ratio. (a) One modality remains [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: UMAP visualization of the training and test data from the BRCA dataset across each input omics modality and the patient representations learned by the [PITH_FULL_IMAGE:figures/full_fig_p010_4.png] view at source ↗
read the original abstract

A key challenge in learning from multimodal biological data is missing modalities, where data from one or more modalities are absent for some patients. Existing approaches either exclude patients with missing modalities, impute missing modalities, or make predictions directly with partial modalities. However, most of these methods rely on inflexible, patient-agnostic fusion strategies and do not scale computationally to the combinatorial growth of missing-modality patterns as the number of modalities increases. To address these limitations, we propose MAGNET (Missing-modality-Aware Graph neural NETwork) to enhance multimodal prediction with partial modalities, featuring a dynamic patient-modality multi-head attention mechanism to fuse lower-dimensional modality embeddings based on their contribution and missingness. MAGNET fusion's complexity increases linearly with the number of modalities while adapting to missing-pattern variability. To generate predictions, MAGNET further constructs a patient graph with fused multimodal embeddings as node features and connectivity determined by the modality missingness, followed by a graph neural network. Experiments on three public multiomics datasets for cancer classification, with real-world missingness, show that MAGNET outperforms state-of-the-art fusion methods. The data and code are available at https://github.com/SinaTabakhi/MAGNET.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript proposes MAGNET, a graph neural network for multimodal cancer classification from multiomics data with missing modalities. It uses a dynamic patient-modality multi-head attention mechanism to fuse lower-dimensional embeddings in a manner whose complexity scales linearly with the number of modalities while conditioning on missingness patterns. A patient graph is constructed with the fused embeddings as node features and edges determined by modality missingness patterns; a GNN is then applied to produce predictions. Experiments on three public multiomics datasets exhibiting real-world missingness report that MAGNET outperforms existing state-of-the-art fusion methods, with code released.

Significance. If the reported gains are shown to arise from the missingness-derived graph rather than the fusion module alone, the work would offer a scalable alternative to imputation or patient exclusion in incomplete multimodal settings. The linear-complexity attention and explicit use of missingness for graph construction are ideas that could transfer to other clinical prediction tasks. Code release supports reproducibility.

major comments (3)
  1. [Experiments] Experiments section: No ablation isolates the contribution of the patient graph whose connectivity is set by modality missingness patterns. The paper does not compare the full MAGNET model against the same dynamic attention fusion module followed by a non-graph classifier (e.g., MLP or linear layer) on the fused embeddings. Without this control, it remains possible that the headline outperformance is attributable only to the fusion step rather than the GNN operating on missingness-based edges.
  2. [Experimental Setup] Experimental setup and results: The manuscript provides insufficient detail on baseline re-implementations, hyperparameter tuning protocols, number of independent runs or random seeds, and statistical significance testing (e.g., paired t-tests or confidence intervals). These omissions prevent full verification of the claimed superiority on the three datasets.
  3. [Method] Method section on patient-graph construction: The assumption that edges defined by shared missingness patterns supply label-relevant structure is not supported by any diagnostic analysis (e.g., comparison of graph properties against random or feature-similarity graphs, or edge-weight correlation with labels). This component is load-bearing for the central claim yet lacks direct evidence.
minor comments (2)
  1. [Abstract] The abstract states that experiments use 'real-world missingness' but does not quantify the per-modality missing rates or list the exact modalities present in each of the three datasets.
  2. [Figures] Figure captions and axis labels in the experimental results could be expanded to include the precise missingness percentages and the number of modalities for each dataset.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed comments. We address each major point below and will revise the manuscript accordingly to strengthen the presentation and evidence.

read point-by-point responses
  1. Referee: [Experiments] Experiments section: No ablation isolates the contribution of the patient graph whose connectivity is set by modality missingness patterns. The paper does not compare the full MAGNET model against the same dynamic attention fusion module followed by a non-graph classifier (e.g., MLP or linear layer) on the fused embeddings. Without this control, it remains possible that the headline outperformance is attributable only to the fusion step rather than the GNN operating on missingness-based edges.

    Authors: We agree that an ablation isolating the graph component is important for validating the central claim. In the revised manuscript we will add a direct comparison of the full MAGNET model against a variant that applies the identical dynamic patient-modality multi-head attention fusion module followed by an MLP (or linear) classifier on the fused embeddings, omitting the GNN and missingness-based edges. This control will clarify the incremental contribution of the patient graph. revision: yes

  2. Referee: [Experimental Setup] Experimental setup and results: The manuscript provides insufficient detail on baseline re-implementations, hyperparameter tuning protocols, number of independent runs or random seeds, and statistical significance testing (e.g., paired t-tests or confidence intervals). These omissions prevent full verification of the claimed superiority on the three datasets.

    Authors: We will expand the experimental setup and results sections (and add an appendix if needed) to provide the requested details: descriptions of baseline re-implementations and any missing-modality adaptations; the hyperparameter search protocol and ranges; the number of independent runs (five runs with distinct random seeds); and statistical significance testing including paired t-tests, p-values, and 95% confidence intervals for all reported metrics. revision: yes

  3. Referee: [Method] Method section on patient-graph construction: The assumption that edges defined by shared missingness patterns supply label-relevant structure is not supported by any diagnostic analysis (e.g., comparison of graph properties against random or feature-similarity graphs, or edge-weight correlation with labels). This component is load-bearing for the central claim yet lacks direct evidence.

    Authors: We accept that direct diagnostic evidence would strengthen the justification for the missingness-based graph. In the revision we will add an analysis that (i) compares structural properties (clustering coefficient, modularity, average degree) of the missingness-derived graph against random graphs and feature-similarity graphs, and (ii) reports the correlation between edge weights (derived from shared missingness) and label agreement across patient pairs. These diagnostics will be presented in a new subsection or appendix. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical architecture evaluated on external benchmarks

full rationale

The paper presents MAGNET as a new architecture combining dynamic multi-head attention fusion (linear in modalities) with a patient graph whose edges are set by missingness patterns and then processed by a GNN. All performance claims rest on experiments against external baselines on three public multiomics datasets with real missingness. No equations, fitted parameters, or self-citations are shown to reduce the reported gains or the graph-construction step to quantities defined inside the paper itself. The derivation is therefore self-contained and externally falsifiable.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on standard supervised learning assumptions plus the modeling choice that missingness patterns carry predictive signal. No new physical entities are postulated. Hyperparameters such as number of attention heads and GNN layers are present but not enumerated as free parameters here.

axioms (2)
  • domain assumption Standard assumptions of supervised classification on tabular multiomics features hold for the cancer datasets used.
    Invoked implicitly when reporting classification performance on public datasets.
  • ad hoc to paper The attention mechanism can meaningfully weigh modality contributions conditional on missingness.
    Core design choice stated in the abstract description of the fusion step.

pith-pipeline@v0.9.0 · 5758 in / 1427 out tokens · 39472 ms · 2026-05-21T23:41:31.453433+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

67 extracted references · 67 canonical work pages · 1 internal anchor

  1. [1]

    Embracing cancer complexity: hallmarks of systemic disease,

    C. Swanton, E. Bernard, C. Abbosh, F. Andr ´e, J. Auwerx, A. Balmain, D. Bar-Sagi, R. Bernards, S. Bullman, J. DeGregori et al., “Embracing cancer complexity: hallmarks of systemic disease,” Cell, vol. 187, no. 7, pp. 1589–1616, 2024

  2. [2]

    An information fusion system-driven deep neural networks with application to cancer mortality risk estimate,

    C.-H. Yang, S.-H. Moi, L.-Y . Chuang, and Y .-D. Lin, “An information fusion system-driven deep neural networks with application to cancer mortality risk estimate,” IEEE Transactions on Neural Networks and Learning Systems, vol. 36, no. 2, pp. 2905–2916, 2025

  3. [3]

    In- terpretable identification of cancer genes across biological networks via transformer-powered graph representation learning,

    X. Su, P. Hu, D. Li, B. Zhao, Z. Niu, T. Herget, P. S. Yu, and L. Hu, “In- terpretable identification of cancer genes across biological networks via transformer-powered graph representation learning,” Nature Biomedical Engineering, pp. 1–19, 2025

  4. [4]

    Integrative omics for health and disease,

    K. J. Karczewski and M. P. Snyder, “Integrative omics for health and disease,” Nature Reviews Genetics , vol. 19, no. 5, pp. 299–310, 2018

  5. [5]

    Multimodal biomedical AI,

    J. N. Acosta, G. J. Falcone, P. Rajpurkar, and E. J. Topol, “Multimodal biomedical AI,” Nature Medicine, vol. 28, no. 9, pp. 1773–1784, 2022

  6. [6]

    Machine learning for integrating data in biology and medicine: principles, practice, and opportunities,

    M. Zitnik, F. Nguyen, B. Wang, J. Leskovec, A. Goldenberg, and M. M. Hoffman, “Machine learning for integrating data in biology and medicine: principles, practice, and opportunities,” Information Fusion , vol. 50, pp. 71–91, 2019

  7. [7]

    MSAFF: multi-way soft attention fusion framework with the large foundation models for the diagnosis of Alzheimer’s disease,

    X.-A. Bi, W. Shen, Y . Shan, D. Chen, L. Xu, K. Chen, and Z. Liu, “MSAFF: multi-way soft attention fusion framework with the large foundation models for the diagnosis of Alzheimer’s disease,” IEEE Transactions on Neural Networks and Learning Systems, pp. 1–15, 2025

  8. [8]

    Benchmarking joint multi-omics dimensionality reduction approaches for the study of cancer,

    L. Cantini, P. Zakeri, C. Hernandez, A. Naldi, D. Thieffry, E. Remy, and A. Baudot, “Benchmarking joint multi-omics dimensionality reduction approaches for the study of cancer,” Nature Communications, vol. 12, no. 1, p. 124, 2021

  9. [9]

    Similarity network fusion for aggregating data types on a genomic scale,

    B. Wang, A. M. Mezlini, F. Demir, M. Fiume, Z. Tu, M. Brudno, B. Haibe-Kains, and A. Goldenberg, “Similarity network fusion for aggregating data types on a genomic scale,” Nature Methods, vol. 11, no. 3, pp. 333–337, 2014

  10. [10]

    MOGONET integrates multi-omics data using graph convolutional networks allowing patient classification and biomarker identification,

    T. Wang, W. Shao, Z. Huang, H. Tang, J. Zhang, Z. Ding, and K. Huang, “MOGONET integrates multi-omics data using graph convolutional networks allowing patient classification and biomarker identification,” Nature Communications, vol. 12, no. 1, p. 3445, 2021

  11. [11]

    Learning from data with structured missingness,

    R. Mitra, S. F. McGough, T. Chakraborti, C. Holmes, R. Copping, N. Hagenbuch, S. Biedermann, J. Noonan, B. Lehmann, A. Shenvi et al., “Learning from data with structured missingness,”Nature Machine Intelligence, vol. 5, no. 1, pp. 13–23, 2023

  12. [12]

    A review of integrative imputation for multi-omics datasets,

    M. Song, J. Greenbaum, J. Luttrell IV , W. Zhou, C. Wu, H. Shen, P. Gong, C. Zhang, and H.-W. Deng, “A review of integrative imputation for multi-omics datasets,” Frontiers in Genetics , vol. 11, p. 570255, 2020

  13. [13]

    Exploiting interdata relationships in next-generation proteomics analysis,

    B. Vitrinel, H. W. Koh, F. M. Kar, S. Maity, J. Rendleman, H. Choi, and C. V ogel, “Exploiting interdata relationships in next-generation proteomics analysis,” Molecular & Cellular Proteomics , vol. 18, no. 8, pp. S5–S14, 2019

  14. [14]

    Multimodal learning with incomplete modalities by knowledge distillation,

    Q. Wang, L. Zhan, P. Thompson, and J. Zhou, “Multimodal learning with incomplete modalities by knowledge distillation,” in Proceedings JOURNAL OF LATEX CLASS FILES, VOL. XX, NO. XX, XXX 2025 11 of the 26th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, 2020, pp. 1828–1838

  15. [15]

    M3Care: learning with missing modalities in multimodal healthcare data,

    C. Zhang, X. Chu, L. Ma, Y . Zhu, Y . Wang, J. Wang, and J. Zhao, “M3Care: learning with missing modalities in multimodal healthcare data,” in Proceedings of the 28th ACM SIGKDD Conference on Knowl- edge Discovery and Data Mining , 2022, pp. 2418–2428

  16. [16]

    Multimodal patient representation learning with missing modalities and labels,

    Z. Wu, A. Dadu, N. Tustison, B. Avants, M. Nalls, J. Sun, and F. Faghri, “Multimodal patient representation learning with missing modalities and labels,” in International Conference on Learning Representations, 2024

  17. [17]

    V AEs in the presence of missing data,

    M. Collier, A. Nazabal, and C. Williams, “V AEs in the presence of missing data,” in ICML Workshop on the Art of Learning with Missing Values (Artemiss), 2020

  18. [18]

    Multimodal deep learning

    J. Ngiam, A. Khosla, M. Kim, J. Nam, H. Lee, A. Y . Ng et al. , “Multimodal deep learning.” in Proceedings of the 28th International Conference on Machine Learning , vol. 11, 2011, pp. 689–696

  19. [19]

    DrFuse: learning disentangled representation for clinical multi-modal fusion with missing modality and modal inconsistency,

    W. Yao, K. Yin, W. K. Cheung, J. Liu, and J. Qin, “DrFuse: learning disentangled representation for clinical multi-modal fusion with missing modality and modal inconsistency,” in Proceedings of the AAAI Confer- ence on Artificial Intelligence , vol. 38, 2024, pp. 16 416–16 424

  20. [20]

    Handling missing data with graph representation learning,

    J. You, X. Ma, Y . Ding, M. J. Kochenderfer, and J. Leskovec, “Handling missing data with graph representation learning,” Advances in Neural Information Processing Systems , vol. 33, pp. 19 075–19 087, 2020

  21. [21]

    Are multi- modal transformers robust to missing modality?

    M. Ma, J. Ren, L. Zhao, D. Testuggine, and X. Peng, “Are multi- modal transformers robust to missing modality?” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , 2022, pp. 18 177–18 186

  22. [22]

    MedFuse: multi-modal fusion with clinical time-series data and chest x-ray images,

    N. Hayat, K. J. Geras, and F. E. Shamout, “MedFuse: multi-modal fusion with clinical time-series data and chest x-ray images,” in Machine Learning for Healthcare Conference . PMLR, 2022, pp. 479–503

  23. [23]

    Learning factorized multimodal representations,

    Y .-H. H. Tsai, P. P. Liang, A. Zadeh, L.-P. Morency, and R. Salakhutdi- nov, “Learning factorized multimodal representations,” in International Conference on Learning Representations , 2019

  24. [24]

    SMIL: multimodal learning with severely missing modality,

    M. Ma, J. Ren, L. Zhao, S. Tulyakov, C. Wu, and X. Peng, “SMIL: multimodal learning with severely missing modality,” in Proceedings of the AAAI Conference on Artificial Intelligence , vol. 35, 2021, pp. 2302–2310

  25. [25]

    Robust multimodal learning with missing modalities via parameter-efficient adaptation,

    M. K. Reza, A. Prater-Bennette, and M. S. Asif, “Robust multimodal learning with missing modalities via parameter-efficient adaptation,” IEEE Transactions on Pattern Analysis and Machine Intelligence , 2024

  26. [26]

    Learn- ing missing modal electronic health records with unified multi-modal data embedding and modality-aware attention,

    K. Lee, S. Lee, S. Hahn, H. Hyun, E. Choi, B. Ahn, and J. Lee, “Learn- ing missing modal electronic health records with unified multi-modal data embedding and modality-aware attention,” in Machine Learning for Healthcare Conference . PMLR, 2023, pp. 423–442

  27. [27]

    Generating missing values for simulation purposes: a multivariate amputation procedure,

    R. M. Schouten, P. Lugtig, and G. Vink, “Generating missing values for simulation purposes: a multivariate amputation procedure,” Journal of Statistical Computation and Simulation , vol. 88, no. 15, pp. 2909–2930, 2018

  28. [28]

    Multi-omics data integration using ratio- based quantitative profiling with quartet reference materials,

    Y . Zheng, Y . Liu, J. Yang, L. Dong, R. Zhang, S. Tian, Y . Yu, L. Ren, W. Hou, F. Zhu et al. , “Multi-omics data integration using ratio- based quantitative profiling with quartet reference materials,” Nature Biotechnology, vol. 42, no. 7, pp. 1133–1149, 2024

  29. [29]

    A roadmap for multi-omics data integration using deep learning,

    M. Kang, E. Ko, and T. B. Mersha, “A roadmap for multi-omics data integration using deep learning,” Briefings in Bioinformatics , vol. 23, no. 1, p. bbab454, 2022

  30. [30]

    Multimodal learning for multi-omics: a survey,

    S. Tabakhi, M. N. I. Suvon, P. Ahadian, and H. Lu, “Multimodal learning for multi-omics: a survey,” World Scientific Annual Review of Artificial Intelligence, vol. 1, p. 2250004, 2023

  31. [31]

    Using machine learning approaches for multi-omics data analysis: a review,

    P. S. Reel, S. Reel, E. Pearson, E. Trucco, and E. Jefferson, “Using machine learning approaches for multi-omics data analysis: a review,” Biotechnology Advances, vol. 49, p. 107739, 2021

  32. [32]

    Integrative network fusion: a multi-omics approach in molecular profiling,

    M. Chierici, N. Bussola, A. Marcolini, M. Francescatto, A. Zandon `a, L. Trastulla, C. Agostinelli, G. Jurman, and C. Furlanello, “Integrative network fusion: a multi-omics approach in molecular profiling,” Fron- tiers in Oncology , vol. 10, p. 1065, 2020

  33. [33]

    Integration of multiomics data with graph convolutional networks to identify new can- cer genes and their associated molecular mechanisms,

    R. Schulte-Sasse, S. Budach, D. Hnisz, and A. Marsico, “Integration of multiomics data with graph convolutional networks to identify new can- cer genes and their associated molecular mechanisms,” Nature Machine Intelligence, vol. 3, no. 6, pp. 513–526, 2021

  34. [34]

    Missing data in multi-omics in- tegration: recent advances through artificial intelligence,

    J. E. Flores, D. M. Claborne, Z. D. Weller, B.-J. M. Webb-Robertson, K. M. Waters, and L. M. Bramer, “Missing data in multi-omics in- tegration: recent advances through artificial intelligence,” Frontiers in Artificial Intelligence, vol. 6, p. 1098308, 2023

  35. [35]

    Deep structure integrative representation of multi-omics data for cancer subtyping,

    B. Yang, Y . Yang, and X. Su, “Deep structure integrative representation of multi-omics data for cancer subtyping,” Bioinformatics, vol. 38, no. 13, pp. 3337–3342, 2022

  36. [36]

    Disease-image-specific learning for diagnosis-oriented neuroimage synthesis with incomplete multi- modality data,

    Y . Pan, M. Liu, Y . Xia, and D. Shen, “Disease-image-specific learning for diagnosis-oriented neuroimage synthesis with incomplete multi- modality data,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 10, pp. 6839–6853, 2021

  37. [37]

    Multiple imputation after 18+ years,

    D. B. Rubin, “Multiple imputation after 18+ years,” Journal of the American Statistical Association , vol. 91, no. 434, pp. 473–489, 1996

  38. [38]

    Multiple imputation: a primer,

    J. L. Schafer, “Multiple imputation: a primer,” Statistical Methods in Medical Research, vol. 8, no. 1, pp. 3–15, 1999

  39. [39]

    Missing data in clinical research: a tutorial on multiple imputation,

    P. C. Austin, I. R. White, D. S. Lee, and S. van Buuren, “Missing data in clinical research: a tutorial on multiple imputation,” Canadian Journal of Cardiology, vol. 37, no. 9, pp. 1322–1331, 2021

  40. [40]

    TOBMI: trans-omics block missing data imputation using a k-nearest neighbor weighted approach,

    X. Dong, L. Lin, R. Zhang, Y . Zhao, D. C. Christiani, Y . Wei, and F. Chen, “TOBMI: trans-omics block missing data imputation using a k-nearest neighbor weighted approach,” Bioinformatics, vol. 35, no. 8, pp. 1278–1283, 2019

  41. [41]

    Cobolt: integrative analysis of multimodal single-cell sequencing data,

    B. Gong, Y . Zhou, and E. Purdom, “Cobolt: integrative analysis of multimodal single-cell sequencing data,” Genome Biology, vol. 22, pp. 1–21, 2021

  42. [42]

    MultiVI: deep generative model for the integration of multimodal data,

    T. Ashuach, M. I. Gabitto, R. V . Koodli, G.-A. Saldi, M. I. Jordan, and N. Yosef, “MultiVI: deep generative model for the integration of multimodal data,” Nature Methods, vol. 20, no. 8, pp. 1222–1231, 2023

  43. [43]

    Deep adversarial learning for multi-modality missing data completion,

    L. Cai, Z. Wang, H. Gao, D. Shen, and S. Ji, “Deep adversarial learning for multi-modality missing data completion,” in Proceedings of the 24th ACM SIGKDD Conference on Knowledge Discovery & Data Mining , 2018, pp. 1158–1166

  44. [44]

    VIGAN: missing view imputation with generative adversarial networks,

    C. Shang, A. Palmer, J. Sun, K.-S. Chen, J. Lu, and J. Bi, “VIGAN: missing view imputation with generative adversarial networks,” in IEEE International Conference on Big Data . IEEE, 2017, pp. 766–775

  45. [45]

    A flexible generative model for heterogeneous tabular EHR with missing modality,

    H. He, W. hao, Y . Xi, Y . Chen, B. Malin, and J. Ho, “A flexible generative model for heterogeneous tabular EHR with missing modality,” in International Conference on Learning Representations , 2024

  46. [46]

    NEMO: cancer subtyping by integration of partial multi-omic data,

    N. Rappoport and R. Shamir, “NEMO: cancer subtyping by integration of partial multi-omic data,” Bioinformatics, vol. 35, no. 18, pp. 3348– 3356, 2019

  47. [47]

    MRGCN: cancer subtyping with multi-reconstruction graph convolutional network using full and partial multi-omics dataset,

    B. Yang, Y . Yang, M. Wang, and X. Su, “MRGCN: cancer subtyping with multi-reconstruction graph convolutional network using full and partial multi-omics dataset,” Bioinformatics, vol. 39, no. 6, p. btad353, 2023

  48. [48]

    Augmented sparse representation for incomplete multiview clustering,

    J. Chen, S. Yang, X. Peng, D. Peng, and Z. Wang, “Augmented sparse representation for incomplete multiview clustering,” IEEE Transactions on Neural Networks and Learning Systems , vol. 35, no. 3, pp. 4058– 4071, 2022

  49. [49]

    Inductive representation learning on large graphs,

    W. Hamilton, Z. Ying, and J. Leskovec, “Inductive representation learning on large graphs,” Advances in Neural Information Processing Systems, vol. 30, 2017

  50. [50]

    Graph representation learning,

    W. L. Hamilton, “Graph representation learning,” Synthesis Lectures on Artificial Intelligence and Machine Learning , vol. 14, no. 3, pp. 1–159, 2020

  51. [51]

    Visualizing data using t-SNE,

    L. Van der Maaten and G. Hinton, “Visualizing data using t-SNE,” Journal of Machine Learning Research , vol. 9, no. 11, 2008

  52. [52]

    The Cancer Genome Atlas Pan-Cancer analysis project,

    J. N. Weinstein, E. A. Collisson, G. B. Mills, K. R. Shaw, B. A. Ozenberger, K. Ellrott, I. Shmulevich, C. Sander, and J. M. Stuart, “The Cancer Genome Atlas Pan-Cancer analysis project,” Nature Genetics , vol. 45, no. 10, pp. 1113–1120, 2013

  53. [53]

    PCA-PAM50 improves consistency between breast cancer intrinsic and clinical subtyping reclassifying a subset of luminal A tumors as luminal B,

    P.-K. Raj-Kumar, J. Liu, J. A. Hooke, A. J. Kovatich, L. Kvecher, C. D. Shriver, and H. Hu, “PCA-PAM50 improves consistency between breast cancer intrinsic and clinical subtyping reclassifying a subset of luminal A tumors as luminal B,” Scientific Reports, vol. 9, no. 1, p. 7956, 2019

  54. [54]

    Comprehensive molec- ular characterization of urothelial bladder carcinoma,

    The Cancer Genome Atlas Research Network, “Comprehensive molec- ular characterization of urothelial bladder carcinoma,” Nature, vol. 507, no. 7492, p. 315, 2014

  55. [55]

    Min-redundancy and max-relevance multi-view feature selection for predicting ovarian cancer survival using multi-omics data,

    Y . El-Manzalawy, T.-Y . Hsieh, M. Shivakumar, D. Kim, and V . Honavar, “Min-redundancy and max-relevance multi-view feature selection for predicting ovarian cancer survival using multi-omics data,” BMC Med- ical Genomics, vol. 11, no. 3, pp. 19–31, 2018

  56. [56]

    Visualizing and interpreting cancer genomics data via the Xena platform,

    M. J. Goldman, B. Craft, M. Hastie, K. Repe ˇcka, F. McDade, A. Kamath, A. Banerjee, Y . Luo, D. Rogers, A. N. Brooks et al. , “Visualizing and interpreting cancer genomics data via the Xena platform,” Nature Biotechnology, vol. 38, no. 6, pp. 675–678, 2020

  57. [57]

    E. R. Girden, ANOVA: repeated measures. Sage, 1992, no. 84

  58. [58]

    Adam: a method for stochastic optimization,

    D. P. Kingma and J. Ba, “Adam: a method for stochastic optimization,” in International Conference on Learning Representations , 2015

  59. [59]

    Silhouettes: a graphical aid to the interpretation and validation of cluster analysis,

    P. J. Rousseeuw, “Silhouettes: a graphical aid to the interpretation and validation of cluster analysis,” Journal of Computational and Applied Mathematics, vol. 20, pp. 53–65, 1987. JOURNAL OF LATEX CLASS FILES, VOL. XX, NO. XX, XXX 2025 12

  60. [60]

    A cluster separation measure,

    D. L. Davies and D. W. Bouldin, “A cluster separation measure,” IEEE Transactions on Pattern Analysis and Machine Intelligence , vol. PAMI- 1, no. 2, pp. 224–227, 1979

  61. [61]

    Dataset for

    S. Ma, “Dataset for ”moving towards genome-wide data integration for patient stratification with integrate any omics”,” Oct. 2024, Zenodo. [Online]. Available: https://doi.org/10.5281/zenodo.13989262

  62. [62]

    Moving towards genome-wide data integration for patient stratification with integrate any omics,

    S. Ma, A. G. Zeng, B. Haibe-Kains, A. Goldenberg, J. E. Dick, and B. Wang, “Moving towards genome-wide data integration for patient stratification with integrate any omics,” Nature Machine Intelligence , vol. 7, no. 1, pp. 29–42, 2025

  63. [63]

    Graph attention networks,

    P. Veli ˇckovi´c, G. Cucurull, A. Casanova, A. Romero, P. Li `o, and Y . Bengio, “Graph attention networks,” in International Conference on Learning Representations, 2018

  64. [64]

    Semi-supervised classification with graph convolutional networks,

    T. N. Kipf and M. Welling, “Semi-supervised classification with graph convolutional networks,” in International Conference on Learning Rep- resentations, 2017

  65. [65]

    How powerful are graph neural networks?

    K. Xu, W. Hu, J. Leskovec, and S. Jegelka, “How powerful are graph neural networks?” in International Conference on Learning Represen- tations, 2019

  66. [66]

    Tune: A Research Platform for Distributed Model Selection and Training

    R. Liaw, E. Liang, R. Nishihara, P. Moritz, J. E. Gonzalez, and I. Stoica, “Tune: a research platform for distributed model selection and training,” arXiv preprint arXiv:1807.05118 , 2018

  67. [67]

    A system for massively parallel hyperpa- rameter tuning,

    L. Li, K. Jamieson, A. Rostamizadeh, E. Gonina, J. Ben-Tzur, M. Hardt, B. Recht, and A. Talwalkar, “A system for massively parallel hyperpa- rameter tuning,” Proceedings of Machine Learning and Systems , vol. 2, pp. 230–246, 2020. JOURNAL OF LATEX CLASS FILES, VOL. XX, NO. XX, XXX 2025 13 Supplementary Material APPENDIX A. Hyperparameter Tuning To ensure ...