Missing-Modality-Aware Graph Neural Network for Cancer Classification

Chen (Cherise) Chen; Haiping Lu; Sina Tabakhi

arxiv: 2506.22901 · v2 · pith:M6KNH3T5new · submitted 2025-06-28 · 💻 cs.LG · cs.AI· q-bio.BM· q-bio.GN

Missing-Modality-Aware Graph Neural Network for Cancer Classification

Sina Tabakhi , Chen (Cherise) Chen , Haiping Lu This is my paper

Pith reviewed 2026-05-21 23:41 UTC · model grok-4.3

classification 💻 cs.LG cs.AIq-bio.BMq-bio.GN

keywords missing modalitiesgraph neural networkmultiomics datacancer classificationmultimodal fusionattention mechanismpatient graph

0 comments

The pith

A graph neural network connecting patients by shared missing-modality patterns improves cancer classification from incomplete multiomics data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a method to handle missing modalities in multimodal biological data without imputing values or dropping patients. It first fuses available modality embeddings through an attention mechanism that weighs each modality according to its presence and contribution for that patient. The fused representations then become node features in a patient graph whose edges reflect similarities in missingness patterns across modalities. A graph neural network processes this graph to produce the final classification. This design keeps fusion cost linear in the number of modalities and is shown to outperform prior fusion techniques on three real-world multiomics cancer datasets.

Core claim

MAGNET fuses lower-dimensional modality embeddings with a dynamic patient-modality multi-head attention mechanism whose complexity grows linearly with the number of modalities while adapting to each patient's missing-pattern variability. It then constructs a patient graph whose nodes carry these fused embeddings and whose connectivity is set by modality missingness, after which a graph neural network generates the predictions. On three public multiomics datasets containing real-world missingness, this approach outperforms state-of-the-art fusion methods for cancer classification.

What carries the argument

The patient graph whose edges are determined by modality missingness patterns, allowing the subsequent graph neural network to propagate information among patients who share similar data-availability profiles.

If this is right

All patients with partial modalities can be retained rather than excluded or imputed.
Fusion cost remains linear rather than combinatorial as the number of modalities grows.
The model adapts automatically to different missing-pattern distributions without retraining for each pattern.
Predictions improve by exploiting shared missingness as an additional source of structure in the data.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same graph-construction idea could be applied to other clinical tasks such as survival prediction where missingness may also correlate with outcomes.
Missingness patterns themselves may serve as a weak but useful proxy for unmeasured patient factors such as disease severity or care access.
Future experiments could replace the missingness-based edges with edges derived from additional metadata to test whether the current signal is the strongest available.
The linear scaling property suggests the method remains practical when new modalities are added to existing multiomics collections.

Load-bearing premise

The assumption that connectivity in the patient graph determined by modality missingness patterns provides meaningful signal for the GNN to improve predictions beyond what the fused embeddings alone achieve.

What would settle it

An ablation that removes the graph neural network step and classifies directly from the fused embeddings, then measures whether accuracy drops on the same three datasets, would test whether the missingness-based graph structure adds predictive value.

Figures

Figures reproduced from arXiv: 2506.22901 by Chen (Cherise) Chen, Haiping Lu, Sina Tabakhi.

**Figure 1.** Figure 1: Different missing-modality patterns (zero, one, or two modalities missing) across 12 patients with three modalities. Each colored row within a modality [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗

**Figure 2.** Figure 2: MAGNET performs classification tasks in three steps. (1) MAGNET encodes each modality into a lower-dimensional embedding of the same size. (2) A patient-modality multi-head attention mechanism computes the importance of each modality for each patient and applies a modality mask to set attention weights to zero for missing modalities. It aggregates patient embeddings into a fused embedding using the attenti… view at source ↗

**Figure 3.** Figure 3: Effect of varying missingness ratios on simulated cancer data using Macro F1, averaged over five independent runs per ratio. (a) One modality remains [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗

**Figure 4.** Figure 4: UMAP visualization of the training and test data from the BRCA dataset across each input omics modality and the patient representations learned by the [PITH_FULL_IMAGE:figures/full_fig_p010_4.png] view at source ↗

read the original abstract

A key challenge in learning from multimodal biological data is missing modalities, where data from one or more modalities are absent for some patients. Existing approaches either exclude patients with missing modalities, impute missing modalities, or make predictions directly with partial modalities. However, most of these methods rely on inflexible, patient-agnostic fusion strategies and do not scale computationally to the combinatorial growth of missing-modality patterns as the number of modalities increases. To address these limitations, we propose MAGNET (Missing-modality-Aware Graph neural NETwork) to enhance multimodal prediction with partial modalities, featuring a dynamic patient-modality multi-head attention mechanism to fuse lower-dimensional modality embeddings based on their contribution and missingness. MAGNET fusion's complexity increases linearly with the number of modalities while adapting to missing-pattern variability. To generate predictions, MAGNET further constructs a patient graph with fused multimodal embeddings as node features and connectivity determined by the modality missingness, followed by a graph neural network. Experiments on three public multiomics datasets for cancer classification, with real-world missingness, show that MAGNET outperforms state-of-the-art fusion methods. The data and code are available at https://github.com/SinaTabakhi/MAGNET.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

MAGNET ties a missingness-pattern patient graph to dynamic attention fusion for incomplete multiomics cancer data, with reported gains and code release, but lacks ablations to show the graph adds value beyond fusion.

read the letter

The key takeaway is that MAGNET uses a patient graph whose connections are defined by modality missingness patterns, layered on top of a dynamic multi-head attention fusion that adapts to which modalities are present for each patient. This setup is new in how it explicitly models the missingness structure in the graph rather than treating missing data only through the fusion step. The attention mechanism keeps computation linear in the number of modalities, which addresses the scaling issue with many possible missing patterns. The experiments use three real multiomics cancer datasets with natural missingness, report better classification performance than several state-of-the-art methods, and include a code release on GitHub. Those elements make the work practical and verifiable to some degree. The main soft spot is the missing ablation study. The paper does not compare the full model against the fusion module alone followed by a standard classifier. Without that, it is difficult to tell whether the GNN on the missingness graph is contributing meaningful signal or if the gains are driven primarily by the attention-based fusion. The assumption that missingness patterns provide useful connectivity for prediction needs direct testing. Overall, this paper targets applied researchers in multimodal machine learning for biomedicine who face incomplete data in cancer studies. A reader working on similar problems could pick up the method and the code to try on their own datasets. The combination of a clear practical problem, real data experiments, and open code makes it worth a serious referee's time, even if revisions are needed for stronger evidence on each component. I recommend sending it for peer review with specific requests for ablations and statistical details.

Referee Report

3 major / 2 minor

Summary. The manuscript proposes MAGNET, a graph neural network for multimodal cancer classification from multiomics data with missing modalities. It uses a dynamic patient-modality multi-head attention mechanism to fuse lower-dimensional embeddings in a manner whose complexity scales linearly with the number of modalities while conditioning on missingness patterns. A patient graph is constructed with the fused embeddings as node features and edges determined by modality missingness patterns; a GNN is then applied to produce predictions. Experiments on three public multiomics datasets exhibiting real-world missingness report that MAGNET outperforms existing state-of-the-art fusion methods, with code released.

Significance. If the reported gains are shown to arise from the missingness-derived graph rather than the fusion module alone, the work would offer a scalable alternative to imputation or patient exclusion in incomplete multimodal settings. The linear-complexity attention and explicit use of missingness for graph construction are ideas that could transfer to other clinical prediction tasks. Code release supports reproducibility.

major comments (3)

[Experiments] Experiments section: No ablation isolates the contribution of the patient graph whose connectivity is set by modality missingness patterns. The paper does not compare the full MAGNET model against the same dynamic attention fusion module followed by a non-graph classifier (e.g., MLP or linear layer) on the fused embeddings. Without this control, it remains possible that the headline outperformance is attributable only to the fusion step rather than the GNN operating on missingness-based edges.
[Experimental Setup] Experimental setup and results: The manuscript provides insufficient detail on baseline re-implementations, hyperparameter tuning protocols, number of independent runs or random seeds, and statistical significance testing (e.g., paired t-tests or confidence intervals). These omissions prevent full verification of the claimed superiority on the three datasets.
[Method] Method section on patient-graph construction: The assumption that edges defined by shared missingness patterns supply label-relevant structure is not supported by any diagnostic analysis (e.g., comparison of graph properties against random or feature-similarity graphs, or edge-weight correlation with labels). This component is load-bearing for the central claim yet lacks direct evidence.

minor comments (2)

[Abstract] The abstract states that experiments use 'real-world missingness' but does not quantify the per-modality missing rates or list the exact modalities present in each of the three datasets.
[Figures] Figure captions and axis labels in the experimental results could be expanded to include the precise missingness percentages and the number of modalities for each dataset.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed comments. We address each major point below and will revise the manuscript accordingly to strengthen the presentation and evidence.

read point-by-point responses

Referee: [Experiments] Experiments section: No ablation isolates the contribution of the patient graph whose connectivity is set by modality missingness patterns. The paper does not compare the full MAGNET model against the same dynamic attention fusion module followed by a non-graph classifier (e.g., MLP or linear layer) on the fused embeddings. Without this control, it remains possible that the headline outperformance is attributable only to the fusion step rather than the GNN operating on missingness-based edges.

Authors: We agree that an ablation isolating the graph component is important for validating the central claim. In the revised manuscript we will add a direct comparison of the full MAGNET model against a variant that applies the identical dynamic patient-modality multi-head attention fusion module followed by an MLP (or linear) classifier on the fused embeddings, omitting the GNN and missingness-based edges. This control will clarify the incremental contribution of the patient graph. revision: yes
Referee: [Experimental Setup] Experimental setup and results: The manuscript provides insufficient detail on baseline re-implementations, hyperparameter tuning protocols, number of independent runs or random seeds, and statistical significance testing (e.g., paired t-tests or confidence intervals). These omissions prevent full verification of the claimed superiority on the three datasets.

Authors: We will expand the experimental setup and results sections (and add an appendix if needed) to provide the requested details: descriptions of baseline re-implementations and any missing-modality adaptations; the hyperparameter search protocol and ranges; the number of independent runs (five runs with distinct random seeds); and statistical significance testing including paired t-tests, p-values, and 95% confidence intervals for all reported metrics. revision: yes
Referee: [Method] Method section on patient-graph construction: The assumption that edges defined by shared missingness patterns supply label-relevant structure is not supported by any diagnostic analysis (e.g., comparison of graph properties against random or feature-similarity graphs, or edge-weight correlation with labels). This component is load-bearing for the central claim yet lacks direct evidence.

Authors: We accept that direct diagnostic evidence would strengthen the justification for the missingness-based graph. In the revision we will add an analysis that (i) compares structural properties (clustering coefficient, modularity, average degree) of the missingness-derived graph against random graphs and feature-similarity graphs, and (ii) reports the correlation between edge weights (derived from shared missingness) and label agreement across patient pairs. These diagnostics will be presented in a new subsection or appendix. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical architecture evaluated on external benchmarks

full rationale

The paper presents MAGNET as a new architecture combining dynamic multi-head attention fusion (linear in modalities) with a patient graph whose edges are set by missingness patterns and then processed by a GNN. All performance claims rest on experiments against external baselines on three public multiomics datasets with real missingness. No equations, fitted parameters, or self-citations are shown to reduce the reported gains or the graph-construction step to quantities defined inside the paper itself. The derivation is therefore self-contained and externally falsifiable.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on standard supervised learning assumptions plus the modeling choice that missingness patterns carry predictive signal. No new physical entities are postulated. Hyperparameters such as number of attention heads and GNN layers are present but not enumerated as free parameters here.

axioms (2)

domain assumption Standard assumptions of supervised classification on tabular multiomics features hold for the cancer datasets used.
Invoked implicitly when reporting classification performance on public datasets.
ad hoc to paper The attention mechanism can meaningfully weigh modality contributions conditional on missingness.
Core design choice stated in the abstract description of the fusion step.

pith-pipeline@v0.9.0 · 5758 in / 1427 out tokens · 39472 ms · 2026-05-21T23:41:31.453433+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

67 extracted references · 67 canonical work pages · 1 internal anchor

[1]

Embracing cancer complexity: hallmarks of systemic disease,

C. Swanton, E. Bernard, C. Abbosh, F. Andr ´e, J. Auwerx, A. Balmain, D. Bar-Sagi, R. Bernards, S. Bullman, J. DeGregori et al., “Embracing cancer complexity: hallmarks of systemic disease,” Cell, vol. 187, no. 7, pp. 1589–1616, 2024

work page 2024
[2]

An information fusion system-driven deep neural networks with application to cancer mortality risk estimate,

C.-H. Yang, S.-H. Moi, L.-Y . Chuang, and Y .-D. Lin, “An information fusion system-driven deep neural networks with application to cancer mortality risk estimate,” IEEE Transactions on Neural Networks and Learning Systems, vol. 36, no. 2, pp. 2905–2916, 2025

work page 2025
[3]

In- terpretable identification of cancer genes across biological networks via transformer-powered graph representation learning,

X. Su, P. Hu, D. Li, B. Zhao, Z. Niu, T. Herget, P. S. Yu, and L. Hu, “In- terpretable identification of cancer genes across biological networks via transformer-powered graph representation learning,” Nature Biomedical Engineering, pp. 1–19, 2025

work page 2025
[4]

Integrative omics for health and disease,

K. J. Karczewski and M. P. Snyder, “Integrative omics for health and disease,” Nature Reviews Genetics , vol. 19, no. 5, pp. 299–310, 2018

work page 2018
[5]

Multimodal biomedical AI,

J. N. Acosta, G. J. Falcone, P. Rajpurkar, and E. J. Topol, “Multimodal biomedical AI,” Nature Medicine, vol. 28, no. 9, pp. 1773–1784, 2022

work page 2022
[6]

Machine learning for integrating data in biology and medicine: principles, practice, and opportunities,

M. Zitnik, F. Nguyen, B. Wang, J. Leskovec, A. Goldenberg, and M. M. Hoffman, “Machine learning for integrating data in biology and medicine: principles, practice, and opportunities,” Information Fusion , vol. 50, pp. 71–91, 2019

work page 2019
[7]

MSAFF: multi-way soft attention fusion framework with the large foundation models for the diagnosis of Alzheimer’s disease,

X.-A. Bi, W. Shen, Y . Shan, D. Chen, L. Xu, K. Chen, and Z. Liu, “MSAFF: multi-way soft attention fusion framework with the large foundation models for the diagnosis of Alzheimer’s disease,” IEEE Transactions on Neural Networks and Learning Systems, pp. 1–15, 2025

work page 2025
[8]

Benchmarking joint multi-omics dimensionality reduction approaches for the study of cancer,

L. Cantini, P. Zakeri, C. Hernandez, A. Naldi, D. Thieffry, E. Remy, and A. Baudot, “Benchmarking joint multi-omics dimensionality reduction approaches for the study of cancer,” Nature Communications, vol. 12, no. 1, p. 124, 2021

work page 2021
[9]

Similarity network fusion for aggregating data types on a genomic scale,

B. Wang, A. M. Mezlini, F. Demir, M. Fiume, Z. Tu, M. Brudno, B. Haibe-Kains, and A. Goldenberg, “Similarity network fusion for aggregating data types on a genomic scale,” Nature Methods, vol. 11, no. 3, pp. 333–337, 2014

work page 2014
[10]

MOGONET integrates multi-omics data using graph convolutional networks allowing patient classification and biomarker identification,

T. Wang, W. Shao, Z. Huang, H. Tang, J. Zhang, Z. Ding, and K. Huang, “MOGONET integrates multi-omics data using graph convolutional networks allowing patient classification and biomarker identification,” Nature Communications, vol. 12, no. 1, p. 3445, 2021

work page 2021
[11]

Learning from data with structured missingness,

R. Mitra, S. F. McGough, T. Chakraborti, C. Holmes, R. Copping, N. Hagenbuch, S. Biedermann, J. Noonan, B. Lehmann, A. Shenvi et al., “Learning from data with structured missingness,”Nature Machine Intelligence, vol. 5, no. 1, pp. 13–23, 2023

work page 2023
[12]

A review of integrative imputation for multi-omics datasets,

M. Song, J. Greenbaum, J. Luttrell IV , W. Zhou, C. Wu, H. Shen, P. Gong, C. Zhang, and H.-W. Deng, “A review of integrative imputation for multi-omics datasets,” Frontiers in Genetics , vol. 11, p. 570255, 2020

work page 2020
[13]

Exploiting interdata relationships in next-generation proteomics analysis,

B. Vitrinel, H. W. Koh, F. M. Kar, S. Maity, J. Rendleman, H. Choi, and C. V ogel, “Exploiting interdata relationships in next-generation proteomics analysis,” Molecular & Cellular Proteomics , vol. 18, no. 8, pp. S5–S14, 2019

work page 2019
[14]

Multimodal learning with incomplete modalities by knowledge distillation,

Q. Wang, L. Zhan, P. Thompson, and J. Zhou, “Multimodal learning with incomplete modalities by knowledge distillation,” in Proceedings JOURNAL OF LATEX CLASS FILES, VOL. XX, NO. XX, XXX 2025 11 of the 26th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, 2020, pp. 1828–1838

work page 2025
[15]

M3Care: learning with missing modalities in multimodal healthcare data,

C. Zhang, X. Chu, L. Ma, Y . Zhu, Y . Wang, J. Wang, and J. Zhao, “M3Care: learning with missing modalities in multimodal healthcare data,” in Proceedings of the 28th ACM SIGKDD Conference on Knowl- edge Discovery and Data Mining , 2022, pp. 2418–2428

work page 2022
[16]

Multimodal patient representation learning with missing modalities and labels,

Z. Wu, A. Dadu, N. Tustison, B. Avants, M. Nalls, J. Sun, and F. Faghri, “Multimodal patient representation learning with missing modalities and labels,” in International Conference on Learning Representations, 2024

work page 2024
[17]

V AEs in the presence of missing data,

M. Collier, A. Nazabal, and C. Williams, “V AEs in the presence of missing data,” in ICML Workshop on the Art of Learning with Missing Values (Artemiss), 2020

work page 2020
[18]

Multimodal deep learning

J. Ngiam, A. Khosla, M. Kim, J. Nam, H. Lee, A. Y . Ng et al. , “Multimodal deep learning.” in Proceedings of the 28th International Conference on Machine Learning , vol. 11, 2011, pp. 689–696

work page 2011
[19]

DrFuse: learning disentangled representation for clinical multi-modal fusion with missing modality and modal inconsistency,

W. Yao, K. Yin, W. K. Cheung, J. Liu, and J. Qin, “DrFuse: learning disentangled representation for clinical multi-modal fusion with missing modality and modal inconsistency,” in Proceedings of the AAAI Confer- ence on Artificial Intelligence , vol. 38, 2024, pp. 16 416–16 424

work page 2024
[20]

Handling missing data with graph representation learning,

J. You, X. Ma, Y . Ding, M. J. Kochenderfer, and J. Leskovec, “Handling missing data with graph representation learning,” Advances in Neural Information Processing Systems , vol. 33, pp. 19 075–19 087, 2020

work page 2020
[21]

Are multi- modal transformers robust to missing modality?

M. Ma, J. Ren, L. Zhao, D. Testuggine, and X. Peng, “Are multi- modal transformers robust to missing modality?” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , 2022, pp. 18 177–18 186

work page 2022
[22]

MedFuse: multi-modal fusion with clinical time-series data and chest x-ray images,

N. Hayat, K. J. Geras, and F. E. Shamout, “MedFuse: multi-modal fusion with clinical time-series data and chest x-ray images,” in Machine Learning for Healthcare Conference . PMLR, 2022, pp. 479–503

work page 2022
[23]

Learning factorized multimodal representations,

Y .-H. H. Tsai, P. P. Liang, A. Zadeh, L.-P. Morency, and R. Salakhutdi- nov, “Learning factorized multimodal representations,” in International Conference on Learning Representations , 2019

work page 2019
[24]

SMIL: multimodal learning with severely missing modality,

M. Ma, J. Ren, L. Zhao, S. Tulyakov, C. Wu, and X. Peng, “SMIL: multimodal learning with severely missing modality,” in Proceedings of the AAAI Conference on Artificial Intelligence , vol. 35, 2021, pp. 2302–2310

work page 2021
[25]

Robust multimodal learning with missing modalities via parameter-efficient adaptation,

M. K. Reza, A. Prater-Bennette, and M. S. Asif, “Robust multimodal learning with missing modalities via parameter-efficient adaptation,” IEEE Transactions on Pattern Analysis and Machine Intelligence , 2024

work page 2024
[26]

Learn- ing missing modal electronic health records with unified multi-modal data embedding and modality-aware attention,

K. Lee, S. Lee, S. Hahn, H. Hyun, E. Choi, B. Ahn, and J. Lee, “Learn- ing missing modal electronic health records with unified multi-modal data embedding and modality-aware attention,” in Machine Learning for Healthcare Conference . PMLR, 2023, pp. 423–442

work page 2023
[27]

Generating missing values for simulation purposes: a multivariate amputation procedure,

R. M. Schouten, P. Lugtig, and G. Vink, “Generating missing values for simulation purposes: a multivariate amputation procedure,” Journal of Statistical Computation and Simulation , vol. 88, no. 15, pp. 2909–2930, 2018

work page 2018
[28]

Multi-omics data integration using ratio- based quantitative profiling with quartet reference materials,

Y . Zheng, Y . Liu, J. Yang, L. Dong, R. Zhang, S. Tian, Y . Yu, L. Ren, W. Hou, F. Zhu et al. , “Multi-omics data integration using ratio- based quantitative profiling with quartet reference materials,” Nature Biotechnology, vol. 42, no. 7, pp. 1133–1149, 2024

work page 2024
[29]

A roadmap for multi-omics data integration using deep learning,

M. Kang, E. Ko, and T. B. Mersha, “A roadmap for multi-omics data integration using deep learning,” Briefings in Bioinformatics , vol. 23, no. 1, p. bbab454, 2022

work page 2022
[30]

Multimodal learning for multi-omics: a survey,

S. Tabakhi, M. N. I. Suvon, P. Ahadian, and H. Lu, “Multimodal learning for multi-omics: a survey,” World Scientific Annual Review of Artificial Intelligence, vol. 1, p. 2250004, 2023

work page 2023
[31]

Using machine learning approaches for multi-omics data analysis: a review,

P. S. Reel, S. Reel, E. Pearson, E. Trucco, and E. Jefferson, “Using machine learning approaches for multi-omics data analysis: a review,” Biotechnology Advances, vol. 49, p. 107739, 2021

work page 2021
[32]

Integrative network fusion: a multi-omics approach in molecular profiling,

M. Chierici, N. Bussola, A. Marcolini, M. Francescatto, A. Zandon `a, L. Trastulla, C. Agostinelli, G. Jurman, and C. Furlanello, “Integrative network fusion: a multi-omics approach in molecular profiling,” Fron- tiers in Oncology , vol. 10, p. 1065, 2020

work page 2020
[33]

Integration of multiomics data with graph convolutional networks to identify new can- cer genes and their associated molecular mechanisms,

R. Schulte-Sasse, S. Budach, D. Hnisz, and A. Marsico, “Integration of multiomics data with graph convolutional networks to identify new can- cer genes and their associated molecular mechanisms,” Nature Machine Intelligence, vol. 3, no. 6, pp. 513–526, 2021

work page 2021
[34]

Missing data in multi-omics in- tegration: recent advances through artificial intelligence,

J. E. Flores, D. M. Claborne, Z. D. Weller, B.-J. M. Webb-Robertson, K. M. Waters, and L. M. Bramer, “Missing data in multi-omics in- tegration: recent advances through artificial intelligence,” Frontiers in Artificial Intelligence, vol. 6, p. 1098308, 2023

work page 2023
[35]

Deep structure integrative representation of multi-omics data for cancer subtyping,

B. Yang, Y . Yang, and X. Su, “Deep structure integrative representation of multi-omics data for cancer subtyping,” Bioinformatics, vol. 38, no. 13, pp. 3337–3342, 2022

work page 2022
[36]

Disease-image-specific learning for diagnosis-oriented neuroimage synthesis with incomplete multi- modality data,

Y . Pan, M. Liu, Y . Xia, and D. Shen, “Disease-image-specific learning for diagnosis-oriented neuroimage synthesis with incomplete multi- modality data,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 10, pp. 6839–6853, 2021

work page 2021
[37]

Multiple imputation after 18+ years,

D. B. Rubin, “Multiple imputation after 18+ years,” Journal of the American Statistical Association , vol. 91, no. 434, pp. 473–489, 1996

work page 1996
[38]

Multiple imputation: a primer,

J. L. Schafer, “Multiple imputation: a primer,” Statistical Methods in Medical Research, vol. 8, no. 1, pp. 3–15, 1999

work page 1999
[39]

Missing data in clinical research: a tutorial on multiple imputation,

P. C. Austin, I. R. White, D. S. Lee, and S. van Buuren, “Missing data in clinical research: a tutorial on multiple imputation,” Canadian Journal of Cardiology, vol. 37, no. 9, pp. 1322–1331, 2021

work page 2021
[40]

TOBMI: trans-omics block missing data imputation using a k-nearest neighbor weighted approach,

X. Dong, L. Lin, R. Zhang, Y . Zhao, D. C. Christiani, Y . Wei, and F. Chen, “TOBMI: trans-omics block missing data imputation using a k-nearest neighbor weighted approach,” Bioinformatics, vol. 35, no. 8, pp. 1278–1283, 2019

work page 2019
[41]

Cobolt: integrative analysis of multimodal single-cell sequencing data,

B. Gong, Y . Zhou, and E. Purdom, “Cobolt: integrative analysis of multimodal single-cell sequencing data,” Genome Biology, vol. 22, pp. 1–21, 2021

work page 2021
[42]

MultiVI: deep generative model for the integration of multimodal data,

T. Ashuach, M. I. Gabitto, R. V . Koodli, G.-A. Saldi, M. I. Jordan, and N. Yosef, “MultiVI: deep generative model for the integration of multimodal data,” Nature Methods, vol. 20, no. 8, pp. 1222–1231, 2023

work page 2023
[43]

Deep adversarial learning for multi-modality missing data completion,

L. Cai, Z. Wang, H. Gao, D. Shen, and S. Ji, “Deep adversarial learning for multi-modality missing data completion,” in Proceedings of the 24th ACM SIGKDD Conference on Knowledge Discovery & Data Mining , 2018, pp. 1158–1166

work page 2018
[44]

VIGAN: missing view imputation with generative adversarial networks,

C. Shang, A. Palmer, J. Sun, K.-S. Chen, J. Lu, and J. Bi, “VIGAN: missing view imputation with generative adversarial networks,” in IEEE International Conference on Big Data . IEEE, 2017, pp. 766–775

work page 2017
[45]

A flexible generative model for heterogeneous tabular EHR with missing modality,

H. He, W. hao, Y . Xi, Y . Chen, B. Malin, and J. Ho, “A flexible generative model for heterogeneous tabular EHR with missing modality,” in International Conference on Learning Representations , 2024

work page 2024
[46]

NEMO: cancer subtyping by integration of partial multi-omic data,

N. Rappoport and R. Shamir, “NEMO: cancer subtyping by integration of partial multi-omic data,” Bioinformatics, vol. 35, no. 18, pp. 3348– 3356, 2019

work page 2019
[47]

MRGCN: cancer subtyping with multi-reconstruction graph convolutional network using full and partial multi-omics dataset,

B. Yang, Y . Yang, M. Wang, and X. Su, “MRGCN: cancer subtyping with multi-reconstruction graph convolutional network using full and partial multi-omics dataset,” Bioinformatics, vol. 39, no. 6, p. btad353, 2023

work page 2023
[48]

Augmented sparse representation for incomplete multiview clustering,

J. Chen, S. Yang, X. Peng, D. Peng, and Z. Wang, “Augmented sparse representation for incomplete multiview clustering,” IEEE Transactions on Neural Networks and Learning Systems , vol. 35, no. 3, pp. 4058– 4071, 2022

work page 2022
[49]

Inductive representation learning on large graphs,

W. Hamilton, Z. Ying, and J. Leskovec, “Inductive representation learning on large graphs,” Advances in Neural Information Processing Systems, vol. 30, 2017

work page 2017
[50]

Graph representation learning,

W. L. Hamilton, “Graph representation learning,” Synthesis Lectures on Artificial Intelligence and Machine Learning , vol. 14, no. 3, pp. 1–159, 2020

work page 2020
[51]

Visualizing data using t-SNE,

L. Van der Maaten and G. Hinton, “Visualizing data using t-SNE,” Journal of Machine Learning Research , vol. 9, no. 11, 2008

work page 2008
[52]

The Cancer Genome Atlas Pan-Cancer analysis project,

J. N. Weinstein, E. A. Collisson, G. B. Mills, K. R. Shaw, B. A. Ozenberger, K. Ellrott, I. Shmulevich, C. Sander, and J. M. Stuart, “The Cancer Genome Atlas Pan-Cancer analysis project,” Nature Genetics , vol. 45, no. 10, pp. 1113–1120, 2013

work page 2013
[53]

PCA-PAM50 improves consistency between breast cancer intrinsic and clinical subtyping reclassifying a subset of luminal A tumors as luminal B,

P.-K. Raj-Kumar, J. Liu, J. A. Hooke, A. J. Kovatich, L. Kvecher, C. D. Shriver, and H. Hu, “PCA-PAM50 improves consistency between breast cancer intrinsic and clinical subtyping reclassifying a subset of luminal A tumors as luminal B,” Scientific Reports, vol. 9, no. 1, p. 7956, 2019

work page 2019
[54]

Comprehensive molec- ular characterization of urothelial bladder carcinoma,

The Cancer Genome Atlas Research Network, “Comprehensive molec- ular characterization of urothelial bladder carcinoma,” Nature, vol. 507, no. 7492, p. 315, 2014

work page 2014
[55]

Min-redundancy and max-relevance multi-view feature selection for predicting ovarian cancer survival using multi-omics data,

Y . El-Manzalawy, T.-Y . Hsieh, M. Shivakumar, D. Kim, and V . Honavar, “Min-redundancy and max-relevance multi-view feature selection for predicting ovarian cancer survival using multi-omics data,” BMC Med- ical Genomics, vol. 11, no. 3, pp. 19–31, 2018

work page 2018
[56]

Visualizing and interpreting cancer genomics data via the Xena platform,

M. J. Goldman, B. Craft, M. Hastie, K. Repe ˇcka, F. McDade, A. Kamath, A. Banerjee, Y . Luo, D. Rogers, A. N. Brooks et al. , “Visualizing and interpreting cancer genomics data via the Xena platform,” Nature Biotechnology, vol. 38, no. 6, pp. 675–678, 2020

work page 2020
[57]

E. R. Girden, ANOVA: repeated measures. Sage, 1992, no. 84

work page 1992
[58]

Adam: a method for stochastic optimization,

D. P. Kingma and J. Ba, “Adam: a method for stochastic optimization,” in International Conference on Learning Representations , 2015

work page 2015
[59]

Silhouettes: a graphical aid to the interpretation and validation of cluster analysis,

P. J. Rousseeuw, “Silhouettes: a graphical aid to the interpretation and validation of cluster analysis,” Journal of Computational and Applied Mathematics, vol. 20, pp. 53–65, 1987. JOURNAL OF LATEX CLASS FILES, VOL. XX, NO. XX, XXX 2025 12

work page 1987
[60]

A cluster separation measure,

D. L. Davies and D. W. Bouldin, “A cluster separation measure,” IEEE Transactions on Pattern Analysis and Machine Intelligence , vol. PAMI- 1, no. 2, pp. 224–227, 1979

work page 1979
[61]

Dataset for

S. Ma, “Dataset for ”moving towards genome-wide data integration for patient stratification with integrate any omics”,” Oct. 2024, Zenodo. [Online]. Available: https://doi.org/10.5281/zenodo.13989262

work page doi:10.5281/zenodo.13989262 2024
[62]

Moving towards genome-wide data integration for patient stratification with integrate any omics,

S. Ma, A. G. Zeng, B. Haibe-Kains, A. Goldenberg, J. E. Dick, and B. Wang, “Moving towards genome-wide data integration for patient stratification with integrate any omics,” Nature Machine Intelligence , vol. 7, no. 1, pp. 29–42, 2025

work page 2025
[63]

Graph attention networks,

P. Veli ˇckovi´c, G. Cucurull, A. Casanova, A. Romero, P. Li `o, and Y . Bengio, “Graph attention networks,” in International Conference on Learning Representations, 2018

work page 2018
[64]

Semi-supervised classification with graph convolutional networks,

T. N. Kipf and M. Welling, “Semi-supervised classification with graph convolutional networks,” in International Conference on Learning Rep- resentations, 2017

work page 2017
[65]

How powerful are graph neural networks?

K. Xu, W. Hu, J. Leskovec, and S. Jegelka, “How powerful are graph neural networks?” in International Conference on Learning Represen- tations, 2019

work page 2019
[66]

Tune: A Research Platform for Distributed Model Selection and Training

R. Liaw, E. Liang, R. Nishihara, P. Moritz, J. E. Gonzalez, and I. Stoica, “Tune: a research platform for distributed model selection and training,” arXiv preprint arXiv:1807.05118 , 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[67]

A system for massively parallel hyperpa- rameter tuning,

L. Li, K. Jamieson, A. Rostamizadeh, E. Gonina, J. Ben-Tzur, M. Hardt, B. Recht, and A. Talwalkar, “A system for massively parallel hyperpa- rameter tuning,” Proceedings of Machine Learning and Systems , vol. 2, pp. 230–246, 2020. JOURNAL OF LATEX CLASS FILES, VOL. XX, NO. XX, XXX 2025 13 Supplementary Material APPENDIX A. Hyperparameter Tuning To ensure ...

work page 2020

[1] [1]

Embracing cancer complexity: hallmarks of systemic disease,

C. Swanton, E. Bernard, C. Abbosh, F. Andr ´e, J. Auwerx, A. Balmain, D. Bar-Sagi, R. Bernards, S. Bullman, J. DeGregori et al., “Embracing cancer complexity: hallmarks of systemic disease,” Cell, vol. 187, no. 7, pp. 1589–1616, 2024

work page 2024

[2] [2]

An information fusion system-driven deep neural networks with application to cancer mortality risk estimate,

C.-H. Yang, S.-H. Moi, L.-Y . Chuang, and Y .-D. Lin, “An information fusion system-driven deep neural networks with application to cancer mortality risk estimate,” IEEE Transactions on Neural Networks and Learning Systems, vol. 36, no. 2, pp. 2905–2916, 2025

work page 2025

[3] [3]

In- terpretable identification of cancer genes across biological networks via transformer-powered graph representation learning,

X. Su, P. Hu, D. Li, B. Zhao, Z. Niu, T. Herget, P. S. Yu, and L. Hu, “In- terpretable identification of cancer genes across biological networks via transformer-powered graph representation learning,” Nature Biomedical Engineering, pp. 1–19, 2025

work page 2025

[4] [4]

Integrative omics for health and disease,

K. J. Karczewski and M. P. Snyder, “Integrative omics for health and disease,” Nature Reviews Genetics , vol. 19, no. 5, pp. 299–310, 2018

work page 2018

[5] [5]

Multimodal biomedical AI,

J. N. Acosta, G. J. Falcone, P. Rajpurkar, and E. J. Topol, “Multimodal biomedical AI,” Nature Medicine, vol. 28, no. 9, pp. 1773–1784, 2022

work page 2022

[6] [6]

Machine learning for integrating data in biology and medicine: principles, practice, and opportunities,

M. Zitnik, F. Nguyen, B. Wang, J. Leskovec, A. Goldenberg, and M. M. Hoffman, “Machine learning for integrating data in biology and medicine: principles, practice, and opportunities,” Information Fusion , vol. 50, pp. 71–91, 2019

work page 2019

[7] [7]

MSAFF: multi-way soft attention fusion framework with the large foundation models for the diagnosis of Alzheimer’s disease,

X.-A. Bi, W. Shen, Y . Shan, D. Chen, L. Xu, K. Chen, and Z. Liu, “MSAFF: multi-way soft attention fusion framework with the large foundation models for the diagnosis of Alzheimer’s disease,” IEEE Transactions on Neural Networks and Learning Systems, pp. 1–15, 2025

work page 2025

[8] [8]

Benchmarking joint multi-omics dimensionality reduction approaches for the study of cancer,

L. Cantini, P. Zakeri, C. Hernandez, A. Naldi, D. Thieffry, E. Remy, and A. Baudot, “Benchmarking joint multi-omics dimensionality reduction approaches for the study of cancer,” Nature Communications, vol. 12, no. 1, p. 124, 2021

work page 2021

[9] [9]

Similarity network fusion for aggregating data types on a genomic scale,

B. Wang, A. M. Mezlini, F. Demir, M. Fiume, Z. Tu, M. Brudno, B. Haibe-Kains, and A. Goldenberg, “Similarity network fusion for aggregating data types on a genomic scale,” Nature Methods, vol. 11, no. 3, pp. 333–337, 2014

work page 2014

[10] [10]

MOGONET integrates multi-omics data using graph convolutional networks allowing patient classification and biomarker identification,

T. Wang, W. Shao, Z. Huang, H. Tang, J. Zhang, Z. Ding, and K. Huang, “MOGONET integrates multi-omics data using graph convolutional networks allowing patient classification and biomarker identification,” Nature Communications, vol. 12, no. 1, p. 3445, 2021

work page 2021

[11] [11]

Learning from data with structured missingness,

R. Mitra, S. F. McGough, T. Chakraborti, C. Holmes, R. Copping, N. Hagenbuch, S. Biedermann, J. Noonan, B. Lehmann, A. Shenvi et al., “Learning from data with structured missingness,”Nature Machine Intelligence, vol. 5, no. 1, pp. 13–23, 2023

work page 2023

[12] [12]

A review of integrative imputation for multi-omics datasets,

M. Song, J. Greenbaum, J. Luttrell IV , W. Zhou, C. Wu, H. Shen, P. Gong, C. Zhang, and H.-W. Deng, “A review of integrative imputation for multi-omics datasets,” Frontiers in Genetics , vol. 11, p. 570255, 2020

work page 2020

[13] [13]

Exploiting interdata relationships in next-generation proteomics analysis,

B. Vitrinel, H. W. Koh, F. M. Kar, S. Maity, J. Rendleman, H. Choi, and C. V ogel, “Exploiting interdata relationships in next-generation proteomics analysis,” Molecular & Cellular Proteomics , vol. 18, no. 8, pp. S5–S14, 2019

work page 2019

[14] [14]

Multimodal learning with incomplete modalities by knowledge distillation,

Q. Wang, L. Zhan, P. Thompson, and J. Zhou, “Multimodal learning with incomplete modalities by knowledge distillation,” in Proceedings JOURNAL OF LATEX CLASS FILES, VOL. XX, NO. XX, XXX 2025 11 of the 26th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, 2020, pp. 1828–1838

work page 2025

[15] [15]

M3Care: learning with missing modalities in multimodal healthcare data,

C. Zhang, X. Chu, L. Ma, Y . Zhu, Y . Wang, J. Wang, and J. Zhao, “M3Care: learning with missing modalities in multimodal healthcare data,” in Proceedings of the 28th ACM SIGKDD Conference on Knowl- edge Discovery and Data Mining , 2022, pp. 2418–2428

work page 2022

[16] [16]

Multimodal patient representation learning with missing modalities and labels,

Z. Wu, A. Dadu, N. Tustison, B. Avants, M. Nalls, J. Sun, and F. Faghri, “Multimodal patient representation learning with missing modalities and labels,” in International Conference on Learning Representations, 2024

work page 2024

[17] [17]

V AEs in the presence of missing data,

M. Collier, A. Nazabal, and C. Williams, “V AEs in the presence of missing data,” in ICML Workshop on the Art of Learning with Missing Values (Artemiss), 2020

work page 2020

[18] [18]

Multimodal deep learning

J. Ngiam, A. Khosla, M. Kim, J. Nam, H. Lee, A. Y . Ng et al. , “Multimodal deep learning.” in Proceedings of the 28th International Conference on Machine Learning , vol. 11, 2011, pp. 689–696

work page 2011

[19] [19]

DrFuse: learning disentangled representation for clinical multi-modal fusion with missing modality and modal inconsistency,

W. Yao, K. Yin, W. K. Cheung, J. Liu, and J. Qin, “DrFuse: learning disentangled representation for clinical multi-modal fusion with missing modality and modal inconsistency,” in Proceedings of the AAAI Confer- ence on Artificial Intelligence , vol. 38, 2024, pp. 16 416–16 424

work page 2024

[20] [20]

Handling missing data with graph representation learning,

J. You, X. Ma, Y . Ding, M. J. Kochenderfer, and J. Leskovec, “Handling missing data with graph representation learning,” Advances in Neural Information Processing Systems , vol. 33, pp. 19 075–19 087, 2020

work page 2020

[21] [21]

Are multi- modal transformers robust to missing modality?

M. Ma, J. Ren, L. Zhao, D. Testuggine, and X. Peng, “Are multi- modal transformers robust to missing modality?” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , 2022, pp. 18 177–18 186

work page 2022

[22] [22]

MedFuse: multi-modal fusion with clinical time-series data and chest x-ray images,

N. Hayat, K. J. Geras, and F. E. Shamout, “MedFuse: multi-modal fusion with clinical time-series data and chest x-ray images,” in Machine Learning for Healthcare Conference . PMLR, 2022, pp. 479–503

work page 2022

[23] [23]

Learning factorized multimodal representations,

Y .-H. H. Tsai, P. P. Liang, A. Zadeh, L.-P. Morency, and R. Salakhutdi- nov, “Learning factorized multimodal representations,” in International Conference on Learning Representations , 2019

work page 2019

[24] [24]

SMIL: multimodal learning with severely missing modality,

M. Ma, J. Ren, L. Zhao, S. Tulyakov, C. Wu, and X. Peng, “SMIL: multimodal learning with severely missing modality,” in Proceedings of the AAAI Conference on Artificial Intelligence , vol. 35, 2021, pp. 2302–2310

work page 2021

[25] [25]

Robust multimodal learning with missing modalities via parameter-efficient adaptation,

M. K. Reza, A. Prater-Bennette, and M. S. Asif, “Robust multimodal learning with missing modalities via parameter-efficient adaptation,” IEEE Transactions on Pattern Analysis and Machine Intelligence , 2024

work page 2024

[26] [26]

Learn- ing missing modal electronic health records with unified multi-modal data embedding and modality-aware attention,

K. Lee, S. Lee, S. Hahn, H. Hyun, E. Choi, B. Ahn, and J. Lee, “Learn- ing missing modal electronic health records with unified multi-modal data embedding and modality-aware attention,” in Machine Learning for Healthcare Conference . PMLR, 2023, pp. 423–442

work page 2023

[27] [27]

Generating missing values for simulation purposes: a multivariate amputation procedure,

R. M. Schouten, P. Lugtig, and G. Vink, “Generating missing values for simulation purposes: a multivariate amputation procedure,” Journal of Statistical Computation and Simulation , vol. 88, no. 15, pp. 2909–2930, 2018

work page 2018

[28] [28]

Multi-omics data integration using ratio- based quantitative profiling with quartet reference materials,

Y . Zheng, Y . Liu, J. Yang, L. Dong, R. Zhang, S. Tian, Y . Yu, L. Ren, W. Hou, F. Zhu et al. , “Multi-omics data integration using ratio- based quantitative profiling with quartet reference materials,” Nature Biotechnology, vol. 42, no. 7, pp. 1133–1149, 2024

work page 2024

[29] [29]

A roadmap for multi-omics data integration using deep learning,

M. Kang, E. Ko, and T. B. Mersha, “A roadmap for multi-omics data integration using deep learning,” Briefings in Bioinformatics , vol. 23, no. 1, p. bbab454, 2022

work page 2022

[30] [30]

Multimodal learning for multi-omics: a survey,

S. Tabakhi, M. N. I. Suvon, P. Ahadian, and H. Lu, “Multimodal learning for multi-omics: a survey,” World Scientific Annual Review of Artificial Intelligence, vol. 1, p. 2250004, 2023

work page 2023

[31] [31]

Using machine learning approaches for multi-omics data analysis: a review,

P. S. Reel, S. Reel, E. Pearson, E. Trucco, and E. Jefferson, “Using machine learning approaches for multi-omics data analysis: a review,” Biotechnology Advances, vol. 49, p. 107739, 2021

work page 2021

[32] [32]

Integrative network fusion: a multi-omics approach in molecular profiling,

M. Chierici, N. Bussola, A. Marcolini, M. Francescatto, A. Zandon `a, L. Trastulla, C. Agostinelli, G. Jurman, and C. Furlanello, “Integrative network fusion: a multi-omics approach in molecular profiling,” Fron- tiers in Oncology , vol. 10, p. 1065, 2020

work page 2020

[33] [33]

Integration of multiomics data with graph convolutional networks to identify new can- cer genes and their associated molecular mechanisms,

R. Schulte-Sasse, S. Budach, D. Hnisz, and A. Marsico, “Integration of multiomics data with graph convolutional networks to identify new can- cer genes and their associated molecular mechanisms,” Nature Machine Intelligence, vol. 3, no. 6, pp. 513–526, 2021

work page 2021

[34] [34]

Missing data in multi-omics in- tegration: recent advances through artificial intelligence,

J. E. Flores, D. M. Claborne, Z. D. Weller, B.-J. M. Webb-Robertson, K. M. Waters, and L. M. Bramer, “Missing data in multi-omics in- tegration: recent advances through artificial intelligence,” Frontiers in Artificial Intelligence, vol. 6, p. 1098308, 2023

work page 2023

[35] [35]

Deep structure integrative representation of multi-omics data for cancer subtyping,

B. Yang, Y . Yang, and X. Su, “Deep structure integrative representation of multi-omics data for cancer subtyping,” Bioinformatics, vol. 38, no. 13, pp. 3337–3342, 2022

work page 2022

[36] [36]

Disease-image-specific learning for diagnosis-oriented neuroimage synthesis with incomplete multi- modality data,

Y . Pan, M. Liu, Y . Xia, and D. Shen, “Disease-image-specific learning for diagnosis-oriented neuroimage synthesis with incomplete multi- modality data,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 10, pp. 6839–6853, 2021

work page 2021

[37] [37]

Multiple imputation after 18+ years,

D. B. Rubin, “Multiple imputation after 18+ years,” Journal of the American Statistical Association , vol. 91, no. 434, pp. 473–489, 1996

work page 1996

[38] [38]

Multiple imputation: a primer,

J. L. Schafer, “Multiple imputation: a primer,” Statistical Methods in Medical Research, vol. 8, no. 1, pp. 3–15, 1999

work page 1999

[39] [39]

Missing data in clinical research: a tutorial on multiple imputation,

P. C. Austin, I. R. White, D. S. Lee, and S. van Buuren, “Missing data in clinical research: a tutorial on multiple imputation,” Canadian Journal of Cardiology, vol. 37, no. 9, pp. 1322–1331, 2021

work page 2021

[40] [40]

TOBMI: trans-omics block missing data imputation using a k-nearest neighbor weighted approach,

X. Dong, L. Lin, R. Zhang, Y . Zhao, D. C. Christiani, Y . Wei, and F. Chen, “TOBMI: trans-omics block missing data imputation using a k-nearest neighbor weighted approach,” Bioinformatics, vol. 35, no. 8, pp. 1278–1283, 2019

work page 2019

[41] [41]

Cobolt: integrative analysis of multimodal single-cell sequencing data,

B. Gong, Y . Zhou, and E. Purdom, “Cobolt: integrative analysis of multimodal single-cell sequencing data,” Genome Biology, vol. 22, pp. 1–21, 2021

work page 2021

[42] [42]

MultiVI: deep generative model for the integration of multimodal data,

T. Ashuach, M. I. Gabitto, R. V . Koodli, G.-A. Saldi, M. I. Jordan, and N. Yosef, “MultiVI: deep generative model for the integration of multimodal data,” Nature Methods, vol. 20, no. 8, pp. 1222–1231, 2023

work page 2023

[43] [43]

Deep adversarial learning for multi-modality missing data completion,

L. Cai, Z. Wang, H. Gao, D. Shen, and S. Ji, “Deep adversarial learning for multi-modality missing data completion,” in Proceedings of the 24th ACM SIGKDD Conference on Knowledge Discovery & Data Mining , 2018, pp. 1158–1166

work page 2018

[44] [44]

VIGAN: missing view imputation with generative adversarial networks,

C. Shang, A. Palmer, J. Sun, K.-S. Chen, J. Lu, and J. Bi, “VIGAN: missing view imputation with generative adversarial networks,” in IEEE International Conference on Big Data . IEEE, 2017, pp. 766–775

work page 2017

[45] [45]

A flexible generative model for heterogeneous tabular EHR with missing modality,

H. He, W. hao, Y . Xi, Y . Chen, B. Malin, and J. Ho, “A flexible generative model for heterogeneous tabular EHR with missing modality,” in International Conference on Learning Representations , 2024

work page 2024

[46] [46]

NEMO: cancer subtyping by integration of partial multi-omic data,

N. Rappoport and R. Shamir, “NEMO: cancer subtyping by integration of partial multi-omic data,” Bioinformatics, vol. 35, no. 18, pp. 3348– 3356, 2019

work page 2019

[47] [47]

MRGCN: cancer subtyping with multi-reconstruction graph convolutional network using full and partial multi-omics dataset,

B. Yang, Y . Yang, M. Wang, and X. Su, “MRGCN: cancer subtyping with multi-reconstruction graph convolutional network using full and partial multi-omics dataset,” Bioinformatics, vol. 39, no. 6, p. btad353, 2023

work page 2023

[48] [48]

Augmented sparse representation for incomplete multiview clustering,

J. Chen, S. Yang, X. Peng, D. Peng, and Z. Wang, “Augmented sparse representation for incomplete multiview clustering,” IEEE Transactions on Neural Networks and Learning Systems , vol. 35, no. 3, pp. 4058– 4071, 2022

work page 2022

[49] [49]

Inductive representation learning on large graphs,

W. Hamilton, Z. Ying, and J. Leskovec, “Inductive representation learning on large graphs,” Advances in Neural Information Processing Systems, vol. 30, 2017

work page 2017

[50] [50]

Graph representation learning,

W. L. Hamilton, “Graph representation learning,” Synthesis Lectures on Artificial Intelligence and Machine Learning , vol. 14, no. 3, pp. 1–159, 2020

work page 2020

[51] [51]

Visualizing data using t-SNE,

L. Van der Maaten and G. Hinton, “Visualizing data using t-SNE,” Journal of Machine Learning Research , vol. 9, no. 11, 2008

work page 2008

[52] [52]

The Cancer Genome Atlas Pan-Cancer analysis project,

J. N. Weinstein, E. A. Collisson, G. B. Mills, K. R. Shaw, B. A. Ozenberger, K. Ellrott, I. Shmulevich, C. Sander, and J. M. Stuart, “The Cancer Genome Atlas Pan-Cancer analysis project,” Nature Genetics , vol. 45, no. 10, pp. 1113–1120, 2013

work page 2013

[53] [53]

PCA-PAM50 improves consistency between breast cancer intrinsic and clinical subtyping reclassifying a subset of luminal A tumors as luminal B,

P.-K. Raj-Kumar, J. Liu, J. A. Hooke, A. J. Kovatich, L. Kvecher, C. D. Shriver, and H. Hu, “PCA-PAM50 improves consistency between breast cancer intrinsic and clinical subtyping reclassifying a subset of luminal A tumors as luminal B,” Scientific Reports, vol. 9, no. 1, p. 7956, 2019

work page 2019

[54] [54]

Comprehensive molec- ular characterization of urothelial bladder carcinoma,

The Cancer Genome Atlas Research Network, “Comprehensive molec- ular characterization of urothelial bladder carcinoma,” Nature, vol. 507, no. 7492, p. 315, 2014

work page 2014

[55] [55]

Min-redundancy and max-relevance multi-view feature selection for predicting ovarian cancer survival using multi-omics data,

Y . El-Manzalawy, T.-Y . Hsieh, M. Shivakumar, D. Kim, and V . Honavar, “Min-redundancy and max-relevance multi-view feature selection for predicting ovarian cancer survival using multi-omics data,” BMC Med- ical Genomics, vol. 11, no. 3, pp. 19–31, 2018

work page 2018

[56] [56]

Visualizing and interpreting cancer genomics data via the Xena platform,

M. J. Goldman, B. Craft, M. Hastie, K. Repe ˇcka, F. McDade, A. Kamath, A. Banerjee, Y . Luo, D. Rogers, A. N. Brooks et al. , “Visualizing and interpreting cancer genomics data via the Xena platform,” Nature Biotechnology, vol. 38, no. 6, pp. 675–678, 2020

work page 2020

[57] [57]

E. R. Girden, ANOVA: repeated measures. Sage, 1992, no. 84

work page 1992

[58] [58]

Adam: a method for stochastic optimization,

D. P. Kingma and J. Ba, “Adam: a method for stochastic optimization,” in International Conference on Learning Representations , 2015

work page 2015

[59] [59]

Silhouettes: a graphical aid to the interpretation and validation of cluster analysis,

P. J. Rousseeuw, “Silhouettes: a graphical aid to the interpretation and validation of cluster analysis,” Journal of Computational and Applied Mathematics, vol. 20, pp. 53–65, 1987. JOURNAL OF LATEX CLASS FILES, VOL. XX, NO. XX, XXX 2025 12

work page 1987

[60] [60]

A cluster separation measure,

D. L. Davies and D. W. Bouldin, “A cluster separation measure,” IEEE Transactions on Pattern Analysis and Machine Intelligence , vol. PAMI- 1, no. 2, pp. 224–227, 1979

work page 1979

[61] [61]

Dataset for

S. Ma, “Dataset for ”moving towards genome-wide data integration for patient stratification with integrate any omics”,” Oct. 2024, Zenodo. [Online]. Available: https://doi.org/10.5281/zenodo.13989262

work page doi:10.5281/zenodo.13989262 2024

[62] [62]

Moving towards genome-wide data integration for patient stratification with integrate any omics,

S. Ma, A. G. Zeng, B. Haibe-Kains, A. Goldenberg, J. E. Dick, and B. Wang, “Moving towards genome-wide data integration for patient stratification with integrate any omics,” Nature Machine Intelligence , vol. 7, no. 1, pp. 29–42, 2025

work page 2025

[63] [63]

Graph attention networks,

P. Veli ˇckovi´c, G. Cucurull, A. Casanova, A. Romero, P. Li `o, and Y . Bengio, “Graph attention networks,” in International Conference on Learning Representations, 2018

work page 2018

[64] [64]

Semi-supervised classification with graph convolutional networks,

T. N. Kipf and M. Welling, “Semi-supervised classification with graph convolutional networks,” in International Conference on Learning Rep- resentations, 2017

work page 2017

[65] [65]

How powerful are graph neural networks?

K. Xu, W. Hu, J. Leskovec, and S. Jegelka, “How powerful are graph neural networks?” in International Conference on Learning Represen- tations, 2019

work page 2019

[66] [66]

Tune: A Research Platform for Distributed Model Selection and Training

R. Liaw, E. Liang, R. Nishihara, P. Moritz, J. E. Gonzalez, and I. Stoica, “Tune: a research platform for distributed model selection and training,” arXiv preprint arXiv:1807.05118 , 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[67] [67]

A system for massively parallel hyperpa- rameter tuning,

L. Li, K. Jamieson, A. Rostamizadeh, E. Gonina, J. Ben-Tzur, M. Hardt, B. Recht, and A. Talwalkar, “A system for massively parallel hyperpa- rameter tuning,” Proceedings of Machine Learning and Systems , vol. 2, pp. 230–246, 2020. JOURNAL OF LATEX CLASS FILES, VOL. XX, NO. XX, XXX 2025 13 Supplementary Material APPENDIX A. Hyperparameter Tuning To ensure ...

work page 2020