pith. sign in

arxiv: 2606.06867 · v1 · pith:5M6URJAYnew · submitted 2026-06-05 · 💻 cs.CV

Multi-FRuGaL: Multimodal Flexible Redundancy-aware Decomposed Gated Learning for Cancer Diagnosis and Prognosis

Pith reviewed 2026-06-27 22:40 UTC · model grok-4.3

classification 💻 cs.CV
keywords multimodal fusionmissing modalitiescancer prognosisgated learninghead and neck cancersurvival analysisHPV predictionredundancy-aware
0
0 comments X

The pith

Multi-FRuGaL separates redundant signals from complementary ones in incomplete medical data using gated fusion to improve cancer prognosis accuracy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents Multi-FRuGaL as a way to fuse radiology, pathology, and clinical data for head and neck cancer when some inputs are missing. It adds a decomposition layer and an input-conditioned gate that learns to emphasize useful modality signals while downplaying redundant or absent ones. An information-aware objective guides the fusion. On the HANCOCK dataset this raises the area under the curve for 5-year survival from 0.601 to 0.8496 and for recurrence from 0.672 to 0.8102. On HECKTOR it reaches 0.975 AUC for HPV status. The framework stays defined even with several modalities absent.

Core claim

Multi-FRuGaL integrates per-modality encoders with a signal decomposition layer, an input-conditioned gating network, and an information-aware fusion objective to separate redundant from modality-specific complementary signals, selectively upweighting informative modalities and suppressing redundant or noisy inputs, and remaining well-defined even when multiple modalities are absent. Evaluated on HANCOCK (N=763, five modalities) and HECKTOR (N=588, three modalities), it improves mean performance across survival, recurrence, and HPV tasks.

What carries the argument

The input-conditioned gating network that, together with the signal decomposition layer, learns to weight modalities according to their contribution to the task.

If this is right

  • It achieves a concordance index of 0.6814 for overall survival on HANCOCK.
  • It reaches 0.975 AUC for HPV prediction on HECKTOR.
  • Performance holds under severe missing-modality conditions.
  • The method produces discriminative multimodal representations.
  • Results are reported for recurrence-free and progression-free survival as well.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar gating could help fusion models in other domains with incomplete sensor data, such as autonomous driving.
  • Explicit signal decomposition might reduce the need for imputation techniques in medical AI.
  • Testing on datasets with more than five modalities would reveal scalability limits.
  • The approach may generalize to non-cancer tasks like predicting treatment response.

Load-bearing premise

The input-conditioned gating network and information-aware fusion objective can reliably separate redundant from modality-specific signals and remain effective when multiple modalities are absent.

What would settle it

A controlled experiment on HANCOCK where two modalities are randomly dropped in every sample and the survival AUC falls below 0.75 would indicate the gating does not reliably identify informative signals.

Figures

Figures reproduced from arXiv: 2606.06867 by Carla Pitarch-Abaigar, Sanket Kachole, Sanyukta Adap, Shubham Innani, Siddhesh Thakur, Spyridon Bakas, Suhang You.

Figure 1
Figure 1. Figure 1: Overview of the proposed Multimodal Flexible Redundancy-aware decomposed Gated Learning (Multi-FRuGaL) framework. [PITH_FULL_IMAGE:figures/full_fig_p021_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Performance under random modality dropout across HANCOCK recurrence, HANCOCK survival, and HECKTOR HPV prediction. [PITH_FULL_IMAGE:figures/full_fig_p022_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Kaplan–Meier survival curves on the HANCOCK dataset across competing methods for Overall Survival (OS), Recurrence-Free Survival [PITH_FULL_IMAGE:figures/full_fig_p022_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Stage-wise t-SNE visualizations across HANCOCK modalities. Each point represents one patient embedding. Colors denote K-means [PITH_FULL_IMAGE:figures/full_fig_p023_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Cross-modal cosine similarity matrices for modality-specific, modality-shared, and gated representations on the HANCOCK dataset. The [PITH_FULL_IMAGE:figures/full_fig_p024_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Distribution of learned gate values across HANCOCK modalities. The histogram view shows the overlap and concentration of gate [PITH_FULL_IMAGE:figures/full_fig_p024_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Kaplan–Meier survival curves on the HECKTOR dataset for recurrence-free survival (RFS) across competing methods. Patients were [PITH_FULL_IMAGE:figures/full_fig_p025_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Stage-wise t-SNE visualizations across HECKTOR modalities. For each modality, the five panels correspond to the raw, modality [PITH_FULL_IMAGE:figures/full_fig_p026_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Cross-modal cosine similarity matrices for modality-specific, modality-shared, and gated representations on the HECKTOR dataset. [PITH_FULL_IMAGE:figures/full_fig_p026_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Distribution of learned gate values across HECKTOR modalities. The histogram view shows the overlap and concentration of gate [PITH_FULL_IMAGE:figures/full_fig_p027_10.png] view at source ↗
read the original abstract

Modern medicine relies on heterogeneous data sources spanning radiology, pathology, text reports, and structured clinical information. However, real-world patient data are frequently incomplete, with missing or sparsely acquired modalities, limiting the effectiveness of standard multimodal fusion approaches. To this end, we propose the Multimodal Flexible Redundancy-aware decomposed GAted Learning (Multi-FRuGaL) framework, a decomposition-aware, adaptive gated intermediate-fusion framework that performs modality-level representation learning under missing data. Multi-FRuGaL integrates per-modality encoders with a signal decomposition layer, an input-conditioned gating network, and an information-aware fusion objective to separate redundant from modality-specific complementary signals, selectively upweighting informative modalities and suppressing redundant or noisy inputs, and remaining well-defined even when multiple modalities are absent. We evaluate Multi-FRuGaL on two multimodal head and neck cancer cohorts: the HANCOCK challenge dataset (N = 763) comprising five modalities and two prognostic endpoints (5-year survival and 2-year recurrence), and the HECKTOR challenge dataset (N = 588) comprising three modalities for human papillomavirus (HPV) status classification. Multi-FRuGaL consistently achieves higher mean performance than the evaluated baselines across multiple tasks, improving AUC from 0.601 to 0.8496 for survival, from 0.672 to 0.8102 for recurrence, and achieving 0.975 AUC for HPV prediction on HECKTOR. For survival analysis, it further achieves a concordance index of 0.6814 for overall survival, 0.7421 for recurrence-free survival, and 0.7143 for progression-free survival on HANCOCK, and 0.7203 for recurrence-free survival on HECKTOR. Qualitative analyses further show that Multi-FRuGaL learns discriminative and robust multimodal representations, even under severe missing-modality conditions.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 0 minor

Summary. The paper proposes Multi-FRuGaL, a decomposition-aware adaptive gated intermediate-fusion framework for multimodal cancer diagnosis and prognosis under missing data. It combines per-modality encoders, a signal decomposition layer, an input-conditioned gating network, and an information-aware fusion objective to separate redundant from modality-specific signals and remain well-defined when modalities are absent. Evaluation is on HANCOCK (N=763, five modalities, survival and recurrence endpoints) and HECKTOR (N=588, three modalities, HPV status), reporting AUC gains from 0.601 to 0.8496 (survival), 0.672 to 0.8102 (recurrence), and 0.975 (HPV), plus concordance indices around 0.68-0.74.

Significance. If the input-conditioned gating and fusion objective demonstrably isolate informative signals under missing modalities with statistical rigor, the work could meaningfully advance robust multimodal fusion for incomplete clinical datasets, where standard approaches often degrade. The reported numerical improvements on two challenge cohorts would then represent a practical contribution to prognostic modeling in head and neck cancer.

major comments (3)
  1. [Abstract] Abstract: the headline AUC gains (0.601→0.8496 survival, 0.672→0.8102 recurrence) are attributed to the decomposition layer + gating network + fusion objective, yet the text supplies only aggregate means with no per-missing-rate performance curves, no controlled ablation isolating the gating network's contribution, and no statistical tests (e.g., paired t-tests or bootstrap CIs) comparing against baselines. This leaves open whether gains arise from the claimed redundancy-aware mechanism or from encoder capacity and dataset specifics.
  2. [Abstract] Abstract: the claim that the framework “remains well-defined” and yields “robust representations even under severe missing-modality conditions” is load-bearing for the central contribution, but no quantitative support (e.g., ablation tables varying the number of absent modalities or gating-weight statistics) is provided beyond the overall means; the weakest assumption—that the input-conditioned gating reliably separates signals when multiple modalities are absent—therefore lacks direct evidence.
  3. [Abstract] Abstract: no information is given on baseline fairness (identical encoders and training protocols), data splits, handling of missingness patterns, or multiple-run variance, making it impossible to assess whether the reported improvements are reproducible or confounded by implementation details.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive comments highlighting areas where additional evidence and details would strengthen the presentation. We address each major comment below and commit to revisions that provide the requested quantitative support and reproducibility information.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the headline AUC gains (0.601→0.8496 survival, 0.672→0.8102 recurrence) are attributed to the decomposition layer + gating network + fusion objective, yet the text supplies only aggregate means with no per-missing-rate performance curves, no controlled ablation isolating the gating network's contribution, and no statistical tests (e.g., paired t-tests or bootstrap CIs) comparing against baselines. This leaves open whether gains arise from the claimed redundancy-aware mechanism or from encoder capacity and dataset specifics.

    Authors: We agree that the abstract reports only aggregate means and that stronger isolation of the proposed mechanisms is needed. In the revised manuscript we will add per-missing-rate performance curves, controlled ablations that isolate the gating network (and other components) while holding encoder capacity fixed, and statistical comparisons (paired t-tests and bootstrap CIs) against baselines. These additions will directly address whether the reported gains derive from the redundancy-aware decomposition and gating rather than other factors. revision: yes

  2. Referee: [Abstract] Abstract: the claim that the framework “remains well-defined” and yields “robust representations even under severe missing-modality conditions” is load-bearing for the central contribution, but no quantitative support (e.g., ablation tables varying the number of absent modalities or gating-weight statistics) is provided beyond the overall means; the weakest assumption—that the input-conditioned gating reliably separates signals when multiple modalities are absent—therefore lacks direct evidence.

    Authors: We acknowledge that direct quantitative evidence for behavior under multiple missing modalities is required to support the central claim. The revision will include ablation tables that systematically vary the number of absent modalities together with gating-weight statistics (means, variances, and distributions) across these conditions. This will provide concrete evidence that the input-conditioned gating continues to separate informative from redundant signals even when several modalities are absent. revision: yes

  3. Referee: [Abstract] Abstract: no information is given on baseline fairness (identical encoders and training protocols), data splits, handling of missingness patterns, or multiple-run variance, making it impossible to assess whether the reported improvements are reproducible or confounded by implementation details.

    Authors: We agree that these implementation and experimental details are essential for assessing reproducibility. The revised manuscript will explicitly document that all baselines used identical encoders and training protocols, describe the data-splitting procedure (including stratification by missingness patterns), explain how missing modalities were simulated and handled during training and inference, and report performance variance across multiple independent runs with different random seeds. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical architecture evaluated on external benchmarks

full rationale

The manuscript describes a multimodal gated fusion framework evaluated via standard train/test splits on two independent challenge datasets (HANCOCK N=763, HECKTOR N=588). Reported AUC gains (0.601→0.8496 survival, 0.672→0.8102 recurrence, 0.975 HPV) are measured quantities on held-out data, not quantities defined by the same fitted parameters. No equations appear that equate any performance metric to an input by construction, no self-citation chain is invoked to justify uniqueness of the gating or decomposition layer, and no ansatz is smuggled via prior work. The central claims rest on empirical comparison rather than self-referential definitions, satisfying the self-contained criterion against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only the abstract is available; no model equations, hyperparameter lists, or explicit assumptions are provided, so the ledger cannot be populated beyond noting that neural-network weights constitute fitted parameters whose values are not reported.

pith-pipeline@v0.9.1-grok · 5919 in / 1120 out tokens · 22353 ms · 2026-06-27T22:40:35.259275+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

79 extracted references · 8 canonical work pages

  1. [1]

    Shilo, H

    S. Shilo, H. Rossman, E. Segal, Axes of a rev- olution: challenges and promises of big data in healthcare, Nature medicine 26 (2020) 29–38

  2. [2]

    M. K. Niazi, A. V . Parwani, M. N. Gurcan, Digital pathology and artificial intelligence, The Lancet Oncology 20 (2019) e253–e261. doi:10.1016/ S1470-2045(19)30154-8

  3. [3]

    C. L. Srinidhi, O. Ciga, A. L. Martel, Deep neu- ral network models for computational histopathol- ogy: A survey, Medical image analysis 67 (2021) 101813

  4. [4]

    Chandrasekaran, S

    M. Chandrasekaran, S. Kachole, J. Francik, D. Makris, Pgcgan: Pathological gait-conditioned gan for human gait synthesis, arXiv preprint arXiv:2603.14409 (2026)

  5. [5]

    J. R. Sempionatto, I. Jeerapan, J. Wang, Wear- able and implantable sensors for biomedical appli- cations, Nature Reviews Bioengineering 1 (2022) 69–84. doi:10.1038/s44222-022-00007-3

  6. [6]

    Castiglioni, L

    I. Castiglioni, L. Rundo, M. Codari, Artifi- cial intelligence applications in medical imag- ing: Current perspectives, European Radiol- ogy Experimental 5 (2021) 35. doi:10.1186/ s41747-021-00234-8

  7. [7]

    Kachole, H

    S. Kachole, H. Sajwani, F. B. Naeini, D. Makris, Y . Zweiri, Asynchronous bioplausible neuron for spiking neural networks for event-based vision, in: European Conference on Computer Vision, Springer, 2024, pp. 399–415

  8. [8]

    Steyaert, C

    S. Steyaert, C. Van Neste, et al., Integrative multi- omics approaches in precision oncology, Nature Reviews Genetics 24 (2023) 389–405. doi:10. 1038/s41576-023-00578-9

  9. [9]

    Rehman, H

    U. Rehman, H. Hudson, C.-Y . Hao, Y . Ahn, S. Ka- chole, J. Liu, S. Patel, M. J. Xu, M. J. Rouhani, P. O’Flynn, et al., Lifetime prevalence of betel nut chewing in india and taiwan: Raising awareness of oral cancer risks and the urgent call for regulation, Cancers 18 (2026) 1074

  10. [10]

    Z. Wang, R. Lin, Y . Li, J. Zeng, Y . Chen, W. Ouyang, H. Li, X. Jia, Z. Lai, Y . Yu, et al., Deep learning-based multi-modal data integration enhancing breast cancer disease-free survival pre- diction, Precision clinical medicine 7 (2024) pbae012

  11. [11]

    Rosenblum, R

    H. Rosenblum, R. Glynne-Jones, et al., Multidis- ciplinary tumor boards in oncology: An overview and future directions, The Oncologist 27 (2022) 95–104. doi:10.1093/oncolo/oyab039. 16

  12. [12]

    Rajpurkar, E

    P. Rajpurkar, E. Chen, I. Banerjee, E. J. Topol, Ai in health and medicine, Nature Medicine 28 (2022) 31–38

  13. [13]

    S. Li, H. Tang, Multimodal alignment and fu- sion: A survey, arXiv preprint arXiv:2411.17040 (2024)

  14. [14]

    Kachole, X

    S. Kachole, X. Huang, F. B. Naeini, R. Muthusamy, D. Makris, Y . Zweiri, Bi- modal segnet: Fused instance segmentation using events and rgb frames, Pattern Recognition 149 (2024) 110215

  15. [15]

    T. M. Schouten, Y . Zhao, M. de Rooij, et al., A scoping review of multimodal ai in medicine, Medical Image Analysis 97 (2025) 103123. doi:10.1016/j.media.2025.103123

  16. [16]

    Lipkova, R

    J. Lipkova, R. J. Chen, B. Chen, M. Y . Lu, M. Bar- bieri, D. Shao, A. J. Vaidya, C. Chen, L. Zhuang, D. F. Williamson, et al., Artificial intelligence for multimodal data integration in oncology, Cancer cell 40 (2022) 1095–1110

  17. [17]

    Huang, A

    S.-C. Huang, A. Pareek, S. Seyyedi, I. Banerjee, M. P. Lungren, Fusion of medical imaging and electronic health records using deep learning: a systematic review and implementation guidelines, NPJ Digital Medicine 3 (2020) 136

  18. [18]

    Baltrušaitis, C

    T. Baltrušaitis, C. Ahuja, L.-P. Morency, Multi- modal machine learning: A survey and taxonomy, IEEE Transactions on Pattern Analysis and Ma- chine Intelligence 41 (2019) 423–443

  19. [19]

    J. Li, T. Zhou, G. Yang, Adaptive modality gat- ing for robust multimodal learning in healthcare, Medical Image Analysis 84 (2023) 102699

  20. [20]

    R. J. Chen, M. Y . Lu, J. Wang, D. F. Williamson, S. J. Rodig, F. Mahmood, Pathomic fusion: An in- tegrated framework for fusing histopathology and genomic features for cancer diagnosis and prog- nosis, IEEE Transactions on Medical Imaging 41 (2020) 757–770

  21. [21]

    L. R. Soenksen, Y . Ma, C. Zeng, D. Bertsimas, In- tegrated multimodal artificial intelligence frame- work for healthcare applications, NPJ Digital Medicine 5 (2022) 149

  22. [22]

    Suter, A

    Y . Suter, A. Roesch, H. Koeppl, P. J. Schueffler, Missing-modality robust multimodal learning for medical imaging, Medical Image Analysis 87 (2023) 102832

  23. [23]

    Zhang, Q

    Y . Zhang, Q. Zhao, X. Hu, Missing modal- ity imagination network for multimodal classifica- tion, in: Proceedings of the IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 8645–8654

  24. [24]

    H. Chen, Y . Li, J. Zhang, L. Yang, Y . Sun, Y . Chen, S. Zhou, Z. Li, X. Qian, Q. Xu, et al., An align- ment and imputation network (ainet) for breast cancer diagnosis with multimodal multi-view ul- trasound images, IEEE Transactions on Medical Imaging (2025)

  25. [25]

    Boyko, A

    M. Boyko, A. Beliaeva, D. Kornilov, A. Bernstein, M. Sharaev, imputmae: Multi-modal transformer with masked pre-training for missing modalities imputation in cancer survival prediction, arXiv preprint arXiv:2508.09195 (2025)

  26. [26]

    Perez, N

    G. Perez, N. Strodthoff, J. Schlemper, Handling missing modalities in multimodal deep learn- ing: A survey, arXiv preprint arXiv:2303.11223 (2023)

  27. [27]

    Y . Zhao, X. Wu, D. N. Metaxas, Moddrop++: Adaptive modality dropping for robust multimodal learning, in: Proceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recogni- tion (CVPR), 2023, pp. 12345–12355

  28. [28]

    Y .-H. H. Tsai, S. Bai, P. P. Liang, A. Zadeh, L.-P. Morency, R. Salakhutdinov, Multimodal transformer for unaligned multimodal language sequences, in: Proceedings of the Association for Computational Linguistics (ACL), 2019, pp. 6558–6569

  29. [29]

    X. Wang, H. Wang, Y . Chen, Y . Xu, Self- supervised multimodal representation learning with missing modalities for medical imaging, Na- ture Communications 15 (2024) 3412

  30. [30]

    Dörrich, M

    M. Dörrich, M. Balk, T. Heusinger, S. Beyer, H. Mirbagheri, D. J. Fischer, H. Kanso, C. Matek, A. Hartmann, H. Iro, et al., A multimodal dataset for precision oncology in head and neck cancer, Nature Communications 16 (2025) 7163

  31. [31]

    Andréarczyk, V

    V . Andréarczyk, V . Oreiller, S. Boughdad, et al., Overview of the hecktor challenge at miccai 2022: Automatic head and neck tumor segmentation and outcome prediction in pet/ct, in: Head and Neck Tumor Segmentation and Outcome Prediction — Third 3D Head and Neck Tumor Segmentation 17 in PET/CT Challenge (HECKTOR 2022), vol- ume 13626 ofLecture Notes in C...

  32. [32]

    Barnum, S

    G. Barnum, S. Talukder, Y . Yue, On the benefits of early fusion in multimodal representation learning, arXiv preprint arXiv:2011.07191 (2020)

  33. [33]

    Nikolaou, D

    N. Nikolaou, D. Salazar, H. RaviPrakash, M. Gonçalves, R. Mulla, N. Burlutskiy, N. Marku- zon, E. Jacob, A machine learning approach for multimodal data fusion for survival prediction in cancer patients, NPJ Precision Oncology 9 (2025) 128

  34. [34]

    Guarrasi, F

    V . Guarrasi, F. Aksu, C. M. Caruso, F. Di Feola, A. Rofena, F. Ruffini, P. Soda, A systematic re- view of intermediate fusion in multimodal deep learning for biomedical applications, Image and Vision Computing (2025) 105509

  35. [35]

    Huang, S

    X. Huang, S. Kachole, A. Ayyad, F. B. Naeini, D. Makris, Y . Zweiri, A neuromorphic dataset for tabletop object segmentation in indoor cluttered environment, Scientific data 11 (2024) 127

  36. [36]

    Ramanathan, T

    V . Ramanathan, T. Xu, P. Pati, F. Ahmed, M. Goubran, A. L. Martel, Modaltune: Fine- tuning slide-level foundation models with multi- modal information for multi-task learning in digi- tal pathology, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025, pp. 23912–23923

  37. [37]

    Ramanathan, P

    V . Ramanathan, P. Pati, M. McNeil, A. L. Martel, Ensemble of prior-guided expert graph models for survival prediction in digital pathology, in: Inter- national Conference on Medical Image Comput- ing and Computer-Assisted Intervention, Springer, 2024, pp. 262–272

  38. [38]

    J. Chen, A. L. Martel, Head and neck tumor segmentation with 3d unet and survival predic- tion with multiple instance neural network, in: 3D head and neck tumor segmentation in PET/CT challenge, Springer, 2022, pp. 221–229

  39. [39]

    F. B. Naeini, S. Kachole, R. Muthusamy, D. Makris, Y . Zweiri, Event augmentation for con- tact force measurements, IEEE Access 10 (2022) 123651–123660

  40. [40]

    Kachole, Y

    S. Kachole, Y . Alkendi, F. B. Naeini, D. Makris, Y . Zweiri, Asynchronous events-based panoptic segmentation using graph mixer neural network, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 4083–4092

  41. [41]

    Kachole, O

    S. Kachole, O. Duran, A computer vision ap- proach to monitoring the activity and well-being of honeybees., 2020

  42. [42]

    S. Kim, R. Xiao, M. I. Georgescu, S. Alaniz, Z. Akata, Cosmos: Cross-modality self- distillation for vision–language pre-training, arXiv preprint arXiv:2412.01814 (2024)

  43. [43]

    Y . Chen, D. Xu, Y . Huang, S. Zhan, H. Wang, D. Chen, X. Wang, M. Qiu, H. Li, Mimo: A med- ical vision–language model with visual referring multimodal input and pixel grounding multimodal output, in: Proceedings of the IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition (CVPR), 2025

  44. [44]

    B. Zhou, Z. Gao, Z. Wang, B. Zhang, Y . Wang, Z. Chen, H. Xie, Syntab-llava: Enhancing mul- timodal table understanding with decoupled syn- thesis, in: Proceedings of the IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition (CVPR), 2025, pp. 24796–24806

  45. [45]

    S. Du, X. Luo, D. P. O’Regan, C. Qin, Stil: Semi- supervised tabular-image learning for comprehen- sive task-relevant information exploration in mul- timodal classification, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025, pp. 15549– 15559

  46. [46]

    H. Yin, G. Si, Z. Wang, Clearsight: Visual signal enhancement for object hallucination mitigation in multimodal large language models, in: Proceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025, pp. 14625–14634

  47. [47]

    L. Yang, Z. Zheng, B. Chen, Z. Zhao, C. Lin, C. Shen, Nullu: Mitigating object hallucinations in large vision-language models via halluspace projection, in: Proceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recogni- tion (CVPR), 2025, pp. 14635–14645

  48. [48]

    H. Zeng, X. Wang, Y . Chen, J. Su, J. Liu, Vision- language gradient descent-driven all-in-one deep unfolding networks, in: Proceedings of the IEEE/CVF Conference on Computer Vision and 18 Pattern Recognition (CVPR), 2025, pp. 7524– 7533

  49. [49]

    Kachole, B

    S. Kachole, B. Nayak, J. Brouner, Y . Liu, L. Guo, D. Makris, Posture estimation from tactile signals using a masked forward diffusion model, Sensors 25 (2025) 4926

  50. [50]

    Kachole, Object segmentation: from neuro- morphic sensing to neuromorphic machine learn- ing (2026)

    S. Kachole, Object segmentation: from neuro- morphic sensing to neuromorphic machine learn- ing (2026)

  51. [51]

    Troyanskaya, M

    O. Troyanskaya, M. Cantor, G. Sher- lock, P. Brown, T. Hastie, R. Tibshirani, D. Botstein, R. B. Altman, Missing value estimation methods for dna microar- rays, Bioinformatics 17 (2001) 520–525. doi:10.1093/bioinformatics/17.6.520

  52. [52]

    D. J. Stekhoven, P. Bühlmann, missforest—non- parametric missing value imputation for mixed- type data, Bioinformatics 28 (2012) 112–118. doi:10.1093/bioinformatics/btr597

  53. [53]

    Josse, F

    J. Josse, F. Husson, missmda: A package for han- dling missing values in multivariate data analy- sis, Journal of Statistical Software 70 (2016) 1–31. doi:10.18637/jss.v070.i01

  54. [54]

    Benkirane, Y

    H. Benkirane, Y . Pradat, S. Michiels, P.-H. Cournède, Customics: A versatile deep-learning based strategy for multi-omics integration, PLOS Computational Biology 19 (2023) e1010921

  55. [55]

    S. You, C. Pitarch-Abaigar, S. Kachole, S. Son- awane, J. Ha, A. S. Gada, D. Crandall, R. Shiradkar, S. Bakas, Profuseme: Prostate cancer biochemical recurrence prediction via fused multi-modal embeddings, arXiv preprint arXiv:2509.14051 (2025)

  56. [56]

    M. Wang, S. Fan, Y . Li, Z. Xie, H. Chen, Missing- modality enabled multi-modal fusion architecture for medical data, Journal of Biomedical Informat- ics 164 (2025) 104796

  57. [57]

    C. Cui, Z. Asad, W. F. Dean, I. T. Smith, C. Mad- den, S. Bao, B. A. Landman, J. T. Roland, L. A. Coburn, K. T. Wilson, et al., Multi-modal learn- ing with missing data for cancer diagnosis using histopathological and genomic data, in: Medical Imaging 2022: Computer-Aided Diagnosis, vol- ume 12033, SPIE, 2022, pp. 371–378

  58. [58]

    C. Cui, H. Yang, Y . Wang, S. Zhao, Z. Asad, L. A. Coburn, K. T. Wilson, B. A. Landman, Y . Huo, Deep multimodal fusion of image and non-image data in disease diagnosis and progno- sis: a review, Progress in Biomedical Engineering 5 (2023) 022001

  59. [59]

    Yeghaian, Z

    M. Yeghaian, Z. Bodalal, D. van den Broek, J. B. Haanen, R. G. Beets-Tan, S. Trebeschi, M. A. van Gerven, Multimodal integration of longitudi- nal noninvasive diagnostics for survival prediction in immunotherapy using deep learning, Journal of the American Medical Informatics Association (2025) ocaf074

  60. [60]

    Y . Xu, F. Zhou, C. Zhao, Y . Wang, C. Yang, H. Chen, Distilled prompt learning for incomplete multimodal survival prediction, in: Proceedings of the Computer Vision and Pattern Recognition Conference, 2025, pp. 5102–5111

  61. [61]

    Pooja, S

    S. Pooja, S. Gupta, Y . Zhao, X. Zhang, Reducing modality redundancy for effective multimodal fu- sion, IEEE Transactions on Neural Networks and Learning Systems 33 (2022) 5301–5313

  62. [62]

    M. Y . Lu, D. F. Williamson, T. Y . Chen, R. J. Chen, M. Barbieri, F. Mahmood, Data-efficient and weakly supervised computational pathology on whole-slide images, Nature biomedical engi- neering 5 (2021) 555–570

  63. [63]

    O. Ciga, T. Xu, A. L. Martel, Self supervised con- trastive learning for digital histopathology, Ma- chine learning with applications 7 (2022) 100198

  64. [64]

    Zimmermann, E

    E. Zimmermann, E. V orontsov, J. Viret, A. Cas- son, M. Zelechowski, G. Shaikovski, N. Tenen- holtz, J. Hall, D. Klimstra, R. Yousfi, et al., Virchow2: Scaling self-supervised mixed mag- nification models in pathology, arXiv preprint arXiv:2408.00738 (2024)

  65. [65]

    Neidlinger, O

    P. Neidlinger, O. S. El Nahhas, H. S. Muti, T. Lenz, M. Hoffmeister, H. Brenner, M. van Treeck, R. Langer, B. Dislich, H. M. Behrens, et al., Benchmarking foundation models as fea- ture extractors for weakly supervised computa- tional pathology, Nature biomedical engineering (2025) 1–11

  66. [66]

    Simonyan, A

    K. Simonyan, A. Zisserman, Very deep convolu- tional networks for large-scale image recognition, arXiv preprint arXiv:1409.1556 (2014). 19

  67. [67]

    K. Hara, H. Kataoka, Y . Satoh, Can spatiotem- poral 3d cnns retrace the history of 2d cnns and imagenet?, in: Proceedings of the IEEE con- ference on Computer Vision and Pattern Recog- nition (CVPR), 2018, pp. 6546–6555. doi:10. 1109/CVPR.2018.00685

  68. [68]

    Dörrich, M

    M. Dörrich, M. Balk, T. Heusinger, S. Beyer, H. Kanso, C. Matek, A. Hartmann, H. Iro, M. Eck- stein, A.-O. Gostian, et al., A multimodal dataset for precision oncology in head and neck cancer, medRxiv (2024) 2024–05

  69. [69]

    Y . Bai, S. Chen, L. Dong, W. Zhou, Z. Zhang, S. Liu, F. Wei, Qwen: A foundation model for multilingual understanding and generation, arXiv preprint arXiv:2309.16609 (2023)

  70. [70]

    C. J. Maddison, A. Mnih, Y . W. Teh, The concrete distribution: A continuous relaxation of discrete random variables, arXiv preprint arXiv:1611.00712 (2016)

  71. [71]

    Vaswani, N

    A. Vaswani, N. Shazeer, N. Parmar, J. Uszko- reit, L. Jones, A. N. Gomez, L. u. Kaiser, I. Polosukhin, Attention is all you need, in: I. Guyon, U. V . Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, R. Garnett (Eds.), Advances in Neural Information Processing Sys- tems, volume 30, Curran Associates, Inc., 2017. URL:https://proceedings.neurips....

  72. [72]

    Tripathi, A

    A. Tripathi, A. Waqas, M. B. Schabath, Y . Yilmaz, G. Rasool, Honeybee: enabling scalable multi- modal ai in oncology through foundation model- driven embeddings, npj Digital Medicine 8 (2025) 622

  73. [73]

    Ebrahimi, S

    S. Ebrahimi, S. O. Arik, Y . Dong, T. Pfis- ter, Lanistr: Multimodal learning from struc- tured and unstructured data, arXiv preprint arXiv:2305.16556 (2023)

  74. [74]

    M. Tan, Q. Le, Efficientnet: Rethinking model scaling for convolutional neural networks, in: Proceedings of the International Conference on Machine Learning (ICML), 2019, pp. 6105–6114. URL:https://arxiv.org/abs/1905.11946

  75. [75]

    Huang, Z

    G. Huang, Z. Liu, L. van der Maaten, K. Q. Weinberger, Densely connected convolutional networks, in: Proceedings of the IEEE Con- ference on Computer Vision and Pattern Recog- nition (CVPR), 2017, pp. 4700–4708. doi:10. 1109/CVPR.2017.243

  76. [76]

    J. Hu, L. Shen, G. Sun, Squeeze-and-excitation networks, in: Proceedings of the IEEE Con- ference on Computer Vision and Pattern Recog- nition (CVPR), 2018, pp. 7132–7141. doi:10. 1109/CVPR.2018.00745

  77. [77]

    Dosovitskiy, L

    A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weis- senborn, et al., An image is worth 16x16 words: Transformers for image recognition at scale, In- ternational Conference on Learning Represen- tations (ICLR) (2021). URL:https://arxiv. org/abs/2010.11929

  78. [78]

    L. Cai, X. Liang, T. Zhang, J. Huang, T. Tan, Y . Yin, Less is more: Efficient pet/ct segmenta- tion and multimodal prediction of recurrence-free survival and hpv status in head and neck cancer, in: Fourth Head and Neck Cancer Tumor Lesion Segmentation, Diagnosis and Prognosis, ????

  79. [79]

    K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 770–778. doi:10.1109/CVPR.2016.90. 20 Figure 1: Overview of the proposed Multimodal Flexible Redundancy-aware decomposed Gated Learning (Multi-FRuGaL) framework.(A)Data processin...