arxiv: 2603.29977 · v2 · submitted 2026-03-31 · 💻 cs.LG · cs.AI· q-bio.QM

Recognition: 1 theorem link

· Lean Theorem

Quantifying Cross-Modal Interactions in Multimodal Glioma Survival Prediction via InterSHAP: Evidence for Additive Signal Integration

Iain Swift , JingHua Ye , Ruairi O'Reilly

Authors on Pith no claims yet

Pith reviewed 2026-05-13 23:41 UTC · model grok-4.3

classification 💻 cs.LG cs.AIq-bio.QM

keywords multimodal fusionglioma survivalcross-modal interactionshapley valuescox modelsadditive variancerna-seqwhole-slide images

0 comments

The pith

Multimodal glioma survival models gain performance by adding image and RNA signals rather than learning their interactions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests the widespread assumption that multimodal models improve cancer survival predictions through synergistic interactions between data types such as whole-slide images and RNA-seq. It adapts the InterSHAP metric to Cox proportional hazards models and applies it to four fusion architectures on TCGA glioma data. The central result is an inverse relationship: stronger predictive models show equal or lower interaction contributions, while variance decomposition attributes most power to stable additive effects from each modality separately. This finding matters for model design because it indicates that performance improvements do not require increasingly complex cross-modal learning.

Core claim

Adapting InterSHAP to survival models on TCGA-GBM and TCGA-LGG data (n=575), the study finds that architectures with rising C-index from 0.64 to 0.82 display cross-modal interaction shares falling from 4.8% to 3.0%. Variance decomposition shows consistent additive splits across all tested fusion strategies, with whole-slide images contributing approximately 40% and RNA-seq approximately 55%, leaving interaction at roughly 4%. Performance therefore stems from complementary signal aggregation instead of learned synergy between modalities.

What carries the argument

InterSHAP, the Shapley interaction index metric adapted to Cox proportional hazards models, which isolates the fraction of output variance due to cross-modal interactions between whole-slide image and RNA-seq features.

If this is right

Performance gains arise from better aggregation of independent modality signals rather than from modeling interactions.
Cross-modal interactions remain small and stable at around 4% of variance no matter which fusion architecture is used.
Simpler fusion methods can reach high discrimination without added complexity for interaction terms.
The metric supplies an auditing method to compare fusion strategies by separating additive from interactive contributions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Designers could favor separate feature extractors over joint layers when building multimodal survival models.
The same additive pattern may appear in other cancers or modality pairs and could be checked by reusing the metric.
Low interaction reduces the need for joint training across sites, supporting privacy-preserving federated setups.

Load-bearing premise

The adaptation of InterSHAP from classification to Cox survival models measures genuine cross-modal interactions without distortion from architecture or preprocessing choices.

What would settle it

An experiment showing a high-C-index fusion architecture with interaction contribution above 5% or one where the interaction share changes sharply under different preprocessing would falsify the stability and small-size claims.

Figures

Figures reproduced from arXiv: 2603.29977 by Iain Swift, JingHua Ye, Ruairi O'Reilly.

**Figure 1.** Figure 1: Fusion architectures evaluated. (A) Early Fusion MLP (8.8M params). (B) Cross-Attention (1.8M params). (C) Bilinear Fusion (0.54M params). (D) Gated Fusion (3.2M params). Histopathology: WSIs were processed at 20× magnification into non-overlapping 224×224 patches. ResNet-50 (ImageNet-pretrained) extracted 2,048-dimensional features per patch, averaged to slide-level embeddings [1]. Transcriptomics: RNA-Se… view at source ↗

read the original abstract

Multimodal deep learning for cancer prognosis is commonly assumed to benefit from synergistic cross-modal interactions, yet this assumption has not been directly tested in survival prediction settings. This work adapts InterSHAP, a Shapley interaction index-based metric, from classification to Cox proportional hazards models and applies it to quantify cross-modal interactions in glioma survival prediction. Using TCGA-GBM and TCGA-LGG data (n=575), we evaluate four fusion architectures combining whole-slide image (WSI) and RNA-seq features. Our central finding is an inverse relationship between predictive performance and measured interaction: architectures achieving superior discrimination (C-index 0.64$\to$0.82) exhibit equivalent or lower cross-modal interaction (4.8\%$\to$3.0\%). Variance decomposition reveals stable additive contributions across all architectures (WSI${\approx}$40\%, RNA${\approx}$55\%, Interaction${\approx}$4\%), indicating that performance gains arise from complementary signal aggregation rather than learned synergy. These findings provide a practical model auditing tool for comparing fusion strategies, reframe the role of architectural complexity in multimodal fusion, and have implications for privacy-preserving federated deployment.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper shows low cross-modal interactions in glioma survival models using adapted InterSHAP, indicating additive fusion drives gains, but the metric adaptation lacks validation.

read the letter

The central finding here is that multimodal glioma survival models achieve better performance through additive combination of imaging and genomic signals rather than through cross-modal synergies, with measured interactions staying low at around 4% across different fusion architectures. The authors adapt the InterSHAP interaction index to Cox proportional hazards models and apply it to TCGA data with 575 cases combining whole-slide images and RNA-seq. They test four fusion strategies and report that higher C-index values correspond to equal or lower interaction percentages, while the variance breakdown remains stable with WSI contributing roughly 40 percent, RNA 55 percent, and interactions only 4 percent. This challenges the usual assumption that multimodal gains come from learned synergies and instead points to complementary signal coverage as the main driver. The work is new in its specific use of this adapted metric for survival prediction in glioma, and it offers a concrete auditing method for comparing fusion approaches. The consistency of the additive pattern across models is a clear positive, and the numbers do not reduce directly to the performance metric itself. The soft spots are mainly around the unvalidated adaptation step. There are no reported checks on whether replacing classification outputs with Cox linear predictors preserves the properties of the Shapley interaction index under censoring or varying model depths. Without error bars or significance tests on the percentage contributions, the inverse relationship between performance and interaction remains suggestive rather than definitive. If the adaptation introduces architecture-dependent bias, the conclusion that simpler additive models suffice could be overstated. This paper is aimed at researchers in multimodal medical AI who care about interpretability and model simplicity. Readers working on fusion strategies or clinical deployment would find the auditing tool useful. It shows clear thinking in engaging the literature on synergies versus additivity. I would recommend sending it for peer review so that the metric adaptation and statistical details can be examined closely.

Referee Report

2 major / 2 minor

Summary. This paper adapts the InterSHAP metric from classification to Cox proportional hazards models to quantify cross-modal interactions in multimodal glioma survival prediction. Using TCGA-GBM and TCGA-LGG data (n=575), it evaluates four fusion architectures combining WSI and RNA-seq features and reports an inverse relationship between predictive performance (C-index 0.64 to 0.82) and measured cross-modal interaction (4.8% to 3.0%), with variance decomposition showing stable additive contributions (WSI ≈40%, RNA ≈55%, interaction ≈4%) across architectures, concluding that performance gains arise from complementary additive signal integration rather than learned synergy.

Significance. If the InterSHAP adaptation to Cox models is validated as unbiased, the results would provide concrete evidence against the assumption that multimodal fusion benefits primarily from synergistic interactions in survival settings. This reframes architectural choices toward simpler additive models, supplies a practical auditing metric for fusion strategies, and carries implications for efficient deployment including privacy-preserving federated scenarios.

major comments (2)

[Methods] Methods section: The adaptation replaces classification logits with the Cox partial likelihood or linear predictor for the Shapley interaction index, but the manuscript supplies no simulation studies on synthetic censored data with known interaction strengths, no comparison to alternative interaction metrics, and no analysis of potential bias from censoring rates or baseline hazard estimation; this is load-bearing because the central claim of an inverse performance-interaction relationship and the 3–4.8% interaction range rest entirely on the metric's fidelity.
[Results] Results section (referenced via abstract ranges): The reported C-index values and interaction percentages are given as point estimates without error bars, cross-validation standard deviations, or statistical tests for differences across the four architectures, so it is impossible to determine whether the claimed inverse relationship (higher C-index with lower interaction) is robust or could be explained by sampling variability.

minor comments (2)

[Abstract] Abstract: The cohort size n=575 is stated but the breakdown between TCGA-GBM and TCGA-LGG cases, the censoring rate, and the exact train/validation/test splits are not provided, which would aid reproducibility of the variance decomposition.
[Discussion] Discussion: The claim that lower interaction aids privacy-preserving federated deployment is asserted without any supporting calculation or reference to how interaction strength correlates with information leakage in multimodal settings.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their detailed and constructive review. We address each major comment below and will revise the manuscript to incorporate additional validation and statistical reporting.

read point-by-point responses

Referee: [Methods] Methods section: The adaptation replaces classification logits with the Cox partial likelihood or linear predictor for the Shapley interaction index, but the manuscript supplies no simulation studies on synthetic censored data with known interaction strengths, no comparison to alternative interaction metrics, and no analysis of potential bias from censoring rates or baseline hazard estimation; this is load-bearing because the central claim of an inverse performance-interaction relationship and the 3–4.8% interaction range rest entirely on the metric's fidelity.

Authors: We acknowledge that explicit validation simulations for the Cox adaptation are absent from the current manuscript. In the revised version we will add simulation experiments on synthetic censored survival data with controlled ground-truth interaction strengths, vary censoring rates, and compare the adapted InterSHAP against alternative interaction metrics. We will also report sensitivity to baseline hazard estimation. These additions will directly substantiate the metric's fidelity and the reported interaction range. revision: yes
Referee: [Results] Results section (referenced via abstract ranges): The reported C-index values and interaction percentages are given as point estimates without error bars, cross-validation standard deviations, or statistical tests for differences across the four architectures, so it is impossible to determine whether the claimed inverse relationship (higher C-index with lower interaction) is robust or could be explained by sampling variability.

Authors: We agree that uncertainty measures and statistical testing are required. The revised manuscript will report C-index and interaction values together with cross-validation standard deviations, include error bars on all relevant figures, and add statistical comparisons (e.g., paired tests with multiple-comparison correction) across the four architectures. These changes will allow readers to assess the robustness of the observed inverse relationship. revision: yes

Circularity Check

0 steps flagged

No circularity: InterSHAP-derived interactions computed independently of C-index

full rationale

The paper adapts InterSHAP to Cox proportional hazards models and applies the metric to trained fusion architectures to obtain interaction percentages and variance decomposition. These quantities are direct outputs of the Shapley-based computation on model predictions rather than parameters fitted to or defined by the reported C-index values. No equation equates interaction strength to discrimination performance, and the observed inverse relationship is an empirical result from separate measurements. The derivation relies on external metric adaptation and data application without self-definitional reduction, fitted-input renaming, or load-bearing self-citation chains.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim depends on the validity of adapting InterSHAP to Cox models and on the assumption that the four tested fusion architectures are representative of multimodal survival modeling practice.

axioms (1)

domain assumption InterSHAP can be validly adapted from classification to Cox proportional hazards models without introducing systematic bias in interaction estimates
The abstract states the adaptation was performed but provides no validation or sensitivity analysis for this step.

pith-pipeline@v0.9.0 · 5523 in / 1240 out tokens · 36737 ms · 2026-05-13T23:41:56.763447+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/BranchSelection.lean branch_selection echoes

?

echoes
ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

Variance decomposition reveals stable additive contributions across all architectures (WSI≈40%, RNA≈55%, Interaction≈4%)

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

16 extracted references · 16 canonical work pages

[1]

Communications Medicine3(1), 44 (2023)

Steyaert, S., Qiu, Y.L., Zheng, Y., Mukherjee, P., Vogel, H., Gevaert, O.: Multimodal deep learning to predict prognosis in adult and pediatric brain tumors. Communications Medicine3(1), 44 (2023). Nature Publishing Group

work page 2023
[2]

Nature Communications14, 4122 (2023)

Zheng, Y., Carrillo-Perez, F., Pizurica, M., Heiland, D.H., Gevaert, O.: Spatial cellular architecture predicts prognosis in glioblastoma. Nature Communications14, 4122 (2023). Nature Publishing Group

work page 2023
[3]

Identifying the Best Machine Learning Algorithms for Brain Tumor Segmentation, Progression Assessment, and Overall Survival Prediction in the BRATS Challenge

Bakas, S., Reyes, M., Jakab, A., Bauer, S., Rempfler, M., Crimi, A., Shinohara, R.T., Berger, C., Ha, S.M., Rozycki, M., et al.: Identifying the best machine learning algorithms for brain tumor segmentation, progression assessment, and overall survival prediction in the BRATS challenge. arXiv preprint arXiv:1811.02629 (2018)

work page Pith review arXiv 2018
[4]

Proceedings of the National Academy of Sciences115(13), E2970–E2979 (2018)

Mobadersany, P., Yousefi, S., Amgad, M., Gutman, D.A., Barnholtz-Sloan, J.S., Velazquez Vega, J.E., Brat, D.J., Cooper, L.A.D.: Predicting cancer outcomes from histology and genomics using convolutional networks. Proceedings of the National Academy of Sciences115(13), E2970–E2979 (2018). National Academy of Sciences

work page 2018
[5]

In: Proceedings of the 39th AAAI Conference on Artificial Intelligence (AAAI-25), pp

Wenderoth, L., Hemker, K., Simidjievski, N., Jamnik, M.: Measuring cross-modal interactions in multimodal models. In: Proceedings of the 39th AAAI Conference on Artificial Intelligence (AAAI-25), pp. 21501–21509. AAAI Press (2025)

work page 2025
[6]

International Journal of Game Theory28(4), 547–565 (1999)

Grabisch, M., Roubens, M.: An axiomatic approach to the concept of interaction among players in cooperative games. International Journal of Game Theory28(4), 547–565 (1999). Springer

work page 1999
[7]

In: Advances in Neural Information Processing Systems30, pp

Lundberg, S.M., Lee, S.-I.: A unified approach to interpreting model predictions. In: Advances in Neural Information Processing Systems30, pp. 4765–4774. Curran Associates (2017)

work page 2017
[8]

Queue16(3), 31–57 (2018)

Lipton, Z.C.: The mythos of model interpretability. Queue16(3), 31–57 (2018). ACM

work page 2018
[9]

Nature Machine Intelligence1(5), 206–215 (2019)

Rudin, C.: Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence1(5), 206–215 (2019). Nature Publishing Group

work page 2019
[10]

ACM Transactions on Knowledge Discovery from Data6(4), Article 15, 1–21 (2012)

Kaufman, S., Rosset, S., Perlich, C., Stitelman, O.: Leakage in data mining: Formulation, detection, and avoidance. ACM Transactions on Knowledge Discovery from Data6(4), Article 15, 1–21 (2012). ACM

work page 2012
[11]

Journal of the Royal Statistical Society: Series B (Methodological)34(2), 187–220 (1972)

Cox, D.R.: Regression models and life-tables. Journal of the Royal Statistical Society: Series B (Methodological)34(2), 187–220 (1972). Wiley

work page 1972
[12]

In: Proceedings of the 35th International Conference on Machine Learning (ICML), pp

Ilse, M., Tomczak, J., Welling, M.: Attention-based deep multiple instance learning. In: Proceedings of the 35th International Conference on Machine Learning (ICML), pp. 2127–2136. PMLR (2018)

work page 2018
[13]

Nature Biomedical Engineering 5, 555–570 (2021)

Lu, M.Y., Williamson, D.F.K., Chen, T.Y., Chen, R.J., Barbieri, M., Mahmood, F.: Data-efficient and weakly supervised computational pathology on whole-slide images. Nature Biomedical Engineering 5, 555–570 (2021). Nature Publishing Group

work page 2021
[14]

In: Proceedings of the 20th International Conference on Artificial Intelligence and Statistics (AISTATS), pp

McMahan, H.B., Moore, E., Ramage, D., Hampson, S., Agüera y Arcas, B.: Communication-efficient learning of deep networks from decentralized data. In: Proceedings of the 20th International Conference on Artificial Intelligence and Statistics (AISTATS), pp. 1273–1282. PMLR (2017)

work page 2017
[15]

doi:10.7937/K9/TCIA.2016.RNYFUYE9

The Cancer Imaging Archive: TCGA-GBM data collection (2016). doi:10.7937/K9/TCIA.2016.RNYFUYE9

work page doi:10.7937/k9/tcia.2016.rnyfuye9 2016
[16]

doi:10.7937/K9/TCIA.2016.L4LTD3TK

The Cancer Imaging Archive: TCGA-LGG data collection (2016). doi:10.7937/K9/TCIA.2016.L4LTD3TK

work page doi:10.7937/k9/tcia.2016.l4ltd3tk 2016