pith. machine review for the scientific record. sign in

arxiv: 2603.29977 · v2 · submitted 2026-03-31 · 💻 cs.LG · cs.AI· q-bio.QM

Recognition: 1 theorem link

· Lean Theorem

Quantifying Cross-Modal Interactions in Multimodal Glioma Survival Prediction via InterSHAP: Evidence for Additive Signal Integration

Authors on Pith no claims yet

Pith reviewed 2026-05-13 23:41 UTC · model grok-4.3

classification 💻 cs.LG cs.AIq-bio.QM
keywords multimodal fusionglioma survivalcross-modal interactionshapley valuescox modelsadditive variancerna-seqwhole-slide images
0
0 comments X

The pith

Multimodal glioma survival models gain performance by adding image and RNA signals rather than learning their interactions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests the widespread assumption that multimodal models improve cancer survival predictions through synergistic interactions between data types such as whole-slide images and RNA-seq. It adapts the InterSHAP metric to Cox proportional hazards models and applies it to four fusion architectures on TCGA glioma data. The central result is an inverse relationship: stronger predictive models show equal or lower interaction contributions, while variance decomposition attributes most power to stable additive effects from each modality separately. This finding matters for model design because it indicates that performance improvements do not require increasingly complex cross-modal learning.

Core claim

Adapting InterSHAP to survival models on TCGA-GBM and TCGA-LGG data (n=575), the study finds that architectures with rising C-index from 0.64 to 0.82 display cross-modal interaction shares falling from 4.8% to 3.0%. Variance decomposition shows consistent additive splits across all tested fusion strategies, with whole-slide images contributing approximately 40% and RNA-seq approximately 55%, leaving interaction at roughly 4%. Performance therefore stems from complementary signal aggregation instead of learned synergy between modalities.

What carries the argument

InterSHAP, the Shapley interaction index metric adapted to Cox proportional hazards models, which isolates the fraction of output variance due to cross-modal interactions between whole-slide image and RNA-seq features.

If this is right

  • Performance gains arise from better aggregation of independent modality signals rather than from modeling interactions.
  • Cross-modal interactions remain small and stable at around 4% of variance no matter which fusion architecture is used.
  • Simpler fusion methods can reach high discrimination without added complexity for interaction terms.
  • The metric supplies an auditing method to compare fusion strategies by separating additive from interactive contributions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Designers could favor separate feature extractors over joint layers when building multimodal survival models.
  • The same additive pattern may appear in other cancers or modality pairs and could be checked by reusing the metric.
  • Low interaction reduces the need for joint training across sites, supporting privacy-preserving federated setups.

Load-bearing premise

The adaptation of InterSHAP from classification to Cox survival models measures genuine cross-modal interactions without distortion from architecture or preprocessing choices.

What would settle it

An experiment showing a high-C-index fusion architecture with interaction contribution above 5% or one where the interaction share changes sharply under different preprocessing would falsify the stability and small-size claims.

Figures

Figures reproduced from arXiv: 2603.29977 by Iain Swift, JingHua Ye, Ruairi O'Reilly.

Figure 1
Figure 1. Figure 1: Fusion architectures evaluated. (A) Early Fusion MLP (8.8M params). (B) Cross-Attention (1.8M params). (C) Bilinear Fusion (0.54M params). (D) Gated Fusion (3.2M params). Histopathology: WSIs were processed at 20× magnification into non-overlapping 224×224 patches. ResNet-50 (ImageNet-pretrained) extracted 2,048-dimensional features per patch, averaged to slide-level embeddings [1]. Transcriptomics: RNA-Se… view at source ↗
read the original abstract

Multimodal deep learning for cancer prognosis is commonly assumed to benefit from synergistic cross-modal interactions, yet this assumption has not been directly tested in survival prediction settings. This work adapts InterSHAP, a Shapley interaction index-based metric, from classification to Cox proportional hazards models and applies it to quantify cross-modal interactions in glioma survival prediction. Using TCGA-GBM and TCGA-LGG data (n=575), we evaluate four fusion architectures combining whole-slide image (WSI) and RNA-seq features. Our central finding is an inverse relationship between predictive performance and measured interaction: architectures achieving superior discrimination (C-index 0.64$\to$0.82) exhibit equivalent or lower cross-modal interaction (4.8\%$\to$3.0\%). Variance decomposition reveals stable additive contributions across all architectures (WSI${\approx}$40\%, RNA${\approx}$55\%, Interaction${\approx}$4\%), indicating that performance gains arise from complementary signal aggregation rather than learned synergy. These findings provide a practical model auditing tool for comparing fusion strategies, reframe the role of architectural complexity in multimodal fusion, and have implications for privacy-preserving federated deployment.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. This paper adapts the InterSHAP metric from classification to Cox proportional hazards models to quantify cross-modal interactions in multimodal glioma survival prediction. Using TCGA-GBM and TCGA-LGG data (n=575), it evaluates four fusion architectures combining WSI and RNA-seq features and reports an inverse relationship between predictive performance (C-index 0.64 to 0.82) and measured cross-modal interaction (4.8% to 3.0%), with variance decomposition showing stable additive contributions (WSI ≈40%, RNA ≈55%, interaction ≈4%) across architectures, concluding that performance gains arise from complementary additive signal integration rather than learned synergy.

Significance. If the InterSHAP adaptation to Cox models is validated as unbiased, the results would provide concrete evidence against the assumption that multimodal fusion benefits primarily from synergistic interactions in survival settings. This reframes architectural choices toward simpler additive models, supplies a practical auditing metric for fusion strategies, and carries implications for efficient deployment including privacy-preserving federated scenarios.

major comments (2)
  1. [Methods] Methods section: The adaptation replaces classification logits with the Cox partial likelihood or linear predictor for the Shapley interaction index, but the manuscript supplies no simulation studies on synthetic censored data with known interaction strengths, no comparison to alternative interaction metrics, and no analysis of potential bias from censoring rates or baseline hazard estimation; this is load-bearing because the central claim of an inverse performance-interaction relationship and the 3–4.8% interaction range rest entirely on the metric's fidelity.
  2. [Results] Results section (referenced via abstract ranges): The reported C-index values and interaction percentages are given as point estimates without error bars, cross-validation standard deviations, or statistical tests for differences across the four architectures, so it is impossible to determine whether the claimed inverse relationship (higher C-index with lower interaction) is robust or could be explained by sampling variability.
minor comments (2)
  1. [Abstract] Abstract: The cohort size n=575 is stated but the breakdown between TCGA-GBM and TCGA-LGG cases, the censoring rate, and the exact train/validation/test splits are not provided, which would aid reproducibility of the variance decomposition.
  2. [Discussion] Discussion: The claim that lower interaction aids privacy-preserving federated deployment is asserted without any supporting calculation or reference to how interaction strength correlates with information leakage in multimodal settings.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their detailed and constructive review. We address each major comment below and will revise the manuscript to incorporate additional validation and statistical reporting.

read point-by-point responses
  1. Referee: [Methods] Methods section: The adaptation replaces classification logits with the Cox partial likelihood or linear predictor for the Shapley interaction index, but the manuscript supplies no simulation studies on synthetic censored data with known interaction strengths, no comparison to alternative interaction metrics, and no analysis of potential bias from censoring rates or baseline hazard estimation; this is load-bearing because the central claim of an inverse performance-interaction relationship and the 3–4.8% interaction range rest entirely on the metric's fidelity.

    Authors: We acknowledge that explicit validation simulations for the Cox adaptation are absent from the current manuscript. In the revised version we will add simulation experiments on synthetic censored survival data with controlled ground-truth interaction strengths, vary censoring rates, and compare the adapted InterSHAP against alternative interaction metrics. We will also report sensitivity to baseline hazard estimation. These additions will directly substantiate the metric's fidelity and the reported interaction range. revision: yes

  2. Referee: [Results] Results section (referenced via abstract ranges): The reported C-index values and interaction percentages are given as point estimates without error bars, cross-validation standard deviations, or statistical tests for differences across the four architectures, so it is impossible to determine whether the claimed inverse relationship (higher C-index with lower interaction) is robust or could be explained by sampling variability.

    Authors: We agree that uncertainty measures and statistical testing are required. The revised manuscript will report C-index and interaction values together with cross-validation standard deviations, include error bars on all relevant figures, and add statistical comparisons (e.g., paired tests with multiple-comparison correction) across the four architectures. These changes will allow readers to assess the robustness of the observed inverse relationship. revision: yes

Circularity Check

0 steps flagged

No circularity: InterSHAP-derived interactions computed independently of C-index

full rationale

The paper adapts InterSHAP to Cox proportional hazards models and applies the metric to trained fusion architectures to obtain interaction percentages and variance decomposition. These quantities are direct outputs of the Shapley-based computation on model predictions rather than parameters fitted to or defined by the reported C-index values. No equation equates interaction strength to discrimination performance, and the observed inverse relationship is an empirical result from separate measurements. The derivation relies on external metric adaptation and data application without self-definitional reduction, fitted-input renaming, or load-bearing self-citation chains.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim depends on the validity of adapting InterSHAP to Cox models and on the assumption that the four tested fusion architectures are representative of multimodal survival modeling practice.

axioms (1)
  • domain assumption InterSHAP can be validly adapted from classification to Cox proportional hazards models without introducing systematic bias in interaction estimates
    The abstract states the adaptation was performed but provides no validation or sensitivity analysis for this step.

pith-pipeline@v0.9.0 · 5523 in / 1240 out tokens · 36737 ms · 2026-05-13T23:41:56.763447+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

  • IndisputableMonolith/Foundation/BranchSelection.lean branch_selection echoes
    ?
    echoes

    ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

    Variance decomposition reveals stable additive contributions across all architectures (WSI≈40%, RNA≈55%, Interaction≈4%)

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

16 extracted references · 16 canonical work pages

  1. [1]

    Communications Medicine3(1), 44 (2023)

    Steyaert, S., Qiu, Y.L., Zheng, Y., Mukherjee, P., Vogel, H., Gevaert, O.: Multimodal deep learning to predict prognosis in adult and pediatric brain tumors. Communications Medicine3(1), 44 (2023). Nature Publishing Group

  2. [2]

    Nature Communications14, 4122 (2023)

    Zheng, Y., Carrillo-Perez, F., Pizurica, M., Heiland, D.H., Gevaert, O.: Spatial cellular architecture predicts prognosis in glioblastoma. Nature Communications14, 4122 (2023). Nature Publishing Group

  3. [3]

    Identifying the Best Machine Learning Algorithms for Brain Tumor Segmentation, Progression Assessment, and Overall Survival Prediction in the BRATS Challenge

    Bakas, S., Reyes, M., Jakab, A., Bauer, S., Rempfler, M., Crimi, A., Shinohara, R.T., Berger, C., Ha, S.M., Rozycki, M., et al.: Identifying the best machine learning algorithms for brain tumor segmentation, progression assessment, and overall survival prediction in the BRATS challenge. arXiv preprint arXiv:1811.02629 (2018)

  4. [4]

    Proceedings of the National Academy of Sciences115(13), E2970–E2979 (2018)

    Mobadersany, P., Yousefi, S., Amgad, M., Gutman, D.A., Barnholtz-Sloan, J.S., Velazquez Vega, J.E., Brat, D.J., Cooper, L.A.D.: Predicting cancer outcomes from histology and genomics using convolutional networks. Proceedings of the National Academy of Sciences115(13), E2970–E2979 (2018). National Academy of Sciences

  5. [5]

    In: Proceedings of the 39th AAAI Conference on Artificial Intelligence (AAAI-25), pp

    Wenderoth, L., Hemker, K., Simidjievski, N., Jamnik, M.: Measuring cross-modal interactions in multimodal models. In: Proceedings of the 39th AAAI Conference on Artificial Intelligence (AAAI-25), pp. 21501–21509. AAAI Press (2025)

  6. [6]

    International Journal of Game Theory28(4), 547–565 (1999)

    Grabisch, M., Roubens, M.: An axiomatic approach to the concept of interaction among players in cooperative games. International Journal of Game Theory28(4), 547–565 (1999). Springer

  7. [7]

    In: Advances in Neural Information Processing Systems30, pp

    Lundberg, S.M., Lee, S.-I.: A unified approach to interpreting model predictions. In: Advances in Neural Information Processing Systems30, pp. 4765–4774. Curran Associates (2017)

  8. [8]

    Queue16(3), 31–57 (2018)

    Lipton, Z.C.: The mythos of model interpretability. Queue16(3), 31–57 (2018). ACM

  9. [9]

    Nature Machine Intelligence1(5), 206–215 (2019)

    Rudin, C.: Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence1(5), 206–215 (2019). Nature Publishing Group

  10. [10]

    ACM Transactions on Knowledge Discovery from Data6(4), Article 15, 1–21 (2012)

    Kaufman, S., Rosset, S., Perlich, C., Stitelman, O.: Leakage in data mining: Formulation, detection, and avoidance. ACM Transactions on Knowledge Discovery from Data6(4), Article 15, 1–21 (2012). ACM

  11. [11]

    Journal of the Royal Statistical Society: Series B (Methodological)34(2), 187–220 (1972)

    Cox, D.R.: Regression models and life-tables. Journal of the Royal Statistical Society: Series B (Methodological)34(2), 187–220 (1972). Wiley

  12. [12]

    In: Proceedings of the 35th International Conference on Machine Learning (ICML), pp

    Ilse, M., Tomczak, J., Welling, M.: Attention-based deep multiple instance learning. In: Proceedings of the 35th International Conference on Machine Learning (ICML), pp. 2127–2136. PMLR (2018)

  13. [13]

    Nature Biomedical Engineering 5, 555–570 (2021)

    Lu, M.Y., Williamson, D.F.K., Chen, T.Y., Chen, R.J., Barbieri, M., Mahmood, F.: Data-efficient and weakly supervised computational pathology on whole-slide images. Nature Biomedical Engineering 5, 555–570 (2021). Nature Publishing Group

  14. [14]

    In: Proceedings of the 20th International Conference on Artificial Intelligence and Statistics (AISTATS), pp

    McMahan, H.B., Moore, E., Ramage, D., Hampson, S., Agüera y Arcas, B.: Communication-efficient learning of deep networks from decentralized data. In: Proceedings of the 20th International Conference on Artificial Intelligence and Statistics (AISTATS), pp. 1273–1282. PMLR (2017)

  15. [15]

    doi:10.7937/K9/TCIA.2016.RNYFUYE9

    The Cancer Imaging Archive: TCGA-GBM data collection (2016). doi:10.7937/K9/TCIA.2016.RNYFUYE9

  16. [16]

    doi:10.7937/K9/TCIA.2016.L4LTD3TK

    The Cancer Imaging Archive: TCGA-LGG data collection (2016). doi:10.7937/K9/TCIA.2016.L4LTD3TK