Handling Missing Modalities in Multimodal Survival Prediction for Non-Small Cell Lung Cancer

Alessandro Bria; Alessio Cortellini; Bruno Beomonte Zobel; Bruno Vincenzi; Camillo Maria Caruso; Carlo Greco; Claudia Tacconi; Claudio Marrocco; Edy Ippolito; Elisa Ficarra

arxiv: 2601.10386 · v2 · submitted 2026-01-15 · 💻 cs.CV · cs.AI· cs.MM

Handling Missing Modalities in Multimodal Survival Prediction for Non-Small Cell Lung Cancer

Filippo Ruffini , Camillo Maria Caruso , Claudia Tacconi , Lorenzo Nibid , Francesca Miccolis , Marta Lovino , Carlo Greco , Edy Ippolito

show 11 more authors

Michele Fiore Alessio Cortellini Bruno Beomonte Zobel Giuseppe Perrone Bruno Vincenzi Claudio Marrocco Alessandro Bria Elisa Ficarra Sara Ramella Valerio Guarrasi Paolo Soda

This is my paper

Pith reviewed 2026-05-16 14:08 UTC · model grok-4.3

classification 💻 cs.CV cs.AIcs.MM

keywords multimodal learningsurvival predictionmissing modalitiesnon-small cell lung cancerfoundation modelsintermediate fusionCT imaginghistopathology

0 comments

The pith

Missing-aware multimodal fusion of CT, histopathology, and clinical data outperforms unimodal and other fusion strategies for survival prediction in non-small cell lung cancer.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a framework that handles missing modalities in multimodal data for predicting survival in NSCLC patients. It extracts features using foundation models from CT scans, whole-slide images, and clinical variables, then fuses them in an intermediate way while accounting for missing data without dropping cases. This approach allows using all available patient data and achieves better performance than single-modality models or early and late fusion methods. A sympathetic reader would care because it addresses a common real-world problem in medical AI where not all tests are available for every patient, potentially leading to more accurate prognosis without requiring complete datasets.

Core claim

The framework combines foundation models for modality-specific feature extraction with a missing-aware encoding strategy to enable intermediate multimodal fusion for overall survival modeling in unresectable stage II-III NSCLC. The trimodal configuration reaches a C-index of 74.42, outperforming unimodal baselines and both early and late fusion, with learned risk scores producing clinically meaningful stratification supported by significant log-rank tests.

What carries the argument

Missing-aware encoding strategy combined with intermediate fusion of features from foundation models pretrained on CT, WSI, and clinical data.

If this is right

The model adapts reliance on each modality based on representation informativeness.
Risk scores stratify patients by disease progression and metastatic risk with statistical significance.
All patients can be included in training and inference without filtering for complete modalities.
Translational relevance is supported for clinical use in prognosis.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Such frameworks could extend to other cancers or multimodal tasks where data incompleteness is common.
Future work might test if better alignment of foundation model pretraining with survival tasks further improves performance.
The modality-importance analysis suggests potential for dynamic modality selection in clinical settings.

Load-bearing premise

That features from general-purpose foundation models align sufficiently with the survival prediction task and that the missing-aware strategy avoids introducing bias in incomplete cases.

What would settle it

A study showing that a trimodal model with this missing-aware intermediate fusion fails to outperform unimodal models or other fusions on a held-out NSCLC cohort with natural missing modalities would falsify the central claim.

read the original abstract

Accurate survival prediction in Non-Small Cell Lung Cancer (NSCLC) requires integrating clinical, radiological, and histopathological data. Multimodal Deep Learning (MDL) can improve precision prognosis, but small cohorts and missing modalities limit its clinical applicability, as conventional approaches enforce complete case filtering or imputation. We present a missing-aware multimodal survival framework that combines Computed Tomography (CT), Whole-Slide Histopathology Images (WSI), and structured clinical variables for overall survival modeling in unresectable stage II-III NSCLC. The framework uses Foundation Models (FMs) for modality-specific feature extraction and a missing-aware encoding strategy that enables intermediate multimodal fusion under naturally incomplete modality profiles. By design, the architecture processes all available data without dropping patients during training or inference. Intermediate fusion outperforms unimodal baselines and both early and late fusion strategies, with the trimodal configuration reaching a C-index of 74.42. Modality-importance analyses show that the fusion model adapts its reliance on each data stream according to representation informativeness, shaped by the alignment between FM pretraining objectives and the survival task. The learned risk scores produce clinically meaningful stratification of disease progression and metastatic risk, with statistically significant log-rank tests across all modality combinations, supporting the translational relevance of the proposed framework.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a workable missing-aware intermediate fusion setup for trimodal NSCLC survival prediction that keeps incomplete cases in the analysis, but the performance edge over baselines is hard to trust without ablations on complete cases or dataset details.

read the letter

The core contribution is a framework that extracts features from CT, whole-slide images, and clinical variables using foundation models, then fuses them at an intermediate stage with a missing-aware encoding. This lets the model train and infer on all patients without dropping incomplete records or relying on imputation, which is a direct response to how NSCLC data actually arrives in the clinic for unresectable stage II-III cases. Intermediate fusion comes out ahead of unimodal baselines and both early and late fusion, with the full trimodal run at a C-index of 74.42 and risk scores that separate progression and metastasis groups on log-rank tests. The model also appears to down-weight less informative modalities automatically. That part is useful and addresses a genuine barrier to deploying these models outside curated cohorts. The soft spot is that the abstract supplies no cohort size, no cross-validation scheme, no explicit handling of censoring, and no ablation that pits the missing-aware version against the same architecture trained only on fully observed patients. Without those checks it is difficult to separate real gains from artifacts of non-random missingness, such as sicker patients lacking WSI. The stress-test concern about bias leakage therefore stands until the authors add the comparison. This work is aimed at researchers and clinicians who build or use multimodal prognostic tools in oncology and routinely face patchy data. It is worth sending to peer review because the problem is practical and the proposed fix is straightforward, even though the current evidence is preliminary and will need tighter validation on data splits and missingness patterns before the claims can be taken as settled.

Referee Report

2 major / 1 minor

Summary. The paper proposes a missing-aware multimodal survival prediction framework for unresectable stage II-III NSCLC that integrates CT, WSI, and clinical data via foundation model feature extractors. A missing-aware encoding strategy enables intermediate fusion without patient exclusion or imputation for incomplete modality profiles. The trimodal configuration achieves a C-index of 74.42, outperforming unimodal baselines and early/late fusion, with modality-importance analysis showing adaptive reliance on each stream and learned risk scores yielding statistically significant log-rank stratification of progression and metastasis risk.

Significance. If the central claims are substantiated, the work addresses a practical barrier in clinical multimodal modeling by retaining all patients regardless of missing modalities, which could increase effective sample sizes and reduce selection bias in NSCLC prognosis. The use of foundation models with explicit modality-importance analysis provides insight into task alignment and offers a template for handling naturally incomplete data in other oncology settings.

major comments (2)

[Abstract] Abstract: The reported trimodal C-index of 74.42 and claims of outperformance over baselines and fusion strategies are presented without any mention of cohort size, number of patients per modality combination, cross-validation folds, or censoring handling, preventing assessment of whether the performance differences are statistically or clinically meaningful.
[Methods] Methods (missing-aware encoding section): The central claim that missing-aware encoding enables unbiased intermediate fusion across naturally incomplete profiles is load-bearing, yet no ablation compares the full model (trained on all patients) against an identical architecture restricted to complete trimodal cases, nor are MCAR/MNAR simulations provided to test whether non-random missingness (e.g., sicker patients lacking WSI) leaks outcome information into risk scores or modality weights.

minor comments (1)

[Results] Results: The abstract states 'statistically significant log-rank tests across all modality combinations' but does not report the actual p-values, hazard ratios, or confidence intervals; these should be added to the main text or a supplementary table for transparency.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We have revised the abstract to supply the requested contextual details and added the suggested ablations and missingness simulations to substantiate the missing-aware encoding claims. Point-by-point responses follow.

read point-by-point responses

Referee: [Abstract] Abstract: The reported trimodal C-index of 74.42 and claims of outperformance over baselines and fusion strategies are presented without any mention of cohort size, number of patients per modality combination, cross-validation folds, or censoring handling, preventing assessment of whether the performance differences are statistically or clinically meaningful.

Authors: We agree that the abstract must supply these details to permit evaluation of the reported metrics. The revised abstract now states the total cohort size, the patient counts for each modality combination, the 5-fold cross-validation procedure, and the censoring handling approach used in the Cox loss. These additions enable readers to assess both statistical significance and clinical relevance of the C-index differences. revision: yes
Referee: [Methods] Methods (missing-aware encoding section): The central claim that missing-aware encoding enables unbiased intermediate fusion across naturally incomplete profiles is load-bearing, yet no ablation compares the full model (trained on all patients) against an identical architecture restricted to complete trimodal cases, nor are MCAR/MNAR simulations provided to test whether non-random missingness (e.g., sicker patients lacking WSI) leaks outcome information into risk scores or modality weights.

Authors: We acknowledge that explicit validation of unbiased fusion is necessary. We have added an ablation that trains the identical architecture on the full (incomplete) cohort versus the complete trimodal subset only. We further include MCAR and MNAR simulations that inject controlled missingness patterns and measure leakage into risk scores and modality weights. Results confirm that performance gains persist without detectable outcome leakage from non-random missingness; these experiments are reported in the revised Methods and supplementary results. revision: yes

Circularity Check

0 steps flagged

No significant circularity in empirical multimodal survival framework

full rationale

The paper's central claims rest on empirical evaluation of a missing-aware intermediate fusion model trained on NSCLC patient data, reporting external metrics such as C-index (74.42 for trimodal) and log-rank p-values. No derivation chain, equation, or prediction reduces to its own inputs by construction; performance is measured against held-out outcomes rather than fitted parameters renamed as results. The architecture description (foundation-model feature extraction plus missing-aware encoding) is presented as a design choice whose validity is tested via ablation against unimodal/early/late baselines, not assumed via self-citation or self-definition. This is a standard self-contained empirical study with no load-bearing circular steps.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The framework relies on the assumption that FM features are transferable to survival modeling and that the custom missing-aware mechanism works without bias; no explicit free parameters or invented entities are listed in the abstract.

axioms (2)

domain assumption Foundation models extract features relevant to the survival prediction task
Assumed that pre-trained FMs align with the survival modeling objective without task-specific fine-tuning details provided.
ad hoc to paper Missing-aware encoding allows unbiased fusion of incomplete modalities
The strategy is proposed in the paper to handle missing data without dropping cases or imputation.

pith-pipeline@v0.9.0 · 5602 in / 1301 out tokens · 63606 ms · 2026-05-16T14:08:57.439855+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

missing-aware encoding strategy that enables intermediate multimodal fusion under naturally incomplete modality profiles... NAIM+ODST encoder... adaptive masking mechanism
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Oblivious Differentiable Decision Tree (ODST) head... intermediate fusion

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.