arxiv: 2604.09197 · v1 · submitted 2026-04-10 · 💻 cs.CV · cs.AI

Vision Transformers for Preoperative CT-Based Prediction of Histopathologic Chemotherapy Response Score in High-Grade Serous Ovarian Carcinoma

Francesca Fati , Felipe Coutinho , Marika Reinius , Marina Rosanu , Gabriel Funingana , Luigi De Vitis , Gabriella Schivardi , Hannah Clayton

show 20 more authors

Alice Traversa Zeyu Gao Guilherme Penteado Shangqi Gao Francesco Pastori Ramona Woitek Maria Cristina Ghioni Giovanni Damiano Aletti Mercedes Jimenez-Linan Sarah Burge Nicoletta Colombo Evis Sala Maria Francesca Spadea Timothy L. Kline James D. Brenton Jaime Cardoso Francesco Multinu Elena De Momi Mireia Crispin-Ortuzar Ines P. Machado

This is my paper

Pith reviewed 2026-05-10 17:00 UTC · model grok-4.3

classification 💻 cs.CV cs.AI

keywords Vision TransformerCT imaginghigh-grade serous ovarian carcinomachemotherapy response scoreneoadjuvant chemotherapymultimodal fusionpreoperative prediction

0 comments

The pith

A multimodal Vision Transformer predicts the post-treatment chemotherapy response score in ovarian cancer from pre-treatment CT scans and clinical data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests whether routine preoperative CT imaging and basic clinical variables can forecast the Chemotherapy Response Score, a validated post-surgical measure of how well neoadjuvant chemotherapy shrank high-grade serous ovarian tumors. Because the score is only available after surgery, an early non-invasive estimate could help multidisciplinary teams decide on treatment intensity for patients who cannot undergo immediate debulking. The authors build a 2.5D model that feeds lesion-rich omental slices into a pre-trained Vision Transformer, then merges the image embeddings with clinical features before predicting the three-level response category. Strong internal results on one hospital's data contrast with a clear drop on an external set, illustrating both promise and the practical limits of current transfer. If the approach holds, it would supply an investigational adjunct that narrows uncertainty before chemotherapy starts.

Core claim

The central claim is that a 2.5D multimodal framework processing omental CT slices with a pre-trained Vision Transformer encoder, fused at an intermediate stage with clinical variables, can preoperatively predict the histopathological Chemotherapy Response Score in high-grade serous ovarian carcinoma patients receiving neoadjuvant chemotherapy, reaching 0.95 ROC-AUC and 95% accuracy internally while remaining feasible on an external cohort.

What carries the argument

A 2.5D multimodal pipeline that extracts representations from selected CT slices via a pre-trained Vision Transformer and combines them with clinical variables through an intermediate fusion module to output CRS class probabilities.

If this is right

Pre-treatment estimates of CRS could be discussed in MDT meetings to set expectations about the likelihood of successful cytoreduction after chemotherapy.
Patients predicted to have poor response might be considered for alternative regimens or clinical trials earlier in the pathway.
The same imaging-clinical fusion approach could be tested on other response biomarkers that are currently only measurable postoperatively.
Routine clinical CT data already acquired for staging would suffice, avoiding the need for additional specialized scans.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If domain shift remains the main obstacle, future work could test simple harmonization or federated training to stabilize performance across hospitals.
The same slice-selection and fusion strategy might apply to predicting response in other heterogeneous solid tumors where neoadjuvant therapy is standard.
Combining the current imaging signal with emerging blood-based or genomic markers could raise external performance without requiring new imaging hardware.

Load-bearing premise

CT imaging features extracted by the Vision Transformer carry information about a tumor's future biological response to chemotherapy that is not limited to the scanner or patient population used for training.

What would settle it

An independent prospective cohort of at least 100 patients in which the model's predicted CRS categories show no statistically significant association with actual post-treatment histopathology, for example an external ROC-AUC statistically indistinguishable from 0.5.

read the original abstract

Purpose. High-grade serous ovarian carcinoma (HGSOC) is characterized by pronounced biological and spatial heterogeneity and is frequently diagnosed at an advanced stage. Neoadjuvant chemotherapy (NACT) followed by delayed primary surgery is commonly employed in patients unsuitable for primary cytoreduction. The Chemotherapy Response Score (CRS) is a validated histopathological biomarker of response to NACT, but it is only available postoperatively. In this study, we investigate whether pre-treatment computed tomography (CT) imaging and clinical data can be used to predict CRS as an investigational decision-support adjunct to inform multidisciplinary team (MDT) discussions regarding expected treatment response. Methods. We proposed a 2.5D multimodal deep learning framework that processes lesion-dense omental slices using a pre-trained Vision Transformer encoder and integrates the resulting visual representations with clinical variables through an intermediate fusion module to predict CRS. Results. Our multimodal model, integrating imaging and clinical data, achieved a ROC-AUC of 0.95 alongside 95% accuracy and 80% precision on the internal test cohort (IEO, n=41 patients). On the external test set (OV04, n=70 patients), it achieved a ROC-AUC of 0.68, alongside 67% accuracy and 75% precision. Conclusion. These preliminary results demonstrate the feasibility of transformer-based deep learning for preoperative prediction of CRS in HGSOC using routine clinical data and CT imaging. As an investigational, pre-treatment decision-support tool, this approach may assist MDT discussions by providing early, non-invasive estimates of treatment response.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper applies a 2.5D ViT with clinical fusion to predict preoperative CRS in HGSOC and reports solid internal numbers that drop on external data, leaving the feasibility claim only partially supported.

read the letter

The main thing here is that they adapted a pre-trained Vision Transformer to handle 2.5D omental CT slices plus clinical variables for predicting the post-NACT histopathology score before surgery. Internal test performance hits 0.95 AUC on 41 patients, which looks usable for MDT support if it were reliable, while the external cohort of 70 patients falls to 0.68 AUC. That gap is the clearest signal in the work. They do something straightforward but useful by focusing slices on lesion-dense areas and using intermediate fusion rather than early or late concatenation. Reporting both internal and external results is better than the usual single-center setup, and the clinical framing around neoadjuvant decision support matches a real need in advanced HGSOC where CRS is only known postoperatively. The internal result is the strongest part on the page, but the small test size combined with a high-capacity model makes the 0.95 AUC vulnerable to split bias or overfitting. The external drop is consistent with scanner or population differences and does not rescue the internal claim. Without seeing the exact train-test handling, augmentation strategy, or any capacity ablations, it is hard to tell how much of the internal win is real versus artifact. This is the kind of paper that would interest medical imaging researchers working on transformers for CT or clinicians in gynecologic oncology who want early response estimates. A reader already building similar multimodal models could pick up the 2.5D slice selection and fusion details as a starting point, though they would need to replicate with larger cohorts. It deserves peer review because the task is clinically relevant and they attempted external validation, even if the methods section will require heavy scrutiny on robustness and reporting.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes a 2.5D multimodal deep learning framework using a pre-trained Vision Transformer to extract features from lesion-dense omental CT slices, fused with clinical variables, to predict the histopathologic Chemotherapy Response Score (CRS) in high-grade serous ovarian carcinoma patients before neoadjuvant chemotherapy. It reports strong performance on an internal test cohort of 41 patients (ROC-AUC 0.95, 95% accuracy, 80% precision) and moderate performance on an external cohort of 70 patients (ROC-AUC 0.68, 67% accuracy, 75% precision), concluding that this demonstrates feasibility for preoperative prediction as a decision-support tool.

Significance. If the central claims hold after addressing validation concerns, this work could represent a meaningful step toward non-invasive, imaging-based prediction of treatment response in HGSOC, potentially aiding multidisciplinary team discussions. The multimodal integration and use of Vision Transformers on CT data are timely, and the inclusion of external validation strengthens the preliminary findings. However, the large performance drop highlights the need for robust generalization strategies.

major comments (2)

[Abstract (Results)] Abstract (Results): The internal test cohort consists of only n=41 patients, yet the multimodal ViT model achieves an AUC of 0.95. Given the high model capacity of Vision Transformers and the absence of reported cross-validation, bootstrapped confidence intervals, or ablation studies on the train/test split, this performance risks being inflated by overfitting or split bias, which is load-bearing for the feasibility claim.
[Abstract (Results)] Abstract (Results): The substantial drop from internal AUC 0.95 to external AUC 0.68 suggests significant domain shift or unaddressed differences in CT acquisition protocols between IEO and OV04 cohorts; the manuscript should detail preprocessing, normalization, and any domain adaptation techniques used to mitigate this.

minor comments (2)

[Abstract] Clarify whether CRS prediction is treated as binary or multi-class classification, as precision and accuracy metrics are reported without specifying the positive class or handling of CRS categories (typically 1-3).
[Abstract] The conclusion states 'preliminary results demonstrate the feasibility'; consider tempering this given the external performance and small internal sample.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their insightful comments and recommendations. We have addressed each major comment in detail below and will incorporate the suggested revisions to enhance the manuscript's rigor and transparency regarding model validation and preprocessing details.

read point-by-point responses

Referee: The internal test cohort consists of only n=41 patients, yet the multimodal ViT model achieves an AUC of 0.95. Given the high model capacity of Vision Transformers and the absence of reported cross-validation, bootstrapped confidence intervals, or ablation studies on the train/test split, this performance risks being inflated by overfitting or split bias, which is load-bearing for the feasibility claim.

Authors: We appreciate the referee's concern regarding the small internal test cohort size and the risk of overfitting. The internal dataset was partitioned at the patient level to avoid data leakage. To reduce overfitting, we utilized a pre-trained Vision Transformer with transfer learning and applied data augmentation during training. Nevertheless, we agree that additional safeguards are warranted. In the revised version, we will report 95% confidence intervals obtained via bootstrapping for all performance metrics on both internal and external cohorts. We will also include an ablation analysis comparing performance across multiple random train/test splits and clarify the exact splitting procedure in the Methods section. These additions will better substantiate the feasibility claim. revision: yes
Referee: The substantial drop from internal AUC 0.95 to external AUC 0.68 suggests significant domain shift or unaddressed differences in CT acquisition protocols between IEO and OV04 cohorts; the manuscript should detail preprocessing, normalization, and any domain adaptation techniques used to mitigate this.

Authors: We thank the referee for pointing out the importance of detailing the handling of domain differences. The Methods section currently describes the CT preprocessing pipeline, which includes clipping Hounsfield units, rescaling, and selecting lesion-dense omental slices per patient based on expert annotation. No domain adaptation methods were employed, as the study aimed to assess out-of-distribution performance in a clinical-like scenario. We will revise the manuscript to provide a more comprehensive description of the preprocessing steps, including intensity normalization and slice selection criteria. Additionally, we will discuss the observed differences in CT acquisition protocols between the two cohorts in the Discussion section to better explain the performance drop and its implications for generalizability. revision: yes

Circularity Check

0 steps flagged

No circularity: standard supervised prediction on held-out data

full rationale

The paper describes a conventional end-to-end supervised learning pipeline: a 2.5D multimodal ViT processes CT slices, fuses with clinical variables, and is trained to predict the postoperative CRS label. Reported ROC-AUC, accuracy, and precision are computed on explicitly held-out internal (n=41) and external (n=70) test cohorts. No equation or claim reduces the target metric to a fitted parameter by construction, no self-citation is invoked as a uniqueness theorem or load-bearing premise, and the pre-trained encoder is an external initialization rather than a redefinition of the output. The performance drop on external data further demonstrates that the evaluation is independent of the training objective. The derivation chain is therefore self-contained and falsifiable against the external labels.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on standard deep-learning assumptions that CT-derived features correlate with histopathologic response and that the chosen architecture and fusion strategy are appropriate for the task; no new entities or ad-hoc axioms are introduced beyond routine medical imaging ML practice.

axioms (1)

domain assumption CT imaging features are sufficiently informative of underlying histopathologic chemotherapy response to allow supervised learning.
Implicit premise required for any imaging-based prediction model; stated in purpose and methods sections of abstract.

pith-pipeline@v0.9.0 · 5713 in / 1302 out tokens · 49584 ms · 2026-05-10T17:00:03.097370+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

16 extracted references · 16 canonical work pages

[1]

New England Journal of Medicine363(10), 943–953 (2010)

Vergote, I., Trop´ e, C.G., Amant, F., Kristensen, G.B., Ehlen, T., Johnson, N., Ver- heijen, R.H.M., Burg, M.E.L., Lacave, A.J., Panici, P.B., Kenter, G.G., Casado, A., Mendiola, C., Coens, C., Verleye, L., Stuart, G.C.E., Pecorelli, S., Reed, N.S.: Neoadjuvant chemotherapy or primary surgery in stage iiic or iv ovarian cancer. New England Journal of Med...

work page 2010
[2]

Nature Communications14(1), 6756 (2023)

Crispin-Ortuzar, M., Woitek, R., Reinius, M.A.V., Moore, E., Beer, L., Bura, V., Rundo, L., McCague, C., Ursprung, S., Escudero Sanchez, L., Martin-Gonzalez, P., Mouliere, F., Chandrananda, D., Morris, J., Goranova, T., Piskorz, A.M., Singh, N., Sahdev, A., Pintican, R., Zerunian, M., Rosenfeld, N., Addley, H., Jimenez-Linan, M., Markowetz, F., Sala, E., ...

work page 2023
[3]

In: International Workshop on Biomedical Image Registration, pp

Machado, I.P., Reithmeir, A., Kogl, F., Rundo, L., Funingana, G., Reinius, M., Mungmeeprued, G., Gao, Z., McCague, C., Kerfoot, E., Woitek, R., Sala, E., Ou, Y., Brenton, J., Schnabel, J., Crispin, M.: A self-supervised image registration 13 approach for measuring local response patterns in metastatic ovarian cancer. In: International Workshop on Biomedic...

work page 2024
[4]

The Lancet386(9990), 249–257 (2015)

Kehoe, S., Hook, J., Nankivell, M., Jayson, G.C., Kitchener, H., Lopes, T., Lues- ley, D., Perren, T., Bannoo, S., Mascarenhas, M., Dobbs, S., Essapen, S., Twigg, J., Herod, J., McCluggage, G., Parmar, M., Swart, A.-M.: Primary chemotherapy versus primary surgery for newly diagnosed advanced ovarian cancer (chorus): an open-label, randomised, controlled, ...

work page 2015
[5]

International Journal of Computer Assisted Radiology and Surgery20(9), 1923–1929 (2025)

Drury, B., Machado, I.P., Gao, Z., Buddenkotte, T., Mahani, G., Funingana, G., Reinius, M., McCague, C., Woitek, R., Sahdev, A., Sala, E., Brenton, J.D., Crispin-Ortuzar, M.: Multi-task deep learning for automatic image segmentation and treatment response assessment in metastatic ovarian cancer. International Journal of Computer Assisted Radiology and Sur...

work page 1923
[6]

Journal of Clinical Oncology43(7), 868–891 (2025)

Gaillard, S., Lacchetti, C., Armstrong, D.K., Cliby, W.A., Edelson, M.I., Garcia, A.A., Ghebre, R.G., Gressel, G.M., Lesnock, J.L., Meyer, L.A., Moore, K.N., O’Cearbhaill, R.E., Olawaiye, A.B., Salani, R., Sparacio, D., Driel, W.J., Tew, W.P.: Neoadjuvant chemotherapy for newly diagnosed, advanced ovarian cancer: Asco guideline update. Journal of Clinical...

work page 2025
[7]

Journal of Clinical Oncology33(22), 2457–2463 (2015)

B¨ ohm, S., Faruqi, A., Said, I., Lockley, M., Brockbank, E., Jeyarajah, A., Fitz- patrick, A., Ennis, D., Dowe, T., Santos, J.L., Cook, L.S., Tinker, A.V., Le, N.D., Gilks, B.C., Singh, N.: Chemotherapy response score: development and validation of a system to quantify histopathologic response to neoadjuvant chemotherapy in tubo-ovarian high-grade serous...

work page 2015
[8]

Gynecologic Oncology194, 1–10 (2025)

Zannoni, G.F., Angelico, G., Spadola, S., Bragantini, E., Troncone, G., Fraggetta, F., Santoro, A.: Chemotherapy response score (crs): A comprehensive review of its prognostic and predictive value in high-grade serous carcinoma (hgsc). Gynecologic Oncology194, 1–10 (2025)

work page 2025
[9]

Journal of gynecologic oncology28(6), 73 (2017)

Lee, J.-Y., Chung, Y.S., Na, K., Kim, H.M., Park, C.K., Nam, E.J., Kim, S., Kim, S.W., Kim, Y.T., Kim, H.-S.: External validation of chemotherapy response score system for histopathological assessment of tumor regression after neoad- juvant chemotherapy in tubo-ovarian high-grade serous carcinoma. Journal of gynecologic oncology28(6), 73 (2017)

work page 2017
[10]

Gynecologic Oncology151(2), 264–268 (2018)

Rajkumar, S., Polson, A., Nath, R., Lane, G., Sayasneh, A., Jakes, A., Begum, S., Mehra, G.: Prognostic implications of histological tumor regression (b¨ ohm’s score) in patients receiving neoadjuvant chemotherapy for high grade serous tubal & ovarian carcinoma. Gynecologic Oncology151(2), 264–268 (2018)

work page 2018
[11]

Elsevier (2023)

Colombo, N., Gadducci, A., Landoni, F., Lorusso, D., Sabatini, R., Artioli, G., Berardi, R., Ceccherini, R., Cecere, S.C., Cormio, G., De Angelis, C., Legge, F., 14 Lissoni, A., Mammoliti, S., Mangili, G., Naglieri, E., Petrilla, M.C., Ricciardi, G.R.R., Ronzino, G., Salutari, V., Sambataro, D., Savarese, A., Scandurra, G., Tasca, G., Toma, F., Valabrega,...

work page 2023
[12]

Frontiers in oncology12, 868265 (2022)

Rundo, L., Beer, L., Escudero Sanchez, L., Crispin-Ortuzar, M., Reinius, M., McCague, C., Sahin, H., Bura, V., Pintican, R., Zerunian, M., Ursprung, S., Allajbeu, I., Addley, H., Martin-Gonzalez, P., Buddenkotte, T., Singh, N., Sahdev, A., Funingana, I., Jimenez-Linan, M., Markowetz, F., Brenton, J.D., Sala, E., Woitek, R.: Clinically interpretable radiom...

work page 2022
[13]

Fati, F., Rosanu, M., De Vitis, L., Rota, A., Traversa, A., Ribero, L., Schivardi, G., Petralia, G., Aletti, G.D., Colombo, N., Peiretti, M., Angioni, S., Casarin, J., Veraldi, R., Zaffino, P., Spadea, M.F., Multinu, F., De Momi, E.: Deep learning for decision support in ovarian cancer treatment planning (2025)

work page 2025
[14]

European radiology experimental7(1), 77 (2023)

Buddenkotte, T., Rundo, L., Woitek, R., Sanchez, L.E., Beer, L., Crispin-Ortuzar, M., Etmann, C., Mukherjee, S., Bura, V., McCague, C., Sahin, H., Pintican, R., Zerunian, M., Allajbeu, I., Singh, N., Anju, S., Havrilesky, L., Cohn, D.E., Bate- man, N.W., Conrads, T.P., Darcy, K.M., Maxwell, G.L., Freymann, J.B., ¨Oktem, O., Brenton, J.D., Sala, E., Sch¨ o...

work page 2023
[15]

International Journal of Gynecological Cancer29(2), 353–356 (2019)

B¨ ohm, S., Le, N., Lockley, M., Brockbank, E., Faruqi, A., Said, I., Jeyarajah, A., Wuntakal, R., Gilks, B., Singh, N.: Histopathologic response to neoadjuvant chemotherapy as a prognostic biomarker in tubo-ovarian high-grade serous carci- noma: updated chemotherapy response score (crs) results. International Journal of Gynecological Cancer29(2), 353–356 (2019)

work page 2019
[16]

Diagnostics 12(3), 633 (2022) 15

Santoro, A., Travaglino, A., Inzani, F., Straccia, P., Arciuolo, D., Valente, M., D’Alessandris, N., Scaglione, G., Angelico, G., Piermattei, A.,et al.: Prognostic value of chemotherapy response score (crs) assessed on the adnexa in ovarian high-grade serous carcinoma: a systematic review and meta-analysis. Diagnostics 12(3), 633 (2022) 15

work page 2022