Evaluation of head segmentation quality for treatment planning of tumor treating fields in brain tumors

Reuben R Shamir; Zeev Bomzon

arxiv: 1906.11014 · v1 · pith:JCYYXS6Onew · submitted 2019-06-26 · 📡 eess.IV · cs.CV

Evaluation of head segmentation quality for treatment planning of tumor treating fields in brain tumors

Reuben R Shamir , Zeev Bomzon This is my paper

Pith reviewed 2026-05-25 15:28 UTC · model grok-4.3

classification 📡 eess.IV cs.CV

keywords head segmentationTTFieldsmachine learningquality assessmentbrain tumorsglioblastomadecision tree regressorelectric field simulation

0 comments

The pith

Machine learning on segmentation-relevant features can predict the quality of automatic head segmentations for TTFields treatment planning.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper aims to establish that automatic assessment of head tissue segmentation quality is feasible for Tumor Treating Fields therapy planning in brain tumors. Accurate segmentation into tissues with similar electrical properties is required to simulate electric fields and guide placement of transducer arrays on the scalp. The authors identify features tied to atlas-based segmentation, show they correlate significantly with a similarity measure against validated segmentations, and train a decision tree regressor that predicts this similarity. On 20 patients the predictions match actual similarities with 3 percent average absolute difference and correlation 0.92. This matters to a reader because manual visual checks are slow, while automatic quality scores could support refinement loops and flag when a segmentation is reliable enough for field simulations.

Core claim

A set of features relevant to atlas-based segmentation correlate significantly with similarity between validated and automatic head segmentations. These features, when supplied to a decision tree regressor, predict the similarity measures for 20 TTFields patients under leave-one-out validation with average absolute difference of 3 percent and correlation coefficient 0.92. The work therefore concludes that quality estimation of segmentations is feasible by incorporating machine learning and segmentation-relevant features.

What carries the argument

Decision tree regressor trained on atlas-based segmentation features to predict similarity between automatic and validated head segmentations.

If this is right

Automatic refinement of segmentation parameters becomes possible without constant expert oversight.
The method can highlight specific flaw points in a segmentation for user correction.
Segmentations can be screened for sufficient accuracy before electric-field simulations are run.
The quality score supports iterative improvement of transducer-array placement recommendations.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same feature-plus-regressor approach could be tested on segmentation tasks for other therapies that rely on tissue property maps.
Embedding the predictor inside clinical software might shorten the time from imaging to finalized TTFields plan.
If the similarity measure is later validated directly against field simulation error, the estimator could serve as a gatekeeper for automated planning pipelines.

Load-bearing premise

The chosen similarity measure between validated and automatic segmentations serves as a sufficient proxy for the accuracy needed to produce reliable electric-field simulations in TTFields planning.

What would settle it

Finding that large differences in the predicted similarity measure produce only negligible changes in the resulting electric-field distributions or in the recommended transducer-array positions would show the quality estimator does not track planning-relevant accuracy.

Figures

Figures reproduced from arXiv: 1906.11014 by Reuben R Shamir, Zeev Bomzon.

**Figure 2.** Figure 2: (a) Absolute Pearson’s correlation between computed features (columns) and the Dice coefficients of each segmented tissue (rows). The Dice coefficients were computed between the validated head segmentations and those that were computed with a new automatic segmentation method (* p < 0.05). (b) The suggested features and a decision tree regressor output predictions of Dice coefficients that are in a high c… view at source ↗

read the original abstract

Tumor treating fields (TTFields) is an FDA approved therapy for the treatment of Gliobastoma Multiform (GBM) and currently being investigated for additional tumor types. TTFields are delivered to the tumor through the placement of transducer arrays (TAs) placed on the patient scalp. The positions of the TAs are associated with treatment outcomes via simulations of the electric fields. Therefore, we are currently developing a method for recommending optimal placement of TAs. A key step to achieve this goal is to correctly segment the head into tissues of similar electrical properties. Visual inspection of segmentation quality is invaluable but time-consuming. Automatic quality assessment can assist in automatic refinement of the segmentation parameters, suggest flaw points to the user and indicate if the segmented method is of sufficient accuracy for TTFields simulation. As a first step in this direction, we identified a set of features that are relevant to atlas-based segmentation and show that these are significantly correlated (p < 0.05) with a similarity measure between validated and automatically computed segmentations. Furthermore, we incorporated these features in a decision tree regressor to predict the similarity of the validated and computed segmentations of 20 TTFields patients using a leave-one-out approach. The predicted similarity measures were highly correlated with the actual ones (average abs. difference 3% (SD = 3%); r = 0.92, p < 0.001). We conclude that quality estimation of segmentations is feasible by incorporating machine learning and segmentation-relevant features.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gets a usable r=0.92 predictor for segmentation similarity on 20 patients but never checks whether that similarity actually moves the electric-field numbers that drive TTFields planning.

read the letter

The core result is straightforward: a handful of atlas-style features correlate with a segmentation similarity score, and a decision-tree regressor recovers that score at r=0.92 (mean absolute error 3 %) under leave-one-out on 20 cases. That is a clean, small-scale demonstration that off-the-shelf ML can stand in for visual quality checks in this workflow. The motivation is stated plainly and the numbers are reported without obvious inflation. Credit for shipping a concrete, reproducible-looking pipeline on real patient data rather than synthetic examples. The soft spot is exactly where the stress-test flagged it. The similarity metric is treated as a sufficient proxy for the segmentation accuracy needed to produce reliable electric-field maps, yet the paper contains no forward simulation that shows how changes in that metric translate into changes in |E| inside the tumor or at the resection cavity. Without that link, the clinical claim rests on an untested assumption. The cohort is also small, feature-selection details are thin, and there is no external validation set. For readers already building automated TTFields planning tools, the work supplies a practical starting point and a clear next experiment. For anyone outside that narrow pipeline, the missing validation step makes the result preliminary rather than actionable. I would send it to review with the explicit request that the authors add at least one simulation study tying segmentation error to field error; without it the paper stays too disconnected from its stated clinical goal.

Referee Report

2 major / 2 minor

Summary. The paper claims that a set of atlas-based segmentation features are significantly correlated (p<0.05) with a similarity measure between manual and automatic head segmentations, and that a decision-tree regressor trained on these features can predict the similarity measure for 20 TTFields patients with r=0.92 and mean absolute error of 3% under leave-one-out cross-validation, concluding that automatic quality estimation of segmentations is feasible for TTFields treatment planning.

Significance. If the chosen similarity metric is shown to be a reliable proxy for segmentation-induced errors in electric-field distributions, the approach could support automated quality control and parameter refinement in TTFields planning pipelines. The reported correlations and LOOCV performance provide preliminary evidence that the selected features carry predictive information about segmentation overlap, but the absence of any forward simulation tying the metric to field quantities limits the assessed clinical significance.

major comments (2)

[Abstract] Abstract (final paragraph) and Methods: The similarity measure is adopted as the sole target variable for both correlation analysis and regression without any reported forward simulation or sensitivity study demonstrating that changes in this measure produce clinically relevant variation in the electric-field quantities (e.g., |E| inside the tumor or resection cavity) that determine transducer-array placement. This link is load-bearing for the stated motivation.
[Results] Results (regressor evaluation): The manuscript provides no description of the feature-selection procedure, whether selection was performed inside or outside the LOOCV loop, or any hyperparameter tuning details for the decision tree. On a cohort of only 20 patients this omission directly affects the credibility of the reported r=0.92 and 3% mean absolute error.

minor comments (2)

[Abstract] Abstract: 'Gliobastoma Multiform' should read 'Glioblastoma Multiforme'.
[Abstract] Abstract: 'abs. difference' and 'SD =' should be written out consistently for readability.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We appreciate the referee's thorough review and constructive suggestions. Below we provide point-by-point responses to the major comments.

read point-by-point responses

Referee: [Abstract] Abstract (final paragraph) and Methods: The similarity measure is adopted as the sole target variable for both correlation analysis and regression without any reported forward simulation or sensitivity study demonstrating that changes in this measure produce clinically relevant variation in the electric-field quantities (e.g., |E| inside the tumor or resection cavity) that determine transducer-array placement. This link is load-bearing for the stated motivation.

Authors: The referee correctly identifies that our study uses the segmentation similarity measure as the target without direct validation against electric field simulation errors. This is a valid point regarding the strength of the clinical motivation. As the work is presented as a preliminary study on the feasibility of using machine learning for quality estimation, we will revise the abstract and discussion to better contextualize the similarity measure as a proxy and to outline the need for future sensitivity analyses linking it to field distributions. revision: partial
Referee: [Results] Results (regressor evaluation): The manuscript provides no description of the feature-selection procedure, whether selection was performed inside or outside the LOOCV loop, or any hyperparameter tuning details for the decision tree. On a cohort of only 20 patients this omission directly affects the credibility of the reported r=0.92 and 3% mean absolute error.

Authors: We agree that additional details on the feature selection procedure and hyperparameter settings are necessary for full transparency, especially given the small sample size. The features were pre-selected based on the statistically significant correlations reported earlier in the manuscript, with the selection performed on the entire dataset prior to LOOCV. The decision tree regressor was implemented using default hyperparameters from the scikit-learn library without further tuning. We will update the Methods section to explicitly describe this process and discuss its implications for the LOOCV results. revision: yes

standing simulated objections not resolved

Forward simulation studies to establish the relationship between the similarity measure and clinically relevant electric field variations, which are outside the scope of the present work.

Circularity Check

0 steps flagged

No significant circularity; standard ML prediction of independent similarity metric

full rationale

The paper extracts atlas-relevant features, demonstrates their correlation (p<0.05) with a Dice-like similarity measure between manual and automatic segmentations, and trains a decision-tree regressor via LOOCV to predict that measure (r=0.92). This is a conventional supervised regression setup where features are independently computed quantities and the target is an external validation metric; the regressor output is not equivalent to any fitted parameter or input by construction. No self-citations, self-definitional steps, or imported uniqueness claims appear in the text. The derivation chain is self-contained against the reported cross-validation benchmark.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The central claim rests on the unstated premise that atlas-based segmentation features capture the aspects of head anatomy that matter for electric-field accuracy; no explicit free parameters, axioms, or invented entities are named in the abstract.

pith-pipeline@v0.9.0 · 5804 in / 1217 out tokens · 28649 ms · 2026-05-25T15:28:49.424853+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

9 extracted references · 9 canonical work pages

[1]

Stupp, R., Taillibert, S., Kanner, A., Read, W., Steinberg, D., Lhermitte, B., Toms, S., Idbaih, A., Ahluwalia, M.S., Fink, K., Di Meco, F., Lieberman, F., Zhu, J. -J., Stragliotto, G., Tran, D., Brem, S., Hottinger, A., Kirson, E.D., Lavy-Shahaf, G., Weinberg, U., Kim, C.-Y., Paek, S.-H., Nicholas, G., Bruna, J., Hirte, H., Weller, M., Palti, Y., Heg i, ...

work page 2017
[2]

In: 2016 38th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC)

Bomzon, Z., Hershkovich, H.S., Urman, N., Chaudhry, A., Garcia -Carracedo, D., Korshoej, A.R., Weinberg, U., Wenger, C., Miranda, P., Wasserman, Y., Kirson, E.D., Yoram: Using computational phantoms to improve delivery of Tumor Treating Fields (TTFields) to patients. In: 2016 38th Annual International Conference of the IEEE Engineering in Medicine and Bio...

work page 2016
[3]

PLoS One

Huang, Y., Parra, L.C.: Fully Automated Whole -Head Segmentation with Improved Smoothness and Continuity, with Theory Reviewed. PLoS One. 10, e0125477 (2015)

work page 2015
[4]

Presented at the October 14 (2001)

Gerig, G., Jomier, M., Chakos, M.: Valmet: A New Validation Tool for Assessing and Improving 3D Object Segmentation. Presented at the October 14 (2001)

work page 2001
[5]

IEEE Trans

Warfield, S.K., Zou, K.H., Wells, W.M.: Simultaneous truth and performance level estimation (STAPLE): an algorithm for the validation of image segmentation. IEEE Trans. Med. Imaging. 23, 903–21 (2004)

work page 2004
[6]

IEEE Trans

Commowick, O., Akhondi -Asl, A., Warfield, S.K.: Estimating a reference standard segmentation with spatially varying performance para meters: local MAP STAPLE. IEEE Trans. Med. Imaging. 31, 1593–606 (2012)

work page 2012
[7]

Rivest-Hénault, D., Dowson, N., Greer, P.B., Fripp, J., Dowling, J.A.: Robust inverse-consistent affine CT–MR registration in MRI -assisted and MRI-alone prostate radiation therapy. Med. Image Anal. 23, 56–69 (2015)

work page 2015
[8]

Routledge (2017)

Breiman, L.: Classification And Regression Trees. Routledge (2017)

work page 2017
[9]

IEEE Trans

Akhondi-Asl, A., Warfield, S.K.: Simultaneous truth and performance level estimation through fusion of probabilistic segmentations. IEEE Trans. Me d. Imaging. 32, 1840–52 (2013)

work page 2013

[1] [1]

Stupp, R., Taillibert, S., Kanner, A., Read, W., Steinberg, D., Lhermitte, B., Toms, S., Idbaih, A., Ahluwalia, M.S., Fink, K., Di Meco, F., Lieberman, F., Zhu, J. -J., Stragliotto, G., Tran, D., Brem, S., Hottinger, A., Kirson, E.D., Lavy-Shahaf, G., Weinberg, U., Kim, C.-Y., Paek, S.-H., Nicholas, G., Bruna, J., Hirte, H., Weller, M., Palti, Y., Heg i, ...

work page 2017

[2] [2]

In: 2016 38th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC)

Bomzon, Z., Hershkovich, H.S., Urman, N., Chaudhry, A., Garcia -Carracedo, D., Korshoej, A.R., Weinberg, U., Wenger, C., Miranda, P., Wasserman, Y., Kirson, E.D., Yoram: Using computational phantoms to improve delivery of Tumor Treating Fields (TTFields) to patients. In: 2016 38th Annual International Conference of the IEEE Engineering in Medicine and Bio...

work page 2016

[3] [3]

PLoS One

Huang, Y., Parra, L.C.: Fully Automated Whole -Head Segmentation with Improved Smoothness and Continuity, with Theory Reviewed. PLoS One. 10, e0125477 (2015)

work page 2015

[4] [4]

Presented at the October 14 (2001)

Gerig, G., Jomier, M., Chakos, M.: Valmet: A New Validation Tool for Assessing and Improving 3D Object Segmentation. Presented at the October 14 (2001)

work page 2001

[5] [5]

IEEE Trans

Warfield, S.K., Zou, K.H., Wells, W.M.: Simultaneous truth and performance level estimation (STAPLE): an algorithm for the validation of image segmentation. IEEE Trans. Med. Imaging. 23, 903–21 (2004)

work page 2004

[6] [6]

IEEE Trans

Commowick, O., Akhondi -Asl, A., Warfield, S.K.: Estimating a reference standard segmentation with spatially varying performance para meters: local MAP STAPLE. IEEE Trans. Med. Imaging. 31, 1593–606 (2012)

work page 2012

[7] [7]

Rivest-Hénault, D., Dowson, N., Greer, P.B., Fripp, J., Dowling, J.A.: Robust inverse-consistent affine CT–MR registration in MRI -assisted and MRI-alone prostate radiation therapy. Med. Image Anal. 23, 56–69 (2015)

work page 2015

[8] [8]

Routledge (2017)

Breiman, L.: Classification And Regression Trees. Routledge (2017)

work page 2017

[9] [9]

IEEE Trans

Akhondi-Asl, A., Warfield, S.K.: Simultaneous truth and performance level estimation through fusion of probabilistic segmentations. IEEE Trans. Me d. Imaging. 32, 1840–52 (2013)

work page 2013