Automated Segmentation of Hip and Thigh Muscles in Metal Artifact-Contaminated CT using Convolutional Neural Network-Enhanced Normalized Metal Artifact Reduction

Masaki Takao; Mitsuki Sakamoto; Nobuhiko Sugano; Yoshinobu Sato; Yoshito Otake; Yuki Suzuki; Yuta Hiasa

arxiv: 1906.11484 · v1 · pith:T3QJGEWFnew · submitted 2019-06-27 · 📡 eess.IV · cs.CV· physics.med-ph

Automated Segmentation of Hip and Thigh Muscles in Metal Artifact-Contaminated CT using Convolutional Neural Network-Enhanced Normalized Metal Artifact Reduction

Mitsuki Sakamoto , Yuta Hiasa , Yoshito Otake , Masaki Takao , Yuki Suzuki , Nobuhiko Sugano , Yoshinobu Sato This is my paper

Pith reviewed 2026-05-25 14:31 UTC · model grok-4.3

classification 📡 eess.IV cs.CVphysics.med-ph

keywords muscle segmentationmetal artifact reductionCT imagingconvolutional neural networkhip arthroplastyU-netpostoperative analysisnormalized metal artifact reduction

0 comments

The pith

A pipeline of NMAR followed by two U-nets improves automatic segmentation of 19 hip and thigh muscles in postoperative CT scans with metal artifacts.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The authors build a method to segment hip and thigh muscles automatically from CT images taken after hip replacement surgery, where metal implants create artifacts that distort the images near the implant. They first apply Normalized Metal Artifact Reduction to lessen the streaks, then feed the result into one U-net that refines the image and a second U-net that labels the 19 muscles. On images simulated from 20 patients the new pipeline lowers the average surface distance error from 1.17 mm to 1.10 mm and achieves statistically significant gains on 14 of the 19 muscles; on three real patient scans the error for the two largest gluteal muscles is 1.32 mm. Accurate segmentation would let clinicians measure muscle volume and position after surgery without drawing outlines by hand.

Core claim

The authors propose a pipeline that first applies Normalized Metal Artifact Reduction to reduce artifacts in postoperative CT, refines the result with a U-net, and then segments 19 muscles with a second U-net. On simulated data, this yields statistically significant improvement in surface distance for 14 of 19 muscles, reducing average ASD from 1.17 mm to 1.10 mm. On real data, ASD for gluteus maximus and medius is 1.32 mm.

What carries the argument

A two-U-net architecture in which the first network refines the output of Normalized Metal Artifact Reduction and the second network performs the muscle segmentation on the refined images.

If this is right

Segmentation accuracy improves for most muscles near the metal implant.
Postoperative CT analysis can proceed with less manual correction for the 14 muscles that showed gains.
The two-stage refinement plus segmentation approach provides a concrete baseline for further automation of hip-implant image analysis.
Real-image validation on gluteus maximus and medius supports use of the method when only limited manual traces are available.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the simulation-to-real gap is small, the pipeline could be deployed to quantify muscle atrophy or implant positioning across larger patient cohorts without extra annotation effort.
The observed local improvement near implants suggests the method could be tested on CTs containing other orthopedic hardware such as knee or shoulder replacements.
Because the second network operates on already-refined images, replacing the two separate U-nets with a single joint model might reduce cumulative error propagation.

Load-bearing premise

The simulated metal-artifact images used for the main quantitative evaluation accurately represent the appearance and severity of artifacts in real postoperative CT scans of patients with hip implants.

What would settle it

Running the pipeline on a larger set of real postoperative CT scans with hip implants and finding no reduction or an increase in average symmetric surface distance relative to the previous method would falsify the reported improvement.

read the original abstract

In total hip arthroplasty, analysis of postoperative medical images is important to evaluate surgical outcome. Since Computed Tomography (CT) is most prevalent modality in orthopedic surgery, we aimed at the analysis of CT image. In this work, we focus on the metal artifact in postoperative CT caused by the metallic implant, which reduces the accuracy of segmentation especially in the vicinity of the implant. Our goal was to develop an automated segmentation method of the bones and muscles in the postoperative CT images. We propose a method that combines Normalized Metal Artifact Reduction (NMAR), which is one of the state-of-the-art metal artifact reduction methods, and a Convolutional Neural Network-based segmentation using two U-net architectures. The first U-net refines the result of NMAR and the muscle segmentation is performed by the second U-net. We conducted experiments using simulated images of 20 patients and real images of three patients to evaluate the segmentation accuracy of 19 muscles. In simulation study, the proposed method showed statistically significant improvement (p<0.05) in the average symmetric surface distance (ASD) metric for 14 muscles out of 19 muscles and the average ASD of all muscles from 1.17 +/- 0.543 mm (mean +/- std over all patients) to 1.10 +/- 0.509 mm over our previous method. The real image study using the manual trace of gluteus maximus and medius muscles showed ASD of 1.32 +/- 0.25 mm. Our future work includes training of a network in an end-to-end manner for both the metal artifact reduction and muscle segmentation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Small simulated gains from NMAR plus two sequential U-Nets, but real-data support is too narrow to back the clinical claim.

read the letter

The paper's core result is a modest ASD drop from 1.17 mm to 1.10 mm on 20 simulated patients, with p<0.05 on 14 of 19 muscles, using NMAR followed by one U-Net for artifact refinement and a second for segmentation. That pipeline is the only concrete novelty here; it is a straightforward stacking of existing tools on hip/thigh anatomy after arthroplasty. The simulation experiments are cleanly reported and the statistical comparison to their prior method is straightforward. Real-image numbers are given for only two muscles in three patients, with no baseline shown and no error bars or method details supplied in the abstract. The absolute gain remains small even on simulation, and the work provides no quantitative check that the simulated artifacts match the spatial pattern or severity seen around real hip implants. That leaves the central claim resting on an untested assumption about simulation fidelity. The paper is aimed at groups already doing orthopedic CT segmentation and metal-artifact work; it gives them a usable recipe but does not change the state of the field. It is coherent and cites the relevant prior art, so it is worth sending to referees who can ask for more real cases and a direct sim-to-real artifact comparison. I would not cite it myself unless those gaps are closed.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes combining Normalized Metal Artifact Reduction (NMAR) with two sequential U-Net architectures (one to refine the NMAR output, the second to segment 19 hip and thigh muscles) for postoperative CT images containing metal artifacts from hip implants. On 20 simulated patients the method yields statistically significant ASD improvement for 14 of 19 muscles (mean ASD reduced from 1.17 mm to 1.10 mm); on three real patients it reports ASD of 1.32 mm for the gluteus maximus and medius.

Significance. If the simulated artifacts faithfully reproduce real clinical beam-hardening and scatter patterns, the pipeline offers a modest, empirically demonstrated gain in segmentation accuracy for orthopedic CT analysis. The work benefits from held-out evaluation on both simulated and real data and from the use of an established MAR pre-processing step, but the small absolute improvement and limited real-world cohort constrain its immediate clinical impact.

major comments (2)

[Simulation study] Simulation study: The central quantitative claim (statistically significant ASD improvement on 14/19 muscles) rests entirely on the unverified assumption that the simulated metal-artifact images accurately reproduce the spatial distribution, severity, and tissue-specific effects of real postoperative CT artifacts around hip implants. No quantitative match (e.g., HU histograms, artifact extent maps, or tissue-specific error statistics) between simulated and real artifact distributions is provided.
[Real image study] Real image study: Evaluation is performed on only three patients and reports ASD solely for gluteus maximus and medius without any baseline comparison to the previous method, rendering the real-data results insufficient to support the claim of improved segmentation accuracy in clinical metal-contaminated CT.

minor comments (2)

[Abstract] Abstract: The statistical procedure used to obtain p<0.05 values and the precise method for computing mean and standard deviation across patients are not described.
[Method] Method section: Training hyperparameters, loss functions, and data-augmentation details for the two U-Nets are only sketched; fuller specification would aid reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thorough review and valuable feedback. We address each major comment below with clarifications and proposed revisions where appropriate. Our responses focus on substance and aim to strengthen the manuscript without overstating the results.

read point-by-point responses

Referee: [Simulation study] The central quantitative claim (statistically significant ASD improvement on 14/19 muscles) rests entirely on the unverified assumption that the simulated metal-artifact images accurately reproduce the spatial distribution, severity, and tissue-specific effects of real postoperative CT artifacts around hip implants. No quantitative match (e.g., HU histograms, artifact extent maps, or tissue-specific error statistics) between simulated and real artifact distributions is provided.

Authors: We agree that the manuscript does not include a direct quantitative comparison (such as HU histograms or artifact extent maps) between the simulated and real artifact distributions. The simulation follows standard forward-projection methods with polychromatic modeling that are widely used in the MAR literature. To address the concern, we will add a new subsection with side-by-side HU distribution plots and visual artifact comparisons using the three available real patient scans. This will provide empirical support for the simulation fidelity. revision: yes
Referee: [Real image study] Evaluation is performed on only three patients and reports ASD solely for gluteus maximus and medius without any baseline comparison to the previous method, rendering the real-data results insufficient to support the claim of improved segmentation accuracy in clinical metal-contaminated CT.

Authors: We concur that the real-image cohort is small (three patients) and limited to two muscles without a baseline comparison to the prior method. Comprehensive manual annotation of all 19 muscles on real postoperative scans is resource-intensive and was not feasible within the study scope. The real-data results are presented as preliminary validation only. In revision we will (i) explicitly label them as such, (ii) remove any implication of superiority on real data, and (iii) expand the limitations section to discuss the practical barriers to larger real-world cohorts and full baseline comparisons. revision: partial

Circularity Check

0 steps flagged

No circularity; empirical evaluation on held-out data

full rationale

The paper reports an empirical pipeline (NMAR + two U-nets) whose central claims are quantitative ASD improvements measured on 20 simulated patients (held-out) and 3 real patients. No equations, derivations, or first-principles results are presented that reduce the reported metrics to fitted parameters, self-definitions, or self-citation chains by construction. The comparison is against a prior method on independent test data; the simulation-to-real fidelity concern is an external-validity issue, not a circularity reduction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract provides no explicit free parameters, axioms, or invented entities; the approach rests on standard assumptions of U-Net architectures and the NMAR algorithm from prior literature.

pith-pipeline@v0.9.0 · 5862 in / 1229 out tokens · 30253 ms · 2026-05-25T14:31:49.458384+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We propose a method that combines Normalized Metal Artifact Reduction (NMAR) ... and a Convolutional Neural Network-based segmentation using two U-net architectures.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

15 extracted references · 15 canonical work pages · 4 internal anchors

[1]

Ronneberger, O., Fischer, P., & Brox, T. (2015). U-net: Convolutional networks for biomedical image segmentation. International Conference on Medical image computing and computer -assisted intervention, 234- 241

work page 2015
[2]

Meyer, E., Raupach, R., Lell, M., Schmidt, B., & Kachelrieß, M. (2010). Normalized metal artif act reduction (NMAR) in computed tomography. Medical physics, 37(10), 5482-5493

work page 2010
[3]

Sakamoto, M., Hiasa, Y., Otake, Y., Takao, M., Suzuki, Y ., Sugano, N., & Sato, Y. (2019 ). Automated segmentation of hip and thigh muscles in metal artifact contaminated CT using CNN. In International Forum on Medical Imaging in Asia 2019. International Society for Optics and Photonics, 11050, 110500

work page 2019
[4]

, Xi, Y., Claus, B., Jin, Y., et al

Gjesteby, L., Shan, H., Yang, Q. , Xi, Y., Claus, B., Jin, Y., et al. (2018). Deep Neural Network for CT Metal Artifact Reduction with a Perceptual Loss Function. Proceedings of The Fifth International Conference on Image Formation in X-ray Computed Tomography, 439-443

work page 2018
[5]

(2018) Convolutional Neural Network based Metal Artifact Reduction in X -ray Computed Tomography

Zhang, Y., & Yu, H. (2018) Convolutional Neural Network based Metal Artifact Reduction in X -ray Computed Tomography. IEEE transactions on medical imaging, 37(6), 1370-1381

work page 2018
[6]

Herman, G. T. (1979). Correction for beam hardening in computed tomography. Physics in Medicine & Biology, 24(1), 81

work page 1979
[7]

Kyriakou, Y., Meyer, E., Prell, D., & Kachelrieß, M. (2010). Empirical beam hardening correction (EBHC) for CT. Medical physics, 37(10), 5179-5187

work page 2010
[8]

http://www.nist.gov/pml/data/xcom/index.cfm

Berger, M., XCOM: photon cross sections database. http://www.nist.gov/pml/data/xcom/index.cfm. Accessed 21 June 2019

work page 2019
[9]

Simulation of x-ray spectra

Simens Healthineers. Simulation of x-ray spectra . https://www.oem-xray-components.siemens.com/x-ray- spectra-simulation. Accessed 21 June 2019

work page 2019
[10]

Adam: A Method for Stochastic Optimization

Kingma, D. P., & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980

work page internal anchor Pith review Pith/arXiv arXiv 2014
[11]

Random Erasing Data Augmentation

Zhong, Z., Zheng, L., Kang, G., Li, S., Yang, Y. (2017) Random erasing data augmentation. arXiv preprint arXiv:1708.04896

work page internal anchor Pith review Pith/arXiv arXiv 2017
[12]

Augmentor: An Image Augmentation Library for Machine Learning

Bloice MD, Stocker C, Holzinger A. (2017), Augmentor: an image augmentation library for machine learning. arXiv:1708.04680

work page internal anchor Pith review Pith/arXiv arXiv 2017
[13]

3D segmentation in the clinic: A grand challenge ,

Van Ginneken , B., Heimann, T., & Styner, M. , “ 3D segmentation in the clinic: A grand challenge ,” 3D segmentation in the clinic: a grand challenge, 7-15 (2007)

work page 2007
[14]

Wu, D., Kim, K., Dong, B., & Li, Q. (2017). End-to-end abnormality detection in medical imaging. arXiv preprint arXiv:1711.02074

work page internal anchor Pith review Pith/arXiv arXiv 2017
[15]

Shrivastava, A., Pfister, T., Tuzel, O., Susskind, J., Wang, W., & Webb, R. (2017). Learning from simulated and unsupervised images through adversarial training. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2107-2116

work page 2017

[1] [1]

Ronneberger, O., Fischer, P., & Brox, T. (2015). U-net: Convolutional networks for biomedical image segmentation. International Conference on Medical image computing and computer -assisted intervention, 234- 241

work page 2015

[2] [2]

Meyer, E., Raupach, R., Lell, M., Schmidt, B., & Kachelrieß, M. (2010). Normalized metal artif act reduction (NMAR) in computed tomography. Medical physics, 37(10), 5482-5493

work page 2010

[3] [3]

Sakamoto, M., Hiasa, Y., Otake, Y., Takao, M., Suzuki, Y ., Sugano, N., & Sato, Y. (2019 ). Automated segmentation of hip and thigh muscles in metal artifact contaminated CT using CNN. In International Forum on Medical Imaging in Asia 2019. International Society for Optics and Photonics, 11050, 110500

work page 2019

[4] [4]

, Xi, Y., Claus, B., Jin, Y., et al

Gjesteby, L., Shan, H., Yang, Q. , Xi, Y., Claus, B., Jin, Y., et al. (2018). Deep Neural Network for CT Metal Artifact Reduction with a Perceptual Loss Function. Proceedings of The Fifth International Conference on Image Formation in X-ray Computed Tomography, 439-443

work page 2018

[5] [5]

(2018) Convolutional Neural Network based Metal Artifact Reduction in X -ray Computed Tomography

Zhang, Y., & Yu, H. (2018) Convolutional Neural Network based Metal Artifact Reduction in X -ray Computed Tomography. IEEE transactions on medical imaging, 37(6), 1370-1381

work page 2018

[6] [6]

Herman, G. T. (1979). Correction for beam hardening in computed tomography. Physics in Medicine & Biology, 24(1), 81

work page 1979

[7] [7]

Kyriakou, Y., Meyer, E., Prell, D., & Kachelrieß, M. (2010). Empirical beam hardening correction (EBHC) for CT. Medical physics, 37(10), 5179-5187

work page 2010

[8] [8]

http://www.nist.gov/pml/data/xcom/index.cfm

Berger, M., XCOM: photon cross sections database. http://www.nist.gov/pml/data/xcom/index.cfm. Accessed 21 June 2019

work page 2019

[9] [9]

Simulation of x-ray spectra

Simens Healthineers. Simulation of x-ray spectra . https://www.oem-xray-components.siemens.com/x-ray- spectra-simulation. Accessed 21 June 2019

work page 2019

[10] [10]

Adam: A Method for Stochastic Optimization

Kingma, D. P., & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980

work page internal anchor Pith review Pith/arXiv arXiv 2014

[11] [11]

Random Erasing Data Augmentation

Zhong, Z., Zheng, L., Kang, G., Li, S., Yang, Y. (2017) Random erasing data augmentation. arXiv preprint arXiv:1708.04896

work page internal anchor Pith review Pith/arXiv arXiv 2017

[12] [12]

Augmentor: An Image Augmentation Library for Machine Learning

Bloice MD, Stocker C, Holzinger A. (2017), Augmentor: an image augmentation library for machine learning. arXiv:1708.04680

work page internal anchor Pith review Pith/arXiv arXiv 2017

[13] [13]

3D segmentation in the clinic: A grand challenge ,

Van Ginneken , B., Heimann, T., & Styner, M. , “ 3D segmentation in the clinic: A grand challenge ,” 3D segmentation in the clinic: a grand challenge, 7-15 (2007)

work page 2007

[14] [14]

Wu, D., Kim, K., Dong, B., & Li, Q. (2017). End-to-end abnormality detection in medical imaging. arXiv preprint arXiv:1711.02074

work page internal anchor Pith review Pith/arXiv arXiv 2017

[15] [15]

Shrivastava, A., Pfister, T., Tuzel, O., Susskind, J., Wang, W., & Webb, R. (2017). Learning from simulated and unsupervised images through adversarial training. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2107-2116

work page 2017