Automated Segmentation of Hip and Thigh Muscles in Metal Artifact-Contaminated CT using Convolutional Neural Network-Enhanced Normalized Metal Artifact Reduction
Pith reviewed 2026-05-25 14:31 UTC · model grok-4.3
The pith
A pipeline of NMAR followed by two U-nets improves automatic segmentation of 19 hip and thigh muscles in postoperative CT scans with metal artifacts.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors propose a pipeline that first applies Normalized Metal Artifact Reduction to reduce artifacts in postoperative CT, refines the result with a U-net, and then segments 19 muscles with a second U-net. On simulated data, this yields statistically significant improvement in surface distance for 14 of 19 muscles, reducing average ASD from 1.17 mm to 1.10 mm. On real data, ASD for gluteus maximus and medius is 1.32 mm.
What carries the argument
A two-U-net architecture in which the first network refines the output of Normalized Metal Artifact Reduction and the second network performs the muscle segmentation on the refined images.
If this is right
- Segmentation accuracy improves for most muscles near the metal implant.
- Postoperative CT analysis can proceed with less manual correction for the 14 muscles that showed gains.
- The two-stage refinement plus segmentation approach provides a concrete baseline for further automation of hip-implant image analysis.
- Real-image validation on gluteus maximus and medius supports use of the method when only limited manual traces are available.
Where Pith is reading between the lines
- If the simulation-to-real gap is small, the pipeline could be deployed to quantify muscle atrophy or implant positioning across larger patient cohorts without extra annotation effort.
- The observed local improvement near implants suggests the method could be tested on CTs containing other orthopedic hardware such as knee or shoulder replacements.
- Because the second network operates on already-refined images, replacing the two separate U-nets with a single joint model might reduce cumulative error propagation.
Load-bearing premise
The simulated metal-artifact images used for the main quantitative evaluation accurately represent the appearance and severity of artifacts in real postoperative CT scans of patients with hip implants.
What would settle it
Running the pipeline on a larger set of real postoperative CT scans with hip implants and finding no reduction or an increase in average symmetric surface distance relative to the previous method would falsify the reported improvement.
read the original abstract
In total hip arthroplasty, analysis of postoperative medical images is important to evaluate surgical outcome. Since Computed Tomography (CT) is most prevalent modality in orthopedic surgery, we aimed at the analysis of CT image. In this work, we focus on the metal artifact in postoperative CT caused by the metallic implant, which reduces the accuracy of segmentation especially in the vicinity of the implant. Our goal was to develop an automated segmentation method of the bones and muscles in the postoperative CT images. We propose a method that combines Normalized Metal Artifact Reduction (NMAR), which is one of the state-of-the-art metal artifact reduction methods, and a Convolutional Neural Network-based segmentation using two U-net architectures. The first U-net refines the result of NMAR and the muscle segmentation is performed by the second U-net. We conducted experiments using simulated images of 20 patients and real images of three patients to evaluate the segmentation accuracy of 19 muscles. In simulation study, the proposed method showed statistically significant improvement (p<0.05) in the average symmetric surface distance (ASD) metric for 14 muscles out of 19 muscles and the average ASD of all muscles from 1.17 +/- 0.543 mm (mean +/- std over all patients) to 1.10 +/- 0.509 mm over our previous method. The real image study using the manual trace of gluteus maximus and medius muscles showed ASD of 1.32 +/- 0.25 mm. Our future work includes training of a network in an end-to-end manner for both the metal artifact reduction and muscle segmentation.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes combining Normalized Metal Artifact Reduction (NMAR) with two sequential U-Net architectures (one to refine the NMAR output, the second to segment 19 hip and thigh muscles) for postoperative CT images containing metal artifacts from hip implants. On 20 simulated patients the method yields statistically significant ASD improvement for 14 of 19 muscles (mean ASD reduced from 1.17 mm to 1.10 mm); on three real patients it reports ASD of 1.32 mm for the gluteus maximus and medius.
Significance. If the simulated artifacts faithfully reproduce real clinical beam-hardening and scatter patterns, the pipeline offers a modest, empirically demonstrated gain in segmentation accuracy for orthopedic CT analysis. The work benefits from held-out evaluation on both simulated and real data and from the use of an established MAR pre-processing step, but the small absolute improvement and limited real-world cohort constrain its immediate clinical impact.
major comments (2)
- [Simulation study] Simulation study: The central quantitative claim (statistically significant ASD improvement on 14/19 muscles) rests entirely on the unverified assumption that the simulated metal-artifact images accurately reproduce the spatial distribution, severity, and tissue-specific effects of real postoperative CT artifacts around hip implants. No quantitative match (e.g., HU histograms, artifact extent maps, or tissue-specific error statistics) between simulated and real artifact distributions is provided.
- [Real image study] Real image study: Evaluation is performed on only three patients and reports ASD solely for gluteus maximus and medius without any baseline comparison to the previous method, rendering the real-data results insufficient to support the claim of improved segmentation accuracy in clinical metal-contaminated CT.
minor comments (2)
- [Abstract] Abstract: The statistical procedure used to obtain p<0.05 values and the precise method for computing mean and standard deviation across patients are not described.
- [Method] Method section: Training hyperparameters, loss functions, and data-augmentation details for the two U-Nets are only sketched; fuller specification would aid reproducibility.
Simulated Author's Rebuttal
We thank the referee for the thorough review and valuable feedback. We address each major comment below with clarifications and proposed revisions where appropriate. Our responses focus on substance and aim to strengthen the manuscript without overstating the results.
read point-by-point responses
-
Referee: [Simulation study] The central quantitative claim (statistically significant ASD improvement on 14/19 muscles) rests entirely on the unverified assumption that the simulated metal-artifact images accurately reproduce the spatial distribution, severity, and tissue-specific effects of real postoperative CT artifacts around hip implants. No quantitative match (e.g., HU histograms, artifact extent maps, or tissue-specific error statistics) between simulated and real artifact distributions is provided.
Authors: We agree that the manuscript does not include a direct quantitative comparison (such as HU histograms or artifact extent maps) between the simulated and real artifact distributions. The simulation follows standard forward-projection methods with polychromatic modeling that are widely used in the MAR literature. To address the concern, we will add a new subsection with side-by-side HU distribution plots and visual artifact comparisons using the three available real patient scans. This will provide empirical support for the simulation fidelity. revision: yes
-
Referee: [Real image study] Evaluation is performed on only three patients and reports ASD solely for gluteus maximus and medius without any baseline comparison to the previous method, rendering the real-data results insufficient to support the claim of improved segmentation accuracy in clinical metal-contaminated CT.
Authors: We concur that the real-image cohort is small (three patients) and limited to two muscles without a baseline comparison to the prior method. Comprehensive manual annotation of all 19 muscles on real postoperative scans is resource-intensive and was not feasible within the study scope. The real-data results are presented as preliminary validation only. In revision we will (i) explicitly label them as such, (ii) remove any implication of superiority on real data, and (iii) expand the limitations section to discuss the practical barriers to larger real-world cohorts and full baseline comparisons. revision: partial
Circularity Check
No circularity; empirical evaluation on held-out data
full rationale
The paper reports an empirical pipeline (NMAR + two U-nets) whose central claims are quantitative ASD improvements measured on 20 simulated patients (held-out) and 3 real patients. No equations, derivations, or first-principles results are presented that reduce the reported metrics to fitted parameters, self-definitions, or self-citation chains by construction. The comparison is against a prior method on independent test data; the simulation-to-real fidelity concern is an external-validity issue, not a circularity reduction.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We propose a method that combines Normalized Metal Artifact Reduction (NMAR) ... and a Convolutional Neural Network-based segmentation using two U-net architectures.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Ronneberger, O., Fischer, P., & Brox, T. (2015). U-net: Convolutional networks for biomedical image segmentation. International Conference on Medical image computing and computer -assisted intervention, 234- 241
work page 2015
-
[2]
Meyer, E., Raupach, R., Lell, M., Schmidt, B., & Kachelrieß, M. (2010). Normalized metal artif act reduction (NMAR) in computed tomography. Medical physics, 37(10), 5482-5493
work page 2010
-
[3]
Sakamoto, M., Hiasa, Y., Otake, Y., Takao, M., Suzuki, Y ., Sugano, N., & Sato, Y. (2019 ). Automated segmentation of hip and thigh muscles in metal artifact contaminated CT using CNN. In International Forum on Medical Imaging in Asia 2019. International Society for Optics and Photonics, 11050, 110500
work page 2019
-
[4]
, Xi, Y., Claus, B., Jin, Y., et al
Gjesteby, L., Shan, H., Yang, Q. , Xi, Y., Claus, B., Jin, Y., et al. (2018). Deep Neural Network for CT Metal Artifact Reduction with a Perceptual Loss Function. Proceedings of The Fifth International Conference on Image Formation in X-ray Computed Tomography, 439-443
work page 2018
-
[5]
(2018) Convolutional Neural Network based Metal Artifact Reduction in X -ray Computed Tomography
Zhang, Y., & Yu, H. (2018) Convolutional Neural Network based Metal Artifact Reduction in X -ray Computed Tomography. IEEE transactions on medical imaging, 37(6), 1370-1381
work page 2018
-
[6]
Herman, G. T. (1979). Correction for beam hardening in computed tomography. Physics in Medicine & Biology, 24(1), 81
work page 1979
-
[7]
Kyriakou, Y., Meyer, E., Prell, D., & Kachelrieß, M. (2010). Empirical beam hardening correction (EBHC) for CT. Medical physics, 37(10), 5179-5187
work page 2010
-
[8]
http://www.nist.gov/pml/data/xcom/index.cfm
Berger, M., XCOM: photon cross sections database. http://www.nist.gov/pml/data/xcom/index.cfm. Accessed 21 June 2019
work page 2019
-
[9]
Simens Healthineers. Simulation of x-ray spectra . https://www.oem-xray-components.siemens.com/x-ray- spectra-simulation. Accessed 21 June 2019
work page 2019
-
[10]
Adam: A Method for Stochastic Optimization
Kingma, D. P., & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980
work page internal anchor Pith review Pith/arXiv arXiv 2014
-
[11]
Random Erasing Data Augmentation
Zhong, Z., Zheng, L., Kang, G., Li, S., Yang, Y. (2017) Random erasing data augmentation. arXiv preprint arXiv:1708.04896
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[12]
Augmentor: An Image Augmentation Library for Machine Learning
Bloice MD, Stocker C, Holzinger A. (2017), Augmentor: an image augmentation library for machine learning. arXiv:1708.04680
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[13]
3D segmentation in the clinic: A grand challenge ,
Van Ginneken , B., Heimann, T., & Styner, M. , “ 3D segmentation in the clinic: A grand challenge ,” 3D segmentation in the clinic: a grand challenge, 7-15 (2007)
work page 2007
-
[14]
Wu, D., Kim, K., Dong, B., & Li, Q. (2017). End-to-end abnormality detection in medical imaging. arXiv preprint arXiv:1711.02074
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[15]
Shrivastava, A., Pfister, T., Tuzel, O., Susskind, J., Wang, W., & Webb, R. (2017). Learning from simulated and unsupervised images through adversarial training. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2107-2116
work page 2017
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.