pith. sign in

arxiv: 1906.11484 · v1 · pith:T3QJGEWFnew · submitted 2019-06-27 · 📡 eess.IV · cs.CV· physics.med-ph

Automated Segmentation of Hip and Thigh Muscles in Metal Artifact-Contaminated CT using Convolutional Neural Network-Enhanced Normalized Metal Artifact Reduction

Pith reviewed 2026-05-25 14:31 UTC · model grok-4.3

classification 📡 eess.IV cs.CVphysics.med-ph
keywords muscle segmentationmetal artifact reductionCT imagingconvolutional neural networkhip arthroplastyU-netpostoperative analysisnormalized metal artifact reduction
0
0 comments X

The pith

A pipeline of NMAR followed by two U-nets improves automatic segmentation of 19 hip and thigh muscles in postoperative CT scans with metal artifacts.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The authors build a method to segment hip and thigh muscles automatically from CT images taken after hip replacement surgery, where metal implants create artifacts that distort the images near the implant. They first apply Normalized Metal Artifact Reduction to lessen the streaks, then feed the result into one U-net that refines the image and a second U-net that labels the 19 muscles. On images simulated from 20 patients the new pipeline lowers the average surface distance error from 1.17 mm to 1.10 mm and achieves statistically significant gains on 14 of the 19 muscles; on three real patient scans the error for the two largest gluteal muscles is 1.32 mm. Accurate segmentation would let clinicians measure muscle volume and position after surgery without drawing outlines by hand.

Core claim

The authors propose a pipeline that first applies Normalized Metal Artifact Reduction to reduce artifacts in postoperative CT, refines the result with a U-net, and then segments 19 muscles with a second U-net. On simulated data, this yields statistically significant improvement in surface distance for 14 of 19 muscles, reducing average ASD from 1.17 mm to 1.10 mm. On real data, ASD for gluteus maximus and medius is 1.32 mm.

What carries the argument

A two-U-net architecture in which the first network refines the output of Normalized Metal Artifact Reduction and the second network performs the muscle segmentation on the refined images.

If this is right

  • Segmentation accuracy improves for most muscles near the metal implant.
  • Postoperative CT analysis can proceed with less manual correction for the 14 muscles that showed gains.
  • The two-stage refinement plus segmentation approach provides a concrete baseline for further automation of hip-implant image analysis.
  • Real-image validation on gluteus maximus and medius supports use of the method when only limited manual traces are available.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the simulation-to-real gap is small, the pipeline could be deployed to quantify muscle atrophy or implant positioning across larger patient cohorts without extra annotation effort.
  • The observed local improvement near implants suggests the method could be tested on CTs containing other orthopedic hardware such as knee or shoulder replacements.
  • Because the second network operates on already-refined images, replacing the two separate U-nets with a single joint model might reduce cumulative error propagation.

Load-bearing premise

The simulated metal-artifact images used for the main quantitative evaluation accurately represent the appearance and severity of artifacts in real postoperative CT scans of patients with hip implants.

What would settle it

Running the pipeline on a larger set of real postoperative CT scans with hip implants and finding no reduction or an increase in average symmetric surface distance relative to the previous method would falsify the reported improvement.

read the original abstract

In total hip arthroplasty, analysis of postoperative medical images is important to evaluate surgical outcome. Since Computed Tomography (CT) is most prevalent modality in orthopedic surgery, we aimed at the analysis of CT image. In this work, we focus on the metal artifact in postoperative CT caused by the metallic implant, which reduces the accuracy of segmentation especially in the vicinity of the implant. Our goal was to develop an automated segmentation method of the bones and muscles in the postoperative CT images. We propose a method that combines Normalized Metal Artifact Reduction (NMAR), which is one of the state-of-the-art metal artifact reduction methods, and a Convolutional Neural Network-based segmentation using two U-net architectures. The first U-net refines the result of NMAR and the muscle segmentation is performed by the second U-net. We conducted experiments using simulated images of 20 patients and real images of three patients to evaluate the segmentation accuracy of 19 muscles. In simulation study, the proposed method showed statistically significant improvement (p<0.05) in the average symmetric surface distance (ASD) metric for 14 muscles out of 19 muscles and the average ASD of all muscles from 1.17 +/- 0.543 mm (mean +/- std over all patients) to 1.10 +/- 0.509 mm over our previous method. The real image study using the manual trace of gluteus maximus and medius muscles showed ASD of 1.32 +/- 0.25 mm. Our future work includes training of a network in an end-to-end manner for both the metal artifact reduction and muscle segmentation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes combining Normalized Metal Artifact Reduction (NMAR) with two sequential U-Net architectures (one to refine the NMAR output, the second to segment 19 hip and thigh muscles) for postoperative CT images containing metal artifacts from hip implants. On 20 simulated patients the method yields statistically significant ASD improvement for 14 of 19 muscles (mean ASD reduced from 1.17 mm to 1.10 mm); on three real patients it reports ASD of 1.32 mm for the gluteus maximus and medius.

Significance. If the simulated artifacts faithfully reproduce real clinical beam-hardening and scatter patterns, the pipeline offers a modest, empirically demonstrated gain in segmentation accuracy for orthopedic CT analysis. The work benefits from held-out evaluation on both simulated and real data and from the use of an established MAR pre-processing step, but the small absolute improvement and limited real-world cohort constrain its immediate clinical impact.

major comments (2)
  1. [Simulation study] Simulation study: The central quantitative claim (statistically significant ASD improvement on 14/19 muscles) rests entirely on the unverified assumption that the simulated metal-artifact images accurately reproduce the spatial distribution, severity, and tissue-specific effects of real postoperative CT artifacts around hip implants. No quantitative match (e.g., HU histograms, artifact extent maps, or tissue-specific error statistics) between simulated and real artifact distributions is provided.
  2. [Real image study] Real image study: Evaluation is performed on only three patients and reports ASD solely for gluteus maximus and medius without any baseline comparison to the previous method, rendering the real-data results insufficient to support the claim of improved segmentation accuracy in clinical metal-contaminated CT.
minor comments (2)
  1. [Abstract] Abstract: The statistical procedure used to obtain p<0.05 values and the precise method for computing mean and standard deviation across patients are not described.
  2. [Method] Method section: Training hyperparameters, loss functions, and data-augmentation details for the two U-Nets are only sketched; fuller specification would aid reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thorough review and valuable feedback. We address each major comment below with clarifications and proposed revisions where appropriate. Our responses focus on substance and aim to strengthen the manuscript without overstating the results.

read point-by-point responses
  1. Referee: [Simulation study] The central quantitative claim (statistically significant ASD improvement on 14/19 muscles) rests entirely on the unverified assumption that the simulated metal-artifact images accurately reproduce the spatial distribution, severity, and tissue-specific effects of real postoperative CT artifacts around hip implants. No quantitative match (e.g., HU histograms, artifact extent maps, or tissue-specific error statistics) between simulated and real artifact distributions is provided.

    Authors: We agree that the manuscript does not include a direct quantitative comparison (such as HU histograms or artifact extent maps) between the simulated and real artifact distributions. The simulation follows standard forward-projection methods with polychromatic modeling that are widely used in the MAR literature. To address the concern, we will add a new subsection with side-by-side HU distribution plots and visual artifact comparisons using the three available real patient scans. This will provide empirical support for the simulation fidelity. revision: yes

  2. Referee: [Real image study] Evaluation is performed on only three patients and reports ASD solely for gluteus maximus and medius without any baseline comparison to the previous method, rendering the real-data results insufficient to support the claim of improved segmentation accuracy in clinical metal-contaminated CT.

    Authors: We concur that the real-image cohort is small (three patients) and limited to two muscles without a baseline comparison to the prior method. Comprehensive manual annotation of all 19 muscles on real postoperative scans is resource-intensive and was not feasible within the study scope. The real-data results are presented as preliminary validation only. In revision we will (i) explicitly label them as such, (ii) remove any implication of superiority on real data, and (iii) expand the limitations section to discuss the practical barriers to larger real-world cohorts and full baseline comparisons. revision: partial

Circularity Check

0 steps flagged

No circularity; empirical evaluation on held-out data

full rationale

The paper reports an empirical pipeline (NMAR + two U-nets) whose central claims are quantitative ASD improvements measured on 20 simulated patients (held-out) and 3 real patients. No equations, derivations, or first-principles results are presented that reduce the reported metrics to fitted parameters, self-definitions, or self-citation chains by construction. The comparison is against a prior method on independent test data; the simulation-to-real fidelity concern is an external-validity issue, not a circularity reduction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract provides no explicit free parameters, axioms, or invented entities; the approach rests on standard assumptions of U-Net architectures and the NMAR algorithm from prior literature.

pith-pipeline@v0.9.0 · 5862 in / 1229 out tokens · 30253 ms · 2026-05-25T14:31:49.458384+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

15 extracted references · 15 canonical work pages · 4 internal anchors

  1. [1]

    Ronneberger, O., Fischer, P., & Brox, T. (2015). U-net: Convolutional networks for biomedical image segmentation. International Conference on Medical image computing and computer -assisted intervention, 234- 241

  2. [2]

    Meyer, E., Raupach, R., Lell, M., Schmidt, B., & Kachelrieß, M. (2010). Normalized metal artif act reduction (NMAR) in computed tomography. Medical physics, 37(10), 5482-5493

  3. [3]

    Sakamoto, M., Hiasa, Y., Otake, Y., Takao, M., Suzuki, Y ., Sugano, N., & Sato, Y. (2019 ). Automated segmentation of hip and thigh muscles in metal artifact contaminated CT using CNN. In International Forum on Medical Imaging in Asia 2019. International Society for Optics and Photonics, 11050, 110500

  4. [4]

    , Xi, Y., Claus, B., Jin, Y., et al

    Gjesteby, L., Shan, H., Yang, Q. , Xi, Y., Claus, B., Jin, Y., et al. (2018). Deep Neural Network for CT Metal Artifact Reduction with a Perceptual Loss Function. Proceedings of The Fifth International Conference on Image Formation in X-ray Computed Tomography, 439-443

  5. [5]

    (2018) Convolutional Neural Network based Metal Artifact Reduction in X -ray Computed Tomography

    Zhang, Y., & Yu, H. (2018) Convolutional Neural Network based Metal Artifact Reduction in X -ray Computed Tomography. IEEE transactions on medical imaging, 37(6), 1370-1381

  6. [6]

    Herman, G. T. (1979). Correction for beam hardening in computed tomography. Physics in Medicine & Biology, 24(1), 81

  7. [7]

    Kyriakou, Y., Meyer, E., Prell, D., & Kachelrieß, M. (2010). Empirical beam hardening correction (EBHC) for CT. Medical physics, 37(10), 5179-5187

  8. [8]

    http://www.nist.gov/pml/data/xcom/index.cfm

    Berger, M., XCOM: photon cross sections database. http://www.nist.gov/pml/data/xcom/index.cfm. Accessed 21 June 2019

  9. [9]

    Simulation of x-ray spectra

    Simens Healthineers. Simulation of x-ray spectra . https://www.oem-xray-components.siemens.com/x-ray- spectra-simulation. Accessed 21 June 2019

  10. [10]

    Adam: A Method for Stochastic Optimization

    Kingma, D. P., & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980

  11. [11]

    Random Erasing Data Augmentation

    Zhong, Z., Zheng, L., Kang, G., Li, S., Yang, Y. (2017) Random erasing data augmentation. arXiv preprint arXiv:1708.04896

  12. [12]

    Augmentor: An Image Augmentation Library for Machine Learning

    Bloice MD, Stocker C, Holzinger A. (2017), Augmentor: an image augmentation library for machine learning. arXiv:1708.04680

  13. [13]

    3D segmentation in the clinic: A grand challenge ,

    Van Ginneken , B., Heimann, T., & Styner, M. , “ 3D segmentation in the clinic: A grand challenge ,” 3D segmentation in the clinic: a grand challenge, 7-15 (2007)

  14. [14]

    Wu, D., Kim, K., Dong, B., & Li, Q. (2017). End-to-end abnormality detection in medical imaging. arXiv preprint arXiv:1711.02074

  15. [15]

    Shrivastava, A., Pfister, T., Tuzel, O., Susskind, J., Wang, W., & Webb, R. (2017). Learning from simulated and unsupervised images through adversarial training. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2107-2116