A New Angle on Bones: Robust Pose Estimation in X-Ray and Ultrasound

Anne-Nele Schr\"oder; Christoph Gro{\ss}br\"ohmer; Franziska Halm; Lasse Hansen; Ludger T\"ushaus; Mattias P. Heinrich; Miriam Johann; Ron Keuth

arxiv: 2606.04700 · v2 · pith:TMJHF3NVnew · submitted 2026-06-03 · 💻 cs.CV

A New Angle on Bones: Robust Pose Estimation in X-Ray and Ultrasound

Ron Keuth , Christoph Gro{\ss}br\"ohmer , Franziska Halm , Miriam Johann , Anne-Nele Schr\"oder , Ludger T\"ushaus , Mattias P. Heinrich , Lasse Hansen This is my paper

Pith reviewed 2026-06-28 07:20 UTC · model grok-4.3

classification 💻 cs.CV

keywords bone angle estimationpose estimationX-rayultrasoundRANSACHough transformpaediatric radiologyrobust fitting

0 comments

The pith

A learning-based point proposal step followed by robust line fitting estimates bone angles in X-rays and ultrasound to within clinical observer error.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops an automated pipeline for measuring angles between bone structures in medical images, a task that guides diagnosis and treatment in orthopaedics. It first trains a network to suggest candidate points along bone axes, then uses outlier-resistant line-fitting methods such as RANSAC and the Hough transform to recover the axes even when some proposals are wrong. The method is tested on three paediatric tasks: fracture fragment alignment in radiographs, fracture alignment in ultrasound, and hip dysplasia assessment by the Graf method in ultrasound. Reported mean angular errors are 4.1°, 5.4°, and 5.51° respectively, values that fall inside typical inter-observer variability and beat purely landmark-driven baselines. If the approach holds, routine angle measurements could be performed faster and with greater consistency without requiring manual landmark placement on every image.

Core claim

By generating point candidates with a learned model and then recovering bone axes through robust line estimation rather than least-squares or direct landmark regression, the pipeline produces angular measurements whose average errors on the three clinical tasks remain inside the range of human observer disagreement while exceeding the accuracy of landmark-only alternatives.

What carries the argument

learning-based point candidate proposal followed by robust line fitting (RANSAC or Hough transform) to extract axis parameters

If this is right

Angle measurements for fracture assessment and hip dysplasia screening can be obtained without manual landmark annotation on each new image.
Reproducibility of paediatric orthopaedic metrics improves because the pipeline replaces variable human landmark placement with a fixed algorithmic procedure.
The same two-stage design (point proposal then robust axis fit) can be retrained for additional bone structures once suitable annotated data exist.
Clinical workflow time for quantitative angle reporting decreases because the pipeline runs end-to-end from image to numeric angle.
Outlier rejection steps inside the line-fitting stage protect accuracy even when the point-proposal network produces occasional false positives.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The approach could be extended to three-dimensional volumetric data if the point-proposal network is replaced by a 3-D analogue and the line model becomes a plane or cylinder fit.
Real-time use during ultrasound-guided procedures becomes feasible once the network is quantized and the fitting step is optimized for low latency.
Systematic comparison against other robust estimators such as M-estimators or graph-cut methods would clarify whether RANSAC and Hough are the only viable choices for this domain.
If the point-proposal network is trained on a broader mix of scanners and patient ages, the same pipeline might maintain accuracy across adult as well as paediatric populations.

Load-bearing premise

The learned point proposals supply enough inliers that robust fitting recovers the true bone axes without systematic offset from false positives or image artifacts.

What would settle it

On a held-out set of the same three tasks, if the method's angular errors rise above published clinical observer variability ranges or display consistent directional bias on images containing common artifacts, the central performance claim would be refuted.

Figures

Figures reproduced from arXiv: 2606.04700 by Anne-Nele Schr\"oder, Christoph Gro{\ss}br\"ohmer, Franziska Halm, Lasse Hansen, Ludger T\"ushaus, Mattias P. Heinrich, Miriam Johann, Ron Keuth.

**Figure 2.** Figure 2: Statistics of absolute angle estimation error (in degrees) across cross [PITH_FULL_IMAGE:figures/full_fig_p010_2.png] view at source ↗

**Figure 3.** Figure 3: Qualitative result of the best method (bold in Tab. 2a) in the three med [PITH_FULL_IMAGE:figures/full_fig_p011_3.png] view at source ↗

read the original abstract

Measuring the angle between bone structures is a routine task in medical image analysis and provides a key quantitative parameter for diagnosis and treatment planning. Automated methods can reduce time and cost while improving reproducibility. In this work, we address automatic bone pose estimation using a learning-based point candidate proposal followed by a line model to extract axis parameters. Since conventional line models such as least squares are sensitive to outliers, we incorporate false-positive reduction strategies and robust fitting techniques, such as RANSAC and Hough transforms, to improve robustness. We evaluate our method on three clinically relevant paediatric angle estimation tasks: fracture fragment assessment in radiographs and ultrasound and developmental dysplasia of the hip evaluation in ultrasound using the Graf method. Our approach achieves mean errors of $4.1^\circ$, $5.4^\circ$, and $5.51^\circ$, respectively, not only remaining within the expected clinical observer variability, but also significantly outperforming landmark-based methods. Our code and annotations for fracture angle assessment in radiographs are publicly available on GitHub.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper pairs a point-proposal network with RANSAC/Hough fitting for three pediatric angle tasks and reports usable errors, but supplies no ablations or inlier stats to show the robust step is necessary.

read the letter

The paper's core contribution is a straightforward pipeline: a learned point candidate network followed by robust line fitting (RANSAC or Hough) to measure bone angles on radiographs and ultrasound. They test it on fracture fragment assessment in both modalities and on Graf-method hip dysplasia in ultrasound, reporting mean errors of 4.1°, 5.4°, and 5.51° that stay inside typical observer variability and beat plain landmark baselines. They also release code and annotations for the radiograph fracture task.

What stands out is the practical focus and the open release; those elements make the work easier to check or extend than many medical imaging papers. The tasks are narrow but clinically routine, so the numbers could matter to people who actually run these measurements.

The soft spots are in the validation. The abstract and stress-test note give no inlier ratios from the proposal network, no ablation that turns the robust fitting off, and no error bars or statistical tests on the reported means. Without those, it is hard to tell whether the fitting step is fixing real false positives or whether the network is already producing clean enough points on these particular images. The three tasks span different artifacts, so the same gap applies across modalities.

This is the kind of paper that belongs in a medical image analysis venue or a clinical engineering journal. Readers who work on automated orthopedic measurements or who need a starting point for similar angle tasks will get something concrete from it. The open data helps.

I would send it to peer review. The application is clear, the errors look plausible on the surface, and the release lowers the barrier for referees to dig into the details. A revision that adds the missing ablations and basic stats would strengthen it without changing the main claim.

Referee Report

3 major / 1 minor

Summary. The paper proposes a two-stage approach for automatic bone axis and angle estimation in radiographs and ultrasound: a learning-based network proposes point candidates on bone structures, followed by robust line fitting (RANSAC or Hough) with false-positive reduction to recover axes. It reports evaluation on three paediatric tasks (fracture fragment assessment in X-ray, fracture assessment in ultrasound, and Graf-method DDH assessment in ultrasound), with mean angular errors of 4.1°, 5.4°, and 5.51° that are claimed to lie within clinical observer variability and to significantly outperform landmark-based baselines. Code and annotations for one task are released publicly.

Significance. If the performance claims are substantiated with proper statistical reporting and component ablations, the work would offer a practical, modality-robust tool for a routine clinical measurement task, potentially improving reproducibility in paediatric orthopaedics. The public release of code and annotations is a clear strength that supports reproducibility.

major comments (3)

[Abstract] Abstract: the headline mean errors (4.1°, 5.4°, 5.51°) and the claim of statistically significant outperformance are presented without standard deviations, dataset sizes (train/test splits), number of images or patients, or any statistical test results. This directly weakens the assertion that the method remains within clinical observer variability and outperforms landmarks.
[Abstract] Abstract / method description: the paper states that false-positive reduction strategies are incorporated, yet supplies no quantitative characterization of the proposal network (inlier ratio, precision, false-positive rate) on the test sets, nor any ablation that isolates the contribution of the robust fitting stage versus the network alone. Because the three tasks involve distinct artifact profiles, this leaves the central robustness claim untested.
[Abstract] Abstract: no details are given on how the false-positive reduction was implemented or tuned, nor on whether the same hyper-parameters were used across the radiograph and ultrasound tasks. This information is load-bearing for the claim that the pipeline generalizes without systematic bias from network errors.

minor comments (1)

[Abstract] The abstract mentions three tasks but does not name the exact clinical angles measured in each (e.g., which Graf angles or fracture angles); adding one sentence would improve clarity.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive comments, which identify opportunities to strengthen the abstract's support for our claims. We address each point below and will revise the abstract accordingly.

read point-by-point responses

Referee: [Abstract] Abstract: the headline mean errors (4.1°, 5.4°, 5.51°) and the claim of statistically significant outperformance are presented without standard deviations, dataset sizes (train/test splits), number of images or patients, or any statistical test results. This directly weakens the assertion that the method remains within clinical observer variability and outperforms landmarks.

Authors: We agree that the abstract would benefit from these details to better substantiate the claims. In the revised version we will incorporate the standard deviations, dataset sizes (including train/test splits, image and patient counts), and statistical test results (e.g., p-values from paired tests) directly into the abstract while preserving its conciseness. revision: yes
Referee: [Abstract] Abstract / method description: the paper states that false-positive reduction strategies are incorporated, yet supplies no quantitative characterization of the proposal network (inlier ratio, precision, false-positive rate) on the test sets, nor any ablation that isolates the contribution of the robust fitting stage versus the network alone. Because the three tasks involve distinct artifact profiles, this leaves the central robustness claim untested.

Authors: The manuscript contains ablations in the experiments section that compare the full pipeline against the proposal network alone. To address the abstract-level concern we will add concise quantitative metrics (inlier ratios, precision) and a note on the fitting-stage contribution to the revised abstract. revision: yes
Referee: [Abstract] Abstract: no details are given on how the false-positive reduction was implemented or tuned, nor on whether the same hyper-parameters were used across the radiograph and ultrasound tasks. This information is load-bearing for the claim that the pipeline generalizes without systematic bias from network errors.

Authors: We will revise the abstract to include a brief description of the false-positive reduction implementation, tuning procedure, and confirmation that consistent hyper-parameters were applied across tasks. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical pipeline with independent validation

full rationale

The paper describes a two-stage method (learning-based point proposal + RANSAC/Hough robust fitting) and reports measured mean angular errors on three clinical tasks. No equations, fitted parameters, or self-citations are presented that reduce any claimed result or derivation to the inputs by construction. The performance numbers are post-hoc empirical outcomes, not quantities defined or forced by the method itself. The derivation chain is therefore self-contained and non-circular.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review yields no explicit free parameters, axioms, or invented entities beyond standard assumptions of supervised learning and robust statistics; no ad-hoc constants or new entities are described.

pith-pipeline@v0.9.1-grok · 5741 in / 1135 out tokens · 22074 ms · 2026-06-28T07:20:13.049620+00:00 · methodology

A New Angle on Bones: Robust Pose Estimation in X-Ray and Ultrasound

Core claim

What carries the argument

If this is right

Where Pith is reading between the lines

Load-bearing premise

What would settle it

discussion (0)