pith. sign in

arxiv: 2512.13402 · v2 · pith:TXLAQBB2new · submitted 2025-12-15 · 💻 cs.CV · cs.AI

End2Reg: Learning Task-Specific Segmentation for Markerless Registration in Spine Surgery

Pith reviewed 2026-05-21 17:00 UTC · model grok-4.3

classification 💻 cs.CV cs.AI
keywords end-to-end learningsegmentationregistrationspine surgerymarkerless navigationdeep learningRGB-D
0
0 comments X

The pith

End-to-end network learns segmentation masks from registration loss to boost markerless spine navigation accuracy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a framework that trains a network to produce segmentation masks for anatomical structures in spine images without any direct segmentation training data. Instead, the masks are optimized purely through the downstream registration task that aligns preoperative and intraoperative images. This joint optimization removes the need for manual labeling or separate segmentation steps. If successful, it simplifies intraoperative navigation by making it fully automatic and markerless while improving accuracy over methods that use weak labels.

Core claim

End2Reg is an end-to-end deep learning framework that jointly optimizes segmentation and registration for markerless RGB-D registration in spine surgery. The network learns task-specific segmentation masks guided solely by the registration objective, without explicit segmentation supervision. This approach achieves state-of-the-art performance, reducing median Target Registration Error by 32% and mean Root Mean Square Error by 61% on ex- and in-vivo benchmarks, while remaining robust to partial occlusions.

What carries the argument

Joint end-to-end optimization where the registration loss directly supervises the generation of task-specific segmentation masks.

If this is right

  • Registration accuracy improves significantly without requiring segmentation labels.
  • The learned masks are optimized specifically for the registration task rather than general anatomical accuracy.
  • Performance holds under partial occlusions common in surgical settings.
  • Ablation studies show that removing end-to-end training reduces accuracy.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar joint optimization could apply to other medical imaging tasks where intermediate steps like segmentation are currently supervised separately.
  • If the method generalizes, it might reduce reliance on large annotated datasets in surgical AI applications.
  • Future work could test if these masks reveal new anatomical insights beyond registration.

Load-bearing premise

The registration objective by itself produces segmentation outputs that are both anatomically meaningful and optimal for registration instead of latching onto irrelevant patterns in the data.

What would settle it

Observing that the generated segmentation masks do not correspond to actual anatomical structures on new patient data, or that registration error does not decrease when using these masks compared to random or manual alternatives.

read the original abstract

Intraoperative navigation in spine surgery demands millimeter-level accuracy. Currently, this is achieved through radiation-intensive intraoperative imaging and bone-anchored markers that are invasive and disrupt surgical workflow. Markerless RGB-D registration methods offer a promising alternative. However, existing approaches rely on weak segmentation labels to isolate relevant anatomical structures, potentially propagating errors through the registration process. We present End2Reg, an end-to-end deep learning framework that jointly optimizes segmentation and registration, eliminating the need for segmentation labels and manual steps. The network learns task-specific segmentation masks optimized for registration, guided solely by the registration objective without explicit segmentation supervision. End2Reg achieves state-of-the-art performance on ex- and in-vivo benchmarks, reducing median Target Registration Error by 32% and mean Root Mean Square Error by 61%, while maintaining robust performance under partial occlusions. Ablation results confirm that end-to-end optimization significantly improves registration accuracy. Overall, End2Reg advances towards fully automatic, markerless intraoperative navigation. Code and interactive visualizations are available at: https://lorenzopettinari.github.io/end-2-reg/.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces End2Reg, an end-to-end deep learning framework for markerless RGB-D registration in spine surgery. A segmentation network is trained jointly with the registration module using only the registration objective, without any explicit segmentation supervision or labels. The learned masks are claimed to be task-specific and optimized for registration. The method reports state-of-the-art results on ex-vivo and in-vivo benchmarks, including a 32% reduction in median Target Registration Error and 61% reduction in mean Root Mean Square Error, plus robustness under partial occlusions. Ablation studies are presented to support the benefit of end-to-end optimization over separate training.

Significance. If the results and the assumption that the registration loss alone yields anatomically meaningful masks hold, the work offers a clear advance toward fully automatic, markerless intraoperative navigation. Eliminating the need for weak segmentation labels addresses a practical bottleneck in surgical settings. The reported error reductions and occlusion robustness, together with public code and visualizations, strengthen the contribution to computer vision methods for medical registration.

major comments (2)
  1. [§4 (Results and Ablations)] The central claim that the registration objective alone produces segmentation masks that isolate relevant anatomy (rather than spurious correlations) is load-bearing but under-supported. The manuscript should include quantitative comparison of the learned masks to anatomical ground truth or expert annotations on held-out data, for example in the results or ablation sections.
  2. [§4.1 (Benchmark Evaluation)] Dataset sizes, cross-validation strategy, and statistical testing for the reported 32% median TRE and 61% mean RMSE reductions are not detailed in the abstract or visible summary; these are required to establish that the improvements are reliable and not driven by small-sample effects or patient-specific artifacts.
minor comments (2)
  1. [§3 (Method)] Clarify the precise formulation of the registration loss and how gradients flow through the segmentation output in the methods section.
  2. [Introduction] Add a reference to recent unsupervised or weakly-supervised segmentation approaches in medical image registration for context.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and insightful comments on our manuscript. We address each major comment below and describe the changes incorporated in the revised version.

read point-by-point responses
  1. Referee: [§4 (Results and Ablations)] The central claim that the registration objective alone produces segmentation masks that isolate relevant anatomy (rather than spurious correlations) is load-bearing but under-supported. The manuscript should include quantitative comparison of the learned masks to anatomical ground truth or expert annotations on held-out data, for example in the results or ablation sections.

    Authors: We appreciate the referee's emphasis on this point. The primary support for the task-specific nature of the masks is provided by the ablation studies, which show statistically significant registration improvements when segmentation and registration are optimized jointly rather than separately. This performance gap indicates that the masks capture features relevant to registration rather than spurious correlations. We acknowledge that a quantitative comparison against expert annotations would offer additional reassurance. Because the method is designed to eliminate the need for segmentation labels, such annotations are unavailable for the primary datasets. In the revision we have added extended qualitative analysis, including side-by-side visualizations of the learned masks against manually delineated anatomical structures on a small held-out subset, together with a discussion of why the registration objective encourages anatomically coherent masks. revision: partial

  2. Referee: [§4.1 (Benchmark Evaluation)] Dataset sizes, cross-validation strategy, and statistical testing for the reported 32% median TRE and 61% mean RMSE reductions are not detailed in the abstract or visible summary; these are required to establish that the improvements are reliable and not driven by small-sample effects or patient-specific artifacts.

    Authors: We thank the referee for noting this presentational gap. Although the dataset composition, cross-validation protocol, and statistical tests are described in Sections 3 and 4 of the manuscript, we agree they should be more immediately visible. In the revised manuscript we have updated the abstract to state the dataset sizes and cross-validation strategy, added a concise summary table in Section 4.1 that reports the number of sequences, patients/cadavers, and the results of the Wilcoxon signed-rank test (including p-values) for the reported error reductions. revision: yes

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The approach assumes that a differentiable registration loss can serve as a sufficient training signal for learning useful segmentation masks. No explicit free parameters, axioms, or invented entities are stated in the abstract beyond standard deep-learning training assumptions.

pith-pipeline@v0.9.0 · 5747 in / 1335 out tokens · 87899 ms · 2026-05-21T17:00:49.051743+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

27 extracted references · 27 canonical work pages · 3 internal anchors

  1. [1]

    Luther, N., Iorgulescu, J.B.,et al.: Comparison of navigated versus non-navigated pedicle screw placement in 260 patients and 1434 screws: screw accuracy, screw size, and the complexity of surgery. Clin. Spine Surg.28(5), 298–303 (2015)

  2. [2]

    Gelalis, I.D., Paschos, N.K.,et al.: Accuracy of pedicle screw placement: a sys- tematic review of prospective in vivo studies comparing free hand, fluoroscopy guidance and navigation techniques. Eur. spine j.21(2), 247–255 (2012)

  3. [3]

    Karkenny, A.J., Mendelis, J.R.,et al.: The role of intraoperative navigation in orthopaedic surgery. JAAOS-J. Am. Acad. Orthop. Surg.27(19), 849–858 (2019)

  4. [4]

    In: Semin

    Floyd, E., Cohn, P.,et al.: A review of preoperative planning technologies for spinal deformity correction. In: Semin. in Spine Surg., vol. 32, p. 100787 (2020). Elsevier

  5. [5]

    Spine28(15S), 54–61 (2003)

    Holly, L.T., Foley, K.T.: Intraoperative spinal navigation. Spine28(15S), 54–61 (2003)

  6. [6]

    The Spine J.24(6), 1087–1094 (2024)

    Striano, B.M., Crawford, A.M.,et al.: Intraoperative navigation increases the pro- jected lifetime cancer risk in patients undergoing surgery for adolescent idiopathic scoliosis. The Spine J.24(6), 1087–1094 (2024)

  7. [7]

    Tonetti, J., Boudissa, M.,et al.: Role of 3d intraoperative imaging in orthopedic and trauma surgery. Orthop. & Traumatol.: Surg. & Res.106(1), 19–25 (2020)

  8. [8]

    Liebmann, F., Atzigen, M.,et al.: Automatic registration with continuous pose updates for marker-less surgical navigation in spine surgery. Med. Image Anal. 91, 103027 (2024)

  9. [9]

    IEEE Trans

    Hu, X., Nguyen, A., Baena, F.R.: Occlusion-robust visual markerless bone track- ing for computer-assisted orthopedic surgery. IEEE Trans. on Instrum. and Meas. 71, 1–11 (2021)

  10. [10]

    arXiv preprint arXiv:2506.23657 (2025)

    Daly, C., Marconi, E., et al.: Towards markerless intraoperative tracking of deformable spine tissue. arXiv preprint arXiv:2506.23657 (2025)

  11. [11]

    IEEE Access8, 42010–42020 (2020)

    Liu, H., Baena, F.R.Y.: Automatic markerless registration and tracking of the bone for computer-assisted orthopaedic surgery. IEEE Access8, 42010–42020 (2020)

  12. [12]

    Zhu, S., Zhao, Z., Pan, Y., Zheng, G.: Markerless robotic pedicle screw placement based on structured light tracking. Int. J. of Comput. Assist. Radiol. and Surg. 15(8), 1347–1358 (2020)

  13. [13]

    Pattern Recognit.151, 110408 (2024) 11

    Lyu, M., Yang, J.,et al.: Rigid pairwise 3d point cloud registration: A survey. Pattern Recognit.151, 110408 (2024) 11

  14. [14]

    In: Sensor Fusion IV: Control Paradigms and Data Structures, vol

    Besl, P.J., McKay, N.D.: Method for registration of 3-d shapes. In: Sensor Fusion IV: Control Paradigms and Data Structures, vol. 1611, pp. 586–606 (1992). Spie

  15. [15]

    Fischler, M.A., Bolles, R.C.: Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. of the ACM24(6), 381–395 (1981)

  16. [16]

    In: 2024 IEEE Int

    Weber, M., Wild, D.,et al.: Deep learning-based point cloud registration for augmented reality-guided surgery. In: 2024 IEEE Int. Symp. on Biomed. Imaging (ISBI), pp. 1–5 (2024). IEEE

  17. [17]

    IEEE Trans

    Ji, S., Fan, X.,et al.: Patient registration using intraoperative stereovision in image-guided open spinal surgery. IEEE Trans. on Biomed. Eng.62(9), 2177–2186 (2015)

  18. [18]

    In: Proc

    Warner, W.R., Bhattacharya, I.,et al.: Sparse-xm: Spine pose adjustment with rgb-d bone segmentation via cross-modality label transfer. In: Proc. of Med. Image Comput. and Compu. Assist. Interv. – MICCAI 2025, vol. LNCS 15968, pp. 532–541. Springer, Switzerland (2025)

  19. [19]

    Liebmann, F., St¨ utz, D.,et al.: Spinedepth: a multi-modal data collection approach for automatic labelling and intraoperative spinal shape reconstruction based on rgb-d data. J. of Imaging7(9), 164 (2021)

  20. [20]

    Categorical Reparameterization with Gumbel-Softmax

    Jang, E., Gu, S., Poole, B.: Categorical reparameterization with gumbel-softmax. arXiv preprint arXiv:1611.01144 (2016)

  21. [21]

    Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation

    Bengio, Y., L´ eonard, N., Courville, A.: Estimating or propagating gradi- ents through stochastic neurons for conditional computation. arXiv preprint arXiv:1308.3432 (2013)

  22. [22]

    In: Proc

    Thomas, H., Qi, C.R.,et al.: Kpconv: Flexible and deformable convolution for point clouds. In: Proc. of the IEEE/CVF Int. Conf. on Comput. Vis., pp. 6411– 6420 (2019)

  23. [23]

    IEEE Trans

    Qin, Z., Yu, H.,et al.: Geotransformer: Fast and robust point cloud registration with geometric transformer. IEEE Trans. on Pattern Anal. and Mach. Intell. 45(8), 9806–9821 (2023)

  24. [24]

    In: Proc

    Wei, T., Patel, Y.,et al.: Generalized differentiable ransac. In: Proc. of the IEEE/CVF Int. Conf. on Comput. Vis., pp. 17649–17660 (2023)

  25. [25]

    Ravi, N., Gabeur, V., et al.: Sam 2: Segment anything in images and videos. arxiv

  26. [26]

    arXiv preprint arXiv:2408.00714

  27. [27]

    IEEE Robot

    Pan, L., Cai, Z., Liu, Z.: Robust partial-to-partial point cloud registration in a full range. IEEE Robot. and Autom. Lett.9(3), 2861–2868 (2024) 12