End2Reg: Learning Task-Specific Segmentation for Markerless Registration in Spine Surgery

Carol C. Hasler; Daniel Studer; Lorenzo Pettinari; Maria Licci; Michael Wehrli; Philippe C. Cattin; Sidaty El Hadramy

arxiv: 2512.13402 · v2 · pith:TXLAQBB2new · submitted 2025-12-15 · 💻 cs.CV · cs.AI

End2Reg: Learning Task-Specific Segmentation for Markerless Registration in Spine Surgery

Lorenzo Pettinari , Sidaty El Hadramy , Michael Wehrli , Philippe C. Cattin , Daniel Studer , Carol C. Hasler , Maria Licci This is my paper

Pith reviewed 2026-05-21 17:00 UTC · model grok-4.3

classification 💻 cs.CV cs.AI

keywords end-to-end learningsegmentationregistrationspine surgerymarkerless navigationdeep learningRGB-D

0 comments

The pith

End-to-end network learns segmentation masks from registration loss to boost markerless spine navigation accuracy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a framework that trains a network to produce segmentation masks for anatomical structures in spine images without any direct segmentation training data. Instead, the masks are optimized purely through the downstream registration task that aligns preoperative and intraoperative images. This joint optimization removes the need for manual labeling or separate segmentation steps. If successful, it simplifies intraoperative navigation by making it fully automatic and markerless while improving accuracy over methods that use weak labels.

Core claim

End2Reg is an end-to-end deep learning framework that jointly optimizes segmentation and registration for markerless RGB-D registration in spine surgery. The network learns task-specific segmentation masks guided solely by the registration objective, without explicit segmentation supervision. This approach achieves state-of-the-art performance, reducing median Target Registration Error by 32% and mean Root Mean Square Error by 61% on ex- and in-vivo benchmarks, while remaining robust to partial occlusions.

What carries the argument

Joint end-to-end optimization where the registration loss directly supervises the generation of task-specific segmentation masks.

If this is right

Registration accuracy improves significantly without requiring segmentation labels.
The learned masks are optimized specifically for the registration task rather than general anatomical accuracy.
Performance holds under partial occlusions common in surgical settings.
Ablation studies show that removing end-to-end training reduces accuracy.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar joint optimization could apply to other medical imaging tasks where intermediate steps like segmentation are currently supervised separately.
If the method generalizes, it might reduce reliance on large annotated datasets in surgical AI applications.
Future work could test if these masks reveal new anatomical insights beyond registration.

Load-bearing premise

The registration objective by itself produces segmentation outputs that are both anatomically meaningful and optimal for registration instead of latching onto irrelevant patterns in the data.

What would settle it

Observing that the generated segmentation masks do not correspond to actual anatomical structures on new patient data, or that registration error does not decrease when using these masks compared to random or manual alternatives.

read the original abstract

Intraoperative navigation in spine surgery demands millimeter-level accuracy. Currently, this is achieved through radiation-intensive intraoperative imaging and bone-anchored markers that are invasive and disrupt surgical workflow. Markerless RGB-D registration methods offer a promising alternative. However, existing approaches rely on weak segmentation labels to isolate relevant anatomical structures, potentially propagating errors through the registration process. We present End2Reg, an end-to-end deep learning framework that jointly optimizes segmentation and registration, eliminating the need for segmentation labels and manual steps. The network learns task-specific segmentation masks optimized for registration, guided solely by the registration objective without explicit segmentation supervision. End2Reg achieves state-of-the-art performance on ex- and in-vivo benchmarks, reducing median Target Registration Error by 32% and mean Root Mean Square Error by 61%, while maintaining robust performance under partial occlusions. Ablation results confirm that end-to-end optimization significantly improves registration accuracy. Overall, End2Reg advances towards fully automatic, markerless intraoperative navigation. Code and interactive visualizations are available at: https://lorenzopettinari.github.io/end-2-reg/.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

End2Reg improves spine registration by learning segmentation masks end-to-end from the registration loss alone, without labels, though the masks could be latching onto dataset quirks.

read the letter

End2Reg gets better registration results in spine surgery by learning the segmentation masks end-to-end using only the registration loss, with no segmentation labels at all. This setup cuts median target registration error by 32 percent and mean RMSE by 61 percent on the ex-vivo and in-vivo tests. The fresh angle is treating segmentation as a task-specific auxiliary that gets optimized directly for the registration goal. Most prior markerless approaches still leaned on some kind of pre-segmentation or weak labels, which could pass errors along. The joint training here lets the network discover what image parts actually help alignment. The ablations support that this end-to-end step matters, and the work shows decent robustness when parts of the view are occluded. Having the code out there makes it easier to verify or extend. One spot that feels less secure is the assumption that the registration objective will produce masks that isolate real anatomy instead of latching onto training-specific patterns like textures or lighting that happen to help on these datasets. Medical RGB-D data often has limited variety, so that risk is there. The paper likely has some mask visuals, but confirming they correspond to actual structures and hold up beyond the benchmarks would tighten things up. More transparency on the loss details, network choices, and how they handled statistics would also help. This is the kind of paper that fits for folks in medical robotics or image-guided surgery. Readers who care about self-supervised learning tricks in clinical computer vision would pick up some ideas. It has a distinct method and real numbers, so it should go to a proper referee. I would send it out for peer review. The results are concrete enough to be worth the time, even with the open question on what the masks are really capturing.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces End2Reg, an end-to-end deep learning framework for markerless RGB-D registration in spine surgery. A segmentation network is trained jointly with the registration module using only the registration objective, without any explicit segmentation supervision or labels. The learned masks are claimed to be task-specific and optimized for registration. The method reports state-of-the-art results on ex-vivo and in-vivo benchmarks, including a 32% reduction in median Target Registration Error and 61% reduction in mean Root Mean Square Error, plus robustness under partial occlusions. Ablation studies are presented to support the benefit of end-to-end optimization over separate training.

Significance. If the results and the assumption that the registration loss alone yields anatomically meaningful masks hold, the work offers a clear advance toward fully automatic, markerless intraoperative navigation. Eliminating the need for weak segmentation labels addresses a practical bottleneck in surgical settings. The reported error reductions and occlusion robustness, together with public code and visualizations, strengthen the contribution to computer vision methods for medical registration.

major comments (2)

[§4 (Results and Ablations)] The central claim that the registration objective alone produces segmentation masks that isolate relevant anatomy (rather than spurious correlations) is load-bearing but under-supported. The manuscript should include quantitative comparison of the learned masks to anatomical ground truth or expert annotations on held-out data, for example in the results or ablation sections.
[§4.1 (Benchmark Evaluation)] Dataset sizes, cross-validation strategy, and statistical testing for the reported 32% median TRE and 61% mean RMSE reductions are not detailed in the abstract or visible summary; these are required to establish that the improvements are reliable and not driven by small-sample effects or patient-specific artifacts.

minor comments (2)

[§3 (Method)] Clarify the precise formulation of the registration loss and how gradients flow through the segmentation output in the methods section.
[Introduction] Add a reference to recent unsupervised or weakly-supervised segmentation approaches in medical image registration for context.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and insightful comments on our manuscript. We address each major comment below and describe the changes incorporated in the revised version.

read point-by-point responses

Referee: [§4 (Results and Ablations)] The central claim that the registration objective alone produces segmentation masks that isolate relevant anatomy (rather than spurious correlations) is load-bearing but under-supported. The manuscript should include quantitative comparison of the learned masks to anatomical ground truth or expert annotations on held-out data, for example in the results or ablation sections.

Authors: We appreciate the referee's emphasis on this point. The primary support for the task-specific nature of the masks is provided by the ablation studies, which show statistically significant registration improvements when segmentation and registration are optimized jointly rather than separately. This performance gap indicates that the masks capture features relevant to registration rather than spurious correlations. We acknowledge that a quantitative comparison against expert annotations would offer additional reassurance. Because the method is designed to eliminate the need for segmentation labels, such annotations are unavailable for the primary datasets. In the revision we have added extended qualitative analysis, including side-by-side visualizations of the learned masks against manually delineated anatomical structures on a small held-out subset, together with a discussion of why the registration objective encourages anatomically coherent masks. revision: partial
Referee: [§4.1 (Benchmark Evaluation)] Dataset sizes, cross-validation strategy, and statistical testing for the reported 32% median TRE and 61% mean RMSE reductions are not detailed in the abstract or visible summary; these are required to establish that the improvements are reliable and not driven by small-sample effects or patient-specific artifacts.

Authors: We thank the referee for noting this presentational gap. Although the dataset composition, cross-validation protocol, and statistical tests are described in Sections 3 and 4 of the manuscript, we agree they should be more immediately visible. In the revised manuscript we have updated the abstract to state the dataset sizes and cross-validation strategy, added a concise summary table in Section 4.1 that reports the number of sequences, patients/cadavers, and the results of the Wilcoxon signed-rank test (including p-values) for the reported error reductions. revision: yes

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The approach assumes that a differentiable registration loss can serve as a sufficient training signal for learning useful segmentation masks. No explicit free parameters, axioms, or invented entities are stated in the abstract beyond standard deep-learning training assumptions.

pith-pipeline@v0.9.0 · 5747 in / 1335 out tokens · 87899 ms · 2026-05-21T17:00:49.051743+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

The network learns task-specific segmentation masks optimized for registration, guided solely by the registration objective without explicit segmentation supervision.
IndisputableMonolith/Foundation/AlphaCoordinateFixation.lean alpha_pin_under_high_calibration unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We adopt the Gumbel-Softmax estimator... Straight-Through Gumbel-Softmax Estimator (ST-GS)

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

27 extracted references · 27 canonical work pages · 3 internal anchors

[1]

Luther, N., Iorgulescu, J.B.,et al.: Comparison of navigated versus non-navigated pedicle screw placement in 260 patients and 1434 screws: screw accuracy, screw size, and the complexity of surgery. Clin. Spine Surg.28(5), 298–303 (2015)

work page 2015
[2]

Gelalis, I.D., Paschos, N.K.,et al.: Accuracy of pedicle screw placement: a sys- tematic review of prospective in vivo studies comparing free hand, fluoroscopy guidance and navigation techniques. Eur. spine j.21(2), 247–255 (2012)

work page 2012
[3]

Karkenny, A.J., Mendelis, J.R.,et al.: The role of intraoperative navigation in orthopaedic surgery. JAAOS-J. Am. Acad. Orthop. Surg.27(19), 849–858 (2019)

work page 2019
[4]

In: Semin

Floyd, E., Cohn, P.,et al.: A review of preoperative planning technologies for spinal deformity correction. In: Semin. in Spine Surg., vol. 32, p. 100787 (2020). Elsevier

work page 2020
[5]

Spine28(15S), 54–61 (2003)

Holly, L.T., Foley, K.T.: Intraoperative spinal navigation. Spine28(15S), 54–61 (2003)

work page 2003
[6]

The Spine J.24(6), 1087–1094 (2024)

Striano, B.M., Crawford, A.M.,et al.: Intraoperative navigation increases the pro- jected lifetime cancer risk in patients undergoing surgery for adolescent idiopathic scoliosis. The Spine J.24(6), 1087–1094 (2024)

work page 2024
[7]

Tonetti, J., Boudissa, M.,et al.: Role of 3d intraoperative imaging in orthopedic and trauma surgery. Orthop. & Traumatol.: Surg. & Res.106(1), 19–25 (2020)

work page 2020
[8]

Liebmann, F., Atzigen, M.,et al.: Automatic registration with continuous pose updates for marker-less surgical navigation in spine surgery. Med. Image Anal. 91, 103027 (2024)

work page 2024
[9]

IEEE Trans

Hu, X., Nguyen, A., Baena, F.R.: Occlusion-robust visual markerless bone track- ing for computer-assisted orthopedic surgery. IEEE Trans. on Instrum. and Meas. 71, 1–11 (2021)

work page 2021
[10]

arXiv preprint arXiv:2506.23657 (2025)

Daly, C., Marconi, E., et al.: Towards markerless intraoperative tracking of deformable spine tissue. arXiv preprint arXiv:2506.23657 (2025)

work page arXiv 2025
[11]

IEEE Access8, 42010–42020 (2020)

Liu, H., Baena, F.R.Y.: Automatic markerless registration and tracking of the bone for computer-assisted orthopaedic surgery. IEEE Access8, 42010–42020 (2020)

work page 2020
[12]

Zhu, S., Zhao, Z., Pan, Y., Zheng, G.: Markerless robotic pedicle screw placement based on structured light tracking. Int. J. of Comput. Assist. Radiol. and Surg. 15(8), 1347–1358 (2020)

work page 2020
[13]

Pattern Recognit.151, 110408 (2024) 11

Lyu, M., Yang, J.,et al.: Rigid pairwise 3d point cloud registration: A survey. Pattern Recognit.151, 110408 (2024) 11

work page 2024
[14]

In: Sensor Fusion IV: Control Paradigms and Data Structures, vol

Besl, P.J., McKay, N.D.: Method for registration of 3-d shapes. In: Sensor Fusion IV: Control Paradigms and Data Structures, vol. 1611, pp. 586–606 (1992). Spie

work page 1992
[15]

Fischler, M.A., Bolles, R.C.: Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. of the ACM24(6), 381–395 (1981)

work page 1981
[16]

In: 2024 IEEE Int

Weber, M., Wild, D.,et al.: Deep learning-based point cloud registration for augmented reality-guided surgery. In: 2024 IEEE Int. Symp. on Biomed. Imaging (ISBI), pp. 1–5 (2024). IEEE

work page 2024
[17]

IEEE Trans

Ji, S., Fan, X.,et al.: Patient registration using intraoperative stereovision in image-guided open spinal surgery. IEEE Trans. on Biomed. Eng.62(9), 2177–2186 (2015)

work page 2015
[18]

In: Proc

Warner, W.R., Bhattacharya, I.,et al.: Sparse-xm: Spine pose adjustment with rgb-d bone segmentation via cross-modality label transfer. In: Proc. of Med. Image Comput. and Compu. Assist. Interv. – MICCAI 2025, vol. LNCS 15968, pp. 532–541. Springer, Switzerland (2025)

work page 2025
[19]

Liebmann, F., St¨ utz, D.,et al.: Spinedepth: a multi-modal data collection approach for automatic labelling and intraoperative spinal shape reconstruction based on rgb-d data. J. of Imaging7(9), 164 (2021)

work page 2021
[20]

Categorical Reparameterization with Gumbel-Softmax

Jang, E., Gu, S., Poole, B.: Categorical reparameterization with gumbel-softmax. arXiv preprint arXiv:1611.01144 (2016)

work page internal anchor Pith review Pith/arXiv arXiv 2016
[21]

Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation

Bengio, Y., L´ eonard, N., Courville, A.: Estimating or propagating gradi- ents through stochastic neurons for conditional computation. arXiv preprint arXiv:1308.3432 (2013)

work page internal anchor Pith review Pith/arXiv arXiv 2013
[22]

In: Proc

Thomas, H., Qi, C.R.,et al.: Kpconv: Flexible and deformable convolution for point clouds. In: Proc. of the IEEE/CVF Int. Conf. on Comput. Vis., pp. 6411– 6420 (2019)

work page 2019
[23]

IEEE Trans

Qin, Z., Yu, H.,et al.: Geotransformer: Fast and robust point cloud registration with geometric transformer. IEEE Trans. on Pattern Anal. and Mach. Intell. 45(8), 9806–9821 (2023)

work page 2023
[24]

In: Proc

Wei, T., Patel, Y.,et al.: Generalized differentiable ransac. In: Proc. of the IEEE/CVF Int. Conf. on Comput. Vis., pp. 17649–17660 (2023)

work page 2023
[25]

Ravi, N., Gabeur, V., et al.: Sam 2: Segment anything in images and videos. arxiv

work page
[26]

arXiv preprint arXiv:2408.00714

work page internal anchor Pith review Pith/arXiv arXiv
[27]

IEEE Robot

Pan, L., Cai, Z., Liu, Z.: Robust partial-to-partial point cloud registration in a full range. IEEE Robot. and Autom. Lett.9(3), 2861–2868 (2024) 12

work page 2024

[1] [1]

Luther, N., Iorgulescu, J.B.,et al.: Comparison of navigated versus non-navigated pedicle screw placement in 260 patients and 1434 screws: screw accuracy, screw size, and the complexity of surgery. Clin. Spine Surg.28(5), 298–303 (2015)

work page 2015

[2] [2]

Gelalis, I.D., Paschos, N.K.,et al.: Accuracy of pedicle screw placement: a sys- tematic review of prospective in vivo studies comparing free hand, fluoroscopy guidance and navigation techniques. Eur. spine j.21(2), 247–255 (2012)

work page 2012

[3] [3]

Karkenny, A.J., Mendelis, J.R.,et al.: The role of intraoperative navigation in orthopaedic surgery. JAAOS-J. Am. Acad. Orthop. Surg.27(19), 849–858 (2019)

work page 2019

[4] [4]

In: Semin

Floyd, E., Cohn, P.,et al.: A review of preoperative planning technologies for spinal deformity correction. In: Semin. in Spine Surg., vol. 32, p. 100787 (2020). Elsevier

work page 2020

[5] [5]

Spine28(15S), 54–61 (2003)

Holly, L.T., Foley, K.T.: Intraoperative spinal navigation. Spine28(15S), 54–61 (2003)

work page 2003

[6] [6]

The Spine J.24(6), 1087–1094 (2024)

Striano, B.M., Crawford, A.M.,et al.: Intraoperative navigation increases the pro- jected lifetime cancer risk in patients undergoing surgery for adolescent idiopathic scoliosis. The Spine J.24(6), 1087–1094 (2024)

work page 2024

[7] [7]

Tonetti, J., Boudissa, M.,et al.: Role of 3d intraoperative imaging in orthopedic and trauma surgery. Orthop. & Traumatol.: Surg. & Res.106(1), 19–25 (2020)

work page 2020

[8] [8]

Liebmann, F., Atzigen, M.,et al.: Automatic registration with continuous pose updates for marker-less surgical navigation in spine surgery. Med. Image Anal. 91, 103027 (2024)

work page 2024

[9] [9]

IEEE Trans

Hu, X., Nguyen, A., Baena, F.R.: Occlusion-robust visual markerless bone track- ing for computer-assisted orthopedic surgery. IEEE Trans. on Instrum. and Meas. 71, 1–11 (2021)

work page 2021

[10] [10]

arXiv preprint arXiv:2506.23657 (2025)

Daly, C., Marconi, E., et al.: Towards markerless intraoperative tracking of deformable spine tissue. arXiv preprint arXiv:2506.23657 (2025)

work page arXiv 2025

[11] [11]

IEEE Access8, 42010–42020 (2020)

Liu, H., Baena, F.R.Y.: Automatic markerless registration and tracking of the bone for computer-assisted orthopaedic surgery. IEEE Access8, 42010–42020 (2020)

work page 2020

[12] [12]

Zhu, S., Zhao, Z., Pan, Y., Zheng, G.: Markerless robotic pedicle screw placement based on structured light tracking. Int. J. of Comput. Assist. Radiol. and Surg. 15(8), 1347–1358 (2020)

work page 2020

[13] [13]

Pattern Recognit.151, 110408 (2024) 11

Lyu, M., Yang, J.,et al.: Rigid pairwise 3d point cloud registration: A survey. Pattern Recognit.151, 110408 (2024) 11

work page 2024

[14] [14]

In: Sensor Fusion IV: Control Paradigms and Data Structures, vol

Besl, P.J., McKay, N.D.: Method for registration of 3-d shapes. In: Sensor Fusion IV: Control Paradigms and Data Structures, vol. 1611, pp. 586–606 (1992). Spie

work page 1992

[15] [15]

Fischler, M.A., Bolles, R.C.: Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. of the ACM24(6), 381–395 (1981)

work page 1981

[16] [16]

In: 2024 IEEE Int

Weber, M., Wild, D.,et al.: Deep learning-based point cloud registration for augmented reality-guided surgery. In: 2024 IEEE Int. Symp. on Biomed. Imaging (ISBI), pp. 1–5 (2024). IEEE

work page 2024

[17] [17]

IEEE Trans

Ji, S., Fan, X.,et al.: Patient registration using intraoperative stereovision in image-guided open spinal surgery. IEEE Trans. on Biomed. Eng.62(9), 2177–2186 (2015)

work page 2015

[18] [18]

In: Proc

Warner, W.R., Bhattacharya, I.,et al.: Sparse-xm: Spine pose adjustment with rgb-d bone segmentation via cross-modality label transfer. In: Proc. of Med. Image Comput. and Compu. Assist. Interv. – MICCAI 2025, vol. LNCS 15968, pp. 532–541. Springer, Switzerland (2025)

work page 2025

[19] [19]

Liebmann, F., St¨ utz, D.,et al.: Spinedepth: a multi-modal data collection approach for automatic labelling and intraoperative spinal shape reconstruction based on rgb-d data. J. of Imaging7(9), 164 (2021)

work page 2021

[20] [20]

Categorical Reparameterization with Gumbel-Softmax

Jang, E., Gu, S., Poole, B.: Categorical reparameterization with gumbel-softmax. arXiv preprint arXiv:1611.01144 (2016)

work page internal anchor Pith review Pith/arXiv arXiv 2016

[21] [21]

Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation

Bengio, Y., L´ eonard, N., Courville, A.: Estimating or propagating gradi- ents through stochastic neurons for conditional computation. arXiv preprint arXiv:1308.3432 (2013)

work page internal anchor Pith review Pith/arXiv arXiv 2013

[22] [22]

In: Proc

Thomas, H., Qi, C.R.,et al.: Kpconv: Flexible and deformable convolution for point clouds. In: Proc. of the IEEE/CVF Int. Conf. on Comput. Vis., pp. 6411– 6420 (2019)

work page 2019

[23] [23]

IEEE Trans

Qin, Z., Yu, H.,et al.: Geotransformer: Fast and robust point cloud registration with geometric transformer. IEEE Trans. on Pattern Anal. and Mach. Intell. 45(8), 9806–9821 (2023)

work page 2023

[24] [24]

In: Proc

Wei, T., Patel, Y.,et al.: Generalized differentiable ransac. In: Proc. of the IEEE/CVF Int. Conf. on Comput. Vis., pp. 17649–17660 (2023)

work page 2023

[25] [25]

Ravi, N., Gabeur, V., et al.: Sam 2: Segment anything in images and videos. arxiv

work page

[26] [26]

arXiv preprint arXiv:2408.00714

work page internal anchor Pith review Pith/arXiv arXiv

[27] [27]

IEEE Robot

Pan, L., Cai, Z., Liu, Z.: Robust partial-to-partial point cloud registration in a full range. IEEE Robot. and Autom. Lett.9(3), 2861–2868 (2024) 12

work page 2024