End2Reg: Learning Task-Specific Segmentation for Markerless Registration in Spine Surgery
Pith reviewed 2026-05-21 17:00 UTC · model grok-4.3
The pith
End-to-end network learns segmentation masks from registration loss to boost markerless spine navigation accuracy.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
End2Reg is an end-to-end deep learning framework that jointly optimizes segmentation and registration for markerless RGB-D registration in spine surgery. The network learns task-specific segmentation masks guided solely by the registration objective, without explicit segmentation supervision. This approach achieves state-of-the-art performance, reducing median Target Registration Error by 32% and mean Root Mean Square Error by 61% on ex- and in-vivo benchmarks, while remaining robust to partial occlusions.
What carries the argument
Joint end-to-end optimization where the registration loss directly supervises the generation of task-specific segmentation masks.
If this is right
- Registration accuracy improves significantly without requiring segmentation labels.
- The learned masks are optimized specifically for the registration task rather than general anatomical accuracy.
- Performance holds under partial occlusions common in surgical settings.
- Ablation studies show that removing end-to-end training reduces accuracy.
Where Pith is reading between the lines
- Similar joint optimization could apply to other medical imaging tasks where intermediate steps like segmentation are currently supervised separately.
- If the method generalizes, it might reduce reliance on large annotated datasets in surgical AI applications.
- Future work could test if these masks reveal new anatomical insights beyond registration.
Load-bearing premise
The registration objective by itself produces segmentation outputs that are both anatomically meaningful and optimal for registration instead of latching onto irrelevant patterns in the data.
What would settle it
Observing that the generated segmentation masks do not correspond to actual anatomical structures on new patient data, or that registration error does not decrease when using these masks compared to random or manual alternatives.
read the original abstract
Intraoperative navigation in spine surgery demands millimeter-level accuracy. Currently, this is achieved through radiation-intensive intraoperative imaging and bone-anchored markers that are invasive and disrupt surgical workflow. Markerless RGB-D registration methods offer a promising alternative. However, existing approaches rely on weak segmentation labels to isolate relevant anatomical structures, potentially propagating errors through the registration process. We present End2Reg, an end-to-end deep learning framework that jointly optimizes segmentation and registration, eliminating the need for segmentation labels and manual steps. The network learns task-specific segmentation masks optimized for registration, guided solely by the registration objective without explicit segmentation supervision. End2Reg achieves state-of-the-art performance on ex- and in-vivo benchmarks, reducing median Target Registration Error by 32% and mean Root Mean Square Error by 61%, while maintaining robust performance under partial occlusions. Ablation results confirm that end-to-end optimization significantly improves registration accuracy. Overall, End2Reg advances towards fully automatic, markerless intraoperative navigation. Code and interactive visualizations are available at: https://lorenzopettinari.github.io/end-2-reg/.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces End2Reg, an end-to-end deep learning framework for markerless RGB-D registration in spine surgery. A segmentation network is trained jointly with the registration module using only the registration objective, without any explicit segmentation supervision or labels. The learned masks are claimed to be task-specific and optimized for registration. The method reports state-of-the-art results on ex-vivo and in-vivo benchmarks, including a 32% reduction in median Target Registration Error and 61% reduction in mean Root Mean Square Error, plus robustness under partial occlusions. Ablation studies are presented to support the benefit of end-to-end optimization over separate training.
Significance. If the results and the assumption that the registration loss alone yields anatomically meaningful masks hold, the work offers a clear advance toward fully automatic, markerless intraoperative navigation. Eliminating the need for weak segmentation labels addresses a practical bottleneck in surgical settings. The reported error reductions and occlusion robustness, together with public code and visualizations, strengthen the contribution to computer vision methods for medical registration.
major comments (2)
- [§4 (Results and Ablations)] The central claim that the registration objective alone produces segmentation masks that isolate relevant anatomy (rather than spurious correlations) is load-bearing but under-supported. The manuscript should include quantitative comparison of the learned masks to anatomical ground truth or expert annotations on held-out data, for example in the results or ablation sections.
- [§4.1 (Benchmark Evaluation)] Dataset sizes, cross-validation strategy, and statistical testing for the reported 32% median TRE and 61% mean RMSE reductions are not detailed in the abstract or visible summary; these are required to establish that the improvements are reliable and not driven by small-sample effects or patient-specific artifacts.
minor comments (2)
- [§3 (Method)] Clarify the precise formulation of the registration loss and how gradients flow through the segmentation output in the methods section.
- [Introduction] Add a reference to recent unsupervised or weakly-supervised segmentation approaches in medical image registration for context.
Simulated Author's Rebuttal
We thank the referee for their constructive and insightful comments on our manuscript. We address each major comment below and describe the changes incorporated in the revised version.
read point-by-point responses
-
Referee: [§4 (Results and Ablations)] The central claim that the registration objective alone produces segmentation masks that isolate relevant anatomy (rather than spurious correlations) is load-bearing but under-supported. The manuscript should include quantitative comparison of the learned masks to anatomical ground truth or expert annotations on held-out data, for example in the results or ablation sections.
Authors: We appreciate the referee's emphasis on this point. The primary support for the task-specific nature of the masks is provided by the ablation studies, which show statistically significant registration improvements when segmentation and registration are optimized jointly rather than separately. This performance gap indicates that the masks capture features relevant to registration rather than spurious correlations. We acknowledge that a quantitative comparison against expert annotations would offer additional reassurance. Because the method is designed to eliminate the need for segmentation labels, such annotations are unavailable for the primary datasets. In the revision we have added extended qualitative analysis, including side-by-side visualizations of the learned masks against manually delineated anatomical structures on a small held-out subset, together with a discussion of why the registration objective encourages anatomically coherent masks. revision: partial
-
Referee: [§4.1 (Benchmark Evaluation)] Dataset sizes, cross-validation strategy, and statistical testing for the reported 32% median TRE and 61% mean RMSE reductions are not detailed in the abstract or visible summary; these are required to establish that the improvements are reliable and not driven by small-sample effects or patient-specific artifacts.
Authors: We thank the referee for noting this presentational gap. Although the dataset composition, cross-validation protocol, and statistical tests are described in Sections 3 and 4 of the manuscript, we agree they should be more immediately visible. In the revised manuscript we have updated the abstract to state the dataset sizes and cross-validation strategy, added a concise summary table in Section 4.1 that reports the number of sequences, patients/cadavers, and the results of the Wilcoxon signed-rank test (including p-values) for the reported error reductions. revision: yes
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
The network learns task-specific segmentation masks optimized for registration, guided solely by the registration objective without explicit segmentation supervision.
-
IndisputableMonolith/Foundation/AlphaCoordinateFixation.leanalpha_pin_under_high_calibration unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We adopt the Gumbel-Softmax estimator... Straight-Through Gumbel-Softmax Estimator (ST-GS)
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Luther, N., Iorgulescu, J.B.,et al.: Comparison of navigated versus non-navigated pedicle screw placement in 260 patients and 1434 screws: screw accuracy, screw size, and the complexity of surgery. Clin. Spine Surg.28(5), 298–303 (2015)
work page 2015
-
[2]
Gelalis, I.D., Paschos, N.K.,et al.: Accuracy of pedicle screw placement: a sys- tematic review of prospective in vivo studies comparing free hand, fluoroscopy guidance and navigation techniques. Eur. spine j.21(2), 247–255 (2012)
work page 2012
-
[3]
Karkenny, A.J., Mendelis, J.R.,et al.: The role of intraoperative navigation in orthopaedic surgery. JAAOS-J. Am. Acad. Orthop. Surg.27(19), 849–858 (2019)
work page 2019
- [4]
-
[5]
Holly, L.T., Foley, K.T.: Intraoperative spinal navigation. Spine28(15S), 54–61 (2003)
work page 2003
-
[6]
The Spine J.24(6), 1087–1094 (2024)
Striano, B.M., Crawford, A.M.,et al.: Intraoperative navigation increases the pro- jected lifetime cancer risk in patients undergoing surgery for adolescent idiopathic scoliosis. The Spine J.24(6), 1087–1094 (2024)
work page 2024
-
[7]
Tonetti, J., Boudissa, M.,et al.: Role of 3d intraoperative imaging in orthopedic and trauma surgery. Orthop. & Traumatol.: Surg. & Res.106(1), 19–25 (2020)
work page 2020
-
[8]
Liebmann, F., Atzigen, M.,et al.: Automatic registration with continuous pose updates for marker-less surgical navigation in spine surgery. Med. Image Anal. 91, 103027 (2024)
work page 2024
-
[9]
Hu, X., Nguyen, A., Baena, F.R.: Occlusion-robust visual markerless bone track- ing for computer-assisted orthopedic surgery. IEEE Trans. on Instrum. and Meas. 71, 1–11 (2021)
work page 2021
-
[10]
arXiv preprint arXiv:2506.23657 (2025)
Daly, C., Marconi, E., et al.: Towards markerless intraoperative tracking of deformable spine tissue. arXiv preprint arXiv:2506.23657 (2025)
-
[11]
IEEE Access8, 42010–42020 (2020)
Liu, H., Baena, F.R.Y.: Automatic markerless registration and tracking of the bone for computer-assisted orthopaedic surgery. IEEE Access8, 42010–42020 (2020)
work page 2020
-
[12]
Zhu, S., Zhao, Z., Pan, Y., Zheng, G.: Markerless robotic pedicle screw placement based on structured light tracking. Int. J. of Comput. Assist. Radiol. and Surg. 15(8), 1347–1358 (2020)
work page 2020
-
[13]
Pattern Recognit.151, 110408 (2024) 11
Lyu, M., Yang, J.,et al.: Rigid pairwise 3d point cloud registration: A survey. Pattern Recognit.151, 110408 (2024) 11
work page 2024
-
[14]
In: Sensor Fusion IV: Control Paradigms and Data Structures, vol
Besl, P.J., McKay, N.D.: Method for registration of 3-d shapes. In: Sensor Fusion IV: Control Paradigms and Data Structures, vol. 1611, pp. 586–606 (1992). Spie
work page 1992
-
[15]
Fischler, M.A., Bolles, R.C.: Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. of the ACM24(6), 381–395 (1981)
work page 1981
-
[16]
Weber, M., Wild, D.,et al.: Deep learning-based point cloud registration for augmented reality-guided surgery. In: 2024 IEEE Int. Symp. on Biomed. Imaging (ISBI), pp. 1–5 (2024). IEEE
work page 2024
-
[17]
Ji, S., Fan, X.,et al.: Patient registration using intraoperative stereovision in image-guided open spinal surgery. IEEE Trans. on Biomed. Eng.62(9), 2177–2186 (2015)
work page 2015
- [18]
-
[19]
Liebmann, F., St¨ utz, D.,et al.: Spinedepth: a multi-modal data collection approach for automatic labelling and intraoperative spinal shape reconstruction based on rgb-d data. J. of Imaging7(9), 164 (2021)
work page 2021
-
[20]
Categorical Reparameterization with Gumbel-Softmax
Jang, E., Gu, S., Poole, B.: Categorical reparameterization with gumbel-softmax. arXiv preprint arXiv:1611.01144 (2016)
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[21]
Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation
Bengio, Y., L´ eonard, N., Courville, A.: Estimating or propagating gradi- ents through stochastic neurons for conditional computation. arXiv preprint arXiv:1308.3432 (2013)
work page internal anchor Pith review Pith/arXiv arXiv 2013
- [22]
-
[23]
Qin, Z., Yu, H.,et al.: Geotransformer: Fast and robust point cloud registration with geometric transformer. IEEE Trans. on Pattern Anal. and Mach. Intell. 45(8), 9806–9821 (2023)
work page 2023
- [24]
-
[25]
Ravi, N., Gabeur, V., et al.: Sam 2: Segment anything in images and videos. arxiv
-
[26]
arXiv preprint arXiv:2408.00714
work page internal anchor Pith review Pith/arXiv arXiv
-
[27]
Pan, L., Cai, Z., Liu, Z.: Robust partial-to-partial point cloud registration in a full range. IEEE Robot. and Autom. Lett.9(3), 2861–2868 (2024) 12
work page 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.