Warm-Started Reinforcement Learning for Iterative 3D/2D Liver Registration
Pith reviewed 2026-05-21 09:05 UTC · model grok-4.3
The pith
A warm-started reinforcement learning policy registers preoperative CT to laparoscopic video by choosing transformation steps and a stopping point automatically.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Formulating CT-to-video registration as a sequential decision process with a shared warm-started encoder and an RL policy head that outputs discrete six-degree-of-freedom actions plus a stop action produces alignments whose final target registration error is comparable to optimized supervised baselines while eliminating the need to choose step sizes or iteration counts by hand.
What carries the argument
A shared feature encoder warm-started from a supervised pose-estimation network whose output is fed to an RL policy head that selects discrete rigid transformations along six degrees of freedom and decides when to terminate the registration sequence.
If this is right
- Registration becomes fully automatic once the policy is trained, removing the requirement to select step sizes or stopping criteria for each new case.
- The learned policy converges faster than non-warm-started alternatives on the same data.
- The discrete-action formulation supplies a direct starting point for later extensions to continuous actions or deformable models.
- Accuracy remains within the range reported for supervised methods that still apply separate optimization refinement.
Where Pith is reading between the lines
- The same warm-start strategy could be tested on other rigid-registration problems where supervised pose networks already exist.
- If the stopping decision generalizes across patients, the method could reduce the need for per-case parameter tuning in operating-room software.
- Replacing the discrete action space with a learned continuous controller might further improve final alignment without changing the warm-start backbone.
Load-bearing premise
The warm-started encoder supplies sufficiently stable geometric features for the reinforcement-learning policy to learn useful transformation steps and a reliable stopping rule.
What would settle it
An ablation in which the same RL policy head is trained from random initialization instead of the supervised warm-start and still reaches the same final TRE and convergence speed on the identical public dataset.
read the original abstract
Registration between preoperative CT and intraoperative laparoscopic video plays a crucial role in augmented reality (AR) guidance for minimally invasive surgery. Learning-based methods have recently achieved registration errors comparable to optimization-based approaches while offering faster inference. However, many supervised methods produce coarse alignments that rely on additional optimization-based refinement, thereby increasing inference time. We present a discrete-action reinforcement learning (RL) framework that formulates CT-to-video registration as a sequential decision-making process. A shared feature encoder, warm-started from a supervised pose estimation network to provide stable geometric features and faster convergence, extracts representations from CT renderings and laparoscopic frames, while an RL policy head learns to choose rigid transformations along six degrees of freedom and to decide when to stop the iteration. Experiments on a public laparoscopic dataset demonstrated that our method achieved an average target registration error (TRE) of 15.70 mm, comparable to supervised approaches with optimization, while achieving faster convergence. The proposed RL-based formulation enables automated, efficient iterative registration without manually tuned step sizes or stopping criteria. This discrete framework provides a practical foundation for future continuous-action and deformable registration models in surgical AR applications.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces a discrete-action reinforcement learning framework for iterative 3D/2D registration of preoperative CT to intraoperative laparoscopic video for liver AR guidance. A shared feature encoder is warm-started from a supervised pose estimation network to extract geometric representations from CT renderings and video frames; an RL policy head then selects rigid 6DoF transformations and learns an automated stopping criterion. Experiments on a public laparoscopic dataset report an average target registration error (TRE) of 15.70 mm, claimed to be comparable to supervised methods that include optimization-based refinement while achieving faster convergence without manual step-size or stopping tuning.
Significance. If the reported TRE and convergence advantages are confirmed with explicit baselines, ablations, and statistical support, the work would provide a useful automated alternative to hybrid supervised-plus-optimization pipelines in surgical AR. The RL formulation for joint transformation selection and stopping is a conceptually clean contribution that could serve as a foundation for continuous-action and deformable extensions.
major comments (2)
- [Abstract / Experiments] Abstract and Experiments section: The central claim of an average TRE of 15.70 mm being 'comparable to supervised approaches with optimization' is presented without any tabulated baseline TRE values, error bars, dataset size, number of test cases, or statistical significance tests. This absence prevents verification of the comparability and faster-convergence assertions that constitute the primary empirical result.
- [Method] Method section: The key modeling assumption that the warm-started shared feature encoder supplies stable geometric features enabling the RL policy to learn 6DoF actions and stopping is load-bearing for both convergence speed and final accuracy, yet no ablation is described that compares warm-start initialization against random initialization or alternative pretraining schemes.
minor comments (2)
- [Abstract] Abstract: The phrase 'public laparoscopic dataset' should be accompanied by the dataset name and citation for reproducibility.
- [Method] Notation: The six degrees of freedom and the discrete action space would benefit from an explicit equation or table defining the action set and the stopping action.
Simulated Author's Rebuttal
We thank the referee for the thoughtful and constructive review. The comments highlight important areas for strengthening the empirical presentation and validation of our approach. We address each major comment below and have prepared revisions to the manuscript accordingly.
read point-by-point responses
-
Referee: [Abstract / Experiments] Abstract and Experiments section: The central claim of an average TRE of 15.70 mm being 'comparable to supervised approaches with optimization' is presented without any tabulated baseline TRE values, error bars, dataset size, number of test cases, or statistical significance tests. This absence prevents verification of the comparability and faster-convergence assertions that constitute the primary empirical result.
Authors: We agree that explicit baseline values, variability measures, dataset details, and statistical tests are needed to substantiate the comparability and convergence claims. In the revised manuscript we will add a results table in the Experiments section listing TRE for our method alongside relevant supervised and hybrid baselines (with and without optimization), including standard deviations as error bars, the precise number of test cases drawn from the public laparoscopic dataset, and p-values from appropriate statistical tests. We will also include iteration-wise convergence curves to directly support the faster-convergence statement without manual tuning. revision: yes
-
Referee: [Method] Method section: The key modeling assumption that the warm-started shared feature encoder supplies stable geometric features enabling the RL policy to learn 6DoF actions and stopping is load-bearing for both convergence speed and final accuracy, yet no ablation is described that compares warm-start initialization against random initialization or alternative pretraining schemes.
Authors: We concur that an ablation isolating the contribution of the warm-start is essential to support the modeling assumption. The revised manuscript will include a new ablation subsection that trains the RL policy from random encoder initialization and from at least one alternative pretraining scheme, reporting both final TRE and convergence speed (iterations to stopping) for each variant. This will quantify the benefit of the supervised warm-start for stable geometric features and automated stopping. revision: yes
Circularity Check
No significant circularity in derivation chain
full rationale
The paper formulates CT-to-video registration as a discrete-action RL process with a shared feature encoder warm-started from a supervised pose estimation network. The reported average TRE of 15.70 mm is presented strictly as an experimental outcome on a public laparoscopic dataset, not as a quantity derived from or reduced to the method equations by construction. No self-definitional steps, fitted inputs renamed as predictions, or load-bearing self-citations appear in the abstract or method description. The derivation relies on standard RL policy learning and empirical validation, remaining self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Rigid transformations along six degrees of freedom are sufficient to model the required alignment between preoperative CT and intraoperative video.
- domain assumption A public laparoscopic dataset provides a representative testbed for evaluating registration accuracy in surgical AR.
Reference graph
Works this paper leans on
-
[1]
Medical Image Analysis90, 102943 (2023)
Ramalhinho, J., Yoo, S., Dowrick, T., Koo, B., Somasundaram, M., Gurusamy, K., Hawkes, D.J., Davidson, B., Blandford, A., Clarkson, M.J.: The value of augmented reality in surgery—a usability study on laparoscopic liver surgery. Medical Image Analysis90, 102943 (2023)
work page 2023
-
[2]
Surgical endoscopy34(10), 4702–4711 (2020)
Schneider, C., Thompson, S., Totz, J., Song, Y., Allam, M., Sodergren, M., Des- jardins, A., Barratt, D., Ourselin, S., Gurusamy, K.,et al.: Comparison of manual and semi-automatic registration in augmented reality image-guided liver surgery: a clinical feasibility study. Surgical endoscopy34(10), 4702–4711 (2020)
work page 2020
-
[3]
Medical image analysis99, 103371 (2025)
Ali, S., Espinel, Y., Jin, Y., Liu, P., G¨ uttner, B., Zhang, X., Zhang, L., Dowrick, T., Clarkson, M.J., Xiao, S.,et al.: An objective comparison of methods for aug- mented reality in laparoscopic liver resection by preoperative-to-intraoperative image fusion from the miccai2022 challenge. Medical image analysis99, 103371 (2025)
work page 2025
-
[4]
International journal of computer assisted radiology and surgery 11(8), 1499–1513 (2016)
Fusaglia, M., Hess, H., Schwalbe, M., Peterhans, M., Tinguely, P., Weber, S., Lu, H.: A clinically applicable laser-based image-guided system for laparoscopic liver procedures. International journal of computer assisted radiology and surgery 11(8), 1499–1513 (2016)
work page 2016
-
[5]
Medical image analysis69, 101946 (2021)
Pelanis, E., Teatini, A., Eigl, B., Regensburger, A., Alzaga, A., Kumar, R.P., Rudolph, T., Aghayan, D.L., Riediger, C., Kvarnstr¨ om, N.,et al.: Evaluation of a novel navigation platform for laparoscopic liver surgery with organ deformation compensation using injected fiducials. Medical image analysis69, 101946 (2021)
work page 2021
-
[6]
International Journal of Computer Assisted Radiology and Surgery13(6), 947–956 (2018)
Robu, M.R., Ramalhinho, J., Thompson, S., Gurusamy, K., Davidson, B., Hawkes, D., Stoyanov, D., Clarkson, M.J.: Global rigid registration of ct to video in laparoscopic liver surgery. International Journal of Computer Assisted Radiology and Surgery13(6), 947–956 (2018)
work page 2018
-
[7]
Computer methods and programs in biomedicine187, 105099 (2020)
Luo, H., Yin, D., Zhang, S., Xiao, D., He, B., Meng, F., Zhang, Y., Cai, W., He, S., Zhang, W.,et al.: Augmented reality navigation for liver resection with a 12 stereoscopic laparoscope. Computer methods and programs in biomedicine187, 105099 (2020)
work page 2020
-
[8]
IEEE Transactions on Medical Imaging (2024)
Zhang, Y., Zou, Y., Liu, P.X.: Point cloud registration in laparoscopic liver surgery using keypoint correspondence registration network. IEEE Transactions on Medical Imaging (2024)
work page 2024
-
[9]
International journal of computer assisted radiology and surgery18(6), 1025–1032 (2023)
Yang, Z., Simon, R., Linte, C.A.: Learning feature descriptors for pre-and intra- operative point cloud matching for laparoscopic liver registration. International journal of computer assisted radiology and surgery18(6), 1025–1032 (2023)
work page 2023
-
[10]
International Journal of Computer Assisted Radiology and Surgery17(1), 167–176 (2022)
Koo, B., Robu, M.R., Allam, M., Pfeiffer, M., Thompson, S., Gurusamy, K., Davidson, B., Speidel, S., Hawkes, D., Stoyanov, D.,et al.: Automatic, global reg- istration in laparoscopic liver surgery. International Journal of Computer Assisted Radiology and Surgery17(1), 167–176 (2022)
work page 2022
-
[11]
In: International Workshop on Advances in Simplifying Medical Ultrasound, pp
Monta˜ na-Brown, N., Ramalhinho, J., Koo, B., Allam, M., Davidson, B., Gurusamy, K., Hu, Y., Clarkson, M.J.: Towards multi-modal self-supervised video and ultrasound pose estimation for laparoscopic liver surgery. In: International Workshop on Advances in Simplifying Medical Ultrasound, pp. 183–192 (2022). Springer
work page 2022
-
[12]
International Journal of Computer Assisted Radiology and Surgery, 1–9 (2025)
Zhang, H., Bulathsinhala, S., Davidson, B.R., Clarkson, M.J., Ramalhinho, J.: Deep hashing for global registration of preoperative ct and video images for laparoscopic liver surgery. International Journal of Computer Assisted Radiology and Surgery, 1–9 (2025)
work page 2025
-
[13]
International Journal of Imaging Systems and Technology35(4), 70124 (2025)
Hao, J., He, B., Dai, Y., Li, Y., Wang, Y., Zhao, R., Lian, R., Zeng, X., Tao, H., Yang, J.,et al.: A 3d-2d rigid liver registration method using pre-training and transfer learning with staged alignment of anatomical landmarks. International Journal of Imaging Systems and Technology35(4), 70124 (2025)
work page 2025
-
[14]
In: International Conference on Medi- cal Image Computing and Computer-Assisted Intervention, pp
Gadoux, E., Bartoli, A.: Automatic deep deformable registration using domain adaptation and run-time optimisation. In: International Conference on Medi- cal Image Computing and Computer-Assisted Intervention, pp. 65–74 (2025). Springer
work page 2025
-
[15]
Labrunie, M., Pizarro, D., Tilmant, C., Bartoli, A.: Automatic 3d/2d deformable registration in minimally invasive liver resection using a mesh recovery network. In: MIDL, pp. 1104–1123 (2023)
work page 2023
-
[16]
International Journal of Computer Assisted Radiology and Surgery 17(8), 1429–1436 (2022)
Labrunie, M., Ribeiro, M., Mourthadhoi, F., Tilmant, C., Le Roy, B., Buc, E., Bartoli, A.: Automatic preoperative 3d model registration in laparoscopic liver resection. International Journal of Computer Assisted Radiology and Surgery 17(8), 1429–1436 (2022)
work page 2022
-
[17]
arXiv preprint arXiv:2602.17517 (2026) https://doi.org/10.48550/arXiv.2602.17517
Zhang, H., He, L., He, R., Kadkhodamohammadi, A., Stoyanov, D., David- son, B.R., Mazomenos, E.B., Clarkson, M.J.: Foundationpose-initialized 3d-2d liver registration for surgical augmented reality. arXiv preprint arXiv:2602.17517 (2026) https://doi.org/10.48550/arXiv.2602.17517
work page internal anchor Pith review doi:10.48550/arxiv.2602.17517 2026
-
[18]
International Journal of Control, Automation and Systems23(5), 1271–1306 (2025) 13
Seo, J., Yoo, S., Chang, J., An, H., Ryu, H., Lee, S., Kruthiventy, A., Choi, J., Horowitz, R.: Se (3)-equivariant robot learning and control: A tutorial sur- vey. International Journal of Control, Automation and Systems23(5), 1271–1306 (2025) 13
work page 2025
-
[19]
State Estimation for Robotics, pp
Barfoot, T.D.: Matrix Lie Groups. State Estimation for Robotics, pp. 205–284. Cambridge University Press, Cambridge (2017)
work page 2017
-
[20]
Rabbani, N., Calvet, L., Espinel, Y., Le Roy, B., Ribeiro, M., Buc, E., Bartoli, A.: A methodology and clinical dataset with ground-truth to evaluate registra- tion accuracy quantitatively in computer-assisted laparoscopic liver resection. Computer Methods in Biomechanics and Biomedical Engineering: Imaging & Visualization10(4), 441–450 (2022)
work page 2022
-
[21]
International Journal of Computer Assisted Radiology and Surgery20(1), 57–64 (2025) 14
Mhiri, I., Pizarro, D., Bartoli, A.: Neural patient-specific 3d–2d registration in laparoscopic liver resection. International Journal of Computer Assisted Radiology and Surgery20(1), 57–64 (2025) 14
work page 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.