pith. sign in

arxiv: 2604.10245 · v2 · pith:YYUIGAWDnew · submitted 2026-04-11 · 💻 cs.CV · physics.med-ph

Warm-Started Reinforcement Learning for Iterative 3D/2D Liver Registration

Pith reviewed 2026-05-21 09:05 UTC · model grok-4.3

classification 💻 cs.CV physics.med-ph
keywords liver registrationreinforcement learningCT to laparoscopic videoiterative registrationsurgical augmented realitywarm-started learningtarget registration error
0
0 comments X

The pith

A warm-started reinforcement learning policy registers preoperative CT to laparoscopic video by choosing transformation steps and a stopping point automatically.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that registration between a preoperative CT scan and live laparoscopic video can be cast as a discrete reinforcement learning task where an agent learns both how to adjust the alignment and when to halt the process. A feature encoder pre-trained on a supervised pose-estimation task supplies stable geometric cues that let the policy head converge quickly without hand-tuned step sizes or stopping thresholds. On a public laparoscopic liver dataset the approach reaches an average target registration error of 15.70 mm, matching the accuracy of supervised methods that still require later optimization stages. Because the entire iteration is learned rather than engineered, the method removes a common source of manual intervention in surgical augmented-reality pipelines.

Core claim

Formulating CT-to-video registration as a sequential decision process with a shared warm-started encoder and an RL policy head that outputs discrete six-degree-of-freedom actions plus a stop action produces alignments whose final target registration error is comparable to optimized supervised baselines while eliminating the need to choose step sizes or iteration counts by hand.

What carries the argument

A shared feature encoder warm-started from a supervised pose-estimation network whose output is fed to an RL policy head that selects discrete rigid transformations along six degrees of freedom and decides when to terminate the registration sequence.

If this is right

  • Registration becomes fully automatic once the policy is trained, removing the requirement to select step sizes or stopping criteria for each new case.
  • The learned policy converges faster than non-warm-started alternatives on the same data.
  • The discrete-action formulation supplies a direct starting point for later extensions to continuous actions or deformable models.
  • Accuracy remains within the range reported for supervised methods that still apply separate optimization refinement.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same warm-start strategy could be tested on other rigid-registration problems where supervised pose networks already exist.
  • If the stopping decision generalizes across patients, the method could reduce the need for per-case parameter tuning in operating-room software.
  • Replacing the discrete action space with a learned continuous controller might further improve final alignment without changing the warm-start backbone.

Load-bearing premise

The warm-started encoder supplies sufficiently stable geometric features for the reinforcement-learning policy to learn useful transformation steps and a reliable stopping rule.

What would settle it

An ablation in which the same RL policy head is trained from random initialization instead of the supervised warm-start and still reaches the same final TRE and convergence speed on the identical public dataset.

read the original abstract

Registration between preoperative CT and intraoperative laparoscopic video plays a crucial role in augmented reality (AR) guidance for minimally invasive surgery. Learning-based methods have recently achieved registration errors comparable to optimization-based approaches while offering faster inference. However, many supervised methods produce coarse alignments that rely on additional optimization-based refinement, thereby increasing inference time. We present a discrete-action reinforcement learning (RL) framework that formulates CT-to-video registration as a sequential decision-making process. A shared feature encoder, warm-started from a supervised pose estimation network to provide stable geometric features and faster convergence, extracts representations from CT renderings and laparoscopic frames, while an RL policy head learns to choose rigid transformations along six degrees of freedom and to decide when to stop the iteration. Experiments on a public laparoscopic dataset demonstrated that our method achieved an average target registration error (TRE) of 15.70 mm, comparable to supervised approaches with optimization, while achieving faster convergence. The proposed RL-based formulation enables automated, efficient iterative registration without manually tuned step sizes or stopping criteria. This discrete framework provides a practical foundation for future continuous-action and deformable registration models in surgical AR applications.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces a discrete-action reinforcement learning framework for iterative 3D/2D registration of preoperative CT to intraoperative laparoscopic video for liver AR guidance. A shared feature encoder is warm-started from a supervised pose estimation network to extract geometric representations from CT renderings and video frames; an RL policy head then selects rigid 6DoF transformations and learns an automated stopping criterion. Experiments on a public laparoscopic dataset report an average target registration error (TRE) of 15.70 mm, claimed to be comparable to supervised methods that include optimization-based refinement while achieving faster convergence without manual step-size or stopping tuning.

Significance. If the reported TRE and convergence advantages are confirmed with explicit baselines, ablations, and statistical support, the work would provide a useful automated alternative to hybrid supervised-plus-optimization pipelines in surgical AR. The RL formulation for joint transformation selection and stopping is a conceptually clean contribution that could serve as a foundation for continuous-action and deformable extensions.

major comments (2)
  1. [Abstract / Experiments] Abstract and Experiments section: The central claim of an average TRE of 15.70 mm being 'comparable to supervised approaches with optimization' is presented without any tabulated baseline TRE values, error bars, dataset size, number of test cases, or statistical significance tests. This absence prevents verification of the comparability and faster-convergence assertions that constitute the primary empirical result.
  2. [Method] Method section: The key modeling assumption that the warm-started shared feature encoder supplies stable geometric features enabling the RL policy to learn 6DoF actions and stopping is load-bearing for both convergence speed and final accuracy, yet no ablation is described that compares warm-start initialization against random initialization or alternative pretraining schemes.
minor comments (2)
  1. [Abstract] Abstract: The phrase 'public laparoscopic dataset' should be accompanied by the dataset name and citation for reproducibility.
  2. [Method] Notation: The six degrees of freedom and the discrete action space would benefit from an explicit equation or table defining the action set and the stopping action.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thoughtful and constructive review. The comments highlight important areas for strengthening the empirical presentation and validation of our approach. We address each major comment below and have prepared revisions to the manuscript accordingly.

read point-by-point responses
  1. Referee: [Abstract / Experiments] Abstract and Experiments section: The central claim of an average TRE of 15.70 mm being 'comparable to supervised approaches with optimization' is presented without any tabulated baseline TRE values, error bars, dataset size, number of test cases, or statistical significance tests. This absence prevents verification of the comparability and faster-convergence assertions that constitute the primary empirical result.

    Authors: We agree that explicit baseline values, variability measures, dataset details, and statistical tests are needed to substantiate the comparability and convergence claims. In the revised manuscript we will add a results table in the Experiments section listing TRE for our method alongside relevant supervised and hybrid baselines (with and without optimization), including standard deviations as error bars, the precise number of test cases drawn from the public laparoscopic dataset, and p-values from appropriate statistical tests. We will also include iteration-wise convergence curves to directly support the faster-convergence statement without manual tuning. revision: yes

  2. Referee: [Method] Method section: The key modeling assumption that the warm-started shared feature encoder supplies stable geometric features enabling the RL policy to learn 6DoF actions and stopping is load-bearing for both convergence speed and final accuracy, yet no ablation is described that compares warm-start initialization against random initialization or alternative pretraining schemes.

    Authors: We concur that an ablation isolating the contribution of the warm-start is essential to support the modeling assumption. The revised manuscript will include a new ablation subsection that trains the RL policy from random encoder initialization and from at least one alternative pretraining scheme, reporting both final TRE and convergence speed (iterations to stopping) for each variant. This will quantify the benefit of the supervised warm-start for stable geometric features and automated stopping. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper formulates CT-to-video registration as a discrete-action RL process with a shared feature encoder warm-started from a supervised pose estimation network. The reported average TRE of 15.70 mm is presented strictly as an experimental outcome on a public laparoscopic dataset, not as a quantity derived from or reduced to the method equations by construction. No self-definitional steps, fitted inputs renamed as predictions, or load-bearing self-citations appear in the abstract or method description. The derivation relies on standard RL policy learning and empirical validation, remaining self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on standard domain assumptions in medical image registration and RL; no free parameters or invented entities are explicitly introduced in the abstract.

axioms (2)
  • domain assumption Rigid transformations along six degrees of freedom are sufficient to model the required alignment between preoperative CT and intraoperative video.
    Invoked by the formulation of the RL policy choosing rigid transformations.
  • domain assumption A public laparoscopic dataset provides a representative testbed for evaluating registration accuracy in surgical AR.
    Used to support the reported TRE result.

pith-pipeline@v0.9.0 · 5767 in / 1336 out tokens · 41751 ms · 2026-05-21T09:05:37.087923+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

21 extracted references · 21 canonical work pages · 1 internal anchor

  1. [1]

    Medical Image Analysis90, 102943 (2023)

    Ramalhinho, J., Yoo, S., Dowrick, T., Koo, B., Somasundaram, M., Gurusamy, K., Hawkes, D.J., Davidson, B., Blandford, A., Clarkson, M.J.: The value of augmented reality in surgery—a usability study on laparoscopic liver surgery. Medical Image Analysis90, 102943 (2023)

  2. [2]

    Surgical endoscopy34(10), 4702–4711 (2020)

    Schneider, C., Thompson, S., Totz, J., Song, Y., Allam, M., Sodergren, M., Des- jardins, A., Barratt, D., Ourselin, S., Gurusamy, K.,et al.: Comparison of manual and semi-automatic registration in augmented reality image-guided liver surgery: a clinical feasibility study. Surgical endoscopy34(10), 4702–4711 (2020)

  3. [3]

    Medical image analysis99, 103371 (2025)

    Ali, S., Espinel, Y., Jin, Y., Liu, P., G¨ uttner, B., Zhang, X., Zhang, L., Dowrick, T., Clarkson, M.J., Xiao, S.,et al.: An objective comparison of methods for aug- mented reality in laparoscopic liver resection by preoperative-to-intraoperative image fusion from the miccai2022 challenge. Medical image analysis99, 103371 (2025)

  4. [4]

    International journal of computer assisted radiology and surgery 11(8), 1499–1513 (2016)

    Fusaglia, M., Hess, H., Schwalbe, M., Peterhans, M., Tinguely, P., Weber, S., Lu, H.: A clinically applicable laser-based image-guided system for laparoscopic liver procedures. International journal of computer assisted radiology and surgery 11(8), 1499–1513 (2016)

  5. [5]

    Medical image analysis69, 101946 (2021)

    Pelanis, E., Teatini, A., Eigl, B., Regensburger, A., Alzaga, A., Kumar, R.P., Rudolph, T., Aghayan, D.L., Riediger, C., Kvarnstr¨ om, N.,et al.: Evaluation of a novel navigation platform for laparoscopic liver surgery with organ deformation compensation using injected fiducials. Medical image analysis69, 101946 (2021)

  6. [6]

    International Journal of Computer Assisted Radiology and Surgery13(6), 947–956 (2018)

    Robu, M.R., Ramalhinho, J., Thompson, S., Gurusamy, K., Davidson, B., Hawkes, D., Stoyanov, D., Clarkson, M.J.: Global rigid registration of ct to video in laparoscopic liver surgery. International Journal of Computer Assisted Radiology and Surgery13(6), 947–956 (2018)

  7. [7]

    Computer methods and programs in biomedicine187, 105099 (2020)

    Luo, H., Yin, D., Zhang, S., Xiao, D., He, B., Meng, F., Zhang, Y., Cai, W., He, S., Zhang, W.,et al.: Augmented reality navigation for liver resection with a 12 stereoscopic laparoscope. Computer methods and programs in biomedicine187, 105099 (2020)

  8. [8]

    IEEE Transactions on Medical Imaging (2024)

    Zhang, Y., Zou, Y., Liu, P.X.: Point cloud registration in laparoscopic liver surgery using keypoint correspondence registration network. IEEE Transactions on Medical Imaging (2024)

  9. [9]

    International journal of computer assisted radiology and surgery18(6), 1025–1032 (2023)

    Yang, Z., Simon, R., Linte, C.A.: Learning feature descriptors for pre-and intra- operative point cloud matching for laparoscopic liver registration. International journal of computer assisted radiology and surgery18(6), 1025–1032 (2023)

  10. [10]

    International Journal of Computer Assisted Radiology and Surgery17(1), 167–176 (2022)

    Koo, B., Robu, M.R., Allam, M., Pfeiffer, M., Thompson, S., Gurusamy, K., Davidson, B., Speidel, S., Hawkes, D., Stoyanov, D.,et al.: Automatic, global reg- istration in laparoscopic liver surgery. International Journal of Computer Assisted Radiology and Surgery17(1), 167–176 (2022)

  11. [11]

    In: International Workshop on Advances in Simplifying Medical Ultrasound, pp

    Monta˜ na-Brown, N., Ramalhinho, J., Koo, B., Allam, M., Davidson, B., Gurusamy, K., Hu, Y., Clarkson, M.J.: Towards multi-modal self-supervised video and ultrasound pose estimation for laparoscopic liver surgery. In: International Workshop on Advances in Simplifying Medical Ultrasound, pp. 183–192 (2022). Springer

  12. [12]

    International Journal of Computer Assisted Radiology and Surgery, 1–9 (2025)

    Zhang, H., Bulathsinhala, S., Davidson, B.R., Clarkson, M.J., Ramalhinho, J.: Deep hashing for global registration of preoperative ct and video images for laparoscopic liver surgery. International Journal of Computer Assisted Radiology and Surgery, 1–9 (2025)

  13. [13]

    International Journal of Imaging Systems and Technology35(4), 70124 (2025)

    Hao, J., He, B., Dai, Y., Li, Y., Wang, Y., Zhao, R., Lian, R., Zeng, X., Tao, H., Yang, J.,et al.: A 3d-2d rigid liver registration method using pre-training and transfer learning with staged alignment of anatomical landmarks. International Journal of Imaging Systems and Technology35(4), 70124 (2025)

  14. [14]

    In: International Conference on Medi- cal Image Computing and Computer-Assisted Intervention, pp

    Gadoux, E., Bartoli, A.: Automatic deep deformable registration using domain adaptation and run-time optimisation. In: International Conference on Medi- cal Image Computing and Computer-Assisted Intervention, pp. 65–74 (2025). Springer

  15. [15]

    In: MIDL, pp

    Labrunie, M., Pizarro, D., Tilmant, C., Bartoli, A.: Automatic 3d/2d deformable registration in minimally invasive liver resection using a mesh recovery network. In: MIDL, pp. 1104–1123 (2023)

  16. [16]

    International Journal of Computer Assisted Radiology and Surgery 17(8), 1429–1436 (2022)

    Labrunie, M., Ribeiro, M., Mourthadhoi, F., Tilmant, C., Le Roy, B., Buc, E., Bartoli, A.: Automatic preoperative 3d model registration in laparoscopic liver resection. International Journal of Computer Assisted Radiology and Surgery 17(8), 1429–1436 (2022)

  17. [17]

    arXiv preprint arXiv:2602.17517 (2026) https://doi.org/10.48550/arXiv.2602.17517

    Zhang, H., He, L., He, R., Kadkhodamohammadi, A., Stoyanov, D., David- son, B.R., Mazomenos, E.B., Clarkson, M.J.: Foundationpose-initialized 3d-2d liver registration for surgical augmented reality. arXiv preprint arXiv:2602.17517 (2026) https://doi.org/10.48550/arXiv.2602.17517

  18. [18]

    International Journal of Control, Automation and Systems23(5), 1271–1306 (2025) 13

    Seo, J., Yoo, S., Chang, J., An, H., Ryu, H., Lee, S., Kruthiventy, A., Choi, J., Horowitz, R.: Se (3)-equivariant robot learning and control: A tutorial sur- vey. International Journal of Control, Automation and Systems23(5), 1271–1306 (2025) 13

  19. [19]

    State Estimation for Robotics, pp

    Barfoot, T.D.: Matrix Lie Groups. State Estimation for Robotics, pp. 205–284. Cambridge University Press, Cambridge (2017)

  20. [20]

    Computer Methods in Biomechanics and Biomedical Engineering: Imaging & Visualization10(4), 441–450 (2022)

    Rabbani, N., Calvet, L., Espinel, Y., Le Roy, B., Ribeiro, M., Buc, E., Bartoli, A.: A methodology and clinical dataset with ground-truth to evaluate registra- tion accuracy quantitatively in computer-assisted laparoscopic liver resection. Computer Methods in Biomechanics and Biomedical Engineering: Imaging & Visualization10(4), 441–450 (2022)

  21. [21]

    International Journal of Computer Assisted Radiology and Surgery20(1), 57–64 (2025) 14

    Mhiri, I., Pizarro, D., Bartoli, A.: Neural patient-specific 3d–2d registration in laparoscopic liver resection. International Journal of Computer Assisted Radiology and Surgery20(1), 57–64 (2025) 14