SutureFormer: Learning Surgical Trajectories via Goal-conditioned Offline RL in Pixel Space

Chunlin Tian; Guy Rosman; Huanrong Liu; Qingbiao Li; Qin Liu; Tailai Zhou; Tongyu Jia; Xin Ma; Yu Gao; Yun Gu

arxiv: 2603.26720 · v2 · pith:I4X36GSJnew · submitted 2026-03-19 · 💻 cs.RO · cs.AI

SutureFormer: Learning Surgical Trajectories via Goal-conditioned Offline RL in Pixel Space

Huanrong Liu , Chunlin Tian , Tongyu Jia , Tailai Zhou , Qin Liu , Yu Gao , Yutong Ban , Yun Gu

show 3 more authors

Guy Rosman Xin Ma Qingbiao Li

This is my paper

Pith reviewed 2026-05-21 11:30 UTC · model grok-4.3

classification 💻 cs.RO cs.AI

keywords surgical needle trajectorygoal-conditioned offline RLpixel space predictionendoscopic video analysisrobot-assisted suturingcubic spline interpolationconservative Q-learningaverage displacement error

0 comments

The pith

By treating the needle tip as an agent that takes sequential actions in pixel space, SutureFormer learns more accurate surgical trajectories from endoscopic videos using goal-conditioned offline reinforcement learning.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper aims to improve prediction of surgical needle paths in robot-assisted suturing by reframing the task as a goal-conditioned offline reinforcement learning problem in pixel space. This approach models the needle tip's movement step by step, capturing the sequential nature of motion that direct learning methods miss. Sparse waypoint annotations are turned into dense rewards through cubic spline interpolation, allowing the policy to learn plausible transitions while following expert guidance. If successful, this would lead to better anticipatory planning and safer motion in surgical robots by reducing prediction errors substantially on real patient data.

Core claim

SutureFormer formulates needle trajectory prediction as a sequential decision-making task where the needle tip is an agent moving in pixel space. It uses an observation encoder to process variable-length video clips and predicts future waypoints autoregressively via actions of discrete directions and continuous magnitudes. Dense rewards are generated from sparse annotations using cubic spline interpolation, and the policy is trained with Conservative Q-Learning regularized by Behavioral Cloning on a dataset of 1,158 trajectories from 50 patients, achieving a 58.6% reduction in Average Displacement Error compared to the strongest baseline.

What carries the argument

Goal-conditioned offline reinforcement learning with Conservative Q-Learning and Behavioral Cloning regularization, applied to pixel-space actions consisting of discrete directions and continuous magnitudes, with cubic spline interpolation for dense rewards from sparse waypoints.

Load-bearing premise

Cubic spline interpolation of sparse waypoint annotations creates dense reward signals that accurately capture physically plausible pixel-wise state transitions without introducing artifacts or biases into the learned policy.

What would settle it

Observing that the interpolated dense trajectories contain non-physical jumps or curves that do not match actual needle motion recorded at higher frame rates, or that the model's performance advantage disappears when tested on densely annotated ground truth data without relying on interpolation.

Figures

Figures reproduced from arXiv: 2603.26720 by Chunlin Tian, Guy Rosman, Huanrong Liu, Qingbiao Li, Qin Liu, Tailai Zhou, Tongyu Jia, Xin Ma, Yu Gao, Yun Gu, Yutong Ban.

**Figure 1.** Figure 1: Overview of the proposed framework. (i) Given the observed video segment, the observation encoder extracts local visual guidance features from needle-centered crops and aggregates their temporal dependencies with a Transformer to obtain the contextual representation zc. (ii) At each prediction step k, the goal-conditioned state encoder constructs the state sk by combining zc with the encoded current positi… view at source ↗

**Figure 2.** Figure 2: Qualitative comparison of predicted trajectories on the testset. The yellow curve denotes the observed trajectory, the green curve represents the ground truth future trajectory, the red curve shows the prediction from our SutureAgent and the blue curve indicates the best baseline prediction. more accurate trajectories with better shape consistency even under sparse observations. The goal-conditioned navig… view at source ↗

**Figure 3.** Figure 3: Distribution of Average Displacement Error (ADE) across all methods on the test set. (a) Violin plot showing the ADE distribution for each method, with individual data points overlaid. Black diamonds indicate the mean and white horizontal lines indicate the median. (b) Empirical cumulative distribution function (CDF) of ADE. The dashed vertical line marks the ADE = 100 pixel threshold, where our method ach… view at source ↗

**Figure 4.** Figure 4: Per-trajectory Q-value curves on four test trajectories of increasing prediction horizon, demonstrating generalisation across variable-length sequences. Qpolicy(sk, aπ k ) (solid blue) is the pessimistic value estimate min(Q1, Q2) of the policy’s chosen action at step k; Qexpert(sk, a∗ k) (dashed red) is the value of the corresponding groundtruth expert action. Orange vertical lines indicate keyframe posi… view at source ↗

read the original abstract

Predicting surgical needle trajectories from endoscopic video is critical for robot-assisted suturing, enabling anticipatory planning, real-time guidance, and safer motion execution. Existing methods that directly learn motion distributions from visual observations tend to overlook the sequential dependency among adjacent motion steps. Moreover, sparse waypoint annotations often fail to provide sufficient supervision, further increasing the difficulty of supervised or imitation learning methods. To address these challenges, we formulate image-based needle trajectory prediction as a sequential decision-making problem, in which the needle tip is treated as an agent that moves step by step in pixel space. This formulation naturally captures the continuity of needle motion and enables the explicit modeling of physically plausible pixel-wise state transitions over time. From this perspective, we propose SutureFormer, a goal-conditioned offline reinforcement learning framework that leverages sparse annotations to dense reward signals via cubic spline interpolation, encouraging the policy to exploit limited expert guidance while exploring plausible future motion paths. SutureFormer encodes variable-length clips using an observation encoder to capture both local spatial cues and long-range temporal dynamics, and autoregressively predicts future waypoints through actions composed of discrete directions and continuous magnitudes. To enable stable offline policy optimization from expert demonstrations, we adopt Conservative Q-Learning with Behavioral Cloning regularization. Experiments on a new kidney wound suturing dataset containing 1,158 trajectories from 50 patients show that SutureFormer reduces Average Displacement Error by 58.6% compared with the strongest baseline, demonstrating the effectiveness of modeling needle trajectory prediction as pixel-level sequential action learning.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

SutureFormer reports a 58% ADE drop on a new kidney suturing dataset by framing needle tracking as goal-conditioned offline RL in pixel space with spline rewards, but the experiments need more detail to pin down where the gain comes from.

read the letter

The headline take is that SutureFormer shows a substantial error reduction in surgical needle trajectory prediction by casting it as goal-conditioned offline RL in pixel space on a new patient dataset. The work is new in applying this RL framing to the suturing task, with an observation encoder for clips and autoregressive action prediction of directions and magnitudes. It uses CQL plus behavioral cloning to train stably from expert demos, and cubic spline interpolation to create dense rewards from sparse waypoints. This setup lets the model explore plausible paths while sticking close to limited annotations. The new kidney suturing dataset with over a thousand trajectories from fifty patients adds concrete value for the field. It does well at highlighting the sequential nature of the motion, which direct supervised methods can overlook. Treating the needle tip as an agent moving step by step in pixels captures continuity without needing full 3D models. The soft spots are in the experimental support. The abstract claims a 58.6 percent drop versus the strongest baseline, but lacks visible ablations, full baseline descriptions, or statistical details like error bars and splits. That makes it hard to confirm the gain comes from the RL components specifically. The spline-based reward densification is central, yet without checks showing it matches real motion physics or velocities from the videos, it risks introducing artifacts that bias the learned policy toward unrealistic paths. This paper suits researchers in medical robotics who want to see offline RL applied to endoscopic trajectory tasks. Readers working on vision-based prediction in surgery would find the formulation and dataset useful. I would send it for peer review. The core idea is grounded enough and the data is fresh, so referees can help tighten the experiments and validate the reward modeling.

Referee Report

2 major / 2 minor

Summary. The paper claims that formulating surgical needle trajectory prediction as goal-conditioned offline RL in pixel space, with cubic spline interpolation of sparse waypoints to generate dense rewards, enables better modeling of sequential motion dependencies than direct supervised or imitation learning approaches. SutureFormer uses an observation encoder for variable-length video clips, autoregressive prediction of discrete-direction and continuous-magnitude actions, and CQL with BC regularization for stable offline optimization. On a new kidney wound suturing dataset of 1,158 trajectories from 50 patients, it reports a 58.6% reduction in Average Displacement Error relative to the strongest baseline.

Significance. If the central result holds after addressing experimental gaps, the work would be significant for robot-assisted surgery by demonstrating that an RL formulation can capture temporal continuity in pixel-space trajectories where standard regression methods fall short. The introduction of a sizable multi-patient dataset and the explicit use of offline RL components (CQL + BC) to leverage limited expert data are concrete strengths that could influence future trajectory prediction pipelines.

major comments (2)

[Abstract / Experimental evaluation] Abstract and method description: the reported 58.6% ADE reduction is attributed to the RL formulation and cubic-spline dense rewards, yet the manuscript provides no ablation studies, baseline implementation details, error bars, or train/validation/test split information. Without these, it is impossible to determine whether the gain arises from the sequential decision-making model or from unstated factors such as encoder architecture or data preprocessing.
[Method (reward formulation)] Reward design (cubic spline interpolation): the central claim that interpolated dense rewards accurately reflect physically plausible pixel-wise state transitions rests on an unvalidated assumption. No quantitative check (e.g., deviation from dense manual labels, consistency with observed needle velocities, or curvature statistics) is reported, leaving open the possibility that C2-continuous splines introduce non-physical artifacts that bias the learned policy.

minor comments (2)

[Method] The observation encoder's handling of variable-length clips and the precise discretization of action directions should be clarified with a diagram or pseudocode for reproducibility.
[Experiments] Dataset description would benefit from explicit mention of how the 50 patients were partitioned and whether any patient-level leakage exists between training and test trajectories.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment below and describe the revisions planned for the manuscript.

read point-by-point responses

Referee: [Abstract / Experimental evaluation] Abstract and method description: the reported 58.6% ADE reduction is attributed to the RL formulation and cubic-spline dense rewards, yet the manuscript provides no ablation studies, baseline implementation details, error bars, or train/validation/test split information. Without these, it is impossible to determine whether the gain arises from the sequential decision-making model or from unstated factors such as encoder architecture or data preprocessing.

Authors: We agree that these experimental details are necessary for reproducibility and to isolate the source of the reported gains. In the revised manuscript we will add ablation studies that separately evaluate the contribution of the goal-conditioned offline RL formulation versus the spline-based reward densification. We will also provide full implementation details for all baselines, report error bars from multiple random seeds, and explicitly state the train/validation/test split ratios and patient-wise partitioning used for the 1,158-trajectory dataset. revision: yes
Referee: [Method (reward formulation)] Reward design (cubic spline interpolation): the central claim that interpolated dense rewards accurately reflect physically plausible pixel-wise state transitions rests on an unvalidated assumption. No quantitative check (e.g., deviation from dense manual labels, consistency with observed needle velocities, or curvature statistics) is reported, leaving open the possibility that C2-continuous splines introduce non-physical artifacts that bias the learned policy.

Authors: We acknowledge that the original submission did not include quantitative validation of the cubic-spline interpolation. In the revision we will add an analysis section that reports consistency of the interpolated trajectories with observed needle velocities and curvature statistics derived from the expert demonstrations. Because the dataset contains only sparse waypoint annotations, direct deviation metrics against dense manual labels are not feasible; we will therefore focus on the velocity and curvature checks that can be performed with the available data. revision: partial

Circularity Check

0 steps flagged

No significant circularity; standard RL application to new task with held-out evaluation

full rationale

The derivation applies established offline RL components (CQL + BC regularization) to formulate pixel-space needle trajectory prediction as goal-conditioned sequential decision making. Sparse waypoint annotations are converted to dense rewards via cubic spline interpolation as a preprocessing step; this does not create a self-definitional loop or fitted-input-called-prediction because the reported 58.6% ADE reduction is measured against ground-truth trajectories on held-out patient data rather than being recovered by construction from the same interpolation. No equations reduce the central claim to its inputs, no uniqueness theorem is imported from self-citations, and no ansatz is smuggled via prior work by the same authors. The method remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on the validity of treating pixel trajectories as an MDP, the suitability of spline interpolation for medical motion, and standard offline RL convergence assumptions; no new physical entities are postulated.

axioms (2)

domain assumption Sparse waypoint annotations can be densified via cubic spline interpolation to produce valid dense rewards that encourage physically plausible pixel-wise transitions.
Explicitly invoked in the abstract when converting sparse annotations to dense reward signals.
domain assumption Conservative Q-Learning with Behavioral Cloning regularization yields stable policies from expert demonstrations in this pixel-space setting.
Adopted to enable offline policy optimization; standard in the cited RL literature but assumed to transfer here.

pith-pipeline@v0.9.0 · 5833 in / 1347 out tokens · 44466 ms · 2026-05-21T11:30:22.727785+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

leverages sparse annotations to dense reward signals via cubic spline interpolation... Conservative Q-Learning with Behavioral Cloning regularization
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

1,158 trajectories from 50 patients... Average Displacement Error by 58.6%

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

26 extracted references · 26 canonical work pages · 1 internal anchor

[1]

Attanasio, A., Scaglioni, B., De Momi, E., Fiorini, P., Valdastri, P.: Autonomy in surgical robotics. Annual Review of Control, Robotics, and Autonomous Sys- tems4(Volume 4, 2021), 651–679 (2021).https://doi.org/https://doi.org/ 10.1146/annurev-control-062420-090543,https://www.annualreviews.org/ content/journals/10.1146/annurev-control-062420-090543

work page doi:10.1146/annurev-control-062420-090543 2021
[2]

In: Machine In- telligence 15, Intelligent Agents [St

Bain, M., Sammut, C.: A framework for behavioural cloning. In: Machine In- telligence 15, Intelligent Agents [St. Catherine’s College, Oxford, July 1995]. p. 103–129. Oxford University, GBR (1999)

work page 1995
[3]

End to End Learning for Self-Driving Cars

Bojarski, M., Del Testa, D., Dworakowski, D., Firner, B., Flepp, B., Goyal, P., Jackel, L.D., Monfort, M., Muller, U., Zhang, J., et al.: End to end learning for self-driving cars. arXiv preprint arXiv:1604.07316 (2016)

work page internal anchor Pith review Pith/arXiv arXiv 2016
[4]

Cai, P., Wang, H., Huang, H., Liu, Y., Liu, M.: Vision-based autonomous car racingusingdeepimitativereinforcementlearning.IEEERoboticsandAutomation Letters6(4), 7262–7269 (2021)

work page 2021
[5]

De Boor, C., De Boor, C.: A practical guide to splines, vol. 27. springer New York (1978)

work page 1978
[6]

In: Faust, A., Hsu, D., Neumann, G

Florence, P., Lynch, C., Zeng, A., Ramirez, O.A., Wahid, A., Downs, L., Wong, A., Lee, J., Mordatch, I., Tompson, J.: Implicit behavioral cloning. In: Faust, A., Hsu, D., Neumann, G. (eds.) Proceedings of the 5th Conference on Robot Learning. PMLRProceedings of Machine Learning Research, vol. 164, pp. 158–168. PMLR (08–11 Nov 2022),https://proceedings.mlr...

work page 2022
[7]

Neural rays for occlusion-aware image-based rendering,

Gu, T., Chen, G., Li, J., Lin, C., Rao, Y., Zhou, J., Lu, J.: Stochastic trajectory prediction via motion indeterminacy diffusion. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 17092–17101 (2022). https://doi.org/10.1109/CVPR52688.2022.01660

work page doi:10.1109/cvpr52688.2022.01660 2022
[8]

In: NIPSProceedings of the 30th International Conference on Neural Information Processing Systems

Ho, J., Ermon, S.: Generative adversarial imitation learning. In: NIPSProceedings of the 30th International Conference on Neural Information Processing Systems. p. 4572–4580. NIPS’16, Curran Associates Inc., Red Hook, NY, USA (2016)

work page 2016
[9]

Cyborg and Bionic Systems4, 0026 (2023)

Ji, G., Gao, Q., Zhang, T., Cao, L., Sun, Z.: A heuristically accelerated reinforce- ment learning-based neurosurgical path planner. Cyborg and Bionic Systems4, 0026 (2023)

work page 2023
[10]

International Journal of Computer Assisted Radiology and Surgery17(12), 2193–2202 (2022)

Jin, Y., Long, Y., Gao, X., Stoyanov, D., Dou, Q., Heng, P.A.: Trans-svnet: hybrid embedding aggregation transformer for surgical workflow analysis. International Journal of Computer Assisted Radiology and Surgery17(12), 2193–2202 (2022)

work page 2022
[11]

In: NeurIPSProceedings of the 34th International Confer- ence on Neural Information Processing Systems

Kumar, A., Zhou, A., Tucker, G., Levine, S.: Conservative Q-learning for offline reinforcement learning. In: NeurIPSProceedings of the 34th International Confer- ence on Neural Information Processing Systems. NIPS ’20, Curran Associates Inc., Red Hook, NY, USA (2020)

work page 2020
[12]

In: MICCAIMedical Image Com- puting and Computer Assisted Intervention – MICCAI 2023: 26th International Conference, Vancouver, BC, Canada, October 8–12, 2023, Proceedings, Part IX

Li, J., Jin, Y., Chen, Y., Yip, H.C., Scheppach, M., Chiu, P.W.Y., Yam, Y., Meng, H.M.L., Dou, Q.: Imitation learning from expert video data for dissection trajec- tory prediction in endoscopic surgical procedure. In: MICCAIMedical Image Com- puting and Computer Assisted Intervention – MICCAI 2023: 26th International Conference, Vancouver, BC, Canada, Oct...

work page doi:10.1007/978-3-031-43996-4_47 2023
[13]

arXiv preprint arXiv:2405.17940 (2024) Learning Surgical Trajectories via Goal-conditioned Offline RL 15

Lin, H., Li, B., Wong, C.W., Rojas, J., Chu, X., Au, K.W.S.: World models for general surgical grasping. arXiv preprint arXiv:2405.17940 (2024) Learning Surgical Trajectories via Goal-conditioned Offline RL 15

work page arXiv 2024
[14]

Nature Biomedical Engineering1(9), 691–696 (2017)

Maier-Hein, L., Vedula, S.S., Speidel, S., Navab, N., Kikinis, R., Park, A., Eisen- mann, M., Feussner, H., Forestier, G., Giannarou, S., et al.: Surgical data science for next-generation interventions. Nature Biomedical Engineering1(9), 691–696 (2017)

work page 2017
[15]

Medical Image Analysis78, 102433 (2022)

Nwoye, C.I., Yu, T., Gonzalez, C., Seeliger, B., Mascagni, P., Mutter, D., Marescaux, J., Padoy, N.: Rendezvous: Attention mechanisms for the recognition of surgical action triplets in endoscopic videos. Medical Image Analysis78, 102433 (2022)

work page 2022
[16]

In: NIPSProceedings of the 2nd International Conference on Neural Information Pro- cessing Systems

Pomerleau, D.A.: ALVINN: an autonomous land vehicle in a neural network. In: NIPSProceedings of the 2nd International Conference on Neural Information Pro- cessing Systems. p. 305–313. NIPS’88, MIT Press, Cambridge, MA, USA (1988)

work page 1988
[17]

TartanAir: A dataset to push the limits of visual SLAM,

Qin, Y., Feyzabadi, S., Allan, M., Burdick, J.W., Azizian, M.: davincinet: Joint prediction of motion and surgical state in robot-assisted surgery. In: IROS2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). pp. 2921–2928 (2020).https://doi.org/10.1109/IROS45743.2020.9340723

work page doi:10.1109/iros45743.2020.9340723 2020
[18]

In: MICCAI

Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomed- ical image segmentation. In: MICCAI. pp. 234–241. Springer (2015)

work page 2015
[19]

In: 2022 IEEE/RSJ Inter- national Conference on Intelligent Robots and Systems (IROS), pp

Shi, C., Zheng, Y., Fey, A.M.: Recognition and prediction of surgical gestures and trajectories using transformer models in robot-assisted surgery. In: IROS2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). pp. 8017–8024 (2022).https://doi.org/10.1109/IROS47612.2022.9981611

work page doi:10.1109/iros47612.2022.9981611 2022
[20]

Decision and organization 1(1), 161–176 (1972)

Simon, H.A., et al.: Theories of bounded rationality. Decision and organization 1(1), 161–176 (1972)

work page 1972
[21]

Advances in neural information processing systems29(2016)

Tamar, A., Wu, Y., Thomas, G., Levine, S., Abbeel, P.: Value iteration networks. Advances in neural information processing systems29(2016)

work page 2016
[22]

IEEE Robotics and Automation Letters5(2), 3422–3429 (2020) https://doi.org/10.1109/LRA.2020

Wang, B., Liu, Z., Li, Q., Prorok, A.: Mobile robot path planning in dynamic envi- ronments through globally guided reinforcement learning. IEEE Robotics and Au- tomation Letters5(4), 6932–6939 (2020).https://doi.org/10.1109/LRA.2020. 3026638

work page doi:10.1109/lra.2020 2020
[23]

McClellan, J., Haghani, N., Winder, J., Huang, F., and Tokekar, P

Weerasinghe, K., Reza Roodabeh, S.H., Hutchinson, K., Alemzadeh, H.: Multi- modal transformers for real-time surgical activity prediction. In: ICRA2024 IEEE International Conference on Robotics and Automation (ICRA). pp. 13323–13330 (2024).https://doi.org/10.1109/ICRA57147.2024.10611048

work page doi:10.1109/icra57147.2024.10611048 2024
[24]

The International Journal of Medical Robotics and Computer Assisted Surgery21(3), e70072 (2025)

Xu, W., Tan, Z., Cao, Z., Ma, H., Wang, G., Wang, H., Wang, W., Du, Z.: Dp4ausu: Autonomous surgical framework for suturing manipulation using diffusion policy with dynamic time wrapping-based locally weighted regression. The International Journal of Medical Robotics and Computer Assisted Surgery21(3), e70072 (2025)

work page 2025
[25]

Yang, G.Z., Cambias, J., Cleary, K., Daimler, E., Drake, J., Dupont, P.E., Hata, N., Kazanzides, P., Martel, S., Patel, R.V., et al.: Medical robotics—regulatory, ethical, and legal considerations for increasing levels of autonomy (2017)

work page 2017
[26]

In: proceedings of Medical Image Computing and Computer Assisted Intervention – MICCAI 2024

Zhao, Z., Fang, F., Yang, X., Xu, Q., Guan, C., Zhou, S.K.: See, Predict, Plan: Diffusion for Procedure Planning in Robotic Surgical Videos . In: proceedings of Medical Image Computing and Computer Assisted Intervention – MICCAI 2024. vol. LNCS 15006. Springer Nature Switzerland (October 2024)

work page 2024

[1] [1]

Attanasio, A., Scaglioni, B., De Momi, E., Fiorini, P., Valdastri, P.: Autonomy in surgical robotics. Annual Review of Control, Robotics, and Autonomous Sys- tems4(Volume 4, 2021), 651–679 (2021).https://doi.org/https://doi.org/ 10.1146/annurev-control-062420-090543,https://www.annualreviews.org/ content/journals/10.1146/annurev-control-062420-090543

work page doi:10.1146/annurev-control-062420-090543 2021

[2] [2]

In: Machine In- telligence 15, Intelligent Agents [St

Bain, M., Sammut, C.: A framework for behavioural cloning. In: Machine In- telligence 15, Intelligent Agents [St. Catherine’s College, Oxford, July 1995]. p. 103–129. Oxford University, GBR (1999)

work page 1995

[3] [3]

End to End Learning for Self-Driving Cars

Bojarski, M., Del Testa, D., Dworakowski, D., Firner, B., Flepp, B., Goyal, P., Jackel, L.D., Monfort, M., Muller, U., Zhang, J., et al.: End to end learning for self-driving cars. arXiv preprint arXiv:1604.07316 (2016)

work page internal anchor Pith review Pith/arXiv arXiv 2016

[4] [4]

Cai, P., Wang, H., Huang, H., Liu, Y., Liu, M.: Vision-based autonomous car racingusingdeepimitativereinforcementlearning.IEEERoboticsandAutomation Letters6(4), 7262–7269 (2021)

work page 2021

[5] [5]

De Boor, C., De Boor, C.: A practical guide to splines, vol. 27. springer New York (1978)

work page 1978

[6] [6]

In: Faust, A., Hsu, D., Neumann, G

Florence, P., Lynch, C., Zeng, A., Ramirez, O.A., Wahid, A., Downs, L., Wong, A., Lee, J., Mordatch, I., Tompson, J.: Implicit behavioral cloning. In: Faust, A., Hsu, D., Neumann, G. (eds.) Proceedings of the 5th Conference on Robot Learning. PMLRProceedings of Machine Learning Research, vol. 164, pp. 158–168. PMLR (08–11 Nov 2022),https://proceedings.mlr...

work page 2022

[7] [7]

Neural rays for occlusion-aware image-based rendering,

Gu, T., Chen, G., Li, J., Lin, C., Rao, Y., Zhou, J., Lu, J.: Stochastic trajectory prediction via motion indeterminacy diffusion. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 17092–17101 (2022). https://doi.org/10.1109/CVPR52688.2022.01660

work page doi:10.1109/cvpr52688.2022.01660 2022

[8] [8]

In: NIPSProceedings of the 30th International Conference on Neural Information Processing Systems

Ho, J., Ermon, S.: Generative adversarial imitation learning. In: NIPSProceedings of the 30th International Conference on Neural Information Processing Systems. p. 4572–4580. NIPS’16, Curran Associates Inc., Red Hook, NY, USA (2016)

work page 2016

[9] [9]

Cyborg and Bionic Systems4, 0026 (2023)

Ji, G., Gao, Q., Zhang, T., Cao, L., Sun, Z.: A heuristically accelerated reinforce- ment learning-based neurosurgical path planner. Cyborg and Bionic Systems4, 0026 (2023)

work page 2023

[10] [10]

International Journal of Computer Assisted Radiology and Surgery17(12), 2193–2202 (2022)

Jin, Y., Long, Y., Gao, X., Stoyanov, D., Dou, Q., Heng, P.A.: Trans-svnet: hybrid embedding aggregation transformer for surgical workflow analysis. International Journal of Computer Assisted Radiology and Surgery17(12), 2193–2202 (2022)

work page 2022

[11] [11]

In: NeurIPSProceedings of the 34th International Confer- ence on Neural Information Processing Systems

Kumar, A., Zhou, A., Tucker, G., Levine, S.: Conservative Q-learning for offline reinforcement learning. In: NeurIPSProceedings of the 34th International Confer- ence on Neural Information Processing Systems. NIPS ’20, Curran Associates Inc., Red Hook, NY, USA (2020)

work page 2020

[12] [12]

In: MICCAIMedical Image Com- puting and Computer Assisted Intervention – MICCAI 2023: 26th International Conference, Vancouver, BC, Canada, October 8–12, 2023, Proceedings, Part IX

Li, J., Jin, Y., Chen, Y., Yip, H.C., Scheppach, M., Chiu, P.W.Y., Yam, Y., Meng, H.M.L., Dou, Q.: Imitation learning from expert video data for dissection trajec- tory prediction in endoscopic surgical procedure. In: MICCAIMedical Image Com- puting and Computer Assisted Intervention – MICCAI 2023: 26th International Conference, Vancouver, BC, Canada, Oct...

work page doi:10.1007/978-3-031-43996-4_47 2023

[13] [13]

arXiv preprint arXiv:2405.17940 (2024) Learning Surgical Trajectories via Goal-conditioned Offline RL 15

Lin, H., Li, B., Wong, C.W., Rojas, J., Chu, X., Au, K.W.S.: World models for general surgical grasping. arXiv preprint arXiv:2405.17940 (2024) Learning Surgical Trajectories via Goal-conditioned Offline RL 15

work page arXiv 2024

[14] [14]

Nature Biomedical Engineering1(9), 691–696 (2017)

Maier-Hein, L., Vedula, S.S., Speidel, S., Navab, N., Kikinis, R., Park, A., Eisen- mann, M., Feussner, H., Forestier, G., Giannarou, S., et al.: Surgical data science for next-generation interventions. Nature Biomedical Engineering1(9), 691–696 (2017)

work page 2017

[15] [15]

Medical Image Analysis78, 102433 (2022)

Nwoye, C.I., Yu, T., Gonzalez, C., Seeliger, B., Mascagni, P., Mutter, D., Marescaux, J., Padoy, N.: Rendezvous: Attention mechanisms for the recognition of surgical action triplets in endoscopic videos. Medical Image Analysis78, 102433 (2022)

work page 2022

[16] [16]

In: NIPSProceedings of the 2nd International Conference on Neural Information Pro- cessing Systems

Pomerleau, D.A.: ALVINN: an autonomous land vehicle in a neural network. In: NIPSProceedings of the 2nd International Conference on Neural Information Pro- cessing Systems. p. 305–313. NIPS’88, MIT Press, Cambridge, MA, USA (1988)

work page 1988

[17] [17]

TartanAir: A dataset to push the limits of visual SLAM,

Qin, Y., Feyzabadi, S., Allan, M., Burdick, J.W., Azizian, M.: davincinet: Joint prediction of motion and surgical state in robot-assisted surgery. In: IROS2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). pp. 2921–2928 (2020).https://doi.org/10.1109/IROS45743.2020.9340723

work page doi:10.1109/iros45743.2020.9340723 2020

[18] [18]

In: MICCAI

Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomed- ical image segmentation. In: MICCAI. pp. 234–241. Springer (2015)

work page 2015

[19] [19]

In: 2022 IEEE/RSJ Inter- national Conference on Intelligent Robots and Systems (IROS), pp

Shi, C., Zheng, Y., Fey, A.M.: Recognition and prediction of surgical gestures and trajectories using transformer models in robot-assisted surgery. In: IROS2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). pp. 8017–8024 (2022).https://doi.org/10.1109/IROS47612.2022.9981611

work page doi:10.1109/iros47612.2022.9981611 2022

[20] [20]

Decision and organization 1(1), 161–176 (1972)

Simon, H.A., et al.: Theories of bounded rationality. Decision and organization 1(1), 161–176 (1972)

work page 1972

[21] [21]

Advances in neural information processing systems29(2016)

Tamar, A., Wu, Y., Thomas, G., Levine, S., Abbeel, P.: Value iteration networks. Advances in neural information processing systems29(2016)

work page 2016

[22] [22]

IEEE Robotics and Automation Letters5(2), 3422–3429 (2020) https://doi.org/10.1109/LRA.2020

Wang, B., Liu, Z., Li, Q., Prorok, A.: Mobile robot path planning in dynamic envi- ronments through globally guided reinforcement learning. IEEE Robotics and Au- tomation Letters5(4), 6932–6939 (2020).https://doi.org/10.1109/LRA.2020. 3026638

work page doi:10.1109/lra.2020 2020

[23] [23]

McClellan, J., Haghani, N., Winder, J., Huang, F., and Tokekar, P

Weerasinghe, K., Reza Roodabeh, S.H., Hutchinson, K., Alemzadeh, H.: Multi- modal transformers for real-time surgical activity prediction. In: ICRA2024 IEEE International Conference on Robotics and Automation (ICRA). pp. 13323–13330 (2024).https://doi.org/10.1109/ICRA57147.2024.10611048

work page doi:10.1109/icra57147.2024.10611048 2024

[24] [24]

The International Journal of Medical Robotics and Computer Assisted Surgery21(3), e70072 (2025)

Xu, W., Tan, Z., Cao, Z., Ma, H., Wang, G., Wang, H., Wang, W., Du, Z.: Dp4ausu: Autonomous surgical framework for suturing manipulation using diffusion policy with dynamic time wrapping-based locally weighted regression. The International Journal of Medical Robotics and Computer Assisted Surgery21(3), e70072 (2025)

work page 2025

[25] [25]

Yang, G.Z., Cambias, J., Cleary, K., Daimler, E., Drake, J., Dupont, P.E., Hata, N., Kazanzides, P., Martel, S., Patel, R.V., et al.: Medical robotics—regulatory, ethical, and legal considerations for increasing levels of autonomy (2017)

work page 2017

[26] [26]

In: proceedings of Medical Image Computing and Computer Assisted Intervention – MICCAI 2024

Zhao, Z., Fang, F., Yang, X., Xu, Q., Guan, C., Zhou, S.K.: See, Predict, Plan: Diffusion for Procedure Planning in Robotic Surgical Videos . In: proceedings of Medical Image Computing and Computer Assisted Intervention – MICCAI 2024. vol. LNCS 15006. Springer Nature Switzerland (October 2024)

work page 2024