SutureFormer: Learning Surgical Trajectories via Goal-conditioned Offline RL in Pixel Space
Pith reviewed 2026-05-21 11:30 UTC · model grok-4.3
The pith
By treating the needle tip as an agent that takes sequential actions in pixel space, SutureFormer learns more accurate surgical trajectories from endoscopic videos using goal-conditioned offline reinforcement learning.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
SutureFormer formulates needle trajectory prediction as a sequential decision-making task where the needle tip is an agent moving in pixel space. It uses an observation encoder to process variable-length video clips and predicts future waypoints autoregressively via actions of discrete directions and continuous magnitudes. Dense rewards are generated from sparse annotations using cubic spline interpolation, and the policy is trained with Conservative Q-Learning regularized by Behavioral Cloning on a dataset of 1,158 trajectories from 50 patients, achieving a 58.6% reduction in Average Displacement Error compared to the strongest baseline.
What carries the argument
Goal-conditioned offline reinforcement learning with Conservative Q-Learning and Behavioral Cloning regularization, applied to pixel-space actions consisting of discrete directions and continuous magnitudes, with cubic spline interpolation for dense rewards from sparse waypoints.
Load-bearing premise
Cubic spline interpolation of sparse waypoint annotations creates dense reward signals that accurately capture physically plausible pixel-wise state transitions without introducing artifacts or biases into the learned policy.
What would settle it
Observing that the interpolated dense trajectories contain non-physical jumps or curves that do not match actual needle motion recorded at higher frame rates, or that the model's performance advantage disappears when tested on densely annotated ground truth data without relying on interpolation.
Figures
read the original abstract
Predicting surgical needle trajectories from endoscopic video is critical for robot-assisted suturing, enabling anticipatory planning, real-time guidance, and safer motion execution. Existing methods that directly learn motion distributions from visual observations tend to overlook the sequential dependency among adjacent motion steps. Moreover, sparse waypoint annotations often fail to provide sufficient supervision, further increasing the difficulty of supervised or imitation learning methods. To address these challenges, we formulate image-based needle trajectory prediction as a sequential decision-making problem, in which the needle tip is treated as an agent that moves step by step in pixel space. This formulation naturally captures the continuity of needle motion and enables the explicit modeling of physically plausible pixel-wise state transitions over time. From this perspective, we propose SutureFormer, a goal-conditioned offline reinforcement learning framework that leverages sparse annotations to dense reward signals via cubic spline interpolation, encouraging the policy to exploit limited expert guidance while exploring plausible future motion paths. SutureFormer encodes variable-length clips using an observation encoder to capture both local spatial cues and long-range temporal dynamics, and autoregressively predicts future waypoints through actions composed of discrete directions and continuous magnitudes. To enable stable offline policy optimization from expert demonstrations, we adopt Conservative Q-Learning with Behavioral Cloning regularization. Experiments on a new kidney wound suturing dataset containing 1,158 trajectories from 50 patients show that SutureFormer reduces Average Displacement Error by 58.6% compared with the strongest baseline, demonstrating the effectiveness of modeling needle trajectory prediction as pixel-level sequential action learning.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that formulating surgical needle trajectory prediction as goal-conditioned offline RL in pixel space, with cubic spline interpolation of sparse waypoints to generate dense rewards, enables better modeling of sequential motion dependencies than direct supervised or imitation learning approaches. SutureFormer uses an observation encoder for variable-length video clips, autoregressive prediction of discrete-direction and continuous-magnitude actions, and CQL with BC regularization for stable offline optimization. On a new kidney wound suturing dataset of 1,158 trajectories from 50 patients, it reports a 58.6% reduction in Average Displacement Error relative to the strongest baseline.
Significance. If the central result holds after addressing experimental gaps, the work would be significant for robot-assisted surgery by demonstrating that an RL formulation can capture temporal continuity in pixel-space trajectories where standard regression methods fall short. The introduction of a sizable multi-patient dataset and the explicit use of offline RL components (CQL + BC) to leverage limited expert data are concrete strengths that could influence future trajectory prediction pipelines.
major comments (2)
- [Abstract / Experimental evaluation] Abstract and method description: the reported 58.6% ADE reduction is attributed to the RL formulation and cubic-spline dense rewards, yet the manuscript provides no ablation studies, baseline implementation details, error bars, or train/validation/test split information. Without these, it is impossible to determine whether the gain arises from the sequential decision-making model or from unstated factors such as encoder architecture or data preprocessing.
- [Method (reward formulation)] Reward design (cubic spline interpolation): the central claim that interpolated dense rewards accurately reflect physically plausible pixel-wise state transitions rests on an unvalidated assumption. No quantitative check (e.g., deviation from dense manual labels, consistency with observed needle velocities, or curvature statistics) is reported, leaving open the possibility that C2-continuous splines introduce non-physical artifacts that bias the learned policy.
minor comments (2)
- [Method] The observation encoder's handling of variable-length clips and the precise discretization of action directions should be clarified with a diagram or pseudocode for reproducibility.
- [Experiments] Dataset description would benefit from explicit mention of how the 50 patients were partitioned and whether any patient-level leakage exists between training and test trajectories.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback. We address each major comment below and describe the revisions planned for the manuscript.
read point-by-point responses
-
Referee: [Abstract / Experimental evaluation] Abstract and method description: the reported 58.6% ADE reduction is attributed to the RL formulation and cubic-spline dense rewards, yet the manuscript provides no ablation studies, baseline implementation details, error bars, or train/validation/test split information. Without these, it is impossible to determine whether the gain arises from the sequential decision-making model or from unstated factors such as encoder architecture or data preprocessing.
Authors: We agree that these experimental details are necessary for reproducibility and to isolate the source of the reported gains. In the revised manuscript we will add ablation studies that separately evaluate the contribution of the goal-conditioned offline RL formulation versus the spline-based reward densification. We will also provide full implementation details for all baselines, report error bars from multiple random seeds, and explicitly state the train/validation/test split ratios and patient-wise partitioning used for the 1,158-trajectory dataset. revision: yes
-
Referee: [Method (reward formulation)] Reward design (cubic spline interpolation): the central claim that interpolated dense rewards accurately reflect physically plausible pixel-wise state transitions rests on an unvalidated assumption. No quantitative check (e.g., deviation from dense manual labels, consistency with observed needle velocities, or curvature statistics) is reported, leaving open the possibility that C2-continuous splines introduce non-physical artifacts that bias the learned policy.
Authors: We acknowledge that the original submission did not include quantitative validation of the cubic-spline interpolation. In the revision we will add an analysis section that reports consistency of the interpolated trajectories with observed needle velocities and curvature statistics derived from the expert demonstrations. Because the dataset contains only sparse waypoint annotations, direct deviation metrics against dense manual labels are not feasible; we will therefore focus on the velocity and curvature checks that can be performed with the available data. revision: partial
Circularity Check
No significant circularity; standard RL application to new task with held-out evaluation
full rationale
The derivation applies established offline RL components (CQL + BC regularization) to formulate pixel-space needle trajectory prediction as goal-conditioned sequential decision making. Sparse waypoint annotations are converted to dense rewards via cubic spline interpolation as a preprocessing step; this does not create a self-definitional loop or fitted-input-called-prediction because the reported 58.6% ADE reduction is measured against ground-truth trajectories on held-out patient data rather than being recovered by construction from the same interpolation. No equations reduce the central claim to its inputs, no uniqueness theorem is imported from self-citations, and no ansatz is smuggled via prior work by the same authors. The method remains self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Sparse waypoint annotations can be densified via cubic spline interpolation to produce valid dense rewards that encourage physically plausible pixel-wise transitions.
- domain assumption Conservative Q-Learning with Behavioral Cloning regularization yields stable policies from expert demonstrations in this pixel-space setting.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
leverages sparse annotations to dense reward signals via cubic spline interpolation... Conservative Q-Learning with Behavioral Cloning regularization
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
1,158 trajectories from 50 patients... Average Displacement Error by 58.6%
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Attanasio, A., Scaglioni, B., De Momi, E., Fiorini, P., Valdastri, P.: Autonomy in surgical robotics. Annual Review of Control, Robotics, and Autonomous Sys- tems4(Volume 4, 2021), 651–679 (2021).https://doi.org/https://doi.org/ 10.1146/annurev-control-062420-090543,https://www.annualreviews.org/ content/journals/10.1146/annurev-control-062420-090543
-
[2]
In: Machine In- telligence 15, Intelligent Agents [St
Bain, M., Sammut, C.: A framework for behavioural cloning. In: Machine In- telligence 15, Intelligent Agents [St. Catherine’s College, Oxford, July 1995]. p. 103–129. Oxford University, GBR (1999)
work page 1995
-
[3]
End to End Learning for Self-Driving Cars
Bojarski, M., Del Testa, D., Dworakowski, D., Firner, B., Flepp, B., Goyal, P., Jackel, L.D., Monfort, M., Muller, U., Zhang, J., et al.: End to end learning for self-driving cars. arXiv preprint arXiv:1604.07316 (2016)
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[4]
Cai, P., Wang, H., Huang, H., Liu, Y., Liu, M.: Vision-based autonomous car racingusingdeepimitativereinforcementlearning.IEEERoboticsandAutomation Letters6(4), 7262–7269 (2021)
work page 2021
-
[5]
De Boor, C., De Boor, C.: A practical guide to splines, vol. 27. springer New York (1978)
work page 1978
-
[6]
In: Faust, A., Hsu, D., Neumann, G
Florence, P., Lynch, C., Zeng, A., Ramirez, O.A., Wahid, A., Downs, L., Wong, A., Lee, J., Mordatch, I., Tompson, J.: Implicit behavioral cloning. In: Faust, A., Hsu, D., Neumann, G. (eds.) Proceedings of the 5th Conference on Robot Learning. PMLRProceedings of Machine Learning Research, vol. 164, pp. 158–168. PMLR (08–11 Nov 2022),https://proceedings.mlr...
work page 2022
-
[7]
In: 2022 IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition (CVPR)
Gu, T., Chen, G., Li, J., Lin, C., Rao, Y., Zhou, J., Lu, J.: Stochastic trajectory prediction via motion indeterminacy diffusion. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 17092–17101 (2022). https://doi.org/10.1109/CVPR52688.2022.01660
-
[8]
In: NIPSProceedings of the 30th International Conference on Neural Information Processing Systems
Ho, J., Ermon, S.: Generative adversarial imitation learning. In: NIPSProceedings of the 30th International Conference on Neural Information Processing Systems. p. 4572–4580. NIPS’16, Curran Associates Inc., Red Hook, NY, USA (2016)
work page 2016
-
[9]
Cyborg and Bionic Systems4, 0026 (2023)
Ji, G., Gao, Q., Zhang, T., Cao, L., Sun, Z.: A heuristically accelerated reinforce- ment learning-based neurosurgical path planner. Cyborg and Bionic Systems4, 0026 (2023)
work page 2023
-
[10]
International Journal of Computer Assisted Radiology and Surgery17(12), 2193–2202 (2022)
Jin, Y., Long, Y., Gao, X., Stoyanov, D., Dou, Q., Heng, P.A.: Trans-svnet: hybrid embedding aggregation transformer for surgical workflow analysis. International Journal of Computer Assisted Radiology and Surgery17(12), 2193–2202 (2022)
work page 2022
-
[11]
Kumar, A., Zhou, A., Tucker, G., Levine, S.: Conservative Q-learning for offline reinforcement learning. In: NeurIPSProceedings of the 34th International Confer- ence on Neural Information Processing Systems. NIPS ’20, Curran Associates Inc., Red Hook, NY, USA (2020)
work page 2020
-
[12]
Li, J., Jin, Y., Chen, Y., Yip, H.C., Scheppach, M., Chiu, P.W.Y., Yam, Y., Meng, H.M.L., Dou, Q.: Imitation learning from expert video data for dissection trajec- tory prediction in endoscopic surgical procedure. In: MICCAIMedical Image Com- puting and Computer Assisted Intervention – MICCAI 2023: 26th International Conference, Vancouver, BC, Canada, Oct...
-
[13]
Lin, H., Li, B., Wong, C.W., Rojas, J., Chu, X., Au, K.W.S.: World models for general surgical grasping. arXiv preprint arXiv:2405.17940 (2024) Learning Surgical Trajectories via Goal-conditioned Offline RL 15
-
[14]
Nature Biomedical Engineering1(9), 691–696 (2017)
Maier-Hein, L., Vedula, S.S., Speidel, S., Navab, N., Kikinis, R., Park, A., Eisen- mann, M., Feussner, H., Forestier, G., Giannarou, S., et al.: Surgical data science for next-generation interventions. Nature Biomedical Engineering1(9), 691–696 (2017)
work page 2017
-
[15]
Medical Image Analysis78, 102433 (2022)
Nwoye, C.I., Yu, T., Gonzalez, C., Seeliger, B., Mascagni, P., Mutter, D., Marescaux, J., Padoy, N.: Rendezvous: Attention mechanisms for the recognition of surgical action triplets in endoscopic videos. Medical Image Analysis78, 102433 (2022)
work page 2022
-
[16]
In: NIPSProceedings of the 2nd International Conference on Neural Information Pro- cessing Systems
Pomerleau, D.A.: ALVINN: an autonomous land vehicle in a neural network. In: NIPSProceedings of the 2nd International Conference on Neural Information Pro- cessing Systems. p. 305–313. NIPS’88, MIT Press, Cambridge, MA, USA (1988)
work page 1988
-
[17]
Qin, Y., Feyzabadi, S., Allan, M., Burdick, J.W., Azizian, M.: davincinet: Joint prediction of motion and surgical state in robot-assisted surgery. In: IROS2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). pp. 2921–2928 (2020).https://doi.org/10.1109/IROS45743.2020.9340723
-
[18]
Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomed- ical image segmentation. In: MICCAI. pp. 234–241. Springer (2015)
work page 2015
-
[19]
In: 2022 IEEE/RSJ Inter- national Conference on Intelligent Robots and Systems (IROS), pp
Shi, C., Zheng, Y., Fey, A.M.: Recognition and prediction of surgical gestures and trajectories using transformer models in robot-assisted surgery. In: IROS2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). pp. 8017–8024 (2022).https://doi.org/10.1109/IROS47612.2022.9981611
-
[20]
Decision and organization 1(1), 161–176 (1972)
Simon, H.A., et al.: Theories of bounded rationality. Decision and organization 1(1), 161–176 (1972)
work page 1972
-
[21]
Advances in neural information processing systems29(2016)
Tamar, A., Wu, Y., Thomas, G., Levine, S., Abbeel, P.: Value iteration networks. Advances in neural information processing systems29(2016)
work page 2016
-
[22]
IEEE Robotics and Automation Letters5(2), 3422–3429 (2020) https://doi.org/10.1109/LRA.2020
Wang, B., Liu, Z., Li, Q., Prorok, A.: Mobile robot path planning in dynamic envi- ronments through globally guided reinforcement learning. IEEE Robotics and Au- tomation Letters5(4), 6932–6939 (2020).https://doi.org/10.1109/LRA.2020. 3026638
-
[23]
McClellan, J., Haghani, N., Winder, J., Huang, F., and Tokekar, P
Weerasinghe, K., Reza Roodabeh, S.H., Hutchinson, K., Alemzadeh, H.: Multi- modal transformers for real-time surgical activity prediction. In: ICRA2024 IEEE International Conference on Robotics and Automation (ICRA). pp. 13323–13330 (2024).https://doi.org/10.1109/ICRA57147.2024.10611048
-
[24]
The International Journal of Medical Robotics and Computer Assisted Surgery21(3), e70072 (2025)
Xu, W., Tan, Z., Cao, Z., Ma, H., Wang, G., Wang, H., Wang, W., Du, Z.: Dp4ausu: Autonomous surgical framework for suturing manipulation using diffusion policy with dynamic time wrapping-based locally weighted regression. The International Journal of Medical Robotics and Computer Assisted Surgery21(3), e70072 (2025)
work page 2025
-
[25]
Yang, G.Z., Cambias, J., Cleary, K., Daimler, E., Drake, J., Dupont, P.E., Hata, N., Kazanzides, P., Martel, S., Patel, R.V., et al.: Medical robotics—regulatory, ethical, and legal considerations for increasing levels of autonomy (2017)
work page 2017
-
[26]
In: proceedings of Medical Image Computing and Computer Assisted Intervention – MICCAI 2024
Zhao, Z., Fang, F., Yang, X., Xu, Q., Guan, C., Zhou, S.K.: See, Predict, Plan: Diffusion for Procedure Planning in Robotic Surgical Videos . In: proceedings of Medical Image Computing and Computer Assisted Intervention – MICCAI 2024. vol. LNCS 15006. Springer Nature Switzerland (October 2024)
work page 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.