Novel evaluation of surgical activity recognition models using task-based efficiency metrics

Aneeq Zia; Anthony Jarc; Irfan Essa; Liheng Guo; Linlin Zhou

arxiv: 1907.02060 · v1 · pith:OJV4ZAYJnew · submitted 2019-07-03 · 💻 cs.CV · eess.IV

Novel evaluation of surgical activity recognition models using task-based efficiency metrics

Aneeq Zia , Liheng Guo , Linlin Zhou , Irfan Essa , Anthony Jarc This is my paper

Pith reviewed 2026-05-25 10:06 UTC · model grok-4.3

classification 💻 cs.CV eess.IV

keywords surgical activity recognitionefficiency metricsrobotic prostatectomytask recognitionCNN-LSTMvideo analysissurgeon trainingRARP

0 comments

The pith

Surgical activity recognition models can be evaluated by the accuracy of the efficiency metrics computed from their task identifications.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes shifting evaluation of surgical task recognition models away from frame-level accuracy toward whether the tasks they identify produce efficiency metrics that match those from expert labels. The authors present RP-Net-V2, a CNN-LSTM model trained to detect the twelve steps of robotic-assisted radical prostatectomy, and compare its outputs both by standard overlap scores and by derived metrics on instrument movements and system events. They report a Jaccard Index of 0.85 and strong correlation between model-derived and expert-derived efficiency values. A reader would care because this offers a practical test for when automated recognition is good enough to support focused training feedback without manual review. If correct, the approach enables scalable post-operative reports that quantify task efficiencies.

Core claim

The central claim is that metrics-based evaluation of surgical activity recognition models is a viable approach to determine when models can be used to quantify surgical efficiencies. RP-Net-V2 achieves a Jaccard Index of 0.85 on the twelve RARP steps and produces task-based efficiency metrics from instrument movements and system events that correlate well with those obtained from clinical expert labels, supporting the conclusion that this form of evaluation can indicate when models are ready for automated surgeon feedback.

What carries the argument

RP-Net-V2, a CNN-LSTM model that recognizes the twelve steps of robotic-assisted radical prostatectomy and is assessed by how closely its task identifications reproduce expert efficiency metrics on instrument movements and system events.

If this is right

Models that pass the metrics correlation test can generate automated post-operative efficiency reports.
Evaluation can indicate when recognition performance is adequate for providing task-specific surgeon feedback.
Task-based metrics enable focused training interventions instead of whole-procedure review.
The method supports scalable quantification of surgical efficiencies without constant expert labeling.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same correlation test could be applied to activity recognition in other robotic or laparoscopic procedures.
If the chosen efficiency metrics prove predictive of patient outcomes, the evaluation standard would gain clinical weight.
Repeated application might allow models to be refined directly against metric fidelity rather than label overlap alone.

Load-bearing premise

Correlation between efficiency metrics from model-identified tasks and expert-labeled tasks is enough to conclude the model can supply reliable surgeon feedback.

What would settle it

A dataset or procedure in which model-derived efficiency metrics show high correlation with expert labels yet fail to predict measurable differences in surgeon performance or outcomes.

read the original abstract

Purpose: Surgical task-based metrics (rather than entire procedure metrics) can be used to improve surgeon training and, ultimately, patient care through focused training interventions. Machine learning models to automatically recognize individual tasks or activities are needed to overcome the otherwise manual effort of video review. Traditionally, these models have been evaluated using frame-level accuracy. Here, we propose evaluating surgical activity recognition models by their effect on task-based efficiency metrics. In this way, we can determine when models have achieved adequate performance for providing surgeon feedback via metrics from individual tasks. Methods: We propose a new CNN-LSTM model, RP-Net-V2, to recognize the 12 steps of robotic-assisted radical prostatectomies (RARP). We evaluated our model both in terms of conventional methods (e.g. Jaccard Index, task boundary accuracy) as well as novel ways, such as the accuracy of efficiency metrics computed from instrument movements and system events. Results: Our proposed model achieves a Jaccard Index of 0.85 thereby outperforming previous models on robotic-assisted radical prostatectomies. Additionally, we show that metrics computed from tasks automatically identified using RP-Net-V2 correlate well with metrics from tasks labeled by clinical experts. Conclusions: We demonstrate that metrics-based evaluation of surgical activity recognition models is a viable approach to determine when models can be used to quantify surgical efficiencies. We believe this approach and our results illustrate the potential for fully automated, post-operative efficiency reports.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

RP-Net-V2 beats prior Jaccard scores on RARP and shows metric correlation, but the case for reliable feedback via this method needs more than correlation to hold up.

read the letter

The paper's key point is that we should judge activity recognition models in surgery by whether the efficiency metrics they enable match those from expert labels. Their RP-Net-V2 CNN-LSTM model recognizes the 12 steps of robotic-assisted radical prostatectomies with a Jaccard index of 0.85, which beats earlier models, and the efficiency metrics from its task labels correlate with expert ones. This is a sensible shift if the end goal is automated feedback on things like instrument movements and system events. It directly ties the model's performance to practical outcomes in training and patient care. Reporting both the standard metrics and the new correlation is a good step. The soft spot is the reliance on correlation without showing the magnitude of differences or ruling out biases. If the model consistently mislabels task boundaries in a way that adds a fixed amount to durations or movements, the correlation could stay high while the metrics become unreliable for feedback. The provided abstract gives no dataset details, no statistical tests, and no absolute error numbers, which leaves the viability claim under-supported. This work is aimed at researchers developing models for surgical video analysis. A reader looking for ways to make model evaluation more clinically relevant would get value from the idea. It deserves a serious referee because the model results are competitive and the evaluation concept has potential, even though the current evidence is preliminary.

Referee Report

2 major / 1 minor

Summary. The paper introduces RP-Net-V2, a CNN-LSTM model for recognizing the 12 steps of robotic-assisted radical prostatectomies (RARP). It reports a Jaccard Index of 0.85 and proposes evaluating activity recognition models via the correlation between task-based efficiency metrics (instrument movements and system events) computed from model-derived task labels versus expert labels. The central claim is that this metrics-based evaluation demonstrates when models achieve adequate performance for automated surgeon feedback on surgical efficiencies.

Significance. If the correlation evidence is strengthened with absolute error analysis, the work could usefully shift evaluation of surgical AI models toward clinically relevant proxies rather than frame-level accuracy alone. The use of independent expert labels as ground truth for the correlation check is a clear methodological strength that supports the independence of the validation.

major comments (2)

[Abstract / Results] Abstract and Results: The claim that efficiency metrics from RP-Net-V2 'correlate well' with expert-derived metrics is presented as evidence that metrics-based evaluation is viable, yet the manuscript reports neither the correlation coefficient values, the number of procedures or tasks evaluated, nor any absolute error measures (e.g., mean absolute difference, Bland-Altman limits of agreement). Without these, systematic biases that preserve rank correlation while rendering the metrics non-interchangeable for feedback cannot be ruled out, directly weakening the central viability conclusion.
[Abstract] Abstract: The Jaccard Index of 0.85 is stated without accompanying dataset size, cross-validation details, statistical tests, error bars, or exclusion criteria. Because the performance of RP-Net-V2 is used to anchor the subsequent metrics-correlation argument, the lack of these basic reporting elements leaves the foundation of the viability claim under-specified.

minor comments (1)

[Abstract] The abstract states the purpose of task-based metrics but does not define the specific efficiency metrics (e.g., which instrument movements or system events) until the Methods; moving a brief definition earlier would improve readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their detailed review and constructive suggestions. We agree that additional quantitative details are needed to strengthen the central claims and will revise the manuscript accordingly to address the major comments.

read point-by-point responses

Referee: [Abstract / Results] Abstract and Results: The claim that efficiency metrics from RP-Net-V2 'correlate well' with expert-derived metrics is presented as evidence that metrics-based evaluation is viable, yet the manuscript reports neither the correlation coefficient values, the number of procedures or tasks evaluated, nor any absolute error measures (e.g., mean absolute difference, Bland-Altman limits of agreement). Without these, systematic biases that preserve rank correlation while rendering the metrics non-interchangeable for feedback cannot be ruled out, directly weakening the central viability conclusion.

Authors: We agree with this assessment. The current manuscript does not report the specific correlation coefficient values, the exact number of procedures used for the efficiency metrics evaluation, or absolute error measures. We will revise the Results section and abstract to include these details from our analysis, such as the correlation coefficients for each metric and the sample size. We will also compute and report mean absolute differences to address potential systematic biases. This revision will be made to better support the viability conclusion. revision: yes
Referee: [Abstract] Abstract: The Jaccard Index of 0.85 is stated without accompanying dataset size, cross-validation details, statistical tests, error bars, or exclusion criteria. Because the performance of RP-Net-V2 is used to anchor the subsequent metrics-correlation argument, the lack of these basic reporting elements leaves the foundation of the viability claim under-specified.

Authors: We concur that the abstract is under-specified in this regard. The current version of the manuscript does not include these elements in the abstract. We will revise the abstract to incorporate the dataset size, cross-validation details, statistical tests performed, and error bars. This will provide a stronger foundation for the claims. revision: yes

Circularity Check

0 steps flagged

No circularity; evaluation uses independent expert ground truth

full rationale

The paper trains RP-Net-V2 on RARP videos, reports standard Jaccard index of 0.85 against expert labels, then computes task-based efficiency metrics (instrument movements, system events) from both model outputs and expert labels and reports their correlation. No step defines a quantity in terms of itself, renames a fitted parameter as a prediction, or relies on a self-citation chain for a uniqueness claim. The correlation check is performed against externally provided expert annotations and is therefore falsifiable outside the model's own outputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Review limited to abstract; main unstated premise is that task efficiency metrics are meaningful performance indicators.

axioms (1)

domain assumption Efficiency metrics computed from instrument movements and system events are valid proxies for surgical performance quality.
Paper's central claim depends on this to link model output to useful feedback.

pith-pipeline@v0.9.0 · 5802 in / 1120 out tokens · 33796 ms · 2026-05-25T10:06:40.534340+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

28 extracted references · 28 canonical work pages · 2 internal anchors

[1]

New England Journal of Medicine 369(15) (2013) 1434–1442

Birkmeyer, J.D., Finks, J.F., O’reilly, A., Oerline, M., Carlin, A.M., Nunn, A.R., Dimick, J., Banerjee, M., Birkmeyer, N.J.: Surgical skill and compl ication rates after bariatric surgery. New England Journal of Medicine 369(15) (2013) 1434–1442

work page 2013
[2]

Journal of Gr aduate Medical Education 9(6) (2017) 697–705

Dai, J.C., Lendvay, T.S., Sorensen, M.D.: Crowdsourcing in surgical skills acquisition: A developing technology in surgical education. Journal of Gr aduate Medical Education 9(6) (2017) 697–705

work page 2017
[3]

The Journal of Urology (2018)

Chen, J., Cheng, N., Cacciamani, G., Oh, P., Lin-Brande, M ., Remulla, D., Gill, I.S., Hung, A.J.: Objective assessment of robotic surgical techn ical skill: A systemic review. The Journal of Urology (2018)

work page 2018
[4]

The Journal of Urology 199(1) (2018) 296–304

Hung, A.J., Chen, J., Jarc, A., Hatcher, D., Djaladat, H., Gill, I.S.: Development and validation of objective performance metrics for robot-ass isted radical prostatectomy: a pilot study. The Journal of Urology 199(1) (2018) 296–304

work page 2018
[5]

J ournal of Endourology 32(5) (2018) 438–444

Hung, A.J., Chen, J., Che, Z., Nilanon, T., Jarc, A., Titus , M., Oh, P.J., Gill, I.S., Liu, Y.: Utilizing machine learning and automated performance metr ics to evaluate robot-assisted radical prostatectomy performance and predict outcomes. J ournal of Endourology 32(5) (2018) 438–444

work page 2018
[6]

Hung, A.J., Chen, J., Ghodoussipour, S., Oh, P.J., Liu, Z. , Nguyen, J., Purushotham, S., Gill, I.S., Liu, Y.: Deep learning on automated performance metrics and clinical features to predict urinary continence recovery after robot-assist ed radical prostatectomy. BJU international (2019)

work page 2019
[7]

Teaching and Learning in Medicine 27(1) (2015) 12–26

Liu, M., Curet, M.: A review of training research and virtu al reality simulators for the da vinci surgical system. Teaching and Learning in Medicine 27(1) (2015) 12–26

work page 2015
[8]

Medical Imag e Analysis 16(3) (2012) 632 – 641 Computer Assisted Interventions

Padoy, N., Blum, T., Ahmadi, S.A., Feussner, H., Berger, M .O., Navab, N.: Statistical modeling and recognition of surgical workﬂow. Medical Imag e Analysis 16(3) (2012) 632 – 641 Computer Assisted Interventions

work page 2012
[9]

Internatio nal Journal of Computer Assisted Radiology and Surgery 10(9) (2015) 1427–1434

Kati´ c, D., Julliard, C., W ekerle, A.L., Kenngott, H., M¨ uller-Stich, B.P., Dillmann, R., Speidel, S., Jannin, P., Gibaud, B.: Lapontospm: an ontolog y for laparoscopic surgeries and its application to surgical phase recognition. Internatio nal Journal of Computer Assisted Radiology and Surgery 10(9) (2015) 1427–1434

work page 2015
[10]

International Journal of Computer Assisted Radiology and Surgery 11(6) (2016) 1081–1089

Dergachyova, O., Bouget, D., Huaulm´ e, A., Morandi, X., Jannin, P.: Automatic data- driven real-time segmentation and recognition of surgical workﬂow. International Journal of Computer Assisted Radiology and Surgery 11(6) (2016) 1081–1089

work page 2016
[11]

IEEE transactions on medical imaging 36(1) (2017) 86–97 12 Aneeq Zia 1 et al

Twinanda, A.P., Shehata, S., Mutter, D., Marescaux, J., De Mathelin, M., Padoy, N.: En- donet: A deep architecture for recognition tasks on laparos copic videos. IEEE transactions on medical imaging 36(1) (2017) 86–97 12 Aneeq Zia 1 et al

work page 2017
[12]

In: International Conference on Medical Image Computing and Computer-Assist ed Intervention, Springer (2016) 551–558

DiPietro, R., Lea, C., Malpani, A., Ahmidi, N., Vedula, S .S., Lee, G.I., Lee, M.R., Hager, G.D.: Recognizing surgical activities with recurrent neur al networks. In: International Conference on Medical Image Computing and Computer-Assist ed Intervention, Springer (2016) 551–558

work page 2016
[13]

IEEE transactions on bio-medi cal engineering (2017)

Ahmidi, N., Tao, L., Sefati, S., Gao, Y., Lea, C., Bejar, B ., Zappella, L., Khudanpur, S., Vidal, R., Hager, G.: A dataset and benchmarks for segmen tation and recognition of gestures in robotic surgery. IEEE transactions on bio-medi cal engineering (2017)

work page 2017
[14]

In: Medical Image Computing and Computer-Assiste d Intervention–MICCAI

Ahmidi, N., Gao, Y., B´ ejar, B., Vedula, S.S., Khudanpur , S., Vidal, R., Hager, G.D.: String motif-based description of tool motion for detectin g skill and gestures in robotic surgery. In: Medical Image Computing and Computer-Assiste d Intervention–MICCAI

work page
[15]

Springer (2013) 26–33

work page 2013
[16]

, Khudanpur, S., Hager, G.D.: Jhu-isi gesture and skill assessment working set (jigsaws) : A surgical activity dataset for human motion modeling

Gao, Y., Vedula, S.S., Reiley, C.E., Ahmidi, N., Varadar ajan, B., Lin, H.C., Tao, L., Zappella, L., B´ ejar, B., Yuh, D.D., Chen, C.C.G., Vidal, R. , Khudanpur, S., Hager, G.D.: Jhu-isi gesture and skill assessment working set (jigsaws) : A surgical activity dataset for human motion modeling. In: MICCAI W orkshop: M2CAI. Volume 3 . (2014)

work page 2014
[17]

Segmental Spatiotemporal CNNs for Fine-grained Action Segmentation

Lea, C., Reiter, A., Vidal, R., Hager, G.D.: Segmental sp atio-temporal cnns for ﬁne-grained action segmentation and classiﬁcation. arXiv preprint arX iv:1602.02995 (2016)

work page internal anchor Pith review Pith/arXiv arXiv 2016
[18]

In: Medical Image Compu ting and Computer Assisted Intervention – MICCAI 2018, Springer International Publis hing (2018) 273–280

Zia, A., Hung, A., Essa, I., Jarc, A.: Surgical activity r ecognition in robot-assisted radical prostatectomy using deep learning. In: Medical Image Compu ting and Computer Assisted Intervention – MICCAI 2018, Springer International Publis hing (2018) 273–280

work page 2018
[19]

International journal of compute r assisted radiology and surgery 12(7) (2017) 1171–1178

Zia, A., Zhang, C., Xiong, X., Jarc, A.M.: Temporal clust ering of surgical activities in robot-assisted surgery. International journal of compute r assisted radiology and surgery 12(7) (2017) 1171–1178

work page 2017
[20]

arXiv preprint arX iv:1811.11727 (2018)

Kannan, S., Yengera, G., Mutter, D., Marescaux, J., Pado y, N.: Future-state predicting lstm for early surgery type recognition. arXiv preprint arX iv:1811.11727 (2018)

work page arXiv 2018
[21]

Joint Surgical Gesture and Task Classification with Multi-Task and Multimodal Learning

Sarikaya, D., Guru, K.A., Corso, J.J.: Joint surgical ge sture and task classiﬁcation with multi-task and multimodal learning. arXiv preprint arXiv: 1805.00721 (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018
[22]

First International W ork shop, OR 2.0 2018, Held in Conjunction with MICCAI 2018, Granada, Spain (2018)

Chen, W., Feng, J., Lu, J., Zhou, J.: Endo3d: Online workﬂ ow analysis for endoscopic surgeries based on 3d cnn and lstm. First International W ork shop, OR 2.0 2018, Held in Conjunction with MICCAI 2018, Granada, Spain (2018)

work page 2018
[23]

IEEE transactions on medical imaging 37(5) (2018) 1114–1126

Jin, Y., Dou, Q., Chen, H., Yu, L., Qin, J., Fu, C.W., Heng, P.A.: Sv-rcnet: workﬂow recognition from surgical videos using recurrent convolut ional network. IEEE transactions on medical imaging 37(5) (2018) 1114–1126

work page 2018
[24]

In: Computer Vision and Patte rn Recognition, 2009

Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: A large-scale hierarchical image database. In: Computer Vision and Patte rn Recognition, 2009. CVPR

work page 2009
[25]

IEEE Conference on, Ieee (2009) 248–255

work page 2009
[26]

Psychometrika 12(2) (Jun 1947) 153–157

McNemar, Q.: Note on the sampling error of the diﬀerence b etween correlated proportions or percentages. Psychometrika 12(2) (Jun 1947) 153–157

work page 1947
[27]

BJU internatio nal (2018)

Hung, A.J., Oh, P.J., Chen, J., Ghodoussipour, S., Lane, C., Jarc, A., Gill, I.S.: Experts versus super experts: Diﬀerences in automated performance metrics and clinical outcomes for robot-assisted radical prostatectomy. BJU internatio nal (2018)

work page 2018
[28]

International Journal of Com puter Assisted Radiology and Surgery 11(6) (2016) 1201–1209

Malpani, A., Lea, C., Chen, C.C.G., Hager, G.D.: System e vents: readily accessible features for surgical phase detection. International Journal of Com puter Assisted Radiology and Surgery 11(6) (2016) 1201–1209

work page 2016

[1] [1]

New England Journal of Medicine 369(15) (2013) 1434–1442

Birkmeyer, J.D., Finks, J.F., O’reilly, A., Oerline, M., Carlin, A.M., Nunn, A.R., Dimick, J., Banerjee, M., Birkmeyer, N.J.: Surgical skill and compl ication rates after bariatric surgery. New England Journal of Medicine 369(15) (2013) 1434–1442

work page 2013

[2] [2]

Journal of Gr aduate Medical Education 9(6) (2017) 697–705

Dai, J.C., Lendvay, T.S., Sorensen, M.D.: Crowdsourcing in surgical skills acquisition: A developing technology in surgical education. Journal of Gr aduate Medical Education 9(6) (2017) 697–705

work page 2017

[3] [3]

The Journal of Urology (2018)

Chen, J., Cheng, N., Cacciamani, G., Oh, P., Lin-Brande, M ., Remulla, D., Gill, I.S., Hung, A.J.: Objective assessment of robotic surgical techn ical skill: A systemic review. The Journal of Urology (2018)

work page 2018

[4] [4]

The Journal of Urology 199(1) (2018) 296–304

Hung, A.J., Chen, J., Jarc, A., Hatcher, D., Djaladat, H., Gill, I.S.: Development and validation of objective performance metrics for robot-ass isted radical prostatectomy: a pilot study. The Journal of Urology 199(1) (2018) 296–304

work page 2018

[5] [5]

J ournal of Endourology 32(5) (2018) 438–444

Hung, A.J., Chen, J., Che, Z., Nilanon, T., Jarc, A., Titus , M., Oh, P.J., Gill, I.S., Liu, Y.: Utilizing machine learning and automated performance metr ics to evaluate robot-assisted radical prostatectomy performance and predict outcomes. J ournal of Endourology 32(5) (2018) 438–444

work page 2018

[6] [6]

Hung, A.J., Chen, J., Ghodoussipour, S., Oh, P.J., Liu, Z. , Nguyen, J., Purushotham, S., Gill, I.S., Liu, Y.: Deep learning on automated performance metrics and clinical features to predict urinary continence recovery after robot-assist ed radical prostatectomy. BJU international (2019)

work page 2019

[7] [7]

Teaching and Learning in Medicine 27(1) (2015) 12–26

Liu, M., Curet, M.: A review of training research and virtu al reality simulators for the da vinci surgical system. Teaching and Learning in Medicine 27(1) (2015) 12–26

work page 2015

[8] [8]

Medical Imag e Analysis 16(3) (2012) 632 – 641 Computer Assisted Interventions

Padoy, N., Blum, T., Ahmadi, S.A., Feussner, H., Berger, M .O., Navab, N.: Statistical modeling and recognition of surgical workﬂow. Medical Imag e Analysis 16(3) (2012) 632 – 641 Computer Assisted Interventions

work page 2012

[9] [9]

Internatio nal Journal of Computer Assisted Radiology and Surgery 10(9) (2015) 1427–1434

Kati´ c, D., Julliard, C., W ekerle, A.L., Kenngott, H., M¨ uller-Stich, B.P., Dillmann, R., Speidel, S., Jannin, P., Gibaud, B.: Lapontospm: an ontolog y for laparoscopic surgeries and its application to surgical phase recognition. Internatio nal Journal of Computer Assisted Radiology and Surgery 10(9) (2015) 1427–1434

work page 2015

[10] [10]

International Journal of Computer Assisted Radiology and Surgery 11(6) (2016) 1081–1089

Dergachyova, O., Bouget, D., Huaulm´ e, A., Morandi, X., Jannin, P.: Automatic data- driven real-time segmentation and recognition of surgical workﬂow. International Journal of Computer Assisted Radiology and Surgery 11(6) (2016) 1081–1089

work page 2016

[11] [11]

IEEE transactions on medical imaging 36(1) (2017) 86–97 12 Aneeq Zia 1 et al

Twinanda, A.P., Shehata, S., Mutter, D., Marescaux, J., De Mathelin, M., Padoy, N.: En- donet: A deep architecture for recognition tasks on laparos copic videos. IEEE transactions on medical imaging 36(1) (2017) 86–97 12 Aneeq Zia 1 et al

work page 2017

[12] [12]

In: International Conference on Medical Image Computing and Computer-Assist ed Intervention, Springer (2016) 551–558

DiPietro, R., Lea, C., Malpani, A., Ahmidi, N., Vedula, S .S., Lee, G.I., Lee, M.R., Hager, G.D.: Recognizing surgical activities with recurrent neur al networks. In: International Conference on Medical Image Computing and Computer-Assist ed Intervention, Springer (2016) 551–558

work page 2016

[13] [13]

IEEE transactions on bio-medi cal engineering (2017)

Ahmidi, N., Tao, L., Sefati, S., Gao, Y., Lea, C., Bejar, B ., Zappella, L., Khudanpur, S., Vidal, R., Hager, G.: A dataset and benchmarks for segmen tation and recognition of gestures in robotic surgery. IEEE transactions on bio-medi cal engineering (2017)

work page 2017

[14] [14]

In: Medical Image Computing and Computer-Assiste d Intervention–MICCAI

Ahmidi, N., Gao, Y., B´ ejar, B., Vedula, S.S., Khudanpur , S., Vidal, R., Hager, G.D.: String motif-based description of tool motion for detectin g skill and gestures in robotic surgery. In: Medical Image Computing and Computer-Assiste d Intervention–MICCAI

work page

[15] [15]

Springer (2013) 26–33

work page 2013

[16] [16]

, Khudanpur, S., Hager, G.D.: Jhu-isi gesture and skill assessment working set (jigsaws) : A surgical activity dataset for human motion modeling

Gao, Y., Vedula, S.S., Reiley, C.E., Ahmidi, N., Varadar ajan, B., Lin, H.C., Tao, L., Zappella, L., B´ ejar, B., Yuh, D.D., Chen, C.C.G., Vidal, R. , Khudanpur, S., Hager, G.D.: Jhu-isi gesture and skill assessment working set (jigsaws) : A surgical activity dataset for human motion modeling. In: MICCAI W orkshop: M2CAI. Volume 3 . (2014)

work page 2014

[17] [17]

Segmental Spatiotemporal CNNs for Fine-grained Action Segmentation

Lea, C., Reiter, A., Vidal, R., Hager, G.D.: Segmental sp atio-temporal cnns for ﬁne-grained action segmentation and classiﬁcation. arXiv preprint arX iv:1602.02995 (2016)

work page internal anchor Pith review Pith/arXiv arXiv 2016

[18] [18]

In: Medical Image Compu ting and Computer Assisted Intervention – MICCAI 2018, Springer International Publis hing (2018) 273–280

Zia, A., Hung, A., Essa, I., Jarc, A.: Surgical activity r ecognition in robot-assisted radical prostatectomy using deep learning. In: Medical Image Compu ting and Computer Assisted Intervention – MICCAI 2018, Springer International Publis hing (2018) 273–280

work page 2018

[19] [19]

International journal of compute r assisted radiology and surgery 12(7) (2017) 1171–1178

Zia, A., Zhang, C., Xiong, X., Jarc, A.M.: Temporal clust ering of surgical activities in robot-assisted surgery. International journal of compute r assisted radiology and surgery 12(7) (2017) 1171–1178

work page 2017

[20] [20]

arXiv preprint arX iv:1811.11727 (2018)

Kannan, S., Yengera, G., Mutter, D., Marescaux, J., Pado y, N.: Future-state predicting lstm for early surgery type recognition. arXiv preprint arX iv:1811.11727 (2018)

work page arXiv 2018

[21] [21]

Joint Surgical Gesture and Task Classification with Multi-Task and Multimodal Learning

Sarikaya, D., Guru, K.A., Corso, J.J.: Joint surgical ge sture and task classiﬁcation with multi-task and multimodal learning. arXiv preprint arXiv: 1805.00721 (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018

[22] [22]

First International W ork shop, OR 2.0 2018, Held in Conjunction with MICCAI 2018, Granada, Spain (2018)

Chen, W., Feng, J., Lu, J., Zhou, J.: Endo3d: Online workﬂ ow analysis for endoscopic surgeries based on 3d cnn and lstm. First International W ork shop, OR 2.0 2018, Held in Conjunction with MICCAI 2018, Granada, Spain (2018)

work page 2018

[23] [23]

IEEE transactions on medical imaging 37(5) (2018) 1114–1126

Jin, Y., Dou, Q., Chen, H., Yu, L., Qin, J., Fu, C.W., Heng, P.A.: Sv-rcnet: workﬂow recognition from surgical videos using recurrent convolut ional network. IEEE transactions on medical imaging 37(5) (2018) 1114–1126

work page 2018

[24] [24]

In: Computer Vision and Patte rn Recognition, 2009

Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: A large-scale hierarchical image database. In: Computer Vision and Patte rn Recognition, 2009. CVPR

work page 2009

[25] [25]

IEEE Conference on, Ieee (2009) 248–255

work page 2009

[26] [26]

Psychometrika 12(2) (Jun 1947) 153–157

McNemar, Q.: Note on the sampling error of the diﬀerence b etween correlated proportions or percentages. Psychometrika 12(2) (Jun 1947) 153–157

work page 1947

[27] [27]

BJU internatio nal (2018)

Hung, A.J., Oh, P.J., Chen, J., Ghodoussipour, S., Lane, C., Jarc, A., Gill, I.S.: Experts versus super experts: Diﬀerences in automated performance metrics and clinical outcomes for robot-assisted radical prostatectomy. BJU internatio nal (2018)

work page 2018

[28] [28]

International Journal of Com puter Assisted Radiology and Surgery 11(6) (2016) 1201–1209

Malpani, A., Lea, C., Chen, C.C.G., Hager, G.D.: System e vents: readily accessible features for surgical phase detection. International Journal of Com puter Assisted Radiology and Surgery 11(6) (2016) 1201–1209

work page 2016