ExpOS: Explainable Open-Surgery Skills Assessment Using 3D Hand Reconstruction

Idan Smoller; Roi Papo; Shlomi Laufer

arxiv: 2605.23653 · v1 · pith:O3IDYDHCnew · submitted 2026-05-22 · 💻 cs.CV

ExpOS: Explainable Open-Surgery Skills Assessment Using 3D Hand Reconstruction

Roi Papo , Idan Smoller , Shlomi Laufer This is my paper

Pith reviewed 2026-05-25 04:47 UTC · model grok-4.3

classification 💻 cs.CV

keywords open surgeryskill assessment3D hand reconstructionexplainable AItemporal convolutional networksattention mechanismskinematic analysissurgical training

0 comments

The pith

ExpOS learns temporal motion patterns from 3D hand reconstructions to predict open-surgery skill levels without expert-defined metrics.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Surgical training feedback currently depends on limited expert observation that does not scale. The paper shows that kinematic data extracted from 3D hand poses in videos can be processed automatically to rate performance on open-surgery tasks. A temporal convolutional network with attention pooling identifies the motion segments most predictive of skill, then fuses those with global statistics for the final prediction. The resulting scores correlate with expert ratings at up to 0.778 on fascial closure while also producing attention maps that indicate which behaviors drove each assessment. This removes the need for manually chosen metrics and supplies multi-level explanations of the output.

Core claim

ExpOS extracts hand poses and tool detections from 221 videos of three open-surgery tasks, derives kinematic descriptors and global motion statistics, models spatiotemporal hand-tool dynamics with a temporal convolutional backbone plus attention-based pooling, and fuses the resulting representations to predict skill level while generating frame-level importance maps.

What carries the argument

Attention-based pooling inside a temporal convolutional network applied to kinematic descriptors from 3D hand reconstruction, which produces frame-level importance maps and enables fusion with global motion statistics.

Load-bearing premise

The extracted 3D hand poses and tool detections together with the attention mechanism capture the motion characteristics that actually determine expert-rated skill.

What would settle it

A new collection of open-surgery videos rated independently by experts where the model's predicted skill scores show little or no correlation with those ratings would falsify the central claim.

Figures

Figures reproduced from arXiv: 2605.23653 by Idan Smoller, Roi Papo, Shlomi Laufer.

**Figure 1.** Figure 1: Overview of ExpOS explainability outputs: (A) 3D hand pose, (B) temporal attention highlights key video frames, and (C) SHAP-based global feature importance reveals which motion statistics most influenced the skill score. We propose ExpOS, an explainable framework for data-driven surgical skill assessment that enables automatic, feedback-oriented evaluation. 3D hand trajectories and kinematic descriptors … view at source ↗

**Figure 2.** Figure 2: Overview of the procedures executed by the surgical residents A. Suturing, B. Knot Tying, C. Facial closure. A YOLO [12] model was trained to detect surgical tools (forceps, needle driver, scissors, simulator) using 731 annotated frames with augmentation to ensure robustness. For hand detection, we employed RoHans [17] pseudo-label self-training to refine a YOLO-based detector, which was then integrated in… view at source ↗

**Figure 3.** Figure 3: Overview of the proposed framework pipeline. The input video is processed into per-frame features consisting of 3D hand keypoints and bounding box information for hands and tools. Additionally, statistical global features are also calculated. The per-frame features are fed into the MS-TCN++ module, which generates a sequence of hidden representations over time. Temporal Attention Pooling is applied to extr… view at source ↗

**Figure 4.** Figure 4: SHAP beeswarm plot for the Fascial Closure test set. Each point of a feature corresponds to a video, with SHAP values (x-axis) indicating the features impact on the predicted skill score and colors denoting feature values (blue = low, red = high). Task completion time, still forceps usage, and coordination between hands emerged as the most influential descriptors. 6 Discussion and Conclusion Our results de… view at source ↗

read the original abstract

Timely and transparent feedback is essential for effective surgical training, yet current assessment remains dependent on expert observation, limiting scalability and opportunities for autonomous practice. We present ExpOS, an explainable framework for data-driven assessment of open-surgery skills designed to enable automatic, feedback-oriented evaluation. Rather than relying on expert-defined metrics, ExpOS learns discriminative temporal patterns directly from motion data and identifies the segments and behaviors most predictive of skill level. We trained and evaluated the method on 221 videos of medical students performing three open-surgery tasks. Hand poses and tool detections were extracted from each frame to derive kinematic descriptors and global motion statistics. Spatiotemporal hand-tool dynamics were modeled using a temporal convolutional backbone with attention-based pooling to generate frame-level importance maps. These representations were fused with global motion statistics to predict skill level and to provide interpretable feedback. ExpOS provides multi-level explainability by identifying when informative events occur through attention weights and which motion characteristics most influence predictions through global feature analysis. Across tasks, the framework achieved strong correlation with expert ratings, with best performance on fascial closure (r = 0.778, R2 = 0.74). These results demonstrate that combining weakly-supervised temporal importance learning with interpretable motion statistics enables scalable and actionable surgical skill assessment.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

ExpOS gets a solid correlation (r=0.778) between 3D hand motion features and expert ratings on open surgery tasks via attention, but the validation details are thin.

read the letter

ExpOS shows that 3D hand and tool tracking plus a temporal attention model can produce predictions that line up with expert skill ratings on open procedures, with the strongest result on fascial closure. The dataset of 221 student videos across three tasks is a practical starting point, and the architecture fuses local spatiotemporal patterns with global motion statistics while adding attention weights for frame-level feedback. That combination is the main concrete advance: it moves away from hand-crafted metrics toward learned patterns that still aim to be interpretable at multiple levels. The explainability claims are grounded in the attention maps and feature importance analysis rather than post-hoc stories. The numbers are reported directly against external ratings, which avoids obvious circularity. The soft spots sit in the evaluation. The abstract and summary give correlations but leave open how the train/test splits were handled, whether performance holds across students or only within tasks, and what the inter-rater reliability was on the ground-truth scores. No direct baseline comparisons appear in the provided description, so it is unclear how much the attention component adds over simpler kinematic aggregates. The accuracy of the upstream 3D reconstruction step is also not quantified, which matters because pose noise could inflate or deflate the reported correlations. This paper is for the small group working on automated surgical training tools or applied CV in medicine. A reader who needs a working example of attention-based skill assessment on real video would get usable ideas from it. It has enough data and a clear applied claim that it deserves a serious referee rather than a desk reject; the experiments would likely need tightening but the core pipeline is worth reviewing.

Referee Report

3 major / 2 minor

Summary. The paper introduces ExpOS, a framework for automatic, explainable assessment of open-surgery skills from video. It extracts 3D hand poses and tool detections to derive kinematic descriptors and global motion statistics, models spatiotemporal dynamics with a temporal convolutional backbone plus attention-based pooling to produce frame-level importance maps, fuses these with global statistics to predict expert-rated skill levels, and supplies multi-level interpretability via attention weights and feature analysis. Evaluated on 221 videos of medical students performing three tasks, it reports correlations with expert ratings (best: r=0.778, R²=0.74 on fascial closure).

Significance. If the reported correlations prove robust under proper validation, the approach could meaningfully advance scalable, feedback-oriented surgical training by reducing reliance on constant expert observation while adding interpretability. The data-driven learning of temporal patterns from 3D kinematics, combined with explicit attention maps, is a constructive direction; however, the absence of baseline comparisons and validation details in the provided description limits immediate assessment of impact.

major comments (3)

[Results] Results section: the reported correlations (r=0.778, R²=0.74 on fascial closure; similar figures across tasks) are presented without baseline methods, confidence intervals, or error bars, and without stating the train/test split strategy or subject-wise cross-validation. These omissions are load-bearing for the central claim of 'strong correlation' and generalizability.
[Methods] Methods/Evaluation: no information is given on the number of expert raters, inter-rater reliability (e.g., ICC or Cohen's kappa), or how label variability was handled when training the supervised model. This directly affects the reliability of the target labels used to train the temporal attention model.
[Methods] Methods: the claim that the framework 'rather than relying on expert-defined metrics, learns discriminative temporal patterns directly from motion data' requires clarification on whether any post-hoc feature selection or expert-derived thresholds were applied to the kinematic descriptors before fusion; if present, this would qualify the 'parameter-free' or purely data-driven assertion.

minor comments (2)

[Abstract] Abstract and text: the phrase 'weakly-supervised temporal importance learning' is used but the model is trained end-to-end on expert skill labels; a brief clarification of the supervision level would improve precision.
[Figures] Figure captions and text: ensure that attention-map visualizations are accompanied by quantitative metrics (e.g., overlap with annotated critical segments) rather than qualitative examples alone.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive feedback, which highlights important aspects of validation and methodological clarity. We address each major comment below and commit to revisions that strengthen the manuscript without altering its core contributions.

read point-by-point responses

Referee: [Results] Results section: the reported correlations (r=0.778, R²=0.74 on fascial closure; similar figures across tasks) are presented without baseline methods, confidence intervals, or error bars, and without stating the train/test split strategy or subject-wise cross-validation. These omissions are load-bearing for the central claim of 'strong correlation' and generalizability.

Authors: We agree these reporting elements are necessary to substantiate the claims. In the revised manuscript we will add baseline comparisons (including a non-attention temporal model and regression on global statistics alone), report 95% confidence intervals computed via bootstrapping, include error bars on figures, and explicitly describe the subject-wise cross-validation protocol used to ensure no leakage across individuals. revision: yes
Referee: [Methods] Methods/Evaluation: no information is given on the number of expert raters, inter-rater reliability (e.g., ICC or Cohen's kappa), or how label variability was handled when training the supervised model. This directly affects the reliability of the target labels used to train the temporal attention model.

Authors: The current manuscript omits these details. We will revise the Methods section to include the number of expert raters, describe how ratings were aggregated to produce the target labels, and either report inter-rater reliability if available from the dataset or explicitly note its absence as a limitation while clarifying that the provided expert ratings constitute the standard supervision signal. revision: yes
Referee: [Methods] Methods: the claim that the framework 'rather than relying on expert-defined metrics, learns discriminative temporal patterns directly from motion data' requires clarification on whether any post-hoc feature selection or expert-derived thresholds were applied to the kinematic descriptors before fusion; if present, this would qualify the 'parameter-free' or purely data-driven assertion.

Authors: No post-hoc feature selection or expert-derived thresholds were applied at any stage. Kinematic descriptors are computed directly from the 3D hand poses and tool detections; the temporal convolutional backbone with attention learns discriminative patterns end-to-end, and fusion with global statistics occurs without manual rules. We will add an explicit clarifying sentence in the Methods section to remove any ambiguity regarding the data-driven nature of the pipeline. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper describes a supervised learning pipeline that extracts kinematic features from 3D hand and tool detections, then trains a temporal convolutional model with attention to predict externally provided expert skill ratings on held-out videos. The reported correlations (r = 0.778, R2 = 0.74) are direct statistical comparisons against those independent labels rather than any internal reconstruction, self-fit, or renamed input. No equations, self-citations, or uniqueness claims appear in the provided text that would reduce the performance numbers to a definitional tautology or fitted-input prediction. The derivation chain therefore remains self-contained against the external expert benchmark.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract supplies insufficient technical detail to enumerate free parameters, axioms, or invented entities; all fields left empty.

pith-pipeline@v0.9.0 · 5764 in / 1025 out tokens · 40636 ms · 2026-05-25T04:47:57.602898+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

24 extracted references · 24 canonical work pages

[1]

IEEE Robotics and Automation Letters 8(2), 1755–1762 (2023)

Anastasiou, D., Jin, Y., Stoyanov, D., Mazomenos, E.: Keep your eye on the best: Contrastive regression transformer for skill assessment in robotic surgery. IEEE Robotics and Automation Letters 8(2), 1755–1762 (2023)

work page 2023
[2]

International Journal of Computer Assisted Radiology and Surgery 18(7), 1279–1285 (2023)

Bkheet, E., DAngelo, A.L., Goldbraikh, A., Laufer, S.: Using hand pose estimation to automate open surgery training feedback. International Journal of Computer Assisted Radiology and Surgery 18(7), 1279–1285 (2023)

work page 2023
[3]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Chefer, H., Gur, S., Wolf, L.: Transformer interpretability beyond attention visu- alization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 782–791 (2021)

work page 2021
[4]

The American journal of surgery 209(4), 645–651 (2015)

D’Angelo, A.L.D., Rutherford, D.N., Ray, R.D., Laufer, S., Kwan, C., Cohen, E.R., Mason, A., Pugh, C.M.: Idle time: an underdeveloped performance metric for as- sessing surgical skill. The American journal of surgery 209(4), 645–651 (2015)

work page 2015
[5]

In: Proceedings of NAACL-HLT

Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: Pre-training of deep bidirec- tional transformers for language understanding. In: Proceedings of NAACL-HLT. pp. 4171–4186. Association for Computational Linguistics (2019)

work page 2019
[6]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (June 2019)

Diaz, R., Marathe, A.: Soft labels for ordinal regression. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (June 2019)

work page 2019
[7]

International Journal of Computer Assisted Radiology and Surgery 14, 1217–1225 (2019)

Funke, I., Mees, S., Weitz, J., Speidel, S.: Video-based surgical skill assessment us- ing 3d convolutional neural networks. International Journal of Computer Assisted Radiology and Surgery 14, 1217–1225 (2019)

work page 2019
[8]

In: MICCAI workshop: M2cai

Gao, Y., Vedula, S.S., Reiley, C.E., Ahmidi, N., Varadarajan, B., Lin, H.C., Tao, L., Zappella, L., Béjar, B., Yuh, D.D., et al.: Jhu-isi gesture and skill assessment working set (jigsaws): A surgical activity dataset for human motion modeling. In: MICCAI workshop: M2cai. vol. 3, p. 3 (2014)

work page 2014
[9]

International Journal of Computer As- sisted Radiology and Surgery 17, 437–448 (2022)

Goldbraikh, A., D’Angelo, A., Pugh, C., Laufer, S.: Video-based fully automatic assessment of open surgery suturing skills. International Journal of Computer As- sisted Radiology and Surgery 17, 437–448 (2022)

work page 2022
[10]

International Journal of Computer Assisted Radiology and Surgery (2024)

Hoﬀmann, H., Funke, I., Peters, P., Venkatesh, D., Egger, J., Rivoir, D., Röhrig, R., Hölzle, F., Bodenstedt, S., Willemer, M., Speidel, S., Puladi, B.: Aixsuture: vision-based assessment of open suturing skills. International Journal of Computer Assisted Radiology and Surgery (2024)

work page 2024
[11]

International journal of computer assisted radiology and surgery 14(9), 1611–1617 (2019)

Ismail Fawaz, H., Forestier, G., Weber, J., Idoumghar, L., Muller, P.A.: Accurate and interpretable evaluation of surgical skills from kinematic data using fully con- volutional neural networks. International journal of computer assisted radiology and surgery 14(9), 1611–1617 (2019)

work page 2019
[12]

Jocher, G., Chaurasia, A., Qiu, J.: Ultralytics YOLO (Jan 2023), https://github.com/ultralytics/ultralytics

work page 2023
[13]

NPJ digital medicine 5(1), 24 (2022)

Lam, K., Chen, J., Wang, Z., Iqbal, F.M., Darzi, A., Lo, B., Purkayastha, S., Kin- ross, J.M.: Machine learning for technical skill assessment in surgery: a systematic review. NPJ digital medicine 5(1), 24 (2022)

work page 2022
[14]

IEEE Trans- actions on Pattern Analysis and Machine Intelligence pp

Li, S.J., AbuFarha, Y., Liu, Y., Cheng, M.M., Gall, J.: Ms-tcn++: Multi- stage temporal convolutional network for action segmentation. IEEE Trans- actions on Pattern Analysis and Machine Intelligence pp. 1–1 (2020). https://doi.org/10.1109/TPAMI.2020.3021756

work page doi:10.1109/tpami.2020.3021756 2020
[15]

In: Proceedings of the 31st International Conference on Neural Information Processing Systems

Lundberg, S.M., Lee, S.I.: A uniﬁed approach to interpreting model predictions. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. pp. 4768–4777. Curran Associates Inc. (2017) 10 Roi Papo et al

work page 2017
[16]

British journal of surgery 84(2), 273–278 (1997)

Martin, J., Regehr, G., Reznick, R., Macrae, H., Murnaghan, J., Hutchison, C., Brown, M.: Objective structured assessment of technical skill (osats) for surgical residents. British journal of surgery 84(2), 273–278 (1997)

work page 1997
[17]

arXiv preprint arXiv:2501.08115 (2025)

Papo, R., Gershov, S., Friedman, T., et al.: Rohan: Robust hand detection in operation room. arXiv preprint arXiv:2501.08115 (2025)

work page arXiv 2025
[18]

Electronics 13(8), 1430 (2024)

Perikos, I., Tzaﬁlkou, K., Grivokostopoulou, F.: Bert-based transformers for sentiment analysis: A comparative study of explainability methods. Electronics 13(8), 1430 (2024). https://doi.org/10.3390/electronics13081430, https://doi.org/10.3390/electronics13081430

work page doi:10.3390/electronics13081430 2024
[19]

arXiv preprint arXiv:2409.12259 (2024)

Potamias, R.A., Zhang, J., Deng, J., Zafeiriou, S.: Wilor: End-to-end 3d hand localization and reconstruction in-the-wild. arXiv preprint arXiv:2409.12259 (2024)

work page arXiv 2024
[20]

Analytical Chemistry 36(8), 1627–1639 (1964)

Savitzky, A., Golay, M.J.E.: Smoothing and diﬀerentiation of data by simpli- ﬁed least squares procedures. Analytical Chemistry 36(8), 1627–1639 (1964). https://doi.org/10.1021/ac60214a047

work page doi:10.1021/ac60214a047 1964
[21]

In: Interna- tional conference on information processing in computer-assisted interventions

Tao, L., Elhamifar, E., Khudanpur, S., Hager, G.D., Vidal, R.: Sparse hidden markov models for surgical gesture classiﬁcation and skill evaluation. In: Interna- tional conference on information processing in computer-assisted interventions. pp. 167–177. Springer (2012)

work page 2012
[22]

IEEE Transactions on Neural Networks and Learning Systems 32(11), 4793–4813 (2020)

Tjoa, E., Guan, C.: A survey on explainable artiﬁcial intelligence (xai): Toward medical xai. IEEE Transactions on Neural Networks and Learning Systems 32(11), 4793–4813 (2020)

work page 2020
[23]

Annual review of biomedical engineering 19, 301–325 (2017)

Vedula, S.S., Ishii, M., Hager, G.D.: Objective assessment of surgical technical skill and competency in the operating room. Annual review of biomedical engineering 19, 301–325 (2017)

work page 2017
[24]

The Journal of Defense Modeling and Simulation 19(2), 159– 171 (2022)

Yanik, E., Intes, X., Kruger, U., Yan, P., Diller, D., Van Voorst, B., Makled, B., Norﬂeet, J., De, S.: Deep neural networks for the assessment of surgical skills: A systematic review. The Journal of Defense Modeling and Simulation 19(2), 159– 171 (2022)

work page 2022

[1] [1]

IEEE Robotics and Automation Letters 8(2), 1755–1762 (2023)

Anastasiou, D., Jin, Y., Stoyanov, D., Mazomenos, E.: Keep your eye on the best: Contrastive regression transformer for skill assessment in robotic surgery. IEEE Robotics and Automation Letters 8(2), 1755–1762 (2023)

work page 2023

[2] [2]

International Journal of Computer Assisted Radiology and Surgery 18(7), 1279–1285 (2023)

Bkheet, E., DAngelo, A.L., Goldbraikh, A., Laufer, S.: Using hand pose estimation to automate open surgery training feedback. International Journal of Computer Assisted Radiology and Surgery 18(7), 1279–1285 (2023)

work page 2023

[3] [3]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Chefer, H., Gur, S., Wolf, L.: Transformer interpretability beyond attention visu- alization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 782–791 (2021)

work page 2021

[4] [4]

The American journal of surgery 209(4), 645–651 (2015)

D’Angelo, A.L.D., Rutherford, D.N., Ray, R.D., Laufer, S., Kwan, C., Cohen, E.R., Mason, A., Pugh, C.M.: Idle time: an underdeveloped performance metric for as- sessing surgical skill. The American journal of surgery 209(4), 645–651 (2015)

work page 2015

[5] [5]

In: Proceedings of NAACL-HLT

Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: Pre-training of deep bidirec- tional transformers for language understanding. In: Proceedings of NAACL-HLT. pp. 4171–4186. Association for Computational Linguistics (2019)

work page 2019

[6] [6]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (June 2019)

Diaz, R., Marathe, A.: Soft labels for ordinal regression. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (June 2019)

work page 2019

[7] [7]

International Journal of Computer Assisted Radiology and Surgery 14, 1217–1225 (2019)

Funke, I., Mees, S., Weitz, J., Speidel, S.: Video-based surgical skill assessment us- ing 3d convolutional neural networks. International Journal of Computer Assisted Radiology and Surgery 14, 1217–1225 (2019)

work page 2019

[8] [8]

In: MICCAI workshop: M2cai

Gao, Y., Vedula, S.S., Reiley, C.E., Ahmidi, N., Varadarajan, B., Lin, H.C., Tao, L., Zappella, L., Béjar, B., Yuh, D.D., et al.: Jhu-isi gesture and skill assessment working set (jigsaws): A surgical activity dataset for human motion modeling. In: MICCAI workshop: M2cai. vol. 3, p. 3 (2014)

work page 2014

[9] [9]

International Journal of Computer As- sisted Radiology and Surgery 17, 437–448 (2022)

Goldbraikh, A., D’Angelo, A., Pugh, C., Laufer, S.: Video-based fully automatic assessment of open surgery suturing skills. International Journal of Computer As- sisted Radiology and Surgery 17, 437–448 (2022)

work page 2022

[10] [10]

International Journal of Computer Assisted Radiology and Surgery (2024)

Hoﬀmann, H., Funke, I., Peters, P., Venkatesh, D., Egger, J., Rivoir, D., Röhrig, R., Hölzle, F., Bodenstedt, S., Willemer, M., Speidel, S., Puladi, B.: Aixsuture: vision-based assessment of open suturing skills. International Journal of Computer Assisted Radiology and Surgery (2024)

work page 2024

[11] [11]

International journal of computer assisted radiology and surgery 14(9), 1611–1617 (2019)

Ismail Fawaz, H., Forestier, G., Weber, J., Idoumghar, L., Muller, P.A.: Accurate and interpretable evaluation of surgical skills from kinematic data using fully con- volutional neural networks. International journal of computer assisted radiology and surgery 14(9), 1611–1617 (2019)

work page 2019

[12] [12]

Jocher, G., Chaurasia, A., Qiu, J.: Ultralytics YOLO (Jan 2023), https://github.com/ultralytics/ultralytics

work page 2023

[13] [13]

NPJ digital medicine 5(1), 24 (2022)

Lam, K., Chen, J., Wang, Z., Iqbal, F.M., Darzi, A., Lo, B., Purkayastha, S., Kin- ross, J.M.: Machine learning for technical skill assessment in surgery: a systematic review. NPJ digital medicine 5(1), 24 (2022)

work page 2022

[14] [14]

IEEE Trans- actions on Pattern Analysis and Machine Intelligence pp

Li, S.J., AbuFarha, Y., Liu, Y., Cheng, M.M., Gall, J.: Ms-tcn++: Multi- stage temporal convolutional network for action segmentation. IEEE Trans- actions on Pattern Analysis and Machine Intelligence pp. 1–1 (2020). https://doi.org/10.1109/TPAMI.2020.3021756

work page doi:10.1109/tpami.2020.3021756 2020

[15] [15]

In: Proceedings of the 31st International Conference on Neural Information Processing Systems

Lundberg, S.M., Lee, S.I.: A uniﬁed approach to interpreting model predictions. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. pp. 4768–4777. Curran Associates Inc. (2017) 10 Roi Papo et al

work page 2017

[16] [16]

British journal of surgery 84(2), 273–278 (1997)

Martin, J., Regehr, G., Reznick, R., Macrae, H., Murnaghan, J., Hutchison, C., Brown, M.: Objective structured assessment of technical skill (osats) for surgical residents. British journal of surgery 84(2), 273–278 (1997)

work page 1997

[17] [17]

arXiv preprint arXiv:2501.08115 (2025)

Papo, R., Gershov, S., Friedman, T., et al.: Rohan: Robust hand detection in operation room. arXiv preprint arXiv:2501.08115 (2025)

work page arXiv 2025

[18] [18]

Electronics 13(8), 1430 (2024)

Perikos, I., Tzaﬁlkou, K., Grivokostopoulou, F.: Bert-based transformers for sentiment analysis: A comparative study of explainability methods. Electronics 13(8), 1430 (2024). https://doi.org/10.3390/electronics13081430, https://doi.org/10.3390/electronics13081430

work page doi:10.3390/electronics13081430 2024

[19] [19]

arXiv preprint arXiv:2409.12259 (2024)

Potamias, R.A., Zhang, J., Deng, J., Zafeiriou, S.: Wilor: End-to-end 3d hand localization and reconstruction in-the-wild. arXiv preprint arXiv:2409.12259 (2024)

work page arXiv 2024

[20] [20]

Analytical Chemistry 36(8), 1627–1639 (1964)

Savitzky, A., Golay, M.J.E.: Smoothing and diﬀerentiation of data by simpli- ﬁed least squares procedures. Analytical Chemistry 36(8), 1627–1639 (1964). https://doi.org/10.1021/ac60214a047

work page doi:10.1021/ac60214a047 1964

[21] [21]

In: Interna- tional conference on information processing in computer-assisted interventions

Tao, L., Elhamifar, E., Khudanpur, S., Hager, G.D., Vidal, R.: Sparse hidden markov models for surgical gesture classiﬁcation and skill evaluation. In: Interna- tional conference on information processing in computer-assisted interventions. pp. 167–177. Springer (2012)

work page 2012

[22] [22]

IEEE Transactions on Neural Networks and Learning Systems 32(11), 4793–4813 (2020)

Tjoa, E., Guan, C.: A survey on explainable artiﬁcial intelligence (xai): Toward medical xai. IEEE Transactions on Neural Networks and Learning Systems 32(11), 4793–4813 (2020)

work page 2020

[23] [23]

Annual review of biomedical engineering 19, 301–325 (2017)

Vedula, S.S., Ishii, M., Hager, G.D.: Objective assessment of surgical technical skill and competency in the operating room. Annual review of biomedical engineering 19, 301–325 (2017)

work page 2017

[24] [24]

The Journal of Defense Modeling and Simulation 19(2), 159– 171 (2022)

Yanik, E., Intes, X., Kruger, U., Yan, P., Diller, D., Van Voorst, B., Makled, B., Norﬂeet, J., De, S.: Deep neural networks for the assessment of surgical skills: A systematic review. The Journal of Defense Modeling and Simulation 19(2), 159– 171 (2022)

work page 2022