Touchless Intraoperative Image Access System Based on Vision-Based Hand Tracking
Pith reviewed 2026-05-08 04:29 UTC · model grok-4.3
The pith
A single RGB camera enables touchless hand-gesture control of medical images during surgery without added hardware or training.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors show that real-time hand position data from a single camera can be interpreted through straightforward gesture mappings to deliver continuous translation, rotation, and zoom operations on medical images, achieving latency and stability levels consistent with fluid interaction, all without extra sensors, user calibration, or changes to the visualization software.
What carries the argument
Real-time mapping of detected hand positions and movements to continuous image manipulation commands for translation, rotation, and zoom.
Load-bearing premise
Hand tracking stays accurate and the selected gestures remain intuitive under the variable lighting, partial hand occlusions, and time pressure of a real operating room without any user-specific calibration or training.
What would settle it
A controlled test in which tracking accuracy or command responsiveness drops sharply when the camera faces typical operating-room lighting, gloved hands, and routine obstructions by instruments or personnel would show the approach does not yet meet surgical conditions.
Figures
read the original abstract
Touchless interaction with medical images is becoming increasingly important in the surgical field, where sterility and continuity of the operational workflow are essential requirements. This work presents a vision-based system for intraoperative navigation of medical images through hand gestures acquired using a single RGB camera. Unlike many existing solutions, the system does not require additional hardware or user-specific training. Hand tracking is performed in real time using MediaPipe Hands, which provides a 2.5D estimation of hand landmarks. Simple and intuitive gestures are then mapped into translation, rotation, and zoom commands, enabling continuous and natural interaction with the image viewer. The system architecture is independent from the visualization software and, for implementation simplicity, in this study it was integrated with PyVista. Performance was evaluated through frame-level logging and quantitative analysis of latency, stability, and interaction robustness metrics. Experimental results highlight real-time behavior, with reduced latencies and stable control, in line with the requirements of fluid interaction. The system demonstrates the feasibility of a low-cost touchless solution for intraoperative access to medical images, laying the groundwork for future clinical evaluations.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper describes a vision-based touchless system for intraoperative medical image navigation using a single RGB camera and MediaPipe Hands for real-time 2.5D hand landmark tracking. Simple gestures are mapped to translation, rotation, and zoom commands in a PyVista-integrated viewer. The architecture requires no additional hardware or user-specific training/calibration. Performance is assessed via frame-level logging and quantitative metrics on latency, stability, and robustness, with the central claim being a demonstration of feasibility for a low-cost sterile solution that supports fluid interaction and grounds future clinical work.
Significance. If the reported real-time performance and stability hold under realistic conditions, the work offers a practical, low-cost integration of off-the-shelf vision tools for sterile image access in surgery, potentially reducing workflow interruptions. The calibration-free design and software independence are practical strengths for prototyping, though the absence of detailed numerical results and robustness data limits immediate impact.
major comments (2)
- [Abstract; evaluation section] Abstract and evaluation section: the manuscript states that 'quantitative analysis of latency, stability, and interaction robustness metrics' was performed and that results show 'real-time behavior, with reduced latencies and stable control,' yet supplies no numerical values, standard deviations, test conditions (e.g., frame rate, hardware, distance), or baseline comparisons. This directly weakens support for the feasibility claim.
- [Evaluation section] Evaluation section: only aggregate metrics from (presumably) controlled conditions are reported; no quantitative breakdown of MediaPipe landmark detection accuracy, gesture misclassification rate, or end-to-end task success under OR-typical perturbations (variable lighting, partial occlusions by gloves/instruments, surgeon movement) is provided. Because the system explicitly avoids calibration or retraining, any degradation in 2.5D estimates directly undermines the 'stable control' and 'intuitive interaction' assertions required for even a feasibility demonstration.
minor comments (2)
- [Abstract] The abstract claims 'in line with the requirements of fluid interaction' without citing specific clinical latency thresholds or prior literature values for comparison.
- [System architecture] Notation for gesture-to-command mapping and the exact PyVista integration interface could be clarified with a diagram or pseudocode for reproducibility.
Simulated Author's Rebuttal
We thank the referee for the insightful comments on our manuscript. We provide point-by-point responses to the major comments and have updated the manuscript to address the concerns where possible.
read point-by-point responses
-
Referee: [Abstract; evaluation section] the manuscript states that 'quantitative analysis of latency, stability, and interaction robustness metrics' was performed and that results show 'real-time behavior, with reduced latencies and stable control,' yet supplies no numerical values, standard deviations, test conditions (e.g., frame rate, hardware, distance), or baseline comparisons. This directly weakens support for the feasibility claim.
Authors: We agree that the absence of specific numerical values weakens the presentation of our results. In the revised manuscript, we will include the quantitative metrics obtained from our frame-level logging, such as the measured latencies, stability, and robustness values along with the corresponding standard deviations, test conditions including hardware setup, frame rate, and camera distance, as well as baseline comparisons to support the feasibility claim. revision: yes
-
Referee: [Evaluation section] only aggregate metrics from (presumably) controlled conditions are reported; no quantitative breakdown of MediaPipe landmark detection accuracy, gesture misclassification rate, or end-to-end task success under OR-typical perturbations (variable lighting, partial occlusions by gloves/instruments, surgeon movement) is provided. Because the system explicitly avoids calibration or retraining, any degradation in 2.5D estimates directly undermines the 'stable control' and 'intuitive interaction' assertions required for even a feasibility demonstration.
Authors: We recognize the importance of detailed breakdowns for validating the system's performance. We will add a quantitative breakdown of MediaPipe landmark detection accuracy and gesture misclassification rates from our experiments in controlled conditions to the Evaluation section. Regarding OR-typical perturbations, the current study was conducted in a controlled lab environment to demonstrate basic feasibility. We have added a discussion on the potential impact of such perturbations on the calibration-free system and plan to address full robustness testing in future clinical work. This partial revision strengthens the current claims while acknowledging limitations. revision: partial
- Quantitative evaluation of end-to-end task success under realistic operating room perturbations such as variable lighting and occlusions, since these were not part of the original experiments.
Circularity Check
No circularity: system integration paper with no derivations or fitted predictions
full rationale
The manuscript describes a vision-based hand-tracking system using MediaPipe, simple gesture-to-command mappings, and integration with PyVista. No equations, parameter fitting, or predictive claims appear in the provided text. Performance metrics (latency, stability) are reported from direct experiments rather than derived from prior fitted quantities. No self-citations form load-bearing premises, and the feasibility claim rests on empirical logging rather than any self-referential reduction. This is a standard engineering integration report whose central assertions do not collapse into their own inputs.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption MediaPipe Hands supplies sufficiently accurate and stable 2.5D hand landmarks in real time for gesture recognition without user-specific training.
- domain assumption The chosen gestures are intuitive and do not require training for surgeons.
Reference graph
Works this paper leans on
-
[1]
Allegranzi, Benedetta, et al. ”New WHO recommendations on preoper- ative measures for surgical site infection prevention: an e vidence-based global perspective.” The Lancet Infectious Diseases 16.12 (2016): e276- e287. TABLE I SYSTEM -LEVEL PERFORMANCE OF THE PROPOSED TOUCHLESS INTERFACE ACRO SS INDIVIDUAL GESTURE MODES AND OVERALL SYSTEM BEHAVIOR CMD-Gen...
2016
-
[2]
Weber, David J., Deverick Anderson, and William A. Rutal a. ”The role of the surface environment in healthcare-associated infec tions.” Current opinion in infectious diseases 26.4 (2013): 338-344
2013
-
[3]
M., et al
Cook, T. M., et al. ”Consensus guidelines for managing th e airway in patients with COVID-19: Guidelines from the Difficult Airwa y Society, the Association of Anaesthetists the Intensive Care Societ y, the Faculty of Intensive Care Medicine and the Royal College of Anaesthe tists.” Anaesthesia 75.6 (2020): 785-799
2020
-
[4]
”Perioperative COVID-19 defen se: an evidence- based approach for optimization of infection control and op erating room management.” Anesthesia & Analgesia 131.1 (2020): 37-42
Dexter, Franklin, et al. ”Perioperative COVID-19 defen se: an evidence- based approach for optimization of infection control and op erating room management.” Anesthesia & Analgesia 131.1 (2020): 37-42
2020
-
[5]
”Interactional order and constru cted ways of seeing with touchless imaging systems in surgery.” Compute r Supported Cooperative Work (CSCW) 23.3 (2014): 299-337
O’Hara, Kenton, et al. ”Interactional order and constru cted ways of seeing with touchless imaging systems in surgery.” Compute r Supported Cooperative Work (CSCW) 23.3 (2014): 299-337
2014
-
[6]
”Interaction proxemics and ima ge use in neu- rosurgery.” Proceedings of the SIGCHI Conference on Human F actors in Computing Systems
Mentis, Helena M., et al. ”Interaction proxemics and ima ge use in neu- rosurgery.” Proceedings of the SIGCHI Conference on Human F actors in Computing Systems. 2012
2012
-
[7]
”A gesture-based tool for sterile b rowsing of radiology images.” Journal of the American Medical Infor matics Association 15.3 (2008): 321-323
Wachs, Juan P ., et al. ”A gesture-based tool for sterile b rowsing of radiology images.” Journal of the American Medical Infor matics Association 15.3 (2008): 321-323
2008
-
[8]
Jacob, Mithun George, Juan Pablo Wachs, and Rebecca A. Pa cker. ”Hand-gesture-based sterile interface for the operating r oom using contextual cues for the navigation of radiological images. ” Journal of the American Medical Informatics Association 20.e1 (2013): e1 83-e186
2013
-
[9]
”Advances in the development and ap plication of non-contact intraoperative image access systems.” BioMed ical Engineer- ing OnLine 23.1 (2024): 108
Liu, Zhengnan, et al. ”Advances in the development and ap plication of non-contact intraoperative image access systems.” BioMed ical Engineer- ing OnLine 23.1 (2024): 108
2024
-
[10]
Mewes, Andre, et al. ”Touchless interaction with softw are in interven- tional radiology and surgery: a systematic literature revi ew.” Interna- tional journal of computer assisted radiology and surgery 1 2.2 (2017): 291-305
2017
-
[11]
”The state of the art of spatial interfaces for 3D visualization.” Computer Graphics Forum
Besanc ¸on, Lonni, et al. ”The state of the art of spatial interfaces for 3D visualization.” Computer Graphics Forum. V ol. 40. No. 1. 20 21
-
[12]
”Controller- free exploration of medical image data: Experiencing the Ki nect.” 2011 24th international symposium on computer-based medic al systems (CBMS)
Gallo, Luigi, Alessio Pierluigi Placitelli, and Mario Ciampi. ”Controller- free exploration of medical image data: Experiencing the Ki nect.” 2011 24th international symposium on computer-based medic al systems (CBMS). IEEE, 2011
2011
-
[13]
”Touchless interfaces in the o perating room: A study in gesture preferences.” International Journal of Hu man–Computer Interaction 39.3 (2023): 438-448
Madapana, Naveen, et al. ”Touchless interfaces in the o perating room: A study in gesture preferences.” International Journal of Hu man–Computer Interaction 39.3 (2023): 438-448
2023
-
[14]
LIU, Jiaqing, et al. ”A preliminary study of kinect-bas ed real-time hand gesture interaction systems for touchless visualizat ions of hepatic structures in surgery.” Medical Imaging and Information Sc iences 36.3 (2019): 128-135
2019
-
[15]
”Y ou can’t touch this: touch-free navigation through radiological images.” Surgical innovation 19.3 (2012): 30 1-307
Ebert, Lars C., et al. ”Y ou can’t touch this: touch-free navigation through radiological images.” Surgical innovation 19.3 (2012): 30 1-307
2012
-
[16]
Elizondo
Rosa, Guillermo M., and Mar´ ıa L. Elizondo. ”Use of a ges ture user interface as a touchless image navigation system in dental s urgery: Case series report.” Imaging science in dentistry 44.2 (2014): 1 55
2014
-
[17]
”Glioblastoma Overall Survival Predic tion With Vision Transformers.” 2025 47th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC)
Lin, Yin, et al. ”Glioblastoma Overall Survival Predic tion With Vision Transformers.” 2025 47th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC). IEEE, 2 025
2025
-
[18]
”Lightweight ensemble vision transfor mer framework for non-invasive survival prediction in glioblastoma.” Ne urocomputing (2026): 133303
Lin, Yin, et al. ”Lightweight ensemble vision transfor mer framework for non-invasive survival prediction in glioblastoma.” Ne urocomputing (2026): 133303
2026
-
[19]
Sa-nguannarm, Phataratah, et al. ”A method of 3d hand mo vement recognition by a leap motion sensor for controlling medical image in an operating room.” 2019 First International Symposium on Ins trumenta- tion, Control, Artificial Intelligence, and Robotics (ICA- SYMP). IEEE, 2019
2019
-
[20]
”Comparison of kinect and leap mo tion for intraoperative image interaction.” Surgical innovation 2 8.1 (2021): 33- 40
Feng, Y uanyuan, et al. ”Comparison of kinect and leap mo tion for intraoperative image interaction.” Surgical innovation 2 8.1 (2021): 33- 40
2021
-
[21]
”A multimodal framework for sens or based sign language recognition.” Neurocomputing 259 (2017): 21-38
Kumar, Pradeep, et al. ”A multimodal framework for sens or based sign language recognition.” Neurocomputing 259 (2017): 21-38
2017
-
[22]
”Coupled HMM-based multi-senso r data fusion for sign language recognition.” Pattern Recognition Lette rs 86 (2017): 1-8
Kumar, Pradeep, et al. ”Coupled HMM-based multi-senso r data fusion for sign language recognition.” Pattern Recognition Lette rs 86 (2017): 1-8
2017
-
[23]
”A review of the hand gesture recognition system: Current prog ress and future directions.” IEEE access 9 (2021): 157422-157436
Mohamed, Noraini, Mumtaz Begum Mustafa, and Nazean Jom hari. ”A review of the hand gesture recognition system: Current prog ress and future directions.” IEEE access 9 (2021): 157422-157436
2021
-
[24]
”Real-time continuous pose r ecovery of human hands using convolutional networks.” ACM Transactions on G raphics (ToG) 33.5 (2014): 1-10
Tompson, Jonathan, et al. ”Real-time continuous pose r ecovery of human hands using convolutional networks.” ACM Transactions on G raphics (ToG) 33.5 (2014): 1-10
2014
-
[25]
Hands Deep in Deep Learning for Hand Pose Estimation
Oberweger, Markus, Paul Wohlhart, and Vincent Lepetit . ”Hands deep in deep learning for hand pose estimation.” arXiv prepr int arXiv:1502.06807 (2015)
work page Pith review arXiv 2015
-
[26]
”Hand keypoint detection in single images us- ing multiview bootstrapping.” Proceedings of the IEEE conf erence on Computer Vision and Pattern Recognition
Simon, Tomas, et al. ”Hand keypoint detection in single images us- ing multiview bootstrapping.” Proceedings of the IEEE conf erence on Computer Vision and Pattern Recognition. 2017
2017
-
[27]
”Learning to e stimate 3d hand pose from single rgb images.” Proceedings of the IEEE in terna- tional conference on computer vision
Zimmermann, Christian, and Thomas Brox. ”Learning to e stimate 3d hand pose from single rgb images.” Proceedings of the IEEE in terna- tional conference on computer vision. 2017
2017
-
[28]
”Hand gesture recognition wit h 3D con- volutional neural networks.” Proceedings of the IEEE confe rence on computer vision and pattern recognition workshops
Molchanov, Pavlo, et al. ”Hand gesture recognition wit h 3D con- volutional neural networks.” Proceedings of the IEEE confe rence on computer vision and pattern recognition workshops. 2015
2015
-
[29]
MediaPipe Hands: On-device Real-time Hand Tracking,
Zhang, Fan, et al. ”Mediapipe hands: On-device real-ti me hand track- ing.” arXiv preprint arXiv:2006.10214 (2020)
-
[30]
”PyVista: 3D pl otting and mesh analysis through a streamlined interface for the Visualiza tion Toolkit (VTK).” Journal of Open Source Software 4.37 (2019): 1450
Sullivan, C., and Alexander Kaszynski. ”PyVista: 3D pl otting and mesh analysis through a streamlined interface for the Visualiza tion Toolkit (VTK).” Journal of Open Source Software 4.37 (2019): 1450
2019
-
[31]
Chen, Jessie YC, and Jennifer E. Thropp. ”Review of low f rame rate effects on human performance.” IEEE Transactions on System s, Man, and Cybernetics-Part A: Systems and Humans 37.6 (2007): 106 3-1076
2007
-
[32]
”Towards effective interface designs for collaborative HRI in manufacturing: metrics and measures
Marvel, Jeremy A., et al. ”Towards effective interface designs for collaborative HRI in manufacturing: metrics and measures. ” ACM Transactions on Human-Robot Interaction (THRI) 9.4 (2020) : 1-55
2020
-
[33]
”Convolutional neural network for gestur e recognition human-computer interaction system design.” PloS one 20.2 ( 2025): e0311941
Niu, Peixin. ”Convolutional neural network for gestur e recognition human-computer interaction system design.” PloS one 20.2 ( 2025): e0311941
2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.