Framework for 3D acoustic localization of surgical events projected onto RGB-D point clouds with transformer-based detection for multimodal dynamic scene understanding.
https://arxiv.org/abs/2505.24287
4 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 4roles
background 1polarities
background 1representative citing papers
Introduces OR-Action benchmark for multi-role fine-grained actions in OR videos and a vision-only temporal model with multi-to-single view alignment that outperforms graph-based approaches.
Gaze-following models on extended 4D-OR and Team-OR datasets reach F1 scores of 0.92 for clinical role prediction and 0.95 for surgical phase recognition while improving team communication detection by over 30%.
The paper introduces Dyadic Partnership (DP) as an intermediate paradigm for robot-clinician collaboration that uses foundation models and multi-modal interfaces to enable safer gradual progress toward autonomous medical robotics.
citing papers explorer
-
Sound Source Localization for Spatial Mapping of Surgical Actions in Dynamic Scenes
Framework for 3D acoustic localization of surgical events projected onto RGB-D point clouds with transformer-based detection for multimodal dynamic scene understanding.
-
OR-Action: Multi-Role Video Understanding with Fine-Grained Actions
Introduces OR-Action benchmark for multi-role fine-grained actions in OR videos and a vision-only temporal model with multi-to-single view alignment that outperforms graph-based approaches.
-
Where are they looking in the operating room?
Gaze-following models on extended 4D-OR and Team-OR datasets reach F1 scores of 0.92 for clinical role prediction and 0.95 for surgical phase recognition while improving team communication detection by over 30%.
-
Dyadic Partnership(DP): A Missing Link Towards Full Autonomy in Medical Robotics
The paper introduces Dyadic Partnership (DP) as an intermediate paradigm for robot-clinician collaboration that uses foundation models and multi-modal interfaces to enable safer gradual progress toward autonomous medical robotics.