arxiv: 2605.04169 · v1 · submitted 2026-05-05 · 💻 cs.AI · cs.LG

Recognition: 3 theorem links

· Lean Theorem

Actionable Real-Time Modeling of Surgical Team Dynamics via Time-Expanded Interaction Graphs

Vincenzo Marco De Luca , Antonio Longa , Giovanna Varni , Andrea Passerini

Authors on Pith no claims yet

Pith reviewed 2026-05-08 17:55 UTC · model grok-4.3

classification 💻 cs.AI cs.LG

keywords surgical team dynamicstime-expanded graphsgraph neural networksreal-time predictioncounterfactual analysisprocedural efficiencycommunication modelingactionable decision support

0 comments

The pith

Time-expanded graphs of team communications let a standard neural network predict surgery duration in real time and suggest fixes.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper claims that representing surgical teams as nodes indexed by time, with directed edges for each communication exchange, creates a structure that captures how coordination evolves during a procedure. This expansion turns the dynamic process into a static graph that a graph neural network can process efficiently without needing specialized time-aware layers. If the approach holds, systems could flag when a case is likely to run longer than expected well before the end and point to small shifts in who talks to whom that correlate with shorter durations. The experiments on recorded cases test whether this yields both better early warnings and explanations that teams could act on during the operation.

Core claim

Surgical team performance is modeled by expanding interactions into time-indexed nodes connected by directed communication edges, which lets a static graph neural network predict efficiency as the deviation from expected procedure duration while supporting counterfactual queries that identify minimal changes in communication structure associated with better outcomes.

What carries the argument

Time-expanded interaction graphs, in which team members at successive time points become nodes and observed communication exchanges become directed edges, turning the evolving team process into a single static graph suitable for standard graph neural network inference.

If this is right

Real-time inference on the graphs flags procedures likely to exceed expected duration earlier than methods that ignore team interaction structure.
Counterfactual analysis on the same graphs identifies the smallest set of communication changes that would improve the predicted efficiency score.
The resulting explanations link specific behavioral variables, such as who speaks to whom at what moments, directly to the efficiency prediction.
The model supports deployment inside the operating room because the underlying graph neural network runs on the static expanded graph without requiring recurrent or attention-based time layers.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same node-and-edge construction could be applied to other coordinated team activities that produce timestamped interaction logs, such as emergency response or manufacturing shifts.
Pairing the interaction graphs with existing visual workflow models might produce hybrid predictors that improve accuracy on both duration and coordination quality.
If the counterfactuals prove stable across hospitals, they could serve as training targets for simulation-based team drills focused on communication patterns.

Load-bearing premise

Communication exchanges recorded as directed edges between time-indexed nodes, together with deviation from an expected duration, capture the main factors that determine how long a procedure will take.

What would settle it

A new set of recorded surgeries in which the time-expanded graph model shows no gain in early detection of overruns or no interpretable counterfactuals compared with a baseline that uses only timestamps and visual workflow cues.

Figures

Figures reproduced from arXiv: 2605.04169 by Andrea Passerini, Antonio Longa, Giovanna Varni, Vincenzo Marco De Luca.

**Figure 1.** Figure 1: Overview of the interaction modeling pipeline. Top row: multimodal time-series are segmented into fixed temporal windows (15 seconds). Middle row: for each window, a snapshot interaction graph Gt is built, in which nodes represent team members and edges encode broadcast verbal communication. Node features integrate interpretable paralinguistic (eGeMAPS), pose, and human–tool interaction. Bottom row: snaps… view at source ↗

**Figure 2.** Figure 2: Sensitivity analysis of model predictions when converting a slow surgical procedure into a medium-duration one, or a medium-duration procedure into a fast one. The left panel reports the sensitivity of the predictions to modifications of interaction edges, while the right panel illustrates the sensitivity to changes in team members’ behavioral classes. paralinguistic features, assessing the minimum distanc… view at source ↗

read the original abstract

Surgical team performance arises from complex interactions between technical execution and non-technical skills, including communication and coordination dynamics. However, current surgical AI systems predominantly model visual workflow signals, lacking structured representations of intraoperative team interactions over time. We propose a real-time actionable approach for modeling surgical team dynamics using time-expanded interaction graphs, where team members are modeled as time-indexed nodes and communication exchanges define directed edges. This spatio-temporal expansion enables dynamic interaction modeling, while allowing efficient inference with a static graph neural network. The model predicts procedural efficiency as the deviation from the expected duration and supports real-time deployment. Beyond prediction, we perform a counterfactual analysis to identify minimal changes in communication structure and interpretable behavioral variables associated with improved predicted outcomes. Experiments on recorded surgical procedures show that structured modeling of team interactions improves early identification of prolonged interventions and provides coherent, actionable explanations. This work advances surgical AI toward real-time, team-aware, and actionable decision support in the operating room.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper's new angle is modeling surgical team comms as time-expanded graphs for static GNN inference on duration deviations plus counterfactuals, but real-time validity hinges on avoiding future data in partial graphs.

read the letter

The main thing to know is that this work turns intraoperative team interactions into time-expanded graphs—team members as time-indexed nodes, communications as directed edges—and runs a static GNN on them to predict deviation from expected case duration while also doing counterfactual analysis for actionable suggestions. That specific construction for surgical team dynamics looks new relative to standard visual workflow models or off-the-shelf GNNs on static graphs. They do a solid job calling out the gap in current surgical AI, which mostly ignores coordination and non-technical skills, and the counterfactual step is a practical touch for turning predictions into potential interventions. Using a static GNN on the expanded structure for efficiency is a reasonable engineering choice if the temporal expansion is handled right. The soft spots are in the evidence base and the real-time setup. The abstract claims experiments on recorded procedures improve early identification of overruns and give coherent explanations, yet supplies no numbers, baselines, sample sizes, or split details, so the actual lift is impossible to assess from what's written. More critically, the stress-test point lands: if the time-expanded graphs are built from the full procedure, then at any intraoperative time t the model risks incorporating later edges or outcomes, which would make early predictions non-causal and undermine both the real-time claim and the counterfactuals. The paper needs to show explicit prefix-graph construction or masking to close that gap; without it the central assumption—that these communication edges plus duration deviation are enough to drive reliable inference on partial data—stays shaky. Other factors like technical execution or patient specifics are acknowledged as missing but not deeply addressed. This is for researchers in surgical AI, OR workflow, or applied temporal graphs who want a concrete example of team modeling. A reader already working on dynamic systems or healthcare graphs could pull the representation idea. I would send it to peer review. The framing is coherent and the application focus is clear, so referees can sort the missing metrics and the temporal leakage issue without it being a desk reject.

Referee Report

3 major / 1 minor

Summary. The paper proposes modeling surgical team dynamics with time-expanded interaction graphs, where team members are represented as time-indexed nodes and communications as directed edges. A static graph neural network is applied to predict deviation from expected procedure duration for real-time inference, with additional counterfactual analysis to generate actionable explanations for improving outcomes. Experiments on recorded surgical procedures are claimed to demonstrate improved early identification of prolonged interventions.

Significance. If the central claims hold under proper real-time constraints, the work could advance surgical AI by shifting from purely visual workflow models to structured representations of team interactions, enabling earlier detection of inefficiencies and interpretable interventions in the operating room. The combination of GNN-based prediction with counterfactual explanations is a potentially useful direction for actionable decision support.

major comments (3)

[Abstract] Abstract: the statement that 'experiments on recorded surgical procedures show that structured modeling of team interactions improves early identification of prolonged interventions' provides no quantitative metrics, baselines, sample sizes, statistical tests, or details on expected-duration computation and data partitioning, so the improvement cannot be evaluated and the central empirical claim remains unverified.
[Abstract] Abstract (description of time-expanded interaction graphs): the construction of time-indexed nodes and directed edges that span the full procedure, followed by inference with a static GNN, does not specify prefix-graph construction, causal masking, or restriction to observations available at intraoperative time t; without such mechanisms, early predictions may incorporate future communications, undermining the real-time and causal validity of both the duration-deviation predictions and the counterfactual explanations.
[Abstract] Abstract (real-time deployment claim): the target variable (deviation from expected duration) is described as externally defined, yet no information is given on how the expected duration is estimated from partial observations or whether the GNN is trained only on prefixes; this leaves open whether the model can perform reliable inference on incomplete intraoperative data.

minor comments (1)

[Abstract] The abstract uses 'spatio-temporal expansion' and 'dynamic interaction modeling' without clarifying whether the GNN itself is dynamic or whether dynamism is achieved solely by the graph construction; a brief clarification of this distinction would improve readability.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed comments on our manuscript. We have revised the abstract and methods to address the concerns about quantitative details, real-time construction, and training procedures. Point-by-point responses follow.

read point-by-point responses

Referee: [Abstract] Abstract: the statement that 'experiments on recorded surgical procedures show that structured modeling of team interactions improves early identification of prolonged interventions' provides no quantitative metrics, baselines, sample sizes, statistical tests, or details on expected-duration computation and data partitioning, so the improvement cannot be evaluated and the central empirical claim remains unverified.

Authors: We agree that the abstract statement is too high-level to allow direct evaluation of the empirical claim. The full manuscript presents these details in the Experiments section, including metrics, baselines, sample sizes, statistical tests, expected-duration estimation, and data partitioning. We have revised the abstract to incorporate a concise summary of the key quantitative results and methodological details so that the central claim can be assessed from the abstract alone. revision: yes
Referee: [Abstract] Abstract (description of time-expanded interaction graphs): the construction of time-indexed nodes and directed edges that span the full procedure, followed by inference with a static GNN, does not specify prefix-graph construction, causal masking, or restriction to observations available at intraoperative time t; without such mechanisms, early predictions may incorporate future communications, undermining the real-time and causal validity of both the duration-deviation predictions and the counterfactual explanations.

Authors: This is a valid concern for ensuring real-time and causal validity. The manuscript constructs the time-expanded graph incrementally from communications observed up to time t and applies the static GNN only to the resulting prefix graph. We have now explicitly added descriptions of prefix-graph construction and causal masking (to block future information) to both the revised abstract and the Methods section, clarifying that all predictions and counterfactual analyses respect intraoperative information constraints. revision: yes
Referee: [Abstract] Abstract (real-time deployment claim): the target variable (deviation from expected duration) is described as externally defined, yet no information is given on how the expected duration is estimated from partial observations or whether the GNN is trained only on prefixes; this leaves open whether the model can perform reliable inference on incomplete intraoperative data.

Authors: We acknowledge the need for explicit clarification on this point. The expected duration is estimated from a separate historical model that does not rely on intraoperative partial observations, while the GNN is trained and evaluated exclusively on graph prefixes to simulate real-time conditions. We have revised the abstract to state this distinction and have added a paragraph in the Methods section detailing the prefix-based training and inference protocol. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The derivation chain constructs time-expanded graphs from observed team communications (directed edges between time-indexed nodes) and applies a static GNN to predict deviation from an externally defined expected duration. No self-definitional loops appear (target is not derived from the model itself), no parameters are fitted on a subset and renamed as predictions, and no load-bearing self-citations or imported uniqueness theorems are invoked in the abstract or described pipeline. The real-time claim rests on independent graph construction rather than reducing to its own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

Abstract-only review yields minimal explicit parameters or axioms; the core modeling choices rest on domain assumptions about what constitutes relevant team interaction.

axioms (2)

domain assumption Team interactions can be faithfully represented as directed edges between time-indexed nodes derived from communication exchanges.
Invoked in the definition of the time-expanded interaction graph.
domain assumption Deviation from expected procedure duration is a valid proxy for procedural efficiency driven by team dynamics.
Used to define the prediction target.

pith-pipeline@v0.9.0 · 5470 in / 1359 out tokens · 67904 ms · 2026-05-08T17:55:18.789130+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

30 extracted references · 2 canonical work pages · 2 internal anchors

[1]

Pyan- note

Herv´ e Bredin, Ruiqing Yin, Juan Manuel Coria, Gregory Gelly, Pavel Korshunov, Marvin Lavechin, Diego Fustes, Hadrien Titeux, Wassim Bouaziz, and Marie-Philippe Gill. Pyan- note. audio: neural building blocks for speaker diarization. InICASSP 2020-2020 IEEE In- ternational conference on acoustics, speech and signal processing (ICASSP), pages 7124–

2020
[2]

Prolonged operative duration is associated with com- plications: a systematic review and meta-analysis.Journal of Surgical Research, 229:134– 144, 2018

Hang Cheng, Jeffrey W Clymer, Brian Po-Han Chen, Behnam Sadeghirad, Nicole C Ferko, Chris G Cameron, and Piet Hinoul. Prolonged operative duration is associated with com- plications: a systematic review and meta-analysis.Journal of Surgical Research, 229:134– 144, 2018

2018
[3]

Understanding costs of care in the operating room.JAMA surgery, 153(4):e176233, 2018

Christopher P Childers and Melinda Maggard-Gibbons. Understanding costs of care in the operating room.JAMA surgery, 153(4):e176233, 2018

2018
[4]

Boosting Team Modeling through Tempo-Relational Representation Learning

Vincenzo Marco De Luca, Giovanna Varni, and Andrea Passerini. Boosting team modeling through tempo-relational representation learning.arXiv preprint arXiv:2507.13305, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[5]

Nicole Etherington, Sarah Larrigan, Henry Liu, Michael Wu, Katrina J Sullivan, James Jung, and Sylvain Boet. Measuring the teamwork performance of operating room teams: a systematic review of assessment tools and their measurement properties.Journal of Interprofessional Care, 35(1):37–45, 2021

2021
[6]

The geneva minimalistic acoustic parameter set (gemaps) for voice research and affective computing.IEEE transactions on affective computing, 7(2):190–202, 2015

Florian Eyben, Klaus R Scherer, Bj¨ orn W Schuller, Johan Sundberg, Elisabeth Andr´ e, Carlos Busso, Laurence Y Devillers, Julien Epps, Petri Laukka, Shrikanth S Narayanan, et al. The geneva minimalistic acoustic parameter set (gemaps) for voice research and affective computing.IEEE transactions on affective computing, 7(2):190–202, 2015

2015
[7]

Machine learning for surgical phase recognition: a systematic review.Annals of surgery, 273(4):684–693, 2021

Carly R Garrow, Karl-Friedrich Kowalewski, Linhong Li, Martin Wagner, Mona W Schmidt, Sandy Engelhardt, Daniel A Hashimoto, Hannes G Kenngott, Sebastian Boden- stedt, Stefanie Speidel, et al. Machine learning for surgical phase recognition: a systematic review.Annals of surgery, 273(4):684–693, 2021

2021
[8]

Deep learning analysis of surgical video recordings to assess nontechnical skills.JAMA network open, 7(7):e2422520, 2024

Rayan Ebnali Harari, Roger D Dias, Lauren R Kennedy-Metz, Giovanna Varni, Matthew Gombolay, Steven Yule, Eduardo Salas, and Marco A Zenati. Deep learning analysis of surgical video recordings to assess nontechnical skills.JAMA network open, 7(7):e2422520, 2024

2024
[9]

Semi-Supervised Classification with Graph Convolutional Networks

Thomas N Kipf and Max Welling. Semi-supervised classification with graph convolutional networks.arXiv preprint arXiv:1609.02907, 2016

work page internal anchor Pith review arXiv 2016
[10]

Pre- diction of remaining surgery duration based on machine learning methods and laparo- scopic annotation data.Biomedical Engineering/Biomedizinische Technik, 70(3):229–239, 2025

Spiros Kostopoulos, Dionisis Cavouras, Dimitris Glotsos, and Constantinos Loukas. Pre- diction of remaining surgery duration based on machine learning methods and laparo- scopic annotation data.Biomedical Engineering/Biomedizinische Technik, 70(3):229–239, 2025

2025
[11]

Surgical process modelling: a review.International journal of computer assisted radiology and surgery, 9(3):495–511, 2014

Florent Lalys and Pierre Jannin. Surgical process modelling: a review.International journal of computer assisted radiology and surgery, 9(3):495–511, 2014

2014
[12]

Machine learning for technical skill assessment in surgery: a systematic review.NPJ digital medicine, 5(1):24, 2022

Kyle Lam, Junhong Chen, Zeyu Wang, Fahad M Iqbal, Ara Darzi, Benny Lo, Sanjay Purkayastha, and James M Kinross. Machine learning for technical skill assessment in surgery: a systematic review.NPJ digital medicine, 5(1):24, 2022

2022
[13]

Deep learning in surgical process modeling: A systematic review of workflow recognition.Journal of Biomedical Informatics, 162:104779, 2025

Zhenzhong Liu, Kelong Chen, Shuai Wang, Yijun Xiao, and Guobin Zhang. Deep learning in surgical process modeling: A systematic review of workflow recognition.Journal of Biomedical Informatics, 162:104779, 2025

2025
[14]

Graph neural networks for temporal graphs: State of the art, open challenges, and opportunities.Transactions on Machine Learning Research, 2023

Antonio Longa, Veronica Lachi, Gabriele Santin, Monica Bianchini, Bruno Lepri, Pietro Lio, Franco Scarselli, and Andrea Passerini. Graph neural networks for temporal graphs: State of the art, open challenges, and opportunities.Transactions on Machine Learning Research, 2023

2023
[15]

xai-drop: Don’t use what you cannot explain

Vincenzo Marco De Luca, Antonio Longa, Pietro Lio, and Andrea Passerini. xai-drop: Don’t use what you cannot explain. InProceedings of the Third Learning on Graphs Conference, pages 16:1–16:22, 2025

2025
[16]

Voice acoustic patterns predict quality of interprofessional team behavior in cardiac surgery

Sanjana Mendu, Shrivatsa Mishra, Victor Murcia Ruiz, Rafael Fricks, Rayan Harari, Roger D Dias, Theodora Chaspari, and Marco A Zenati. Voice acoustic patterns predict quality of interprofessional team behavior in cardiac surgery. InThe Hamlyn Symposium on Medical Robotics: proceedings, volume 2025, page 17, 2025

2025
[17]

Prolonged operative time significantly impacts on the incidence of compli- cations in spinal surgery.Journal of orthopaedic surgery and research, 19(1):567, 2024

Annalisa Monetta, Cristiana Griffoni, Luigi Falzetti, Gisberto Evangelisti, Luigi Emanuele Noli, Giuseppe Tedesco, Carlotta Cavallari, Stefano Bandiera, Silvia Terzi, Riccardo Gher- mandi, et al. Prolonged operative time significantly impacts on the incidence of compli- cations in spinal surgery.Journal of orthopaedic surgery and research, 19(1):567, 2024

2024
[18]

Mm-or: A large multimodal operating room dataset for semantic understanding of high-intensity surgical environments

Ege ¨Ozsoy, Chantal Pellegrini, Tobias Czempiel, Felix Tristram, Kun Yuan, David Bani- Harouni, Ulrich Eck, Benjamin Busam, Matthias Keicher, and Nassir Navab. Mm-or: A large multimodal operating room dataset for semantic understanding of high-intensity surgical environments. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 1...

2025
[19]

Robust speech recognition via large-scale weak supervision

Alec Radford, Jong Wook Kim, Tao Xu, Greg Brockman, Christine McLeavey, and Ilya Sutskever. Robust speech recognition via large-scale weak supervision. InInternational conference on machine learning, pages 28492–28518. PMLR, 2023

2023
[20]

Cost of postoperative complications after general surgery at a major canadian academic centre.International Journal for Quality in Health Care, 34(4):mzac075, 2022

Eileen Roach, Luis De La Maza, Scott Rieder, Laavanyah Vigneswaran, Azusa Maeda, Allan Okrainec, and Timothy D Jackson. Cost of postoperative complications after general surgery at a major canadian academic centre.International Journal for Quality in Health Care, 34(4):mzac075, 2022

2022
[21]

Long short-term memory.Neural Comput, 9(8):1735–1780, 1997

J¨ urgen Schmidhuber, Sepp Hochreiter, et al. Long short-term memory.Neural Comput, 9(8):1735–1780, 1997

1997
[22]

Organizational decision-making structures in the age of artificial intelligence.California management review, 61(4):66–83, 2019

Yash Raj Shrestha, Shiko M Ben-Menahem, and Georg Von Krogh. Organizational decision-making structures in the age of artificial intelligence.California management review, 61(4):66–83, 2019

2019
[23]

Gommers DAMPJ Reinders MJT

Jim M Smit, Jesse H Krijthe, Jasper van Bommel, and Causal Inference for ICU Collabo- rators van Genderen ME Labrecque JA Komorowski M. Gommers DAMPJ Reinders MJT. The future of artificial intelligence in intensive care: moving from predictive to actionable ai.Intensive Care Medicine, 49(9):1114–1116, 2023

2023
[24]

Christopher H Stucky, Felichism W Kabo, Marla J De Jong, Sherita L House, Chandler H Moser, and Donald E Kimbler. Surgical control time estimation variability: implications for medical systems and the future integration of ai and ml models.Perioperative Care and Operating Room Management, 37:100432, 2024

2024
[25]

Communication and relationship dynamics in surgical teams in the operating room: an ethnographic study.BMC health services research, 19(1):528, 2019

Birgitte Tørring, Jody Hoffer Gittell, Mogens Laursen, Bodil Steen Rasmussen, and Erik Elgaard Sørensen. Communication and relationship dynamics in surgical teams in the operating room: an ethnographic study.BMC health services research, 19(1):528, 2019

2019
[26]

Attention is all you need.Advances in Neural Information Processing Systems, 2017

A Vaswani. Attention is all you need.Advances in Neural Information Processing Systems, 2017

2017
[27]

Surgical data science: the new knowledge domain

S Swaroop Vedula and Gregory D Hager. Surgical data science: the new knowledge domain. Innovative surgical sciences, 2(3):109–121, 2017

2017
[28]

Graph attention networks.stat, 1050(20):10–48550, 2017

Petar Velickovic, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Lio, Yoshua Bengio, et al. Graph attention networks.stat, 1050(20):10–48550, 2017

2017
[29]

Surgical data science and artificial intelligence for surgical education

Thomas M Ward, Pietro Mascagni, Amin Madani, Nicolas Padoy, Silvana Perretta, and Daniel A Hashimoto. Surgical data science and artificial intelligence for surgical education. Journal of Surgical Oncology, 124(2):221–230, 2021

2021
[30]

Team dynamics in the operating room: how is team performance optimized?Anesthesiology clinics, 41(4):775–787, 2023

Scott C Watkins and Nadia B Hensley. Team dynamics in the operating room: how is team performance optimized?Anesthesiology clinics, 41(4):775–787, 2023

2023