pith. sign in

arxiv: 2606.26886 · v1 · pith:ZX35ZIJ2new · submitted 2026-06-25 · 💻 cs.HC

Optimizing Human-Machine Interface for Real-Time AI Support in the Operating Room: the CVS Copilot

Pith reviewed 2026-06-26 03:06 UTC · model grok-4.3

classification 💻 cs.HC
keywords Human-machine interfaceAI-assisted surgeryCritical View of SafetyLaparoscopic cholecystectomyUser-centered designIntraoperative decision supportRole-adaptive interface
0
0 comments X

The pith

Surgeons want AI for gallbladder safety checks to stay minimal by default and fully under their control.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper describes interviews with 17 surgeons about how AI predictions for the Critical View of Safety should appear during laparoscopic cholecystectomy. Surgeons largely want AI to support decisions rather than make them, and they favor displays that show little unless the surgeon chooses to see more. Attendings and residents differed on how much optional guidance they wanted, but both groups rejected persistent overlays and autonomous actions. The resulting design, called CVS Copilot, uses role-adaptive minimal visuals with on-demand anatomical details. The work aims to make AI assistance safer and more acceptable in the operating room by matching real user preferences.

Core claim

The optimal interface is a surgeon-controlled, role-adaptive HMI that delivers AI-based CVS assessment through minimal default visualization plus optional overlays, because this matches the preferences expressed across the surgeon cohort for visual, minimally intrusive, and fully controllable assistance.

What carries the argument

Reflexive thematic analysis of semi-structured interviews that probed interaction modalities, timing of assistance, visualization strategies, and control mechanisms across surgical roles.

If this is right

  • Sixteen of seventeen surgeons support AI for intraoperative decision support but reject autonomous decision-making.
  • Attendings prefer minimal AI feedback only at decisive moments while residents prefer optional guidance with confidence indicators.
  • A minimal overlay receives the strongest support, followed by on-demand anatomical segmentation.
  • Persistent overlays, haptic feedback, and numeric confidence displays raise recurrent concerns.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The CVS Copilot design could be tested in live cases to check whether the stated preferences actually reduce cognitive load or improve safety metrics.
  • Role-adaptive features might need further tuning when the same system is used across different team compositions or hospital cultures.
  • Similar interview methods could be applied to AI interfaces for other laparoscopic procedures to see if the minimal-default pattern holds.

Load-bearing premise

Preferences stated by seventeen surgeons at one hospital will stay stable and produce safer surgery when the same interface is used by other surgeons in other places.

What would settle it

A larger study at multiple hospitals finding that most surgeons prefer persistent overlays, numeric confidence scores, or haptic signals instead of minimal default visualization.

Figures

Figures reproduced from arXiv: 2606.26886 by Aditya Murali, Lorenzo Arboit, Nicolas Chanel, Nicolas Padoy, Pietro Mascagni.

Figure 1
Figure 1. Figure 1: summarizes surgeons’ preferences across user ex￾perience, interface, and interaction dimensions, highlighting areas of consensus, rejection, and role-dependent divergence. 2.5. Mock-Up Evaluation and CVS Copilot Design Predefined interface mock-ups were introduced during the final phase of each interview as structured probes to elicit pref￾ [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: CVS Copilot interface. (A) Minimal Interface (MI) displaying the status of the three Critical View of Safety (CVS) criteria through a compact corner indicator, where for each criterion, gray denotes not achieved and green denotes achieved. (B) Full Dashboard (FD) providing detailed information including confidence scores, frame-level CVS assessment, and anatomy overlays, with the cystic duct shown in green… view at source ↗
Figure 3
Figure 3. Figure 3: Surgeon-centric design process. DeepCVS outputs informed structured interviews on User Experience (UX), User Interface (UI), and mock-ups, followed by qualitative analysis, leading to CVS Copilot (minimal overlay and customizable dashboard). 4. Methods 4.1. Study design A qualitative-dominant, mixed-methods study grounded in a UCD framework was conducted to elicit surgeons’ require￾ments for an intraoperat… view at source ↗
read the original abstract

Artificial intelligence (AI) systems for automated Critical View of Safety (CVS) assessment in laparoscopic cholecystectomy are nearing clinical translation. Beyond algorithmic performance, clinical safety and effectiveness depend on the quality of the human-machine interface (HMI). This work examines how AI-generated predictions should be presented and controlled intraoperatively. Seventeen surgeons, including residents, attending surgeons, and professors, took part in a mixed-methods, user-centered design study to optimize an intraoperative HMI for AI-assisted safe laparoscopic cholecystectomy. Interviews explored interaction modalities, timing of assistance, visualization strategies, and control mechanisms across surgical roles, and were analyzed using reflexive thematic analysis and human-factors heuristics. Most surgeons (16/17) supported the use of AI for intraoperative decision support while rejecting autonomous decision-making. Attendings preferred minimal AI feedback at decisive moments (13/14), whereas residents favored optional guidance (3/3) with confidence indicators and on-demand anatomical overlays. Across interviews, surgeons consistently prioritized visual, surgeon-controlled, minimally intrusive displays, with the strongest support for a minimal overlay (16/17) and on-demand anatomical segmentation (13/17). Recurrent concerns included persistent overlays, haptic feedback, and numeric confidence displays, although these were not uniformly raised across the cohort. These findings informed the design of CVS Copilot, a surgeon-controlled, role-adaptive HMI that provides AI-based CVS assessment with minimal default visualization and optional overlays.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The manuscript reports a single-site mixed-methods user-centered design study with 17 surgeons (residents, attendings, professors) who participated in semi-structured interviews. Using reflexive thematic analysis and human-factors heuristics, the authors extract preferences for intraoperative AI support during laparoscopic cholecystectomy: 16/17 surgeons endorse AI for decision support but reject autonomy; attendings (13/14) prefer minimal feedback at key moments while residents (3/3) want optional guidance and overlays; a minimal default overlay (16/17) and on-demand anatomical segmentation (13/17) receive strongest support. These data are stated to have directly informed the design of CVS Copilot, a surgeon-controlled, role-adaptive HMI with minimal default visualization and optional overlays.

Significance. If the reported frequencies and their translation into concrete design decisions are reliable, the work supplies empirical grounding for HMI choices in real-time surgical AI, a domain where interface design is critical for safety and adoption. The explicit role differentiation and emphasis on minimal intrusiveness are useful contributions to human-AI collaboration literature in healthcare.

major comments (1)
  1. [Methods] Methods: The reflexive thematic analysis is described only at a high level (interview topics listed, analysis method named). No information is supplied on the interview guide, number of interviewers/coders, coding reliability or saturation criteria, or how divergent views were reconciled. Because the central claim is that these interview findings directly shaped the CVS Copilot design, the missing methodological detail prevents evaluation of the data-to-design traceability.
minor comments (2)
  1. [Results] Abstract and Results: The specific response counts (16/17, 13/17, etc.) are useful but would be clearer if presented in a table that also shows role breakdowns and theme frequencies.
  2. [Discussion] Discussion: The single-institution sample is noted implicitly but could be stated explicitly as a limitation when interpreting the generalizability of the derived HMI requirements.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We agree that expanding the methodological description will strengthen the paper and improve traceability from data to design. We address the single major comment below.

read point-by-point responses
  1. Referee: [Methods] Methods: The reflexive thematic analysis is described only at a high level (interview topics listed, analysis method named). No information is supplied on the interview guide, number of interviewers/coders, coding reliability or saturation criteria, or how divergent views were reconciled. Because the central claim is that these interview findings directly shaped the CVS Copilot design, the missing methodological detail prevents evaluation of the data-to-design traceability.

    Authors: We acknowledge that the current Methods section provides only a high-level overview. In the revised manuscript we will add: (1) the complete semi-structured interview guide as Supplementary Material; (2) explicit statement that interviews were conducted by two authors (a surgeon and an HCI researcher), with reflexive thematic analysis performed primarily by the first author following Braun & Clarke (2006, 2021); (3) clarification that inter-coder reliability metrics were not computed, consistent with the reflexive approach that prioritizes researcher subjectivity over consensus coding; (4) saturation criteria (thematic saturation reached after 14 interviews, confirmed by no new codes in the final three); and (5) a description of how divergent views were handled through iterative team discussions, with minority perspectives retained as sub-themes. We will also insert a new table that explicitly maps representative quotes and themes to the specific HMI features implemented in CVS Copilot (e.g., minimal default overlay, role-adaptive guidance, on-demand segmentation). These additions will directly address traceability without altering the study design or findings. revision: yes

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The paper reports a mixed-methods user-centered design study based on semi-structured interviews with 17 surgeons, analyzed via reflexive thematic analysis and human-factors heuristics. All central claims (e.g., 16/17 support for AI decision support, role-specific preferences, design of CVS Copilot) are direct mappings from the collected interview data and frequencies; there are no equations, fitted parameters, predictions, derivations, or self-citations that reduce any result to its own inputs by construction. The study is self-contained as a descriptive requirements-gathering exercise whose validity rests on traceable execution of the interviews rather than any external theorem or prior author result.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is a qualitative empirical study with no mathematical derivations, fitted parameters, or postulated entities.

pith-pipeline@v0.9.1-grok · 5800 in / 1184 out tokens · 61971 ms · 2026-06-26T03:06:33.801985+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

32 extracted references · 28 canonical work pages

  1. [1]

    https://doi.org/10.48550/arXiv.2509.17100,http://arxiv

    Alapatt, D., Eckhoff, J., Lyu, Z., Ban, Y ., Mazellier, J.P., Choksi, S., Yang, K., Consortium, .C.C., Li, Q., Filicori, F., Li, X., Mascagni, P., Hashimoto, D.A., Rosman, G., Meireles, O., Padoy, N.: The SAGES Critical View of Safety Challenge: A Global Benchmark for AI-Assisted Surgical Quality Assessment (Sep 2025). https://doi.org/10.48550/arXiv.2509....

  2. [2]

    BMC medical informatics and decision making20(1), 310 (Nov 2020)

    Amann, J., Blasimme, A., Vayena, E., Frey, D., Madai, V .I., Precise4Q consortium: Explainability for artificial intelligence in healthcare: a multidisciplinary perspective. BMC medical informatics and decision making20(1), 310 (Nov 2020). https://doi.org/10.1186/s12911-020- 01332-6

  3. [3]

    International Journal of Transgender Health24(1), 1–6 (Jan 2023)

    Braun, V ., Clarke, V .: Toward good practice in thematic anal- ysis: Avoiding common problems and be(com)ing a knowing researcher. International Journal of Transgender Health24(1), 1–6 (Jan 2023). https://doi.org/10.1080/26895269.2022.2129597, https://doi.org/10.1080/26895269.2022.2129597, _eprint: https://doi.org/10.1080/26895269.2022.2129597

  4. [4]

    Annals of Surgery272(1), 3–23 (Jul 2020)

    Brunt, L.M., Deziel, D.J., Telem, D.A., Strasberg, S.M., Aggarwal, R., Asbun, H., Bonjer, J., McDonald, M., Alseidi, A., Ujiki, M., Riall, T.S., Hammill, C., Moulton, C.A., Pucher, P.H., Parks, R.W., Ansari, M.T., Connor, S., Dirks, R.C., Anderson, B., Altieri, M.S., Tsamalaidze, L., Stefanidis, D., and the Prevention of Bile Duct Injury Consensus Work Gr...

  5. [5]

    In: 2015 International Conference on Healthcare Informatics

    Bussone, A., Stumpf, S., O’Sullivan, D.: The Role of Explana- tions on Trust and Reliance in Clinical Decision Support Systems. In: 2015 International Conference on Healthcare Informatics. pp. 160–169 (Oct 2015). https://doi.org/10.1109/ICHI.2015.26,https: //ieeexplore.ieee.org/document/7349687

  6. [6]

    Biomedical Instrumentation & Technology46(4), 268–277 (2012)

    Cvach, M.: Monitor alarm fatigue: an integrative review. Biomedical Instrumentation & Technology46(4), 268–277 (2012). https://doi.org/10.2345/0899-8205-46.4.268

  7. [7]

    https://doi.org/10.48550/arXiv.2504.02551,http://arxiv

    Davidson, A.E., Ray, J.M., Strekalova, Y .L., Rashidi, P., Bi- horac, A.: Human-Centered Development of an Explainable AI Framework for Real-Time Surgical Risk Surveillance (Apr 2025). https://doi.org/10.48550/arXiv.2504.02551,http://arxiv. org/abs/2504.02551, arXiv:2504.02551 [cs]

  8. [8]

    Annals of Surgery259(4), 700 (Apr 2014)

    Diana, M., Noll, E., Diemunsch, P., Dallemagne, B., Be- nahmed, M.A., Agnus, V ., Soler, L., Barry, B., Namer, I.J., Demartines, N., Charles, A.L., Geny, B., Marescaux, J.: Enhanced-Reality Video Fluorescence: A Real-Time Assess- ment of Intestinal Viability. Annals of Surgery259(4), 700 (Apr 2014). https://doi.org/10.1097/SLA.0b013e31828d4ab3, https://jo...

  9. [9]

    Intensive and Critical Care Nursing59, 102845 (Aug 2020)

    Dursun Ergezen, F., Kol, E.: Nurses’ responses to mon- itor alarms in an intensive care unit: An observational study. Intensive and Critical Care Nursing59, 102845 (Aug 2020). https://doi.org/10.1016/j.iccn.2020.102845,https: //www.sciencedirect.com/science/article/pii/ S0964339720300483

  10. [10]

    Human Factors37(1), 32–64 (Mar 1995)

    Endsley, M.R.: Toward a Theory of Situation Awareness in Dynamic Systems. Human Factors37(1), 32–64 (Mar 1995). https://doi.org/10.1518/001872095779049543,https://doi.org/ 10.1518/001872095779049543

  11. [11]

    npj Digital Medicine4(1), 31 (Feb 2021)

    Gaube, S., Suresh, H., Raue, M., Merritt, A., Berkowitz, S.J., Lermer, E., Coughlin, J.F., Guttag, J.V ., Colak, E., Ghassemi, M.: Do as AI say: susceptibility in deployment of clinical decision-aids. npj Digital Medicine4(1), 31 (Feb 2021). https://doi.org/10.1038/s41746-021- 00385-9,https://www.nature.com/articles/s41746- 021-00385-9

  12. [12]

    PLOS ONE15(5), e0232076 (May 2020)

    Guest, G., Namey, E., Chen, M.: A simple method to assess and report thematic saturation in qualitative research. PLOS ONE15(5), e0232076 (May 2020). https://doi.org/10.1371/journal.pone.0232076, https://journals.plos.org/plosone/article?id=10. 1371/journal.pone.0232076

  13. [13]

    Clin- ical trial registration NCT06895200, clinicaltrials.gov (Apr 2025), https://clinicaltrials.gov/study/NCT06895200, sub- mitted: 2025-03-17

    IHU Strasbourg: CVS-Notifier - Smart Intraoperative Reminder to Implement Safety Principles in Laparoscopic Cholecystectomy. Clin- ical trial registration NCT06895200, clinicaltrials.gov (Apr 2025), https://clinicaltrials.gov/study/NCT06895200, sub- mitted: 2025-03-17

  14. [14]

    Frontiers in Public Health11, 1281194 (Jan 2024)

    Liu, S., Li, Y .y., Li, D., Wang, F.Y ., Fan, L.J., Zhou, L.x.: Ad- vances in objective assessment of ergonomics in endoscopic surgery: a review. Frontiers in Public Health11, 1281194 (Jan 2024). https://doi.org/10.3389/fpubh.2023.1281194,https://pmc.ncbi. nlm.nih.gov/articles/PMC10796503/

  15. [15]

    Clinical trial registration NCT07186803, clinical- trials.gov (Sep 2025),https://clinicaltrials.gov/study/ NCT07186803, submitted: 2025-09-13

    Madani, A.: Evaluating the Clinical Impact of Artificial Intelli- gence on Safety in Laparoscopic Cholecystectomy: A Randomized Controlled Trial. Clinical trial registration NCT07186803, clinical- trials.gov (Sep 2025),https://clinicaltrials.gov/study/ NCT07186803, submitted: 2025-09-13

  16. [16]

    BJS111(1), znad353 (Jan 2024)

    Mascagni, P., Alapatt, D., Lapergola, A., Vardazaryan, A., Mazel- lier, J.P., Dallemagne, B., Mutter, D., Padoy, N.: Early- stage clinical evaluation of real-time artificial intelligence as- sistance for laparoscopic cholecystectomy. BJS111(1), znad353 (Jan 2024). https://doi.org/10.1093/bjs/znad353,https://doi. org/10.1093/bjs/znad353

  17. [17]

    npj Digital Medicine5(1), 163 (Oct 2022)

    Mascagni, P., Alapatt, D., Sestini, L., Altieri, M.S., Madani, A., Watanabe, Y ., Alseidi, A., Redan, J.A., Alfieri, S., Costamagna, G., Boškoski, I., Padoy, N., Hashimoto, D.A.: Computer vision in surgery: from potential to clinical value. npj Digital Medicine5(1), 163 (Oct 2022). https://doi.org/10.1038/s41746-022-00707-5,https://www. nature.com/article...

  18. [18]

    Annals of Surgery275(5), 955 (May 2022)

    Mascagni, P., Vardazaryan, A., Alapatt, D., Urade, T., Emre, T., Fiorillo, C., Pessaux, P., Mutter, D., Marescaux, J., Costamagna, G., Dallemagne, B., Padoy, N.: Artificial Intelligence for Surgical Safety: Automatic Assessment of the Critical View of Safety in Laparoscopic Cholecystectomy Using Deep Learning. Annals of Surgery275(5), 955 (May 2022). http...

  19. [19]

    Neurosurgical review40(4), 537–548 (Oct 2017)

    Meola, A., Cutolo, F., Carbone, M., Cagnazzo, F., Ferrari, M., Ferrari, V .: Augmented reality in neurosurgery: a sys- tematic review. Neurosurgical review40(4), 537–548 (Oct 2017). https://doi.org/10.1007/s10143-016-0732-9,https://pmc.ncbi. nlm.nih.gov/articles/PMC6155988/

  20. [20]

    IEEE Transactions on Medical Imaging43(3), 1247– 1258 (Mar 2024)

    Murali, A., Alapatt, D., Mascagni, P., Vardazaryan, A., Garcia, A., Okamoto, N., Mutter, D., Padoy, N.: Latent Graph Representations for Critical View of Safety Assess- ment. IEEE Transactions on Medical Imaging43(3), 1247– 1258 (Mar 2024). https://doi.org/10.1109/TMI.2023.3333034, https://ieeexplore.ieee.org/document/10319763 10

  21. [21]

    In: Pro- ceedings of the SIGCHI Conference on Human Factors in Computing Systems

    Nielsen, J., Molich, R.: Heuristic evaluation of user interfaces. In: Pro- ceedings of the SIGCHI Conference on Human Factors in Computing Systems. pp. 249–256. CHI ’90, Association for Computing Machinery, New York, NY , USA (Mar 1990). https://doi.org/10.1145/97243.97281, https://dl.acm.org/doi/10.1145/97243.97281

  22. [22]

    International Journal of Com- puter Assisted Radiology and Surgery20(6), 1145–1152 (Jun 2025)

    Nowak, F.M., Mazomenos, E.B., Davidson, B., Clarkson, M.J.: SwinCVS: a unified approach to classifying critical view of safety struc- tures in laparoscopic cholecystectomy. International Journal of Com- puter Assisted Radiology and Surgery20(6), 1145–1152 (Jun 2025). https://doi.org/10.1007/s11548-025-03354-9

  23. [23]

    AACN Advanced Critical Care24(4), 378–386 (Oct 2013)

    Sendelbach, S., Funk, M.: Alarm Fatigue: A Patient Safety Concern. AACN Advanced Critical Care24(4), 378–386 (Oct 2013). https://doi.org/10.4037/NCI.0b013e3182a903f9,https:// doi.org/10.4037/NCI.0b013e3182a903f9

  24. [24]

    Quality & safety in health care19(Suppl 3), i68– i74 (Oct 2010)

    Sittig, D.F., Singh, H.: A New Socio-technical Model for Study- ing Health Information Technology in Complex Adaptive Health- care Systems. Quality & safety in health care19(Suppl 3), i68– i74 (Oct 2010). https://doi.org/10.1136/qshc.2010.042085,https:// pmc.ncbi.nlm.nih.gov/articles/PMC3120130/

  25. [25]

    https://doi.org/10.1006/ijhc.1999.0252, https://www.sciencedirect.com/science/article/ pii/S1071581999902525

    Skitka, L.J., Mosier, K.L., Burdick, M.: Does automation bias decision-making? International Journal of Human-Computer Studies 51(5), 991–1006 (Nov 1999). https://doi.org/10.1006/ijhc.1999.0252, https://www.sciencedirect.com/science/article/ pii/S1071581999902525

  26. [26]

    Journal of the American College of Surgeons180(1), 101–125 (Jan 1995)

    Strasberg, S.M., Hertl, M., Soper, N.J.: An analysis of the problem of biliary injury during laparoscopic cholecystectomy. Journal of the American College of Surgeons180(1), 101–125 (Jan 1995)

  27. [27]

    Clinical trial registration NCT06732271, clinical- trials.gov (Dec 2024),https://clinicaltrials.gov/study/ NCT06732271, submitted: 2024-12-09

    Sun Yat-Sen Memorial Hospital of Sun Yat-Sen University: LC-Smart: A Deep Learning-Based Quality Control Model for Laparoscopic Cholecystectomy. Clinical trial registration NCT06732271, clinical- trials.gov (Dec 2024),https://clinicaltrials.gov/study/ NCT06732271, submitted: 2024-12-09

  28. [28]

    https://doi.org/10.48550/arXiv.1905.05134,http://arxiv.org/ abs/1905.05134, arXiv:1905.05134 [cs]

    Tonekaboni, S., Joshi, S., McCradden, M.D., Golden- berg, A.: What Clinicians Want: Contextualizing Explain- able Machine Learning for Clinical End Use (Aug 2019). https://doi.org/10.48550/arXiv.1905.05134,http://arxiv.org/ abs/1905.05134, arXiv:1905.05134 [cs]

  29. [29]

    International Journal for Quality in Health Care19(6), 349–357 (Dec 2007)

    Tong, A., Sainsbury, P., Craig, J.: Consolidated criteria for reporting qualitative research (COREQ): a 32-item checklist for interviews and focus groups. International Journal for Quality in Health Care19(6), 349–357 (Dec 2007). https://doi.org/10.1093/intqhc/mzm042,https: //doi.org/10.1093/intqhc/mzm042

  30. [30]

    Publisher: Nature Publishing Group

    Topol, E.J.: High-performance medicine: the convergence of hu- man and artificial intelligence. Nature Medicine25(1), 44–56 (Jan 2019). https://doi.org/10.1038/s41591-018-0300-7,https://www. nature.com/articles/s41591-018-0300-7

  31. [31]

    Journal of Healthcare Engineering 2017, 4574172 (2017)

    Vávra, P., Roman, J., Zon ˇca, P., Ihnát, P., N ˇemec, M., Kumar, J., Habib, N., El-Gendi, A.: Recent Development of Augmented Reality in Surgery: A Review. Journal of Healthcare Engineering 2017, 4574172 (2017). https://doi.org/10.1155/2017/4574172,https: //pmc.ncbi.nlm.nih.gov/articles/PMC5585624/

  32. [32]

    Journal of Robotic Surgery18(1), 110 (2024)

    Wong, S.W., Crowe, P.: Cognitive ergonomics and robotic surgery. Journal of Robotic Surgery18(1), 110 (2024). https://doi.org/10.1007/s11701-024-01852-7,https://pmc.ncbi. nlm.nih.gov/articles/PMC10914881/ 11 Supplementary Materials Supplementary Video 1 View Supplementary Video 1 Scan the QR code or click the link above to access the video. COREQ (COnsoli...