Scalable LLM-based Coding of Dialogue in Healthcare Simulation: Balancing Coding Performance, Processing Time, and Environmental Impact
Pith reviewed 2026-05-08 07:40 UTC · model grok-4.3
The pith
Increasing batch sizes in LLM prompting for healthcare dialogue coding improves speed and reduces energy consumption but decreases accuracy.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Testing four prompt designs across varying batch sizes on a dataset of 11,647 utterances coded for six dialogue constructs reveals that larger batch sizes improve processing speed and lower energy use while reducing coding performance compared to smaller batches.
What carries the argument
Batch size as the variable controlling the number of utterances processed together in each LLM call, which affects the balance between accuracy, latency, and power draw.
If this is right
- Larger batch sizes enable faster processing suitable for real-time debriefing in simulations.
- Reduced energy use supports sustainable deployment of LLM tools in educational settings.
- Smaller batches can be selected when maximum coding fidelity is required for research.
- Practical guidance emerges for scaling dialogue analytics where timeliness and sustainability matter.
Where Pith is reading between the lines
- If accuracy losses stay within limits acceptable for feedback, efficient batching could support real-time educational systems without high compute demands.
- The batching approach might transfer to dialogue analysis in other collaborative learning contexts such as classrooms if similar patterns hold.
- Different LLMs or refined prompts could be tested to reduce the accuracy penalty while keeping the speed gains.
Load-bearing premise
The accuracy-speed-energy trade-offs seen with this dataset of 11,647 utterances, specific prompts, and chosen LLM will hold in other settings and that the resulting accuracy levels remain useful for providing educational feedback.
What would settle it
Running the same experiments on a new dataset from a different healthcare simulation or with another LLM model and finding that accuracy does not decline with increased batch size would challenge the central trade-off claim.
Figures
read the original abstract
Research shows that dialogue, the interactive process through which participants articulate their thinking, plays a central role in constructing shared understanding, coordinating action, and shaping learning outcomes in teams. Analysing dialogue content has been central to advancing team learning theory and informing the design of computer-supported collaborative learning environments, yet this progress has depended on labour-intensive qualitative coding. LLMs offer new possibilities for automating and enhancing the dialogue layer within emerging multimodal learning analytics approaches, with recent studies showing that they can approximate human coding through few-shot prompting. However, prior work has focused on replicating human coding accuracy for research purposes, rather than addressing a more educationally consequential question: how can we design prompts that allow an LLM to label team dialogue accurately and fast enough to be useful in real settings, such as in-person healthcare simulations, where results must be returned quickly and computational cost and sustainability also matter? This paper investigates how prompt design and batching strategies can be optimised to balance coding accuracy, processing time, and environmental impact in team-based healthcare simulation debriefing. Using a dataset of 11,647 utterances coded across 6 dialogue constructs, we compared 4 prompt designs across varying batch sizes, evaluating coding performance, processing time, and energy consumption, as well as the trade-offs between these metrics. Results indicate that increasing batch size improves speed and reduces energy use, but negatively impacts coding performance. Beyond demonstrating the feasibility of LLM-based qualitative analysis, this study offers practical guidance for scaling dialogue analytics in contexts where timeliness, privacy, and sustainability are critical.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper reports an empirical study comparing four prompt designs and varying batch sizes for LLM-based coding of team dialogue in healthcare simulation debriefings. On a fixed dataset of 11,647 utterances labeled across six dialogue constructs, it measures coding performance, processing time, and energy consumption, finding that larger batch sizes improve speed and reduce energy use while degrading coding accuracy. The work positions this as practical guidance for scalable, timely, private, and sustainable dialogue analytics in educational settings.
Significance. If the reported accuracy-speed-energy trade-offs are robust, the study usefully shifts LLM dialogue-coding research from pure accuracy replication toward deployment-relevant constraints in real-time healthcare training. It supplies concrete empirical observations on batching effects for one dataset and model family, which could inform prompt and infrastructure choices where latency and carbon cost matter.
major comments (3)
- [Abstract / Results] Abstract and Results: directional claims that increasing batch size 'negatively impacts coding performance' are presented without statistical tests, confidence intervals, error bars, or exact numeric deltas (e.g., F1 or accuracy drops per batch size). This absence makes the central trade-off observation unverifiable from the reported text.
- [Methods / Evaluation] Methods / Evaluation: no human inter-rater reliability baselines (e.g., Cohen's kappa or percent agreement) are supplied for the six dialogue constructs, so the absolute and relative quality of the LLM outputs cannot be contextualized against the human coding standard the paper seeks to approximate.
- [Results] Results: the paper does not specify the exact performance metric(s) used (accuracy, macro-F1, exact match, etc.) or how ground-truth labels were obtained and validated, which is load-bearing for interpreting the reported performance degradation.
minor comments (2)
- [Abstract] The abstract states '4 prompt designs' and '6 dialogue constructs' but does not name them; adding explicit labels or a small table would improve readability.
- [Methods] Energy-consumption measurement protocol (hardware, carbon-intensity assumptions, tool used) should be stated more explicitly even if relegated to an appendix.
Simulated Author's Rebuttal
We thank the referee for their constructive feedback, which highlights opportunities to strengthen the clarity and verifiability of our empirical results on LLM batching trade-offs. We address each major comment below and will revise the manuscript accordingly.
read point-by-point responses
-
Referee: [Abstract / Results] Abstract and Results: directional claims that increasing batch size 'negatively impacts coding performance' are presented without statistical tests, confidence intervals, error bars, or exact numeric deltas (e.g., F1 or accuracy drops per batch size). This absence makes the central trade-off observation unverifiable from the reported text.
Authors: We agree that the directional claims require stronger statistical support for verifiability. The full results section contains per-batch performance values, but we did not include formal tests, CIs, or deltas in the abstract or summary text. In revision we will add exact F1/accuracy deltas between batch sizes, error bars on figures, and appropriate statistical tests (e.g., repeated-measures ANOVA or paired comparisons with correction) to substantiate the observed degradation. revision: yes
-
Referee: [Methods / Evaluation] Methods / Evaluation: no human inter-rater reliability baselines (e.g., Cohen's kappa or percent agreement) are supplied for the six dialogue constructs, so the absolute and relative quality of the LLM outputs cannot be contextualized against the human coding standard the paper seeks to approximate.
Authors: The ground-truth labels were produced by a single primary expert coder with secondary validation by the research team rather than independent parallel coding, which is why IRR statistics were not computed or reported. We will expand the Methods section with a full description of the labeling protocol, any available validation agreement figures, and an explicit discussion of how this affects interpretation of LLM performance relative to human standards. revision: partial
-
Referee: [Results] Results: the paper does not specify the exact performance metric(s) used (accuracy, macro-F1, exact match, etc.) or how ground-truth labels were obtained and validated, which is load-bearing for interpreting the reported performance degradation.
Authors: Macro-F1 was the primary metric (chosen for its suitability to the multi-construct coding task); ground-truth labels were obtained via expert manual coding of the 11,647 utterances by healthcare simulation researchers, with a validation subset reviewed for consistency. We will state the metric explicitly in Methods and Results, add details on label acquisition and validation procedures, and clarify how performance degradation was calculated. revision: yes
Circularity Check
No significant circularity; purely empirical comparison
full rationale
The paper reports an empirical study that compares four prompt designs across varying batch sizes on a fixed dataset of 11,647 utterances, measuring coding performance, processing time, and energy consumption directly from the experiments. No mathematical derivations, equations, fitted parameters presented as predictions, or self-citations that reduce the central claims to prior results by construction are present. The headline findings on accuracy-speed-energy trade-offs are stated as observed outcomes for this specific setup rather than as a universal law derived from inputs.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption LLMs can approximate human qualitative coding of dialogue through few-shot prompting
Reference graph
Works this paper leans on
-
[1]
Abulebda, K., Auerbach, M., Limaiem, F.: Debriefing techniques utilized in medical simulation. In: StatPearls. StatPearls Publishing, Treasure Island, FL (2025)
work page 2025
-
[2]
IEEE Access13, 5858–5870 (2025)
Algarni, A.M., Thayananthan, V.: Digital health: The cybersecurity for ai-based healthcare communication. IEEE Access13, 5858–5870 (2025). https://doi.org/10.1109/ACCESS.2025.3526666
-
[3]
An, M., Teffera, L., Mehrvarz, M., Li, B., Bogart, C., Sakr, M., M. McLaren, B.: Lever- aging intelligent tutoring systems to enhance project-based learning in work- force training at community colleges. In: Ferreira Mello, R., Rummel, N., Jivet, I., Pishtari, G., Ruipérez Valiente, J.A. (eds.) Technology Enhanced Learning for In- clusive and Equitable Qu...
work page 2024
-
[4]
In: Proceedings of the Eleventh ACM Conference on Learning @ Scale
Barno, E., Albaladejo-González, M., Reich, J.: Scaling generated feedback for novice teachers by sustaining teacher educators’ expertise: A design to train llms with teacher educator endorsement of generated feedback. In: Proceedings of the Eleventh ACM Conference on Learning @ Scale. p. 412–416. L@S ’24, ACM, New York, NY, USA (2024). https://doi.org/10....
-
[5]
Berthelot, A., Caron, E., Jay, M., Lefèvre, L.: Understanding the environmen- tal impact of generative ai services. Commun. ACM68(7), 46–53 (Jun 2025). https://doi.org/10.1145/3725984
-
[6]
Cheng, Z., Kasai, J., Yu, T.: Batch Prompting: Efficient Inference with Large Language Model APIs. In: Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: Industry Track. pp. 792–810. Association for Computational Linguistics, Singapore (2023). https://doi.org/10.18653/v1/2023.emnlp-industry.74
-
[7]
In: Proceedings of the 2nd Workshop on Sustainable Computer Systems
Chien, A.A., Lin, L., Nguyen, H., Rao, V., Sharma, T., Wijayawardana, R.: Reducing the carbon impact of generative ai inference (today and in 2035). In: Proceedings of the 2nd Workshop on Sustainable Computer Systems. HotCarbon ’23, ACM, New York, NY, USA (2023). https://doi.org/10.1145/3604930.3605705
-
[8]
Educational Technology & Society21(2), 273–290 (2018)
Choi, S.P.M., Lam, S.S., Li, K.C., Wong, B.T.M.: Learning analytics at low cost: At- risk student prediction with clicker data and systematic proactive interventions. Educational Technology & Society21(2), 273–290 (2018)
work page 2018
-
[9]
International Journal of Social Research Methodology15(6), 523–543 (2012)
Crowston, K., Allen, E.E., Heckman, R.: Using natural language processing tech- nology for qualitative data analysis. International Journal of Social Research Methodology15(6), 523–543 (2012)
work page 2012
-
[10]
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: Pre-training of deep bidirec- tional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)...
-
[11]
Medical Teacher31(7), e287–e294 (Jan 2009)
Dieckmann, P., Molin Friis, S., Lippert, A., Østergaard, D.: The art and science of debriefing in simulation: Ideal and practice. Medical Teacher31(7), e287–e294 (Jan 2009). https://doi.org/10.1080/01421590902866218
-
[12]
In: 2024 IEEE 15th International Green and Sustainable Computing Conference (IGSC)
Ding, Y., Shi, T.: Sustainable llm serving: Environmental implications, chal- lenges, and opportunities : Invited paper. In: 2024 IEEE 15th International Green and Sustainable Computing Conference (IGSC). pp. 37–38 (2024). https://doi.org/10.1109/IGSC64514.2024.00016
-
[13]
In: Proceedings of the 14th Learning Analytics and Knowledge Conference
Echeverria, V., Yan, L., Zhao, L., Abel, S., Alfredo, R., Dix, S., Jaggard, H., Wother- spoon, R., Osborne, A., Buckingham Shum, S., Gasevic, D., Martinez-Maldonado, R.: TeamSlides: A Multimodal Teamwork Analytics Dashboard for Teacher-guided Reflection in a Physical Learning Space. In: Proceedings of the 14th Learning Analytics and Knowledge Conference. ...
-
[14]
In: Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems
Echeverria, V., Zhao, L., Alfredo, R., Milesi, M.E., Jin, Y., Abel, S., Fan, J.X., Yan, L., Dix, S., Wotherspoon, R., Li, X., Jaggard, H.A., Osborne, A., Buckingham Shum, S., Gasevic, D., Martinez-Maldonado, R.: TeamVision: An AI-powered Learning Analytics System for Supporting Reflection in Team-based Healthcare Simulation. In: Proceedings of the 2025 CH...
-
[15]
Elsworth, C., Huang, K., Patterson, D., Schneider, I., Sedivy, R., Goodman, S., Townsend, B., Ranganathan, P., Dean, J., Vahdat, A., Gomes, B., Manyika, J.: Measuring the environmental impact of delivering AI at Google Scale (Aug 2025). https://doi.org/10.48550/arXiv.2508.15734
-
[16]
Educational Technology & Society28(4), 166–182 (October 2025)
Erdoğdu, F., Kara, M., Gökoğlu, S., Telci, S.: Trends and insights in cscl research from the emergence to the present: A review through bibliometric and latent dirichlet allocation analyses. Educational Technology & Society28(4), 166–182 (October 2025)
work page 2025
-
[17]
Fanning, R.M., Gaba, D.M.: The role of debriefing in simulation-based learning. Simulation in Healthcare: The Journal of the Society for Simulation in Healthcare 2(2), 115–125 (2007). https://doi.org/10.1097/SIH.0b013e3180315539
-
[18]
Educational Psychologist48(1), 9–24 (2013)
Fransen, J., Weinberger, A., Kirschner, P.A.: Team effectiveness and team development in cscl. Educational Psychologist48(1), 9–24 (2013). https://doi.org/10.1080/00461520.2012.747947
-
[19]
In: Proceedings of the 14th Learning Analytics and Knowledge Conference
Garg, R., Han, J., Cheng, Y., Fang, Z., Swiecki, Z.: Automated discourse analysis via generative artificial intelligence. In: Proceedings of the 14th Learning Analytics and Knowledge Conference. p. 814–820. LAK ’24, ACM, New York, NY, USA (2024). https://doi.org/10.1145/3636555.3636879
-
[20]
Gašević, D., Dawson, S., Siemens, G.: Let’s not forget: Learning analytics are about learning. TechTrends59(1), 64–71 (2015). https://doi.org/10.1007/s11528- 014-0822-x
-
[21]
Transactions of the Association for Computational Linguistics11, 351–366 (Apr 2023)
Gekhman, Z., Oved, N., Keller, O., Szpektor, I., Reichart, R.: On the robust- ness of dialogue history representation in conversational question answer- ing: A comprehensive study and a new prompt-based method. Transactions of the Association for Computational Linguistics11, 351–366 (Apr 2023). https://doi.org/10.1162/tacl_a_00549
-
[22]
doi: 10.1038/s41586-025-09422-z
Guo, D., Yang, D., Zhang, H., Song, J., et al.: DeepSeek-R1 incentivizes reasoning in LLMs through reinforcement learning. Nature645(8081), 633–638 (Sep 2025). https://doi.org/10.1038/s41586-025-09422-z
-
[23]
In: Cress, U., Rosé, C., Wise, A.F., Oshima, J
Hmelo-Silver, C.E., Jeong, H.: An overview of cscl methods. In: Cress, U., Rosé, C., Wise, A.F., Oshima, J. (eds.) International Handbook of Computer-Supported Col- laborative Learning, Computer-Supported Collaborative Learning Series, vol. 19. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-65291-3_4
-
[24]
In: Proceedings of the Eleventh ACM Conference on Learning @ Scale
Hutt, S., Hieb, G.: Scaling up mastery learning with generative ai: Explor- ing how generative ai can assist in the generation and evaluation of mas- tery quiz questions. In: Proceedings of the Eleventh ACM Conference on Learning @ Scale. p. 310–314. L@S ’24, ACM, New York, NY, USA (2024). https://doi.org/10.1145/3657604.3664699
- [25]
-
[26]
International Journal of Computer-Supported Collaborative Learning9(3), 305–334 (2014)
Jeong, H., Hmelo-Silver, C.E., Yu, Y.: An examination of cscl methodological practices and the influence of theoretical frameworks 2005–2009. International Journal of Computer-Supported Collaborative Learning9(3), 305–334 (2014). https://doi.org/10.1007/s11412-014-9198-3
-
[27]
Proceedings of the VLDB Endowment18(7), 2172–2184 (Mar 2025)
Ji, Z., Wang, X., Luo, Z., Xie, Z., Zhang, M.: Optimized Batch Prompting for Cost-Effective LLMs. Proceedings of the VLDB Endowment18(7), 2172–2184 (Mar 2025). https://doi.org/10.14778/3734839.3734853
-
[28]
In: Proceedings of the Eleventh ACM Conference on Learning @ Scale
Jin, Y., Yu, J.: Optimizing mentor-student communication using llm-based auto- mated labeling information states. In: Proceedings of the Eleventh ACM Confer- ence on Learning @ Scale. p. 284–288. L@S ’24, ACM, New York, NY, USA (2024). https://doi.org/10.1145/3657604.3664691
-
[29]
Landis, J.R., Koch, G.G.: The Measurement of Observer Agreement for Categorical Data. Biometrics33(1), 159 (Mar 1977). https://doi.org/10.2307/2529310
-
[30]
In: 2025 7th International Conference on Computer Science and Technologies in Education (CSTE)
Li, M., Qin, W., Tang, Z., Fang, X., He, T., Cao, X.: Automating ssrl detec- tion in asynchronous ocl via llms. In: 2025 7th International Conference on Computer Science and Technologies in Education (CSTE). pp. 548–551 (2025). https://doi.org/10.1109/CSTE64638.2025.11092245
-
[31]
Future in Educational Researchn/a(n/a) (2025)
Liao, J., Sun, F., Liu, Y., Hu, Y.: Deepseek in education: Exploring the transfor- mative potential of ai-driven educational intelligence. Future in Educational Researchn/a(n/a) (2025). https://doi.org/10.1002/fer3.70022
-
[32]
Liu, Kevin Lin, John Hewitt, Ashwin Paranjape, Michele Bevilacqua, Fabio Petroni, and Percy Liang
Liu, N.F., Lin, K., Hewitt, J., Paranjape, A., Bevilacqua, M., Petroni, F., Liang, P.: Lost in the middle: How language models use long contexts. Transac- tions of the Association for Computational Linguistics12, 157–173 (02 2024). https://doi.org/10.1162/tacl_a_00638
-
[33]
Journal of Learning Analytics12(1), 169–185 (2025)
Liu, X., Zambrano, A.F., Baker, R.S., Barany, A., Ocumpaugh, J., Zhang, J., Pankiewicz, M., Nasiar, N., Wei, Z.: Qualitative coding with gpt-4: Where it works better. Journal of Learning Analytics12(1), 169–185 (2025). https://doi.org/10.18608/jla.2025.8575
-
[34]
Martinez-Maldonado, R., Echeverria, V., Fernandez-Nieto, G., Yan, L., Zhao, L., Alfredo, R., Li, X., Dix, S., Jaggard, H., Wotherspoon, R., Osborne, A., Shum, S.B., Gašević, D.: Lessons learnt from a multimodal learning analytics de- ployment in-the-wild. ACM Trans. Comput.-Hum. Interact.31(1) (Nov 2023). https://doi.org/10.1145/3622784
-
[35]
Computers in Human Behavior 71, 327–342 (2017)
Martinez-Maldonado, R., Goodyear, P., Carvalho, L., Thompson, K., Hernandez- Leo, D., Dimitriadis, Y., Prieto, L.P., Wardak, D.: Supporting collaborative design activity in a multi-user digital design ecology. Computers in Human Behavior 71, 327–342 (2017). https://doi.org/10.1016/j.chb.2017.01.055
-
[36]
In: Proceedings of the Seventh International Learning Analytics & Knowledge Conference
Martinez-Maldonado, R., Power, T., Hayes, C., Abdiprano, A., Vo, T., Axisa, C., Buckingham Shum, S.: Analytics meet patient manikins: challenges in an authen- tic small-group healthcare simulation classroom. In: Proceedings of the Seventh International Learning Analytics & Knowledge Conference. p. 90–94. LAK ’17, ACM, New York, NY, USA (2017). https://doi...
-
[37]
Journal of Computer Assisted Learning36(5), 741–762 (2020)
Martinez-Maldonado, R., Schulte, J., Echeverria, V., Gopalan, Y., Shum, S.B.: Where is the teacher? digital analytics for classroom proxemics. Journal of Computer Assisted Learning36(5), 741–762 (2020). https://doi.org/10.1111/jcal.12444
-
[38]
Biochemia medica22(3), 276–282 (2012).https://doi.org/10.11613/BM.2012.031
McHugh, M.L.: Interrater reliability: The kappa statistic. Biochemia Medica22(3), 276–282 (2012). https://doi.org/10.11613/BM.2012.031
-
[39]
In: Proceedings of the Twelfth ACM Confer- ence on Learning @ Scale
Mehta, S., Srivastava, N., Liu, X., Vanacore, K., Baker, R.S.: Do mooc conver- sations matter? investigating the role of social presence and course-relevant discussion in career advancement. In: Proceedings of the Twelfth ACM Confer- ence on Learning @ Scale. p. 236–240. L@S ’25, ACM, New York, NY, USA (2025). https://doi.org/10.1145/3698205.3733930 L@S ’...
-
[40]
Journal of Nursing Management17(2), 247–255 (Mar 2009)
Miller, K., Riley, W., Davis, S.: Identifying key nursing and team behaviours to achieve high reliability. Journal of Nursing Management17(2), 247–255 (Mar 2009). https://doi.org/10.1111/j.1365-2834.2009.00978.x
-
[41]
In: Proceed- ings of the Eleventh ACM Conference on Learning @ Scale
Moore, S., Schmucker, R., Mitchell, T., Stamper, J.: Automated generation and tagging of knowledge components from multiple-choice questions. In: Proceed- ings of the Eleventh ACM Conference on Learning @ Scale. p. 122–133. L@S ’24, ACM, New York, NY, USA (2024). https://doi.org/10.1145/3657604.3662030
-
[42]
Ngatchou, P., Zarei, A., El-Sharkawi, A.: Pareto multi objective op- timization. In: Proceedings of the 13th International Conference on, Intelligent Systems Application to Power Systems. pp. 84–91 (2005). https://doi.org/10.1109/ISAP.2005.1599245
-
[43]
In: Proceedings of the Eleventh ACM Conference on Learning @ Scale
Nguyen, H., Stott, N., Allan, V.: Comparing feedback from large language mod- els and instructors: Teaching computer science at scale. In: Proceedings of the Eleventh ACM Conference on Learning @ Scale. p. 335–339. L@S ’24, ACM, New York, NY, USA (2024). https://doi.org/10.1145/3657604.3664660
-
[44]
In: Proceedings of the Twelfth ACM Conference on Learning @ Scale
Nie, A., Chandak, Y., Suzara, M., Malik, A., Woodrow, J., Peng, M., Sahami, M., Brunskill, E., Piech, C.: The gpt surprise: Offering large language model chat in a massive coding class reduced engagement but may increase adopters’ exam performances. In: Proceedings of the Twelfth ACM Conference on Learning @ Scale. p. 376–380. L@S ’25, ACM, New York, NY, ...
-
[45]
In: Proceedings of the Tenth ACM Conference on Learning @ Scale
Ouhaichi, H., Spikol, D., Vogel, B.: Rethinking mmla: Design considerations for multimodal learning analytics systems. In: Proceedings of the Tenth ACM Conference on Learning @ Scale. p. 354–359. L@S ’23, ACM, New York, NY, USA (2023). https://doi.org/10.1145/3573051.3596186
-
[46]
In: Proceedings of the Third (2016) ACM Conference on Learning @ Scale
Papathoma, T., Ferguson, R., Littlejohn, A., Coe, A.: Making the production of learning at scale more open and flexible. In: Proceedings of the Third (2016) ACM Conference on Learning @ Scale. p. 273–276. L@S ’16, ACM, New York, NY, USA (2016). https://doi.org/10.1145/2876034.2893432
-
[47]
In: Proceedings of the 40th International Conference on Machine Learning
Radford, A., Kim, J.W., Xu, T., Brockman, G., McLeavey, C., Sutskever, I.: Robust speech recognition via large-scale weak supervision. In: Proceedings of the 40th International Conference on Machine Learning. ICML’23 (2023)
work page 2023
-
[48]
In: Henriksen, K., Battles, J.B., Keyes, M.A., Grady, M.L
Riley, W., Hansen, H., Gürses, A.P., Davis, S., Miller, K., Priester, R.: The nature, characteristics and patterns of perinatal critical events teams. In: Henriksen, K., Battles, J.B., Keyes, M.A., Grady, M.L. (eds.) Advances in Patient Safety: New Directions and Alternative Approaches (Vol. 3: Performance and Tools). Agency for Healthcare Research and Qu...
work page 2008
-
[49]
Journal of Continuing Education in the Health Professions32(4), 243–254 (2012)
Rosen, M.A., Hunt, E.A., Pronovost, P.J., Federowicz, M.A., Weaver, S.J.: In Situ Simulation in Continuing Education for the Health Care Professions: A Systematic Review. Journal of Continuing Education in the Health Professions32(4), 243–254 (2012). https://doi.org/10.1002/chp.21152
-
[50]
American Psychologist73(4), 593–600 (2018)
Salas, E., Reyes, D.L., McDaniel, S.H.: The science of teamwork: Progress, re- flections, and the road ahead. American Psychologist73(4), 593–600 (2018). https://doi.org/10.1037/amp0000334
-
[51]
In: Pro- ceedings of the 16th International Learning Analytics & Knowledge Conference (LAK ’26)
Samaraweera, S., Zhao, L., Echeverria, V., Alfredo, R., Chen, G., Davis, J., Leonny, S., Sevenhuysen, S., Connell, C., Gasevic, D., Martinez-Maldonado, R., Dhar- maratne, A.: From formal learning to professional practice: Automated llm-based coding and visualisation of team dialogue in in-situ healthcare simulation. In: Pro- ceedings of the 16th Internati...
work page 2026
-
[52]
Communication & Medicine13(1), 1–7 (2017)
Sarangi, S.: Editorial: Team work and team talk as distributed and coordinated action in healthcare delivery. Communication & Medicine13(1), 1–7 (2017). https://doi.org/10.1558/cam.32569
-
[53]
JMIR Medical Informatics12, e55318 (Apr 2024)
Sivarajkumar, S., Kelley, M., Samolyk-Mazzanti, A., Visweswaran, S., Wang, Y.: An Empirical Evaluation of Prompting Strategies for Large Language Mod- els in Zero-Shot Clinical Natural Language Processing: Algorithm Develop- ment and Validation Study. JMIR Medical Informatics12, e55318 (Apr 2024). https://doi.org/10.2196/55318
-
[54]
Southwell, R., Pugh, S., E. Margaret Perkoff, Clevenger, C., Bush, J., Lieber, R., Ward, W., Foltz, P., D’Mello, S.: Challenges and Feasibility of Auto- matic Speech Recognition for Modeling Student Collaborative Discourse in Classrooms. In: Mitrovic, A., Bosch, N. (eds.) Proceedings of the 15th International Conference on Educational Data Mining. Zenodo ...
-
[55]
Stadler, W.: Multicriteria Optimization in Engineering and in the Sciences, vol. 37. Springer Science & Business Media (1988)
work page 1988
-
[56]
MIT Press, Cambridge, MA (2006)
Stahl, G.: Group Cognition: Computer Support for Building Collaborative Knowledge. MIT Press, Cambridge, MA (2006). https://doi.org/10.7551/mitpress/3372.001.0001
-
[57]
Journal of Medical Internet Research27, e58744 (Feb 2025)
Stenseth, H.V., Steindal, S.A., Solberg, M.T., Ølnes, M.A., Sørensen, A.L., Strandell- Laine, C., Olaussen, C., Farsjø Aure, C., Pedersen, I., Zlamal, J., Gue Mar- tini, J., Bresolin, P., Linnerud, S.C.W., Nes, A.A.G.: Simulation-Based Learning Supported by Technology to Enhance Critical Thinking in Nursing Students: Scoping Review. Journal of Medical Int...
-
[58]
International Journal of Surgery53, 171–177 (2018)
Sun, R., Marshall, D.C., Sykes, M.C., Maruthappu, M., Shalhoub, J.: The impact of improving teamwork on patient outcomes in surgery: A sys- tematic review. International Journal of Surgery53, 171–177 (2018). https://doi.org/10.1016/j.ijsu.2018.03.044
-
[59]
Advances in Simulation (Jan 2026)
Tscholl, D.W., Ebensperger, M., RahrischRahrisch, A., Wang, H., Heckel, H., Thomasius, M., Kaserer, A., Grande, B., Seelandt, J.C., Kolbe, M.: Generative AI in simulation debriefings: An exploratory study using the Team-FIRST frame- work and qualitative feedback from simulation experts and learners. Advances in Simulation (Jan 2026). https://doi.org/10.11...
-
[60]
British Journal of Educational Technology56(6), 2671–2704 (2025)
Wang, D., Chen, G.: Evaluating the use of bert and llama to anal- yse classroom dialogue for teachers’ learning of dialogic pedagogy. British Journal of Educational Technology56(6), 2671–2704 (2025). https://doi.org/https://doi.org/10.1111/bjet.13604
-
[61]
International Journal of Educational Research 123, 102275 (2024)
Wang, D., Tao, Y., Chen, G.: Artificial intelligence in classroom discourse: A sys- tematic review of the past decade. International Journal of Educational Research 123, 102275 (2024)
work page 2024
-
[62]
In: Proceedings of the Twelfth ACM Conference on Learning @ Scale
Wang, D., Yang, C., Chen, G.: Using lora to fine-tune large language models for analyzing collaborative argumentation in classrooms. In: Proceedings of the Twelfth ACM Conference on Learning @ Scale. p. 207–211. L@S ’25, ACM, New York, NY, USA (2025). https://doi.org/10.1145/3698205.3733924
-
[63]
In: Proceedings of the 30th Conference on Pattern Languages of Programs
White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer- Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with chatgpt. In: Proceedings of the 30th Conference on Pattern Languages of Programs. PLoP ’23, The Hillside Group, USA (2023)
work page 2023
-
[64]
International Journal of Artificial Intelligence in Education35(4), 2421– 2452 (Dec 2025)
Yan, L., Gašević, D., Echeverria, V., Zhao, L., Jin, Y., Li, X., Martinez-Maldonado, R.: In Sync or Out of Sync? Understanding Stress and Learning Performance in Collaborative Healthcare Simulations through Physiological Synchrony and Arousal. International Journal of Artificial Intelligence in Education35(4), 2421– 2452 (Dec 2025). https://doi.org/10.100...
-
[65]
In: Pro- ceedings of the Ninth ACM Conference on Learning @ Scale
Yang, H., Alozie, N., Rachmatullah, A.: Collaboration at scale: Exploring member role changing patterns in collaborative science problem-solving tasks. In: Pro- ceedings of the Ninth ACM Conference on Learning @ Scale. p. 309–312. L@S ’22, ACM, New York, NY, USA (2022). https://doi.org/10.1145/3491140.3528319
-
[66]
You, J., Chung, J.W., Chowdhury, M.: Zeus: Understanding and optimizing GPU energy consumption of DNN training. In: Usenix Nsdi (2023)
work page 2023
-
[67]
Baker, Juhan Kim, and Nidhi Nasiar
Zambrano, A.F., Liu, X., Barany, A., Baker, R.S., Kim, J., Nasiar, N.: From ncoder to chatgpt: From automated coding to refining human coding. In: Arastoopour Ir- gens, G., Knight, S. (eds.) Advances in Quantitative Ethnography, Communica- tions in Computer and Information Science, vol. 1895. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-47014-1_32
-
[68]
In: Proceedings of the Eleventh ACM Conference on Learning @ Scale
Zhang, A.G., Tang, X., Oney, S., Chen, Y.: Cflow: Supporting semantic flow analy- sis of students’ code in programming problems at scale. In: Proceedings of the Eleventh ACM Conference on Learning @ Scale. p. 188–199. L@S ’24, ACM, New York, NY, USA (2024). https://doi.org/10.1145/3657604.3662025
-
[69]
In: Proceedings of the 14th Learning Analytics and Knowledge Conference
Zhao, L., Echeverria, V., Swiecki, Z., Yan, L., Alfredo, R., Li, X., Gase- vic, D., Martinez-Maldonado, R.: Epistemic network analysis for end-users: Closing the loop in the context of multimodal analytics for collaborative team learning. In: Proceedings of the 14th Learning Analytics and Knowl- edge Conference. p. 90–100. LAK ’24, ACM, New York, NY, USA ...
-
[70]
British Journal of Educational Technology55(4), 1673–1702 (Jul 2024)
Zhao, L., Gašević, D., Swiecki, Z., Li, Y., Lin, J., Sha, L., Yan, L., Alfredo, R., Li, X., Martinez-Maldonado, R.: Towards automated transcribing and coding of embodied teamwork communication through multimodal learning analyt- ics. British Journal of Educational Technology55(4), 1673–1702 (Jul 2024). https://doi.org/10.1111/bjet.13476
-
[71]
In: LAK23: 13th International Learning Analytics and Knowledge Conference
Zhao, L., Swiecki, Z., Gasevic, D., Yan, L., Dix, S., Jaggard, H., Wotherspoon, R., Osborne, A., Li, X., Alfredo, R., Martinez-Maldonado, R.: Mets: Multimodal learning analytics of embodied teamwork learning. In: LAK23: 13th International Learning Analytics and Knowledge Conference. p. 186–196. LAK2023, ACM, New York, NY, USA (2023). https://doi.org/10.11...
-
[72]
https://doi.org/10.48550/ARXIV.2405.16178 Received 16 February 2026
Zhu, Y., Gu, J.C., Sikora, C., Ko, H., Liu, Y., Lin, C.C., Shu, L., Luo, L., Meng, L., Liu, B., Chen, J.: Accelerating Inference of Retrieval-Augmented Generation via Sparse Context Selection (2024). https://doi.org/10.48550/ARXIV.2405.16178 Received 16 February 2026
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.