Scalable LLM-based Coding of Dialogue in Healthcare Simulation: Balancing Coding Performance, Processing Time, and Environmental Impact

Dragan Gasevic; Gloria Milena Fernandez-Nieto; Kiyoshige Garces; Linxuan Zhao; Roberto Martinez-Maldonado; Sachini Samaraweera; Vanessa Echeverria

arxiv: 2604.23255 · v1 · submitted 2026-04-25 · 💻 cs.HC · cs.AI· cs.CY

Scalable LLM-based Coding of Dialogue in Healthcare Simulation: Balancing Coding Performance, Processing Time, and Environmental Impact

Kiyoshige Garces , Gloria Milena Fernandez-Nieto , Linxuan Zhao , Sachini Samaraweera , Dragan Gasevic , Roberto Martinez-Maldonado , Vanessa Echeverria This is my paper

Pith reviewed 2026-05-08 07:40 UTC · model grok-4.3

classification 💻 cs.HC cs.AIcs.CY

keywords LLM codingdialogue analysishealthcare simulationbatch sizeenergy consumptionqualitative codingteam learning

0 comments

The pith

Increasing batch sizes in LLM prompting for healthcare dialogue coding improves speed and reduces energy consumption but decreases accuracy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines ways to make LLM-based coding of team dialogue practical for healthcare simulation training. It tests different prompt designs and batch sizes on over 11,000 utterances to see how they affect accuracy, speed, and environmental cost. A sympathetic reader would care because manual qualitative coding is slow and expensive, while automated systems need to deliver results quickly enough for debriefing sessions without excessive computing resources. The study finds that larger batches trade some coding quality for faster and greener processing. This points toward design choices that could make such tools viable in real educational environments where time and sustainability matter.

Core claim

Testing four prompt designs across varying batch sizes on a dataset of 11,647 utterances coded for six dialogue constructs reveals that larger batch sizes improve processing speed and lower energy use while reducing coding performance compared to smaller batches.

What carries the argument

Batch size as the variable controlling the number of utterances processed together in each LLM call, which affects the balance between accuracy, latency, and power draw.

If this is right

Larger batch sizes enable faster processing suitable for real-time debriefing in simulations.
Reduced energy use supports sustainable deployment of LLM tools in educational settings.
Smaller batches can be selected when maximum coding fidelity is required for research.
Practical guidance emerges for scaling dialogue analytics where timeliness and sustainability matter.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If accuracy losses stay within limits acceptable for feedback, efficient batching could support real-time educational systems without high compute demands.
The batching approach might transfer to dialogue analysis in other collaborative learning contexts such as classrooms if similar patterns hold.
Different LLMs or refined prompts could be tested to reduce the accuracy penalty while keeping the speed gains.

Load-bearing premise

The accuracy-speed-energy trade-offs seen with this dataset of 11,647 utterances, specific prompts, and chosen LLM will hold in other settings and that the resulting accuracy levels remain useful for providing educational feedback.

What would settle it

Running the same experiments on a new dataset from a different healthcare simulation or with another LLM model and finding that accuracy does not decline with increased batch size would challenge the central trade-off claim.

Figures

Figures reproduced from arXiv: 2604.23255 by Dragan Gasevic, Gloria Milena Fernandez-Nieto, Kiyoshige Garces, Linxuan Zhao, Roberto Martinez-Maldonado, Sachini Samaraweera, Vanessa Echeverria.

**Figure 1.** Figure 1: Processing Time in Seconds across prompts designs and batch sizes. view at source ↗

**Figure 2.** Figure 2: Macro-averaged F1 scores and batch sizes illustrat view at source ↗

**Figure 3.** Figure 3: Energy Consumption in Joules across prompts designs and batch sizes. view at source ↗

**Figure 4.** Figure 4: Pareto front. The highlighted points depict the Pareto front for the two dimensions to optimise: Processing Time and view at source ↗

read the original abstract

Research shows that dialogue, the interactive process through which participants articulate their thinking, plays a central role in constructing shared understanding, coordinating action, and shaping learning outcomes in teams. Analysing dialogue content has been central to advancing team learning theory and informing the design of computer-supported collaborative learning environments, yet this progress has depended on labour-intensive qualitative coding. LLMs offer new possibilities for automating and enhancing the dialogue layer within emerging multimodal learning analytics approaches, with recent studies showing that they can approximate human coding through few-shot prompting. However, prior work has focused on replicating human coding accuracy for research purposes, rather than addressing a more educationally consequential question: how can we design prompts that allow an LLM to label team dialogue accurately and fast enough to be useful in real settings, such as in-person healthcare simulations, where results must be returned quickly and computational cost and sustainability also matter? This paper investigates how prompt design and batching strategies can be optimised to balance coding accuracy, processing time, and environmental impact in team-based healthcare simulation debriefing. Using a dataset of 11,647 utterances coded across 6 dialogue constructs, we compared 4 prompt designs across varying batch sizes, evaluating coding performance, processing time, and energy consumption, as well as the trade-offs between these metrics. Results indicate that increasing batch size improves speed and reduces energy use, but negatively impacts coding performance. Beyond demonstrating the feasibility of LLM-based qualitative analysis, this study offers practical guidance for scaling dialogue analytics in contexts where timeliness, privacy, and sustainability are critical.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper reports an empirical study comparing four prompt designs and varying batch sizes for LLM-based coding of team dialogue in healthcare simulation debriefings. On a fixed dataset of 11,647 utterances labeled across six dialogue constructs, it measures coding performance, processing time, and energy consumption, finding that larger batch sizes improve speed and reduce energy use while degrading coding accuracy. The work positions this as practical guidance for scalable, timely, private, and sustainable dialogue analytics in educational settings.

Significance. If the reported accuracy-speed-energy trade-offs are robust, the study usefully shifts LLM dialogue-coding research from pure accuracy replication toward deployment-relevant constraints in real-time healthcare training. It supplies concrete empirical observations on batching effects for one dataset and model family, which could inform prompt and infrastructure choices where latency and carbon cost matter.

major comments (3)

[Abstract / Results] Abstract and Results: directional claims that increasing batch size 'negatively impacts coding performance' are presented without statistical tests, confidence intervals, error bars, or exact numeric deltas (e.g., F1 or accuracy drops per batch size). This absence makes the central trade-off observation unverifiable from the reported text.
[Methods / Evaluation] Methods / Evaluation: no human inter-rater reliability baselines (e.g., Cohen's kappa or percent agreement) are supplied for the six dialogue constructs, so the absolute and relative quality of the LLM outputs cannot be contextualized against the human coding standard the paper seeks to approximate.
[Results] Results: the paper does not specify the exact performance metric(s) used (accuracy, macro-F1, exact match, etc.) or how ground-truth labels were obtained and validated, which is load-bearing for interpreting the reported performance degradation.

minor comments (2)

[Abstract] The abstract states '4 prompt designs' and '6 dialogue constructs' but does not name them; adding explicit labels or a small table would improve readability.
[Methods] Energy-consumption measurement protocol (hardware, carbon-intensity assumptions, tool used) should be stated more explicitly even if relegated to an appendix.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive feedback, which highlights opportunities to strengthen the clarity and verifiability of our empirical results on LLM batching trade-offs. We address each major comment below and will revise the manuscript accordingly.

read point-by-point responses

Referee: [Abstract / Results] Abstract and Results: directional claims that increasing batch size 'negatively impacts coding performance' are presented without statistical tests, confidence intervals, error bars, or exact numeric deltas (e.g., F1 or accuracy drops per batch size). This absence makes the central trade-off observation unverifiable from the reported text.

Authors: We agree that the directional claims require stronger statistical support for verifiability. The full results section contains per-batch performance values, but we did not include formal tests, CIs, or deltas in the abstract or summary text. In revision we will add exact F1/accuracy deltas between batch sizes, error bars on figures, and appropriate statistical tests (e.g., repeated-measures ANOVA or paired comparisons with correction) to substantiate the observed degradation. revision: yes
Referee: [Methods / Evaluation] Methods / Evaluation: no human inter-rater reliability baselines (e.g., Cohen's kappa or percent agreement) are supplied for the six dialogue constructs, so the absolute and relative quality of the LLM outputs cannot be contextualized against the human coding standard the paper seeks to approximate.

Authors: The ground-truth labels were produced by a single primary expert coder with secondary validation by the research team rather than independent parallel coding, which is why IRR statistics were not computed or reported. We will expand the Methods section with a full description of the labeling protocol, any available validation agreement figures, and an explicit discussion of how this affects interpretation of LLM performance relative to human standards. revision: partial
Referee: [Results] Results: the paper does not specify the exact performance metric(s) used (accuracy, macro-F1, exact match, etc.) or how ground-truth labels were obtained and validated, which is load-bearing for interpreting the reported performance degradation.

Authors: Macro-F1 was the primary metric (chosen for its suitability to the multi-construct coding task); ground-truth labels were obtained via expert manual coding of the 11,647 utterances by healthcare simulation researchers, with a validation subset reviewed for consistency. We will state the metric explicitly in Methods and Results, add details on label acquisition and validation procedures, and clarify how performance degradation was calculated. revision: yes

Circularity Check

0 steps flagged

No significant circularity; purely empirical comparison

full rationale

The paper reports an empirical study that compares four prompt designs across varying batch sizes on a fixed dataset of 11,647 utterances, measuring coding performance, processing time, and energy consumption directly from the experiments. No mathematical derivations, equations, fitted parameters presented as predictions, or self-citations that reduce the central claims to prior results by construction are present. The headline findings on accuracy-speed-energy trade-offs are stated as observed outcomes for this specific setup rather than as a universal law derived from inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the empirical observation that batch size modulates the three metrics; no free parameters are fitted to produce the result, but the work assumes LLM few-shot prompting can be tuned for practical use.

axioms (1)

domain assumption LLMs can approximate human qualitative coding of dialogue through few-shot prompting
Invoked in abstract as established by recent studies, forming the basis for testing prompt and batch optimizations.

pith-pipeline@v0.9.0 · 5618 in / 1179 out tokens · 39217 ms · 2026-05-08T07:40:33.770469+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

72 extracted references · 72 canonical work pages

[1]

In: StatPearls

Abulebda, K., Auerbach, M., Limaiem, F.: Debriefing techniques utilized in medical simulation. In: StatPearls. StatPearls Publishing, Treasure Island, FL (2025)

work page 2025
[2]

IEEE Access13, 5858–5870 (2025)

Algarni, A.M., Thayananthan, V.: Digital health: The cybersecurity for ai-based healthcare communication. IEEE Access13, 5858–5870 (2025). https://doi.org/10.1109/ACCESS.2025.3526666

work page doi:10.1109/access.2025.3526666 2025
[3]

McLaren, B.: Lever- aging intelligent tutoring systems to enhance project-based learning in work- force training at community colleges

An, M., Teffera, L., Mehrvarz, M., Li, B., Bogart, C., Sakr, M., M. McLaren, B.: Lever- aging intelligent tutoring systems to enhance project-based learning in work- force training at community colleges. In: Ferreira Mello, R., Rummel, N., Jivet, I., Pishtari, G., Ruipérez Valiente, J.A. (eds.) Technology Enhanced Learning for In- clusive and Equitable Qu...

work page 2024
[4]

In: Proceedings of the Eleventh ACM Conference on Learning @ Scale

Barno, E., Albaladejo-González, M., Reich, J.: Scaling generated feedback for novice teachers by sustaining teacher educators’ expertise: A design to train llms with teacher educator endorsement of generated feedback. In: Proceedings of the Eleventh ACM Conference on Learning @ Scale. p. 412–416. L@S ’24, ACM, New York, NY, USA (2024). https://doi.org/10....

work page doi:10.1145/3657604.3664677 2024
[5]

Berthelot, A., Caron, E., Jay, M., Lefèvre, L.: Understanding the environmen- tal impact of generative ai services. Commun. ACM68(7), 46–53 (Jun 2025). https://doi.org/10.1145/3725984

work page doi:10.1145/3725984 2025
[6]

In: Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: Industry Track

Cheng, Z., Kasai, J., Yu, T.: Batch Prompting: Efficient Inference with Large Language Model APIs. In: Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: Industry Track. pp. 792–810. Association for Computational Linguistics, Singapore (2023). https://doi.org/10.18653/v1/2023.emnlp-industry.74

work page doi:10.18653/v1/2023.emnlp-industry.74 2023
[7]

In: Proceedings of the 2nd Workshop on Sustainable Computer Systems

Chien, A.A., Lin, L., Nguyen, H., Rao, V., Sharma, T., Wijayawardana, R.: Reducing the carbon impact of generative ai inference (today and in 2035). In: Proceedings of the 2nd Workshop on Sustainable Computer Systems. HotCarbon ’23, ACM, New York, NY, USA (2023). https://doi.org/10.1145/3604930.3605705

work page doi:10.1145/3604930.3605705 2035
[8]

Educational Technology & Society21(2), 273–290 (2018)

Choi, S.P.M., Lam, S.S., Li, K.C., Wong, B.T.M.: Learning analytics at low cost: At- risk student prediction with clicker data and systematic proactive interventions. Educational Technology & Society21(2), 273–290 (2018)

work page 2018
[9]

International Journal of Social Research Methodology15(6), 523–543 (2012)

Crowston, K., Allen, E.E., Heckman, R.: Using natural language processing tech- nology for qualitative data analysis. International Journal of Social Research Methodology15(6), 523–543 (2012)

work page 2012
[10]

Devlin, M.-W

Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: Pre-training of deep bidirec- tional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)...

work page doi:10.18653/v1/n19- 2019
[11]

Medical Teacher31(7), e287–e294 (Jan 2009)

Dieckmann, P., Molin Friis, S., Lippert, A., Østergaard, D.: The art and science of debriefing in simulation: Ideal and practice. Medical Teacher31(7), e287–e294 (Jan 2009). https://doi.org/10.1080/01421590902866218

work page doi:10.1080/01421590902866218 2009
[12]

In: 2024 IEEE 15th International Green and Sustainable Computing Conference (IGSC)

Ding, Y., Shi, T.: Sustainable llm serving: Environmental implications, chal- lenges, and opportunities : Invited paper. In: 2024 IEEE 15th International Green and Sustainable Computing Conference (IGSC). pp. 37–38 (2024). https://doi.org/10.1109/IGSC64514.2024.00016

work page doi:10.1109/igsc64514.2024.00016 2024
[13]

In: Proceedings of the 14th Learning Analytics and Knowledge Conference

Echeverria, V., Yan, L., Zhao, L., Abel, S., Alfredo, R., Dix, S., Jaggard, H., Wother- spoon, R., Osborne, A., Buckingham Shum, S., Gasevic, D., Martinez-Maldonado, R.: TeamSlides: A Multimodal Teamwork Analytics Dashboard for Teacher-guided Reflection in a Physical Learning Space. In: Proceedings of the 14th Learning Analytics and Knowledge Conference. ...

work page doi:10.1145/3636555.3636857 2024
[14]

In: Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems

Echeverria, V., Zhao, L., Alfredo, R., Milesi, M.E., Jin, Y., Abel, S., Fan, J.X., Yan, L., Dix, S., Wotherspoon, R., Li, X., Jaggard, H.A., Osborne, A., Buckingham Shum, S., Gasevic, D., Martinez-Maldonado, R.: TeamVision: An AI-powered Learning Analytics System for Supporting Reflection in Team-based Healthcare Simulation. In: Proceedings of the 2025 CH...

work page doi:10.1145/3706598.3713395 2025
[15]

Measuring the environmental impact of delivering AI at Google Scale.arXiv preprint arXiv:2508.15734, 2025

Elsworth, C., Huang, K., Patterson, D., Schneider, I., Sedivy, R., Goodman, S., Townsend, B., Ranganathan, P., Dean, J., Vahdat, A., Gomes, B., Manyika, J.: Measuring the environmental impact of delivering AI at Google Scale (Aug 2025). https://doi.org/10.48550/arXiv.2508.15734

work page doi:10.48550/arxiv.2508.15734 2025
[16]

Educational Technology & Society28(4), 166–182 (October 2025)

Erdoğdu, F., Kara, M., Gökoğlu, S., Telci, S.: Trends and insights in cscl research from the emergence to the present: A review through bibliometric and latent dirichlet allocation analyses. Educational Technology & Society28(4), 166–182 (October 2025)

work page 2025
[17]

Simulation in Healthcare: The Journal of the Society for Simulation in Healthcare 2(2), 115–125 (2007)

Fanning, R.M., Gaba, D.M.: The role of debriefing in simulation-based learning. Simulation in Healthcare: The Journal of the Society for Simulation in Healthcare 2(2), 115–125 (2007). https://doi.org/10.1097/SIH.0b013e3180315539

work page doi:10.1097/sih.0b013e3180315539 2007
[18]

Educational Psychologist48(1), 9–24 (2013)

Fransen, J., Weinberger, A., Kirschner, P.A.: Team effectiveness and team development in cscl. Educational Psychologist48(1), 9–24 (2013). https://doi.org/10.1080/00461520.2012.747947

work page doi:10.1080/00461520.2012.747947 2013
[19]

In: Proceedings of the 14th Learning Analytics and Knowledge Conference

Garg, R., Han, J., Cheng, Y., Fang, Z., Swiecki, Z.: Automated discourse analysis via generative artificial intelligence. In: Proceedings of the 14th Learning Analytics and Knowledge Conference. p. 814–820. LAK ’24, ACM, New York, NY, USA (2024). https://doi.org/10.1145/3636555.3636879

work page doi:10.1145/3636555.3636879 2024
[20]

TechTrends59(1), 64–71 (2015)

Gašević, D., Dawson, S., Siemens, G.: Let’s not forget: Learning analytics are about learning. TechTrends59(1), 64–71 (2015). https://doi.org/10.1007/s11528- 014-0822-x

work page doi:10.1007/s11528- 2015
[21]

Transactions of the Association for Computational Linguistics11, 351–366 (Apr 2023)

Gekhman, Z., Oved, N., Keller, O., Szpektor, I., Reichart, R.: On the robust- ness of dialogue history representation in conversational question answer- ing: A comprehensive study and a new prompt-based method. Transactions of the Association for Computational Linguistics11, 351–366 (Apr 2023). https://doi.org/10.1162/tacl_a_00549

work page doi:10.1162/tacl_a_00549 2023
[22]

doi: 10.1038/s41586-025-09422-z

Guo, D., Yang, D., Zhang, H., Song, J., et al.: DeepSeek-R1 incentivizes reasoning in LLMs through reinforcement learning. Nature645(8081), 633–638 (Sep 2025). https://doi.org/10.1038/s41586-025-09422-z

work page doi:10.1038/s41586-025-09422-z 2025
[23]

In: Cress, U., Rosé, C., Wise, A.F., Oshima, J

Hmelo-Silver, C.E., Jeong, H.: An overview of cscl methods. In: Cress, U., Rosé, C., Wise, A.F., Oshima, J. (eds.) International Handbook of Computer-Supported Col- laborative Learning, Computer-Supported Collaborative Learning Series, vol. 19. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-65291-3_4

work page doi:10.1007/978-3-030-65291-3_4 2021
[24]

In: Proceedings of the Eleventh ACM Conference on Learning @ Scale

Hutt, S., Hieb, G.: Scaling up mastery learning with generative ai: Explor- ing how generative ai can assist in the generation and evaluation of mas- tery quiz questions. In: Proceedings of the Eleventh ACM Conference on Learning @ Scale. p. 310–314. L@S ’24, ACM, New York, NY, USA (2024). https://doi.org/10.1145/3657604.3664699

work page doi:10.1145/3657604.3664699 2024
[25]

In: Proc

Inie, N., Falk, J., Selvan, R.: How co2stly is chi? the carbon footprint of generative ai in hci research and what we should do about it. In: Proc. of the CHI Conference on Human Factors in Computing Systems (CHI ’25). pp. 1–29. ACM, New York, NY, USA (2025)

work page 2025
[26]

International Journal of Computer-Supported Collaborative Learning9(3), 305–334 (2014)

Jeong, H., Hmelo-Silver, C.E., Yu, Y.: An examination of cscl methodological practices and the influence of theoretical frameworks 2005–2009. International Journal of Computer-Supported Collaborative Learning9(3), 305–334 (2014). https://doi.org/10.1007/s11412-014-9198-3

work page doi:10.1007/s11412-014-9198-3 2005
[27]

Proceedings of the VLDB Endowment18(7), 2172–2184 (Mar 2025)

Ji, Z., Wang, X., Luo, Z., Xie, Z., Zhang, M.: Optimized Batch Prompting for Cost-Effective LLMs. Proceedings of the VLDB Endowment18(7), 2172–2184 (Mar 2025). https://doi.org/10.14778/3734839.3734853

work page doi:10.14778/3734839.3734853 2025
[28]

In: Proceedings of the Eleventh ACM Conference on Learning @ Scale

Jin, Y., Yu, J.: Optimizing mentor-student communication using llm-based auto- mated labeling information states. In: Proceedings of the Eleventh ACM Confer- ence on Learning @ Scale. p. 284–288. L@S ’24, ACM, New York, NY, USA (2024). https://doi.org/10.1145/3657604.3664691

work page doi:10.1145/3657604.3664691 2024
[29]

Richard Landis and Gary G

Landis, J.R., Koch, G.G.: The Measurement of Observer Agreement for Categorical Data. Biometrics33(1), 159 (Mar 1977). https://doi.org/10.2307/2529310

work page doi:10.2307/2529310 1977
[30]

In: 2025 7th International Conference on Computer Science and Technologies in Education (CSTE)

Li, M., Qin, W., Tang, Z., Fang, X., He, T., Cao, X.: Automating ssrl detec- tion in asynchronous ocl via llms. In: 2025 7th International Conference on Computer Science and Technologies in Education (CSTE). pp. 548–551 (2025). https://doi.org/10.1109/CSTE64638.2025.11092245

work page doi:10.1109/cste64638.2025.11092245 2025
[31]

Future in Educational Researchn/a(n/a) (2025)

Liao, J., Sun, F., Liu, Y., Hu, Y.: Deepseek in education: Exploring the transfor- mative potential of ai-driven educational intelligence. Future in Educational Researchn/a(n/a) (2025). https://doi.org/10.1002/fer3.70022

work page doi:10.1002/fer3.70022 2025
[32]

Liu, Kevin Lin, John Hewitt, Ashwin Paranjape, Michele Bevilacqua, Fabio Petroni, and Percy Liang

Liu, N.F., Lin, K., Hewitt, J., Paranjape, A., Bevilacqua, M., Petroni, F., Liang, P.: Lost in the middle: How language models use long contexts. Transac- tions of the Association for Computational Linguistics12, 157–173 (02 2024). https://doi.org/10.1162/tacl_a_00638

work page doi:10.1162/tacl_a_00638 2024
[33]

Journal of Learning Analytics12(1), 169–185 (2025)

Liu, X., Zambrano, A.F., Baker, R.S., Barany, A., Ocumpaugh, J., Zhang, J., Pankiewicz, M., Nasiar, N., Wei, Z.: Qualitative coding with gpt-4: Where it works better. Journal of Learning Analytics12(1), 169–185 (2025). https://doi.org/10.18608/jla.2025.8575

work page doi:10.18608/jla.2025.8575 2025
[34]

ACM Trans

Martinez-Maldonado, R., Echeverria, V., Fernandez-Nieto, G., Yan, L., Zhao, L., Alfredo, R., Li, X., Dix, S., Jaggard, H., Wotherspoon, R., Osborne, A., Shum, S.B., Gašević, D.: Lessons learnt from a multimodal learning analytics de- ployment in-the-wild. ACM Trans. Comput.-Hum. Interact.31(1) (Nov 2023). https://doi.org/10.1145/3622784

work page doi:10.1145/3622784 2023
[35]

Computers in Human Behavior 71, 327–342 (2017)

Martinez-Maldonado, R., Goodyear, P., Carvalho, L., Thompson, K., Hernandez- Leo, D., Dimitriadis, Y., Prieto, L.P., Wardak, D.: Supporting collaborative design activity in a multi-user digital design ecology. Computers in Human Behavior 71, 327–342 (2017). https://doi.org/10.1016/j.chb.2017.01.055

work page doi:10.1016/j.chb.2017.01.055 2017
[36]

In: Proceedings of the Seventh International Learning Analytics & Knowledge Conference

Martinez-Maldonado, R., Power, T., Hayes, C., Abdiprano, A., Vo, T., Axisa, C., Buckingham Shum, S.: Analytics meet patient manikins: challenges in an authen- tic small-group healthcare simulation classroom. In: Proceedings of the Seventh International Learning Analytics & Knowledge Conference. p. 90–94. LAK ’17, ACM, New York, NY, USA (2017). https://doi...

work page doi:10.1145/3027385.3027401 2017
[37]

Journal of Computer Assisted Learning36(5), 741–762 (2020)

Martinez-Maldonado, R., Schulte, J., Echeverria, V., Gopalan, Y., Shum, S.B.: Where is the teacher? digital analytics for classroom proxemics. Journal of Computer Assisted Learning36(5), 741–762 (2020). https://doi.org/10.1111/jcal.12444

work page doi:10.1111/jcal.12444 2020
[38]

Biochemia medica22(3), 276–282 (2012).https://doi.org/10.11613/BM.2012.031

McHugh, M.L.: Interrater reliability: The kappa statistic. Biochemia Medica22(3), 276–282 (2012). https://doi.org/10.11613/BM.2012.031

work page doi:10.11613/bm.2012.031 2012
[39]

In: Proceedings of the Twelfth ACM Confer- ence on Learning @ Scale

Mehta, S., Srivastava, N., Liu, X., Vanacore, K., Baker, R.S.: Do mooc conver- sations matter? investigating the role of social presence and course-relevant discussion in career advancement. In: Proceedings of the Twelfth ACM Confer- ence on Learning @ Scale. p. 236–240. L@S ’25, ACM, New York, NY, USA (2025). https://doi.org/10.1145/3698205.3733930 L@S ’...

work page doi:10.1145/3698205.3733930 2025
[40]

Journal of Nursing Management17(2), 247–255 (Mar 2009)

Miller, K., Riley, W., Davis, S.: Identifying key nursing and team behaviours to achieve high reliability. Journal of Nursing Management17(2), 247–255 (Mar 2009). https://doi.org/10.1111/j.1365-2834.2009.00978.x

work page doi:10.1111/j.1365-2834.2009.00978.x 2009
[41]

In: Proceed- ings of the Eleventh ACM Conference on Learning @ Scale

Moore, S., Schmucker, R., Mitchell, T., Stamper, J.: Automated generation and tagging of knowledge components from multiple-choice questions. In: Proceed- ings of the Eleventh ACM Conference on Learning @ Scale. p. 122–133. L@S ’24, ACM, New York, NY, USA (2024). https://doi.org/10.1145/3657604.3662030

work page doi:10.1145/3657604.3662030 2024
[42]

In: Proceedings of the 13th International Conference on, Intelligent Systems Application to Power Systems

Ngatchou, P., Zarei, A., El-Sharkawi, A.: Pareto multi objective op- timization. In: Proceedings of the 13th International Conference on, Intelligent Systems Application to Power Systems. pp. 84–91 (2005). https://doi.org/10.1109/ISAP.2005.1599245

work page doi:10.1109/isap.2005.1599245 2005
[43]

In: Proceedings of the Eleventh ACM Conference on Learning @ Scale

Nguyen, H., Stott, N., Allan, V.: Comparing feedback from large language mod- els and instructors: Teaching computer science at scale. In: Proceedings of the Eleventh ACM Conference on Learning @ Scale. p. 335–339. L@S ’24, ACM, New York, NY, USA (2024). https://doi.org/10.1145/3657604.3664660

work page doi:10.1145/3657604.3664660 2024
[44]

In: Proceedings of the Twelfth ACM Conference on Learning @ Scale

Nie, A., Chandak, Y., Suzara, M., Malik, A., Woodrow, J., Peng, M., Sahami, M., Brunskill, E., Piech, C.: The gpt surprise: Offering large language model chat in a massive coding class reduced engagement but may increase adopters’ exam performances. In: Proceedings of the Twelfth ACM Conference on Learning @ Scale. p. 376–380. L@S ’25, ACM, New York, NY, ...

work page doi:10.1145/3698205.3733960 2025
[45]

In: Proceedings of the Tenth ACM Conference on Learning @ Scale

Ouhaichi, H., Spikol, D., Vogel, B.: Rethinking mmla: Design considerations for multimodal learning analytics systems. In: Proceedings of the Tenth ACM Conference on Learning @ Scale. p. 354–359. L@S ’23, ACM, New York, NY, USA (2023). https://doi.org/10.1145/3573051.3596186

work page doi:10.1145/3573051.3596186 2023
[46]

In: Proceedings of the Third (2016) ACM Conference on Learning @ Scale

Papathoma, T., Ferguson, R., Littlejohn, A., Coe, A.: Making the production of learning at scale more open and flexible. In: Proceedings of the Third (2016) ACM Conference on Learning @ Scale. p. 273–276. L@S ’16, ACM, New York, NY, USA (2016). https://doi.org/10.1145/2876034.2893432

work page doi:10.1145/2876034.2893432 2016
[47]

In: Proceedings of the 40th International Conference on Machine Learning

Radford, A., Kim, J.W., Xu, T., Brockman, G., McLeavey, C., Sutskever, I.: Robust speech recognition via large-scale weak supervision. In: Proceedings of the 40th International Conference on Machine Learning. ICML’23 (2023)

work page 2023
[48]

In: Henriksen, K., Battles, J.B., Keyes, M.A., Grady, M.L

Riley, W., Hansen, H., Gürses, A.P., Davis, S., Miller, K., Priester, R.: The nature, characteristics and patterns of perinatal critical events teams. In: Henriksen, K., Battles, J.B., Keyes, M.A., Grady, M.L. (eds.) Advances in Patient Safety: New Directions and Alternative Approaches (Vol. 3: Performance and Tools). Agency for Healthcare Research and Qu...

work page 2008
[49]

Journal of Continuing Education in the Health Professions32(4), 243–254 (2012)

Rosen, M.A., Hunt, E.A., Pronovost, P.J., Federowicz, M.A., Weaver, S.J.: In Situ Simulation in Continuing Education for the Health Care Professions: A Systematic Review. Journal of Continuing Education in the Health Professions32(4), 243–254 (2012). https://doi.org/10.1002/chp.21152

work page doi:10.1002/chp.21152 2012
[50]

American Psychologist73(4), 593–600 (2018)

Salas, E., Reyes, D.L., McDaniel, S.H.: The science of teamwork: Progress, re- flections, and the road ahead. American Psychologist73(4), 593–600 (2018). https://doi.org/10.1037/amp0000334

work page doi:10.1037/amp0000334 2018
[51]

In: Pro- ceedings of the 16th International Learning Analytics & Knowledge Conference (LAK ’26)

Samaraweera, S., Zhao, L., Echeverria, V., Alfredo, R., Chen, G., Davis, J., Leonny, S., Sevenhuysen, S., Connell, C., Gasevic, D., Martinez-Maldonado, R., Dhar- maratne, A.: From formal learning to professional practice: Automated llm-based coding and visualisation of team dialogue in in-situ healthcare simulation. In: Pro- ceedings of the 16th Internati...

work page 2026
[52]

Communication & Medicine13(1), 1–7 (2017)

Sarangi, S.: Editorial: Team work and team talk as distributed and coordinated action in healthcare delivery. Communication & Medicine13(1), 1–7 (2017). https://doi.org/10.1558/cam.32569

work page doi:10.1558/cam.32569 2017
[53]

JMIR Medical Informatics12, e55318 (Apr 2024)

Sivarajkumar, S., Kelley, M., Samolyk-Mazzanti, A., Visweswaran, S., Wang, Y.: An Empirical Evaluation of Prompting Strategies for Large Language Mod- els in Zero-Shot Clinical Natural Language Processing: Algorithm Develop- ment and Validation Study. JMIR Medical Informatics12, e55318 (Apr 2024). https://doi.org/10.2196/55318

work page doi:10.2196/55318 2024
[54]

Southwell, R., Pugh, S., E. Margaret Perkoff, Clevenger, C., Bush, J., Lieber, R., Ward, W., Foltz, P., D’Mello, S.: Challenges and Feasibility of Auto- matic Speech Recognition for Modeling Student Collaborative Discourse in Classrooms. In: Mitrovic, A., Bosch, N. (eds.) Proceedings of the 15th International Conference on Educational Data Mining. Zenodo ...

work page doi:10.5281/zenodo.6853109 2022
[55]

Stadler, W.: Multicriteria Optimization in Engineering and in the Sciences, vol. 37. Springer Science & Business Media (1988)

work page 1988
[56]

MIT Press, Cambridge, MA (2006)

Stahl, G.: Group Cognition: Computer Support for Building Collaborative Knowledge. MIT Press, Cambridge, MA (2006). https://doi.org/10.7551/mitpress/3372.001.0001

work page doi:10.7551/mitpress/3372.001.0001 2006
[57]

Journal of Medical Internet Research27, e58744 (Feb 2025)

Stenseth, H.V., Steindal, S.A., Solberg, M.T., Ølnes, M.A., Sørensen, A.L., Strandell- Laine, C., Olaussen, C., Farsjø Aure, C., Pedersen, I., Zlamal, J., Gue Mar- tini, J., Bresolin, P., Linnerud, S.C.W., Nes, A.A.G.: Simulation-Based Learning Supported by Technology to Enhance Critical Thinking in Nursing Students: Scoping Review. Journal of Medical Int...

work page doi:10.2196/58744 2025
[58]

International Journal of Surgery53, 171–177 (2018)

Sun, R., Marshall, D.C., Sykes, M.C., Maruthappu, M., Shalhoub, J.: The impact of improving teamwork on patient outcomes in surgery: A sys- tematic review. International Journal of Surgery53, 171–177 (2018). https://doi.org/10.1016/j.ijsu.2018.03.044

work page doi:10.1016/j.ijsu.2018.03.044 2018
[59]

Advances in Simulation (Jan 2026)

Tscholl, D.W., Ebensperger, M., RahrischRahrisch, A., Wang, H., Heckel, H., Thomasius, M., Kaserer, A., Grande, B., Seelandt, J.C., Kolbe, M.: Generative AI in simulation debriefings: An exploratory study using the Team-FIRST frame- work and qualitative feedback from simulation experts and learners. Advances in Simulation (Jan 2026). https://doi.org/10.11...

work page doi:10.1186/s41077-026-00407-0 2026
[60]

British Journal of Educational Technology56(6), 2671–2704 (2025)

Wang, D., Chen, G.: Evaluating the use of bert and llama to anal- yse classroom dialogue for teachers’ learning of dialogic pedagogy. British Journal of Educational Technology56(6), 2671–2704 (2025). https://doi.org/https://doi.org/10.1111/bjet.13604

work page doi:10.1111/bjet.13604 2025
[61]

International Journal of Educational Research 123, 102275 (2024)

Wang, D., Tao, Y., Chen, G.: Artificial intelligence in classroom discourse: A sys- tematic review of the past decade. International Journal of Educational Research 123, 102275 (2024)

work page 2024
[62]

In: Proceedings of the Twelfth ACM Conference on Learning @ Scale

Wang, D., Yang, C., Chen, G.: Using lora to fine-tune large language models for analyzing collaborative argumentation in classrooms. In: Proceedings of the Twelfth ACM Conference on Learning @ Scale. p. 207–211. L@S ’25, ACM, New York, NY, USA (2025). https://doi.org/10.1145/3698205.3733924

work page doi:10.1145/3698205.3733924 2025
[63]

In: Proceedings of the 30th Conference on Pattern Languages of Programs

White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer- Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with chatgpt. In: Proceedings of the 30th Conference on Pattern Languages of Programs. PLoP ’23, The Hillside Group, USA (2023)

work page 2023
[64]

International Journal of Artificial Intelligence in Education35(4), 2421– 2452 (Dec 2025)

Yan, L., Gašević, D., Echeverria, V., Zhao, L., Jin, Y., Li, X., Martinez-Maldonado, R.: In Sync or Out of Sync? Understanding Stress and Learning Performance in Collaborative Healthcare Simulations through Physiological Synchrony and Arousal. International Journal of Artificial Intelligence in Education35(4), 2421– 2452 (Dec 2025). https://doi.org/10.100...

work page doi:10.1007/s40593-025-00475-9 2025
[65]

In: Pro- ceedings of the Ninth ACM Conference on Learning @ Scale

Yang, H., Alozie, N., Rachmatullah, A.: Collaboration at scale: Exploring member role changing patterns in collaborative science problem-solving tasks. In: Pro- ceedings of the Ninth ACM Conference on Learning @ Scale. p. 309–312. L@S ’22, ACM, New York, NY, USA (2022). https://doi.org/10.1145/3491140.3528319

work page doi:10.1145/3491140.3528319 2022
[66]

In: Usenix Nsdi (2023)

You, J., Chung, J.W., Chowdhury, M.: Zeus: Understanding and optimizing GPU energy consumption of DNN training. In: Usenix Nsdi (2023)

work page 2023
[67]

Baker, Juhan Kim, and Nidhi Nasiar

Zambrano, A.F., Liu, X., Barany, A., Baker, R.S., Kim, J., Nasiar, N.: From ncoder to chatgpt: From automated coding to refining human coding. In: Arastoopour Ir- gens, G., Knight, S. (eds.) Advances in Quantitative Ethnography, Communica- tions in Computer and Information Science, vol. 1895. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-47014-1_32

work page doi:10.1007/978-3-031-47014-1_32 2023
[68]

In: Proceedings of the Eleventh ACM Conference on Learning @ Scale

Zhang, A.G., Tang, X., Oney, S., Chen, Y.: Cflow: Supporting semantic flow analy- sis of students’ code in programming problems at scale. In: Proceedings of the Eleventh ACM Conference on Learning @ Scale. p. 188–199. L@S ’24, ACM, New York, NY, USA (2024). https://doi.org/10.1145/3657604.3662025

work page doi:10.1145/3657604.3662025 2024
[69]

In: Proceedings of the 14th Learning Analytics and Knowledge Conference

Zhao, L., Echeverria, V., Swiecki, Z., Yan, L., Alfredo, R., Li, X., Gase- vic, D., Martinez-Maldonado, R.: Epistemic network analysis for end-users: Closing the loop in the context of multimodal analytics for collaborative team learning. In: Proceedings of the 14th Learning Analytics and Knowl- edge Conference. p. 90–100. LAK ’24, ACM, New York, NY, USA ...

work page doi:10.1145/3636555.3636855 2024
[70]

British Journal of Educational Technology55(4), 1673–1702 (Jul 2024)

Zhao, L., Gašević, D., Swiecki, Z., Li, Y., Lin, J., Sha, L., Yan, L., Alfredo, R., Li, X., Martinez-Maldonado, R.: Towards automated transcribing and coding of embodied teamwork communication through multimodal learning analyt- ics. British Journal of Educational Technology55(4), 1673–1702 (Jul 2024). https://doi.org/10.1111/bjet.13476

work page doi:10.1111/bjet.13476 2024
[71]

In: LAK23: 13th International Learning Analytics and Knowledge Conference

Zhao, L., Swiecki, Z., Gasevic, D., Yan, L., Dix, S., Jaggard, H., Wotherspoon, R., Osborne, A., Li, X., Alfredo, R., Martinez-Maldonado, R.: Mets: Multimodal learning analytics of embodied teamwork learning. In: LAK23: 13th International Learning Analytics and Knowledge Conference. p. 186–196. LAK2023, ACM, New York, NY, USA (2023). https://doi.org/10.11...

work page doi:10.1145/3576050.3576076 2023
[72]

https://doi.org/10.48550/ARXIV.2405.16178 Received 16 February 2026

Zhu, Y., Gu, J.C., Sikora, C., Ko, H., Liu, Y., Lin, C.C., Shu, L., Luo, L., Meng, L., Liu, B., Chen, J.: Accelerating Inference of Retrieval-Augmented Generation via Sparse Context Selection (2024). https://doi.org/10.48550/ARXIV.2405.16178 Received 16 February 2026

work page doi:10.48550/arxiv.2405.16178 2024

[1] [1]

In: StatPearls

Abulebda, K., Auerbach, M., Limaiem, F.: Debriefing techniques utilized in medical simulation. In: StatPearls. StatPearls Publishing, Treasure Island, FL (2025)

work page 2025

[2] [2]

IEEE Access13, 5858–5870 (2025)

Algarni, A.M., Thayananthan, V.: Digital health: The cybersecurity for ai-based healthcare communication. IEEE Access13, 5858–5870 (2025). https://doi.org/10.1109/ACCESS.2025.3526666

work page doi:10.1109/access.2025.3526666 2025

[3] [3]

McLaren, B.: Lever- aging intelligent tutoring systems to enhance project-based learning in work- force training at community colleges

An, M., Teffera, L., Mehrvarz, M., Li, B., Bogart, C., Sakr, M., M. McLaren, B.: Lever- aging intelligent tutoring systems to enhance project-based learning in work- force training at community colleges. In: Ferreira Mello, R., Rummel, N., Jivet, I., Pishtari, G., Ruipérez Valiente, J.A. (eds.) Technology Enhanced Learning for In- clusive and Equitable Qu...

work page 2024

[4] [4]

In: Proceedings of the Eleventh ACM Conference on Learning @ Scale

Barno, E., Albaladejo-González, M., Reich, J.: Scaling generated feedback for novice teachers by sustaining teacher educators’ expertise: A design to train llms with teacher educator endorsement of generated feedback. In: Proceedings of the Eleventh ACM Conference on Learning @ Scale. p. 412–416. L@S ’24, ACM, New York, NY, USA (2024). https://doi.org/10....

work page doi:10.1145/3657604.3664677 2024

[5] [5]

Berthelot, A., Caron, E., Jay, M., Lefèvre, L.: Understanding the environmen- tal impact of generative ai services. Commun. ACM68(7), 46–53 (Jun 2025). https://doi.org/10.1145/3725984

work page doi:10.1145/3725984 2025

[6] [6]

In: Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: Industry Track

Cheng, Z., Kasai, J., Yu, T.: Batch Prompting: Efficient Inference with Large Language Model APIs. In: Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: Industry Track. pp. 792–810. Association for Computational Linguistics, Singapore (2023). https://doi.org/10.18653/v1/2023.emnlp-industry.74

work page doi:10.18653/v1/2023.emnlp-industry.74 2023

[7] [7]

In: Proceedings of the 2nd Workshop on Sustainable Computer Systems

Chien, A.A., Lin, L., Nguyen, H., Rao, V., Sharma, T., Wijayawardana, R.: Reducing the carbon impact of generative ai inference (today and in 2035). In: Proceedings of the 2nd Workshop on Sustainable Computer Systems. HotCarbon ’23, ACM, New York, NY, USA (2023). https://doi.org/10.1145/3604930.3605705

work page doi:10.1145/3604930.3605705 2035

[8] [8]

Educational Technology & Society21(2), 273–290 (2018)

Choi, S.P.M., Lam, S.S., Li, K.C., Wong, B.T.M.: Learning analytics at low cost: At- risk student prediction with clicker data and systematic proactive interventions. Educational Technology & Society21(2), 273–290 (2018)

work page 2018

[9] [9]

International Journal of Social Research Methodology15(6), 523–543 (2012)

Crowston, K., Allen, E.E., Heckman, R.: Using natural language processing tech- nology for qualitative data analysis. International Journal of Social Research Methodology15(6), 523–543 (2012)

work page 2012

[10] [10]

Devlin, M.-W

Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: Pre-training of deep bidirec- tional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)...

work page doi:10.18653/v1/n19- 2019

[11] [11]

Medical Teacher31(7), e287–e294 (Jan 2009)

Dieckmann, P., Molin Friis, S., Lippert, A., Østergaard, D.: The art and science of debriefing in simulation: Ideal and practice. Medical Teacher31(7), e287–e294 (Jan 2009). https://doi.org/10.1080/01421590902866218

work page doi:10.1080/01421590902866218 2009

[12] [12]

In: 2024 IEEE 15th International Green and Sustainable Computing Conference (IGSC)

Ding, Y., Shi, T.: Sustainable llm serving: Environmental implications, chal- lenges, and opportunities : Invited paper. In: 2024 IEEE 15th International Green and Sustainable Computing Conference (IGSC). pp. 37–38 (2024). https://doi.org/10.1109/IGSC64514.2024.00016

work page doi:10.1109/igsc64514.2024.00016 2024

[13] [13]

In: Proceedings of the 14th Learning Analytics and Knowledge Conference

Echeverria, V., Yan, L., Zhao, L., Abel, S., Alfredo, R., Dix, S., Jaggard, H., Wother- spoon, R., Osborne, A., Buckingham Shum, S., Gasevic, D., Martinez-Maldonado, R.: TeamSlides: A Multimodal Teamwork Analytics Dashboard for Teacher-guided Reflection in a Physical Learning Space. In: Proceedings of the 14th Learning Analytics and Knowledge Conference. ...

work page doi:10.1145/3636555.3636857 2024

[14] [14]

In: Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems

Echeverria, V., Zhao, L., Alfredo, R., Milesi, M.E., Jin, Y., Abel, S., Fan, J.X., Yan, L., Dix, S., Wotherspoon, R., Li, X., Jaggard, H.A., Osborne, A., Buckingham Shum, S., Gasevic, D., Martinez-Maldonado, R.: TeamVision: An AI-powered Learning Analytics System for Supporting Reflection in Team-based Healthcare Simulation. In: Proceedings of the 2025 CH...

work page doi:10.1145/3706598.3713395 2025

[15] [15]

Measuring the environmental impact of delivering AI at Google Scale.arXiv preprint arXiv:2508.15734, 2025

Elsworth, C., Huang, K., Patterson, D., Schneider, I., Sedivy, R., Goodman, S., Townsend, B., Ranganathan, P., Dean, J., Vahdat, A., Gomes, B., Manyika, J.: Measuring the environmental impact of delivering AI at Google Scale (Aug 2025). https://doi.org/10.48550/arXiv.2508.15734

work page doi:10.48550/arxiv.2508.15734 2025

[16] [16]

Educational Technology & Society28(4), 166–182 (October 2025)

Erdoğdu, F., Kara, M., Gökoğlu, S., Telci, S.: Trends and insights in cscl research from the emergence to the present: A review through bibliometric and latent dirichlet allocation analyses. Educational Technology & Society28(4), 166–182 (October 2025)

work page 2025

[17] [17]

Simulation in Healthcare: The Journal of the Society for Simulation in Healthcare 2(2), 115–125 (2007)

Fanning, R.M., Gaba, D.M.: The role of debriefing in simulation-based learning. Simulation in Healthcare: The Journal of the Society for Simulation in Healthcare 2(2), 115–125 (2007). https://doi.org/10.1097/SIH.0b013e3180315539

work page doi:10.1097/sih.0b013e3180315539 2007

[18] [18]

Educational Psychologist48(1), 9–24 (2013)

Fransen, J., Weinberger, A., Kirschner, P.A.: Team effectiveness and team development in cscl. Educational Psychologist48(1), 9–24 (2013). https://doi.org/10.1080/00461520.2012.747947

work page doi:10.1080/00461520.2012.747947 2013

[19] [19]

In: Proceedings of the 14th Learning Analytics and Knowledge Conference

Garg, R., Han, J., Cheng, Y., Fang, Z., Swiecki, Z.: Automated discourse analysis via generative artificial intelligence. In: Proceedings of the 14th Learning Analytics and Knowledge Conference. p. 814–820. LAK ’24, ACM, New York, NY, USA (2024). https://doi.org/10.1145/3636555.3636879

work page doi:10.1145/3636555.3636879 2024

[20] [20]

TechTrends59(1), 64–71 (2015)

Gašević, D., Dawson, S., Siemens, G.: Let’s not forget: Learning analytics are about learning. TechTrends59(1), 64–71 (2015). https://doi.org/10.1007/s11528- 014-0822-x

work page doi:10.1007/s11528- 2015

[21] [21]

Transactions of the Association for Computational Linguistics11, 351–366 (Apr 2023)

Gekhman, Z., Oved, N., Keller, O., Szpektor, I., Reichart, R.: On the robust- ness of dialogue history representation in conversational question answer- ing: A comprehensive study and a new prompt-based method. Transactions of the Association for Computational Linguistics11, 351–366 (Apr 2023). https://doi.org/10.1162/tacl_a_00549

work page doi:10.1162/tacl_a_00549 2023

[22] [22]

doi: 10.1038/s41586-025-09422-z

Guo, D., Yang, D., Zhang, H., Song, J., et al.: DeepSeek-R1 incentivizes reasoning in LLMs through reinforcement learning. Nature645(8081), 633–638 (Sep 2025). https://doi.org/10.1038/s41586-025-09422-z

work page doi:10.1038/s41586-025-09422-z 2025

[23] [23]

In: Cress, U., Rosé, C., Wise, A.F., Oshima, J

Hmelo-Silver, C.E., Jeong, H.: An overview of cscl methods. In: Cress, U., Rosé, C., Wise, A.F., Oshima, J. (eds.) International Handbook of Computer-Supported Col- laborative Learning, Computer-Supported Collaborative Learning Series, vol. 19. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-65291-3_4

work page doi:10.1007/978-3-030-65291-3_4 2021

[24] [24]

In: Proceedings of the Eleventh ACM Conference on Learning @ Scale

Hutt, S., Hieb, G.: Scaling up mastery learning with generative ai: Explor- ing how generative ai can assist in the generation and evaluation of mas- tery quiz questions. In: Proceedings of the Eleventh ACM Conference on Learning @ Scale. p. 310–314. L@S ’24, ACM, New York, NY, USA (2024). https://doi.org/10.1145/3657604.3664699

work page doi:10.1145/3657604.3664699 2024

[25] [25]

In: Proc

Inie, N., Falk, J., Selvan, R.: How co2stly is chi? the carbon footprint of generative ai in hci research and what we should do about it. In: Proc. of the CHI Conference on Human Factors in Computing Systems (CHI ’25). pp. 1–29. ACM, New York, NY, USA (2025)

work page 2025

[26] [26]

International Journal of Computer-Supported Collaborative Learning9(3), 305–334 (2014)

Jeong, H., Hmelo-Silver, C.E., Yu, Y.: An examination of cscl methodological practices and the influence of theoretical frameworks 2005–2009. International Journal of Computer-Supported Collaborative Learning9(3), 305–334 (2014). https://doi.org/10.1007/s11412-014-9198-3

work page doi:10.1007/s11412-014-9198-3 2005

[27] [27]

Proceedings of the VLDB Endowment18(7), 2172–2184 (Mar 2025)

Ji, Z., Wang, X., Luo, Z., Xie, Z., Zhang, M.: Optimized Batch Prompting for Cost-Effective LLMs. Proceedings of the VLDB Endowment18(7), 2172–2184 (Mar 2025). https://doi.org/10.14778/3734839.3734853

work page doi:10.14778/3734839.3734853 2025

[28] [28]

In: Proceedings of the Eleventh ACM Conference on Learning @ Scale

Jin, Y., Yu, J.: Optimizing mentor-student communication using llm-based auto- mated labeling information states. In: Proceedings of the Eleventh ACM Confer- ence on Learning @ Scale. p. 284–288. L@S ’24, ACM, New York, NY, USA (2024). https://doi.org/10.1145/3657604.3664691

work page doi:10.1145/3657604.3664691 2024

[29] [29]

Richard Landis and Gary G

Landis, J.R., Koch, G.G.: The Measurement of Observer Agreement for Categorical Data. Biometrics33(1), 159 (Mar 1977). https://doi.org/10.2307/2529310

work page doi:10.2307/2529310 1977

[30] [30]

In: 2025 7th International Conference on Computer Science and Technologies in Education (CSTE)

Li, M., Qin, W., Tang, Z., Fang, X., He, T., Cao, X.: Automating ssrl detec- tion in asynchronous ocl via llms. In: 2025 7th International Conference on Computer Science and Technologies in Education (CSTE). pp. 548–551 (2025). https://doi.org/10.1109/CSTE64638.2025.11092245

work page doi:10.1109/cste64638.2025.11092245 2025

[31] [31]

Future in Educational Researchn/a(n/a) (2025)

Liao, J., Sun, F., Liu, Y., Hu, Y.: Deepseek in education: Exploring the transfor- mative potential of ai-driven educational intelligence. Future in Educational Researchn/a(n/a) (2025). https://doi.org/10.1002/fer3.70022

work page doi:10.1002/fer3.70022 2025

[32] [32]

Liu, Kevin Lin, John Hewitt, Ashwin Paranjape, Michele Bevilacqua, Fabio Petroni, and Percy Liang

Liu, N.F., Lin, K., Hewitt, J., Paranjape, A., Bevilacqua, M., Petroni, F., Liang, P.: Lost in the middle: How language models use long contexts. Transac- tions of the Association for Computational Linguistics12, 157–173 (02 2024). https://doi.org/10.1162/tacl_a_00638

work page doi:10.1162/tacl_a_00638 2024

[33] [33]

Journal of Learning Analytics12(1), 169–185 (2025)

Liu, X., Zambrano, A.F., Baker, R.S., Barany, A., Ocumpaugh, J., Zhang, J., Pankiewicz, M., Nasiar, N., Wei, Z.: Qualitative coding with gpt-4: Where it works better. Journal of Learning Analytics12(1), 169–185 (2025). https://doi.org/10.18608/jla.2025.8575

work page doi:10.18608/jla.2025.8575 2025

[34] [34]

ACM Trans

Martinez-Maldonado, R., Echeverria, V., Fernandez-Nieto, G., Yan, L., Zhao, L., Alfredo, R., Li, X., Dix, S., Jaggard, H., Wotherspoon, R., Osborne, A., Shum, S.B., Gašević, D.: Lessons learnt from a multimodal learning analytics de- ployment in-the-wild. ACM Trans. Comput.-Hum. Interact.31(1) (Nov 2023). https://doi.org/10.1145/3622784

work page doi:10.1145/3622784 2023

[35] [35]

Computers in Human Behavior 71, 327–342 (2017)

Martinez-Maldonado, R., Goodyear, P., Carvalho, L., Thompson, K., Hernandez- Leo, D., Dimitriadis, Y., Prieto, L.P., Wardak, D.: Supporting collaborative design activity in a multi-user digital design ecology. Computers in Human Behavior 71, 327–342 (2017). https://doi.org/10.1016/j.chb.2017.01.055

work page doi:10.1016/j.chb.2017.01.055 2017

[36] [36]

In: Proceedings of the Seventh International Learning Analytics & Knowledge Conference

Martinez-Maldonado, R., Power, T., Hayes, C., Abdiprano, A., Vo, T., Axisa, C., Buckingham Shum, S.: Analytics meet patient manikins: challenges in an authen- tic small-group healthcare simulation classroom. In: Proceedings of the Seventh International Learning Analytics & Knowledge Conference. p. 90–94. LAK ’17, ACM, New York, NY, USA (2017). https://doi...

work page doi:10.1145/3027385.3027401 2017

[37] [37]

Journal of Computer Assisted Learning36(5), 741–762 (2020)

Martinez-Maldonado, R., Schulte, J., Echeverria, V., Gopalan, Y., Shum, S.B.: Where is the teacher? digital analytics for classroom proxemics. Journal of Computer Assisted Learning36(5), 741–762 (2020). https://doi.org/10.1111/jcal.12444

work page doi:10.1111/jcal.12444 2020

[38] [38]

Biochemia medica22(3), 276–282 (2012).https://doi.org/10.11613/BM.2012.031

McHugh, M.L.: Interrater reliability: The kappa statistic. Biochemia Medica22(3), 276–282 (2012). https://doi.org/10.11613/BM.2012.031

work page doi:10.11613/bm.2012.031 2012

[39] [39]

In: Proceedings of the Twelfth ACM Confer- ence on Learning @ Scale

Mehta, S., Srivastava, N., Liu, X., Vanacore, K., Baker, R.S.: Do mooc conver- sations matter? investigating the role of social presence and course-relevant discussion in career advancement. In: Proceedings of the Twelfth ACM Confer- ence on Learning @ Scale. p. 236–240. L@S ’25, ACM, New York, NY, USA (2025). https://doi.org/10.1145/3698205.3733930 L@S ’...

work page doi:10.1145/3698205.3733930 2025

[40] [40]

Journal of Nursing Management17(2), 247–255 (Mar 2009)

Miller, K., Riley, W., Davis, S.: Identifying key nursing and team behaviours to achieve high reliability. Journal of Nursing Management17(2), 247–255 (Mar 2009). https://doi.org/10.1111/j.1365-2834.2009.00978.x

work page doi:10.1111/j.1365-2834.2009.00978.x 2009

[41] [41]

In: Proceed- ings of the Eleventh ACM Conference on Learning @ Scale

Moore, S., Schmucker, R., Mitchell, T., Stamper, J.: Automated generation and tagging of knowledge components from multiple-choice questions. In: Proceed- ings of the Eleventh ACM Conference on Learning @ Scale. p. 122–133. L@S ’24, ACM, New York, NY, USA (2024). https://doi.org/10.1145/3657604.3662030

work page doi:10.1145/3657604.3662030 2024

[42] [42]

In: Proceedings of the 13th International Conference on, Intelligent Systems Application to Power Systems

Ngatchou, P., Zarei, A., El-Sharkawi, A.: Pareto multi objective op- timization. In: Proceedings of the 13th International Conference on, Intelligent Systems Application to Power Systems. pp. 84–91 (2005). https://doi.org/10.1109/ISAP.2005.1599245

work page doi:10.1109/isap.2005.1599245 2005

[43] [43]

In: Proceedings of the Eleventh ACM Conference on Learning @ Scale

Nguyen, H., Stott, N., Allan, V.: Comparing feedback from large language mod- els and instructors: Teaching computer science at scale. In: Proceedings of the Eleventh ACM Conference on Learning @ Scale. p. 335–339. L@S ’24, ACM, New York, NY, USA (2024). https://doi.org/10.1145/3657604.3664660

work page doi:10.1145/3657604.3664660 2024

[44] [44]

In: Proceedings of the Twelfth ACM Conference on Learning @ Scale

Nie, A., Chandak, Y., Suzara, M., Malik, A., Woodrow, J., Peng, M., Sahami, M., Brunskill, E., Piech, C.: The gpt surprise: Offering large language model chat in a massive coding class reduced engagement but may increase adopters’ exam performances. In: Proceedings of the Twelfth ACM Conference on Learning @ Scale. p. 376–380. L@S ’25, ACM, New York, NY, ...

work page doi:10.1145/3698205.3733960 2025

[45] [45]

In: Proceedings of the Tenth ACM Conference on Learning @ Scale

Ouhaichi, H., Spikol, D., Vogel, B.: Rethinking mmla: Design considerations for multimodal learning analytics systems. In: Proceedings of the Tenth ACM Conference on Learning @ Scale. p. 354–359. L@S ’23, ACM, New York, NY, USA (2023). https://doi.org/10.1145/3573051.3596186

work page doi:10.1145/3573051.3596186 2023

[46] [46]

In: Proceedings of the Third (2016) ACM Conference on Learning @ Scale

Papathoma, T., Ferguson, R., Littlejohn, A., Coe, A.: Making the production of learning at scale more open and flexible. In: Proceedings of the Third (2016) ACM Conference on Learning @ Scale. p. 273–276. L@S ’16, ACM, New York, NY, USA (2016). https://doi.org/10.1145/2876034.2893432

work page doi:10.1145/2876034.2893432 2016

[47] [47]

In: Proceedings of the 40th International Conference on Machine Learning

Radford, A., Kim, J.W., Xu, T., Brockman, G., McLeavey, C., Sutskever, I.: Robust speech recognition via large-scale weak supervision. In: Proceedings of the 40th International Conference on Machine Learning. ICML’23 (2023)

work page 2023

[48] [48]

In: Henriksen, K., Battles, J.B., Keyes, M.A., Grady, M.L

Riley, W., Hansen, H., Gürses, A.P., Davis, S., Miller, K., Priester, R.: The nature, characteristics and patterns of perinatal critical events teams. In: Henriksen, K., Battles, J.B., Keyes, M.A., Grady, M.L. (eds.) Advances in Patient Safety: New Directions and Alternative Approaches (Vol. 3: Performance and Tools). Agency for Healthcare Research and Qu...

work page 2008

[49] [49]

Journal of Continuing Education in the Health Professions32(4), 243–254 (2012)

Rosen, M.A., Hunt, E.A., Pronovost, P.J., Federowicz, M.A., Weaver, S.J.: In Situ Simulation in Continuing Education for the Health Care Professions: A Systematic Review. Journal of Continuing Education in the Health Professions32(4), 243–254 (2012). https://doi.org/10.1002/chp.21152

work page doi:10.1002/chp.21152 2012

[50] [50]

American Psychologist73(4), 593–600 (2018)

Salas, E., Reyes, D.L., McDaniel, S.H.: The science of teamwork: Progress, re- flections, and the road ahead. American Psychologist73(4), 593–600 (2018). https://doi.org/10.1037/amp0000334

work page doi:10.1037/amp0000334 2018

[51] [51]

In: Pro- ceedings of the 16th International Learning Analytics & Knowledge Conference (LAK ’26)

Samaraweera, S., Zhao, L., Echeverria, V., Alfredo, R., Chen, G., Davis, J., Leonny, S., Sevenhuysen, S., Connell, C., Gasevic, D., Martinez-Maldonado, R., Dhar- maratne, A.: From formal learning to professional practice: Automated llm-based coding and visualisation of team dialogue in in-situ healthcare simulation. In: Pro- ceedings of the 16th Internati...

work page 2026

[52] [52]

Communication & Medicine13(1), 1–7 (2017)

Sarangi, S.: Editorial: Team work and team talk as distributed and coordinated action in healthcare delivery. Communication & Medicine13(1), 1–7 (2017). https://doi.org/10.1558/cam.32569

work page doi:10.1558/cam.32569 2017

[53] [53]

JMIR Medical Informatics12, e55318 (Apr 2024)

Sivarajkumar, S., Kelley, M., Samolyk-Mazzanti, A., Visweswaran, S., Wang, Y.: An Empirical Evaluation of Prompting Strategies for Large Language Mod- els in Zero-Shot Clinical Natural Language Processing: Algorithm Develop- ment and Validation Study. JMIR Medical Informatics12, e55318 (Apr 2024). https://doi.org/10.2196/55318

work page doi:10.2196/55318 2024

[54] [54]

Southwell, R., Pugh, S., E. Margaret Perkoff, Clevenger, C., Bush, J., Lieber, R., Ward, W., Foltz, P., D’Mello, S.: Challenges and Feasibility of Auto- matic Speech Recognition for Modeling Student Collaborative Discourse in Classrooms. In: Mitrovic, A., Bosch, N. (eds.) Proceedings of the 15th International Conference on Educational Data Mining. Zenodo ...

work page doi:10.5281/zenodo.6853109 2022

[55] [55]

Stadler, W.: Multicriteria Optimization in Engineering and in the Sciences, vol. 37. Springer Science & Business Media (1988)

work page 1988

[56] [56]

MIT Press, Cambridge, MA (2006)

Stahl, G.: Group Cognition: Computer Support for Building Collaborative Knowledge. MIT Press, Cambridge, MA (2006). https://doi.org/10.7551/mitpress/3372.001.0001

work page doi:10.7551/mitpress/3372.001.0001 2006

[57] [57]

Journal of Medical Internet Research27, e58744 (Feb 2025)

Stenseth, H.V., Steindal, S.A., Solberg, M.T., Ølnes, M.A., Sørensen, A.L., Strandell- Laine, C., Olaussen, C., Farsjø Aure, C., Pedersen, I., Zlamal, J., Gue Mar- tini, J., Bresolin, P., Linnerud, S.C.W., Nes, A.A.G.: Simulation-Based Learning Supported by Technology to Enhance Critical Thinking in Nursing Students: Scoping Review. Journal of Medical Int...

work page doi:10.2196/58744 2025

[58] [58]

International Journal of Surgery53, 171–177 (2018)

Sun, R., Marshall, D.C., Sykes, M.C., Maruthappu, M., Shalhoub, J.: The impact of improving teamwork on patient outcomes in surgery: A sys- tematic review. International Journal of Surgery53, 171–177 (2018). https://doi.org/10.1016/j.ijsu.2018.03.044

work page doi:10.1016/j.ijsu.2018.03.044 2018

[59] [59]

Advances in Simulation (Jan 2026)

Tscholl, D.W., Ebensperger, M., RahrischRahrisch, A., Wang, H., Heckel, H., Thomasius, M., Kaserer, A., Grande, B., Seelandt, J.C., Kolbe, M.: Generative AI in simulation debriefings: An exploratory study using the Team-FIRST frame- work and qualitative feedback from simulation experts and learners. Advances in Simulation (Jan 2026). https://doi.org/10.11...

work page doi:10.1186/s41077-026-00407-0 2026

[60] [60]

British Journal of Educational Technology56(6), 2671–2704 (2025)

Wang, D., Chen, G.: Evaluating the use of bert and llama to anal- yse classroom dialogue for teachers’ learning of dialogic pedagogy. British Journal of Educational Technology56(6), 2671–2704 (2025). https://doi.org/https://doi.org/10.1111/bjet.13604

work page doi:10.1111/bjet.13604 2025

[61] [61]

International Journal of Educational Research 123, 102275 (2024)

Wang, D., Tao, Y., Chen, G.: Artificial intelligence in classroom discourse: A sys- tematic review of the past decade. International Journal of Educational Research 123, 102275 (2024)

work page 2024

[62] [62]

In: Proceedings of the Twelfth ACM Conference on Learning @ Scale

Wang, D., Yang, C., Chen, G.: Using lora to fine-tune large language models for analyzing collaborative argumentation in classrooms. In: Proceedings of the Twelfth ACM Conference on Learning @ Scale. p. 207–211. L@S ’25, ACM, New York, NY, USA (2025). https://doi.org/10.1145/3698205.3733924

work page doi:10.1145/3698205.3733924 2025

[63] [63]

In: Proceedings of the 30th Conference on Pattern Languages of Programs

White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer- Smith, J., Schmidt, D.C.: A prompt pattern catalog to enhance prompt engineering with chatgpt. In: Proceedings of the 30th Conference on Pattern Languages of Programs. PLoP ’23, The Hillside Group, USA (2023)

work page 2023

[64] [64]

International Journal of Artificial Intelligence in Education35(4), 2421– 2452 (Dec 2025)

Yan, L., Gašević, D., Echeverria, V., Zhao, L., Jin, Y., Li, X., Martinez-Maldonado, R.: In Sync or Out of Sync? Understanding Stress and Learning Performance in Collaborative Healthcare Simulations through Physiological Synchrony and Arousal. International Journal of Artificial Intelligence in Education35(4), 2421– 2452 (Dec 2025). https://doi.org/10.100...

work page doi:10.1007/s40593-025-00475-9 2025

[65] [65]

In: Pro- ceedings of the Ninth ACM Conference on Learning @ Scale

Yang, H., Alozie, N., Rachmatullah, A.: Collaboration at scale: Exploring member role changing patterns in collaborative science problem-solving tasks. In: Pro- ceedings of the Ninth ACM Conference on Learning @ Scale. p. 309–312. L@S ’22, ACM, New York, NY, USA (2022). https://doi.org/10.1145/3491140.3528319

work page doi:10.1145/3491140.3528319 2022

[66] [66]

In: Usenix Nsdi (2023)

You, J., Chung, J.W., Chowdhury, M.: Zeus: Understanding and optimizing GPU energy consumption of DNN training. In: Usenix Nsdi (2023)

work page 2023

[67] [67]

Baker, Juhan Kim, and Nidhi Nasiar

Zambrano, A.F., Liu, X., Barany, A., Baker, R.S., Kim, J., Nasiar, N.: From ncoder to chatgpt: From automated coding to refining human coding. In: Arastoopour Ir- gens, G., Knight, S. (eds.) Advances in Quantitative Ethnography, Communica- tions in Computer and Information Science, vol. 1895. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-47014-1_32

work page doi:10.1007/978-3-031-47014-1_32 2023

[68] [68]

In: Proceedings of the Eleventh ACM Conference on Learning @ Scale

Zhang, A.G., Tang, X., Oney, S., Chen, Y.: Cflow: Supporting semantic flow analy- sis of students’ code in programming problems at scale. In: Proceedings of the Eleventh ACM Conference on Learning @ Scale. p. 188–199. L@S ’24, ACM, New York, NY, USA (2024). https://doi.org/10.1145/3657604.3662025

work page doi:10.1145/3657604.3662025 2024

[69] [69]

In: Proceedings of the 14th Learning Analytics and Knowledge Conference

Zhao, L., Echeverria, V., Swiecki, Z., Yan, L., Alfredo, R., Li, X., Gase- vic, D., Martinez-Maldonado, R.: Epistemic network analysis for end-users: Closing the loop in the context of multimodal analytics for collaborative team learning. In: Proceedings of the 14th Learning Analytics and Knowl- edge Conference. p. 90–100. LAK ’24, ACM, New York, NY, USA ...

work page doi:10.1145/3636555.3636855 2024

[70] [70]

British Journal of Educational Technology55(4), 1673–1702 (Jul 2024)

Zhao, L., Gašević, D., Swiecki, Z., Li, Y., Lin, J., Sha, L., Yan, L., Alfredo, R., Li, X., Martinez-Maldonado, R.: Towards automated transcribing and coding of embodied teamwork communication through multimodal learning analyt- ics. British Journal of Educational Technology55(4), 1673–1702 (Jul 2024). https://doi.org/10.1111/bjet.13476

work page doi:10.1111/bjet.13476 2024

[71] [71]

In: LAK23: 13th International Learning Analytics and Knowledge Conference

Zhao, L., Swiecki, Z., Gasevic, D., Yan, L., Dix, S., Jaggard, H., Wotherspoon, R., Osborne, A., Li, X., Alfredo, R., Martinez-Maldonado, R.: Mets: Multimodal learning analytics of embodied teamwork learning. In: LAK23: 13th International Learning Analytics and Knowledge Conference. p. 186–196. LAK2023, ACM, New York, NY, USA (2023). https://doi.org/10.11...

work page doi:10.1145/3576050.3576076 2023

[72] [72]

https://doi.org/10.48550/ARXIV.2405.16178 Received 16 February 2026

Zhu, Y., Gu, J.C., Sikora, C., Ko, H., Liu, Y., Lin, C.C., Shu, L., Luo, L., Meng, L., Liu, B., Chen, J.: Accelerating Inference of Retrieval-Augmented Generation via Sparse Context Selection (2024). https://doi.org/10.48550/ARXIV.2405.16178 Received 16 February 2026

work page doi:10.48550/arxiv.2405.16178 2024