pith. sign in

arxiv: 2602.13241 · v2 · pith:L2WBPEUMnew · submitted 2026-01-30 · 💻 cs.CY · cs.AI· cs.HC

Empowering 9-1-1 Calltaking Training with Generative AI: Experiences and Lessons Learned

Pith reviewed 2026-05-25 07:07 UTC · model grok-4.3

classification 💻 cs.CY cs.AIcs.HC
keywords generative AI911 call-taker trainingemergency communicationspublic safety systemsAI deployment lessonssafety-critical trainingreal-world evaluationorganizational processes
0
0 comments X

The pith

A generative AI training system for 9-1-1 call-takers was deployed at scale in Nashville, producing four lessons on design and governance drawn from real operations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper reports on a partnership to create and run a generative AI system that trains emergency call-takers to handle 911 calls. Traditional methods require hundreds of hours of one-on-one time and cannot keep up with staffing shortages above 25 percent. The system reached 190 users across 1,120 sessions in six months, and logs of 98,429 interactions plus organizational records were examined to surface challenges in delivery, accuracy, reliability, and human fit that lab tests miss. Each challenge is paired with specific practices for building and overseeing such systems. Readers would care because the guidance is taken directly from live use in a safety-critical public service rather than from simulations.

Core claim

Deployment of the GenAI-powered call-taking training system under real-world constraints scaled from pilot to 190 operational users and 1,120 sessions; analysis of 98,429 user interactions, organizational processes, and stakeholder patterns distilled four key lessons on system delivery, rigor, resilience, and human factors, each paired with concrete design and governance practices for safety-critical public sector environments.

What carries the argument

The six-month live deployment of the GenAI call-taking training system together with systematic review of its usage logs and organizational patterns to extract lessons.

If this is right

  • AI training systems can expand coverage and speed feedback without pulling experienced staff off active duty for every new hire.
  • Challenges around delivery, rigor, resilience, and human factors only become visible after systems move from controlled tests into daily operations.
  • Concrete design choices must be paired with governance rules to keep the training aligned with safety requirements.
  • Practitioners in other constrained public-sector settings can use the same log-analysis approach to surface their own hidden obstacles.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same pattern of surfacing hidden challenges through live logs could be applied to AI training tools in fields such as healthcare dispatch or aviation ground operations.
  • Organizations could test whether adopting the four paired practices shortens the 720-hour training cycle or lowers the 25 percent staffing gap.
  • Repeating the deployment and analysis in centers with different call volumes or technology stacks would show how much the lessons depend on local conditions.
  • Long-term use may require ongoing monitoring of how the system interacts with existing shift schedules and performance metrics.

Load-bearing premise

The challenges seen in this single Metro Nashville deployment are representative of those that will appear in other public safety organizations that face similar staffing and training limits.

What would settle it

A second deployment in a different emergency communications center that encounters none of the four reported challenges or finds that the recommended design and governance practices produce no measurable improvement in training outcomes.

Figures

Figures reproduced from arXiv: 2602.13241 by Meiyi Ma, Yilin Liu, Zirong Chen.

Figure 1
Figure 1. Figure 1: Workstation view of the de￾ployed training system at a municipal 9- 1-1 communications center. We address this gap through a longitudinal deployment of a GenAI-powered training system within Metro Nashville Department of Emergency Communications (MNDEC), see photos caputured in routine training in [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: System workflow showing training assignment, cloud-based caller simulation, real-time [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Continuously iterative design-develop-deploy workflow. The three phases operate concur [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Dispute rates (noted as ‘phantom error’) of user attributed mistakes to system different across experience and performance levels (114 trainees, 1,120 completed sessions). 3.3.1 Observations When training systems perform evaluative functions under pressure, users experiencing difficulty may attribute failures to technology rather than their own performance as a psychological defense mechanism [29, 44]. Thi… view at source ↗
Figure 5
Figure 5. Figure 5: Task complexity versus performance and dispute rates (940 sessions with 12+ turns). [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗
read the original abstract

Emergency call-takers form the first operational link in public safety response, handling over 240 million calls annually while facing a sustained training crisis: staffing shortages exceed 25\% in many centers, and preparing a single new hire can require up to 720 hours of one-on-one instruction that removes experienced personnel from active duty. Traditional training approaches struggle to scale under these constraints, limiting both coverage and feedback timeliness. In partnership with Metro Nashville Department of Emergency Communications (MNDEC), we designed, developed, and deployed a GenAI-powered call-taking training system under real-world constraints. Over six months, deployment scaled from initial pilot to 190 operational users across 1,120 training sessions, exposing systematic challenges around system delivery, rigor, resilience, and human factors that remain largely invisible in controlled or purely simulated evaluations. By analyzing deployment logs capturing 98,429 user interactions, organizational processes, and stakeholder engagement patterns, we distill four key lessons, each coupled with concrete design and governance practices. These lessons provide grounded guidance for researchers and practitioners seeking to deliver AI-driven training systems in safety-critical public sector environments where practical constraints fundamentally shape human-centric design.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper describes a six-month real-world deployment of a generative AI-powered training system for 9-1-1 call-takers at Metro Nashville Department of Emergency Communications (MNDEC). It reports scaling from pilot to 190 users and 1,120 sessions, with analysis of 98,429 user interactions, organizational processes, and stakeholder patterns used to identify four key lessons on challenges in system delivery, rigor, resilience, and human factors. Each lesson is paired with concrete design and governance practices intended to guide similar AI training deployments in safety-critical public-sector settings.

Significance. If the lessons hold and transfer, the work supplies rare empirical detail on operational frictions that arise only after deployment at scale in a high-stakes domain, complementing controlled studies by documenting how staffing shortages, existing LMS constraints, and union rules interact with GenAI training tools.

major comments (2)
  1. [Abstract / Methods (lesson distillation)] Abstract and the section describing lesson extraction: the central claim that four lessons were distilled from the 98,429-interaction logs rests on an unstated analytical process; no description is given of coding procedures, inter-rater reliability, controls for selection bias in reported challenges, or validation steps against raw logs or stakeholder transcripts.
  2. [Abstract] Abstract: the assertion that the lessons supply 'grounded guidance' for other safety-critical public-sector environments is load-bearing for the contribution, yet the manuscript contains no cross-site data, no explicit transferability conditions, and no sensitivity checks showing which lessons depend on MNDEC-specific factors such as local staffing ratios or call-volume profile.
minor comments (2)
  1. [Abstract] The abstract states deployment scaled to 190 operational users but does not clarify whether this figure includes only active call-takers or also supervisors and trainers.
  2. [Results / Discussion] No table or figure summarizes the four lessons alongside the supporting log-derived evidence or the paired design practices; such a summary would improve traceability.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for the detailed and constructive feedback on our manuscript. We address each major comment below, indicating where we will revise the paper and where limitations are inherent to the single-site study design.

read point-by-point responses
  1. Referee: [Abstract / Methods (lesson distillation)] Abstract and the section describing lesson extraction: the central claim that four lessons were distilled from the 98,429-interaction logs rests on an unstated analytical process; no description is given of coding procedures, inter-rater reliability, controls for selection bias in reported challenges, or validation steps against raw logs or stakeholder transcripts.

    Authors: We agree that the analytical process used to distill the four lessons requires explicit description. The lessons emerged from iterative team review of deployment logs, organizational records, and stakeholder meeting notes, with challenges identified through pattern recognition across the 98,429 interactions and cross-referenced with operational constraints reported by MNDEC staff. No formal qualitative coding protocol with inter-rater reliability metrics was applied, as the process relied on consensus among the co-authors who had direct access to the deployment. In the revised manuscript we will add a dedicated subsection in Methods that documents the steps taken, including how raw logs were sampled, how selection of reported challenges was reviewed for bias, and the absence of formal IRR procedures. This addition will also note the limitations of the approach. revision: yes

  2. Referee: [Abstract] Abstract: the assertion that the lessons supply 'grounded guidance' for other safety-critical public-sector environments is load-bearing for the contribution, yet the manuscript contains no cross-site data, no explicit transferability conditions, and no sensitivity checks showing which lessons depend on MNDEC-specific factors such as local staffing ratios or call-volume profile.

    Authors: The referee correctly notes that the study is a single-site deployment and therefore cannot supply cross-site empirical validation or sensitivity analyses across varying staffing ratios or call volumes. We will revise the abstract to moderate the phrasing around 'grounded guidance' and add a new Limitations and Transferability section that explicitly lists MNDEC-specific factors (staffing shortages exceeding 25%, existing LMS constraints, union rules) and states the conditions under which the lessons are most likely to apply (public-safety agencies facing comparable training bottlenecks and regulatory environments). No new data collection is possible, but the added section will provide readers with clearer boundaries for generalization. revision: partial

standing simulated objections not resolved
  • We cannot supply cross-site data, multi-site validation, or quantitative sensitivity checks across different staffing or call-volume profiles, as the work reports a single six-month deployment at MNDEC.

Circularity Check

0 steps flagged

No circularity: observational case study with no derivations or fitted constructs

full rationale

The paper is a six-month deployment study of a GenAI training system at one site (MNDEC). It reports 98,429 logged interactions and distills four lessons from logs, processes, and stakeholder patterns. No equations, parameters, predictions, or derivations appear; lessons are presented as direct outputs of the observed data rather than constructs that reduce to their own inputs by definition or self-citation. The single-site nature raises external-validity questions but does not create circularity in the reported analysis chain.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Empirical deployment study with no mathematical model, free parameters, axioms, or postulated entities; relies entirely on observed logs and stakeholder input from one partnership.

pith-pipeline@v0.9.0 · 5742 in / 1298 out tokens · 53295 ms · 2026-05-25T07:07:14.494342+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

57 extracted references · 57 canonical work pages

  1. [1]

    A systematic review on fostering appropriate trust in human-ai interaction: Trends, opportunities and challenges

    Yasmeen Alufaisan, Layan Alzahrani, Jian Zhou, and Murat Kantarcioglu. A systematic review on fostering appropriate trust in human-ai interaction: Trends, opportunities and challenges. ACM Journal on Responsible Computing, 1(4):1–35, 2024

  2. [2]

    Why does the effective context length of llms fall short?, 2024

    Chenxin An, Jun Zhang, Ming Zhong, Lei Li, Shansan Gong, Yao Luo, Jingjing Xu, and Lingpeng Kong. Why does the effective context length of llms fall short?, 2024. arXiv preprint

  3. [3]

    Minimum training standards for public safety telecommunicators (apco ans 3.103.2-2015)

    APCO International. Minimum training standards for public safety telecommunicators (apco ans 3.103.2-2015). American national standard, APCO International, Daytona Beach, FL,

  4. [4]

    ANSI-approved standard

  5. [5]

    Standard for the establishment of a quality assurance and quality improvement program for public safety answering points (apco/nena ans 1.107.1-2015)

    APCO International and National Emergency Number Association (NENA). Standard for the establishment of a quality assurance and quality improvement program for public safety answering points (apco/nena ans 1.107.1-2015). American national standard, APCO Interna- tional; National Emergency Number Association (NENA), 2015

  6. [6]

    Chain-of-thought reasoning in the wild is not always faithful

    Iván Arcuschin, Jett Janiak, Robert Krzyzanowski, Senthooran Rajamanoharan, Neel Nanda, and Arthur Conmy. Chain-of-thought reasoning in the wild is not always faithful. ICLR 2025 Workshop on Reasoning and Planning for Large Language Models, 2025

  7. [7]

    Runtime assurance from signal temporal logic safety spec- ifications

    Luke Baird and Samuel Coogan. Runtime assurance from signal temporal logic safety spec- ifications. In2023 American Control Conference (ACC), pages 3535–3540, San Diego, CA, USA, 2023. IEEE

  8. [8]

    Socio-technical systems: From design methods to sys- tems engineering.Interacting with Computers, 23(1):4–17, 2011

    Gordon Baxter and Ian Sommerville. Socio-technical systems: From design methods to sys- tems engineering.Interacting with Computers, 23(1):4–17, 2011

  9. [9]

    Artificial intelligence in emergency medicine: Viewpoint of current applications and foreseeable opportunities and challenges

    Matthias Beham, Carmen Vlad, and Sonja Reuter. Artificial intelligence in emergency medicine: Viewpoint of current applications and foreseeable opportunities and challenges. Journal of Medical Internet Research, 25:e40031, 2023

  10. [10]

    Robert A. Bjork. Memory and metamemory considerations in the training of human beings. In Janet Metcalfe and Arthur P. Shimamura, editors,Metacognition: Knowing About Knowing, pages 185–205. MIT Press, Cambridge, MA, 1994

  11. [11]

    Language mod- els are few-shot learners.Advances in neural information processing systems, 33:1877–1901, 2020

    Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhari- wal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language mod- els are few-shot learners.Advances in neural information processing systems, 33:1877–1901, 2020

  12. [12]

    Zana Buçinca, Maja Barbara Malaya, and Krzysztof Z. Gajos. To trust or to think: Cognitive forcing functions can reduce overreliance on ai in ai-assisted decision-making.Proceedings of the ACM on Human-Computer Interaction, 5(CSCW1):1–21, 2021

  13. [13]

    Over- coming the challenges of collaboratively adopting artificial intelligence in the public sector

    Averill Campion, Mila Gasco-Hernandez, Slava Jankin Mikhaylov, and Marc Esteve. Over- coming the challenges of collaboratively adopting artificial intelligence in the public sector. Social Science Computer Review, 40(2):462–477, 2022

  14. [14]

    Deep learning with edge computing: A review.Proceedings of the IEEE, 107(8):1655–1674, 2019

    Jianyu Chen and Xiang Ran. Deep learning with edge computing: A review.Proceedings of the IEEE, 107(8):1655–1674, 2019. 10

  15. [15]

    Logidebrief: A signal-temporal logic based automated debriefing approach with large language models integration

    Zirong Chen, Ziyan An, Jennifer Reynolds, Kristin Mullen, Stephen Maritini, and Meiyi Ma. Logidebrief: A signal-temporal logic based automated debriefing approach with large language models integration. In James Kwok, editor,Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence, IJCAI-25, pages 9582–9590. International...

  16. [16]

    Sim911: Towards effective and equitable 9-1-1 dispatcher training with an llm-enabled simulation

    Zirong Chen, Elizabeth Chason, Noah Mladenovski, Erin Wilson, Kristin Mullen, Stephen Martini, and Meiyi Ma. Sim911: Towards effective and equitable 9-1-1 dispatcher training with an llm-enabled simulation. InProceedings of the AAAI Conference on Artificial Intelligence, volume 39, pages 27896–27904, 2025

  17. [17]

    Cityspec: An intelligent assistant system for requirement specification in smart cities

    Zirong Chen, Isaac Li, Haoxiang Zhang, Sarah Preum, John A Stankovic, and Meiyi Ma. Cityspec: An intelligent assistant system for requirement specification in smart cities. In2022 IEEE International Conference on Smart Computing (SMARTCOMP), pages 32–39. IEEE, 2022

  18. [18]

    Cityspec with shield: A secure intelligent assistant for requirement formalization.Pervasive and Mobile Computing, 92:101802, 2023

    Zirong Chen, Isaac Li, Haoxiang Zhang, Sarah Preum, John A Stankovic, and Meiyi Ma. Cityspec with shield: A secure intelligent assistant for requirement formalization.Pervasive and Mobile Computing, 92:101802, 2023

  19. [19]

    Auto311: A confidence-guided auto- mated system for non-emergency calls

    Zirong Chen, Xutong Sun, Yuanhe Li, and Meiyi Ma. Auto311: A confidence-guided auto- mated system for non-emergency calls. InProceedings of the AAAI Conference on Artificial Intelligence, volume 38, pages 21967–21975, 2024

  20. [20]

    Teacher, trainer, counsel, spy: How generative ai can bridge or widen the gaps in worker-centric digital phenotyping of wellbeing

    Vedant Das Swain and Koustuv Saha. Teacher, trainer, counsel, spy: How generative ai can bridge or widen the gaps in worker-centric digital phenotyping of wellbeing. InProceedings of the 3rd Annual Meeting of the Symposium on Human-Computer Interaction for Work, CHI- WORK ’24, New York, NY , USA, 2024. ACM

  21. [21]

    The participatory turn in ai design: Theoretical foundations and the current state of practice

    Fernando Delgado, Stephen Yang, Michael Madaio, and Qian Yang. The participatory turn in ai design: Theoretical foundations and the current state of practice. InProceedings of the 3rd ACM Conference on Equity and Access in Algorithms, Mechanisms, and Optimization (EAAMO ’23), pages 1–23, New York, NY , USA, 2023. ACM

  22. [22]

    Deshmukh, Alexandre Donzé, Shromona Ghosh, Xiaoqing Jin, Garvit Juniwal, and Sanjit A

    Jyotirmoy V . Deshmukh, Alexandre Donzé, Shromona Ghosh, Xiaoqing Jin, Garvit Juniwal, and Sanjit A. Seshia. Robust online monitoring of signal temporal logic. InRuntime Veri- fication, volume 9333 ofLecture Notes in Computer Science, pages 55–70. Springer, Cham, 2015

  23. [23]

    Exploring context window of large language models via decomposed positional vectors.Ad- vances in Neural Information Processing Systems, 37:10320–10347, 2024

    Zican Dong, Junyi Li, Xin Men, Xin Zhao, Bingning Wang, Zhen Tian, Ji-Rong Wen, et al. Exploring context window of large language models via decomposed positional vectors.Ad- vances in Neural Information Processing Systems, 37:10320–10347, 2024

  24. [24]

    Upol Ehsan, Koustuv Saha, Munmun De Choudhury, and Mark O. Riedl. Charting the so- ciotechnical gap in explainable ai: A framework to address the gap in xai.Proceedings of the ACM on Human-Computer Interaction, 7(CSCW1):1–32, 2023

  25. [25]

    Equity within ai systems: What can health leaders expect?Healthcare Management Forum, 36(2):119–124, 2023

    Christo El Morr. Equity within ai systems: What can health leaders expect?Healthcare Management Forum, 36(2):119–124, 2023

  26. [26]

    Wal- brink, Arielle Shibi Rosen, and Isabel T

    Eury Hong, Sundes Kazmir, Benjamin Dylik, Rajaram Bellan, William Frey, Sina Ardestani, Ikhlas Al-Hosni, Anna Mattana, Emily Carlson, Margaret Kirk, Megan Fitzwater, Rebecca Goldstein, Dany Furness, Nicola Kydes, Kim Recker, Katie Dilger, Erik Doyle, Traci A. Wal- brink, Arielle Shibi Rosen, and Isabel T. Gross. Exploring the use of a large language model...

  27. [27]

    America’s 911 workforce is in crisis

    International Academy of Emergency Dispatch and National Association of State 911 Admin- istrators. America’s 911 workforce is in crisis. Technical report, IAED & NASNA, 2023

  28. [28]

    The global landscape of ai ethics guidelines

    Anna Jobin, Marcello Ienca, and Effy Vayena. The global landscape of ai ethics guidelines. Nature Machine Intelligence, 1(9):389–399, 2019. 11

  29. [29]

    Productive failure.Cognition and Instruction, 26(3):379–424, 2008

    Manu Kapur. Productive failure.Cognition and Instruction, 26(3):379–424, 2008

  30. [30]

    Kubota, Reihane Mojdehbakhsh, Clarissa I

    Jennifer T. Kubota, Reihane Mojdehbakhsh, Clarissa I. Cortland, and Elizabeth A. Phelps. Stressing the person: Legal and everyday person attributions under stress.Biological Psychol- ogy, 103:117–124, 2014

  31. [31]

    Retrieval- augmented generation for knowledge-intensive nlp tasks.Advances in neural information pro- cessing systems, 33:9459–9474, 2020

    Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Na- man Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, et al. Retrieval- augmented generation for knowledge-intensive nlp tasks.Advances in neural information pro- cessing systems, 33:9459–9474, 2020

  32. [32]

    Lost in the middle: How language models use long contexts.Transactions of the Association for Computational Linguistics, 12:157–173, 2024

    Nelson F Liu, Kevin Lin, John Hewitt, Ashwin Paranjape, Michele Bevilacqua, Fabio Petroni, and Percy Liang. Lost in the middle: How language models use long contexts.Transactions of the Association for Computational Linguistics, 12:157–173, 2024

  33. [33]

    Google cloud platform.https://cloud.google.com/, 2025

    Google LLC. Google cloud platform.https://cloud.google.com/, 2025. Accessed: 2025-11-11

  34. [34]

    Who should i trust: Ai or myself? leveraging human and ai correctness likelihood to promote appropriate trust in ai-assisted decision-making

    Shuai Ma, Ying Lei, Xinru Wang, Chengbo Zheng, Chuhan Shi, Ming Yin, and Xiaojuan Ma. Who should i trust: Ai or myself? leveraging human and ai correctness likelihood to promote appropriate trust in ai-assisted decision-making. InProceedings of the 2023 CHI Conference on Human Factors in Computing Systems, CHI ’23, pages 1–19, New York, NY , USA, 2023. ACM

  35. [35]

    Marraffino, Bradford L

    Matthew D. Marraffino, Bradford L. Schroeder, Nicholas W. Fraulini, Wendi L. Van Buskirk, and Cheryl I. Johnson. Adapting training in real time: An empirical test of adaptive difficulty schedules.Military Psychology, 33(3):136–151, 2021

  36. [36]

    Turner, Marcia R

    Hendrika Meischke, Ian Painter, Anne M. Turner, Marcia R. Weaver, Carol E. Fahrenbruch, Brooke R. Ike, and Scott Stangenes. Protocol: Simulation training to improve 9-1-1 dispatcher identification of cardiac arrest.BMC Emergency Medicine, 16(1):9, 2016

  37. [37]

    Implementing AI in the public sector.Public Management Review, Special Issue editorial, 2024

    Ines Mergel, Helen Dickinson, Jari Stenvall, and Mila Gascó. Implementing AI in the public sector.Public Management Review, Special Issue editorial, 2024

  38. [38]

    Harnessing ai’s potential to lift up under- served communities

    Pooja Mittal, Mohamed Jalloh, and Traco Matthews. Harnessing ai’s potential to lift up under- served communities. California Health Care Foundation, April 2025

  39. [39]

    The story of socio-technical design: Reflections on its successes, failures and potential.Information Systems Journal, 16(4):317–342, 2006

    Enid Mumford. The story of socio-technical design: Reflections on its successes, failures and potential.Information Systems Journal, 16(4):317–342, 2006

  40. [40]

    Ai risk management framework (ai rmf 1.0)

    National Institute of Standards and Technology. Ai risk management framework (ai rmf 1.0). Technical report, U.S. Department of Commerce, 2023

  41. [41]

    Exploring artificial intelligence adop- tion in public organizations: A comparative case study.Public Management Review, 26(5):1– 23, 2024

    Oliver Neumann, Kyrillos Guirguis, and Roland Steiner. Exploring artificial intelligence adop- tion in public organizations: A comparative case study.Public Management Review, 26(5):1– 23, 2024

  42. [42]

    Gpt-4o-audio-preview (version 2024-12-17)

    OpenAI. Gpt-4o-audio-preview (version 2024-12-17). Preview model documentation, 2024. Supports audio-in/audio-out for text and audio modalities

  43. [43]

    Pasmore, Stu Winby, Susan A

    William A. Pasmore, Stu Winby, Susan A. Mohrman, and Richard Vanasse. Reflections: So- ciotechnical systems design and organization change.The Journal of Change Management, 19(2):67–85, 2019

  44. [44]

    Janne Riikonen, Pia Laukkanen-Nevala, Ilkka Virkkunen, Veronica Lindström, and Joonas Pappinen. Differences between the dispatch priority assessments of emergency medical dis- patchers and emergency medical services: A prospective register-based study in finland.Scan- dinavian Journal of Trauma, Resuscitation and Emergency Medicine, 31(1):8, 2023

  45. [45]

    The intuitive psychologist and his shortcomings: Distortions in the attribution pro- cess

    Lee Ross. The intuitive psychologist and his shortcomings: Distortions in the attribution pro- cess. In Leonard Berkowitz, editor,Advances in Experimental Social Psychology, volume 10, pages 173–220. Academic Press, New York, NY , 1977. 12

  46. [46]

    Magid, Paul Chan, Elisabeth D

    Comilla Sasson, David J. Magid, Paul Chan, Elisabeth D. Root, Bryan F. McNally, Arthur L. Kellermann, and Jason S. Haukoos. Association of neighborhood characteristics with bystander-initiated cpr.New England Journal of Medicine, 367(17):1607–1615, 2012

  47. [47]

    Edge computing: Vision and challenges.IEEE Internet of Things Journal, 3(5):637–646, 2016

    Weisong Shi, Jie Cao, Quan Zhang, Youhuizi Li, and Lanyu Xu. Edge computing: Vision and challenges.IEEE Internet of Things Journal, 3(5):637–646, 2016

  48. [48]

    Cognitive load during problem solving: Effects on learning.Cognitive Science, 12(2):257–285, 1988

    John Sweller. Cognitive load during problem solving: Effects on learning.Cognitive Science, 12(2):257–285, 1988

  49. [49]

    Taylor, Christopher A

    Michael J. Taylor, Christopher A. Zoda, Kyle L. Rasmussen, Anna M. Washington, Devansh Saxena, and Iheoma U. Ogbonnaya-Ogburu. Democratizing AI in public administration: Im- plementing U.S. federal AI guidelines with maximum feasible participation.AI & Society, 40(5):3653–3662, 2025

  50. [50]

    Llama: Open and efficient foundation language models, 2023

    Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timo- thée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, et al. Llama: Open and efficient foundation language models, 2023. arXiv preprint, v1

  51. [51]

    Michelangelo: Long context evaluations beyond haystacks via latent structure queries, 2024

    Kiran V odrahalli, Santiago Ontanon, Nilesh Tripuraneni, Kelvin Xu, Sanil Jain, Rakesh Shiv- anna, Jeffrey Hui, Nishanth Dikkala, Mehran Kazemi, Bahare Fatemi, et al. Michelangelo: Long context evaluations beyond haystacks via latent structure queries, 2024. arXiv preprint

  52. [52]

    Vygotsky.Mind in Society: The Development of Higher Psychological Processes

    Lev S. Vygotsky.Mind in Society: The Development of Higher Psychological Processes. Harvard University Press, Cambridge, MA, 1978

  53. [53]

    Chain-of-thought prompting elicits reasoning in large language models

    Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Fei Xia, Ed Chi, Quoc V Le, Denny Zhou, et al. Chain-of-thought prompting elicits reasoning in large language models. Advances in neural information processing systems, 35:24824–24837, 2022

  54. [54]

    Mastering symbolic operations: Augmenting language models with compiled neural networks

    Yixuan Weng, Minjun Zhu, Fei Xia, Bin Li, Shizhu He, Kang Liu, and Jun Zhao. Mastering symbolic operations: Augmenting language models with compiled neural networks. InPro- ceedings of the Twelfth International Conference on Learning Representations (ICLR), Vienna, Austria, 2024. OpenReview

  55. [55]

    Measuring and understand- ing trust calibrations for automated systems: A survey of the state-of-the-art and future direc- tions

    Magdalena Wischnewski, Nicole Krämer, and Emmanuel Müller. Measuring and understand- ing trust calibrations for automated systems: A survey of the state-of-the-art and future direc- tions. InProceedings of the 2023 CHI Conference on Human Factors in Computing Systems, CHI ’23, pages 1–23, New York, NY , USA, 2023. ACM

  56. [56]

    From role- play to drama-interaction: An llm solution

    Weiqi Wu, Hongqiu Wu, Lai Jiang, Xingyuan Liu, Hai Zhao, and Min Zhang. From role- play to drama-interaction: An llm solution. InFindings of the Association for Computational Linguistics: ACL 2024, pages 3271–3290, Bangkok, Thailand, 2024. Association for Compu- tational Linguistics

  57. [57]

    Edge intelligence: Paving the last mile of artificial intelligence with edge computing.Proceedings of the IEEE, 107(8):1738–1762, 2019

    Zhi Zhou, Xu Chen, En Li, Liekang Zeng, Ke Luo, and Junshan Zhang. Edge intelligence: Paving the last mile of artificial intelligence with edge computing.Proceedings of the IEEE, 107(8):1738–1762, 2019. 13 Natural Language Rules Formalized Specifications Call-taker asks for the address in the firstτ 1 turns. DETECT ω[0,τ1] a ,‘ask address’ Caller provides...